You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Social Group Identification Models

A family of fine-tuned Qwen3 models for extracting social group mentions from text in English and German. These models identify human collectives characterized by shared attributes (professional, demographic, role-based, etc.) and return structured spans following precise extraction rules.

Supported groups include: teachers, students, doctors, children, parents, patients, diabetics, single parents, colleagues, and any other human collective with identifiable shared properties.

Excluded: Named individuals, organizations/institutions, non-humans, quantifiers alone.

Usage

Task Prompt

Use the following prompt with your input text, always appending /no_think at the end:

Click to expand full prompt
## Task: Identify Social Groups in Sentences
**Definition**: A social group is a collection of people characterized by shared attributes. Extract human groups that are plural or generic singular representing a category.

### Core Rules

#### 1. Social Groups Include:
**Any human collective** characterized by shared attributes (plural or generic singular representing a category), such as:
- **Professional/occupational**: "Lehrkräfte" / "teachers", "Ärzte" / "doctors", "Studenten" / "students"
- **Demographic**: "Kinder" / "children", "Jugendliche" / "teenagers", "Senioren" / "seniors"
- **Role-based**: "Eltern" / "parents", "Patienten" / "patients", "Kunden" / "customers"
- **Characteristic-based**: "Diabetiker" / "diabetics", "Alleinerziehende" / "single parents"
- **Social/relational**: "Freunde" / "friends", "Nachbarn" / "neighbors", "Kollegen" / "colleagues"
- **Any other human group** with identifiable shared properties

*Note: Generic singular forms are included when they represent the category, not specific individuals.*

#### 2. Boundary Cases to Exclude:
- **Organizations/institutions**: "Unilever", "NASA", "Harvard", "Bundestag", "SPD" (entities, not groups)
- **Named individuals**: "Angela Merkel", "John Smith" (specific persons)
- **Non-humans**: "Hunde" / "dogs", "Roboter" / "robots" (not human groups)
- **Quantifiers alone**: "alle" / "all", "viele" / "many", "einige" / "some" (not group identifiers)
- **Articles and Numeralia**: "der" / "the", "hundert" / "hundreds" (no relevant attribute)

#### 3. Span Extraction Rules:
**Longest Valid Span Principle**: Extract the complete descriptive phrase that defines the social group.

**Include Essential Modifiers**:
- Descriptive attributes: "ältere Alumni" / "older alumni"
- Professional specifications: "erfahrene Chirurgen" / "experienced surgeons"
- Demographic details: "Kinder unter 12 Jahren" / "children under 12"
- Complex descriptions: "Menschen mit chronischen Erkrankungen" / "people with chronic illnesses"

**Personal Experience Exclusion**: Remove parts that define groups only through speaker's personal relationship:
- "Kollegen, die mich mobben" → "Kollegen" / "colleagues who bully me" → "colleagues"
- "Leute mit ähnlichen Erfahrungen" → "Leute" / "people with similar experiences" → "people"

**Coordination Handling**:
- **Separate groups**: "Männer und Frauen" → [Männer || Frauen]
- **Different attributes**: "junge Ärzte und erfahrene Krankenschwestern" → [junge Ärzte || erfahrene Krankenschwestern]
- **Shared attributes**: "kleine Jungen und Mädchen" → [kleine Jungen und Mädchen]

#### 4. Extraction Guidelines:
**Syntactic Position Independence**: Extract groups regardless of grammatical role (subject, object, prepositional phrase, genitive/possessive).

**Semantic Function Independence**: Extract groups regardless of semantic function (descriptive, predicative, vocative).

### Output Format:
Social Groups: [Group 1 || Group 2 || Group 3] (**Don't add any explanation**)

**Now analyze this sentence**: 

⚠️ Critical: The /no_think Token

Always append /no_think to your prompts. This token is essential for proper output formatting and ensures the model returns only the structured response without intermediate reasoning.

Example input: [Task prompt] + "Teachers and students discussed the curriculum." + " /no_think"
Expected output: Social Groups: [Teachers || students]

Performance Comparison

Model Parameters Avg F1 (All) Avg F1 (Non-Empty) Avg Time (s)
SocialGroupIdentification-Qwen3-0.6B-v1.2 0.6B 0.8774 0.5560 0.049
SocialGroupIdentification-Qwen3-1.7B-v1.2 1.7B 0.8792 0.5542 0.073
SocialGroupIdentification-Qwen3-4B-v1.2 4B 0.9160 0.6969 0.146
SocialGroupIdentification-Qwen3-8B-v1.2 8B 0.9175 0.6932 0.220

Test datasets: German articles, German tweets, English articles, English Reddit.

Model Details

  • Base Model: Qwen3 (0.6B, 1.7B, 4B, 8B variants)
  • Languages: English, German
  • Task: Span extraction for social group identification
  • Output Format: Structured list with || delimiter

Limitations

  • Optimized for German and English only
  • Performance varies by genre: higher on formal text (articles), lower on informal social media (tweets)

Citation

@misc{socialgroupidentification2025,
  author = {Schwager, Nils and Büttner, Jonas and Jügens, Pascal},
  title = {Social Group Identification Models},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/collections/nsschw/socialgroupidentification-68e3cae684fb332790c3a52b}
}
Downloads last month
3
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nsschw/SocialGroupIdentification-Qwen3-4B-v1.2

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(303)
this model