Social Group Identification Models
A family of fine-tuned Qwen3 models for extracting social group mentions from text in English and German. These models identify human collectives characterized by shared attributes (professional, demographic, role-based, etc.) and return structured spans following precise extraction rules.
Supported groups include: teachers, students, doctors, children, parents, patients, diabetics, single parents, colleagues, and any other human collective with identifiable shared properties.
Excluded: Named individuals, organizations/institutions, non-humans, quantifiers alone.
Usage
Task Prompt
Use the following prompt with your input text, always appending /no_think at the end:
Click to expand full prompt
## Task: Identify Social Groups in Sentences
**Definition**: A social group is a collection of people characterized by shared attributes. Extract human groups that are plural or generic singular representing a category.
### Core Rules
#### 1. Social Groups Include:
**Any human collective** characterized by shared attributes (plural or generic singular representing a category), such as:
- **Professional/occupational**: "Lehrkräfte" / "teachers", "Ärzte" / "doctors", "Studenten" / "students"
- **Demographic**: "Kinder" / "children", "Jugendliche" / "teenagers", "Senioren" / "seniors"
- **Role-based**: "Eltern" / "parents", "Patienten" / "patients", "Kunden" / "customers"
- **Characteristic-based**: "Diabetiker" / "diabetics", "Alleinerziehende" / "single parents"
- **Social/relational**: "Freunde" / "friends", "Nachbarn" / "neighbors", "Kollegen" / "colleagues"
- **Any other human group** with identifiable shared properties
*Note: Generic singular forms are included when they represent the category, not specific individuals.*
#### 2. Boundary Cases to Exclude:
- **Organizations/institutions**: "Unilever", "NASA", "Harvard", "Bundestag", "SPD" (entities, not groups)
- **Named individuals**: "Angela Merkel", "John Smith" (specific persons)
- **Non-humans**: "Hunde" / "dogs", "Roboter" / "robots" (not human groups)
- **Quantifiers alone**: "alle" / "all", "viele" / "many", "einige" / "some" (not group identifiers)
- **Articles and Numeralia**: "der" / "the", "hundert" / "hundreds" (no relevant attribute)
#### 3. Span Extraction Rules:
**Longest Valid Span Principle**: Extract the complete descriptive phrase that defines the social group.
**Include Essential Modifiers**:
- Descriptive attributes: "ältere Alumni" / "older alumni"
- Professional specifications: "erfahrene Chirurgen" / "experienced surgeons"
- Demographic details: "Kinder unter 12 Jahren" / "children under 12"
- Complex descriptions: "Menschen mit chronischen Erkrankungen" / "people with chronic illnesses"
**Personal Experience Exclusion**: Remove parts that define groups only through speaker's personal relationship:
- "Kollegen, die mich mobben" → "Kollegen" / "colleagues who bully me" → "colleagues"
- "Leute mit ähnlichen Erfahrungen" → "Leute" / "people with similar experiences" → "people"
**Coordination Handling**:
- **Separate groups**: "Männer und Frauen" → [Männer || Frauen]
- **Different attributes**: "junge Ärzte und erfahrene Krankenschwestern" → [junge Ärzte || erfahrene Krankenschwestern]
- **Shared attributes**: "kleine Jungen und Mädchen" → [kleine Jungen und Mädchen]
#### 4. Extraction Guidelines:
**Syntactic Position Independence**: Extract groups regardless of grammatical role (subject, object, prepositional phrase, genitive/possessive).
**Semantic Function Independence**: Extract groups regardless of semantic function (descriptive, predicative, vocative).
### Output Format:
Social Groups: [Group 1 || Group 2 || Group 3] (**Don't add any explanation**)
**Now analyze this sentence**:
⚠️ Critical: The /no_think Token
Always append /no_think to your prompts. This token is essential for proper output formatting and ensures the model returns only the structured response without intermediate reasoning.
Example input: [Task prompt] + "Teachers and students discussed the curriculum." + " /no_think"
Expected output: Social Groups: [Teachers || students]
Performance Comparison
| Model | Parameters | Avg F1 (All) | Avg F1 (Non-Empty) | Avg Time (s) |
|---|---|---|---|---|
| SocialGroupIdentification-Qwen3-0.6B-v1.2 | 0.6B | 0.8774 | 0.5560 | 0.049 |
| SocialGroupIdentification-Qwen3-1.7B-v1.2 | 1.7B | 0.8792 | 0.5542 | 0.073 |
| SocialGroupIdentification-Qwen3-4B-v1.2 | 4B | 0.9160 | 0.6969 | 0.146 |
| SocialGroupIdentification-Qwen3-8B-v1.2 | 8B | 0.9175 | 0.6932 | 0.220 |
Test datasets: German articles, German tweets, English articles, English Reddit.
Model Details
- Base Model: Qwen3 (0.6B, 1.7B, 4B, 8B variants)
- Languages: English, German
- Task: Span extraction for social group identification
- Output Format: Structured list with
||delimiter
Limitations
- Optimized for German and English only
- Performance varies by genre: higher on formal text (articles), lower on informal social media (tweets)
Citation
@misc{socialgroupidentification2025,
author = {Schwager, Nils and Büttner, Jonas and Jügens, Pascal},
title = {Social Group Identification Models},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/collections/nsschw/socialgroupidentification-68e3cae684fb332790c3a52b}
}
- Downloads last month
- 3