You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Social Group Identification Models

A family of fine-tuned Qwen3 models for extracting social group mentions from text in English and German. These models identify human collectives characterized by shared attributes (professional, demographic, role-based, etc.) and return structured spans following precise extraction rules.

Supported groups include: teachers, students, doctors, children, parents, patients, diabetics, single parents, colleagues, and any other human collective with identifiable shared properties.

Excluded: Named individuals, organizations/institutions, non-humans, quantifiers alone.

Usage

Task Prompt

Use the following prompt with your input text, always appending /no_think at the end:

Click to expand full prompt

## Task: Identify Social Groups in Sentences
**Definition**: A social group is a collection of people characterized by shared attributes. Extract human groups that are plural or generic singular representing a category.

### Core Rules

#### 1. Social Groups Include:
**Any human collective** characterized by shared attributes (plural or generic singular representing a category), such as:
- **Professional/occupational**: "Lehrkräfte" / "teachers", "Ärzte" / "doctors", "Studenten" / "students"
- **Demographic**: "Kinder" / "children", "Jugendliche" / "teenagers", "Senioren" / "seniors"
- **Role-based**: "Eltern" / "parents", "Patienten" / "patients", "Kunden" / "customers"
- **Characteristic-based**: "Diabetiker" / "diabetics", "Alleinerziehende" / "single parents"
- **Social/relational**: "Freunde" / "friends", "Nachbarn" / "neighbors", "Kollegen" / "colleagues"
- **Any other human group** with identifiable shared properties

*Note: Generic singular forms are included when they represent the category, not specific individuals.*

#### 2. Boundary Cases to Exclude:
- **Organizations/institutions**: "Unilever", "NASA", "Harvard", "Bundestag", "SPD" (entities, not groups)
- **Named individuals**: "Angela Merkel", "John Smith" (specific persons)
- **Non-humans**: "Hunde" / "dogs", "Roboter" / "robots" (not human groups)
- **Quantifiers alone**: "alle" / "all", "viele" / "many", "einige" / "some" (not group identifiers)
- **Articles and Numeralia**: "der" / "the", "hundert" / "hundreds" (no relevant attribute)

#### 3. Span Extraction Rules:
**Longest Valid Span Principle**: Extract the complete descriptive phrase that defines the social group.

**Include Essential Modifiers**:
- Descriptive attributes: "ältere Alumni" / "older alumni"
- Professional specifications: "erfahrene Chirurgen" / "experienced surgeons"
- Demographic details: "Kinder unter 12 Jahren" / "children under 12"
- Complex descriptions: "Menschen mit chronischen Erkrankungen" / "people with chronic illnesses"

**Personal Experience Exclusion**: Remove parts that define groups only through speaker's personal relationship:
- "Kollegen, die mich mobben" → "Kollegen" / "colleagues who bully me" → "colleagues"
- "Leute mit ähnlichen Erfahrungen" → "Leute" / "people with similar experiences" → "people"

**Coordination Handling**:
- **Separate groups**: "Männer und Frauen" → [Männer || Frauen]
- **Different attributes**: "junge Ärzte und erfahrene Krankenschwestern" → [junge Ärzte || erfahrene Krankenschwestern]
- **Shared attributes**: "kleine Jungen und Mädchen" → [kleine Jungen und Mädchen]

#### 4. Extraction Guidelines:
**Syntactic Position Independence**: Extract groups regardless of grammatical role (subject, object, prepositional phrase, genitive/possessive).

**Semantic Function Independence**: Extract groups regardless of semantic function (descriptive, predicative, vocative).

### Output Format:
Social Groups: [Group 1 || Group 2 || Group 3] (**Don't add any explanation**)

**Now analyze this sentence**:

⚠️ Critical: The `/no_think` Token

Always append /no_think to your prompts. This token is essential for proper output formatting and ensures the model returns only the structured response without intermediate reasoning.

Example input: [Task prompt] + "Teachers and students discussed the curriculum." + " /no_think"
Expected output: Social Groups: [Teachers || students]

Performance Comparison

Model	Parameters	Avg F1 (All)	Avg F1 (Non-Empty)	Avg Time (s)
SocialGroupIdentification-Qwen3-0.6B-v1.2	0.6B	0.8774	0.5560	0.049
SocialGroupIdentification-Qwen3-1.7B-v1.2	1.7B	0.8792	0.5542	0.073
SocialGroupIdentification-Qwen3-4B-v1.2	4B	0.9160	0.6969	0.146
SocialGroupIdentification-Qwen3-8B-v1.2	8B	0.9175	0.6932	0.220

Test datasets: German articles, German tweets, English articles, English Reddit.

Model Details

Base Model: Qwen3 (0.6B, 1.7B, 4B, 8B variants)
Languages: English, German
Task: Span extraction for social group identification
Output Format: Structured list with || delimiter

Limitations

Optimized for German and English only
Performance varies by genre: higher on formal text (articles), lower on informal social media (tweets)

Citation

@misc{socialgroupidentification2025,
  author = {Schwager, Nils and Büttner, Jonas and Jügens, Pascal},
  title = {Social Group Identification Models},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/collections/nsschw/socialgroupidentification-68e3cae684fb332790c3a52b}
}

Downloads last month: 3

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for nsschw/SocialGroupIdentification-Qwen3-4B-v1.2

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B