SIP-BERT
SIP-BERT is a transformer-based model designed to detect social inequality in German texts.
It was fine-tuned on German Bundestag debates (sourced from OpenDiscourse), where each training instance consists of 3-sentence segments.
Model Description
- Architecture:
bert-base-german-cased(from dbmdz) - Task: Binary classification – detecting social inequality in German texts
- Labels:
0= no social inequality1= social inequality
- Language: German
- Training Data: 1,950 annotated text passages from Bundestag debates (via OpenDiscourse)
- Segmenting: Data split into 3-sentence units
- Evaluation: Accuracy 0.97; F1 Score 0.95
Intended Use
- Primary use case: Analysis of parliamentary discourse on social inequality
- Research contexts: Political science, computational social science, discourse analysis
Limitations
- The model is trained on Bundestag debates (1949–2021), but is specialized for texts from 1990 onwards.
- It may be less reliable for earlier parliamentary language (1949–1989) and for non-parliamentary speech.
- It was designed primarily to detect economic inequality, and it may not be applicable to other types of inequality.
Usage
You can load the model with the Hugging Face transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("miriamex/SIP-BERT")
model = AutoModelForSequenceClassification.from_pretrained("miriamex/SIP-BERT")
inputs = tokenizer("Hier ein Beispieltext über soziale Ungleichheit.", return_tensors="pt")
outputs = model(**inputs)
- Downloads last month
- 6
Model tree for miriamex/SIP-BERT
Base model
dbmdz/bert-base-german-cased