SIP-BERT

SIP-BERT is a transformer-based model designed to detect social inequality in German texts.
It was fine-tuned on German Bundestag debates (sourced from OpenDiscourse), where each training instance consists of 3-sentence segments.


Model Description

  • Architecture: bert-base-german-cased (from dbmdz)
  • Task: Binary classification – detecting social inequality in German texts
  • Labels:
    • 0 = no social inequality
    • 1 = social inequality
  • Language: German
  • Training Data: 1,950 annotated text passages from Bundestag debates (via OpenDiscourse)
  • Segmenting: Data split into 3-sentence units
  • Evaluation: Accuracy 0.97; F1 Score 0.95

Intended Use

  • Primary use case: Analysis of parliamentary discourse on social inequality
  • Research contexts: Political science, computational social science, discourse analysis

Limitations

  • The model is trained on Bundestag debates (1949–2021), but is specialized for texts from 1990 onwards.
  • It may be less reliable for earlier parliamentary language (1949–1989) and for non-parliamentary speech.
  • It was designed primarily to detect economic inequality, and it may not be applicable to other types of inequality.

Usage

You can load the model with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("miriamex/SIP-BERT")
model = AutoModelForSequenceClassification.from_pretrained("miriamex/SIP-BERT")

inputs = tokenizer("Hier ein Beispieltext über soziale Ungleichheit.", return_tensors="pt")
outputs = model(**inputs)
Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for miriamex/SIP-BERT

Finetuned
(29)
this model