metadata
license: apache-2.0
language:
- en
tags:
- text-classification
- propaganda-detection
- multi-label
- modernbert
datasets:
- synapti/nci-propaganda-production
- synapti/nci-synthetic-articles
metrics:
- f1
- precision
- recall
pipeline_tag: text-classification
library_name: transformers
base_model: answerdotai/ModernBERT-base
NCI Technique Classifier v2
Multi-label propaganda technique classifier trained on the NCI (Neural Counter-Intelligence) Protocol dataset.
Model Description
This model detects 18 propaganda techniques in text using a multi-label classification approach. It is designed to work as Stage 2 in a two-stage pipeline:
- Stage 1: Binary detection (is there propaganda?) using
synapti/nci-binary-detector - Stage 2: Technique classification (what techniques are used?) using this model
Supported Techniques
| Technique | Description |
|---|---|
Loaded_Language |
Words/phrases with strong emotional implications |
Appeal_to_fear-prejudice |
Building support by exploiting fear |
Exaggeration,Minimisation |
Making something more/less important than it is |
Repetition |
Repeating the same message over and over |
Flag-Waving |
Playing on national/group identity |
Name_Calling,Labeling |
Attacking through labels rather than arguments |
Reductio_ad_hitlerum |
Persuading by comparing to disliked groups |
Black-and-White_Fallacy |
Presenting only two choices |
Causal_Oversimplification |
Assuming single cause for complex issue |
Whataboutism,Straw_Men,Red_Herring |
Deflection and misdirection |
Straw_Man |
Misrepresenting someone's argument |
Red_Herring |
Introducing irrelevant topics |
Doubt |
Questioning credibility without evidence |
Appeal_to_Authority |
Relying on authority rather than evidence |
Thought-terminating_Cliches |
Phrases that discourage critical thought |
Bandwagon |
Appeals to popularity |
Slogans |
Brief, memorable phrases |
Obfuscation,Intentional_Vagueness,Confusion |
Deliberately unclear language |
Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
"synapti/nci-technique-classifier-v2",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"synapti/nci-technique-classifier-v2",
trust_remote_code=True
)
# Prepare input
text = "Wake up, patriots! The radical elites are destroying our country!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
probs = torch.sigmoid(outputs.logits)[0]
# Get technique labels
id2label = model.config.id2label
threshold = 0.3
# Print detected techniques
for idx, prob in enumerate(probs):
if prob.item() >= threshold:
technique = id2label[str(idx)]
print(f"{technique}: {prob.item():.1%}")
Two-Stage Pipeline Usage
from nci.transformers.two_stage_pipeline import TwoStagePipeline
# Load two-stage pipeline
pipeline = TwoStagePipeline.from_pretrained(
binary_model="synapti/nci-binary-detector",
technique_model="synapti/nci-technique-classifier-v2",
)
# Analyze text
result = pipeline.analyze("Some text to analyze...")
print(f"Has propaganda: {result.has_propaganda}")
print(f"Confidence: {result.propaganda_confidence:.1%}")
print(f"Detected techniques: {result.detected_techniques}")
Training Details
Training Data
- Primary: synapti/nci-propaganda-production (11,573 samples)
- Augmentation: synapti/nci-synthetic-articles (~5,485 synthetic article-length samples)
- Total: ~17,000 training samples
Training Procedure
- Base model: answerdotai/ModernBERT-base
- Fine-tuning: HuggingFace AutoTrain on A100 GPU
- Epochs: 3
- Batch size: 16
- Learning rate: 2e-5
- Loss function: Focal Loss (gamma=2) for class imbalance handling
Performance Metrics
Test Set Performance:
| Metric | Score |
|---|---|
| Micro F1 | 80.1% |
| Macro F1 | 51.2% |
Top Performing Techniques:
| Technique | F1 Score |
|---|---|
| Loaded_Language | 97.0% |
| Appeal_to_fear-prejudice | 89.7% |
| Name_Calling,Labeling | 81.8% |
| Exaggeration,Minimisation | 75.4% |
Limitations
- Trained primarily on English text
- Performance varies by technique (common techniques perform better)
- Best used as Stage 2 after binary detection for efficient inference
- Requires
trust_remote_code=Truefor ModernBERT architecture
Citation
If you use this model, please cite:
@misc{nci-technique-classifier-v2,
title={NCI Technique Classifier v2: Multi-label Propaganda Detection},
author={Synapti},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/synapti/nci-technique-classifier-v2}
}
License
Apache 2.0