synapti's picture
Upload README.md with huggingface_hub
8382f0d verified
|
raw
history blame
5.18 kB
metadata
license: apache-2.0
language:
  - en
tags:
  - text-classification
  - propaganda-detection
  - multi-label
  - modernbert
datasets:
  - synapti/nci-propaganda-production
  - synapti/nci-synthetic-articles
metrics:
  - f1
  - precision
  - recall
pipeline_tag: text-classification
library_name: transformers
base_model: answerdotai/ModernBERT-base

NCI Technique Classifier v2

Multi-label propaganda technique classifier trained on the NCI (Neural Counter-Intelligence) Protocol dataset.

Model Description

This model detects 18 propaganda techniques in text using a multi-label classification approach. It is designed to work as Stage 2 in a two-stage pipeline:

  1. Stage 1: Binary detection (is there propaganda?) using synapti/nci-binary-detector
  2. Stage 2: Technique classification (what techniques are used?) using this model

Supported Techniques

Technique Description
Loaded_Language Words/phrases with strong emotional implications
Appeal_to_fear-prejudice Building support by exploiting fear
Exaggeration,Minimisation Making something more/less important than it is
Repetition Repeating the same message over and over
Flag-Waving Playing on national/group identity
Name_Calling,Labeling Attacking through labels rather than arguments
Reductio_ad_hitlerum Persuading by comparing to disliked groups
Black-and-White_Fallacy Presenting only two choices
Causal_Oversimplification Assuming single cause for complex issue
Whataboutism,Straw_Men,Red_Herring Deflection and misdirection
Straw_Man Misrepresenting someone's argument
Red_Herring Introducing irrelevant topics
Doubt Questioning credibility without evidence
Appeal_to_Authority Relying on authority rather than evidence
Thought-terminating_Cliches Phrases that discourage critical thought
Bandwagon Appeals to popularity
Slogans Brief, memorable phrases
Obfuscation,Intentional_Vagueness,Confusion Deliberately unclear language

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    "synapti/nci-technique-classifier-v2",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "synapti/nci-technique-classifier-v2",
    trust_remote_code=True
)

# Prepare input
text = "Wake up, patriots! The radical elites are destroying our country!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits)[0]

# Get technique labels
id2label = model.config.id2label
threshold = 0.3

# Print detected techniques
for idx, prob in enumerate(probs):
    if prob.item() >= threshold:
        technique = id2label[str(idx)]
        print(f"{technique}: {prob.item():.1%}")

Two-Stage Pipeline Usage

from nci.transformers.two_stage_pipeline import TwoStagePipeline

# Load two-stage pipeline
pipeline = TwoStagePipeline.from_pretrained(
    binary_model="synapti/nci-binary-detector",
    technique_model="synapti/nci-technique-classifier-v2",
)

# Analyze text
result = pipeline.analyze("Some text to analyze...")
print(f"Has propaganda: {result.has_propaganda}")
print(f"Confidence: {result.propaganda_confidence:.1%}")
print(f"Detected techniques: {result.detected_techniques}")

Training Details

Training Data

Training Procedure

  • Base model: answerdotai/ModernBERT-base
  • Fine-tuning: HuggingFace AutoTrain on A100 GPU
  • Epochs: 3
  • Batch size: 16
  • Learning rate: 2e-5
  • Loss function: Focal Loss (gamma=2) for class imbalance handling

Performance Metrics

Test Set Performance:

Metric Score
Micro F1 80.1%
Macro F1 51.2%

Top Performing Techniques:

Technique F1 Score
Loaded_Language 97.0%
Appeal_to_fear-prejudice 89.7%
Name_Calling,Labeling 81.8%
Exaggeration,Minimisation 75.4%

Limitations

  • Trained primarily on English text
  • Performance varies by technique (common techniques perform better)
  • Best used as Stage 2 after binary detection for efficient inference
  • Requires trust_remote_code=True for ModernBERT architecture

Citation

If you use this model, please cite:

@misc{nci-technique-classifier-v2,
  title={NCI Technique Classifier v2: Multi-label Propaganda Detection},
  author={Synapti},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/synapti/nci-technique-classifier-v2}
}

License

Apache 2.0