nci-technique-classifier-v2 / README.md

synapti

Upload README.md with huggingface_hub

8382f0d verified 13 days ago

preview code

raw

history blame

5.18 kB

metadata

license: apache-2.0
language:
  - en
tags:
  - text-classification
  - propaganda-detection
  - multi-label
  - modernbert
datasets:
  - synapti/nci-propaganda-production
  - synapti/nci-synthetic-articles
metrics:
  - f1
  - precision
  - recall
pipeline_tag: text-classification
library_name: transformers
base_model: answerdotai/ModernBERT-base

NCI Technique Classifier v2

Multi-label propaganda technique classifier trained on the NCI (Neural Counter-Intelligence) Protocol dataset.

Model Description

This model detects 18 propaganda techniques in text using a multi-label classification approach. It is designed to work as Stage 2 in a two-stage pipeline:

Stage 1: Binary detection (is there propaganda?) using synapti/nci-binary-detector
Stage 2: Technique classification (what techniques are used?) using this model

Supported Techniques

Technique	Description
`Loaded_Language`	Words/phrases with strong emotional implications
`Appeal_to_fear-prejudice`	Building support by exploiting fear
`Exaggeration,Minimisation`	Making something more/less important than it is
`Repetition`	Repeating the same message over and over
`Flag-Waving`	Playing on national/group identity
`Name_Calling,Labeling`	Attacking through labels rather than arguments
`Reductio_ad_hitlerum`	Persuading by comparing to disliked groups
`Black-and-White_Fallacy`	Presenting only two choices
`Causal_Oversimplification`	Assuming single cause for complex issue
`Whataboutism,Straw_Men,Red_Herring`	Deflection and misdirection
`Straw_Man`	Misrepresenting someone's argument
`Red_Herring`	Introducing irrelevant topics
`Doubt`	Questioning credibility without evidence
`Appeal_to_Authority`	Relying on authority rather than evidence
`Thought-terminating_Cliches`	Phrases that discourage critical thought
`Bandwagon`	Appeals to popularity
`Slogans`	Brief, memorable phrases
`Obfuscation,Intentional_Vagueness,Confusion`	Deliberately unclear language

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    "synapti/nci-technique-classifier-v2",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "synapti/nci-technique-classifier-v2",
    trust_remote_code=True
)

# Prepare input
text = "Wake up, patriots! The radical elites are destroying our country!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits)[0]

# Get technique labels
id2label = model.config.id2label
threshold = 0.3

# Print detected techniques
for idx, prob in enumerate(probs):
    if prob.item() >= threshold:
        technique = id2label[str(idx)]
        print(f"{technique}: {prob.item():.1%}")

Two-Stage Pipeline Usage

from nci.transformers.two_stage_pipeline import TwoStagePipeline

# Load two-stage pipeline
pipeline = TwoStagePipeline.from_pretrained(
    binary_model="synapti/nci-binary-detector",
    technique_model="synapti/nci-technique-classifier-v2",
)

# Analyze text
result = pipeline.analyze("Some text to analyze...")
print(f"Has propaganda: {result.has_propaganda}")
print(f"Confidence: {result.propaganda_confidence:.1%}")
print(f"Detected techniques: {result.detected_techniques}")

Training Details

Training Data

Primary: synapti/nci-propaganda-production (11,573 samples)
Augmentation: synapti/nci-synthetic-articles (~5,485 synthetic article-length samples)
Total: ~17,000 training samples

Training Procedure

Base model: answerdotai/ModernBERT-base
Fine-tuning: HuggingFace AutoTrain on A100 GPU
Epochs: 3
Batch size: 16
Learning rate: 2e-5
Loss function: Focal Loss (gamma=2) for class imbalance handling

Performance Metrics

Test Set Performance:

Metric	Score
Micro F1	80.1%
Macro F1	51.2%

Top Performing Techniques:

Technique	F1 Score
Loaded_Language	97.0%
Appeal_to_fear-prejudice	89.7%
Name_Calling,Labeling	81.8%
Exaggeration,Minimisation	75.4%

Limitations

Trained primarily on English text
Performance varies by technique (common techniques perform better)
Best used as Stage 2 after binary detection for efficient inference
Requires trust_remote_code=True for ModernBERT architecture

Citation

If you use this model, please cite:

@misc{nci-technique-classifier-v2,
  title={NCI Technique Classifier v2: Multi-label Propaganda Detection},
  author={Synapti},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/synapti/nci-technique-classifier-v2}
}

License

Apache 2.0