--- language: - en license: mit library_name: transformers tags: - propaganda-detection - multi-label-classification - modernbert - nci-protocol base_model: answerdotai/ModernBERT-base datasets: - synapti/nci-propaganda-production metrics: - f1 - precision - recall pipeline_tag: text-classification --- # NCI Technique Classifier v2 Multi-label propaganda technique classifier for the NCI (News Content Intelligence) Protocol. ## Model Description This model classifies text into 18 propaganda techniques as part of a two-stage pipeline: - **Stage 1**: Binary detection (`synapti/nci-binary-detector-v2`) determines if propaganda exists - **Stage 2**: This model identifies which specific techniques are used ### Techniques Detected | ID | Technique | Description | |----|-----------|-------------| | 0 | Loaded_Language | Using words with strong emotional implications | | 1 | Appeal_to_fear-prejudice | Seeking to build support by instilling fear | | 2 | Exaggeration,Minimisation | Overstating or understating aspects of issues | | 3 | Repetition | Repeating the same message multiple times | | 4 | Flag-Waving | Appeals to patriotism or group identity | | 5 | Name_Calling,Labeling | Giving a subject a name with negative connotations | | 6 | Reductio_ad_hitlerum | Comparing to Hitler or Nazis to discredit | | 7 | Black-and-White_Fallacy | Presenting only two options when more exist | | 8 | Causal_Oversimplification | Assuming a single cause for complex issues | | 9 | Whataboutism,Straw_Men,Red_Herring | Deflection and misrepresentation tactics | | 10 | Straw_Man | Misrepresenting someone's argument | | 11 | Red_Herring | Introducing irrelevant information | | 12 | Doubt | Questioning credibility of sources | | 13 | Appeal_to_Authority | Citing authorities to support claims | | 14 | Thought-terminating_Cliches | Using clichés to end discussion | | 15 | Bandwagon | Appeal to popularity | | 16 | Slogans | Brief, striking phrases | | 17 | Obfuscation,Intentional_Vagueness,Confusion | Being deliberately unclear | ## Training - **Base Model**: `answerdotai/ModernBERT-base` - **Dataset**: `synapti/nci-propaganda-production` (19,581 train, 1,727 val, 1,729 test) - **Loss**: Focal Loss (gamma=2.0) with class weights for imbalanced techniques - **Epochs**: 5 - **Batch Size**: 16 - **Learning Rate**: 2e-5 - **Hardware**: NVIDIA A10G GPU ## Performance | Metric | Score | |--------|-------| | Micro F1 | 80.2% | | Macro F1 | 63.9% | | Micro Precision | 83.4% | | Micro Recall | 77.4% | ### Per-Technique Performance (selected) | Technique | F1 Score | |-----------|----------| | Loaded_Language | 97.0% | | Appeal_to_fear-prejudice | 89.7% | | Name_Calling,Labeling | 84.3% | | Flag-Waving | 82.1% | ## Usage ### With Transformers ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-technique-classifier-v2") tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2") text = "The radical left is DESTROYING our great nation!" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) probs = torch.sigmoid(outputs.logits)[0] # Get techniques above threshold threshold = 0.5 techniques = list(model.config.id2label.values()) detected = [(techniques[i], probs[i].item()) for i in range(len(techniques)) if probs[i] > threshold] print(detected) ``` ### With NCI Protocol ```python from nci.transformers.two_stage_pipeline import TwoStagePipeline pipeline = TwoStagePipeline.from_pretrained( binary_model="synapti/nci-binary-detector-v2", technique_model="synapti/nci-technique-classifier-v2", ) result = pipeline.analyze("The radical left is DESTROYING our great nation!") print(f"Has propaganda: {result.has_propaganda}") print(f"Techniques: {[t.name for t in result.techniques if t.above_threshold]}") ``` ### ONNX Inference ONNX model available in `onnx/model.onnx` for faster inference (~1.25x speedup). ```python import onnxruntime as ort import numpy as np from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2") session = ort.InferenceSession("onnx/model.onnx") text = "WAKE UP AMERICA!" inputs = tokenizer(text, return_tensors="np", truncation=True, max_length=512) outputs = session.run(None, { "input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"] }) probs = 1 / (1 + np.exp(-outputs[0])) # sigmoid ``` ## Limitations - Trained primarily on English news articles - May not generalize well to social media or other domains - Threshold of 0.5 may need adjustment for specific use cases - Multi-label classification means multiple techniques can be detected per text ## Citation ```bibtex @misc{nci-technique-classifier-v2, author = {Synapti}, title = {NCI Technique Classifier v2}, year = {2024}, publisher = {Hugging Face}, url = {https://huggingface.co/synapti/nci-technique-classifier-v2} } ``` ## License MIT License