Upload README.md with huggingface_hub

8382f0d verified 25 days ago

5.18 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- text-classification
	- propaganda-detection
	- multi-label
	- modernbert
	datasets:
	- synapti/nci-propaganda-production
	- synapti/nci-synthetic-articles
	metrics:
	- f1
	- precision
	- recall
	pipeline_tag: text-classification
	library_name: transformers
	base_model: answerdotai/ModernBERT-base
	---

	# NCI Technique Classifier v2

	Multi-label propaganda technique classifier trained on the NCI (Neural Counter-Intelligence) Protocol dataset.

	## Model Description

	This model detects 18 propaganda techniques in text using a multi-label classification approach. It is designed to work as Stage 2 in a two-stage pipeline:

	1. Stage 1: Binary detection (is there propaganda?) using `synapti/nci-binary-detector`
	2. Stage 2: Technique classification (what techniques are used?) using this model

	### Supported Techniques

	\| Technique \| Description \|
	\|-----------\|-------------\|
	\| `Loaded_Language` \| Words/phrases with strong emotional implications \|
	\| `Appeal_to_fear-prejudice` \| Building support by exploiting fear \|
	\| `Exaggeration,Minimisation` \| Making something more/less important than it is \|
	\| `Repetition` \| Repeating the same message over and over \|
	\| `Flag-Waving` \| Playing on national/group identity \|
	\| `Name_Calling,Labeling` \| Attacking through labels rather than arguments \|
	\| `Reductio_ad_hitlerum` \| Persuading by comparing to disliked groups \|
	\| `Black-and-White_Fallacy` \| Presenting only two choices \|
	\| `Causal_Oversimplification` \| Assuming single cause for complex issue \|
	\| `Whataboutism,Straw_Men,Red_Herring` \| Deflection and misdirection \|
	\| `Straw_Man` \| Misrepresenting someone's argument \|
	\| `Red_Herring` \| Introducing irrelevant topics \|
	\| `Doubt` \| Questioning credibility without evidence \|
	\| `Appeal_to_Authority` \| Relying on authority rather than evidence \|
	\| `Thought-terminating_Cliches` \| Phrases that discourage critical thought \|
	\| `Bandwagon` \| Appeals to popularity \|
	\| `Slogans` \| Brief, memorable phrases \|
	\| `Obfuscation,Intentional_Vagueness,Confusion` \| Deliberately unclear language \|

	## Usage

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	# Load model and tokenizer
	model = AutoModelForSequenceClassification.from_pretrained(
	"synapti/nci-technique-classifier-v2",
	trust_remote_code=True
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"synapti/nci-technique-classifier-v2",
	trust_remote_code=True
	)

	# Prepare input
	text = "Wake up, patriots! The radical elites are destroying our country!"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

	# Get predictions
	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.sigmoid(outputs.logits)[0]

	# Get technique labels
	id2label = model.config.id2label
	threshold = 0.3

	# Print detected techniques
	for idx, prob in enumerate(probs):
	if prob.item() >= threshold:
	technique = id2label[str(idx)]
	print(f"{technique}: {prob.item():.1%}")
	```

	### Two-Stage Pipeline Usage

	```python
	from nci.transformers.two_stage_pipeline import TwoStagePipeline

	# Load two-stage pipeline
	pipeline = TwoStagePipeline.from_pretrained(
	binary_model="synapti/nci-binary-detector",
	technique_model="synapti/nci-technique-classifier-v2",
	)

	# Analyze text
	result = pipeline.analyze("Some text to analyze...")
	print(f"Has propaganda: {result.has_propaganda}")
	print(f"Confidence: {result.propaganda_confidence:.1%}")
	print(f"Detected techniques: {result.detected_techniques}")
	```

	## Training Details

	### Training Data

	- Primary: [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production) (11,573 samples)
	- Augmentation: [synapti/nci-synthetic-articles](https://huggingface.co/datasets/synapti/nci-synthetic-articles) (~5,485 synthetic article-length samples)
	- Total: ~17,000 training samples

	### Training Procedure

	- Base model: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
	- Fine-tuning: HuggingFace AutoTrain on A100 GPU
	- Epochs: 3
	- Batch size: 16
	- Learning rate: 2e-5
	- Loss function: Focal Loss (gamma=2) for class imbalance handling

	### Performance Metrics

	Test Set Performance:

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Micro F1 \| 80.1% \|
	\| Macro F1 \| 51.2% \|

	Top Performing Techniques:

	\| Technique \| F1 Score \|
	\|-----------\|----------\|
	\| Loaded_Language \| 97.0% \|
	\| Appeal_to_fear-prejudice \| 89.7% \|
	\| Name_Calling,Labeling \| 81.8% \|
	\| Exaggeration,Minimisation \| 75.4% \|

	## Limitations

	- Trained primarily on English text
	- Performance varies by technique (common techniques perform better)
	- Best used as Stage 2 after binary detection for efficient inference
	- Requires `trust_remote_code=True` for ModernBERT architecture

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{nci-technique-classifier-v2,
	title={NCI Technique Classifier v2: Multi-label Propaganda Detection},
	author={Synapti},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/synapti/nci-technique-classifier-v2}
	}
	```

	## License

	Apache 2.0