synapti
/

nci-technique-classifier-v2

@@ -1,63 +1,168 @@
 ---
-library_name: transformers
 license: apache-2.0
-base_model: answerdotai/ModernBERT-base
 tags:
-- generated_from_trainer
-model-index:
-- name: nci-technique-classifier-v2
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# nci-technique-classifier-v2
-This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0810
-- Micro F1: 0.8010
-- Macro F1: 0.5416
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 16
-- eval_batch_size: 16
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 3
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Micro F1 | Macro F1 |
-|:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|
-| 0.0868        | 1.0   | 1224 | 0.0925          | 0.7615   | 0.2471   |
-| 0.0783        | 2.0   | 2448 | 0.0834          | 0.7764   | 0.4156   |
-| 0.0666        | 3.0   | 3672 | 0.0810          | 0.8010   | 0.5416   |
-### Framework versions
-- Transformers 4.57.3
-- Pytorch 2.9.1+cu128
-- Datasets 4.4.1
-- Tokenizers 0.22.1

 ---
 license: apache-2.0
+language:
+- en
 tags:
+- text-classification
+- propaganda-detection
+- multi-label
+- modernbert
+datasets:
+- synapti/nci-propaganda-production
+- synapti/nci-synthetic-articles
+metrics:
+- f1
+- precision
+- recall
+pipeline_tag: text-classification
+library_name: transformers
+base_model: answerdotai/ModernBERT-base
 ---
+# NCI Technique Classifier v2
+**Multi-label propaganda technique classifier** trained on the NCI (Neural Counter-Intelligence) Protocol dataset.
+## Model Description
+This model detects **18 propaganda techniques** in text using a multi-label classification approach. It is designed to work as Stage 2 in a two-stage pipeline:
+1. **Stage 1**: Binary detection (is there propaganda?) using `synapti/nci-binary-detector`
+2. **Stage 2**: Technique classification (what techniques are used?) using this model
+### Supported Techniques
+| Technique | Description |
+|-----------|-------------|
+| `Loaded_Language` | Words/phrases with strong emotional implications |
+| `Appeal_to_fear-prejudice` | Building support by exploiting fear |
+| `Exaggeration,Minimisation` | Making something more/less important than it is |
+| `Repetition` | Repeating the same message over and over |
+| `Flag-Waving` | Playing on national/group identity |
+| `Name_Calling,Labeling` | Attacking through labels rather than arguments |
+| `Reductio_ad_hitlerum` | Persuading by comparing to disliked groups |
+| `Black-and-White_Fallacy` | Presenting only two choices |
+| `Causal_Oversimplification` | Assuming single cause for complex issue |
+| `Whataboutism,Straw_Men,Red_Herring` | Deflection and misdirection |
+| `Straw_Man` | Misrepresenting someone's argument |
+| `Red_Herring` | Introducing irrelevant topics |
+| `Doubt` | Questioning credibility without evidence |
+| `Appeal_to_Authority` | Relying on authority rather than evidence |
+| `Thought-terminating_Cliches` | Phrases that discourage critical thought |
+| `Bandwagon` | Appeals to popularity |
+| `Slogans` | Brief, memorable phrases |
+| `Obfuscation,Intentional_Vagueness,Confusion` | Deliberately unclear language |
+## Usage
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+# Load model and tokenizer
+model = AutoModelForSequenceClassification.from_pretrained(
+    "synapti/nci-technique-classifier-v2",
+    trust_remote_code=True
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    "synapti/nci-technique-classifier-v2",
+    trust_remote_code=True
+)
+# Prepare input
+text = "Wake up, patriots! The radical elites are destroying our country!"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+# Get predictions
+with torch.no_grad():
+    outputs = model(**inputs)
+    probs = torch.sigmoid(outputs.logits)[0]
+# Get technique labels
+id2label = model.config.id2label
+threshold = 0.3
+# Print detected techniques
+for idx, prob in enumerate(probs):
+    if prob.item() >= threshold:
+        technique = id2label[str(idx)]
+        print(f"{technique}: {prob.item():.1%}")
+```
+### Two-Stage Pipeline Usage
+```python
+from nci.transformers.two_stage_pipeline import TwoStagePipeline
+# Load two-stage pipeline
+pipeline = TwoStagePipeline.from_pretrained(
+    binary_model="synapti/nci-binary-detector",
+    technique_model="synapti/nci-technique-classifier-v2",
+)
+# Analyze text
+result = pipeline.analyze("Some text to analyze...")
+print(f"Has propaganda: {result.has_propaganda}")
+print(f"Confidence: {result.propaganda_confidence:.1%}")
+print(f"Detected techniques: {result.detected_techniques}")
+```
+## Training Details
+### Training Data
+- **Primary**: [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production) (11,573 samples)
+- **Augmentation**: [synapti/nci-synthetic-articles](https://huggingface.co/datasets/synapti/nci-synthetic-articles) (~5,485 synthetic article-length samples)
+- **Total**: ~17,000 training samples
+### Training Procedure
+- **Base model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
+- **Fine-tuning**: HuggingFace AutoTrain on A100 GPU
+- **Epochs**: 3
+- **Batch size**: 16
+- **Learning rate**: 2e-5
+- **Loss function**: Focal Loss (gamma=2) for class imbalance handling
+### Performance Metrics
+**Test Set Performance:**
+| Metric | Score |
+|--------|-------|
+| **Micro F1** | 80.1% |
+| **Macro F1** | 51.2% |
+**Top Performing Techniques:**
+| Technique | F1 Score |
+|-----------|----------|
+| Loaded_Language | 97.0% |
+| Appeal_to_fear-prejudice | 89.7% |
+| Name_Calling,Labeling | 81.8% |
+| Exaggeration,Minimisation | 75.4% |
+## Limitations
+- Trained primarily on English text
+- Performance varies by technique (common techniques perform better)
+- Best used as Stage 2 after binary detection for efficient inference
+- Requires `trust_remote_code=True` for ModernBERT architecture
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{nci-technique-classifier-v2,
+  title={NCI Technique Classifier v2: Multi-label Propaganda Detection},
+  author={Synapti},
+  year={2024},
+  publisher={Hugging Face},
+  url={https://huggingface.co/synapti/nci-technique-classifier-v2}
+}
+```
+## License
+Apache 2.0