synapti commited on
Commit
8382f0d
·
verified ·
1 Parent(s): 27dcd7d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +160 -55
README.md CHANGED
@@ -1,63 +1,168 @@
1
  ---
2
- library_name: transformers
3
  license: apache-2.0
4
- base_model: answerdotai/ModernBERT-base
 
5
  tags:
6
- - generated_from_trainer
7
- model-index:
8
- - name: nci-technique-classifier-v2
9
- results: []
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
- # nci-technique-classifier-v2
16
-
17
- This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
18
- It achieves the following results on the evaluation set:
19
- - Loss: 0.0810
20
- - Micro F1: 0.8010
21
- - Macro F1: 0.5416
22
-
23
- ## Model description
24
-
25
- More information needed
26
-
27
- ## Intended uses & limitations
28
-
29
- More information needed
30
-
31
- ## Training and evaluation data
32
-
33
- More information needed
34
-
35
- ## Training procedure
36
-
37
- ### Training hyperparameters
38
-
39
- The following hyperparameters were used during training:
40
- - learning_rate: 2e-05
41
- - train_batch_size: 16
42
- - eval_batch_size: 16
43
- - seed: 42
44
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
- - lr_scheduler_type: linear
46
- - num_epochs: 3
47
- - mixed_precision_training: Native AMP
48
-
49
- ### Training results
50
-
51
- | Training Loss | Epoch | Step | Validation Loss | Micro F1 | Macro F1 |
52
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|
53
- | 0.0868 | 1.0 | 1224 | 0.0925 | 0.7615 | 0.2471 |
54
- | 0.0783 | 2.0 | 2448 | 0.0834 | 0.7764 | 0.4156 |
55
- | 0.0666 | 3.0 | 3672 | 0.0810 | 0.8010 | 0.5416 |
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- ### Framework versions
 
 
 
 
 
 
 
 
 
 
59
 
60
- - Transformers 4.57.3
61
- - Pytorch 2.9.1+cu128
62
- - Datasets 4.4.1
63
- - Tokenizers 0.22.1
 
1
  ---
 
2
  license: apache-2.0
3
+ language:
4
+ - en
5
  tags:
6
+ - text-classification
7
+ - propaganda-detection
8
+ - multi-label
9
+ - modernbert
10
+ datasets:
11
+ - synapti/nci-propaganda-production
12
+ - synapti/nci-synthetic-articles
13
+ metrics:
14
+ - f1
15
+ - precision
16
+ - recall
17
+ pipeline_tag: text-classification
18
+ library_name: transformers
19
+ base_model: answerdotai/ModernBERT-base
20
  ---
21
 
22
+ # NCI Technique Classifier v2
23
+
24
+ **Multi-label propaganda technique classifier** trained on the NCI (Neural Counter-Intelligence) Protocol dataset.
25
+
26
+ ## Model Description
27
+
28
+ This model detects **18 propaganda techniques** in text using a multi-label classification approach. It is designed to work as Stage 2 in a two-stage pipeline:
29
+
30
+ 1. **Stage 1**: Binary detection (is there propaganda?) using `synapti/nci-binary-detector`
31
+ 2. **Stage 2**: Technique classification (what techniques are used?) using this model
32
+
33
+ ### Supported Techniques
34
+
35
+ | Technique | Description |
36
+ |-----------|-------------|
37
+ | `Loaded_Language` | Words/phrases with strong emotional implications |
38
+ | `Appeal_to_fear-prejudice` | Building support by exploiting fear |
39
+ | `Exaggeration,Minimisation` | Making something more/less important than it is |
40
+ | `Repetition` | Repeating the same message over and over |
41
+ | `Flag-Waving` | Playing on national/group identity |
42
+ | `Name_Calling,Labeling` | Attacking through labels rather than arguments |
43
+ | `Reductio_ad_hitlerum` | Persuading by comparing to disliked groups |
44
+ | `Black-and-White_Fallacy` | Presenting only two choices |
45
+ | `Causal_Oversimplification` | Assuming single cause for complex issue |
46
+ | `Whataboutism,Straw_Men,Red_Herring` | Deflection and misdirection |
47
+ | `Straw_Man` | Misrepresenting someone's argument |
48
+ | `Red_Herring` | Introducing irrelevant topics |
49
+ | `Doubt` | Questioning credibility without evidence |
50
+ | `Appeal_to_Authority` | Relying on authority rather than evidence |
51
+ | `Thought-terminating_Cliches` | Phrases that discourage critical thought |
52
+ | `Bandwagon` | Appeals to popularity |
53
+ | `Slogans` | Brief, memorable phrases |
54
+ | `Obfuscation,Intentional_Vagueness,Confusion` | Deliberately unclear language |
 
 
 
 
 
 
 
 
 
 
 
55
 
56
+ ## Usage
57
+
58
+ ```python
59
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
60
+ import torch
61
+
62
+ # Load model and tokenizer
63
+ model = AutoModelForSequenceClassification.from_pretrained(
64
+ "synapti/nci-technique-classifier-v2",
65
+ trust_remote_code=True
66
+ )
67
+ tokenizer = AutoTokenizer.from_pretrained(
68
+ "synapti/nci-technique-classifier-v2",
69
+ trust_remote_code=True
70
+ )
71
+
72
+ # Prepare input
73
+ text = "Wake up, patriots! The radical elites are destroying our country!"
74
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
75
+
76
+ # Get predictions
77
+ with torch.no_grad():
78
+ outputs = model(**inputs)
79
+ probs = torch.sigmoid(outputs.logits)[0]
80
+
81
+ # Get technique labels
82
+ id2label = model.config.id2label
83
+ threshold = 0.3
84
+
85
+ # Print detected techniques
86
+ for idx, prob in enumerate(probs):
87
+ if prob.item() >= threshold:
88
+ technique = id2label[str(idx)]
89
+ print(f"{technique}: {prob.item():.1%}")
90
+ ```
91
+
92
+ ### Two-Stage Pipeline Usage
93
+
94
+ ```python
95
+ from nci.transformers.two_stage_pipeline import TwoStagePipeline
96
+
97
+ # Load two-stage pipeline
98
+ pipeline = TwoStagePipeline.from_pretrained(
99
+ binary_model="synapti/nci-binary-detector",
100
+ technique_model="synapti/nci-technique-classifier-v2",
101
+ )
102
+
103
+ # Analyze text
104
+ result = pipeline.analyze("Some text to analyze...")
105
+ print(f"Has propaganda: {result.has_propaganda}")
106
+ print(f"Confidence: {result.propaganda_confidence:.1%}")
107
+ print(f"Detected techniques: {result.detected_techniques}")
108
+ ```
109
+
110
+ ## Training Details
111
+
112
+ ### Training Data
113
+
114
+ - **Primary**: [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production) (11,573 samples)
115
+ - **Augmentation**: [synapti/nci-synthetic-articles](https://huggingface.co/datasets/synapti/nci-synthetic-articles) (~5,485 synthetic article-length samples)
116
+ - **Total**: ~17,000 training samples
117
+
118
+ ### Training Procedure
119
+
120
+ - **Base model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
121
+ - **Fine-tuning**: HuggingFace AutoTrain on A100 GPU
122
+ - **Epochs**: 3
123
+ - **Batch size**: 16
124
+ - **Learning rate**: 2e-5
125
+ - **Loss function**: Focal Loss (gamma=2) for class imbalance handling
126
+
127
+ ### Performance Metrics
128
+
129
+ **Test Set Performance:**
130
+
131
+ | Metric | Score |
132
+ |--------|-------|
133
+ | **Micro F1** | 80.1% |
134
+ | **Macro F1** | 51.2% |
135
+
136
+ **Top Performing Techniques:**
137
+
138
+ | Technique | F1 Score |
139
+ |-----------|----------|
140
+ | Loaded_Language | 97.0% |
141
+ | Appeal_to_fear-prejudice | 89.7% |
142
+ | Name_Calling,Labeling | 81.8% |
143
+ | Exaggeration,Minimisation | 75.4% |
144
+
145
+ ## Limitations
146
+
147
+ - Trained primarily on English text
148
+ - Performance varies by technique (common techniques perform better)
149
+ - Best used as Stage 2 after binary detection for efficient inference
150
+ - Requires `trust_remote_code=True` for ModernBERT architecture
151
+
152
+ ## Citation
153
+
154
+ If you use this model, please cite:
155
 
156
+ ```bibtex
157
+ @misc{nci-technique-classifier-v2,
158
+ title={NCI Technique Classifier v2: Multi-label Propaganda Detection},
159
+ author={Synapti},
160
+ year={2024},
161
+ publisher={Hugging Face},
162
+ url={https://huggingface.co/synapti/nci-technique-classifier-v2}
163
+ }
164
+ ```
165
+
166
+ ## License
167
 
168
+ Apache 2.0