synapti commited on
Commit
6d86c84
·
verified ·
1 Parent(s): 2e92f74

Retrained with proper id2label mapping

Browse files
Files changed (3) hide show
  1. README.md +85 -163
  2. calibration_config.json +43 -0
  3. model.safetensors +1 -1
README.md CHANGED
@@ -1,168 +1,90 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- tags:
6
- - text-classification
7
- - propaganda-detection
8
- - multi-label
9
- - modernbert
10
- datasets:
11
- - synapti/nci-propaganda-production
12
- - synapti/nci-synthetic-articles
13
- metrics:
14
- - f1
15
- - precision
16
- - recall
17
- pipeline_tag: text-classification
18
  library_name: transformers
 
19
  base_model: answerdotai/ModernBERT-base
 
 
 
 
 
20
  ---
21
 
22
- # NCI Technique Classifier v2
23
-
24
- **Multi-label propaganda technique classifier** trained on the NCI (Neural Counter-Intelligence) Protocol dataset.
25
-
26
- ## Model Description
27
-
28
- This model detects **18 propaganda techniques** in text using a multi-label classification approach. It is designed to work as Stage 2 in a two-stage pipeline:
29
-
30
- 1. **Stage 1**: Binary detection (is there propaganda?) using `synapti/nci-binary-detector`
31
- 2. **Stage 2**: Technique classification (what techniques are used?) using this model
32
-
33
- ### Supported Techniques
34
-
35
- | Technique | Description |
36
- |-----------|-------------|
37
- | `Loaded_Language` | Words/phrases with strong emotional implications |
38
- | `Appeal_to_fear-prejudice` | Building support by exploiting fear |
39
- | `Exaggeration,Minimisation` | Making something more/less important than it is |
40
- | `Repetition` | Repeating the same message over and over |
41
- | `Flag-Waving` | Playing on national/group identity |
42
- | `Name_Calling,Labeling` | Attacking through labels rather than arguments |
43
- | `Reductio_ad_hitlerum` | Persuading by comparing to disliked groups |
44
- | `Black-and-White_Fallacy` | Presenting only two choices |
45
- | `Causal_Oversimplification` | Assuming single cause for complex issue |
46
- | `Whataboutism,Straw_Men,Red_Herring` | Deflection and misdirection |
47
- | `Straw_Man` | Misrepresenting someone's argument |
48
- | `Red_Herring` | Introducing irrelevant topics |
49
- | `Doubt` | Questioning credibility without evidence |
50
- | `Appeal_to_Authority` | Relying on authority rather than evidence |
51
- | `Thought-terminating_Cliches` | Phrases that discourage critical thought |
52
- | `Bandwagon` | Appeals to popularity |
53
- | `Slogans` | Brief, memorable phrases |
54
- | `Obfuscation,Intentional_Vagueness,Confusion` | Deliberately unclear language |
55
-
56
- ## Usage
57
-
58
- ```python
59
- from transformers import AutoModelForSequenceClassification, AutoTokenizer
60
- import torch
61
-
62
- # Load model and tokenizer
63
- model = AutoModelForSequenceClassification.from_pretrained(
64
- "synapti/nci-technique-classifier-v2",
65
- trust_remote_code=True
66
- )
67
- tokenizer = AutoTokenizer.from_pretrained(
68
- "synapti/nci-technique-classifier-v2",
69
- trust_remote_code=True
70
- )
71
-
72
- # Prepare input
73
- text = "Wake up, patriots! The radical elites are destroying our country!"
74
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
75
-
76
- # Get predictions
77
- with torch.no_grad():
78
- outputs = model(**inputs)
79
- probs = torch.sigmoid(outputs.logits)[0]
80
-
81
- # Get technique labels
82
- id2label = model.config.id2label
83
- threshold = 0.3
84
-
85
- # Print detected techniques
86
- for idx, prob in enumerate(probs):
87
- if prob.item() >= threshold:
88
- technique = id2label[str(idx)]
89
- print(f"{technique}: {prob.item():.1%}")
90
- ```
91
-
92
- ### Two-Stage Pipeline Usage
93
-
94
- ```python
95
- from nci.transformers.two_stage_pipeline import TwoStagePipeline
96
-
97
- # Load two-stage pipeline
98
- pipeline = TwoStagePipeline.from_pretrained(
99
- binary_model="synapti/nci-binary-detector",
100
- technique_model="synapti/nci-technique-classifier-v2",
101
- )
102
-
103
- # Analyze text
104
- result = pipeline.analyze("Some text to analyze...")
105
- print(f"Has propaganda: {result.has_propaganda}")
106
- print(f"Confidence: {result.propaganda_confidence:.1%}")
107
- print(f"Detected techniques: {result.detected_techniques}")
108
- ```
109
-
110
- ## Training Details
111
-
112
- ### Training Data
113
-
114
- - **Primary**: [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production) (11,573 samples)
115
- - **Augmentation**: [synapti/nci-synthetic-articles](https://huggingface.co/datasets/synapti/nci-synthetic-articles) (~5,485 synthetic article-length samples)
116
- - **Total**: ~17,000 training samples
117
-
118
- ### Training Procedure
119
-
120
- - **Base model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
121
- - **Fine-tuning**: HuggingFace AutoTrain on A100 GPU
122
- - **Epochs**: 3
123
- - **Batch size**: 16
124
- - **Learning rate**: 2e-5
125
- - **Loss function**: Focal Loss (gamma=2) for class imbalance handling
126
-
127
- ### Performance Metrics
128
-
129
- **Test Set Performance:**
130
-
131
- | Metric | Score |
132
- |--------|-------|
133
- | **Micro F1** | 80.1% |
134
- | **Macro F1** | 51.2% |
135
-
136
- **Top Performing Techniques:**
137
-
138
- | Technique | F1 Score |
139
- |-----------|----------|
140
- | Loaded_Language | 97.0% |
141
- | Appeal_to_fear-prejudice | 89.7% |
142
- | Name_Calling,Labeling | 81.8% |
143
- | Exaggeration,Minimisation | 75.4% |
144
-
145
- ## Limitations
146
-
147
- - Trained primarily on English text
148
- - Performance varies by technique (common techniques perform better)
149
- - Best used as Stage 2 after binary detection for efficient inference
150
- - Requires `trust_remote_code=True` for ModernBERT architecture
151
-
152
- ## Citation
153
-
154
- If you use this model, please cite:
155
-
156
- ```bibtex
157
- @misc{nci-technique-classifier-v2,
158
- title={NCI Technique Classifier v2: Multi-label Propaganda Detection},
159
- author={Synapti},
160
- year={2024},
161
- publisher={Hugging Face},
162
- url={https://huggingface.co/synapti/nci-technique-classifier-v2}
163
- }
164
- ```
165
-
166
- ## License
167
-
168
- Apache 2.0
 
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
+ license: apache-2.0
4
  base_model: answerdotai/ModernBERT-base
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: nci-technique-classifier-v2
9
+ results: []
10
  ---
11
 
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # nci-technique-classifier-v2
16
+
17
+ This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 0.0233
20
+ - Micro F1: 0.8017
21
+ - Macro F1: 0.6272
22
+ - Micro Precision: 0.8311
23
+ - Micro Recall: 0.7743
24
+
25
+ ## Model description
26
+
27
+ More information needed
28
+
29
+ ## Intended uses & limitations
30
+
31
+ More information needed
32
+
33
+ ## Training and evaluation data
34
+
35
+ More information needed
36
+
37
+ ## Training procedure
38
+
39
+ ### Training hyperparameters
40
+
41
+ The following hyperparameters were used during training:
42
+ - learning_rate: 2e-05
43
+ - train_batch_size: 16
44
+ - eval_batch_size: 32
45
+ - seed: 42
46
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
47
+ - lr_scheduler_type: linear
48
+ - lr_scheduler_warmup_ratio: 0.1
49
+ - num_epochs: 5
50
+ - mixed_precision_training: Native AMP
51
+
52
+ ### Training results
53
+
54
+ | Training Loss | Epoch | Step | Validation Loss | Micro F1 | Macro F1 | Micro Precision | Micro Recall |
55
+ |:-------------:|:------:|:----:|:---------------:|:--------:|:--------:|:---------------:|:------------:|
56
+ | No log | 0.1634 | 200 | 0.0350 | 0.6311 | 0.1526 | 0.7644 | 0.5373 |
57
+ | No log | 0.3268 | 400 | 0.0305 | 0.6658 | 0.1814 | 0.8020 | 0.5692 |
58
+ | 0.0552 | 0.4902 | 600 | 0.0282 | 0.7023 | 0.2044 | 0.8244 | 0.6117 |
59
+ | 0.0552 | 0.6536 | 800 | 0.0263 | 0.7268 | 0.2181 | 0.8509 | 0.6343 |
60
+ | 0.0273 | 0.8170 | 1000 | 0.0256 | 0.7497 | 0.2610 | 0.8305 | 0.6832 |
61
+ | 0.0273 | 0.9804 | 1200 | 0.0249 | 0.7462 | 0.2371 | 0.8740 | 0.6510 |
62
+ | 0.0273 | 1.1438 | 1400 | 0.0245 | 0.7626 | 0.2862 | 0.8450 | 0.6949 |
63
+ | 0.0231 | 1.3072 | 1600 | 0.0242 | 0.7583 | 0.2371 | 0.8582 | 0.6793 |
64
+ | 0.0231 | 1.4706 | 1800 | 0.0238 | 0.7650 | 0.3155 | 0.8457 | 0.6984 |
65
+ | 0.0226 | 1.6340 | 2000 | 0.0238 | 0.7624 | 0.3074 | 0.8542 | 0.6885 |
66
+ | 0.0226 | 1.7974 | 2200 | 0.0230 | 0.7626 | 0.3634 | 0.8681 | 0.68 |
67
+ | 0.0226 | 1.9608 | 2400 | 0.0223 | 0.7747 | 0.4246 | 0.8675 | 0.6998 |
68
+ | 0.0214 | 2.1242 | 2600 | 0.0225 | 0.7731 | 0.4412 | 0.8752 | 0.6924 |
69
+ | 0.0214 | 2.2876 | 2800 | 0.0221 | 0.7775 | 0.4101 | 0.8733 | 0.7005 |
70
+ | 0.0189 | 2.4510 | 3000 | 0.0219 | 0.7819 | 0.4757 | 0.8414 | 0.7303 |
71
+ | 0.0189 | 2.6144 | 3200 | 0.0224 | 0.7796 | 0.4224 | 0.8606 | 0.7126 |
72
+ | 0.0189 | 2.7778 | 3400 | 0.0217 | 0.7922 | 0.5512 | 0.8389 | 0.7504 |
73
+ | 0.0187 | 2.9412 | 3600 | 0.0217 | 0.7813 | 0.4680 | 0.8610 | 0.7150 |
74
+ | 0.0187 | 3.1046 | 3800 | 0.0224 | 0.7912 | 0.5458 | 0.8341 | 0.7526 |
75
+ | 0.0155 | 3.2680 | 4000 | 0.0231 | 0.7922 | 0.5455 | 0.8475 | 0.7437 |
76
+ | 0.0155 | 3.4314 | 4200 | 0.0231 | 0.7996 | 0.5843 | 0.8295 | 0.7717 |
77
+ | 0.0155 | 3.5948 | 4400 | 0.0223 | 0.8004 | 0.5706 | 0.8398 | 0.7646 |
78
+ | 0.0148 | 3.7582 | 4600 | 0.0228 | 0.8096 | 0.6067 | 0.8527 | 0.7706 |
79
+ | 0.0148 | 3.9216 | 4800 | 0.0229 | 0.8135 | 0.6228 | 0.8457 | 0.7837 |
80
+ | 0.0126 | 4.0850 | 5000 | 0.0255 | 0.8095 | 0.6251 | 0.8379 | 0.7830 |
81
+ | 0.0126 | 4.2484 | 5200 | 0.0267 | 0.8061 | 0.6223 | 0.8325 | 0.7812 |
82
+ | 0.0126 | 4.4118 | 5400 | 0.0261 | 0.8081 | 0.6338 | 0.8372 | 0.7809 |
83
+
84
+
85
+ ### Framework versions
86
+
87
+ - Transformers 4.57.3
88
+ - Pytorch 2.9.1+cu128
89
+ - Datasets 4.4.1
90
+ - Tokenizers 0.22.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
calibration_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "temperature": 1.0,
3
+ "thresholds": {
4
+ "Loaded_Language": 0.5,
5
+ "Appeal_to_fear-prejudice": 0.5,
6
+ "Exaggeration,Minimisation": 0.5,
7
+ "Repetition": 0.5,
8
+ "Flag-Waving": 0.5,
9
+ "Name_Calling,Labeling": 0.5,
10
+ "Reductio_ad_hitlerum": 0.5,
11
+ "Black-and-White_Fallacy": 0.5,
12
+ "Causal_Oversimplification": 0.5,
13
+ "Whataboutism,Straw_Men,Red_Herring": 0.5,
14
+ "Straw_Man": 0.5,
15
+ "Red_Herring": 0.5,
16
+ "Doubt": 0.5,
17
+ "Appeal_to_Authority": 0.5,
18
+ "Thought-terminating_Cliches": 0.5,
19
+ "Bandwagon": 0.5,
20
+ "Slogans": 0.5,
21
+ "Obfuscation,Intentional_Vagueness,Confusion": 0.5
22
+ },
23
+ "technique_labels": [
24
+ "Loaded_Language",
25
+ "Appeal_to_fear-prejudice",
26
+ "Exaggeration,Minimisation",
27
+ "Repetition",
28
+ "Flag-Waving",
29
+ "Name_Calling,Labeling",
30
+ "Reductio_ad_hitlerum",
31
+ "Black-and-White_Fallacy",
32
+ "Causal_Oversimplification",
33
+ "Whataboutism,Straw_Men,Red_Herring",
34
+ "Straw_Man",
35
+ "Red_Herring",
36
+ "Doubt",
37
+ "Appeal_to_Authority",
38
+ "Thought-terminating_Cliches",
39
+ "Bandwagon",
40
+ "Slogans",
41
+ "Obfuscation,Intentional_Vagueness,Confusion"
42
+ ]
43
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b352010ce6054ba31b79158ecba01272ef2c8b0cf8f35c7ee9ad0bfa4774724c
3
  size 598489008
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f8a1177759ab1d9f67ee8d2de26fdb6bda5eaa6e382125e17f7a494388d838d
3
  size 598489008