shreyasmeher commited on
Commit
3ed3924
·
verified ·
1 Parent(s): af0db37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +164 -10
README.md CHANGED
@@ -1,22 +1,176 @@
1
  ---
2
  base_model: unsloth/llama-3-8b-bnb-4bit
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - llama
8
  - gguf
 
 
 
 
9
  license: apache-2.0
10
  language:
11
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** shreyasmeher
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
 
 
 
 
19
 
20
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  base_model: unsloth/llama-3-8b-bnb-4bit
3
  tags:
4
+ - llama.cpp
 
 
 
5
  - gguf
6
+ - quantized
7
+ - q4_k_m
8
+ - text-classification
9
+ - bf16
10
  license: apache-2.0
11
  language:
12
  - en
13
+ widget:
14
+ - text: >-
15
+ On the morning of June 15th, armed individuals forced their way into a local
16
+ bank in Mexico City. They held bank employees and customers at gunpoint for
17
+ several hours while demanding access to the vault. The perpetrators escaped
18
+ with an undisclosed amount of money after a prolonged standoff with local
19
+ authorities.
20
+ example_title: Armed Assault Example
21
+ output:
22
+ - label: Armed Assault | Hostage Taking
23
+ score: 0.9
24
+ - text: >-
25
+ A massive explosion occurred outside a government building in Baghdad. The
26
+ blast, caused by a car bomb, killed 12 people and injured over 30 others.
27
+ The explosion caused significant damage to the building's facade and
28
+ surrounding structures.
29
+ example_title: Bombing Example
30
+ output:
31
+ - label: Bombing/Explosion
32
+ score: 0.95
33
+ pipeline_tag: text-classification
34
+ inference:
35
+ parameters:
36
+ temperature: 0.7
37
+ max_new_tokens: 128
38
+ do_sample: true
39
  ---
40
 
 
41
 
42
+ # ConflLlama: GTD-Finetuned LLaMA-3 8B
43
+ - **Model Type:** GGUF quantized (q4_k_m and q8_0)
44
+ - **Base Model:** unsloth/llama-3-8b-bnb-4bit
45
+ - **Quantization Details:**
46
+ - Methods: q4_k_m, q8_0, BF16
47
+ - q4_k_m uses Q6_K for half of attention.wv and feed_forward.w2 tensors
48
+ - Optimized for both speed (q8_0) and quality (q4_k_m)
49
 
50
+ ### Training Data
51
+ - **Dataset:** Global Terrorism Database (GTD)
52
+ - **Time Period:** Events before January 1, 2017
53
+ - **Format:** Event summaries with associated attack types
54
+ - **Labels:** Attack type classifications from GTD
55
 
56
+ ### Data Processing
57
+ 1. **Date Filtering:**
58
+ - Filtered events occurring before 2017-01-01
59
+ - Handled missing dates by setting default month/day to 1
60
+ 2. **Data Cleaning:**
61
+ - Removed entries with missing summaries
62
+ - Cleaned summary text by removing special characters and formatting
63
+ 3. **Attack Type Processing:**
64
+ - Combined multiple attack types with separator '|'
65
+ - Included primary, secondary, and tertiary attack types when available
66
+ 4. **Training Format:**
67
+ - Input: Processed event summaries
68
+ - Output: Combined attack types
69
+ - Used chat template:
70
+ ```
71
+ Below describes details about terrorist events.
72
+ >>> Event Details:
73
+ {summary}
74
+ >>> Attack Types:
75
+ {combined_attacks}
76
+ ```
77
+
78
+ ### Training Details
79
+ - **Framework:** QLoRA
80
+ - **Hardware:** NVIDIA A100-SXM4-40GB GPU on Delta Supercomputer
81
+ - **Training Configuration:**
82
+ - Batch Size: 1 per device
83
+ - Gradient Accumulation Steps: 8
84
+ - Learning Rate: 2e-4
85
+ - Max Steps: 1000
86
+ - Save Steps: 200
87
+ - Logging Steps: 10
88
+ - **LoRA Configuration:**
89
+ - Rank: 8
90
+ - Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
91
+ - Alpha: 16
92
+ - Dropout: 0
93
+ - **Optimizations:**
94
+ - Gradient Checkpointing: Enabled
95
+ - 4-bit Quantization: Enabled
96
+ - Max Sequence Length: 1024
97
+
98
+ ## Model Architecture
99
+ The model uses a combination of efficient fine-tuning techniques and optimizations for handling conflict event classification:
100
+
101
+ <p align="center">
102
+ <img src="images/model-arch.png" alt="Model Training Architecture" width="800"/>
103
+ </p>
104
+
105
+ ### Data Processing Pipeline
106
+ The preprocessing pipeline transforms raw GTD data into a format suitable for fine-tuning:
107
+
108
+ <p align="center">
109
+ <img src="images/preprocessing.png" alt="Data Preprocessing Pipeline" width="800"/>
110
+ </p>
111
+
112
+ ### Memory Optimizations
113
+ - Used 4-bit quantization
114
+ - Gradient accumulation steps: 8
115
+ - Memory-efficient gradient checkpointing
116
+ - Reduced maximum sequence length to 1024
117
+ - Disabled dataloader pin memory
118
+
119
+ ## Intended Use
120
+ This model is designed for:
121
+ 1. Classification of terrorist events based on event descriptions
122
+ 2. Research in conflict studies and terrorism analysis
123
+ 3. Understanding attack type patterns in historical events
124
+ 4. Academic research in security studies
125
+
126
+ ## Limitations
127
+ 1. Training data limited to pre-2017 events
128
+ 2. Maximum sequence length limited to 1024 tokens
129
+ 3. May not capture recent changes in attack patterns
130
+ 4. Performance dependent on quality of event descriptions
131
+
132
+ ## Ethical Considerations
133
+ 1. Model trained on sensitive terrorism-related data
134
+ 2. Should be used responsibly for research purposes only
135
+ 3. Not intended for operational security decisions
136
+ 4. Results should be interpreted with appropriate context
137
+
138
+
139
+ ## Training Logs
140
+ <p align="center">
141
+ <img src="images/training.png" alt="Training Logs" width="800"/>
142
+ </p>
143
+
144
+ The training logs show a successful training run with healthy convergence patterns:
145
+
146
+ **Loss & Learning Rate:**
147
+ - Loss decreases from 1.95 to ~0.90, with rapid initial improvement
148
+ - Learning rate uses warmup/decay schedule, peaking at ~1.5x10^-4
149
+
150
+ **Training Stability:**
151
+ - Stable gradient norms (0.4-0.6 range)
152
+ - Consistent GPU memory usage (~5800MB allocated, 7080MB reserved)
153
+ - Steady training speed (~3.5s/step) with brief interruption at step 800
154
+
155
+ The graphs indicate effective model training with good optimization dynamics and resource utilization. The loss vs. learning rate plot suggests optimal learning around 10^-4.
156
+
157
+ ## Citation
158
+ ```bibtex
159
+ @misc{conflllama,
160
+ author = {Meher, Shreyas},
161
+ title = {ConflLlama: GTD-Finetuned LLaMA-3 8B},
162
+ year = {2024},
163
+ publisher = {HuggingFace},
164
+ note = {Based on Meta's LLaMA-3 8B and GTD Dataset}
165
+ }
166
+ ```
167
+
168
+ ## Acknowledgments
169
+ - Unsloth for optimization framework and base model
170
+ - Hugging Face for transformers infrastructure
171
+ - Global Terrorism Database team
172
+ - This research was supported by NSF award 2311142
173
+ - This work used Delta at NCSA / University of Illinois through allocation CIS220162 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by NSF grants 2138259, 2138286, 2138307, 2137603, and 2138296
174
+
175
+
176
+ <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>