shreyasmeher
/

ConflLlama-Alt

@@ -1,22 +1,176 @@
 ---
 base_model: unsloth/llama-3-8b-bnb-4bit
 tags:
-- text-generation-inference
-- transformers
-- unsloth
-- llama
 - gguf
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** shreyasmeher
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
-This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
 base_model: unsloth/llama-3-8b-bnb-4bit
 tags:
+- llama.cpp
 - gguf
+- quantized
+- q4_k_m
+- text-classification
+- bf16
 license: apache-2.0
 language:
 - en
+widget:
+- text: >-
+    On the morning of June 15th, armed individuals forced their way into a local
+    bank in Mexico City. They held bank employees and customers at gunpoint for
+    several hours while demanding access to the vault. The perpetrators escaped
+    with an undisclosed amount of money after a prolonged standoff with local
+    authorities.
+  example_title: Armed Assault Example
+  output:
+  - label: Armed Assault | Hostage Taking
+    score: 0.9
+- text: >-
+    A massive explosion occurred outside a government building in Baghdad. The
+    blast, caused by a car bomb, killed 12 people and injured over 30 others.
+    The explosion caused significant damage to the building's facade and
+    surrounding structures.
+  example_title: Bombing Example
+  output:
+  - label: Bombing/Explosion
+    score: 0.95
+pipeline_tag: text-classification
+inference:
+  parameters:
+    temperature: 0.7
+    max_new_tokens: 128
+    do_sample: true
 ---
+# ConflLlama: GTD-Finetuned LLaMA-3 8B
+- **Model Type:** GGUF quantized (q4_k_m and q8_0)
+- **Base Model:** unsloth/llama-3-8b-bnb-4bit
+- **Quantization Details:**
+  - Methods: q4_k_m, q8_0, BF16
+  - q4_k_m uses Q6_K for half of attention.wv and feed_forward.w2 tensors
+  - Optimized for both speed (q8_0) and quality (q4_k_m)
+### Training Data
+- **Dataset:** Global Terrorism Database (GTD)
+- **Time Period:** Events before January 1, 2017
+- **Format:** Event summaries with associated attack types
+- **Labels:** Attack type classifications from GTD
+### Data Processing
+1. **Date Filtering:**
+   - Filtered events occurring before 2017-01-01
+   - Handled missing dates by setting default month/day to 1
+2. **Data Cleaning:**
+   - Removed entries with missing summaries
+   - Cleaned summary text by removing special characters and formatting
+3. **Attack Type Processing:**
+   - Combined multiple attack types with separator '|'
+   - Included primary, secondary, and tertiary attack types when available
+4. **Training Format:**
+   - Input: Processed event summaries
+   - Output: Combined attack types
+   - Used chat template:
+     ```
+     Below describes details about terrorist events.
+     >>> Event Details:
+     {summary}
+     >>> Attack Types:
+     {combined_attacks}
+     ```
+### Training Details
+- **Framework:** QLoRA
+- **Hardware:** NVIDIA A100-SXM4-40GB GPU on Delta Supercomputer
+- **Training Configuration:**
+  - Batch Size: 1 per device
+  - Gradient Accumulation Steps: 8
+  - Learning Rate: 2e-4
+  - Max Steps: 1000
+  - Save Steps: 200
+  - Logging Steps: 10
+- **LoRA Configuration:**
+  - Rank: 8
+  - Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+  - Alpha: 16
+  - Dropout: 0
+- **Optimizations:**
+  - Gradient Checkpointing: Enabled
+  - 4-bit Quantization: Enabled
+  - Max Sequence Length: 1024
+## Model Architecture
+The model uses a combination of efficient fine-tuning techniques and optimizations for handling conflict event classification:
+<p align="center">
+  <img src="images/model-arch.png" alt="Model Training Architecture" width="800"/>
+</p>
+### Data Processing Pipeline
+The preprocessing pipeline transforms raw GTD data into a format suitable for fine-tuning:
+<p align="center">
+  <img src="images/preprocessing.png" alt="Data Preprocessing Pipeline" width="800"/>
+</p>
+### Memory Optimizations
+- Used 4-bit quantization
+- Gradient accumulation steps: 8
+- Memory-efficient gradient checkpointing
+- Reduced maximum sequence length to 1024
+- Disabled dataloader pin memory
+## Intended Use
+This model is designed for:
+1. Classification of terrorist events based on event descriptions
+2. Research in conflict studies and terrorism analysis
+3. Understanding attack type patterns in historical events
+4. Academic research in security studies
+## Limitations
+1. Training data limited to pre-2017 events
+2. Maximum sequence length limited to 1024 tokens
+3. May not capture recent changes in attack patterns
+4. Performance dependent on quality of event descriptions
+## Ethical Considerations
+1. Model trained on sensitive terrorism-related data
+2. Should be used responsibly for research purposes only
+3. Not intended for operational security decisions
+4. Results should be interpreted with appropriate context
+## Training Logs
+<p align="center">
+  <img src="images/training.png" alt="Training Logs" width="800"/>
+</p>
+The training logs show a successful training run with healthy convergence patterns:
+**Loss & Learning Rate:**
+- Loss decreases from 1.95 to ~0.90, with rapid initial improvement
+- Learning rate uses warmup/decay schedule, peaking at ~1.5x10^-4
+**Training Stability:**
+- Stable gradient norms (0.4-0.6 range)
+- Consistent GPU memory usage (~5800MB allocated, 7080MB reserved)
+- Steady training speed (~3.5s/step) with brief interruption at step 800
+The graphs indicate effective model training with good optimization dynamics and resource utilization. The loss vs. learning rate plot suggests optimal learning around 10^-4.
+## Citation
+```bibtex
+@misc{conflllama,
+  author = {Meher, Shreyas},
+  title = {ConflLlama: GTD-Finetuned LLaMA-3 8B},
+  year = {2024},
+  publisher = {HuggingFace},
+  note = {Based on Meta's LLaMA-3 8B and GTD Dataset}
+}
+```
+## Acknowledgments
+- Unsloth for optimization framework and base model
+- Hugging Face for transformers infrastructure
+- Global Terrorism Database team
+- This research was supported by NSF award 2311142
+- This work used Delta at NCSA / University of Illinois through allocation CIS220162 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by NSF grants 2138259, 2138286, 2138307, 2137603, and 2138296
+<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>