Llama 3.1 Dynahate QLoRA Adapter
This repository hosts a QLoRA adapter built with LLaMA-Factory for classifying social media posts from the Dynahate dataset into hate vs not hate.
It fine-tunes meta-llama/Meta-Llama-3.1-8B-Instruct with 4-bit quantization and LoRA rank 8 adapters, so you only need to load ~150 MB of adapter weights instead of the full model checkpoint.
Dataset and Prompting
- Dynahate splits: train (32 916), dev (4 100), test (4 120) CSVs converted into Alpaca-style JSONL (see
LLaMA-Factory/data/dynahate_*.jsonlin the training workspace). - Instruction:
You are a helpful Assistant. Your task is to classify the social media post as hate or not hate. Post: - System prompt:
Strictly respond only with the label: 'hate' or 'not hate'. - Label space:
hate,not hate. Outputs are normalized case-insensitively during evaluation.
Training Configuration
| Component | Value |
|---|---|
| Base model | meta-llama/Meta-Llama-3.1-8B-Instruct |
| Finetuning method | QLoRA (bnb 4-bit, lora_rank=8, lora_alpha=16, lora_dropout=0.05) |
| Sequence length | 1 024 |
| Batch size | 3 per device × grad acc 8 (effective 24) |
| Optimizer | paged_adamw_32bit, LR 2e-5, cosine schedule, warmup 10 % |
| Epochs | 3 |
| Framework versions | transformers==4.57.1, peft==0.17.1, bitsandbytes==0.43.1, PyTorch 2.1+ |
Full logs (training curves, trainer arguments, etc.) live inside this repository (trainer_log.jsonl, trainer_state.json, all_results.json).
Evaluation
Evaluation was performed with scripts/eval_dynahate_metrics.py (greedy decoding, max 4 new tokens, cutoff 1 024).
Detailed raw outputs are captured in eval_metrics.txt.
| Split | Accuracy | Macro-F1 | Precision (hate / not hate) | Recall (hate / not hate) |
|---|---|---|---|---|
| Dynahate dev | 0.9259 | 0.9255 | 0.9228 / 0.9294 | 0.9382 / 0.9121 |
| Dynahate test | 0.9153 | 0.9142 | 0.9123 / 0.9191 | 0.9361 / 0.8898 |
The evaluation script also prints full sklearn classification reports (see eval_metrics.txt).
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter_id = "muditbaid/llama31-dynahate-qlora"
tokenizer = AutoTokenizer.from_pretrained(adapter_id, use_fast=True, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_id)
instruction = "You are a helpful Assistant. Your task is to classify the social media post as hate or not hate. Post:"
system_prompt = "Strictly respond only with the label: 'hate' or 'not hate'."
post = "i can't stand those people coming into our neighborhood"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"{instruction} {post}"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=4, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True).strip())
Files in this Repository
adapter_model.safetensors/adapter_config.json: LoRA weights and configuration.tokenizer.*,chat_template.jinja,special_tokens_map.json: tokenizer assets aligned with the base model.dynahate_instruction_prompt.txt,dynahate_system_prompt.txt: prompt text used for data generation.eval_metrics.txt,eval_results.json,all_results.json,train_results.json: evaluation + training summaries.
Checkpoints produced during training (e.g., checkpoint-1000) are omitted from the upload to keep the repo lightweight.
Responsible AI & Limitations
- The Dynahate dataset contains explicit hate speech. Applications should include content warnings and guardrails to avoid resurfacing toxic language to end-users.
- The adapter is trained purely as a binary classifier; it does not provide rationales or severity levels.
- Llama 3.1’s usage terms apply. Make sure you have the correct license and comply with local regulations when deploying hate-speech classifiers.
Citation
If you use this work, please cite both the Dynahate dataset and Meta Llama 3.1:
@inproceedings{vidgen2020dynahate,
title={Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection},
author={Vidgen, Bertie and others},
booktitle={ACL},
year={2020}
}
@misc{AI@Meta2024Llama3,
author = {AI@Meta},
title = {The Llama 3 Herd of Models},
year = {2024}
}
- Downloads last month
- 10
Model tree for muditbaid/llama31-dynahate-qlora
Base model
meta-llama/Llama-3.1-8BEvaluation results
- Accuracy on Dynahate devself-reported0.926
- Macro-F1 on Dynahate devself-reported0.925
- Accuracy on Dynahate testself-reported0.915
- Macro-F1 on Dynahate testself-reported0.914