GSM8K Fine-tuned Llama3 8B Instruct
Llama 3.1 8B Instruct model fine-tuned on GSM8K dataset for improved mathematical reasoning capabilities.
Model Details
- Base Model: meta-llama/Llama-3.1-8B-Instruct
- Training Method: LoRA (Low-Rank Adaptation)
- Training Dataset: GSM8K
- Training Date: 2025-11-10
Training Configuration
- LoRA Rank: 8
- LoRA Alpha: 16
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training Samples: 500
- Epochs: 3
- Batch Size: 2
- Learning Rate: 1e-4
- Max Length: 512
Performance
- GSM8K Test Accuracy: 55.00% (11/20 samples)
- Training Loss: ~0.43 (final)
Usage
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "kmseong/GSM8K-Llama3_8B_Instruct_SFT")
model = model.merge_and_unload() # Optional: merge LoRA weights
# Generate
prompt = "Solve this math problem step by step:\n\nNatalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?\n\nProvide your final answer in the format:\n[reasoning steps]\n####\n[final answer (just the number)]"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
With PEFT (Recommended)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load model with LoRA adapter
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
torch_dtype="auto",
device_map="auto"
)
model = PeftModel.from_pretrained(model, "kmseong/GSM8K-Llama3_8B_Instruct_SFT")
tokenizer = AutoTokenizer.from_pretrained("kmseong/GSM8K-Llama3_8B_Instruct_SFT")
# Use for inference
inputs = tokenizer("Solve: 2x + 3 = 7", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
Training Script
The model was trained using the following script:
python finetune_gsm8k.py \
--num_train_samples 500 \
--num_eval_samples 50 \
--epochs 3 \
--batch_size 2 \
--learning_rate 1e-4 \
--output_dir ./gsm8k_finetuned_v2
Evaluation
# Quick evaluation on GSM8K test set
from datasets import load_dataset
test_dataset = load_dataset('openai/gsm8k', 'main', split='test')
# Load your model and evaluate
# (See evaluate_on_gsm8k function in the training script)
Citation
@misc{gsm8k-finetuned-llama3,
title={GSM8K Fine-tuned Llama 3.1 8B Instruct},
author={Kim, Min-Seong},
year={2025}
}
License
This model is built on Llama 3.1 8B Instruct and follows the same license.
Disclaimer
This model is fine-tuned specifically for mathematical reasoning tasks. Performance on other tasks may vary. Always evaluate model outputs for your specific use cases.
- Downloads last month
- 16