GSM8K Fine-tuned Llama3 8B Instruct

Llama 3.1 8B Instruct model fine-tuned on GSM8K dataset for improved mathematical reasoning capabilities.

Model Details

Base Model: meta-llama/Llama-3.1-8B-Instruct
Training Method: LoRA (Low-Rank Adaptation)
Training Dataset: GSM8K
Training Date: 2025-11-10

Training Configuration

LoRA Rank: 8
LoRA Alpha: 16
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Samples: 500
Epochs: 3
Batch Size: 2
Learning Rate: 1e-4
Max Length: 512

Performance

GSM8K Test Accuracy: 55.00% (11/20 samples)
Training Loss: ~0.43 (final)

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "kmseong/GSM8K-Llama3_8B_Instruct_SFT")
model = model.merge_and_unload()  # Optional: merge LoRA weights

# Generate
prompt = "Solve this math problem step by step:\n\nNatalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?\n\nProvide your final answer in the format:\n[reasoning steps]\n####\n[final answer (just the number)]"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

With PEFT (Recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load model with LoRA adapter
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
model = PeftModel.from_pretrained(model, "kmseong/GSM8K-Llama3_8B_Instruct_SFT")
tokenizer = AutoTokenizer.from_pretrained("kmseong/GSM8K-Llama3_8B_Instruct_SFT")

# Use for inference
inputs = tokenizer("Solve: 2x + 3 = 7", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)

Training Script

The model was trained using the following script:

python finetune_gsm8k.py \
  --num_train_samples 500 \
  --num_eval_samples 50 \
  --epochs 3 \
  --batch_size 2 \
  --learning_rate 1e-4 \
  --output_dir ./gsm8k_finetuned_v2

Evaluation

# Quick evaluation on GSM8K test set
from datasets import load_dataset

test_dataset = load_dataset('openai/gsm8k', 'main', split='test')

# Load your model and evaluate
# (See evaluate_on_gsm8k function in the training script)

Citation

@misc{gsm8k-finetuned-llama3,
  title={GSM8K Fine-tuned Llama 3.1 8B Instruct},
  author={Kim, Min-Seong},
  year={2025}
}

License

This model is built on Llama 3.1 8B Instruct and follows the same license.

Disclaimer

This model is fine-tuned specifically for mathematical reasoning tasks. Performance on other tasks may vary. Always evaluate model outputs for your specific use cases.

Downloads last month: 16

Safetensors

Model size

8B params

Tensor type

F32