GSM8K Fine-tuned Llama3 8B Instruct

Llama 3.1 8B Instruct model fine-tuned on GSM8K dataset for improved mathematical reasoning capabilities.

Model Details

  • Base Model: meta-llama/Llama-3.1-8B-Instruct
  • Training Method: LoRA (Low-Rank Adaptation)
  • Training Dataset: GSM8K
  • Training Date: 2025-11-10

Training Configuration

  • LoRA Rank: 8
  • LoRA Alpha: 16
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Training Samples: 500
  • Epochs: 3
  • Batch Size: 2
  • Learning Rate: 1e-4
  • Max Length: 512

Performance

  • GSM8K Test Accuracy: 55.00% (11/20 samples)
  • Training Loss: ~0.43 (final)

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "kmseong/GSM8K-Llama3_8B_Instruct_SFT")
model = model.merge_and_unload()  # Optional: merge LoRA weights

# Generate
prompt = "Solve this math problem step by step:\n\nNatalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?\n\nProvide your final answer in the format:\n[reasoning steps]\n####\n[final answer (just the number)]"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

With PEFT (Recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load model with LoRA adapter
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
model = PeftModel.from_pretrained(model, "kmseong/GSM8K-Llama3_8B_Instruct_SFT")
tokenizer = AutoTokenizer.from_pretrained("kmseong/GSM8K-Llama3_8B_Instruct_SFT")

# Use for inference
inputs = tokenizer("Solve: 2x + 3 = 7", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)

Training Script

The model was trained using the following script:

python finetune_gsm8k.py \
  --num_train_samples 500 \
  --num_eval_samples 50 \
  --epochs 3 \
  --batch_size 2 \
  --learning_rate 1e-4 \
  --output_dir ./gsm8k_finetuned_v2

Evaluation

# Quick evaluation on GSM8K test set
from datasets import load_dataset

test_dataset = load_dataset('openai/gsm8k', 'main', split='test')

# Load your model and evaluate
# (See evaluate_on_gsm8k function in the training script)

Citation

@misc{gsm8k-finetuned-llama3,
  title={GSM8K Fine-tuned Llama 3.1 8B Instruct},
  author={Kim, Min-Seong},
  year={2025}
}

License

This model is built on Llama 3.1 8B Instruct and follows the same license.

Disclaimer

This model is fine-tuned specifically for mathematical reasoning tasks. Performance on other tasks may vary. Always evaluate model outputs for your specific use cases.

Downloads last month
16
Safetensors
Model size
8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support