Qwen2.5-1.5B-Instruct Fine-tuned for Mathematical Reasoning

A fine-tuned version of Qwen2.5-1.5B-Instruct trained to solve mathematical word problems with explicit step-by-step reasoning chains.

Model Details

Model Description

This model is a QLoRA fine-tuned version of Qwen2.5-1.5B-Instruct, specifically trained to solve mathematical word problems from the GSM8K dataset. The model learns to break down complex problems into numbered reasoning steps, show intermediate calculations, and provide clear final answers.

The fine-tuning uses synthetic data generated by prompting the base model to produce detailed reasoning chains, then training on these structured examples to reinforce both mathematical accuracy and explanation quality.

Developed by: Nishitha
Model type: Causal Language Model (Fine-tuned with QLoRA)
Language: English
License: Same as base model (Qwen2.5-1.5B-Instruct)
Finetuned from model: Qwen/Qwen2.5-1.5B-Instruct
Fine-tuning method: QLoRA (4-bit quantization + LoRA adapters)

Model Sources

Base Model Repository: https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct
Training Dataset: GSM8K with synthetic reasoning chains

Uses

Direct Use

This model is designed to solve grade school math word problems with step-by-step explanations. It excels at:

Breaking down complex math problems into manageable steps
Showing intermediate calculations and reasoning
Providing structured, educational responses
Teaching mathematical problem-solving approaches

Example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "your-username/model-name")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")

prompt = "Question: Janet has 5 apples. She buys 3 more. How many does she have now?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0]))

Downstream Use

Potential applications include:

Educational tutoring systems requiring step-by-step explanations
Math homework assistance tools
Reasoning capability enhancement for small language models
Foundation for further fine-tuning on domain-specific math problems

Out-of-Scope Use

This model is NOT suitable for:

Advanced mathematics (calculus, linear algebra, etc.) - trained only on grade school math
High-stakes decision making or professional calculations
Problems requiring external tools, calculators, or symbolic computation
Non-mathematical reasoning tasks

Bias, Risks, and Limitations

Known Limitations:

70% accuracy rate - the model makes errors on 30% of test problems
Occasional arithmetic mistakes in multi-step calculations
Training data generated by same-sized base model, limiting maximum achievable accuracy
Small model size (1.5B parameters) constrains mathematical reasoning capability
May confidently present incorrect answers with plausible-looking reasoning steps

Risks:

Users may trust incorrect mathematical solutions if they appear well-reasoned
Not suitable for any application where calculation accuracy is critical
May inherit biases from the GSM8K dataset and base model

Recommendations

Always verify answers for important calculations
Use as an educational aid, not a calculator replacement
Best suited for learning and demonstration rather than production applications
Consider ensemble methods or verification steps for critical use cases
Be aware that structured reasoning doesn't guarantee correctness

Training Details

Training Data

Dataset: GSM8K (Grade School Math 8K)

Synthetic Data Generation Process:

Prompted Qwen2.5-1.5B-Instruct to generate detailed reasoning chains for GSM8K problems
Created structured dataset with numbered steps, mathematical formulations, and clear final answers
Format: Question/Answer pairs with explicit step-by-step reasoning
Dataset uploaded to Hugging Face for reproducibility

The training data emphasizes teaching the model to show its work through:

Numbered reasoning steps
Intermediate calculations
Clear problem decomposition
Explicit final answers

Training Procedure

Hardware:

Google Colab with T4 GPU (free tier)
Training completed in reasonable time on consumer-grade hardware

Technique: QLoRA (Quantized Low-Rank Adaptation)

4-bit quantization of base model
LoRA adapters for efficient fine-tuning

Training Hyperparameters

LoRA Configuration:
- Rank (r): 8
- Alpha: 16
- Target modules: Attention layers
Training regime: 4-bit quantization with LoRA adapters (QLoRA)

Evaluation

Testing Data & Metrics

Test Set: 10 sample problems from GSM8K

Evaluation Metrics:

Answer Accuracy: Percentage of problems with correct final answers
Reasoning Structure: Percentage of responses following step-by-step format

Results

Performance Summary:

Metric	Score
Answer Accuracy	7/10 (70%)
Reasoning Structure	10/10 (100%)

Key Findings:

✅ Strengths:

100% adoption of structured reasoning format
All responses include intermediate calculations and explanations
Successfully breaks down complex problems into manageable steps
Significant improvement over base model in both structure and correctness

❌ Weaknesses:

30% error rate on mathematical accuracy
Some arithmetic errors in multi-step calculations
Incorrect answers despite showing reasoning steps

Analysis:

The model successfully learned both formatting and mathematical reasoning. The 70% accuracy with 100% structured output demonstrates effective fine-tuning. The self-teaching approach (using the same 1.5B model to generate training data) proved viable for teaching structure and improving accuracy, though there's room for improvement.

Environmental Impact

Training was conducted on Google Colab's free T4 GPU tier, minimizing environmental impact through:

Efficient QLoRA training (4-bit quantization)
Short training time on consumer-grade hardware
Parameter-efficient fine-tuning (only LoRA adapters trained)

Estimated carbon footprint is minimal due to use of shared, optimized infrastructure and efficient training methods.

Technical Specifications

Model Architecture and Objective

Base Architecture: Qwen2.5 transformer architecture (1.5B parameters)
Fine-tuning Method: QLoRA (4-bit quantized base model + trainable LoRA adapters)
Objective: Causal language modeling with focus on mathematical reasoning chains

Compute Infrastructure

Hardware

GPU: NVIDIA T4 (Google Colab free tier)
Memory: ~15GB GPU RAM (enabled by 4-bit quantization)

Software

PEFT 0.17.1
Transformers library (Hugging Face)
PyTorch
bitsandbytes (for quantization)

Citation

If you use this model, please cite:

BibTeX:

@misc{qwen25-math-reasoning,
  author = {Nishitha},
  title = {Qwen2.5-1.5B Fine-tuned for Mathematical Reasoning},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/your-username/model-name}}
}

Related Work:

Dettmers et al. (2023). "QLoRA: Efficient Finetuning of Quantized LLMs"
Cobbe et al. (2021). "Training Verifiers to Solve Math Word Problems" (GSM8K dataset)

Future Improvements

Potential Enhancements:

Use larger teacher models (Llama 70B, GPT-4) for higher-quality training data generation
Increase LoRA rank (16-32) for greater model capacity
Expand training dataset to 5,000-10,000 examples
Implement mathematical validation of reasoning chains
Fine-tune larger base models (Qwen 7B/14B) for improved baseline capability

Model Card Contact

For questions or feedback about this model, please reach out through the Hugging Face model repository.

Downloads last month: 1

Model tree for Nishitha03/Qwen2.5-1.5b-Reasoning-Updated

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(524)

this model

Dataset used to train Nishitha03/Qwen2.5-1.5b-Reasoning-Updated

Evaluation results

Answer Accuracy on GSM8K
self-reported

70.000
Reasoning Structure on GSM8K
self-reported

100.000

View on Papers With Code