ziadrone's picture
v1.0: Initial fine-tuned model with reasoning tokens
1b5de24 verified
---
license: apache-2.0
base_model: Qwen/Qwen2-1.5B-Instruct
language:
- en
tags:
- reasoning
- math
- gsm8k
- chain-of-thought
pipeline_tag: text-generation
---
# 🧠 ARIES 1.5B - Reasoning Language Model
A 1.5B parameter reasoning model fine-tuned with custom reasoning tokens for step-by-step mathematical problem solving.
## πŸ“Š Model Details
- **Architecture:** Qwen2-1.5B-Instruct (base) + Custom Reasoning Tokens
- **Parameters:** 1.54B
- **Training Method:** Fine-tuned on GSM8K with reasoning token integration
- **Special Tokens:** `<think>`, `<context>`, `<answer>`, `<end>`
- **Training Loss:** 0.2130
- **Version:** v1.0-finetuned
## 🎯 What Makes This Model Special
This model extends Qwen2-1.5B with:
1. **Custom reasoning tokens** for structured thought processes
2. **Step-by-step explanation** capabilities
3. **GSM8K-style notation** support (`<<calculation=result>>`)
4. **Chain-of-thought reasoning** integration
## πŸ“ Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"ziadrone/aries-1.5b-reasoning",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("ziadrone/aries-1.5b-reasoning")
prompt = "<think> What is 25 + 17?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
```
## πŸ§ͺ Example Outputs
**Addition:**
```
Input: <think> What is 25 + 17?
Output: The answer to 25 + 17 is 42.
Explanation: We add the two numbers together.
25 + 17 = <<25+17=42>>42
```
**Word Problems:**
```
Input: <think> If I have $50 and spend $23, how much is left?
Output: You are left with 27 dollars.
If you have $50 and spend $23, then the amount left will be
$50 - $23 = <<50-23=27>>27
```
**Distance Problems:**
```
Input: <think> A train travels 60 mph for 3 hours. How far does it go?
Output: It goes 180 miles because 60 times 3 is 180.
<answer> 180 <end>
```
## πŸ“ˆ Training Details
- **Dataset:** GSM8K (1,500 training examples)
- **Epochs:** 2
- **Batch Size:** 1 Γ— 32 gradient accumulation
- **Learning Rate:** 3e-5 with cosine schedule + warmup
- **Optimizer:** AdamW with CPU offloading (memory efficient)
- **Training Time:** ~42 minutes on single GPU
- **Hardware:** NVIDIA GPU with 24GB VRAM
## πŸŽ“ Training Strategy
The model was trained using a memory-efficient approach:
- **CPU-offloaded optimizer states** (saved ~6GB GPU memory)
- **Gradient checkpointing** enabled
- **Mixed precision** (BF16)
- **Custom learning rate scheduler** with warmup
## πŸ”„ Roadmap
- **v1.0** (Current): Fine-tuned on GSM8K
- **v2.0** (Coming): Knowledge distillation for improved performance
- **v3.0** (Planned): Extended to MATH and MMLU datasets
## πŸ“„ License
Apache 2.0
## πŸ™ Credits
- **Base Model:** Qwen Team (Qwen2-1.5B-Instruct)
- **Reasoning Framework:** ARIES (Autonomous Reasoning Improvement via Ensembling Systems)
- **Training Dataset:** OpenAI GSM8K
- **Framework:** HuggingFace Transformers
## πŸ“§ Contact
For questions or collaborations: [Your contact]