---
license: apache-2.0
base_model: Qwen/Qwen2-1.5B-Instruct
language:
- en
tags:
- reasoning
- math
- gsm8k
- chain-of-thought
pipeline_tag: text-generation
---

# 🧠 ARIES 1.5B - Reasoning Language Model

A 1.5B parameter reasoning model fine-tuned with custom reasoning tokens for step-by-step mathematical problem solving.

## 📊 Model Details

- **Architecture:** Qwen2-1.5B-Instruct (base) + Custom Reasoning Tokens
- **Parameters:** 1.54B
- **Training Method:** Fine-tuned on GSM8K with reasoning token integration
- **Special Tokens:** `<think>`, `<context>`, `<answer>`, `<end>`
- **Training Loss:** 0.2130
- **Version:** v1.0-finetuned

## 🎯 What Makes This Model Special

This model extends Qwen2-1.5B with:
1. **Custom reasoning tokens** for structured thought processes
2. **Step-by-step explanation** capabilities
3. **GSM8K-style notation** support (`<<calculation=result>>`)
4. **Chain-of-thought reasoning** integration

## 📝 Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "ziadrone/aries-1.5b-reasoning",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("ziadrone/aries-1.5b-reasoning")

prompt = "<think> What is 25 + 17?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9
)

print(tokenizer.decode(outputs[0], skip_special_tokens=False))
```

## 🧪 Example Outputs

**Addition:**
```
Input: <think> What is 25 + 17?
Output: The answer to 25 + 17 is 42. 
        Explanation: We add the two numbers together. 
        25 + 17 = <<25+17=42>>42
```

**Word Problems:**
```
Input: <think> If I have $50 and spend $23, how much is left?
Output: You are left with 27 dollars. 
        If you have $50 and spend $23, then the amount left will be 
        $50 - $23 = <<50-23=27>>27
```

**Distance Problems:**
```
Input: <think> A train travels 60 mph for 3 hours. How far does it go?
Output: It goes 180 miles because 60 times 3 is 180.
        <answer> 180 <end>
```

## 📈 Training Details

- **Dataset:** GSM8K (1,500 training examples)
- **Epochs:** 2
- **Batch Size:** 1 × 32 gradient accumulation
- **Learning Rate:** 3e-5 with cosine schedule + warmup
- **Optimizer:** AdamW with CPU offloading (memory efficient)
- **Training Time:** ~42 minutes on single GPU
- **Hardware:** NVIDIA GPU with 24GB VRAM

## 🎓 Training Strategy

The model was trained using a memory-efficient approach:
- **CPU-offloaded optimizer states** (saved ~6GB GPU memory)
- **Gradient checkpointing** enabled
- **Mixed precision** (BF16)
- **Custom learning rate scheduler** with warmup

## 🔄 Roadmap

- **v1.0** (Current): Fine-tuned on GSM8K
- **v2.0** (Coming): Knowledge distillation for improved performance
- **v3.0** (Planned): Extended to MATH and MMLU datasets

## 📄 License

Apache 2.0

## 🙏 Credits

- **Base Model:** Qwen Team (Qwen2-1.5B-Instruct)
- **Reasoning Framework:** ARIES (Autonomous Reasoning Improvement via Ensembling Systems)
- **Training Dataset:** OpenAI GSM8K
- **Framework:** HuggingFace Transformers

## 📧 Contact

For questions or collaborations: [Your contact]