|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2-1.5B-Instruct |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- reasoning |
|
|
- math |
|
|
- gsm8k |
|
|
- chain-of-thought |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# π§ ARIES 1.5B - Reasoning Language Model |
|
|
|
|
|
A 1.5B parameter reasoning model fine-tuned with custom reasoning tokens for step-by-step mathematical problem solving. |
|
|
|
|
|
## π Model Details |
|
|
|
|
|
- **Architecture:** Qwen2-1.5B-Instruct (base) + Custom Reasoning Tokens |
|
|
- **Parameters:** 1.54B |
|
|
- **Training Method:** Fine-tuned on GSM8K with reasoning token integration |
|
|
- **Special Tokens:** `<think>`, `<context>`, `<answer>`, `<end>` |
|
|
- **Training Loss:** 0.2130 |
|
|
- **Version:** v1.0-finetuned |
|
|
|
|
|
## π― What Makes This Model Special |
|
|
|
|
|
This model extends Qwen2-1.5B with: |
|
|
1. **Custom reasoning tokens** for structured thought processes |
|
|
2. **Step-by-step explanation** capabilities |
|
|
3. **GSM8K-style notation** support (`<<calculation=result>>`) |
|
|
4. **Chain-of-thought reasoning** integration |
|
|
|
|
|
## π Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"ziadrone/aries-1.5b-reasoning", |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained("ziadrone/aries-1.5b-reasoning") |
|
|
|
|
|
prompt = "<think> What is 25 + 17?" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=100, |
|
|
temperature=0.7, |
|
|
top_p=0.9 |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=False)) |
|
|
``` |
|
|
|
|
|
## π§ͺ Example Outputs |
|
|
|
|
|
**Addition:** |
|
|
``` |
|
|
Input: <think> What is 25 + 17? |
|
|
Output: The answer to 25 + 17 is 42. |
|
|
Explanation: We add the two numbers together. |
|
|
25 + 17 = <<25+17=42>>42 |
|
|
``` |
|
|
|
|
|
**Word Problems:** |
|
|
``` |
|
|
Input: <think> If I have $50 and spend $23, how much is left? |
|
|
Output: You are left with 27 dollars. |
|
|
If you have $50 and spend $23, then the amount left will be |
|
|
$50 - $23 = <<50-23=27>>27 |
|
|
``` |
|
|
|
|
|
**Distance Problems:** |
|
|
``` |
|
|
Input: <think> A train travels 60 mph for 3 hours. How far does it go? |
|
|
Output: It goes 180 miles because 60 times 3 is 180. |
|
|
<answer> 180 <end> |
|
|
``` |
|
|
|
|
|
## π Training Details |
|
|
|
|
|
- **Dataset:** GSM8K (1,500 training examples) |
|
|
- **Epochs:** 2 |
|
|
- **Batch Size:** 1 Γ 32 gradient accumulation |
|
|
- **Learning Rate:** 3e-5 with cosine schedule + warmup |
|
|
- **Optimizer:** AdamW with CPU offloading (memory efficient) |
|
|
- **Training Time:** ~42 minutes on single GPU |
|
|
- **Hardware:** NVIDIA GPU with 24GB VRAM |
|
|
|
|
|
## π Training Strategy |
|
|
|
|
|
The model was trained using a memory-efficient approach: |
|
|
- **CPU-offloaded optimizer states** (saved ~6GB GPU memory) |
|
|
- **Gradient checkpointing** enabled |
|
|
- **Mixed precision** (BF16) |
|
|
- **Custom learning rate scheduler** with warmup |
|
|
|
|
|
## π Roadmap |
|
|
|
|
|
- **v1.0** (Current): Fine-tuned on GSM8K |
|
|
- **v2.0** (Coming): Knowledge distillation for improved performance |
|
|
- **v3.0** (Planned): Extended to MATH and MMLU datasets |
|
|
|
|
|
## π License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## π Credits |
|
|
|
|
|
- **Base Model:** Qwen Team (Qwen2-1.5B-Instruct) |
|
|
- **Reasoning Framework:** ARIES (Autonomous Reasoning Improvement via Ensembling Systems) |
|
|
- **Training Dataset:** OpenAI GSM8K |
|
|
- **Framework:** HuggingFace Transformers |
|
|
|
|
|
## π§ Contact |
|
|
|
|
|
For questions or collaborations: [Your contact] |
|
|
|