v1.0: Initial fine-tuned model with reasoning tokens

1b5de24 verified 28 days ago

3.27 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2-1.5B-Instruct
	language:
	- en
	tags:
	- reasoning
	- math
	- gsm8k
	- chain-of-thought
	pipeline_tag: text-generation
	---

	# 🧠 ARIES 1.5B - Reasoning Language Model

	A 1.5B parameter reasoning model fine-tuned with custom reasoning tokens for step-by-step mathematical problem solving.

	## 📊 Model Details

	- Architecture: Qwen2-1.5B-Instruct (base) + Custom Reasoning Tokens
	- Parameters: 1.54B
	- Training Method: Fine-tuned on GSM8K with reasoning token integration
	- Special Tokens: `<think>`, `<context>`, `<answer>`, `<end>`
	- Training Loss: 0.2130
	- Version: v1.0-finetuned

	## 🎯 What Makes This Model Special

	This model extends Qwen2-1.5B with:
	1. Custom reasoning tokens for structured thought processes
	2. Step-by-step explanation capabilities
	3. GSM8K-style notation support (`<<calculation=result>>`)
	4. Chain-of-thought reasoning integration

	## 📝 Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model = AutoModelForCausalLM.from_pretrained(
	"ziadrone/aries-1.5b-reasoning",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("ziadrone/aries-1.5b-reasoning")

	prompt = "<think> What is 25 + 17?"
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

	outputs = model.generate(
	**inputs,
	max_new_tokens=100,
	temperature=0.7,
	top_p=0.9
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=False))
	```

	## 🧪 Example Outputs

	Addition:
	```
	Input: <think> What is 25 + 17?
	Output: The answer to 25 + 17 is 42.
	Explanation: We add the two numbers together.
	25 + 17 = <<25+17=42>>42
	```

	Word Problems:
	```
	Input: <think> If I have $50 and spend $23, how much is left?
	Output: You are left with 27 dollars.
	If you have $50 and spend $23, then the amount left will be
	$50 - $23 = <<50-23=27>>27
	```

	Distance Problems:
	```
	Input: <think> A train travels 60 mph for 3 hours. How far does it go?
	Output: It goes 180 miles because 60 times 3 is 180.
	<answer> 180 <end>
	```

	## 📈 Training Details

	- Dataset: GSM8K (1,500 training examples)
	- Epochs: 2
	- Batch Size: 1 × 32 gradient accumulation
	- Learning Rate: 3e-5 with cosine schedule + warmup
	- Optimizer: AdamW with CPU offloading (memory efficient)
	- Training Time: ~42 minutes on single GPU
	- Hardware: NVIDIA GPU with 24GB VRAM

	## 🎓 Training Strategy

	The model was trained using a memory-efficient approach:
	- CPU-offloaded optimizer states (saved ~6GB GPU memory)
	- Gradient checkpointing enabled
	- Mixed precision (BF16)
	- Custom learning rate scheduler with warmup

	## 🔄 Roadmap

	- v1.0 (Current): Fine-tuned on GSM8K
	- v2.0 (Coming): Knowledge distillation for improved performance
	- v3.0 (Planned): Extended to MATH and MMLU datasets

	## 📄 License

	Apache 2.0

	## 🙏 Credits

	- Base Model: Qwen Team (Qwen2-1.5B-Instruct)
	- Reasoning Framework: ARIES (Autonomous Reasoning Improvement via Ensembling Systems)
	- Training Dataset: OpenAI GSM8K
	- Framework: HuggingFace Transformers

	## 📧 Contact

	For questions or collaborations: [Your contact]