Qwen3-0.6B-DPO
Model Card for Model ID
This model is a fine-tuned variant of Qwen/Qwen3-0.6B, trained using Direct Preference Optimization (DPO) on a preference-form version of the nvidia/HelpSteer2 dataset as part of the AIPlans Model Diffing Project.
Model Details
Model Description
This model is a 0.6B parameter language model based on Qwen3-0.6B and fine-tuned using DPO for preference optimization.
The goal of the fine-tuning was to improve helpfulness and harmlessness as measured by the HelpSteer2 preference dataset, while enabling controlled model diffing experiments within the AIPlans research workflow.
Special attention was paid to training efficiency, including gradient checkpointing and other memory-saving strategies.
Developed by: AIPlans
Funded by: AIPlans
Shared by: AIPlans
Model type: Causal decoder-only Transformer (LLM)
Languages: English
License: MIT
Fine-tuned from: Qwen/Qwen3-0.6B
Training Method: Direct Preference Optimization (DPO)
Intended Use: Research on model diffing, preference fine-tuning, evaluation of lightweight LLM behavior changes.
Model Sources
- Repository: https://github.com/AI-Plans/Model-Diffing/tree/main/DPOTrainer
- DPO Paper: https://arxiv.org/abs/2305.18290
Training Details
Training Data
Dataset is taken from Jennny/helpsteer2-helpfulness-preference . Thanks Jennny
Evaluation
Below is a comparison between the base Qwen3-0.6B model and our DPO-trained version (trained using HelpSteer2 preference data).
Evaluation Results
The model was evaluated using lm-eval-harness on multiple reasoning and truthfulness benchmarks.
Below is a comparison between the Base Qwen3-0.6B model and This DPO-Trained Model.
π Benchmark Comparison
Benchmark Comparison
| Task | Metric | Base Model | DPO Model | Change |
|---|---|---|---|---|
| ARC-Challenge | acc | 0.3148 | 0.3208 | +0.0060 |
| ARC-Challenge | acc_norm | 0.3447 | 0.3430 | β0.0017 |
| ARC-Easy | acc | 0.6044 | 0.6069 | +0.0025 |
| ARC-Easy | acc_norm | 0.5589 | 0.5610 | +0.0021 |
| HellaSwag | acc | 0.3751 | 0.3782 | +0.0031 |
| HellaSwag | acc_norm | 0.4738 | 0.4799 | +0.0061 |
| TruthfulQA MC2 | acc | 0.4275 | 0.4335 | +0.0060 |
| Winogrande | acc | 0.5604 | 0.5627 | +0.0023 |
Model Card Authors
Jithesh Pavan D Souza β AIPlans Research Intern
Model Card Contact
Jithesh β [email protected]
- Downloads last month
- 30