Qwen3-0.6B-DPO

Model Card for Model ID

This model is a fine-tuned variant of Qwen/Qwen3-0.6B, trained using Direct Preference Optimization (DPO) on a preference-form version of the nvidia/HelpSteer2 dataset as part of the AIPlans Model Diffing Project.

Model Details

Model Description

This model is a 0.6B parameter language model based on Qwen3-0.6B and fine-tuned using DPO for preference optimization.
The goal of the fine-tuning was to improve helpfulness and harmlessness as measured by the HelpSteer2 preference dataset, while enabling controlled model diffing experiments within the AIPlans research workflow.

Special attention was paid to training efficiency, including gradient checkpointing and other memory-saving strategies.

Developed by: AIPlans
Funded by: AIPlans
Shared by: AIPlans

Model type: Causal decoder-only Transformer (LLM)
Languages: English
License: MIT
Fine-tuned from: Qwen/Qwen3-0.6B
Training Method: Direct Preference Optimization (DPO)
Intended Use: Research on model diffing, preference fine-tuning, evaluation of lightweight LLM behavior changes.

Model Sources

Repository: https://github.com/AI-Plans/Model-Diffing/tree/main/DPOTrainer
DPO Paper: https://arxiv.org/abs/2305.18290

Training Details

Training Data

Dataset is taken from Jennny/helpsteer2-helpfulness-preference . Thanks Jennny

Evaluation

Below is a comparison between the base Qwen3-0.6B model and our DPO-trained version (trained using HelpSteer2 preference data).

Evaluation Results

The model was evaluated using lm-eval-harness on multiple reasoning and truthfulness benchmarks.
Below is a comparison between the Base Qwen3-0.6B model and This DPO-Trained Model.

📊 Benchmark Comparison

Benchmark Comparison

Task	Metric	Base Model	DPO Model	Change
ARC-Challenge	acc	0.3148	0.3208	+0.0060
ARC-Challenge	acc_norm	0.3447	0.3430	−0.0017
ARC-Easy	acc	0.6044	0.6069	+0.0025
ARC-Easy	acc_norm	0.5589	0.5610	+0.0021
HellaSwag	acc	0.3751	0.3782	+0.0031
HellaSwag	acc_norm	0.4738	0.4799	+0.0061
TruthfulQA MC2	acc	0.4275	0.4335	+0.0060
Winogrande	acc	0.5604	0.5627	+0.0023

Model Card Authors

Jithesh Pavan D Souza – AIPlans Research Intern

Model Card Contact

Jithesh – [email protected]

Downloads last month: 30

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for AIPlans/Qwen3-0.6B-DPO_NOTLORA

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B