---
library_name: transformers
license: other
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: s1.1_qwq_ds
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# S1.1-QwQ-DS

This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) on the [S1.1-QwQ](https://huggingface.co/datasets/BitStarWalkin/S1.1-QwQ) dataset.

The model has achieved state-of-the-art reasoning capabilities on challengining benchmarks including AIME2024/2025, MATH500 and GPQA-Diamond.


## Training and evaluation data

We utilize [LLaMAFactory](https://github.com/hiyouga/LLaMA-Factory)  with $8\times A100-SXM4-80GB$ GPU to conduct full-parameter finetuning on our self-curated S1.1-QWQ dataset, which is another refined version of [S1.1-1K](https://huggingface.co/datasets/simplescaling/s1K-1.1) dataset. 

We use [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) to generate reasoning trajectories for each of the  problem in S1.1-1k dataset. The experiment turns out that the quality of QwQ generated trajectories are better than the original version including (Gemini-2.0-flash-thinking and DeepSeek-R1).

Dataset: [S1.1-QwQ](https://huggingface.co/datasets/BitStarWalkin/S1.1-QwQ)

Here we present the evaluation results of our S1.1-QwQ-DS/Qwen-32B on challenging reasoning tasks including AIME2024,AIM2025,MATH500 and GPQA-Diamond. 

| Model                      | Model Size | AIME2024       | AIME2025       | MATH500         | GPQA           |
|---------------------------|------------|----------------|----------------|------------------|----------------|
| Qwen2.5-Instruct          | 32B        | 16.7           | 26.7           | 84.2             | 48.5           |
| +S1-1k (Gemini-2.0-flash-thinking)               | 32B        | 56.7 | 26.7 |  93.0   |  59.6 |
| +S1.1-32B (R1)              | 32B        | 56.7 | 60.0 |  95.4   |  63.6 |
| S1.1-QwQ-Qwen-32B  (Ours)               | 32B        | 66.7  | 60.0  | 95.8     | 64.7   |
| S1.1-QwQ-DS-32B (Ours)                 | 32B        | **83.3**  | **73.3** | **96.4**    | **66.7**   |

Compare to other version of s1-1k dataset, our newly curated dataset has demonstrate the supeority of performance gains based on [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) and [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) over all benchmarks.

We also compare our results with more open-source reasoning LLMs:
| Category             | Model             | Model Size | AIME 2024 | AIME 2025 | MATH500 | GPQA  |
|----------------------|-------------------|------------|-----------|-----------|---------|-------|
| Industrial Models    | QwQ               | 32B        | 80.0      | 60.0      | 97.6    | 68.2  |
|                      | DeepSeek-R1       | 671B       | 79.8      | -         | 97.3    | 71.5  |
| Open-Sourced Models  | Qwen2.5-Instruct  | 32B        | 16.7      | 26.7      | 84.2    | 48.5  |
|                      | R1-Distill-Qwen2.5| 7B         | 50.0      | 40.0      | 92.6    | 47.0  |
|                      | R1-Distill-Qwen2.5| 14B        | 60.0      | 26.7      | 92.0    | 52.0  |
|                      | R1-Distill-Qwen2.5| 32B        | 70.0      | 46.7      | 92.0    | 59.6  |
|                      | OpenThinker       | 32B        | 63.3      | 46.7      | 94.8    | 60.1  |
|                      | FuseO1-Preview    | 32B        | 76.7      | 40.0      | 93.4    | 59.1  |
|                      | Tiny-R1           | 32B        | 76.7      | 53.3      | 95.4    | -     |
|                      | Light-R1          | 32B        | 78.1      | 65.9      | 96.2    | 68.0  |
|                      | EXAONE-Deep       | 32B        | 70.0      | 60.0      | 96.2    | 64.6  |
|                      | LIMO              | 32B        | 56.7      | 33.3      | 92.2    | 58.8  |
| Our Model            | S1.1-QwQ-DS       | 32B        | **83.3**      | **73.3**      | **96.4**    | **66.7**  |

**We provide our evaluation results in folder eval_result.**

## Quick start with VLLM
```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = 'BitStarWalkin/S1.1-QwQ-DS'
model = LLM(
    model_id,
    tensor_parallel_size=8,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
sampling_params = SamplingParams(
    max_tokens=16384,
)
question = """Let \(x, y\), and \(z\) be positive real numbers satisfying the system of equations:
\[
\begin{array}{c}
\sqrt{2 x-x y}+\sqrt{2 y-x y}=1 \\
\sqrt{2 y-y z}+\sqrt{2 z-y z}=\sqrt{2} \\
\sqrt{2 z-z x}+\sqrt{2 x-z x}=\sqrt{3} .
\end{array}
\]
Then \(\left[(1-x)(1-y)(1-z)\right]^{2}\) can be written as \(\frac{m}{n}\), where \(m\) and \(n\) are relatively prime positive integers. Find \(m+n\)."""
ds_prompt="<｜User｜>\n" + question + "<｜Assistant｜>\n"
output = model.generate(ds_prompt, sampling_params=sampling_params)
print(output[0].outputs[0].text)
```

## Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 5.0

### Framework versions

- Transformers 4.49.0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0