--- library_name: transformers license: other base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B tags: - llama-factory - full - generated_from_trainer model-index: - name: s1.1_qwq_ds results: [] --- # S1.1-QwQ-DS This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) on the [S1.1-QwQ](https://huggingface.co/datasets/BitStarWalkin/S1.1-QwQ) dataset. The model has achieved state-of-the-art reasoning capabilities on challengining benchmarks including AIME2024/2025, MATH500 and GPQA-Diamond. ## Training and evaluation data We utilize [LLaMAFactory](https://github.com/hiyouga/LLaMA-Factory) with $8\times A100-SXM4-80GB$ GPU to conduct full-parameter finetuning on our self-curated S1.1-QWQ dataset, which is another refined version of [S1.1-1K](https://huggingface.co/datasets/simplescaling/s1K-1.1) dataset. We use [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) to generate reasoning trajectories for each of the problem in S1.1-1k dataset. The experiment turns out that the quality of QwQ generated trajectories are better than the original version including (Gemini-2.0-flash-thinking and DeepSeek-R1). Dataset: [S1.1-QwQ](https://huggingface.co/datasets/BitStarWalkin/S1.1-QwQ) Here we present the evaluation results of our S1.1-QwQ-DS/Qwen-32B on challenging reasoning tasks including AIME2024,AIM2025,MATH500 and GPQA-Diamond. | Model | Model Size | AIME2024 | AIME2025 | MATH500 | GPQA | |---------------------------|------------|----------------|----------------|------------------|----------------| | Qwen2.5-Instruct | 32B | 16.7 | 26.7 | 84.2 | 48.5 | | +S1-1k (Gemini-2.0-flash-thinking) | 32B | 56.7 | 26.7 | 93.0 | 59.6 | | +S1.1-32B (R1) | 32B | 56.7 | 60.0 | 95.4 | 63.6 | | S1.1-QwQ-Qwen-32B (Ours) | 32B | 66.7 | 60.0 | 95.8 | 64.7 | | S1.1-QwQ-DS-32B (Ours) | 32B | **83.3** | **73.3** | **96.4** | **66.7** | Compare to other version of s1-1k dataset, our newly curated dataset has demonstrate the supeority of performance gains based on [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) and [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) over all benchmarks. We also compare our results with more open-source reasoning LLMs: | Category | Model | Model Size | AIME 2024 | AIME 2025 | MATH500 | GPQA | |----------------------|-------------------|------------|-----------|-----------|---------|-------| | Industrial Models | QwQ | 32B | 80.0 | 60.0 | 97.6 | 68.2 | | | DeepSeek-R1 | 671B | 79.8 | - | 97.3 | 71.5 | | Open-Sourced Models | Qwen2.5-Instruct | 32B | 16.7 | 26.7 | 84.2 | 48.5 | | | R1-Distill-Qwen2.5| 7B | 50.0 | 40.0 | 92.6 | 47.0 | | | R1-Distill-Qwen2.5| 14B | 60.0 | 26.7 | 92.0 | 52.0 | | | R1-Distill-Qwen2.5| 32B | 70.0 | 46.7 | 92.0 | 59.6 | | | OpenThinker | 32B | 63.3 | 46.7 | 94.8 | 60.1 | | | FuseO1-Preview | 32B | 76.7 | 40.0 | 93.4 | 59.1 | | | Tiny-R1 | 32B | 76.7 | 53.3 | 95.4 | - | | | Light-R1 | 32B | 78.1 | 65.9 | 96.2 | 68.0 | | | EXAONE-Deep | 32B | 70.0 | 60.0 | 96.2 | 64.6 | | | LIMO | 32B | 56.7 | 33.3 | 92.2 | 58.8 | | Our Model | S1.1-QwQ-DS | 32B | **83.3** | **73.3** | **96.4** | **66.7** | **We provide our evaluation results in folder eval_result.** ## Quick start with VLLM ```python from vllm import LLM, SamplingParams from transformers import AutoTokenizer model_id = 'BitStarWalkin/S1.1-QwQ-DS' model = LLM( model_id, tensor_parallel_size=8, ) tokenizer = AutoTokenizer.from_pretrained(model_id) sampling_params = SamplingParams( max_tokens=16384, ) question = """Let \(x, y\), and \(z\) be positive real numbers satisfying the system of equations: \[ \begin{array}{c} \sqrt{2 x-x y}+\sqrt{2 y-x y}=1 \\ \sqrt{2 y-y z}+\sqrt{2 z-y z}=\sqrt{2} \\ \sqrt{2 z-z x}+\sqrt{2 x-z x}=\sqrt{3} . \end{array} \] Then \(\left[(1-x)(1-y)(1-z)\right]^{2}\) can be written as \(\frac{m}{n}\), where \(m\) and \(n\) are relatively prime positive integers. Find \(m+n\).""" ds_prompt="<|User|>\n" + question + "<|Assistant|>\n" output = model.generate(ds_prompt, sampling_params=sampling_params) print(output[0].outputs[0].text) ``` ## Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - total_eval_batch_size: 64 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - num_epochs: 5.0 ### Framework versions - Transformers 4.49.0 - Pytorch 2.5.1+cu124 - Datasets 3.2.0 - Tokenizers 0.21.0