INFLogic-Qwen2.5-32B-RL-Preview
Model Overview
- INFLogic-Qwen2.5-32B-RL-Preview enhances the reasoning capabilities of DeepSeek-R1-Distill-Qwen-32B through fine-tuning on our proprietary logical reasoning dataset using reinforcement learning with verifiable rewards (RLVR).
- As of March 27th, 2025, this model achieves state-of-the-art performance among open-source LLMs on ZebraLogicBench, demonstrating enhanced logical reasoning abilities.
Evaluation Results
| Model | MATH-500 | ZebraLogic | GPQA | 
|---|---|---|---|
| INFLogic-Qwen2.5-32B-RL-Preview | 95.6 | 85.1 | 65.7 | 
| DeepSeek-R1-Distill-Qwen-32B | 94.3 | 68.7 | 62.1 | 
| DeepSeek-R1 | 96.2 | 77.2 | 78.9 | 
| OpenAI o1 | 96.4 | 87.9 | 85.2 | 
Detailed result
| Metric | Value | 
|---|---|
| Puzzle Acc | 0.851 | 
| Small Puzzle Acc | 0.982 | 
| Medium Puzzle Acc | 0.969 | 
| Large Puzzle Acc | 0.848 | 
| XL Puzzle Acc | 0.480 | 
| Total Puzzles | 1000 | 
| N_Mode | single | 
| N_Size | 1 | 
| Reason Lens | 559.9 | 
We report pass@1 scores using vLLM 0.5.3 (temperature=0.6, top_p=0.95). For MATH-500 and GPQA, we used Open R1's evaluation scripts. Other models' results come from their original reports.
Contributors
Supervisors
Wei Chu • Yuan Qi
Logic Team
Cheng Peng • Shuyao Xu • Weidi Xu
Acknowledgments
We thank Chao Qu, Haozhe Wang, Jiaran Hao, and Liuyihan Song for their valuable discussions and support.
Citation
If you find our model useful, please consider citing:
@misc{INFLogic_RL_Preview,
  author       = {Peng, Cheng and Xu, Shuyao and Xu, Weidi and Chu, Wei and Qi, Yuan},
  title        = {INFLogic-Qwen2.5-32B-RL-Preview},
  year         = {2025},
  month        = {March},
  howpublished = {Hugging Face},
  url          = {https://huggingface.co/infly/INFLogic-Qwen2.5-32B-RL-Preview},
}
- Downloads last month
- 62
