Elliott
/

LUFFY-Qwen-Math-7B-Zero

@@ -1,13 +1,14 @@
 ---
 library_name: transformers
 tags:
 - reasoning
 - Zero-RL
-license: mit
-base_model:
-- Qwen/Qwen2.5-Math-7B
-pipeline_tag: text-generation
 ---
 # 📖Introduction
 ![Github](https://img.shields.io/badge/LUFFY-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)
@@ -74,9 +75,9 @@ LUFFY also generalizes well to out-of-distribution tasks, with over +6.2 average
 | Qwen2.5-Math-7B-Base             | 18.2      | 11.1             | 16.9         | 15.4     |
 | Qwen2.5-Math-7B-Instruct         | 70.3      | 24.7             | 34.1         | 43.0     |
 | SimpleRL-Zero                    | 30.2      | 23.2             | 34.5         | 29.3     |
-| OpenReasoner-Zero                | 66.2      | 29.8             | 58.7         | 51.6     |
 | PRIME-Zero                       | 73.3      | 18.2             | 32.7         | 41.4     |
 | Oat-Zero                         | 70.1      | 23.7             | 41.7         | 45.2     |
 | **LUFFY**                        | _80.5_    | _39.9_           | **53.0**     | **57.8** |
 ---
@@ -85,6 +86,8 @@ LUFFY also generalizes well to out-of-distribution tasks, with over +6.2 average
 LUFFY builds upon [veRL](https://github.com/volcengine/verl) and [deepscaler](https://github.com/agentica-project/rllm), and utilizes [vLLM](https://github.com/vllm-project/vllm) for inference. We utilize [Math-Verify](https://github.com/huggingface/Math-Verify) for math reasoning evaluation. We thank the open-source community for datasets and backbones, including [NuminaMath](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT), [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k), [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math), and [DeepSeek-R1](https://github.com/deepseek-ai/deepseek-r1) model.
 # Citation
 If you find our model, data, or evaluation code useful, please kindly cite our paper:
 ```bib

 ---
+base_model:
+- Qwen/Qwen2.5-Math-7B
 library_name: transformers
+license: mit
+pipeline_tag: text-generation
 tags:
 - reasoning
 - Zero-RL
 ---
 # 📖Introduction
 ![Github](https://img.shields.io/badge/LUFFY-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)
 | Qwen2.5-Math-7B-Base             | 18.2      | 11.1             | 16.9         | 15.4     |
 | Qwen2.5-Math-7B-Instruct         | 70.3      | 24.7             | 34.1         | 43.0     |
 | SimpleRL-Zero                    | 30.2      | 23.2             | 34.5         | 29.3     |
 | PRIME-Zero                       | 73.3      | 18.2             | 32.7         | 41.4     |
 | Oat-Zero                         | 70.1      | 23.7             | 41.7         | 45.2     |
+| OpenReasoner-Zero                | 66.2      | 29.8             | 58.7         | 51.6     |
 | **LUFFY**                        | _80.5_    | _39.9_           | **53.0**     | **57.8** |
 ---
 LUFFY builds upon [veRL](https://github.com/volcengine/verl) and [deepscaler](https://github.com/agentica-project/rllm), and utilizes [vLLM](https://github.com/vllm-project/vllm) for inference. We utilize [Math-Verify](https://github.com/huggingface/Math-Verify) for math reasoning evaluation. We thank the open-source community for datasets and backbones, including [NuminaMath](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT), [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k), [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math), and [DeepSeek-R1](https://github.com/deepseek-ai/deepseek-r1) model.
+Code: https://github.com/ElliottYan/LUFFY
 # Citation
 If you find our model, data, or evaluation code useful, please kindly cite our paper:
 ```bib