LuyiCui commited on
Commit
aceb7eb
·
verified ·
1 Parent(s): f60faf3

Model save

Browse files
README.md CHANGED
@@ -1,10 +1,8 @@
1
  ---
2
- datasets: AI-MO/NuminaMath-TIR
3
  library_name: transformers
4
  model_name: Qwen2.5-1.5B-Open-R1-GRPO
5
  tags:
6
  - generated_from_trainer
7
- - open-r1
8
  - trl
9
  - grpo
10
  licence: license
@@ -12,7 +10,7 @@ licence: license
12
 
13
  # Model Card for Qwen2.5-1.5B-Open-R1-GRPO
14
 
15
- This model is a fine-tuned version of [None](https://huggingface.co/None) on the [AI-MO/NuminaMath-TIR](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset.
16
  It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
  ## Quick start
@@ -28,18 +26,18 @@ print(output["generated_text"])
28
 
29
  ## Training procedure
30
 
31
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/cuiluyi/huggingface/runs/9ncklfe9)
32
 
33
 
34
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
35
 
36
  ### Framework versions
37
 
38
- - TRL: 0.15.0.dev0
39
- - Transformers: 4.49.0.dev0
40
- - Pytorch: 2.5.1
41
- - Datasets: 3.2.0
42
- - Tokenizers: 0.21.0
43
 
44
  ## Citations
45
 
@@ -60,7 +58,7 @@ Cite TRL as:
60
  ```bibtex
61
  @misc{vonwerra2022trl,
62
  title = {{TRL: Transformer Reinforcement Learning}},
63
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
64
  year = 2020,
65
  journal = {GitHub repository},
66
  publisher = {GitHub},
 
1
  ---
 
2
  library_name: transformers
3
  model_name: Qwen2.5-1.5B-Open-R1-GRPO
4
  tags:
5
  - generated_from_trainer
 
6
  - trl
7
  - grpo
8
  licence: license
 
10
 
11
  # Model Card for Qwen2.5-1.5B-Open-R1-GRPO
12
 
13
+ This model is a fine-tuned version of [None](https://huggingface.co/None).
14
  It has been trained using [TRL](https://github.com/huggingface/trl).
15
 
16
  ## Quick start
 
26
 
27
  ## Training procedure
28
 
29
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/cuiluyi/huggingface/runs/evidr78o)
30
 
31
 
32
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
33
 
34
  ### Framework versions
35
 
36
+ - TRL: 0.17.0.dev0
37
+ - Transformers: 4.51.2
38
+ - Pytorch: 2.6.0
39
+ - Datasets: 3.5.0
40
+ - Tokenizers: 0.21.1
41
 
42
  ## Citations
43
 
 
58
  ```bibtex
59
  @misc{vonwerra2022trl,
60
  title = {{TRL: Transformer Reinforcement Learning}},
61
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
62
  year = 2020,
63
  journal = {GitHub repository},
64
  publisher = {GitHub},
all_results.json CHANGED
@@ -1,13 +1,8 @@
1
  {
2
- "eval_loss": 0.012912314385175705,
3
- "eval_runtime": 196.6453,
4
- "eval_samples": 99,
5
- "eval_samples_per_second": 0.503,
6
- "eval_steps_per_second": 0.015,
7
  "total_flos": 0.0,
8
- "train_loss": 69.57985635299504,
9
- "train_runtime": 92793.5148,
10
- "train_samples": 20000,
11
- "train_samples_per_second": 0.216,
12
- "train_steps_per_second": 0.027
13
  }
 
1
  {
 
 
 
 
 
2
  "total_flos": 0.0,
3
+ "train_loss": 0.026552865276898957,
4
+ "train_runtime": 9272.812,
5
+ "train_samples": 93733,
6
+ "train_samples_per_second": 0.043,
7
+ "train_steps_per_second": 0.005
8
  }
generation_config.json CHANGED
@@ -10,5 +10,5 @@
10
  "temperature": 0.7,
11
  "top_k": 20,
12
  "top_p": 0.8,
13
- "transformers_version": "4.49.0.dev0"
14
  }
 
10
  "temperature": 0.7,
11
  "top_k": 20,
12
  "top_p": 0.8,
13
+ "transformers_version": "4.51.2"
14
  }
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "total_flos": 0.0,
3
- "train_loss": 69.57985635299504,
4
- "train_runtime": 92793.5148,
5
- "train_samples": 20000,
6
- "train_samples_per_second": 0.216,
7
- "train_steps_per_second": 0.027
8
  }
 
1
  {
2
  "total_flos": 0.0,
3
+ "train_loss": 0.026552865276898957,
4
+ "train_runtime": 9272.812,
5
+ "train_samples": 93733,
6
+ "train_samples_per_second": 0.043,
7
+ "train_steps_per_second": 0.005
8
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff