huseinzol05 commited on
Commit
2e1d11d
·
verified ·
1 Parent(s): 57b5710

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ Online Reinforcement learning using GRPO full parameter on warmup reasoning SFT
14
 
15
  ## Improvement
16
 
17
- 1. Improve reasoning on Dialects, each datapoint been replicated to 12 generations.
18
  2. Actual online reinforcement learning.
19
 
20
  ## Better performance
 
14
 
15
  ## Improvement
16
 
17
+ 1. Improve reasoning on Dialects, each datapoint been replicated to 6 generations.
18
  2. Actual online reinforcement learning.
19
 
20
  ## Better performance