Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ Online Reinforcement learning using GRPO full parameter on warmup reasoning SFT
|
|
| 14 |
|
| 15 |
## Improvement
|
| 16 |
|
| 17 |
-
1. Improve reasoning on Dialects, each datapoint been replicated to
|
| 18 |
2. Actual online reinforcement learning.
|
| 19 |
|
| 20 |
## Better performance
|
|
|
|
| 14 |
|
| 15 |
## Improvement
|
| 16 |
|
| 17 |
+
1. Improve reasoning on Dialects, each datapoint been replicated to 6 generations.
|
| 18 |
2. Actual online reinforcement learning.
|
| 19 |
|
| 20 |
## Better performance
|