huseinzol05 commited on
Commit
57b5710
·
verified ·
1 Parent(s): b5afa92

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -17,6 +17,10 @@ Online Reinforcement learning using GRPO full parameter on warmup reasoning SFT
17
  1. Improve reasoning on Dialects, each datapoint been replicated to 12 generations.
18
  2. Actual online reinforcement learning.
19
 
 
 
 
 
20
  ## Training session
21
 
22
  Finetune on [huseinzol05/malaysian-dialect-qa](https://huggingface.co/datasets/huseinzol05/malaysian-dialect-qa), this is train set from [mesolitica/Malay-Dialect-Reasoning](https://huggingface.co/datasets/mesolitica/Malay-Dialect-Reasoning).
 
17
  1. Improve reasoning on Dialects, each datapoint been replicated to 12 generations.
18
  2. Actual online reinforcement learning.
19
 
20
+ ## Better performance
21
+
22
+ To get better performance, use system prompt `You are going to enter reasoning mode. First, you try to think step-by-step in Malay. After that, put your final answer within $\\boxed{}$.`, you can check how we trained it at https://github.com/mesolitica/malaya/blob/master/session/qwen2.5/grpo.py#L80
23
+
24
  ## Training session
25
 
26
  Finetune on [huseinzol05/malaysian-dialect-qa](https://huggingface.co/datasets/huseinzol05/malaysian-dialect-qa), this is train set from [mesolitica/Malay-Dialect-Reasoning](https://huggingface.co/datasets/mesolitica/Malay-Dialect-Reasoning).