Research model
Small training in T4 kaggle
Evaluation model from ACEBench
Compare 3 models Cery (SFT), Cery-M(GRPO), Cery-High(SFT+GRPO)

Details:
- Fix the chat template for instruct generation.
- GRPO training process. (focus on calling a tool)
Config LoRA
rank 16
alpha 32
epoch 1