hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr Text Generation • 8B • Updated Jul 9 • 3
hdong0/Qwen2.5-Math-1.5B-untied-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Aug 5 • 1
hdong0/deepseek-Qwen2.5-7B-baseline-thin-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr Text Generation • 8B • Updated Aug 10 • 1
hdong0/deepseek-Llama-8B-baseline-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr Text Generation • 8B • Updated Aug 13 • 1
hdong0/deepseek-Qwen2.5-1.5B-baseline-thin-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Aug 17 • 1
hdong0/deepseek-Qwen-1.5B-baseline-thin-Open-R1-GRPO_deepscaler_mu_8_constant_lr_warmed Text Generation • 2B • Updated Aug 19 • 18
hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr_no_kl Text Generation • 8B • Updated Aug 20 • 25