Xuerui2312/DeepSeek-R1-Distill-Qwen-7B-TRPA-DeepScaleR-verl0326 Text Generation • 8B • Updated Jun 20 • 3 • 1
hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_kl1e-3_acc Text Generation • 8B • Updated Jun 15 • 3
hdong0/deepseek-Qwen2.5-1.5B-baseline-Open-R1-GRPO_deepscaler_mu_8 Text Generation • 2B • Updated Jul 4 • 3
hdong0/Qwen2.5-Math-1.5B-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Jul 7 • 3
hdong0/deepseek-Qwen-1.5B-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Jul 7 • 7
hdong0/Qwen2.5-Math-1.5B-baseline-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Jul 8 • 3