hzy/qwen1.5b-math-base-3-to-5-grpo_std_on-mi300x-3000-drgrpo-len-with-entropy-loss-step-980 2B • Updated Apr 11 • 4
hzy/qwen2.5_1.5b-math-stage2-naive-grpo_std_on-with-large-rollout_with_df_20250328-step-180 2B • Updated Mar 30 • 5
hzy/qwen2.5_1.5b-math-short-0-long-1-restricted-overlong-1024-0.5-len-reward-step-640 2B • Updated Mar 27 • 4
hzy/verl-grpo-math-qwen2.5-1.5b-short-0-long-1-restricted-overlong-1024-step-140 2B • Updated Mar 26 • 5
hzy/verl-grpo-math-qwen2.5-1.5b-short-0-long-1-restricted-overlong-1024-step-120 2B • Updated Mar 26 • 5
hzy/verl-grpo-math-qwen2.5-1.5b-short-0-long-1-restricted-overlong-1024-step-400 2B • Updated Mar 26 • 6
hzy/Qwen2.5-Math-7B-Instruct-PRM-Modified-math_shepherd Token Classification • 7B • Updated Mar 10 • 10