PRM-CoT/Qwen2.5-Math-7B-numina-prm_advorm-n5-eta300-stepSplit-nn-step600 8B • Updated 22 days ago • 28
PRM-CoT/Qwen2.5-Math-7B-numina-prm_advorm-n5-eta300-stepSplit-nn-step600 8B • Updated 22 days ago • 28
PRM-CoT/Qwen2.5-Math-7B-numina-prm_advorm-n5-eta300-stepSplit-nn-step500 8B • Updated 22 days ago • 10
PRM-CoT/Qwen2.5-Math-7B-numina-prm_advorm-n5-eta300-stepSplit-nn-step500 8B • Updated 22 days ago • 10
PRM-CoT/Qwen2.5-Math-1.5B-numina-prm_advorm-n5-eta300-stepSplit-nn-step550 2B • Updated 26 days ago • 41
PRM-CoT/Qwen2.5-Math-1.5B-numina-prm_advorm-n5-eta300-stepSplit-nn-step550 2B • Updated 26 days ago • 41
PRM-CoT/Qwen2.5-Math-1.5B-numina-prm_advorm-n5-eta300-stepSplit-nn-step500 2B • Updated 26 days ago • 11
PRM-CoT/Qwen2.5-Math-1.5B-numina-prm_advorm-n5-eta300-stepSplit-nn-step500 2B • Updated 26 days ago • 11
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving Paper • 2510.11769 • Published Oct 13 • 25
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark Paper • 2510.13759 • Published Oct 15 • 9
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning Paper • 2510.12693 • Published Oct 14 • 26
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published Oct 7 • 139
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training Paper • 2510.04996 • Published Oct 6 • 15