Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published 3 days ago • 25
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published 2 days ago • 90
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published 4 days ago • 11
VisCoder2: Building Multi-Language Visualization Coding Agents Paper • 2510.23642 • Published 8 days ago • 20
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published 16 days ago • 44
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Paper • 2510.19600 • Published 10 days ago • 66
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published 10 days ago • 58
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published 16 days ago • 144
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Paper • 2510.01171 • Published Oct 1 • 17
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published 19 days ago • 31
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Paper • 2510.06499 • Published 25 days ago • 31
ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review Paper • 2510.08867 • Published 23 days ago • 4
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published about 1 month ago • 91