Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Paper • 2510.19600 • Published 5 days ago • 64
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published 5 days ago • 57
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published 10 days ago • 137
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Paper • 2510.01171 • Published 26 days ago • 17
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published 14 days ago • 31
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Paper • 2510.06499 • Published 19 days ago • 31
ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review Paper • 2510.08867 • Published 17 days ago • 4
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published 25 days ago • 91
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends Paper • 2509.24203 • Published 28 days ago • 7
VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning Paper • 2510.01444 • Published 25 days ago • 19
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published 25 days ago • 26
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published 27 days ago • 52
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning Paper • 2509.22824 • Published about 1 month ago • 20
VideoScore2: Think before You Score in Generative Video Evaluation Paper • 2509.22799 • Published about 1 month ago • 24
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published about 1 month ago • 67