Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified
Self-Play
Paper
• 2509.25541
• Published
• 140
Agent Learning via Early Experience
Paper
• 2510.08558
• Published
• 273
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published
• 146
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP
Use
Paper
• 2509.24002
• Published
• 176
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published
• 190
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action
Model
Paper
• 2509.09372
• Published
• 248
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
• 2508.01191
• Published
• 238
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for
MLLMs
Paper
• 2510.09201
• Published
• 50
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
• 2510.13786
• Published
• 32