todo - a sean29 Collection

sean29 's Collections

rl

todo

updated Nov 4, 2025

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 509
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Paper • 2509.25541 • Published Sep 29, 2025 • 140
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 273
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 146
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28, 2025 • 176
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11, 2025 • 248
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2, 2025 • 238
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Paper • 2510.09201 • Published Oct 10, 2025 • 50
The Art of Scaling Reinforcement Learning Compute for LLMs

Paper • 2510.13786 • Published Oct 15, 2025 • 32