Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published 5 days ago • 24
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning Paper • 2510.15110 • Published 11 days ago • 15
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published 10 days ago • 77
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 14 days ago • 166
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving Paper • 2510.11769 • Published 14 days ago • 25
BroRL: Scaling Reinforcement Learning via Broadened Exploration Paper • 2510.01180 • Published 26 days ago • 17
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published 28 days ago • 133
view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 1.14k
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3 • 21
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL Paper • 2508.07976 • Published Aug 11 • 51