A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports Paper • 2510.02190 • Published Oct 2 • 18
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning Paper • 2509.22824 • Published Sep 26 • 20
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 73
VideoScore2: Think before You Score in Generative Video Evaluation Paper • 2509.22799 • Published Sep 26 • 25