Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM
Reasoners With Verifiers
Paper
• 2505.04842
• Published
• 12
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper
• 2505.04588
• Published
• 65
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
• 2504.21776
• Published
• 59
Agentic Reasoning and Tool Integration for LLMs via Reinforcement
Learning
Paper
• 2505.01441
• Published
• 39
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You
Think
Paper
• 2504.20708
• Published
• 23
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
• 2504.13837
• Published
• 139
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published
• 85
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
• 2504.09643
• Published
• 34
Toward Evaluative Thinking: Meta Policy Optimization with Evolving
Reward Models
Paper
• 2504.20157
• Published
• 37
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
• 2504.11536
• Published
• 63
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published
• 58
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published
• 49
T1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language Models
Paper
• 2504.04718
• Published
• 43
START: Self-taught Reasoner with Tools
Paper
• 2503.04625
• Published
• 113
RM-R1: Reward Modeling as Reasoning
Paper
• 2505.02387
• Published
• 81
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in
Large Language Models
Paper
• 2505.02847
• Published
• 29
Self-Generated In-Context Examples Improve LLM Agents for Sequential
Decision-Making Tasks
Paper
• 2505.00234
• Published
• 26
Learning to Reason under Off-Policy Guidance
Paper
• 2504.14945
• Published
• 88