-
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Paper • 2505.04842 • Published • 12 -
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper • 2505.04588 • Published • 65 -
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Paper • 2504.21776 • Published • 59 -
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Paper • 2505.01441 • Published • 39
Always OU
AlwaysOU
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
3 months ago
Pruning the Unsurprising: Efficient Code Reasoning via First-Token
Surprisal
upvoted
a
paper
3 months ago
Memp: Exploring Agent Procedural Memory
Organizations
None yet