OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value Paper • 2512.14051 • Published 13 days ago • 39
Rethinking Expert Trajectory Utilization in LLM Post-training Paper • 2512.11470 • Published 17 days ago • 7
Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale Paper • 2512.10398 • Published 18 days ago • 6
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Paper • 2512.10739 • Published 18 days ago • 45
VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs Paper • 2512.12072 • Published 17 days ago • 17
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published 12 days ago • 17
Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision Paper • 2512.15489 • Published 12 days ago • 6
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM Paper • 2503.17793 • Published Mar 22 • 23
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations Paper • 2509.03405 • Published Sep 3 • 23
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic Paper • 2509.01363 • Published Sep 1 • 58
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 227
SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge Paper • 2509.07968 • Published Sep 9 • 14
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents Paper • 2509.06283 • Published Sep 8 • 17
Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents Paper • 2509.06917 • Published Sep 8 • 41