Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models Paper • 2510.21978 • Published Oct 24 • 14
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9 • 41
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9 • 41 • 2
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published Oct 8 • 30
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2 • 24