P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published 12 days ago • 128
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published about 1 month ago • 80
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published Sep 25 • 29
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published Sep 18 • 52
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 88
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction Paper • 2507.02025 • Published Jul 2 • 35
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28 • 131
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published May 20 • 62
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond Paper • 2503.21614 • Published Mar 27 • 42
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid Paper • 2502.07563 • Published Feb 11 • 24
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published Jan 22 • 61