Who's Your Judge? On the Detectability of LLM-Generated Judgments Paper • 2509.25154 • Published Sep 29 • 29
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 236
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Paper • 2505.15929 • Published May 21 • 49
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation Paper • 2505.18759 • Published May 24 • 14