Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 95
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2, 2025 • 187
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs Paper • 2505.13529 • Published May 18, 2025 • 11
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20, 2025 • 106
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13, 2025 • 99
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 85
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks Paper • 2407.02855 • Published Jul 3, 2024 • 12
Prompt-Driven LLM Safeguarding via Directed Representation Optimization Paper • 2401.18018 • Published Jan 31, 2024 • 1
CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation Paper • 2208.08845 • Published Aug 18, 2022
PsyQA: A Chinese Dataset for Generating Long Counseling Text for Mental Health Support Paper • 2106.01702 • Published Jun 3, 2021
On Large Language Models' Selection Bias in Multi-Choice Questions Paper • 2309.03882 • Published Sep 7, 2023