Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published Oct 29 • 44
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning Paper • 2509.13755 • Published Sep 17 • 19
Writing-Zero: Bridge the Gap Between Non-verifiable Problems and Verifiable Rewards Paper • 2506.00103 • Published May 30 • 3