Submitted by haizhongzheng 13 Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs? Carnegie Mellon University 2