Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning Paper • 2508.09726 • Published Aug 13 • 15
AdoraRL/Qwen2.5-7B-Instruct-1M-KK-5ppl-100step-ADORA Text Generation • 8B • Updated Apr 3 • 8 • 1
AdoraRL/Qwen2.5-7B-Instruct-1M-KK-5ppl-100step-ADORA Text Generation • 8B • Updated Apr 3 • 8 • 1
AdoraRL/Qwen2.5-7B-Instruct-1M-KK-5ppl-100step-ADORA Text Generation • 8B • Updated Apr 3 • 8 • 1