Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 5 days ago • 40
Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory Paper • 2606.06523 • Published 12 days ago • 6
AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents Paper • 2606.05597 • Published 10 days ago • 4
Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues Paper • 2606.02754 • Published 13 days ago • 13
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 13 days ago • 228
CorrectKLinRL/Qwen3-1.7B-Base-prlCurrentKL-eta100-forward_k3-clipLow_inf-clipHigh_inf 2B • Updated 27 days ago • 60
CorrectKLinRL/Qwen3-1.7B-Base-prlCurrentKL-eta100-forward_k3-clipLow_inf-clipHigh_inf 2B • Updated 27 days ago • 60
CorrectKLinRL/Qwen3-1.7B-Base-prlCurrentKL-eta100-reverse_k3-clipLow_inf-clipHigh_inf 2B • Updated 27 days ago • 18
CorrectKLinRL/Qwen3-1.7B-Base-prlCurrentKL-eta100-reverse_k3-clipLow_inf-clipHigh_inf 2B • Updated 27 days ago • 18