SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Paper • 2506.19767 • Published Jun 24 • 15
pLSTM: parallelizable Linear Source Transition Mark networks Paper • 2506.11997 • Published Jun 13 • 10
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 141
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published Apr 22 • 21
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks Paper • 2410.22391 • Published Oct 29, 2024 • 22
Retrieval-Augmented Decision Transformer: External Memory for In-context RL Paper • 2410.07071 • Published Oct 9, 2024 • 7
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Paper • 2410.07170 • Published Oct 9, 2024 • 16