papers-to-read - a esthor Collection

esthor 's Collections

TTS

papers-to-read

updated Sep 21

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 236
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 257
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 153
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 132
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22 • 120
WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 121
4KAgent: Agentic Any Image to 4K Super-Resolution

Paper • 2507.07105 • Published Jul 9 • 104
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30 • 98
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 236
DINOv3

Paper • 2508.10104 • Published Aug 13 • 274
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 186
VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6 • 158
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 177
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 154
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10 • 673
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25 • 340
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 218
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 183
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Paper • 2508.21148 • Published Aug 28 • 140
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4 • 189
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 98
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

Paper • 2509.13312 • Published Sep 16 • 104
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 112
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Paper • 2509.06501 • Published Sep 8 • 78
Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4 • 73
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1 • 71
MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

Paper • 2509.06806 • Published Sep 8 • 63
FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18 • 110