-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2601.19897
-
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 100 -
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper • 2601.19325 • Published • 79 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61 -
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper • 2601.21821 • Published • 60
-
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper • 2601.14249 • Published • 13 -
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Paper • 2402.07033 • Published • 19 -
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
Paper • 2601.07251 • Published • 11 -
GameTalk: Training LLMs for Strategic Conversation
Paper • 2601.16276 • Published • 13
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 38 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 211 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 200 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 197
-
Self-Distillation Enables Continual Learning
Paper • 2601.19897 • Published • 26 -
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Paper • 2601.21468 • Published • 25 -
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Paper • 2509.23040 • Published • 12
-
Behavior Knowledge Merge in Reinforced Agentic Models
Paper • 2601.13572 • Published • 25 -
Language of Thought Shapes Output Diversity in Large Language Models
Paper • 2601.11227 • Published • 9 -
Agentic-R: Learning to Retrieve for Agentic Search
Paper • 2601.11888 • Published • 19 -
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper • 2602.02488 • Published • 33
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 38 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 211 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 200 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 197
-
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 100 -
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper • 2601.19325 • Published • 79 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61 -
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper • 2601.21821 • Published • 60
-
Self-Distillation Enables Continual Learning
Paper • 2601.19897 • Published • 26 -
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Paper • 2601.21468 • Published • 25 -
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Paper • 2509.23040 • Published • 12
-
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper • 2601.14249 • Published • 13 -
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Paper • 2402.07033 • Published • 19 -
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
Paper • 2601.07251 • Published • 11 -
GameTalk: Training LLMs for Strategic Conversation
Paper • 2601.16276 • Published • 13
-
Behavior Knowledge Merge in Reinforced Agentic Models
Paper • 2601.13572 • Published • 25 -
Language of Thought Shapes Output Diversity in Large Language Models
Paper • 2601.11227 • Published • 9 -
Agentic-R: Learning to Retrieve for Agentic Search
Paper • 2601.11888 • Published • 19 -
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper • 2602.02488 • Published • 33