TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model Paper • 2510.16449 • Published 12 days ago • 34
Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published 10 days ago • 114
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning Paper • 2510.14958 • Published 14 days ago • 22
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Paper • 2510.08555 • Published 21 days ago • 62
WebGen-Agent Collection A novel website-generation agent that leverages comprehensive and multi-level visual feedback to iteratively generate and refine the website codebase. • 7 items • Updated Sep 29 • 1
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning Paper • 2509.22644 • Published Sep 26 • 20
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing Paper • 2509.22651 • Published Sep 26 • 22
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 72
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2 • 123
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 218
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench Paper • 2508.20931 • Published Aug 28 • 15
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published Aug 28 • 35
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models Paper • 2508.21365 • Published Aug 29 • 28
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Paper • 2508.21148 • Published Aug 28 • 140
WebGen-Bench Collection Datasets and models introduced in the paper "WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch". • 11 items • Updated Aug 30 • 1
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation Paper • 2506.03930 • Published Jun 4 • 26
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published Aug 7 • 73
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch Paper • 2505.03733 • Published May 6 • 17
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published Nov 16, 2024 • 46