Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 4 days ago • 45
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published 9 days ago • 23
SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published 4 days ago • 42
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published 5 days ago • 26
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence Paper • 2512.10863 • Published 16 days ago • 21
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image Paper • 2512.16899 • Published 9 days ago • 12
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published 9 days ago • 71
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs Paper • 2512.17008 • Published 9 days ago • 10
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 9 days ago • 79
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 9 days ago • 19
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models Paper • 2512.15713 • Published 10 days ago • 15
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing Paper • 2512.14681 • Published 11 days ago • 39
Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure Paper • 2512.14336 • Published 11 days ago • 28
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 11 days ago • 64
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics Paper • 2512.13660 • Published 12 days ago • 37