Canvas-to-Image: Compositional Image Generation with Multimodal Controls Paper • 2511.21691 • Published 6 days ago • 26
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published 13 days ago • 90
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published 12 days ago • 89
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Paper • 2511.16668 • Published 12 days ago • 53
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning Paper • 2511.16043 • Published 12 days ago • 98
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity Paper • 2511.15593 • Published 13 days ago • 54
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning Paper • 2511.14366 • Published 14 days ago • 14
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Paper • 2511.12609 • Published 16 days ago • 101
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper • 2511.11434 • Published 18 days ago • 44
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published 19 days ago • 90
Black-Box On-Policy Distillation of Large Language Models Paper • 2511.10643 • Published 19 days ago • 46
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Paper • 2511.08633 • Published 23 days ago • 53
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published 22 days ago • 103
HaluMem: Evaluating Hallucinations in Memory Systems of Agents Paper • 2511.03506 • Published 27 days ago • 92
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published 25 days ago • 52