WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper • 2511.11434 • Published 7 days ago • 43
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published 8 days ago • 77
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published 11 days ago • 99
Revisiting Multimodal Positional Encoding in Vision-Language Models Paper • 2510.23095 • Published 25 days ago • 20
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published 17 days ago • 100
ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks Paper • 2510.18455 • Published about 1 month ago • 17
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10 • 50
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6 • 112
Code2Video: A Code-centric Paradigm for Educational Video Generation Paper • 2510.01174 • Published Oct 1 • 33