Building a Foundational Guardrail for General Agentic Systems via Synthetic Data Paper • 2510.09781 • Published 24 days ago • 26
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published 28 days ago • 46
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Paper • 2504.05541 • Published Apr 7 • 15
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 96