OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published 14 days ago • 85
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints Paper • 2510.08565 • Published 22 days ago • 19
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published 22 days ago • 108
Factuality Matters: When Image Generation and Editing Meet Structured Visuals Paper • 2510.05091 • Published 25 days ago • 18
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 202
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published Aug 7 • 73
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models Paper • 2507.12566 • Published Jul 16 • 14
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models Paper • 2505.09694 • Published May 14 • 19
EnerVerse-AC: Envisioning Embodied Environments with Action Condition Paper • 2505.09723 • Published May 14 • 23
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Jul 21 • 650
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 135
PIIP Collection [NeurIPS 2024 Spotlight (Ranking Top 10), TPAMI 2025] Parameter-Inverted Image Pyramid Networks • 11 items • Updated Sep 28 • 1
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3 • 68
Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning Paper • 2503.11646 • Published Mar 14 • 35
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 74
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding Paper • 2501.07783 • Published Jan 14 • 7