InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper โข 2508.18265 โข Published Aug 25 โข 202
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper โข 2505.07608 โข Published May 12 โข 82
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper โข 2504.15279 โข Published Apr 21 โข 77
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper โข 2503.10639 โข Published Mar 13 โข 53
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper โข 2412.09604 โข Published Dec 12, 2024 โข 38
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper โข 2412.05271 โข Published Dec 6, 2024 โข 159
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper โข 2411.10442 โข Published Nov 15, 2024 โข 86
InternVL2.0 Collection Expanding Performance Boundaries of Open-Source MLLM โข 15 items โข Updated Sep 28 โข 89
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper โข 2407.01284 โข Published Jul 1, 2024 โข 81
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper โข 2404.16821 โข Published Apr 25, 2024 โข 57