Paper list
updated
Finetuned Multimodal Language Models Are High-Quality Image-Text Data
Filters
Paper
• 2403.02677
• Published
• 18
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large
Language Models
Paper
• 2403.03003
• Published
• 11
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Paper
• 2403.01487
• Published
• 16
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper
• 2403.00522
• Published
• 46
FuseChat: Knowledge Fusion of Chat Models
Paper
• 2402.16107
• Published
• 39
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
• 2403.07508
• Published
• 77
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published
• 129
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
Acceleration for Large Vision-Language Models
Paper
• 2403.06764
• Published
• 27
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper
• 2403.05525
• Published
• 49
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
• 2403.03853
• Published
• 66
Enhancing Vision-Language Pre-training with Rich Supervisions
Paper
• 2403.03346
• Published
• 17
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Paper
• 2403.13248
• Published
• 78
When Do We Not Need Larger Vision Models?
Paper
• 2403.13043
• Published
• 26
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Paper
• 2403.11703
• Published
• 17
Aria: An Open Multimodal Native Mixture-of-Experts Model
Paper
• 2410.05993
• Published
• 111
Personalized Visual Instruction Tuning
Paper
• 2410.07113
• Published
• 70
Paper
• 2410.07073
• Published
• 69
MM-Ego: Towards Building Egocentric Multimodal LLMs
Paper
• 2410.07177
• Published
• 22
UniMuMo: Unified Text, Music and Motion Generation
Paper
• 2410.04534
• Published
• 19
Video Instruction Tuning With Synthetic Data
Paper
• 2410.02713
• Published
• 41
Distilling an End-to-End Voice Assistant Without Instruction Training
Data
Paper
• 2410.02678
• Published
• 23