Paper list - a poeroz Collection

poeroz 's Collections

Paper list

updated Oct 12, 2024

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 18
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Paper • 2403.03003 • Published Mar 5, 2024 • 11
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Paper • 2403.01487 • Published Mar 3, 2024 • 16
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
FuseChat: Knowledge Fusion of Chat Models

Paper • 2402.16107 • Published Feb 25, 2024 • 39
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12, 2024 • 77
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Paper • 2403.06764 • Published Mar 11, 2024 • 27
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 49
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 66
Enhancing Vision-Language Pre-training with Rich Supervisions

Paper • 2403.03346 • Published Mar 5, 2024 • 17
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Paper • 2403.13248 • Published Mar 20, 2024 • 78
When Do We Not Need Larger Vision Models?

Paper • 2403.13043 • Published Mar 19, 2024 • 26
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18, 2024 • 17
Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 111
Personalized Visual Instruction Tuning

Paper • 2410.07113 • Published Oct 9, 2024 • 70
Pixtral 12B

Paper • 2410.07073 • Published Oct 9, 2024 • 69
MM-Ego: Towards Building Egocentric Multimodal LLMs

Paper • 2410.07177 • Published Oct 9, 2024 • 22
UniMuMo: Unified Text, Music and Motion Generation

Paper • 2410.04534 • Published Oct 6, 2024 • 19
Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 41
Distilling an End-to-End Voice Assistant Without Instruction Training Data

Paper • 2410.02678 • Published Oct 3, 2024 • 23