Vision-Language Model - a hllj Collection

hllj 's Collections

Technical Report

(Continued) Pretraining

Retrieval Augmented Generation

Dataset Processing Technique

Vision-Language Model

Image-Text Models

Speculative Decoding

Vision-Language Model

updated Jul 18, 2024

Visual Instruction Tuning

Paper • 2304.08485 • Published Apr 17, 2023 • 21
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 11
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 39
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
CogVLM: Visual Expert for Pretrained Language Models

Paper • 2311.03079 • Published Nov 6, 2023 • 27
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 49
OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9, 2024 • 77
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39
Pegasus-v1 Technical Report

Paper • 2404.14687 • Published Apr 23, 2024 • 33
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25, 2024 • 59
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25, 2024 • 37
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 94
Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2, 2024 • 24
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding

Paper • 2407.01791 • Published Jul 1, 2024 • 6