Uniform Discrete Diffusion with Metric Path for Video Generation Paper • 2510.24717 • Published Oct 28 • 39
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published 30 days ago • 104
CapsFusion: Rethinking Image-Text Data at Scale Paper • 2310.20550 • Published Oct 31, 2023 • 27
Generative Multimodal Models are In-Context Learners Paper • 2312.13286 • Published Dec 20, 2023 • 37
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters Paper • 2402.04252 • Published Feb 6, 2024 • 29
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception Paper • 2407.08303 • Published Jul 11, 2024 • 19