Emerging Properties in Unified Multimodal Pretraining Paper โข 2505.14683 โข Published May 20 โข 133
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding Paper โข 2503.10596 โข Published Mar 13 โข 18
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper โข 2502.13145 โข Published Feb 18 โข 38
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Paper โข 2501.01423 โข Published Jan 2 โข 44
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving Paper โข 2411.15139 โข Published Nov 22, 2024 โข 15
ControlAR: Controllable Image Generation with Autoregressive Models Paper โข 2410.02705 โข Published Oct 3, 2024 โข 11
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Paper โข 2406.20076 โข Published Jun 28, 2024 โข 10
YOLO-World: Real-Time Open-Vocabulary Object Detection Paper โข 2401.17270 โข Published Jan 30, 2024 โข 42