CASA: Cross-Attention as Self-Attention for Efficient Vision-Language Fusion on long context streaming inputs
-
CASA Gallery
🏠1Video Gallery for CASA: Cross-Attention via Self-Attention
-
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper • 2512.19535 • Published • 6 -
kyutai/CASA-Helium1-VL-2B
Image-Text-to-Text • 3B • Updated • 72 • 4 -
kyutai/CASA-Qwen2_5-VL-3B
Image-Text-to-Text • 4B • Updated • 64 • 1