MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Paper โข 2509.16197 โข Published Sep 19 โข 56