2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 107
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review Paper • 2502.16586 • Published Feb 23 • 1