T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published May 1 • 44
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 240
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published Dec 3, 2024 • 113
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 • 78
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6, 2024 • 92