LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 130
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3 • 32
view article Article Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies Feb 17 • 26
view article Article Drag GAN - Interactive Point-based Manipulation on the Generative Image Manifold Dec 17, 2023 • 3
Adding Conditional Control to Text-to-Image Diffusion Models Paper • 2302.05543 • Published Feb 10, 2023 • 57
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition Paper • 2407.13559 • Published Jul 18, 2024 • 20
Arabic Handwritten Text for Person Biometric Identification: A Deep Learning Approach Paper • 2406.00409 • Published Jun 1, 2024 • 1
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24, 2024 • 202
Muharaf: Manuscripts of Handwritten Arabic Dataset for Cursive Text Recognition Paper • 2406.09630 • Published Jun 13, 2024 • 2