PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published 18 days ago • 78
view article Article There is no such thing as a tokenizer-free lunch By catherinearnett • Sep 25 • 84
view article Article Synthetic dataset generation techniques: Self-Instruct By davanstrien • May 15, 2024 • 20
view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages By davanstrien and 5 others • Jul 8 • 32
Cohere Labs Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated Jul 31 • 70
Running 3.4k 3.4k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Smol-Course-Models Collection The collection of models that I trained for different chapters and datasets provided in the smol-course • 3 items • Updated Jan 19
Smol-Course-Models Collection The collection of models that I trained for different chapters and datasets provided in the smol-course • 3 items • Updated Jan 19