LongAttn: Selecting Long-context Training Data via Token-level Attention Paper β’ 2502.16860 β’ Published Feb 24 β’ 1
CLIPPER Collection Models and datasets for CLIPPER: Compression enables long-context synthetic data generation β’ 7 items β’ Updated Oct 3 β’ 5
view article Article Deploying Your FastAPI Applications on Huggingface Via Docker Dec 11, 2023 β’ 39
SmolVLM: Redefining small and efficient multimodal models Paper β’ 2504.05299 β’ Published Apr 7 β’ 202
Towards the Law of Capacity Gap in Distilling Language Models Paper β’ 2311.07052 β’ Published Nov 13, 2023 β’ 2
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published Feb 4 β’ 252
view article Article From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease Oct 21, 2022 β’ 42
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper β’ 2410.22366 β’ Published Oct 28, 2024 β’ 84
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper β’ 2404.00399 β’ Published Mar 30, 2024 β’ 42