SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval Paper • 2109.10086 • Published Sep 21, 2021 • 3
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Paper • 2505.21115 • Published May 27 • 139
SANA-Sprint Collection 🏃SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation • 6 items • Updated Sep 13 • 43
Sana Collection ⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 21 items • Updated Sep 13 • 97
DRAMA Collection A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages. • 3 items • Updated Feb 26 • 7
SuperBPE Collection SuperBPE tokenizers and models trained with them • 9 items • Updated 4 days ago • 17
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5 • 232
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators Paper • 2502.06394 • Published Feb 10 • 89
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 174