Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report Paper • 2510.14880 • Published Oct 16 • 15
Embeddings datasets ⚡️ Collection This collection gather datasets for embeddings pre-training and fine-tuning. • 12 items • Updated Oct 1 • 3
NanoBEIR-fr 🍺 Collection French translation of zeta-alpha-ai's NanoBEIR collection • 13 items • Updated 19 days ago • 2
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval Paper • 2505.16967 • Published May 22 • 24
RLHN Datasets Collection RLHN: Cleaned Training Datasets with False Negatives Identified & Relabeled as ground truth. • 5 items • Updated May 23 • 4
NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 23
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28, 2024 • 20