view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 Jul 1 • 130
MTEB-NL Collection Massive Text Embedding Benchmark for Dutch. Check https://github.com/nikolay-banar/mteb-nl-dev to evaluate your models. • 26 items • Updated 11 days ago • 2
MTEB-NL and E5-NL: Embedding Benchmark and Models for Dutch Paper • 2509.12340 • Published Sep 15 • 4
view article Article Provence: efficient and robust context pruning for retrieval-augmented generation Jan 28 • 22
BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language Paper • 2412.08329 • Published Dec 11, 2024 • 1
BEIR-NL Collection Zero-shot Information Retrieval Benchmark for the Dutch Language • 16 items • Updated Sep 23 • 3
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP Paper • 2408.04303 • Published Aug 8, 2024 • 22
view article Article How to generate text: using different decoding methods for language generation with Transformers Mar 1, 2020 • 264
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 151
Parallel Sentences Datasets Collection These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 14 items • Updated Feb 25 • 19
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 59