Nandan Thakur's picture

Nandan Thakur

nthakur

·

https://thakur-nandan.github.io

AI & ML interests

NLP, IR, QA

Recent Activity

published a dataset 4 days ago

nthakur/odyssey-verified-27K

liked a model 4 days ago

Alibaba-NLP/Tongyi-DeepResearch-30B-A3B

upvoted an article 11 days ago

Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text

View all activity

Organizations

upvoted an article 11 days ago

Article

Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text

By

and 2 others •

11 days ago

• 33

upvoted an article 30 days ago

Article

Introducing RTEB: A New Standard for Retrieval Evaluation

about 1 month ago

• 118

upvoted a collection about 2 months ago

EmbeddingGemma

3 items • Updated Sep 11 • 93

upvoted an article about 2 months ago

Article

Welcome EmbeddingGemma, Google's new efficient embedding model

Sep 4

• 251

upvoted a paper 3 months ago

BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

Paper • 2508.06600 • Published Aug 8 • 40

upvoted an article 4 months ago

Article

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

Jul 1

• 126

upvoted a collection 5 months ago

Qwen3-Embedding

6 items • Updated Jul 21 • 132

upvoted a paper 5 months ago

Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

Paper • 2505.16967 • Published May 22 • 24

upvoted 2 collections 5 months ago

RLHN Datasets

RLHN: Cleaned Training Datasets with False Negatives Identified & Relabeled as ground truth. • 5 items • Updated May 23 • 4

Multilingual SFT & DPO Datasets

These SFT or DPO datasets were translated from English using the Mistral-7B-Instruct-v0.2 or taken from other sources. • 8 items • Updated Mar 31 • 3

upvoted a paper 7 months ago

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Paper • 2504.13128 • Published Apr 17 • 7

upvoted 3 collections 7 months ago

Multimodal DSE Retrievers

A collection of DSE models for multimodal retrieval • 5 items • Updated Apr 15 • 15

🌐 NoMIRACL Dataset [EMNLP'24]

A collection of multilingual relevance assessment datasets. We also have SFT fine-tuned models (Mistral-7B & Llama-3 8B) • 7 items • Updated Mar 31 • 1

🏜️MIRAGE-Bench [NAACL'25]

Dataset Collection from the MIRAGE-Bench paper • 13 items • Updated Mar 31 • 2

upvoted a collection 8 months ago

DRAMA

A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages. • 3 items • Updated Feb 26 • 7

upvoted a paper 9 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 420

upvoted a paper 11 months ago

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

Paper • 2312.11361 • Published Dec 18, 2023 • 1

upvoted an article about 1 year ago

Article

Visually Multilingual: Introducing mcdse-2b

By

•

Oct 27, 2024

• 41

upvoted a paper over 1 year ago

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51

upvoted an article over 1 year ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28, 2024

• 257