Koray's picture

15 60

Koray

ciCic

·

ciCciC

AI & ML interests

CV, NLP

Recent Activity

upvoted an article about 6 hours ago

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

updated a dataset 8 days ago

ciCic/nrc_a_embeddings

published a dataset 9 days ago

ciCic/nrc_a_embeddings

View all activity

Organizations

upvoted an article about 6 hours ago

Article

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

Jul 1

•

130

upvoted an article about 1 month ago

Article

Introducing RTEB: A New Standard for Retrieval Evaluation

Oct 1

•

126

upvoted 2 collections about 2 months ago

E5-NL

Collection of Dutch retrieval models • 13 items • Updated Sep 23 • 6

MTEB-NL

Massive Text Embedding Benchmark for Dutch. Check https://github.com/nikolay-banar/mteb-nl-dev to evaluate your models. • 26 items • Updated 11 days ago • 2

upvoted a paper about 2 months ago

MTEB-NL and E5-NL: Embedding Benchmark and Models for Dutch

Paper • 2509.12340 • Published Sep 15 • 4

upvoted an article 2 months ago

Article

Welcome EmbeddingGemma, Google's new efficient embedding model

Sep 4

•

256

upvoted an article 4 months ago

Article

Provence: efficient and robust context pruning for retrieval-augmented generation

Jan 28

•

22

upvoted a paper 9 months ago

BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language

Paper • 2412.08329 • Published Dec 11, 2024 • 1

upvoted a collection 9 months ago

BEIR-NL

Zero-shot Information Retrieval Benchmark for the Dutch Language • 16 items • Updated Sep 23 • 3

upvoted a paper 9 months ago

Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

Paper • 2408.04303 • Published Aug 8, 2024 • 22

upvoted an article 12 months ago

Article

EuroLLM-9B

Dec 2, 2024

•

137

upvoted an article about 1 year ago

Article

How to generate text: using different decoding methods for language generation with Transformers

Mar 1, 2020

•

264

upvoted a paper about 1 year ago

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 151

upvoted a collection about 1 year ago

Parallel Sentences Datasets

These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 14 items • Updated Feb 25 • 19

upvoted a paper over 1 year ago

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11, 2024 • 59