Yuan-embedding-1.0
Yuan-embedding-1.0 是专门为中文文本检索任务设计的嵌入模型。 在xiaobu模型结构(bert-large结构)基础上, 采用全新的数据集构建、生成与清洗方法, 结合二阶段微调实现Retrieval任务的精度领先(Hugging Face C-MTEB榜单 [1])。 其中, 正负例样本采用源2.0-M32(Yuan2.0-M32 [2])大模型进行生成。主要工作如下:
- 在Hard negative sampling中,使用Rerank模型(bge-reranker-large [3])进行数据排序筛选 
- 通过(Yuan2.0-M32大模型)迭代生成新query、corpus 
- 采用MRL方法进行模型微调训练 
Usage
pip install -U sentence-transformers==3.1.1
使用示例:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("IEIYuan/Yuan-embedding-1.0")
sentences = [
    "这是一个样例-1",
    "这是一个样例-2",
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Reference
- Downloads last month
- 167
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	🙋
			
		Ask for provider support
Evaluation results
- cosine_pearson on MTEB AFQMC (default)validation set self-reported56.399
- cosine_spearman on MTEB AFQMC (default)validation set self-reported60.298
- manhattan_pearson on MTEB AFQMC (default)validation set self-reported58.344
- manhattan_spearman on MTEB AFQMC (default)validation set self-reported59.634
- euclidean_pearson on MTEB AFQMC (default)validation set self-reported58.332
- euclidean_spearman on MTEB AFQMC (default)validation set self-reported59.633
- main_score on MTEB AFQMC (default)validation set self-reported60.298
- cosine_pearson on MTEB ATEC (default)test set self-reported56.419
- cosine_spearman on MTEB ATEC (default)test set self-reported58.498
- manhattan_pearson on MTEB ATEC (default)test set self-reported62.053
