IEITYuan
/

Yuan-embedding-1.0

Model card Files Files and versions

IEIT-Yuan commited on Nov 14, 2024

Commit

5b8c129

·

verified ·

1 Parent(s): eb89f36

Update README.md

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -1261,13 +1261,15 @@ tags:
 ---
 ## Yuan-embedding-1.0
-Yuan-embedding-1.0是专门为中文文本检索任务设计的嵌入模型。它基于xiaobu-embedding-v2[1]，主要改动如下：
-- 在Hard negative sampling中，使用Rerank模型(bge-reranker-large [2])进行数据排序筛选
-- 基于LLM迭代生成新query
-- 基于piccolo-embedding [3]进行训练
 ## Usage
@@ -1293,6 +1295,6 @@ print(similarities)
 ## Reference
-1. https://huggingface.co/lier007/xiaobu-embedding-v2
-2. https://huggingface.co/BAAI/bge-reranker-large
-3. https://github.com/hjq133/piccolo-embedding

 ---
 ## Yuan-embedding-1.0
+Yuan-embedding-1.0 是专门为中文文本检索任务设计的嵌入模型。
+在xiaobu模型结构（bert-large结构）基础上, 采用全新的数据集构建、生成与清洗方法, 结合二阶段微调实现Retrieval任务的精度领先（Hugging Face C-MTEB榜单 [1]）。
+其中, 正负例样本采用源2.0-M32（Yuan2.0-M32 [2] ） 大模型进行生成。主要工作如下：
+- 在Hard negative sampling中，使用Rerank模型(bge-reranker-large [3])进行数据排序筛选
+- 通过（Yuan2.0-M32大模型）迭代生成新query、corpus
+- 采用MRL方法进行模型微调训练
 ## Usage
 ## Reference
+1. https://huggingface.co/spaces/mteb/leaderboard
+2. https://huggingface.co/IEITYuan/Yuan2-M32
+3. https://huggingface.co/BAAI/bge-reranker-large