mdbr-leaf-mt-asym / README.md

Upload README.md

72545da verified 2 months ago

8.88 kB

	---
	license: apache-2.0
	base_model: microsoft/MiniLM-L6-v2
	tags:
	- transformers
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- text-embeddings-inference
	- information-retrieval
	- knowledge-distillation
	language:
	- en
	---

	<div style="display: flex; justify-content: center;">
	<div style="display: flex; align-items: center; gap: 10px;">
	<img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">
	<span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-mt-asym</span>
	</div>
	</div>

	# Content

	1. [Introduction](#introduction)
	2. [Technical Report](#technical-report)
	3. [Highlights](#highlights)
	4. [Benchmarks](#benchmark-comparison)
	5. [Quickstart](#quickstart)
	6. [Citation](#citation)

	# Introduction

	`mdbr-leaf-mt-asym` is a high-performance text embedding model designed for classification, clustering, semantic sentence similarity and summarization tasks.

	This model is the asymmetric variant of `mdbr-leaf-mt`, which uses [`MongoDB/mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) for queries and [`mixedbread-ai/mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) for documents.

	The model is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).

	If you are looking to perform semantic search / information retrieval (e.g. for RAGs), please check out our [`mdbr-leaf-ir`](https://huggingface.co/MongoDB/mdbr-leaf-ir) model, which is specifically trained for these tasks.

	> [!Note]
	> Note: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.

	# Technical Report

	A technical report detailing our proposed `LEAF` training procedure is [available here](https://arxiv.org/abs/2509.12539).

	# Highlights

	* State-of-the-Art Performance: `mdbr-leaf-mt-asym` achieves state-of-the-art results for compact embedding models, ranking #1 on the public [MTEB v2 (Eng) leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤30M parameters.
	* Flexible Architecture Support: `mdbr-leaf-mt-asym` uses an asymmetric retrieval architecture enabling even greater retrieval results.
	* MRL and Quantization Support: embedding vectors generated by `mdbr-leaf-mt-asym` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl-truncation) for more information.

	## Benchmark Comparison

	The table below shows the scores for `mdbr-leaf-mt` on the MTEB v2 (English) benchmark, compared to other retrieval models.

	`mdbr-leaf-mt` ranks #1 on this benchmark for models with <30M parameters.

	\| Model \| Size \| MTEB v2 (Eng) \|
	\|------------------------------------\|---------\|---------------\|
	\| OpenAI text-embedding-3-large \| Unknown \| 66.43 \|
	\| OpenAI text-embedding-3-small \| Unknown \| 64.56 \|
	\| mdbr-leaf-mt \| 23M \| 63.97 \|
	\| gte-small \| 33M \| 63.22 \|
	\| snowflake-arctic-embed-s \| 32M \| 61.59 \|
	\| e5-small-v2 \| 33M \| 61.32 \|
	\| granite-embedding-small-english-r2 \| 47M \| 61.07 \|
	\| all-MiniLM-L6-v2 \| 22M \| 59.03 \|

	# Quickstart

	## Sentence Transformers

	```python
	from sentence_transformers import SentenceTransformer

	# Load the model
	model = SentenceTransformer("MongoDB/mdbr-leaf-mt-asym")

	# Example queries and documents
	queries = [
	"What is machine learning?",
	"How does neural network training work?",
	]

	documents = [
	"Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
	"Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.",
	]

	# Encode queries and documents
	query_embeddings = model.encode_query(queries)
	document_embeddings = model.encode_document(documents)

	# Compute similarity scores
	scores = model.similarity(query_embeddings, document_embeddings)

	# Print results
	for i, query in enumerate(queries):
	print(f"Query: {query}")
	for j, doc in enumerate(documents):
	print(f" Similarity: {scores[i, j]:.4f} \| Document {j}: {doc[:80]}...")
	```
	<details>
	<summary>See example output</summary>

	```
	Query: What is machine learning?
	Similarity: 0.8483 \| Document 0: Machine learning is a subset of artificial intelligence that focuses on algorith...
	Similarity: 0.6805 \| Document 1: Neural networks are trained through backpropagation, adjusting weights to minimi...

	Query: How does neural network training work?
	Similarity: 0.6050 \| Document 0: Machine learning is a subset of artificial intelligence that focuses on algorith...
	Similarity: 0.7689 \| Document 1: Neural networks are trained through backpropagation, adjusting weights to minimi...
	```

	</details>

	## Transformers Usage

	See [here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/transformers_example_mt.ipynb).

	## Asymmetric Retrieval Setup

	`mdbr-leaf-mt` is aligned to [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1), the model it has been distilled from.
	This enables flexible architectures in which, for example, documents are encoded using the larger model,
	while queries can be encoded faster and more efficiently with the compact `leaf` model.
	This usually outperforms the symmetric setup in which both queries and documents are encoded with `leaf`.

	To use exclusively the leaf model, use [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt).

	## MRL Truncation

	Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
	```python
	query_embeds = model.encode_query(queries, truncate_dim=256)
	doc_embeds = model.encode_document(documents, truncate_dim=256)

	similarities = model.similarity(query_embeds, doc_embeds)

	print('After MRL:')
	print(f"* Embeddings dimension: {query_embeds.shape[1]}")
	print(f"* Similarities:\n{similarities}")
	```
	<details>
	<summary>See example output</summary>

	```
	After MRL:
	* Embeddings dimension: 256
	* Similarities:
	tensor([[0.8584, 0.6921],
	[0.5973, 0.7893]])
	```

	</details>


	## Vector Quantization
	Vector quantization, for example to `int8` or `binary`, can be performed as follows:

	Note: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization).
	Good initial values are -1.0 and +1.0.
	```python
	from sentence_transformers.quantization import quantize_embeddings
	import torch

	query_embeds = model.encode_query(queries)
	doc_embeds = model.encode_document(documents)

	# Quantize embeddings to int8 using -1.0 and +1.0
	ranges = torch.tensor([[-1.0], [+1.0]]).expand(2, query_embeds.shape[1]).cpu().numpy()
	query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
	doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)

	# Calculate similarities; cast to int64 to avoid under/overflow
	similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T

	print('After quantization:')
	print(f"* Embeddings type: {query_embeds.dtype}")
	print(f"* Similarities:\n{similarities}")
	```
	<details>
	<summary>See example output</summary>

	```
	After quantization:
	* Embeddings type: int8
	* Similarities:
	[[11392 9204]
	[8256 10470]]
	```

	</details>


	## Evaluation

	Please [see here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/evaluate_models.ipynb).

	# Citation

	If you use this model in your work, please cite:

	```bibtex
	@misc{mdbr_leaf,
	title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations},
	author={Robin Vujanic and Thomas Rueckstiess},
	year={2025},
	eprint={2509.12539},
	archivePrefix={arXiv},
	primaryClass={cs.IR},
	url={https://arxiv.org/abs/2509.12539},
	}
	```

	# License

	This model is released under Apache 2.0 License.

	# Contact

	For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML research team at [email protected].

	# Acknowledgments

	This model version was created by @tomaarsen - we thank him for his contribution to this project.