Intel
/

bge-base-en-v1.5-rag-int8-static

Feature Extraction

text-embeddings-inference

Model card Files Files and versions

peterizsak commited on Feb 19, 2024

Commit

9af8004

·

verified ·

1 Parent(s): 030eb32

Upload README.md

Files changed (1) hide show

README.md +88 -0

README.md CHANGED Viewed

@@ -1,3 +1,91 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language:
+- en
 ---
+# BGE-base-en-v1.5-rag-int8-static
+A quantized version of [BAAI/BGE-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) quantized with [Intel® Neural Compressor](https://github.com/huggingface/optimum-intel) and compatible with [Optimum-Intel](https://github.com/huggingface/optimum-intel).
+The model can be used with [Optimum-Intel](https://github.com/huggingface/optimum-intel) API and as a standalone model or as an embedder or ranker module as part of [fastRAG](https://github.com/IntelLabs/fastRAG) RAG pipeline.
+## Technical details
+Quantized using post-training static quantization.
+|  |  |
+|---|:---:|
+| Calibration set | [qasper](https://huggingface.co/datasets/allenai/qasper) (with 80 random samples)" |
+| Quantization tool | [Optimum-Intel](https://github.com/huggingface/optimum-intel) |
+| Backend | `IPEX` |
+| Original model | [BAAI/BGE-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) |
+Instructions how to reproduce the quantized model can be found [here](https://github.com/IntelLabs/fastRAG/tree/main/scripts/optimizations/embedders).
+## Evaluation - MTEB
+Model performance on the [Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/spaces/mteb/leaderboard) *retrieval* and *reranking* tasks.
+|  | `INT8` | `FP32` | % diff |
+|---|:---:|:---:|:---:|
+| Reranking | 0.5886 | 0.5886 | 0.0%   |
+| Retrieval | 0.5242 | 0.5325 | -1.55% |
+## Usage
+### Using with Optimum-intel
+See [Optimum-intel](https://github.com/huggingface/optimum-intel) installation page for instructions how to install. Or run:
+``` sh
+pip install -U optimum[neural-compressor, ipex] intel-extension-for-transformers
+```
+Loading a model:
+``` python
+from optimum.intel import IPEXModel
+model = IPEXModel.from_pretrained("Intel/bge-base-en-v1.5-rag-int8-static")
+```
+Running inference:
+``` python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("Intel/bge-base-en-v1.5-rag-int8-static")
+inputs = tokenizer(sentences, return_tensors='pt')
+with torch.no_grad():
+    outputs = model(**inputs)
+    # get the vector of [CLS]
+    embedded = model_output[0][:, 0]
+```
+### Using with a fastRAG RAG pipeline
+Get started with installing [fastRAG](https://github.com/IntelLabs/fastRAG) as instructed [here](https://github.com/IntelLabs/fastRAG).
+Below is an example for loading the model into a ranker node that embeds and re-ranks all the documents it gets in the node input of a pipeline.
+``` python
+from fastrag.rankers import QuantizedBiEncoderRanker
+ranker = QuantizedBiEncoderRanker("Intel/bge-base-en-v1.5-rag-int8-static")
+```
+and plugging it into a pipeline
+``` python
+from haystack import Pipeline
+p = Pipeline()
+p.add_node(component=retriever, name="retriever", inputs=["Query"])
+p.add_node(component=ranker, name="ranker", inputs=["retriever"])
+```
+See a more complete example notebook [here](https://github.com/IntelLabs/fastRAG/blob/main/examples/optimized-embeddings.ipynb).