jian-mo's picture
Update README.md
09da64d verified
---
library_name: onnx
tags:
- text-reranking
- jina
- onnx
- fp16
pipeline_tag: sentence-similarity
base_model:
- jinaai/jina-reranker-m0
---
# Jina Reranker M0 - ONNX FP16 Version
This repository contains the [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0) model converted to the ONNX format with FP16 precision.
## Model Description
Jina Reranker is designed to rerank search results or document passages based on their relevance to a given query. It takes a query and a list of documents as input and outputs relevance scores.
This version is specifically exported for use with ONNX Runtime.
**Original Model Card:** [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0)
## Technical Details
* **Format:** ONNX
* **Opset:** 14
* **Precision:** FP16 (exported using `.half()`)
* **External Data:** Uses ONNX external data format due to model size. All files in this repository are required. `huggingface_hub` handles downloading them automatically.
* **Export Source:** Exported from the Hugging Face `transformers` library using `torch.onnx.export`.
## Usage
You can use this model with `onnxruntime` for inference. You will also need the `transformers` library to load the appropriate processor for input preparation and `huggingface_hub` to download the model files.
**1. Installation:**
```bash
pip install onnxruntime huggingface_hub transformers torch sentencepiece
```
**2. Inference Script:**
```python
import onnxruntime as ort
from huggingface_hub import hf_hub_download
from transformers import AutoProcessor
import numpy as np
import torch # For processor output handling
# --- Configuration ---
# Replace with your repository ID if different
repo_id = "jian-mo/jina-reranker-m0-onnx"
onnx_filename = "jina-reranker-m0.onnx" # Main ONNX file name
# Use the original model ID to load the correct processor
original_model_id = "jinaai/jina-reranker-m0"
# --- End Configuration ---
# 1. Download ONNX model files from the Hub
# hf_hub_download automatically handles external data files linked via LFS
print(f"Downloading ONNX model from {repo_id}...")
local_onnx_path = hf_hub_download(
repo_id=repo_id,
filename=onnx_filename
)
print(f"ONNX model downloaded to: {local_onnx_path}")
# 2. Load ONNX Runtime session
print("Loading ONNX Inference Session...")
# You can choose execution providers, e.g., ['CUDAExecutionProvider', 'CPUExecutionProvider']
# if you have GPU support and the necessary onnxruntime build.
session_options = ort.SessionOptions()
# session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED
providers = ['CPUExecutionProvider'] # Default to CPU
session = ort.InferenceSession(local_onnx_path, sess_options=session_options, providers=providers)
print(f"ONNX session loaded with provider: {session.get_providers()}")
# 3. Load the Processor
print(f"Loading processor from {original_model_id}...")
processor = AutoProcessor.from_pretrained(original_model_id, trust_remote_code=True)
print("Processor loaded.")
# 4. Prepare Input Data
query = "What is deep learning?"
document = "Deep learning is a subset of machine learning based on artificial neural networks with representation learning."
# Example with multiple documents (batch processing)
# documents = [
# "Deep learning is a subset of machine learning based on artificial neural networks with representation learning.",
# "Artificial intelligence refers to the simulation of human intelligence in machines.",
# "A transformer is a deep learning model used primarily in the field of natural language processing."
# ]
# Use processor logic suitable for query + multiple documents if needed
print("Preparing input data...")
# Process query and document together as expected by the reranker model
inputs = processor(
text=f"{query} {document}",
images=None, # Assuming text-only reranking
return_tensors="pt", # Get PyTorch tensors first
padding=True,
truncation=True,
max_length=512 # Use a reasonable max_length
)
# Convert to NumPy for ONNX Runtime
inputs_np = {
"input_ids": inputs["input_ids"].numpy(),
"attention_mask": inputs["attention_mask"].numpy()
}
print("Input data prepared.")
# print("Input shapes:", {k: v.shape for k, v in inputs_np.items()})
# 5. Run Inference
print("Running inference...")
output_names = [output.name for output in session.get_outputs()]
outputs = session.run(output_names, inputs_np)
print("Inference complete.")
# 6. Process Output
# The exact interpretation depends on the model's output structure.
# For Jina Reranker, the output is typically a logit score.
# Higher values usually indicate higher relevance. Check the original model card.
print(f"Number of outputs: {len(outputs)}")
if len(outputs) > 0:
logits = outputs[0]
print(f"Output logits shape: {logits.shape}")
# Often, the relevance score is associated