|
|
--- |
|
|
library_name: onnx |
|
|
tags: |
|
|
- text-reranking |
|
|
- jina |
|
|
- onnx |
|
|
- fp16 |
|
|
pipeline_tag: sentence-similarity |
|
|
base_model: |
|
|
- jinaai/jina-reranker-m0 |
|
|
--- |
|
|
|
|
|
# Jina Reranker M0 - ONNX FP16 Version |
|
|
|
|
|
This repository contains the [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0) model converted to the ONNX format with FP16 precision. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
Jina Reranker is designed to rerank search results or document passages based on their relevance to a given query. It takes a query and a list of documents as input and outputs relevance scores. |
|
|
|
|
|
This version is specifically exported for use with ONNX Runtime. |
|
|
|
|
|
**Original Model Card:** [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0) |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
* **Format:** ONNX |
|
|
* **Opset:** 14 |
|
|
* **Precision:** FP16 (exported using `.half()`) |
|
|
* **External Data:** Uses ONNX external data format due to model size. All files in this repository are required. `huggingface_hub` handles downloading them automatically. |
|
|
* **Export Source:** Exported from the Hugging Face `transformers` library using `torch.onnx.export`. |
|
|
|
|
|
## Usage |
|
|
|
|
|
You can use this model with `onnxruntime` for inference. You will also need the `transformers` library to load the appropriate processor for input preparation and `huggingface_hub` to download the model files. |
|
|
|
|
|
**1. Installation:** |
|
|
|
|
|
```bash |
|
|
pip install onnxruntime huggingface_hub transformers torch sentencepiece |
|
|
``` |
|
|
|
|
|
**2. Inference Script:** |
|
|
|
|
|
```python |
|
|
import onnxruntime as ort |
|
|
from huggingface_hub import hf_hub_download |
|
|
from transformers import AutoProcessor |
|
|
import numpy as np |
|
|
import torch # For processor output handling |
|
|
|
|
|
# --- Configuration --- |
|
|
# Replace with your repository ID if different |
|
|
repo_id = "jian-mo/jina-reranker-m0-onnx" |
|
|
onnx_filename = "jina-reranker-m0.onnx" # Main ONNX file name |
|
|
# Use the original model ID to load the correct processor |
|
|
original_model_id = "jinaai/jina-reranker-m0" |
|
|
# --- End Configuration --- |
|
|
|
|
|
# 1. Download ONNX model files from the Hub |
|
|
# hf_hub_download automatically handles external data files linked via LFS |
|
|
print(f"Downloading ONNX model from {repo_id}...") |
|
|
local_onnx_path = hf_hub_download( |
|
|
repo_id=repo_id, |
|
|
filename=onnx_filename |
|
|
) |
|
|
print(f"ONNX model downloaded to: {local_onnx_path}") |
|
|
|
|
|
# 2. Load ONNX Runtime session |
|
|
print("Loading ONNX Inference Session...") |
|
|
# You can choose execution providers, e.g., ['CUDAExecutionProvider', 'CPUExecutionProvider'] |
|
|
# if you have GPU support and the necessary onnxruntime build. |
|
|
session_options = ort.SessionOptions() |
|
|
# session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED |
|
|
providers = ['CPUExecutionProvider'] # Default to CPU |
|
|
session = ort.InferenceSession(local_onnx_path, sess_options=session_options, providers=providers) |
|
|
print(f"ONNX session loaded with provider: {session.get_providers()}") |
|
|
|
|
|
# 3. Load the Processor |
|
|
print(f"Loading processor from {original_model_id}...") |
|
|
processor = AutoProcessor.from_pretrained(original_model_id, trust_remote_code=True) |
|
|
print("Processor loaded.") |
|
|
|
|
|
# 4. Prepare Input Data |
|
|
query = "What is deep learning?" |
|
|
document = "Deep learning is a subset of machine learning based on artificial neural networks with representation learning." |
|
|
# Example with multiple documents (batch processing) |
|
|
# documents = [ |
|
|
# "Deep learning is a subset of machine learning based on artificial neural networks with representation learning.", |
|
|
# "Artificial intelligence refers to the simulation of human intelligence in machines.", |
|
|
# "A transformer is a deep learning model used primarily in the field of natural language processing." |
|
|
# ] |
|
|
# Use processor logic suitable for query + multiple documents if needed |
|
|
|
|
|
print("Preparing input data...") |
|
|
# Process query and document together as expected by the reranker model |
|
|
inputs = processor( |
|
|
text=f"{query} {document}", |
|
|
images=None, # Assuming text-only reranking |
|
|
return_tensors="pt", # Get PyTorch tensors first |
|
|
padding=True, |
|
|
truncation=True, |
|
|
max_length=512 # Use a reasonable max_length |
|
|
) |
|
|
|
|
|
# Convert to NumPy for ONNX Runtime |
|
|
inputs_np = { |
|
|
"input_ids": inputs["input_ids"].numpy(), |
|
|
"attention_mask": inputs["attention_mask"].numpy() |
|
|
} |
|
|
print("Input data prepared.") |
|
|
# print("Input shapes:", {k: v.shape for k, v in inputs_np.items()}) |
|
|
|
|
|
# 5. Run Inference |
|
|
print("Running inference...") |
|
|
output_names = [output.name for output in session.get_outputs()] |
|
|
outputs = session.run(output_names, inputs_np) |
|
|
print("Inference complete.") |
|
|
|
|
|
# 6. Process Output |
|
|
# The exact interpretation depends on the model's output structure. |
|
|
# For Jina Reranker, the output is typically a logit score. |
|
|
# Higher values usually indicate higher relevance. Check the original model card. |
|
|
print(f"Number of outputs: {len(outputs)}") |
|
|
if len(outputs) > 0: |
|
|
logits = outputs[0] |
|
|
print(f"Output logits shape: {logits.shape}") |
|
|
# Often, the relevance score is associated |