Update README.md

09da64d verified 8 months ago

4.91 kB

	---
	library_name: onnx
	tags:
	- text-reranking
	- jina
	- onnx
	- fp16
	pipeline_tag: sentence-similarity
	base_model:
	- jinaai/jina-reranker-m0
	---

	# Jina Reranker M0 - ONNX FP16 Version

	This repository contains the [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0) model converted to the ONNX format with FP16 precision.

	## Model Description

	Jina Reranker is designed to rerank search results or document passages based on their relevance to a given query. It takes a query and a list of documents as input and outputs relevance scores.

	This version is specifically exported for use with ONNX Runtime.

	Original Model Card: [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0)

	## Technical Details

	* Format: ONNX
	* Opset: 14
	* Precision: FP16 (exported using `.half()`)
	* External Data: Uses ONNX external data format due to model size. All files in this repository are required. `huggingface_hub` handles downloading them automatically.
	* Export Source: Exported from the Hugging Face `transformers` library using `torch.onnx.export`.

	## Usage

	You can use this model with `onnxruntime` for inference. You will also need the `transformers` library to load the appropriate processor for input preparation and `huggingface_hub` to download the model files.

	1. Installation:

	```bash
	pip install onnxruntime huggingface_hub transformers torch sentencepiece
	```

	2. Inference Script:

	```python
	import onnxruntime as ort
	from huggingface_hub import hf_hub_download
	from transformers import AutoProcessor
	import numpy as np
	import torch # For processor output handling

	# --- Configuration ---
	# Replace with your repository ID if different
	repo_id = "jian-mo/jina-reranker-m0-onnx"
	onnx_filename = "jina-reranker-m0.onnx" # Main ONNX file name
	# Use the original model ID to load the correct processor
	original_model_id = "jinaai/jina-reranker-m0"
	# --- End Configuration ---

	# 1. Download ONNX model files from the Hub
	# hf_hub_download automatically handles external data files linked via LFS
	print(f"Downloading ONNX model from {repo_id}...")
	local_onnx_path = hf_hub_download(
	repo_id=repo_id,
	filename=onnx_filename
	)
	print(f"ONNX model downloaded to: {local_onnx_path}")

	# 2. Load ONNX Runtime session
	print("Loading ONNX Inference Session...")
	# You can choose execution providers, e.g., ['CUDAExecutionProvider', 'CPUExecutionProvider']
	# if you have GPU support and the necessary onnxruntime build.
	session_options = ort.SessionOptions()
	# session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED
	providers = ['CPUExecutionProvider'] # Default to CPU
	session = ort.InferenceSession(local_onnx_path, sess_options=session_options, providers=providers)
	print(f"ONNX session loaded with provider: {session.get_providers()}")

	# 3. Load the Processor
	print(f"Loading processor from {original_model_id}...")
	processor = AutoProcessor.from_pretrained(original_model_id, trust_remote_code=True)
	print("Processor loaded.")

	# 4. Prepare Input Data
	query = "What is deep learning?"
	document = "Deep learning is a subset of machine learning based on artificial neural networks with representation learning."
	# Example with multiple documents (batch processing)
	# documents = [
	# "Deep learning is a subset of machine learning based on artificial neural networks with representation learning.",
	# "Artificial intelligence refers to the simulation of human intelligence in machines.",
	# "A transformer is a deep learning model used primarily in the field of natural language processing."
	# ]
	# Use processor logic suitable for query + multiple documents if needed

	print("Preparing input data...")
	# Process query and document together as expected by the reranker model
	inputs = processor(
	text=f"{query} {document}",
	images=None, # Assuming text-only reranking
	return_tensors="pt", # Get PyTorch tensors first
	padding=True,
	truncation=True,
	max_length=512 # Use a reasonable max_length
	)

	# Convert to NumPy for ONNX Runtime
	inputs_np = {
	"input_ids": inputs["input_ids"].numpy(),
	"attention_mask": inputs["attention_mask"].numpy()
	}
	print("Input data prepared.")
	# print("Input shapes:", {k: v.shape for k, v in inputs_np.items()})

	# 5. Run Inference
	print("Running inference...")
	output_names = [output.name for output in session.get_outputs()]
	outputs = session.run(output_names, inputs_np)
	print("Inference complete.")

	# 6. Process Output
	# The exact interpretation depends on the model's output structure.
	# For Jina Reranker, the output is typically a logit score.
	# Higher values usually indicate higher relevance. Check the original model card.
	print(f"Number of outputs: {len(outputs)}")
	if len(outputs) > 0:
	logits = outputs[0]
	print(f"Output logits shape: {logits.shape}")
	# Often, the relevance score is associated