--- library_name: onnx tags: - text-reranking - jina - onnx - fp16 pipeline_tag: sentence-similarity base_model: - jinaai/jina-reranker-m0 --- # Jina Reranker M0 - ONNX FP16 Version This repository contains the [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0) model converted to the ONNX format with FP16 precision. ## Model Description Jina Reranker is designed to rerank search results or document passages based on their relevance to a given query. It takes a query and a list of documents as input and outputs relevance scores. This version is specifically exported for use with ONNX Runtime. **Original Model Card:** [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0) ## Technical Details * **Format:** ONNX * **Opset:** 14 * **Precision:** FP16 (exported using `.half()`) * **External Data:** Uses ONNX external data format due to model size. All files in this repository are required. `huggingface_hub` handles downloading them automatically. * **Export Source:** Exported from the Hugging Face `transformers` library using `torch.onnx.export`. ## Usage You can use this model with `onnxruntime` for inference. You will also need the `transformers` library to load the appropriate processor for input preparation and `huggingface_hub` to download the model files. **1. Installation:** ```bash pip install onnxruntime huggingface_hub transformers torch sentencepiece ``` **2. Inference Script:** ```python import onnxruntime as ort from huggingface_hub import hf_hub_download from transformers import AutoProcessor import numpy as np import torch # For processor output handling # --- Configuration --- # Replace with your repository ID if different repo_id = "jian-mo/jina-reranker-m0-onnx" onnx_filename = "jina-reranker-m0.onnx" # Main ONNX file name # Use the original model ID to load the correct processor original_model_id = "jinaai/jina-reranker-m0" # --- End Configuration --- # 1. Download ONNX model files from the Hub # hf_hub_download automatically handles external data files linked via LFS print(f"Downloading ONNX model from {repo_id}...") local_onnx_path = hf_hub_download( repo_id=repo_id, filename=onnx_filename ) print(f"ONNX model downloaded to: {local_onnx_path}") # 2. Load ONNX Runtime session print("Loading ONNX Inference Session...") # You can choose execution providers, e.g., ['CUDAExecutionProvider', 'CPUExecutionProvider'] # if you have GPU support and the necessary onnxruntime build. session_options = ort.SessionOptions() # session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED providers = ['CPUExecutionProvider'] # Default to CPU session = ort.InferenceSession(local_onnx_path, sess_options=session_options, providers=providers) print(f"ONNX session loaded with provider: {session.get_providers()}") # 3. Load the Processor print(f"Loading processor from {original_model_id}...") processor = AutoProcessor.from_pretrained(original_model_id, trust_remote_code=True) print("Processor loaded.") # 4. Prepare Input Data query = "What is deep learning?" document = "Deep learning is a subset of machine learning based on artificial neural networks with representation learning." # Example with multiple documents (batch processing) # documents = [ # "Deep learning is a subset of machine learning based on artificial neural networks with representation learning.", # "Artificial intelligence refers to the simulation of human intelligence in machines.", # "A transformer is a deep learning model used primarily in the field of natural language processing." # ] # Use processor logic suitable for query + multiple documents if needed print("Preparing input data...") # Process query and document together as expected by the reranker model inputs = processor( text=f"{query} {document}", images=None, # Assuming text-only reranking return_tensors="pt", # Get PyTorch tensors first padding=True, truncation=True, max_length=512 # Use a reasonable max_length ) # Convert to NumPy for ONNX Runtime inputs_np = { "input_ids": inputs["input_ids"].numpy(), "attention_mask": inputs["attention_mask"].numpy() } print("Input data prepared.") # print("Input shapes:", {k: v.shape for k, v in inputs_np.items()}) # 5. Run Inference print("Running inference...") output_names = [output.name for output in session.get_outputs()] outputs = session.run(output_names, inputs_np) print("Inference complete.") # 6. Process Output # The exact interpretation depends on the model's output structure. # For Jina Reranker, the output is typically a logit score. # Higher values usually indicate higher relevance. Check the original model card. print(f"Number of outputs: {len(outputs)}") if len(outputs) > 0: logits = outputs[0] print(f"Output logits shape: {logits.shape}") # Often, the relevance score is associated