JustJaro's picture
Update README.md
4c801f1 verified
---
language: multilingual
license: mit
tags:
- onnx
- optimum
- text-embedding
- onnxruntime
- opset19
- sentence-similarity
- gpu
- optimized
datasets:
- mmarco
pipeline_tag: sentence-similarity
---
# gte-multilingual-reranker-base-onnx-op19-opt-gpu
This model is an ONNX version of [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) using ONNX opset 19.
## Model Details
- **Framework**: ONNX Runtime
- **ONNX Opset**: 19
- **Task**: sentence-similarity
- **Target Device**: GPU
- **Optimized**: Yes
- **Original Model**: [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base)
- **Exported On**: 2025-03-31
- **Author** This model was modified by [Jaro](https://www.linkedin.com/in/jaroai/)
## Environment and Package Versions
| Package | Version |
| --- | --- |
| transformers | 4.48.3 |
| optimum | 1.24.0 |
| onnx | 1.17.0 |
| onnxruntime | 1.21.0 |
| torch | 2.5.1 |
| numpy | 1.26.4 |
| huggingface_hub | 0.28.1 |
| python | 3.12.9 |
| system | Darwin 24.3.0 |
### Applied Optimizations
| Optimization | Setting |
| --- | --- |
| Graph Optimization Level | Extended |
| Optimize for GPU | Yes |
| Use FP16 | No |
| Transformers Specific Optimizations Enabled | Yes |
| Gelu Fusion Enabled | Yes |
| Layer Norm Fusion Enabled | Yes |
| Attention Fusion Enabled | Yes |
| Skip Layer Norm Fusion Enabled | Yes |
| Gelu Approximation Enabled | Yes |
## Usage
```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
# Load model and tokenizer
model = ORTModelForSequenceClassification.from_pretrained("onnx")
tokenizer = AutoTokenizer.from_pretrained("onnx")
# Prepare input
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt")
# Run inference
outputs = model(**inputs)
```
## Export Process
This model was exported to ONNX format using the Optimum library from Hugging Face with opset 19.
Graph optimization was applied during export, targeting GPU devices.
## Performance
ONNX Runtime models generally offer better inference speed compared to native PyTorch models,
especially when deployed to production environments.