|
|
--- |
|
|
language: multilingual |
|
|
license: mit |
|
|
tags: |
|
|
- onnx |
|
|
- optimum |
|
|
- text-embedding |
|
|
- onnxruntime |
|
|
- opset19 |
|
|
- sentence-similarity |
|
|
- gpu |
|
|
- optimized |
|
|
datasets: |
|
|
- mmarco |
|
|
pipeline_tag: sentence-similarity |
|
|
--- |
|
|
|
|
|
# gte-multilingual-reranker-base-onnx-op19-opt-gpu |
|
|
|
|
|
This model is an ONNX version of [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) using ONNX opset 19. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Framework**: ONNX Runtime |
|
|
- **ONNX Opset**: 19 |
|
|
- **Task**: sentence-similarity |
|
|
- **Target Device**: GPU |
|
|
- **Optimized**: Yes |
|
|
- **Original Model**: [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) |
|
|
- **Exported On**: 2025-03-31 |
|
|
- **Author** This model was modified by [Jaro](https://www.linkedin.com/in/jaroai/) |
|
|
|
|
|
|
|
|
## Environment and Package Versions |
|
|
|
|
|
| Package | Version | |
|
|
| --- | --- | |
|
|
| transformers | 4.48.3 | |
|
|
| optimum | 1.24.0 | |
|
|
| onnx | 1.17.0 | |
|
|
| onnxruntime | 1.21.0 | |
|
|
| torch | 2.5.1 | |
|
|
| numpy | 1.26.4 | |
|
|
| huggingface_hub | 0.28.1 | |
|
|
| python | 3.12.9 | |
|
|
| system | Darwin 24.3.0 | |
|
|
|
|
|
|
|
|
### Applied Optimizations |
|
|
|
|
|
| Optimization | Setting | |
|
|
| --- | --- | |
|
|
| Graph Optimization Level | Extended | |
|
|
| Optimize for GPU | Yes | |
|
|
| Use FP16 | No | |
|
|
| Transformers Specific Optimizations Enabled | Yes | |
|
|
| Gelu Fusion Enabled | Yes | |
|
|
| Layer Norm Fusion Enabled | Yes | |
|
|
| Attention Fusion Enabled | Yes | |
|
|
| Skip Layer Norm Fusion Enabled | Yes | |
|
|
| Gelu Approximation Enabled | Yes | |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from optimum.onnxruntime import ORTModelForSequenceClassification |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = ORTModelForSequenceClassification.from_pretrained("onnx") |
|
|
tokenizer = AutoTokenizer.from_pretrained("onnx") |
|
|
|
|
|
# Prepare input |
|
|
text = "Your text here" |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
|
|
# Run inference |
|
|
outputs = model(**inputs) |
|
|
``` |
|
|
|
|
|
## Export Process |
|
|
|
|
|
This model was exported to ONNX format using the Optimum library from Hugging Face with opset 19. |
|
|
Graph optimization was applied during export, targeting GPU devices. |
|
|
|
|
|
## Performance |
|
|
|
|
|
ONNX Runtime models generally offer better inference speed compared to native PyTorch models, |
|
|
especially when deployed to production environments. |
|
|
|