ConfidentialMind
/

gte-multilingual-reranker-base-onnx-op19-opt-gpu

Sentence Similarity

Model card Files Files and versions

gte-multilingual-reranker-base-onnx-op19-opt-gpu / README.md

JustJaro's picture

Update README.md

4c801f1 verified 5 months ago

|

history blame contribute delete

2.26 kB

	---
	language: multilingual
	license: mit
	tags:
	- onnx
	- optimum
	- text-embedding
	- onnxruntime
	- opset19
	- sentence-similarity
	- gpu
	- optimized
	datasets:
	- mmarco
	pipeline_tag: sentence-similarity
	---

	# gte-multilingual-reranker-base-onnx-op19-opt-gpu

	This model is an ONNX version of [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) using ONNX opset 19.

	## Model Details

	- Framework: ONNX Runtime
	- ONNX Opset: 19
	- Task: sentence-similarity
	- Target Device: GPU
	- Optimized: Yes
	- Original Model: [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base)
	- Exported On: 2025-03-31
	- Author This model was modified by [Jaro](https://www.linkedin.com/in/jaroai/)


	## Environment and Package Versions

	\| Package \| Version \|
	\| --- \| --- \|
	\| transformers \| 4.48.3 \|
	\| optimum \| 1.24.0 \|
	\| onnx \| 1.17.0 \|
	\| onnxruntime \| 1.21.0 \|
	\| torch \| 2.5.1 \|
	\| numpy \| 1.26.4 \|
	\| huggingface_hub \| 0.28.1 \|
	\| python \| 3.12.9 \|
	\| system \| Darwin 24.3.0 \|


	### Applied Optimizations

	\| Optimization \| Setting \|
	\| --- \| --- \|
	\| Graph Optimization Level \| Extended \|
	\| Optimize for GPU \| Yes \|
	\| Use FP16 \| No \|
	\| Transformers Specific Optimizations Enabled \| Yes \|
	\| Gelu Fusion Enabled \| Yes \|
	\| Layer Norm Fusion Enabled \| Yes \|
	\| Attention Fusion Enabled \| Yes \|
	\| Skip Layer Norm Fusion Enabled \| Yes \|
	\| Gelu Approximation Enabled \| Yes \|


	## Usage

	```python
	from optimum.onnxruntime import ORTModelForSequenceClassification
	from transformers import AutoTokenizer

	# Load model and tokenizer
	model = ORTModelForSequenceClassification.from_pretrained("onnx")
	tokenizer = AutoTokenizer.from_pretrained("onnx")

	# Prepare input
	text = "Your text here"
	inputs = tokenizer(text, return_tensors="pt")

	# Run inference
	outputs = model(**inputs)
	```

	## Export Process

	This model was exported to ONNX format using the Optimum library from Hugging Face with opset 19.
	Graph optimization was applied during export, targeting GPU devices.

	## Performance

	ONNX Runtime models generally offer better inference speed compared to native PyTorch models,
	especially when deployed to production environments.