Upload folder using huggingface_hub

8b63e92 verified about 1 month ago

4.06 kB

	---
	library_name: onnx
	pipeline_tag: translation
	language:
	- ar
	- bg
	- zh
	- cs
	- da
	- nl
	- en
	- fi
	- fr
	- de
	- el
	- gu
	- he
	- hi
	- hu
	- id
	- it
	- ja
	- ko
	- fa
	- pl
	- pt
	- ro
	- ru
	- sk
	- es
	- sv
	- tl
	- th
	- tr
	- uk
	- vi
	license: gemma
	tags:
	- onnx
	- onnxruntime
	- optimum
	- translation
	- gemma
	- int4
	- quantized
	- cuda
	- directml
	base_model: google/gemma-3-4b-pt
	base_model_relation: quantized
	model-index:
	- name: YanoljaNEXT-Rosetta-4B-ONNX
	results:
	- task:
	type: translation
	name: Translation
	metrics:
	- type: bleu
	value: 31.5
	name: BLEU Score
	---

	# YanoljaNEXT-Rosetta-4B-2510-ONNX

	## Introduction
	This repository hosts Pangaia Software's optimized versions of the [`YanoljaNEXT-Rosetta-4B-2510`](https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-4B-2510) model to accelerate inference with ONNX Runtime.

	Optimized models are published here in ONNX format to run with ONNX Runtime on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

	Here are some of the optimized configurations we have added:

	1. ONNX model for int4 CPU: ONNX model for CPU and mobile using int4 quantization via RTN.
	2. ONNX model for int4 GPU: ONNX model for GPU using int4 quantization via RTN.

	## Model Run
	For CPU:

	```bash
	# Download the model directly using the Hugging Face CLI
	huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include cpu_and_mobile/* --local-dir .

	# Install the CPU package of ONNX Runtime GenAI
	pip install --pre onnxruntime-genai
	```

	For CUDA:

	```bash
	# Download the model directly using the Hugging Face CLI
	huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include cuda/* --local-dir .

	# Install the CUDA package of ONNX Runtime GenAI
	pip install --pre onnxruntime-genai-cuda
	```

	For GPU:

	```bash
	# Download the model directly using the Hugging Face CLI
	huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include gpu/* --local-dir .

	# Install the CUDA package of ONNX Runtime GenAI
	pip install --pre onnxruntime-genai-cuda
	```

	For DirectML:

	```bash
	# Download the model directly using the Hugging Face CLI
	huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include directml/* --local-dir .

	# Install the DML package of ONNX Runtime GenAI
	pip install --pre onnxruntime-genai-directml
	```

	Execution:

	Refer to the [`ONNX Runtime GenAI`](https://github.com/microsoft/onnxruntime-genai) repo for the latest samples for model execution.

	Note: since this is a Gemma-based model, use the corresponding prompt template:

	```
	System = "<start_of_turn>instruction\n{{CONTENT}}<end_of_turn>\n",
	User = "<start_of_turn>source\n{{CONTENT}}<end_of_turn>\n",
	Assistant = "<start_of_turn>translation\n{{CONTENT}}<end_of_turn>\n",
	Stop = ["<end_of_turn>", "<start_of_turn>"]
	```


	## Model Description
	- Developed by: Pangaia Software
	- Model type: ONNX
	- License: gemma
	- Model Description: This is a conversion of the [`YanoljaNEXT-Rosetta-4B-2510`](https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-4B-2510) model for ONNX Runtime inference, which in turn is based on the [`google/gemma-3-4b-pt`](https://huggingface.co/google/gemma-3-4b-pt) model.

	Disclaimer: Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied.

	## License
	This model is released under the Gemma license, inherited from its base model, [`google/gemma-3-4b-pt`](https://huggingface.co/google/gemma-3-4b-pt). Please consult the official [Gemma license terms](https://ai.google.dev/gemma/terms) for detailed usage guidelines.