File size: 4,061 Bytes

---

library_name: onnx
pipeline_tag: translation
language:
  - ar
  - bg
  - zh
  - cs
  - da
  - nl
  - en
  - fi
  - fr
  - de
  - el
  - gu
  - he
  - hi
  - hu
  - id
  - it
  - ja
  - ko
  - fa
  - pl
  - pt
  - ro
  - ru
  - sk
  - es
  - sv
  - tl
  - th
  - tr
  - uk
  - vi
license: gemma
tags:
- onnx
- onnxruntime
- optimum
- translation
- gemma
- int4
- quantized
- cuda
- directml
base_model: google/gemma-3-4b-pt
base_model_relation: quantized
model-index:
- name: YanoljaNEXT-Rosetta-4B-ONNX
  results:
  - task:
      type: translation
      name: Translation
    metrics:
    - type: bleu
      value: 31.5
      name: BLEU Score
---


# YanoljaNEXT-Rosetta-4B-2510-ONNX

## Introduction
This repository hosts Pangaia Software's optimized versions of the [`YanoljaNEXT-Rosetta-4B-2510`](https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-4B-2510) model to accelerate inference with ONNX Runtime.

Optimized models are published here in ONNX format to run with ONNX Runtime on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

Here are some of the optimized configurations we have added:  

1. ONNX model for int4 CPU: ONNX model for CPU and mobile using int4 quantization via RTN.
2. ONNX model for int4 GPU: ONNX model for GPU using int4 quantization via RTN.

## Model Run
For CPU:

```bash

# Download the model directly using the Hugging Face CLI

huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include cpu_and_mobile/* --local-dir .



# Install the CPU package of ONNX Runtime GenAI

pip install --pre onnxruntime-genai

```

For CUDA:

```bash

# Download the model directly using the Hugging Face CLI

huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include cuda/* --local-dir .



# Install the CUDA package of ONNX Runtime GenAI

pip install --pre onnxruntime-genai-cuda

```

For GPU:

```bash

# Download the model directly using the Hugging Face CLI

huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include gpu/* --local-dir .



# Install the CUDA package of ONNX Runtime GenAI

pip install --pre onnxruntime-genai-cuda

```

For DirectML:

```bash

# Download the model directly using the Hugging Face CLI

huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include directml/* --local-dir .



# Install the DML package of ONNX Runtime GenAI

pip install --pre onnxruntime-genai-directml

```

Execution:

Refer to the [`ONNX Runtime GenAI`](https://github.com/microsoft/onnxruntime-genai) repo for the latest samples for model execution. 

**Note**: since this is a Gemma-based model, use the corresponding prompt template:

```

System = "<start_of_turn>instruction\n{{CONTENT}}<end_of_turn>\n",

User = "<start_of_turn>source\n{{CONTENT}}<end_of_turn>\n",

Assistant = "<start_of_turn>translation\n{{CONTENT}}<end_of_turn>\n",

Stop = ["<end_of_turn>", "<start_of_turn>"]

```


## Model Description
- Developed by: Pangaia Software
- Model type: ONNX
- License: gemma
- Model Description: This is a conversion of the [`YanoljaNEXT-Rosetta-4B-2510`](https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-4B-2510) model for ONNX Runtime inference, which in turn is based on the [`google/gemma-3-4b-pt`](https://huggingface.co/google/gemma-3-4b-pt) model.

**Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied.

## License
This model is released under the Gemma license, inherited from its base model, [`google/gemma-3-4b-pt`](https://huggingface.co/google/gemma-3-4b-pt). Please consult the official [Gemma license terms](https://ai.google.dev/gemma/terms) for detailed usage guidelines.