File size: 4,061 Bytes
87dcef3
8b63e92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87dcef3
8b63e92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87dcef3
8b63e92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---

library_name: onnx
pipeline_tag: translation
language:
  - ar
  - bg
  - zh
  - cs
  - da
  - nl
  - en
  - fi
  - fr
  - de
  - el
  - gu
  - he
  - hi
  - hu
  - id
  - it
  - ja
  - ko
  - fa
  - pl
  - pt
  - ro
  - ru
  - sk
  - es
  - sv
  - tl
  - th
  - tr
  - uk
  - vi
license: gemma
tags:
- onnx
- onnxruntime
- optimum
- translation
- gemma
- int4
- quantized
- cuda
- directml
base_model: google/gemma-3-4b-pt
base_model_relation: quantized
model-index:
- name: YanoljaNEXT-Rosetta-4B-ONNX
  results:
  - task:
      type: translation
      name: Translation
    metrics:
    - type: bleu
      value: 31.5
      name: BLEU Score
---


# YanoljaNEXT-Rosetta-4B-2510-ONNX

## Introduction
This repository hosts Pangaia Software's optimized versions of the [`YanoljaNEXT-Rosetta-4B-2510`](https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-4B-2510) model to accelerate inference with ONNX Runtime.

Optimized models are published here in ONNX format to run with ONNX Runtime on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

Here are some of the optimized configurations we have added:  

1. ONNX model for int4 CPU: ONNX model for CPU and mobile using int4 quantization via RTN.
2. ONNX model for int4 GPU: ONNX model for GPU using int4 quantization via RTN.

## Model Run
For CPU:

```bash

# Download the model directly using the Hugging Face CLI

huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include cpu_and_mobile/* --local-dir .



# Install the CPU package of ONNX Runtime GenAI

pip install --pre onnxruntime-genai

```

For CUDA:

```bash

# Download the model directly using the Hugging Face CLI

huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include cuda/* --local-dir .



# Install the CUDA package of ONNX Runtime GenAI

pip install --pre onnxruntime-genai-cuda

```

For GPU:

```bash

# Download the model directly using the Hugging Face CLI

huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include gpu/* --local-dir .



# Install the CUDA package of ONNX Runtime GenAI

pip install --pre onnxruntime-genai-cuda

```

For DirectML:

```bash

# Download the model directly using the Hugging Face CLI

huggingface-cli download PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx --include directml/* --local-dir .



# Install the DML package of ONNX Runtime GenAI

pip install --pre onnxruntime-genai-directml

```

Execution:

Refer to the [`ONNX Runtime GenAI`](https://github.com/microsoft/onnxruntime-genai) repo for the latest samples for model execution. 

**Note**: since this is a Gemma-based model, use the corresponding prompt template:

```

System = "<start_of_turn>instruction\n{{CONTENT}}<end_of_turn>\n",

User = "<start_of_turn>source\n{{CONTENT}}<end_of_turn>\n",

Assistant = "<start_of_turn>translation\n{{CONTENT}}<end_of_turn>\n",

Stop = ["<end_of_turn>", "<start_of_turn>"]

```


## Model Description
- Developed by: Pangaia Software
- Model type: ONNX
- License: gemma
- Model Description: This is a conversion of the [`YanoljaNEXT-Rosetta-4B-2510`](https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-4B-2510) model for ONNX Runtime inference, which in turn is based on the [`google/gemma-3-4b-pt`](https://huggingface.co/google/gemma-3-4b-pt) model.

**Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied.

## License
This model is released under the Gemma license, inherited from its base model, [`google/gemma-3-4b-pt`](https://huggingface.co/google/gemma-3-4b-pt). Please consult the official [Gemma license terms](https://ai.google.dev/gemma/terms) for detailed usage guidelines.