Itzune v1.9 EN -> EU machine translation argos model
This model was trained using argostrain training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the Opus project.
Model description
- Developed by: argostranslate
- Model type: traslation
- Model version: v1.9
- Source Language: English
- Target Language: Basque
- License: MIT
Training Data
The English-Basque parallel sentences were collected from the following datasets:
| Dataset | Sentences before cleaning |
|---|---|
| CCMatrix v1 | 7,788,871 |
| OpenSubtitles v2018 | 805,780 |
| XLEnt v1.2 | 800,631 |
| GNOME v1 | 652,298 |
| HPLT v1.1 | 610,694 |
| EhuHac v1 | 585,210 |
| WikiMatrix v1 | 119,480 |
| KDE4 v2 | 100,160 |
| wikimedia v20230407 | 60,990 |
| bible-uedin v1 | 15,893 |
| Tatoeba v2023-04-12 | 2,070 |
| Wiktionary | 629 |
| Total | 11,542,706 |
Evaluation results
Below are the evaluation results on the machine translation from English to Basque compared to Google Translate, NLLB 200 3.3B and mt-hitz-en-eu:
BLEU scores
| Test set | Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
|---|---|---|---|---|
| Flores 200 devtest | 20.5 | 13.3 | 19.2 | 17.0 |
| TaCON | 12.1 | 9.4 | 8.8 | - |
| NTREX | 15.7 | 8.0 | 14.5 | - |
| Average | 16.1 | 10.2 | 14.2 | - |
TER scores
| Test set | Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
|---|---|---|---|---|
| Flores 200 devtest | 59.5 | 70.4 | 65.0 | 70.1 |
| TaCON | 69.5 | 75.3 | 76.8 | - |
| NTREX | 65.8 | 81.6 | 66.7 | - |
| Average | 64.9 | 75.8 | 68.2 | - |
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support