mT5-Small (Maltese News Categories)
This model is a fine-tuned version of google/mt5-small on the MLRS/maltese_news_categories dataset. It achieves the following results on the test set:
- Loss: 0.5892
- F1: 0.5247
Intended uses & limitations
The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.
Training procedure
The model was fine-tuned using a customised script.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use adafactor and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 200.0
- early_stopping_patience: 20
Training results
| Training Loss | Epoch | Step | Validation Loss | F1 |
|---|---|---|---|---|
| No log | 1.0 | 337 | 0.8587 | 0.3587 |
| 2.0301 | 2.0 | 674 | 0.8575 | 0.2998 |
| 0.9165 | 3.0 | 1011 | 0.6689 | 0.4065 |
| 0.9165 | 4.0 | 1348 | 0.6344 | 0.4198 |
| 0.7345 | 5.0 | 1685 | 0.6038 | 0.4447 |
| 0.6836 | 6.0 | 2022 | 0.6210 | 0.4416 |
| 0.6836 | 7.0 | 2359 | 0.5396 | 0.4754 |
| 0.6346 | 8.0 | 2696 | 0.5455 | 0.4775 |
| 0.5773 | 9.0 | 3033 | 0.5194 | 0.4861 |
| 0.5773 | 10.0 | 3370 | 0.5245 | 0.4884 |
| 0.5429 | 11.0 | 3707 | 0.5024 | 0.4899 |
| 0.5212 | 12.0 | 4044 | 0.4950 | 0.4872 |
| 0.5212 | 13.0 | 4381 | 0.4842 | 0.5041 |
| 0.4964 | 14.0 | 4718 | 0.5221 | 0.4931 |
| 0.4977 | 15.0 | 5055 | 0.5183 | 0.4740 |
| 0.4977 | 16.0 | 5392 | 0.5177 | 0.4901 |
| 0.4808 | 17.0 | 5729 | 0.4934 | 0.5075 |
| 0.4622 | 18.0 | 6066 | 0.5008 | 0.5195 |
| 0.4622 | 19.0 | 6403 | 0.4905 | 0.5345 |
| 0.4578 | 20.0 | 6740 | 0.5069 | 0.5278 |
| 0.4434 | 21.0 | 7077 | 0.4920 | 0.5223 |
| 0.4434 | 22.0 | 7414 | 0.5079 | 0.5271 |
| 0.43 | 23.0 | 7751 | 0.5004 | 0.5204 |
| 0.4209 | 24.0 | 8088 | 0.5059 | 0.5363 |
| 0.4209 | 25.0 | 8425 | 0.5395 | 0.5185 |
| 0.4095 | 26.0 | 8762 | 0.5299 | 0.5249 |
| 0.4063 | 27.0 | 9099 | 0.5102 | 0.5226 |
| 0.4063 | 28.0 | 9436 | 0.5150 | 0.5022 |
| 0.3836 | 29.0 | 9773 | 0.5194 | 0.5326 |
| 0.3793 | 30.0 | 10110 | 0.5496 | 0.5203 |
| 0.3793 | 31.0 | 10447 | 0.5071 | 0.5275 |
| 0.3812 | 32.0 | 10784 | 0.5050 | 0.5239 |
| 0.3594 | 33.0 | 11121 | 0.5033 | 0.5296 |
| 0.3594 | 34.0 | 11458 | 0.5007 | 0.5455 |
| 0.3552 | 35.0 | 11795 | 0.5199 | 0.5331 |
| 0.3471 | 36.0 | 12132 | 0.5331 | 0.5180 |
| 0.3471 | 37.0 | 12469 | 0.5292 | 0.5392 |
| 0.3434 | 38.0 | 12806 | 0.5206 | 0.5392 |
| 0.3351 | 39.0 | 13143 | 0.5226 | 0.5285 |
| 0.3351 | 40.0 | 13480 | 0.5186 | 0.5256 |
| 0.3298 | 41.0 | 13817 | 0.5490 | 0.5517 |
| 0.3174 | 42.0 | 14154 | 0.5689 | 0.5234 |
| 0.3174 | 43.0 | 14491 | 0.5520 | 0.5205 |
| 0.3194 | 44.0 | 14828 | 0.5371 | 0.5145 |
| 0.301 | 45.0 | 15165 | 0.5831 | 0.5204 |
| 0.3013 | 46.0 | 15502 | 0.5832 | 0.5282 |
| 0.3013 | 47.0 | 15839 | 0.5852 | 0.5289 |
| 0.2946 | 48.0 | 16176 | 0.5476 | 0.5290 |
| 0.2923 | 49.0 | 16513 | 0.5757 | 0.5316 |
| 0.2923 | 50.0 | 16850 | 0.6236 | 0.5116 |
| 0.2773 | 51.0 | 17187 | 0.5951 | 0.5292 |
| 0.2795 | 52.0 | 17524 | 0.5958 | 0.5367 |
| 0.2795 | 53.0 | 17861 | 0.6318 | 0.5166 |
| 0.268 | 54.0 | 18198 | 0.6652 | 0.5204 |
| 0.2627 | 55.0 | 18535 | 0.6435 | 0.5074 |
| 0.2627 | 56.0 | 18872 | 0.6365 | 0.5213 |
| 0.2557 | 57.0 | 19209 | 0.5879 | 0.5217 |
| 0.2577 | 58.0 | 19546 | 0.6339 | 0.5322 |
| 0.2577 | 59.0 | 19883 | 0.6272 | 0.5278 |
| 0.2375 | 60.0 | 20220 | 0.6326 | 0.5406 |
| 0.2258 | 61.0 | 20557 | 0.6520 | 0.5265 |
Framework versions
- Transformers 4.48.2
- Pytorch 2.4.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:
@inproceedings{micallef-borg-2025-melabenchv1,
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
author = "Micallef, Kurt and
Borg, Claudia",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.1053/",
doi = "10.18653/v1/2025.findings-acl.1053",
pages = "20505--20527",
ISBN = "979-8-89176-256-5",
}
- Downloads last month
- -
Model tree for MLRS/mt5-small_maltese-news-categories
Base model
google/mt5-smallDataset used to train MLRS/mt5-small_maltese-news-categories
Collection including MLRS/mt5-small_maltese-news-categories
Evaluation results
- Macro-averaged F1 on MLRS/maltese_news_categoriesMELABench Leaderboard55.170
