--- library_name: transformers language: - mt license: cc-by-nc-sa-4.0 base_model: google/mt5-small datasets: - MLRS/maltese_news_categories model-index: - name: mt5-small_maltese-news-categories results: - task: type: text-classification name: Topic Classification dataset: type: maltese_news_categories name: MLRS/maltese_news_categories metrics: - type: f1 args: macro value: 55.17 name: Macro-averaged F1 source: name: MELABench Leaderboard url: https://huggingface.co/spaces/MLRS/MELABench extra_gated_fields: Name: text Surname: text Date of Birth: date_picker Organisation: text Country: country I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox --- # mT5-Small (Maltese News Categories) This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the [MLRS/maltese_news_categories](https://huggingface.co/datasets/MLRS/maltese_news_categories) dataset. It achieves the following results on the test set: - Loss: 0.5892 - F1: 0.5247 ## Intended uses & limitations The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited. ## Training procedure The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_seq2seq_classification.py). ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.001 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Use adafactor and the args are: No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 200.0 - early_stopping_patience: 20 ### Training results | Training Loss | Epoch | Step | Validation Loss | F1 | |:-------------:|:-----:|:-----:|:---------------:|:------:| | No log | 1.0 | 337 | 0.8587 | 0.3587 | | 2.0301 | 2.0 | 674 | 0.8575 | 0.2998 | | 0.9165 | 3.0 | 1011 | 0.6689 | 0.4065 | | 0.9165 | 4.0 | 1348 | 0.6344 | 0.4198 | | 0.7345 | 5.0 | 1685 | 0.6038 | 0.4447 | | 0.6836 | 6.0 | 2022 | 0.6210 | 0.4416 | | 0.6836 | 7.0 | 2359 | 0.5396 | 0.4754 | | 0.6346 | 8.0 | 2696 | 0.5455 | 0.4775 | | 0.5773 | 9.0 | 3033 | 0.5194 | 0.4861 | | 0.5773 | 10.0 | 3370 | 0.5245 | 0.4884 | | 0.5429 | 11.0 | 3707 | 0.5024 | 0.4899 | | 0.5212 | 12.0 | 4044 | 0.4950 | 0.4872 | | 0.5212 | 13.0 | 4381 | 0.4842 | 0.5041 | | 0.4964 | 14.0 | 4718 | 0.5221 | 0.4931 | | 0.4977 | 15.0 | 5055 | 0.5183 | 0.4740 | | 0.4977 | 16.0 | 5392 | 0.5177 | 0.4901 | | 0.4808 | 17.0 | 5729 | 0.4934 | 0.5075 | | 0.4622 | 18.0 | 6066 | 0.5008 | 0.5195 | | 0.4622 | 19.0 | 6403 | 0.4905 | 0.5345 | | 0.4578 | 20.0 | 6740 | 0.5069 | 0.5278 | | 0.4434 | 21.0 | 7077 | 0.4920 | 0.5223 | | 0.4434 | 22.0 | 7414 | 0.5079 | 0.5271 | | 0.43 | 23.0 | 7751 | 0.5004 | 0.5204 | | 0.4209 | 24.0 | 8088 | 0.5059 | 0.5363 | | 0.4209 | 25.0 | 8425 | 0.5395 | 0.5185 | | 0.4095 | 26.0 | 8762 | 0.5299 | 0.5249 | | 0.4063 | 27.0 | 9099 | 0.5102 | 0.5226 | | 0.4063 | 28.0 | 9436 | 0.5150 | 0.5022 | | 0.3836 | 29.0 | 9773 | 0.5194 | 0.5326 | | 0.3793 | 30.0 | 10110 | 0.5496 | 0.5203 | | 0.3793 | 31.0 | 10447 | 0.5071 | 0.5275 | | 0.3812 | 32.0 | 10784 | 0.5050 | 0.5239 | | 0.3594 | 33.0 | 11121 | 0.5033 | 0.5296 | | 0.3594 | 34.0 | 11458 | 0.5007 | 0.5455 | | 0.3552 | 35.0 | 11795 | 0.5199 | 0.5331 | | 0.3471 | 36.0 | 12132 | 0.5331 | 0.5180 | | 0.3471 | 37.0 | 12469 | 0.5292 | 0.5392 | | 0.3434 | 38.0 | 12806 | 0.5206 | 0.5392 | | 0.3351 | 39.0 | 13143 | 0.5226 | 0.5285 | | 0.3351 | 40.0 | 13480 | 0.5186 | 0.5256 | | 0.3298 | 41.0 | 13817 | 0.5490 | 0.5517 | | 0.3174 | 42.0 | 14154 | 0.5689 | 0.5234 | | 0.3174 | 43.0 | 14491 | 0.5520 | 0.5205 | | 0.3194 | 44.0 | 14828 | 0.5371 | 0.5145 | | 0.301 | 45.0 | 15165 | 0.5831 | 0.5204 | | 0.3013 | 46.0 | 15502 | 0.5832 | 0.5282 | | 0.3013 | 47.0 | 15839 | 0.5852 | 0.5289 | | 0.2946 | 48.0 | 16176 | 0.5476 | 0.5290 | | 0.2923 | 49.0 | 16513 | 0.5757 | 0.5316 | | 0.2923 | 50.0 | 16850 | 0.6236 | 0.5116 | | 0.2773 | 51.0 | 17187 | 0.5951 | 0.5292 | | 0.2795 | 52.0 | 17524 | 0.5958 | 0.5367 | | 0.2795 | 53.0 | 17861 | 0.6318 | 0.5166 | | 0.268 | 54.0 | 18198 | 0.6652 | 0.5204 | | 0.2627 | 55.0 | 18535 | 0.6435 | 0.5074 | | 0.2627 | 56.0 | 18872 | 0.6365 | 0.5213 | | 0.2557 | 57.0 | 19209 | 0.5879 | 0.5217 | | 0.2577 | 58.0 | 19546 | 0.6339 | 0.5322 | | 0.2577 | 59.0 | 19883 | 0.6272 | 0.5278 | | 0.2375 | 60.0 | 20220 | 0.6326 | 0.5406 | | 0.2258 | 61.0 | 20557 | 0.6520 | 0.5265 | ### Framework versions - Transformers 4.48.2 - Pytorch 2.4.1+cu121 - Datasets 3.2.0 - Tokenizers 0.21.0 ## License This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/). [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png ## Citation This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385). Cite it as follows: ```bibtex @inproceedings{micallef-borg-2025-melabenchv1, title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}", author = "Micallef, Kurt and Borg, Claudia", editor = "Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher", booktitle = "Findings of the Association for Computational Linguistics: ACL 2025", month = jul, year = "2025", address = "Vienna, Austria", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.findings-acl.1053/", doi = "10.18653/v1/2025.findings-acl.1053", pages = "20505--20527", ISBN = "979-8-89176-256-5", } ```