MLRS
/

Transformers
TensorBoard
Safetensors
Maltese
mt5
text2text-generation
Eval Results

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

mT5-Small (Maltese News Categories)

This model is a fine-tuned version of google/mt5-small on the MLRS/maltese_news_categories dataset. It achieves the following results on the test set:

  • Loss: 0.5892
  • F1: 0.5247

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use adafactor and the args are: No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 200.0
  • early_stopping_patience: 20

Training results

Training Loss Epoch Step Validation Loss F1
No log 1.0 337 0.8587 0.3587
2.0301 2.0 674 0.8575 0.2998
0.9165 3.0 1011 0.6689 0.4065
0.9165 4.0 1348 0.6344 0.4198
0.7345 5.0 1685 0.6038 0.4447
0.6836 6.0 2022 0.6210 0.4416
0.6836 7.0 2359 0.5396 0.4754
0.6346 8.0 2696 0.5455 0.4775
0.5773 9.0 3033 0.5194 0.4861
0.5773 10.0 3370 0.5245 0.4884
0.5429 11.0 3707 0.5024 0.4899
0.5212 12.0 4044 0.4950 0.4872
0.5212 13.0 4381 0.4842 0.5041
0.4964 14.0 4718 0.5221 0.4931
0.4977 15.0 5055 0.5183 0.4740
0.4977 16.0 5392 0.5177 0.4901
0.4808 17.0 5729 0.4934 0.5075
0.4622 18.0 6066 0.5008 0.5195
0.4622 19.0 6403 0.4905 0.5345
0.4578 20.0 6740 0.5069 0.5278
0.4434 21.0 7077 0.4920 0.5223
0.4434 22.0 7414 0.5079 0.5271
0.43 23.0 7751 0.5004 0.5204
0.4209 24.0 8088 0.5059 0.5363
0.4209 25.0 8425 0.5395 0.5185
0.4095 26.0 8762 0.5299 0.5249
0.4063 27.0 9099 0.5102 0.5226
0.4063 28.0 9436 0.5150 0.5022
0.3836 29.0 9773 0.5194 0.5326
0.3793 30.0 10110 0.5496 0.5203
0.3793 31.0 10447 0.5071 0.5275
0.3812 32.0 10784 0.5050 0.5239
0.3594 33.0 11121 0.5033 0.5296
0.3594 34.0 11458 0.5007 0.5455
0.3552 35.0 11795 0.5199 0.5331
0.3471 36.0 12132 0.5331 0.5180
0.3471 37.0 12469 0.5292 0.5392
0.3434 38.0 12806 0.5206 0.5392
0.3351 39.0 13143 0.5226 0.5285
0.3351 40.0 13480 0.5186 0.5256
0.3298 41.0 13817 0.5490 0.5517
0.3174 42.0 14154 0.5689 0.5234
0.3174 43.0 14491 0.5520 0.5205
0.3194 44.0 14828 0.5371 0.5145
0.301 45.0 15165 0.5831 0.5204
0.3013 46.0 15502 0.5832 0.5282
0.3013 47.0 15839 0.5852 0.5289
0.2946 48.0 16176 0.5476 0.5290
0.2923 49.0 16513 0.5757 0.5316
0.2923 50.0 16850 0.6236 0.5116
0.2773 51.0 17187 0.5951 0.5292
0.2795 52.0 17524 0.5958 0.5367
0.2795 53.0 17861 0.6318 0.5166
0.268 54.0 18198 0.6652 0.5204
0.2627 55.0 18535 0.6435 0.5074
0.2627 56.0 18872 0.6365 0.5213
0.2557 57.0 19209 0.5879 0.5217
0.2577 58.0 19546 0.6339 0.5322
0.2577 59.0 19883 0.6272 0.5278
0.2375 60.0 20220 0.6326 0.5406
0.2258 61.0 20557 0.6520 0.5265

Framework versions

  • Transformers 4.48.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

CC BY-NC-SA 4.0

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLRS/mt5-small_maltese-news-categories

Base model

google/mt5-small
Finetuned
(595)
this model

Dataset used to train MLRS/mt5-small_maltese-news-categories

Collection including MLRS/mt5-small_maltese-news-categories

Evaluation results