---
library_name: transformers
language:
- mt
license: cc-by-nc-sa-4.0
base_model: google/mt5-small
datasets:
- MLRS/maltese_news_categories
model-index:
- name: mt5-small_maltese-news-categories
  results:
  - task: 
      type: text-classification
      name: Topic Classification
    dataset:
      type: maltese_news_categories
      name: MLRS/maltese_news_categories
    metrics:
      - type: f1
        args: macro
        value: 55.17
        name: Macro-averaged F1
    source:
      name: MELABench Leaderboard
      url: https://huggingface.co/spaces/MLRS/MELABench
extra_gated_fields:
  Name: text
  Surname: text
  Date of Birth: date_picker
  Organisation: text
  Country: country
  I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox
---

# mT5-Small (Maltese News Categories)

This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the [MLRS/maltese_news_categories](https://huggingface.co/datasets/MLRS/maltese_news_categories) dataset.
It achieves the following results on the test set:
- Loss: 0.5892
- F1: 0.5247

## Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task.
Any limitations present in the base model are inherited.

## Training procedure

The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_seq2seq_classification.py).

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use adafactor and the args are:
No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 200.0
- early_stopping_patience: 20

### Training results

| Training Loss | Epoch | Step  | Validation Loss | F1     |
|:-------------:|:-----:|:-----:|:---------------:|:------:|
| No log        | 1.0   | 337   | 0.8587          | 0.3587 |
| 2.0301        | 2.0   | 674   | 0.8575          | 0.2998 |
| 0.9165        | 3.0   | 1011  | 0.6689          | 0.4065 |
| 0.9165        | 4.0   | 1348  | 0.6344          | 0.4198 |
| 0.7345        | 5.0   | 1685  | 0.6038          | 0.4447 |
| 0.6836        | 6.0   | 2022  | 0.6210          | 0.4416 |
| 0.6836        | 7.0   | 2359  | 0.5396          | 0.4754 |
| 0.6346        | 8.0   | 2696  | 0.5455          | 0.4775 |
| 0.5773        | 9.0   | 3033  | 0.5194          | 0.4861 |
| 0.5773        | 10.0  | 3370  | 0.5245          | 0.4884 |
| 0.5429        | 11.0  | 3707  | 0.5024          | 0.4899 |
| 0.5212        | 12.0  | 4044  | 0.4950          | 0.4872 |
| 0.5212        | 13.0  | 4381  | 0.4842          | 0.5041 |
| 0.4964        | 14.0  | 4718  | 0.5221          | 0.4931 |
| 0.4977        | 15.0  | 5055  | 0.5183          | 0.4740 |
| 0.4977        | 16.0  | 5392  | 0.5177          | 0.4901 |
| 0.4808        | 17.0  | 5729  | 0.4934          | 0.5075 |
| 0.4622        | 18.0  | 6066  | 0.5008          | 0.5195 |
| 0.4622        | 19.0  | 6403  | 0.4905          | 0.5345 |
| 0.4578        | 20.0  | 6740  | 0.5069          | 0.5278 |
| 0.4434        | 21.0  | 7077  | 0.4920          | 0.5223 |
| 0.4434        | 22.0  | 7414  | 0.5079          | 0.5271 |
| 0.43          | 23.0  | 7751  | 0.5004          | 0.5204 |
| 0.4209        | 24.0  | 8088  | 0.5059          | 0.5363 |
| 0.4209        | 25.0  | 8425  | 0.5395          | 0.5185 |
| 0.4095        | 26.0  | 8762  | 0.5299          | 0.5249 |
| 0.4063        | 27.0  | 9099  | 0.5102          | 0.5226 |
| 0.4063        | 28.0  | 9436  | 0.5150          | 0.5022 |
| 0.3836        | 29.0  | 9773  | 0.5194          | 0.5326 |
| 0.3793        | 30.0  | 10110 | 0.5496          | 0.5203 |
| 0.3793        | 31.0  | 10447 | 0.5071          | 0.5275 |
| 0.3812        | 32.0  | 10784 | 0.5050          | 0.5239 |
| 0.3594        | 33.0  | 11121 | 0.5033          | 0.5296 |
| 0.3594        | 34.0  | 11458 | 0.5007          | 0.5455 |
| 0.3552        | 35.0  | 11795 | 0.5199          | 0.5331 |
| 0.3471        | 36.0  | 12132 | 0.5331          | 0.5180 |
| 0.3471        | 37.0  | 12469 | 0.5292          | 0.5392 |
| 0.3434        | 38.0  | 12806 | 0.5206          | 0.5392 |
| 0.3351        | 39.0  | 13143 | 0.5226          | 0.5285 |
| 0.3351        | 40.0  | 13480 | 0.5186          | 0.5256 |
| 0.3298        | 41.0  | 13817 | 0.5490          | 0.5517 |
| 0.3174        | 42.0  | 14154 | 0.5689          | 0.5234 |
| 0.3174        | 43.0  | 14491 | 0.5520          | 0.5205 |
| 0.3194        | 44.0  | 14828 | 0.5371          | 0.5145 |
| 0.301         | 45.0  | 15165 | 0.5831          | 0.5204 |
| 0.3013        | 46.0  | 15502 | 0.5832          | 0.5282 |
| 0.3013        | 47.0  | 15839 | 0.5852          | 0.5289 |
| 0.2946        | 48.0  | 16176 | 0.5476          | 0.5290 |
| 0.2923        | 49.0  | 16513 | 0.5757          | 0.5316 |
| 0.2923        | 50.0  | 16850 | 0.6236          | 0.5116 |
| 0.2773        | 51.0  | 17187 | 0.5951          | 0.5292 |
| 0.2795        | 52.0  | 17524 | 0.5958          | 0.5367 |
| 0.2795        | 53.0  | 17861 | 0.6318          | 0.5166 |
| 0.268         | 54.0  | 18198 | 0.6652          | 0.5204 |
| 0.2627        | 55.0  | 18535 | 0.6435          | 0.5074 |
| 0.2627        | 56.0  | 18872 | 0.6365          | 0.5213 |
| 0.2557        | 57.0  | 19209 | 0.5879          | 0.5217 |
| 0.2577        | 58.0  | 19546 | 0.6339          | 0.5322 |
| 0.2577        | 59.0  | 19883 | 0.6272          | 0.5278 |
| 0.2375        | 60.0  | 20220 | 0.6326          | 0.5406 |
| 0.2258        | 61.0  | 20557 | 0.6520          | 0.5265 |


### Framework versions

- Transformers 4.48.2
- Pytorch 2.4.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0

## License

This work is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).

[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png

## Citation

This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385).
Cite it as follows: 

```bibtex
@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}
```