You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

mT5-Small (Maltese News Categories)

This model is a fine-tuned version of google/mt5-small on the MLRS/maltese_news_categories dataset. It achieves the following results on the test set:

Loss: 0.5892
F1: 0.5247

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use adafactor and the args are: No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 200.0
early_stopping_patience: 20

Training results

Training Loss	Epoch	Step	Validation Loss	F1
No log	1.0	337	0.8587	0.3587
2.0301	2.0	674	0.8575	0.2998
0.9165	3.0	1011	0.6689	0.4065
0.9165	4.0	1348	0.6344	0.4198
0.7345	5.0	1685	0.6038	0.4447
0.6836	6.0	2022	0.6210	0.4416
0.6836	7.0	2359	0.5396	0.4754
0.6346	8.0	2696	0.5455	0.4775
0.5773	9.0	3033	0.5194	0.4861
0.5773	10.0	3370	0.5245	0.4884
0.5429	11.0	3707	0.5024	0.4899
0.5212	12.0	4044	0.4950	0.4872
0.5212	13.0	4381	0.4842	0.5041
0.4964	14.0	4718	0.5221	0.4931
0.4977	15.0	5055	0.5183	0.4740
0.4977	16.0	5392	0.5177	0.4901
0.4808	17.0	5729	0.4934	0.5075
0.4622	18.0	6066	0.5008	0.5195
0.4622	19.0	6403	0.4905	0.5345
0.4578	20.0	6740	0.5069	0.5278
0.4434	21.0	7077	0.4920	0.5223
0.4434	22.0	7414	0.5079	0.5271
0.43	23.0	7751	0.5004	0.5204
0.4209	24.0	8088	0.5059	0.5363
0.4209	25.0	8425	0.5395	0.5185
0.4095	26.0	8762	0.5299	0.5249
0.4063	27.0	9099	0.5102	0.5226
0.4063	28.0	9436	0.5150	0.5022
0.3836	29.0	9773	0.5194	0.5326
0.3793	30.0	10110	0.5496	0.5203
0.3793	31.0	10447	0.5071	0.5275
0.3812	32.0	10784	0.5050	0.5239
0.3594	33.0	11121	0.5033	0.5296
0.3594	34.0	11458	0.5007	0.5455
0.3552	35.0	11795	0.5199	0.5331
0.3471	36.0	12132	0.5331	0.5180
0.3471	37.0	12469	0.5292	0.5392
0.3434	38.0	12806	0.5206	0.5392
0.3351	39.0	13143	0.5226	0.5285
0.3351	40.0	13480	0.5186	0.5256
0.3298	41.0	13817	0.5490	0.5517
0.3174	42.0	14154	0.5689	0.5234
0.3174	43.0	14491	0.5520	0.5205
0.3194	44.0	14828	0.5371	0.5145
0.301	45.0	15165	0.5831	0.5204
0.3013	46.0	15502	0.5832	0.5282
0.3013	47.0	15839	0.5852	0.5289
0.2946	48.0	16176	0.5476	0.5290
0.2923	49.0	16513	0.5757	0.5316
0.2923	50.0	16850	0.6236	0.5116
0.2773	51.0	17187	0.5951	0.5292
0.2795	52.0	17524	0.5958	0.5367
0.2795	53.0	17861	0.6318	0.5166
0.268	54.0	18198	0.6652	0.5204
0.2627	55.0	18535	0.6435	0.5074
0.2627	56.0	18872	0.6365	0.5213
0.2557	57.0	19209	0.5879	0.5217
0.2577	58.0	19546	0.6339	0.5322
0.2577	59.0	19883	0.6272	0.5278
0.2375	60.0	20220	0.6326	0.5406
0.2258	61.0	20557	0.6520	0.5265

Framework versions

Transformers 4.48.2
Pytorch 2.4.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}

Downloads last month: -

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLRS/mt5-small_maltese-news-categories

Base model

google/mt5-small

Finetuned

(595)

this model

Dataset used to train MLRS/mt5-small_maltese-news-categories

Collection including MLRS/mt5-small_maltese-news-categories

mT5-Small

Collection

A collection of fine-tuned mT5-Small models on MELABench tasks. • 9 items • Updated Aug 19

Evaluation results

Macro-averaged F1 on MLRS/maltese_news_categories
MELABench Leaderboard

55.170

View on Papers With Code