mRoBERTa_FT2_DFT2_lenguaje_claro

Description

This model is fine-tuned from BSC-LT/mRoBERTa for the task of clear language classification in Spanish texts.

It predicts among three categories of linguistic clarity:

TXT: Original text
FAC: Facilitated text
LF: Easy-to-read text

Dataset

The dataset consists of Spanish texts annotated with clarity levels:

Training set: 9,299 instances
Test set: 3,723 instances
Extra test set: 465 instances (texts from non-contiguous categories not seen during training, used to evaluate generalization)

Training Parameters

learning_rate: 2e-5
num_train_epochs: 2
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
overwrite_output_dir: true
logging_strategy: steps
logging_steps: 10
seed: 852
fp16: true

Results

Combined test set (4,188 instances)

Confusion Matrix

	Pred FAC	Pred LF	Pred TXT
True FAC	1373	15	8
True LF	29	1367	0
True TXT	16	1	1379

Class	Precision	Recall	F1-score	Support
FAC	0.9683	0.9835	0.9758	1396
LF	0.9884	0.9792	0.9838	1396
TXT	0.9942	0.9878	0.9910	1396

Accuracy: 0.9835
Macro Avg F1: 0.9836

Test set (3,723 instances)

Confusion Matrix

	Pred FAC	Pred LF	Pred TXT
True FAC	1220	13	8
True LF	28	1213	0
True TXT	13	1	1227

Class	Precision	Recall	F1-score	Support
FAC	0.9675	0.9831	0.9752	1241
LF	0.9886	0.9774	0.9830	1241
TXT	0.9935	0.9887	0.9911	1241

Accuracy: 0.9831
Macro Avg F1: 0.9831

Extra test set (465 instances)

Confusion Matrix

	Pred FAC	Pred LF	Pred TXT
True FAC	153	2	0
True LF	1	154	0
True TXT	3	0	152

Class	Precision	Recall	F1-score	Support
FAC	0.9745	0.9871	0.9808	155
LF	0.9872	0.9936	0.9903	155
TXT	1.0000	0.9806	0.9902	155

Accuracy: 0.9871
Macro Avg F1: 0.9871

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA.

Reference

@misc{gplsi-mroberta-lenguajeclaro,
  author       = {Sepúlveda-Torres, Robiert and Martínez-Murillo, Iván and Bonora, Mar and Consuegra-Ayala, Juan Pablo},
  title        = {mRoBERTa_FT2_DFT2_lenguaje_claro: Fine-tuned model for clear language classification (TXT, FAC, LF)},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT2_DFT2_lenguaje_claro}},
  note         = {Accessed: 2025-10-03}
}

Downloads last month: 10

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for gplsi/mRoBERTa_FT2_DFT2_lenguaje_claro

Base model

BSC-LT/mRoBERTa

Finetuned

(3)

this model