mRoBERTa_FT2_DFT2_lenguaje_claro

Description

This model is fine-tuned from BSC-LT/mRoBERTa for the task of clear language classification in Spanish texts.

It predicts among three categories of linguistic clarity:

  • TXT: Original text
  • FAC: Facilitated text
  • LF: Easy-to-read text

Dataset

The dataset consists of Spanish texts annotated with clarity levels:

  • Training set: 9,299 instances
  • Test set: 3,723 instances
  • Extra test set: 465 instances (texts from non-contiguous categories not seen during training, used to evaluate generalization)

Training Parameters

  • learning_rate: 2e-5
  • num_train_epochs: 2
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • overwrite_output_dir: true
  • logging_strategy: steps
  • logging_steps: 10
  • seed: 852
  • fp16: true

Results

Combined test set (4,188 instances)

Confusion Matrix

Pred FAC Pred LF Pred TXT
True FAC 1373 15 8
True LF 29 1367 0
True TXT 16 1 1379
Class Precision Recall F1-score Support
FAC 0.9683 0.9835 0.9758 1396
LF 0.9884 0.9792 0.9838 1396
TXT 0.9942 0.9878 0.9910 1396
  • Accuracy: 0.9835
  • Macro Avg F1: 0.9836

Test set (3,723 instances)

Confusion Matrix

Pred FAC Pred LF Pred TXT
True FAC 1220 13 8
True LF 28 1213 0
True TXT 13 1 1227
Class Precision Recall F1-score Support
FAC 0.9675 0.9831 0.9752 1241
LF 0.9886 0.9774 0.9830 1241
TXT 0.9935 0.9887 0.9911 1241
  • Accuracy: 0.9831
  • Macro Avg F1: 0.9831

Extra test set (465 instances)

Confusion Matrix

Pred FAC Pred LF Pred TXT
True FAC 153 2 0
True LF 1 154 0
True TXT 3 0 152
Class Precision Recall F1-score Support
FAC 0.9745 0.9871 0.9808 155
LF 0.9872 0.9936 0.9903 155
TXT 1.0000 0.9806 0.9902 155
  • Accuracy: 0.9871
  • Macro Avg F1: 0.9871

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA.

Reference

@misc{gplsi-mroberta-lenguajeclaro,
  author       = {Sepúlveda-Torres, Robiert and Martínez-Murillo, Iván and Bonora, Mar and Consuegra-Ayala, Juan Pablo},
  title        = {mRoBERTa_FT2_DFT2_lenguaje_claro: Fine-tuned model for clear language classification (TXT, FAC, LF)},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT2_DFT2_lenguaje_claro}},
  note         = {Accessed: 2025-10-03}
}
Downloads last month
10
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gplsi/mRoBERTa_FT2_DFT2_lenguaje_claro

Base model

BSC-LT/mRoBERTa
Finetuned
(3)
this model