mRoBERTa_FT2_DFT2_lenguaje_claro
Description
This model is fine-tuned from BSC-LT/mRoBERTa for the task of clear language classification in Spanish texts.
It predicts among three categories of linguistic clarity:
- TXT: Original text
- FAC: Facilitated text
- LF: Easy-to-read text
Dataset
The dataset consists of Spanish texts annotated with clarity levels:
- Training set: 9,299 instances
- Test set: 3,723 instances
- Extra test set: 465 instances (texts from non-contiguous categories not seen during training, used to evaluate generalization)
Training Parameters
- learning_rate: 2e-5
- num_train_epochs: 2
- per_device_train_batch_size: 8
- per_device_eval_batch_size: 8
- overwrite_output_dir: true
- logging_strategy: steps
- logging_steps: 10
- seed: 852
- fp16: true
Results
Combined test set (4,188 instances)
Confusion Matrix
| Pred FAC | Pred LF | Pred TXT | |
|---|---|---|---|
| True FAC | 1373 | 15 | 8 |
| True LF | 29 | 1367 | 0 |
| True TXT | 16 | 1 | 1379 |
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| FAC | 0.9683 | 0.9835 | 0.9758 | 1396 |
| LF | 0.9884 | 0.9792 | 0.9838 | 1396 |
| TXT | 0.9942 | 0.9878 | 0.9910 | 1396 |
- Accuracy: 0.9835
- Macro Avg F1: 0.9836
Test set (3,723 instances)
Confusion Matrix
| Pred FAC | Pred LF | Pred TXT | |
|---|---|---|---|
| True FAC | 1220 | 13 | 8 |
| True LF | 28 | 1213 | 0 |
| True TXT | 13 | 1 | 1227 |
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| FAC | 0.9675 | 0.9831 | 0.9752 | 1241 |
| LF | 0.9886 | 0.9774 | 0.9830 | 1241 |
| TXT | 0.9935 | 0.9887 | 0.9911 | 1241 |
- Accuracy: 0.9831
- Macro Avg F1: 0.9831
Extra test set (465 instances)
Confusion Matrix
| Pred FAC | Pred LF | Pred TXT | |
|---|---|---|---|
| True FAC | 153 | 2 | 0 |
| True LF | 1 | 154 | 0 |
| True TXT | 3 | 0 | 152 |
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| FAC | 0.9745 | 0.9871 | 0.9808 | 155 |
| LF | 0.9872 | 0.9936 | 0.9903 | 155 |
| TXT | 1.0000 | 0.9806 | 0.9902 | 155 |
- Accuracy: 0.9871
- Macro Avg F1: 0.9871
Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA.
Reference
@misc{gplsi-mroberta-lenguajeclaro,
author = {Sepúlveda-Torres, Robiert and Martínez-Murillo, Iván and Bonora, Mar and Consuegra-Ayala, Juan Pablo},
title = {mRoBERTa_FT2_DFT2_lenguaje_claro: Fine-tuned model for clear language classification (TXT, FAC, LF)},
year = {2025},
howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT2_DFT2_lenguaje_claro}},
note = {Accessed: 2025-10-03}
}
- Downloads last month
- 10
Model tree for gplsi/mRoBERTa_FT2_DFT2_lenguaje_claro
Base model
BSC-LT/mRoBERTa