93beed4cf0a292354d00816feccfa413

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [es-pt] dataset. It achieves the following results on the evaluation set:

  • Loss: 4.2710
  • Data Size: 1.0
  • Epoch Runtime: 21.8637
  • Bleu: 0.9816

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 236.3281 0 1.8417 0.0218
No log 1 33 220.4164 0.0078 2.4454 0.0215
No log 2 66 204.8405 0.0156 3.2886 0.0205
No log 3 99 178.9747 0.0312 5.1289 0.0170
10.8048 4 132 149.1265 0.0625 7.5365 0.0163
10.8048 5 165 115.3729 0.125 9.7194 0.0064
10.8048 6 198 73.5350 0.25 11.7979 0.0057
23.6174 7 231 35.5477 0.5 14.7557 0.0068
36.4933 8.0 264 17.0832 1.0 22.0719 0.0452
36.4933 9.0 297 14.2073 1.0 21.0046 0.0598
26.2706 10.0 330 12.5317 1.0 21.5360 0.0276
19.6304 11.0 363 11.6492 1.0 20.4911 0.1435
19.6304 12.0 396 10.6991 1.0 20.8877 0.0648
16.9168 13.0 429 10.2809 1.0 20.8863 0.0476
15.2537 14.0 462 9.7464 1.0 20.6664 0.0335
15.2537 15.0 495 9.8049 1.0 20.8270 0.0558
13.8518 16.0 528 8.2535 1.0 20.6819 0.1067
12.748 17.0 561 8.4780 1.0 21.5175 0.0440
12.748 18.0 594 8.1782 1.0 20.7945 0.0967
11.8111 19.0 627 7.4566 1.0 20.7800 0.2006
11.0365 20.0 660 7.3057 1.0 21.1041 0.1273
11.0365 21.0 693 6.7133 1.0 21.0027 0.3471
10.5073 22.0 726 6.9228 1.0 20.9700 0.3550
9.8583 23.0 759 7.0375 1.0 21.6299 0.2705
9.8583 24.0 792 6.7037 1.0 21.2713 0.4168
9.3498 25.0 825 6.2121 1.0 20.5182 0.5571
8.9215 26.0 858 5.9844 1.0 20.7437 0.5474
8.9215 27.0 891 6.0323 1.0 20.7299 0.6197
8.5465 28.0 924 5.6314 1.0 20.8857 0.5808
8.1801 29.0 957 5.5487 1.0 21.6938 0.5568
8.1801 30.0 990 5.6767 1.0 20.6651 0.4915
7.8944 31.0 1023 5.3007 1.0 20.7470 0.6073
7.6164 32.0 1056 5.4566 1.0 20.6885 0.5843
7.6164 33.0 1089 5.2941 1.0 21.0663 0.4686
7.3031 34.0 1122 5.2816 1.0 21.5694 0.7061
7.101 35.0 1155 5.2643 1.0 20.7937 0.6266
7.101 36.0 1188 5.0665 1.0 20.9322 0.5939
6.8672 37.0 1221 4.9107 1.0 21.7358 0.6661
6.6882 38.0 1254 4.9897 1.0 21.7348 0.7313
6.6882 39.0 1287 5.2259 1.0 22.1833 0.5689
6.48 40.0 1320 4.8984 1.0 22.8371 0.7928
6.3136 41.0 1353 4.7463 1.0 21.5830 0.7910
6.3136 42.0 1386 4.6112 1.0 21.7582 0.7291
6.1308 43.0 1419 4.8187 1.0 22.0048 0.6860
5.9509 44.0 1452 4.6719 1.0 22.0582 0.7044
5.9509 45.0 1485 4.4241 1.0 21.8078 0.9858
5.8166 46.0 1518 4.5210 1.0 21.7949 0.8620
5.6598 47.0 1551 4.5541 1.0 22.3571 0.7085
5.6598 48.0 1584 4.3421 1.0 21.4648 0.7746
5.4744 49.0 1617 4.2760 1.0 21.7306 0.8698
5.4035 50.0 1650 4.2710 1.0 21.8637 0.9816

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/93beed4cf0a292354d00816feccfa413

Finetuned
(38)
this model