4d18f0b7b0fc742690ac565165d0438f

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [de-es] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2221
  • Data Size: 1.0
  • Epoch Runtime: 301.5996
  • Bleu: 1.1422

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 235.5920 0 21.6725 0.0151
No log 1 688 158.6363 0.0078 25.2709 0.0065
No log 2 1376 85.2046 0.0156 27.3339 0.0036
No log 3 2064 30.0130 0.0312 33.1372 0.0016
2.6754 4 2752 16.1713 0.0625 41.7777 0.0725
1.863 5 3440 12.0949 0.125 58.9698 0.1161
14.77 6 4128 9.1553 0.25 94.4603 0.1050
10.7326 7 4816 7.1104 0.5 162.4500 0.0777
7.1993 8.0 5504 4.8120 1.0 296.1590 0.0317
5.7684 9.0 6192 4.0821 1.0 296.1616 0.0404
4.8521 10.0 6880 3.5462 1.0 295.2833 0.1019
4.3215 11.0 7568 3.3064 1.0 293.7784 0.0986
3.9813 12.0 8256 3.1607 1.0 294.3076 0.1001
3.7181 13.0 8944 2.9624 1.0 293.7822 0.1871
3.5393 14.0 9632 2.9189 1.0 294.3335 0.2653
3.3765 15.0 10320 2.8352 1.0 297.6367 0.3737
3.2647 16.0 11008 2.7518 1.0 295.5343 0.2972
3.1507 17.0 11696 2.7165 1.0 293.9573 0.3930
3.0727 18.0 12384 2.6494 1.0 294.2747 0.3934
2.9689 19.0 13072 2.6156 1.0 293.8179 0.4885
2.9318 20.0 13760 2.5777 1.0 296.2244 0.4388
2.8625 21.0 14448 2.5504 1.0 292.5996 0.6039
2.8148 22.0 15136 2.5291 1.0 294.3252 0.5524
2.7718 23.0 15824 2.4954 1.0 295.4557 0.6300
2.7298 24.0 16512 2.4774 1.0 296.3096 0.6497
2.6893 25.0 17200 2.4443 1.0 295.0107 0.7417
2.6229 26.0 17888 2.4218 1.0 292.1687 0.6751
2.582 27.0 18576 2.3986 1.0 297.3084 0.6930
2.5685 28.0 19264 2.3927 1.0 300.0445 0.7737
2.5398 29.0 19952 2.3765 1.0 300.2355 0.7913
2.5014 30.0 20640 2.3554 1.0 295.5488 0.8455
2.4913 31.0 21328 2.3539 1.0 293.5029 0.7853
2.4406 32.0 22016 2.3387 1.0 297.9184 0.8931
2.4185 33.0 22704 2.3182 1.0 298.8404 0.9088
2.398 34.0 23392 2.3152 1.0 296.9342 0.7916
2.3568 35.0 24080 2.3110 1.0 298.0802 0.9005
2.3569 36.0 24768 2.2831 1.0 298.2648 0.9909
2.3031 37.0 25456 2.2923 1.0 299.1979 0.9535
2.2997 38.0 26144 2.2674 1.0 297.0862 0.9699
2.258 39.0 26832 2.2698 1.0 300.9418 1.1056
2.2468 40.0 27520 2.2597 1.0 294.5861 1.0727
2.227 41.0 28208 2.2564 1.0 298.8053 1.0456
2.1836 42.0 28896 2.2530 1.0 300.2710 1.1390
2.1922 43.0 29584 2.2477 1.0 302.2907 0.9623
2.1422 44.0 30272 2.2420 1.0 303.2315 1.0065
2.1354 45.0 30960 2.2412 1.0 302.1064 1.1697
2.1167 46.0 31648 2.2408 1.0 300.0712 1.1307
2.0993 47.0 32336 2.2337 1.0 300.4522 1.1571
2.0814 48.0 33024 2.2205 1.0 299.3218 1.0888
2.0697 49.0 33712 2.2232 1.0 302.0744 1.1725
2.0495 50.0 34400 2.2221 1.0 301.5996 1.1422

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
12
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/4d18f0b7b0fc742690ac565165d0438f

Finetuned
(38)
this model