4c71ce1eafdb5d36bfec031e622c7637

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [de-it] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2967
  • Data Size: 1.0
  • Epoch Runtime: 296.9825
  • Bleu: 1.5312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 229.1420 0 21.3235 0.0102
No log 1 684 169.5335 0.0078 23.9654 0.0061
No log 2 1368 105.1773 0.0156 27.4962 0.0041
No log 3 2052 37.4363 0.0312 33.8458 0.0015
No log 4 2736 17.6935 0.0625 42.6523 0.0095
22.5653 5 3420 12.8546 0.125 61.7509 0.2321
15.8601 6 4104 9.7817 0.25 97.5144 0.0232
11.2192 7 4788 7.5349 0.5 167.9222 0.0168
7.604 8.0 5472 5.1665 1.0 311.9814 0.1142
5.8428 9.0 6156 4.0933 1.0 306.4764 0.1187
5.0192 10.0 6840 3.7044 1.0 308.3549 0.1499
4.485 11.0 7524 3.3896 1.0 300.2145 0.1799
4.0706 12.0 8208 3.1497 1.0 298.1315 0.3534
3.8092 13.0 8892 3.0523 1.0 300.3850 0.3495
3.6472 14.0 9576 2.9593 1.0 298.2718 0.4008
3.4461 15.0 10260 2.8402 1.0 297.2106 0.4580
3.3258 16.0 10944 2.7898 1.0 301.6928 0.4834
3.2073 17.0 11628 2.7334 1.0 300.2841 0.6354
3.1163 18.0 12312 2.6852 1.0 298.0317 0.6447
3.0383 19.0 12996 2.6541 1.0 297.7672 0.6590
2.9504 20.0 13680 2.6051 1.0 300.4691 0.7512
2.8632 21.0 14364 2.5677 1.0 297.4315 0.8523
2.8375 22.0 15048 2.5375 1.0 296.9936 0.8462
2.7784 23.0 15732 2.5039 1.0 299.7437 0.9535
2.7273 24.0 16416 2.4831 1.0 299.1126 0.9218
2.7019 25.0 17100 2.4658 1.0 307.1864 0.9450
2.6508 26.0 17784 2.4403 1.0 298.0001 1.0115
2.5866 27.0 18468 2.4429 1.0 300.0931 1.0111
2.5655 28.0 19152 2.4098 1.0 298.6058 1.0417
2.5107 29.0 19836 2.3942 1.0 297.7683 1.2072
2.4684 30.0 20520 2.3807 1.0 297.4598 1.1409
2.4466 31.0 21204 2.3601 1.0 298.9890 1.1560
2.4378 32.0 21888 2.3492 1.0 298.1015 1.2198
2.375 33.0 22572 2.3348 1.0 298.1212 1.2304
2.3469 34.0 23256 2.3288 1.0 296.2021 1.3018
2.3185 35.0 23940 2.3205 1.0 298.5160 1.3001
2.2792 36.0 24624 2.3057 1.0 298.3413 1.3007
2.2462 37.0 25308 2.3119 1.0 297.9856 1.3171
2.2336 38.0 25992 2.3040 1.0 300.3649 1.4047
2.178 39.0 26676 2.2890 1.0 298.0711 1.3388
2.1507 40.0 27360 2.2871 1.0 299.2080 1.4162
2.1355 41.0 28044 2.2830 1.0 296.7871 1.4717
2.113 42.0 28728 2.2790 1.0 297.9198 1.4811
2.0881 43.0 29412 2.2894 1.0 297.2793 1.5521
2.0622 44.0 30096 2.2832 1.0 298.7365 1.4748
2.0253 45.0 30780 2.2811 1.0 295.0279 1.5263
2.0143 46.0 31464 2.2967 1.0 296.9825 1.5312

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
9
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/4c71ce1eafdb5d36bfec031e622c7637

Finetuned
(38)
this model