c45e5d95c04fb6c57a4c8033d39fca63

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [es-fi] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8312
  • Data Size: 1.0
  • Epoch Runtime: 40.7791
  • Bleu: 0.2250

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 209.7094 0 3.3267 0.0010
No log 1 83 184.2445 0.0078 3.9533 0.0012
No log 2 166 161.0579 0.0156 5.0809 0.0013
No log 3 249 144.9138 0.0312 6.7955 0.0013
5.3868 4 332 111.1862 0.0625 9.0224 0.0011
5.3868 5 415 65.2582 0.125 12.1194 0.0011
5.3868 6 498 25.2881 0.25 16.3851 0.0011
13.292 7 581 13.2800 0.5 25.3184 0.0035
18.2018 8.0 664 9.5933 1.0 42.7601 0.0043
15.789 9.0 747 8.5477 1.0 41.0928 0.0045
13.0828 10.0 830 7.7491 1.0 41.1897 0.0043
11.4015 11.0 913 7.6218 1.0 41.2410 0.0044
10.7068 12.0 996 6.7302 1.0 41.2708 0.0077
9.7104 13.0 1079 6.2087 1.0 40.5168 0.0122
8.8903 14.0 1162 5.9130 1.0 41.7129 0.0248
8.6231 15.0 1245 5.1895 1.0 41.0517 0.0652
7.9684 16.0 1328 5.2467 1.0 41.2612 0.0485
7.5212 17.0 1411 4.8904 1.0 41.8618 0.0594
7.3085 18.0 1494 4.9712 1.0 41.2331 0.0317
6.8657 19.0 1577 4.5537 1.0 41.8624 0.0458
6.5853 20.0 1660 4.4626 1.0 41.9728 0.0634
6.3659 21.0 1743 4.3185 1.0 41.6910 0.0857
6.1174 22.0 1826 4.2137 1.0 41.7061 0.0646
5.8922 23.0 1909 4.1079 1.0 41.7720 0.0545
5.747 24.0 1992 3.9319 1.0 41.2620 0.1931
5.5505 25.0 2075 3.8795 1.0 41.2630 0.1169
5.3418 26.0 2158 3.8160 1.0 41.6766 0.0979
5.1912 27.0 2241 3.7565 1.0 41.6077 0.0972
5.0714 28.0 2324 3.6059 1.0 41.8966 0.1575
4.8811 29.0 2407 3.5714 1.0 41.3058 0.1184
4.8396 30.0 2490 3.6037 1.0 41.7404 0.0959
4.6897 31.0 2573 3.5540 1.0 40.8913 0.0918
4.5818 32.0 2656 3.4263 1.0 41.7674 0.1365
4.4231 33.0 2739 3.2331 1.0 40.9916 0.1768
4.345 34.0 2822 3.3233 1.0 41.6434 0.1605
4.2561 35.0 2905 3.2043 1.0 41.8876 0.1920
4.192 36.0 2988 3.2323 1.0 42.0650 0.1436
4.1131 37.0 3071 3.1558 1.0 41.6571 0.1903
4.0248 38.0 3154 3.0953 1.0 41.1832 0.2231
3.9353 39.0 3237 3.0665 1.0 40.7884 0.2473
3.8715 40.0 3320 3.0905 1.0 41.5836 0.1599
3.7659 41.0 3403 3.0597 1.0 41.5660 0.1769
3.7303 42.0 3486 2.9279 1.0 41.5299 0.2641
3.6671 43.0 3569 2.9700 1.0 41.6461 0.2292
3.5778 44.0 3652 2.9280 1.0 41.0377 0.3111
3.5576 45.0 3735 2.9184 1.0 42.2988 0.2039
3.4826 46.0 3818 2.9182 1.0 40.8568 0.2763
3.4323 47.0 3901 2.8435 1.0 40.8695 0.2529
3.3986 48.0 3984 2.8897 1.0 42.0452 0.2206
3.346 49.0 4067 2.8105 1.0 41.3435 0.2463
3.2765 50.0 4150 2.8312 1.0 40.7791 0.2250

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
7
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/c45e5d95c04fb6c57a4c8033d39fca63

Finetuned
(38)
this model