eac34b484a22a3628c2ccdb617917d69

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [es-ru] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5964
  • Data Size: 1.0
  • Epoch Runtime: 185.4283
  • Bleu: 1.3104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 201.7415 0 13.7183 0.0021
No log 1 419 150.6333 0.0078 16.5532 0.0017
No log 2 838 99.6125 0.0156 17.9770 0.0029
3.5686 3 1257 40.6733 0.0312 21.3690 0.0034
3.5686 4 1676 15.8695 0.0625 27.4520 0.0037
2.029 5 2095 8.8537 0.125 39.4947 0.0151
1.8451 6 2514 7.1642 0.25 60.9853 0.0610
10.1166 7 2933 5.6426 0.5 102.4794 0.0545
7.1548 8.0 3352 3.9471 1.0 187.1191 0.0995
5.8098 9.0 3771 3.2701 1.0 185.9093 0.1842
5.0287 10.0 4190 3.0513 1.0 185.4253 0.2690
4.305 11.0 4609 2.6723 1.0 184.4337 0.3414
3.8963 12.0 5028 2.6301 1.0 185.9095 0.3794
3.5101 13.0 5447 2.3203 1.0 185.7310 0.3516
3.294 14.0 5866 2.2275 1.0 186.2012 0.3561
3.056 15.0 6285 2.1389 1.0 186.7088 0.5289
2.8816 16.0 6704 2.0732 1.0 186.1090 0.5626
2.7164 17.0 7123 2.0461 1.0 186.8601 0.5161
2.6059 18.0 7542 1.9743 1.0 186.4966 0.5527
2.4976 19.0 7961 1.9548 1.0 184.8019 0.3953
2.4114 20.0 8380 1.9271 1.0 187.4944 0.4917
2.3611 21.0 8799 1.8831 1.0 186.7338 0.5610
2.3081 22.0 9218 1.8734 1.0 186.6796 0.5303
2.2358 23.0 9637 1.8519 1.0 187.8298 0.5170
2.1536 24.0 10056 1.8180 1.0 187.0982 0.4831
2.1465 25.0 10475 1.8126 1.0 185.7881 0.6376
2.1027 26.0 10894 1.7823 1.0 186.0421 0.7252
2.0634 27.0 11313 1.7695 1.0 186.5257 0.6960
2.0437 28.0 11732 1.7442 1.0 184.7985 0.7366
2.0119 29.0 12151 1.7512 1.0 184.7184 0.6206
1.9818 30.0 12570 1.7254 1.0 186.1011 0.7023
1.952 31.0 12989 1.7143 1.0 186.7252 0.7113
1.9279 32.0 13408 1.7260 1.0 187.3581 0.7404
1.9076 33.0 13827 1.7100 1.0 187.1126 0.8001
1.882 34.0 14246 1.7079 1.0 187.0736 0.9485
1.8713 35.0 14665 1.6857 1.0 185.7581 0.8636
1.8552 36.0 15084 1.6698 1.0 186.2798 0.9409
1.811 37.0 15503 1.6657 1.0 184.8279 0.9436
1.8074 38.0 15922 1.6667 1.0 185.9223 0.9371
1.7854 39.0 16341 1.6506 1.0 187.9232 1.0613
1.7659 40.0 16760 1.6429 1.0 186.2874 1.0016
1.7587 41.0 17179 1.6280 1.0 185.0036 0.9775
1.748 42.0 17598 1.6436 1.0 185.1594 1.1243
1.7297 43.0 18017 1.6248 1.0 185.3714 1.0467
1.7089 44.0 18436 1.6211 1.0 186.5728 1.1851
1.6861 45.0 18855 1.6184 1.0 184.8577 1.1488
1.6765 46.0 19274 1.6148 1.0 185.9029 1.0135
1.6652 47.0 19693 1.6117 1.0 187.1650 1.1559
1.6399 48.0 20112 1.5886 1.0 187.4137 1.1896
1.6529 49.0 20531 1.5901 1.0 185.5260 1.1178
1.6186 50.0 20950 1.5964 1.0 185.4283 1.3104

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
6
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/eac34b484a22a3628c2ccdb617917d69

Finetuned
(38)
this model