a04f75b168fb5e65cc4aa043c0986e46

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [de-ru] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5194
  • Data Size: 1.0
  • Epoch Runtime: 190.7035
  • Bleu: 2.9426

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 202.5984 0 14.1871 0.0032
No log 1 434 146.0637 0.0078 16.4882 0.0009
No log 2 868 82.6320 0.0156 17.8014 0.0042
No log 3 1302 36.9967 0.0312 21.5719 0.0014
No log 4 1736 12.9721 0.0625 27.5246 0.0322
2.352 5 2170 8.9631 0.125 38.5097 0.0120
13.7923 6 2604 6.8370 0.25 60.6155 0.0158
9.8565 7 3038 5.2524 0.5 102.4748 0.1683
7.1531 8.0 3472 3.9123 1.0 189.4097 0.1544
5.7235 9.0 3906 3.3748 1.0 187.1634 0.3366
4.9217 10.0 4340 2.8952 1.0 189.0930 0.5316
4.2887 11.0 4774 2.6399 1.0 189.0462 0.6685
3.7893 12.0 5208 2.5477 1.0 188.3899 0.6963
3.5148 13.0 5642 2.2363 1.0 188.5408 0.9609
3.1863 14.0 6076 2.1948 1.0 187.5332 0.9438
2.9695 15.0 6510 2.1323 1.0 191.2049 1.0293
2.8219 16.0 6944 2.0521 1.0 189.4276 1.1523
2.6797 17.0 7378 1.9762 1.0 187.4849 1.2506
2.5808 18.0 7812 1.9738 1.0 188.5891 1.2668
2.4829 19.0 8246 1.9159 1.0 189.5944 1.2928
2.3538 20.0 8680 1.8791 1.0 189.3800 1.2319
2.2968 21.0 9114 1.8656 1.0 188.7305 1.2929
2.2477 22.0 9548 1.8339 1.0 190.0467 1.2751
2.1886 23.0 9982 1.8181 1.0 188.2904 1.2143
2.1226 24.0 10416 1.7675 1.0 187.7876 1.5454
2.0982 25.0 10850 1.7498 1.0 187.9860 1.6132
2.0616 26.0 11284 1.7464 1.0 189.7731 1.6805
1.9994 27.0 11718 1.7334 1.0 188.4008 1.6033
1.9536 28.0 12152 1.7069 1.0 188.5661 1.7287
1.9417 29.0 12586 1.6890 1.0 187.8541 1.7209
1.912 30.0 13020 1.6810 1.0 189.3464 1.7065
1.8667 31.0 13454 1.6775 1.0 190.3884 1.7976
1.8477 32.0 13888 1.6604 1.0 190.3526 1.8207
1.8402 33.0 14322 1.6425 1.0 189.3657 1.9363
1.815 34.0 14756 1.6353 1.0 190.6804 2.0192
1.7951 35.0 15190 1.6228 1.0 191.0806 2.0446
1.7637 36.0 15624 1.6211 1.0 190.5929 1.9894
1.7506 37.0 16058 1.6261 1.0 187.5604 2.0763
1.721 38.0 16492 1.5991 1.0 188.9553 2.2292
1.6896 39.0 16926 1.5949 1.0 190.0516 2.2357
1.6767 40.0 17360 1.5716 1.0 190.8610 2.3121
1.6509 41.0 17794 1.5777 1.0 191.4814 2.2918
1.6528 42.0 18228 1.5765 1.0 189.4791 2.3562
1.6221 43.0 18662 1.5635 1.0 189.4849 2.4379
1.5932 44.0 19096 1.5540 1.0 190.0615 2.5319
1.5844 45.0 19530 1.5539 1.0 190.9674 2.6273
1.5697 46.0 19964 1.5391 1.0 189.2361 2.5885
1.5423 47.0 20398 1.5403 1.0 190.4720 2.5408
1.5435 48.0 20832 1.5302 1.0 190.5557 2.6584
1.536 49.0 21266 1.5258 1.0 189.1135 2.8198
1.5052 50.0 21700 1.5194 1.0 190.7035 2.9426

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/a04f75b168fb5e65cc4aa043c0986e46

Finetuned
(38)
this model