c13a759b61384e71ae749ca08ee45860

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [de-en] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4073
  • Data Size: 1.0
  • Epoch Runtime: 555.3545
  • Bleu: 8.7727

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 235.2919 0 39.2661 0.0132
No log 1 1286 120.1965 0.0078 44.2889 0.0043
2.8146 2 2572 49.4094 0.0156 49.6286 0.0051
1.6071 3 3858 20.4550 0.0312 59.4574 0.7587
1.3271 4 5144 14.5204 0.0625 76.6593 0.8207
18.1546 5 6430 11.0647 0.125 108.8657 0.5067
13.2691 6 7716 8.8710 0.25 174.8121 0.2393
9.5107 7 9002 6.4360 0.5 305.4358 0.3510
6.6882 8.0 10288 5.0426 1.0 559.3121 0.4920
5.6293 9.0 11574 4.4863 1.0 562.5860 0.7639
5.0402 10.0 12860 4.2530 1.0 548.2183 0.8391
4.7066 11.0 14146 4.0890 1.0 547.5112 1.0605
4.4638 12.0 15432 3.9215 1.0 550.3030 1.2834
4.2706 13.0 16718 3.7970 1.0 546.8803 1.5993
4.1206 14.0 18004 3.7101 1.0 547.8734 1.6043
4.0485 15.0 19290 3.6028 1.0 550.3787 2.0361
3.8513 16.0 20576 3.4695 1.0 554.7206 2.4633
3.6965 17.0 21862 3.3163 1.0 552.1435 3.2631
3.5174 18.0 23148 3.1673 1.0 552.6019 3.7798
3.4198 19.0 24434 3.0481 1.0 553.0598 4.8593
3.2156 20.0 25720 2.9519 1.0 546.2709 5.5273
3.1142 21.0 27006 2.8637 1.0 548.0975 5.5330
3.0316 22.0 28292 2.7891 1.0 547.9021 5.8670
2.9276 23.0 29578 2.7405 1.0 556.6313 6.6010
2.8277 24.0 30864 2.6889 1.0 562.8570 6.5369
2.7293 25.0 32150 2.6523 1.0 554.6323 6.9262
2.7133 26.0 33436 2.6092 1.0 555.7105 7.2215
2.6023 27.0 34722 2.5574 1.0 555.7316 7.3526
2.5282 28.0 36008 2.5258 1.0 557.3460 7.5779
2.4563 29.0 37294 2.5076 1.0 562.6717 7.6128
2.3971 30.0 38580 2.4874 1.0 550.9001 7.7558
2.2987 31.0 39866 2.4654 1.0 552.4083 8.1211
2.2517 32.0 41152 2.4466 1.0 560.5730 8.0958
2.2175 33.0 42438 2.4352 1.0 555.0719 8.5334
2.2171 34.0 43724 2.4245 1.0 555.6583 8.1145
2.092 35.0 45010 2.4045 1.0 557.6587 7.7817
2.1219 36.0 46296 2.3932 1.0 556.0943 8.4077
2.0355 37.0 47582 2.3923 1.0 566.1635 8.4984
2.0013 38.0 48868 2.3870 1.0 560.9526 8.2356
1.9366 39.0 50154 2.3860 1.0 556.5659 8.4268
1.8571 40.0 51440 2.3951 1.0 556.1993 8.4897
1.8342 41.0 52726 2.3917 1.0 557.9698 8.8207
1.7889 42.0 54012 2.3831 1.0 554.2279 8.6197
1.7624 43.0 55298 2.3902 1.0 554.2447 8.5846
1.7079 44.0 56584 2.3930 1.0 556.5337 8.7441
1.6758 45.0 57870 2.4084 1.0 557.9219 8.4475
1.6574 46.0 59156 2.4073 1.0 555.3545 8.7727

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
11
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/c13a759b61384e71ae749ca08ee45860

Finetuned
(38)
this model