e89c4d4bc924762169972784ce0743d1

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-no] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9462
  • Data Size: 1.0
  • Epoch Runtime: 43.1853
  • Bleu: 0.8830

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 224.2352 0 3.5508 0.0032
No log 1 87 212.5210 0.0078 4.1545 0.0032
No log 2 174 192.2315 0.0156 5.8040 0.0054
No log 3 261 161.1566 0.0312 7.6665 0.0074
No log 4 348 126.3028 0.0625 9.8214 0.0060
7.3924 5 435 71.1394 0.125 12.4434 0.0046
22.2446 6 522 30.4704 0.25 17.5750 0.0035
14.0298 7 609 15.4122 0.5 26.8646 0.0209
13.029 8.0 696 10.2502 1.0 44.2144 0.0331
15.3985 9.0 783 9.0259 1.0 43.5366 0.0348
12.89 10.0 870 8.1174 1.0 42.5628 0.0544
11.3325 11.0 957 7.4832 1.0 43.3066 0.0545
10.7894 12.0 1044 6.9241 1.0 43.5899 0.0645
9.8539 13.0 1131 6.4567 1.0 43.7614 0.0814
9.0522 14.0 1218 5.9655 1.0 42.7995 0.1457
8.3978 15.0 1305 5.7566 1.0 43.4940 0.0861
8.1609 16.0 1392 5.4790 1.0 42.9029 0.1043
7.6713 17.0 1479 5.0701 1.0 43.7232 0.1156
7.3557 18.0 1566 4.9153 1.0 42.9773 0.1368
7.0059 19.0 1653 4.8731 1.0 43.4807 0.1251
6.8734 20.0 1740 5.0616 1.0 43.8662 0.1216
6.5613 21.0 1827 4.5904 1.0 43.5890 0.1562
6.249 22.0 1914 4.4879 1.0 42.7270 0.1646
6.0007 23.0 2001 4.3395 1.0 42.6230 0.1809
5.8872 24.0 2088 4.5178 1.0 43.2088 0.2103
5.7128 25.0 2175 4.2705 1.0 42.8065 0.2511
5.5175 26.0 2262 4.3097 1.0 43.0957 0.2314
5.3951 27.0 2349 3.9705 1.0 42.3281 0.3685
5.1632 28.0 2436 3.7944 1.0 43.0646 0.2776
5.0412 29.0 2523 3.7838 1.0 43.4584 0.3647
4.8635 30.0 2610 3.6218 1.0 42.9818 0.5388
4.764 31.0 2697 3.6284 1.0 43.7082 0.5182
4.6246 32.0 2784 3.5281 1.0 42.6314 0.5128
4.4867 33.0 2871 3.5645 1.0 42.4939 0.5195
4.3818 34.0 2958 3.6955 1.0 43.6410 0.3649
4.376 35.0 3045 3.4153 1.0 42.5558 0.5612
4.2115 36.0 3132 3.3060 1.0 43.0912 0.6357
4.1147 37.0 3219 3.2941 1.0 42.9670 0.6773
4.0376 38.0 3306 3.4444 1.0 42.8137 0.5379
3.9058 39.0 3393 3.2842 1.0 43.1814 0.6365
3.8797 40.0 3480 3.3343 1.0 43.0732 0.6551
3.799 41.0 3567 3.2287 1.0 43.0785 0.6682
3.7084 42.0 3654 3.0826 1.0 42.4590 0.8260
3.6224 43.0 3741 3.2035 1.0 42.6156 0.7929
3.5609 44.0 3828 3.1611 1.0 43.3427 0.7412
3.4857 45.0 3915 3.0869 1.0 43.0869 0.7860
3.4254 46.0 4002 3.0354 1.0 43.9039 0.8531
3.3516 47.0 4089 2.9898 1.0 42.9068 0.8146
3.3001 48.0 4176 2.9625 1.0 43.2900 0.8684
3.2534 49.0 4263 2.9992 1.0 42.2250 0.8155
3.1864 50.0 4350 2.9462 1.0 43.1853 0.8830

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
3
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/e89c4d4bc924762169972784ce0743d1

Finetuned
(38)
this model

Evaluation results