cc6530f988294864455773617553a4e2

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [fi-fr] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.9504
  • Data Size: 1.0
  • Epoch Runtime: 43.6482
  • Bleu: 0.6960

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 210.3118 0 3.5214 0.0339
No log 1 88 194.9818 0.0078 4.3850 0.0391
No log 2 176 169.7989 0.0156 5.5429 0.0418
No log 3 264 138.3005 0.0312 7.4443 0.0376
No log 4 352 98.2089 0.0625 10.0366 0.0106
No log 5 440 57.4321 0.125 13.4255 0.0171
8.5688 6 528 29.0110 0.25 17.6872 0.0283
16.0629 7 616 18.4866 0.5 26.4004 0.2147
24.2236 8.0 704 14.5337 1.0 44.8358 0.3396
21.6713 9.0 792 12.8300 1.0 44.1918 0.1911
18.5801 10.0 880 11.2006 1.0 42.8811 0.0965
16.5609 11.0 968 10.4868 1.0 43.8189 0.1437
15.0038 12.0 1056 9.2199 1.0 43.4298 0.1613
14.2505 13.0 1144 9.2859 1.0 42.9938 0.2398
13.3176 14.0 1232 8.5574 1.0 44.2948 0.2922
12.514 15.0 1320 7.9899 1.0 43.3043 0.2558
11.7392 16.0 1408 7.7368 1.0 43.6196 0.2846
11.3779 17.0 1496 7.9881 1.0 44.4180 0.2235
10.8107 18.0 1584 7.3984 1.0 42.9446 0.3104
10.3264 19.0 1672 6.7918 1.0 43.6092 0.3328
9.7584 20.0 1760 6.9432 1.0 42.9454 0.2219
9.5394 21.0 1848 6.6225 1.0 43.8095 0.2157
9.1196 22.0 1936 5.9313 1.0 42.8700 0.2882
8.6616 23.0 2024 5.8392 1.0 44.2303 0.3488
8.3646 24.0 2112 5.9353 1.0 44.0809 0.3769
8.0709 25.0 2200 5.5733 1.0 43.3977 0.2680
7.8487 26.0 2288 5.7821 1.0 43.9842 0.5559
7.5668 27.0 2376 5.6761 1.0 43.7011 0.2355
7.2396 28.0 2464 5.2907 1.0 43.4624 0.3873
7.042 29.0 2552 5.2608 1.0 43.5775 0.3210
6.8641 30.0 2640 5.0165 1.0 43.7052 0.4909
6.5967 31.0 2728 5.0911 1.0 43.7801 0.3103
6.423 32.0 2816 4.7339 1.0 42.6348 0.3563
6.2719 33.0 2904 4.7812 1.0 44.5199 0.4468
6.1618 34.0 2992 4.8750 1.0 43.5037 0.4621
5.9636 35.0 3080 4.7315 1.0 43.6152 0.3522
5.8011 36.0 3168 4.5619 1.0 43.7941 0.5355
5.7031 37.0 3256 4.4944 1.0 43.8364 0.4846
5.5746 38.0 3344 4.4306 1.0 42.8775 0.5597
5.4236 39.0 3432 4.4541 1.0 44.3190 0.5422
5.2757 40.0 3520 4.4262 1.0 44.0665 0.6967
5.1584 41.0 3608 4.2310 1.0 44.1722 0.6211
5.0859 42.0 3696 4.2331 1.0 43.3939 0.9223
4.9627 43.0 3784 4.1307 1.0 42.9168 0.6862
4.881 44.0 3872 4.1794 1.0 43.3896 0.6229
4.7832 45.0 3960 4.0830 1.0 43.9060 0.6629
4.7222 46.0 4048 4.0359 1.0 43.2298 0.6652
4.6348 47.0 4136 4.0967 1.0 44.0093 0.6725
4.5194 48.0 4224 4.0618 1.0 43.0736 0.9749
4.453 49.0 4312 3.9898 1.0 43.0868 0.7547
4.3851 50.0 4400 3.9504 1.0 43.6482 0.6960

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
4
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/cc6530f988294864455773617553a4e2

Finetuned
(38)
this model

Evaluation results