e89c4d4bc924762169972784ce0743d1

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-no] dataset. It achieves the following results on the evaluation set:

Loss: 2.9462
Data Size: 1.0
Epoch Runtime: 43.1853
Bleu: 0.8830

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	224.2352	0	3.5508	0.0032
No log	1	87	212.5210	0.0078	4.1545	0.0032
No log	2	174	192.2315	0.0156	5.8040	0.0054
No log	3	261	161.1566	0.0312	7.6665	0.0074
No log	4	348	126.3028	0.0625	9.8214	0.0060
7.3924	5	435	71.1394	0.125	12.4434	0.0046
22.2446	6	522	30.4704	0.25	17.5750	0.0035
14.0298	7	609	15.4122	0.5	26.8646	0.0209
13.029	8.0	696	10.2502	1.0	44.2144	0.0331
15.3985	9.0	783	9.0259	1.0	43.5366	0.0348
12.89	10.0	870	8.1174	1.0	42.5628	0.0544
11.3325	11.0	957	7.4832	1.0	43.3066	0.0545
10.7894	12.0	1044	6.9241	1.0	43.5899	0.0645
9.8539	13.0	1131	6.4567	1.0	43.7614	0.0814
9.0522	14.0	1218	5.9655	1.0	42.7995	0.1457
8.3978	15.0	1305	5.7566	1.0	43.4940	0.0861
8.1609	16.0	1392	5.4790	1.0	42.9029	0.1043
7.6713	17.0	1479	5.0701	1.0	43.7232	0.1156
7.3557	18.0	1566	4.9153	1.0	42.9773	0.1368
7.0059	19.0	1653	4.8731	1.0	43.4807	0.1251
6.8734	20.0	1740	5.0616	1.0	43.8662	0.1216
6.5613	21.0	1827	4.5904	1.0	43.5890	0.1562
6.249	22.0	1914	4.4879	1.0	42.7270	0.1646
6.0007	23.0	2001	4.3395	1.0	42.6230	0.1809
5.8872	24.0	2088	4.5178	1.0	43.2088	0.2103
5.7128	25.0	2175	4.2705	1.0	42.8065	0.2511
5.5175	26.0	2262	4.3097	1.0	43.0957	0.2314
5.3951	27.0	2349	3.9705	1.0	42.3281	0.3685
5.1632	28.0	2436	3.7944	1.0	43.0646	0.2776
5.0412	29.0	2523	3.7838	1.0	43.4584	0.3647
4.8635	30.0	2610	3.6218	1.0	42.9818	0.5388
4.764	31.0	2697	3.6284	1.0	43.7082	0.5182
4.6246	32.0	2784	3.5281	1.0	42.6314	0.5128
4.4867	33.0	2871	3.5645	1.0	42.4939	0.5195
4.3818	34.0	2958	3.6955	1.0	43.6410	0.3649
4.376	35.0	3045	3.4153	1.0	42.5558	0.5612
4.2115	36.0	3132	3.3060	1.0	43.0912	0.6357
4.1147	37.0	3219	3.2941	1.0	42.9670	0.6773
4.0376	38.0	3306	3.4444	1.0	42.8137	0.5379
3.9058	39.0	3393	3.2842	1.0	43.1814	0.6365
3.8797	40.0	3480	3.3343	1.0	43.0732	0.6551
3.799	41.0	3567	3.2287	1.0	43.0785	0.6682
3.7084	42.0	3654	3.0826	1.0	42.4590	0.8260
3.6224	43.0	3741	3.2035	1.0	42.6156	0.7929
3.5609	44.0	3828	3.1611	1.0	43.3427	0.7412
3.4857	45.0	3915	3.0869	1.0	43.0869	0.7860
3.4254	46.0	4002	3.0354	1.0	43.9039	0.8531
3.3516	47.0	4089	2.9898	1.0	42.9068	0.8146
3.3001	48.0	4176	2.9625	1.0	43.2900	0.8684
3.2534	49.0	4263	2.9992	1.0	42.2250	0.8155
3.1864	50.0	4350	2.9462	1.0	43.1853	0.8830

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 3

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/e89c4d4bc924762169972784ce0743d1

Base model

google/long-t5-local-large

Finetuned

(38)

this model

contemmcm
/

e89c4d4bc924762169972784ce0743d1

e89c4d4bc924762169972784ce0743d1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/e89c4d4bc924762169972784ce0743d1

Evaluation results