93beed4cf0a292354d00816feccfa413

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [es-pt] dataset. It achieves the following results on the evaluation set:

Loss: 4.2710
Data Size: 1.0
Epoch Runtime: 21.8637
Bleu: 0.9816

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	236.3281	0	1.8417	0.0218
No log	1	33	220.4164	0.0078	2.4454	0.0215
No log	2	66	204.8405	0.0156	3.2886	0.0205
No log	3	99	178.9747	0.0312	5.1289	0.0170
10.8048	4	132	149.1265	0.0625	7.5365	0.0163
10.8048	5	165	115.3729	0.125	9.7194	0.0064
10.8048	6	198	73.5350	0.25	11.7979	0.0057
23.6174	7	231	35.5477	0.5	14.7557	0.0068
36.4933	8.0	264	17.0832	1.0	22.0719	0.0452
36.4933	9.0	297	14.2073	1.0	21.0046	0.0598
26.2706	10.0	330	12.5317	1.0	21.5360	0.0276
19.6304	11.0	363	11.6492	1.0	20.4911	0.1435
19.6304	12.0	396	10.6991	1.0	20.8877	0.0648
16.9168	13.0	429	10.2809	1.0	20.8863	0.0476
15.2537	14.0	462	9.7464	1.0	20.6664	0.0335
15.2537	15.0	495	9.8049	1.0	20.8270	0.0558
13.8518	16.0	528	8.2535	1.0	20.6819	0.1067
12.748	17.0	561	8.4780	1.0	21.5175	0.0440
12.748	18.0	594	8.1782	1.0	20.7945	0.0967
11.8111	19.0	627	7.4566	1.0	20.7800	0.2006
11.0365	20.0	660	7.3057	1.0	21.1041	0.1273
11.0365	21.0	693	6.7133	1.0	21.0027	0.3471
10.5073	22.0	726	6.9228	1.0	20.9700	0.3550
9.8583	23.0	759	7.0375	1.0	21.6299	0.2705
9.8583	24.0	792	6.7037	1.0	21.2713	0.4168
9.3498	25.0	825	6.2121	1.0	20.5182	0.5571
8.9215	26.0	858	5.9844	1.0	20.7437	0.5474
8.9215	27.0	891	6.0323	1.0	20.7299	0.6197
8.5465	28.0	924	5.6314	1.0	20.8857	0.5808
8.1801	29.0	957	5.5487	1.0	21.6938	0.5568
8.1801	30.0	990	5.6767	1.0	20.6651	0.4915
7.8944	31.0	1023	5.3007	1.0	20.7470	0.6073
7.6164	32.0	1056	5.4566	1.0	20.6885	0.5843
7.6164	33.0	1089	5.2941	1.0	21.0663	0.4686
7.3031	34.0	1122	5.2816	1.0	21.5694	0.7061
7.101	35.0	1155	5.2643	1.0	20.7937	0.6266
7.101	36.0	1188	5.0665	1.0	20.9322	0.5939
6.8672	37.0	1221	4.9107	1.0	21.7358	0.6661
6.6882	38.0	1254	4.9897	1.0	21.7348	0.7313
6.6882	39.0	1287	5.2259	1.0	22.1833	0.5689
6.48	40.0	1320	4.8984	1.0	22.8371	0.7928
6.3136	41.0	1353	4.7463	1.0	21.5830	0.7910
6.3136	42.0	1386	4.6112	1.0	21.7582	0.7291
6.1308	43.0	1419	4.8187	1.0	22.0048	0.6860
5.9509	44.0	1452	4.6719	1.0	22.0582	0.7044
5.9509	45.0	1485	4.4241	1.0	21.8078	0.9858
5.8166	46.0	1518	4.5210	1.0	21.7949	0.8620
5.6598	47.0	1551	4.5541	1.0	22.3571	0.7085
5.6598	48.0	1584	4.3421	1.0	21.4648	0.7746
5.4744	49.0	1617	4.2760	1.0	21.7306	0.8698
5.4035	50.0	1650	4.2710	1.0	21.8637	0.9816

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/93beed4cf0a292354d00816feccfa413

Base model

google/long-t5-local-large

Finetuned

(38)

this model