c45e5d95c04fb6c57a4c8033d39fca63

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [es-fi] dataset. It achieves the following results on the evaluation set:

Loss: 2.8312
Data Size: 1.0
Epoch Runtime: 40.7791
Bleu: 0.2250

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	209.7094	0	3.3267	0.0010
No log	1	83	184.2445	0.0078	3.9533	0.0012
No log	2	166	161.0579	0.0156	5.0809	0.0013
No log	3	249	144.9138	0.0312	6.7955	0.0013
5.3868	4	332	111.1862	0.0625	9.0224	0.0011
5.3868	5	415	65.2582	0.125	12.1194	0.0011
5.3868	6	498	25.2881	0.25	16.3851	0.0011
13.292	7	581	13.2800	0.5	25.3184	0.0035
18.2018	8.0	664	9.5933	1.0	42.7601	0.0043
15.789	9.0	747	8.5477	1.0	41.0928	0.0045
13.0828	10.0	830	7.7491	1.0	41.1897	0.0043
11.4015	11.0	913	7.6218	1.0	41.2410	0.0044
10.7068	12.0	996	6.7302	1.0	41.2708	0.0077
9.7104	13.0	1079	6.2087	1.0	40.5168	0.0122
8.8903	14.0	1162	5.9130	1.0	41.7129	0.0248
8.6231	15.0	1245	5.1895	1.0	41.0517	0.0652
7.9684	16.0	1328	5.2467	1.0	41.2612	0.0485
7.5212	17.0	1411	4.8904	1.0	41.8618	0.0594
7.3085	18.0	1494	4.9712	1.0	41.2331	0.0317
6.8657	19.0	1577	4.5537	1.0	41.8624	0.0458
6.5853	20.0	1660	4.4626	1.0	41.9728	0.0634
6.3659	21.0	1743	4.3185	1.0	41.6910	0.0857
6.1174	22.0	1826	4.2137	1.0	41.7061	0.0646
5.8922	23.0	1909	4.1079	1.0	41.7720	0.0545
5.747	24.0	1992	3.9319	1.0	41.2620	0.1931
5.5505	25.0	2075	3.8795	1.0	41.2630	0.1169
5.3418	26.0	2158	3.8160	1.0	41.6766	0.0979
5.1912	27.0	2241	3.7565	1.0	41.6077	0.0972
5.0714	28.0	2324	3.6059	1.0	41.8966	0.1575
4.8811	29.0	2407	3.5714	1.0	41.3058	0.1184
4.8396	30.0	2490	3.6037	1.0	41.7404	0.0959
4.6897	31.0	2573	3.5540	1.0	40.8913	0.0918
4.5818	32.0	2656	3.4263	1.0	41.7674	0.1365
4.4231	33.0	2739	3.2331	1.0	40.9916	0.1768
4.345	34.0	2822	3.3233	1.0	41.6434	0.1605
4.2561	35.0	2905	3.2043	1.0	41.8876	0.1920
4.192	36.0	2988	3.2323	1.0	42.0650	0.1436
4.1131	37.0	3071	3.1558	1.0	41.6571	0.1903
4.0248	38.0	3154	3.0953	1.0	41.1832	0.2231
3.9353	39.0	3237	3.0665	1.0	40.7884	0.2473
3.8715	40.0	3320	3.0905	1.0	41.5836	0.1599
3.7659	41.0	3403	3.0597	1.0	41.5660	0.1769
3.7303	42.0	3486	2.9279	1.0	41.5299	0.2641
3.6671	43.0	3569	2.9700	1.0	41.6461	0.2292
3.5778	44.0	3652	2.9280	1.0	41.0377	0.3111
3.5576	45.0	3735	2.9184	1.0	42.2988	0.2039
3.4826	46.0	3818	2.9182	1.0	40.8568	0.2763
3.4323	47.0	3901	2.8435	1.0	40.8695	0.2529
3.3986	48.0	3984	2.8897	1.0	42.0452	0.2206
3.346	49.0	4067	2.8105	1.0	41.3435	0.2463
3.2765	50.0	4150	2.8312	1.0	40.7791	0.2250

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 7

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/c45e5d95c04fb6c57a4c8033d39fca63

Base model

google/long-t5-local-large

Finetuned

(38)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard