4d18f0b7b0fc742690ac565165d0438f

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [de-es] dataset. It achieves the following results on the evaluation set:

Loss: 2.2221
Data Size: 1.0
Epoch Runtime: 301.5996
Bleu: 1.1422

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	235.5920	0	21.6725	0.0151
No log	1	688	158.6363	0.0078	25.2709	0.0065
No log	2	1376	85.2046	0.0156	27.3339	0.0036
No log	3	2064	30.0130	0.0312	33.1372	0.0016
2.6754	4	2752	16.1713	0.0625	41.7777	0.0725
1.863	5	3440	12.0949	0.125	58.9698	0.1161
14.77	6	4128	9.1553	0.25	94.4603	0.1050
10.7326	7	4816	7.1104	0.5	162.4500	0.0777
7.1993	8.0	5504	4.8120	1.0	296.1590	0.0317
5.7684	9.0	6192	4.0821	1.0	296.1616	0.0404
4.8521	10.0	6880	3.5462	1.0	295.2833	0.1019
4.3215	11.0	7568	3.3064	1.0	293.7784	0.0986
3.9813	12.0	8256	3.1607	1.0	294.3076	0.1001
3.7181	13.0	8944	2.9624	1.0	293.7822	0.1871
3.5393	14.0	9632	2.9189	1.0	294.3335	0.2653
3.3765	15.0	10320	2.8352	1.0	297.6367	0.3737
3.2647	16.0	11008	2.7518	1.0	295.5343	0.2972
3.1507	17.0	11696	2.7165	1.0	293.9573	0.3930
3.0727	18.0	12384	2.6494	1.0	294.2747	0.3934
2.9689	19.0	13072	2.6156	1.0	293.8179	0.4885
2.9318	20.0	13760	2.5777	1.0	296.2244	0.4388
2.8625	21.0	14448	2.5504	1.0	292.5996	0.6039
2.8148	22.0	15136	2.5291	1.0	294.3252	0.5524
2.7718	23.0	15824	2.4954	1.0	295.4557	0.6300
2.7298	24.0	16512	2.4774	1.0	296.3096	0.6497
2.6893	25.0	17200	2.4443	1.0	295.0107	0.7417
2.6229	26.0	17888	2.4218	1.0	292.1687	0.6751
2.582	27.0	18576	2.3986	1.0	297.3084	0.6930
2.5685	28.0	19264	2.3927	1.0	300.0445	0.7737
2.5398	29.0	19952	2.3765	1.0	300.2355	0.7913
2.5014	30.0	20640	2.3554	1.0	295.5488	0.8455
2.4913	31.0	21328	2.3539	1.0	293.5029	0.7853
2.4406	32.0	22016	2.3387	1.0	297.9184	0.8931
2.4185	33.0	22704	2.3182	1.0	298.8404	0.9088
2.398	34.0	23392	2.3152	1.0	296.9342	0.7916
2.3568	35.0	24080	2.3110	1.0	298.0802	0.9005
2.3569	36.0	24768	2.2831	1.0	298.2648	0.9909
2.3031	37.0	25456	2.2923	1.0	299.1979	0.9535
2.2997	38.0	26144	2.2674	1.0	297.0862	0.9699
2.258	39.0	26832	2.2698	1.0	300.9418	1.1056
2.2468	40.0	27520	2.2597	1.0	294.5861	1.0727
2.227	41.0	28208	2.2564	1.0	298.8053	1.0456
2.1836	42.0	28896	2.2530	1.0	300.2710	1.1390
2.1922	43.0	29584	2.2477	1.0	302.2907	0.9623
2.1422	44.0	30272	2.2420	1.0	303.2315	1.0065
2.1354	45.0	30960	2.2412	1.0	302.1064	1.1697
2.1167	46.0	31648	2.2408	1.0	300.0712	1.1307
2.0993	47.0	32336	2.2337	1.0	300.4522	1.1571
2.0814	48.0	33024	2.2205	1.0	299.3218	1.0888
2.0697	49.0	33712	2.2232	1.0	302.0744	1.1725
2.0495	50.0	34400	2.2221	1.0	301.5996	1.1422

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 12

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/4d18f0b7b0fc742690ac565165d0438f

Base model

google/long-t5-local-large

Finetuned

(38)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard