c13a759b61384e71ae749ca08ee45860

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [de-en] dataset. It achieves the following results on the evaluation set:

Loss: 2.4073
Data Size: 1.0
Epoch Runtime: 555.3545
Bleu: 8.7727

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	235.2919	0	39.2661	0.0132
No log	1	1286	120.1965	0.0078	44.2889	0.0043
2.8146	2	2572	49.4094	0.0156	49.6286	0.0051
1.6071	3	3858	20.4550	0.0312	59.4574	0.7587
1.3271	4	5144	14.5204	0.0625	76.6593	0.8207
18.1546	5	6430	11.0647	0.125	108.8657	0.5067
13.2691	6	7716	8.8710	0.25	174.8121	0.2393
9.5107	7	9002	6.4360	0.5	305.4358	0.3510
6.6882	8.0	10288	5.0426	1.0	559.3121	0.4920
5.6293	9.0	11574	4.4863	1.0	562.5860	0.7639
5.0402	10.0	12860	4.2530	1.0	548.2183	0.8391
4.7066	11.0	14146	4.0890	1.0	547.5112	1.0605
4.4638	12.0	15432	3.9215	1.0	550.3030	1.2834
4.2706	13.0	16718	3.7970	1.0	546.8803	1.5993
4.1206	14.0	18004	3.7101	1.0	547.8734	1.6043
4.0485	15.0	19290	3.6028	1.0	550.3787	2.0361
3.8513	16.0	20576	3.4695	1.0	554.7206	2.4633
3.6965	17.0	21862	3.3163	1.0	552.1435	3.2631
3.5174	18.0	23148	3.1673	1.0	552.6019	3.7798
3.4198	19.0	24434	3.0481	1.0	553.0598	4.8593
3.2156	20.0	25720	2.9519	1.0	546.2709	5.5273
3.1142	21.0	27006	2.8637	1.0	548.0975	5.5330
3.0316	22.0	28292	2.7891	1.0	547.9021	5.8670
2.9276	23.0	29578	2.7405	1.0	556.6313	6.6010
2.8277	24.0	30864	2.6889	1.0	562.8570	6.5369
2.7293	25.0	32150	2.6523	1.0	554.6323	6.9262
2.7133	26.0	33436	2.6092	1.0	555.7105	7.2215
2.6023	27.0	34722	2.5574	1.0	555.7316	7.3526
2.5282	28.0	36008	2.5258	1.0	557.3460	7.5779
2.4563	29.0	37294	2.5076	1.0	562.6717	7.6128
2.3971	30.0	38580	2.4874	1.0	550.9001	7.7558
2.2987	31.0	39866	2.4654	1.0	552.4083	8.1211
2.2517	32.0	41152	2.4466	1.0	560.5730	8.0958
2.2175	33.0	42438	2.4352	1.0	555.0719	8.5334
2.2171	34.0	43724	2.4245	1.0	555.6583	8.1145
2.092	35.0	45010	2.4045	1.0	557.6587	7.7817
2.1219	36.0	46296	2.3932	1.0	556.0943	8.4077
2.0355	37.0	47582	2.3923	1.0	566.1635	8.4984
2.0013	38.0	48868	2.3870	1.0	560.9526	8.2356
1.9366	39.0	50154	2.3860	1.0	556.5659	8.4268
1.8571	40.0	51440	2.3951	1.0	556.1993	8.4897
1.8342	41.0	52726	2.3917	1.0	557.9698	8.8207
1.7889	42.0	54012	2.3831	1.0	554.2279	8.6197
1.7624	43.0	55298	2.3902	1.0	554.2447	8.5846
1.7079	44.0	56584	2.3930	1.0	556.5337	8.7441
1.6758	45.0	57870	2.4084	1.0	557.9219	8.4475
1.6574	46.0	59156	2.4073	1.0	555.3545	8.7727

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 11

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/c13a759b61384e71ae749ca08ee45860

Base model

google/long-t5-local-large

Finetuned

(38)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard