cc6530f988294864455773617553a4e2

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [fi-fr] dataset. It achieves the following results on the evaluation set:

Loss: 3.9504
Data Size: 1.0
Epoch Runtime: 43.6482
Bleu: 0.6960

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	210.3118	0	3.5214	0.0339
No log	1	88	194.9818	0.0078	4.3850	0.0391
No log	2	176	169.7989	0.0156	5.5429	0.0418
No log	3	264	138.3005	0.0312	7.4443	0.0376
No log	4	352	98.2089	0.0625	10.0366	0.0106
No log	5	440	57.4321	0.125	13.4255	0.0171
8.5688	6	528	29.0110	0.25	17.6872	0.0283
16.0629	7	616	18.4866	0.5	26.4004	0.2147
24.2236	8.0	704	14.5337	1.0	44.8358	0.3396
21.6713	9.0	792	12.8300	1.0	44.1918	0.1911
18.5801	10.0	880	11.2006	1.0	42.8811	0.0965
16.5609	11.0	968	10.4868	1.0	43.8189	0.1437
15.0038	12.0	1056	9.2199	1.0	43.4298	0.1613
14.2505	13.0	1144	9.2859	1.0	42.9938	0.2398
13.3176	14.0	1232	8.5574	1.0	44.2948	0.2922
12.514	15.0	1320	7.9899	1.0	43.3043	0.2558
11.7392	16.0	1408	7.7368	1.0	43.6196	0.2846
11.3779	17.0	1496	7.9881	1.0	44.4180	0.2235
10.8107	18.0	1584	7.3984	1.0	42.9446	0.3104
10.3264	19.0	1672	6.7918	1.0	43.6092	0.3328
9.7584	20.0	1760	6.9432	1.0	42.9454	0.2219
9.5394	21.0	1848	6.6225	1.0	43.8095	0.2157
9.1196	22.0	1936	5.9313	1.0	42.8700	0.2882
8.6616	23.0	2024	5.8392	1.0	44.2303	0.3488
8.3646	24.0	2112	5.9353	1.0	44.0809	0.3769
8.0709	25.0	2200	5.5733	1.0	43.3977	0.2680
7.8487	26.0	2288	5.7821	1.0	43.9842	0.5559
7.5668	27.0	2376	5.6761	1.0	43.7011	0.2355
7.2396	28.0	2464	5.2907	1.0	43.4624	0.3873
7.042	29.0	2552	5.2608	1.0	43.5775	0.3210
6.8641	30.0	2640	5.0165	1.0	43.7052	0.4909
6.5967	31.0	2728	5.0911	1.0	43.7801	0.3103
6.423	32.0	2816	4.7339	1.0	42.6348	0.3563
6.2719	33.0	2904	4.7812	1.0	44.5199	0.4468
6.1618	34.0	2992	4.8750	1.0	43.5037	0.4621
5.9636	35.0	3080	4.7315	1.0	43.6152	0.3522
5.8011	36.0	3168	4.5619	1.0	43.7941	0.5355
5.7031	37.0	3256	4.4944	1.0	43.8364	0.4846
5.5746	38.0	3344	4.4306	1.0	42.8775	0.5597
5.4236	39.0	3432	4.4541	1.0	44.3190	0.5422
5.2757	40.0	3520	4.4262	1.0	44.0665	0.6967
5.1584	41.0	3608	4.2310	1.0	44.1722	0.6211
5.0859	42.0	3696	4.2331	1.0	43.3939	0.9223
4.9627	43.0	3784	4.1307	1.0	42.9168	0.6862
4.881	44.0	3872	4.1794	1.0	43.3896	0.6229
4.7832	45.0	3960	4.0830	1.0	43.9060	0.6629
4.7222	46.0	4048	4.0359	1.0	43.2298	0.6652
4.6348	47.0	4136	4.0967	1.0	44.0093	0.6725
4.5194	48.0	4224	4.0618	1.0	43.0736	0.9749
4.453	49.0	4312	3.9898	1.0	43.0868	0.7547
4.3851	50.0	4400	3.9504	1.0	43.6482	0.6960

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 4

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/cc6530f988294864455773617553a4e2

Base model

google/long-t5-local-large

Finetuned

(38)

this model

contemmcm
/

cc6530f988294864455773617553a4e2

cc6530f988294864455773617553a4e2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/cc6530f988294864455773617553a4e2

Evaluation results