40efc318792ebd56a10f1612e2fce919

This model is a fine-tuned version of google/gemma-2b on the nyu-mll/glue [mnli] dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	8.3050	0	14.1671	0.3243	0.1890	0.3246	0.3246	0.3244
4.2895	1	12271	3.0390	0.0078	30.4238	0.6819	0.6844	0.6818	0.6819	0.6819
2.5474	2	24542	2.6322	0.0156	51.3774	0.7443	0.7429	0.7440	0.7445	0.7447
2.5025	3	36813	2.2221	0.0312	90.2574	0.7816	0.7810	0.7816	0.7816	0.7815
2.5107	4	49084	2.4528	0.0625	160.5099	0.7479	0.7441	0.7473	0.7478	0.7478
2.42	5	61355	2.6393	0.125	294.8323	0.7353	0.7324	0.7353	0.7355	0.7354
2.5207	6	73626	2.5955	0.25	570.8636	0.7446	0.7434	0.7445	0.7446	0.7446
2.2561	7	85897	2.4339	0.5	1125.5597	0.7509	0.7503	0.7508	0.7509	0.7510

Safetensors

Model size

0.6B params

Tensor type

F32

Base model

google/gemma-2b

Finetuned

(249)

this model