0d6c6b1dab82360d7d4f1596be35f632

This model is a fine-tuned version of albert/albert-large-v2 on the nyu-mll/glue dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	0.6981	0	1.1350	0.5069	0.3840	0.5069	0.5069	0.5069
No log	1	2104	0.6951	0.0078	3.2977	0.4931	0.4451	0.4931	0.4931	0.4931
No log	2	4208	0.6977	0.0156	3.2906	0.5093	0.3374	0.5093	0.5093	0.5081
0.0152	3	6312	0.7175	0.0312	5.3165	0.5093	0.3374	0.5093	0.5093	0.5081
0.7024	4	8416	0.6975	0.0625	9.5343	0.5093	0.3374	0.5093	0.5093	0.5081
0.703	5	10520	0.7688	0.125	17.8642	0.5093	0.3374	0.5093	0.5093	0.5081

Safetensors

Model size

17.7M params

Tensor type

F32

Base model

Finetuned

(19)

this model