train_cola_101112_1760638047

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 1.1558
Num Input Tokens Seen: 7325256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.8001	1.0	1924	1.3159	366136
1.1352	2.0	3848	1.1830	732880
1.0681	3.0	5772	1.1599	1099816
1.2597	4.0	7696	1.1679	1465464
0.8928	5.0	9620	1.1558	1831728
1.1918	6.0	11544	1.1778	2198176
1.2012	7.0	13468	1.1566	2564208
1.2548	8.0	15392	1.1667	2930240
1.0706	9.0	17316	1.1753	3297136
0.8893	10.0	19240	1.1691	3663392
1.0184	11.0	21164	1.1679	4028760
1.2287	12.0	23088	1.1591	4394320
1.2017	13.0	25012	1.1729	4761000
0.9829	14.0	26936	1.1689	5127440
1.05	15.0	28860	1.1611	5494368
1.1359	16.0	30784	1.1626	5860888
1.5683	17.0	32708	1.1663	6226952
0.8134	18.0	34632	1.1715	6593400
1.0798	19.0	36556	1.1715	6959600
1.2922	20.0	38480	1.1715	7325256

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 10

Model tree for rbelanec/train_cola_101112_1760638047

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2072)

this model

rbelanec
/

train_cola_101112_1760638047

train_cola_101112_1760638047

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_cola_101112_1760638047

Evaluation results