train_gsm8k_789_1760637938

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

Loss: 6.6031
Num Input Tokens Seen: 34722248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5477	1.0	1682	0.5399	1739480
0.4443	2.0	3364	0.5181	3478568
0.45	3.0	5046	0.4996	5217760
0.4048	4.0	6728	0.4938	6949888
0.4764	5.0	8410	0.4856	8687904
0.5323	6.0	10092	0.4819	10421288
0.4192	7.0	11774	0.4789	12155264
0.5273	8.0	13456	0.4779	13889536
0.5192	9.0	15138	0.4751	15631248
0.4844	10.0	16820	0.4764	17370104
0.489	11.0	18502	0.4740	19100344
0.4828	12.0	20184	0.4757	20834120
0.3606	13.0	21866	0.4750	22566752
0.3906	14.0	23548	0.4766	24305592
0.3912	15.0	25230	0.4767	26037952
0.3813	16.0	26912	0.4781	27770056
0.3704	17.0	28594	0.4790	29506864
0.3476	18.0	30276	0.4812	31245432
0.3341	19.0	31958	0.4815	32980080
0.4645	20.0	33640	0.4816	34722248

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 155

Model tree for rbelanec/train_gsm8k_789_1760637938

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2016)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard