train_svamp_101112_1760638000

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Loss: 0.1424
Num Input Tokens Seen: 1430592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.168	1.0	158	0.2062	71552
0.1665	2.0	316	0.1748	142960
0.0347	3.0	474	0.1424	214432
0.0922	4.0	632	0.1507	286000
0.0243	5.0	790	0.1568	357888
0.0981	6.0	948	0.1513	429456
0.0112	7.0	1106	0.1503	501136
0.0097	8.0	1264	0.1528	573104
0.0126	9.0	1422	0.1989	644752
0.0109	10.0	1580	0.2334	716192
0.0012	11.0	1738	0.2143	787200
0.0027	12.0	1896	0.2454	858736
0.0003	13.0	2054	0.2466	930160
0.0002	14.0	2212	0.2516	1001792
0.001	15.0	2370	0.2517	1073248
0.0001	16.0	2528	0.2521	1144672
0.0003	17.0	2686	0.2519	1216160
0.0002	18.0	2844	0.2529	1287728
0.0004	19.0	3002	0.2522	1359120
0.0002	20.0	3160	0.2546	1430592

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 189

Model tree for rbelanec/train_svamp_101112_1760638000

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2014)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard