train_gsm8k_789_1760637938

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 6.6031
  • Num Input Tokens Seen: 34722248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5477 1.0 1682 0.5399 1739480
0.4443 2.0 3364 0.5181 3478568
0.45 3.0 5046 0.4996 5217760
0.4048 4.0 6728 0.4938 6949888
0.4764 5.0 8410 0.4856 8687904
0.5323 6.0 10092 0.4819 10421288
0.4192 7.0 11774 0.4789 12155264
0.5273 8.0 13456 0.4779 13889536
0.5192 9.0 15138 0.4751 15631248
0.4844 10.0 16820 0.4764 17370104
0.489 11.0 18502 0.4740 19100344
0.4828 12.0 20184 0.4757 20834120
0.3606 13.0 21866 0.4750 22566752
0.3906 14.0 23548 0.4766 24305592
0.3912 15.0 25230 0.4767 26037952
0.3813 16.0 26912 0.4781 27770056
0.3704 17.0 28594 0.4790 29506864
0.3476 18.0 30276 0.4812 31245432
0.3341 19.0 31958 0.4815 32980080
0.4645 20.0 33640 0.4816 34722248

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
155
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_789_1760637938

Adapter
(2016)
this model