train_winogrande_101112_1760638068

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2313
  • Num Input Tokens Seen: 38366624

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2313 1.0 9090 0.2314 1917952
0.2347 2.0 18180 0.2317 3835840
0.2335 3.0 27270 0.2313 5753152
0.2317 4.0 36360 0.2316 7672000
0.2319 5.0 45450 0.2314 9590080
0.2293 6.0 54540 0.2314 11509088
0.2362 7.0 63630 0.2317 13427712
0.2314 8.0 72720 0.2313 15346672
0.2329 9.0 81810 0.2315 17265344
0.2319 10.0 90900 0.2315 19184224
0.2319 11.0 99990 0.2314 21102912
0.2324 12.0 109080 0.2314 23021312
0.2324 13.0 118170 0.2314 24938688
0.2309 14.0 127260 0.2314 26857088
0.2319 15.0 136350 0.2313 28775840
0.2324 16.0 145440 0.2313 30693088
0.2298 17.0 154530 0.2313 32612480
0.2309 18.0 163620 0.2313 34530176
0.2324 19.0 172710 0.2315 36447600
0.2319 20.0 181800 0.2313 38366624

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
135
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_winogrande_101112_1760638068

Adapter
(2009)
this model