train_wsc_123_1760364575

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5938
  • Num Input Tokens Seen: 1465808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4242 1.504 188 0.6816 73760
0.3927 3.008 376 0.3612 148032
0.3887 4.5120 564 0.4348 222944
0.3473 6.016 752 0.3647 294320
0.3476 7.52 940 0.3601 369248
0.3447 9.024 1128 0.3562 442000
0.3588 10.528 1316 0.3446 516624
0.3716 12.032 1504 0.3595 589072
0.3868 13.536 1692 0.3683 662256
0.327 15.04 1880 0.3556 736272
0.3534 16.544 2068 0.3579 809824
0.3555 18.048 2256 0.3524 882480
0.3638 19.552 2444 0.3565 956000
0.3484 21.056 2632 0.3556 1028736
0.3373 22.56 2820 0.3549 1102672
0.3468 24.064 3008 0.3554 1176448
0.3631 25.568 3196 0.3627 1249968
0.3279 27.072 3384 0.3589 1322608
0.352 28.576 3572 0.3576 1396032

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_123_1760364575

Adapter
(2158)
this model

Evaluation results