train_cola_1757340211

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2174
  • Num Input Tokens Seen: 3668312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2364 0.5 962 0.1978 183008
0.3251 1.0 1924 0.1863 366712
0.1059 1.5 2886 0.1736 550360
0.207 2.0 3848 0.1481 734016
0.2568 2.5 4810 0.1479 917408
0.0476 3.0 5772 0.1609 1100824
0.0386 3.5 6734 0.1598 1283896
0.1213 4.0 7696 0.1442 1467248
0.0282 4.5 8658 0.1766 1651280
0.1311 5.0 9620 0.1388 1834568
0.0748 5.5 10582 0.1554 2017960
0.098 6.0 11544 0.1478 2201464
0.109 6.5 12506 0.1749 2384536
0.0855 7.0 13468 0.1513 2568040
0.0174 7.5 14430 0.1706 2750664
0.058 8.0 15392 0.1701 2934360
0.0563 8.5 16354 0.1804 3118424
0.2265 9.0 17316 0.1770 3301448
0.0602 9.5 18278 0.1786 3485512
0.1016 10.0 19240 0.1781 3668312

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_1757340211

Adapter
(2056)
this model