train_rte_1744902661

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0805
  • Num Input Tokens Seen: 98761256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0779 1.4207 200 0.0918 496688
0.0755 2.8414 400 0.0845 991488
0.0428 4.2567 600 0.0805 1481464
0.0787 5.6774 800 0.0808 1979088
0.0332 7.0927 1000 0.0825 2468504
0.0234 8.5134 1200 0.0847 2963120
0.0565 9.9340 1400 0.0859 3459048
0.0294 11.3494 1600 0.0934 3951104
0.0229 12.7701 1800 0.1011 4445432
0.0214 14.1854 2000 0.1132 4938824
0.004 15.6061 2200 0.1359 5433720
0.0013 17.0214 2400 0.1360 5925896
0.0017 18.4421 2600 0.1567 6422360
0.0012 19.8627 2800 0.1644 6914152
0.0064 21.2781 3000 0.1777 7403976
0.0006 22.6988 3200 0.1956 7902520
0.0002 24.1141 3400 0.2198 8394080
0.0003 25.5348 3600 0.2378 8884224
0.0001 26.9554 3800 0.2536 9382368
0.0001 28.3708 4000 0.2729 9872768
0.0 29.7914 4200 0.2836 10366000
0.0001 31.2068 4400 0.2932 10867488
0.0 32.6275 4600 0.2957 11358568
0.0001 34.0428 4800 0.3032 11852320
0.0 35.4635 5000 0.3060 12343880
0.0 36.8841 5200 0.3148 12837040
0.0 38.2995 5400 0.3201 13329368
0.0 39.7201 5600 0.3267 13828784
0.0 41.1355 5800 0.3299 14315304
0.0 42.5561 6000 0.3314 14806592
0.0 43.9768 6200 0.3411 15305208
0.0 45.3922 6400 0.3413 15791608
0.0 46.8128 6600 0.3466 16292464
0.0 48.2282 6800 0.3515 16781768
0.0 49.6488 7000 0.3563 17278560
0.0 51.0642 7200 0.3560 17769384
0.0 52.4848 7400 0.3619 18262680
0.0 53.9055 7600 0.3626 18763936
0.0 55.3209 7800 0.3667 19258096
0.0 56.7415 8000 0.3683 19753648
0.0 58.1569 8200 0.3750 20244128
0.0 59.5775 8400 0.3802 20739208
0.0 60.9982 8600 0.3804 21236872
0.0 62.4135 8800 0.3883 21726944
0.0 63.8342 9000 0.3856 22223288
0.0 65.2496 9200 0.3914 22716672
0.0 66.6702 9400 0.3966 23209088
0.0 68.0856 9600 0.3962 23701520
0.0 69.5062 9800 0.4016 24197944
0.0 70.9269 10000 0.3996 24694272
0.0 72.3422 10200 0.3997 25191256
0.0 73.7629 10400 0.4040 25688288
0.0 75.1783 10600 0.4158 26177720
0.0 76.5989 10800 0.4156 26675248
0.0 78.0143 11000 0.4173 27168496
0.0 79.4349 11200 0.4198 27664360
0.0 80.8556 11400 0.4272 28161984
0.0 82.2709 11600 0.4260 28655448
0.0 83.6916 11800 0.4278 29151808
0.0 85.1070 12000 0.4304 29642952
0.0 86.5276 12200 0.4381 30140536
0.0 87.9483 12400 0.4350 30639808
0.0 89.3636 12600 0.4398 31135048
0.0 90.7843 12800 0.4397 31630256
0.0 92.1996 13000 0.4424 32121256
0.0 93.6203 13200 0.4532 32618184
0.0 95.0357 13400 0.4497 33115432
0.0 96.4563 13600 0.4475 33609472
0.0 97.8770 13800 0.4516 34098712
0.0 99.2923 14000 0.4572 34590368
0.0 100.7130 14200 0.4523 35081248
0.0 102.1283 14400 0.4666 35571464
0.0 103.5490 14600 0.4678 36063824
0.0 104.9697 14800 0.4681 36557944
0.0 106.3850 15000 0.4724 37048560
0.0 107.8057 15200 0.4635 37543928
0.0 109.2210 15400 0.4800 38035968
0.0 110.6417 15600 0.4819 38526000
0.0 112.0570 15800 0.4826 39021440
0.0 113.4777 16000 0.4793 39519712
0.0 114.8984 16200 0.4884 40014440
0.0 116.3137 16400 0.4961 40509368
0.0 117.7344 16600 0.4914 41001000
0.0 119.1497 16800 0.4953 41492672
0.0 120.5704 17000 0.4933 41991984
0.0 121.9911 17200 0.5009 42486736
0.0 123.4064 17400 0.5002 42979888
0.0 124.8271 17600 0.5029 43473920
0.0 126.2424 17800 0.5095 43963728
0.0 127.6631 18000 0.5176 44457208
0.0 129.0784 18200 0.5129 44952664
0.0 130.4991 18400 0.5183 45446704
0.0 131.9198 18600 0.5097 45936552
0.0 133.3351 18800 0.5149 46426240
0.0 134.7558 19000 0.5187 46921256
0.0 136.1711 19200 0.5181 47412080
0.0 137.5918 19400 0.5187 47911024
0.0 139.0071 19600 0.5154 48404752
0.0 140.4278 19800 0.5252 48901416
0.0 141.8485 20000 0.5204 49400736
0.0 143.2638 20200 0.5231 49895752
0.0 144.6845 20400 0.5181 50380736
0.0 146.0998 20600 0.5283 50871288
0.0 147.5205 20800 0.5325 51360328
0.0 148.9412 21000 0.5230 51853696
0.0 150.3565 21200 0.5282 52348712
0.0 151.7772 21400 0.5331 52842992
0.0 153.1925 21600 0.5305 53335368
0.0 154.6132 21800 0.5353 53831240
0.0 156.0285 22000 0.5339 54320840
0.0 157.4492 22200 0.5395 54818304
0.0 158.8699 22400 0.5286 55310560
0.0 160.2852 22600 0.5361 55805192
0.0 161.7059 22800 0.5427 56294240
0.0 163.1212 23000 0.5402 56785216
0.0 164.5419 23200 0.5328 57277112
0.0 165.9626 23400 0.5400 57768960
0.0 167.3779 23600 0.5325 58259216
0.0 168.7986 23800 0.5375 58754552
0.0 170.2139 24000 0.5380 59250304
0.0 171.6346 24200 0.5376 59743752
0.0 173.0499 24400 0.5403 60240920
0.0 174.4706 24600 0.5476 60738488
0.0 175.8913 24800 0.5405 61232632
0.0 177.3066 25000 0.5426 61726896
0.0 178.7273 25200 0.5500 62220440
0.0 180.1426 25400 0.5384 62713544
0.0 181.5633 25600 0.5392 63208560
0.0 182.9840 25800 0.5366 63703320
0.0 184.3993 26000 0.5411 64195280
0.0 185.8200 26200 0.5460 64693448
0.0 187.2353 26400 0.5340 65180864
0.0 188.6560 26600 0.5409 65680024
0.0 190.0713 26800 0.5322 66173368
0.0 191.4920 27000 0.5351 66664968
0.0 192.9127 27200 0.5311 67157528
0.0 194.3280 27400 0.5385 67657848
0.0 195.7487 27600 0.5397 68154280
0.0 197.1640 27800 0.5369 68648760
0.0 198.5847 28000 0.5372 69145424
0.0 200.0 28200 0.5430 69634592
0.0 201.4207 28400 0.5315 70126824
0.0 202.8414 28600 0.5439 70621048
0.0 204.2567 28800 0.5393 71112744
0.0 205.6774 29000 0.5434 71609328
0.0 207.0927 29200 0.5420 72096488
0.0 208.5134 29400 0.5392 72590600
0.0 209.9340 29600 0.5428 73085400
0.0 211.3494 29800 0.5378 73578704
0.0 212.7701 30000 0.5413 74071832
0.0 214.1854 30200 0.5365 74558088
0.0 215.6061 30400 0.5338 75054720
0.0 217.0214 30600 0.5447 75550968
0.0 218.4421 30800 0.5418 76052048
0.0 219.8627 31000 0.5440 76544760
0.0 221.2781 31200 0.5407 77039312
0.0 222.6988 31400 0.5393 77536608
0.0 224.1141 31600 0.5339 78029096
0.0 225.5348 31800 0.5421 78521640
0.0 226.9554 32000 0.5473 79014704
0.0 228.3708 32200 0.5443 79509056
0.0 229.7914 32400 0.5436 80004760
0.0 231.2068 32600 0.5393 80498576
0.0 232.6275 32800 0.5429 80992160
0.0 234.0428 33000 0.5428 81484216
0.0 235.4635 33200 0.5518 81981536
0.0 236.8841 33400 0.5384 82469112
0.0 238.2995 33600 0.5381 82967264
0.0 239.7201 33800 0.5362 83460632
0.0 241.1355 34000 0.5469 83946936
0.0 242.5561 34200 0.5378 84438976
0.0 243.9768 34400 0.5419 84936992
0.0 245.3922 34600 0.5366 85424648
0.0 246.8128 34800 0.5383 85921552
0.0 248.2282 35000 0.5398 86414392
0.0 249.6488 35200 0.5405 86904424
0.0 251.0642 35400 0.5402 87399560
0.0 252.4848 35600 0.5418 87900568
0.0 253.9055 35800 0.5278 88391952
0.0 255.3209 36000 0.5487 88887288
0.0 256.7415 36200 0.5428 89375944
0.0 258.1569 36400 0.5406 89868176
0.0 259.5775 36600 0.5420 90365056
0.0 260.9982 36800 0.5366 90855096
0.0 262.4135 37000 0.5435 91348504
0.0 263.8342 37200 0.5350 91843280
0.0 265.2496 37400 0.5408 92339160
0.0 266.6702 37600 0.5417 92834936
0.0 268.0856 37800 0.5320 93329096
0.0 269.5062 38000 0.5405 93825960
0.0 270.9269 38200 0.5393 94316976
0.0 272.3422 38400 0.5388 94808456
0.0 273.7629 38600 0.5399 95304384
0.0 275.1783 38800 0.5384 95796256
0.0 276.5989 39000 0.5312 96293992
0.0 278.0143 39200 0.5339 96783960
0.0 279.4349 39400 0.5395 97275176
0.0 280.8556 39600 0.5489 97769584
0.0 282.2709 39800 0.5464 98266712
0.0 283.6916 40000 0.5345 98761256

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_rte_1744902661

Adapter
(2154)
this model

Evaluation results