lemexp-task1-v2-template_small_nodefs-Llama-3.2-1B-8lr-24epochs-nonspecial-eos-token

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1670

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 24
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.4177 0.4002 1440 0.4011
0.3596 0.8003 2880 0.3598
0.3297 1.2004 4320 0.3253
0.3159 1.6005 5760 0.3246
0.3067 2.0006 7200 0.3112
0.2948 2.4007 8640 0.3049
0.2864 2.8009 10080 0.2953
0.2753 3.2009 11520 0.2888
0.2713 3.6011 12960 0.2779
0.2682 4.0011 14400 0.2768
0.262 4.4013 15840 0.2739
0.2555 4.8014 17280 0.2742
0.2499 5.2015 18720 0.2648
0.2484 5.6016 20160 0.2723
0.2425 6.0017 21600 0.2579
0.2395 6.4018 23040 0.2447
0.2363 6.8020 24480 0.2571
0.226 7.2020 25920 0.2494
0.2289 7.6022 27360 0.2449
0.2294 8.0022 28800 0.2507
0.2153 8.4024 30240 0.2330
0.2159 8.8026 31680 0.2298
0.21 9.2026 33120 0.2275
0.2115 9.6028 34560 0.2332
0.2026 10.0028 36000 0.2274
0.1982 10.4029 37440 0.2214
0.2012 10.8031 38880 0.2145
0.1917 11.2031 40320 0.2179
0.1919 11.6033 41760 0.2135
0.1929 12.0033 43200 0.2088
0.18 12.4035 44640 0.2131
0.1841 12.8037 46080 0.2005
0.1729 13.2037 47520 0.2046
0.174 13.6039 48960 0.2056
0.1745 14.0039 50400 0.1975
0.1668 14.4041 51840 0.2001
0.1687 14.8042 53280 0.1923
0.1562 15.2043 54720 0.1963
0.1583 15.6044 56160 0.1853
0.1593 16.0044 57600 0.1869
0.1494 16.4046 59040 0.1833
0.1511 16.8048 60480 0.1816
0.1401 17.2048 61920 0.1810
0.1403 17.6050 63360 0.1832
0.142 18.0050 64800 0.1797
0.1293 18.4052 66240 0.1759
0.1346 18.8053 67680 0.1734
0.1223 19.2054 69120 0.1782
0.1218 19.6055 70560 0.1702
0.12 20.0056 72000 0.1749
0.1127 20.4057 73440 0.1701
0.1137 20.8059 74880 0.1672
0.1065 21.2059 76320 0.1728
0.1032 21.6061 77760 0.1663
0.1019 22.0061 79200 0.1690
0.0929 22.4063 80640 0.1686
0.0948 22.8064 82080 0.1651
0.0858 23.2065 83520 0.1667
0.0874 23.6066 84960 0.1670

Framework versions

  • PEFT 0.14.0
  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-v2-template_small_nodefs-Llama-3.2-1B-8lr-24epochs-nonspecial-eos-token

Adapter
(610)
this model