lemexp-task1-v3-template_small-Llama-3.2-1B-8lr-12epochs-no-eos

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1547

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.4564 0.2001 720 0.3630
0.3609 0.4002 1440 0.3301
0.3184 0.6003 2160 0.3115
0.3117 0.8003 2880 0.3125
0.2947 1.0003 3600 0.2918
0.285 1.2004 4320 0.2820
0.276 1.4004 5040 0.2790
0.2747 1.6005 5760 0.2702
0.2739 1.8006 6480 0.2805
0.2668 2.0006 7200 0.2689
0.2572 2.2006 7920 0.2628
0.2544 2.4007 8640 0.2497
0.2466 2.6008 9360 0.2474
0.2469 2.8009 10080 0.2447
0.2469 3.0008 10800 0.2448
0.2368 3.2009 11520 0.2434
0.2375 3.4010 12240 0.2370
0.2312 3.6011 12960 0.2370
0.2277 3.8012 13680 0.2412
0.2286 4.0011 14400 0.2287
0.2189 4.2012 15120 0.2300
0.2183 4.4013 15840 0.2249
0.2139 4.6014 16560 0.2314
0.2138 4.8014 17280 0.2169
0.2082 5.0014 18000 0.2230
0.2035 5.2015 18720 0.2187
0.2027 5.4016 19440 0.2156
0.2014 5.6016 20160 0.2166
0.2003 5.8017 20880 0.2091
0.1951 6.0017 21600 0.2080
0.1882 6.2018 22320 0.2013
0.1877 6.4018 23040 0.1974
0.1848 6.6019 23760 0.2013
0.1837 6.8020 24480 0.2057
0.1808 7.0019 25200 0.1938
0.1719 7.2020 25920 0.1946
0.1712 7.4021 26640 0.1917
0.1717 7.6022 27360 0.1951
0.1654 7.8023 28080 0.1839
0.1691 8.0022 28800 0.1845
0.1575 8.2023 29520 0.1828
0.1524 8.4024 30240 0.1815
0.1562 8.6025 30960 0.1795
0.1522 8.8026 31680 0.1767
0.1531 9.0025 32400 0.1742
0.141 9.2026 33120 0.1736
0.1401 9.4027 33840 0.1682
0.1415 9.6028 34560 0.1684
0.1396 9.8028 35280 0.1651
0.1308 10.0028 36000 0.1657
0.1249 10.2029 36720 0.1627
0.122 10.4029 37440 0.1608
0.1211 10.6030 38160 0.1640
0.121 10.8031 38880 0.1563
0.1219 11.0031 39600 0.1582
0.1105 11.2031 40320 0.1561
0.1102 11.4032 41040 0.1561
0.1073 11.6033 41760 0.1556
0.1064 11.8034 42480 0.1547

Framework versions

  • PEFT 0.15.2
  • Transformers 4.57.1
  • Pytorch 2.7.0+cu126
  • Datasets 4.3.0
  • Tokenizers 0.22.1
Downloads last month
244
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-v3-template_small-Llama-3.2-1B-8lr-12epochs-no-eos

Adapter
(590)
this model