d0de82b464048426c21013a98b94b226
This model is a fine-tuned version of distilbert/distilbert-base-uncased-distilled-squad on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:
- Loss: 0.5263
- Data Size: 1.0
- Epoch Runtime: 6.2360
- Mse: 0.5266
- Mae: 0.5632
- R2: 0.7644
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Mse | Mae | R2 |
|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 7.1630 | 0 | 0.9963 | 7.1642 | 2.2442 | -2.2048 |
| No log | 1 | 179 | 4.1232 | 0.0078 | 1.3503 | 4.1242 | 1.6843 | -0.8449 |
| No log | 2 | 358 | 2.3754 | 0.0156 | 1.2906 | 2.3763 | 1.3290 | -0.0630 |
| No log | 3 | 537 | 1.7820 | 0.0312 | 1.5784 | 1.7826 | 1.1047 | 0.2026 |
| No log | 4 | 716 | 1.2457 | 0.0625 | 1.7373 | 1.2458 | 0.9017 | 0.4427 |
| No log | 5 | 895 | 0.6916 | 0.125 | 1.9072 | 0.6918 | 0.6535 | 0.6905 |
| 0.0938 | 6 | 1074 | 0.8121 | 0.25 | 2.5930 | 0.8121 | 0.6988 | 0.6367 |
| 0.5726 | 7 | 1253 | 0.5908 | 0.5 | 3.6150 | 0.5911 | 0.6119 | 0.7356 |
| 0.4264 | 8.0 | 1432 | 0.6125 | 1.0 | 6.1224 | 0.6126 | 0.5930 | 0.7260 |
| 0.2704 | 9.0 | 1611 | 0.6132 | 1.0 | 6.1703 | 0.6135 | 0.6034 | 0.7256 |
| 0.2095 | 10.0 | 1790 | 0.5552 | 1.0 | 6.1675 | 0.5555 | 0.5592 | 0.7515 |
| 0.1598 | 11.0 | 1969 | 0.6219 | 1.0 | 6.1374 | 0.6221 | 0.5942 | 0.7217 |
| 0.1313 | 12.0 | 2148 | 0.5631 | 1.0 | 6.4765 | 0.5633 | 0.5739 | 0.7480 |
| 0.107 | 13.0 | 2327 | 0.5578 | 1.0 | 6.0775 | 0.5581 | 0.5739 | 0.7503 |
| 0.1065 | 14.0 | 2506 | 0.5416 | 1.0 | 6.0314 | 0.5419 | 0.5551 | 0.7576 |
| 0.1038 | 15.0 | 2685 | 0.5477 | 1.0 | 6.1429 | 0.5480 | 0.5622 | 0.7549 |
| 0.0869 | 16.0 | 2864 | 0.5574 | 1.0 | 6.1720 | 0.5576 | 0.5697 | 0.7506 |
| 0.0699 | 17.0 | 3043 | 0.5085 | 1.0 | 6.0534 | 0.5088 | 0.5404 | 0.7724 |
| 0.0737 | 18.0 | 3222 | 0.6007 | 1.0 | 6.0803 | 0.6010 | 0.5945 | 0.7312 |
| 0.0612 | 19.0 | 3401 | 0.5205 | 1.0 | 6.0794 | 0.5208 | 0.5545 | 0.7670 |
| 0.0654 | 20.0 | 3580 | 0.5380 | 1.0 | 6.1629 | 0.5383 | 0.5534 | 0.7592 |
| 0.0567 | 21.0 | 3759 | 0.5263 | 1.0 | 6.2360 | 0.5266 | 0.5632 | 0.7644 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- 2