4cf557b80c2abbcebf5cb6742ac3111a
This model is a fine-tuned version of Qwen/Qwen2.5-7B on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:
- Loss: 9.0028
- Data Size: 1.0
- Epoch Runtime: 167.2591
- Mse: 2.2516
- Mae: 1.2798
- R2: -0.0072
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Mse | Mae | R2 |
|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 162.1640 | 0 | 4.9325 | 40.5406 | 5.3933 | -17.1352 |
| No log | 1 | 179 | 216.4438 | 0.0078 | 6.0286 | 54.1147 | 5.4474 | -23.2074 |
| No log | 2 | 358 | 78.2590 | 0.0156 | 15.2698 | 19.5655 | 3.7806 | -7.7524 |
| No log | 3 | 537 | 71.3912 | 0.0312 | 26.6903 | 17.8493 | 3.8704 | -6.9846 |
| No log | 4 | 716 | 20.5629 | 0.0625 | 36.6249 | 5.1418 | 1.8678 | -1.3001 |
| No log | 5 | 895 | 14.8689 | 0.125 | 52.8272 | 3.7177 | 1.5604 | -0.6631 |
| 16.3429 | 6 | 1074 | 8.3563 | 0.25 | 75.7176 | 2.0895 | 1.1038 | 0.0653 |
| 5.983 | 7 | 1253 | 4.2240 | 0.5 | 126.6140 | 1.0560 | 0.8367 | 0.5276 |
| 4.5779 | 8.0 | 1432 | 3.3550 | 1.0 | 195.2600 | 0.8391 | 0.7164 | 0.6246 |
| 6.1749 | 9.0 | 1611 | 7.1648 | 1.0 | 153.6322 | 1.7914 | 1.0613 | 0.1987 |
| 12.2715 | 10.0 | 1790 | 10.9483 | 1.0 | 170.9618 | 2.7376 | 1.3450 | -0.2246 |
| 7.0884 | 11.0 | 1969 | 8.7058 | 1.0 | 157.1733 | 2.1772 | 1.2580 | 0.0260 |
| 9.5085 | 12.0 | 2148 | 9.0028 | 1.0 | 167.2591 | 2.2516 | 1.2798 | -0.0072 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- 10
Model tree for contemmcm/4cf557b80c2abbcebf5cb6742ac3111a
Base model
Qwen/Qwen2.5-7B