irenewds/shawgpt-ft

Browse files

Files changed (3) hide show

README.md +16 -25
runs/Oct24_20-34-40_1f4e1c060daf/events.out.tfevents.1729802081.1f4e1c060daf.28754.7 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.4966
 ## Model description
@@ -39,37 +39,28 @@ The following hyperparameters were used during training:
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
-- num_epochs: 20
 - mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch   | Step | Validation Loss |
-|:-------------:|:-------:|:----:|:---------------:|
-| 4.6204        | 0.9231  | 3    | 4.1048          |
-| 4.3258        | 1.8462  | 6    | 3.8169          |
-| 3.9783        | 2.7692  | 9    | 3.5341          |
-| 2.7261        | 4.0     | 13   | 3.1687          |
-| 3.3397        | 4.9231  | 16   | 2.9220          |
-| 3.0432        | 5.8462  | 19   | 2.7075          |
-| 2.7883        | 6.7692  | 22   | 2.5078          |
-| 1.9298        | 8.0     | 26   | 2.2852          |
-| 2.3461        | 8.9231  | 29   | 2.1040          |
-| 2.1186        | 9.8462  | 32   | 1.9649          |
-| 1.9918        | 10.7692 | 35   | 1.8571          |
-| 1.3817        | 12.0    | 39   | 1.7531          |
-| 1.7729        | 12.9231 | 42   | 1.6817          |
-| 1.6647        | 13.8462 | 45   | 1.6200          |
-| 1.6129        | 14.7692 | 48   | 1.5709          |
-| 1.1762        | 16.0    | 52   | 1.5287          |
-| 1.5169        | 16.9231 | 55   | 1.5096          |
-| 1.4859        | 17.8462 | 58   | 1.4992          |
-| 1.0397        | 18.4615 | 60   | 1.4966          |
 ### Framework versions

 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.3903
 ## Model description
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
+- num_epochs: 16
 - mixed_precision_training: Native AMP
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 6.9703        | 0.6154 | 1    | 4.2307          |
+| 3.4299        | 1.8462 | 3    | 4.0954          |
+| 6.6315        | 2.4615 | 4    | 4.0002          |
+| 3.1735        | 3.6923 | 6    | 3.8260          |
+| 3.0824        | 4.9231 | 8    | 3.6824          |
+| 5.9421        | 5.5385 | 9    | 3.6204          |
+| 2.9008        | 6.7692 | 11   | 3.5164          |
+| 2.8262        | 8.0    | 13   | 3.4415          |
+| 5.5625        | 8.6154 | 14   | 3.4162          |
+| 2.334         | 9.8462 | 16   | 3.3903          |
 ### Framework versions

runs/Oct24_20-34-40_1f4e1c060daf/events.out.tfevents.1729802081.1f4e1c060daf.28754.7 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f7ccc4bcd3168a9a3101920db9672b671185a9c462111d15faf1ce8774c40d75
+size 10660

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c96ca61cdd35f9cd64d101a9f251c09920cbb29127ce95ca255189bdeedfa31a
 size 5176

 version https://git-lfs.github.com/spec/v1
+oid sha256:3df52654c2125c6cc3d3a176af21eb30f2ef1fc7e01e15ed33451cef4f6c43c0
 size 5176