End of training
Browse files- README.md +21 -21
- adapter_model.bin +1 -1
README.md
CHANGED
|
@@ -22,8 +22,8 @@ base_model: mistralai/Mistral-7B-v0.1
|
|
| 22 |
model_type: MistralForCausalLM
|
| 23 |
tokenizer_type: LlamaTokenizer
|
| 24 |
|
| 25 |
-
load_in_8bit:
|
| 26 |
-
load_in_4bit:
|
| 27 |
strict: false
|
| 28 |
|
| 29 |
data_seed: 42
|
|
@@ -38,7 +38,7 @@ output_dir: ./outputs/mistral/lora-out-templatefree
|
|
| 38 |
hub_model_id: strickvl/isafpr-mistral-lora-templatefree
|
| 39 |
|
| 40 |
|
| 41 |
-
sequence_len:
|
| 42 |
sample_packing: true
|
| 43 |
pad_to_sequence_len: true
|
| 44 |
|
|
@@ -110,7 +110,7 @@ special_tokens:
|
|
| 110 |
|
| 111 |
This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
|
| 112 |
It achieves the following results on the evaluation set:
|
| 113 |
-
- Loss: 0.
|
| 114 |
|
| 115 |
## Model description
|
| 116 |
|
|
@@ -147,23 +147,23 @@ The following hyperparameters were used during training:
|
|
| 147 |
|
| 148 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 149 |
|:-------------:|:------:|:----:|:---------------:|
|
| 150 |
-
| 1.
|
| 151 |
-
| 0.
|
| 152 |
-
| 0.
|
| 153 |
-
| 0.
|
| 154 |
-
| 0.
|
| 155 |
-
| 0.
|
| 156 |
-
| 0.
|
| 157 |
-
| 0.
|
| 158 |
-
| 0.
|
| 159 |
-
| 0.
|
| 160 |
-
| 0.
|
| 161 |
-
| 0.
|
| 162 |
-
| 0.
|
| 163 |
-
| 0.
|
| 164 |
-
| 0.
|
| 165 |
-
| 0.
|
| 166 |
-
| 0.
|
| 167 |
|
| 168 |
|
| 169 |
### Framework versions
|
|
|
|
| 22 |
model_type: MistralForCausalLM
|
| 23 |
tokenizer_type: LlamaTokenizer
|
| 24 |
|
| 25 |
+
load_in_8bit: true
|
| 26 |
+
load_in_4bit: false
|
| 27 |
strict: false
|
| 28 |
|
| 29 |
data_seed: 42
|
|
|
|
| 38 |
hub_model_id: strickvl/isafpr-mistral-lora-templatefree
|
| 39 |
|
| 40 |
|
| 41 |
+
sequence_len: 2048
|
| 42 |
sample_packing: true
|
| 43 |
pad_to_sequence_len: true
|
| 44 |
|
|
|
|
| 110 |
|
| 111 |
This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
|
| 112 |
It achieves the following results on the evaluation set:
|
| 113 |
+
- Loss: 0.0288
|
| 114 |
|
| 115 |
## Model description
|
| 116 |
|
|
|
|
| 147 |
|
| 148 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 149 |
|:-------------:|:------:|:----:|:---------------:|
|
| 150 |
+
| 1.5339 | 0.0131 | 1 | 1.5408 |
|
| 151 |
+
| 0.0671 | 0.2492 | 19 | 0.0549 |
|
| 152 |
+
| 0.037 | 0.4984 | 38 | 0.0406 |
|
| 153 |
+
| 0.0424 | 0.7475 | 57 | 0.0361 |
|
| 154 |
+
| 0.035 | 0.9967 | 76 | 0.0351 |
|
| 155 |
+
| 0.0322 | 1.2295 | 95 | 0.0336 |
|
| 156 |
+
| 0.0247 | 1.4787 | 114 | 0.0314 |
|
| 157 |
+
| 0.0229 | 1.7279 | 133 | 0.0313 |
|
| 158 |
+
| 0.0241 | 1.9770 | 152 | 0.0299 |
|
| 159 |
+
| 0.0222 | 2.2098 | 171 | 0.0307 |
|
| 160 |
+
| 0.0183 | 2.4590 | 190 | 0.0296 |
|
| 161 |
+
| 0.0205 | 2.7082 | 209 | 0.0291 |
|
| 162 |
+
| 0.0153 | 2.9574 | 228 | 0.0281 |
|
| 163 |
+
| 0.0162 | 3.1902 | 247 | 0.0286 |
|
| 164 |
+
| 0.0126 | 3.4393 | 266 | 0.0290 |
|
| 165 |
+
| 0.0147 | 3.6885 | 285 | 0.0287 |
|
| 166 |
+
| 0.0157 | 3.9377 | 304 | 0.0288 |
|
| 167 |
|
| 168 |
|
| 169 |
### Framework versions
|
adapter_model.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 335706186
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1a9f2c96b8754c87ccd86910df8f4514f74e07548f3f863e6c1c99422fe65200
|
| 3 |
size 335706186
|