andysalerno
/

openchat-nectar-0.5

Text Generation

text-generation-inference

Model card Files Files and versions

andysalerno commited on Jan 16, 2024

Commit

2e3fef2

·

verified ·

1 Parent(s): ba3caf5

Update README.md

Files changed (1) hide show

README.md +18 -1

README.md CHANGED Viewed

@@ -14,4 +14,21 @@ Data selection and filtering:
 - I filtered out any examples with "good_natured == False". Why? When I inspected examples with "good_natured == False" in the Nectar dataset, I noticed they frequently include refusals from even the top ranking model. So, counter-intuitively, including "bad natured" entries might actually censor the model *more*, since the top responses (as ranked by GPT-4) to these queries tend to be refusals. Not to mention, the quality of the conversations that are "bad natured" tends to be worse in general, in my own opinion.
 Differences from 0.4:
-- Trained on 5000 steps instead of 500, with a lower learning rate and slower warmup period.

 - I filtered out any examples with "good_natured == False". Why? When I inspected examples with "good_natured == False" in the Nectar dataset, I noticed they frequently include refusals from even the top ranking model. So, counter-intuitively, including "bad natured" entries might actually censor the model *more*, since the top responses (as ranked by GPT-4) to these queries tend to be refusals. Not to mention, the quality of the conversations that are "bad natured" tends to be worse in general, in my own opinion.
 Differences from 0.4:
+- Trained on 5000 steps instead of 500, with a lower learning rate and slower warmup period.
+Summary of versions:
+**[openchat-nectar-0.1](https://huggingface.co/andysalerno/openchat-nectar-0.1)**
+- 200 steps, no filtering on Nectar dataset, 5e-5 learning rate
+**[openchat-nectar-0.2](https://huggingface.co/andysalerno/openchat-nectar-0.2)**
+- empty repo, failed training. ignore it
+**[openchat-nectar-0.3](https://huggingface.co/andysalerno/openchat-nectar-0.3)**
+- 500 steps, no filtering on Nectar dataset, 5e-5 learning rate (same as 1 but with more steps)
+**[openchat-nectar-0.4](https://huggingface.co/andysalerno/openchat-nectar-0.4)**
+- 500 steps, filtered dataset to only include multi-chat-turn examples, used 4th ranking response as the "rejected" instead of 3rd, filtered out "good_natured=False", 5e-5 learning rate
+**[openchat-nectar-0.5](https://huggingface.co/andysalerno/openchat-nectar-0.5)**
+- 5000 steps (over a full epoch), filtered dataset to only include multi-chat-turn examples, used 4th ranking response as the "rejected" instead of 3rd, filtered out "good_natured=False", 5e-6 learning rate. Same as 0.4 but with 10x the steps, and 1/10th the learning rate