Update README.md
Browse files
README.md
CHANGED
|
@@ -20,6 +20,7 @@ with the K-wise maximum likelihood estimator proposed in [this paper](https://ar
|
|
| 20 |
less harmful will get the highest reward score. Note that since the preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest) is based on GPT-4 preference, the reward model is likely to be biased
|
| 21 |
towards GPT-4's own preference, including longer responses and certain response format.
|
| 22 |
|
|
|
|
| 23 |
|
| 24 |
|
| 25 |
- **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao.
|
|
@@ -35,16 +36,15 @@ towards GPT-4's own preference, including longer responses and certain response
|
|
| 35 |
- **Blog:** https://starling.cs.berkeley.edu/
|
| 36 |
- **Paper:** Coming soon!
|
| 37 |
- **Code:** Coming soon!
|
| 38 |
-
|
| 39 |
## Uses
|
| 40 |
|
| 41 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 42 |
-
Please use the following code for
|
| 43 |
|
| 44 |
```
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
```
|
| 49 |
|
| 50 |
|
|
|
|
| 20 |
less harmful will get the highest reward score. Note that since the preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest) is based on GPT-4 preference, the reward model is likely to be biased
|
| 21 |
towards GPT-4's own preference, including longer responses and certain response format.
|
| 22 |
|
| 23 |
+
For more detailed discussions, please check out our [blog post](starling.cs.berkeley.edu), and stay tuned for our upcoming code and paper!
|
| 24 |
|
| 25 |
|
| 26 |
- **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao.
|
|
|
|
| 36 |
- **Blog:** https://starling.cs.berkeley.edu/
|
| 37 |
- **Paper:** Coming soon!
|
| 38 |
- **Code:** Coming soon!
|
| 39 |
+
|
| 40 |
## Uses
|
| 41 |
|
| 42 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 43 |
+
Please use the following code for inference with the reward model.
|
| 44 |
|
| 45 |
```
|
| 46 |
+
## Define the reward model function class
|
| 47 |
+
Test.
|
|
|
|
| 48 |
```
|
| 49 |
|
| 50 |
|