berkeley-nest
/

Starling-RM-7B-alpha

text-generation-inference

Model card Files Files and versions

banghua commited on Nov 27, 2023

Commit

b5fc156

·

1 Parent(s): 36eccc9

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -20,6 +20,7 @@ with the K-wise maximum likelihood estimator proposed in [this paper](https://ar
 less harmful will get the highest reward score. Note that since the preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest) is based on GPT-4 preference, the reward model is likely to be biased
 towards GPT-4's own preference, including longer responses and certain response format.
 - **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao.
@@ -35,16 +36,15 @@ towards GPT-4's own preference, including longer responses and certain response
 - **Blog:** https://starling.cs.berkeley.edu/
 - **Paper:** Coming soon!
 - **Code:** Coming soon!
--
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-Please use the following code for running inference with the reward model.
 ```
-# Load in the reward model
 ```

 less harmful will get the highest reward score. Note that since the preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest) is based on GPT-4 preference, the reward model is likely to be biased
 towards GPT-4's own preference, including longer responses and certain response format.
+For more detailed discussions, please check out our [blog post](starling.cs.berkeley.edu), and stay tuned for our upcoming code and paper!
 - **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao.
 - **Blog:** https://starling.cs.berkeley.edu/
 - **Paper:** Coming soon!
 - **Code:** Coming soon!
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+Please use the following code for inference with the reward model.
 ```
+## Define the reward model function class
+Test.
 ```