Commit
·
61a8927
1
Parent(s):
dbfbc30
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,7 +7,8 @@ language:
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
| 11 |
To use this model for reward scoring:
|
| 12 |
|
| 13 |
```python
|
|
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
---
|
| 9 |
|
| 10 |
+
A reward model trained on deberta-large-v3 using Anthropic-hh dataset. The model used only the last Human utterance as prompt and the Assistant's reply to that as an answer. It achieves an accuracy of 87% on this dataset.
|
| 11 |
+
|
| 12 |
To use this model for reward scoring:
|
| 13 |
|
| 14 |
```python
|