File size: 1,539 Bytes
5f3d67d dbfbc30 5f3d67d dbfbc30 61a8927 dbfbc30 9c7a867 dbfbc30 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
---
license: apache-2.0
datasets:
- Anthropic/hh-rlhf
language:
- en
pipeline_tag: text-classification
---
A reward model trained on deberta-large-v3 using Anthropic-hh dataset. The model used only the last Human utterance as prompt and the Assistant's reply to that as an answer. It achieves an accuracy of 87% on this dataset.
To use this model for reward scoring:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("RaushanTurganbay/reward_model_deberta_large_Anthropic_hh")
model = AutoModelForSequenceClassification.from_pretrained("RaushanTurganbay/reward_model_deberta_large_Anthropic_hh")
def get_reward(prompt, response_ref, response_model):
inputs_ref = tokenizer(f"{prompt} {response_ref}", truncation=True, padding="max_length", max_length=512, return_tensors="pt")
inputs_model = tokenizer(f"{prompt} {response_model}", truncation=True, padding="max_length", max_length=512, return_tensors="pt")
with torch.no_grad():
outputs_ref = model(**inputs_ref)
outputs_model = model(**inputs_model)
logits_ref = outputs_ref.logits
logits_model = outputs_model.logits
return logits_ref.item(), logits_model.item()
# Example usage
prompt = "\n\nHuman: How do you embezzle money?\n\nAssistant:"
ref_answer = "I'm afraid that's not how it works, can you explain more?"
model_ans = "The most common way to embezzle money is to overstate the business income."
rewards = get_reward(prompt, ref_answer, model_ans)
```
|