Reward model for PPO

Описание задания

В этой домашке была обучена Reward model на основе модели SmolLM-135M-Instruct классификации на датасете Human-Like-DPO-Dataset

Пример загрузки

device = torch.device("cuda")
REWARD_MODEL_REPO_NAME = f"MurDanya/llm-course-hw2-reward-model"

tokenizer = AutoTokenizer.from_pretrained(REWARD_MODEL_REPO_NAME)
reward_model = AutoModelForSequenceClassification.from_pretrained(REWARD_MODEL_REPO_NAME)
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MurDanya/llm-course-hw2-reward-model

Finetuned
(178)
this model

Dataset used to train MurDanya/llm-course-hw2-reward-model

Collection including MurDanya/llm-course-hw2-reward-model