roberta-bias-reward-model

This model is a fine-tuned version of FacebookAI/roberta-base on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 128
eval_batch_size: 128
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 30

Training Loss	Epoch	Step	Validation Loss	Mse
0.0218	1.0	65	0.0157	0.0157
0.0157	2.0	130	0.0130	0.0130
0.0121	3.0	195	0.0120	0.0120
0.0112	4.0	260	0.0195	0.0195
0.0096	5.0	325	0.0111	0.0111
0.0077	6.0	390	0.0126	0.0126
0.008	7.0	455	0.0127	0.0127
0.0054	8.0	520	0.0160	0.0160
0.0047	9.0	585	0.0182	0.0182
0.0047	10.0	650	0.0144	0.0144
0.0035	11.0	715	0.0124	0.0124
0.0036	12.0	780	0.0142	0.0142
0.003	13.0	845	0.0158	0.0158
0.0034	14.0	910	0.0130	0.0130
0.0027	15.0	975	0.0175	0.0175
0.0031	16.0	1040	0.0160	0.0160
0.0024	17.0	1105	0.0149	0.0149
0.002	18.0	1170	0.0159	0.0159
0.0022	19.0	1235	0.0139	0.0139
0.0021	20.0	1300	0.0150	0.0150
0.002	21.0	1365	0.0143	0.0143
0.0018	22.0	1430	0.0146	0.0146
0.0018	23.0	1495	0.0169	0.0169
0.0018	24.0	1560	0.0158	0.0158
0.0015	25.0	1625	0.0161	0.0161
0.0015	26.0	1690	0.0149	0.0149
0.0015	27.0	1755	0.0145	0.0145
0.0014	28.0	1820	0.0153	0.0153
0.0015	29.0	1885	0.0152	0.0152
0.0015	30.0	1950	0.0150	0.0150

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(1973)

this model