RLVR Strategy Suggestions

#30
by TheOfficialAJ - opened

I am trying to finetune the base variant for a table parsing task. I am also looking into outputting the tables in OTSL instead of HTML to save up on tokens.

After normal finetuning, I want to experiment with RLVR to better enforce the structure of the table. I couldn't find the exact training strategy used being discussed in the paper or the finetuning notebook.

Is it possible to get access to the RLVR pipeline?

Sign up or log in to comment