RLVR Strategy Suggestions
#30
by
TheOfficialAJ - opened
I am trying to finetune the base variant for a table parsing task. I am also looking into outputting the tables in OTSL instead of HTML to save up on tokens.
After normal finetuning, I want to experiment with RLVR to better enforce the structure of the table. I couldn't find the exact training strategy used being discussed in the paper or the finetuning notebook.
Is it possible to get access to the RLVR pipeline?