Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
zeliang0426
/
RM-Think
like
0
Text Generation
Transformers
Safetensors
llama_adapter
Generated from Trainer
unsloth
grpo
trl
conversational
custom_code
arxiv:
2402.03300
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
RM-Think
Commit History
Training in progress, step 120
9b97f19
verified
zeliang0426
commited on
Aug 16
Training in progress, step 110
143f427
verified
zeliang0426
commited on
Aug 16
Training in progress, step 100
35cada9
verified
zeliang0426
commited on
Aug 16
Training in progress, step 90
66a1217
verified
zeliang0426
commited on
Aug 16
Training in progress, step 80
a4617c0
verified
zeliang0426
commited on
Aug 16
Training in progress, step 70
7bde851
verified
zeliang0426
commited on
Aug 16
Training in progress, step 60
8dc7655
verified
zeliang0426
commited on
Aug 16
Training in progress, step 50
ccc8f7c
verified
zeliang0426
commited on
Aug 16
Training in progress, step 40
9f87d77
verified
zeliang0426
commited on
Aug 16
Training in progress, step 30
3caded3
verified
zeliang0426
commited on
Aug 16
Training in progress, step 20
4789525
verified
zeliang0426
commited on
Aug 16
Training in progress, step 10
58b044a
verified
zeliang0426
commited on
Aug 16
initial commit
3135783
verified
zeliang0426
commited on
Aug 15