Co-rewarding is a novel self-supervised RL framework that improves training stability by seeking complementary supervision from another views.
			
	
	- 
	
	
	TMLR-Group-HF/Co-rewarding-RephrasedMATHViewer • Updated • 7.5k • 72
- 
	
	
	  TMLR-Group-HF/Co-rewarding-I-Qwen2.5-3B-MATHText Generation • 3B • Updated • 15 • 1
- 
	
	
	  TMLR-Group-HF/Co-rewarding-I-Qwen2.5-7B-MATHText Generation • 8B • Updated • 23 • 1
- 
	
	
	  TMLR-Group-HF/Co-rewarding-I-Qwen3-1.7B-Base-MATHText Generation • 2B • Updated • 18