Process Reward Models that Think -- https://arxiv.org/abs/2504.16828
			
	
	AI & ML interests
Factuality, reasoning, alignment, LLM applications
Recent Activity
			spaces
			5
		
			
	
	
	
	
	
		Running
		
	
					
					2
ExpertLongBench
🚀
Leaderboard for ExpertLongBench
		Sleeping
		
	
					
					1
ManyICLBench
🚀
Leaderboard for ManyICLBench
		Running
		
	FactRBench
🏆
View and analyze long-form factuality leaderboard
		Running
		
	MLRC-BENCH
📊
Display model performance rankings
		Running
		
	
					
					3
Factbench
📈
View and compare language model factuality scores
			datasets
			12
		
			
	
	
	
	
	launch/ExpertLongBench
			Preview
			• 
	
				Updated
					
				
	
				• 
					
					116
				
				• 
					
					10
				
launch/thinkprm-1K-verification-cots
			Viewer
			• 
	
				Updated
					
				• 
			
			1k
	
				• 
					
					53
				
				• 
					
					6
				
launch/ManyICLBench
			Viewer
			• 
	
				Updated
					
				• 
			
			66
	
				• 
					
					167
				
				• 
					
					1
				
launch/CMV
			Viewer
			• 
	
				Updated
					
				• 
			
			133
	
				• 
					
					21
				
				
				
launch/FactRBench
			Viewer
			• 
	
				Updated
					
				• 
			
			1.06k
	
				• 
					
					26
				
				• 
					
					1
				
launch/FactBench
			Viewer
			• 
	
				Updated
					
				• 
			
			1k
	
				• 
					
					40
				
				• 
					
					3
				
launch/CLASH
			Viewer
			• 
	
				Updated
					
				• 
			
			345
	
				• 
					
					20
				
				• 
					
					2
				
launch/gov_report
			Viewer
			• 
	
				Updated
					
				• 
			
			58.4k
	
				• 
					
					124
				
				• 
					
					7
				
launch/gov_report_qs
			Viewer
			• 
	
				Updated
					
				• 
			
			7.87k
	
				• 
					
					74
				
				• 
					
					3
				
launch/open_question_type
			Viewer
			• 
	
				Updated
					
				• 
			
			4.96k
	
				• 
					
					47
				
				• 
					
					5
				
