gary109
			's Collections
			 
		
			
		RLHF
		
	updated
			
 
				
				
	
	
	
			
			Stabilizing RLHF through Advantage Model and Selective Rehearsal
		
			Paper
			
•
			2309.10202
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Q-Transformer: Scalable Offline Reinforcement Learning via
  Autoregressive Q-Functions
		
			Paper
			
•
			2309.10150
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Robotic Offline RL from Internet Videos via Value-Function Pre-Training
		
			Paper
			
•
			2309.13041
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Voyager: An Open-Ended Embodied Agent with Large Language Models
		
			Paper
			
•
			2305.16291
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Unleashing the Power of Pre-trained Language Models for Offline
  Reinforcement Learning
		
			Paper
			
•
			2310.20587
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			JaxMARL: Multi-Agent RL Environments in JAX
		
			Paper
			
•
			2311.10090
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
  Fine-grained Correctional Human Feedback
		
			Paper
			
•
			2312.00849
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			RLVF: Learning from Verbal Feedback without Overgeneralization
		
			Paper
			
•
			2402.10893
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Learning to Learn Faster from Human Feedback with Language Model
  Predictive Control
		
			Paper
			
•
			2402.11450
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
		
			Paper
			
•
			2403.03954
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Dataset Reset Policy Optimization for RLHF
		
			Paper
			
•
			2404.08495
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
		
			Paper
			
•
			2406.15193
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			WARP: On the Benefits of Weight Averaged Rewarded Policies
		
			Paper
			
•
			2406.16768
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
		
			Paper
			
•
			2408.08441
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Reward-Robust RLHF in LLMs
		
			Paper
			
•
			2409.15360
			
•
			Published
				
			•
				
				6