Collections
Discover the best community collections!
Collections including paper arxiv:2405.11143 
						
					
				- 
	
	
	
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 39 - 
	
	
	
Pandora: Towards General World Model with Natural Language Actions and Video States
Paper • 2406.09455 • Published • 16 - 
	
	
	
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 15 - 
	
	
	
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper • 2406.11194 • Published • 18 
- 
	
	
	
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 - 
	
	
	
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 10 - 
	
	
	
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 - 
	
	
	
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7 
- 
	
	
	
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71 - 
	
	
	
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper • 2406.06469 • Published • 29 - 
	
	
	
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper • 2406.04271 • Published • 30 - 
	
	
	
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 41 
- 
	
	
	
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 - 
	
	
	
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 - 
	
	
	
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 - 
	
	
	
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41 
- 
	
	
	
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 - 
	
	
	
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 10 - 
	
	
	
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 - 
	
	
	
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7 
- 
	
	
	
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 39 - 
	
	
	
Pandora: Towards General World Model with Natural Language Actions and Video States
Paper • 2406.09455 • Published • 16 - 
	
	
	
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 15 - 
	
	
	
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper • 2406.11194 • Published • 18 
- 
	
	
	
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71 - 
	
	
	
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper • 2406.06469 • Published • 29 - 
	
	
	
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper • 2406.04271 • Published • 30 - 
	
	
	
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 41 
- 
	
	
	
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 - 
	
	
	
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 - 
	
	
	
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 - 
	
	
	
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41