- 
	
	
	
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
Paper • 2410.13925 • Published • 24 - 
	
	
	
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Paper • 2410.14672 • Published • 8 - 
	
	
	
Scalable Ranked Preference Optimization for Text-to-Image Generation
Paper • 2410.18013 • Published • 15 - 
	
	
	
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Paper • 2410.18666 • Published • 19 
Collections
Discover the best community collections!
Collections including paper arxiv:2410.22366 
						
					
				- 
	
	
	
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 - 
	
	
	
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 - 
	
	
	
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 - 
	
	
	
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41 
- 
	
	
	
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 5 - 
	
	
	
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 15 - 
	
	
	
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 - 
	
	
	
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2 
- 
	
	
	
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 - 
	
	
	
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 - 
	
	
	
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 - 
	
	
	
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77 
- 
	
	
	
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 - 
	
	
	
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 - 
	
	
	
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 - 
	
	
	
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22 
- 
	
	
	
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 - 
	
	
	
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 - 
	
	
	
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36 - 
	
	
	
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 48 
- 
	
	
	
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 - 
	
	
	
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 - 
	
	
	
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 - 
	
	
	
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25 
- 
	
	
	
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
Paper • 2410.13925 • Published • 24 - 
	
	
	
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Paper • 2410.14672 • Published • 8 - 
	
	
	
Scalable Ranked Preference Optimization for Text-to-Image Generation
Paper • 2410.18013 • Published • 15 - 
	
	
	
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Paper • 2410.18666 • Published • 19 
- 
	
	
	
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 - 
	
	
	
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 - 
	
	
	
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 - 
	
	
	
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22 
- 
	
	
	
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 - 
	
	
	
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 - 
	
	
	
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 - 
	
	
	
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41 
- 
	
	
	
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 - 
	
	
	
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 - 
	
	
	
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36 - 
	
	
	
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 48 
- 
	
	
	
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 5 - 
	
	
	
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 15 - 
	
	
	
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 - 
	
	
	
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2 
- 
	
	
	
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 - 
	
	
	
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 - 
	
	
	
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 - 
	
	
	
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25 
- 
	
	
	
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 - 
	
	
	
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 - 
	
	
	
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 - 
	
	
	
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77