https://github.com/huggingface/diffusers/pull/12207
Cannot do much beyond this at this point. There are a couple of things very unclear.
 
					
	
		https://github.com/huggingface/diffusers/pull/12207
Cannot do much beyond this at this point. There are a couple of things very unclear.
Phenomenal post!
Maybe it could help future readers to have some clarity on the CUDA programming model itself so that the hierarchy of where each of the components (SM, thread blocks, registers, etc.) sits is clear.
 
					
	
		 
					
	
		torch.compile 
					
	
		 
					
	
		 
					
	
		 
					
	
		 
					
	
		2 ** search_round) and repeat 1 - 3. 
					
	
		 
					
	
		 
					
	
		 
					
	
		 
					
	
		 
					
	
		 
					
	
		diffusers 🧨bistandbytes as the official backend but using others like torchao is already very simple. enable_model_cpu_offload() 
					
	
		torch.compile() them.  
					
	
		from_single_file loading and affected by the Runway SD 1.5 issue.runwayml/stable-diffusion-v1-5 saved locally in your HF cache then loading single file checkpoints in the following way should still work. from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>")runwayml/stable-diffusion-v1-5 doesn't exist anymore. from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>", config="Lykon/DreamShaper") 
					
	
		 
								 
								


















