I tested sd-flow-alpha and found that while it does work, it has severe issues with middle gray (aka latent zero) bias, much worse than base sd15. This led me to believe that it had not fully learned the RF objective yet, and more finetuning was needed.
What I did:
- mashed together some training code from scripts I had lying around
- full parameter unet finetune, no special tricks
- used the biggest dataset I had on hand (~2m images of commoncatalog-cc-by)
- batch size 16, lr 8e-6, 8k steps (128k images sampled, probably no repeats, about 8h)
The brute force approach definitely worked, and the 8k checkpoint has no trouble creating dark images, bright images, high contrast, even solid colors sometimes (although it's not perfect at that). Additionally the finetuned model tends to obey bright/dark prompts better than base sd15, although it still has the CLIP color bleed issue.
Downsides: because I trained all parameters indiscriminately, some knowledge of the base model has been partially forgotten. The dataset I used is fairly general, but nowhere near as diverse as LAION-5b.
It now also tends to default to rich, high contrast images (especially with high CFG), so if you want something else you'll have to prompt for it.
I've uploaded two checkpoints in both diffusers and merged checkpoint formats, as well as a difference merged checkpoint with RV5.1, which works quite well.
The example comfyui workflow includes the nodes for difference merging with other checkpoints if you want to try that.
I've included my train_sd.py as well. All the defaults are what I used for the training run.
- Downloads last month
- 42
Model tree for spacepxl/sd15-flow-alpha-finetune
Base model
stable-diffusion-v1-5/stable-diffusion-v1-5