Update README.md
Browse files
README.md
CHANGED
|
@@ -45,7 +45,7 @@ An adapter to reproduce the likeness of the legendary Symbolist/Modernist Russia
|
|
| 45 |
This version of our Blok LoRA was the product of an experimental training to transfer face/attribute-features from historical photos with minimal compute time by using a high rank training, and a relatively high learning rate, but with a minimal number of steps. <br>
|
| 46 |
This is the version at rank128 (linear_dims+alpha), lr of 0.0005, batch size 2 (with a dataset of only 12 images, but x3 resolutions: 512, 768, 1024), minimalist descriptive captions with a dropout of .09 (9%), adamw8bit optimizer, and only 50 steps (!), with no warmups. <br>
|
| 47 |
|
| 48 |
-
All in all, we consider the experiment fairly successful, and exceptionally demonstrative of the unprecedentedly absorbent
|
| 49 |
We will soon reproduce this experiment on one of the homebrew de-distilled versions of FLUX, and see whether fast learning improves or diminishes without the extra steering from distilled guidance during fine-tuning. <br>
|
| 50 |
And we are most curious about whether the re-introduction of distilled guidance during inference still zeroes in even on low-step learning. <br>
|
| 51 |
Higher step learning with de-distilled Flux models has so far demonstated broader potentials to any other technique. <br>
|
|
|
|
| 45 |
This version of our Blok LoRA was the product of an experimental training to transfer face/attribute-features from historical photos with minimal compute time by using a high rank training, and a relatively high learning rate, but with a minimal number of steps. <br>
|
| 46 |
This is the version at rank128 (linear_dims+alpha), lr of 0.0005, batch size 2 (with a dataset of only 12 images, but x3 resolutions: 512, 768, 1024), minimalist descriptive captions with a dropout of .09 (9%), adamw8bit optimizer, and only 50 steps (!), with no warmups. <br>
|
| 47 |
|
| 48 |
+
All in all, we consider the experiment fairly successful, and exceptionally demonstrative of the unprecedentedly absorbent learning capacity not just among FLUX models, but DiT-based models more broadly. <br>
|
| 49 |
We will soon reproduce this experiment on one of the homebrew de-distilled versions of FLUX, and see whether fast learning improves or diminishes without the extra steering from distilled guidance during fine-tuning. <br>
|
| 50 |
And we are most curious about whether the re-introduction of distilled guidance during inference still zeroes in even on low-step learning. <br>
|
| 51 |
Higher step learning with de-distilled Flux models has so far demonstated broader potentials to any other technique. <br>
|