bubbliiiing commited on
Commit ·
5155fc5
1
Parent(s): 3a7324b
Update 2602 for gray control
Browse files
README.md
CHANGED
|
@@ -8,40 +8,46 @@ library_name: videox_fun
|
|
| 8 |
[](https://github.com/aigc-apps/VideoX-Fun)
|
| 9 |
|
| 10 |
## Update
|
| 11 |
-
-
|
| 12 |
-
-
|
| 13 |
-
-
|
|
|
|
| 14 |
|
| 15 |
## Model Card
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
###
|
| 18 |
| Name | Description |
|
| 19 |
|--|--|
|
| 20 |
-
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-2601-8steps.safetensors | Compared to the old version
|
| 21 |
-
| Z-Image-Turbo-Fun-Controlnet-Tile-2.1-2601-8steps.safetensors | Compared to the old version
|
| 22 |
-
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-lite-2601-8steps.safetensors |
|
| 23 |
-
| Z-Image-Turbo-Fun-Controlnet-Tile-2.1-lite-2601-8steps.safetensors |
|
| 24 |
|
| 25 |
-
###
|
| 26 |
| Name | Description |
|
| 27 |
|--|--|
|
| 28 |
-
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors |
|
| 29 |
-
| Z-Image-Turbo-Fun-Controlnet-Tile-2.1-8steps.safetensors | A Tile model trained on high-definition datasets
|
| 30 |
-
| Z-Image-Turbo-Fun-Controlnet-Union-2.1.safetensors | A retrained model
|
| 31 |
-
| Z-Image-Turbo-Fun-Controlnet-Union-2.0.safetensors | ControlNet weights for Z-Image-Turbo. Compared to version 1.0,
|
| 32 |
|
| 33 |
## Model Features
|
| 34 |
-
- This ControlNet is
|
| 35 |
-
- Inpainting mode is also supported.
|
| 36 |
-
- Training Process:
|
| 37 |
-
- 2.0:
|
| 38 |
-
- 2.1:
|
| 39 |
-
- 2.1-8-steps:
|
| 40 |
-
- Note on Steps:
|
| 41 |
-
- 2.0 and 2.1:
|
| 42 |
-
- 2.1-8-steps:
|
| 43 |
-
-
|
| 44 |
-
-
|
| 45 |
|
| 46 |
## Results
|
| 47 |
### a. Difference between 2.1-8steps and 2.1-2601-8steps.
|
|
@@ -88,7 +94,7 @@ The old 8-steps model sometimes learned the mask information and tended to compl
|
|
| 88 |
|
| 89 |
### c. Generation Results With 2.1-lite-2601-8steps
|
| 90 |
|
| 91 |
-
|
| 92 |
|
| 93 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 94 |
<tr>
|
|
@@ -234,6 +240,19 @@ Uses the same training scheme as the 2601 version, but compared to the large ver
|
|
| 234 |
</tr>
|
| 235 |
</table>
|
| 236 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 237 |
## Inference
|
| 238 |
Go to the VideoX-Fun repository for more details.
|
| 239 |
|
|
|
|
| 8 |
[](https://github.com/aigc-apps/VideoX-Fun)
|
| 9 |
|
| 10 |
## Update
|
| 11 |
+
- **[2026.02.26]** Update to version 2602, with support for Gray Control.
|
| 12 |
+
- **[2026.01.12]** Update to version 2601, with support for Scribble Control. Added lite models (1.9GB, 5 layers). Retrained Control and Tile models with enriched mask varieties, improved training schedules, and multi-resolution control images (512~1536) to fix mask pattern leakage and large `control_context_scale` artifacts.
|
| 13 |
+
- **[2025.12.22]** Performed 8-step distillation on v2.1 to restore acceleration lost when applying ControlNet. Uploaded a tile model for super-resolution.
|
| 14 |
+
- **[2025.12.17]** Fixed v2.0 typo (`control_layers` used instead of `control_noise_refiner`), which caused double forward pass and slow inference. Speed restored in v2.1.
|
| 15 |
|
| 16 |
## Model Card
|
| 17 |
+
### a. 2602 Models
|
| 18 |
+
| Name | Description |
|
| 19 |
+
|--|--|
|
| 20 |
+
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-2602-8steps.safetensors | Supports multiple control conditions (Canny, Depth, Pose, MLSD, Hed, Scribble, and Gray).|
|
| 21 |
+
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-lite-2602-8steps.safetensors | Same training scheme as the 2601 version, but with control applied to fewer layers. Supports multiple control conditions (Canny, Depth, Pose, MLSD, Hed, Scribble, and Gray). |
|
| 22 |
|
| 23 |
+
### b. 2601 Models
|
| 24 |
| Name | Description |
|
| 25 |
|--|--|
|
| 26 |
+
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-2601-8steps.safetensors | Compared to the old version, this model uses more diverse masks, a more reasonable training schedule, and multi-resolution control images (512–1536) instead of single resolution (512). This reduces artifacts and mask information leakage while improving robustness. Supports multiple control conditions (Canny, Depth, Pose, MLSD, Hed, and Scribble). |
|
| 27 |
+
| Z-Image-Turbo-Fun-Controlnet-Tile-2.1-2601-8steps.safetensors | Compared to the old version, uses higher training resolution and a more refined distillation schedule, reducing bright spots and artifacts. |
|
| 28 |
+
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-lite-2601-8steps.safetensors | Same training scheme as the 2601 version, but with control applied to fewer layers, resulting in weaker control. This allows for larger control_context_scale values with more natural results, and is also better suited for lower-spec machines. Supports multiple control conditions (Canny, Depth, Pose, MLSD, Hed, and Scribble). |
|
| 29 |
+
| Z-Image-Turbo-Fun-Controlnet-Tile-2.1-lite-2601-8steps.safetensors | Same training scheme as the 2601 version, but with control applied to fewer layers, resulting in weaker control. Allows larger control_context_scale values with more natural results, and better suits lower-spec machines. |
|
| 30 |
|
| 31 |
+
### c. Models Before 2601
|
| 32 |
| Name | Description |
|
| 33 |
|--|--|
|
| 34 |
+
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors | Distilled from version 2.1 using an 8-step distillation algorithm. Compared to version 2.1, 8-step prediction yields clearer images with more reasonable composition. Supports Canny, Depth, Pose, MLSD, and Hed. |
|
| 35 |
+
| Z-Image-Turbo-Fun-Controlnet-Tile-2.1-8steps.safetensors | A Tile model trained on high-definition datasets (up to 2048×2048) for super-resolution, distilled using an 8-step algorithm. 8-step prediction is recommended. |
|
| 36 |
+
| Z-Image-Turbo-Fun-Controlnet-Union-2.1.safetensors | A retrained model fixing the typo in version 2.0, with faster single-step speed. Supports Canny, Depth, Pose, MLSD, and Hed. However, like version 2.0, some acceleration capability was lost during training, requiring more steps and cfg. |
|
| 37 |
+
| Z-Image-Turbo-Fun-Controlnet-Union-2.0.safetensors | ControlNet weights for Z-Image-Turbo. Compared to version 1.0, more layers are modified with longer training. However, a code typo caused layer blocks to forward twice, resulting in slower speed. Supports Canny, Depth, Pose, MLSD, and Hed. Some acceleration capability was lost during training, requiring more steps. |
|
| 38 |
|
| 39 |
## Model Features
|
| 40 |
+
- This ControlNet is applied to 15 layer blocks and 2 refiner layer blocks (Lite models: 3 layer blocks and 2 refiner layer blocks). It supports multiple control conditions including Canny, HED, Depth, Pose, and MLSD (supporting Scribble in 2601 models and Gray in 2602 models).
|
| 41 |
+
- Inpainting mode is also supported. For inpaint mode, use a larger `control_context_scale` for better image continuity.
|
| 42 |
+
- **Training Process:**
|
| 43 |
+
- **2.0:** Trained from scratch for 70,000 steps on 1M high-quality images (general and human-centric content) at 1328 resolution with BFloat16 precision, batch size 64, learning rate 2e-5, and text dropout ratio 0.10.
|
| 44 |
+
- **2.1:** Continued training from 2.0 weights for 11,000 additional steps after fixing a typo, using the same parameters and dataset.
|
| 45 |
+
- **2.1-8-steps:** Distilled from version 2.1 using an 8-step distillation algorithm for 5,500 steps.
|
| 46 |
+
- **Note on Steps:**
|
| 47 |
+
- **2.0 and 2.1:** Higher `control_context_scale` values may require more inference steps for better results, likely because the control model has not been distilled.
|
| 48 |
+
- **2.1-8-steps:** Use 8 steps for inference.
|
| 49 |
+
- Adjust `control_context_scale` (optimal range: 0.65–1.00) for stronger control and better detail preservation. A detailed prompt is highly recommended for stability.
|
| 50 |
+
- In versions 2.0 and 2.1, applying ControlNet to Z-Image-Turbo caused loss of acceleration capability and blurry images. For strength and step count testing details, refer to [Scale Test Results](#scale-test-results) (generated with version 2.0).
|
| 51 |
|
| 52 |
## Results
|
| 53 |
### a. Difference between 2.1-8steps and 2.1-2601-8steps.
|
|
|
|
| 94 |
|
| 95 |
### c. Generation Results With 2.1-lite-2601-8steps
|
| 96 |
|
| 97 |
+
Shares the same training scheme as the 2601 version, but with control applied to fewer layers, resulting in weaker control. This allows for larger control_context_scale values with more natural results, and is also better suited for lower-spec machines.
|
| 98 |
|
| 99 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 100 |
<tr>
|
|
|
|
| 240 |
</tr>
|
| 241 |
</table>
|
| 242 |
|
| 243 |
+
### e. Gray Control Results with 2602 Models
|
| 244 |
+
|
| 245 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 246 |
+
<tr>
|
| 247 |
+
<td>Low Resolution</td>
|
| 248 |
+
<td>High Resolution</td>
|
| 249 |
+
</tr>
|
| 250 |
+
<tr>
|
| 251 |
+
<td><img src="asset/gray.jpg" width="100%" /></td>
|
| 252 |
+
<td><img src="results/gray.png" width="100%" /></td>
|
| 253 |
+
</tr>
|
| 254 |
+
</table>
|
| 255 |
+
|
| 256 |
## Inference
|
| 257 |
Go to the VideoX-Fun repository for more details.
|
| 258 |
|
Z-Image-Turbo-Fun-Controlnet-Union-2.1-2602-8steps.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d1251cc7bc3486bc61d25c3be498ef394c31c85ddf4ee9137d2e933411f4a689
|
| 3 |
+
size 6712485600
|
Z-Image-Turbo-Fun-Controlnet-Union-2.1-lite-2602-8steps.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3ea098db9bd145be525c7e2366920b6d76c5ffd46b3d7aa8169bbc943fdaee35
|
| 3 |
+
size 2016627488
|
asset/gray.jpg
ADDED
|
Git LFS Details
|
results/gray.png
ADDED
|
Git LFS Details
|