File size: 4,909 Bytes
df86cdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d6dc2e
df86cdf
 
 
 
236acad
 
df86cdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7177a83
df86cdf
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
license: apache-2.0
language:
  - en
tags:
- video
- video-generation
- video-to-video
- controlnet
- diffusers
- wan2.2
---
# Controlnet for Wan2.2 (tile)

This repo contains the code for controlnet module for Wan2.2.  See <a href="https://github.com/TheDenk/wan2.2-controlnet">Github code</a>.   
Same approach as controlnet for [Wan2.1](https://github.com/TheDenk/wan2.1-dilated-controlnet).

<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/nf0_13795_uaVEOuKodOK.mp4"></video>

### For ComfyUI
Use the cool [ComfyUI-WanVideoWrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper).  

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/xMkt_cPrf4aWipPKxfVIy.png)

### Inference examples
#### Simple inference with cli
```bash
python -m inference.cli_demo \
    --video_path "resources/bubble.mp4" \
    --prompt "Close-up shot with soft lighting, focusing sharply on the lower half of a young woman's face. Her lips are slightly parted as she blows an enormous bubblegum bubble. The bubble is semi-transparent, shimmering gently under the light, and surprisingly contains a miniature aquarium inside, where two orange-and-white goldfish slowly swim, their fins delicately fluttering as if in an aquatic universe. The background is a pure light blue color." \
    --controlnet_type "tile" \
    --base_model_path Wan-AI/Wan2.2-TI2V-5B-Diffusers \
    --controlnet_model_path TheDenk/wan2.2-ti2v-5b-controlnet-tile-v1
```
#### Minimal code example
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = "0"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

import cv2
from PIL import Image
import torch
from diffusers.utils import load_video, export_to_video
from diffusers import AutoencoderKLWan, UniPCMultistepScheduler

from wan_controlnet import WanControlnet
from wan_transformer import CustomWanTransformer3DModel
from wan_t2v_controlnet_pipeline import WanTextToVideoControlnetPipeline

base_model_path = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
controlnet_model_path = "TheDenk/wan2.2-ti2v-5b-controlnet-tile-v1"
vae = AutoencoderKLWan.from_pretrained(base_model_path, subfolder="vae", torch_dtype=torch.float32)
transformer = CustomWanTransformer3DModel.from_pretrained(base_model_path, subfolder="transformer", torch_dtype=torch.bfloat16)
controlnet = WanControlnet.from_pretrained(controlnet_model_path, torch_dtype=torch.bfloat16)
pipe = WanTextToVideoControlnetPipeline.from_pretrained(
    pretrained_model_name_or_path=base_model_path,
    controlnet=controlnet,
    transformer=transformer,
    vae=vae, 
    torch_dtype=torch.bfloat16
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=5.0)
pipe.enable_model_cpu_offload()

img_h = 704 # 704 480
img_w = 1280 # 1280 832
num_frames = 121  # 121 81 49

def apply_gaussian_blur(image, ksize=5, sigmaX=1.0):
    image_np = np.array(image)
    if ksize % 2 == 0:
        ksize += 1
    blurred_image = cv2.GaussianBlur(image_np, (ksize, ksize), sigmaX=sigmaX)
    return Image.fromarray(blurred_image)

video_path = 'bubble.mp4'
video_frames = load_video(video_path)[:num_frames]
ksize = 5
downscale_coef =4
controlnet_frames = [x.resize((img_w // downscale_coef, img_h // downscale_coef)) for x in video_frames]
controlnet_frames = [apply_gaussian_blur(x, ksize=ksize, sigmaX=ksize // 2) for x in controlnet_frames]
controlnet_frames = [x.resize((img_w, img_h)) for x in controlnet_frames]

prompt = "Close-up shot with soft lighting, focusing sharply on the lower half of a young woman's face. Her lips are slightly parted as she blows an enormous bubblegum bubble. The bubble is semi-transparent, shimmering gently under the light, and surprisingly contains a miniature aquarium inside, where two orange-and-white goldfish slowly swim, their fins delicately fluttering as if in an aquatic universe. The background is a pure light blue color."
negative_prompt = "bad quality, worst quality"

output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=img_h,
    width=img_w,
    num_frames=num_frames,
    guidance_scale=5,
    generator=torch.Generator(device="cuda").manual_seed(42),
    output_type="pil",

    controlnet_frames=controlnet_frames,
    controlnet_guidance_start=0.0,
    controlnet_guidance_end=0.8,
    controlnet_weight=0.8,

    teacache_treshold=0.6,
).frames[0]

export_to_video(output, "output.mp4", fps=16)
```
## Acknowledgements
Original code and models [Wan2.2](https://github.com/Wan-Video/Wan2.2).  


## Citations
```
@misc{TheDenk,
    title={Wan2.2 Controlnet},
    author={Karachev Denis},
    url={https://github.com/TheDenk/wan2.2-controlnet},
    publisher={Github},
    year={2025}
}
```

## Contacts
<p>Issues should be raised directly in the repository. For professional support and recommendations please <a>[email protected]</a>.</p>