Ovi for Rapid sound consideration?

#136

by VainGuard - opened Oct 4

Oct 4

@Phr00t have you seen the WAN 2.2 Video+Audio Ovi model?

https://aaxwaz.github.io/Ovi/
https://github.com/character-ai/Ovi

The model is still only 24GB, and I can only imagine the magic you could add to it. Mega is very good, but just imagine plus sound?! I hope it is VACE-like that allows for extended generation and/or control nets.

It's still early before ComfyUI support is added, but this is exciting.

Phr00t

Owner Oct 4

Eehhhhhhhh it is based off of the WAN 5B model which isn't that great.

OnkelTom

Oct 4

Wan 2.5 will have sound so it is just a waiting game.

Kutches

Oct 4

•

edited Oct 4

Wan 2.5 will have sound so it is just a waiting game.

But wan 2.5 is not open source so i dont see how that is relevant...

VainGuard

Oct 5

It's a real loss for local creators if WAN 2.5 remains closed. Ovi could still be useful for getting a baseline video and then maybe do V2V to Rapid with sound merged in. Or have S2V work better than it currently does and plugin Rapid.

A lot is possible when NVFP4 models come out, but it will require an upgrade to the 50xx series. So much more could be accomplished with limited VRAM. It's the only correct way to quantize WAN 2.5 for consumer cards and we just have to wish WAN 2.5 will become open source. And someone will have to make the NVFP4 quantization. If all of these happen, decent movies will be made with 5090s if/when this happens. (It will still take significant effort to make consistent characters and environments and months of work.)

OnkelTom

Oct 5

•

edited Oct 5

Wan 2.5 will be released as a watered down open source version like with the other Wan versions. The "full" version will not run on a consumer pc anyway.

Kutches

Oct 5

Wan 2.5 will be released as a watered down open source version like with the other Wan versions. The "full" version will not run on a consumer pc anyway.

thats obvious thats why people are asking about the Ovi model..

VainGuard

Oct 9

After some issues, this ComfyUI Ovi repo is working at least for ComfyUI desktop mode. (There is another one "RH_Ovi". Stay away from that one.)
https://github.com/snicolast/ComfyUI-Ovi
Also available through ComfyUI Manager. Search "-ovi"
KJ is working on his version. He has an ovi branch, but I haven't tried it yet.

Even with only 16GB VRAM (requirements are 24GB+), I'm still able to generate an audio+video clip sometimes as low as 6 minutes.
Be sure to have this WAN text encoder https://huggingface.co/Kijai/WanVideo_comfy/blob/main/umt5-xxl-enc-fp8_e4m3fn.safetensors

This uses a Rapid starting frame and then Ovi I2V. (Manually; only using the default workflows for each.) Imagine putting the quality of Rapid into Ovi.

VainGuard

Oct 10

The Kijai nodes execute Ovi faster even with 50 euler steps, bp16 Ovi models, and fp32 audio plus using another ~12GB of RAM with offloading (so definitely need 16GB VRAM + 64GB RAM minimum with total RAM usage hitting 62GB). I like how the models are loaded instead of hidden behind an "engine" node. It also exposes the latent space, so a possible extended loop can be built.
It's not available in main as of last night, so you have to manually switch to the ovi branch.

MINGW64 .../ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper (main)
$ git fetch origin
remote: Enumerating objects: 135, done.
remote: Counting objects: 100% (135/135), done.
remote: Compressing objects: 100% (84/84), done.
remote: Total 135 (delta 78), reused 107 (delta 50), pack-reused 0 (from 0)
Receiving objects: 100% (135/135), 128.27 KiB | 2.25 MiB/s, done.
Resolving deltas: 100% (78/78), completed with 9 local objects.
From https://github.com/kijai/ComfyUI-WanVideoWrapper
   1d94082..6d2ff33  main       -> origin/main
 * [new branch]      ovi        -> origin/ovi

MINGW64 .../ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper (main)
$ git switch ovi
branch 'ovi' set up to track 'origin/ovi'.
Switched to a new branch 'ovi'

Seeker36087

Oct 12

It's a real loss for local creators if WAN 2.5 remains closed. Ovi could still be useful for getting a baseline video and then maybe do V2V to Rapid with sound merged in. Or have S2V work better than it currently does and plugin Rapid.

A lot is possible when NVFP4 models come out, but it will require an upgrade to the 50xx series. So much more could be accomplished with limited VRAM. It's the only correct way to quantize WAN 2.5 for consumer cards and we just have to wish WAN 2.5 will become open source. And someone will have to make the NVFP4 quantization. If all of these happen, decent movies will be made with 5090s if/when this happens. (It will still take significant effort to make consistent characters and environments and months of work.)

As someone with a 5070 - constantly feeling like nothing truly works or utilises Blackwell architecture properly - I can't wait for proper NVFP4 models to arrive! I just hope that the 12GB VRAM on the 5070 doesn't invalidate most of the benefits...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment