comfyui 不能加载VAE
试了lighttaew2_1.safetensors与pth,都不能用LoadVAE加载。能不能修复一下,感觉是个革命性的vae,太厉害了。8G能压缩到几十M,太6了
太感谢你们了,要是支持comfyui真是太伟大了。目前wan vae decode消耗vram极大也慢,看到数字我都震惊了,竟然能优化到这个程度
不是metadata的事,看了一下源码,尽一点绵薄之力,其他帮不到了哈
"decoder.middle.0.residual.0.gamma"是关键,
wan2.2必须有 "decoder.upsamples.0.upsamples.0.residual.2.weight"
源码位置 :https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/sd.py#L274
 elif "decoder.middle.0.residual.0.gamma" in sd:
                if "decoder.upsamples.0.upsamples.0.residual.2.weight" in sd:  # Wan 2.2 VAE
                    self.upscale_ratio = (lambda a: max(0, a * 4 - 3), 16, 16)
                    self.upscale_index_formula = (4, 16, 16)
                    self.downscale_ratio = (lambda a: max(0, math.floor((a + 3) / 4)), 16, 16)
                    self.downscale_index_formula = (4, 16, 16)
                    self.latent_dim = 3
                    self.latent_channels = 48
                    ddconfig = {"dim": 160, "z_dim": self.latent_channels, "dim_mult": [1, 2, 4, 4], "num_res_blocks": 2, "attn_scales": [], "temperal_downsample": [False, True, True], "dropout": 0.0}
                    self.first_stage_model = comfy.ldm.wan.vae2_2.WanVAE(**ddconfig)
                    self.working_dtypes = [torch.bfloat16, torch.float16, torch.float32]
                    self.memory_used_encode = lambda shape, dtype: 3300 * shape[3] * shape[4] * model_management.dtype_size(dtype)
                    self.memory_used_decode = lambda shape, dtype: 8000 * shape[3] * shape[4] * (16 * 16) * model_management.dtype_size(dtype)
                else:  # Wan 2.1 VAE
                    self.upscale_ratio = (lambda a: max(0, a * 4 - 3), 8, 8)
                    self.upscale_index_formula = (4, 8, 8)
                    self.downscale_ratio = (lambda a: max(0, math.floor((a + 3) / 4)), 8, 8)
                    self.downscale_index_formula = (4, 8, 8)
                    self.latent_dim = 3
                    self.latent_channels = 16
                    ddconfig = {"dim": 96, "z_dim": self.latent_channels, "dim_mult": [1, 2, 4, 4], "num_res_blocks": 2, "attn_scales": [], "temperal_downsample": [False, True, True], "dropout": 0.0}
                    self.first_stage_model = comfy.ldm.wan.vae.WanVAE(**ddconfig)
                    self.working_dtypes = [torch.bfloat16, torch.float16, torch.float32]
                    self.memory_used_encode = lambda shape, dtype: 6000 * shape[3] * shape[4] * model_management.dtype_size(dtype)
                    self.memory_used_decode = lambda shape, dtype: 7000 * shape[3] * shape[4] * (8 * 8) * model_management.dtype_size(dtype)
custom_nodes\ComfyUI-LightVAE module for custom nodes: No module named 'lightx2v.models.video_encoders'
这个10g有点奇怪,可以watch nvidia-smi观察下统计的对不对。精度问题的话,如果你只是做视频的重建(不涉及Wan),可以把这里的 need_scaled 改成false,https://github.com/ModelTC/ComfyUI-LightVAE/blob/ca7a73965963bf966b405af401482c8f5ad470d9/nodes.py#L88,然后再看看效果
可能是sageAttention2的问题,测试你们的vae后发现了。去掉sageAttention2以后decode 速度正常了,本来是几秒。need_scaled 改成false后图片颜色正常了
试了几个高清的图片,4K下看不出任何差异。原版taew我跑过会有瑕疵,例如脸上出现斑点,你们这个没有任何问题。太感谢了,厉害!
No module named 'lightx2v.models.video_encoders'
可以复制原来lightx2v项目里面缺的video_encoders两个目录过去,暂时凑合用。

