Wan 2.2 Image-to-Video (I2V-A14B) - GGUF FP16 Quantized Models

This repository contains GGUF quantized versions of the Wan 2.2 Image-to-Video A14B model, optimized for efficient inference with reduced VRAM requirements while maintaining high-quality video generation capabilities.

Model Description

Wan 2.2 is an advanced large-scale video generative model that uses a Mixture-of-Experts (MoE) architecture specifically designed for image-to-video synthesis. The A14B variant features a dual-expert design with approximately 14 billion parameters per expert:

High-Noise Expert: Optimized for early denoising stages, focusing on overall layout and composition
Low-Noise Expert: Specialized for later denoising stages, refining video details and quality

The model generates videos at 480P and 720P resolutions from static images, with support for text-guided prompts to control the generation process. Wan 2.2 incorporates meticulously curated aesthetic data with detailed labels for lighting, composition, contrast, and color tone, enabling precise cinematic-style video generation.

Repository Contents

This repository contains three GGUF model files optimized for different use cases:

diffusion_models/wan/
├── wan22-i2v-a14b-high.gguf           (15 GB)  - Full FP16 high-noise expert
├── wan22-i2v-a14b-high-q4-k-s.gguf    (8.2 GB) - Q4_K_S quantized high-noise expert
└── wan22-i2v-a14b-low-q4-k-s.gguf     (8.2 GB) - Q4_K_S quantized low-noise expert

Total Repository Size: 31 GB

Model Files Explained

wan22-i2v-a14b-high.gguf: Full precision FP16 high-noise expert model for maximum quality
wan22-i2v-a14b-high-q4-k-s.gguf: Q4_K_S quantized high-noise expert (46% size reduction)
wan22-i2v-a14b-low-q4-k-s.gguf: Q4_K_S quantized low-noise expert (46% size reduction)

Quantization Format: Q4_K_S (4-bit K-quant Small) provides an optimal balance between model size, memory usage, and generation quality.

Hardware Requirements

Minimum Requirements

Configuration	VRAM	Disk Space	RAM
Full FP16	24 GB	31 GB	32 GB
Q4_K_S Quantized	12 GB	31 GB	16 GB
Mixed (FP16 + Q4_K_S)	18 GB	31 GB	24 GB

Recommended Requirements

GPU: NVIDIA RTX 4090 (24GB), RTX 6000 Ada (48GB), or A6000 (48GB)
CPU: Modern multi-core processor (8+ cores recommended)
Storage: SSD for faster model loading
Operating System: Windows 10/11, Linux (Ubuntu 22.04+)

Performance Notes

FP16 models provide the highest quality but require more VRAM
Q4_K_S quantization reduces VRAM usage by ~50% with minimal quality loss
Video generation time depends on resolution (480P ~30-60s, 720P ~60-120s per video)
Batch processing can improve throughput but requires additional VRAM

Usage Examples

ComfyUI Integration

The most common way to use these GGUF models is through ComfyUI with the ComfyUI-GGUF custom node.

Installation:

# Navigate to ComfyUI custom nodes directory
cd ComfyUI/custom_nodes

# Clone the GGUF node
git clone https://github.com/city96/ComfyUI-GGUF

# Install dependencies
cd ComfyUI-GGUF
pip install -r requirements.txt

Model Setup:

# Copy models to ComfyUI directory
cp E:\huggingface\wan22-fp16-i2v-gguf\diffusion_models\wan\*.gguf ComfyUI\models\unet\

Workflow Configuration:

Load image input node
Add GGUF Model Loader node
Select wan22-i2v-a14b-high-q4-k-s.gguf (for high-noise expert)
Add prompt conditioning (optional)
Configure video sampler with:
- Steps: 50-100
- CFG Scale: 7-9
- Resolution: 480P or 720P
Connect to video output node

Python Usage (Diffusers)

For direct Python usage with absolute paths:

from diffusers import DiffusionPipeline
import torch

# Note: GGUF models require conversion or specialized loaders
# For native Diffusers support, use the base model:
# pipe = DiffusionPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers")

# For GGUF files, use ComfyUI or llama.cpp-based loaders
# Example using custom GGUF loader (requires compatible library):
from comfyui_gguf_loader import load_gguf_model

model_path = r"E:\huggingface\wan22-fp16-i2v-gguf\diffusion_models\wan\wan22-i2v-a14b-high-q4-k-s.gguf"
model = load_gguf_model(model_path, device="cuda", dtype=torch.float16)

# Generate video from image
image = load_image("input_image.jpg")
video = model.generate(
    image=image,
    prompt="A serene landscape with gentle wind moving through grass",
    num_frames=48,
    resolution="720p",
    guidance_scale=8.0,
    num_inference_steps=75
)

# Save video
video.save("output_video.mp4")

Advanced Configuration

# Memory-optimized configuration for 12GB VRAM
config = {
    "model_path": r"E:\huggingface\wan22-fp16-i2v-gguf\diffusion_models\wan\wan22-i2v-a14b-high-q4-k-s.gguf",
    "vae_tiling": True,          # Reduce VAE memory usage
    "enable_xformers": True,      # Memory-efficient attention
    "gradient_checkpointing": True,
    "low_vram_mode": True,
    "chunk_size": 2,              # Process video in chunks
}

Model Specifications

Architecture

Base Model: Wan 2.2 I2V-A14B (Image-to-Video)
Parameters: 14.3 billion per expert (~27B total, 14B active)
Architecture: Mixture-of-Experts (MoE) Diffusion Transformer
Experts: Dual-expert design (high-noise + low-noise)
Precision: FP16 (full) / Q4_K_S (quantized)
Format: GGUF (GPT-Generated Unified Format)

Capabilities

Input: Static images (any resolution, recommended 512x512 or higher)
Output: Video sequences at 480P (854x480) or 720P (1280x720)
Frame Count: Configurable (typically 24-96 frames)
Frame Rate: 24 FPS (configurable)
Duration: 1-4 seconds typical output
Text Conditioning: Optional prompt-guided generation
Style Control: Lighting, composition, contrast, color tone

Quantization Details

Q4_K_S Quantization:

Bit Depth: 4-bit per weight (mixed with some 6-bit components)
Method: K-quant Small (balanced quality/size trade-off)
Size Reduction: ~46% compared to FP16
Quality Loss: Minimal (~2-5% perceptual difference)
Speed: Similar or faster inference due to reduced memory bandwidth

Performance Tips and Optimization

Memory Optimization

Use Quantized Models: Start with Q4_K_S versions for 12GB VRAM systems
Enable VAE Tiling: Reduces memory usage by processing image tiles
Lower Resolution: Generate at 480P first, upscale if needed
Reduce Batch Size: Process one video at a time on limited VRAM
Model Offloading: Move models to CPU between inference steps

Quality Optimization

Inference Steps: Use 75-100 steps for best quality (50 minimum)
Guidance Scale: CFG 7-9 provides good prompt adherence
Prompt Engineering: Describe motion, lighting, and camera movement
Input Image Quality: Higher quality input = better video output
Resolution Matching: Match input aspect ratio to output resolution

Speed Optimization

Use Quantized Models: Q4_K_S inference is 10-20% faster
Enable xFormers: Memory-efficient attention for faster processing
Optimize Steps: Balance quality vs speed (50-75 steps for faster generation)
Compile Model: Use torch.compile() for 15-25% speedup (PyTorch 2.0+)
GPU Warmup: Run one generation to compile kernels before batch processing

Example Prompts

Good Prompts:

"Gentle camera pan right, golden hour lighting, soft wind through trees"
"Slow zoom in, dramatic lighting from left, subtle motion in background"
"Static camera, clouds moving across sky, soft ambient lighting"

Avoid:

Overly complex multi-action prompts
Conflicting motion directions
Unrealistic physics or transformations

License

This model is released under a custom Wan license. Please refer to the original Wan 2.2 model repository for complete licensing terms.

Usage Terms

Users are accountable for the content they generate and must not:

Violate laws or regulations
Cause harm to individuals or groups
Generate or spread misinformation or disinformation
Target or harm vulnerable populations

Commercial Use

Please consult the original Wan 2.2 license for commercial use terms and conditions.

Citation

If you use Wan 2.2 models in your research or applications, please cite:

@article{wan2025,
  title={Wan: Open and Advanced Large-Scale Video Generative Models},
  author={Team Wan and Contributors},
  journal={arXiv preprint arXiv:2503.20314},
  year={2025}
}

Related Resources

Official Resources

Original Model: Wan-AI/Wan2.2-I2V-A14B
Diffusers Version: Wan-AI/Wan2.2-I2V-A14B-Diffusers
GGUF Collection: QuantStack/Wan2.2-I2V-A14B-GGUF
GitHub Repository: Wan-Video/Wan2.2
Research Paper: arXiv:2503.20314

Community Resources

ComfyUI Integration: ComfyUI-GGUF
Tutorial: Wan 2.2 VideoGen in ComfyUI
Low VRAM Guide: Running Wan 2.2 GGUF with Low VRAM

Other Wan 2.2 Variants

Text-to-Video: Wan2.2-T2V-A14B
Text+Image-to-Video: Wan2.2-TI2V-5B
Speech-to-Video: Wan2.2-S2V-14B

Troubleshooting

Common Issues

Issue: Out of memory errors Solution: Use Q4_K_S quantized models, enable VAE tiling, reduce resolution to 480P

Issue: Slow generation speed Solution: Use quantized models, enable xFormers, reduce inference steps to 50-75

Issue: Poor video quality Solution: Increase inference steps to 75-100, use higher guidance scale (8-9), improve input image quality

Issue: Model fails to load Solution: Verify GGUF loader compatibility, check file integrity, ensure sufficient disk space

Issue: Inconsistent motion Solution: Use clearer motion prompts, adjust guidance scale, increase inference steps

Support and Contact

For issues, questions, or contributions:

Model Issues: Wan-AI on Hugging Face
GGUF Issues: ComfyUI-GGUF GitHub
General Discussion: Hugging Face Forums

Model Version: v2.2 README Version: v1.3 Last Updated: 2025-10-14 Format: GGUF (FP16 + Q4_K_S) Base Model: Wan-AI/Wan2.2-I2V-A14B

Downloads last month: 49

GGUF

Model size

14B params

Architecture

wan

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Collection including wangkanai/wan22-fp16-i2v-gguf

wan-2.2

Collection

WAN 2.2 video models • 27 items • Updated 4 days ago • 1