Wan 2.2 Image-to-Video (I2V-A14B) - GGUF FP16 Quantized Models

This repository contains GGUF quantized versions of the Wan 2.2 Image-to-Video A14B model, optimized for efficient inference with reduced VRAM requirements while maintaining high-quality video generation capabilities.

Model Description

Wan 2.2 is an advanced large-scale video generative model that uses a Mixture-of-Experts (MoE) architecture specifically designed for image-to-video synthesis. The A14B variant features a dual-expert design with approximately 14 billion parameters per expert:

  • High-Noise Expert: Optimized for early denoising stages, focusing on overall layout and composition
  • Low-Noise Expert: Specialized for later denoising stages, refining video details and quality

The model generates videos at 480P and 720P resolutions from static images, with support for text-guided prompts to control the generation process. Wan 2.2 incorporates meticulously curated aesthetic data with detailed labels for lighting, composition, contrast, and color tone, enabling precise cinematic-style video generation.

Repository Contents

This repository contains three GGUF model files optimized for different use cases:

diffusion_models/wan/
β”œβ”€β”€ wan22-i2v-a14b-high.gguf           (15 GB)  - Full FP16 high-noise expert
β”œβ”€β”€ wan22-i2v-a14b-high-q4-k-s.gguf    (8.2 GB) - Q4_K_S quantized high-noise expert
└── wan22-i2v-a14b-low-q4-k-s.gguf     (8.2 GB) - Q4_K_S quantized low-noise expert

Total Repository Size: 31 GB

Model Files Explained

  • wan22-i2v-a14b-high.gguf: Full precision FP16 high-noise expert model for maximum quality
  • wan22-i2v-a14b-high-q4-k-s.gguf: Q4_K_S quantized high-noise expert (46% size reduction)
  • wan22-i2v-a14b-low-q4-k-s.gguf: Q4_K_S quantized low-noise expert (46% size reduction)

Quantization Format: Q4_K_S (4-bit K-quant Small) provides an optimal balance between model size, memory usage, and generation quality.

Hardware Requirements

Minimum Requirements

Configuration VRAM Disk Space RAM
Full FP16 24 GB 31 GB 32 GB
Q4_K_S Quantized 12 GB 31 GB 16 GB
Mixed (FP16 + Q4_K_S) 18 GB 31 GB 24 GB

Recommended Requirements

  • GPU: NVIDIA RTX 4090 (24GB), RTX 6000 Ada (48GB), or A6000 (48GB)
  • CPU: Modern multi-core processor (8+ cores recommended)
  • Storage: SSD for faster model loading
  • Operating System: Windows 10/11, Linux (Ubuntu 22.04+)

Performance Notes

  • FP16 models provide the highest quality but require more VRAM
  • Q4_K_S quantization reduces VRAM usage by ~50% with minimal quality loss
  • Video generation time depends on resolution (480P ~30-60s, 720P ~60-120s per video)
  • Batch processing can improve throughput but requires additional VRAM

Usage Examples

ComfyUI Integration

The most common way to use these GGUF models is through ComfyUI with the ComfyUI-GGUF custom node.

Installation:

# Navigate to ComfyUI custom nodes directory
cd ComfyUI/custom_nodes

# Clone the GGUF node
git clone https://github.com/city96/ComfyUI-GGUF

# Install dependencies
cd ComfyUI-GGUF
pip install -r requirements.txt

Model Setup:

# Copy models to ComfyUI directory
cp E:\huggingface\wan22-fp16-i2v-gguf\diffusion_models\wan\*.gguf ComfyUI\models\unet\

Workflow Configuration:

  1. Load image input node
  2. Add GGUF Model Loader node
  3. Select wan22-i2v-a14b-high-q4-k-s.gguf (for high-noise expert)
  4. Add prompt conditioning (optional)
  5. Configure video sampler with:
    • Steps: 50-100
    • CFG Scale: 7-9
    • Resolution: 480P or 720P
  6. Connect to video output node

Python Usage (Diffusers)

For direct Python usage with absolute paths:

from diffusers import DiffusionPipeline
import torch

# Note: GGUF models require conversion or specialized loaders
# For native Diffusers support, use the base model:
# pipe = DiffusionPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers")

# For GGUF files, use ComfyUI or llama.cpp-based loaders
# Example using custom GGUF loader (requires compatible library):
from comfyui_gguf_loader import load_gguf_model

model_path = r"E:\huggingface\wan22-fp16-i2v-gguf\diffusion_models\wan\wan22-i2v-a14b-high-q4-k-s.gguf"
model = load_gguf_model(model_path, device="cuda", dtype=torch.float16)

# Generate video from image
image = load_image("input_image.jpg")
video = model.generate(
    image=image,
    prompt="A serene landscape with gentle wind moving through grass",
    num_frames=48,
    resolution="720p",
    guidance_scale=8.0,
    num_inference_steps=75
)

# Save video
video.save("output_video.mp4")

Advanced Configuration

# Memory-optimized configuration for 12GB VRAM
config = {
    "model_path": r"E:\huggingface\wan22-fp16-i2v-gguf\diffusion_models\wan\wan22-i2v-a14b-high-q4-k-s.gguf",
    "vae_tiling": True,          # Reduce VAE memory usage
    "enable_xformers": True,      # Memory-efficient attention
    "gradient_checkpointing": True,
    "low_vram_mode": True,
    "chunk_size": 2,              # Process video in chunks
}

Model Specifications

Architecture

  • Base Model: Wan 2.2 I2V-A14B (Image-to-Video)
  • Parameters: 14.3 billion per expert (~27B total, 14B active)
  • Architecture: Mixture-of-Experts (MoE) Diffusion Transformer
  • Experts: Dual-expert design (high-noise + low-noise)
  • Precision: FP16 (full) / Q4_K_S (quantized)
  • Format: GGUF (GPT-Generated Unified Format)

Capabilities

  • Input: Static images (any resolution, recommended 512x512 or higher)
  • Output: Video sequences at 480P (854x480) or 720P (1280x720)
  • Frame Count: Configurable (typically 24-96 frames)
  • Frame Rate: 24 FPS (configurable)
  • Duration: 1-4 seconds typical output
  • Text Conditioning: Optional prompt-guided generation
  • Style Control: Lighting, composition, contrast, color tone

Quantization Details

Q4_K_S Quantization:

  • Bit Depth: 4-bit per weight (mixed with some 6-bit components)
  • Method: K-quant Small (balanced quality/size trade-off)
  • Size Reduction: ~46% compared to FP16
  • Quality Loss: Minimal (~2-5% perceptual difference)
  • Speed: Similar or faster inference due to reduced memory bandwidth

Performance Tips and Optimization

Memory Optimization

  1. Use Quantized Models: Start with Q4_K_S versions for 12GB VRAM systems
  2. Enable VAE Tiling: Reduces memory usage by processing image tiles
  3. Lower Resolution: Generate at 480P first, upscale if needed
  4. Reduce Batch Size: Process one video at a time on limited VRAM
  5. Model Offloading: Move models to CPU between inference steps

Quality Optimization

  1. Inference Steps: Use 75-100 steps for best quality (50 minimum)
  2. Guidance Scale: CFG 7-9 provides good prompt adherence
  3. Prompt Engineering: Describe motion, lighting, and camera movement
  4. Input Image Quality: Higher quality input = better video output
  5. Resolution Matching: Match input aspect ratio to output resolution

Speed Optimization

  1. Use Quantized Models: Q4_K_S inference is 10-20% faster
  2. Enable xFormers: Memory-efficient attention for faster processing
  3. Optimize Steps: Balance quality vs speed (50-75 steps for faster generation)
  4. Compile Model: Use torch.compile() for 15-25% speedup (PyTorch 2.0+)
  5. GPU Warmup: Run one generation to compile kernels before batch processing

Example Prompts

Good Prompts:

  • "Gentle camera pan right, golden hour lighting, soft wind through trees"
  • "Slow zoom in, dramatic lighting from left, subtle motion in background"
  • "Static camera, clouds moving across sky, soft ambient lighting"

Avoid:

  • Overly complex multi-action prompts
  • Conflicting motion directions
  • Unrealistic physics or transformations

License

This model is released under a custom Wan license. Please refer to the original Wan 2.2 model repository for complete licensing terms.

Usage Terms

Users are accountable for the content they generate and must not:

  • Violate laws or regulations
  • Cause harm to individuals or groups
  • Generate or spread misinformation or disinformation
  • Target or harm vulnerable populations

Commercial Use

Please consult the original Wan 2.2 license for commercial use terms and conditions.

Citation

If you use Wan 2.2 models in your research or applications, please cite:

@article{wan2025,
  title={Wan: Open and Advanced Large-Scale Video Generative Models},
  author={Team Wan and Contributors},
  journal={arXiv preprint arXiv:2503.20314},
  year={2025}
}

Related Resources

Official Resources

Community Resources

Other Wan 2.2 Variants

Troubleshooting

Common Issues

Issue: Out of memory errors Solution: Use Q4_K_S quantized models, enable VAE tiling, reduce resolution to 480P

Issue: Slow generation speed Solution: Use quantized models, enable xFormers, reduce inference steps to 50-75

Issue: Poor video quality Solution: Increase inference steps to 75-100, use higher guidance scale (8-9), improve input image quality

Issue: Model fails to load Solution: Verify GGUF loader compatibility, check file integrity, ensure sufficient disk space

Issue: Inconsistent motion Solution: Use clearer motion prompts, adjust guidance scale, increase inference steps

Support and Contact

For issues, questions, or contributions:


Model Version: v2.2 README Version: v1.3 Last Updated: 2025-10-14 Format: GGUF (FP16 + Q4_K_S) Base Model: Wan-AI/Wan2.2-I2V-A14B

Downloads last month
49
GGUF
Model size
14B params
Architecture
wan
Hardware compatibility
Log In to view the estimation

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including wangkanai/wan22-fp16-i2v-gguf