Qwen Image ModelOpt FP8 SGLang Transformer

This repository contains a SGLang-ready ModelOpt FP8 transformer override for Qwen/Qwen-Image. It only replaces the transformer weights; tokenizer, scheduler, VAE, and other non-transformer components are loaded from the original base model.

The checkpoint is intended for SGLang Diffusion with the Qwen Image FP8 support from sgl-project/sglang#23155.

Usage

sglang generate \
  --backend=sglang \
  --model-id=Qwen-Image \
  --model-path Qwen/Qwen-Image \
  --transformer-path BBuf/Qwen-Image-ModelOpt-FP8-SGLang \
  --prompt "A futuristic cyberpunk city at night, neon lights reflecting on wet streets" \
  --width=1024 \
  --height=1024 \
  --num-inference-steps=50 \
  --guidance-scale=4.0 \
  --seed=42 \
  --num-gpus=1 \
  --dit-cpu-offload false \
  --dit-layerwise-offload false \
  --warmup \
  --save-output

H100 Validation Snapshot

Validation was run on one H100 GPU using rank0 with --backend=sglang. The FP8 image below is from the fixed checkpoint after keeping the validated sensitive Qwen Image fallback tensors in BF16.

Artifacts:

BF16, 1024x1024, 50 steps FP8 fixed, 1024x1024, 50 steps
BF16 output FP8 fixed output

Benchmark, warmup excluded:

Metric BF16 FP8 fixed Delta Speedup
E2E latency 13.589 s 12.159 s -1.430 s (-10.5%) 1.12x
Denoising stage 12.929 s 11.437 s -1.491 s (-11.5%) 1.13x
Decoding stage 58.55 ms 52.30 ms -6.25 ms (-10.7%) 1.12x
Text encoding 599.85 ms 666.43 ms +66.57 ms (+11.1%) 0.90x

Notes:

  • Validation prompt: A futuristic cyberpunk city at night, neon lights reflecting on wet streets.
  • Validation settings: 1024x1024, 50 inference steps, guidance_scale=4.0, seed=42, --dit-cpu-offload false, --dit-layerwise-offload false, --warmup.
  • Profiler artifacts were captured separately with profiler flags; those profiler timings include profiling overhead and are not used as benchmark latency numbers.

Conversion Notes

The checkpoint was converted from a NVIDIA ModelOpt FP8 export with SGLang's build_modelopt_fp8_transformer tool. Most linear weights are FP8. The validated fallback set keeps numerically sensitive tensors in BF16, including the Qwen Image image-MLP output projection family needed for normal image quality.

Downloads last month
121
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BBuf/Qwen-Image-ModelOpt-FP8-SGLang

Base model

Qwen/Qwen-Image
Quantized
(25)
this model