E-MMDiT: Revisiting Multimodal Diffusion Transformer Design for Fast Image Synthesis under Limited Resources
Paper
โข
2510.27135
โข
Published
This is the Nitro-E 512px text-to-image diffusion model with DC-AE-Lite for faster image decoding.
| VAE Variant | Decoding Speed | Quality |
|---|---|---|
| DC-AE (Standard) | 1.0ร | Reference |
| DC-AE-Lite | 1.8ร | Similar |
This makes Nitro-E even faster for real-time applications!
import torch
from diffusers import NitroEPipeline
# Load the lite variant
pipe = NitroEPipeline.from_pretrained(
"blanchon/nitro_e_512_lite",
torch_dtype=torch.bfloat16
)
pipe.to("cuda")
# Generate image (1.8x faster decoding!)
prompt = "A hot air balloon in the shape of a heart grand canyon"
image = pipe(
prompt=prompt,
width=512,
height=512,
num_inference_steps=20,
guidance_scale=4.5,
).images[0]
image.save("output.png")
Use DC-AE-Lite (this model) when:
Use standard DC-AE when:
@article{nitro-e-2025,
title={Nitro-E: Efficient Training of Diffusion Models},
author={AMD AI Group},
journal={arXiv preprint arXiv:2510.27135},
year={2025}
}
Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved.
Licensed under the MIT License.
Base model
amd/Nitro-E