---
license: mit
pipeline_tag: image-to-image
library_name: diffusers
---
🚀 REPA-E for T2I
End-to-End Tuned VAEs for Supercharging Text-to-Image Diffusion Transformers
🌐 Project Page
🤗 Models
📃 Paper
---
## 🚀 Overall
We present REPA-E for T2I, a family of end-to-end tuned VAEs designed to supercharge text-to-image generation training. These models consistently outperform SD-3.5-VAE across all benchmarks (COCO-30K, DPG-Bench, GenAI-Bench, GenEval, and MJHQ-30K) without requiring any additional representation alignment losses.
For training, we adopt the official REPA-E training code to optimize the
SD-3.5-VAE for 80 epochs with a batch size of 256 on the ImageNet-256 dataset.
The REPA-E training effectively refines the VAE’s latent-space structure and enables faster convergence in downstream text-to-image latent diffusion model training.
This repository provides diffusers-compatible weights for the end-to-end trained SD-3.5-VAE. In addition, we release end-to-end trained variants of several other widely used VAEs to facilitate research and integration within text-to-image diffusion frameworks.
## ⚡️ Quickstart
```python
from diffusers import AutoencoderKL
vae = AutoencoderKL.from_pretrained("REPA-E/e2e-sd3.5-vae").to("cuda")
```
> Use `vae.encode(...)` / `vae.decode(...)` in your pipeline. (A full example is provided below.)
### 🧩 End-to-End Trained VAE Releases
| Model | Hugging Face Link |
|-------|-------------------|
| **E2E-FLUX-VAE** | 🤗 [REPA-E/e2e-flux-vae](https://huggingface.co/REPA-E/e2e-flux-vae) |
| **E2E-SD-3.5-VAE** | 🤗 [REPA-E/e2e-sd3.5-vae](https://huggingface.co/REPA-E/e2e-sd3.5-vae) |
| **E2E-Qwen-Image-VAE** | 🤗 [REPA-E/e2e-qwenimage-vae](https://huggingface.co/REPA-E/e2e-qwenimage-vae) |
## 📦 Requirements
The following packages are required to load and run the REPA-E VAEs with the `diffusers` library:
```bash
pip install diffusers>=0.33.0
pip install torch>=2.3.1
```
## 🚀 Example Usage
Below is a minimal example showing how to load and use the REPA-E end-to-end trained SD-3.5-VAE with `diffusers`:
```python
from io import BytesIO
import requests
from diffusers import AutoencoderKL
import numpy as np
import torch
from PIL import Image
response = requests.get("https://raw.githubusercontent.com/End2End-Diffusion/fuse-dit/main/assets/example.png")
device = "cuda"
image = torch.from_numpy(
np.array(
Image.open(BytesIO(response.content))
)
).permute(2, 0, 1).unsqueeze(0).to(torch.float32) / 127.5 - 1
image = image.to(device)
vae = AutoencoderKL.from_pretrained("REPA-E/e2e-sd3.5-vae").to(device)
with torch.no_grad():
latents = vae.encode(image).latent_dist.sample()
reconstructed = vae.decode(latents).sample
```