Diversity Has Always Been There in Your Visual Autoregressive Models
Abstract
DiverseVAR enhances generative diversity in Visual Autoregressive models by modifying the pivotal component of feature maps without additional training, improving synthesis quality.
Visual Autoregressive (VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm, offering notable advantages in both inference efficiency and image quality compared to traditional multi-step autoregressive (AR) and diffusion models. However, despite their efficiency, VAR models often suffer from the diversity collapse i.e., a reduction in output variability, analogous to that observed in few-step distilled diffusion models. In this paper, we introduce DiverseVAR, a simple yet effective approach that restores the generative diversity of VAR models without requiring any additional training. Our analysis reveals the pivotal component of the feature map as a key factor governing diversity formation at early scales. By suppressing the pivotal component in the model input and amplifying it in the model output, DiverseVAR effectively unlocks the inherent generative potential of VAR models while preserving high-fidelity synthesis. Empirical results demonstrate that our approach substantially enhances generative diversity with only neglectable performance influences. Our code will be publicly released at https://github.com/wangtong627/DiverseVAR.
Community
We introduce DiverseVAR, a simple yet effective approach that restores the generative diversity of VAR models without requiring any additional training.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ScaleWeaver: Weaving Efficient Controllable T2I Generation with Multi-Scale Reference Attention (2025)
- Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy (2025)
- Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models (2025)
- Dynamic Mixture-of-Experts for Visual Autoregressive Model (2025)
- ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation (2025)
- SoftCFG: Uncertainty-guided Stable Guidance for Visual Autoregressive Model (2025)
- EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper