Abstract
KL-Adaptive Stability Sampling (KLASS) accelerates diffusion-based generation by identifying stable predictions, achieving significant speedups and quality improvements across various domains.
Masked diffusion models have demonstrated competitive results on various tasks including language generation. However, due to its iterative refinement process, the inference is often bottlenecked by slow and static sampling speed. To overcome this problem, we introduce `KL-Adaptive Stability Sampling' (KLASS), a fast yet effective sampling method that exploits token-level KL divergence to identify stable, high-confidence predictions. By unmasking multiple tokens in each iteration without any additional model training, our approach speeds up generation significantly while maintaining sample quality. On reasoning benchmarks, KLASS achieves up to 2.78times wall-clock speedups while improving performance over standard greedy decoding, attaining state-of-the-art results among diffusion-based samplers. We further validate KLASS across diverse domains, including text, image, and molecular generation, showing its effectiveness as a broadly applicable sampler across different models.
Community
TL;DR: We propose KLASS, a sampling method that leverages token-level KL divergence dynamics to identify stable tokens for early unmasking, achieving significant inference speedups for masked diffusion LMs while maintaining and even improving generation quality.
paper: https://arxiv.org/abs/2511.05664
code: https://github.com/shkim0116/KLASS
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Guided Star-Shaped Masked Diffusion (2025)
- Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models (2025)
- Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding (2025)
- Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing (2025)
- Accelerating Diffusion LLM Inference via Local Determinism Propagation (2025)
- Fast-dLLM v2: Efficient Block-Diffusion LLM (2025)
- Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper