arxiv:2511.05664

KLASS: KL-Guided Fast Inference in Masked Diffusion Models

Published on Nov 7

· Submitted by

Seo Hyun Kim on Nov 12

KAIST AI

Upvote

Authors:

Seo Hyun Kim ,

Sunwoo Hong ,

Youngrok Park ,

Abstract

KL-Adaptive Stability Sampling (KLASS) accelerates diffusion-based generation by identifying stable predictions, achieving significant speedups and quality improvements across various domains.

AI-generated summary

Masked diffusion models have demonstrated competitive results on various tasks including language generation. However, due to its iterative refinement process, the inference is often bottlenecked by slow and static sampling speed. To overcome this problem, we introduce `KL-Adaptive Stability Sampling' (KLASS), a fast yet effective sampling method that exploits token-level KL divergence to identify stable, high-confidence predictions. By unmasking multiple tokens in each iteration without any additional model training, our approach speeds up generation significantly while maintaining sample quality. On reasoning benchmarks, KLASS achieves up to 2.78times wall-clock speedups while improving performance over standard greedy decoding, attaining state-of-the-art results among diffusion-based samplers. We further validate KLASS across diverse domains, including text, image, and molecular generation, showing its effectiveness as a broadly applicable sampler across different models.

View arXiv page View PDF GitHub 14 Add to collection

Community

shkim0116

Paper author Paper submitter 14 days ago

TL;DR: We propose KLASS, a sampling method that leverages token-level KL divergence dynamics to identify stable tokens for early unmasking, achieving significant inference speedups for masked diffusion LMs while maintaining and even improving generation quality.

paper: https://arxiv.org/abs/2511.05664
code: https://github.com/shkim0116/KLASS