Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference Paper • 2406.10774 • Published Jun 16, 2024 • 4
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published 29 days ago • 44
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published about 1 month ago • 58
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13 • 173
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published Oct 2 • 92
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder Paper • 2509.25182 • Published Sep 29 • 36
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29 • 44
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28 • 115
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization Paper • 2508.10395 • Published Aug 14 • 42
LPD Collection Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation • 6 items • Updated Jul 2 • 2
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation Paper • 2507.01957 • Published Jul 2 • 21
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation Paper • 2506.19852 • Published Jun 24 • 41
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28 • 44
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16 • 75
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation Paper • 2505.18875 • Published May 24 • 42
Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity Paper • 2502.01776 • Published Feb 3 • 3