MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published 14 days ago • 51
TwinFlow Collection A collection of TwinFlow-accelerated diffusion models • 4 items • Updated 28 days ago • 6
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published Dec 9, 2025 • 132
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows Paper • 2512.05150 • Published Dec 3, 2025 • 75
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published May 17, 2025 • 40
Efficient Generative Model Training via Embedded Representation Warmup Paper • 2504.10188 • Published Apr 14, 2025 • 12
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect Paper • 2403.03853 • Published Mar 6, 2024 • 66