Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published about 1 month ago • 118
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 9 items • Updated about 17 hours ago • 9
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 9 items • Updated about 17 hours ago • 9
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 9 items • Updated about 17 hours ago • 9
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 9 items • Updated about 17 hours ago • 9
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering Paper • 2505.23604 • Published May 29 • 23