Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling Paper • 2510.16751 • Published 11 days ago • 19
VISTA: A Test-Time Self-Improving Video Generation Agent Paper • 2510.15831 • Published 13 days ago • 19
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published 14 days ago • 48
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 17 days ago • 160
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published 28 days ago • 91
dParallel: Learnable Parallel Decoding for dLLMs Paper • 2509.26488 • Published about 1 month ago • 19
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder Paper • 2509.25182 • Published Sep 29 • 36
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 76
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis Paper • 2509.10441 • Published Sep 12 • 30
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing Paper • 2508.09192 • Published Aug 8 • 30
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning Paper • 2507.14137 • Published Jul 18 • 34
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published Jul 11 • 61
Cosmos-Predict2 Collection World Foundation Model for Future Prediction • 13 items • Updated 9 days ago • 29
DIP: Unsupervised Dense In-Context Post-training of Visual Representations Paper • 2506.18463 • Published Jun 23 • 21
Attention, Please! Revisiting Attentive Probing for Masked Image Modeling Paper • 2506.10178 • Published Jun 11 • 7