MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation Paper • 2405.17842 • Published May 28, 2024
HumanGif: Single-View Human Diffusion with Generative Prior Paper • 2502.12080 • Published Feb 17 • 1
SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation Paper • 2412.13462 • Published Dec 18, 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation Paper • 2409.17550 • Published Sep 26, 2024
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper • 2412.15322 • Published Dec 19, 2024 • 20
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network Paper • 2309.02836 • Published Sep 6, 2023 • 1
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping Paper • 2405.17251 • Published May 27, 2024 • 2
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer Paper • 2301.12811 • Published Jan 30, 2023
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published May 23, 2024 • 14
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation Paper • 2405.18503 • Published May 28, 2024 • 9