Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Paper • 2106.06103 • Published Jun 11, 2021 • 4
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published 17 days ago • 86
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published 11 days ago • 44
view article Article Building the Open Agent Ecosystem Together: Introducing OpenEnv 12 days ago • 114
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE Paper • 2510.13344 • Published 19 days ago • 61
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec Paper • 2410.15764 • Published Oct 21, 2024 • 1
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Paper • 2409.00750 • Published Sep 1, 2024 • 5
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs Paper • 2509.22220 • Published Sep 26 • 64
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28 • 115
Advancing Speech Understanding in Speech-Aware Language Models with GRPO Paper • 2509.16990 • Published Sep 21 • 18
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Paper • 2502.18924 • Published Feb 26 • 16
Fast Text-to-Audio Generation with Adversarial Post-Training Paper • 2505.08175 • Published May 13 • 25
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15 • 103