Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis Paper • 2407.09732 • Published Jul 13, 2024 • 10
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation Paper • 2408.11849 • Published Aug 13, 2024
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue Paper • 2409.04927 • Published Sep 7, 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Paper • 2409.10058 • Published Sep 16, 2024 • 2
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform Paper • 2309.09493 • Published Sep 18, 2023
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding Paper • 2502.16794 • Published Feb 24 • 5
Learning Representations for New Sound Classes With Continual Self-Supervised Learning Paper • 2205.07390 • Published May 15, 2022
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions Paper • 2301.08810 • Published Jan 20, 2023
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model Paper • 2405.11831 • Published May 20, 2024 • 1
Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience Paper • 2402.03710 • Published Feb 6, 2024
Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation Paper • 2403.18257 • Published Mar 27, 2024 • 1
Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation Paper • 2309.15938 • Published Sep 27, 2023
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes Paper • 2305.18441 • Published May 29, 2023