video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models Paper • 2406.15704 • Published Jun 22, 2024 • 6
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing Paper • 2509.16622 • Published Sep 20 • 1
Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement Paper • 2409.09642 • Published Sep 15, 2024 • 1
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark Paper • 2410.19168 • Published Oct 24, 2024 • 22
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 298