SAO-Instruct: Free-form Audio Editing using Natural Language Instructions Paper • 2510.22795 • Published 24 days ago • 4
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper • 2506.08279 • Published Jun 9 • 27
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 154
LatentSwap: An Efficient Latent Code Mapping Framework for Face Swapping Paper • 2402.18351 • Published Feb 28, 2024 • 2
Mitsua/mitsua-japanese-clip-vit-b-16 Zero-Shot Image Classification • 0.2B • Updated Dec 9, 2024 • 3 • 7
Running 557 Talking Face Generation with Multilingual TTS 👄 557 Generate a talking face video from text in multiple languages