Audio & Speech Models - a adarshzolekar Collection

adarshzolekar 's Collections

Multimodal AI Models

Audio & Speech Models

Vision Models (Image & Video)

Text & Code Models (NLP)

Audio & Speech Models

updated Jan 23

Purpose: Speech recognition, text-to-speech, music, audio analysis.

openai/whisper-large-v3

Automatic Speech Recognition • Updated Aug 12, 2024 • 6.07M • • 5.43k
facebook/wav2vec2-base-960h

Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 3.57M • 389
coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 8.12M • 3.41k
microsoft/speecht5_tts

Text-to-Speech • Updated Nov 8, 2023 • 114k • 823
facebook/musicgen-small

Text-to-Audio • Updated Nov 17, 2023 • 149k • 478