Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
adarshzolekar 's Collections
Multimodal AI Models
Audio & Speech Models
Vision Models (Image & Video)
Text & Code Models (NLP)

Audio & Speech Models

updated Jan 23

Purpose: Speech recognition, text-to-speech, music, audio analysis.

Upvote
1

  • openai/whisper-large-v3

    Automatic Speech Recognition • Updated Aug 12, 2024 • 6.07M • • 5.43k

  • facebook/wav2vec2-base-960h

    Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 3.57M • 389

  • coqui/XTTS-v2

    Text-to-Speech • Updated Dec 11, 2023 • 8.12M • 3.41k

  • microsoft/speecht5_tts

    Text-to-Speech • Updated Nov 8, 2023 • 114k • 823

  • facebook/musicgen-small

    Text-to-Audio • Updated Nov 17, 2023 • 149k • 478
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs