Luganda TTS v3

Text-to-Speech system for Luganda using NVIDIA NeMo.

Models

Model Description Size
luganda_fastpitch.nemo FastPitch spectrogram generator 187 MB
luganda_hifigan.nemo HiFi-GAN neural vocoder 339 MB

Training

  • Dataset: Sunbird/salt (~2,380 samples, 2.69 hours)
  • FastPitch: 20,000 steps
  • HiFi-GAN: 20,000 steps
  • Sample Rate: 22,050 Hz

Usage

from nemo.collections.tts.models import FastPitchModel, HifiGanModel

fastpitch = FastPitchModel.restore_from("luganda_fastpitch.nemo")
hifigan = HifiGanModel.restore_from("luganda_hifigan.nemo")

text = "Oli otya?"
spec = fastpitch.generate_spectrogram(tokens=fastpitch.parse(text))
audio = hifigan.convert_spectrogram_to_audio(spec=spec)
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train cxlrd/luganda-tts-v3