Luganda TTS v3

Text-to-Speech system for Luganda using NVIDIA NeMo.

Models

Model	Description	Size
`luganda_fastpitch.nemo`	FastPitch spectrogram generator	187 MB
`luganda_hifigan.nemo`	HiFi-GAN neural vocoder	339 MB

Training

Dataset: Sunbird/salt (~2,380 samples, 2.69 hours)
FastPitch: 20,000 steps
HiFi-GAN: 20,000 steps
Sample Rate: 22,050 Hz

Usage

from nemo.collections.tts.models import FastPitchModel, HifiGanModel

fastpitch = FastPitchModel.restore_from("luganda_fastpitch.nemo")
hifigan = HifiGanModel.restore_from("luganda_hifigan.nemo")

text = "Oli otya?"
spec = fastpitch.generate_spectrogram(tokens=fastpitch.parse(text))
audio = hifigan.convert_spectrogram_to_audio(spec=spec)

Downloads last month: 9

cxlrd
/

luganda-tts-v3

Luganda TTS v3

Models

Training

Usage

Dataset used to train cxlrd/luganda-tts-v3