Luganda TTS v3
Text-to-Speech system for Luganda using NVIDIA NeMo.
Models
| Model | Description | Size |
|---|---|---|
luganda_fastpitch.nemo |
FastPitch spectrogram generator | 187 MB |
luganda_hifigan.nemo |
HiFi-GAN neural vocoder | 339 MB |
Training
- Dataset: Sunbird/salt (~2,380 samples, 2.69 hours)
- FastPitch: 20,000 steps
- HiFi-GAN: 20,000 steps
- Sample Rate: 22,050 Hz
Usage
from nemo.collections.tts.models import FastPitchModel, HifiGanModel
fastpitch = FastPitchModel.restore_from("luganda_fastpitch.nemo")
hifigan = HifiGanModel.restore_from("luganda_hifigan.nemo")
text = "Oli otya?"
spec = fastpitch.generate_spectrogram(tokens=fastpitch.parse(text))
audio = hifigan.convert_spectrogram_to_audio(spec=spec)
- Downloads last month
- 9