Configuration Parsing Warning: Invalid JSON for config file config.json

XTTS v2 - Setswana Fine-Tune

This is a fine-tuned version of Coqui XTTS v2 for Setswana (Tswana).

It was trained on 3,428 high-quality clips from the Mozilla Common Voice 17.0 dataset to capture authentic Setswana prosody, rhythm, and intonation.

Model Capabilities

  • Authentic Prosody: Captures the melodic flow and stress patterns of native Setswana speech.
  • Native Pronunciation: Improved handling of specific Setswana phonemes compared to the base model.
  • Cross-Lingual Inference: Can transfer the Setswana voice style to other languages supported by XTTS.

Training Metrics

  • Base Model: XTTS v2.0.2
  • Dataset: Common Voice Setswana (Validated, >2 upvotes)
  • Training Steps: ~850+ (Epoch 1 Complete)
  • Initial Loss: 3.46
  • Final Eval Loss: 2.22
  • Current Training Loss: ~1.83

Usage

from TTS.api import TTS
tts = TTS("ogaufi/xtts-v2-setswana", gpu=True)

# Generate speech
tts.tts_to_file(text="Dumêla rra, o tsogile jang?",
                file_path="output.wav",
                speaker_wav="reference_speaker.wav",
                language="en") # Use 'en' as base language
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support