Configuration Parsing Warning: Invalid JSON for config file config.json

XTTS v2 - Setswana Fine-Tune

This is a fine-tuned version of Coqui XTTS v2 for Setswana (Tswana).

It was trained on 3,428 high-quality clips from the Mozilla Common Voice 17.0 dataset to capture authentic Setswana prosody, rhythm, and intonation.

Model Capabilities

Authentic Prosody: Captures the melodic flow and stress patterns of native Setswana speech.
Native Pronunciation: Improved handling of specific Setswana phonemes compared to the base model.
Cross-Lingual Inference: Can transfer the Setswana voice style to other languages supported by XTTS.

Training Metrics

Base Model: XTTS v2.0.2
Dataset: Common Voice Setswana (Validated, >2 upvotes)
Training Steps: ~850+ (Epoch 1 Complete)
Initial Loss: 3.46
Final Eval Loss: 2.22
Current Training Loss: ~1.83

Usage

from TTS.api import TTS
tts = TTS("ogaufi/xtts-v2-setswana", gpu=True)

# Generate speech
tts.tts_to_file(text="Dumêla rra, o tsogile jang?",
                file_path="output.wav",
                speaker_wav="reference_speaker.wav",
                language="en") # Use 'en' as base language

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support