akera/whisper-large-v3-kin-200h-v2

This model is a fine-tuned Whisper large-v3 model for Automatic Speech Recognition (ASR) in Kinyarwanda, specifically trained with 200 hours of labeled speech data. It was developed as part of the research presented in the paper:

How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu

The paper investigates the minimum data volumes required for viable ASR performance in low-resource African languages, using Kinyarwanda and Kikuyu as case studies. It demonstrates that practical ASR performance (WER < 13%) can be achieved with as little as 50 hours of training data, with significant improvements up to 200 hours (WER < 10%). The research also highlights the critical role of data quality in achieving robust system performance.

For more details on the experiments, data preparation, and full code, please refer to the accompanying GitHub repository.

Training Configurations and Results

This model (akera/whisper-large-v3-kin-200h-v2) is one of several models trained and evaluated in the linked research. Below is a summary of the training configurations and their performance (WER, CER) on a dev_test[:300] subset, as reported in the GitHub repository.

Config Hours Model ID on Hugging Face WER (%) CER (%) Score
baseline.yaml 0 openai/whisper-large-v3 33.10 9.80 0.861
train_1h.yaml 1 akera/whisper-large-v3-kin-1h-v2 47.63 16.97 0.754
train_50h.yaml 50 akera/whisper-large-v3-kin-50h-v2 12.51 3.31 0.932
train_100h.yaml 100 akera/whisper-large-v3-kin-100h-v2 10.90 2.84 0.943
train_150h.yaml 150 akera/whisper-large-v3-kin-150h-v2 10.21 2.64 0.948
train_200h.yaml 200 akera/whisper-large-v3-kin-200h-v2 9.82 2.56 0.951
train_500h.yaml 500 akera/whisper-large-v3-kin-500h-v2 8.24 2.15 0.963
train_1000h.yaml 1000 akera/whisper-large-v3-kin-1000h-v2 7.65 1.98 0.967
train_full.yaml ~1400 akera/whisper-large-v3-kin-full 7.14 1.88 0.970

Score = 1 - (0.6 ร— CER + 0.4 ร— WER)

Citation

@article{,
    title={How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu},
    author={Benjamin Akera, Evelyn Nafula, Patrick Walukagga, Gilbert Yiga, John Quinn, Ernest Mwebaze},
    journal={arXiv preprint arXiv:2510.07221},
    year={2025}
}
Downloads last month
7
Safetensors
Model size
2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support