akera/whisper-large-v3-kin-200h-v2
This model is a fine-tuned Whisper large-v3 model for Automatic Speech Recognition (ASR) in Kinyarwanda, specifically trained with 200 hours of labeled speech data. It was developed as part of the research presented in the paper:
The paper investigates the minimum data volumes required for viable ASR performance in low-resource African languages, using Kinyarwanda and Kikuyu as case studies. It demonstrates that practical ASR performance (WER < 13%) can be achieved with as little as 50 hours of training data, with significant improvements up to 200 hours (WER < 10%). The research also highlights the critical role of data quality in achieving robust system performance.
For more details on the experiments, data preparation, and full code, please refer to the accompanying GitHub repository.
Training Configurations and Results
This model (akera/whisper-large-v3-kin-200h-v2) is one of several models trained and evaluated in the linked research. Below is a summary of the training configurations and their performance (WER, CER) on a dev_test[:300] subset, as reported in the GitHub repository.
| Config | Hours | Model ID on Hugging Face | WER (%) | CER (%) | Score |
|---|---|---|---|---|---|
baseline.yaml |
0 | openai/whisper-large-v3 |
33.10 | 9.80 | 0.861 |
train_1h.yaml |
1 | akera/whisper-large-v3-kin-1h-v2 |
47.63 | 16.97 | 0.754 |
train_50h.yaml |
50 | akera/whisper-large-v3-kin-50h-v2 |
12.51 | 3.31 | 0.932 |
train_100h.yaml |
100 | akera/whisper-large-v3-kin-100h-v2 |
10.90 | 2.84 | 0.943 |
train_150h.yaml |
150 | akera/whisper-large-v3-kin-150h-v2 |
10.21 | 2.64 | 0.948 |
train_200h.yaml |
200 | akera/whisper-large-v3-kin-200h-v2 |
9.82 | 2.56 | 0.951 |
train_500h.yaml |
500 | akera/whisper-large-v3-kin-500h-v2 |
8.24 | 2.15 | 0.963 |
train_1000h.yaml |
1000 | akera/whisper-large-v3-kin-1000h-v2 |
7.65 | 1.98 | 0.967 |
train_full.yaml |
~1400 | akera/whisper-large-v3-kin-full |
7.14 | 1.88 | 0.970 |
Score = 1 - (0.6 ร CER + 0.4 ร WER)
Citation
@article{,
title={How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu},
author={Benjamin Akera, Evelyn Nafula, Patrick Walukagga, Gilbert Yiga, John Quinn, Ernest Mwebaze},
journal={arXiv preprint arXiv:2510.07221},
year={2025}
}
- Downloads last month
- 7