Kinyarwanda Whisper Large v3 (Full Data)
This model, akera/whisper-large-v3-kin-full, is a fine-tuned version of openai/whisper-large-v3 for Automatic Speech Recognition in Kinyarwanda. It was presented in the paper How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu.
The model was trained on approximately 1400 hours of Kinyarwanda transcribed speech data.
Paper Abstract
The abstract of the paper is the following:
The development of Automatic Speech Recognition (ASR) systems for low-resource African languages remains challenging due to limited transcribed speech data. While recent advances in large multilingual models like OpenAI's Whisper offer promising pathways for low-resource ASR development, critical questions persist regarding practical deployment requirements. This paper addresses two fundamental concerns for practitioners: determining the minimum data volumes needed for viable performance and characterizing the primary failure modes that emerge in production systems. We evaluate Whisper's performance through comprehensive experiments on two Bantu languages: systematic data scaling analysis on Kinyarwanda using training sets from 1 to 1,400 hours, and detailed error characterization on Kikuyu using 270 hours of training data. Our scaling experiments demonstrate that practical ASR performance (WER < 13%) becomes achievable with as little as 50 hours of training data, with substantial improvements continuing through 200 hours (WER < 10%). Complementing these volume-focused findings, our error analysis reveals that data quality issues, particularly noisy ground truth transcriptions, account for 38.6% of high-error cases, indicating that careful data curation is as critical as data volume for robust system performance. These results provide actionable benchmarks and deployment guidance for teams developing ASR systems across similar low-resource language contexts. We release accompanying and models see this https URL
Code and Project Page
Find the code and more details at the GitHub repository.
Installation (from GitHub)
To set up the environment and run experiments from the GitHub repository:
git clone https://github.com/SunbirdAI/kinyarwanda-whisper-eval.git
cd kinyarwanda-whisper-eval
uv sync
Install SALT:
git clone https://github.com/SunbirdAI/salt.git
uv pip install -r salt/requirements.txt
Set up environment:
cp env_example .env
Fill in your .env with MLflow and Hugging Face credentials.
Evaluation
To evaluate this model (or any other Hugging Face ASR model) using the provided evaluation script:
uv run python eval.py --model_path akera/whisper-large-v3-kin-full --batch_size=8
Performance Results
Evaluation on dev_test[:300] subset (as reported in the paper and GitHub repository):
| Model | Hours | WER (%) | CER (%) | Score |
|---|---|---|---|---|
openai/whisper-large-v3 |
0 | 33.10 | 9.80 | 0.861 |
akera/whisper-large-v3-kin-1h-v2 |
1 | 47.63 | 16.97 | 0.754 |
akera/whisper-large-v3-kin-50h-v2 |
50 | 12.51 | 3.31 | 0.932 |
akera/whisper-large-v3-kin-100h-v2 |
100 | 10.90 | 2.84 | 0.943 |
akera/whisper-large-v3-kin-150h-v2 |
150 | 10.21 | 2.64 | 0.948 |
akera/whisper-large-v3-kin-200h-v2 |
200 | 9.82 | 2.56 | 0.951 |
akera/whisper-large-v3-kin-500h-v2 |
500 | 8.24 | 2.15 | 0.963 |
akera/whisper-large-v3-kin-1000h-v2 |
1000 | 7.65 | 1.98 | 0.967 |
akera/whisper-large-v3-kin-full |
~1400 | 7.14 | 1.88 | 0.970 |
Score = 1 - (0.6 ร CER + 0.4 ร WER)
- Downloads last month
- 42
Model tree for akera/whisper-large-v3-kin-200h-v2
Base model
openai/whisper-large-v3