You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

📖 Abstract

Most speech technology exists for high-resource languages, leaving low-resource languages like Idoma digitally underrepresented. This project bridges that gap by developing a fully functional Bidirectional English-Idoma Speech System.

The system utilizes an Applied Research Design to solve data scarcity and tonal complexity issues inherent in the Idoma language. It integrates three fine-tuned transformer models to enable natural conversation flow:

ASR (Automatic Speech Recognition): Fine-tuned Wav2Vec 2.0 XLS-R.
NMT (Neural Machine Translation): Fine-tuned NLLB-200.
TTS (Text-to-Speech): Fine-tuned VITS-based MMS-TTS.

Core Components

Speech-to-Text (STT): Converts input audio to text using Wav2Vec2 (Idoma) and Whisper Large v3 (English).
Machine Translation (NMT): Translates text between English and Idoma using NLLB-200.
Text-to-Speech (TTS): Synthesizes the translated text into speech using VITS (Idoma) and SpeechT5 (English).

📊 Performance Results

The system was rigorously tested against a held-out test set. Below are the quantitative evaluation metrics reported in the thesis:

Component	Model Architecture	Metric	Score
Idoma STT	Wav2Vec 2.0 XLS-R	Word Error Rate (WER)	11.43%
		Character Error Rate	3.5%
NMT Engine	NLLB-200	BLEU	31.42
		spBLEU	33.25
		ChrF++	50.51
Idoma TTS	VITS MMS-TTS	MOS (Intelligibility)	4.36 / 5.0

🔗 Software Artefacts & Models

All models and datasets developed during this research are hosted on Hugging Face and are automatically downloaded by this application upon first run.

Function	Model Name	Hugging Face Link
STT	wav2vec2-xls-r-1b-finetuned-idoma	View Model
TTS	idoma-mms-tts-eng	View Model
NMT	idu-eng-translator	View Model

Curated Datasets

Adah-Idoma Dataset: mrheartng/adah-idoma
Idoma TTS (Speaker 1): mrheartng/idoma-tts-speaker1
Idoma STT (Multi-speaker): mrheartng/idoma-tts-multiple-speakers

🛠️ Installation & Usage

Prerequisites

Python 3.8+
CUDA-enabled GPU (Recommended for faster inference)
FFmpeg (Required for audio processing)

Setup

Clone the repository:

git clone https://github.com/mrheart/idoma-english-bidirectional-tts-stt.git
cd idoma-english-bidirectional-tts-stt/idoma_translator

Install dependencies:

pip install -r requirements.txt

Run the application:

python app.py

Access the UI: Open your browser to the local Gradio URL provided in the terminal (usually http://127.0.0.1:7860).

📂 Project Structure

├── config/             # Configuration settings and model IDs
├── src/
│   ├── services/       # Core inference logic (ASR, NMT, TTS)
│   ├── model_loader.py # Singleton pattern for memory-efficient model loading
│   └── utils.py        # Audio processing utilities
├── tests/              # Unit tests
├── app.py              # Main entry point (Gradio Application)
└── requirements.txt    # Project dependencies

📝 Citation

If you use this code, models, or dataset in your research, please cite the thesis:

@mastersthesis{Ojabo2025Idoma,
  author  = {Ojabo, John Heart},
  title   = {Developing a Bidirectional English-Idoma Speech-to-Text (STT) and Text-to-Speech (TTS) System for Enhanced Communication and Language Preservation},
  school  = {University of Essex},
  year    = {2025}
}

📧 Contact

For questions regarding the dataset or model fine-tuning parameters, please open an issue in this repository or contact the author via Hugging Face.

Downloads last month: 2

Safetensors

Model size

1B params

Tensor type

F32

Model tree for mrheartng/idu-eng-translator

Base model

facebook/nllb-200-1.3B

Finetuned

(26)

this model

mrheartng
/

idu-eng-translator

You need to agree to share your contact information to access this model

Model tree for mrheartng/idu-eng-translator

Dataset used to train mrheartng/idu-eng-translator