You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸ“– Abstract

Most speech technology exists for high-resource languages, leaving low-resource languages like Idoma digitally underrepresented. This project bridges that gap by developing a fully functional Bidirectional English-Idoma Speech System.

The system utilizes an Applied Research Design to solve data scarcity and tonal complexity issues inherent in the Idoma language. It integrates three fine-tuned transformer models to enable natural conversation flow:

  • ASR (Automatic Speech Recognition): Fine-tuned Wav2Vec 2.0 XLS-R.
  • NMT (Neural Machine Translation): Fine-tuned NLLB-200.
  • TTS (Text-to-Speech): Fine-tuned VITS-based MMS-TTS.

Core Components

  • Speech-to-Text (STT): Converts input audio to text using Wav2Vec2 (Idoma) and Whisper Large v3 (English).
  • Machine Translation (NMT): Translates text between English and Idoma using NLLB-200.
  • Text-to-Speech (TTS): Synthesizes the translated text into speech using VITS (Idoma) and SpeechT5 (English).

πŸ“Š Performance Results

The system was rigorously tested against a held-out test set. Below are the quantitative evaluation metrics reported in the thesis:

Component Model Architecture Metric Score
Idoma STT Wav2Vec 2.0 XLS-R Word Error Rate (WER) 11.43%
Character Error Rate 3.5%
NMT Engine NLLB-200 BLEU 31.42
spBLEU 33.25
ChrF++ 50.51
Idoma TTS VITS MMS-TTS MOS (Intelligibility) 4.36 / 5.0

πŸ”— Software Artefacts & Models

All models and datasets developed during this research are hosted on Hugging Face and are automatically downloaded by this application upon first run.

Function Model Name Hugging Face Link
STT wav2vec2-xls-r-1b-finetuned-idoma View Model
TTS idoma-mms-tts-eng View Model
NMT idu-eng-translator View Model

Curated Datasets

πŸ› οΈ Installation & Usage

Prerequisites

  • Python 3.8+
  • CUDA-enabled GPU (Recommended for faster inference)
  • FFmpeg (Required for audio processing)

Setup

Clone the repository:

git clone https://github.com/mrheart/idoma-english-bidirectional-tts-stt.git
cd idoma-english-bidirectional-tts-stt/idoma_translator

Install dependencies:

pip install -r requirements.txt

Run the application:

python app.py

Access the UI: Open your browser to the local Gradio URL provided in the terminal (usually http://127.0.0.1:7860).

πŸ“‚ Project Structure

β”œβ”€β”€ config/             # Configuration settings and model IDs
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ services/       # Core inference logic (ASR, NMT, TTS)
β”‚   β”œβ”€β”€ model_loader.py # Singleton pattern for memory-efficient model loading
β”‚   └── utils.py        # Audio processing utilities
β”œβ”€β”€ tests/              # Unit tests
β”œβ”€β”€ app.py              # Main entry point (Gradio Application)
└── requirements.txt    # Project dependencies

πŸ“ Citation

If you use this code, models, or dataset in your research, please cite the thesis:

@mastersthesis{Ojabo2025Idoma,
  author  = {Ojabo, John Heart},
  title   = {Developing a Bidirectional English-Idoma Speech-to-Text (STT) and Text-to-Speech (TTS) System for Enhanced Communication and Language Preservation},
  school  = {University of Essex},
  year    = {2025}
}

πŸ“§ Contact

For questions regarding the dataset or model fine-tuning parameters, please open an issue in this repository or contact the author via Hugging Face.

Downloads last month
2
Safetensors
Model size
1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mrheartng/idu-eng-translator

Finetuned
(26)
this model

Dataset used to train mrheartng/idu-eng-translator