NuralVoiceSTT

Developed by Blink Digital

Accurate universal English speech-to-text model optimized for both callcenter and wideband audio scenarios.

Model Details

NuralVoiceSTT is a state-of-the-art English (US) speech recognition model developed by Blink Digital. This model delivers high-accuracy speech-to-text conversion with support for both narrowband (callcenter) and wideband audio formats.

Key Features

High Accuracy: Optimized for real-world speech recognition tasks
Universal Support: Works with both callcenter and wideband audio
Dynamic Graph: Advanced dynamic graph architecture for flexible recognition
Speaker Adaptation: Built-in i-vector support for speaker adaptation

Model Architecture

Acoustic Model: Advanced neural network-based acoustic modeling
Language Model: Large vocabulary language model with dynamic graph
Feature Extraction: MFCC-based feature extraction pipeline
Speaker Adaptation: I-vector based speaker adaptation system

Model Structure

am/ - Acoustic model files
conf/ - Configuration files (MFCC, model config)
graph/ - Language model graph files (FST files, phones, words)
ivector/ - I-vector files for speaker adaptation

Usage

Python API

import json
import wave
from vosk import Model, KaldiRecognizer, SetLogLevel

# Set log level
SetLogLevel(0)

# Load the NuralVoiceSTT model
model = Model("path/to/NuralVoiceSTT")

# Process audio file
wf = wave.open("audio.wav", "rb")
rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)

# Recognize speech
while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        result = json.loads(rec.Result())
        print(result["text"])

# Get final result
result = json.loads(rec.FinalResult())
print(result["text"])

Download from Hugging Face

from huggingface_hub import snapshot_download

model_path = snapshot_download(
    repo_id="ashishkblink/NuralVoiceSTT",
    local_dir="./models/NuralVoiceSTT"
)

Installation

pip install vosk

Performance

NuralVoiceSTT is optimized for:

Callcenter Audio: High accuracy on narrowband telephone audio
Wideband Audio: Excellent performance on high-quality audio recordings
Real-time Processing: Efficient for live transcription applications

License

This model is distributed under the Apache License 2.0, which allows:

✅ Free use (including commercial)
✅ Modification
✅ Distribution

Citation

If you use this model, please cite:

@misc{nuralvoicestt2024,
  title={NuralVoiceSTT: High-Accuracy English Speech-to-Text Model},
  author={Blink Digital},
  year={2024},
  publisher={Blink Digital},
  url={https://huggingface.co/ashishkblink/NuralVoiceSTT}
}

About Blink Digital

NuralVoiceSTT is developed and maintained by Blink Digital, a leading provider of AI-powered speech recognition solutions. For more information, visit our Hugging Face profile.

Support

For questions, issues, or commercial inquiries, please contact Blink Digital through our Hugging Face profile.

Downloads last month: -; Downloads are not tracked for this model. How to track

ashishkblink
/

NuralVoiceSTT