NuralVoiceSTT
Developed by Blink Digital
Accurate universal English speech-to-text model optimized for both callcenter and wideband audio scenarios.
Model Details
NuralVoiceSTT is a state-of-the-art English (US) speech recognition model developed by Blink Digital. This model delivers high-accuracy speech-to-text conversion with support for both narrowband (callcenter) and wideband audio formats.
Key Features
- High Accuracy: Optimized for real-world speech recognition tasks
- Universal Support: Works with both callcenter and wideband audio
- Dynamic Graph: Advanced dynamic graph architecture for flexible recognition
- Speaker Adaptation: Built-in i-vector support for speaker adaptation
Model Architecture
- Acoustic Model: Advanced neural network-based acoustic modeling
- Language Model: Large vocabulary language model with dynamic graph
- Feature Extraction: MFCC-based feature extraction pipeline
- Speaker Adaptation: I-vector based speaker adaptation system
Model Structure
am/- Acoustic model filesconf/- Configuration files (MFCC, model config)graph/- Language model graph files (FST files, phones, words)ivector/- I-vector files for speaker adaptation
Usage
Python API
import json
import wave
from vosk import Model, KaldiRecognizer, SetLogLevel
# Set log level
SetLogLevel(0)
# Load the NuralVoiceSTT model
model = Model("path/to/NuralVoiceSTT")
# Process audio file
wf = wave.open("audio.wav", "rb")
rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)
# Recognize speech
while True:
data = wf.readframes(4000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
result = json.loads(rec.Result())
print(result["text"])
# Get final result
result = json.loads(rec.FinalResult())
print(result["text"])
Download from Hugging Face
from huggingface_hub import snapshot_download
model_path = snapshot_download(
repo_id="ashishkblink/NuralVoiceSTT",
local_dir="./models/NuralVoiceSTT"
)
Installation
pip install vosk
Performance
NuralVoiceSTT is optimized for:
- Callcenter Audio: High accuracy on narrowband telephone audio
- Wideband Audio: Excellent performance on high-quality audio recordings
- Real-time Processing: Efficient for live transcription applications
License
This model is distributed under the Apache License 2.0, which allows:
- โ Free use (including commercial)
- โ Modification
- โ Distribution
Citation
If you use this model, please cite:
@misc{nuralvoicestt2024,
title={NuralVoiceSTT: High-Accuracy English Speech-to-Text Model},
author={Blink Digital},
year={2024},
publisher={Blink Digital},
url={https://huggingface.co/ashishkblink/NuralVoiceSTT}
}
About Blink Digital
NuralVoiceSTT is developed and maintained by Blink Digital, a leading provider of AI-powered speech recognition solutions. For more information, visit our Hugging Face profile.
Support
For questions, issues, or commercial inquiries, please contact Blink Digital through our Hugging Face profile.