NuralVoiceSTT

Developed by Blink Digital

Accurate universal English speech-to-text model optimized for both callcenter and wideband audio scenarios.

Model Details

NuralVoiceSTT is a state-of-the-art English (US) speech recognition model developed by Blink Digital. This model delivers high-accuracy speech-to-text conversion with support for both narrowband (callcenter) and wideband audio formats.

Key Features

  • High Accuracy: Optimized for real-world speech recognition tasks
  • Universal Support: Works with both callcenter and wideband audio
  • Dynamic Graph: Advanced dynamic graph architecture for flexible recognition
  • Speaker Adaptation: Built-in i-vector support for speaker adaptation

Model Architecture

  • Acoustic Model: Advanced neural network-based acoustic modeling
  • Language Model: Large vocabulary language model with dynamic graph
  • Feature Extraction: MFCC-based feature extraction pipeline
  • Speaker Adaptation: I-vector based speaker adaptation system

Model Structure

  • am/ - Acoustic model files
  • conf/ - Configuration files (MFCC, model config)
  • graph/ - Language model graph files (FST files, phones, words)
  • ivector/ - I-vector files for speaker adaptation

Usage

Python API

import json
import wave
from vosk import Model, KaldiRecognizer, SetLogLevel

# Set log level
SetLogLevel(0)

# Load the NuralVoiceSTT model
model = Model("path/to/NuralVoiceSTT")

# Process audio file
wf = wave.open("audio.wav", "rb")
rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)

# Recognize speech
while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        result = json.loads(rec.Result())
        print(result["text"])

# Get final result
result = json.loads(rec.FinalResult())
print(result["text"])

Download from Hugging Face

from huggingface_hub import snapshot_download

model_path = snapshot_download(
    repo_id="ashishkblink/NuralVoiceSTT",
    local_dir="./models/NuralVoiceSTT"
)

Installation

pip install vosk

Performance

NuralVoiceSTT is optimized for:

  • Callcenter Audio: High accuracy on narrowband telephone audio
  • Wideband Audio: Excellent performance on high-quality audio recordings
  • Real-time Processing: Efficient for live transcription applications

License

This model is distributed under the Apache License 2.0, which allows:

  • โœ… Free use (including commercial)
  • โœ… Modification
  • โœ… Distribution

Citation

If you use this model, please cite:

@misc{nuralvoicestt2024,
  title={NuralVoiceSTT: High-Accuracy English Speech-to-Text Model},
  author={Blink Digital},
  year={2024},
  publisher={Blink Digital},
  url={https://huggingface.co/ashishkblink/NuralVoiceSTT}
}

About Blink Digital

NuralVoiceSTT is developed and maintained by Blink Digital, a leading provider of AI-powered speech recognition solutions. For more information, visit our Hugging Face profile.

Support

For questions, issues, or commercial inquiries, please contact Blink Digital through our Hugging Face profile.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using ashishkblink/NuralVoiceSTT 1