alexwengg's picture
Update README.md
dfd55eb verified
metadata
license: cc-by-4.0
language:
  - en
  - es
  - it
  - fr
  - de
  - nl
  - ru
  - pl
  - uk
  - sk
  - bg
  - fi
  - ro
  - hr
  - cs
  - sv
  - et
  - hu
  - lt
  - da
  - mt
  - sl
  - lv
  - el
pipeline_tag: automatic-speech-recognition
thumbnail: null
tags:
  - automatic-speech-recognition
  - speech
  - audio
  - Transducer
  - TDT
  - FastConformer
  - Conformer
  - multilingual
  - NeMo
  - OpenVINO
base_model:
  - nvidia/parakeet-tdt-1.1b

Parakeet TDT 1.1B V3 - OpenVINO

Discord GitHub Repo stars

OpenVINO-optimized version of NVIDIA's Parakeet TDT 1.1B V3 model for high-performance multilingual automatic speech recognition on Intel NPUs and CPUs.

Benchmark Results

Hardware: Intel Core Ultra 7 155H (Meteor Lake) with Intel AI Boost NPU Software: OpenVINO 2025.x

LibriSpeech test-clean (English)

Metric Value
Average WER 3.7%
Median WER 0.0%
Average CER 1.9%
RTFx (NPU) 25.7×
RTFx (CPU) 5-8×
Files processed 2,620 (5.4 hours)

FLEURS Multilingual (24 Languages)

Metric Value
Average WER 17.0%
Average CER 5.4%
Average RTFx 41.1×
Total samples ~15,000+

Best performing languages (WER): Italian 4.3%, Spanish 5.4%, English 6.1%, German 7.4%, French 7.7%

See BENCHMARK_RESULTS.md for complete per-language results.

Performance Comparison

Implementation Device RTFx (Avg) WER (LibriSpeech)
eddy (OpenVINO) Intel Core Ultra 7 155H NPU 25.7× 3.7%
Parakeet (PyTorch) Intel Arc 140V GPU ~20×* ~2.5%*
eddy (OpenVINO) Intel Core Ultra 7 155H CPU 5-8× 3.7%

Note: Benchmarked on HP EliteBook Ultra G1i. eddy NPU is ~1.3× faster than PyTorch on Intel Arc GPU, with lower power consumption. *V3 estimated from V2 benchmark.

Supported Languages

24 European languages: English, Spanish, Italian, French, German, Dutch, Russian, Polish, Ukrainian, Slovak, Bulgarian, Finnish, Romanian, Croatian, Czech, Swedish, Estonian, Hungarian, Lithuanian, Danish, Maltese, Slovenian, Latvian, Greek

Usage

Python usage via ctypes available - see eddy repository for details.

Model Details

  • Parameters: 1.1B
  • Architecture: FastConformer-RNNT (4-model pipeline)
  • Languages: 24 European languages
  • Blank token ID: 8192
  • Context window: 10s chunks with 3s overlap
  • Features: LSTM state continuity, token deduplication, per-token timestamps

License

CC-BY-4.0 - See LICENSE for details.

Links

Acknowledgments

Based on NVIDIA's Parakeet TDT model. OpenVINO conversion and optimization by the FluidInference team.