--- license: cc-by-4.0 language: - en - es - it - fr - de - nl - ru - pl - uk - sk - bg - fi - ro - hr - cs - sv - et - hu - lt - da - mt - sl - lv - el pipeline_tag: automatic-speech-recognition thumbnail: null tags: - automatic-speech-recognition - speech - audio - Transducer - TDT - FastConformer - Conformer - multilingual - NeMo - OpenVINO base_model: - nvidia/parakeet-tdt-1.1b --- # Parakeet TDT 1.1B V3 - OpenVINO [![Discord](https://img.shields.io/badge/Discord-Join%20Chat-7289da.svg)](https://discord.gg/WNsvaCtmDe) [![GitHub Repo stars](https://img.shields.io/github/stars/FluidInference/eddy?style=flat&logo=github)](https://github.com/FluidInference/eddy) OpenVINO-optimized version of NVIDIA's Parakeet TDT 1.1B V3 model for high-performance multilingual automatic speech recognition on Intel NPUs and CPUs. ## Benchmark Results **Hardware**: Intel Core Ultra 7 155H (Meteor Lake) with Intel AI Boost NPU **Software**: OpenVINO 2025.x ### LibriSpeech test-clean (English) | Metric | Value | |--------|-------| | **Average WER** | 3.7% | | **Median WER** | 0.0% | | **Average CER** | 1.9% | | **RTFx (NPU)** | 25.7× | | **RTFx (CPU)** | 5-8× | | **Files processed** | 2,620 (5.4 hours) | ### FLEURS Multilingual (24 Languages) | Metric | Value | |--------|-------| | **Average WER** | 17.0% | | **Average CER** | 5.4% | | **Average RTFx** | 41.1× | | **Total samples** | ~15,000+ | **Best performing languages** (WER): Italian 4.3%, Spanish 5.4%, English 6.1%, German 7.4%, French 7.7% See [BENCHMARK_RESULTS.md](https://github.com/FluidInference/eddy/blob/main/BENCHMARK_RESULTS.md) for complete per-language results. ## Performance Comparison | Implementation | Device | RTFx (Avg) | WER (LibriSpeech) | |----------------|--------|------------|-------------------| | **eddy (OpenVINO)** | Intel Core Ultra 7 155H NPU | **25.7×** | 3.7% | | Parakeet (PyTorch) | Intel Arc 140V GPU | ~20×* | ~2.5%* | | **eddy (OpenVINO)** | Intel Core Ultra 7 155H CPU | **5-8×** | 3.7% | > **Note**: Benchmarked on HP EliteBook Ultra G1i. eddy NPU is ~1.3× faster than PyTorch on Intel Arc GPU, with lower power consumption. *V3 estimated from V2 benchmark. ## Supported Languages **24 European languages**: English, Spanish, Italian, French, German, Dutch, Russian, Polish, Ukrainian, Slovak, Bulgarian, Finnish, Romanian, Croatian, Czech, Swedish, Estonian, Hungarian, Lithuanian, Danish, Maltese, Slovenian, Latvian, Greek ## Usage Python usage via ctypes available - see [eddy repository](https://github.com/FluidInference/eddy) for details. ## Model Details - **Parameters**: 1.1B - **Architecture**: FastConformer-RNNT (4-model pipeline) - **Languages**: 24 European languages - **Blank token ID**: 8192 - **Context window**: 10s chunks with 3s overlap - **Features**: LSTM state continuity, token deduplication, per-token timestamps ## License CC-BY-4.0 - See [LICENSE](LICENSE) for details. ## Links - **GitHub**: [FluidInference/eddy](https://github.com/FluidInference/eddy) - **Base Model**: [nvidia/parakeet-tdt-1.1b](https://huggingface.co/nvidia/parakeet-tdt-1.1b) - **Documentation**: [Benchmark Results](https://github.com/FluidInference/eddy/blob/main/BENCHMARK_RESULTS.md) ## Acknowledgments Based on NVIDIA's Parakeet TDT model. OpenVINO conversion and optimization by the FluidInference team.