Questions about streaming with Parakeet and TDT merging methods

#13

by alexandreacff - opened Sep 15

Sep 15

I’m currently trying to work with Parakeet in streaming mode, receiving microphone chunks and generating live transcriptions.

As a reference, I’m using the following code for streaming: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py

However, I’ve run into some questions:

Why do the more conventional merging methods not work well for TDT? I tested them, but the performance dropped significantly.
Is there already an implementation available for this use case (streaming with Parakeet using microphone chunks)?

artbataev

NVIDIA org Sep 18

I responded in the adjacent thread https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/discussions/63#68cc58004fdfe65cc5d61be5
In brief:

Please, use the new streaming pipeline https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_streaming_infer_rnnt.py
You can try https://github.com/NVIDIA-NeMo/NeMo/pull/14759 as a reference for chunked streaming with microphone

WJ88

30 days ago

Guys maybe this will help.
I finally managed to make the streaming with microphone gradio working. There are no errors regarding microphone now. I was also fighting with that problem a lot.
The space itself is not great, but the concept of streaming and gradio integration finally works.

https://huggingface.co/spaces/WJ88/Parakeet-TDT-0.6b-V3_-_multilingual_but_performance_issues_accumulating_over_time

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment