Questions about streaming with Parakeet and TDT merging methods

#13
by alexandreacff - opened

I’m currently trying to work with Parakeet in streaming mode, receiving microphone chunks and generating live transcriptions.

As a reference, I’m using the following code for streaming: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py

However, I’ve run into some questions:

  1. Why do the more conventional merging methods not work well for TDT? I tested them, but the performance dropped significantly.

  2. Is there already an implementation available for this use case (streaming with Parakeet using microphone chunks)?

NVIDIA org

Guys maybe this will help.
I finally managed to make the streaming with microphone gradio working. There are no errors regarding microphone now. I was also fighting with that problem a lot.
The space itself is not great, but the concept of streaming and gradio integration finally works.

https://huggingface.co/spaces/WJ88/Parakeet-TDT-0.6b-V3_-_multilingual_but_performance_issues_accumulating_over_time

Sign up or log in to comment