diarray commited on
Commit
81b3c01
·
verified ·
1 Parent(s): 47a5cbe

Update Model Card: Clarification about WER values disparency && Adding workaround for GPUs that doesn't support CUDA Graphs

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -59,7 +59,9 @@ img {
59
  `soloni-114m-tdt-ctc` is a fine tuned version of nvidia's [`parakeet-tdt_ctc-110m`](https://huggingface.co/nvidia/parakeet-tdt_ctc-110m) that transcribes bambara language speech. Unlike its base model, this model cannot write Punctuations and Capitalizations since these were absent from its training.
60
  The model was fine-tuned using **NVIDIA NeMo** and supports **both TDT (Token-and-Duration Transducer) and CTC (Connectionist Temporal Classification) decoding**.
61
 
62
- ## **🚨 Important Note**
 
 
63
  This model, along with its associated resources, is part of an **ongoing research effort**, improvements and refinements are expected in future versions. Users should be aware that:
64
 
65
  - **The model may not generalize very well accross all speaking conditions and dialects.**
@@ -89,6 +91,14 @@ asr_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(model_na
89
  asr_model.transcribe(['sample_audio.wav'])
90
  ```
91
 
 
 
 
 
 
 
 
 
92
  ### Input
93
 
94
  This model accepts **16000 Hz mono-channel** audio (wav files) as input.
@@ -126,7 +136,7 @@ These are greedy WER numbers without external LM. By default the main decoder br
126
 
127
  ```python
128
  # Retrieve the CTC decoding config
129
- ctc_decoding_cfg = model.cfg.aux_ctc.decoding
130
  # Then change the decoding strategy
131
  asr_model.change_decoding_strategy(decoder_type='ctc', decoding_cfg=ctc_decoding_cfg)
132
  # Transcribe with the CTC decoder
 
59
  `soloni-114m-tdt-ctc` is a fine tuned version of nvidia's [`parakeet-tdt_ctc-110m`](https://huggingface.co/nvidia/parakeet-tdt_ctc-110m) that transcribes bambara language speech. Unlike its base model, this model cannot write Punctuations and Capitalizations since these were absent from its training.
60
  The model was fine-tuned using **NVIDIA NeMo** and supports **both TDT (Token-and-Duration Transducer) and CTC (Connectionist Temporal Classification) decoding**.
61
 
62
+ ## **🚨 Important Note**
63
+ **Update (February 17th):** We observed a significantly lower WER (~36%) for the TDT branch when using an external WER calculation method that relies solely on the predicted and reference transcriptions. However, the WER values reported in this model card are derived from the standard NeMo workflow using PyTorch Lightning's trainer, where the TDT branch yielded higher WER scores (~66%). Differences may arise due to variations in post-processing, alignment handling, or evaluation methodologies.
64
+
65
  This model, along with its associated resources, is part of an **ongoing research effort**, improvements and refinements are expected in future versions. Users should be aware that:
66
 
67
  - **The model may not generalize very well accross all speaking conditions and dialects.**
 
91
  asr_model.transcribe(['sample_audio.wav'])
92
  ```
93
 
94
+ Note that the decoding strategy for the TDT decoder use CUDA Graphs by default but not all GPUs and versions of cuda support that parameter. If you run into a `RuntimeError: CUDA error: invalid argument` you should set that argument to false in the decoding strategy before call asr_model.transcribe()
95
+ ```python
96
+ decoding_cfg = asr_model.cfg.decoding
97
+ # Disable CUDA Graphs
98
+ decoding_cfg.greedy.use_cuda_graph_decoder = False
99
+ # Then change the decoding strategy
100
+ asr_model.change_decoding_strategy(decoding_cfg=decoding_cfg)
101
+ ```
102
  ### Input
103
 
104
  This model accepts **16000 Hz mono-channel** audio (wav files) as input.
 
136
 
137
  ```python
138
  # Retrieve the CTC decoding config
139
+ ctc_decoding_cfg = asr_model.cfg.aux_ctc.decoding
140
  # Then change the decoding strategy
141
  asr_model.change_decoding_strategy(decoder_type='ctc', decoding_cfg=ctc_decoding_cfg)
142
  # Transcribe with the CTC decoder