Instructions to use ai4bharat/bhili-asr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use ai4bharat/bhili-asr with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("ai4bharat/bhili-asr") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Bhili ASR
Automatic Speech Recognition (ASR) for Bhili (ΰ€ΰ₯ΰ€²ΰ₯), specifically the Dehvali Bhili dialect, an Indo-Aryan language spoken by the Bhil community in western India. This is a fine-tuned version of ai4bharat/indic-conformer-600m-multilingual, trained on ~180 hours of Bhili read, conversational, and spontaneous speech data. The ONNX version of the model can be found here. Try out the model here!
Quick Start
1. Download Files
huggingface-cli download ai4bharat/bhili-asr --local-dir bhili-asr
cd bhili-asr
2. Extract Tokenizers and NeMo
tar -xzvf tokenizers.tar.gz
tar -xzvf NeMo.tar.gz
3. Install NeMo Toolkit
β οΈ Important: This model requires a custom NeMo toolkit with multilingual tokenizer support. Do NOT install NeMo from pip or the official NVIDIA repository. You must use the provided NeMo.tar.gz file.
cd NeMo
pip install -e .[asr]
cd ..
Verify correct NeMo is installed:
python -c "import nemo; print(nemo.__file__)"
β οΈ Output should point to your local NeMo folder, not system packages.
4. Update Model Paths
Update the tokenizer paths to match your local directory:
python update_paths.py --root_dir /full/path/to/bhili-asr/tokenizers/tokenizers_v3
This creates bhili_asr_finetune_v1_updated.nemo with correct paths.
5. Verify Setup
Your directory should look like:
bhili-asr/
βββ bhili_asr_finetune_v1.nemo
βββ bhili_asr_finetune_v1_updated.nemo (created after step 4)
βββ tokenizers/
β βββ tokenizers_v3/
β βββ as_256/
β βββ bn_256/
β βββ hi_256/
β βββ mr_256/
β βββ ...
βββ NeMo/
βββ update_paths.py
Inference
See the GitHub repository for complete setup instructions.
Requirements
- Python 3.10
- NeMo toolkit (from provided
NeMo.tar.gz) - PyTorch 2.0+
- CUDA 11.8+ (for GPU inference)
soundfile(pip install soundfile)
Basic Usage
import soundfile as sf
import torch
from nemo.collections.asr.models import EncDecHybridRNNTCTCBPEModel
model = EncDecHybridRNNTCTCBPEModel.restore_from("bhili_asr_finetune_v1_updated.nemo", map_location="cuda")
model.eval()
# Load audio (must be 16kHz mono)
audio, sr = sf.read("audio.wav")
assert sr == 16000, "Audio must be 16kHz"
signal = torch.tensor(audio, dtype=torch.float32).unsqueeze(0).cuda()
signal_len = torch.tensor([signal.shape[1]]).cuda()
with torch.no_grad():
encoded, encoded_len = model.forward(input_signal=signal, input_signal_length=signal_len)
hyps, _ = model.decoding.rnnt_decoder_predictions_tensor(
encoded, encoded_len, return_hypotheses=False, lang_ids=["mr"],
)
print(hyps[0])
Because this is a multilingual model with per-language joint network heads, the language ID ("mr") must be passed explicitly during decoding.
β οΈ Note on Tokenizer Choice
Bhili does not currently have a dedicated tokenizer in the IndicConformer model's supported language set.
We use the Marathi (mr) tokenizer as the closest alternative, since both languages use the Devanagari script.
β οΈ Note on Audio Format
The model was trained on audio bandlimited to 8kHz (telephony bandwidth) and resampled to 16kHz. For best results, inference audio should match this preprocessing. If feeding clean 16kHz studio audio, performance will be lower than expected.
Troubleshooting
| Error | Solution |
|---|---|
KeyError: 'dir' |
Run update_paths.py with correct absolute path |
KeyError: None (during transcribe) |
Pass lang_ids=["mr"] to rnnt_decoder_predictions_tensor as shown above |
MultilingualTokenizer not found |
Install NeMo from provided NeMo.tar.gz, not pip |
huggingface_hub errors |
pip install huggingface_hub==0.23.5 transformers==4.36.0 |
pyarrow errors |
pip install numpy==1.26.4 pyarrow==14.0.1 datasets==2.14.0 |
| Stereo / wrong sample rate | Convert with ffmpeg -i input.wav -ar 16000 -ac 1 output.wav |
- Downloads last month
- 35
Model tree for ai4bharat/bhili-asr
Base model
ai4bharat/indic-conformer-600m-multilingual