# whisper-small-hindi

## Model Description
This is a fine-tuned version of the OpenAI [`whisper-small`](https://huggingface.co/openai/whisper-small) model, specifically adapted for Hindi speech recognition. The model was trained on the Hindi subset of the Mozilla Common Voice 17.0 dataset to improve transcription accuracy for Hindi audio.

---

## Intended Uses & Limitations

**Intended Uses:**

- Automatic speech recognition (ASR) for Hindi language audio.
- Speech-to-text transcription services targeting Hindi speakers.
- Integration into voice-enabled applications and platforms requiring Hindi transcription.

**Limitations:**

- The model’s performance depends on the diversity and quality of the training data.
- May not generalize well to Hindi dialects or accents not represented in Common Voice 17.0.
- Performance can degrade in noisy, overlapping, or highly conversational speech scenarios.

---

## Training and Evaluation Data

- **Training Dataset:** Mozilla Common Voice 17.0 (Hindi subset).
- **Evaluation Dataset:** Standard Common Voice 17.0 Hindi test split.
- Both datasets are publicly available and annotated for speech recognition tasks.

---

## Training Procedure

| Hyperparameter               | Value                                  |
|-----------------------------|----------------------------------------|
| Learning Rate               | 1e-5                                   |
| Training Batch Size         | 8                                      |
| Evaluation Batch Size       | 8                                      |
| Gradient Accumulation Steps | 2                                      |
| Total Effective Batch Size  | 16                                     |
| Optimizer                  | Adam (betas=(0.9, 0.999), epsilon=1e-8) |
| Learning Rate Scheduler     | Linear                                 |
| Warmup Steps                | 250                                    |
| Total Training Steps        | 1000                                   |
| Mixed Precision Training    | Native AMP (Automatic Mixed Precision) |

---

## Training Results

- Training framework: Hugging Face Transformers 4.39.3, PyTorch 2.6.0+cu124.
- Tokenizers version 0.15.2.
- Achieved competitive Word Error Rate (WER) on Common Voice Hindi test set.
- For detailed evaluation metrics, please refer to the model card or contact the author.

---

## Usage

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

processor = WhisperProcessor.from_pretrained("bohraanuj23/whisper-small-hindi")
model = WhisperForConditionalGeneration.from_pretrained("bohraanuj23/whisper-small-hindi")

audio_input = ...  # load your 16kHz audio array here

inputs = processor(audio_input, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(inputs.input_features)

transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("Transcription:", transcription)
  • Ensure audio is sampled at 16kHz for best performance.
  • The model disables forced decoder IDs and suppresses specific tokens for optimal decoding.

Citation

If you use this model, please cite:


License

Apache License 2.0


For questions or issues, please open a GitHub issue or contact the author directly.

https://github.com/anujbohra23 ```

Downloads last month
4
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bohraanuj23/whisper-small-hindi

Finetuned
(3509)
this model

Paper for bohraanuj23/whisper-small-hindi