Whisper Hebrew Nikud v1

Model Description

whisper-hebrew-nikud-v1 directly transcribes Hebrew speech to text with full diacritical marks (niqqud) in a single step, eliminating the traditional two-step pipeline of transcription followed by nikud restoration.

Developed by: Maayan Bogin
Model type: Automatic Speech Recognition
Language(s): Hebrew (עברית)
License: MIT
Finetuned from model: ivrit-ai/whisper-large-v3-turbo

Model Sources

Repository: https://github.com/MaayanBogin/Nikudon
Model: https://huggingface.co/MayBog/whisper-hebrew-nikud-v1

How to Get Started with the Model

from transformers import pipeline
import torch

device = 0 if torch.cuda.is_available() else "cpu"
pipe = pipeline(
    task="automatic-speech-recognition",
    model="MayBog/whisper-hebrew-nikud-v1",
    chunk_length_s=30,
    device=device,
)

result = pipe(
    "audio.wav",
    generate_kwargs={"language": "hebrew", "task": "transcribe"}
)

print(result["text"])  # Hebrew text with niqqud