whisper-tiny-khmer-mlx-fp32

This model was converted to MLX format from openai-whisper-tiny, then fine-tined to Khmer language using three datasets:

seanghay/khmer_mpwt_speech
seanghay/km-speech-corpus
train split of openslr/openslr SLR42

It achieves the following word error rate (wer) on 2 popular datasets:

77.3% on test split of google/fleurs km-kh

NOTE MLX format is usable for M-chip series of Apple.

Use with mlx

pip install mlx-whisper

Write a python script, example.py, as the following

import mlx_whisper

result = mlx_whisper.transcribe(
    SPEECH_FILE_NAME,
    path_or_hf_repo="mlx-community/whisper-tiny-khmer-mlx-fp32",
    fp16=False
)
print(result['text'])

Then execute this script example.py to see the result.

You can also use command line in terminal

mlx_whisper --model mlx-community/whisper-tiny-khmer-mlx-fp32 --task transcribe SPEECH_FILE_NAME --fp16 False

Downloads last month: 47

Safetensors

Model size

37.2M params

Tensor type

I64

F32

Dataset used to train mlx-community/whisper-tiny-khmer-mlx-fp32

Space using mlx-community/whisper-tiny-khmer-mlx-fp32 1

Evaluation results

test on test split of "km_kh" in google/fleurs
self-reported

77.3%

View on Papers With Code