YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for Wav2Vec2-Large-XLSR-53-Kapampangan

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 for Automatic Speech Recognition (ASR) in Kapampangan. It was trained on a custom dataset of 2,489 audio-transcription pairs and achieves competitive results on a held-out test split. The model supports transcription of Kapampangan speech, with partial support for English code-switching.

Model Details

Model Description

  • Developed by: Sean Almendral
  • Model type: Speech-to-text (ASR)
  • Languages (NLP): Kapampangan (kap), English (en)
  • License: Apache-2.0
  • Finetuned from model: facebook/wav2vec2-large-xlsr-53
  • Pipeline tag: automatic-speech-recognition

Model Sources


Uses

Direct Use

  • Transcribing spoken Kapampangan into written text.
  • Educational and cultural preservation of Kapampangan.
  • Can be used as a baseline ASR model for further fine-tuning.

Downstream Use

  • Integration into translation pipelines (e.g., Kapampangan โ†’ English).
  • Voice-enabled chatbots, transcription tools, or accessibility apps.

Out-of-Scope Use

  • Not suited for medical, legal, or safety-critical transcription.
  • Performance may degrade with noisy audio or dialects outside the training data.

Bias, Risks, and Limitations

  • Dataset size is relatively small (2,489 clips), so coverage is limited.
  • May not generalize well to informal speech, background noise, or rare words.
  • English code-switching is partially supported but not fully accurate.
  • Risk of transcription bias due to imbalanced speaker representation in the dataset.

Recommendations

  • Users should validate outputs before using them in sensitive applications.
  • For production, additional fine-tuning with larger and more diverse Kapampangan datasets is recommended.

How to Get Started with the Model

from transformers import pipeline

asr = pipeline(
    "automatic-speech-recognition",
    model="kruokruo/wav2vec2-large-xlsr-53-kapampangan"
)

result = asr("sample_audio.wav")
print(result["text"])

Training Details

Training Data

  • Custom dataset of 2,489 audio-transcription pairs in Kapampangan.
  • Audio normalized to 16kHz mono WAV.
  • Preprocessing included lowercasing, punctuation removal, and character-based vocabulary building.

Training Procedure

  • Base model: facebook/wav2vec2-large-xlsr-53
  • Objective: Connectionist Temporal Classification (CTC)
  • Optimizer: AdamW
  • Learning rate: 1e-4
  • Epochs: 30
  • Train/Validation split: 80/20 (random seed=42)

Preprocessing

  • Audio: resampled to 16kHz mono WAV
  • Text: lowercased, punctuation removed, character-level tokenization

Training Hyperparameters

  • Precision: fp32
  • Batch size: dependent on GPU memory (typical 16โ€“32)
  • Scheduler: Linear decay with warmup

Evaluation

Testing Data

  • 20% held-out split from the same dataset (โ‰ˆ498 samples).

Metrics

  • WER (Word Error Rate): 0.3586
  • CER (Character Error Rate): 0.1259

Results

  • The model achieves strong performance for clean Kapampangan speech.
  • Maintains low character-level error (12.6%) and moderate word-level error (35.9%).

Environmental Impact

  • Hardware Type: NVIDIA L4 GPU (Google Colab Pro)
  • Training Duration: [Fill in total hours used, if tracked]
  • Energy Consumed: ~514.06 Wh
  • Carbon Emissions: ~209.74 gCO2e
  • Cloud Provider: Google Colab Pro
  • Compute Region: Not disclosed by Google Colab

These numbers were estimated using the Machine Learning Impact calculator.


Technical Specifications

Model Architecture and Objective

  • Architecture: Wav2Vec2-Large-XLSR-53 (317M parameters, trained on 53 languages)
  • Objective: Fine-tuned with a CTC head for Kapampangan ASR

Compute Infrastructure

  • Framework: PyTorch + Hugging Face Transformers
  • Trainer: Hugging Face Trainer API

Citation

BibTeX:

@misc{wav2vec2-kapampangan,
  title={Wav2Vec2-Large-XLSR-53-Kapampangan: Automatic Speech Recognition Model},
  author={Sean Almendral},
  year={2025},
  howpublished={\url{https://huggingface.co/your-username/wav2vec2-large-xlsr-53-kapampangan}},
}

Model Card Authors

  • Sean Almendral

Model Card Contact


Downloads last month
19
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using kruokruo/wav2vec2-large-xlsr-53-kapampangan 1