YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for Wav2Vec2-Large-XLSR-53-Kapampangan

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 for Automatic Speech Recognition (ASR) in Kapampangan. It was trained on a custom dataset of 2,489 audio-transcription pairs and achieves competitive results on a held-out test split. The model supports transcription of Kapampangan speech, with partial support for English code-switching.

Model Details

Model Description

Developed by: Sean Almendral
Model type: Speech-to-text (ASR)
Languages (NLP): Kapampangan (kap), English (en)
License: Apache-2.0
Finetuned from model: facebook/wav2vec2-large-xlsr-53
Pipeline tag: automatic-speech-recognition

Model Sources

Repository: Hugging Face Model Card
Paper [base model]: Wav2Vec2: Self-Supervised Learning of Speech Representations

Uses

Direct Use

Transcribing spoken Kapampangan into written text.
Educational and cultural preservation of Kapampangan.
Can be used as a baseline ASR model for further fine-tuning.

Downstream Use

Integration into translation pipelines (e.g., Kapampangan → English).
Voice-enabled chatbots, transcription tools, or accessibility apps.

Out-of-Scope Use

Not suited for medical, legal, or safety-critical transcription.
Performance may degrade with noisy audio or dialects outside the training data.

Bias, Risks, and Limitations

Dataset size is relatively small (2,489 clips), so coverage is limited.
May not generalize well to informal speech, background noise, or rare words.
English code-switching is partially supported but not fully accurate.
Risk of transcription bias due to imbalanced speaker representation in the dataset.

Recommendations

Users should validate outputs before using them in sensitive applications.
For production, additional fine-tuning with larger and more diverse Kapampangan datasets is recommended.

How to Get Started with the Model

from transformers import pipeline

asr = pipeline(
    "automatic-speech-recognition",
    model="kruokruo/wav2vec2-large-xlsr-53-kapampangan"
)

result = asr("sample_audio.wav")
print(result["text"])

Training Details

Training Data

Custom dataset of 2,489 audio-transcription pairs in Kapampangan.
Audio normalized to 16kHz mono WAV.
Preprocessing included lowercasing, punctuation removal, and character-based vocabulary building.

Training Procedure

Base model: facebook/wav2vec2-large-xlsr-53
Objective: Connectionist Temporal Classification (CTC)
Optimizer: AdamW
Learning rate: 1e-4
Epochs: 30
Train/Validation split: 80/20 (random seed=42)

Preprocessing

Audio: resampled to 16kHz mono WAV
Text: lowercased, punctuation removed, character-level tokenization

Training Hyperparameters

Precision: fp32
Batch size: dependent on GPU memory (typical 16–32)
Scheduler: Linear decay with warmup

Evaluation

Testing Data

20% held-out split from the same dataset (≈498 samples).

Metrics

WER (Word Error Rate): 0.3586
CER (Character Error Rate): 0.1259

Results

The model achieves strong performance for clean Kapampangan speech.
Maintains low character-level error (12.6%) and moderate word-level error (35.9%).

Environmental Impact

Hardware Type: NVIDIA L4 GPU (Google Colab Pro)
Training Duration: [Fill in total hours used, if tracked]
Energy Consumed: ~514.06 Wh
Carbon Emissions: ~209.74 gCO2e
Cloud Provider: Google Colab Pro
Compute Region: Not disclosed by Google Colab

These numbers were estimated using the Machine Learning Impact calculator.

Technical Specifications

Model Architecture and Objective

Architecture: Wav2Vec2-Large-XLSR-53 (317M parameters, trained on 53 languages)
Objective: Fine-tuned with a CTC head for Kapampangan ASR

Compute Infrastructure

Framework: PyTorch + Hugging Face Transformers
Trainer: Hugging Face Trainer API

Citation

BibTeX:

@misc{wav2vec2-kapampangan,
  title={Wav2Vec2-Large-XLSR-53-Kapampangan: Automatic Speech Recognition Model},
  author={Sean Almendral},
  year={2025},
  howpublished={\url{https://huggingface.co/your-username/wav2vec2-large-xlsr-53-kapampangan}},
}

Model Card Authors

Sean Almendral

Model Card Contact

[email protected]

Downloads last month: 19

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

kruokruo
/

wav2vec2-large-xlsr-53-kapampangan