YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Model Card for Wav2Vec2-Large-XLSR-53-Kapampangan
This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 for Automatic Speech Recognition (ASR) in Kapampangan. It was trained on a custom dataset of 2,489 audio-transcription pairs and achieves competitive results on a held-out test split. The model supports transcription of Kapampangan speech, with partial support for English code-switching.
Model Details
Model Description
- Developed by: Sean Almendral
- Model type: Speech-to-text (ASR)
- Languages (NLP): Kapampangan (
kap), English (en) - License: Apache-2.0
- Finetuned from model: facebook/wav2vec2-large-xlsr-53
- Pipeline tag:
automatic-speech-recognition
Model Sources
- Repository: Hugging Face Model Card
- Paper [base model]: Wav2Vec2: Self-Supervised Learning of Speech Representations
Uses
Direct Use
- Transcribing spoken Kapampangan into written text.
- Educational and cultural preservation of Kapampangan.
- Can be used as a baseline ASR model for further fine-tuning.
Downstream Use
- Integration into translation pipelines (e.g., Kapampangan โ English).
- Voice-enabled chatbots, transcription tools, or accessibility apps.
Out-of-Scope Use
- Not suited for medical, legal, or safety-critical transcription.
- Performance may degrade with noisy audio or dialects outside the training data.
Bias, Risks, and Limitations
- Dataset size is relatively small (2,489 clips), so coverage is limited.
- May not generalize well to informal speech, background noise, or rare words.
- English code-switching is partially supported but not fully accurate.
- Risk of transcription bias due to imbalanced speaker representation in the dataset.
Recommendations
- Users should validate outputs before using them in sensitive applications.
- For production, additional fine-tuning with larger and more diverse Kapampangan datasets is recommended.
How to Get Started with the Model
from transformers import pipeline
asr = pipeline(
"automatic-speech-recognition",
model="kruokruo/wav2vec2-large-xlsr-53-kapampangan"
)
result = asr("sample_audio.wav")
print(result["text"])
Training Details
Training Data
- Custom dataset of 2,489 audio-transcription pairs in Kapampangan.
- Audio normalized to 16kHz mono WAV.
- Preprocessing included lowercasing, punctuation removal, and character-based vocabulary building.
Training Procedure
- Base model: facebook/wav2vec2-large-xlsr-53
- Objective: Connectionist Temporal Classification (CTC)
- Optimizer: AdamW
- Learning rate: 1e-4
- Epochs: 30
- Train/Validation split: 80/20 (random seed=42)
Preprocessing
- Audio: resampled to 16kHz mono WAV
- Text: lowercased, punctuation removed, character-level tokenization
Training Hyperparameters
- Precision: fp32
- Batch size: dependent on GPU memory (typical 16โ32)
- Scheduler: Linear decay with warmup
Evaluation
Testing Data
- 20% held-out split from the same dataset (โ498 samples).
Metrics
- WER (Word Error Rate): 0.3586
- CER (Character Error Rate): 0.1259
Results
- The model achieves strong performance for clean Kapampangan speech.
- Maintains low character-level error (12.6%) and moderate word-level error (35.9%).
Environmental Impact
- Hardware Type: NVIDIA L4 GPU (Google Colab Pro)
- Training Duration: [Fill in total hours used, if tracked]
- Energy Consumed: ~514.06 Wh
- Carbon Emissions: ~209.74 gCO2e
- Cloud Provider: Google Colab Pro
- Compute Region: Not disclosed by Google Colab
These numbers were estimated using the Machine Learning Impact calculator.
Technical Specifications
Model Architecture and Objective
- Architecture: Wav2Vec2-Large-XLSR-53 (317M parameters, trained on 53 languages)
- Objective: Fine-tuned with a CTC head for Kapampangan ASR
Compute Infrastructure
- Framework: PyTorch + Hugging Face Transformers
- Trainer: Hugging Face
TrainerAPI
Citation
BibTeX:
@misc{wav2vec2-kapampangan,
title={Wav2Vec2-Large-XLSR-53-Kapampangan: Automatic Speech Recognition Model},
author={Sean Almendral},
year={2025},
howpublished={\url{https://huggingface.co/your-username/wav2vec2-large-xlsr-53-kapampangan}},
}
Model Card Authors
- Sean Almendral
Model Card Contact
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support