ChunkFormer Classification Model

GitHub Paper

This model performs speech classification tasks such as gender recognition, dialect identification, emotion detection, and age classification.

Classification Tasks

  • Age: 5 classes
  • Dialect: 5 classes
  • Emotion: 8 classes
  • Gender: 2 classes

Usage

Install the package:

pip install chunkformer

Single Audio Classification

from chunkformer import ChunkFormerModel

# Load the model
model = ChunkFormerModel.from_pretrained("khanhld/chunkformer-gender-emotion-dialect-age-classification")

# Classify a single audio file
result = model.classify_audio(
    audio_path="path/to/your/audio.wav",
    chunk_size=-1,  # -1 for full attention
    left_context_size=-1,
    right_context_size=-1
)

print(result)
# Output example:
# {
#   'gender': {
#       'label': 'female',
#       'label_id': 0,
#       'prob': 0.95
#   },
#   'dialect': {
#       'label': 'northern dialect',
#       'label_id': 3,
#       'prob': 0.70
#   },
#   'emotion': {
#       'label': 'neutral',
#       'label_id': 5,
#       'prob': 0.80
#   }
# }

Command Line Usage

chunkformer-decode \
    --model_checkpoint khanhld/chunkformer-gender-emotion-dialect-age-classification \
    --audio_file path/to/audio.wav

Training

This model was trained using the ChunkFormer framework. For more details about the training process and to access the source code, please visit: https://github.com/khanld/chunkformer

Paper: https://arxiv.org/abs/2502.14673

Citation

If you use this work in your research, please cite:

@INPROCEEDINGS{10888640,
    author={Le, Khanh and Ho, Tuan Vu and Tran, Dung and Chau, Duc Thanh},
    booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription},
    year={2025},
    volume={},
    number={},
    pages={1-5},
    keywords={Scalability;Memory management;Graphics processing units;Signal processing;Performance gain;Hardware;Resource management;Speech processing;Standards;Context modeling;chunkformer;masked batch;long-form transcription},
    doi={10.1109/ICASSP49660.2025.10888640}}
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support