--- tags: - audio-classification - speech-classification - audio - chunkformer - pytorch - transformers - speech-processing - age - dialect - emotion - gender license: apache-2.0 library_name: transformers pipeline_tag: audio-classification --- # ChunkFormer Classification Model [![GitHub](https://img.shields.io/badge/GitHub-ChunkFormer-blue)](https://github.com/khanld/chunkformer) [![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](https://arxiv.org/abs/2502.14673) This model performs speech classification tasks such as gender recognition, dialect identification, emotion detection, and age classification. ## Classification Tasks - **Age**: 5 classes - **Dialect**: 5 classes - **Emotion**: 8 classes - **Gender**: 2 classes ## Usage Install the package: ```bash pip install chunkformer ``` ### Single Audio Classification ```python from chunkformer import ChunkFormerModel # Load the model model = ChunkFormerModel.from_pretrained("khanhld/chunkformer-gender-emotion-dialect-age-classification") # Classify a single audio file result = model.classify_audio( audio_path="path/to/your/audio.wav", chunk_size=-1, # -1 for full attention left_context_size=-1, right_context_size=-1 ) print(result) # Output example: # { # 'gender': { # 'label': 'female', # 'label_id': 0, # 'prob': 0.95 # }, # 'dialect': { # 'label': 'northern dialect', # 'label_id': 3, # 'prob': 0.70 # }, # 'emotion': { # 'label': 'neutral', # 'label_id': 5, # 'prob': 0.80 # } # } ``` ### Command Line Usage ```bash chunkformer-decode \ --model_checkpoint khanhld/chunkformer-gender-emotion-dialect-age-classification \ --audio_file path/to/audio.wav ``` ## Training This model was trained using the ChunkFormer framework. For more details about the training process and to access the source code, please visit: https://github.com/khanld/chunkformer Paper: https://arxiv.org/abs/2502.14673 ## Citation If you use this work in your research, please cite: ```bibtex @INPROCEEDINGS{10888640, author={Le, Khanh and Ho, Tuan Vu and Tran, Dung and Chau, Duc Thanh}, booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription}, year={2025}, volume={}, number={}, pages={1-5}, keywords={Scalability;Memory management;Graphics processing units;Signal processing;Performance gain;Hardware;Resource management;Speech processing;Standards;Context modeling;chunkformer;masked batch;long-form transcription}, doi={10.1109/ICASSP49660.2025.10888640}} ```