khanhld
/

chunkformer-ctc-small-libri-100h

Automatic Speech Recognition

speech-recognition

long-form transcription

Model card Files Files and versions

chunkformer-ctc-small-libri-100h / README.md

khanhld's picture

Upload ChunkFormer model

b873087 verified 2 months ago

|

history blame contribute delete

2.16 kB

	---
	tags:
	- speech-recognition
	- audio
	- chunkformer
	- ctc
	- pytorch
	- transformers
	- automatic-speech-recognition
	- long-form transcription
	- asr
	license: apache-2.0
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	---

	# ChunkFormer Model
	<style>
	img {
	display: inline;
	}
	</style>
	[![GitHub](https://img.shields.io/badge/GitHub-ChunkFormer-blue)](https://github.com/khanld/chunkformer)
	[![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](https://arxiv.org/abs/2502.14673)


	## Usage

	Install the package:

	```bash
	pip install chunkformer
	```

	```python
	from chunkformer import ChunkFormerModel

	# Load the model
	model = ChunkFormerModel.from_pretrained("khanhld/chunkformer-ctc-small-libri-100h")

	# For long-form audio transcription
	transcription = model.endless_decode(
	audio_path="path/to/your/audio.wav",
	chunk_size=64,
	left_context_size=128,
	right_context_size=128,
	return_timestamps=True
	)
	print(transcription)

	# For batch processing
	audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
	transcriptions = model.batch_decode(
	audio_paths=audio_files,
	chunk_size=64,
	left_context_size=128,
	right_context_size=128
	)
	```

	## Training

	This model was trained using the ChunkFormer framework. For more details about the training process and to access the source code, please visit: https://github.com/khanld/chunkformer

	Paper: https://arxiv.org/abs/2502.14673

	## Citation

	If you use this work in your research, please cite:

	```bibtex
	@INPROCEEDINGS{10888640,
	author={Le, Khanh and Ho, Tuan Vu and Tran, Dung and Chau, Duc Thanh},
	booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
	title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription},
	year={2025},
	volume={},
	number={},
	pages={1-5},
	keywords={Scalability;Memory management;Graphics processing units;Signal processing;Performance gain;Hardware;Resource management;Speech processing;Standards;Context modeling;chunkformer;masked batch;long-form transcription},
	doi={10.1109/ICASSP49660.2025.10888640}}
	```