--- library_name: transformers license: mit base_model: pyannote/segmentation-3.0 tags: - speaker-diarization - speaker-segmentation - generated_from_trainer - air traffic control datasets: - miguelozaalon/atco2-1h-asr-diarization model-index: - name: speaker-segmentation-atc results: - task: type: segmentation dataset: name: atco2 type: atco2 metrics: - name: Diarization Error Rate (DER) type: Diarization Error Rate value: 15.816% - name: Jaccard Error Rate (JER) type: Jaccard Error Rate value: 24.198% - task: type: segmentation dataset: name: atco2-noise-reduction type: atco2-noise-reduction metrics: - name: Diarization Error Rate (DER) type: Diarization Error Rate value: 14.764% - name: Jaccard Error Rate (JER) type: Jaccard Error Rate value: 19.815% language: - en --- # speaker-segmentation-atc This model is a fine-tuned version of [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) on the [miguelozaalon/atco2-1h-asr-diarization](https://huggingface.co/datasets/miguelozaalon/atco2-1h-asr-diarization) dataset. ## Model description This model is designed for speaker segmentation in air traffic control (ATC) communications. It has been fine-tuned on a dataset specifically curated for ATC conversations, making it particularly effective for identifying and segmenting different speakers in ATC audio recordings. The model uses the pyannote/segmentation-3.0 architecture as its base, which is known for its robust performance in speaker diarization tasks. By fine-tuning on ATC-specific data, this model has been optimized to handle the unique characteristics of air traffic control communications, including multiple speakers, background noise, and technical jargon. ## Intended uses & limitations ### Intended uses: - Speaker segmentation in air traffic control audio recordings - Diarization of ATC communications for transcription or analysis purposes - Identifying turn-taking patterns in ATC conversations ### Limitations: - The model is specifically trained on ATC data and may not perform as well on general conversational audio ## Training and evaluation data The model was trained on the [miguelozaalon/atco2-1h-asr-diarization](https://huggingface.co/datasets/miguelozaalon/atco2-1h-asr-diarization) dataset. This dataset consists of: - 1 hour of annotated ATC communications - Multiple speakers, including air traffic controllers and pilots - Varied acoustic conditions typical of ATC environments - Detailed speaker turn annotations The dataset was split into training and validation sets to ensure proper evaluation during the fine-tuning process. ## Training procedure Starting from the pre-trained [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) model. The training process focused on adapting the model to the specific characteristics of ATC communications while retaining its general speaker segmentation capabilities. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.001 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - num_epochs: 5.0 ### Training results - DER = 15.816% - JER = 24.198% ### Framework versions - Transformers 4.45.1 - Pytorch 2.4.1+cu124 - Datasets 3.0.1 - Tokenizers 0.20.0