---
library_name: transformers
license: mit
base_model: pyannote/segmentation-3.0
tags:
- speaker-diarization
- speaker-segmentation
- generated_from_trainer
- air traffic control
datasets:
- miguelozaalon/atco2-1h-asr-diarization
model-index:
- name: speaker-segmentation-atc
  results: 
  - task:
      type: segmentation
    dataset:
      name: atco2
      type: atco2
    metrics:
    - name: Diarization Error Rate (DER)
      type: Diarization Error Rate
      value: 15.816%
    - name: Jaccard Error Rate (JER)
      type: Jaccard Error Rate
      value: 24.198%
  - task:
      type: segmentation
    dataset:
      name: atco2-noise-reduction
      type: atco2-noise-reduction
    metrics:
    - name: Diarization Error Rate (DER)
      type: Diarization Error Rate
      value: 14.764%
    - name: Jaccard Error Rate (JER)
      type: Jaccard Error Rate
      value: 19.815%
language:
- en
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# speaker-segmentation-atc

This model is a fine-tuned version of [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) on the [miguelozaalon/atco2-1h-asr-diarization](https://huggingface.co/datasets/miguelozaalon/atco2-1h-asr-diarization) dataset.

## Model description

This model is designed for speaker segmentation in air traffic control (ATC) communications. It has been fine-tuned on a dataset specifically curated for ATC conversations, making it particularly effective for identifying and segmenting different speakers in ATC audio recordings.

The model uses the pyannote/segmentation-3.0 architecture as its base, which is known for its robust performance in speaker diarization tasks. By fine-tuning on ATC-specific data, this model has been optimized to handle the unique characteristics of air traffic control communications, including multiple speakers, background noise, and technical jargon.

## Intended uses & limitations

### Intended uses:
- Speaker segmentation in air traffic control audio recordings
- Diarization of ATC communications for transcription or analysis purposes
- Identifying turn-taking patterns in ATC conversations

### Limitations:
- The model is specifically trained on ATC data and may not perform as well on general conversational audio

## Training and evaluation data

The model was trained on the [miguelozaalon/atco2-1h-asr-diarization](https://huggingface.co/datasets/miguelozaalon/atco2-1h-asr-diarization) dataset. This dataset consists of:

- 1 hour of annotated ATC communications
- Multiple speakers, including air traffic controllers and pilots
- Varied acoustic conditions typical of ATC environments
- Detailed speaker turn annotations

The dataset was split into training and validation sets to ensure proper evaluation during the fine-tuning process.

## Training procedure

Starting from the pre-trained [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) model. The training process focused on adapting the model to the specific characteristics of ATC communications while retaining its general speaker segmentation capabilities.

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 5.0

### Training results

- DER = 15.816%
- JER = 24.198%
### Framework versions

- Transformers 4.45.1
- Pytorch 2.4.1+cu124
- Datasets 3.0.1
- Tokenizers 0.20.0