GigaAM-v3

GigaAM-v3 is a Conformer-based foundation model with 220–240M parameters, pretrained on diverse Russian speech data using the HuBERT-CTC objective. It is the third generation of the GigaAM family and provides state-of-the-art performance on Russian ASR across a wide range of domains.

GigaAM-v3 includes the following model variants:

ssl — self-supervised HuBERT–CTC encoder pre-trained on 700,000 hours of Russian speech
ctc — ASR model fine-tuned with a CTC decoder
rnnt — ASR model fine-tuned with an RNN-T decoder
e2e_ctc — end-to-end CTC model with punctuation and text normalization
e2e_rnnt — end-to-end RNN-T model with punctuation and text normalization

GigaAM-v3 training incorporates new internal datasets: callcenter conversations, speech with background music, natural speech, and speech with atypical characteristics. the models perform on average 30% better on these new domains, while maintaining the same quality as previous GigaAM generations on public benchmarks.

The table below reports the Word Error Rate (%) for GigaAM-v3 and other existing models over diverse domains.

Set Name	V3_CTC	V3_RNNT	T-One + LM	Whisper
Open Datasets	3.0	2.6	5.7	12.0
Golos Farfield	4.5	3.9	12.2	16.7
Natural Speech	7.8	6.9	14.5	13.6
Disordered Speech	20.6	19.2	51.0	59.3
Callcenter	10.3	9.5	13.5	23.9
Average	9.2	8.4	19.4	25.1

The end-to-end ASR models (e2e_ctc and e2e_rnnt) produce punctuated, normalized text directly. In end-to-end ASR comparisons of e2e_ctc and e2e_rnnt against Whisper-large-v3, using Gemini 2.5 Pro as an LLM-as-a-judge, GigaAM-v3 models win by an average margin of 70:30.

For detailed results, see metrics.

Usage

from transformers import AutoModel

revision = "e2e_rnnt"  # can be any v3 model: ssl, ctc, rnnt, e2e_ctc, e2e_rnnt
model = AutoModel.from_pretrained(
    "ai-sage/GigaAM-v3",
    revision=revision,
    trust_remote_code=True,
)

transcription = model.transcribe("example.wav")
print(transcription)

Recommended versions:

torch==2.8.0, torchaudio==2.8.0
transformers==4.57.1
pyannote-audio==4.0.0, torchcodec==0.7.0
(any) hydra-core, omegaconf, sentencepiece

Full usage guide can be found in the example.

License: MIT

Paper: GigaAM: Efficient Self-Supervised Learner for Speech Recognition (InterSpeech 2025)

Downloads last month: 214

Model tree for ai-sage/GigaAM-v3

Quantizations

1 model