---
base_model:
- openai/whisper-large-v3-turbo
base_model_relation: quantized
pipeline_tag: automatic-speech-recognition
tags:
  - quantized
  - hardware-optimized
  - whisper
  - audio
  - tensordyne
license: apache-2.0
---


## 📝 Overview
Tensordyne builds advanced [AI-inference systems](https://www.tensordyne.ai/inference-system), enabling faster, more affordable, and sustainable generative AI.  

This repository provides resources to quickly get started with **[Whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)** on the **Tensordyne Inference System and its SDK**.  

## 🧩 Model Details
- **Quantization:** post-training quantization of the base model, no fine-tuning or additional training has been performed
- **Supported data types:** Tensordyne FP16 (tFP16), Tensordyne FP8 (tFP8), mixed-precision

## ⚙️ Quantization
The Tensordyne SDK offers multiple post-training quantization strategies to convert AI models for efficient inference on the Tensordyne Inference System — fully customizable for your optimization targets.  
We showcase several preselected quantization variants that can be applied on-the-fly to quantize to Tensordyne data types here. The calibration-based strategies are defined by quantization configurations provided as `.json`.

The quantized models are evaluated on a subset of the [LibriSpeech ASR](https://huggingface.co/datasets/openslr/librispeech_asr) test set. Negative WER drops indicate that the model performs better than the float base model.

| Model Configuration            | Absolute WER [%] | Relative WER Drop vs. BF16      | Details                                                     |
|--------------------------------|------------------|---------------------------------|-------------------------------------------------------------|
| BF16                           | 1.933 %          | –                               | The baseline model trained in BF16                          |
| calibration_based_tFP16        | 1.921 %          | -0.61 %                        | calibration-based tFP16 quantization                        |
| layerwise_mixed_precision      | 1.909 %          | -1.23 %                        | calibration-based mixed-precision: tFP8, outliers in tFP16  |

## 🚀 Getting Started
Refer to the [Tensordyne Hugging Face Hub tutorial](https://resources.tensordyne.ai/sdk/v0.1.1/tutorials/tutorials/#tensordyne-hugging-face-hub-tutorials) for instructions on using the artifacts provided in this repository.  
Our [hosted documentation](https://resources.tensordyne.ai/sdk/v0.1.1/) provides more information on Tensordyne's quantization strategies and introduces you to our SDK.