File size: 5,476 Bytes

---
language:
- en
license: mit
tags:
- whisper
- automatic-speech-recognition
- speech
- audio
- transcription
- phone-calls
- conversational
pipeline_tag: automatic-speech-recognition
---

<div align="center">
  <img src="https://olib.ai/logo.png" alt="Olib AI Logo" width="200"/>
  
  # Whisper to Oliver
  
  **Fine-tuned Whisper for Real-World Conversational Audio**
  
  [![Model on HF](https://img.shields.io/badge/🤗-Model%20on%20HF-yellow.svg)](https://huggingface.co/olib-ai/whisper-to-oliver)
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
  [![Olib AI](https://img.shields.io/badge/🌐-Olib%20AI-green.svg)](https://www.olib.ai)
</div>

## 🎯 Model Description

**Whisper to Oliver** is a specialized fine-tuned version of OpenAI's `whisper-large-v3-turbo` model, optimized for real-world conversational audio with challenging acoustic conditions. This model is specifically designed to excel at transcribing phone calls and conversations where audio quality may be compromised.

### ✨ Key Features

- 🎙️ **Enhanced Performance on Poor Quality Audio**: Fine-tuned on 170K conversational datasets with minor to poor audio quality
- 📞 **Phone Call Optimized**: Specifically trained on short conversational segments typical of phone calls
- 🚀 **Turbo Performance**: Inherits the speed advantages of whisper-large-v3-turbo
- 💼 **Enterprise Ready**: Developed by [Olib AI](https://www.olib.ai) for business applications
- 🔧 **FP32 Precision**: Full precision model for maximum accuracy

## 📊 Training Details

- **Base Model**: [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo)
- **Training Dataset**: 170,000 conversational audio samples
- **Audio Characteristics**: Minor to poor quality recordings
- **Focus**: Short conversational segments typical of phone interactions
- **Developer**: [Olib AI](https://www.olib.ai) - Building AI Services for Businesses

## 🚀 Usage

### Using the Transformers Library

```python
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "olib-ai/whisper-to-oliver"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

# Note: This model is in FP32 format

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

# Transcribe audio
result = pipe("audio.mp3")
print(result["text"])
```

### Advanced Usage with Parameters

```python
# For better results with phone calls or poor quality audio
result = pipe(
    "phone_call.mp3",
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
)
print(result["text"])
```

## 📈 Performance

Whisper to Oliver shows significant improvements over the base model when dealing with:
- 📞 Phone call recordings
- 🎙️ Low-quality microphone inputs
- 🌐 Conversational speech with background noise
- 💬 Short dialogue segments

## 🎯 Intended Use

This model is designed for:
- Customer service call transcription
- Meeting transcription with variable audio quality
- Voice assistant applications
- Real-time conversation analysis
- Accessibility applications for hearing-impaired users

## ⚠️ Limitations and Ethical Considerations

Following the ethical guidelines of the base Whisper model:
- Should not be used to transcribe recordings without consent
- Not recommended for "subjective classification" tasks
- Should undergo robust evaluation before deployment in high-risk contexts
- May show performance variations across different languages and demographics

## 📜 License

This model is released under the **MIT License**, allowing for commercial and non-commercial use with proper attribution.

## 📖 Citation

If you use this model in your research or applications, please cite both our work and the original Whisper paper:

```bibtex
@misc{whisper-to-oliver,
  author = {{Olib AI}},
  title = {Whisper to Oliver: Fine-tuned Whisper for Real-World Conversational Audio},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/olib-ai/whisper-to-oliver}},
}

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
```

## 👥 About Olib AI

[Olib AI](https://www.olib.ai) specializes in building AI services for businesses. Our team focuses on creating practical AI solutions that solve real-world problems.

**Contact Us:**
- 🌐 Website: [www.olib.ai](https://www.olib.ai)
- 📧 Akram H. Sharkar: [[email protected]](mailto:[email protected])
- 📧 Maya M. Sharkar: [[email protected]](mailto:[email protected])
- 💻 GitHub: [https://github.com/Olib-AI](https://github.com/Olib-AI)

---

<div align="center">
  <strong>Built with ❤️ by Olib AI</strong>
</div>