File size: 5,476 Bytes
7ecb336
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0147e6
7ecb336
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0147e6
 
7ecb336
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
language:
- en
license: mit
tags:
- whisper
- automatic-speech-recognition
- speech
- audio
- transcription
- phone-calls
- conversational
pipeline_tag: automatic-speech-recognition
---

<div align="center">
  <img src="https://olib.ai/logo.png" alt="Olib AI Logo" width="200"/>
  
  # Whisper to Oliver
  
  **Fine-tuned Whisper for Real-World Conversational Audio**
  
  [![Model on HF](https://img.shields.io/badge/πŸ€—-Model%20on%20HF-yellow.svg)](https://huggingface.co/olib-ai/whisper-to-oliver)
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
  [![Olib AI](https://img.shields.io/badge/🌐-Olib%20AI-green.svg)](https://www.olib.ai)
</div>

## 🎯 Model Description

**Whisper to Oliver** is a specialized fine-tuned version of OpenAI's `whisper-large-v3-turbo` model, optimized for real-world conversational audio with challenging acoustic conditions. This model is specifically designed to excel at transcribing phone calls and conversations where audio quality may be compromised.

### ✨ Key Features

- πŸŽ™οΈ **Enhanced Performance on Poor Quality Audio**: Fine-tuned on 170K conversational datasets with minor to poor audio quality
- πŸ“ž **Phone Call Optimized**: Specifically trained on short conversational segments typical of phone calls
- πŸš€ **Turbo Performance**: Inherits the speed advantages of whisper-large-v3-turbo
- πŸ’Ό **Enterprise Ready**: Developed by [Olib AI](https://www.olib.ai) for business applications
- πŸ”§ **FP32 Precision**: Full precision model for maximum accuracy

## πŸ“Š Training Details

- **Base Model**: [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo)
- **Training Dataset**: 170,000 conversational audio samples
- **Audio Characteristics**: Minor to poor quality recordings
- **Focus**: Short conversational segments typical of phone interactions
- **Developer**: [Olib AI](https://www.olib.ai) - Building AI Services for Businesses

## πŸš€ Usage

### Using the Transformers Library

```python
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "olib-ai/whisper-to-oliver"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

# Note: This model is in FP32 format

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

# Transcribe audio
result = pipe("audio.mp3")
print(result["text"])
```

### Advanced Usage with Parameters

```python
# For better results with phone calls or poor quality audio
result = pipe(
    "phone_call.mp3",
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
)
print(result["text"])
```

## πŸ“ˆ Performance

Whisper to Oliver shows significant improvements over the base model when dealing with:
- πŸ“ž Phone call recordings
- πŸŽ™οΈ Low-quality microphone inputs
- 🌐 Conversational speech with background noise
- πŸ’¬ Short dialogue segments

## 🎯 Intended Use

This model is designed for:
- Customer service call transcription
- Meeting transcription with variable audio quality
- Voice assistant applications
- Real-time conversation analysis
- Accessibility applications for hearing-impaired users

## ⚠️ Limitations and Ethical Considerations

Following the ethical guidelines of the base Whisper model:
- Should not be used to transcribe recordings without consent
- Not recommended for "subjective classification" tasks
- Should undergo robust evaluation before deployment in high-risk contexts
- May show performance variations across different languages and demographics

## πŸ“œ License

This model is released under the **MIT License**, allowing for commercial and non-commercial use with proper attribution.

## πŸ“– Citation

If you use this model in your research or applications, please cite both our work and the original Whisper paper:

```bibtex
@misc{whisper-to-oliver,
  author = {{Olib AI}},
  title = {Whisper to Oliver: Fine-tuned Whisper for Real-World Conversational Audio},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/olib-ai/whisper-to-oliver}},
}

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
```

## πŸ‘₯ About Olib AI

[Olib AI](https://www.olib.ai) specializes in building AI services for businesses. Our team focuses on creating practical AI solutions that solve real-world problems.

**Contact Us:**
- 🌐 Website: [www.olib.ai](https://www.olib.ai)
- πŸ“§ Akram H. Sharkar: [[email protected]](mailto:[email protected])
- πŸ“§ Maya M. Sharkar: [[email protected]](mailto:[email protected])
- πŸ’» GitHub: [https://github.com/Olib-AI](https://github.com/Olib-AI)

---

<div align="center">
  <strong>Built with ❀️ by Olib AI</strong>
</div>