File size: 5,476 Bytes
7ecb336 f0147e6 7ecb336 f0147e6 7ecb336 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
---
language:
- en
license: mit
tags:
- whisper
- automatic-speech-recognition
- speech
- audio
- transcription
- phone-calls
- conversational
pipeline_tag: automatic-speech-recognition
---
<div align="center">
<img src="https://olib.ai/logo.png" alt="Olib AI Logo" width="200"/>
# Whisper to Oliver
**Fine-tuned Whisper for Real-World Conversational Audio**
[](https://huggingface.co/olib-ai/whisper-to-oliver)
[](https://opensource.org/licenses/MIT)
[](https://www.olib.ai)
</div>
## π― Model Description
**Whisper to Oliver** is a specialized fine-tuned version of OpenAI's `whisper-large-v3-turbo` model, optimized for real-world conversational audio with challenging acoustic conditions. This model is specifically designed to excel at transcribing phone calls and conversations where audio quality may be compromised.
### β¨ Key Features
- ποΈ **Enhanced Performance on Poor Quality Audio**: Fine-tuned on 170K conversational datasets with minor to poor audio quality
- π **Phone Call Optimized**: Specifically trained on short conversational segments typical of phone calls
- π **Turbo Performance**: Inherits the speed advantages of whisper-large-v3-turbo
- πΌ **Enterprise Ready**: Developed by [Olib AI](https://www.olib.ai) for business applications
- π§ **FP32 Precision**: Full precision model for maximum accuracy
## π Training Details
- **Base Model**: [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo)
- **Training Dataset**: 170,000 conversational audio samples
- **Audio Characteristics**: Minor to poor quality recordings
- **Focus**: Short conversational segments typical of phone interactions
- **Developer**: [Olib AI](https://www.olib.ai) - Building AI Services for Businesses
## π Usage
### Using the Transformers Library
```python
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "olib-ai/whisper-to-oliver"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
# Note: This model is in FP32 format
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)
# Transcribe audio
result = pipe("audio.mp3")
print(result["text"])
```
### Advanced Usage with Parameters
```python
# For better results with phone calls or poor quality audio
result = pipe(
"phone_call.mp3",
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
)
print(result["text"])
```
## π Performance
Whisper to Oliver shows significant improvements over the base model when dealing with:
- π Phone call recordings
- ποΈ Low-quality microphone inputs
- π Conversational speech with background noise
- π¬ Short dialogue segments
## π― Intended Use
This model is designed for:
- Customer service call transcription
- Meeting transcription with variable audio quality
- Voice assistant applications
- Real-time conversation analysis
- Accessibility applications for hearing-impaired users
## β οΈ Limitations and Ethical Considerations
Following the ethical guidelines of the base Whisper model:
- Should not be used to transcribe recordings without consent
- Not recommended for "subjective classification" tasks
- Should undergo robust evaluation before deployment in high-risk contexts
- May show performance variations across different languages and demographics
## π License
This model is released under the **MIT License**, allowing for commercial and non-commercial use with proper attribution.
## π Citation
If you use this model in your research or applications, please cite both our work and the original Whisper paper:
```bibtex
@misc{whisper-to-oliver,
author = {{Olib AI}},
title = {Whisper to Oliver: Fine-tuned Whisper for Real-World Conversational Audio},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/olib-ai/whisper-to-oliver}},
}
@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
```
## π₯ About Olib AI
[Olib AI](https://www.olib.ai) specializes in building AI services for businesses. Our team focuses on creating practical AI solutions that solve real-world problems.
**Contact Us:**
- π Website: [www.olib.ai](https://www.olib.ai)
- π§ Akram H. Sharkar: [[email protected]](mailto:[email protected])
- π§ Maya M. Sharkar: [[email protected]](mailto:[email protected])
- π» GitHub: [https://github.com/Olib-AI](https://github.com/Olib-AI)
---
<div align="center">
<strong>Built with β€οΈ by Olib AI</strong>
</div> |