orreries commited on
Commit
6df7c1b
Β·
verified Β·
1 Parent(s): ac1904f

Upload huggingface_audio_oma.py

Browse files

App file for audio transcription tool

Files changed (1) hide show
  1. huggingface_audio_oma.py +322 -0
huggingface_audio_oma.py ADDED
@@ -0,0 +1,322 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """HuggingFace-Audio_OMA
3
+
4
+ Automatically generated by Colab.
5
+
6
+ Original file is located at
7
+ https://colab.research.google.com/drive/1n_vCpc3G-TY3mIVX-FlN56kVuiuzNYqe
8
+
9
+ ## Setup
10
+ To get started install the `gradio` library along with `transformers`.
11
+ """
12
+
13
+ !pip -q install gradio #==4.36.1
14
+ !pip -q install transformers #==4.41.2
15
+
16
+ # the usual shorthand is to import gradio as gr
17
+ import gradio as gr
18
+
19
+ """### Audio-to-text
20
+ In this part we will build a demo that handles the first step of the meeting transcription tool: converting audio into text.
21
+
22
+ As we learned, the key ingredient to building a Gradio demo is to have a Python function that executes the logic we are trying to showcase. For the audio-to-text conversion, we will build our function using the awesome `transformers` library and its `pipeline` utility to use a popular audio-to-text model called `distil-whisper/distil-large-v3`.
23
+
24
+ The result is the following `transcribe` function, which takes as input the audio that we want to convert:
25
+
26
+
27
+ """
28
+
29
+ import os
30
+ import tempfile
31
+
32
+ import torch
33
+ import gradio as gr
34
+ from transformers import pipeline
35
+
36
+ device = 0 if torch.cuda.is_available() else "cpu"
37
+
38
+ AUDIO_MODEL_NAME = "distil-whisper/distil-large-v3" # faster and very close in performance to the full-size "openai/whisper-large-v3"
39
+ BATCH_SIZE = 8
40
+
41
+
42
+ pipe = pipeline(
43
+ task="automatic-speech-recognition",
44
+ model=AUDIO_MODEL_NAME,
45
+ chunk_length_s=30,
46
+ device=device,
47
+ )
48
+
49
+
50
+ def transcribe(audio_input):
51
+ """Function to convert audio to text."""
52
+ if audio_input is None:
53
+ raise gr.Error("No audio file submitted!")
54
+
55
+ output = pipe(
56
+ audio_input,
57
+ batch_size=BATCH_SIZE,
58
+ generate_kwargs={"task": "transcribe"},
59
+ return_timestamps=True
60
+ )
61
+ return output["text"]
62
+
63
+ """Now that we have our Python function, we can demo that by passing it into `gr.Interface`. Notice how in this case the input that the function expects is the audio that we want to convert. Gradio includes a ton useful components, one of which is [Audio](https://www.gradio.app/docs/gradio/audio), exactly what we need for our demo 🎢 😎.
64
+
65
+ """
66
+
67
+ part_1_demo = gr.Interface(
68
+ fn=transcribe,
69
+ inputs=gr.Audio(type="filepath"), # "filepath" passes a str path to a temporary file containing the audio
70
+ outputs=gr.Textbox(show_copy_button=True), # give users the option to copy the results
71
+ title="Transcribe Audio to Text", # give our demo a title :)
72
+ )
73
+
74
+ part_1_demo.launch()
75
+
76
+ """Go ahead and try it out πŸ‘†! You can upload an `.mp3` file or hit the 🎀 button to record your own voice.
77
+
78
+ For a sample file with an actual meeting recording, you can check out the [MeetingBank_Audio dataset](https://huggingface.co/datasets/huuuyeah/MeetingBank_Audio) which is a dataset of meetings from city councils of 6 major U.S. cities. For my own testing, I tried out a couple of the [Denver meetings](https://huggingface.co/datasets/huuuyeah/MeetingBank_Audio/blob/main/Denver/mp3/Denver-21.zip).
79
+
80
+ > [!TIP]
81
+ > Also check out `Interface`'s [from_pipeline](https://www.gradio.app/docs/gradio/interface#interface-from_pipeline) constructor which will directly build the `Interface` from a `pipeline`.
82
+
83
+ Using the Serverless Inferfence API means that instead of calling a model via a pipeline (like we did for the audio conversion part), we will call it from the `InferenceClient`, which is part of the `huggingface_hub` library ([Hub Python Library](https://huggingface.co/docs/huggingface_hub/en/package_reference/login)). And in turn, to use the `InferenceClient`, we need to log into the πŸ€— Hub using `notebook_login()`, which will produce a dialog box asking for your User Access Token to authenticate with the Hub.
84
+
85
+ You can manage your tokens from your [personal settings page](https://huggingface.co/settings/tokens), and please remember to use [fine-grained](https://huggingface.co/docs/hub/security-tokens) tokens as much as possible for enhanced security.
86
+ """
87
+
88
+ from huggingface_hub import notebook_login, InferenceClient
89
+
90
+ # running this will prompt you to enter your Hugging Face credentials
91
+ notebook_login()
92
+
93
+ """Now that we are logged into the Hub, we can write our text processing function using the Serverless Inference API via `InferenceClient`.
94
+
95
+ The code for this part will be structured into two functions:
96
+
97
+ - `build_messages`, to format the message prompt into the LLM;
98
+ - `organize_text`, to actually pass the raw meeting text into the LLM for organization (and summarization, depending on the prompt we provide).
99
+
100
+ """
101
+
102
+ # sample meeting transcript from huuuyeah/MeetingBank_Audio
103
+ # this is just a copy-paste from the output of part 1 using one of the Denver meetings
104
+ sample_transcript = """
105
+ Good evening. Welcome to the Denver City Council meeting of Monday, May 8, 2017. My name is Kelly Velez. I'm your Council Secretary. According to our rules of procedure, when our Council President, Albus Brooks, and Council President Pro Tem, JoLynn Clark, are both absent, the Council Secretary calls the meeting to order. Please rise and join Councilman Herndon in the Pledge of Allegiance. Madam Secretary, roll call. Roll call. Here. Mark. Espinosa. Here. Platt. Delmar. Here. Here. Here. Here. We have five members present. There is not a quorum this evening. Many of the council members are participating in an urban exploration trip in Portland, Oregon, pursuant to Section 3.3.4 of the city charter. Because there is not a quorum of seven council members present, all of tonight's business will move to next week, to Monday, May 15th. Seeing no other business before this body except to wish Councilwoman Keniche a very happy birthday this meeting is adjourned Thank you. A standard model and an energy efficient model likely will be returned to you in energy savings many times during its lifespan. Now, what size do you need? Air conditioners are not a one-size-or-type fits all. Before you buy an air conditioner, you need to consider the size of your home and the cost to operate the unit per hour. Do you want a room air conditioner, which costs less but cools a smaller area, or do you want a central air conditioner, which cools your entire house but costs more? Do your homework. Now, let's discuss evaporative coolers. In low humidity areas, evaporating water into the air provides a natural and energy efficient means of cooling. Evaporative coolers, also called swamp coolers, cool outdoor air by passing it over water saturated pads, causing the water to evaporate into it. Evaporative coolers cost about one half as much to install as central air conditioners and use about one-quarter as much energy. However, they require more frequent maintenance than refrigerated air conditioners, and they're suitable only for areas with low humidity. Watch the maintenance tips at the end of this segment to learn more. And finally, fans. When air moves around in your home, it creates a wind chill effect. A mere two-mile-an-hour breeze will make your home feel four degrees cooler and therefore you can set your thermostat a bit higher. Ceiling fans and portable oscillating fans are cheap to run and they make your house feel cooler. You can also install a whole house fan to draw the hot air out of your home. A whole house fan draws cool outdoor air inside through open windows and exhausts hot room air through the attic to the outside. The result is excellent ventilation, lower indoor temperatures, and improved evaporative cooling. But remember, there are many low-cost, no-cost ways that you can keep your home cool. You should focus on these long before you turn on your AC or even before you purchase an AC. But if you are going to purchase a new cooling system, remember to get one that's energy efficient and the correct size for your home. Wait, wait, don't go away, there's more. After this segment of the presentation is over, you're going to be given the option to view maintenance tips about air conditioners and evaporative coolers. Now all of these tips are brought to you by the people at Xcel Energy. Thanks for watching.
106
+ """
107
+
108
+ sample_transcript = """
109
+ Hello, my first audio was not long enough, so the text output was not long enough for it to work right. So I want to do a longer, I want to say more things. I'm going to start saying more things in this short time period so it could have more words and I can make this thing work better. Okay, thank you.
110
+ """
111
+
112
+ from huggingface_hub import InferenceClient
113
+
114
+ TEXT_MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
115
+
116
+ client = InferenceClient()
117
+
118
+ def organize_text(meeting_transcript):
119
+ messages = build_messages(meeting_transcript)
120
+ response = client.chat_completion(
121
+ messages, model=TEXT_MODEL_NAME, max_tokens=1000, seed=430
122
+ )
123
+ return response.choices[0].message.content
124
+
125
+
126
+ def build_messages(meeting_transcript) -> list:
127
+ system_input = "You are an assistant that organizes meeting minutes."
128
+ user_input = """Take this raw meeting transcript and return an organized version.
129
+ Here is the transcript:
130
+ {meeting_transcript}
131
+ """.format(meeting_transcript=meeting_transcript)
132
+
133
+ messages = [
134
+ {"role": "system", "content": system_input},
135
+ {"role": "user", "content": user_input},
136
+ ]
137
+ return messages
138
+
139
+ """And now that we have our text organization function `organize_text`, we can build a demo for it as well:"""
140
+
141
+ part_2_demo = gr.Interface(
142
+ fn=organize_text,
143
+ inputs=gr.Textbox(value=sample_transcript),
144
+ outputs=gr.Textbox(show_copy_button=True),
145
+ title="Clean Up Transcript Text",
146
+ )
147
+ part_2_demo.launch()
148
+
149
+ """Go ahead and try it out πŸ‘†! If you hit "Submit" in the demo above, you will see that the output text is a much clearer and organized version of the transcript, with a title and sections for the different parts of the meeting.
150
+
151
+ See if you can get a summary by playing around with the `user_input` variable that controls the LLM prompt.
152
+
153
+ ### Putting it all together
154
+ At this point we have a function for each of the two steps we want out meeting transcription tool to do:
155
+ 1. convert the audio into a text file, and
156
+ 2. organize that text file into a nicely-formatted meeting document.
157
+
158
+ All we have to do next is stitch these two functions together and build a demo for the combined steps. In other words, our complete meeting transcription tool is just a new function (which we'll creatively call `meeting_transcript_tool` πŸ˜€) that takes the output of `transcribe` and passes it into `organize_text`:
159
+ """
160
+
161
+ def meeting_transcript_tool(audio_input):
162
+ meeting_text = transcribe(audio_input)
163
+ organized_text = organize_text(meeting_text)
164
+ return organized_text
165
+
166
+
167
+ full_demo = gr.Interface(
168
+ fn=meeting_transcript_tool,
169
+ inputs=gr.Audio(type="filepath"),
170
+ outputs=gr.Textbox(show_copy_button=True),
171
+ title="The Complete Meeting Transcription Tool",
172
+ )
173
+ full_demo.launch()
174
+
175
+ """Go ahead and try it out πŸ‘†! This is now the full demo of our transcript tool. If you give it an audio file, the output will be the already-organized (and potentially summarized) version of the meeting. Super cool 😎.
176
+
177
+ ## Move your demo into πŸ€— Spaces
178
+ If you made it this far, now you know the basics of how to create a demo of your machine learning model using Gradio πŸ‘!
179
+
180
+ Up next we are going to show you how to take your brand new demo to Hugging Face Spaces. On top of the ease of use and powerful features of Gradio, moving your demo to πŸ€— Spaces gives you the benefit of permanent hosting, ease of deployment each time you update your app, and the ability to share your work with anyone! Do keep in mind that your Space will go to sleep after a while unless you are using it or making changes to it.
181
+
182
+
183
+ The first step is to head over to [https://huggingface.co/new-space](https://huggingface.co/new-space), select "Gradio" from the templates, and leave the rest of the options as default for now (you can change these later):
184
+
185
+ <img src="https://github.com/dmaniloff/public-screenshots/blob/main/create-new-space.png?raw=true" width="350" alt="image description">
186
+
187
+ This will result in a newly created Space that you can populate with your demo code. As an example, I created the πŸ€— Space `jjs1111/class10hw`, which you can access [here](https://huggingface.co/spaces/jjs1111/class10hw).
188
+
189
+ If you are using the free tier of HuggingFace, you'll be constrained by a lack of GPU when using the serverless inference API, but you get a small number of free monthly credits. I recommend starting by trying to run the first part of this tutorial, the speech-to-text function that's configured to run locally rather than with the serverless inference API, and swapping the `distil-whisper/distil-large-v3` model with the smaller [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny), as it will run faster over a CPU.
190
+
191
+ Note! You'll need to add an environment variable to your HuggingFace Space for the serverless inference API to run. It is the `HF_TOKEN` variable, and you can generate it at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens), then add it as a new secret under the `Variables and secrets` header on the `Settings` page of your new Space.
192
+
193
+ There are two files we need to edit:
194
+
195
+ * `app.py` -- This is where the demo code lives. It should contain all the code we used earlier, or at least the transcribe function if you're on the free tier of HuggingFace:
196
+ ```python
197
+ # outline of app.py:
198
+
199
+ #imports and global variables
200
+
201
+ def meeting_transcript_tool(...):
202
+ ...
203
+
204
+ def transcribe(...):
205
+ ...
206
+
207
+ def organize_text(...):
208
+ ...
209
+
210
+ # gradio instantiation / launch code
211
+
212
+ ```
213
+
214
+ * `requirements.txt` -- This is where we tell our Space about the libraries it will need. It should look something like this:
215
+ ```
216
+ # contents of requirements.txt:
217
+ torch
218
+ transformers
219
+ ```
220
+
221
+ ## Gradio comes with batteries included πŸ”‹
222
+
223
+ Gradio comes with lots of cool functionality right out of the box. We won't be able to cover all of it in this notebook, but here's 3 that we will check out:
224
+
225
+ - Access as an API
226
+ - Sharing via public URL
227
+ - Flagging
228
+
229
+ ### Access as an API
230
+ One of the benefits of building your web demos with Gradio is that you automatically get an API πŸ™Œ! This means that you can access the functionality of your Python function using a standard HTTP client like `curl` or the Python `requests` library.
231
+
232
+ If you look closely at the demos we created above, you will see at the bottom there is a link that says "Use via API". If you click on it in the Space I created ([jjs1111/class10hw](https://huggingface.co/spaces/jjs1111/class10hw/blob/main/app.py)), you will see something similar to the following:
233
+
234
+ <img src="https://github.com/dmaniloff/public-screenshots/blob/main/gradio-as-api.png?raw=true" width="750" alt="image description">
235
+
236
+ Let's go ahead and copy-paste that code below to use our Space as an API:
237
+ """
238
+
239
+ !pip install gradio_client
240
+
241
+ from gradio_client import Client, handle_file
242
+
243
+ client = Client("jjs1111/class10hw")
244
+ result = client.predict(
245
+ audio_input=handle_file('https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0010_8k.wav'),
246
+ api_name="/predict"
247
+ )
248
+ print(result)
249
+
250
+ """Wow! What happened there? Let's break it down:
251
+
252
+ - We installed the `gradio_client`, which is a package that is specifically designed to interact with APIs built with Gradio.
253
+ - We instantiated the client by providing the name of the πŸ€— Space that we want to query.
254
+ - We called the `predict` method of the client and passed in a sample audio file to it.
255
+
256
+ The Gradio client takes care of making the HTTP POST for us, and it also provides functionality like reading the input audio file that our meeting transcript tool will process (via the function `handle_file`).
257
+
258
+ Again, using this client is a choice, and you can just as well run a `curl -X POST https://dmaniloff-meeting-transcript-tool.hf.space/call/predict [...]` and pass in all the parameters needed in the request.
259
+
260
+ > [!TIP]
261
+ > The output that we get from the call above is a made-up meeting that was generated by the LLM that we are using for text organization. This is because the sample input file isn't an actual meeting recording. You can tweak the LLM's prompt to handle this case.
262
+
263
+ ### Share via public URL
264
+ Another cool feature built into Gradio is that even if you build your demo on your local computer (before you move it into a πŸ€— Space) you can still share this with anyone in the world by passing in `share=True` into `launch` like so:
265
+
266
+ ```python
267
+ demo.launch(share=True)
268
+ ```
269
+
270
+ You might have noticed that in this Google Colab environment that behaviour is enabled by default, and so the previous demos that we created already had a public URL that you can share 🌎. Go back ⬆ and look at the logs for `Running on public URL:` to find it πŸ”Ž!
271
+
272
+ ### Flagging
273
+ [Flagging](https://www.gradio.app/guides/using-flagging) is a feature built into Gradio that allows the users of your demo to provide feedback. You might have noticed that the first demo we created had a `Flag` button at the bottom.
274
+
275
+ Under the default options, if a user clicks that button then the input and output samples are saved into a CSV log file that you can review later. If the demo involves audio (like in our case), these are saved separately in a parallel directory and the paths to these files are saved in the CSV file.
276
+
277
+ Go back and play with our first demo once more, and then click the `Flag` button. You will see that a new log file is created in the `flagged` directory:
278
+ """
279
+
280
+ !cat flagged/log.csv
281
+
282
+ """In this case I set inputs to `name=diego` and `intensity=29`, which I then flagged. You can see that the log file includes the inputs to the function, the output `"Hello, diego!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"`, and also a timestamp.
283
+
284
+ While a list of inputs and outputs that your users found problematic is better than nothing, Gradio's flagging feature allows you to do much more. For example, you can provide a `flagging_options` parameter that lets you customize the kind of feedback or errors that you can receive, such as `["Incorrect", "Ambiguous"]`. Note that this requires that `allow_flagging` is set to `"manual"`:
285
+ """
286
+
287
+ demo_with_custom_flagging = gr.Interface(
288
+ fn=greet,
289
+ inputs=["text", "slider"], # the inputs are a text box and a slider ("text" and "slider" are components in Gradio)
290
+ outputs=["text"], # the output is a text box
291
+ allow_flagging="manual",
292
+ flagging_options=["Incorrect", "Ambiguous"],
293
+ )
294
+ demo_with_custom_flagging.launch()
295
+
296
+ """Go ahead and try it out πŸ‘†! You can see that the flagging buttons now are `Flag as Incorrect` and `Flag as Ambiguous`, and the new log file will reflect those options:"""
297
+
298
+ !cat flagged/log.csv
299
+
300
+ """## Wrap up & Next Steps
301
+ In this notebook we learned how to demo any machine learning model using Gradio.
302
+
303
+ First, we learned the basics of setting up an interface for a simple Python function; and second, we covered Gradio's true strength: building demos for machine learning models.
304
+
305
+ For this, we learned how easy it is to leverage models in the πŸ€— Hub via the `transformers` library and its `pipeline` function, and how to use multimedia inputs like `gr.Audio`.
306
+
307
+ Third, we covered how to host your Gradio demo on πŸ€— Spaces, which lets you keep your demo running in the cloud and gives you flexibility in terms of the compute requirements for your demo.
308
+
309
+ Finally, we showcased a few of the super cool batteries included that come with Gradio such as API access, public URLs, and Flagging.
310
+
311
+ For next steps, check out the `Further Reading` links at the end of each section.
312
+
313
+ ## ⏭️ Further reading
314
+ - [Your first demo with gradio](https://www.gradio.app/guides/quickstart#building-your-first-demo)
315
+ - [Gradio Components](https://www.gradio.app/docs/gradio/introduction)
316
+ - [The transformers library](https://huggingface.co/docs/transformers/en/index)
317
+ - [The pipeline function](https://huggingface.co/docs/transformers/en/main_classes/pipelines)
318
+ - [Hub Python Library](https://huggingface.co/docs/huggingface_hub/en/package_reference/login)
319
+ - [Serverless Inference API](https://huggingface.co/docs/api-inference/index)
320
+ - [πŸ€— Spaces](https://huggingface.co/spaces)
321
+ - [Spaces documentation](https://huggingface.co/docs/hub/spaces)
322
+ """