Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: English Accent Detector
emoji: 💻
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: false
short_description: Lightweight Gradio demo that detects whether a video’s speec
Random‑slice English Accent Classifier
A Gradio-based tool for detecting and classifying English accents from public video URLs (e.g., YouTube, Loom). It samples multiple 8‑second clips, filters for English speech, then predicts accents (American, British, Australian, etc.) with confidence scores.
🚀 Features
- Public URL support: Download audio from YouTube, Loom, or direct MP4 links via
yt_dlp. - Language filtering: Uses SpeechBrain’s language-ID model to skip non-English content.
- Random‑slice sampling: Analyzes N random 8‑second windows to avoid full‑audio processing.
- Accent classification: Classifies each slice using a pretrained ECAPA model and aggregates via majority vote.
- Confidence scores: Returns confidence percentages for language detection, per‑slice accent, and overall accent.
- Interactive UI: Simple Gradio interface—paste URL, choose sample count, click Analyse.
🛠️ Requirements
- Python 3.8+ (tested on 3.10)
yt_dlptorch,torchaudiogradiospeechbrain
You can install all dependencies via:
pip install -r requirements.txt
requirements.txt should include:
yt_dlp
torch
torchaudio
gradio
speechbrain
📦 Installation
Clone the repo:
git clone cd english-accent-classifierInstall dependencies:
pip install -r requirements.txt
▶️ Usage
CLI Mode
Run the script directly:
python3 main.py --share
- The
--shareflag enables a public Gradio link for easy testing. - By default, the app runs on
http://localhost:7860.
Gradio UI
Open the link in your browser (e.g.,
http://localhost:7860).Paste a public video URL in the Video URL field.
Choose the number of random 8‑second samples (1–10).
Click Analyse.
Inspect the JSON output for:
language&language_confidenceaccent_overall&overall_confidence_avgper_cliparray with individual slice detailssummarystring
Example Output
{
"language": "English",
"language_confidence": 98.7,
"accent_overall": "British",
"overall_confidence_avg": 87.5,
"per_clip": [
{"clip": 0, "start": "00:01:23", "end": "00:01:31", "accent": "British", "confidence": 89.3},
...
],
"summary": "English detected. Overall accent = British (≈87.5% on 3/4 slices)."
}
📝 Code Structure
main.py: Core logic, Gradio UI, and entry point.Helpers:
download_audio: Fetches best audio track viayt_dlp.extract_wav: Cuts 8‑second WAV clips withtorchaudio.classify_language/classify_accent: Run SpeechBrain models.pick_random_offsets: Selects random start times.
Models:
speechbrain/lang-id-voxlingua107-ecapa(language detection)Jzuluaga/accent-id-commonaccent_ecapa(accent classification)
⚙️ Configuration
- DEVICE: Change to
"cuda"inmain.pyif you have a GPU. - Sample length: Default is 8 seconds—adjust
durinextract_wavif desired. - Model IDs: Swap out for custom models by updating
ACCENT_MODEL_IDandLANG_MODEL_ID.
🐛 Troubleshooting
- Private or invalid URLs: Ensure the video is publicly accessible.
- Short audio: Audio shorter than 8 seconds will trigger an error.
- Missing dependencies: Double-check
pip install -r requirements.txt. - Slow startup: Model downloads occur on first run—expect ~10–20 s delay.
- YouTube bot detection on hosted spaces: On Hugging Face Spaces, direct YouTube downloads may be blocked But loom link are working.