Spaces:

dogukancck
/

English_Accent_Detector

Sleeping

App Files Files Community

English_Accent_Detector / README.md

dogukancck

Update README.md

80ce39f verified 7 months ago

preview code

raw

history blame contribute delete

3.95 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: English Accent Detector
emoji: 💻
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: false
short_description: Lightweight Gradio demo that detects whether a video’s speec

Random‑slice English Accent Classifier

A Gradio-based tool for detecting and classifying English accents from public video URLs (e.g., YouTube, Loom). It samples multiple 8‑second clips, filters for English speech, then predicts accents (American, British, Australian, etc.) with confidence scores.

🚀 Features

Public URL support: Download audio from YouTube, Loom, or direct MP4 links via yt_dlp.
Language filtering: Uses SpeechBrain’s language-ID model to skip non-English content.
Random‑slice sampling: Analyzes N random 8‑second windows to avoid full‑audio processing.
Accent classification: Classifies each slice using a pretrained ECAPA model and aggregates via majority vote.
Confidence scores: Returns confidence percentages for language detection, per‑slice accent, and overall accent.
Interactive UI: Simple Gradio interface—paste URL, choose sample count, click Analyse.

🛠️ Requirements

Python 3.8+ (tested on 3.10)
yt_dlp
torch, torchaudio
gradio
speechbrain

You can install all dependencies via:

pip install -r requirements.txt

requirements.txt should include:

yt_dlp
torch
torchaudio
gradio
speechbrain

📦 Installation

Clone the repo:

git clone 
cd english-accent-classifier

Install dependencies:
```
 pip install -r requirements.txt
```

▶️ Usage

CLI Mode

Run the script directly:

python3 main.py --share

The --share flag enables a public Gradio link for easy testing.
By default, the app runs on http://localhost:7860.

Gradio UI

Open the link in your browser (e.g., http://localhost:7860).
Paste a public video URL in the Video URL field.
Choose the number of random 8‑second samples (1–10).
Click Analyse.
Inspect the JSON output for:
- language & language_confidence
- accent_overall & overall_confidence_avg
- per_clip array with individual slice details
- summary string

Example Output

{
  "language": "English",
  "language_confidence": 98.7,
  "accent_overall": "British",
  "overall_confidence_avg": 87.5,
  "per_clip": [
    {"clip": 0, "start": "00:01:23", "end": "00:01:31", "accent": "British", "confidence": 89.3},
    ...
  ],
  "summary": "English detected. Overall accent = British (≈87.5% on 3/4 slices)."
}

📝 Code Structure

main.py: Core logic, Gradio UI, and entry point.
Helpers:
- download_audio: Fetches best audio track via yt_dlp.
- extract_wav: Cuts 8‑second WAV clips with torchaudio.
- classify_language / classify_accent: Run SpeechBrain models.
- pick_random_offsets: Selects random start times.
Models:
- speechbrain/lang-id-voxlingua107-ecapa (language detection)
- Jzuluaga/accent-id-commonaccent_ecapa (accent classification)

⚙️ Configuration

DEVICE: Change to "cuda" in main.py if you have a GPU.
Sample length: Default is 8 seconds—adjust dur in extract_wav if desired.
Model IDs: Swap out for custom models by updating ACCENT_MODEL_ID and LANG_MODEL_ID.

🐛 Troubleshooting

Private or invalid URLs: Ensure the video is publicly accessible.
Short audio: Audio shorter than 8 seconds will trigger an error.
Missing dependencies: Double-check pip install -r requirements.txt.
Slow startup: Model downloads occur on first run—expect ~10–20 s delay.
YouTube bot detection on hosted spaces: On Hugging Face Spaces, direct YouTube downloads may be blocked But loom link are working.