dogukancck's picture
Update README.md
80ce39f verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: English Accent Detector
emoji: 💻
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: false
short_description: Lightweight Gradio demo that detects whether a video’s speec

Random‑slice English Accent Classifier

A Gradio-based tool for detecting and classifying English accents from public video URLs (e.g., YouTube, Loom). It samples multiple 8‑second clips, filters for English speech, then predicts accents (American, British, Australian, etc.) with confidence scores.

🚀 Features

  • Public URL support: Download audio from YouTube, Loom, or direct MP4 links via yt_dlp.
  • Language filtering: Uses SpeechBrain’s language-ID model to skip non-English content.
  • Random‑slice sampling: Analyzes N random 8‑second windows to avoid full‑audio processing.
  • Accent classification: Classifies each slice using a pretrained ECAPA model and aggregates via majority vote.
  • Confidence scores: Returns confidence percentages for language detection, per‑slice accent, and overall accent.
  • Interactive UI: Simple Gradio interface—paste URL, choose sample count, click Analyse.

🛠️ Requirements

  • Python 3.8+ (tested on 3.10)
  • yt_dlp
  • torch, torchaudio
  • gradio
  • speechbrain

You can install all dependencies via:

pip install -r requirements.txt

requirements.txt should include:

yt_dlp
torch
torchaudio
gradio
speechbrain

📦 Installation

  1. Clone the repo:

    git clone 
    cd english-accent-classifier
    
  2. Install dependencies:

     pip install -r requirements.txt
    

▶️ Usage

CLI Mode

Run the script directly:

python3 main.py --share
  • The --share flag enables a public Gradio link for easy testing.
  • By default, the app runs on http://localhost:7860.

Gradio UI

  1. Open the link in your browser (e.g., http://localhost:7860).

  2. Paste a public video URL in the Video URL field.

  3. Choose the number of random 8‑second samples (1–10).

  4. Click Analyse.

  5. Inspect the JSON output for:

    • language & language_confidence
    • accent_overall & overall_confidence_avg
    • per_clip array with individual slice details
    • summary string

Example Output

{
  "language": "English",
  "language_confidence": 98.7,
  "accent_overall": "British",
  "overall_confidence_avg": 87.5,
  "per_clip": [
    {"clip": 0, "start": "00:01:23", "end": "00:01:31", "accent": "British", "confidence": 89.3},
    ...
  ],
  "summary": "English detected. Overall accent = British (≈87.5% on 3/4 slices)."
}

📝 Code Structure

  • main.py: Core logic, Gradio UI, and entry point.

  • Helpers:

    • download_audio: Fetches best audio track via yt_dlp.
    • extract_wav: Cuts 8‑second WAV clips with torchaudio.
    • classify_language / classify_accent: Run SpeechBrain models.
    • pick_random_offsets: Selects random start times.
  • Models:

    • speechbrain/lang-id-voxlingua107-ecapa (language detection)
    • Jzuluaga/accent-id-commonaccent_ecapa (accent classification)

⚙️ Configuration

  • DEVICE: Change to "cuda" in main.py if you have a GPU.
  • Sample length: Default is 8 seconds—adjust dur in extract_wav if desired.
  • Model IDs: Swap out for custom models by updating ACCENT_MODEL_ID and LANG_MODEL_ID.

🐛 Troubleshooting

  • Private or invalid URLs: Ensure the video is publicly accessible.
  • Short audio: Audio shorter than 8 seconds will trigger an error.
  • Missing dependencies: Double-check pip install -r requirements.txt.
  • Slow startup: Model downloads occur on first run—expect ~10–20 s delay.
  • YouTube bot detection on hosted spaces: On Hugging Face Spaces, direct YouTube downloads may be blocked But loom link are working.