Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.0.2
π Project Structure
Complete File Tree
madrid-analyzer-hf/
β
βββ π app.py β Main Gradio application (run this!)
βββ π requirements.txt Python dependencies
βββ π README.md Space description (for HF)
βββ π QUICKSTART.md π Start here! 30-min deploy guide
βββ π DEPLOYMENT_GUIDE.md π Detailed deployment instructions
βββ π .gitignore Git ignore rules
β
βββ π config/
β βββ π database.py π¦ DuckDB connection & schema
β
βββ π storage/
β βββ π __init__.py
β βββ π repository.py πΎ Data access layer (all queries)
β
βββ π fetchers/
β βββ π __init__.py
β βββ π rss_fetcher.py π₯ RSS feed fetcher for Madrid
β
βββ π analyzers/
β βββ π __init__.py
β βββ π analyzer_wrapper.py π§ Aclarador integration point
β βββ π aclarador/ π ADD YOUR CODE HERE!
β βββ π __init__.py
β βββ π README.md π Instructions for adding Aclarador
β βββ ... (your Aclarador files go here)
β
βββ π schedulers/
β βββ π __init__.py
β βββ π background_tasks.py β° Fetch & analyze scheduler
β
βββ π utils/
βββ π __init__.py
βββ π logger.py π Logging configuration
π File Descriptions
Core Files
app.py (Main Application)
- Gradio interface with 4 tabs
- Dashboard, Browser, Analytics, Settings
- Background scheduler
- Entry point for the app
requirements.txt (Dependencies)
- All Python packages needed
- Add your Aclarador's deps here
- Installed automatically by HF Spaces
README.md (Space Description)
- Shows on your HF Space page
- Describes what the app does
- For public visitors
Configuration
config/database.py (Database)
- DuckDB connection setup
- Schema creation (5 tables)
- Persistent storage in /data/
Data Layer
storage/repository.py (Data Access)
- All database queries
- CRUD operations
- Statistics methods
- Search and filtering
Content Fetching
fetchers/rss_fetcher.py (RSS Fetcher)
- Fetches from Madrid RSS feed
- Parses entries
- Cleans HTML to text
- Deduplication
Analysis
analyzers/analyzer_wrapper.py (Integration)
- π§ THIS IS WHERE YOU INTEGRATE ACLARADOR
- Currently has placeholder
- Update this to call your real Aclarador
analyzers/aclarador/ (Your Code)
- π PUT YOUR ACLARADOR CODE HERE
- See README.md in that folder
- Then update analyzer_wrapper.py
Background Tasks
schedulers/background_tasks.py (Main Pipeline)
- Fetches content every 6 hours
- Analyzes each item
- Stores results
- Error handling
Utilities
utils/logger.py (Logging)
- Structured logging
- Console output
- Debug information
π Data Flow
1. Background Scheduler (every 6 hours)
β
2. fetchers/rss_fetcher.py
- Fetches from Madrid RSS
β
3. storage/repository.py
- Stores in DuckDB
β
4. analyzers/analyzer_wrapper.py
- Calls YOUR Aclarador
β
5. storage/repository.py
- Stores analysis results
β
6. app.py (Gradio UI)
- Displays in dashboard
π― What You Need to Modify
Required
- analyzers/aclarador/ - Add your Aclarador code
- analyzers/analyzer_wrapper.py - Update imports and mapping
- requirements.txt - Add Aclarador's dependencies (if any)
Optional
- app.py line 35 - Change fetch frequency (default: 6 hours)
- app.py UI sections - Customize dashboard tabs
- fetchers/ - Add more content sources
Never Modify (Unless You Know What You're Doing)
- config/database.py - Database schema
- storage/repository.py - Data access methods
- schedulers/background_tasks.py - Background logic
π Database Schema (DuckDB)
Tables Created Automatically
- content_sources - RSS feeds and API sources
- content_items - Fetched content with metadata
- clarity_analyses - Your Aclarador's results
- analysis_history - Trends over time
- fetch_logs - Operation audit trail
Located at: /data/madrid.duckdb (persistent!)
π Deployment Steps
- Copy this entire folder to your HF Space clone
- Add Aclarador to
analyzers/aclarador/ - Update
analyzer_wrapper.pywith your imports - Add dependencies to
requirements.txt - Commit and push to Hugging Face
- Done! Space builds automatically
π File Sizes
app.py ~19 KB (main application)
storage/repository.py ~10 KB (database queries)
fetchers/rss_fetcher.py ~5 KB (RSS parsing)
schedulers/background.py ~6 KB (fetch pipeline)
config/database.py ~5 KB (DB setup)
analyzer_wrapper.py ~7 KB (integration template)
Total: ~50 KB (without your Aclarador)
π¨ Gradio UI Structure (app.py)
Tab 1: Dashboard (lines 50-200)
- Statistics display
- Clarity distribution chart
- Content timeline
- Category breakdown
Tab 2: Browse Content (lines 202-280)
- Search filters
- Date range selector
- Category dropdown
- Results table
Tab 3: Analytics (lines 282-340)
- Low clarity items
- Export functionality
- Data download
Tab 4: Settings (lines 342-420)
- Manual fetch trigger
- Database statistics
- Recent logs viewer
π Finding Things
"Where do I...?"
Add my Aclarador?
β analyzers/aclarador/ (folder)
β analyzers/analyzer_wrapper.py (integration)
Change fetch frequency?
β app.py line 35 (change hours=6)
Add dependencies?
β requirements.txt
See database schema?
β config/database.py (lines 20-100)
Modify UI?
β app.py (lines 50-450)
Add new data source?
β Create new fetcher in fetchers/
β Add to background_tasks.py
Debug errors?
β Check HF Space logs (App files β Logs)
β See utils/logger.py for logging
β Verification Checklist
After copying files:
- app.py exists (main file)
- requirements.txt exists
- config/database.py exists
- analyzers/analyzer_wrapper.py exists
- analyzers/aclarador/init.py exists
- All init.py files present
- .gitignore exists
Before deploying:
- Aclarador code in analyzers/aclarador/
- analyzer_wrapper.py updated
- Dependencies in requirements.txt
- Tested locally (optional)
After deploying:
- Space builds successfully
- App starts (Running status)
- Dashboard loads
- Manual fetch works
- Data persists
π― Quick Reference
Run locally:
python app.py
Test Aclarador:
python -c "from analyzers.analyzer_wrapper import AclaradorAnalyzer; a=AclaradorAnalyzer(); print(a.analyze('test'))"
Check structure:
find . -name "*.py" | sort
See QUICKSTART.md to deploy! π