madriClaro / PROJECT_STRUCTURE.md
Ruben
Integrate Aclarador with Groq API for clarity analysis
28aa7d9

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

πŸ“ Project Structure

Complete File Tree

madrid-analyzer-hf/
β”‚
β”œβ”€β”€ πŸ“„ app.py                          ⭐ Main Gradio application (run this!)
β”œβ”€β”€ πŸ“„ requirements.txt                 Python dependencies
β”œβ”€β”€ πŸ“„ README.md                        Space description (for HF)
β”œβ”€β”€ πŸ“„ QUICKSTART.md                   πŸš€ Start here! 30-min deploy guide
β”œβ”€β”€ πŸ“„ DEPLOYMENT_GUIDE.md             πŸ“– Detailed deployment instructions
β”œβ”€β”€ πŸ“„ .gitignore                       Git ignore rules
β”‚
β”œβ”€β”€ πŸ“ config/
β”‚   └── πŸ“„ database.py                 πŸ¦† DuckDB connection & schema
β”‚
β”œβ”€β”€ πŸ“ storage/
β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚   └── πŸ“„ repository.py               πŸ’Ύ Data access layer (all queries)
β”‚
β”œβ”€β”€ πŸ“ fetchers/
β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚   └── πŸ“„ rss_fetcher.py              πŸ“₯ RSS feed fetcher for Madrid
β”‚
β”œβ”€β”€ πŸ“ analyzers/
β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚   β”œβ”€β”€ πŸ“„ analyzer_wrapper.py         πŸ”§ Aclarador integration point
β”‚   └── πŸ“ aclarador/                  πŸ‘ˆ ADD YOUR CODE HERE!
β”‚       β”œβ”€β”€ πŸ“„ __init__.py
β”‚       β”œβ”€β”€ πŸ“„ README.md               πŸ“ Instructions for adding Aclarador
β”‚       └── ... (your Aclarador files go here)
β”‚
β”œβ”€β”€ πŸ“ schedulers/
β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚   └── πŸ“„ background_tasks.py         ⏰ Fetch & analyze scheduler
β”‚
└── πŸ“ utils/
    β”œβ”€β”€ πŸ“„ __init__.py
    └── πŸ“„ logger.py                   πŸ“Š Logging configuration

πŸ“„ File Descriptions

Core Files

app.py (Main Application)

  • Gradio interface with 4 tabs
  • Dashboard, Browser, Analytics, Settings
  • Background scheduler
  • Entry point for the app

requirements.txt (Dependencies)

  • All Python packages needed
  • Add your Aclarador's deps here
  • Installed automatically by HF Spaces

README.md (Space Description)

  • Shows on your HF Space page
  • Describes what the app does
  • For public visitors

Configuration

config/database.py (Database)

  • DuckDB connection setup
  • Schema creation (5 tables)
  • Persistent storage in /data/

Data Layer

storage/repository.py (Data Access)

  • All database queries
  • CRUD operations
  • Statistics methods
  • Search and filtering

Content Fetching

fetchers/rss_fetcher.py (RSS Fetcher)

  • Fetches from Madrid RSS feed
  • Parses entries
  • Cleans HTML to text
  • Deduplication

Analysis

analyzers/analyzer_wrapper.py (Integration)

  • πŸ”§ THIS IS WHERE YOU INTEGRATE ACLARADOR
  • Currently has placeholder
  • Update this to call your real Aclarador

analyzers/aclarador/ (Your Code)

  • πŸ‘ˆ PUT YOUR ACLARADOR CODE HERE
  • See README.md in that folder
  • Then update analyzer_wrapper.py

Background Tasks

schedulers/background_tasks.py (Main Pipeline)

  • Fetches content every 6 hours
  • Analyzes each item
  • Stores results
  • Error handling

Utilities

utils/logger.py (Logging)

  • Structured logging
  • Console output
  • Debug information

πŸ”„ Data Flow

1. Background Scheduler (every 6 hours)
   ↓
2. fetchers/rss_fetcher.py
   - Fetches from Madrid RSS
   ↓
3. storage/repository.py
   - Stores in DuckDB
   ↓
4. analyzers/analyzer_wrapper.py
   - Calls YOUR Aclarador
   ↓
5. storage/repository.py
   - Stores analysis results
   ↓
6. app.py (Gradio UI)
   - Displays in dashboard

🎯 What You Need to Modify

Required

  • analyzers/aclarador/ - Add your Aclarador code
  • analyzers/analyzer_wrapper.py - Update imports and mapping
  • requirements.txt - Add Aclarador's dependencies (if any)

Optional

  • app.py line 35 - Change fetch frequency (default: 6 hours)
  • app.py UI sections - Customize dashboard tabs
  • fetchers/ - Add more content sources

Never Modify (Unless You Know What You're Doing)

  • config/database.py - Database schema
  • storage/repository.py - Data access methods
  • schedulers/background_tasks.py - Background logic

πŸ“Š Database Schema (DuckDB)

Tables Created Automatically

  1. content_sources - RSS feeds and API sources
  2. content_items - Fetched content with metadata
  3. clarity_analyses - Your Aclarador's results
  4. analysis_history - Trends over time
  5. fetch_logs - Operation audit trail

Located at: /data/madrid.duckdb (persistent!)


πŸš€ Deployment Steps

  1. Copy this entire folder to your HF Space clone
  2. Add Aclarador to analyzers/aclarador/
  3. Update analyzer_wrapper.py with your imports
  4. Add dependencies to requirements.txt
  5. Commit and push to Hugging Face
  6. Done! Space builds automatically

πŸ“ File Sizes

app.py                    ~19 KB    (main application)
storage/repository.py     ~10 KB    (database queries)
fetchers/rss_fetcher.py   ~5 KB     (RSS parsing)
schedulers/background.py  ~6 KB     (fetch pipeline)
config/database.py        ~5 KB     (DB setup)
analyzer_wrapper.py       ~7 KB     (integration template)

Total: ~50 KB (without your Aclarador)


🎨 Gradio UI Structure (app.py)

Tab 1: Dashboard (lines 50-200)

  • Statistics display
  • Clarity distribution chart
  • Content timeline
  • Category breakdown

Tab 2: Browse Content (lines 202-280)

  • Search filters
  • Date range selector
  • Category dropdown
  • Results table

Tab 3: Analytics (lines 282-340)

  • Low clarity items
  • Export functionality
  • Data download

Tab 4: Settings (lines 342-420)

  • Manual fetch trigger
  • Database statistics
  • Recent logs viewer

πŸ” Finding Things

"Where do I...?"

Add my Aclarador? β†’ analyzers/aclarador/ (folder) β†’ analyzers/analyzer_wrapper.py (integration)

Change fetch frequency? β†’ app.py line 35 (change hours=6)

Add dependencies? β†’ requirements.txt

See database schema? β†’ config/database.py (lines 20-100)

Modify UI? β†’ app.py (lines 50-450)

Add new data source? β†’ Create new fetcher in fetchers/ β†’ Add to background_tasks.py

Debug errors? β†’ Check HF Space logs (App files β†’ Logs) β†’ See utils/logger.py for logging


βœ… Verification Checklist

After copying files:

  • app.py exists (main file)
  • requirements.txt exists
  • config/database.py exists
  • analyzers/analyzer_wrapper.py exists
  • analyzers/aclarador/init.py exists
  • All init.py files present
  • .gitignore exists

Before deploying:

  • Aclarador code in analyzers/aclarador/
  • analyzer_wrapper.py updated
  • Dependencies in requirements.txt
  • Tested locally (optional)

After deploying:

  • Space builds successfully
  • App starts (Running status)
  • Dashboard loads
  • Manual fetch works
  • Data persists

🎯 Quick Reference

Run locally:

python app.py

Test Aclarador:

python -c "from analyzers.analyzer_wrapper import AclaradorAnalyzer; a=AclaradorAnalyzer(); print(a.analyze('test'))"

Check structure:

find . -name "*.py" | sort

See QUICKSTART.md to deploy! πŸš€