madriClaro / DEPLOYMENT_GUIDE.md
Ruben
Integrate Aclarador with Groq API for clarity analysis
28aa7d9

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

πŸš€ Deployment Guide - Hugging Face Spaces

Complete step-by-step guide to deploy Madrid Content Analyzer on Hugging Face Spaces

Cost: $0/month forever! πŸŽ‰


πŸ“‹ Prerequisites

  1. βœ… Hugging Face account (free!)
  2. βœ… Git installed on your computer
  3. βœ… Your Aclarador code ready

Time needed: 30-60 minutes


Step 1: Create Hugging Face Account (5 minutes)

1.1 Sign Up

1. Go to https://huggingface.co
2. Click "Sign Up"
3. Use email or GitHub
4. Verify your email

No credit card required! βœ…

1.2 Get Access Token

1. Go to https://huggingface.co/settings/tokens
2. Click "New token"
3. Name it "madrid-analyzer"
4. Select "write" permissions
5. Click "Generate"
6. Copy the token (save it safely!)

Step 2: Create Your Space (3 minutes)

2.1 Create Space

1. Go to https://huggingface.co/new-space
2. Fill in:
   - Owner: your username
   - Space name: madrid-content-analyzer
   - License: MIT
   - Select SDK: Gradio
   - SDK version: 4.44.0
   - Hardware: CPU basic (free!)
   - Visibility: Public (or Private if you prefer)
3. Click "Create Space"

2.2 Note Your Space URL

Your space will be at:

https://huggingface.co/spaces/YOUR_USERNAME/madrid-content-analyzer

Step 3: Clone and Setup Locally (10 minutes)

3.1 Clone Your Space

# Clone the empty space
git clone https://huggingface.co/spaces/YOUR_USERNAME/madrid-content-analyzer
cd madrid-content-analyzer

# Or if it asks for credentials:
git clone https://YOUR_USERNAME:[email protected]/spaces/YOUR_USERNAME/madrid-content-analyzer

3.2 Copy the Adapted Code

# You have two options:

# Option A: Download from outputs folder
# Copy everything from /mnt/user-data/outputs/madrid-analyzer-hf/
# to your local madrid-content-analyzer/ folder

# Option B: Copy manually
# Files needed (I've created them for you):
# - app.py
# - requirements.txt  
# - README.md
# - config/database.py
# - config/settings.py
# - storage/repository.py
# - storage/models.py
# - fetchers/rss_fetcher.py
# - fetchers/api_fetcher.py
# - analyzers/analyzer_wrapper.py
# - schedulers/background_tasks.py
# - utils/logger.py
# - utils/text_cleaner.py

3.3 Add Your Aclarador Code

# Create analyzers directory if not exists
mkdir -p analyzers/aclarador

# Copy your Aclarador code
cp -r /path/to/your/aclarador/* analyzers/aclarador/

# Your Aclarador analyzer should be importable as:
# from analyzers.aclarador import your_analysis_function

3.4 Update analyzer_wrapper.py

Edit analyzers/analyzer_wrapper.py to integrate your Aclarador:

# Import your actual Aclarador function
from analyzers.aclarador.your_module import analyze_text

class AclaradorAnalyzer:
    def analyze(self, text, title=None):
        # Call your actual analysis function
        result = analyze_text(text)
        
        # Map to expected format
        return {
            'overall_score': result.get('clarity_score', 0),
            'readability_score': result.get('readability', 0),
            'complexity_score': result.get('complexity', 0),
            'sentence_stats': result.get('sentence_analysis', {}),
            'vocabulary_stats': result.get('vocabulary', {}),
            'jargon_count': len(result.get('jargon_terms', [])),
            'jargon_words': result.get('jargon_terms', []),
            'suggestions': result.get('recommendations', [])
        }

Step 4: Deploy to Hugging Face (5 minutes)

4.1 Configure Git

# Set your git email and name
git config user.email "[email protected]"
git config user.name "Your Name"

4.2 Commit Everything

# Add all files
git add .

# Commit
git commit -m "Initial deployment of Madrid Content Analyzer"

4.3 Push to Hugging Face

# Push to deploy
git push

# If it asks for credentials:
# Username: YOUR_USERNAME
# Password: YOUR_TOKEN (from Step 1.2)

4.4 Watch Build

1. Go to your Space URL
2. You'll see "Building..." status
3. Watch the logs
4. Takes 2-5 minutes

Step 5: Verify Deployment (5 minutes)

5.1 Check Space is Running

1. Go to your Space URL
2. You should see the Gradio interface
3. It might show "Waiting for app to start..."
4. Give it 30-60 seconds

5.2 Test Dashboard

1. Click "Dashboard" tab
2. Click "πŸ”„ Refresh Statistics"
3. Should show initial stats (might be empty)

5.3 Trigger First Fetch

1. Go to "Settings" tab
2. Click "πŸ”„ Trigger Manual Fetch"
3. Wait 1-2 minutes
4. Go back to Dashboard
5. Click refresh - you should see data!

Step 6: Configure Secrets (Optional)

If your Aclarador needs API keys or configuration:

6.1 Add Secrets

1. Go to your Space settings
2. Click "Repository secrets"
3. Add secrets:
   - Name: ACLARADOR_API_KEY
   - Value: your-key
4. Restart space

6.2 Access Secrets in Code

import os

api_key = os.getenv('ACLARADOR_API_KEY')

Step 7: Make Space Public/Private

7.1 Public Space (Recommended)

- Anyone can view
- Good for showcasing
- Free forever

7.2 Private Space

1. Go to Settings
2. Change visibility to "Private"
3. Only you can access
4. Still free!

πŸŽ‰ You're Live!

Your Madrid Content Analyzer is now running FREE forever on Hugging Face Spaces!

Your Space URL:

https://huggingface.co/spaces/YOUR_USERNAME/madrid-content-analyzer

Share it:

https://YOUR_USERNAME-madrid-content-analyzer.hf.space

πŸ”§ Post-Deployment Configuration

Update Fetch Frequency

Edit app.py line ~35:

scheduler.add_job(
    fetch_and_analyze_content,
    'interval',
    hours=6,  # Change this! (1, 6, 12, 24)
    id='content_fetch'
)

Commit and push:

git add app.py
git commit -m "Update fetch frequency"
git push

Update Data Retention

Add cleanup job in app.py:

scheduler.add_job(
    cleanup_old_data,
    'interval',
    days=7,  # Run weekly
    id='cleanup'
)

πŸ“Š Monitoring Your Space

Check Logs

1. Go to your Space
2. Click "App files" β†’ "Logs"
3. See real-time logs

Check Database Size

1. Go to "Settings" tab in your app
2. Click "Refresh Database Stats"
3. See "Database Size" in MB

Space is at 16GB limit?

1. Go to Settings tab
2. Run cleanup manually
3. Or decrease data retention period

πŸ”„ Updating Your Space

Update Code

# Make changes locally
# Edit files

# Commit and push
git add .
git commit -m "Update: your changes"
git push

# Space rebuilds automatically!

Update Dependencies

# Edit requirements.txt
# Add/update packages

# Commit and push
git add requirements.txt
git commit -m "Update dependencies"
git push

πŸ†˜ Troubleshooting

Space won't start

Check logs: Look for errors in Space logs Common issues:

  • Missing dependencies in requirements.txt
  • Import errors in code
  • Database permission issues

Solution:

# Check requirements.txt has all deps
pip install -r requirements.txt  # Test locally first

# Check imports work
python app.py  # Test locally

Database not persisting

Issue: Data disappears after restart Solution: Make sure using /data/ directory

DB_PATH = '/data/madrid.duckdb'  # βœ… Correct
DB_PATH = 'madrid.duckdb'        # ❌ Wrong (ephemeral!)

Scheduler not running

Issue: No automatic fetches Check: Background scheduler started

scheduler.start()  # Make sure this is called!

Out of memory

Issue: Space crashes with memory error Solution:

  1. Reduce fetch batch size
  2. Add pagination to queries
  3. Upgrade to better hardware (paid)

Import errors

Issue: Can't import Aclarador Solution:

# Check your analyzer structure
analyzers/
  aclarador/
    __init__.py  # Make sure this exists!
    your_code.py

πŸ’‘ Pro Tips

Tip 1: Test Locally First

# Before pushing, test locally
python app.py

# Visit http://localhost:7860
# Make sure everything works!

Tip 2: Use .gitignore

Create .gitignore:

__pycache__/
*.pyc
.env
*.duckdb
.DS_Store

Tip 3: Add Status Badge

Add to your Space README.md:

![Space Status](https://huggingface.co/spaces/YOUR_USERNAME/madrid-content-analyzer/badge.svg)

Tip 4: Monitor Resource Usage

HF Spaces shows CPU/Memory usage in Space settings

Tip 5: Version Your Data

Before major changes:

# Export data
export_data('csv')

# Make changes

# Can restore if needed

πŸ“ˆ Scaling Up (If Needed)

If You Outgrow Free Tier

Paid Hardware Options:

  • CPU Upgrade: $0.03/hour (~$22/month)
  • Basic GPU: $0.60/hour (~$432/month)

But you probably won't need it!

  • Free tier handles 100K+ items easily
  • DuckDB is very efficient
  • 16GB is plenty

βœ… Deployment Checklist

Before Deployment

  • Hugging Face account created
  • Access token generated
  • Space created
  • Code copied to local clone
  • Aclarador integrated
  • Tested locally

During Deployment

  • All files committed
  • Pushed to Hugging Face
  • Build successful
  • App starts without errors

After Deployment

  • Dashboard loads
  • Manual fetch works
  • Data persists
  • Scheduler running
  • Analysis working

Post-Launch

  • Set visibility (public/private)
  • Share URL
  • Monitor first few fetches
  • Check database size
  • Verify automatic fetches

🎊 You Did It!

You now have a completely free Madrid Content Analyzer running 24/7!

What you saved:

  • Heroku: $84-168/year
  • Server costs: $0/month
  • Database: $0/month (vs $5-60/month)

What you got:

  • Modern Gradio interface
  • Fast DuckDB analytics
  • 16GB storage
  • Always-on service
  • Beautiful visualizations

πŸ“ž Need Help?

Hugging Face Community:

Check Your Space Logs:

  • App files β†’ Logs
  • See errors in real-time

Congratulations! You're now running on Hugging Face Spaces! πŸŽ‰

Next: Share your Space URL and start analyzing Madrid's content!