madriClaro / DEPLOYMENT_GUIDE.md
Ruben
Integrate Aclarador with Groq API for clarity analysis
28aa7d9
# πŸš€ Deployment Guide - Hugging Face Spaces
## Complete step-by-step guide to deploy Madrid Content Analyzer on Hugging Face Spaces
**Cost: $0/month forever!** πŸŽ‰
---
## πŸ“‹ Prerequisites
1. βœ… Hugging Face account (free!)
2. βœ… Git installed on your computer
3. βœ… Your Aclarador code ready
**Time needed**: 30-60 minutes
---
## Step 1: Create Hugging Face Account (5 minutes)
### 1.1 Sign Up
```
1. Go to https://huggingface.co
2. Click "Sign Up"
3. Use email or GitHub
4. Verify your email
```
**No credit card required!** βœ…
### 1.2 Get Access Token
```
1. Go to https://huggingface.co/settings/tokens
2. Click "New token"
3. Name it "madrid-analyzer"
4. Select "write" permissions
5. Click "Generate"
6. Copy the token (save it safely!)
```
---
## Step 2: Create Your Space (3 minutes)
### 2.1 Create Space
```
1. Go to https://huggingface.co/new-space
2. Fill in:
- Owner: your username
- Space name: madrid-content-analyzer
- License: MIT
- Select SDK: Gradio
- SDK version: 4.44.0
- Hardware: CPU basic (free!)
- Visibility: Public (or Private if you prefer)
3. Click "Create Space"
```
### 2.2 Note Your Space URL
Your space will be at:
```
https://huggingface.co/spaces/YOUR_USERNAME/madrid-content-analyzer
```
---
## Step 3: Clone and Setup Locally (10 minutes)
### 3.1 Clone Your Space
```bash
# Clone the empty space
git clone https://huggingface.co/spaces/YOUR_USERNAME/madrid-content-analyzer
cd madrid-content-analyzer
# Or if it asks for credentials:
git clone https://YOUR_USERNAME:[email protected]/spaces/YOUR_USERNAME/madrid-content-analyzer
```
### 3.2 Copy the Adapted Code
```bash
# You have two options:
# Option A: Download from outputs folder
# Copy everything from /mnt/user-data/outputs/madrid-analyzer-hf/
# to your local madrid-content-analyzer/ folder
# Option B: Copy manually
# Files needed (I've created them for you):
# - app.py
# - requirements.txt
# - README.md
# - config/database.py
# - config/settings.py
# - storage/repository.py
# - storage/models.py
# - fetchers/rss_fetcher.py
# - fetchers/api_fetcher.py
# - analyzers/analyzer_wrapper.py
# - schedulers/background_tasks.py
# - utils/logger.py
# - utils/text_cleaner.py
```
### 3.3 Add Your Aclarador Code
```bash
# Create analyzers directory if not exists
mkdir -p analyzers/aclarador
# Copy your Aclarador code
cp -r /path/to/your/aclarador/* analyzers/aclarador/
# Your Aclarador analyzer should be importable as:
# from analyzers.aclarador import your_analysis_function
```
### 3.4 Update analyzer_wrapper.py
Edit `analyzers/analyzer_wrapper.py` to integrate your Aclarador:
```python
# Import your actual Aclarador function
from analyzers.aclarador.your_module import analyze_text
class AclaradorAnalyzer:
def analyze(self, text, title=None):
# Call your actual analysis function
result = analyze_text(text)
# Map to expected format
return {
'overall_score': result.get('clarity_score', 0),
'readability_score': result.get('readability', 0),
'complexity_score': result.get('complexity', 0),
'sentence_stats': result.get('sentence_analysis', {}),
'vocabulary_stats': result.get('vocabulary', {}),
'jargon_count': len(result.get('jargon_terms', [])),
'jargon_words': result.get('jargon_terms', []),
'suggestions': result.get('recommendations', [])
}
```
---
## Step 4: Deploy to Hugging Face (5 minutes)
### 4.1 Configure Git
```bash
# Set your git email and name
git config user.email "[email protected]"
git config user.name "Your Name"
```
### 4.2 Commit Everything
```bash
# Add all files
git add .
# Commit
git commit -m "Initial deployment of Madrid Content Analyzer"
```
### 4.3 Push to Hugging Face
```bash
# Push to deploy
git push
# If it asks for credentials:
# Username: YOUR_USERNAME
# Password: YOUR_TOKEN (from Step 1.2)
```
### 4.4 Watch Build
```
1. Go to your Space URL
2. You'll see "Building..." status
3. Watch the logs
4. Takes 2-5 minutes
```
---
## Step 5: Verify Deployment (5 minutes)
### 5.1 Check Space is Running
```
1. Go to your Space URL
2. You should see the Gradio interface
3. It might show "Waiting for app to start..."
4. Give it 30-60 seconds
```
### 5.2 Test Dashboard
```
1. Click "Dashboard" tab
2. Click "πŸ”„ Refresh Statistics"
3. Should show initial stats (might be empty)
```
### 5.3 Trigger First Fetch
```
1. Go to "Settings" tab
2. Click "πŸ”„ Trigger Manual Fetch"
3. Wait 1-2 minutes
4. Go back to Dashboard
5. Click refresh - you should see data!
```
---
## Step 6: Configure Secrets (Optional)
If your Aclarador needs API keys or configuration:
### 6.1 Add Secrets
```
1. Go to your Space settings
2. Click "Repository secrets"
3. Add secrets:
- Name: ACLARADOR_API_KEY
- Value: your-key
4. Restart space
```
### 6.2 Access Secrets in Code
```python
import os
api_key = os.getenv('ACLARADOR_API_KEY')
```
---
## Step 7: Make Space Public/Private
### 7.1 Public Space (Recommended)
```
- Anyone can view
- Good for showcasing
- Free forever
```
### 7.2 Private Space
```
1. Go to Settings
2. Change visibility to "Private"
3. Only you can access
4. Still free!
```
---
## πŸŽ‰ You're Live!
Your Madrid Content Analyzer is now running **FREE forever** on Hugging Face Spaces!
**Your Space URL**:
```
https://huggingface.co/spaces/YOUR_USERNAME/madrid-content-analyzer
```
**Share it**:
```
https://YOUR_USERNAME-madrid-content-analyzer.hf.space
```
---
## πŸ”§ Post-Deployment Configuration
### Update Fetch Frequency
Edit `app.py` line ~35:
```python
scheduler.add_job(
fetch_and_analyze_content,
'interval',
hours=6, # Change this! (1, 6, 12, 24)
id='content_fetch'
)
```
Commit and push:
```bash
git add app.py
git commit -m "Update fetch frequency"
git push
```
### Update Data Retention
Add cleanup job in `app.py`:
```python
scheduler.add_job(
cleanup_old_data,
'interval',
days=7, # Run weekly
id='cleanup'
)
```
---
## πŸ“Š Monitoring Your Space
### Check Logs
```
1. Go to your Space
2. Click "App files" β†’ "Logs"
3. See real-time logs
```
### Check Database Size
```
1. Go to "Settings" tab in your app
2. Click "Refresh Database Stats"
3. See "Database Size" in MB
```
### Space is at 16GB limit?
```
1. Go to Settings tab
2. Run cleanup manually
3. Or decrease data retention period
```
---
## πŸ”„ Updating Your Space
### Update Code
```bash
# Make changes locally
# Edit files
# Commit and push
git add .
git commit -m "Update: your changes"
git push
# Space rebuilds automatically!
```
### Update Dependencies
```bash
# Edit requirements.txt
# Add/update packages
# Commit and push
git add requirements.txt
git commit -m "Update dependencies"
git push
```
---
## πŸ†˜ Troubleshooting
### Space won't start
**Check logs**: Look for errors in Space logs
**Common issues**:
- Missing dependencies in requirements.txt
- Import errors in code
- Database permission issues
**Solution**:
```bash
# Check requirements.txt has all deps
pip install -r requirements.txt # Test locally first
# Check imports work
python app.py # Test locally
```
### Database not persisting
**Issue**: Data disappears after restart
**Solution**: Make sure using `/data/` directory
```python
DB_PATH = '/data/madrid.duckdb' # βœ… Correct
DB_PATH = 'madrid.duckdb' # ❌ Wrong (ephemeral!)
```
### Scheduler not running
**Issue**: No automatic fetches
**Check**: Background scheduler started
```python
scheduler.start() # Make sure this is called!
```
### Out of memory
**Issue**: Space crashes with memory error
**Solution**:
1. Reduce fetch batch size
2. Add pagination to queries
3. Upgrade to better hardware (paid)
### Import errors
**Issue**: Can't import Aclarador
**Solution**:
```bash
# Check your analyzer structure
analyzers/
aclarador/
__init__.py # Make sure this exists!
your_code.py
```
---
## πŸ’‘ Pro Tips
### Tip 1: Test Locally First
```bash
# Before pushing, test locally
python app.py
# Visit http://localhost:7860
# Make sure everything works!
```
### Tip 2: Use .gitignore
Create `.gitignore`:
```
__pycache__/
*.pyc
.env
*.duckdb
.DS_Store
```
### Tip 3: Add Status Badge
Add to your Space README.md:
```markdown
![Space Status](https://huggingface.co/spaces/YOUR_USERNAME/madrid-content-analyzer/badge.svg)
```
### Tip 4: Monitor Resource Usage
HF Spaces shows CPU/Memory usage in Space settings
### Tip 5: Version Your Data
Before major changes:
```python
# Export data
export_data('csv')
# Make changes
# Can restore if needed
```
---
## πŸ“ˆ Scaling Up (If Needed)
### If You Outgrow Free Tier
**Paid Hardware Options**:
- **CPU Upgrade**: $0.03/hour (~$22/month)
- **Basic GPU**: $0.60/hour (~$432/month)
**But you probably won't need it!**
- Free tier handles 100K+ items easily
- DuckDB is very efficient
- 16GB is plenty
---
## βœ… Deployment Checklist
### Before Deployment
- [ ] Hugging Face account created
- [ ] Access token generated
- [ ] Space created
- [ ] Code copied to local clone
- [ ] Aclarador integrated
- [ ] Tested locally
### During Deployment
- [ ] All files committed
- [ ] Pushed to Hugging Face
- [ ] Build successful
- [ ] App starts without errors
### After Deployment
- [ ] Dashboard loads
- [ ] Manual fetch works
- [ ] Data persists
- [ ] Scheduler running
- [ ] Analysis working
### Post-Launch
- [ ] Set visibility (public/private)
- [ ] Share URL
- [ ] Monitor first few fetches
- [ ] Check database size
- [ ] Verify automatic fetches
---
## 🎊 You Did It!
You now have a **completely free** Madrid Content Analyzer running 24/7!
**What you saved**:
- Heroku: $84-168/year
- Server costs: $0/month
- Database: $0/month (vs $5-60/month)
**What you got**:
- Modern Gradio interface
- Fast DuckDB analytics
- 16GB storage
- Always-on service
- Beautiful visualizations
---
## πŸ“ž Need Help?
**Hugging Face Community**:
- Forums: https://discuss.huggingface.co
- Discord: https://hf.co/join/discord
- Documentation: https://huggingface.co/docs/hub/spaces
**Check Your Space Logs**:
- App files β†’ Logs
- See errors in real-time
---
**Congratulations! You're now running on Hugging Face Spaces! πŸŽ‰**
**Next**: Share your Space URL and start analyzing Madrid's content!