cojournalist-data / DEPLOYMENT.md
Tom
Deploy Phi-3-mini with ZeroGPU and 50 req/day limit
c7dcc92
# πŸš€ Deployment Guide for HuggingFace Space with ZeroGPU
## βœ… Pre-Deployment Checklist
All code is ready! Here's what's configured:
- βœ… Model: `microsoft/Phi-3-mini-4k-instruct` (3.8B params)
- βœ… ZeroGPU support: Enabled with `@spaces.GPU` decorator
- βœ… Local/Space compatibility: Auto-detects environment
- βœ… Usage tracking: 50 requests/day per user
- βœ… Requirements: All dependencies listed
- βœ… README: Updated with instructions
## πŸ“‹ Deployment Steps
### Step 1: Push Code to Your Space
```bash
cd /Users/tom/code/cojournalist-data
# If not already initialized
git init
git remote add space https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data
# Or if already connected
git add .
git commit -m "Deploy Phi-3-mini with ZeroGPU and usage tracking"
git push space main
```
### Step 2: Configure Space Hardware
1. Go to your Space: `https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data`
2. Click **Settings** (βš™οΈ icon in top right)
3. Scroll to **Hardware** section
4. Select **ZeroGPU** from dropdown
5. Click **Save**
6. Space will restart automatically
### Step 3: Wait for Build
The Space will:
1. Install dependencies (~2-3 minutes)
2. Download Phi-3-mini model (~1-2 minutes, 7.6GB)
3. Load model into memory (~30 seconds)
4. Launch Gradio interface
**Total build time: ~5-7 minutes**
### Step 4: Test Your Space
Once running, test with these queries:
1. **English:** "Who are the parliamentarians from Zurich?"
2. **German:** "Zeige mir aktuelle Abstimmungen zur Klimapolitik"
3. **French:** "Qui sont les parlementaires de Zurich?"
4. **Italian:** "Mostrami i voti recenti sulla politica climatica"
## πŸ”§ Space Settings Summary
### Hardware
- **Type:** ZeroGPU
- **Cost:** FREE (included with Team plan)
- **GPU:** Nvidia H200 (70GB VRAM)
- **Allocation:** Dynamic (only when needed)
### Environment Variables (Optional)
If you want to configure anything:
- `HF_TOKEN`: Your HuggingFace token (for private models, not needed for Phi-3)
## πŸ“Š Expected Behavior
### First Request
- Takes ~5-10 seconds (GPU allocation + inference)
- Subsequent requests faster (~2-5 seconds)
### Rate Limiting
- 50 requests per day per user IP
- Error message shown when limit reached
- Resets daily at midnight UTC
### Model Loading
- Happens once on Space startup
- Cached for subsequent requests
- No reload needed between requests
## πŸ› Troubleshooting
### "Model not loading"
- Check Space logs for errors
- Verify ZeroGPU is selected in Hardware settings
- Ensure `spaces>=0.28.0` in requirements.txt
### "Out of memory"
- This shouldn't happen with ZeroGPU (70GB VRAM)
- If it does, contact HF support
### "Rate limit not working"
- Usage tracker uses in-memory storage
- Resets on Space restart
- IP-based tracking (works in production)
### "Slow inference"
- First request allocates GPU (slower)
- Subsequent requests use cached allocation
- Normal: 2-5 seconds per request
## πŸ’° Cost Breakdown
- **Team Plan:** $20/user/month (you already have this)
- **ZeroGPU:** FREE (included)
- **Inference:** FREE (no API calls)
- **Storage:** FREE (model cached by HF)
**Total additional cost: $0/month** πŸŽ‰
## πŸ”„ Updates & Maintenance
To update your Space:
```bash
# Make changes to code
git add .
git commit -m "Update: description of changes"
git push space main
```
Space will automatically rebuild and redeploy.
## πŸ“ˆ Monitoring Usage
Check your Space's metrics:
1. Go to Space page
2. Click "Analytics" tab
3. View daily/weekly usage stats
## 🎯 Next Steps After Deployment
1. βœ… Test all 4 languages
2. βœ… Verify tool calling works
3. βœ… Check rate limiting
4. βœ… Monitor performance
5. πŸ”œ Adjust system prompt if needed
6. πŸ”œ Fine-tune temperature/max_tokens if needed
## πŸ“ž Support
If you encounter issues:
- Check Space logs (Settings β†’ Logs)
- HuggingFace Discord: https://discord.gg/huggingface
- HF Forums: https://discuss.huggingface.co/
---
**You're ready to deploy! πŸš€**