Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.52.2
π DEPLOYMENT: Supabase + HuggingFace Spaces
Your SAP Chatbot now uses production-grade infrastructure:
- Vector DB: Supabase pgvector
- App Hosting: HuggingFace Spaces (Docker β Streamlit)
- Ingestion: GitHub Actions (automated)
- LLM: HuggingFace Inference API
Total cost: $0-25/month (Supabase free or $25 pro)
π Step-by-Step Deployment
Phase 1: Supabase Setup (10 minutes)
1.1 Create Supabase Project
1. Go to https://supabase.com
2. Click "Start your project"
3. Sign up with GitHub (free)
4. Create organization & project
5. Choose region (closest to you)
6. Wait for initialization (~2 min)
1.2 Enable pgvector
-- In Supabase Dashboard β SQL Editor
CREATE EXTENSION IF NOT EXISTS vector;
1.3 Create Documents Table
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
source TEXT,
url TEXT,
title TEXT,
content TEXT,
chunk_id INT,
embedding VECTOR(384),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create index for faster search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
1.4 Create Search Function
CREATE OR REPLACE FUNCTION search_documents(query_embedding VECTOR, k INT DEFAULT 5)
RETURNS TABLE(id BIGINT, source TEXT, url TEXT, title TEXT, content TEXT, chunk_id INT, distance FLOAT8) AS $$
BEGIN
RETURN QUERY
SELECT
documents.id,
documents.source,
documents.url,
documents.title,
documents.content,
documents.chunk_id,
1 - (documents.embedding <=> query_embedding) AS distance
FROM documents
ORDER BY documents.embedding <=> query_embedding
LIMIT k;
END;
$$ LANGUAGE plpgsql;
1.5 Get Credentials
In Supabase Dashboard β Settings β API
Copy these:
- Project URL β SUPABASE_URL
- Anon (public) key β SUPABASE_ANON_KEY (for app)
- Service_role key β SUPABASE_SERVICE_ROLE_KEY (for Actions only!)
β οΈ IMPORTANT: Never expose service_role key in HF Spaces!
Phase 2: GitHub Actions Setup (5 minutes)
2.1 Add GitHub Secrets
Your repo β Settings β Secrets and variables β Actions
Add these secrets:
- SUPABASE_URL
- SUPABASE_SERVICE_ROLE_KEY
2.2 Verify Workflow
Your repo β Actions
You should see: "Ingest & Deploy to HF Spaces"
2.3 Manual Trigger (Optional)
Actions β "Ingest & Deploy to HF Spaces" β Run workflow
This:
1. Runs ingest.py
2. Loads SAP documents
3. Computes embeddings
4. Inserts into Supabase
Phase 3: HuggingFace Spaces Setup (10 minutes)
3.1 Create Space
1. Go to https://huggingface.co/spaces
2. Click "Create new Space"
3. Fill in:
- Name: sap-chatbot
- License: Apache 2.0
- Space SDK: Docker (important!)
- Visibility: Public
4. Click "Create Space"
3.2 Link GitHub Repository
Space Settings β "Linked Repository"
Select: your-username/sap-chatbot
β Space now auto-syncs with GitHub!
3.3 Add Secrets
Space Settings β Secrets
Add these (all from Supabase API):
- HF_API_TOKEN (from https://huggingface.co/settings/tokens)
- SUPABASE_URL (public, safe to expose)
- SUPABASE_ANON_KEY (public, safe to expose)
- EMBEDDING_MODEL (optional, default: all-MiniLM-L6-v2)
- RESULTS_K (optional, default: 5)
3.4 Wait for Build
Space will:
1. Detect changes from GitHub
2. Build Docker image (~3 min)
3. Start Streamlit app (~1 min)
4. Status: "Running" (green light)
3.5 Test the App
1. Click "Open in iframe" or visit the Space URL
2. Wait for Streamlit to load
3. Ask: "How do I monitor SAP background jobs?"
4. Should return answer with sources from Supabase!
π File Structure
sap-chatbot/
βββ app.py # Streamlit app (uses HF API + Supabase)
βββ ingest.py # Ingestion script (GitHub Actions)
βββ config.py # Configuration
βββ Dockerfile # Docker config (HF Spaces)
βββ requirements.txt # Dependencies (supabase, sentence-transformers)
βββ .github/
β βββ workflows/
β βββ deploy.yml # GitHub Actions workflow
βββ tools/
β βββ agent.py # LLM interface
β βββ embeddings.py # Embedding utilities
β βββ build_dataset.py # Dataset builder
βββ data/
β βββ sap_dataset.json # Source documents
βββ SUPABASE_SETUP.md # Detailed Supabase guide
βββ README.md # Main README
βββ QUICKSTART_HF_SPACES.md # Local setup (alternative)
π Workflows
Adding More Documents
1. Update data/sap_dataset.json with new documents
ββ Run: python tools/build_dataset.py
2. Push to GitHub
ββ git add . && git commit && git push
3. GitHub Actions auto-runs:
ββ ingest.py computes embeddings
ββ Inserts into Supabase
ββ ~2-5 minutes
4. HF Spaces auto-syncs from GitHub
ββ New documents immediately available
Updating Code
1. Make changes to app.py, config.py, etc.
2. Push to GitHub
3. HF Spaces auto-rebuilds and redeploys (~3 min)
4. App is live with new features!
Manual Ingestion (Local)
# Set env vars
export SUPABASE_URL="https://..."
export SUPABASE_SERVICE_ROLE_KEY="eyJ..."
export EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
# Run ingestion
python ingest.py
# Logs show progress:
# - Loading 47 documents
# - Computing embeddings
# - Inserting into Supabase
# - Total chunks: 234
π Security
Keys & Secrets
| Key | Use | Where | Public? |
|---|---|---|---|
| HF_API_TOKEN | API access | HF Spaces Secrets | β No |
| SUPABASE_URL | DB connection | HF Spaces Secrets | β Yes |
| SUPABASE_ANON_KEY | Row-level access (RLS) | HF Spaces Secrets | β Yes (limited) |
| SUPABASE_SERVICE_ROLE_KEY | Bypass RLS | GitHub Secrets only | β NO! |
Row-Level Security (RLS)
Supabase uses RLS policies to control access:
SUPABASE_ANON_KEY: Can read fromdocumentstable (RLS policy)SUPABASE_SERVICE_ROLE_KEY: Can bypass RLS (ingestion only)
β Best Practice: Keep service_role key only in GitHub Actions
π Scaling
Free Tier Limits
- 500MB database
- 2GB file storage
- Limited API calls
- Great for testing!
When to Upgrade Supabase
Free tier is enough if:
- Documents < 500MB
- Users < 100/month
- Searches < 1000/day
Upgrade to Pro ($25/mo) when:
- Growing beyond limits
- Need higher rate limits
- Want priority support
Cost Optimization
Current (Free):
- HF Spaces: $0
- Supabase: $0
- HF Inference API: $0
- GitHub Actions: $0
- Total: $0
With Supabase Pro ($25):
- HF Spaces: $0
- Supabase: $25
- HF Inference API: $0
- GitHub Actions: $0
- Total: $25/month
Supports:
- 100+ concurrent users
- 1TB+ documents
- Unlimited searches
β Checklist
Before Deploying
- Supabase project created
- pgvector enabled
- documents table created
- search_documents() function created
- GitHub Actions secrets added
- HF Space created and linked to GitHub
- HF Space secrets configured
- data/sap_dataset.json in repo
Deployment Day
- Run GitHub Actions ingestion (manual trigger)
- Wait for ingestion to complete
- HF Space auto-syncs and builds
- App available at Space URL
- Test with sample query
- Share URL with team
Post-Deployment
- Monitor ingestion logs
- Monitor app performance
- Add more documents as needed
- Gather feedback from users
- Plan upgrades if needed
π Troubleshooting
"Module not found: supabase"
# Install missing packages
pip install -r requirements.txt
"pgvector not found"
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
"RPC function not found"
-- Create function in Supabase SQL Editor
CREATE OR REPLACE FUNCTION search_documents...
"Embedding dimension mismatch"
# Check model outputs 384 dimensions
# Table must be VECTOR(384)
"Ingestion too slow"
# In ingest.py, increase batch size
BATCH_SIZE = 200 # default: 100
"App can't connect to Supabase"
- Verify
SUPABASE_URLin secrets - Verify
SUPABASE_ANON_KEYin secrets - Check RLS policies allow read from documents
"Search results are empty"
- Verify ingestion completed
- Check documents table has rows
- Test search_documents() directly in Supabase
π Next Steps
- β Set up Supabase project
- β Configure GitHub Actions
- β Create HF Space with secrets
- β Trigger ingestion manually
- β Deploy and test
- β Share with your SAP team!
π Resources
- Supabase: https://supabase.com/docs
- pgvector: https://github.com/pgvector/pgvector
- HF Spaces: https://huggingface.co/docs/hub/spaces
- Docker on HF: https://huggingface.co/docs/hub/spaces-sdks-docker
Your production-grade SAP chatbot is ready! π