Spaces:

Psytamaa
/

sap-chatbot

Sleeping

1. Go to https://supabase.com
2. Click "Start your project"
3. Sign up with GitHub (free)
4. Create organization & project
5. Choose region (closest to you)
6. Wait for initialization (~2 min)

1.2 Enable pgvector

-- In Supabase Dashboard → SQL Editor
CREATE EXTENSION IF NOT EXISTS vector;

1.3 Create Documents Table

CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  source TEXT,
  url TEXT,
  title TEXT,
  content TEXT,
  chunk_id INT,
  embedding VECTOR(384),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create index for faster search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

1.4 Create Search Function

CREATE OR REPLACE FUNCTION search_documents(query_embedding VECTOR, k INT DEFAULT 5)
RETURNS TABLE(id BIGINT, source TEXT, url TEXT, title TEXT, content TEXT, chunk_id INT, distance FLOAT8) AS $$
BEGIN
  RETURN QUERY
  SELECT
    documents.id,
    documents.source,
    documents.url,
    documents.title,
    documents.content,
    documents.chunk_id,
    1 - (documents.embedding <=> query_embedding) AS distance
  FROM documents
  ORDER BY documents.embedding <=> query_embedding
  LIMIT k;
END;
$$ LANGUAGE plpgsql;

1.5 Get Credentials

In Supabase Dashboard → Settings → API

Copy these:
- Project URL              → SUPABASE_URL
- Anon (public) key       → SUPABASE_ANON_KEY (for app)
- Service_role key        → SUPABASE_SERVICE_ROLE_KEY (for Actions only!)

⚠️ IMPORTANT: Never expose service_role key in HF Spaces!

Phase 2: GitHub Actions Setup (5 minutes)

2.1 Add GitHub Secrets

Your repo → Settings → Secrets and variables → Actions

Add these secrets:
- SUPABASE_URL
- SUPABASE_SERVICE_ROLE_KEY

2.2 Verify Workflow

Your repo → Actions

You should see: "Ingest & Deploy to HF Spaces"

2.3 Manual Trigger (Optional)

Actions → "Ingest & Deploy to HF Spaces" → Run workflow

This:
1. Runs ingest.py
2. Loads SAP documents
3. Computes embeddings
4. Inserts into Supabase

Phase 3: HuggingFace Spaces Setup (10 minutes)

3.1 Create Space

1. Go to https://huggingface.co/spaces
2. Click "Create new Space"
3. Fill in:
   - Name: sap-chatbot
   - License: Apache 2.0
   - Space SDK: Docker (important!)
   - Visibility: Public
4. Click "Create Space"

3.2 Link GitHub Repository

Space Settings → "Linked Repository"

Select: your-username/sap-chatbot

✓ Space now auto-syncs with GitHub!

3.3 Add Secrets

Space Settings → Secrets

Add these (all from Supabase API):
- HF_API_TOKEN          (from https://huggingface.co/settings/tokens)
- SUPABASE_URL          (public, safe to expose)
- SUPABASE_ANON_KEY     (public, safe to expose)
- EMBEDDING_MODEL       (optional, default: all-MiniLM-L6-v2)
- RESULTS_K             (optional, default: 5)

3.4 Wait for Build

Space will:
1. Detect changes from GitHub
2. Build Docker image (~3 min)
3. Start Streamlit app (~1 min)
4. Status: "Running" (green light)

3.5 Test the App

1. Click "Open in iframe" or visit the Space URL
2. Wait for Streamlit to load
3. Ask: "How do I monitor SAP background jobs?"
4. Should return answer with sources from Supabase!

📊 File Structure

sap-chatbot/
├── app.py                    # Streamlit app (uses HF API + Supabase)
├── ingest.py                 # Ingestion script (GitHub Actions)
├── config.py                 # Configuration
├── Dockerfile                # Docker config (HF Spaces)
├── requirements.txt          # Dependencies (supabase, sentence-transformers)
├── .github/
│   └── workflows/
│       └── deploy.yml        # GitHub Actions workflow
├── tools/
│   ├── agent.py             # LLM interface
│   ├── embeddings.py        # Embedding utilities
│   └── build_dataset.py     # Dataset builder
├── data/
│   └── sap_dataset.json     # Source documents
├── SUPABASE_SETUP.md        # Detailed Supabase guide
├── README.md                # Main README
└── QUICKSTART_HF_SPACES.md  # Local setup (alternative)

🔄 Workflows

Adding More Documents

1. Update data/sap_dataset.json with new documents
   └─ Run: python tools/build_dataset.py

2. Push to GitHub
   └─ git add . && git commit && git push

3. GitHub Actions auto-runs:
   └─ ingest.py computes embeddings
   └─ Inserts into Supabase
   └─ ~2-5 minutes

4. HF Spaces auto-syncs from GitHub
   └─ New documents immediately available

Updating Code

1. Make changes to app.py, config.py, etc.
2. Push to GitHub
3. HF Spaces auto-rebuilds and redeploys (~3 min)
4. App is live with new features!

Manual Ingestion (Local)

# Set env vars
export SUPABASE_URL="https://..."
export SUPABASE_SERVICE_ROLE_KEY="eyJ..."
export EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"

# Run ingestion
python ingest.py

# Logs show progress:
# - Loading 47 documents
# - Computing embeddings
# - Inserting into Supabase
# - Total chunks: 234

🔐 Security

Keys & Secrets

Key	Use	Where	Public?
HF_API_TOKEN	API access	HF Spaces Secrets	❌ No
SUPABASE_URL	DB connection	HF Spaces Secrets	✅ Yes
SUPABASE_ANON_KEY	Row-level access (RLS)	HF Spaces Secrets	✅ Yes (limited)
SUPABASE_SERVICE_ROLE_KEY	Bypass RLS	GitHub Secrets only	❌ NO!

Row-Level Security (RLS)

Supabase uses RLS policies to control access:

SUPABASE_ANON_KEY: Can read from documents table (RLS policy)
SUPABASE_SERVICE_ROLE_KEY: Can bypass RLS (ingestion only)

✅ Best Practice: Keep service_role key only in GitHub Actions

📈 Scaling

Free Tier Limits

500MB database
2GB file storage
Limited API calls
Great for testing!

When to Upgrade Supabase

Free tier is enough if:
- Documents < 500MB
- Users < 100/month
- Searches < 1000/day

Upgrade to Pro ($25/mo) when:
- Growing beyond limits
- Need higher rate limits
- Want priority support

Cost Optimization

Current (Free):
- HF Spaces: $0
- Supabase: $0
- HF Inference API: $0
- GitHub Actions: $0
- Total: $0

With Supabase Pro ($25):
- HF Spaces: $0
- Supabase: $25
- HF Inference API: $0
- GitHub Actions: $0
- Total: $25/month

Supports:
- 100+ concurrent users
- 1TB+ documents
- Unlimited searches

✅ Checklist

Before Deploying

Supabase project created
pgvector enabled
documents table created
search_documents() function created
GitHub Actions secrets added
HF Space created and linked to GitHub
HF Space secrets configured
data/sap_dataset.json in repo

Deployment Day

Run GitHub Actions ingestion (manual trigger)
Wait for ingestion to complete
HF Space auto-syncs and builds
App available at Space URL
Test with sample query
Share URL with team

Post-Deployment

Monitor ingestion logs
Monitor app performance
Add more documents as needed
Gather feedback from users
Plan upgrades if needed

🆘 Troubleshooting

"Module not found: supabase"

# Install missing packages
pip install -r requirements.txt

"pgvector not found"

-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;

"RPC function not found"

-- Create function in Supabase SQL Editor
CREATE OR REPLACE FUNCTION search_documents...

"Embedding dimension mismatch"

# Check model outputs 384 dimensions
# Table must be VECTOR(384)

"Ingestion too slow"

# In ingest.py, increase batch size
BATCH_SIZE = 200  # default: 100

"App can't connect to Supabase"

Verify SUPABASE_URL in secrets
Verify SUPABASE_ANON_KEY in secrets
Check RLS policies allow read from documents

"Search results are empty"

Verify ingestion completed
Check documents table has rows
Test search_documents() directly in Supabase

🚀 Next Steps

✅ Set up Supabase project
✅ Configure GitHub Actions
✅ Create HF Space with secrets
✅ Trigger ingestion manually
✅ Deploy and test
✅ Share with your SAP team!

📚 Resources

Supabase: https://supabase.com/docs
pgvector: https://github.com/pgvector/pgvector
HF Spaces: https://huggingface.co/docs/hub/spaces
Docker on HF: https://huggingface.co/docs/hub/spaces-sdks-docker

Your production-grade SAP chatbot is ready! 🎉