# Deployment Guide This guide covers deploying the AI API Service to various platforms. ## Table of Contents - [Local Development](#local-development) - [Docker Deployment](#docker-deployment) - [Encore Cloud](#encore-cloud) - [Hugging Face Spaces](#hugging-face-spaces) - [AWS Deployment](#aws-deployment) - [Google Cloud Platform](#google-cloud-platform) - [Azure Deployment](#azure-deployment) - [Environment Variables](#environment-variables) ## Local Development ### Prerequisites - Node.js 18+ - npm or yarn - Encore CLI ### Steps 1. **Install Encore CLI** ```bash npm install -g encore ``` 2. **Install dependencies** ```bash npm install ``` 3. **Configure environment** ```bash cp .env.example .env # Edit .env with your API keys ``` 4. **Run development server** ```bash encore run ``` The API will be available at `http://localhost:8000` ## Docker Deployment ### Build and Run Locally ```bash docker-compose up -d ``` This starts: - API service on port 8000 - Redis for caching (optional) ### Build Production Image ```bash docker build -t ai-api-service:latest . ``` ### Run Production Container ```bash docker run -d \ -p 8000:8000 \ -e OPENAI_API_KEY=your_key \ -e API_KEYS=your_api_keys \ --name ai-api \ ai-api-service:latest ``` ## Encore Cloud Encore Cloud provides the easiest deployment experience with automatic infrastructure provisioning. ### Steps 1. **Install Encore CLI** ```bash npm install -g encore ``` 2. **Login to Encore** ```bash encore auth login ``` 3. **Create app (first time)** ```bash encore app create ai-api-service ``` 4. **Set secrets** ```bash encore secret set OPENAI_API_KEY encore secret set HUGGINGFACE_API_KEY encore secret set PINECONE_API_KEY ``` 5. **Deploy** ```bash encore deploy ``` Your API will be deployed with: - Auto-scaling - Load balancing - SSL/TLS certificates - Monitoring and logs - Database backups ## Hugging Face Spaces Deploy as a Docker Space on Hugging Face for easy sharing. ### Steps 1. **Create new Space** - Go to https://huggingface.co/new-space - Select "Docker" as SDK - Choose hardware tier (CPU or GPU) 2. **Clone Space repository** ```bash git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE cd YOUR_SPACE ``` 3. **Copy project files** ```bash cp -r /path/to/ai-api-service/* . ``` 4. **Create Dockerfile for HF Spaces** ```dockerfile FROM node:18-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . ENV PORT=7860 EXPOSE 7860 CMD ["npm", "start"] ``` 5. **Configure secrets in Space settings** - `OPENAI_API_KEY` - `HUGGINGFACE_API_KEY` - `API_KEYS` 6. **Push to Space** ```bash git add . git commit -m "Initial deployment" git push ``` ## AWS Deployment ### Using AWS ECS (Elastic Container Service) 1. **Push image to ECR** ```bash aws ecr create-repository --repository-name ai-api-service docker build -t ai-api-service . aws ecr get-login-password --region us-east-1 | \ docker login --username AWS --password-stdin \ YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com docker tag ai-api-service:latest \ YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/ai-api-service:latest docker push YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/ai-api-service:latest ``` 2. **Create ECS Task Definition** ```json { "family": "ai-api-service", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "1024", "memory": "2048", "containerDefinitions": [{ "name": "ai-api", "image": "YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/ai-api-service:latest", "portMappings": [{ "containerPort": 8000, "protocol": "tcp" }], "environment": [], "secrets": [{ "name": "OPENAI_API_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:openai-api-key" }] }] } ``` 3. **Create ECS Service with ALB** - Configure Application Load Balancer - Set up target group (port 8000) - Configure auto-scaling - Add health checks ### Using AWS Lambda (API Gateway) For serverless deployment, wrap endpoints with AWS Lambda handlers. ## Google Cloud Platform ### Using Cloud Run 1. **Build and push to GCR** ```bash gcloud builds submit --tag gcr.io/PROJECT_ID/ai-api-service gcloud run deploy ai-api-service \ --image gcr.io/PROJECT_ID/ai-api-service \ --platform managed \ --region us-central1 \ --allow-unauthenticated \ --set-env-vars OPENAI_API_KEY=your_key ``` 2. **Configure secrets** ```bash echo -n "your_openai_key" | \ gcloud secrets create openai-api-key --data-file=- gcloud run services update ai-api-service \ --update-secrets OPENAI_API_KEY=openai-api-key:latest ``` ### Using GKE (Kubernetes) 1. **Create cluster** ```bash gcloud container clusters create ai-api-cluster \ --num-nodes=3 \ --machine-type=n1-standard-2 ``` 2. **Deploy application** ```bash kubectl apply -f k8s/deployment.yaml kubectl apply -f k8s/service.yaml kubectl apply -f k8s/ingress.yaml ``` ## Azure Deployment ### Using Azure Container Instances ```bash az container create \ --resource-group ai-api-rg \ --name ai-api-service \ --image your-registry.azurecr.io/ai-api-service:latest \ --cpu 2 \ --memory 4 \ --ports 8000 \ --environment-variables \ PORT=8000 \ --secure-environment-variables \ OPENAI_API_KEY=your_key \ API_KEYS=demo-key-1 ``` ### Using Azure App Service 1. **Create App Service Plan** ```bash az appservice plan create \ --name ai-api-plan \ --resource-group ai-api-rg \ --is-linux \ --sku B1 ``` 2. **Create Web App** ```bash az webapp create \ --resource-group ai-api-rg \ --plan ai-api-plan \ --name ai-api-service \ --deployment-container-image-name your-registry.azurecr.io/ai-api-service:latest ``` 3. **Configure settings** ```bash az webapp config appsettings set \ --resource-group ai-api-rg \ --name ai-api-service \ --settings \ OPENAI_API_KEY=@Microsoft.KeyVault(SecretUri=...) ``` ## Environment Variables ### Required Variables | Variable | Description | Example | |----------|-------------|---------| | `API_KEYS` | Comma-separated API keys | `key1,key2,key3` | | `OPENAI_API_KEY` | OpenAI API key (or alternative) | `sk-...` | ### Optional Variables | Variable | Description | Default | |----------|-------------|---------| | `HUGGINGFACE_API_KEY` | HuggingFace API key | - | | `ANTHROPIC_API_KEY` | Anthropic API key | - | | `PINECONE_API_KEY` | Pinecone vector DB key | - | | `RATE_LIMIT_DEFAULT` | Requests/min for default tier | `60` | | `RATE_LIMIT_ADMIN` | Requests/min for admin tier | `1000` | | `LOG_LEVEL` | Logging level | `info` | | `MAX_FILE_SIZE_MB` | Max upload size in MB | `10` | ### Setting Secrets **Encore Cloud:** ```bash encore secret set OPENAI_API_KEY ``` **Docker:** ```bash docker run -e OPENAI_API_KEY=your_key ... ``` **Kubernetes:** ```bash kubectl create secret generic api-secrets \ --from-literal=OPENAI_API_KEY=your_key ``` **AWS Secrets Manager:** ```bash aws secretsmanager create-secret \ --name openai-api-key \ --secret-string your_key ``` ## Monitoring ### Health Checks Configure health check endpoint: ``` GET /health ``` Expected response: ```json { "status": "healthy", "version": "1.0.0", "services": [...] } ``` ### Metrics Access metrics at: ``` GET /metrics ``` ### Logging Logs are output as structured JSON: ```json { "timestamp": "2025-10-01T12:00:00Z", "level": "info", "message": "Request processed", "duration_ms": 245 } ``` ## Scaling Recommendations ### Horizontal Scaling - Start with 2-3 replicas - Auto-scale based on CPU (70% threshold) - Use load balancer for distribution ### Vertical Scaling - Minimum: 1 CPU, 2GB RAM - Recommended: 2 CPU, 4GB RAM - High traffic: 4 CPU, 8GB RAM ### Database Scaling - Use Pinecone for production vector storage - Implement Redis for caching - Consider read replicas for high traffic ## Troubleshooting ### Common Issues **"No LLM adapter available"** - Check that at least one API key is set (OpenAI, HuggingFace, or Anthropic) **"Rate limit exceeded"** - Increase rate limits in environment variables - Use admin API key for testing **"Vector DB connection failed"** - Service falls back to in-memory storage - Check Pinecone credentials **High latency** - Enable caching (Redis) - Use closer region for APIs - Optimize model selection ## Support For deployment assistance: - GitHub Issues - Documentation at docs/ - Community Discord