Spaces:
Build error
Build error
Quick Start Guide
Get your AI API Service up and running in 5 minutes!
Prerequisites
- Node.js 18+
- npm or yarn
- At least one LLM API key (OpenAI, HuggingFace, or Anthropic)
5-Minute Setup
1. Install Dependencies
npm install
2. Configure Environment
cp .env.example .env
Edit .env and add your API keys:
OPENAI_API_KEY=sk-your-openai-key
API_KEYS=demo-key-1,my-secret-key
3. Start the Server
npm run dev
The API will be available at http://localhost:8000
4. Test the API
curl http://localhost:8000/health
Expected response:
{
"status": "healthy",
"version": "1.0.0",
"services": [...],
"uptime_seconds": 5
}
5. Make Your First Request
curl -X POST http://localhost:8000/ai/chat \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{
"conversation": [
{"role": "user", "content": "Hello!"}
]
}'
Example Requests
Chat
curl -X POST http://localhost:8000/ai/chat \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{"conversation": [{"role": "user", "content": "What is AI?"}]}'
RAG Query
curl -X POST http://localhost:8000/rag/query \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{"query": "What are the key features?", "top_k": 5}'
Image Generation
curl -X POST http://localhost:8000/image/generate \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{"prompt": "A sunset over mountains", "size": "1024x1024"}'
What Each Component Does
π Authentication (/backend/utils/auth.ts)
- Validates API keys from the Authorization header
- Implements role-based access (default, premium, admin)
- Used by all protected endpoints
β‘ Rate Limiting (/backend/utils/rate_limit.ts)
- Token bucket algorithm
- Configurable limits per tier (60/300/1000 requests/min)
- Automatic reset after 1 minute
- Prevents abuse and cost overruns
π€ AI Service (/backend/services/ai_service.ts)
- Multi-provider LLM routing (OpenAI, HuggingFace, Anthropic)
- Automatic model selection and fallback
- Chat completions with context management
- Embedding generation for RAG
π RAG Service (/backend/services/rag_service.ts)
- Vector-based document retrieval
- Automatic context injection into prompts
- Supports Pinecone or in-memory vector DB
- Returns sources with similarity scores
πΌοΈ Image Service (/backend/services/image_service.ts)
- Text-to-image generation
- Supports DALL-E and Stable Diffusion
- Configurable sizes and quality
- Returns base64 or URLs
ποΈ Voice Service (/backend/services/voice_service.ts)
- Text-to-speech synthesis (TTS)
- Speech-to-text transcription (STT)
- Multiple voice options
- Various audio formats (mp3, opus, etc.)
π Document Service (/backend/services/document_service.ts)
- Upload PDF, DOCX, TXT files
- Automatic text extraction
- Chunking with overlap for better retrieval
- Background processing with workers
- Stores chunks in vector DB
π Adapters
OpenAI Adapter (/backend/adapters/openai_adapter.ts)
- Chat completions (GPT-4, GPT-3.5)
- Embeddings (text-embedding-ada-002)
- Image generation (DALL-E)
- Voice synthesis and transcription
- Implements LLMAdapter, ImageAdapter, VoiceAdapter interfaces
HuggingFace Adapter (/backend/adapters/huggingface_adapter.ts)
- Open-source models (Mistral, Llama, etc.)
- Stable Diffusion for images
- Sentence transformers for embeddings
- Free tier available
Anthropic Adapter (/backend/adapters/anthropic_adapter.ts)
- Claude models (Sonnet, Opus)
- Advanced reasoning capabilities
- Long context windows
Vector DB Adapters (/backend/adapters/vector_db_adapter.ts)
- PineconeAdapter: Production vector storage with managed scaling
- InMemoryVectorDB: Development fallback with cosine similarity
- Supports metadata filtering and batch operations
π Observability
Logger (/backend/utils/logger.ts)
- Structured JSON logging
- Configurable log levels (debug, info, warn, error)
- Automatic timestamping
- Production-ready format
Metrics (/backend/utils/metrics.ts)
- Request counting by endpoint
- Error tracking
- Response time measurement
- Model usage statistics
- Vector DB query counts
- Document processing stats
π Background Workers (/backend/workers/ingestion_worker.ts)
- Async document processing
- Configurable concurrency
- Job status tracking
- Webhook notifications on completion
- Automatic retries on failure
π API Endpoints
All endpoints are in /backend/api/:
Health & Metrics (health.ts)
GET /health- Service health with component statusGET /metrics- Usage metrics and statistics
Authentication (auth.ts)
POST /auth/verify- Validate API key
Chat (chat.ts)
POST /ai/chat- Multi-turn conversationGET /ai/query- Simple Q&A
RAG (rag.ts)
POST /rag/query- Query with retrievalGET /rag/models- List available models
Images (image.ts)
POST /image/generate- Generate images
Voice (voice.ts)
POST /voice/synthesize- Text to speechPOST /voice/transcribe- Speech to text
Documents (documents.ts)
POST /upload- Upload documentGET /docs/:id/sources- Get document chunksPOST /webhook/events- Processing webhooks
Architecture Flow
βββββββββββ
β Client β
ββββββ¬βββββ
β
ββ Authorization Header (Bearer token)
β
βββββββββββββββββββ
β Auth Middleware β β Validates API key
ββββββ¬βββββββββββββ
ββ Checks rate limit
β
ββββββββββββββββ
β API Endpoint β β Routes request
ββββββ¬ββββββββββ
ββ POST /ai/chat β AI Service
ββ POST /rag/query β RAG Service β Vector DB β AI Service
ββ POST /image/generate β Image Service
ββ POST /voice/synthesize β Voice Service
ββ POST /upload β Document Service β Worker β Vector DB
β
βββββββββββββ
β Response β β JSON with data + metadata
βββββββββββββ
Configuration
Environment Variables
| Variable | What It Does | Example |
|---|---|---|
OPENAI_API_KEY |
OpenAI access for GPT models | sk-... |
HUGGINGFACE_API_KEY |
HuggingFace models access | hf_... |
API_KEYS |
Valid API keys (comma-separated) | key1,key2 |
RATE_LIMIT_DEFAULT |
Requests/min for basic users | 60 |
RATE_LIMIT_ADMIN |
Requests/min for admins | 1000 |
MAX_FILE_SIZE_MB |
Max document upload size | 10 |
CHUNK_SIZE |
Text chunk size for RAG | 1000 |
LOG_LEVEL |
Logging verbosity | info |
Tier System
- Default: 60 requests/min
- Premium: 300 requests/min (add to config)
- Admin: 1000 requests/min (via
ADMIN_API_KEYS)
Testing
Run tests:
npm test
Run with coverage:
npm run test:coverage
Production Checklist
- Set strong
API_KEYS - Configure
ADMIN_API_KEYSseparately - Set up Pinecone for vector storage
- Increase rate limits based on needs
- Enable background workers
- Set
LOG_LEVEL=infoorwarn - Configure CORS origins
- Set up monitoring/alerting
- Review cost limits on LLM providers
Troubleshooting
"No LLM adapter available" β Add at least one API key (OPENAI_API_KEY, HUGGINGFACE_API_KEY, or ANTHROPIC_API_KEY)
"Invalid API key"
β Check Authorization header: Bearer your-key-here
"Rate limit exceeded" β Wait 60 seconds or use admin key
Vector DB queries fail β Service falls back to in-memory storage automatically
Next Steps
- Read the full README:
README.md - Check deployment guide:
DEPLOYMENT.md - Review examples:
examples/js_client.jsandexamples/curl.sh - Run tests:
npm test - Deploy to production: See DEPLOYMENT.md
Support
- GitHub Issues
- Documentation in
/docs - Example code in
/examples
Enjoy building with the AI API Service! π