ai-api-ollama / QUICKSTART.md
cygon
Initial deployment with Ollama support
d61feef
# Quick Start Guide
Get your AI API Service up and running in 5 minutes!
## Prerequisites
- Node.js 18+
- npm or yarn
- At least one LLM API key (OpenAI, HuggingFace, or Anthropic)
## 5-Minute Setup
### 1. Install Dependencies
```bash
npm install
```
### 2. Configure Environment
```bash
cp .env.example .env
```
Edit `.env` and add your API keys:
```env
OPENAI_API_KEY=sk-your-openai-key
API_KEYS=demo-key-1,my-secret-key
```
### 3. Start the Server
```bash
npm run dev
```
The API will be available at `http://localhost:8000`
### 4. Test the API
```bash
curl http://localhost:8000/health
```
Expected response:
```json
{
"status": "healthy",
"version": "1.0.0",
"services": [...],
"uptime_seconds": 5
}
```
### 5. Make Your First Request
```bash
curl -X POST http://localhost:8000/ai/chat \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{
"conversation": [
{"role": "user", "content": "Hello!"}
]
}'
```
## Example Requests
### Chat
```bash
curl -X POST http://localhost:8000/ai/chat \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{"conversation": [{"role": "user", "content": "What is AI?"}]}'
```
### RAG Query
```bash
curl -X POST http://localhost:8000/rag/query \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{"query": "What are the key features?", "top_k": 5}'
```
### Image Generation
```bash
curl -X POST http://localhost:8000/image/generate \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{"prompt": "A sunset over mountains", "size": "1024x1024"}'
```
## What Each Component Does
### πŸ” **Authentication (`/backend/utils/auth.ts`)**
- Validates API keys from the Authorization header
- Implements role-based access (default, premium, admin)
- Used by all protected endpoints
### ⚑ **Rate Limiting (`/backend/utils/rate_limit.ts`)**
- Token bucket algorithm
- Configurable limits per tier (60/300/1000 requests/min)
- Automatic reset after 1 minute
- Prevents abuse and cost overruns
### πŸ€– **AI Service (`/backend/services/ai_service.ts`)**
- Multi-provider LLM routing (OpenAI, HuggingFace, Anthropic)
- Automatic model selection and fallback
- Chat completions with context management
- Embedding generation for RAG
### πŸ“š **RAG Service (`/backend/services/rag_service.ts`)**
- Vector-based document retrieval
- Automatic context injection into prompts
- Supports Pinecone or in-memory vector DB
- Returns sources with similarity scores
### πŸ–ΌοΈ **Image Service (`/backend/services/image_service.ts`)**
- Text-to-image generation
- Supports DALL-E and Stable Diffusion
- Configurable sizes and quality
- Returns base64 or URLs
### πŸŽ™οΈ **Voice Service (`/backend/services/voice_service.ts`)**
- Text-to-speech synthesis (TTS)
- Speech-to-text transcription (STT)
- Multiple voice options
- Various audio formats (mp3, opus, etc.)
### πŸ“„ **Document Service (`/backend/services/document_service.ts`)**
- Upload PDF, DOCX, TXT files
- Automatic text extraction
- Chunking with overlap for better retrieval
- Background processing with workers
- Stores chunks in vector DB
### πŸ”Œ **Adapters**
#### **OpenAI Adapter (`/backend/adapters/openai_adapter.ts`)**
- Chat completions (GPT-4, GPT-3.5)
- Embeddings (text-embedding-ada-002)
- Image generation (DALL-E)
- Voice synthesis and transcription
- Implements LLMAdapter, ImageAdapter, VoiceAdapter interfaces
#### **HuggingFace Adapter (`/backend/adapters/huggingface_adapter.ts`)**
- Open-source models (Mistral, Llama, etc.)
- Stable Diffusion for images
- Sentence transformers for embeddings
- Free tier available
#### **Anthropic Adapter (`/backend/adapters/anthropic_adapter.ts`)**
- Claude models (Sonnet, Opus)
- Advanced reasoning capabilities
- Long context windows
#### **Vector DB Adapters (`/backend/adapters/vector_db_adapter.ts`)**
- **PineconeAdapter**: Production vector storage with managed scaling
- **InMemoryVectorDB**: Development fallback with cosine similarity
- Supports metadata filtering and batch operations
### πŸ“Š **Observability**
#### **Logger (`/backend/utils/logger.ts`)**
- Structured JSON logging
- Configurable log levels (debug, info, warn, error)
- Automatic timestamping
- Production-ready format
#### **Metrics (`/backend/utils/metrics.ts`)**
- Request counting by endpoint
- Error tracking
- Response time measurement
- Model usage statistics
- Vector DB query counts
- Document processing stats
### πŸ”„ **Background Workers (`/backend/workers/ingestion_worker.ts`)**
- Async document processing
- Configurable concurrency
- Job status tracking
- Webhook notifications on completion
- Automatic retries on failure
### 🌐 **API Endpoints**
All endpoints are in `/backend/api/`:
#### **Health & Metrics (`health.ts`)**
- `GET /health` - Service health with component status
- `GET /metrics` - Usage metrics and statistics
#### **Authentication (`auth.ts`)**
- `POST /auth/verify` - Validate API key
#### **Chat (`chat.ts`)**
- `POST /ai/chat` - Multi-turn conversation
- `GET /ai/query` - Simple Q&A
#### **RAG (`rag.ts`)**
- `POST /rag/query` - Query with retrieval
- `GET /rag/models` - List available models
#### **Images (`image.ts`)**
- `POST /image/generate` - Generate images
#### **Voice (`voice.ts`)**
- `POST /voice/synthesize` - Text to speech
- `POST /voice/transcribe` - Speech to text
#### **Documents (`documents.ts`)**
- `POST /upload` - Upload document
- `GET /docs/:id/sources` - Get document chunks
- `POST /webhook/events` - Processing webhooks
## Architecture Flow
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Client β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
β”‚
β”œβ”€ Authorization Header (Bearer token)
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Auth Middleware β”‚ ← Validates API key
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”œβ”€ Checks rate limit
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ API Endpoint β”‚ ← Routes request
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”œβ”€ POST /ai/chat β†’ AI Service
β”œβ”€ POST /rag/query β†’ RAG Service β†’ Vector DB β†’ AI Service
β”œβ”€ POST /image/generate β†’ Image Service
β”œβ”€ POST /voice/synthesize β†’ Voice Service
β”œβ”€ POST /upload β†’ Document Service β†’ Worker β†’ Vector DB
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Response β”‚ ← JSON with data + metadata
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Configuration
### Environment Variables
| Variable | What It Does | Example |
|----------|-------------|---------|
| `OPENAI_API_KEY` | OpenAI access for GPT models | `sk-...` |
| `HUGGINGFACE_API_KEY` | HuggingFace models access | `hf_...` |
| `API_KEYS` | Valid API keys (comma-separated) | `key1,key2` |
| `RATE_LIMIT_DEFAULT` | Requests/min for basic users | `60` |
| `RATE_LIMIT_ADMIN` | Requests/min for admins | `1000` |
| `MAX_FILE_SIZE_MB` | Max document upload size | `10` |
| `CHUNK_SIZE` | Text chunk size for RAG | `1000` |
| `LOG_LEVEL` | Logging verbosity | `info` |
### Tier System
- **Default**: 60 requests/min
- **Premium**: 300 requests/min (add to config)
- **Admin**: 1000 requests/min (via `ADMIN_API_KEYS`)
## Testing
Run tests:
```bash
npm test
```
Run with coverage:
```bash
npm run test:coverage
```
## Production Checklist
- [ ] Set strong `API_KEYS`
- [ ] Configure `ADMIN_API_KEYS` separately
- [ ] Set up Pinecone for vector storage
- [ ] Increase rate limits based on needs
- [ ] Enable background workers
- [ ] Set `LOG_LEVEL=info` or `warn`
- [ ] Configure CORS origins
- [ ] Set up monitoring/alerting
- [ ] Review cost limits on LLM providers
## Troubleshooting
**"No LLM adapter available"**
β†’ Add at least one API key (OPENAI_API_KEY, HUGGINGFACE_API_KEY, or ANTHROPIC_API_KEY)
**"Invalid API key"**
β†’ Check Authorization header: `Bearer your-key-here`
**"Rate limit exceeded"**
β†’ Wait 60 seconds or use admin key
**Vector DB queries fail**
β†’ Service falls back to in-memory storage automatically
## Next Steps
1. **Read the full README**: `README.md`
2. **Check deployment guide**: `DEPLOYMENT.md`
3. **Review examples**: `examples/js_client.js` and `examples/curl.sh`
4. **Run tests**: `npm test`
5. **Deploy to production**: See DEPLOYMENT.md
## Support
- GitHub Issues
- Documentation in `/docs`
- Example code in `/examples`
Enjoy building with the AI API Service! πŸš€