ai-api-ollama / QUICKSTART.md
cygon
Initial deployment with Ollama support
d61feef

Quick Start Guide

Get your AI API Service up and running in 5 minutes!

Prerequisites

  • Node.js 18+
  • npm or yarn
  • At least one LLM API key (OpenAI, HuggingFace, or Anthropic)

5-Minute Setup

1. Install Dependencies

npm install

2. Configure Environment

cp .env.example .env

Edit .env and add your API keys:

OPENAI_API_KEY=sk-your-openai-key
API_KEYS=demo-key-1,my-secret-key

3. Start the Server

npm run dev

The API will be available at http://localhost:8000

4. Test the API

curl http://localhost:8000/health

Expected response:

{
  "status": "healthy",
  "version": "1.0.0",
  "services": [...],
  "uptime_seconds": 5
}

5. Make Your First Request

curl -X POST http://localhost:8000/ai/chat \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Example Requests

Chat

curl -X POST http://localhost:8000/ai/chat \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{"conversation": [{"role": "user", "content": "What is AI?"}]}'

RAG Query

curl -X POST http://localhost:8000/rag/query \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the key features?", "top_k": 5}'

Image Generation

curl -X POST http://localhost:8000/image/generate \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A sunset over mountains", "size": "1024x1024"}'

What Each Component Does

πŸ” Authentication (/backend/utils/auth.ts)

  • Validates API keys from the Authorization header
  • Implements role-based access (default, premium, admin)
  • Used by all protected endpoints

⚑ Rate Limiting (/backend/utils/rate_limit.ts)

  • Token bucket algorithm
  • Configurable limits per tier (60/300/1000 requests/min)
  • Automatic reset after 1 minute
  • Prevents abuse and cost overruns

πŸ€– AI Service (/backend/services/ai_service.ts)

  • Multi-provider LLM routing (OpenAI, HuggingFace, Anthropic)
  • Automatic model selection and fallback
  • Chat completions with context management
  • Embedding generation for RAG

πŸ“š RAG Service (/backend/services/rag_service.ts)

  • Vector-based document retrieval
  • Automatic context injection into prompts
  • Supports Pinecone or in-memory vector DB
  • Returns sources with similarity scores

πŸ–ΌοΈ Image Service (/backend/services/image_service.ts)

  • Text-to-image generation
  • Supports DALL-E and Stable Diffusion
  • Configurable sizes and quality
  • Returns base64 or URLs

πŸŽ™οΈ Voice Service (/backend/services/voice_service.ts)

  • Text-to-speech synthesis (TTS)
  • Speech-to-text transcription (STT)
  • Multiple voice options
  • Various audio formats (mp3, opus, etc.)

πŸ“„ Document Service (/backend/services/document_service.ts)

  • Upload PDF, DOCX, TXT files
  • Automatic text extraction
  • Chunking with overlap for better retrieval
  • Background processing with workers
  • Stores chunks in vector DB

πŸ”Œ Adapters

OpenAI Adapter (/backend/adapters/openai_adapter.ts)

  • Chat completions (GPT-4, GPT-3.5)
  • Embeddings (text-embedding-ada-002)
  • Image generation (DALL-E)
  • Voice synthesis and transcription
  • Implements LLMAdapter, ImageAdapter, VoiceAdapter interfaces

HuggingFace Adapter (/backend/adapters/huggingface_adapter.ts)

  • Open-source models (Mistral, Llama, etc.)
  • Stable Diffusion for images
  • Sentence transformers for embeddings
  • Free tier available

Anthropic Adapter (/backend/adapters/anthropic_adapter.ts)

  • Claude models (Sonnet, Opus)
  • Advanced reasoning capabilities
  • Long context windows

Vector DB Adapters (/backend/adapters/vector_db_adapter.ts)

  • PineconeAdapter: Production vector storage with managed scaling
  • InMemoryVectorDB: Development fallback with cosine similarity
  • Supports metadata filtering and batch operations

πŸ“Š Observability

Logger (/backend/utils/logger.ts)

  • Structured JSON logging
  • Configurable log levels (debug, info, warn, error)
  • Automatic timestamping
  • Production-ready format

Metrics (/backend/utils/metrics.ts)

  • Request counting by endpoint
  • Error tracking
  • Response time measurement
  • Model usage statistics
  • Vector DB query counts
  • Document processing stats

πŸ”„ Background Workers (/backend/workers/ingestion_worker.ts)

  • Async document processing
  • Configurable concurrency
  • Job status tracking
  • Webhook notifications on completion
  • Automatic retries on failure

🌐 API Endpoints

All endpoints are in /backend/api/:

Health & Metrics (health.ts)

  • GET /health - Service health with component status
  • GET /metrics - Usage metrics and statistics

Authentication (auth.ts)

  • POST /auth/verify - Validate API key

Chat (chat.ts)

  • POST /ai/chat - Multi-turn conversation
  • GET /ai/query - Simple Q&A

RAG (rag.ts)

  • POST /rag/query - Query with retrieval
  • GET /rag/models - List available models

Images (image.ts)

  • POST /image/generate - Generate images

Voice (voice.ts)

  • POST /voice/synthesize - Text to speech
  • POST /voice/transcribe - Speech to text

Documents (documents.ts)

  • POST /upload - Upload document
  • GET /docs/:id/sources - Get document chunks
  • POST /webhook/events - Processing webhooks

Architecture Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Client  β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
     β”‚
     β”œβ”€ Authorization Header (Bearer token)
     ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Auth Middleware β”‚ ← Validates API key
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”œβ”€ Checks rate limit
     ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ API Endpoint β”‚ ← Routes request
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”œβ”€ POST /ai/chat β†’ AI Service
     β”œβ”€ POST /rag/query β†’ RAG Service β†’ Vector DB β†’ AI Service
     β”œβ”€ POST /image/generate β†’ Image Service
     β”œβ”€ POST /voice/synthesize β†’ Voice Service
     β”œβ”€ POST /upload β†’ Document Service β†’ Worker β†’ Vector DB
     ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Response  β”‚ ← JSON with data + metadata
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Configuration

Environment Variables

Variable What It Does Example
OPENAI_API_KEY OpenAI access for GPT models sk-...
HUGGINGFACE_API_KEY HuggingFace models access hf_...
API_KEYS Valid API keys (comma-separated) key1,key2
RATE_LIMIT_DEFAULT Requests/min for basic users 60
RATE_LIMIT_ADMIN Requests/min for admins 1000
MAX_FILE_SIZE_MB Max document upload size 10
CHUNK_SIZE Text chunk size for RAG 1000
LOG_LEVEL Logging verbosity info

Tier System

  • Default: 60 requests/min
  • Premium: 300 requests/min (add to config)
  • Admin: 1000 requests/min (via ADMIN_API_KEYS)

Testing

Run tests:

npm test

Run with coverage:

npm run test:coverage

Production Checklist

  • Set strong API_KEYS
  • Configure ADMIN_API_KEYS separately
  • Set up Pinecone for vector storage
  • Increase rate limits based on needs
  • Enable background workers
  • Set LOG_LEVEL=info or warn
  • Configure CORS origins
  • Set up monitoring/alerting
  • Review cost limits on LLM providers

Troubleshooting

"No LLM adapter available" β†’ Add at least one API key (OPENAI_API_KEY, HUGGINGFACE_API_KEY, or ANTHROPIC_API_KEY)

"Invalid API key" β†’ Check Authorization header: Bearer your-key-here

"Rate limit exceeded" β†’ Wait 60 seconds or use admin key

Vector DB queries fail β†’ Service falls back to in-memory storage automatically

Next Steps

  1. Read the full README: README.md
  2. Check deployment guide: DEPLOYMENT.md
  3. Review examples: examples/js_client.js and examples/curl.sh
  4. Run tests: npm test
  5. Deploy to production: See DEPLOYMENT.md

Support

  • GitHub Issues
  • Documentation in /docs
  • Example code in /examples

Enjoy building with the AI API Service! πŸš€