Spaces:

cygon24
/

ai-api-ollama

Build error

App Files Files Community

ai-api-ollama / QUICKSTART.md

cygon

Initial deployment with Ollama support

d61feef about 1 month ago

preview code

raw

history blame contribute delete

8.52 kB

Quick Start Guide

Get your AI API Service up and running in 5 minutes!

Prerequisites

Node.js 18+
npm or yarn
At least one LLM API key (OpenAI, HuggingFace, or Anthropic)

5-Minute Setup

1. Install Dependencies

npm install

2. Configure Environment

cp .env.example .env

Edit .env and add your API keys:

OPENAI_API_KEY=sk-your-openai-key
API_KEYS=demo-key-1,my-secret-key

3. Start the Server

npm run dev

The API will be available at http://localhost:8000

4. Test the API

curl http://localhost:8000/health

Expected response:

{
  "status": "healthy",
  "version": "1.0.0",
  "services": [...],
  "uptime_seconds": 5
}

5. Make Your First Request

curl -X POST http://localhost:8000/ai/chat \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Example Requests

Chat

curl -X POST http://localhost:8000/ai/chat \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{"conversation": [{"role": "user", "content": "What is AI?"}]}'

RAG Query

curl -X POST http://localhost:8000/rag/query \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the key features?", "top_k": 5}'

Image Generation

curl -X POST http://localhost:8000/image/generate \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A sunset over mountains", "size": "1024x1024"}'

What Each Component Does

🔐 Authentication (`/backend/utils/auth.ts`)

Validates API keys from the Authorization header
Implements role-based access (default, premium, admin)
Used by all protected endpoints

⚡ Rate Limiting (`/backend/utils/rate_limit.ts`)

Token bucket algorithm
Configurable limits per tier (60/300/1000 requests/min)
Automatic reset after 1 minute
Prevents abuse and cost overruns

🤖 AI Service (`/backend/services/ai_service.ts`)

Multi-provider LLM routing (OpenAI, HuggingFace, Anthropic)
Automatic model selection and fallback
Chat completions with context management
Embedding generation for RAG

📚 RAG Service (`/backend/services/rag_service.ts`)

Vector-based document retrieval
Automatic context injection into prompts
Supports Pinecone or in-memory vector DB
Returns sources with similarity scores

🖼️ Image Service (`/backend/services/image_service.ts`)

Text-to-image generation
Supports DALL-E and Stable Diffusion
Configurable sizes and quality
Returns base64 or URLs

🎙️ Voice Service (`/backend/services/voice_service.ts`)

Text-to-speech synthesis (TTS)
Speech-to-text transcription (STT)
Multiple voice options
Various audio formats (mp3, opus, etc.)

📄 Document Service (`/backend/services/document_service.ts`)

Upload PDF, DOCX, TXT files
Automatic text extraction
Chunking with overlap for better retrieval
Background processing with workers
Stores chunks in vector DB

🔌 Adapters

OpenAI Adapter (`/backend/adapters/openai_adapter.ts`)

Chat completions (GPT-4, GPT-3.5)
Embeddings (text-embedding-ada-002)
Image generation (DALL-E)
Voice synthesis and transcription
Implements LLMAdapter, ImageAdapter, VoiceAdapter interfaces

HuggingFace Adapter (`/backend/adapters/huggingface_adapter.ts`)

Open-source models (Mistral, Llama, etc.)
Stable Diffusion for images
Sentence transformers for embeddings
Free tier available

Anthropic Adapter (`/backend/adapters/anthropic_adapter.ts`)

Claude models (Sonnet, Opus)
Advanced reasoning capabilities
Long context windows

Vector DB Adapters (`/backend/adapters/vector_db_adapter.ts`)

PineconeAdapter: Production vector storage with managed scaling
InMemoryVectorDB: Development fallback with cosine similarity
Supports metadata filtering and batch operations

📊 Observability

Logger (`/backend/utils/logger.ts`)

Structured JSON logging
Configurable log levels (debug, info, warn, error)
Automatic timestamping
Production-ready format

Metrics (`/backend/utils/metrics.ts`)

Request counting by endpoint
Error tracking
Response time measurement
Model usage statistics
Vector DB query counts
Document processing stats

🔄 Background Workers (`/backend/workers/ingestion_worker.ts`)

Async document processing
Configurable concurrency
Job status tracking
Webhook notifications on completion
Automatic retries on failure

🌐 API Endpoints

All endpoints are in /backend/api/:

Health & Metrics (`health.ts`)

GET /health - Service health with component status
GET /metrics - Usage metrics and statistics

Authentication (`auth.ts`)

POST /auth/verify - Validate API key

Chat (`chat.ts`)

POST /ai/chat - Multi-turn conversation
GET /ai/query - Simple Q&A

RAG (`rag.ts`)

POST /rag/query - Query with retrieval
GET /rag/models - List available models

Images (`image.ts`)

POST /image/generate - Generate images

Voice (`voice.ts`)

POST /voice/synthesize - Text to speech
POST /voice/transcribe - Speech to text

Documents (`documents.ts`)

POST /upload - Upload document
GET /docs/:id/sources - Get document chunks
POST /webhook/events - Processing webhooks

Architecture Flow

┌─────────┐
│ Client  │
└────┬────┘
     │
     ├─ Authorization Header (Bearer token)
     ↓
┌─────────────────┐
│ Auth Middleware │ ← Validates API key
└────┬────────────┘
     ├─ Checks rate limit
     ↓
┌──────────────┐
│ API Endpoint │ ← Routes request
└────┬─────────┘
     ├─ POST /ai/chat → AI Service
     ├─ POST /rag/query → RAG Service → Vector DB → AI Service
     ├─ POST /image/generate → Image Service
     ├─ POST /voice/synthesize → Voice Service
     ├─ POST /upload → Document Service → Worker → Vector DB
     ↓
┌───────────┐
│ Response  │ ← JSON with data + metadata
└───────────┘

Configuration

Environment Variables

Variable	What It Does	Example
`OPENAI_API_KEY`	OpenAI access for GPT models	`sk-...`
`HUGGINGFACE_API_KEY`	HuggingFace models access	`hf_...`
`API_KEYS`	Valid API keys (comma-separated)	`key1,key2`
`RATE_LIMIT_DEFAULT`	Requests/min for basic users	`60`
`RATE_LIMIT_ADMIN`	Requests/min for admins	`1000`
`MAX_FILE_SIZE_MB`	Max document upload size	`10`
`CHUNK_SIZE`	Text chunk size for RAG	`1000`
`LOG_LEVEL`	Logging verbosity	`info`

Tier System

Default: 60 requests/min
Premium: 300 requests/min (add to config)
Admin: 1000 requests/min (via ADMIN_API_KEYS)

Testing

Run tests:

npm test

Run with coverage:

npm run test:coverage

Production Checklist

Set strong API_KEYS
Configure ADMIN_API_KEYS separately
Set up Pinecone for vector storage
Increase rate limits based on needs
Enable background workers
Set LOG_LEVEL=info or warn
Configure CORS origins
Set up monitoring/alerting
Review cost limits on LLM providers

Troubleshooting

"No LLM adapter available" → Add at least one API key (OPENAI_API_KEY, HUGGINGFACE_API_KEY, or ANTHROPIC_API_KEY)

"Invalid API key" → Check Authorization header: Bearer your-key-here

"Rate limit exceeded" → Wait 60 seconds or use admin key

Vector DB queries fail → Service falls back to in-memory storage automatically

Next Steps

Read the full README: README.md
Check deployment guide: DEPLOYMENT.md
Review examples: examples/js_client.js and examples/curl.sh
Run tests: npm test
Deploy to production: See DEPLOYMENT.md

Support

GitHub Issues
Documentation in /docs
Example code in /examples

Enjoy building with the AI API Service! 🚀

Quick Start Guide

Prerequisites

5-Minute Setup

1. Install Dependencies

2. Configure Environment

3. Start the Server

4. Test the API

5. Make Your First Request

Example Requests

Chat

RAG Query

Image Generation

What Each Component Does

🔐 Authentication (/backend/utils/auth.ts)

⚡ Rate Limiting (/backend/utils/rate_limit.ts)

🤖 AI Service (/backend/services/ai_service.ts)

📚 RAG Service (/backend/services/rag_service.ts)

🖼️ Image Service (/backend/services/image_service.ts)

🎙️ Voice Service (/backend/services/voice_service.ts)

📄 Document Service (/backend/services/document_service.ts)