Spaces:

cygon24
/

ai-api-ollama

Build error

App Files Files Community

ai-api-ollama / QUICKSTART.md

cygon

Initial deployment with Ollama support

d61feef about 1 month ago

preview code

raw

history blame contribute delete

8.52 kB

	# Quick Start Guide

	Get your AI API Service up and running in 5 minutes!

	## Prerequisites

	- Node.js 18+
	- npm or yarn
	- At least one LLM API key (OpenAI, HuggingFace, or Anthropic)

	## 5-Minute Setup

	### 1. Install Dependencies

	```bash
	npm install
	```

	### 2. Configure Environment

	```bash
	cp .env.example .env
	```

	Edit `.env` and add your API keys:

	```env
	OPENAI_API_KEY=sk-your-openai-key
	API_KEYS=demo-key-1,my-secret-key
	```

	### 3. Start the Server

	```bash
	npm run dev
	```

	The API will be available at `http://localhost:8000`

	### 4. Test the API

	```bash
	curl http://localhost:8000/health
	```

	Expected response:
	```json
	{
	"status": "healthy",
	"version": "1.0.0",
	"services": [...],
	"uptime_seconds": 5
	}
	```

	### 5. Make Your First Request

	```bash
	curl -X POST http://localhost:8000/ai/chat \
	-H "Authorization: Bearer demo-key-1" \
	-H "Content-Type: application/json" \
	-d '{
	"conversation": [
	{"role": "user", "content": "Hello!"}
	]
	}'
	```

	## Example Requests

	### Chat
	```bash
	curl -X POST http://localhost:8000/ai/chat \
	-H "Authorization: Bearer demo-key-1" \
	-H "Content-Type: application/json" \
	-d '{"conversation": [{"role": "user", "content": "What is AI?"}]}'
	```

	### RAG Query
	```bash
	curl -X POST http://localhost:8000/rag/query \
	-H "Authorization: Bearer demo-key-1" \
	-H "Content-Type: application/json" \
	-d '{"query": "What are the key features?", "top_k": 5}'
	```

	### Image Generation
	```bash
	curl -X POST http://localhost:8000/image/generate \
	-H "Authorization: Bearer demo-key-1" \
	-H "Content-Type: application/json" \
	-d '{"prompt": "A sunset over mountains", "size": "1024x1024"}'
	```

	## What Each Component Does

	### 🔐 Authentication (`/backend/utils/auth.ts`)
	- Validates API keys from the Authorization header
	- Implements role-based access (default, premium, admin)
	- Used by all protected endpoints

	### ⚡ Rate Limiting (`/backend/utils/rate_limit.ts`)
	- Token bucket algorithm
	- Configurable limits per tier (60/300/1000 requests/min)
	- Automatic reset after 1 minute
	- Prevents abuse and cost overruns

	### 🤖 AI Service (`/backend/services/ai_service.ts`)
	- Multi-provider LLM routing (OpenAI, HuggingFace, Anthropic)
	- Automatic model selection and fallback
	- Chat completions with context management
	- Embedding generation for RAG

	### 📚 RAG Service (`/backend/services/rag_service.ts`)
	- Vector-based document retrieval
	- Automatic context injection into prompts
	- Supports Pinecone or in-memory vector DB
	- Returns sources with similarity scores

	### 🖼️ Image Service (`/backend/services/image_service.ts`)
	- Text-to-image generation
	- Supports DALL-E and Stable Diffusion
	- Configurable sizes and quality
	- Returns base64 or URLs

	### 🎙️ Voice Service (`/backend/services/voice_service.ts`)
	- Text-to-speech synthesis (TTS)
	- Speech-to-text transcription (STT)
	- Multiple voice options
	- Various audio formats (mp3, opus, etc.)

	### 📄 Document Service (`/backend/services/document_service.ts`)
	- Upload PDF, DOCX, TXT files
	- Automatic text extraction
	- Chunking with overlap for better retrieval
	- Background processing with workers
	- Stores chunks in vector DB

	### 🔌 Adapters

	#### OpenAI Adapter (`/backend/adapters/openai_adapter.ts`)
	- Chat completions (GPT-4, GPT-3.5)
	- Embeddings (text-embedding-ada-002)
	- Image generation (DALL-E)
	- Voice synthesis and transcription
	- Implements LLMAdapter, ImageAdapter, VoiceAdapter interfaces

	#### HuggingFace Adapter (`/backend/adapters/huggingface_adapter.ts`)
	- Open-source models (Mistral, Llama, etc.)
	- Stable Diffusion for images
	- Sentence transformers for embeddings
	- Free tier available

	#### Anthropic Adapter (`/backend/adapters/anthropic_adapter.ts`)
	- Claude models (Sonnet, Opus)
	- Advanced reasoning capabilities
	- Long context windows

	#### Vector DB Adapters (`/backend/adapters/vector_db_adapter.ts`)
	- PineconeAdapter: Production vector storage with managed scaling
	- InMemoryVectorDB: Development fallback with cosine similarity
	- Supports metadata filtering and batch operations

	### 📊 Observability

	#### Logger (`/backend/utils/logger.ts`)
	- Structured JSON logging
	- Configurable log levels (debug, info, warn, error)
	- Automatic timestamping
	- Production-ready format

	#### Metrics (`/backend/utils/metrics.ts`)
	- Request counting by endpoint
	- Error tracking
	- Response time measurement
	- Model usage statistics
	- Vector DB query counts
	- Document processing stats

	### 🔄 Background Workers (`/backend/workers/ingestion_worker.ts`)
	- Async document processing
	- Configurable concurrency
	- Job status tracking
	- Webhook notifications on completion
	- Automatic retries on failure

	### 🌐 API Endpoints

	All endpoints are in `/backend/api/`:

	#### Health & Metrics (`health.ts`)
	- `GET /health` - Service health with component status
	- `GET /metrics` - Usage metrics and statistics

	#### Authentication (`auth.ts`)
	- `POST /auth/verify` - Validate API key

	#### Chat (`chat.ts`)
	- `POST /ai/chat` - Multi-turn conversation
	- `GET /ai/query` - Simple Q&A

	#### RAG (`rag.ts`)
	- `POST /rag/query` - Query with retrieval
	- `GET /rag/models` - List available models

	#### Images (`image.ts`)
	- `POST /image/generate` - Generate images

	#### Voice (`voice.ts`)
	- `POST /voice/synthesize` - Text to speech
	- `POST /voice/transcribe` - Speech to text

	#### Documents (`documents.ts`)
	- `POST /upload` - Upload document
	- `GET /docs/:id/sources` - Get document chunks
	- `POST /webhook/events` - Processing webhooks

	## Architecture Flow

	```
	┌─────────┐
	│ Client │
	└────┬────┘
	│
	├─ Authorization Header (Bearer token)
	↓
	┌─────────────────┐
	│ Auth Middleware │ ← Validates API key
	└────┬────────────┘
	├─ Checks rate limit
	↓
	┌──────────────┐
	│ API Endpoint │ ← Routes request
	└────┬─────────┘
	├─ POST /ai/chat → AI Service
	├─ POST /rag/query → RAG Service → Vector DB → AI Service
	├─ POST /image/generate → Image Service
	├─ POST /voice/synthesize → Voice Service
	├─ POST /upload → Document Service → Worker → Vector DB
	↓
	┌───────────┐
	│ Response │ ← JSON with data + metadata
	└───────────┘
	```

	## Configuration

	### Environment Variables

	\| Variable \| What It Does \| Example \|
	\|----------\|-------------\|---------\|
	\| `OPENAI_API_KEY` \| OpenAI access for GPT models \| `sk-...` \|
	\| `HUGGINGFACE_API_KEY` \| HuggingFace models access \| `hf_...` \|
	\| `API_KEYS` \| Valid API keys (comma-separated) \| `key1,key2` \|
	\| `RATE_LIMIT_DEFAULT` \| Requests/min for basic users \| `60` \|
	\| `RATE_LIMIT_ADMIN` \| Requests/min for admins \| `1000` \|
	\| `MAX_FILE_SIZE_MB` \| Max document upload size \| `10` \|
	\| `CHUNK_SIZE` \| Text chunk size for RAG \| `1000` \|
	\| `LOG_LEVEL` \| Logging verbosity \| `info` \|

	### Tier System

	- Default: 60 requests/min
	- Premium: 300 requests/min (add to config)
	- Admin: 1000 requests/min (via `ADMIN_API_KEYS`)

	## Testing

	Run tests:
	```bash
	npm test
	```

	Run with coverage:
	```bash
	npm run test:coverage
	```

	## Production Checklist

	- [ ] Set strong `API_KEYS`
	- [ ] Configure `ADMIN_API_KEYS` separately
	- [ ] Set up Pinecone for vector storage
	- [ ] Increase rate limits based on needs
	- [ ] Enable background workers
	- [ ] Set `LOG_LEVEL=info` or `warn`
	- [ ] Configure CORS origins
	- [ ] Set up monitoring/alerting
	- [ ] Review cost limits on LLM providers

	## Troubleshooting

	"No LLM adapter available"
	→ Add at least one API key (OPENAI_API_KEY, HUGGINGFACE_API_KEY, or ANTHROPIC_API_KEY)

	"Invalid API key"
	→ Check Authorization header: `Bearer your-key-here`

	"Rate limit exceeded"
	→ Wait 60 seconds or use admin key

	Vector DB queries fail
	→ Service falls back to in-memory storage automatically

	## Next Steps

	1. Read the full README: `README.md`
	2. Check deployment guide: `DEPLOYMENT.md`
	3. Review examples: `examples/js_client.js` and `examples/curl.sh`
	4. Run tests: `npm test`
	5. Deploy to production: See DEPLOYMENT.md

	## Support

	- GitHub Issues
	- Documentation in `/docs`
	- Example code in `/examples`

	Enjoy building with the AI API Service! 🚀