Spaces:

cygon24
/

ai-api-ollama

Build error

File size: 8,517 Bytes

d61feef

# Quick Start Guide

Get your AI API Service up and running in 5 minutes!

## Prerequisites

- Node.js 18+
- npm or yarn
- At least one LLM API key (OpenAI, HuggingFace, or Anthropic)

## 5-Minute Setup

### 1. Install Dependencies

```bash
npm install
```

### 2. Configure Environment

```bash
cp .env.example .env
```

Edit `.env` and add your API keys:

```env
OPENAI_API_KEY=sk-your-openai-key
API_KEYS=demo-key-1,my-secret-key
```

### 3. Start the Server

```bash
npm run dev
```

The API will be available at `http://localhost:8000`

### 4. Test the API

```bash
curl http://localhost:8000/health
```

Expected response:
```json
{
  "status": "healthy",
  "version": "1.0.0",
  "services": [...],
  "uptime_seconds": 5
}
```

### 5. Make Your First Request

```bash
curl -X POST http://localhost:8000/ai/chat \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "Hello!"}
    ]
  }'
```

## Example Requests

### Chat
```bash
curl -X POST http://localhost:8000/ai/chat \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{"conversation": [{"role": "user", "content": "What is AI?"}]}'
```

### RAG Query
```bash
curl -X POST http://localhost:8000/rag/query \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the key features?", "top_k": 5}'
```

### Image Generation
```bash
curl -X POST http://localhost:8000/image/generate \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A sunset over mountains", "size": "1024x1024"}'
```

## What Each Component Does

### 🔐 **Authentication (`/backend/utils/auth.ts`)**
- Validates API keys from the Authorization header
- Implements role-based access (default, premium, admin)
- Used by all protected endpoints

### ⚡ **Rate Limiting (`/backend/utils/rate_limit.ts`)**
- Token bucket algorithm
- Configurable limits per tier (60/300/1000 requests/min)
- Automatic reset after 1 minute
- Prevents abuse and cost overruns

### 🤖 **AI Service (`/backend/services/ai_service.ts`)**
- Multi-provider LLM routing (OpenAI, HuggingFace, Anthropic)
- Automatic model selection and fallback
- Chat completions with context management
- Embedding generation for RAG

### 📚 **RAG Service (`/backend/services/rag_service.ts`)**
- Vector-based document retrieval
- Automatic context injection into prompts
- Supports Pinecone or in-memory vector DB
- Returns sources with similarity scores

### 🖼️ **Image Service (`/backend/services/image_service.ts`)**
- Text-to-image generation
- Supports DALL-E and Stable Diffusion
- Configurable sizes and quality
- Returns base64 or URLs

### 🎙️ **Voice Service (`/backend/services/voice_service.ts`)**
- Text-to-speech synthesis (TTS)
- Speech-to-text transcription (STT)
- Multiple voice options
- Various audio formats (mp3, opus, etc.)

### 📄 **Document Service (`/backend/services/document_service.ts`)**
- Upload PDF, DOCX, TXT files
- Automatic text extraction
- Chunking with overlap for better retrieval
- Background processing with workers
- Stores chunks in vector DB

### 🔌 **Adapters**

#### **OpenAI Adapter (`/backend/adapters/openai_adapter.ts`)**
- Chat completions (GPT-4, GPT-3.5)
- Embeddings (text-embedding-ada-002)
- Image generation (DALL-E)
- Voice synthesis and transcription
- Implements LLMAdapter, ImageAdapter, VoiceAdapter interfaces

#### **HuggingFace Adapter (`/backend/adapters/huggingface_adapter.ts`)**
- Open-source models (Mistral, Llama, etc.)
- Stable Diffusion for images
- Sentence transformers for embeddings
- Free tier available

#### **Anthropic Adapter (`/backend/adapters/anthropic_adapter.ts`)**
- Claude models (Sonnet, Opus)
- Advanced reasoning capabilities
- Long context windows

#### **Vector DB Adapters (`/backend/adapters/vector_db_adapter.ts`)**
- **PineconeAdapter**: Production vector storage with managed scaling
- **InMemoryVectorDB**: Development fallback with cosine similarity
- Supports metadata filtering and batch operations

### 📊 **Observability**

#### **Logger (`/backend/utils/logger.ts`)**
- Structured JSON logging
- Configurable log levels (debug, info, warn, error)
- Automatic timestamping
- Production-ready format

#### **Metrics (`/backend/utils/metrics.ts`)**
- Request counting by endpoint
- Error tracking
- Response time measurement
- Model usage statistics
- Vector DB query counts
- Document processing stats

### 🔄 **Background Workers (`/backend/workers/ingestion_worker.ts`)**
- Async document processing
- Configurable concurrency
- Job status tracking
- Webhook notifications on completion
- Automatic retries on failure

### 🌐 **API Endpoints**

All endpoints are in `/backend/api/`:

#### **Health & Metrics (`health.ts`)**
- `GET /health` - Service health with component status
- `GET /metrics` - Usage metrics and statistics

#### **Authentication (`auth.ts`)**
- `POST /auth/verify` - Validate API key

#### **Chat (`chat.ts`)**
- `POST /ai/chat` - Multi-turn conversation
- `GET /ai/query` - Simple Q&A

#### **RAG (`rag.ts`)**
- `POST /rag/query` - Query with retrieval
- `GET /rag/models` - List available models

#### **Images (`image.ts`)**
- `POST /image/generate` - Generate images

#### **Voice (`voice.ts`)**
- `POST /voice/synthesize` - Text to speech
- `POST /voice/transcribe` - Speech to text

#### **Documents (`documents.ts`)**
- `POST /upload` - Upload document
- `GET /docs/:id/sources` - Get document chunks
- `POST /webhook/events` - Processing webhooks

## Architecture Flow

```
┌─────────┐
│ Client  │
└────┬────┘
     │
     ├─ Authorization Header (Bearer token)
     ↓
┌─────────────────┐
│ Auth Middleware │ ← Validates API key
└────┬────────────┘
     ├─ Checks rate limit
     ↓
┌──────────────┐
│ API Endpoint │ ← Routes request
└────┬─────────┘
     ├─ POST /ai/chat → AI Service
     ├─ POST /rag/query → RAG Service → Vector DB → AI Service
     ├─ POST /image/generate → Image Service
     ├─ POST /voice/synthesize → Voice Service
     ├─ POST /upload → Document Service → Worker → Vector DB
     ↓
┌───────────┐
│ Response  │ ← JSON with data + metadata
└───────────┘
```

## Configuration

### Environment Variables

| Variable | What It Does | Example |
|----------|-------------|---------|
| `OPENAI_API_KEY` | OpenAI access for GPT models | `sk-...` |
| `HUGGINGFACE_API_KEY` | HuggingFace models access | `hf_...` |
| `API_KEYS` | Valid API keys (comma-separated) | `key1,key2` |
| `RATE_LIMIT_DEFAULT` | Requests/min for basic users | `60` |
| `RATE_LIMIT_ADMIN` | Requests/min for admins | `1000` |
| `MAX_FILE_SIZE_MB` | Max document upload size | `10` |
| `CHUNK_SIZE` | Text chunk size for RAG | `1000` |
| `LOG_LEVEL` | Logging verbosity | `info` |

### Tier System

- **Default**: 60 requests/min
- **Premium**: 300 requests/min (add to config)
- **Admin**: 1000 requests/min (via `ADMIN_API_KEYS`)

## Testing

Run tests:
```bash
npm test
```

Run with coverage:
```bash
npm run test:coverage
```

## Production Checklist

- [ ] Set strong `API_KEYS`
- [ ] Configure `ADMIN_API_KEYS` separately
- [ ] Set up Pinecone for vector storage
- [ ] Increase rate limits based on needs
- [ ] Enable background workers
- [ ] Set `LOG_LEVEL=info` or `warn`
- [ ] Configure CORS origins
- [ ] Set up monitoring/alerting
- [ ] Review cost limits on LLM providers

## Troubleshooting

**"No LLM adapter available"**
→ Add at least one API key (OPENAI_API_KEY, HUGGINGFACE_API_KEY, or ANTHROPIC_API_KEY)

**"Invalid API key"**
→ Check Authorization header: `Bearer your-key-here`

**"Rate limit exceeded"**
→ Wait 60 seconds or use admin key

**Vector DB queries fail**
→ Service falls back to in-memory storage automatically

## Next Steps

1. **Read the full README**: `README.md`
2. **Check deployment guide**: `DEPLOYMENT.md`
3. **Review examples**: `examples/js_client.js` and `examples/curl.sh`
4. **Run tests**: `npm test`
5. **Deploy to production**: See DEPLOYMENT.md

## Support

- GitHub Issues
- Documentation in `/docs`
- Example code in `/examples`

Enjoy building with the AI API Service! 🚀