Spaces:

cygon24
/

ai-api-ollama

Build error

File size: 33,744 Bytes

d61feef

# Complete Step-by-Step Guide: Deploy AI API with Ollama to Hugging Face Spaces
## (Absolute Beginner-Friendly Guide)

**What you'll build**: A fully working AI API running on Hugging Face Spaces that anyone can access via the internet, powered by Ollama (no OpenAI key needed).

**Time needed**: 30-45 minutes  
**Cost**: FREE (or $0.60/hour for faster GPU)  
**No prior experience needed!**

---

## 📋 **What You Need Before Starting**

1. ✅ A Hugging Face account (we'll create this if you don't have one)
2. ✅ Git installed on your computer
3. ✅ Basic ability to copy/paste and follow instructions
4. ✅ This project's code files (you already have these)

---

## 🎯 **PART 1: Create Hugging Face Account & Space**

### **Step 1.1: Create Hugging Face Account** (Skip if you have one)

1. Open your web browser
2. Go to: https://huggingface.co/join
3. Fill in:
   - **Email**: Your email address
   - **Username**: Pick a username (you'll need this later - write it down!)
   - **Password**: Choose a strong password
4. Click **"Sign Up"**
5. Check your email and click the verification link
6. You're now logged into Hugging Face!

### **Step 1.2: Create a New Space**

1. **Go to**: https://huggingface.co/new-space
   
2. **Fill in the form**:
   
   | Field | What to Enter | Example |
   |-------|---------------|---------|
   | **Owner** | Your username | `yourname` |
   | **Space name** | `ai-api-ollama` | (or anything you like) |
   | **License** | Select "MIT" | |
   | **Select the Space SDK** | Click on **"Docker"** | ⚠️ IMPORTANT: Must be Docker! |
   | **Space hardware** | Select **"CPU basic - Free"** for now | (We'll upgrade later if needed) |
   | **Repo type** | Leave as **"Public"** | (or Private if you prefer) |

3. **Click "Create Space"** button at the bottom

4. **IMPORTANT - Write down your Space URL**:
   ```
   https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
   ```
   Replace `YOUR_USERNAME` with your actual username.

5. You'll see a page with instructions - **ignore them for now**, we'll do it differently.

---

## 🔧 **PART 2: Install Git and Set Up Authentication**

### **Step 2.1: Check if Git is Installed**

**On Windows**:
1. Press `Windows Key + R`
2. Type `cmd` and press Enter
3. Type: `git --version`
4. If you see a version number (like `git version 2.40.0`), you have Git ✅
5. If you see an error, download Git from: https://git-scm.com/download/win

**On Mac**:
1. Press `Command + Space`
2. Type `terminal` and press Enter
3. Type: `git --version`
4. If you see a version number, you have Git ✅
5. If not, it will prompt you to install Xcode Command Line Tools - click Install

**On Linux**:
```bash
git --version
```
If not installed:
```bash
sudo apt-get update
sudo apt-get install git
```

### **Step 2.2: Create Hugging Face Access Token**

1. Go to: https://huggingface.co/settings/tokens
2. Click **"New token"** button
3. Fill in:
   - **Name**: `git-access` (or anything you like)
   - **Role**: Select **"Write"**
4. Click **"Generate token"**
5. **CRITICAL**: Copy the token and save it somewhere safe (Notepad, password manager)
   - It looks like: `hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
   - ⚠️ **You won't be able to see this again!**

---

## 💻 **PART 3: Clone Your Space to Your Computer**

### **Step 3.1: Open Terminal/Command Prompt**

**Windows**:
1. Press `Windows Key + R`
2. Type `cmd` and press Enter
3. Navigate to where you want to work (e.g., Desktop):
   ```
   cd Desktop
   ```

**Mac/Linux**:
1. Open Terminal
2. Navigate to where you want to work:
   ```bash
   cd ~/Desktop
   ```

### **Step 3.2: Clone the Space Repository**

1. **Copy this command** (replace YOUR_USERNAME with your actual Hugging Face username):
   ```bash
   git clone https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
   ```

2. **Example**:
   ```bash
   git clone https://huggingface.co/spaces/johndoe/ai-api-ollama
   ```

3. **Press Enter**

4. When prompted for username and password:
   - **Username**: Your Hugging Face username
   - **Password**: **Paste your token** (NOT your password!) - the one that starts with `hf_`
   
5. You should see:
   ```
   Cloning into 'ai-api-ollama'...
   ```

6. **Verify the folder was created**:
   ```bash
   cd ai-api-ollama
   ls
   ```
   (On Windows use `dir` instead of `ls`)

---

## 📂 **PART 4: Copy Project Files to Space**

### **Step 4.1: Locate Your AI API Service Files**

You should have the project files in a folder. Let's say they're in:
- Windows: `C:\Users\YourName\Downloads\ai-api-service\`
- Mac/Linux: `~/Downloads/ai-api-service/`

### **Step 4.2: Copy ALL Files to Space Folder**

**Option A: Using File Explorer (Easiest)**

**Windows**:
1. Open File Explorer
2. Navigate to your original `ai-api-service` folder
3. Press `Ctrl + A` to select all files
4. Press `Ctrl + C` to copy
5. Navigate to `Desktop\ai-api-ollama` (your Space folder)
6. Press `Ctrl + V` to paste
7. When asked about replacing files, click **"Replace"**

**Mac**:
1. Open Finder
2. Navigate to your original `ai-api-service` folder
3. Press `Cmd + A` to select all files
4. Press `Cmd + C` to copy
5. Navigate to `Desktop/ai-api-ollama` (your Space folder)
6. Press `Cmd + V` to paste

**Option B: Using Command Line**

From the terminal, in your Space folder:

**Windows**:
```bash
xcopy /E /I "C:\Users\YourName\Downloads\ai-api-service\*" .
```

**Mac/Linux**:
```bash
cp -r ~/Downloads/ai-api-service/* .
```

### **Step 4.3: Verify Files Were Copied**

In your terminal (inside the `ai-api-ollama` folder):

```bash
ls
```

You should see these folders/files:
- `backend/`
- `examples/`
- `tests/`
- `package.json`
- `README.md`
- `.env.example`
- `Dockerfile.huggingface`
- And many more files...

✅ If you see these, you're good to proceed!

---

## 🐳 **PART 5: Prepare the Dockerfile for Hugging Face**

### **Step 5.1: Rename the Dockerfile**

Hugging Face expects a file named exactly `Dockerfile` (no extension).

**Windows Command Prompt**:
```bash
ren Dockerfile.huggingface Dockerfile
```

**Mac/Linux Terminal**:
```bash
mv Dockerfile.huggingface Dockerfile
```

### **Step 5.2: Verify the Dockerfile**

```bash
cat Dockerfile
```

You should see content starting with `FROM node:18-alpine AS builder`

✅ Good to go!

---

## 📝 **PART 6: Create Space Configuration Files**

### **Step 6.1: Create README.md for Your Space**

This file tells Hugging Face how to run your Space.

**Create a new file called `README.md`** in your `ai-api-ollama` folder:

**Windows**:
```bash
notepad README.md
```

**Mac/Linux**:
```bash
nano README.md
```

**Copy and paste this EXACT content** (replace YOUR_USERNAME):

```markdown
---
title: AI API Service with Ollama
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# AI API Service with Ollama

A production-ready AI API service powered by Ollama. No OpenAI API key needed!

## 🚀 Features

- 💬 **Multi-turn Chat** - Conversational AI with Llama2/Llama3
- 📚 **RAG** - Retrieval-Augmented Generation with vector search
- 🖼️ **Image Generation** - Text-to-image (requires additional API key)
- 🎙️ **Voice Synthesis** - Text-to-speech (requires additional API key)
- 📄 **Document Processing** - Upload and query PDFs, DOCX, TXT
- 🔒 **Authentication** - Secure API key-based access
- ⚡ **Rate Limiting** - Prevent abuse

## 📡 API Endpoint

```
https://YOUR_USERNAME-ai-api-ollama.hf.space
```

## 🔑 Quick Start

### Health Check

```bash
curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health
```

### Chat Example

```bash
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "Explain machine learning in simple terms"}
    ]
  }'
```

### RAG Example

```bash
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are transformers in AI?",
    "top_k": 5
  }'
```

## 🔐 Authentication

Default API key: `demo-key-1`

**⚠️ IMPORTANT**: Change this in Space settings for production use!

## 📚 Available Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Service health check |
| `/metrics` | GET | Usage metrics |
| `/ai/chat` | POST | Multi-turn conversation |
| `/ai/query` | GET | Simple question answering |
| `/rag/query` | POST | Query with document retrieval |
| `/image/generate` | POST | Generate images (needs API key) |
| `/voice/synthesize` | POST | Text to speech (needs API key) |
| `/upload` | POST | Upload documents |

## ⚙️ Configuration

Configured with Ollama running **inside the Space** for true serverless deployment.

**Current Settings**:
- Model: Llama 2 (7B)
- Embedding Model: nomic-embed-text
- Hardware: See Space settings

## 🎯 Use Cases

- Chatbot backend for web/mobile apps
- Document Q&A system
- AI-powered search
- Content generation API
- Educational AI assistant

## 📖 Documentation

Full API documentation: [See repository](https://github.com/your-username/ai-api-service)

## 💡 Tips

1. **First request is slow** - Ollama loads the model on first use (~30 seconds)
2. **Subsequent requests are fast** - Model stays in memory
3. **Use persistent hardware** - Upgrade from CPU to GPU for better performance
4. **Monitor costs** - Free tier works great for testing, upgrade for production

## 🆘 Support

Having issues? Check the logs or open an issue on GitHub.

---

Built with [Encore.ts](https://encore.dev) and [Ollama](https://ollama.ai)
```

**Save the file**:
- Notepad: File → Save
- Nano: Press `Ctrl + O`, then `Enter`, then `Ctrl + X`

---

## 🔐 **PART 7: Configure Environment Variables in Space Settings**

### **Step 7.1: Go to Your Space Settings**

1. Open your browser
2. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings`
3. Scroll down to **"Variables and secrets"** section

### **Step 7.2: Add Environment Variables**

Click **"New variable"** for each of these:

#### **Variable 1: API_KEYS**
- **Name**: `API_KEYS`
- **Value**: `my-secret-key-12345,another-key-67890`
  - ⚠️ **IMPORTANT**: Replace with your own random keys!
  - Use strong, random strings (20+ characters)
  - Separate multiple keys with commas (no spaces)
- Click **"Save"**

#### **Variable 2: ADMIN_API_KEYS** (Optional but recommended)
- **Name**: `ADMIN_API_KEYS`
- **Value**: `admin-super-secret-key-99999`
  - ⚠️ Make this DIFFERENT from regular API keys
  - This bypasses rate limits
- Click **"Save"**

#### **Variable 3: OLLAMA_MODEL**
- **Name**: `OLLAMA_MODEL`
- **Value**: Choose one:
  - `phi:latest` (Fastest, smallest - 1.3GB - **RECOMMENDED FOR FREE CPU**)
  - `llama2:latest` (Good quality - 4GB)
  - `llama3:latest` (Best quality - 4.7GB - needs GPU)
  - `mistral:latest` (Very good - 4GB)
- Click **"Save"**

**Recommendation for FREE tier**: Use `phi:latest`

#### **Variable 4: OLLAMA_EMBEDDING_MODEL**
- **Name**: `OLLAMA_EMBEDDING_MODEL`
- **Value**: `nomic-embed-text`
  - Leave as is, this works great for RAG
- Click **"Save"**

#### **Variable 5: RATE_LIMIT_DEFAULT**
- **Name**: `RATE_LIMIT_DEFAULT`
- **Value**: `100`
  - This means 100 requests per minute for regular API keys
- Click **"Save"**

#### **Variable 6: LOG_LEVEL** (Optional)
- **Name**: `LOG_LEVEL`
- **Value**: `info`
- Click **"Save"**

### **Step 7.3: Verify Your Variables**

You should now see these variables listed:
- ✅ `API_KEYS`
- ✅ `ADMIN_API_KEYS` (if you added it)
- ✅ `OLLAMA_MODEL`
- ✅ `OLLAMA_EMBEDDING_MODEL`
- ✅ `RATE_LIMIT_DEFAULT`

---

## 📤 **PART 8: Push Code to Hugging Face**

Now we'll upload all the files to Hugging Face.

### **Step 8.1: Configure Git (First Time Only)**

In your terminal (inside the `ai-api-ollama` folder):

```bash
git config user.email "[email protected]"
git config user.name "Your Name"
```

Replace with your actual email and name.

### **Step 8.2: Add All Files to Git**

```bash
git add .
```

The `.` means "add all files in this folder"

### **Step 8.3: Commit the Files**

```bash
git commit -m "Initial deployment with Ollama support"
```

You should see output like:
```
[main abc1234] Initial deployment with Ollama support
 XX files changed, XXX insertions(+)
```

### **Step 8.4: Push to Hugging Face**

```bash
git push
```

When prompted for credentials:
- **Username**: Your Hugging Face username
- **Password**: Your Hugging Face token (starts with `hf_`)

You'll see:
```
Enumerating objects: XX, done.
Counting objects: 100% (XX/XX), done.
Writing objects: 100% (XX/XX), XX.XX MiB | XX.XX MiB/s, done.
```

✅ **Success!** Your code is now on Hugging Face.

---

## ⏳ **PART 9: Wait for Build & Monitor Progress**

### **Step 9.1: Go to Your Space**

1. Open browser: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama`
2. You'll see a yellow "Building" status at the top

### **Step 9.2: Watch the Build Logs**

1. Click on the **"Logs"** tab (near the top)
2. You'll see real-time output like:
   ```
   Building Docker image...
   Step 1/15 : FROM node:18-alpine AS builder
   ...
   ```

### **Step 9.3: What to Expect (Timeline)**

| Time | What's Happening | What You'll See |
|------|------------------|-----------------|
| 0-2 min | Docker image building | `Building Docker image...` |
| 2-5 min | Installing Node dependencies | `npm install...` |
| 5-8 min | Installing Ollama | `Installing Ollama...` |
| 8-10 min | Starting services | `Starting Ollama...` |
| 10-15 min | **Downloading Ollama model** | `Pulling model: phi:latest` ⏳ **LONGEST STEP** |
| 15+ min | Warming up model | `Warming up model...` |
| Final | **Space is RUNNING** | 🟢 Green "Running" status |

**Total time**: 15-20 minutes for first deployment

### **Step 9.4: Troubleshooting Build Errors**

If you see **red error messages**:

**Common Error 1**: `npm install failed`
- **Fix**: Check that `package.json` was copied correctly
- Re-run: `git add package.json && git commit -m "fix package.json" && git push`

**Common Error 2**: `Port 7860 already in use`
- **Fix**: This shouldn't happen, but if it does, check README.md has `app_port: 7860`

**Common Error 3**: `Model download timeout`
- **Fix**: Use a smaller model like `phi:latest` in environment variables
- Or upgrade to GPU hardware (see Part 10)

**Common Error 4**: `Out of memory`
- **Fix**: Model too big for free CPU. Use `phi:latest` or upgrade to paid tier

### **Step 9.5: Verify Space is Running**

When build completes:
1. Status changes to 🟢 **"Running"**
2. You'll see in logs: `Starting AI API Service on port 7860...`
3. **Your API is now LIVE!**

---

## 🎉 **PART 10: Test Your Live API**

### **Step 10.1: Get Your Space URL**

Your API is available at:
```
https://YOUR_USERNAME-ai-api-ollama.hf.space
```

**Example**:
```
https://johndoe-ai-api-ollama.hf.space
```

### **Step 10.2: Test Health Endpoint**

**Option A: Use Browser**
1. Open your browser
2. Go to: `https://YOUR_USERNAME-ai-api-ollama.hf.space/health`
3. You should see JSON like:
   ```json
   {
     "status": "healthy",
     "version": "1.0.0",
     "services": [...]
   }
   ```

✅ If you see this, your API is working!

**Option B: Use Command Line**

```bash
curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health
```

### **Step 10.3: Test Chat Endpoint**

**Copy this command** (replace YOUR_USERNAME and use one of your API keys):

```bash
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
  -H "Authorization: Bearer my-secret-key-12345" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {
        "role": "user",
        "content": "Hello! Can you explain what you are in one sentence?"
      }
    ]
  }'
```

**Expected response** (takes 5-30 seconds for first request):
```json
{
  "reply": "I am an AI assistant powered by Llama, designed to help answer questions...",
  "model": "llama2",
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 50,
    "total_tokens": 75
  },
  "sources": null
}
```

✅ **Success!** Your AI API is working!

### **Step 10.4: Test RAG Endpoint (Optional)**

First, upload a document:

```bash
# Create a test document
echo "The AI API Service is a production-ready API for chatbots. It supports Ollama, OpenAI, and HuggingFace." > test.txt

# Convert to base64
base64 test.txt > test.txt.b64

# Upload (Mac/Linux)
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/upload \
  -H "Authorization: Bearer my-secret-key-12345" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"test.txt\",
    \"content_base64\": \"$(cat test.txt.b64)\",
    \"metadata\": {\"title\": \"Test Document\"}
  }"
```

Then query it:

```bash
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
  -H "Authorization: Bearer my-secret-key-12345" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What does the API support?",
    "top_k": 3
  }'
```

---

## 📊 **PART 11: Monitor and Optimize (Optional)**

### **Step 11.1: Check Metrics**

```bash
curl https://YOUR_USERNAME-ai-api-ollama.hf.space/metrics \
  -H "Authorization: Bearer my-secret-key-12345"
```

You'll see:
- Total requests
- Errors
- Response times
- Model usage

### **Step 11.2: Upgrade Hardware (If Needed)**

If your Space is slow or timing out:

1. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings`
2. Scroll to **"Space hardware"**
3. Click **"Change hardware"**
4. Select:
   - **CPU upgrade** ($0.60/hr) - 2x faster than free
   - **GPU T4** ($0.60/hr) - 10x faster, supports bigger models
   - **GPU A10G** ($3.15/hr) - Best performance
5. Click **"Update Space"**
6. Space will restart with new hardware (~5 minutes)

### **Step 11.3: Use Bigger Models**

Once you have GPU:

1. Go to Settings → Variables and secrets
2. Edit `OLLAMA_MODEL`
3. Change to: `llama3:latest` or `mistral:latest`
4. Save
5. Space will restart and download new model

---

## 🔒 **PART 12: Security Best Practices**

### **Step 12.1: Change Default API Keys**

**⚠️ CRITICAL FOR PRODUCTION**

1. Go to Space Settings → Variables
2. Edit `API_KEYS`
3. Replace `demo-key-1` with strong random keys:
   ```
   ak_live_a8f7d9e2c1b4f5a7d8e9c2b1a5f7,ak_live_b9c2d1e3f4a5b7c8d9e1f2a3b5
   ```
4. **Never share these keys publicly!**

### **Step 12.2: Make Space Private (Optional)**

1. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings`
2. Scroll to **"Rename or change repo visibility"**
3. Click **"Make private"**
4. Confirm

Now only you can see the Space, but the API still works for anyone with the URL and API key.

### **Step 12.3: Monitor Usage**

Check logs regularly:
1. Go to Space → Logs tab
2. Look for suspicious activity:
   - Many failed authentication attempts
   - Unusually high request volume
   - Error patterns

---

## 🎯 **PART 13: Using Your API in Applications**

### **Example: JavaScript/TypeScript Web App**

```javascript
// Save as: app.js

const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
const API_KEY = 'my-secret-key-12345'; // Your actual key

async function chat(message) {
  const response = await fetch(`${API_URL}/ai/chat`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      conversation: [
        { role: 'user', content: message }
      ]
    })
  });
  
  const data = await response.json();
  return data.reply;
}

// Usage
chat('Hello!').then(reply => {
  console.log('AI:', reply);
});
```

### **Example: Python Application**

```python
# Save as: app.py

import requests

API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space'
API_KEY = 'my-secret-key-12345'

def chat(message):
    response = requests.post(
        f'{API_URL}/ai/chat',
        headers={
            'Authorization': f'Bearer {API_KEY}',
            'Content-Type': 'application/json'
        },
        json={
            'conversation': [
                {'role': 'user', 'content': message}
            ]
        }
    )
    return response.json()['reply']

# Usage
reply = chat('Hello!')
print(f'AI: {reply}')
```

### **Example: Mobile App (React Native)**

```javascript
// Save as: ChatService.js

const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
const API_KEY = 'my-secret-key-12345';

export async function sendMessage(message) {
  try {
    const response = await fetch(`${API_URL}/ai/chat`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        conversation: [
          { role: 'user', content: message }
        ]
      })
    });
    
    if (!response.ok) {
      throw new Error('API request failed');
    }
    
    const data = await response.json();
    return data.reply;
  } catch (error) {
    console.error('Chat error:', error);
    throw error;
  }
}
```

---

## 🆘 **PART 14: Troubleshooting Common Issues**

### **Issue 1: "Space is building for too long"**

**Symptoms**: Build takes 30+ minutes

**Causes**:
- Large model download (llama3 is 4.7GB)
- Slow internet on Hugging Face servers
- Free tier resource limits

**Solutions**:
1. Use smaller model: `phi:latest` (1.3GB)
2. Upgrade to GPU hardware for faster downloads
3. Wait patiently - first build is always slow

---

### **Issue 2: "Space crashed / Runtime error"**

**Symptoms**: Red "Runtime error" status

**Check logs for**:

**Error**: `Out of memory`
- **Fix**: Model too big for hardware
- **Solution**: Use `phi:latest` or upgrade to GPU T4

**Error**: `Port 7860 already in use`
- **Fix**: Check README.md has correct `app_port: 7860`
- **Solution**: Edit README.md and push again

**Error**: `Ollama failed to start`
- **Fix**: Dockerfile issue
- **Solution**: Verify Dockerfile was renamed correctly

---

### **Issue 3: "API returns 401 Unauthorized"**

**Symptoms**: 
```json
{"error": "Invalid API key"}
```

**Solutions**:
1. **Check your Authorization header**:
   ```bash
   # Correct format:
   -H "Authorization: Bearer my-secret-key-12345"
   
   # NOT:
   -H "Authorization: my-secret-key-12345"  # Missing "Bearer"
   ```

2. **Verify API key is in Space settings**:
   - Go to Settings → Variables
   - Check `API_KEYS` contains your key
   - Keys are case-sensitive!

3. **Try the default key**:
   ```bash
   -H "Authorization: Bearer demo-key-1"
   ```

---

### **Issue 4: "API is very slow (30+ seconds)"**

**Causes**:
- First request loads model into memory (normal)
- Free CPU tier is slow
- Model is too large for hardware

**Solutions**:
1. **First request is always slow** - subsequent requests are fast
2. **Upgrade to GPU T4**:
   - Settings → Space hardware → GPU T4
   - 10x faster inference
3. **Use smaller model**: `phi:latest`
4. **Add model warmup** (already in Dockerfile):
   - Keeps model loaded
   - Reduces cold start time

---

### **Issue 5: "Cannot upload documents"**

**Error**: `File too large`

**Fix**: 
- Default max size is 10MB
- To increase, add environment variable:
  ```
  MAX_FILE_SIZE_MB=50
  ```

**Error**: `Invalid file format`

**Fix**:
- Only supports: PDF, DOCX, TXT
- Ensure file extension is correct
- Check file is not corrupted

---

### **Issue 6: "RAG returns no results"**

**Symptoms**: Empty `sources` array in response

**Causes**:
1. No documents uploaded yet
2. Query doesn't match document content
3. Embedding model not loaded

**Solutions**:
1. **Upload a document first**:
   ```bash
   curl -X POST https://YOUR_API/upload \
     -H "Authorization: Bearer YOUR_KEY" \
     -F "[email protected]"
   ```

2. **Wait for processing** (check logs):
   ```
   Document processed successfully: doc_abc123
   ```

3. **Try broader query**:
   - Instead of: "What is the exact price?"
   - Try: "pricing information"

---

### **Issue 7: "How do I see errors?"**

**Steps**:
1. Go to your Space
2. Click **"Logs"** tab
3. Look for lines with:
   ```
   "level": "error"
   ```
4. Read the `"message"` field

**Common errors and fixes**:

```json
{"level":"error","message":"Invalid API key"}
```
→ Fix: Check Authorization header

```json
{"level":"error","message":"Rate limit exceeded"}
```
→ Fix: Wait 60 seconds or use admin key

```json
{"level":"error","message":"Ollama API error"}
```
→ Fix: Model not loaded, wait for startup to complete

---

### **Issue 8: "Space keeps restarting"**

**Symptoms**: Status alternates between Building and Running

**Causes**:
- Application crashes on startup
- Out of memory
- Port configuration issue

**Debug steps**:
1. Check logs for crash reason
2. Verify environment variables are set
3. Try smaller model
4. Contact Hugging Face support if persistent

---

## 📖 **PART 15: Complete API Reference**

### **Base URL**
```
https://YOUR_USERNAME-ai-api-ollama.hf.space
```

### **Authentication**
All endpoints (except `/health`) require:
```
Authorization: Bearer YOUR_API_KEY
```

---

### **1. Health Check**

**Endpoint**: `GET /health`

**No authentication required**

**Example**:
```bash
curl https://YOUR_API/health
```

**Response**:
```json
{
  "status": "healthy",
  "version": "1.0.0",
  "services": [
    {"name": "llm", "status": "up"},
    {"name": "vector_db", "status": "up"}
  ],
  "uptime_seconds": 3600
}
```

---

### **2. Metrics**

**Endpoint**: `GET /metrics`

**Requires authentication**

**Example**:
```bash
curl https://YOUR_API/metrics \
  -H "Authorization: Bearer YOUR_KEY"
```

**Response**:
```json
{
  "timestamp": 1698765432000,
  "requests_total": 150,
  "requests_by_endpoint": {
    "/ai/chat": 100,
    "/rag/query": 50
  },
  "errors_total": 5,
  "rate_limit_hits": 2,
  "average_response_time_ms": 1250
}
```

---

### **3. Simple Chat**

**Endpoint**: `POST /ai/chat`

**Request**:
```json
{
  "conversation": [
    {"role": "user", "content": "Hello!"}
  ],
  "model": "llama2",
  "options": {
    "temperature": 0.7,
    "max_tokens": 500
  }
}
```

**Response**:
```json
{
  "reply": "Hello! How can I help you today?",
  "model": "llama2",
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "sources": null
}
```

**Example**:
```bash
curl -X POST https://YOUR_API/ai/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "Explain AI in one sentence"}
    ]
  }'
```

---

### **4. Multi-turn Conversation**

**Endpoint**: `POST /ai/chat`

**Request** (with context):
```json
{
  "conversation": [
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "2+2 equals 4."},
    {"role": "user", "content": "What about 2+3?"}
  ]
}
```

**Response**:
```json
{
  "reply": "2+3 equals 5.",
  "model": "llama2",
  "usage": {...}
}
```

---

### **5. RAG Query**

**Endpoint**: `POST /rag/query`

**Request**:
```json
{
  "query": "What are the main features?",
  "top_k": 5,
  "model": "llama2",
  "use_retrieval": true
}
```

**Response**:
```json
{
  "answer": "The main features include...",
  "sources": [
    {
      "doc_id": "doc_123",
      "chunk_id": "chunk_5",
      "content": "Feature description...",
      "score": 0.92,
      "metadata": {"title": "Documentation"}
    }
  ],
  "model": "llama2",
  "usage": {...},
  "retrieval_time_ms": 250
}
```

**Example**:
```bash
curl -X POST https://YOUR_API/rag/query \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is machine learning?",
    "top_k": 3
  }'
```

---

### **6. Upload Document**

**Endpoint**: `POST /upload`

**Request**:
```json
{
  "filename": "document.txt",
  "content_base64": "VGhpcyBpcyBhIHRlc3Q=",
  "metadata": {
    "title": "Test Document",
    "category": "docs"
  }
}
```

**Response**:
```json
{
  "doc_id": "doc_abc123",
  "filename": "document.txt",
  "size_bytes": 1024,
  "status": "processing",
  "estimated_chunks": 5
}
```

**Example (Linux/Mac)**:
```bash
# Encode file to base64
base64 document.txt > document.b64

# Upload
curl -X POST https://YOUR_API/upload \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"document.txt\",
    \"content_base64\": \"$(cat document.b64)\",
    \"metadata\": {\"title\": \"My Document\"}
  }"
```

---

### **7. Get Document Sources**

**Endpoint**: `GET /docs/:id/sources`

**Example**:
```bash
curl https://YOUR_API/docs/doc_abc123/sources \
  -H "Authorization: Bearer YOUR_KEY"
```

**Response**:
```json
{
  "sources": [
    {
      "doc_id": "doc_abc123",
      "chunk_id": "chunk_0",
      "content": "This is the first chunk...",
      "score": 1.0,
      "metadata": {...}
    }
  ]
}
```

---

### **8. Simple Query**

**Endpoint**: `GET /ai/query?q=QUESTION`

**Example**:
```bash
curl "https://YOUR_API/ai/query?q=What+is+AI" \
  -H "Authorization: Bearer YOUR_KEY"
```

**Response**:
```json
{
  "answer": "AI stands for Artificial Intelligence...",
  "model": "llama2"
}
```

---

### **9. Get Available Models**

**Endpoint**: `GET /rag/models`

**Example**:
```bash
curl https://YOUR_API/rag/models \
  -H "Authorization: Bearer YOUR_KEY"
```

**Response**:
```json
{
  "models": ["ollama", "llama", "llama2", "llama3", "mistral"],
  "default_model": "llama2"
}
```

---

## 🎓 **PART 16: Advanced Tips & Tricks**

### **Tip 1: Optimize Response Time**

**Add warmup requests** to keep model in memory:

Create a simple cron job or scheduled task:
```bash
# Every 5 minutes, make a request to keep model loaded
*/5 * * * * curl -X POST https://YOUR_API/ai/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"conversation":[{"role":"user","content":"ping"}]}'
```

---

### **Tip 2: Use System Prompts for Consistency**

```bash
curl -X POST https://YOUR_API/ai/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {
        "role": "system",
        "content": "You are a friendly customer support agent. Be helpful and concise."
      },
      {
        "role": "user",
        "content": "How do I reset my password?"
      }
    ]
  }'
```

---

### **Tip 3: Batch Document Upload**

Upload multiple documents efficiently:

```bash
# Create script: batch_upload.sh

for file in docs/*.txt; do
  echo "Uploading $file..."
  base64 "$file" > temp.b64
  curl -X POST https://YOUR_API/upload \
    -H "Authorization: Bearer YOUR_KEY" \
    -H "Content-Type: application/json" \
    -d "{
      \"filename\": \"$(basename $file)\",
      \"content_base64\": \"$(cat temp.b64)\"
    }"
  sleep 2  # Rate limiting
done

rm temp.b64
```

---

### **Tip 4: Monitor Costs**

If using paid hardware:

1. Check Hugging Face billing: https://huggingface.co/settings/billing
2. Set up budget alerts
3. Monitor Space uptime
4. Pause Space when not in use:
   - Settings → "Pause Space"
   - Saves money, stops billing
   - Resume anytime

---

### **Tip 5: Create API Key Tiers**

**In Space Settings**, set up different keys for different users:

```
# Free tier - limited rate
API_KEYS=free_user_key_1,free_user_key_2

# Premium tier - higher rate
PREMIUM_API_KEYS=premium_user_key_1

# Admin tier - unlimited
ADMIN_API_KEYS=admin_key_1
```

Then adjust rate limits:
```
RATE_LIMIT_DEFAULT=60
RATE_LIMIT_PREMIUM=300
RATE_LIMIT_ADMIN=10000
```

---

## ✅ **Final Checklist**

Before going live, verify:

- [ ] Space is running (green status)
- [ ] Health check returns `"status": "healthy"`
- [ ] Chat endpoint responds correctly
- [ ] Changed default API keys to strong random strings
- [ ] Tested with your own API key
- [ ] Documented your API keys securely (password manager)
- [ ] Set appropriate rate limits
- [ ] Chose right model for your hardware
- [ ] Tested all endpoints you plan to use
- [ ] Reviewed logs for errors
- [ ] (Optional) Upgraded hardware if needed
- [ ] (Optional) Made Space private if needed

---

## 🎉 **Congratulations!**

You now have:
✅ A fully functional AI API running on Hugging Face Spaces  
✅ Powered by Ollama (no OpenAI costs!)  
✅ Accessible from anywhere via HTTPS  
✅ Secure with API key authentication  
✅ Ready to integrate into your apps  

**Your API URL**:
```
https://YOUR_USERNAME-ai-api-ollama.hf.space
```

**Share your API** (securely):
- Give URL + API key to developers
- Use in web apps, mobile apps, scripts
- Process millions of requests
- Scale as needed

---

## 📞 **Need Help?**

**If you're stuck**:
1. ✅ Re-read the relevant section
2. ✅ Check Space logs for errors
3. ✅ Try the troubleshooting section
4. ✅ Open an issue on GitHub
5. ✅ Ask on Hugging Face forums

**Common beginner mistakes**:
- Forgot to rename `Dockerfile.huggingface` to `Dockerfile`
- Used wrong API key format (missing "Bearer")
- Chose model too large for hardware
- Didn't wait for initial model download

---

## 📚 **What's Next?**

Now that your API is live:

1. **Build a chat interface**:
   - React app
   - Vue app
   - Mobile app
   - WordPress plugin

2. **Add more features**:
   - User accounts
   - Usage analytics
   - Custom models
   - Advanced RAG

3. **Scale up**:
   - Upgrade hardware
   - Add caching
   - Load balancing
   - CDN

4. **Monetize** (optional):
   - Charge for API access
   - Offer different tiers
   - White-label for clients

---

**You did it! 🎉🚀**

Your AI-powered API is now live and ready to change the world!