Spaces:

cygon24
/

ai-api-ollama

Build error

App Files Files Community

ai-api-ollama / COMPLETE_DEPLOYMENT_GUIDE.md

cygon

Initial deployment with Ollama support

d61feef about 1 month ago

preview code

raw

history blame contribute delete

33.7 kB

Complete Step-by-Step Guide: Deploy AI API with Ollama to Hugging Face Spaces

(Absolute Beginner-Friendly Guide)

What you'll build: A fully working AI API running on Hugging Face Spaces that anyone can access via the internet, powered by Ollama (no OpenAI key needed).

Time needed: 30-45 minutes
Cost: FREE (or $0.60/hour for faster GPU)
No prior experience needed!

📋 What You Need Before Starting

✅ A Hugging Face account (we'll create this if you don't have one)
✅ Git installed on your computer
✅ Basic ability to copy/paste and follow instructions
✅ This project's code files (you already have these)

🎯 PART 1: Create Hugging Face Account & Space

Step 1.1: Create Hugging Face Account (Skip if you have one)

Open your web browser
Go to: https://huggingface.co/join
Fill in:
- Email: Your email address
- Username: Pick a username (you'll need this later - write it down!)
- Password: Choose a strong password
Click "Sign Up"
Check your email and click the verification link
You're now logged into Hugging Face!

Step 1.2: Create a New Space

Go to: https://huggingface.co/new-space

Fill in the form:

Field	What to Enter	Example
Owner	Your username	`yourname`
Space name	`ai-api-ollama`	(or anything you like)
License	Select "MIT"
Select the Space SDK	Click on "Docker"	⚠️ IMPORTANT: Must be Docker!
Space hardware	Select "CPU basic - Free" for now	(We'll upgrade later if needed)
Repo type	Leave as "Public"	(or Private if you prefer)

Click "Create Space" button at the bottom
IMPORTANT - Write down your Space URL:
```
https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
```
Replace YOUR_USERNAME with your actual username.
You'll see a page with instructions - ignore them for now, we'll do it differently.

🔧 PART 2: Install Git and Set Up Authentication

Step 2.1: Check if Git is Installed

On Windows:

Press Windows Key + R
Type cmd and press Enter
Type: git --version
If you see a version number (like git version 2.40.0), you have Git ✅
If you see an error, download Git from: https://git-scm.com/download/win

On Mac:

Press Command + Space
Type terminal and press Enter
Type: git --version
If you see a version number, you have Git ✅
If not, it will prompt you to install Xcode Command Line Tools - click Install

On Linux:

git --version

If not installed:

sudo apt-get update
sudo apt-get install git

Step 2.2: Create Hugging Face Access Token

Go to: https://huggingface.co/settings/tokens
Click "New token" button
Fill in:
- Name: git-access (or anything you like)
- Role: Select "Write"
Click "Generate token"
CRITICAL: Copy the token and save it somewhere safe (Notepad, password manager)
- It looks like: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
- ⚠️ You won't be able to see this again!

💻 PART 3: Clone Your Space to Your Computer

Step 3.1: Open Terminal/Command Prompt

Windows:

Press Windows Key + R
Type cmd and press Enter
Navigate to where you want to work (e.g., Desktop):
```
cd Desktop
```

Mac/Linux:

Open Terminal
Navigate to where you want to work:
```
cd ~/Desktop
```

Step 3.2: Clone the Space Repository

Copy this command (replace YOUR_USERNAME with your actual Hugging Face username):
```
git clone https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
```

Example:

git clone https://huggingface.co/spaces/johndoe/ai-api-ollama

Press Enter
When prompted for username and password:
- Username: Your Hugging Face username
- Password: Paste your token (NOT your password!) - the one that starts with hf_
You should see:
```
Cloning into 'ai-api-ollama'...
```
Verify the folder was created:
```
cd ai-api-ollama
ls
```
(On Windows use dir instead of ls)

📂 PART 4: Copy Project Files to Space

Step 4.1: Locate Your AI API Service Files

You should have the project files in a folder. Let's say they're in:

Windows: C:\Users\YourName\Downloads\ai-api-service\
Mac/Linux: ~/Downloads/ai-api-service/

Step 4.2: Copy ALL Files to Space Folder

Option A: Using File Explorer (Easiest)

Windows:

Open File Explorer
Navigate to your original ai-api-service folder
Press Ctrl + A to select all files
Press Ctrl + C to copy
Navigate to Desktop\ai-api-ollama (your Space folder)
Press Ctrl + V to paste
When asked about replacing files, click "Replace"

Mac:

Open Finder
Navigate to your original ai-api-service folder
Press Cmd + A to select all files
Press Cmd + C to copy
Navigate to Desktop/ai-api-ollama (your Space folder)
Press Cmd + V to paste

Option B: Using Command Line

From the terminal, in your Space folder:

Windows:

xcopy /E /I "C:\Users\YourName\Downloads\ai-api-service\*" .

Mac/Linux:

cp -r ~/Downloads/ai-api-service/* .

Step 4.3: Verify Files Were Copied

In your terminal (inside the ai-api-ollama folder):

ls

You should see these folders/files:

backend/
examples/
tests/
package.json
README.md
.env.example
Dockerfile.huggingface
And many more files...

✅ If you see these, you're good to proceed!

🐳 PART 5: Prepare the Dockerfile for Hugging Face

Step 5.1: Rename the Dockerfile

Hugging Face expects a file named exactly Dockerfile (no extension).

Windows Command Prompt:

ren Dockerfile.huggingface Dockerfile

Mac/Linux Terminal:

mv Dockerfile.huggingface Dockerfile

Step 5.2: Verify the Dockerfile

cat Dockerfile

You should see content starting with FROM node:18-alpine AS builder

✅ Good to go!

📝 PART 6: Create Space Configuration Files

Step 6.1: Create README.md for Your Space

This file tells Hugging Face how to run your Space.

Create a new file called README.md in your ai-api-ollama folder:

Windows:

notepad README.md

Mac/Linux:

nano README.md

Copy and paste this EXACT content (replace YOUR_USERNAME):

---
title: AI API Service with Ollama
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# AI API Service with Ollama

A production-ready AI API service powered by Ollama. No OpenAI API key needed!

## 🚀 Features

- 💬 **Multi-turn Chat** - Conversational AI with Llama2/Llama3
- 📚 **RAG** - Retrieval-Augmented Generation with vector search
- 🖼️ **Image Generation** - Text-to-image (requires additional API key)
- 🎙️ **Voice Synthesis** - Text-to-speech (requires additional API key)
- 📄 **Document Processing** - Upload and query PDFs, DOCX, TXT
- 🔒 **Authentication** - Secure API key-based access
- ⚡ **Rate Limiting** - Prevent abuse

## 📡 API Endpoint

https://YOUR_USERNAME-ai-api-ollama.hf.space


## 🔑 Quick Start

### Health Check

```bash
curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health

Chat Example

curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "Explain machine learning in simple terms"}
    ]
  }'

RAG Example

curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are transformers in AI?",
    "top_k": 5
  }'

🔐 Authentication

Default API key: demo-key-1

⚠️ IMPORTANT: Change this in Space settings for production use!

📚 Available Endpoints

Endpoint	Method	Description
`/health`	GET	Service health check
`/metrics`	GET	Usage metrics
`/ai/chat`	POST	Multi-turn conversation
`/ai/query`	GET	Simple question answering
`/rag/query`	POST	Query with document retrieval
`/image/generate`	POST	Generate images (needs API key)
`/voice/synthesize`	POST	Text to speech (needs API key)
`/upload`	POST	Upload documents

⚙️ Configuration

Configured with Ollama running inside the Space for true serverless deployment.

Current Settings:

Model: Llama 2 (7B)
Embedding Model: nomic-embed-text
Hardware: See Space settings

🎯 Use Cases

Chatbot backend for web/mobile apps
Document Q&A system
AI-powered search
Content generation API
Educational AI assistant

📖 Documentation

Full API documentation: See repository

💡 Tips

First request is slow - Ollama loads the model on first use (~30 seconds)
Subsequent requests are fast - Model stays in memory
Use persistent hardware - Upgrade from CPU to GPU for better performance
Monitor costs - Free tier works great for testing, upgrade for production

🆘 Support

Having issues? Check the logs or open an issue on GitHub.

Built with Encore.ts and Ollama


**Save the file**:
- Notepad: File → Save
- Nano: Press `Ctrl + O`, then `Enter`, then `Ctrl + X`

---

## 🔐 **PART 7: Configure Environment Variables in Space Settings**

### **Step 7.1: Go to Your Space Settings**

1. Open your browser
2. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings`
3. Scroll down to **"Variables and secrets"** section

### **Step 7.2: Add Environment Variables**

Click **"New variable"** for each of these:

#### **Variable 1: API_KEYS**
- **Name**: `API_KEYS`
- **Value**: `my-secret-key-12345,another-key-67890`
  - ⚠️ **IMPORTANT**: Replace with your own random keys!
  - Use strong, random strings (20+ characters)
  - Separate multiple keys with commas (no spaces)
- Click **"Save"**

#### **Variable 2: ADMIN_API_KEYS** (Optional but recommended)
- **Name**: `ADMIN_API_KEYS`
- **Value**: `admin-super-secret-key-99999`
  - ⚠️ Make this DIFFERENT from regular API keys
  - This bypasses rate limits
- Click **"Save"**

#### **Variable 3: OLLAMA_MODEL**
- **Name**: `OLLAMA_MODEL`
- **Value**: Choose one:
  - `phi:latest` (Fastest, smallest - 1.3GB - **RECOMMENDED FOR FREE CPU**)
  - `llama2:latest` (Good quality - 4GB)
  - `llama3:latest` (Best quality - 4.7GB - needs GPU)
  - `mistral:latest` (Very good - 4GB)
- Click **"Save"**

**Recommendation for FREE tier**: Use `phi:latest`

#### **Variable 4: OLLAMA_EMBEDDING_MODEL**
- **Name**: `OLLAMA_EMBEDDING_MODEL`
- **Value**: `nomic-embed-text`
  - Leave as is, this works great for RAG
- Click **"Save"**

#### **Variable 5: RATE_LIMIT_DEFAULT**
- **Name**: `RATE_LIMIT_DEFAULT`
- **Value**: `100`
  - This means 100 requests per minute for regular API keys
- Click **"Save"**

#### **Variable 6: LOG_LEVEL** (Optional)
- **Name**: `LOG_LEVEL`
- **Value**: `info`
- Click **"Save"**

### **Step 7.3: Verify Your Variables**

You should now see these variables listed:
- ✅ `API_KEYS`
- ✅ `ADMIN_API_KEYS` (if you added it)
- ✅ `OLLAMA_MODEL`
- ✅ `OLLAMA_EMBEDDING_MODEL`
- ✅ `RATE_LIMIT_DEFAULT`

---

## 📤 **PART 8: Push Code to Hugging Face**

Now we'll upload all the files to Hugging Face.

### **Step 8.1: Configure Git (First Time Only)**

In your terminal (inside the `ai-api-ollama` folder):

```bash
git config user.email "[email protected]"
git config user.name "Your Name"

Replace with your actual email and name.

Step 8.2: Add All Files to Git

git add .

The . means "add all files in this folder"

Step 8.3: Commit the Files

git commit -m "Initial deployment with Ollama support"

You should see output like:

[main abc1234] Initial deployment with Ollama support
 XX files changed, XXX insertions(+)

Step 8.4: Push to Hugging Face

git push

When prompted for credentials:

Username: Your Hugging Face username
Password: Your Hugging Face token (starts with hf_)

You'll see:

Enumerating objects: XX, done.
Counting objects: 100% (XX/XX), done.
Writing objects: 100% (XX/XX), XX.XX MiB | XX.XX MiB/s, done.

✅ Success! Your code is now on Hugging Face.

⏳ PART 9: Wait for Build & Monitor Progress

Step 9.1: Go to Your Space

Open browser: https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
You'll see a yellow "Building" status at the top

Step 9.2: Watch the Build Logs

Click on the "Logs" tab (near the top)

You'll see real-time output like:

Building Docker image...
Step 1/15 : FROM node:18-alpine AS builder
...

Step 9.3: What to Expect (Timeline)

Time	What's Happening	What You'll See
0-2 min	Docker image building	`Building Docker image...`
2-5 min	Installing Node dependencies	`npm install...`
5-8 min	Installing Ollama	`Installing Ollama...`
8-10 min	Starting services	`Starting Ollama...`
10-15 min	Downloading Ollama model	`Pulling model: phi:latest` ⏳ LONGEST STEP
15+ min	Warming up model	`Warming up model...`
Final	Space is RUNNING	🟢 Green "Running" status

Total time: 15-20 minutes for first deployment

Step 9.4: Troubleshooting Build Errors

If you see red error messages:

Common Error 1: npm install failed

Fix: Check that package.json was copied correctly
Re-run: git add package.json && git commit -m "fix package.json" && git push

Common Error 2: Port 7860 already in use

Fix: This shouldn't happen, but if it does, check README.md has app_port: 7860

Common Error 3: Model download timeout

Fix: Use a smaller model like phi:latest in environment variables
Or upgrade to GPU hardware (see Part 10)

Common Error 4: Out of memory

Fix: Model too big for free CPU. Use phi:latest or upgrade to paid tier

Step 9.5: Verify Space is Running

When build completes:

Status changes to 🟢 "Running"
You'll see in logs: Starting AI API Service on port 7860...
Your API is now LIVE!

🎉 PART 10: Test Your Live API

Step 10.1: Get Your Space URL

Your API is available at:

https://YOUR_USERNAME-ai-api-ollama.hf.space

Example:

https://johndoe-ai-api-ollama.hf.space

Step 10.2: Test Health Endpoint

Option A: Use Browser

Open your browser
Go to: https://YOUR_USERNAME-ai-api-ollama.hf.space/health

You should see JSON like:

{
  "status": "healthy",
  "version": "1.0.0",
  "services": [...]
}

✅ If you see this, your API is working!

Option B: Use Command Line

curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health

Step 10.3: Test Chat Endpoint

Copy this command (replace YOUR_USERNAME and use one of your API keys):

curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
  -H "Authorization: Bearer my-secret-key-12345" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {
        "role": "user",
        "content": "Hello! Can you explain what you are in one sentence?"
      }
    ]
  }'

Expected response (takes 5-30 seconds for first request):

{
  "reply": "I am an AI assistant powered by Llama, designed to help answer questions...",
  "model": "llama2",
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 50,
    "total_tokens": 75
  },
  "sources": null
}

✅ Success! Your AI API is working!

Step 10.4: Test RAG Endpoint (Optional)

First, upload a document:

# Create a test document
echo "The AI API Service is a production-ready API for chatbots. It supports Ollama, OpenAI, and HuggingFace." > test.txt

# Convert to base64
base64 test.txt > test.txt.b64

# Upload (Mac/Linux)
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/upload \
  -H "Authorization: Bearer my-secret-key-12345" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"test.txt\",
    \"content_base64\": \"$(cat test.txt.b64)\",
    \"metadata\": {\"title\": \"Test Document\"}
  }"

Then query it:

curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
  -H "Authorization: Bearer my-secret-key-12345" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What does the API support?",
    "top_k": 3
  }'

📊 PART 11: Monitor and Optimize (Optional)

Step 11.1: Check Metrics

curl https://YOUR_USERNAME-ai-api-ollama.hf.space/metrics \
  -H "Authorization: Bearer my-secret-key-12345"

You'll see:

Total requests
Errors
Response times
Model usage

Step 11.2: Upgrade Hardware (If Needed)

If your Space is slow or timing out:

Go to: https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings
Scroll to "Space hardware"
Click "Change hardware"
Select:
- CPU upgrade ($0.60/hr) - 2x faster than free
- GPU T4 ($0.60/hr) - 10x faster, supports bigger models
- GPU A10G ($3.15/hr) - Best performance
Click "Update Space"
Space will restart with new hardware (~5 minutes)

Step 11.3: Use Bigger Models

Once you have GPU:

Go to Settings → Variables and secrets
Edit OLLAMA_MODEL
Change to: llama3:latest or mistral:latest
Save
Space will restart and download new model

🔒 PART 12: Security Best Practices

Step 12.1: Change Default API Keys

⚠️ CRITICAL FOR PRODUCTION

Go to Space Settings → Variables
Edit API_KEYS

Replace demo-key-1 with strong random keys:

ak_live_a8f7d9e2c1b4f5a7d8e9c2b1a5f7,ak_live_b9c2d1e3f4a5b7c8d9e1f2a3b5

Never share these keys publicly!

Step 12.2: Make Space Private (Optional)

Go to: https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings
Scroll to "Rename or change repo visibility"
Click "Make private"
Confirm

Now only you can see the Space, but the API still works for anyone with the URL and API key.

Step 12.3: Monitor Usage

Check logs regularly:

Go to Space → Logs tab
Look for suspicious activity:
- Many failed authentication attempts
- Unusually high request volume
- Error patterns

🎯 PART 13: Using Your API in Applications

Example: JavaScript/TypeScript Web App

// Save as: app.js

const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
const API_KEY = 'my-secret-key-12345'; // Your actual key

async function chat(message) {
  const response = await fetch(`${API_URL}/ai/chat`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      conversation: [
        { role: 'user', content: message }
      ]
    })
  });
  
  const data = await response.json();
  return data.reply;
}

// Usage
chat('Hello!').then(reply => {
  console.log('AI:', reply);
});

Example: Python Application

# Save as: app.py

import requests

API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space'
API_KEY = 'my-secret-key-12345'

def chat(message):
    response = requests.post(
        f'{API_URL}/ai/chat',
        headers={
            'Authorization': f'Bearer {API_KEY}',
            'Content-Type': 'application/json'
        },
        json={
            'conversation': [
                {'role': 'user', 'content': message}
            ]
        }
    )
    return response.json()['reply']

# Usage
reply = chat('Hello!')
print(f'AI: {reply}')

Example: Mobile App (React Native)

// Save as: ChatService.js

const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
const API_KEY = 'my-secret-key-12345';

export async function sendMessage(message) {
  try {
    const response = await fetch(`${API_URL}/ai/chat`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        conversation: [
          { role: 'user', content: message }
        ]
      })
    });
    
    if (!response.ok) {
      throw new Error('API request failed');
    }
    
    const data = await response.json();
    return data.reply;
  } catch (error) {
    console.error('Chat error:', error);
    throw error;
  }
}

🆘 PART 14: Troubleshooting Common Issues

Issue 1: "Space is building for too long"

Symptoms: Build takes 30+ minutes

Causes:

Large model download (llama3 is 4.7GB)
Slow internet on Hugging Face servers
Free tier resource limits

Solutions:

Use smaller model: phi:latest (1.3GB)
Upgrade to GPU hardware for faster downloads
Wait patiently - first build is always slow

Issue 2: "Space crashed / Runtime error"

Symptoms: Red "Runtime error" status

Check logs for:

Error: Out of memory

Fix: Model too big for hardware
Solution: Use phi:latest or upgrade to GPU T4

Error: Port 7860 already in use

Fix: Check README.md has correct app_port: 7860
Solution: Edit README.md and push again

Error: Ollama failed to start

Fix: Dockerfile issue
Solution: Verify Dockerfile was renamed correctly

Issue 3: "API returns 401 Unauthorized"

Symptoms:

{"error": "Invalid API key"}

Solutions:

Check your Authorization header:

# Correct format:
-H "Authorization: Bearer my-secret-key-12345"

# NOT:
-H "Authorization: my-secret-key-12345"  # Missing "Bearer"

Verify API key is in Space settings:
- Go to Settings → Variables
- Check API_KEYS contains your key
- Keys are case-sensitive!
Try the default key:
```
-H "Authorization: Bearer demo-key-1"
```

Issue 4: "API is very slow (30+ seconds)"

Causes:

First request loads model into memory (normal)
Free CPU tier is slow
Model is too large for hardware

Solutions:

First request is always slow - subsequent requests are fast
Upgrade to GPU T4:
- Settings → Space hardware → GPU T4
- 10x faster inference
Use smaller model: phi:latest
Add model warmup (already in Dockerfile):
- Keeps model loaded
- Reduces cold start time

Issue 5: "Cannot upload documents"

Error: File too large

Fix:

Default max size is 10MB
To increase, add environment variable:
```
MAX_FILE_SIZE_MB=50
```

Error: Invalid file format

Fix:

Only supports: PDF, DOCX, TXT
Ensure file extension is correct
Check file is not corrupted

Issue 6: "RAG returns no results"

Symptoms: Empty sources array in response

Causes:

No documents uploaded yet
Query doesn't match document content
Embedding model not loaded

Solutions:

Upload a document first:

curl -X POST https://YOUR_API/upload \
  -H "Authorization: Bearer YOUR_KEY" \
  -F "[email protected]"

Wait for processing (check logs):

Document processed successfully: doc_abc123

Try broader query:
- Instead of: "What is the exact price?"
- Try: "pricing information"

Issue 7: "How do I see errors?"

Steps:

Go to your Space
Click "Logs" tab
Look for lines with:
```
"level": "error"
```
Read the "message" field

Common errors and fixes:

{"level":"error","message":"Invalid API key"}

→ Fix: Check Authorization header

{"level":"error","message":"Rate limit exceeded"}

→ Fix: Wait 60 seconds or use admin key

{"level":"error","message":"Ollama API error"}

→ Fix: Model not loaded, wait for startup to complete

Issue 8: "Space keeps restarting"

Symptoms: Status alternates between Building and Running

Causes:

Application crashes on startup
Out of memory
Port configuration issue

Debug steps:

Check logs for crash reason
Verify environment variables are set
Try smaller model
Contact Hugging Face support if persistent

📖 PART 15: Complete API Reference

Base URL

https://YOUR_USERNAME-ai-api-ollama.hf.space

Authentication

All endpoints (except /health) require:

Authorization: Bearer YOUR_API_KEY

1. Health Check

Endpoint: GET /health

No authentication required

Example:

curl https://YOUR_API/health

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "services": [
    {"name": "llm", "status": "up"},
    {"name": "vector_db", "status": "up"}
  ],
  "uptime_seconds": 3600
}

2. Metrics

Endpoint: GET /metrics

Requires authentication

Example:

curl https://YOUR_API/metrics \
  -H "Authorization: Bearer YOUR_KEY"

Response:

{
  "timestamp": 1698765432000,
  "requests_total": 150,
  "requests_by_endpoint": {
    "/ai/chat": 100,
    "/rag/query": 50
  },
  "errors_total": 5,
  "rate_limit_hits": 2,
  "average_response_time_ms": 1250
}

3. Simple Chat

Endpoint: POST /ai/chat

Request:

{
  "conversation": [
    {"role": "user", "content": "Hello!"}
  ],
  "model": "llama2",
  "options": {
    "temperature": 0.7,
    "max_tokens": 500
  }
}

Response:

{
  "reply": "Hello! How can I help you today?",
  "model": "llama2",
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "sources": null
}

Example:

curl -X POST https://YOUR_API/ai/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "Explain AI in one sentence"}
    ]
  }'

4. Multi-turn Conversation

Endpoint: POST /ai/chat

Request (with context):

{
  "conversation": [
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "2+2 equals 4."},
    {"role": "user", "content": "What about 2+3?"}
  ]
}

Response:

{
  "reply": "2+3 equals 5.",
  "model": "llama2",
  "usage": {...}
}

5. RAG Query

Endpoint: POST /rag/query

Request:

{
  "query": "What are the main features?",
  "top_k": 5,
  "model": "llama2",
  "use_retrieval": true
}

Response:

{
  "answer": "The main features include...",
  "sources": [
    {
      "doc_id": "doc_123",
      "chunk_id": "chunk_5",
      "content": "Feature description...",
      "score": 0.92,
      "metadata": {"title": "Documentation"}
    }
  ],
  "model": "llama2",
  "usage": {...},
  "retrieval_time_ms": 250
}

Example:

curl -X POST https://YOUR_API/rag/query \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is machine learning?",
    "top_k": 3
  }'

6. Upload Document

Endpoint: POST /upload

Request:

{
  "filename": "document.txt",
  "content_base64": "VGhpcyBpcyBhIHRlc3Q=",
  "metadata": {
    "title": "Test Document",
    "category": "docs"
  }
}

Response:

{
  "doc_id": "doc_abc123",
  "filename": "document.txt",
  "size_bytes": 1024,
  "status": "processing",
  "estimated_chunks": 5
}

Example (Linux/Mac):

# Encode file to base64
base64 document.txt > document.b64

# Upload
curl -X POST https://YOUR_API/upload \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"document.txt\",
    \"content_base64\": \"$(cat document.b64)\",
    \"metadata\": {\"title\": \"My Document\"}
  }"

7. Get Document Sources

Endpoint: GET /docs/:id/sources

Example:

curl https://YOUR_API/docs/doc_abc123/sources \
  -H "Authorization: Bearer YOUR_KEY"

Response:

{
  "sources": [
    {
      "doc_id": "doc_abc123",
      "chunk_id": "chunk_0",
      "content": "This is the first chunk...",
      "score": 1.0,
      "metadata": {...}
    }
  ]
}

8. Simple Query

Endpoint: GET /ai/query?q=QUESTION

Example:

curl "https://YOUR_API/ai/query?q=What+is+AI" \
  -H "Authorization: Bearer YOUR_KEY"

Response:

{
  "answer": "AI stands for Artificial Intelligence...",
  "model": "llama2"
}

9. Get Available Models

Endpoint: GET /rag/models

Example:

curl https://YOUR_API/rag/models \
  -H "Authorization: Bearer YOUR_KEY"

Response:

{
  "models": ["ollama", "llama", "llama2", "llama3", "mistral"],
  "default_model": "llama2"
}

🎓 PART 16: Advanced Tips & Tricks

Tip 1: Optimize Response Time

Add warmup requests to keep model in memory:

Create a simple cron job or scheduled task:

# Every 5 minutes, make a request to keep model loaded
*/5 * * * * curl -X POST https://YOUR_API/ai/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"conversation":[{"role":"user","content":"ping"}]}'

Tip 2: Use System Prompts for Consistency

curl -X POST https://YOUR_API/ai/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {
        "role": "system",
        "content": "You are a friendly customer support agent. Be helpful and concise."
      },
      {
        "role": "user",
        "content": "How do I reset my password?"
      }
    ]
  }'

Tip 3: Batch Document Upload

Upload multiple documents efficiently:

# Create script: batch_upload.sh

for file in docs/*.txt; do
  echo "Uploading $file..."
  base64 "$file" > temp.b64
  curl -X POST https://YOUR_API/upload \
    -H "Authorization: Bearer YOUR_KEY" \
    -H "Content-Type: application/json" \
    -d "{
      \"filename\": \"$(basename $file)\",
      \"content_base64\": \"$(cat temp.b64)\"
    }"
  sleep 2  # Rate limiting
done

rm temp.b64

Tip 4: Monitor Costs

If using paid hardware:

Check Hugging Face billing: https://huggingface.co/settings/billing
Set up budget alerts
Monitor Space uptime
Pause Space when not in use:
- Settings → "Pause Space"
- Saves money, stops billing
- Resume anytime

Tip 5: Create API Key Tiers

In Space Settings, set up different keys for different users:

# Free tier - limited rate
API_KEYS=free_user_key_1,free_user_key_2

# Premium tier - higher rate
PREMIUM_API_KEYS=premium_user_key_1

# Admin tier - unlimited
ADMIN_API_KEYS=admin_key_1

Then adjust rate limits:

RATE_LIMIT_DEFAULT=60
RATE_LIMIT_PREMIUM=300
RATE_LIMIT_ADMIN=10000

✅ Final Checklist

Before going live, verify:

Space is running (green status)
Health check returns "status": "healthy"
Chat endpoint responds correctly
Changed default API keys to strong random strings
Tested with your own API key
Documented your API keys securely (password manager)
Set appropriate rate limits
Chose right model for your hardware
Tested all endpoints you plan to use
Reviewed logs for errors
(Optional) Upgraded hardware if needed
(Optional) Made Space private if needed

🎉 Congratulations!

You now have: ✅ A fully functional AI API running on Hugging Face Spaces
✅ Powered by Ollama (no OpenAI costs!)
✅ Accessible from anywhere via HTTPS
✅ Secure with API key authentication
✅ Ready to integrate into your apps

Your API URL:

https://YOUR_USERNAME-ai-api-ollama.hf.space

Share your API (securely):

Give URL + API key to developers
Use in web apps, mobile apps, scripts
Process millions of requests
Scale as needed

📞 Need Help?

If you're stuck:

✅ Re-read the relevant section
✅ Check Space logs for errors
✅ Try the troubleshooting section
✅ Open an issue on GitHub
✅ Ask on Hugging Face forums

Common beginner mistakes:

Forgot to rename Dockerfile.huggingface to Dockerfile
Used wrong API key format (missing "Bearer")
Chose model too large for hardware
Didn't wait for initial model download

📚 What's Next?

Now that your API is live:

Build a chat interface:
- React app
- Vue app
- Mobile app
- WordPress plugin
Add more features:
- User accounts
- Usage analytics
- Custom models
- Advanced RAG
Scale up:
- Upgrade hardware
- Add caching
- Load balancing
- CDN
Monetize (optional):
- Charge for API access
- Offer different tiers
- White-label for clients

You did it! 🎉🚀

Your AI-powered API is now live and ready to change the world!