ai-api-ollama / COMPLETE_DEPLOYMENT_GUIDE.md
cygon
Initial deployment with Ollama support
d61feef

Complete Step-by-Step Guide: Deploy AI API with Ollama to Hugging Face Spaces

(Absolute Beginner-Friendly Guide)

What you'll build: A fully working AI API running on Hugging Face Spaces that anyone can access via the internet, powered by Ollama (no OpenAI key needed).

Time needed: 30-45 minutes
Cost: FREE (or $0.60/hour for faster GPU)
No prior experience needed!


πŸ“‹ What You Need Before Starting

  1. βœ… A Hugging Face account (we'll create this if you don't have one)
  2. βœ… Git installed on your computer
  3. βœ… Basic ability to copy/paste and follow instructions
  4. βœ… This project's code files (you already have these)

🎯 PART 1: Create Hugging Face Account & Space

Step 1.1: Create Hugging Face Account (Skip if you have one)

  1. Open your web browser
  2. Go to: https://huggingface.co/join
  3. Fill in:
    • Email: Your email address
    • Username: Pick a username (you'll need this later - write it down!)
    • Password: Choose a strong password
  4. Click "Sign Up"
  5. Check your email and click the verification link
  6. You're now logged into Hugging Face!

Step 1.2: Create a New Space

  1. Go to: https://huggingface.co/new-space

  2. Fill in the form:

    Field What to Enter Example
    Owner Your username yourname
    Space name ai-api-ollama (or anything you like)
    License Select "MIT"
    Select the Space SDK Click on "Docker" ⚠️ IMPORTANT: Must be Docker!
    Space hardware Select "CPU basic - Free" for now (We'll upgrade later if needed)
    Repo type Leave as "Public" (or Private if you prefer)
  3. Click "Create Space" button at the bottom

  4. IMPORTANT - Write down your Space URL:

    https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
    

    Replace YOUR_USERNAME with your actual username.

  5. You'll see a page with instructions - ignore them for now, we'll do it differently.


πŸ”§ PART 2: Install Git and Set Up Authentication

Step 2.1: Check if Git is Installed

On Windows:

  1. Press Windows Key + R
  2. Type cmd and press Enter
  3. Type: git --version
  4. If you see a version number (like git version 2.40.0), you have Git βœ…
  5. If you see an error, download Git from: https://git-scm.com/download/win

On Mac:

  1. Press Command + Space
  2. Type terminal and press Enter
  3. Type: git --version
  4. If you see a version number, you have Git βœ…
  5. If not, it will prompt you to install Xcode Command Line Tools - click Install

On Linux:

git --version

If not installed:

sudo apt-get update
sudo apt-get install git

Step 2.2: Create Hugging Face Access Token

  1. Go to: https://huggingface.co/settings/tokens
  2. Click "New token" button
  3. Fill in:
    • Name: git-access (or anything you like)
    • Role: Select "Write"
  4. Click "Generate token"
  5. CRITICAL: Copy the token and save it somewhere safe (Notepad, password manager)
    • It looks like: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    • ⚠️ You won't be able to see this again!

πŸ’» PART 3: Clone Your Space to Your Computer

Step 3.1: Open Terminal/Command Prompt

Windows:

  1. Press Windows Key + R
  2. Type cmd and press Enter
  3. Navigate to where you want to work (e.g., Desktop):
    cd Desktop
    

Mac/Linux:

  1. Open Terminal
  2. Navigate to where you want to work:
    cd ~/Desktop
    

Step 3.2: Clone the Space Repository

  1. Copy this command (replace YOUR_USERNAME with your actual Hugging Face username):

    git clone https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
    
  2. Example:

    git clone https://huggingface.co/spaces/johndoe/ai-api-ollama
    
  3. Press Enter

  4. When prompted for username and password:

    • Username: Your Hugging Face username
    • Password: Paste your token (NOT your password!) - the one that starts with hf_
  5. You should see:

    Cloning into 'ai-api-ollama'...
    
  6. Verify the folder was created:

    cd ai-api-ollama
    ls
    

    (On Windows use dir instead of ls)


πŸ“‚ PART 4: Copy Project Files to Space

Step 4.1: Locate Your AI API Service Files

You should have the project files in a folder. Let's say they're in:

  • Windows: C:\Users\YourName\Downloads\ai-api-service\
  • Mac/Linux: ~/Downloads/ai-api-service/

Step 4.2: Copy ALL Files to Space Folder

Option A: Using File Explorer (Easiest)

Windows:

  1. Open File Explorer
  2. Navigate to your original ai-api-service folder
  3. Press Ctrl + A to select all files
  4. Press Ctrl + C to copy
  5. Navigate to Desktop\ai-api-ollama (your Space folder)
  6. Press Ctrl + V to paste
  7. When asked about replacing files, click "Replace"

Mac:

  1. Open Finder
  2. Navigate to your original ai-api-service folder
  3. Press Cmd + A to select all files
  4. Press Cmd + C to copy
  5. Navigate to Desktop/ai-api-ollama (your Space folder)
  6. Press Cmd + V to paste

Option B: Using Command Line

From the terminal, in your Space folder:

Windows:

xcopy /E /I "C:\Users\YourName\Downloads\ai-api-service\*" .

Mac/Linux:

cp -r ~/Downloads/ai-api-service/* .

Step 4.3: Verify Files Were Copied

In your terminal (inside the ai-api-ollama folder):

ls

You should see these folders/files:

  • backend/
  • examples/
  • tests/
  • package.json
  • README.md
  • .env.example
  • Dockerfile.huggingface
  • And many more files...

βœ… If you see these, you're good to proceed!


🐳 PART 5: Prepare the Dockerfile for Hugging Face

Step 5.1: Rename the Dockerfile

Hugging Face expects a file named exactly Dockerfile (no extension).

Windows Command Prompt:

ren Dockerfile.huggingface Dockerfile

Mac/Linux Terminal:

mv Dockerfile.huggingface Dockerfile

Step 5.2: Verify the Dockerfile

cat Dockerfile

You should see content starting with FROM node:18-alpine AS builder

βœ… Good to go!


πŸ“ PART 6: Create Space Configuration Files

Step 6.1: Create README.md for Your Space

This file tells Hugging Face how to run your Space.

Create a new file called README.md in your ai-api-ollama folder:

Windows:

notepad README.md

Mac/Linux:

nano README.md

Copy and paste this EXACT content (replace YOUR_USERNAME):

---
title: AI API Service with Ollama
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# AI API Service with Ollama

A production-ready AI API service powered by Ollama. No OpenAI API key needed!

## πŸš€ Features

- πŸ’¬ **Multi-turn Chat** - Conversational AI with Llama2/Llama3
- πŸ“š **RAG** - Retrieval-Augmented Generation with vector search
- πŸ–ΌοΈ **Image Generation** - Text-to-image (requires additional API key)
- πŸŽ™οΈ **Voice Synthesis** - Text-to-speech (requires additional API key)
- πŸ“„ **Document Processing** - Upload and query PDFs, DOCX, TXT
- πŸ”’ **Authentication** - Secure API key-based access
- ⚑ **Rate Limiting** - Prevent abuse

## πŸ“‘ API Endpoint

https://YOUR_USERNAME-ai-api-ollama.hf.space


## πŸ”‘ Quick Start

### Health Check

```bash
curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health

Chat Example

curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "Explain machine learning in simple terms"}
    ]
  }'

RAG Example

curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
  -H "Authorization: Bearer demo-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are transformers in AI?",
    "top_k": 5
  }'

πŸ” Authentication

Default API key: demo-key-1

⚠️ IMPORTANT: Change this in Space settings for production use!

πŸ“š Available Endpoints

Endpoint Method Description
/health GET Service health check
/metrics GET Usage metrics
/ai/chat POST Multi-turn conversation
/ai/query GET Simple question answering
/rag/query POST Query with document retrieval
/image/generate POST Generate images (needs API key)
/voice/synthesize POST Text to speech (needs API key)
/upload POST Upload documents

βš™οΈ Configuration

Configured with Ollama running inside the Space for true serverless deployment.

Current Settings:

  • Model: Llama 2 (7B)
  • Embedding Model: nomic-embed-text
  • Hardware: See Space settings

🎯 Use Cases

  • Chatbot backend for web/mobile apps
  • Document Q&A system
  • AI-powered search
  • Content generation API
  • Educational AI assistant

πŸ“– Documentation

Full API documentation: See repository

πŸ’‘ Tips

  1. First request is slow - Ollama loads the model on first use (~30 seconds)
  2. Subsequent requests are fast - Model stays in memory
  3. Use persistent hardware - Upgrade from CPU to GPU for better performance
  4. Monitor costs - Free tier works great for testing, upgrade for production

πŸ†˜ Support

Having issues? Check the logs or open an issue on GitHub.


Built with Encore.ts and Ollama


**Save the file**:
- Notepad: File β†’ Save
- Nano: Press `Ctrl + O`, then `Enter`, then `Ctrl + X`

---

## πŸ” **PART 7: Configure Environment Variables in Space Settings**

### **Step 7.1: Go to Your Space Settings**

1. Open your browser
2. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings`
3. Scroll down to **"Variables and secrets"** section

### **Step 7.2: Add Environment Variables**

Click **"New variable"** for each of these:

#### **Variable 1: API_KEYS**
- **Name**: `API_KEYS`
- **Value**: `my-secret-key-12345,another-key-67890`
  - ⚠️ **IMPORTANT**: Replace with your own random keys!
  - Use strong, random strings (20+ characters)
  - Separate multiple keys with commas (no spaces)
- Click **"Save"**

#### **Variable 2: ADMIN_API_KEYS** (Optional but recommended)
- **Name**: `ADMIN_API_KEYS`
- **Value**: `admin-super-secret-key-99999`
  - ⚠️ Make this DIFFERENT from regular API keys
  - This bypasses rate limits
- Click **"Save"**

#### **Variable 3: OLLAMA_MODEL**
- **Name**: `OLLAMA_MODEL`
- **Value**: Choose one:
  - `phi:latest` (Fastest, smallest - 1.3GB - **RECOMMENDED FOR FREE CPU**)
  - `llama2:latest` (Good quality - 4GB)
  - `llama3:latest` (Best quality - 4.7GB - needs GPU)
  - `mistral:latest` (Very good - 4GB)
- Click **"Save"**

**Recommendation for FREE tier**: Use `phi:latest`

#### **Variable 4: OLLAMA_EMBEDDING_MODEL**
- **Name**: `OLLAMA_EMBEDDING_MODEL`
- **Value**: `nomic-embed-text`
  - Leave as is, this works great for RAG
- Click **"Save"**

#### **Variable 5: RATE_LIMIT_DEFAULT**
- **Name**: `RATE_LIMIT_DEFAULT`
- **Value**: `100`
  - This means 100 requests per minute for regular API keys
- Click **"Save"**

#### **Variable 6: LOG_LEVEL** (Optional)
- **Name**: `LOG_LEVEL`
- **Value**: `info`
- Click **"Save"**

### **Step 7.3: Verify Your Variables**

You should now see these variables listed:
- βœ… `API_KEYS`
- βœ… `ADMIN_API_KEYS` (if you added it)
- βœ… `OLLAMA_MODEL`
- βœ… `OLLAMA_EMBEDDING_MODEL`
- βœ… `RATE_LIMIT_DEFAULT`

---

## πŸ“€ **PART 8: Push Code to Hugging Face**

Now we'll upload all the files to Hugging Face.

### **Step 8.1: Configure Git (First Time Only)**

In your terminal (inside the `ai-api-ollama` folder):

```bash
git config user.email "[email protected]"
git config user.name "Your Name"

Replace with your actual email and name.

Step 8.2: Add All Files to Git

git add .

The . means "add all files in this folder"

Step 8.3: Commit the Files

git commit -m "Initial deployment with Ollama support"

You should see output like:

[main abc1234] Initial deployment with Ollama support
 XX files changed, XXX insertions(+)

Step 8.4: Push to Hugging Face

git push

When prompted for credentials:

  • Username: Your Hugging Face username
  • Password: Your Hugging Face token (starts with hf_)

You'll see:

Enumerating objects: XX, done.
Counting objects: 100% (XX/XX), done.
Writing objects: 100% (XX/XX), XX.XX MiB | XX.XX MiB/s, done.

βœ… Success! Your code is now on Hugging Face.


⏳ PART 9: Wait for Build & Monitor Progress

Step 9.1: Go to Your Space

  1. Open browser: https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
  2. You'll see a yellow "Building" status at the top

Step 9.2: Watch the Build Logs

  1. Click on the "Logs" tab (near the top)
  2. You'll see real-time output like:
    Building Docker image...
    Step 1/15 : FROM node:18-alpine AS builder
    ...
    

Step 9.3: What to Expect (Timeline)

Time What's Happening What You'll See
0-2 min Docker image building Building Docker image...
2-5 min Installing Node dependencies npm install...
5-8 min Installing Ollama Installing Ollama...
8-10 min Starting services Starting Ollama...
10-15 min Downloading Ollama model Pulling model: phi:latest ⏳ LONGEST STEP
15+ min Warming up model Warming up model...
Final Space is RUNNING 🟒 Green "Running" status

Total time: 15-20 minutes for first deployment

Step 9.4: Troubleshooting Build Errors

If you see red error messages:

Common Error 1: npm install failed

  • Fix: Check that package.json was copied correctly
  • Re-run: git add package.json && git commit -m "fix package.json" && git push

Common Error 2: Port 7860 already in use

  • Fix: This shouldn't happen, but if it does, check README.md has app_port: 7860

Common Error 3: Model download timeout

  • Fix: Use a smaller model like phi:latest in environment variables
  • Or upgrade to GPU hardware (see Part 10)

Common Error 4: Out of memory

  • Fix: Model too big for free CPU. Use phi:latest or upgrade to paid tier

Step 9.5: Verify Space is Running

When build completes:

  1. Status changes to 🟒 "Running"
  2. You'll see in logs: Starting AI API Service on port 7860...
  3. Your API is now LIVE!

πŸŽ‰ PART 10: Test Your Live API

Step 10.1: Get Your Space URL

Your API is available at:

https://YOUR_USERNAME-ai-api-ollama.hf.space

Example:

https://johndoe-ai-api-ollama.hf.space

Step 10.2: Test Health Endpoint

Option A: Use Browser

  1. Open your browser
  2. Go to: https://YOUR_USERNAME-ai-api-ollama.hf.space/health
  3. You should see JSON like:
    {
      "status": "healthy",
      "version": "1.0.0",
      "services": [...]
    }
    

βœ… If you see this, your API is working!

Option B: Use Command Line

curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health

Step 10.3: Test Chat Endpoint

Copy this command (replace YOUR_USERNAME and use one of your API keys):

curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
  -H "Authorization: Bearer my-secret-key-12345" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {
        "role": "user",
        "content": "Hello! Can you explain what you are in one sentence?"
      }
    ]
  }'

Expected response (takes 5-30 seconds for first request):

{
  "reply": "I am an AI assistant powered by Llama, designed to help answer questions...",
  "model": "llama2",
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 50,
    "total_tokens": 75
  },
  "sources": null
}

βœ… Success! Your AI API is working!

Step 10.4: Test RAG Endpoint (Optional)

First, upload a document:

# Create a test document
echo "The AI API Service is a production-ready API for chatbots. It supports Ollama, OpenAI, and HuggingFace." > test.txt

# Convert to base64
base64 test.txt > test.txt.b64

# Upload (Mac/Linux)
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/upload \
  -H "Authorization: Bearer my-secret-key-12345" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"test.txt\",
    \"content_base64\": \"$(cat test.txt.b64)\",
    \"metadata\": {\"title\": \"Test Document\"}
  }"

Then query it:

curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
  -H "Authorization: Bearer my-secret-key-12345" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What does the API support?",
    "top_k": 3
  }'

πŸ“Š PART 11: Monitor and Optimize (Optional)

Step 11.1: Check Metrics

curl https://YOUR_USERNAME-ai-api-ollama.hf.space/metrics \
  -H "Authorization: Bearer my-secret-key-12345"

You'll see:

  • Total requests
  • Errors
  • Response times
  • Model usage

Step 11.2: Upgrade Hardware (If Needed)

If your Space is slow or timing out:

  1. Go to: https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings
  2. Scroll to "Space hardware"
  3. Click "Change hardware"
  4. Select:
    • CPU upgrade ($0.60/hr) - 2x faster than free
    • GPU T4 ($0.60/hr) - 10x faster, supports bigger models
    • GPU A10G ($3.15/hr) - Best performance
  5. Click "Update Space"
  6. Space will restart with new hardware (~5 minutes)

Step 11.3: Use Bigger Models

Once you have GPU:

  1. Go to Settings β†’ Variables and secrets
  2. Edit OLLAMA_MODEL
  3. Change to: llama3:latest or mistral:latest
  4. Save
  5. Space will restart and download new model

πŸ”’ PART 12: Security Best Practices

Step 12.1: Change Default API Keys

⚠️ CRITICAL FOR PRODUCTION

  1. Go to Space Settings β†’ Variables
  2. Edit API_KEYS
  3. Replace demo-key-1 with strong random keys:
    ak_live_a8f7d9e2c1b4f5a7d8e9c2b1a5f7,ak_live_b9c2d1e3f4a5b7c8d9e1f2a3b5
    
  4. Never share these keys publicly!

Step 12.2: Make Space Private (Optional)

  1. Go to: https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings
  2. Scroll to "Rename or change repo visibility"
  3. Click "Make private"
  4. Confirm

Now only you can see the Space, but the API still works for anyone with the URL and API key.

Step 12.3: Monitor Usage

Check logs regularly:

  1. Go to Space β†’ Logs tab
  2. Look for suspicious activity:
    • Many failed authentication attempts
    • Unusually high request volume
    • Error patterns

🎯 PART 13: Using Your API in Applications

Example: JavaScript/TypeScript Web App

// Save as: app.js

const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
const API_KEY = 'my-secret-key-12345'; // Your actual key

async function chat(message) {
  const response = await fetch(`${API_URL}/ai/chat`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      conversation: [
        { role: 'user', content: message }
      ]
    })
  });
  
  const data = await response.json();
  return data.reply;
}

// Usage
chat('Hello!').then(reply => {
  console.log('AI:', reply);
});

Example: Python Application

# Save as: app.py

import requests

API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space'
API_KEY = 'my-secret-key-12345'

def chat(message):
    response = requests.post(
        f'{API_URL}/ai/chat',
        headers={
            'Authorization': f'Bearer {API_KEY}',
            'Content-Type': 'application/json'
        },
        json={
            'conversation': [
                {'role': 'user', 'content': message}
            ]
        }
    )
    return response.json()['reply']

# Usage
reply = chat('Hello!')
print(f'AI: {reply}')

Example: Mobile App (React Native)

// Save as: ChatService.js

const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
const API_KEY = 'my-secret-key-12345';

export async function sendMessage(message) {
  try {
    const response = await fetch(`${API_URL}/ai/chat`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        conversation: [
          { role: 'user', content: message }
        ]
      })
    });
    
    if (!response.ok) {
      throw new Error('API request failed');
    }
    
    const data = await response.json();
    return data.reply;
  } catch (error) {
    console.error('Chat error:', error);
    throw error;
  }
}

πŸ†˜ PART 14: Troubleshooting Common Issues

Issue 1: "Space is building for too long"

Symptoms: Build takes 30+ minutes

Causes:

  • Large model download (llama3 is 4.7GB)
  • Slow internet on Hugging Face servers
  • Free tier resource limits

Solutions:

  1. Use smaller model: phi:latest (1.3GB)
  2. Upgrade to GPU hardware for faster downloads
  3. Wait patiently - first build is always slow

Issue 2: "Space crashed / Runtime error"

Symptoms: Red "Runtime error" status

Check logs for:

Error: Out of memory

  • Fix: Model too big for hardware
  • Solution: Use phi:latest or upgrade to GPU T4

Error: Port 7860 already in use

  • Fix: Check README.md has correct app_port: 7860
  • Solution: Edit README.md and push again

Error: Ollama failed to start

  • Fix: Dockerfile issue
  • Solution: Verify Dockerfile was renamed correctly

Issue 3: "API returns 401 Unauthorized"

Symptoms:

{"error": "Invalid API key"}

Solutions:

  1. Check your Authorization header:

    # Correct format:
    -H "Authorization: Bearer my-secret-key-12345"
    
    # NOT:
    -H "Authorization: my-secret-key-12345"  # Missing "Bearer"
    
  2. Verify API key is in Space settings:

    • Go to Settings β†’ Variables
    • Check API_KEYS contains your key
    • Keys are case-sensitive!
  3. Try the default key:

    -H "Authorization: Bearer demo-key-1"
    

Issue 4: "API is very slow (30+ seconds)"

Causes:

  • First request loads model into memory (normal)
  • Free CPU tier is slow
  • Model is too large for hardware

Solutions:

  1. First request is always slow - subsequent requests are fast
  2. Upgrade to GPU T4:
    • Settings β†’ Space hardware β†’ GPU T4
    • 10x faster inference
  3. Use smaller model: phi:latest
  4. Add model warmup (already in Dockerfile):
    • Keeps model loaded
    • Reduces cold start time

Issue 5: "Cannot upload documents"

Error: File too large

Fix:

  • Default max size is 10MB
  • To increase, add environment variable:
    MAX_FILE_SIZE_MB=50
    

Error: Invalid file format

Fix:

  • Only supports: PDF, DOCX, TXT
  • Ensure file extension is correct
  • Check file is not corrupted

Issue 6: "RAG returns no results"

Symptoms: Empty sources array in response

Causes:

  1. No documents uploaded yet
  2. Query doesn't match document content
  3. Embedding model not loaded

Solutions:

  1. Upload a document first:

    curl -X POST https://YOUR_API/upload \
      -H "Authorization: Bearer YOUR_KEY" \
      -F "[email protected]"
    
  2. Wait for processing (check logs):

    Document processed successfully: doc_abc123
    
  3. Try broader query:

    • Instead of: "What is the exact price?"
    • Try: "pricing information"

Issue 7: "How do I see errors?"

Steps:

  1. Go to your Space
  2. Click "Logs" tab
  3. Look for lines with:
    "level": "error"
    
  4. Read the "message" field

Common errors and fixes:

{"level":"error","message":"Invalid API key"}

β†’ Fix: Check Authorization header

{"level":"error","message":"Rate limit exceeded"}

β†’ Fix: Wait 60 seconds or use admin key

{"level":"error","message":"Ollama API error"}

β†’ Fix: Model not loaded, wait for startup to complete


Issue 8: "Space keeps restarting"

Symptoms: Status alternates between Building and Running

Causes:

  • Application crashes on startup
  • Out of memory
  • Port configuration issue

Debug steps:

  1. Check logs for crash reason
  2. Verify environment variables are set
  3. Try smaller model
  4. Contact Hugging Face support if persistent

πŸ“– PART 15: Complete API Reference

Base URL

https://YOUR_USERNAME-ai-api-ollama.hf.space

Authentication

All endpoints (except /health) require:

Authorization: Bearer YOUR_API_KEY

1. Health Check

Endpoint: GET /health

No authentication required

Example:

curl https://YOUR_API/health

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "services": [
    {"name": "llm", "status": "up"},
    {"name": "vector_db", "status": "up"}
  ],
  "uptime_seconds": 3600
}

2. Metrics

Endpoint: GET /metrics

Requires authentication

Example:

curl https://YOUR_API/metrics \
  -H "Authorization: Bearer YOUR_KEY"

Response:

{
  "timestamp": 1698765432000,
  "requests_total": 150,
  "requests_by_endpoint": {
    "/ai/chat": 100,
    "/rag/query": 50
  },
  "errors_total": 5,
  "rate_limit_hits": 2,
  "average_response_time_ms": 1250
}

3. Simple Chat

Endpoint: POST /ai/chat

Request:

{
  "conversation": [
    {"role": "user", "content": "Hello!"}
  ],
  "model": "llama2",
  "options": {
    "temperature": 0.7,
    "max_tokens": 500
  }
}

Response:

{
  "reply": "Hello! How can I help you today?",
  "model": "llama2",
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "sources": null
}

Example:

curl -X POST https://YOUR_API/ai/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "Explain AI in one sentence"}
    ]
  }'

4. Multi-turn Conversation

Endpoint: POST /ai/chat

Request (with context):

{
  "conversation": [
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "2+2 equals 4."},
    {"role": "user", "content": "What about 2+3?"}
  ]
}

Response:

{
  "reply": "2+3 equals 5.",
  "model": "llama2",
  "usage": {...}
}

5. RAG Query

Endpoint: POST /rag/query

Request:

{
  "query": "What are the main features?",
  "top_k": 5,
  "model": "llama2",
  "use_retrieval": true
}

Response:

{
  "answer": "The main features include...",
  "sources": [
    {
      "doc_id": "doc_123",
      "chunk_id": "chunk_5",
      "content": "Feature description...",
      "score": 0.92,
      "metadata": {"title": "Documentation"}
    }
  ],
  "model": "llama2",
  "usage": {...},
  "retrieval_time_ms": 250
}

Example:

curl -X POST https://YOUR_API/rag/query \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is machine learning?",
    "top_k": 3
  }'

6. Upload Document

Endpoint: POST /upload

Request:

{
  "filename": "document.txt",
  "content_base64": "VGhpcyBpcyBhIHRlc3Q=",
  "metadata": {
    "title": "Test Document",
    "category": "docs"
  }
}

Response:

{
  "doc_id": "doc_abc123",
  "filename": "document.txt",
  "size_bytes": 1024,
  "status": "processing",
  "estimated_chunks": 5
}

Example (Linux/Mac):

# Encode file to base64
base64 document.txt > document.b64

# Upload
curl -X POST https://YOUR_API/upload \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"document.txt\",
    \"content_base64\": \"$(cat document.b64)\",
    \"metadata\": {\"title\": \"My Document\"}
  }"

7. Get Document Sources

Endpoint: GET /docs/:id/sources

Example:

curl https://YOUR_API/docs/doc_abc123/sources \
  -H "Authorization: Bearer YOUR_KEY"

Response:

{
  "sources": [
    {
      "doc_id": "doc_abc123",
      "chunk_id": "chunk_0",
      "content": "This is the first chunk...",
      "score": 1.0,
      "metadata": {...}
    }
  ]
}

8. Simple Query

Endpoint: GET /ai/query?q=QUESTION

Example:

curl "https://YOUR_API/ai/query?q=What+is+AI" \
  -H "Authorization: Bearer YOUR_KEY"

Response:

{
  "answer": "AI stands for Artificial Intelligence...",
  "model": "llama2"
}

9. Get Available Models

Endpoint: GET /rag/models

Example:

curl https://YOUR_API/rag/models \
  -H "Authorization: Bearer YOUR_KEY"

Response:

{
  "models": ["ollama", "llama", "llama2", "llama3", "mistral"],
  "default_model": "llama2"
}

πŸŽ“ PART 16: Advanced Tips & Tricks

Tip 1: Optimize Response Time

Add warmup requests to keep model in memory:

Create a simple cron job or scheduled task:

# Every 5 minutes, make a request to keep model loaded
*/5 * * * * curl -X POST https://YOUR_API/ai/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"conversation":[{"role":"user","content":"ping"}]}'

Tip 2: Use System Prompts for Consistency

curl -X POST https://YOUR_API/ai/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {
        "role": "system",
        "content": "You are a friendly customer support agent. Be helpful and concise."
      },
      {
        "role": "user",
        "content": "How do I reset my password?"
      }
    ]
  }'

Tip 3: Batch Document Upload

Upload multiple documents efficiently:

# Create script: batch_upload.sh

for file in docs/*.txt; do
  echo "Uploading $file..."
  base64 "$file" > temp.b64
  curl -X POST https://YOUR_API/upload \
    -H "Authorization: Bearer YOUR_KEY" \
    -H "Content-Type: application/json" \
    -d "{
      \"filename\": \"$(basename $file)\",
      \"content_base64\": \"$(cat temp.b64)\"
    }"
  sleep 2  # Rate limiting
done

rm temp.b64

Tip 4: Monitor Costs

If using paid hardware:

  1. Check Hugging Face billing: https://huggingface.co/settings/billing
  2. Set up budget alerts
  3. Monitor Space uptime
  4. Pause Space when not in use:
    • Settings β†’ "Pause Space"
    • Saves money, stops billing
    • Resume anytime

Tip 5: Create API Key Tiers

In Space Settings, set up different keys for different users:

# Free tier - limited rate
API_KEYS=free_user_key_1,free_user_key_2

# Premium tier - higher rate
PREMIUM_API_KEYS=premium_user_key_1

# Admin tier - unlimited
ADMIN_API_KEYS=admin_key_1

Then adjust rate limits:

RATE_LIMIT_DEFAULT=60
RATE_LIMIT_PREMIUM=300
RATE_LIMIT_ADMIN=10000

βœ… Final Checklist

Before going live, verify:

  • Space is running (green status)
  • Health check returns "status": "healthy"
  • Chat endpoint responds correctly
  • Changed default API keys to strong random strings
  • Tested with your own API key
  • Documented your API keys securely (password manager)
  • Set appropriate rate limits
  • Chose right model for your hardware
  • Tested all endpoints you plan to use
  • Reviewed logs for errors
  • (Optional) Upgraded hardware if needed
  • (Optional) Made Space private if needed

πŸŽ‰ Congratulations!

You now have: βœ… A fully functional AI API running on Hugging Face Spaces
βœ… Powered by Ollama (no OpenAI costs!)
βœ… Accessible from anywhere via HTTPS
βœ… Secure with API key authentication
βœ… Ready to integrate into your apps

Your API URL:

https://YOUR_USERNAME-ai-api-ollama.hf.space

Share your API (securely):

  • Give URL + API key to developers
  • Use in web apps, mobile apps, scripts
  • Process millions of requests
  • Scale as needed

πŸ“ž Need Help?

If you're stuck:

  1. βœ… Re-read the relevant section
  2. βœ… Check Space logs for errors
  3. βœ… Try the troubleshooting section
  4. βœ… Open an issue on GitHub
  5. βœ… Ask on Hugging Face forums

Common beginner mistakes:

  • Forgot to rename Dockerfile.huggingface to Dockerfile
  • Used wrong API key format (missing "Bearer")
  • Chose model too large for hardware
  • Didn't wait for initial model download

πŸ“š What's Next?

Now that your API is live:

  1. Build a chat interface:

    • React app
    • Vue app
    • Mobile app
    • WordPress plugin
  2. Add more features:

    • User accounts
    • Usage analytics
    • Custom models
    • Advanced RAG
  3. Scale up:

    • Upgrade hardware
    • Add caching
    • Load balancing
    • CDN
  4. Monetize (optional):

    • Charge for API access
    • Offer different tiers
    • White-label for clients

You did it! πŸŽ‰πŸš€

Your AI-powered API is now live and ready to change the world!