Spaces:
Build error
Complete Step-by-Step Guide: Deploy AI API with Ollama to Hugging Face Spaces
(Absolute Beginner-Friendly Guide)
What you'll build: A fully working AI API running on Hugging Face Spaces that anyone can access via the internet, powered by Ollama (no OpenAI key needed).
Time needed: 30-45 minutes
Cost: FREE (or $0.60/hour for faster GPU)
No prior experience needed!
π What You Need Before Starting
- β A Hugging Face account (we'll create this if you don't have one)
- β Git installed on your computer
- β Basic ability to copy/paste and follow instructions
- β This project's code files (you already have these)
π― PART 1: Create Hugging Face Account & Space
Step 1.1: Create Hugging Face Account (Skip if you have one)
- Open your web browser
- Go to: https://huggingface.co/join
- Fill in:
- Email: Your email address
- Username: Pick a username (you'll need this later - write it down!)
- Password: Choose a strong password
- Click "Sign Up"
- Check your email and click the verification link
- You're now logged into Hugging Face!
Step 1.2: Create a New Space
Fill in the form:
Field What to Enter Example Owner Your username yournameSpace name ai-api-ollama(or anything you like) License Select "MIT" Select the Space SDK Click on "Docker" β οΈ IMPORTANT: Must be Docker! Space hardware Select "CPU basic - Free" for now (We'll upgrade later if needed) Repo type Leave as "Public" (or Private if you prefer) Click "Create Space" button at the bottom
IMPORTANT - Write down your Space URL:
https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollamaReplace
YOUR_USERNAMEwith your actual username.You'll see a page with instructions - ignore them for now, we'll do it differently.
π§ PART 2: Install Git and Set Up Authentication
Step 2.1: Check if Git is Installed
On Windows:
- Press
Windows Key + R - Type
cmdand press Enter - Type:
git --version - If you see a version number (like
git version 2.40.0), you have Git β - If you see an error, download Git from: https://git-scm.com/download/win
On Mac:
- Press
Command + Space - Type
terminaland press Enter - Type:
git --version - If you see a version number, you have Git β
- If not, it will prompt you to install Xcode Command Line Tools - click Install
On Linux:
git --version
If not installed:
sudo apt-get update
sudo apt-get install git
Step 2.2: Create Hugging Face Access Token
- Go to: https://huggingface.co/settings/tokens
- Click "New token" button
- Fill in:
- Name:
git-access(or anything you like) - Role: Select "Write"
- Name:
- Click "Generate token"
- CRITICAL: Copy the token and save it somewhere safe (Notepad, password manager)
- It looks like:
hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx - β οΈ You won't be able to see this again!
- It looks like:
π» PART 3: Clone Your Space to Your Computer
Step 3.1: Open Terminal/Command Prompt
Windows:
- Press
Windows Key + R - Type
cmdand press Enter - Navigate to where you want to work (e.g., Desktop):
cd Desktop
Mac/Linux:
- Open Terminal
- Navigate to where you want to work:
cd ~/Desktop
Step 3.2: Clone the Space Repository
Copy this command (replace YOUR_USERNAME with your actual Hugging Face username):
git clone https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollamaExample:
git clone https://huggingface.co/spaces/johndoe/ai-api-ollamaPress Enter
When prompted for username and password:
- Username: Your Hugging Face username
- Password: Paste your token (NOT your password!) - the one that starts with
hf_
You should see:
Cloning into 'ai-api-ollama'...Verify the folder was created:
cd ai-api-ollama ls(On Windows use
dirinstead ofls)
π PART 4: Copy Project Files to Space
Step 4.1: Locate Your AI API Service Files
You should have the project files in a folder. Let's say they're in:
- Windows:
C:\Users\YourName\Downloads\ai-api-service\ - Mac/Linux:
~/Downloads/ai-api-service/
Step 4.2: Copy ALL Files to Space Folder
Option A: Using File Explorer (Easiest)
Windows:
- Open File Explorer
- Navigate to your original
ai-api-servicefolder - Press
Ctrl + Ato select all files - Press
Ctrl + Cto copy - Navigate to
Desktop\ai-api-ollama(your Space folder) - Press
Ctrl + Vto paste - When asked about replacing files, click "Replace"
Mac:
- Open Finder
- Navigate to your original
ai-api-servicefolder - Press
Cmd + Ato select all files - Press
Cmd + Cto copy - Navigate to
Desktop/ai-api-ollama(your Space folder) - Press
Cmd + Vto paste
Option B: Using Command Line
From the terminal, in your Space folder:
Windows:
xcopy /E /I "C:\Users\YourName\Downloads\ai-api-service\*" .
Mac/Linux:
cp -r ~/Downloads/ai-api-service/* .
Step 4.3: Verify Files Were Copied
In your terminal (inside the ai-api-ollama folder):
ls
You should see these folders/files:
backend/examples/tests/package.jsonREADME.md.env.exampleDockerfile.huggingface- And many more files...
β If you see these, you're good to proceed!
π³ PART 5: Prepare the Dockerfile for Hugging Face
Step 5.1: Rename the Dockerfile
Hugging Face expects a file named exactly Dockerfile (no extension).
Windows Command Prompt:
ren Dockerfile.huggingface Dockerfile
Mac/Linux Terminal:
mv Dockerfile.huggingface Dockerfile
Step 5.2: Verify the Dockerfile
cat Dockerfile
You should see content starting with FROM node:18-alpine AS builder
β Good to go!
π PART 6: Create Space Configuration Files
Step 6.1: Create README.md for Your Space
This file tells Hugging Face how to run your Space.
Create a new file called README.md in your ai-api-ollama folder:
Windows:
notepad README.md
Mac/Linux:
nano README.md
Copy and paste this EXACT content (replace YOUR_USERNAME):
---
title: AI API Service with Ollama
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---
# AI API Service with Ollama
A production-ready AI API service powered by Ollama. No OpenAI API key needed!
## π Features
- π¬ **Multi-turn Chat** - Conversational AI with Llama2/Llama3
- π **RAG** - Retrieval-Augmented Generation with vector search
- πΌοΈ **Image Generation** - Text-to-image (requires additional API key)
- ποΈ **Voice Synthesis** - Text-to-speech (requires additional API key)
- π **Document Processing** - Upload and query PDFs, DOCX, TXT
- π **Authentication** - Secure API key-based access
- β‘ **Rate Limiting** - Prevent abuse
## π‘ API Endpoint
https://YOUR_USERNAME-ai-api-ollama.hf.space
## π Quick Start
### Health Check
```bash
curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health
Chat Example
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{
"conversation": [
{"role": "user", "content": "Explain machine learning in simple terms"}
]
}'
RAG Example
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
-H "Authorization: Bearer demo-key-1" \
-H "Content-Type: application/json" \
-d '{
"query": "What are transformers in AI?",
"top_k": 5
}'
π Authentication
Default API key: demo-key-1
β οΈ IMPORTANT: Change this in Space settings for production use!
π Available Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Service health check |
/metrics |
GET | Usage metrics |
/ai/chat |
POST | Multi-turn conversation |
/ai/query |
GET | Simple question answering |
/rag/query |
POST | Query with document retrieval |
/image/generate |
POST | Generate images (needs API key) |
/voice/synthesize |
POST | Text to speech (needs API key) |
/upload |
POST | Upload documents |
βοΈ Configuration
Configured with Ollama running inside the Space for true serverless deployment.
Current Settings:
- Model: Llama 2 (7B)
- Embedding Model: nomic-embed-text
- Hardware: See Space settings
π― Use Cases
- Chatbot backend for web/mobile apps
- Document Q&A system
- AI-powered search
- Content generation API
- Educational AI assistant
π Documentation
Full API documentation: See repository
π‘ Tips
- First request is slow - Ollama loads the model on first use (~30 seconds)
- Subsequent requests are fast - Model stays in memory
- Use persistent hardware - Upgrade from CPU to GPU for better performance
- Monitor costs - Free tier works great for testing, upgrade for production
π Support
Having issues? Check the logs or open an issue on GitHub.
Built with Encore.ts and Ollama
**Save the file**:
- Notepad: File β Save
- Nano: Press `Ctrl + O`, then `Enter`, then `Ctrl + X`
---
## π **PART 7: Configure Environment Variables in Space Settings**
### **Step 7.1: Go to Your Space Settings**
1. Open your browser
2. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings`
3. Scroll down to **"Variables and secrets"** section
### **Step 7.2: Add Environment Variables**
Click **"New variable"** for each of these:
#### **Variable 1: API_KEYS**
- **Name**: `API_KEYS`
- **Value**: `my-secret-key-12345,another-key-67890`
- β οΈ **IMPORTANT**: Replace with your own random keys!
- Use strong, random strings (20+ characters)
- Separate multiple keys with commas (no spaces)
- Click **"Save"**
#### **Variable 2: ADMIN_API_KEYS** (Optional but recommended)
- **Name**: `ADMIN_API_KEYS`
- **Value**: `admin-super-secret-key-99999`
- β οΈ Make this DIFFERENT from regular API keys
- This bypasses rate limits
- Click **"Save"**
#### **Variable 3: OLLAMA_MODEL**
- **Name**: `OLLAMA_MODEL`
- **Value**: Choose one:
- `phi:latest` (Fastest, smallest - 1.3GB - **RECOMMENDED FOR FREE CPU**)
- `llama2:latest` (Good quality - 4GB)
- `llama3:latest` (Best quality - 4.7GB - needs GPU)
- `mistral:latest` (Very good - 4GB)
- Click **"Save"**
**Recommendation for FREE tier**: Use `phi:latest`
#### **Variable 4: OLLAMA_EMBEDDING_MODEL**
- **Name**: `OLLAMA_EMBEDDING_MODEL`
- **Value**: `nomic-embed-text`
- Leave as is, this works great for RAG
- Click **"Save"**
#### **Variable 5: RATE_LIMIT_DEFAULT**
- **Name**: `RATE_LIMIT_DEFAULT`
- **Value**: `100`
- This means 100 requests per minute for regular API keys
- Click **"Save"**
#### **Variable 6: LOG_LEVEL** (Optional)
- **Name**: `LOG_LEVEL`
- **Value**: `info`
- Click **"Save"**
### **Step 7.3: Verify Your Variables**
You should now see these variables listed:
- β
`API_KEYS`
- β
`ADMIN_API_KEYS` (if you added it)
- β
`OLLAMA_MODEL`
- β
`OLLAMA_EMBEDDING_MODEL`
- β
`RATE_LIMIT_DEFAULT`
---
## π€ **PART 8: Push Code to Hugging Face**
Now we'll upload all the files to Hugging Face.
### **Step 8.1: Configure Git (First Time Only)**
In your terminal (inside the `ai-api-ollama` folder):
```bash
git config user.email "[email protected]"
git config user.name "Your Name"
Replace with your actual email and name.
Step 8.2: Add All Files to Git
git add .
The . means "add all files in this folder"
Step 8.3: Commit the Files
git commit -m "Initial deployment with Ollama support"
You should see output like:
[main abc1234] Initial deployment with Ollama support
XX files changed, XXX insertions(+)
Step 8.4: Push to Hugging Face
git push
When prompted for credentials:
- Username: Your Hugging Face username
- Password: Your Hugging Face token (starts with
hf_)
You'll see:
Enumerating objects: XX, done.
Counting objects: 100% (XX/XX), done.
Writing objects: 100% (XX/XX), XX.XX MiB | XX.XX MiB/s, done.
β Success! Your code is now on Hugging Face.
β³ PART 9: Wait for Build & Monitor Progress
Step 9.1: Go to Your Space
- Open browser:
https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama - You'll see a yellow "Building" status at the top
Step 9.2: Watch the Build Logs
- Click on the "Logs" tab (near the top)
- You'll see real-time output like:
Building Docker image... Step 1/15 : FROM node:18-alpine AS builder ...
Step 9.3: What to Expect (Timeline)
| Time | What's Happening | What You'll See |
|---|---|---|
| 0-2 min | Docker image building | Building Docker image... |
| 2-5 min | Installing Node dependencies | npm install... |
| 5-8 min | Installing Ollama | Installing Ollama... |
| 8-10 min | Starting services | Starting Ollama... |
| 10-15 min | Downloading Ollama model | Pulling model: phi:latest β³ LONGEST STEP |
| 15+ min | Warming up model | Warming up model... |
| Final | Space is RUNNING | π’ Green "Running" status |
Total time: 15-20 minutes for first deployment
Step 9.4: Troubleshooting Build Errors
If you see red error messages:
Common Error 1: npm install failed
- Fix: Check that
package.jsonwas copied correctly - Re-run:
git add package.json && git commit -m "fix package.json" && git push
Common Error 2: Port 7860 already in use
- Fix: This shouldn't happen, but if it does, check README.md has
app_port: 7860
Common Error 3: Model download timeout
- Fix: Use a smaller model like
phi:latestin environment variables - Or upgrade to GPU hardware (see Part 10)
Common Error 4: Out of memory
- Fix: Model too big for free CPU. Use
phi:latestor upgrade to paid tier
Step 9.5: Verify Space is Running
When build completes:
- Status changes to π’ "Running"
- You'll see in logs:
Starting AI API Service on port 7860... - Your API is now LIVE!
π PART 10: Test Your Live API
Step 10.1: Get Your Space URL
Your API is available at:
https://YOUR_USERNAME-ai-api-ollama.hf.space
Example:
https://johndoe-ai-api-ollama.hf.space
Step 10.2: Test Health Endpoint
Option A: Use Browser
- Open your browser
- Go to:
https://YOUR_USERNAME-ai-api-ollama.hf.space/health - You should see JSON like:
{ "status": "healthy", "version": "1.0.0", "services": [...] }
β If you see this, your API is working!
Option B: Use Command Line
curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health
Step 10.3: Test Chat Endpoint
Copy this command (replace YOUR_USERNAME and use one of your API keys):
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
-H "Authorization: Bearer my-secret-key-12345" \
-H "Content-Type: application/json" \
-d '{
"conversation": [
{
"role": "user",
"content": "Hello! Can you explain what you are in one sentence?"
}
]
}'
Expected response (takes 5-30 seconds for first request):
{
"reply": "I am an AI assistant powered by Llama, designed to help answer questions...",
"model": "llama2",
"usage": {
"prompt_tokens": 25,
"completion_tokens": 50,
"total_tokens": 75
},
"sources": null
}
β Success! Your AI API is working!
Step 10.4: Test RAG Endpoint (Optional)
First, upload a document:
# Create a test document
echo "The AI API Service is a production-ready API for chatbots. It supports Ollama, OpenAI, and HuggingFace." > test.txt
# Convert to base64
base64 test.txt > test.txt.b64
# Upload (Mac/Linux)
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/upload \
-H "Authorization: Bearer my-secret-key-12345" \
-H "Content-Type: application/json" \
-d "{
\"filename\": \"test.txt\",
\"content_base64\": \"$(cat test.txt.b64)\",
\"metadata\": {\"title\": \"Test Document\"}
}"
Then query it:
curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
-H "Authorization: Bearer my-secret-key-12345" \
-H "Content-Type: application/json" \
-d '{
"query": "What does the API support?",
"top_k": 3
}'
π PART 11: Monitor and Optimize (Optional)
Step 11.1: Check Metrics
curl https://YOUR_USERNAME-ai-api-ollama.hf.space/metrics \
-H "Authorization: Bearer my-secret-key-12345"
You'll see:
- Total requests
- Errors
- Response times
- Model usage
Step 11.2: Upgrade Hardware (If Needed)
If your Space is slow or timing out:
- Go to:
https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings - Scroll to "Space hardware"
- Click "Change hardware"
- Select:
- CPU upgrade ($0.60/hr) - 2x faster than free
- GPU T4 ($0.60/hr) - 10x faster, supports bigger models
- GPU A10G ($3.15/hr) - Best performance
- Click "Update Space"
- Space will restart with new hardware (~5 minutes)
Step 11.3: Use Bigger Models
Once you have GPU:
- Go to Settings β Variables and secrets
- Edit
OLLAMA_MODEL - Change to:
llama3:latestormistral:latest - Save
- Space will restart and download new model
π PART 12: Security Best Practices
Step 12.1: Change Default API Keys
β οΈ CRITICAL FOR PRODUCTION
- Go to Space Settings β Variables
- Edit
API_KEYS - Replace
demo-key-1with strong random keys:ak_live_a8f7d9e2c1b4f5a7d8e9c2b1a5f7,ak_live_b9c2d1e3f4a5b7c8d9e1f2a3b5 - Never share these keys publicly!
Step 12.2: Make Space Private (Optional)
- Go to:
https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings - Scroll to "Rename or change repo visibility"
- Click "Make private"
- Confirm
Now only you can see the Space, but the API still works for anyone with the URL and API key.
Step 12.3: Monitor Usage
Check logs regularly:
- Go to Space β Logs tab
- Look for suspicious activity:
- Many failed authentication attempts
- Unusually high request volume
- Error patterns
π― PART 13: Using Your API in Applications
Example: JavaScript/TypeScript Web App
// Save as: app.js
const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
const API_KEY = 'my-secret-key-12345'; // Your actual key
async function chat(message) {
const response = await fetch(`${API_URL}/ai/chat`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
conversation: [
{ role: 'user', content: message }
]
})
});
const data = await response.json();
return data.reply;
}
// Usage
chat('Hello!').then(reply => {
console.log('AI:', reply);
});
Example: Python Application
# Save as: app.py
import requests
API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space'
API_KEY = 'my-secret-key-12345'
def chat(message):
response = requests.post(
f'{API_URL}/ai/chat',
headers={
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
},
json={
'conversation': [
{'role': 'user', 'content': message}
]
}
)
return response.json()['reply']
# Usage
reply = chat('Hello!')
print(f'AI: {reply}')
Example: Mobile App (React Native)
// Save as: ChatService.js
const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
const API_KEY = 'my-secret-key-12345';
export async function sendMessage(message) {
try {
const response = await fetch(`${API_URL}/ai/chat`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
conversation: [
{ role: 'user', content: message }
]
})
});
if (!response.ok) {
throw new Error('API request failed');
}
const data = await response.json();
return data.reply;
} catch (error) {
console.error('Chat error:', error);
throw error;
}
}
π PART 14: Troubleshooting Common Issues
Issue 1: "Space is building for too long"
Symptoms: Build takes 30+ minutes
Causes:
- Large model download (llama3 is 4.7GB)
- Slow internet on Hugging Face servers
- Free tier resource limits
Solutions:
- Use smaller model:
phi:latest(1.3GB) - Upgrade to GPU hardware for faster downloads
- Wait patiently - first build is always slow
Issue 2: "Space crashed / Runtime error"
Symptoms: Red "Runtime error" status
Check logs for:
Error: Out of memory
- Fix: Model too big for hardware
- Solution: Use
phi:latestor upgrade to GPU T4
Error: Port 7860 already in use
- Fix: Check README.md has correct
app_port: 7860 - Solution: Edit README.md and push again
Error: Ollama failed to start
- Fix: Dockerfile issue
- Solution: Verify Dockerfile was renamed correctly
Issue 3: "API returns 401 Unauthorized"
Symptoms:
{"error": "Invalid API key"}
Solutions:
Check your Authorization header:
# Correct format: -H "Authorization: Bearer my-secret-key-12345" # NOT: -H "Authorization: my-secret-key-12345" # Missing "Bearer"Verify API key is in Space settings:
- Go to Settings β Variables
- Check
API_KEYScontains your key - Keys are case-sensitive!
Try the default key:
-H "Authorization: Bearer demo-key-1"
Issue 4: "API is very slow (30+ seconds)"
Causes:
- First request loads model into memory (normal)
- Free CPU tier is slow
- Model is too large for hardware
Solutions:
- First request is always slow - subsequent requests are fast
- Upgrade to GPU T4:
- Settings β Space hardware β GPU T4
- 10x faster inference
- Use smaller model:
phi:latest - Add model warmup (already in Dockerfile):
- Keeps model loaded
- Reduces cold start time
Issue 5: "Cannot upload documents"
Error: File too large
Fix:
- Default max size is 10MB
- To increase, add environment variable:
MAX_FILE_SIZE_MB=50
Error: Invalid file format
Fix:
- Only supports: PDF, DOCX, TXT
- Ensure file extension is correct
- Check file is not corrupted
Issue 6: "RAG returns no results"
Symptoms: Empty sources array in response
Causes:
- No documents uploaded yet
- Query doesn't match document content
- Embedding model not loaded
Solutions:
Upload a document first:
curl -X POST https://YOUR_API/upload \ -H "Authorization: Bearer YOUR_KEY" \ -F "[email protected]"Wait for processing (check logs):
Document processed successfully: doc_abc123Try broader query:
- Instead of: "What is the exact price?"
- Try: "pricing information"
Issue 7: "How do I see errors?"
Steps:
- Go to your Space
- Click "Logs" tab
- Look for lines with:
"level": "error" - Read the
"message"field
Common errors and fixes:
{"level":"error","message":"Invalid API key"}
β Fix: Check Authorization header
{"level":"error","message":"Rate limit exceeded"}
β Fix: Wait 60 seconds or use admin key
{"level":"error","message":"Ollama API error"}
β Fix: Model not loaded, wait for startup to complete
Issue 8: "Space keeps restarting"
Symptoms: Status alternates between Building and Running
Causes:
- Application crashes on startup
- Out of memory
- Port configuration issue
Debug steps:
- Check logs for crash reason
- Verify environment variables are set
- Try smaller model
- Contact Hugging Face support if persistent
π PART 15: Complete API Reference
Base URL
https://YOUR_USERNAME-ai-api-ollama.hf.space
Authentication
All endpoints (except /health) require:
Authorization: Bearer YOUR_API_KEY
1. Health Check
Endpoint: GET /health
No authentication required
Example:
curl https://YOUR_API/health
Response:
{
"status": "healthy",
"version": "1.0.0",
"services": [
{"name": "llm", "status": "up"},
{"name": "vector_db", "status": "up"}
],
"uptime_seconds": 3600
}
2. Metrics
Endpoint: GET /metrics
Requires authentication
Example:
curl https://YOUR_API/metrics \
-H "Authorization: Bearer YOUR_KEY"
Response:
{
"timestamp": 1698765432000,
"requests_total": 150,
"requests_by_endpoint": {
"/ai/chat": 100,
"/rag/query": 50
},
"errors_total": 5,
"rate_limit_hits": 2,
"average_response_time_ms": 1250
}
3. Simple Chat
Endpoint: POST /ai/chat
Request:
{
"conversation": [
{"role": "user", "content": "Hello!"}
],
"model": "llama2",
"options": {
"temperature": 0.7,
"max_tokens": 500
}
}
Response:
{
"reply": "Hello! How can I help you today?",
"model": "llama2",
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
},
"sources": null
}
Example:
curl -X POST https://YOUR_API/ai/chat \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"conversation": [
{"role": "user", "content": "Explain AI in one sentence"}
]
}'
4. Multi-turn Conversation
Endpoint: POST /ai/chat
Request (with context):
{
"conversation": [
{"role": "user", "content": "What is 2+2?"},
{"role": "assistant", "content": "2+2 equals 4."},
{"role": "user", "content": "What about 2+3?"}
]
}
Response:
{
"reply": "2+3 equals 5.",
"model": "llama2",
"usage": {...}
}
5. RAG Query
Endpoint: POST /rag/query
Request:
{
"query": "What are the main features?",
"top_k": 5,
"model": "llama2",
"use_retrieval": true
}
Response:
{
"answer": "The main features include...",
"sources": [
{
"doc_id": "doc_123",
"chunk_id": "chunk_5",
"content": "Feature description...",
"score": 0.92,
"metadata": {"title": "Documentation"}
}
],
"model": "llama2",
"usage": {...},
"retrieval_time_ms": 250
}
Example:
curl -X POST https://YOUR_API/rag/query \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What is machine learning?",
"top_k": 3
}'
6. Upload Document
Endpoint: POST /upload
Request:
{
"filename": "document.txt",
"content_base64": "VGhpcyBpcyBhIHRlc3Q=",
"metadata": {
"title": "Test Document",
"category": "docs"
}
}
Response:
{
"doc_id": "doc_abc123",
"filename": "document.txt",
"size_bytes": 1024,
"status": "processing",
"estimated_chunks": 5
}
Example (Linux/Mac):
# Encode file to base64
base64 document.txt > document.b64
# Upload
curl -X POST https://YOUR_API/upload \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d "{
\"filename\": \"document.txt\",
\"content_base64\": \"$(cat document.b64)\",
\"metadata\": {\"title\": \"My Document\"}
}"
7. Get Document Sources
Endpoint: GET /docs/:id/sources
Example:
curl https://YOUR_API/docs/doc_abc123/sources \
-H "Authorization: Bearer YOUR_KEY"
Response:
{
"sources": [
{
"doc_id": "doc_abc123",
"chunk_id": "chunk_0",
"content": "This is the first chunk...",
"score": 1.0,
"metadata": {...}
}
]
}
8. Simple Query
Endpoint: GET /ai/query?q=QUESTION
Example:
curl "https://YOUR_API/ai/query?q=What+is+AI" \
-H "Authorization: Bearer YOUR_KEY"
Response:
{
"answer": "AI stands for Artificial Intelligence...",
"model": "llama2"
}
9. Get Available Models
Endpoint: GET /rag/models
Example:
curl https://YOUR_API/rag/models \
-H "Authorization: Bearer YOUR_KEY"
Response:
{
"models": ["ollama", "llama", "llama2", "llama3", "mistral"],
"default_model": "llama2"
}
π PART 16: Advanced Tips & Tricks
Tip 1: Optimize Response Time
Add warmup requests to keep model in memory:
Create a simple cron job or scheduled task:
# Every 5 minutes, make a request to keep model loaded
*/5 * * * * curl -X POST https://YOUR_API/ai/chat \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"conversation":[{"role":"user","content":"ping"}]}'
Tip 2: Use System Prompts for Consistency
curl -X POST https://YOUR_API/ai/chat \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"conversation": [
{
"role": "system",
"content": "You are a friendly customer support agent. Be helpful and concise."
},
{
"role": "user",
"content": "How do I reset my password?"
}
]
}'
Tip 3: Batch Document Upload
Upload multiple documents efficiently:
# Create script: batch_upload.sh
for file in docs/*.txt; do
echo "Uploading $file..."
base64 "$file" > temp.b64
curl -X POST https://YOUR_API/upload \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d "{
\"filename\": \"$(basename $file)\",
\"content_base64\": \"$(cat temp.b64)\"
}"
sleep 2 # Rate limiting
done
rm temp.b64
Tip 4: Monitor Costs
If using paid hardware:
- Check Hugging Face billing: https://huggingface.co/settings/billing
- Set up budget alerts
- Monitor Space uptime
- Pause Space when not in use:
- Settings β "Pause Space"
- Saves money, stops billing
- Resume anytime
Tip 5: Create API Key Tiers
In Space Settings, set up different keys for different users:
# Free tier - limited rate
API_KEYS=free_user_key_1,free_user_key_2
# Premium tier - higher rate
PREMIUM_API_KEYS=premium_user_key_1
# Admin tier - unlimited
ADMIN_API_KEYS=admin_key_1
Then adjust rate limits:
RATE_LIMIT_DEFAULT=60
RATE_LIMIT_PREMIUM=300
RATE_LIMIT_ADMIN=10000
β Final Checklist
Before going live, verify:
- Space is running (green status)
- Health check returns
"status": "healthy" - Chat endpoint responds correctly
- Changed default API keys to strong random strings
- Tested with your own API key
- Documented your API keys securely (password manager)
- Set appropriate rate limits
- Chose right model for your hardware
- Tested all endpoints you plan to use
- Reviewed logs for errors
- (Optional) Upgraded hardware if needed
- (Optional) Made Space private if needed
π Congratulations!
You now have:
β
A fully functional AI API running on Hugging Face Spaces
β
Powered by Ollama (no OpenAI costs!)
β
Accessible from anywhere via HTTPS
β
Secure with API key authentication
β
Ready to integrate into your apps
Your API URL:
https://YOUR_USERNAME-ai-api-ollama.hf.space
Share your API (securely):
- Give URL + API key to developers
- Use in web apps, mobile apps, scripts
- Process millions of requests
- Scale as needed
π Need Help?
If you're stuck:
- β Re-read the relevant section
- β Check Space logs for errors
- β Try the troubleshooting section
- β Open an issue on GitHub
- β Ask on Hugging Face forums
Common beginner mistakes:
- Forgot to rename
Dockerfile.huggingfacetoDockerfile - Used wrong API key format (missing "Bearer")
- Chose model too large for hardware
- Didn't wait for initial model download
π What's Next?
Now that your API is live:
Build a chat interface:
- React app
- Vue app
- Mobile app
- WordPress plugin
Add more features:
- User accounts
- Usage analytics
- Custom models
- Advanced RAG
Scale up:
- Upgrade hardware
- Add caching
- Load balancing
- CDN
Monetize (optional):
- Charge for API access
- Offer different tiers
- White-label for clients
You did it! ππ
Your AI-powered API is now live and ready to change the world!