Spaces:

cygon24
/

ai-api-ollama

Build error

App Files Files Community

ai-api-ollama / COMPLETE_DEPLOYMENT_GUIDE.md

cygon

Initial deployment with Ollama support

d61feef about 1 month ago

preview code

raw

history blame contribute delete

33.7 kB

	# Complete Step-by-Step Guide: Deploy AI API with Ollama to Hugging Face Spaces
	## (Absolute Beginner-Friendly Guide)

	What you'll build: A fully working AI API running on Hugging Face Spaces that anyone can access via the internet, powered by Ollama (no OpenAI key needed).

	Time needed: 30-45 minutes
	Cost: FREE (or $0.60/hour for faster GPU)
	No prior experience needed!

	---

	## 📋 What You Need Before Starting

	1. ✅ A Hugging Face account (we'll create this if you don't have one)
	2. ✅ Git installed on your computer
	3. ✅ Basic ability to copy/paste and follow instructions
	4. ✅ This project's code files (you already have these)

	---

	## 🎯 PART 1: Create Hugging Face Account & Space

	### Step 1.1: Create Hugging Face Account (Skip if you have one)

	1. Open your web browser
	2. Go to: https://huggingface.co/join
	3. Fill in:
	- Email: Your email address
	- Username: Pick a username (you'll need this later - write it down!)
	- Password: Choose a strong password
	4. Click "Sign Up"
	5. Check your email and click the verification link
	6. You're now logged into Hugging Face!

	### Step 1.2: Create a New Space

	1. Go to: https://huggingface.co/new-space

	2. Fill in the form:

	\| Field \| What to Enter \| Example \|
	\|-------\|---------------\|---------\|
	\| Owner \| Your username \| `yourname` \|
	\| Space name \| `ai-api-ollama` \| (or anything you like) \|
	\| License \| Select "MIT" \| \|
	\| Select the Space SDK \| Click on "Docker" \| ⚠️ IMPORTANT: Must be Docker! \|
	\| Space hardware \| Select "CPU basic - Free" for now \| (We'll upgrade later if needed) \|
	\| Repo type \| Leave as "Public" \| (or Private if you prefer) \|

	3. Click "Create Space" button at the bottom

	4. IMPORTANT - Write down your Space URL:
	```
	https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
	```
	Replace `YOUR_USERNAME` with your actual username.

	5. You'll see a page with instructions - ignore them for now, we'll do it differently.

	---

	## 🔧 PART 2: Install Git and Set Up Authentication

	### Step 2.1: Check if Git is Installed

	On Windows:
	1. Press `Windows Key + R`
	2. Type `cmd` and press Enter
	3. Type: `git --version`
	4. If you see a version number (like `git version 2.40.0`), you have Git ✅
	5. If you see an error, download Git from: https://git-scm.com/download/win

	On Mac:
	1. Press `Command + Space`
	2. Type `terminal` and press Enter
	3. Type: `git --version`
	4. If you see a version number, you have Git ✅
	5. If not, it will prompt you to install Xcode Command Line Tools - click Install

	On Linux:
	```bash
	git --version
	```
	If not installed:
	```bash
	sudo apt-get update
	sudo apt-get install git
	```

	### Step 2.2: Create Hugging Face Access Token

	1. Go to: https://huggingface.co/settings/tokens
	2. Click "New token" button
	3. Fill in:
	- Name: `git-access` (or anything you like)
	- Role: Select "Write"
	4. Click "Generate token"
	5. CRITICAL: Copy the token and save it somewhere safe (Notepad, password manager)
	- It looks like: `hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
	- ⚠️ You won't be able to see this again!

	---

	## 💻 PART 3: Clone Your Space to Your Computer

	### Step 3.1: Open Terminal/Command Prompt

	Windows:
	1. Press `Windows Key + R`
	2. Type `cmd` and press Enter
	3. Navigate to where you want to work (e.g., Desktop):
	```
	cd Desktop
	```

	Mac/Linux:
	1. Open Terminal
	2. Navigate to where you want to work:
	```bash
	cd ~/Desktop
	```

	### Step 3.2: Clone the Space Repository

	1. Copy this command (replace YOUR_USERNAME with your actual Hugging Face username):
	```bash
	git clone https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama
	```

	2. Example:
	```bash
	git clone https://huggingface.co/spaces/johndoe/ai-api-ollama
	```

	3. Press Enter

	4. When prompted for username and password:
	- Username: Your Hugging Face username
	- Password: Paste your token (NOT your password!) - the one that starts with `hf_`

	5. You should see:
	```
	Cloning into 'ai-api-ollama'...
	```

	6. Verify the folder was created:
	```bash
	cd ai-api-ollama
	ls
	```
	(On Windows use `dir` instead of `ls`)

	---

	## 📂 PART 4: Copy Project Files to Space

	### Step 4.1: Locate Your AI API Service Files

	You should have the project files in a folder. Let's say they're in:
	- Windows: `C:\Users\YourName\Downloads\ai-api-service\`
	- Mac/Linux: `~/Downloads/ai-api-service/`

	### Step 4.2: Copy ALL Files to Space Folder

	Option A: Using File Explorer (Easiest)

	Windows:
	1. Open File Explorer
	2. Navigate to your original `ai-api-service` folder
	3. Press `Ctrl + A` to select all files
	4. Press `Ctrl + C` to copy
	5. Navigate to `Desktop\ai-api-ollama` (your Space folder)
	6. Press `Ctrl + V` to paste
	7. When asked about replacing files, click "Replace"

	Mac:
	1. Open Finder
	2. Navigate to your original `ai-api-service` folder
	3. Press `Cmd + A` to select all files
	4. Press `Cmd + C` to copy
	5. Navigate to `Desktop/ai-api-ollama` (your Space folder)
	6. Press `Cmd + V` to paste

	Option B: Using Command Line

	From the terminal, in your Space folder:

	Windows:
	```bash
	xcopy /E /I "C:\Users\YourName\Downloads\ai-api-service\*" .
	```

	Mac/Linux:
	```bash
	cp -r ~/Downloads/ai-api-service/* .
	```

	### Step 4.3: Verify Files Were Copied

	In your terminal (inside the `ai-api-ollama` folder):

	```bash
	ls
	```

	You should see these folders/files:
	- `backend/`
	- `examples/`
	- `tests/`
	- `package.json`
	- `README.md`
	- `.env.example`
	- `Dockerfile.huggingface`
	- And many more files...

	✅ If you see these, you're good to proceed!

	---

	## 🐳 PART 5: Prepare the Dockerfile for Hugging Face

	### Step 5.1: Rename the Dockerfile

	Hugging Face expects a file named exactly `Dockerfile` (no extension).

	Windows Command Prompt:
	```bash
	ren Dockerfile.huggingface Dockerfile
	```

	Mac/Linux Terminal:
	```bash
	mv Dockerfile.huggingface Dockerfile
	```

	### Step 5.2: Verify the Dockerfile

	```bash
	cat Dockerfile
	```

	You should see content starting with `FROM node:18-alpine AS builder`

	✅ Good to go!

	---

	## 📝 PART 6: Create Space Configuration Files

	### Step 6.1: Create README.md for Your Space

	This file tells Hugging Face how to run your Space.

	Create a new file called `README.md` in your `ai-api-ollama` folder:

	Windows:
	```bash
	notepad README.md
	```

	Mac/Linux:
	```bash
	nano README.md
	```

	Copy and paste this EXACT content (replace YOUR_USERNAME):

	```markdown
	---
	title: AI API Service with Ollama
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# AI API Service with Ollama

	A production-ready AI API service powered by Ollama. No OpenAI API key needed!

	## 🚀 Features

	- 💬 Multi-turn Chat - Conversational AI with Llama2/Llama3
	- 📚 RAG - Retrieval-Augmented Generation with vector search
	- 🖼️ Image Generation - Text-to-image (requires additional API key)
	- 🎙️ Voice Synthesis - Text-to-speech (requires additional API key)
	- 📄 Document Processing - Upload and query PDFs, DOCX, TXT
	- 🔒 Authentication - Secure API key-based access
	- ⚡ Rate Limiting - Prevent abuse

	## 📡 API Endpoint

	```
	https://YOUR_USERNAME-ai-api-ollama.hf.space
	```

	## 🔑 Quick Start

	### Health Check

	```bash
	curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health
	```

	### Chat Example

	```bash
	curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
	-H "Authorization: Bearer demo-key-1" \
	-H "Content-Type: application/json" \
	-d '{
	"conversation": [
	{"role": "user", "content": "Explain machine learning in simple terms"}
	]
	}'
	```

	### RAG Example

	```bash
	curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
	-H "Authorization: Bearer demo-key-1" \
	-H "Content-Type: application/json" \
	-d '{
	"query": "What are transformers in AI?",
	"top_k": 5
	}'
	```

	## 🔐 Authentication

	Default API key: `demo-key-1`

	⚠️ IMPORTANT: Change this in Space settings for production use!

	## 📚 Available Endpoints

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/health` \| GET \| Service health check \|
	\| `/metrics` \| GET \| Usage metrics \|
	\| `/ai/chat` \| POST \| Multi-turn conversation \|
	\| `/ai/query` \| GET \| Simple question answering \|
	\| `/rag/query` \| POST \| Query with document retrieval \|
	\| `/image/generate` \| POST \| Generate images (needs API key) \|
	\| `/voice/synthesize` \| POST \| Text to speech (needs API key) \|
	\| `/upload` \| POST \| Upload documents \|

	## ⚙️ Configuration

	Configured with Ollama running inside the Space for true serverless deployment.

	Current Settings:
	- Model: Llama 2 (7B)
	- Embedding Model: nomic-embed-text
	- Hardware: See Space settings

	## 🎯 Use Cases

	- Chatbot backend for web/mobile apps
	- Document Q&A system
	- AI-powered search
	- Content generation API
	- Educational AI assistant

	## 📖 Documentation

	Full API documentation: [See repository](https://github.com/your-username/ai-api-service)

	## 💡 Tips

	1. First request is slow - Ollama loads the model on first use (~30 seconds)
	2. Subsequent requests are fast - Model stays in memory
	3. Use persistent hardware - Upgrade from CPU to GPU for better performance
	4. Monitor costs - Free tier works great for testing, upgrade for production

	## 🆘 Support

	Having issues? Check the logs or open an issue on GitHub.

	---

	Built with [Encore.ts](https://encore.dev) and [Ollama](https://ollama.ai)
	```

	Save the file:
	- Notepad: File → Save
	- Nano: Press `Ctrl + O`, then `Enter`, then `Ctrl + X`

	---

	## 🔐 PART 7: Configure Environment Variables in Space Settings

	### Step 7.1: Go to Your Space Settings

	1. Open your browser
	2. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings`
	3. Scroll down to "Variables and secrets" section

	### Step 7.2: Add Environment Variables

	Click "New variable" for each of these:

	#### Variable 1: API_KEYS
	- Name: `API_KEYS`
	- Value: `my-secret-key-12345,another-key-67890`
	- ⚠️ IMPORTANT: Replace with your own random keys!
	- Use strong, random strings (20+ characters)
	- Separate multiple keys with commas (no spaces)
	- Click "Save"

	#### Variable 2: ADMIN_API_KEYS (Optional but recommended)
	- Name: `ADMIN_API_KEYS`
	- Value: `admin-super-secret-key-99999`
	- ⚠️ Make this DIFFERENT from regular API keys
	- This bypasses rate limits
	- Click "Save"

	#### Variable 3: OLLAMA_MODEL
	- Name: `OLLAMA_MODEL`
	- Value: Choose one:
	- `phi:latest` (Fastest, smallest - 1.3GB - RECOMMENDED FOR FREE CPU)
	- `llama2:latest` (Good quality - 4GB)
	- `llama3:latest` (Best quality - 4.7GB - needs GPU)
	- `mistral:latest` (Very good - 4GB)
	- Click "Save"

	Recommendation for FREE tier: Use `phi:latest`

	#### Variable 4: OLLAMA_EMBEDDING_MODEL
	- Name: `OLLAMA_EMBEDDING_MODEL`
	- Value: `nomic-embed-text`
	- Leave as is, this works great for RAG
	- Click "Save"

	#### Variable 5: RATE_LIMIT_DEFAULT
	- Name: `RATE_LIMIT_DEFAULT`
	- Value: `100`
	- This means 100 requests per minute for regular API keys
	- Click "Save"

	#### Variable 6: LOG_LEVEL (Optional)
	- Name: `LOG_LEVEL`
	- Value: `info`
	- Click "Save"

	### Step 7.3: Verify Your Variables

	You should now see these variables listed:
	- ✅ `API_KEYS`
	- ✅ `ADMIN_API_KEYS` (if you added it)
	- ✅ `OLLAMA_MODEL`
	- ✅ `OLLAMA_EMBEDDING_MODEL`
	- ✅ `RATE_LIMIT_DEFAULT`

	---

	## 📤 PART 8: Push Code to Hugging Face

	Now we'll upload all the files to Hugging Face.

	### Step 8.1: Configure Git (First Time Only)

	In your terminal (inside the `ai-api-ollama` folder):

	```bash
	git config user.email "[email protected]"
	git config user.name "Your Name"
	```

	Replace with your actual email and name.

	### Step 8.2: Add All Files to Git

	```bash
	git add .
	```

	The `.` means "add all files in this folder"

	### Step 8.3: Commit the Files

	```bash
	git commit -m "Initial deployment with Ollama support"
	```

	You should see output like:
	```
	[main abc1234] Initial deployment with Ollama support
	XX files changed, XXX insertions(+)
	```

	### Step 8.4: Push to Hugging Face

	```bash
	git push
	```

	When prompted for credentials:
	- Username: Your Hugging Face username
	- Password: Your Hugging Face token (starts with `hf_`)

	You'll see:
	```
	Enumerating objects: XX, done.
	Counting objects: 100% (XX/XX), done.
	Writing objects: 100% (XX/XX), XX.XX MiB \| XX.XX MiB/s, done.
	```

	✅ Success! Your code is now on Hugging Face.

	---

	## ⏳ PART 9: Wait for Build & Monitor Progress

	### Step 9.1: Go to Your Space

	1. Open browser: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama`
	2. You'll see a yellow "Building" status at the top

	### Step 9.2: Watch the Build Logs

	1. Click on the "Logs" tab (near the top)
	2. You'll see real-time output like:
	```
	Building Docker image...
	Step 1/15 : FROM node:18-alpine AS builder
	...
	```

	### Step 9.3: What to Expect (Timeline)

	\| Time \| What's Happening \| What You'll See \|
	\|------\|------------------\|-----------------\|
	\| 0-2 min \| Docker image building \| `Building Docker image...` \|
	\| 2-5 min \| Installing Node dependencies \| `npm install...` \|
	\| 5-8 min \| Installing Ollama \| `Installing Ollama...` \|
	\| 8-10 min \| Starting services \| `Starting Ollama...` \|
	\| 10-15 min \| Downloading Ollama model \| `Pulling model: phi:latest` ⏳ LONGEST STEP \|
	\| 15+ min \| Warming up model \| `Warming up model...` \|
	\| Final \| Space is RUNNING \| 🟢 Green "Running" status \|

	Total time: 15-20 minutes for first deployment

	### Step 9.4: Troubleshooting Build Errors

	If you see red error messages:

	Common Error 1: `npm install failed`
	- Fix: Check that `package.json` was copied correctly
	- Re-run: `git add package.json && git commit -m "fix package.json" && git push`

	Common Error 2: `Port 7860 already in use`
	- Fix: This shouldn't happen, but if it does, check README.md has `app_port: 7860`

	Common Error 3: `Model download timeout`
	- Fix: Use a smaller model like `phi:latest` in environment variables
	- Or upgrade to GPU hardware (see Part 10)

	Common Error 4: `Out of memory`
	- Fix: Model too big for free CPU. Use `phi:latest` or upgrade to paid tier

	### Step 9.5: Verify Space is Running

	When build completes:
	1. Status changes to 🟢 "Running"
	2. You'll see in logs: `Starting AI API Service on port 7860...`
	3. Your API is now LIVE!

	---

	## 🎉 PART 10: Test Your Live API

	### Step 10.1: Get Your Space URL

	Your API is available at:
	```
	https://YOUR_USERNAME-ai-api-ollama.hf.space
	```

	Example:
	```
	https://johndoe-ai-api-ollama.hf.space
	```

	### Step 10.2: Test Health Endpoint

	Option A: Use Browser
	1. Open your browser
	2. Go to: `https://YOUR_USERNAME-ai-api-ollama.hf.space/health`
	3. You should see JSON like:
	```json
	{
	"status": "healthy",
	"version": "1.0.0",
	"services": [...]
	}
	```

	✅ If you see this, your API is working!

	Option B: Use Command Line

	```bash
	curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health
	```

	### Step 10.3: Test Chat Endpoint

	Copy this command (replace YOUR_USERNAME and use one of your API keys):

	```bash
	curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \
	-H "Authorization: Bearer my-secret-key-12345" \
	-H "Content-Type: application/json" \
	-d '{
	"conversation": [
	{
	"role": "user",
	"content": "Hello! Can you explain what you are in one sentence?"
	}
	]
	}'
	```

	Expected response (takes 5-30 seconds for first request):
	```json
	{
	"reply": "I am an AI assistant powered by Llama, designed to help answer questions...",
	"model": "llama2",
	"usage": {
	"prompt_tokens": 25,
	"completion_tokens": 50,
	"total_tokens": 75
	},
	"sources": null
	}
	```

	✅ Success! Your AI API is working!

	### Step 10.4: Test RAG Endpoint (Optional)

	First, upload a document:

	```bash
	# Create a test document
	echo "The AI API Service is a production-ready API for chatbots. It supports Ollama, OpenAI, and HuggingFace." > test.txt

	# Convert to base64
	base64 test.txt > test.txt.b64

	# Upload (Mac/Linux)
	curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/upload \
	-H "Authorization: Bearer my-secret-key-12345" \
	-H "Content-Type: application/json" \
	-d "{
	\"filename\": \"test.txt\",
	\"content_base64\": \"$(cat test.txt.b64)\",
	\"metadata\": {\"title\": \"Test Document\"}
	}"
	```

	Then query it:

	```bash
	curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \
	-H "Authorization: Bearer my-secret-key-12345" \
	-H "Content-Type: application/json" \
	-d '{
	"query": "What does the API support?",
	"top_k": 3
	}'
	```

	---

	## 📊 PART 11: Monitor and Optimize (Optional)

	### Step 11.1: Check Metrics

	```bash
	curl https://YOUR_USERNAME-ai-api-ollama.hf.space/metrics \
	-H "Authorization: Bearer my-secret-key-12345"
	```

	You'll see:
	- Total requests
	- Errors
	- Response times
	- Model usage

	### Step 11.2: Upgrade Hardware (If Needed)

	If your Space is slow or timing out:

	1. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings`
	2. Scroll to "Space hardware"
	3. Click "Change hardware"
	4. Select:
	- CPU upgrade ($0.60/hr) - 2x faster than free
	- GPU T4 ($0.60/hr) - 10x faster, supports bigger models
	- GPU A10G ($3.15/hr) - Best performance
	5. Click "Update Space"
	6. Space will restart with new hardware (~5 minutes)

	### Step 11.3: Use Bigger Models

	Once you have GPU:

	1. Go to Settings → Variables and secrets
	2. Edit `OLLAMA_MODEL`
	3. Change to: `llama3:latest` or `mistral:latest`
	4. Save
	5. Space will restart and download new model

	---

	## 🔒 PART 12: Security Best Practices

	### Step 12.1: Change Default API Keys

	⚠️ CRITICAL FOR PRODUCTION

	1. Go to Space Settings → Variables
	2. Edit `API_KEYS`
	3. Replace `demo-key-1` with strong random keys:
	```
	ak_live_a8f7d9e2c1b4f5a7d8e9c2b1a5f7,ak_live_b9c2d1e3f4a5b7c8d9e1f2a3b5
	```
	4. Never share these keys publicly!

	### Step 12.2: Make Space Private (Optional)

	1. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings`
	2. Scroll to "Rename or change repo visibility"
	3. Click "Make private"
	4. Confirm

	Now only you can see the Space, but the API still works for anyone with the URL and API key.

	### Step 12.3: Monitor Usage

	Check logs regularly:
	1. Go to Space → Logs tab
	2. Look for suspicious activity:
	- Many failed authentication attempts
	- Unusually high request volume
	- Error patterns

	---

	## 🎯 PART 13: Using Your API in Applications

	### Example: JavaScript/TypeScript Web App

	```javascript
	// Save as: app.js

	const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
	const API_KEY = 'my-secret-key-12345'; // Your actual key

	async function chat(message) {
	const response = await fetch(`${API_URL}/ai/chat`, {
	method: 'POST',
	headers: {
	'Authorization': `Bearer ${API_KEY}`,
	'Content-Type': 'application/json',
	},
	body: JSON.stringify({
	conversation: [
	{ role: 'user', content: message }
	]
	})
	});

	const data = await response.json();
	return data.reply;
	}

	// Usage
	chat('Hello!').then(reply => {
	console.log('AI:', reply);
	});
	```

	### Example: Python Application

	```python
	# Save as: app.py

	import requests

	API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space'
	API_KEY = 'my-secret-key-12345'

	def chat(message):
	response = requests.post(
	f'{API_URL}/ai/chat',
	headers={
	'Authorization': f'Bearer {API_KEY}',
	'Content-Type': 'application/json'
	},
	json={
	'conversation': [
	{'role': 'user', 'content': message}
	]
	}
	)
	return response.json()['reply']

	# Usage
	reply = chat('Hello!')
	print(f'AI: {reply}')
	```

	### Example: Mobile App (React Native)

	```javascript
	// Save as: ChatService.js

	const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space';
	const API_KEY = 'my-secret-key-12345';

	export async function sendMessage(message) {
	try {
	const response = await fetch(`${API_URL}/ai/chat`, {
	method: 'POST',
	headers: {
	'Authorization': `Bearer ${API_KEY}`,
	'Content-Type': 'application/json',
	},
	body: JSON.stringify({
	conversation: [
	{ role: 'user', content: message }
	]
	})
	});

	if (!response.ok) {
	throw new Error('API request failed');
	}

	const data = await response.json();
	return data.reply;
	} catch (error) {
	console.error('Chat error:', error);
	throw error;
	}
	}
	```

	---

	## 🆘 PART 14: Troubleshooting Common Issues

	### Issue 1: "Space is building for too long"

	Symptoms: Build takes 30+ minutes

	Causes:
	- Large model download (llama3 is 4.7GB)
	- Slow internet on Hugging Face servers
	- Free tier resource limits

	Solutions:
	1. Use smaller model: `phi:latest` (1.3GB)
	2. Upgrade to GPU hardware for faster downloads
	3. Wait patiently - first build is always slow

	---

	### Issue 2: "Space crashed / Runtime error"

	Symptoms: Red "Runtime error" status

	Check logs for:

	Error: `Out of memory`
	- Fix: Model too big for hardware
	- Solution: Use `phi:latest` or upgrade to GPU T4

	Error: `Port 7860 already in use`
	- Fix: Check README.md has correct `app_port: 7860`
	- Solution: Edit README.md and push again

	Error: `Ollama failed to start`
	- Fix: Dockerfile issue
	- Solution: Verify Dockerfile was renamed correctly

	---

	### Issue 3: "API returns 401 Unauthorized"

	Symptoms:
	```json
	{"error": "Invalid API key"}
	```

	Solutions:
	1. Check your Authorization header:
	```bash
	# Correct format:
	-H "Authorization: Bearer my-secret-key-12345"

	# NOT:
	-H "Authorization: my-secret-key-12345" # Missing "Bearer"
	```

	2. Verify API key is in Space settings:
	- Go to Settings → Variables
	- Check `API_KEYS` contains your key
	- Keys are case-sensitive!

	3. Try the default key:
	```bash
	-H "Authorization: Bearer demo-key-1"
	```

	---

	### Issue 4: "API is very slow (30+ seconds)"

	Causes:
	- First request loads model into memory (normal)
	- Free CPU tier is slow
	- Model is too large for hardware

	Solutions:
	1. First request is always slow - subsequent requests are fast
	2. Upgrade to GPU T4:
	- Settings → Space hardware → GPU T4
	- 10x faster inference
	3. Use smaller model: `phi:latest`
	4. Add model warmup (already in Dockerfile):
	- Keeps model loaded
	- Reduces cold start time

	---

	### Issue 5: "Cannot upload documents"

	Error: `File too large`

	Fix:
	- Default max size is 10MB
	- To increase, add environment variable:
	```
	MAX_FILE_SIZE_MB=50
	```

	Error: `Invalid file format`

	Fix:
	- Only supports: PDF, DOCX, TXT
	- Ensure file extension is correct
	- Check file is not corrupted

	---

	### Issue 6: "RAG returns no results"

	Symptoms: Empty `sources` array in response

	Causes:
	1. No documents uploaded yet
	2. Query doesn't match document content
	3. Embedding model not loaded

	Solutions:
	1. Upload a document first:
	```bash
	curl -X POST https://YOUR_API/upload \
	-H "Authorization: Bearer YOUR_KEY" \
	-F "[email protected]"
	```

	2. Wait for processing (check logs):
	```
	Document processed successfully: doc_abc123
	```

	3. Try broader query:
	- Instead of: "What is the exact price?"
	- Try: "pricing information"

	---

	### Issue 7: "How do I see errors?"

	Steps:
	1. Go to your Space
	2. Click "Logs" tab
	3. Look for lines with:
	```
	"level": "error"
	```
	4. Read the `"message"` field

	Common errors and fixes:

	```json
	{"level":"error","message":"Invalid API key"}
	```
	→ Fix: Check Authorization header

	```json
	{"level":"error","message":"Rate limit exceeded"}
	```
	→ Fix: Wait 60 seconds or use admin key

	```json
	{"level":"error","message":"Ollama API error"}
	```
	→ Fix: Model not loaded, wait for startup to complete

	---

	### Issue 8: "Space keeps restarting"

	Symptoms: Status alternates between Building and Running

	Causes:
	- Application crashes on startup
	- Out of memory
	- Port configuration issue

	Debug steps:
	1. Check logs for crash reason
	2. Verify environment variables are set
	3. Try smaller model
	4. Contact Hugging Face support if persistent

	---

	## 📖 PART 15: Complete API Reference

	### Base URL
	```
	https://YOUR_USERNAME-ai-api-ollama.hf.space
	```

	### Authentication
	All endpoints (except `/health`) require:
	```
	Authorization: Bearer YOUR_API_KEY
	```

	---

	### 1. Health Check

	Endpoint: `GET /health`

	No authentication required

	Example:
	```bash
	curl https://YOUR_API/health
	```

	Response:
	```json
	{
	"status": "healthy",
	"version": "1.0.0",
	"services": [
	{"name": "llm", "status": "up"},
	{"name": "vector_db", "status": "up"}
	],
	"uptime_seconds": 3600
	}
	```

	---

	### 2. Metrics

	Endpoint: `GET /metrics`

	Requires authentication

	Example:
	```bash
	curl https://YOUR_API/metrics \
	-H "Authorization: Bearer YOUR_KEY"
	```

	Response:
	```json
	{
	"timestamp": 1698765432000,
	"requests_total": 150,
	"requests_by_endpoint": {
	"/ai/chat": 100,
	"/rag/query": 50
	},
	"errors_total": 5,
	"rate_limit_hits": 2,
	"average_response_time_ms": 1250
	}
	```

	---

	### 3. Simple Chat

	Endpoint: `POST /ai/chat`

	Request:
	```json
	{
	"conversation": [
	{"role": "user", "content": "Hello!"}
	],
	"model": "llama2",
	"options": {
	"temperature": 0.7,
	"max_tokens": 500
	}
	}
	```

	Response:
	```json
	{
	"reply": "Hello! How can I help you today?",
	"model": "llama2",
	"usage": {
	"prompt_tokens": 10,
	"completion_tokens": 20,
	"total_tokens": 30
	},
	"sources": null
	}
	```

	Example:
	```bash
	curl -X POST https://YOUR_API/ai/chat \
	-H "Authorization: Bearer YOUR_KEY" \
	-H "Content-Type: application/json" \
	-d '{
	"conversation": [
	{"role": "user", "content": "Explain AI in one sentence"}
	]
	}'
	```

	---

	### 4. Multi-turn Conversation

	Endpoint: `POST /ai/chat`

	Request (with context):
	```json
	{
	"conversation": [
	{"role": "user", "content": "What is 2+2?"},
	{"role": "assistant", "content": "2+2 equals 4."},
	{"role": "user", "content": "What about 2+3?"}
	]
	}
	```

	Response:
	```json
	{
	"reply": "2+3 equals 5.",
	"model": "llama2",
	"usage": {...}
	}
	```

	---

	### 5. RAG Query

	Endpoint: `POST /rag/query`

	Request:
	```json
	{
	"query": "What are the main features?",
	"top_k": 5,
	"model": "llama2",
	"use_retrieval": true
	}
	```

	Response:
	```json
	{
	"answer": "The main features include...",
	"sources": [
	{
	"doc_id": "doc_123",
	"chunk_id": "chunk_5",
	"content": "Feature description...",
	"score": 0.92,
	"metadata": {"title": "Documentation"}
	}
	],
	"model": "llama2",
	"usage": {...},
	"retrieval_time_ms": 250
	}
	```

	Example:
	```bash
	curl -X POST https://YOUR_API/rag/query \
	-H "Authorization: Bearer YOUR_KEY" \
	-H "Content-Type: application/json" \
	-d '{
	"query": "What is machine learning?",
	"top_k": 3
	}'
	```

	---

	### 6. Upload Document

	Endpoint: `POST /upload`

	Request:
	```json
	{
	"filename": "document.txt",
	"content_base64": "VGhpcyBpcyBhIHRlc3Q=",
	"metadata": {
	"title": "Test Document",
	"category": "docs"
	}
	}
	```

	Response:
	```json
	{
	"doc_id": "doc_abc123",
	"filename": "document.txt",
	"size_bytes": 1024,
	"status": "processing",
	"estimated_chunks": 5
	}
	```

	Example (Linux/Mac):
	```bash
	# Encode file to base64
	base64 document.txt > document.b64

	# Upload
	curl -X POST https://YOUR_API/upload \
	-H "Authorization: Bearer YOUR_KEY" \
	-H "Content-Type: application/json" \
	-d "{
	\"filename\": \"document.txt\",
	\"content_base64\": \"$(cat document.b64)\",
	\"metadata\": {\"title\": \"My Document\"}
	}"
	```

	---

	### 7. Get Document Sources

	Endpoint: `GET /docs/:id/sources`

	Example:
	```bash
	curl https://YOUR_API/docs/doc_abc123/sources \
	-H "Authorization: Bearer YOUR_KEY"
	```

	Response:
	```json
	{
	"sources": [
	{
	"doc_id": "doc_abc123",
	"chunk_id": "chunk_0",
	"content": "This is the first chunk...",
	"score": 1.0,
	"metadata": {...}
	}
	]
	}
	```

	---

	### 8. Simple Query

	Endpoint: `GET /ai/query?q=QUESTION`

	Example:
	```bash
	curl "https://YOUR_API/ai/query?q=What+is+AI" \
	-H "Authorization: Bearer YOUR_KEY"
	```

	Response:
	```json
	{
	"answer": "AI stands for Artificial Intelligence...",
	"model": "llama2"
	}
	```

	---

	### 9. Get Available Models

	Endpoint: `GET /rag/models`

	Example:
	```bash
	curl https://YOUR_API/rag/models \
	-H "Authorization: Bearer YOUR_KEY"
	```

	Response:
	```json
	{
	"models": ["ollama", "llama", "llama2", "llama3", "mistral"],
	"default_model": "llama2"
	}
	```

	---

	## 🎓 PART 16: Advanced Tips & Tricks

	### Tip 1: Optimize Response Time

	Add warmup requests to keep model in memory:

	Create a simple cron job or scheduled task:
	```bash
	# Every 5 minutes, make a request to keep model loaded
	/5 * * * curl -X POST https://YOUR_API/ai/chat \
	-H "Authorization: Bearer YOUR_KEY" \
	-H "Content-Type: application/json" \
	-d '{"conversation":[{"role":"user","content":"ping"}]}'
	```

	---

	### Tip 2: Use System Prompts for Consistency

	```bash
	curl -X POST https://YOUR_API/ai/chat \
	-H "Authorization: Bearer YOUR_KEY" \
	-H "Content-Type: application/json" \
	-d '{
	"conversation": [
	{
	"role": "system",
	"content": "You are a friendly customer support agent. Be helpful and concise."
	},
	{
	"role": "user",
	"content": "How do I reset my password?"
	}
	]
	}'
	```

	---

	### Tip 3: Batch Document Upload

	Upload multiple documents efficiently:

	```bash
	# Create script: batch_upload.sh

	for file in docs/*.txt; do
	echo "Uploading $file..."
	base64 "$file" > temp.b64
	curl -X POST https://YOUR_API/upload \
	-H "Authorization: Bearer YOUR_KEY" \
	-H "Content-Type: application/json" \
	-d "{
	\"filename\": \"$(basename $file)\",
	\"content_base64\": \"$(cat temp.b64)\"
	}"
	sleep 2 # Rate limiting
	done

	rm temp.b64
	```

	---

	### Tip 4: Monitor Costs

	If using paid hardware:

	1. Check Hugging Face billing: https://huggingface.co/settings/billing
	2. Set up budget alerts
	3. Monitor Space uptime
	4. Pause Space when not in use:
	- Settings → "Pause Space"
	- Saves money, stops billing
	- Resume anytime

	---

	### Tip 5: Create API Key Tiers

	In Space Settings, set up different keys for different users:

	```
	# Free tier - limited rate
	API_KEYS=free_user_key_1,free_user_key_2

	# Premium tier - higher rate
	PREMIUM_API_KEYS=premium_user_key_1

	# Admin tier - unlimited
	ADMIN_API_KEYS=admin_key_1
	```

	Then adjust rate limits:
	```
	RATE_LIMIT_DEFAULT=60
	RATE_LIMIT_PREMIUM=300
	RATE_LIMIT_ADMIN=10000
	```

	---

	## ✅ Final Checklist

	Before going live, verify:

	- [ ] Space is running (green status)
	- [ ] Health check returns `"status": "healthy"`
	- [ ] Chat endpoint responds correctly
	- [ ] Changed default API keys to strong random strings
	- [ ] Tested with your own API key
	- [ ] Documented your API keys securely (password manager)
	- [ ] Set appropriate rate limits
	- [ ] Chose right model for your hardware
	- [ ] Tested all endpoints you plan to use
	- [ ] Reviewed logs for errors
	- [ ] (Optional) Upgraded hardware if needed
	- [ ] (Optional) Made Space private if needed

	---

	## 🎉 Congratulations!

	You now have:
	✅ A fully functional AI API running on Hugging Face Spaces
	✅ Powered by Ollama (no OpenAI costs!)
	✅ Accessible from anywhere via HTTPS
	✅ Secure with API key authentication
	✅ Ready to integrate into your apps

	Your API URL:
	```
	https://YOUR_USERNAME-ai-api-ollama.hf.space
	```

	Share your API (securely):
	- Give URL + API key to developers
	- Use in web apps, mobile apps, scripts
	- Process millions of requests
	- Scale as needed

	---

	## 📞 Need Help?

	If you're stuck:
	1. ✅ Re-read the relevant section
	2. ✅ Check Space logs for errors
	3. ✅ Try the troubleshooting section
	4. ✅ Open an issue on GitHub
	5. ✅ Ask on Hugging Face forums

	Common beginner mistakes:
	- Forgot to rename `Dockerfile.huggingface` to `Dockerfile`
	- Used wrong API key format (missing "Bearer")
	- Chose model too large for hardware
	- Didn't wait for initial model download

	---

	## 📚 What's Next?

	Now that your API is live:

	1. Build a chat interface:
	- React app
	- Vue app
	- Mobile app
	- WordPress plugin

	2. Add more features:
	- User accounts
	- Usage analytics
	- Custom models
	- Advanced RAG

	3. Scale up:
	- Upgrade hardware
	- Add caching
	- Load balancing
	- CDN

	4. Monetize (optional):
	- Charge for API access
	- Offer different tiers
	- White-label for clients

	---

	You did it! 🎉🚀

	Your AI-powered API is now live and ready to change the world!