Spaces:
Build error
Build error
| # Complete Step-by-Step Guide: Deploy AI API with Ollama to Hugging Face Spaces | |
| ## (Absolute Beginner-Friendly Guide) | |
| **What you'll build**: A fully working AI API running on Hugging Face Spaces that anyone can access via the internet, powered by Ollama (no OpenAI key needed). | |
| **Time needed**: 30-45 minutes | |
| **Cost**: FREE (or $0.60/hour for faster GPU) | |
| **No prior experience needed!** | |
| --- | |
| ## π **What You Need Before Starting** | |
| 1. β A Hugging Face account (we'll create this if you don't have one) | |
| 2. β Git installed on your computer | |
| 3. β Basic ability to copy/paste and follow instructions | |
| 4. β This project's code files (you already have these) | |
| --- | |
| ## π― **PART 1: Create Hugging Face Account & Space** | |
| ### **Step 1.1: Create Hugging Face Account** (Skip if you have one) | |
| 1. Open your web browser | |
| 2. Go to: https://huggingface.co/join | |
| 3. Fill in: | |
| - **Email**: Your email address | |
| - **Username**: Pick a username (you'll need this later - write it down!) | |
| - **Password**: Choose a strong password | |
| 4. Click **"Sign Up"** | |
| 5. Check your email and click the verification link | |
| 6. You're now logged into Hugging Face! | |
| ### **Step 1.2: Create a New Space** | |
| 1. **Go to**: https://huggingface.co/new-space | |
| 2. **Fill in the form**: | |
| | Field | What to Enter | Example | | |
| |-------|---------------|---------| | |
| | **Owner** | Your username | `yourname` | | |
| | **Space name** | `ai-api-ollama` | (or anything you like) | | |
| | **License** | Select "MIT" | | | |
| | **Select the Space SDK** | Click on **"Docker"** | β οΈ IMPORTANT: Must be Docker! | | |
| | **Space hardware** | Select **"CPU basic - Free"** for now | (We'll upgrade later if needed) | | |
| | **Repo type** | Leave as **"Public"** | (or Private if you prefer) | | |
| 3. **Click "Create Space"** button at the bottom | |
| 4. **IMPORTANT - Write down your Space URL**: | |
| ``` | |
| https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama | |
| ``` | |
| Replace `YOUR_USERNAME` with your actual username. | |
| 5. You'll see a page with instructions - **ignore them for now**, we'll do it differently. | |
| --- | |
| ## π§ **PART 2: Install Git and Set Up Authentication** | |
| ### **Step 2.1: Check if Git is Installed** | |
| **On Windows**: | |
| 1. Press `Windows Key + R` | |
| 2. Type `cmd` and press Enter | |
| 3. Type: `git --version` | |
| 4. If you see a version number (like `git version 2.40.0`), you have Git β | |
| 5. If you see an error, download Git from: https://git-scm.com/download/win | |
| **On Mac**: | |
| 1. Press `Command + Space` | |
| 2. Type `terminal` and press Enter | |
| 3. Type: `git --version` | |
| 4. If you see a version number, you have Git β | |
| 5. If not, it will prompt you to install Xcode Command Line Tools - click Install | |
| **On Linux**: | |
| ```bash | |
| git --version | |
| ``` | |
| If not installed: | |
| ```bash | |
| sudo apt-get update | |
| sudo apt-get install git | |
| ``` | |
| ### **Step 2.2: Create Hugging Face Access Token** | |
| 1. Go to: https://huggingface.co/settings/tokens | |
| 2. Click **"New token"** button | |
| 3. Fill in: | |
| - **Name**: `git-access` (or anything you like) | |
| - **Role**: Select **"Write"** | |
| 4. Click **"Generate token"** | |
| 5. **CRITICAL**: Copy the token and save it somewhere safe (Notepad, password manager) | |
| - It looks like: `hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx` | |
| - β οΈ **You won't be able to see this again!** | |
| --- | |
| ## π» **PART 3: Clone Your Space to Your Computer** | |
| ### **Step 3.1: Open Terminal/Command Prompt** | |
| **Windows**: | |
| 1. Press `Windows Key + R` | |
| 2. Type `cmd` and press Enter | |
| 3. Navigate to where you want to work (e.g., Desktop): | |
| ``` | |
| cd Desktop | |
| ``` | |
| **Mac/Linux**: | |
| 1. Open Terminal | |
| 2. Navigate to where you want to work: | |
| ```bash | |
| cd ~/Desktop | |
| ``` | |
| ### **Step 3.2: Clone the Space Repository** | |
| 1. **Copy this command** (replace YOUR_USERNAME with your actual Hugging Face username): | |
| ```bash | |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama | |
| ``` | |
| 2. **Example**: | |
| ```bash | |
| git clone https://huggingface.co/spaces/johndoe/ai-api-ollama | |
| ``` | |
| 3. **Press Enter** | |
| 4. When prompted for username and password: | |
| - **Username**: Your Hugging Face username | |
| - **Password**: **Paste your token** (NOT your password!) - the one that starts with `hf_` | |
| 5. You should see: | |
| ``` | |
| Cloning into 'ai-api-ollama'... | |
| ``` | |
| 6. **Verify the folder was created**: | |
| ```bash | |
| cd ai-api-ollama | |
| ls | |
| ``` | |
| (On Windows use `dir` instead of `ls`) | |
| --- | |
| ## π **PART 4: Copy Project Files to Space** | |
| ### **Step 4.1: Locate Your AI API Service Files** | |
| You should have the project files in a folder. Let's say they're in: | |
| - Windows: `C:\Users\YourName\Downloads\ai-api-service\` | |
| - Mac/Linux: `~/Downloads/ai-api-service/` | |
| ### **Step 4.2: Copy ALL Files to Space Folder** | |
| **Option A: Using File Explorer (Easiest)** | |
| **Windows**: | |
| 1. Open File Explorer | |
| 2. Navigate to your original `ai-api-service` folder | |
| 3. Press `Ctrl + A` to select all files | |
| 4. Press `Ctrl + C` to copy | |
| 5. Navigate to `Desktop\ai-api-ollama` (your Space folder) | |
| 6. Press `Ctrl + V` to paste | |
| 7. When asked about replacing files, click **"Replace"** | |
| **Mac**: | |
| 1. Open Finder | |
| 2. Navigate to your original `ai-api-service` folder | |
| 3. Press `Cmd + A` to select all files | |
| 4. Press `Cmd + C` to copy | |
| 5. Navigate to `Desktop/ai-api-ollama` (your Space folder) | |
| 6. Press `Cmd + V` to paste | |
| **Option B: Using Command Line** | |
| From the terminal, in your Space folder: | |
| **Windows**: | |
| ```bash | |
| xcopy /E /I "C:\Users\YourName\Downloads\ai-api-service\*" . | |
| ``` | |
| **Mac/Linux**: | |
| ```bash | |
| cp -r ~/Downloads/ai-api-service/* . | |
| ``` | |
| ### **Step 4.3: Verify Files Were Copied** | |
| In your terminal (inside the `ai-api-ollama` folder): | |
| ```bash | |
| ls | |
| ``` | |
| You should see these folders/files: | |
| - `backend/` | |
| - `examples/` | |
| - `tests/` | |
| - `package.json` | |
| - `README.md` | |
| - `.env.example` | |
| - `Dockerfile.huggingface` | |
| - And many more files... | |
| β If you see these, you're good to proceed! | |
| --- | |
| ## π³ **PART 5: Prepare the Dockerfile for Hugging Face** | |
| ### **Step 5.1: Rename the Dockerfile** | |
| Hugging Face expects a file named exactly `Dockerfile` (no extension). | |
| **Windows Command Prompt**: | |
| ```bash | |
| ren Dockerfile.huggingface Dockerfile | |
| ``` | |
| **Mac/Linux Terminal**: | |
| ```bash | |
| mv Dockerfile.huggingface Dockerfile | |
| ``` | |
| ### **Step 5.2: Verify the Dockerfile** | |
| ```bash | |
| cat Dockerfile | |
| ``` | |
| You should see content starting with `FROM node:18-alpine AS builder` | |
| β Good to go! | |
| --- | |
| ## π **PART 6: Create Space Configuration Files** | |
| ### **Step 6.1: Create README.md for Your Space** | |
| This file tells Hugging Face how to run your Space. | |
| **Create a new file called `README.md`** in your `ai-api-ollama` folder: | |
| **Windows**: | |
| ```bash | |
| notepad README.md | |
| ``` | |
| **Mac/Linux**: | |
| ```bash | |
| nano README.md | |
| ``` | |
| **Copy and paste this EXACT content** (replace YOUR_USERNAME): | |
| ```markdown | |
| --- | |
| title: AI API Service with Ollama | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| --- | |
| # AI API Service with Ollama | |
| A production-ready AI API service powered by Ollama. No OpenAI API key needed! | |
| ## π Features | |
| - π¬ **Multi-turn Chat** - Conversational AI with Llama2/Llama3 | |
| - π **RAG** - Retrieval-Augmented Generation with vector search | |
| - πΌοΈ **Image Generation** - Text-to-image (requires additional API key) | |
| - ποΈ **Voice Synthesis** - Text-to-speech (requires additional API key) | |
| - π **Document Processing** - Upload and query PDFs, DOCX, TXT | |
| - π **Authentication** - Secure API key-based access | |
| - β‘ **Rate Limiting** - Prevent abuse | |
| ## π‘ API Endpoint | |
| ``` | |
| https://YOUR_USERNAME-ai-api-ollama.hf.space | |
| ``` | |
| ## π Quick Start | |
| ### Health Check | |
| ```bash | |
| curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health | |
| ``` | |
| ### Chat Example | |
| ```bash | |
| curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \ | |
| -H "Authorization: Bearer demo-key-1" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "conversation": [ | |
| {"role": "user", "content": "Explain machine learning in simple terms"} | |
| ] | |
| }' | |
| ``` | |
| ### RAG Example | |
| ```bash | |
| curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \ | |
| -H "Authorization: Bearer demo-key-1" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "query": "What are transformers in AI?", | |
| "top_k": 5 | |
| }' | |
| ``` | |
| ## π Authentication | |
| Default API key: `demo-key-1` | |
| **β οΈ IMPORTANT**: Change this in Space settings for production use! | |
| ## π Available Endpoints | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/health` | GET | Service health check | | |
| | `/metrics` | GET | Usage metrics | | |
| | `/ai/chat` | POST | Multi-turn conversation | | |
| | `/ai/query` | GET | Simple question answering | | |
| | `/rag/query` | POST | Query with document retrieval | | |
| | `/image/generate` | POST | Generate images (needs API key) | | |
| | `/voice/synthesize` | POST | Text to speech (needs API key) | | |
| | `/upload` | POST | Upload documents | | |
| ## βοΈ Configuration | |
| Configured with Ollama running **inside the Space** for true serverless deployment. | |
| **Current Settings**: | |
| - Model: Llama 2 (7B) | |
| - Embedding Model: nomic-embed-text | |
| - Hardware: See Space settings | |
| ## π― Use Cases | |
| - Chatbot backend for web/mobile apps | |
| - Document Q&A system | |
| - AI-powered search | |
| - Content generation API | |
| - Educational AI assistant | |
| ## π Documentation | |
| Full API documentation: [See repository](https://github.com/your-username/ai-api-service) | |
| ## π‘ Tips | |
| 1. **First request is slow** - Ollama loads the model on first use (~30 seconds) | |
| 2. **Subsequent requests are fast** - Model stays in memory | |
| 3. **Use persistent hardware** - Upgrade from CPU to GPU for better performance | |
| 4. **Monitor costs** - Free tier works great for testing, upgrade for production | |
| ## π Support | |
| Having issues? Check the logs or open an issue on GitHub. | |
| --- | |
| Built with [Encore.ts](https://encore.dev) and [Ollama](https://ollama.ai) | |
| ``` | |
| **Save the file**: | |
| - Notepad: File β Save | |
| - Nano: Press `Ctrl + O`, then `Enter`, then `Ctrl + X` | |
| --- | |
| ## π **PART 7: Configure Environment Variables in Space Settings** | |
| ### **Step 7.1: Go to Your Space Settings** | |
| 1. Open your browser | |
| 2. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings` | |
| 3. Scroll down to **"Variables and secrets"** section | |
| ### **Step 7.2: Add Environment Variables** | |
| Click **"New variable"** for each of these: | |
| #### **Variable 1: API_KEYS** | |
| - **Name**: `API_KEYS` | |
| - **Value**: `my-secret-key-12345,another-key-67890` | |
| - β οΈ **IMPORTANT**: Replace with your own random keys! | |
| - Use strong, random strings (20+ characters) | |
| - Separate multiple keys with commas (no spaces) | |
| - Click **"Save"** | |
| #### **Variable 2: ADMIN_API_KEYS** (Optional but recommended) | |
| - **Name**: `ADMIN_API_KEYS` | |
| - **Value**: `admin-super-secret-key-99999` | |
| - β οΈ Make this DIFFERENT from regular API keys | |
| - This bypasses rate limits | |
| - Click **"Save"** | |
| #### **Variable 3: OLLAMA_MODEL** | |
| - **Name**: `OLLAMA_MODEL` | |
| - **Value**: Choose one: | |
| - `phi:latest` (Fastest, smallest - 1.3GB - **RECOMMENDED FOR FREE CPU**) | |
| - `llama2:latest` (Good quality - 4GB) | |
| - `llama3:latest` (Best quality - 4.7GB - needs GPU) | |
| - `mistral:latest` (Very good - 4GB) | |
| - Click **"Save"** | |
| **Recommendation for FREE tier**: Use `phi:latest` | |
| #### **Variable 4: OLLAMA_EMBEDDING_MODEL** | |
| - **Name**: `OLLAMA_EMBEDDING_MODEL` | |
| - **Value**: `nomic-embed-text` | |
| - Leave as is, this works great for RAG | |
| - Click **"Save"** | |
| #### **Variable 5: RATE_LIMIT_DEFAULT** | |
| - **Name**: `RATE_LIMIT_DEFAULT` | |
| - **Value**: `100` | |
| - This means 100 requests per minute for regular API keys | |
| - Click **"Save"** | |
| #### **Variable 6: LOG_LEVEL** (Optional) | |
| - **Name**: `LOG_LEVEL` | |
| - **Value**: `info` | |
| - Click **"Save"** | |
| ### **Step 7.3: Verify Your Variables** | |
| You should now see these variables listed: | |
| - β `API_KEYS` | |
| - β `ADMIN_API_KEYS` (if you added it) | |
| - β `OLLAMA_MODEL` | |
| - β `OLLAMA_EMBEDDING_MODEL` | |
| - β `RATE_LIMIT_DEFAULT` | |
| --- | |
| ## π€ **PART 8: Push Code to Hugging Face** | |
| Now we'll upload all the files to Hugging Face. | |
| ### **Step 8.1: Configure Git (First Time Only)** | |
| In your terminal (inside the `ai-api-ollama` folder): | |
| ```bash | |
| git config user.email "[email protected]" | |
| git config user.name "Your Name" | |
| ``` | |
| Replace with your actual email and name. | |
| ### **Step 8.2: Add All Files to Git** | |
| ```bash | |
| git add . | |
| ``` | |
| The `.` means "add all files in this folder" | |
| ### **Step 8.3: Commit the Files** | |
| ```bash | |
| git commit -m "Initial deployment with Ollama support" | |
| ``` | |
| You should see output like: | |
| ``` | |
| [main abc1234] Initial deployment with Ollama support | |
| XX files changed, XXX insertions(+) | |
| ``` | |
| ### **Step 8.4: Push to Hugging Face** | |
| ```bash | |
| git push | |
| ``` | |
| When prompted for credentials: | |
| - **Username**: Your Hugging Face username | |
| - **Password**: Your Hugging Face token (starts with `hf_`) | |
| You'll see: | |
| ``` | |
| Enumerating objects: XX, done. | |
| Counting objects: 100% (XX/XX), done. | |
| Writing objects: 100% (XX/XX), XX.XX MiB | XX.XX MiB/s, done. | |
| ``` | |
| β **Success!** Your code is now on Hugging Face. | |
| --- | |
| ## β³ **PART 9: Wait for Build & Monitor Progress** | |
| ### **Step 9.1: Go to Your Space** | |
| 1. Open browser: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama` | |
| 2. You'll see a yellow "Building" status at the top | |
| ### **Step 9.2: Watch the Build Logs** | |
| 1. Click on the **"Logs"** tab (near the top) | |
| 2. You'll see real-time output like: | |
| ``` | |
| Building Docker image... | |
| Step 1/15 : FROM node:18-alpine AS builder | |
| ... | |
| ``` | |
| ### **Step 9.3: What to Expect (Timeline)** | |
| | Time | What's Happening | What You'll See | | |
| |------|------------------|-----------------| | |
| | 0-2 min | Docker image building | `Building Docker image...` | | |
| | 2-5 min | Installing Node dependencies | `npm install...` | | |
| | 5-8 min | Installing Ollama | `Installing Ollama...` | | |
| | 8-10 min | Starting services | `Starting Ollama...` | | |
| | 10-15 min | **Downloading Ollama model** | `Pulling model: phi:latest` β³ **LONGEST STEP** | | |
| | 15+ min | Warming up model | `Warming up model...` | | |
| | Final | **Space is RUNNING** | π’ Green "Running" status | | |
| **Total time**: 15-20 minutes for first deployment | |
| ### **Step 9.4: Troubleshooting Build Errors** | |
| If you see **red error messages**: | |
| **Common Error 1**: `npm install failed` | |
| - **Fix**: Check that `package.json` was copied correctly | |
| - Re-run: `git add package.json && git commit -m "fix package.json" && git push` | |
| **Common Error 2**: `Port 7860 already in use` | |
| - **Fix**: This shouldn't happen, but if it does, check README.md has `app_port: 7860` | |
| **Common Error 3**: `Model download timeout` | |
| - **Fix**: Use a smaller model like `phi:latest` in environment variables | |
| - Or upgrade to GPU hardware (see Part 10) | |
| **Common Error 4**: `Out of memory` | |
| - **Fix**: Model too big for free CPU. Use `phi:latest` or upgrade to paid tier | |
| ### **Step 9.5: Verify Space is Running** | |
| When build completes: | |
| 1. Status changes to π’ **"Running"** | |
| 2. You'll see in logs: `Starting AI API Service on port 7860...` | |
| 3. **Your API is now LIVE!** | |
| --- | |
| ## π **PART 10: Test Your Live API** | |
| ### **Step 10.1: Get Your Space URL** | |
| Your API is available at: | |
| ``` | |
| https://YOUR_USERNAME-ai-api-ollama.hf.space | |
| ``` | |
| **Example**: | |
| ``` | |
| https://johndoe-ai-api-ollama.hf.space | |
| ``` | |
| ### **Step 10.2: Test Health Endpoint** | |
| **Option A: Use Browser** | |
| 1. Open your browser | |
| 2. Go to: `https://YOUR_USERNAME-ai-api-ollama.hf.space/health` | |
| 3. You should see JSON like: | |
| ```json | |
| { | |
| "status": "healthy", | |
| "version": "1.0.0", | |
| "services": [...] | |
| } | |
| ``` | |
| β If you see this, your API is working! | |
| **Option B: Use Command Line** | |
| ```bash | |
| curl https://YOUR_USERNAME-ai-api-ollama.hf.space/health | |
| ``` | |
| ### **Step 10.3: Test Chat Endpoint** | |
| **Copy this command** (replace YOUR_USERNAME and use one of your API keys): | |
| ```bash | |
| curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/ai/chat \ | |
| -H "Authorization: Bearer my-secret-key-12345" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "conversation": [ | |
| { | |
| "role": "user", | |
| "content": "Hello! Can you explain what you are in one sentence?" | |
| } | |
| ] | |
| }' | |
| ``` | |
| **Expected response** (takes 5-30 seconds for first request): | |
| ```json | |
| { | |
| "reply": "I am an AI assistant powered by Llama, designed to help answer questions...", | |
| "model": "llama2", | |
| "usage": { | |
| "prompt_tokens": 25, | |
| "completion_tokens": 50, | |
| "total_tokens": 75 | |
| }, | |
| "sources": null | |
| } | |
| ``` | |
| β **Success!** Your AI API is working! | |
| ### **Step 10.4: Test RAG Endpoint (Optional)** | |
| First, upload a document: | |
| ```bash | |
| # Create a test document | |
| echo "The AI API Service is a production-ready API for chatbots. It supports Ollama, OpenAI, and HuggingFace." > test.txt | |
| # Convert to base64 | |
| base64 test.txt > test.txt.b64 | |
| # Upload (Mac/Linux) | |
| curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/upload \ | |
| -H "Authorization: Bearer my-secret-key-12345" \ | |
| -H "Content-Type: application/json" \ | |
| -d "{ | |
| \"filename\": \"test.txt\", | |
| \"content_base64\": \"$(cat test.txt.b64)\", | |
| \"metadata\": {\"title\": \"Test Document\"} | |
| }" | |
| ``` | |
| Then query it: | |
| ```bash | |
| curl -X POST https://YOUR_USERNAME-ai-api-ollama.hf.space/rag/query \ | |
| -H "Authorization: Bearer my-secret-key-12345" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "query": "What does the API support?", | |
| "top_k": 3 | |
| }' | |
| ``` | |
| --- | |
| ## π **PART 11: Monitor and Optimize (Optional)** | |
| ### **Step 11.1: Check Metrics** | |
| ```bash | |
| curl https://YOUR_USERNAME-ai-api-ollama.hf.space/metrics \ | |
| -H "Authorization: Bearer my-secret-key-12345" | |
| ``` | |
| You'll see: | |
| - Total requests | |
| - Errors | |
| - Response times | |
| - Model usage | |
| ### **Step 11.2: Upgrade Hardware (If Needed)** | |
| If your Space is slow or timing out: | |
| 1. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings` | |
| 2. Scroll to **"Space hardware"** | |
| 3. Click **"Change hardware"** | |
| 4. Select: | |
| - **CPU upgrade** ($0.60/hr) - 2x faster than free | |
| - **GPU T4** ($0.60/hr) - 10x faster, supports bigger models | |
| - **GPU A10G** ($3.15/hr) - Best performance | |
| 5. Click **"Update Space"** | |
| 6. Space will restart with new hardware (~5 minutes) | |
| ### **Step 11.3: Use Bigger Models** | |
| Once you have GPU: | |
| 1. Go to Settings β Variables and secrets | |
| 2. Edit `OLLAMA_MODEL` | |
| 3. Change to: `llama3:latest` or `mistral:latest` | |
| 4. Save | |
| 5. Space will restart and download new model | |
| --- | |
| ## π **PART 12: Security Best Practices** | |
| ### **Step 12.1: Change Default API Keys** | |
| **β οΈ CRITICAL FOR PRODUCTION** | |
| 1. Go to Space Settings β Variables | |
| 2. Edit `API_KEYS` | |
| 3. Replace `demo-key-1` with strong random keys: | |
| ``` | |
| ak_live_a8f7d9e2c1b4f5a7d8e9c2b1a5f7,ak_live_b9c2d1e3f4a5b7c8d9e1f2a3b5 | |
| ``` | |
| 4. **Never share these keys publicly!** | |
| ### **Step 12.2: Make Space Private (Optional)** | |
| 1. Go to: `https://huggingface.co/spaces/YOUR_USERNAME/ai-api-ollama/settings` | |
| 2. Scroll to **"Rename or change repo visibility"** | |
| 3. Click **"Make private"** | |
| 4. Confirm | |
| Now only you can see the Space, but the API still works for anyone with the URL and API key. | |
| ### **Step 12.3: Monitor Usage** | |
| Check logs regularly: | |
| 1. Go to Space β Logs tab | |
| 2. Look for suspicious activity: | |
| - Many failed authentication attempts | |
| - Unusually high request volume | |
| - Error patterns | |
| --- | |
| ## π― **PART 13: Using Your API in Applications** | |
| ### **Example: JavaScript/TypeScript Web App** | |
| ```javascript | |
| // Save as: app.js | |
| const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space'; | |
| const API_KEY = 'my-secret-key-12345'; // Your actual key | |
| async function chat(message) { | |
| const response = await fetch(`${API_URL}/ai/chat`, { | |
| method: 'POST', | |
| headers: { | |
| 'Authorization': `Bearer ${API_KEY}`, | |
| 'Content-Type': 'application/json', | |
| }, | |
| body: JSON.stringify({ | |
| conversation: [ | |
| { role: 'user', content: message } | |
| ] | |
| }) | |
| }); | |
| const data = await response.json(); | |
| return data.reply; | |
| } | |
| // Usage | |
| chat('Hello!').then(reply => { | |
| console.log('AI:', reply); | |
| }); | |
| ``` | |
| ### **Example: Python Application** | |
| ```python | |
| # Save as: app.py | |
| import requests | |
| API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space' | |
| API_KEY = 'my-secret-key-12345' | |
| def chat(message): | |
| response = requests.post( | |
| f'{API_URL}/ai/chat', | |
| headers={ | |
| 'Authorization': f'Bearer {API_KEY}', | |
| 'Content-Type': 'application/json' | |
| }, | |
| json={ | |
| 'conversation': [ | |
| {'role': 'user', 'content': message} | |
| ] | |
| } | |
| ) | |
| return response.json()['reply'] | |
| # Usage | |
| reply = chat('Hello!') | |
| print(f'AI: {reply}') | |
| ``` | |
| ### **Example: Mobile App (React Native)** | |
| ```javascript | |
| // Save as: ChatService.js | |
| const API_URL = 'https://YOUR_USERNAME-ai-api-ollama.hf.space'; | |
| const API_KEY = 'my-secret-key-12345'; | |
| export async function sendMessage(message) { | |
| try { | |
| const response = await fetch(`${API_URL}/ai/chat`, { | |
| method: 'POST', | |
| headers: { | |
| 'Authorization': `Bearer ${API_KEY}`, | |
| 'Content-Type': 'application/json', | |
| }, | |
| body: JSON.stringify({ | |
| conversation: [ | |
| { role: 'user', content: message } | |
| ] | |
| }) | |
| }); | |
| if (!response.ok) { | |
| throw new Error('API request failed'); | |
| } | |
| const data = await response.json(); | |
| return data.reply; | |
| } catch (error) { | |
| console.error('Chat error:', error); | |
| throw error; | |
| } | |
| } | |
| ``` | |
| --- | |
| ## π **PART 14: Troubleshooting Common Issues** | |
| ### **Issue 1: "Space is building for too long"** | |
| **Symptoms**: Build takes 30+ minutes | |
| **Causes**: | |
| - Large model download (llama3 is 4.7GB) | |
| - Slow internet on Hugging Face servers | |
| - Free tier resource limits | |
| **Solutions**: | |
| 1. Use smaller model: `phi:latest` (1.3GB) | |
| 2. Upgrade to GPU hardware for faster downloads | |
| 3. Wait patiently - first build is always slow | |
| --- | |
| ### **Issue 2: "Space crashed / Runtime error"** | |
| **Symptoms**: Red "Runtime error" status | |
| **Check logs for**: | |
| **Error**: `Out of memory` | |
| - **Fix**: Model too big for hardware | |
| - **Solution**: Use `phi:latest` or upgrade to GPU T4 | |
| **Error**: `Port 7860 already in use` | |
| - **Fix**: Check README.md has correct `app_port: 7860` | |
| - **Solution**: Edit README.md and push again | |
| **Error**: `Ollama failed to start` | |
| - **Fix**: Dockerfile issue | |
| - **Solution**: Verify Dockerfile was renamed correctly | |
| --- | |
| ### **Issue 3: "API returns 401 Unauthorized"** | |
| **Symptoms**: | |
| ```json | |
| {"error": "Invalid API key"} | |
| ``` | |
| **Solutions**: | |
| 1. **Check your Authorization header**: | |
| ```bash | |
| # Correct format: | |
| -H "Authorization: Bearer my-secret-key-12345" | |
| # NOT: | |
| -H "Authorization: my-secret-key-12345" # Missing "Bearer" | |
| ``` | |
| 2. **Verify API key is in Space settings**: | |
| - Go to Settings β Variables | |
| - Check `API_KEYS` contains your key | |
| - Keys are case-sensitive! | |
| 3. **Try the default key**: | |
| ```bash | |
| -H "Authorization: Bearer demo-key-1" | |
| ``` | |
| --- | |
| ### **Issue 4: "API is very slow (30+ seconds)"** | |
| **Causes**: | |
| - First request loads model into memory (normal) | |
| - Free CPU tier is slow | |
| - Model is too large for hardware | |
| **Solutions**: | |
| 1. **First request is always slow** - subsequent requests are fast | |
| 2. **Upgrade to GPU T4**: | |
| - Settings β Space hardware β GPU T4 | |
| - 10x faster inference | |
| 3. **Use smaller model**: `phi:latest` | |
| 4. **Add model warmup** (already in Dockerfile): | |
| - Keeps model loaded | |
| - Reduces cold start time | |
| --- | |
| ### **Issue 5: "Cannot upload documents"** | |
| **Error**: `File too large` | |
| **Fix**: | |
| - Default max size is 10MB | |
| - To increase, add environment variable: | |
| ``` | |
| MAX_FILE_SIZE_MB=50 | |
| ``` | |
| **Error**: `Invalid file format` | |
| **Fix**: | |
| - Only supports: PDF, DOCX, TXT | |
| - Ensure file extension is correct | |
| - Check file is not corrupted | |
| --- | |
| ### **Issue 6: "RAG returns no results"** | |
| **Symptoms**: Empty `sources` array in response | |
| **Causes**: | |
| 1. No documents uploaded yet | |
| 2. Query doesn't match document content | |
| 3. Embedding model not loaded | |
| **Solutions**: | |
| 1. **Upload a document first**: | |
| ```bash | |
| curl -X POST https://YOUR_API/upload \ | |
| -H "Authorization: Bearer YOUR_KEY" \ | |
| -F "[email protected]" | |
| ``` | |
| 2. **Wait for processing** (check logs): | |
| ``` | |
| Document processed successfully: doc_abc123 | |
| ``` | |
| 3. **Try broader query**: | |
| - Instead of: "What is the exact price?" | |
| - Try: "pricing information" | |
| --- | |
| ### **Issue 7: "How do I see errors?"** | |
| **Steps**: | |
| 1. Go to your Space | |
| 2. Click **"Logs"** tab | |
| 3. Look for lines with: | |
| ``` | |
| "level": "error" | |
| ``` | |
| 4. Read the `"message"` field | |
| **Common errors and fixes**: | |
| ```json | |
| {"level":"error","message":"Invalid API key"} | |
| ``` | |
| β Fix: Check Authorization header | |
| ```json | |
| {"level":"error","message":"Rate limit exceeded"} | |
| ``` | |
| β Fix: Wait 60 seconds or use admin key | |
| ```json | |
| {"level":"error","message":"Ollama API error"} | |
| ``` | |
| β Fix: Model not loaded, wait for startup to complete | |
| --- | |
| ### **Issue 8: "Space keeps restarting"** | |
| **Symptoms**: Status alternates between Building and Running | |
| **Causes**: | |
| - Application crashes on startup | |
| - Out of memory | |
| - Port configuration issue | |
| **Debug steps**: | |
| 1. Check logs for crash reason | |
| 2. Verify environment variables are set | |
| 3. Try smaller model | |
| 4. Contact Hugging Face support if persistent | |
| --- | |
| ## π **PART 15: Complete API Reference** | |
| ### **Base URL** | |
| ``` | |
| https://YOUR_USERNAME-ai-api-ollama.hf.space | |
| ``` | |
| ### **Authentication** | |
| All endpoints (except `/health`) require: | |
| ``` | |
| Authorization: Bearer YOUR_API_KEY | |
| ``` | |
| --- | |
| ### **1. Health Check** | |
| **Endpoint**: `GET /health` | |
| **No authentication required** | |
| **Example**: | |
| ```bash | |
| curl https://YOUR_API/health | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "status": "healthy", | |
| "version": "1.0.0", | |
| "services": [ | |
| {"name": "llm", "status": "up"}, | |
| {"name": "vector_db", "status": "up"} | |
| ], | |
| "uptime_seconds": 3600 | |
| } | |
| ``` | |
| --- | |
| ### **2. Metrics** | |
| **Endpoint**: `GET /metrics` | |
| **Requires authentication** | |
| **Example**: | |
| ```bash | |
| curl https://YOUR_API/metrics \ | |
| -H "Authorization: Bearer YOUR_KEY" | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "timestamp": 1698765432000, | |
| "requests_total": 150, | |
| "requests_by_endpoint": { | |
| "/ai/chat": 100, | |
| "/rag/query": 50 | |
| }, | |
| "errors_total": 5, | |
| "rate_limit_hits": 2, | |
| "average_response_time_ms": 1250 | |
| } | |
| ``` | |
| --- | |
| ### **3. Simple Chat** | |
| **Endpoint**: `POST /ai/chat` | |
| **Request**: | |
| ```json | |
| { | |
| "conversation": [ | |
| {"role": "user", "content": "Hello!"} | |
| ], | |
| "model": "llama2", | |
| "options": { | |
| "temperature": 0.7, | |
| "max_tokens": 500 | |
| } | |
| } | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "reply": "Hello! How can I help you today?", | |
| "model": "llama2", | |
| "usage": { | |
| "prompt_tokens": 10, | |
| "completion_tokens": 20, | |
| "total_tokens": 30 | |
| }, | |
| "sources": null | |
| } | |
| ``` | |
| **Example**: | |
| ```bash | |
| curl -X POST https://YOUR_API/ai/chat \ | |
| -H "Authorization: Bearer YOUR_KEY" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "conversation": [ | |
| {"role": "user", "content": "Explain AI in one sentence"} | |
| ] | |
| }' | |
| ``` | |
| --- | |
| ### **4. Multi-turn Conversation** | |
| **Endpoint**: `POST /ai/chat` | |
| **Request** (with context): | |
| ```json | |
| { | |
| "conversation": [ | |
| {"role": "user", "content": "What is 2+2?"}, | |
| {"role": "assistant", "content": "2+2 equals 4."}, | |
| {"role": "user", "content": "What about 2+3?"} | |
| ] | |
| } | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "reply": "2+3 equals 5.", | |
| "model": "llama2", | |
| "usage": {...} | |
| } | |
| ``` | |
| --- | |
| ### **5. RAG Query** | |
| **Endpoint**: `POST /rag/query` | |
| **Request**: | |
| ```json | |
| { | |
| "query": "What are the main features?", | |
| "top_k": 5, | |
| "model": "llama2", | |
| "use_retrieval": true | |
| } | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "answer": "The main features include...", | |
| "sources": [ | |
| { | |
| "doc_id": "doc_123", | |
| "chunk_id": "chunk_5", | |
| "content": "Feature description...", | |
| "score": 0.92, | |
| "metadata": {"title": "Documentation"} | |
| } | |
| ], | |
| "model": "llama2", | |
| "usage": {...}, | |
| "retrieval_time_ms": 250 | |
| } | |
| ``` | |
| **Example**: | |
| ```bash | |
| curl -X POST https://YOUR_API/rag/query \ | |
| -H "Authorization: Bearer YOUR_KEY" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "query": "What is machine learning?", | |
| "top_k": 3 | |
| }' | |
| ``` | |
| --- | |
| ### **6. Upload Document** | |
| **Endpoint**: `POST /upload` | |
| **Request**: | |
| ```json | |
| { | |
| "filename": "document.txt", | |
| "content_base64": "VGhpcyBpcyBhIHRlc3Q=", | |
| "metadata": { | |
| "title": "Test Document", | |
| "category": "docs" | |
| } | |
| } | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "doc_id": "doc_abc123", | |
| "filename": "document.txt", | |
| "size_bytes": 1024, | |
| "status": "processing", | |
| "estimated_chunks": 5 | |
| } | |
| ``` | |
| **Example (Linux/Mac)**: | |
| ```bash | |
| # Encode file to base64 | |
| base64 document.txt > document.b64 | |
| # Upload | |
| curl -X POST https://YOUR_API/upload \ | |
| -H "Authorization: Bearer YOUR_KEY" \ | |
| -H "Content-Type: application/json" \ | |
| -d "{ | |
| \"filename\": \"document.txt\", | |
| \"content_base64\": \"$(cat document.b64)\", | |
| \"metadata\": {\"title\": \"My Document\"} | |
| }" | |
| ``` | |
| --- | |
| ### **7. Get Document Sources** | |
| **Endpoint**: `GET /docs/:id/sources` | |
| **Example**: | |
| ```bash | |
| curl https://YOUR_API/docs/doc_abc123/sources \ | |
| -H "Authorization: Bearer YOUR_KEY" | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "sources": [ | |
| { | |
| "doc_id": "doc_abc123", | |
| "chunk_id": "chunk_0", | |
| "content": "This is the first chunk...", | |
| "score": 1.0, | |
| "metadata": {...} | |
| } | |
| ] | |
| } | |
| ``` | |
| --- | |
| ### **8. Simple Query** | |
| **Endpoint**: `GET /ai/query?q=QUESTION` | |
| **Example**: | |
| ```bash | |
| curl "https://YOUR_API/ai/query?q=What+is+AI" \ | |
| -H "Authorization: Bearer YOUR_KEY" | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "answer": "AI stands for Artificial Intelligence...", | |
| "model": "llama2" | |
| } | |
| ``` | |
| --- | |
| ### **9. Get Available Models** | |
| **Endpoint**: `GET /rag/models` | |
| **Example**: | |
| ```bash | |
| curl https://YOUR_API/rag/models \ | |
| -H "Authorization: Bearer YOUR_KEY" | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "models": ["ollama", "llama", "llama2", "llama3", "mistral"], | |
| "default_model": "llama2" | |
| } | |
| ``` | |
| --- | |
| ## π **PART 16: Advanced Tips & Tricks** | |
| ### **Tip 1: Optimize Response Time** | |
| **Add warmup requests** to keep model in memory: | |
| Create a simple cron job or scheduled task: | |
| ```bash | |
| # Every 5 minutes, make a request to keep model loaded | |
| */5 * * * * curl -X POST https://YOUR_API/ai/chat \ | |
| -H "Authorization: Bearer YOUR_KEY" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"conversation":[{"role":"user","content":"ping"}]}' | |
| ``` | |
| --- | |
| ### **Tip 2: Use System Prompts for Consistency** | |
| ```bash | |
| curl -X POST https://YOUR_API/ai/chat \ | |
| -H "Authorization: Bearer YOUR_KEY" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "conversation": [ | |
| { | |
| "role": "system", | |
| "content": "You are a friendly customer support agent. Be helpful and concise." | |
| }, | |
| { | |
| "role": "user", | |
| "content": "How do I reset my password?" | |
| } | |
| ] | |
| }' | |
| ``` | |
| --- | |
| ### **Tip 3: Batch Document Upload** | |
| Upload multiple documents efficiently: | |
| ```bash | |
| # Create script: batch_upload.sh | |
| for file in docs/*.txt; do | |
| echo "Uploading $file..." | |
| base64 "$file" > temp.b64 | |
| curl -X POST https://YOUR_API/upload \ | |
| -H "Authorization: Bearer YOUR_KEY" \ | |
| -H "Content-Type: application/json" \ | |
| -d "{ | |
| \"filename\": \"$(basename $file)\", | |
| \"content_base64\": \"$(cat temp.b64)\" | |
| }" | |
| sleep 2 # Rate limiting | |
| done | |
| rm temp.b64 | |
| ``` | |
| --- | |
| ### **Tip 4: Monitor Costs** | |
| If using paid hardware: | |
| 1. Check Hugging Face billing: https://huggingface.co/settings/billing | |
| 2. Set up budget alerts | |
| 3. Monitor Space uptime | |
| 4. Pause Space when not in use: | |
| - Settings β "Pause Space" | |
| - Saves money, stops billing | |
| - Resume anytime | |
| --- | |
| ### **Tip 5: Create API Key Tiers** | |
| **In Space Settings**, set up different keys for different users: | |
| ``` | |
| # Free tier - limited rate | |
| API_KEYS=free_user_key_1,free_user_key_2 | |
| # Premium tier - higher rate | |
| PREMIUM_API_KEYS=premium_user_key_1 | |
| # Admin tier - unlimited | |
| ADMIN_API_KEYS=admin_key_1 | |
| ``` | |
| Then adjust rate limits: | |
| ``` | |
| RATE_LIMIT_DEFAULT=60 | |
| RATE_LIMIT_PREMIUM=300 | |
| RATE_LIMIT_ADMIN=10000 | |
| ``` | |
| --- | |
| ## β **Final Checklist** | |
| Before going live, verify: | |
| - [ ] Space is running (green status) | |
| - [ ] Health check returns `"status": "healthy"` | |
| - [ ] Chat endpoint responds correctly | |
| - [ ] Changed default API keys to strong random strings | |
| - [ ] Tested with your own API key | |
| - [ ] Documented your API keys securely (password manager) | |
| - [ ] Set appropriate rate limits | |
| - [ ] Chose right model for your hardware | |
| - [ ] Tested all endpoints you plan to use | |
| - [ ] Reviewed logs for errors | |
| - [ ] (Optional) Upgraded hardware if needed | |
| - [ ] (Optional) Made Space private if needed | |
| --- | |
| ## π **Congratulations!** | |
| You now have: | |
| β A fully functional AI API running on Hugging Face Spaces | |
| β Powered by Ollama (no OpenAI costs!) | |
| β Accessible from anywhere via HTTPS | |
| β Secure with API key authentication | |
| β Ready to integrate into your apps | |
| **Your API URL**: | |
| ``` | |
| https://YOUR_USERNAME-ai-api-ollama.hf.space | |
| ``` | |
| **Share your API** (securely): | |
| - Give URL + API key to developers | |
| - Use in web apps, mobile apps, scripts | |
| - Process millions of requests | |
| - Scale as needed | |
| --- | |
| ## π **Need Help?** | |
| **If you're stuck**: | |
| 1. β Re-read the relevant section | |
| 2. β Check Space logs for errors | |
| 3. β Try the troubleshooting section | |
| 4. β Open an issue on GitHub | |
| 5. β Ask on Hugging Face forums | |
| **Common beginner mistakes**: | |
| - Forgot to rename `Dockerfile.huggingface` to `Dockerfile` | |
| - Used wrong API key format (missing "Bearer") | |
| - Chose model too large for hardware | |
| - Didn't wait for initial model download | |
| --- | |
| ## π **What's Next?** | |
| Now that your API is live: | |
| 1. **Build a chat interface**: | |
| - React app | |
| - Vue app | |
| - Mobile app | |
| - WordPress plugin | |
| 2. **Add more features**: | |
| - User accounts | |
| - Usage analytics | |
| - Custom models | |
| - Advanced RAG | |
| 3. **Scale up**: | |
| - Upgrade hardware | |
| - Add caching | |
| - Load balancing | |
| - CDN | |
| 4. **Monetize** (optional): | |
| - Charge for API access | |
| - Offer different tiers | |
| - White-label for clients | |
| --- | |
| **You did it! ππ** | |
| Your AI-powered API is now live and ready to change the world! | |