# ZeroGPU Migration Guide

This document describes the changes made to enable FlashWorld to run on Hugging Face Spaces with ZeroGPU.

## Overview

FlashWorld has been adapted to support ZeroGPU deployment on Hugging Face Spaces. This allows the model to run on free, dynamically allocated GPU resources with a configurable time budget.

## Changes Made

### 1. New Gradio Application (`app_gradio.py`)

Created a new Gradio-based interface that replaces the Flask API for ZeroGPU deployment:

**Key Features:**
- Uses Gradio 5.49.1+ for the interface
- Implements `@spaces.GPU(duration=15)` decorator with 15-second GPU budget
- Model loading happens in global scope (outside GPU decorator) for efficiency
- Simpler interface compared to the original Flask app with custom HTML
- Accepts camera trajectory as JSON input
- Returns PLY files for download

**Architecture:**
```python
# Model loads globally (once, at startup)
generation_system = GenerationSystem(ckpt_path=ckpt_path, device=device, offload_t5=args.offload_t5)

# Generation function uses GPU only when called
@spaces.GPU(duration=15)
def generate_scene(image_prompt, text_prompt, camera_json, resolution):
    # GPU-intensive work happens here
    # Returns PLY file + status message
```

### 2. Requirements Updates (`requirements.txt`)

**Removed:**
- `flask==3.1.2` (not needed for ZeroGPU deployment)

**Added:**
- `spaces` (Hugging Face Spaces integration library)

**Kept:**
- `gradio==5.49.1` (required for Gradio SDK)
- All other dependencies remain unchanged

### 3. System Dependencies (`packages.txt`)

**Created new file** to install system-level dependencies required by gsplat for CUDA compilation:
- `libglm-dev` (OpenGL Mathematics library headers)
- `build-essential` (compilation tools)

### 4. README Updates

**Added YAML frontmatter:**
```yaml
---
title: FlashWorld
emoji: 🌎
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app_gradio.py
pinned: false
license: cc-by-nc-sa-4.0
python_version: 3.10.13
---
```

**Added ZeroGPU deployment section:**
- Instructions for deploying on Hugging Face Spaces
- Documentation of 15-second GPU budget
- Explanation of model loading strategy

### 5. CLAUDE.md Updates

Updated the development documentation to include:
- Instructions for running both Flask (local) and Gradio (ZeroGPU) versions
- Documentation of ZeroGPU configuration
- Explanation of decorator usage and model loading patterns

### 6. Example Camera Trajectory

Created `examples/simple_trajectory.json` with a basic 5-camera forward-moving trajectory to help users get started.

## Key Design Decisions

### Why 15 Seconds?

The GPU duration budget was set to 15 seconds for the following reasons:
1. Generation takes ~7 seconds on A100/A800
2. Additional time needed for:
   - Input processing (image resizing, camera parsing)
   - Export to PLY format
   - Buffer for slower GPUs or variable load
3. ZeroGPU default is 60 seconds, so 15 seconds is conservative

### Model Loading Strategy

The model is loaded **once** in global scope, not inside the `@spaces.GPU` decorator:

**Advantages:**
- Model loads at startup, not on every generation
- Faster response time for users
- More efficient use of GPU time budget
- Follows ZeroGPU best practices

**Implementation:**
```python
# Global scope - loads once at startup
generation_system = GenerationSystem(...)

# GPU decorator - only for inference
@spaces.GPU(duration=15)
def generate_scene(...):
    return generation_system.generate(...)
```

### Input Format

Camera trajectories are provided as JSON to make the Gradio interface simpler:

```json
{
  "cameras": [
    {
      "quaternion": [w, x, y, z],
      "position": [x, y, z],
      "fx": 352.0,
      "fy": 352.0,
      "cx": 352.0,
      "cy": 240.0
    }
  ]
}
```

This is different from the Flask API which used nested dictionaries in the POST request.

## Deployment Instructions

### Local Testing

Test the Gradio app locally before deploying:

```bash
python app_gradio.py
```

This will start the Gradio interface at `http://localhost:7860`

### Hugging Face Spaces Deployment

1. **Create a new Space:**
   - Go to https://huggingface.co/spaces
   - Click "Create new Space"
   - Select "ZeroGPU" as hardware

2. **Upload files:**
   - Push this repository to the Space
   - Ensure `app_gradio.py` is set as the app file in README.md

3. **Configuration:**
   - The Space will automatically use the YAML frontmatter in README.md
   - Model checkpoint will auto-download from HuggingFace Hub
   - No additional configuration needed

4. **Optional: Enable `--offload_t5` flag:**
   - Edit `app_gradio.py` to add `offload_t5=True` in `GenerationSystem` initialization
   - This reduces GPU memory usage but may slightly increase generation time

## Limitations

### ZeroGPU Constraints

1. **60-second hard limit:** Cannot exceed 60 seconds per GPU call
2. **No torch.compile:** Not supported in ZeroGPU environment
3. **Gradio only:** Must use Gradio SDK (no Flask or other frameworks)
4. **Python 3.10.13:** Recommended Python version

### Feature Differences from Flask App

The Gradio app (`app_gradio.py`) differs from the original Flask app (`app.py`):

**Missing features:**
- Custom HTML/CSS interface
- Real-time 3D preview with Spark.js
- Manual camera trajectory recording with mouse/keyboard
- Template-based trajectory generation
- Queue visualization with progress bars
- Concurrent request handling

**Present features:**
- Image and text prompts
- Camera trajectory input (via JSON)
- PLY file generation and download
- Simple, accessible Gradio interface

### Recommended Usage

For **ZeroGPU deployment:**
- Use `app_gradio.py`
- Keep camera trajectories reasonable (≤24 frames)
- Consider enabling `--offload_t5` for memory savings

For **local development with full features:**
- Use `app.py`
- Enjoy the full custom UI with interactive camera controls
- Support for multiple concurrent generations

## Testing

### Test the Gradio App

```bash
# Start the app
python app_gradio.py

# In the browser (http://localhost:7860):
# 1. Upload an image (optional)
# 2. Enter text prompt (optional)
# 3. Paste example camera JSON from examples/simple_trajectory.json
# 4. Select resolution (24x480x704)
# 5. Click "Generate 3D Scene"
```

### Verify GPU Decorator

Check that model loading happens outside the decorator:

```python
# Good - model loads once at startup
generation_system = GenerationSystem(...)

@spaces.GPU(duration=15)
def generate_scene(...):
    return generation_system.generate(...)

# Bad - would reload model on every call (slow!)
@spaces.GPU(duration=15)
def generate_scene(...):
    generation_system = GenerationSystem(...)  # Don't do this!
    return generation_system.generate(...)
```

## Troubleshooting

### "GPU budget exceeded"

**Cause:** Generation took longer than 15 seconds

**Solutions:**
- Reduce number of frames in camera trajectory
- Enable `--offload_t5` flag
- Increase duration: `@spaces.GPU(duration=20)`

### "Out of memory"

**Cause:** GPU memory exhausted

**Solutions:**
- Enable T5 offloading: `offload_t5=True`
- Enable VAE offloading: `offload_vae=True`
- Reduce resolution
- Reduce number of frames

### "Model checkpoint not found"

**Cause:** Automatic download failed

**Solutions:**
- Check internet connection
- Verify HuggingFace access
- Manually download and specify with `--ckpt` flag

### "Error building extension 'gsplat_cuda'" or "glm/gtc/type_ptr.hpp: No such file or directory"

**Cause:** Missing GLM library headers required for gsplat CUDA compilation

**Solutions:**
- Ensure `packages.txt` exists with `libglm-dev` and `build-essential`
- Restart the Space to reinstall dependencies
- Check Space build logs for system package installation errors

### "Bias is not supported when out_dtype is set to Float32"

**Cause:** PyTorch FP8 operations limitation on certain GPU architectures

**Solutions:**
- This is fixed in `quant.py` by applying bias separately when needed
- Ensure you have the latest version of the code

## Future Improvements

Potential enhancements for ZeroGPU deployment:

1. **Gradio Blocks UI:** Add more interactive controls
2. **Example gallery:** Pre-loaded example camera trajectories
3. **3D visualization:** Embed PLY viewer in Gradio interface
4. **Video preview:** Show rendered video before downloading PLY
5. **Dynamic duration:** Adjust GPU budget based on camera count

## References

- [ZeroGPU Documentation](https://huggingface.co/docs/hub/en/spaces-zerogpu)
- [Gradio Documentation](https://gradio.app/docs/)
- [FlashWorld Paper](https://arxiv.org/pdf/2510.13678)
- [FlashWorld Project Page](https://imlixinyang.github.io/FlashWorld-Project-Page)