FlashWorld-ZeroGPU / ZEROGPU_MIGRATION.md
Julian Bilcke
update documentation
4f5fb48
# ZeroGPU Migration Guide
This document describes the changes made to enable FlashWorld to run on Hugging Face Spaces with ZeroGPU.
## Overview
FlashWorld has been adapted to support ZeroGPU deployment on Hugging Face Spaces. This allows the model to run on free, dynamically allocated GPU resources with a configurable time budget.
## Changes Made
### 1. New Gradio Application (`app_gradio.py`)
Created a new Gradio-based interface that replaces the Flask API for ZeroGPU deployment:
**Key Features:**
- Uses Gradio 5.49.1+ for the interface
- Implements `@spaces.GPU(duration=15)` decorator with 15-second GPU budget
- Model loading happens in global scope (outside GPU decorator) for efficiency
- Simpler interface compared to the original Flask app with custom HTML
- Accepts camera trajectory as JSON input
- Returns PLY files for download
**Architecture:**
```python
# Model loads globally (once, at startup)
generation_system = GenerationSystem(ckpt_path=ckpt_path, device=device, offload_t5=args.offload_t5)
# Generation function uses GPU only when called
@spaces.GPU(duration=15)
def generate_scene(image_prompt, text_prompt, camera_json, resolution):
# GPU-intensive work happens here
# Returns PLY file + status message
```
### 2. Requirements Updates (`requirements.txt`)
**Removed:**
- `flask==3.1.2` (not needed for ZeroGPU deployment)
**Added:**
- `spaces` (Hugging Face Spaces integration library)
**Kept:**
- `gradio==5.49.1` (required for Gradio SDK)
- All other dependencies remain unchanged
### 3. System Dependencies (`packages.txt`)
**Created new file** to install system-level dependencies required by gsplat for CUDA compilation:
- `libglm-dev` (OpenGL Mathematics library headers)
- `build-essential` (compilation tools)
### 4. README Updates
**Added YAML frontmatter:**
```yaml
---
title: FlashWorld
emoji: 🌎
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app_gradio.py
pinned: false
license: cc-by-nc-sa-4.0
python_version: 3.10.13
---
```
**Added ZeroGPU deployment section:**
- Instructions for deploying on Hugging Face Spaces
- Documentation of 15-second GPU budget
- Explanation of model loading strategy
### 5. CLAUDE.md Updates
Updated the development documentation to include:
- Instructions for running both Flask (local) and Gradio (ZeroGPU) versions
- Documentation of ZeroGPU configuration
- Explanation of decorator usage and model loading patterns
### 6. Example Camera Trajectory
Created `examples/simple_trajectory.json` with a basic 5-camera forward-moving trajectory to help users get started.
## Key Design Decisions
### Why 15 Seconds?
The GPU duration budget was set to 15 seconds for the following reasons:
1. Generation takes ~7 seconds on A100/A800
2. Additional time needed for:
- Input processing (image resizing, camera parsing)
- Export to PLY format
- Buffer for slower GPUs or variable load
3. ZeroGPU default is 60 seconds, so 15 seconds is conservative
### Model Loading Strategy
The model is loaded **once** in global scope, not inside the `@spaces.GPU` decorator:
**Advantages:**
- Model loads at startup, not on every generation
- Faster response time for users
- More efficient use of GPU time budget
- Follows ZeroGPU best practices
**Implementation:**
```python
# Global scope - loads once at startup
generation_system = GenerationSystem(...)
# GPU decorator - only for inference
@spaces.GPU(duration=15)
def generate_scene(...):
return generation_system.generate(...)
```
### Input Format
Camera trajectories are provided as JSON to make the Gradio interface simpler:
```json
{
"cameras": [
{
"quaternion": [w, x, y, z],
"position": [x, y, z],
"fx": 352.0,
"fy": 352.0,
"cx": 352.0,
"cy": 240.0
}
]
}
```
This is different from the Flask API which used nested dictionaries in the POST request.
## Deployment Instructions
### Local Testing
Test the Gradio app locally before deploying:
```bash
python app_gradio.py
```
This will start the Gradio interface at `http://localhost:7860`
### Hugging Face Spaces Deployment
1. **Create a new Space:**
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Select "ZeroGPU" as hardware
2. **Upload files:**
- Push this repository to the Space
- Ensure `app_gradio.py` is set as the app file in README.md
3. **Configuration:**
- The Space will automatically use the YAML frontmatter in README.md
- Model checkpoint will auto-download from HuggingFace Hub
- No additional configuration needed
4. **Optional: Enable `--offload_t5` flag:**
- Edit `app_gradio.py` to add `offload_t5=True` in `GenerationSystem` initialization
- This reduces GPU memory usage but may slightly increase generation time
## Limitations
### ZeroGPU Constraints
1. **60-second hard limit:** Cannot exceed 60 seconds per GPU call
2. **No torch.compile:** Not supported in ZeroGPU environment
3. **Gradio only:** Must use Gradio SDK (no Flask or other frameworks)
4. **Python 3.10.13:** Recommended Python version
### Feature Differences from Flask App
The Gradio app (`app_gradio.py`) differs from the original Flask app (`app.py`):
**Missing features:**
- Custom HTML/CSS interface
- Real-time 3D preview with Spark.js
- Manual camera trajectory recording with mouse/keyboard
- Template-based trajectory generation
- Queue visualization with progress bars
- Concurrent request handling
**Present features:**
- Image and text prompts
- Camera trajectory input (via JSON)
- PLY file generation and download
- Simple, accessible Gradio interface
### Recommended Usage
For **ZeroGPU deployment:**
- Use `app_gradio.py`
- Keep camera trajectories reasonable (≀24 frames)
- Consider enabling `--offload_t5` for memory savings
For **local development with full features:**
- Use `app.py`
- Enjoy the full custom UI with interactive camera controls
- Support for multiple concurrent generations
## Testing
### Test the Gradio App
```bash
# Start the app
python app_gradio.py
# In the browser (http://localhost:7860):
# 1. Upload an image (optional)
# 2. Enter text prompt (optional)
# 3. Paste example camera JSON from examples/simple_trajectory.json
# 4. Select resolution (24x480x704)
# 5. Click "Generate 3D Scene"
```
### Verify GPU Decorator
Check that model loading happens outside the decorator:
```python
# Good - model loads once at startup
generation_system = GenerationSystem(...)
@spaces.GPU(duration=15)
def generate_scene(...):
return generation_system.generate(...)
# Bad - would reload model on every call (slow!)
@spaces.GPU(duration=15)
def generate_scene(...):
generation_system = GenerationSystem(...) # Don't do this!
return generation_system.generate(...)
```
## Troubleshooting
### "GPU budget exceeded"
**Cause:** Generation took longer than 15 seconds
**Solutions:**
- Reduce number of frames in camera trajectory
- Enable `--offload_t5` flag
- Increase duration: `@spaces.GPU(duration=20)`
### "Out of memory"
**Cause:** GPU memory exhausted
**Solutions:**
- Enable T5 offloading: `offload_t5=True`
- Enable VAE offloading: `offload_vae=True`
- Reduce resolution
- Reduce number of frames
### "Model checkpoint not found"
**Cause:** Automatic download failed
**Solutions:**
- Check internet connection
- Verify HuggingFace access
- Manually download and specify with `--ckpt` flag
### "Error building extension 'gsplat_cuda'" or "glm/gtc/type_ptr.hpp: No such file or directory"
**Cause:** Missing GLM library headers required for gsplat CUDA compilation
**Solutions:**
- Ensure `packages.txt` exists with `libglm-dev` and `build-essential`
- Restart the Space to reinstall dependencies
- Check Space build logs for system package installation errors
### "Bias is not supported when out_dtype is set to Float32"
**Cause:** PyTorch FP8 operations limitation on certain GPU architectures
**Solutions:**
- This is fixed in `quant.py` by applying bias separately when needed
- Ensure you have the latest version of the code
## Future Improvements
Potential enhancements for ZeroGPU deployment:
1. **Gradio Blocks UI:** Add more interactive controls
2. **Example gallery:** Pre-loaded example camera trajectories
3. **3D visualization:** Embed PLY viewer in Gradio interface
4. **Video preview:** Show rendered video before downloading PLY
5. **Dynamic duration:** Adjust GPU budget based on camera count
## References
- [ZeroGPU Documentation](https://huggingface.co/docs/hub/en/spaces-zerogpu)
- [Gradio Documentation](https://gradio.app/docs/)
- [FlashWorld Paper](https://arxiv.org/pdf/2510.13678)
- [FlashWorld Project Page](https://imlixinyang.github.io/FlashWorld-Project-Page)