Spaces:
Running
on
Zero
ZeroGPU Migration Guide
This document describes the changes made to enable FlashWorld to run on Hugging Face Spaces with ZeroGPU.
Overview
FlashWorld has been adapted to support ZeroGPU deployment on Hugging Face Spaces. This allows the model to run on free, dynamically allocated GPU resources with a configurable time budget.
Changes Made
1. New Gradio Application (app_gradio.py)
Created a new Gradio-based interface that replaces the Flask API for ZeroGPU deployment:
Key Features:
- Uses Gradio 5.49.1+ for the interface
- Implements
@spaces.GPU(duration=15)decorator with 15-second GPU budget - Model loading happens in global scope (outside GPU decorator) for efficiency
- Simpler interface compared to the original Flask app with custom HTML
- Accepts camera trajectory as JSON input
- Returns PLY files for download
Architecture:
# Model loads globally (once, at startup)
generation_system = GenerationSystem(ckpt_path=ckpt_path, device=device, offload_t5=args.offload_t5)
# Generation function uses GPU only when called
@spaces.GPU(duration=15)
def generate_scene(image_prompt, text_prompt, camera_json, resolution):
# GPU-intensive work happens here
# Returns PLY file + status message
2. Requirements Updates (requirements.txt)
Removed:
flask==3.1.2(not needed for ZeroGPU deployment)
Added:
spaces(Hugging Face Spaces integration library)
Kept:
gradio==5.49.1(required for Gradio SDK)- All other dependencies remain unchanged
3. System Dependencies (packages.txt)
Created new file to install system-level dependencies required by gsplat for CUDA compilation:
libglm-dev(OpenGL Mathematics library headers)build-essential(compilation tools)
4. README Updates
Added YAML frontmatter: ```yaml
title: FlashWorld emoji: π colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.49.1 app_file: app_gradio.py pinned: false license: cc-by-nc-sa-4.0 python_version: 3.10.13
**Added ZeroGPU deployment section:**
- Instructions for deploying on Hugging Face Spaces
- Documentation of 15-second GPU budget
- Explanation of model loading strategy
### 5. CLAUDE.md Updates
Updated the development documentation to include:
- Instructions for running both Flask (local) and Gradio (ZeroGPU) versions
- Documentation of ZeroGPU configuration
- Explanation of decorator usage and model loading patterns
### 6. Example Camera Trajectory
Created `examples/simple_trajectory.json` with a basic 5-camera forward-moving trajectory to help users get started.
## Key Design Decisions
### Why 15 Seconds?
The GPU duration budget was set to 15 seconds for the following reasons:
1. Generation takes ~7 seconds on A100/A800
2. Additional time needed for:
- Input processing (image resizing, camera parsing)
- Export to PLY format
- Buffer for slower GPUs or variable load
3. ZeroGPU default is 60 seconds, so 15 seconds is conservative
### Model Loading Strategy
The model is loaded **once** in global scope, not inside the `@spaces.GPU` decorator:
**Advantages:**
- Model loads at startup, not on every generation
- Faster response time for users
- More efficient use of GPU time budget
- Follows ZeroGPU best practices
**Implementation:**
```python
# Global scope - loads once at startup
generation_system = GenerationSystem(...)
# GPU decorator - only for inference
@spaces.GPU(duration=15)
def generate_scene(...):
return generation_system.generate(...)
Input Format
Camera trajectories are provided as JSON to make the Gradio interface simpler:
{
"cameras": [
{
"quaternion": [w, x, y, z],
"position": [x, y, z],
"fx": 352.0,
"fy": 352.0,
"cx": 352.0,
"cy": 240.0
}
]
}
This is different from the Flask API which used nested dictionaries in the POST request.
Deployment Instructions
Local Testing
Test the Gradio app locally before deploying:
python app_gradio.py
This will start the Gradio interface at http://localhost:7860
Hugging Face Spaces Deployment
Create a new Space:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Select "ZeroGPU" as hardware
Upload files:
- Push this repository to the Space
- Ensure
app_gradio.pyis set as the app file in README.md
Configuration:
- The Space will automatically use the YAML frontmatter in README.md
- Model checkpoint will auto-download from HuggingFace Hub
- No additional configuration needed
Optional: Enable
--offload_t5flag:- Edit
app_gradio.pyto addoffload_t5=TrueinGenerationSysteminitialization - This reduces GPU memory usage but may slightly increase generation time
- Edit
Limitations
ZeroGPU Constraints
- 60-second hard limit: Cannot exceed 60 seconds per GPU call
- No torch.compile: Not supported in ZeroGPU environment
- Gradio only: Must use Gradio SDK (no Flask or other frameworks)
- Python 3.10.13: Recommended Python version
Feature Differences from Flask App
The Gradio app (app_gradio.py) differs from the original Flask app (app.py):
Missing features:
- Custom HTML/CSS interface
- Real-time 3D preview with Spark.js
- Manual camera trajectory recording with mouse/keyboard
- Template-based trajectory generation
- Queue visualization with progress bars
- Concurrent request handling
Present features:
- Image and text prompts
- Camera trajectory input (via JSON)
- PLY file generation and download
- Simple, accessible Gradio interface
Recommended Usage
For ZeroGPU deployment:
- Use
app_gradio.py - Keep camera trajectories reasonable (β€24 frames)
- Consider enabling
--offload_t5for memory savings
For local development with full features:
- Use
app.py - Enjoy the full custom UI with interactive camera controls
- Support for multiple concurrent generations
Testing
Test the Gradio App
# Start the app
python app_gradio.py
# In the browser (http://localhost:7860):
# 1. Upload an image (optional)
# 2. Enter text prompt (optional)
# 3. Paste example camera JSON from examples/simple_trajectory.json
# 4. Select resolution (24x480x704)
# 5. Click "Generate 3D Scene"
Verify GPU Decorator
Check that model loading happens outside the decorator:
# Good - model loads once at startup
generation_system = GenerationSystem(...)
@spaces.GPU(duration=15)
def generate_scene(...):
return generation_system.generate(...)
# Bad - would reload model on every call (slow!)
@spaces.GPU(duration=15)
def generate_scene(...):
generation_system = GenerationSystem(...) # Don't do this!
return generation_system.generate(...)
Troubleshooting
"GPU budget exceeded"
Cause: Generation took longer than 15 seconds
Solutions:
- Reduce number of frames in camera trajectory
- Enable
--offload_t5flag - Increase duration:
@spaces.GPU(duration=20)
"Out of memory"
Cause: GPU memory exhausted
Solutions:
- Enable T5 offloading:
offload_t5=True - Enable VAE offloading:
offload_vae=True - Reduce resolution
- Reduce number of frames
"Model checkpoint not found"
Cause: Automatic download failed
Solutions:
- Check internet connection
- Verify HuggingFace access
- Manually download and specify with
--ckptflag
"Error building extension 'gsplat_cuda'" or "glm/gtc/type_ptr.hpp: No such file or directory"
Cause: Missing GLM library headers required for gsplat CUDA compilation
Solutions:
- Ensure
packages.txtexists withlibglm-devandbuild-essential - Restart the Space to reinstall dependencies
- Check Space build logs for system package installation errors
"Bias is not supported when out_dtype is set to Float32"
Cause: PyTorch FP8 operations limitation on certain GPU architectures
Solutions:
- This is fixed in
quant.pyby applying bias separately when needed - Ensure you have the latest version of the code
Future Improvements
Potential enhancements for ZeroGPU deployment:
- Gradio Blocks UI: Add more interactive controls
- Example gallery: Pre-loaded example camera trajectories
- 3D visualization: Embed PLY viewer in Gradio interface
- Video preview: Show rendered video before downloading PLY
- Dynamic duration: Adjust GPU budget based on camera count