Spaces:

jbilcke-hf
/

FlashWorld-ZeroGPU

Running on Zero

App Files Files Community

FlashWorld-ZeroGPU / ZEROGPU_MIGRATION.md

Julian Bilcke

update documentation

4f5fb48 20 days ago

preview code

raw

history blame contribute delete

8.72 kB

ZeroGPU Migration Guide

This document describes the changes made to enable FlashWorld to run on Hugging Face Spaces with ZeroGPU.

Overview

FlashWorld has been adapted to support ZeroGPU deployment on Hugging Face Spaces. This allows the model to run on free, dynamically allocated GPU resources with a configurable time budget.

Changes Made

1. New Gradio Application (`app_gradio.py`)

Created a new Gradio-based interface that replaces the Flask API for ZeroGPU deployment:

Key Features:

Uses Gradio 5.49.1+ for the interface
Implements @spaces.GPU(duration=15) decorator with 15-second GPU budget
Model loading happens in global scope (outside GPU decorator) for efficiency
Simpler interface compared to the original Flask app with custom HTML
Accepts camera trajectory as JSON input
Returns PLY files for download

Architecture:

# Model loads globally (once, at startup)
generation_system = GenerationSystem(ckpt_path=ckpt_path, device=device, offload_t5=args.offload_t5)

# Generation function uses GPU only when called
@spaces.GPU(duration=15)
def generate_scene(image_prompt, text_prompt, camera_json, resolution):
    # GPU-intensive work happens here
    # Returns PLY file + status message

2. Requirements Updates (`requirements.txt`)

Removed:

flask==3.1.2 (not needed for ZeroGPU deployment)

Added:

spaces (Hugging Face Spaces integration library)

Kept:

gradio==5.49.1 (required for Gradio SDK)
All other dependencies remain unchanged

3. System Dependencies (`packages.txt`)

Created new file to install system-level dependencies required by gsplat for CUDA compilation:

libglm-dev (OpenGL Mathematics library headers)
build-essential (compilation tools)

4. README Updates

Added YAML frontmatter: ```yaml

title: FlashWorld emoji: 🌎 colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.49.1 app_file: app_gradio.py pinned: false license: cc-by-nc-sa-4.0 python_version: 3.10.13


**Added ZeroGPU deployment section:**
- Instructions for deploying on Hugging Face Spaces
- Documentation of 15-second GPU budget
- Explanation of model loading strategy

### 5. CLAUDE.md Updates

Updated the development documentation to include:
- Instructions for running both Flask (local) and Gradio (ZeroGPU) versions
- Documentation of ZeroGPU configuration
- Explanation of decorator usage and model loading patterns

### 6. Example Camera Trajectory

Created `examples/simple_trajectory.json` with a basic 5-camera forward-moving trajectory to help users get started.

## Key Design Decisions

### Why 15 Seconds?

The GPU duration budget was set to 15 seconds for the following reasons:
1. Generation takes ~7 seconds on A100/A800
2. Additional time needed for:
   - Input processing (image resizing, camera parsing)
   - Export to PLY format
   - Buffer for slower GPUs or variable load
3. ZeroGPU default is 60 seconds, so 15 seconds is conservative

### Model Loading Strategy

The model is loaded **once** in global scope, not inside the `@spaces.GPU` decorator:

**Advantages:**
- Model loads at startup, not on every generation
- Faster response time for users
- More efficient use of GPU time budget
- Follows ZeroGPU best practices

**Implementation:**
```python
# Global scope - loads once at startup
generation_system = GenerationSystem(...)

# GPU decorator - only for inference
@spaces.GPU(duration=15)
def generate_scene(...):
    return generation_system.generate(...)

Input Format

Camera trajectories are provided as JSON to make the Gradio interface simpler:

{
  "cameras": [
    {
      "quaternion": [w, x, y, z],
      "position": [x, y, z],
      "fx": 352.0,
      "fy": 352.0,
      "cx": 352.0,
      "cy": 240.0
    }
  ]
}

This is different from the Flask API which used nested dictionaries in the POST request.

Deployment Instructions

Local Testing

Test the Gradio app locally before deploying:

python app_gradio.py

This will start the Gradio interface at http://localhost:7860

Hugging Face Spaces Deployment

Create a new Space:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Select "ZeroGPU" as hardware
Upload files:
- Push this repository to the Space
- Ensure app_gradio.py is set as the app file in README.md
Configuration:
- The Space will automatically use the YAML frontmatter in README.md
- Model checkpoint will auto-download from HuggingFace Hub
- No additional configuration needed
Optional: Enable --offload_t5 flag:
- Edit app_gradio.py to add offload_t5=True in GenerationSystem initialization
- This reduces GPU memory usage but may slightly increase generation time

Limitations

ZeroGPU Constraints

60-second hard limit: Cannot exceed 60 seconds per GPU call
No torch.compile: Not supported in ZeroGPU environment
Gradio only: Must use Gradio SDK (no Flask or other frameworks)
Python 3.10.13: Recommended Python version

Feature Differences from Flask App

The Gradio app (app_gradio.py) differs from the original Flask app (app.py):

Missing features:

Custom HTML/CSS interface
Real-time 3D preview with Spark.js
Manual camera trajectory recording with mouse/keyboard
Template-based trajectory generation
Queue visualization with progress bars
Concurrent request handling

Present features:

Image and text prompts
Camera trajectory input (via JSON)
PLY file generation and download
Simple, accessible Gradio interface

Recommended Usage

For ZeroGPU deployment:

Use app_gradio.py
Keep camera trajectories reasonable (≤24 frames)
Consider enabling --offload_t5 for memory savings

For local development with full features:

Use app.py
Enjoy the full custom UI with interactive camera controls
Support for multiple concurrent generations

Testing

Test the Gradio App

# Start the app
python app_gradio.py

# In the browser (http://localhost:7860):
# 1. Upload an image (optional)
# 2. Enter text prompt (optional)
# 3. Paste example camera JSON from examples/simple_trajectory.json
# 4. Select resolution (24x480x704)
# 5. Click "Generate 3D Scene"

Verify GPU Decorator

Check that model loading happens outside the decorator:

# Good - model loads once at startup
generation_system = GenerationSystem(...)

@spaces.GPU(duration=15)
def generate_scene(...):
    return generation_system.generate(...)

# Bad - would reload model on every call (slow!)
@spaces.GPU(duration=15)
def generate_scene(...):
    generation_system = GenerationSystem(...)  # Don't do this!
    return generation_system.generate(...)

Troubleshooting

"GPU budget exceeded"

Cause: Generation took longer than 15 seconds

Solutions:

Reduce number of frames in camera trajectory
Enable --offload_t5 flag
Increase duration: @spaces.GPU(duration=20)

"Out of memory"

Cause: GPU memory exhausted

Solutions:

Enable T5 offloading: offload_t5=True
Enable VAE offloading: offload_vae=True
Reduce resolution
Reduce number of frames

"Model checkpoint not found"

Cause: Automatic download failed

Solutions:

Check internet connection
Verify HuggingFace access
Manually download and specify with --ckpt flag

"Error building extension 'gsplat_cuda'" or "glm/gtc/type_ptr.hpp: No such file or directory"

Cause: Missing GLM library headers required for gsplat CUDA compilation

Solutions:

Ensure packages.txt exists with libglm-dev and build-essential
Restart the Space to reinstall dependencies
Check Space build logs for system package installation errors

"Bias is not supported when out_dtype is set to Float32"

Cause: PyTorch FP8 operations limitation on certain GPU architectures

Solutions:

This is fixed in quant.py by applying bias separately when needed
Ensure you have the latest version of the code

Future Improvements

Potential enhancements for ZeroGPU deployment:

Gradio Blocks UI: Add more interactive controls
Example gallery: Pre-loaded example camera trajectories
3D visualization: Embed PLY viewer in Gradio interface
Video preview: Show rendered video before downloading PLY
Dynamic duration: Adjust GPU budget based on camera count

ZeroGPU Migration Guide

Overview

Changes Made

1. New Gradio Application (app_gradio.py)

2. Requirements Updates (requirements.txt)

3. System Dependencies (packages.txt)

4. README Updates

Added YAML frontmatter: ```yaml

title: FlashWorld emoji: 🌎 colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.49.1 app_file: app_gradio.py pinned: false license: cc-by-nc-sa-4.0 python_version: 3.10.13

Input Format

Deployment Instructions

Local Testing

Hugging Face Spaces Deployment

Limitations

ZeroGPU Constraints

Feature Differences from Flask App

Recommended Usage

Testing

Test the Gradio App

Verify GPU Decorator

Troubleshooting

"GPU budget exceeded"

"Out of memory"

"Model checkpoint not found"

"Error building extension 'gsplat_cuda'" or "glm/gtc/type_ptr.hpp: No such file or directory"

"Bias is not supported when out_dtype is set to Float32"

Future Improvements

References

1. New Gradio Application (`app_gradio.py`)

2. Requirements Updates (`requirements.txt`)

3. System Dependencies (`packages.txt`)