FlashWorld-ZeroGPU / ZEROGPU_MIGRATION.md
Julian Bilcke
update documentation
4f5fb48

ZeroGPU Migration Guide

This document describes the changes made to enable FlashWorld to run on Hugging Face Spaces with ZeroGPU.

Overview

FlashWorld has been adapted to support ZeroGPU deployment on Hugging Face Spaces. This allows the model to run on free, dynamically allocated GPU resources with a configurable time budget.

Changes Made

1. New Gradio Application (app_gradio.py)

Created a new Gradio-based interface that replaces the Flask API for ZeroGPU deployment:

Key Features:

  • Uses Gradio 5.49.1+ for the interface
  • Implements @spaces.GPU(duration=15) decorator with 15-second GPU budget
  • Model loading happens in global scope (outside GPU decorator) for efficiency
  • Simpler interface compared to the original Flask app with custom HTML
  • Accepts camera trajectory as JSON input
  • Returns PLY files for download

Architecture:

# Model loads globally (once, at startup)
generation_system = GenerationSystem(ckpt_path=ckpt_path, device=device, offload_t5=args.offload_t5)

# Generation function uses GPU only when called
@spaces.GPU(duration=15)
def generate_scene(image_prompt, text_prompt, camera_json, resolution):
    # GPU-intensive work happens here
    # Returns PLY file + status message

2. Requirements Updates (requirements.txt)

Removed:

  • flask==3.1.2 (not needed for ZeroGPU deployment)

Added:

  • spaces (Hugging Face Spaces integration library)

Kept:

  • gradio==5.49.1 (required for Gradio SDK)
  • All other dependencies remain unchanged

3. System Dependencies (packages.txt)

Created new file to install system-level dependencies required by gsplat for CUDA compilation:

  • libglm-dev (OpenGL Mathematics library headers)
  • build-essential (compilation tools)

4. README Updates

Added YAML frontmatter: ```yaml

title: FlashWorld emoji: 🌎 colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.49.1 app_file: app_gradio.py pinned: false license: cc-by-nc-sa-4.0 python_version: 3.10.13


**Added ZeroGPU deployment section:**
- Instructions for deploying on Hugging Face Spaces
- Documentation of 15-second GPU budget
- Explanation of model loading strategy

### 5. CLAUDE.md Updates

Updated the development documentation to include:
- Instructions for running both Flask (local) and Gradio (ZeroGPU) versions
- Documentation of ZeroGPU configuration
- Explanation of decorator usage and model loading patterns

### 6. Example Camera Trajectory

Created `examples/simple_trajectory.json` with a basic 5-camera forward-moving trajectory to help users get started.

## Key Design Decisions

### Why 15 Seconds?

The GPU duration budget was set to 15 seconds for the following reasons:
1. Generation takes ~7 seconds on A100/A800
2. Additional time needed for:
   - Input processing (image resizing, camera parsing)
   - Export to PLY format
   - Buffer for slower GPUs or variable load
3. ZeroGPU default is 60 seconds, so 15 seconds is conservative

### Model Loading Strategy

The model is loaded **once** in global scope, not inside the `@spaces.GPU` decorator:

**Advantages:**
- Model loads at startup, not on every generation
- Faster response time for users
- More efficient use of GPU time budget
- Follows ZeroGPU best practices

**Implementation:**
```python
# Global scope - loads once at startup
generation_system = GenerationSystem(...)

# GPU decorator - only for inference
@spaces.GPU(duration=15)
def generate_scene(...):
    return generation_system.generate(...)

Input Format

Camera trajectories are provided as JSON to make the Gradio interface simpler:

{
  "cameras": [
    {
      "quaternion": [w, x, y, z],
      "position": [x, y, z],
      "fx": 352.0,
      "fy": 352.0,
      "cx": 352.0,
      "cy": 240.0
    }
  ]
}

This is different from the Flask API which used nested dictionaries in the POST request.

Deployment Instructions

Local Testing

Test the Gradio app locally before deploying:

python app_gradio.py

This will start the Gradio interface at http://localhost:7860

Hugging Face Spaces Deployment

  1. Create a new Space:

  2. Upload files:

    • Push this repository to the Space
    • Ensure app_gradio.py is set as the app file in README.md
  3. Configuration:

    • The Space will automatically use the YAML frontmatter in README.md
    • Model checkpoint will auto-download from HuggingFace Hub
    • No additional configuration needed
  4. Optional: Enable --offload_t5 flag:

    • Edit app_gradio.py to add offload_t5=True in GenerationSystem initialization
    • This reduces GPU memory usage but may slightly increase generation time

Limitations

ZeroGPU Constraints

  1. 60-second hard limit: Cannot exceed 60 seconds per GPU call
  2. No torch.compile: Not supported in ZeroGPU environment
  3. Gradio only: Must use Gradio SDK (no Flask or other frameworks)
  4. Python 3.10.13: Recommended Python version

Feature Differences from Flask App

The Gradio app (app_gradio.py) differs from the original Flask app (app.py):

Missing features:

  • Custom HTML/CSS interface
  • Real-time 3D preview with Spark.js
  • Manual camera trajectory recording with mouse/keyboard
  • Template-based trajectory generation
  • Queue visualization with progress bars
  • Concurrent request handling

Present features:

  • Image and text prompts
  • Camera trajectory input (via JSON)
  • PLY file generation and download
  • Simple, accessible Gradio interface

Recommended Usage

For ZeroGPU deployment:

  • Use app_gradio.py
  • Keep camera trajectories reasonable (≀24 frames)
  • Consider enabling --offload_t5 for memory savings

For local development with full features:

  • Use app.py
  • Enjoy the full custom UI with interactive camera controls
  • Support for multiple concurrent generations

Testing

Test the Gradio App

# Start the app
python app_gradio.py

# In the browser (http://localhost:7860):
# 1. Upload an image (optional)
# 2. Enter text prompt (optional)
# 3. Paste example camera JSON from examples/simple_trajectory.json
# 4. Select resolution (24x480x704)
# 5. Click "Generate 3D Scene"

Verify GPU Decorator

Check that model loading happens outside the decorator:

# Good - model loads once at startup
generation_system = GenerationSystem(...)

@spaces.GPU(duration=15)
def generate_scene(...):
    return generation_system.generate(...)

# Bad - would reload model on every call (slow!)
@spaces.GPU(duration=15)
def generate_scene(...):
    generation_system = GenerationSystem(...)  # Don't do this!
    return generation_system.generate(...)

Troubleshooting

"GPU budget exceeded"

Cause: Generation took longer than 15 seconds

Solutions:

  • Reduce number of frames in camera trajectory
  • Enable --offload_t5 flag
  • Increase duration: @spaces.GPU(duration=20)

"Out of memory"

Cause: GPU memory exhausted

Solutions:

  • Enable T5 offloading: offload_t5=True
  • Enable VAE offloading: offload_vae=True
  • Reduce resolution
  • Reduce number of frames

"Model checkpoint not found"

Cause: Automatic download failed

Solutions:

  • Check internet connection
  • Verify HuggingFace access
  • Manually download and specify with --ckpt flag

"Error building extension 'gsplat_cuda'" or "glm/gtc/type_ptr.hpp: No such file or directory"

Cause: Missing GLM library headers required for gsplat CUDA compilation

Solutions:

  • Ensure packages.txt exists with libglm-dev and build-essential
  • Restart the Space to reinstall dependencies
  • Check Space build logs for system package installation errors

"Bias is not supported when out_dtype is set to Float32"

Cause: PyTorch FP8 operations limitation on certain GPU architectures

Solutions:

  • This is fixed in quant.py by applying bias separately when needed
  • Ensure you have the latest version of the code

Future Improvements

Potential enhancements for ZeroGPU deployment:

  1. Gradio Blocks UI: Add more interactive controls
  2. Example gallery: Pre-loaded example camera trajectories
  3. 3D visualization: Embed PLY viewer in Gradio interface
  4. Video preview: Show rendered video before downloading PLY
  5. Dynamic duration: Adjust GPU budget based on camera count

References