Spaces:

jbilcke-hf
/

FlashWorld-ZeroGPU

Running on Zero

App Files Files Community

FlashWorld-ZeroGPU / ZEROGPU_MIGRATION.md

Julian Bilcke

update documentation

4f5fb48 26 days ago

preview code

raw

history blame contribute delete

8.72 kB

	# ZeroGPU Migration Guide

	This document describes the changes made to enable FlashWorld to run on Hugging Face Spaces with ZeroGPU.

	## Overview

	FlashWorld has been adapted to support ZeroGPU deployment on Hugging Face Spaces. This allows the model to run on free, dynamically allocated GPU resources with a configurable time budget.

	## Changes Made

	### 1. New Gradio Application (`app_gradio.py`)

	Created a new Gradio-based interface that replaces the Flask API for ZeroGPU deployment:

	Key Features:
	- Uses Gradio 5.49.1+ for the interface
	- Implements `@spaces.GPU(duration=15)` decorator with 15-second GPU budget
	- Model loading happens in global scope (outside GPU decorator) for efficiency
	- Simpler interface compared to the original Flask app with custom HTML
	- Accepts camera trajectory as JSON input
	- Returns PLY files for download

	Architecture:
	```python
	# Model loads globally (once, at startup)
	generation_system = GenerationSystem(ckpt_path=ckpt_path, device=device, offload_t5=args.offload_t5)

	# Generation function uses GPU only when called
	@spaces.GPU(duration=15)
	def generate_scene(image_prompt, text_prompt, camera_json, resolution):
	# GPU-intensive work happens here
	# Returns PLY file + status message
	```

	### 2. Requirements Updates (`requirements.txt`)

	Removed:
	- `flask==3.1.2` (not needed for ZeroGPU deployment)

	Added:
	- `spaces` (Hugging Face Spaces integration library)

	Kept:
	- `gradio==5.49.1` (required for Gradio SDK)
	- All other dependencies remain unchanged

	### 3. System Dependencies (`packages.txt`)

	Created new file to install system-level dependencies required by gsplat for CUDA compilation:
	- `libglm-dev` (OpenGL Mathematics library headers)
	- `build-essential` (compilation tools)

	### 4. README Updates

	Added YAML frontmatter:
	```yaml
	---
	title: FlashWorld
	emoji: 🌎
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app_gradio.py
	pinned: false
	license: cc-by-nc-sa-4.0
	python_version: 3.10.13
	---
	```

	Added ZeroGPU deployment section:
	- Instructions for deploying on Hugging Face Spaces
	- Documentation of 15-second GPU budget
	- Explanation of model loading strategy

	### 5. CLAUDE.md Updates

	Updated the development documentation to include:
	- Instructions for running both Flask (local) and Gradio (ZeroGPU) versions
	- Documentation of ZeroGPU configuration
	- Explanation of decorator usage and model loading patterns

	### 6. Example Camera Trajectory

	Created `examples/simple_trajectory.json` with a basic 5-camera forward-moving trajectory to help users get started.

	## Key Design Decisions

	### Why 15 Seconds?

	The GPU duration budget was set to 15 seconds for the following reasons:
	1. Generation takes ~7 seconds on A100/A800
	2. Additional time needed for:
	- Input processing (image resizing, camera parsing)
	- Export to PLY format
	- Buffer for slower GPUs or variable load
	3. ZeroGPU default is 60 seconds, so 15 seconds is conservative

	### Model Loading Strategy

	The model is loaded once in global scope, not inside the `@spaces.GPU` decorator:

	Advantages:
	- Model loads at startup, not on every generation
	- Faster response time for users
	- More efficient use of GPU time budget
	- Follows ZeroGPU best practices

	Implementation:
	```python
	# Global scope - loads once at startup
	generation_system = GenerationSystem(...)

	# GPU decorator - only for inference
	@spaces.GPU(duration=15)
	def generate_scene(...):
	return generation_system.generate(...)
	```

	### Input Format

	Camera trajectories are provided as JSON to make the Gradio interface simpler:

	```json
	{
	"cameras": [
	{
	"quaternion": [w, x, y, z],
	"position": [x, y, z],
	"fx": 352.0,
	"fy": 352.0,
	"cx": 352.0,
	"cy": 240.0
	}
	]
	}
	```

	This is different from the Flask API which used nested dictionaries in the POST request.

	## Deployment Instructions

	### Local Testing

	Test the Gradio app locally before deploying:

	```bash
	python app_gradio.py
	```

	This will start the Gradio interface at `http://localhost:7860`

	### Hugging Face Spaces Deployment

	1. Create a new Space:
	- Go to https://huggingface.co/spaces
	- Click "Create new Space"
	- Select "ZeroGPU" as hardware

	2. Upload files:
	- Push this repository to the Space
	- Ensure `app_gradio.py` is set as the app file in README.md

	3. Configuration:
	- The Space will automatically use the YAML frontmatter in README.md
	- Model checkpoint will auto-download from HuggingFace Hub
	- No additional configuration needed

	4. Optional: Enable `--offload_t5` flag:
	- Edit `app_gradio.py` to add `offload_t5=True` in `GenerationSystem` initialization
	- This reduces GPU memory usage but may slightly increase generation time

	## Limitations

	### ZeroGPU Constraints

	1. 60-second hard limit: Cannot exceed 60 seconds per GPU call
	2. No torch.compile: Not supported in ZeroGPU environment
	3. Gradio only: Must use Gradio SDK (no Flask or other frameworks)
	4. Python 3.10.13: Recommended Python version

	### Feature Differences from Flask App

	The Gradio app (`app_gradio.py`) differs from the original Flask app (`app.py`):

	Missing features:
	- Custom HTML/CSS interface
	- Real-time 3D preview with Spark.js
	- Manual camera trajectory recording with mouse/keyboard
	- Template-based trajectory generation
	- Queue visualization with progress bars
	- Concurrent request handling

	Present features:
	- Image and text prompts
	- Camera trajectory input (via JSON)
	- PLY file generation and download
	- Simple, accessible Gradio interface

	### Recommended Usage

	For ZeroGPU deployment:
	- Use `app_gradio.py`
	- Keep camera trajectories reasonable (≤24 frames)
	- Consider enabling `--offload_t5` for memory savings

	For local development with full features:
	- Use `app.py`
	- Enjoy the full custom UI with interactive camera controls
	- Support for multiple concurrent generations

	## Testing

	### Test the Gradio App

	```bash
	# Start the app
	python app_gradio.py

	# In the browser (http://localhost:7860):
	# 1. Upload an image (optional)
	# 2. Enter text prompt (optional)
	# 3. Paste example camera JSON from examples/simple_trajectory.json
	# 4. Select resolution (24x480x704)
	# 5. Click "Generate 3D Scene"
	```

	### Verify GPU Decorator

	Check that model loading happens outside the decorator:

	```python
	# Good - model loads once at startup
	generation_system = GenerationSystem(...)

	@spaces.GPU(duration=15)
	def generate_scene(...):
	return generation_system.generate(...)

	# Bad - would reload model on every call (slow!)
	@spaces.GPU(duration=15)
	def generate_scene(...):
	generation_system = GenerationSystem(...) # Don't do this!
	return generation_system.generate(...)
	```

	## Troubleshooting

	### "GPU budget exceeded"

	Cause: Generation took longer than 15 seconds

	Solutions:
	- Reduce number of frames in camera trajectory
	- Enable `--offload_t5` flag
	- Increase duration: `@spaces.GPU(duration=20)`

	### "Out of memory"

	Cause: GPU memory exhausted

	Solutions:
	- Enable T5 offloading: `offload_t5=True`
	- Enable VAE offloading: `offload_vae=True`
	- Reduce resolution
	- Reduce number of frames

	### "Model checkpoint not found"

	Cause: Automatic download failed

	Solutions:
	- Check internet connection
	- Verify HuggingFace access
	- Manually download and specify with `--ckpt` flag

	### "Error building extension 'gsplat_cuda'" or "glm/gtc/type_ptr.hpp: No such file or directory"

	Cause: Missing GLM library headers required for gsplat CUDA compilation

	Solutions:
	- Ensure `packages.txt` exists with `libglm-dev` and `build-essential`
	- Restart the Space to reinstall dependencies
	- Check Space build logs for system package installation errors

	### "Bias is not supported when out_dtype is set to Float32"

	Cause: PyTorch FP8 operations limitation on certain GPU architectures

	Solutions:
	- This is fixed in `quant.py` by applying bias separately when needed
	- Ensure you have the latest version of the code

	## Future Improvements

	Potential enhancements for ZeroGPU deployment:

	1. Gradio Blocks UI: Add more interactive controls
	2. Example gallery: Pre-loaded example camera trajectories
	3. 3D visualization: Embed PLY viewer in Gradio interface
	4. Video preview: Show rendered video before downloading PLY
	5. Dynamic duration: Adjust GPU budget based on camera count

	## References

	- [ZeroGPU Documentation](https://huggingface.co/docs/hub/en/spaces-zerogpu)
	- [Gradio Documentation](https://gradio.app/docs/)
	- [FlashWorld Paper](https://arxiv.org/pdf/2510.13678)
	- [FlashWorld Project Page](https://imlixinyang.github.io/FlashWorld-Project-Page)