Spaces:
Running
on
Zero
Running
on
Zero
| # ZeroGPU Migration Guide | |
| This document describes the changes made to enable FlashWorld to run on Hugging Face Spaces with ZeroGPU. | |
| ## Overview | |
| FlashWorld has been adapted to support ZeroGPU deployment on Hugging Face Spaces. This allows the model to run on free, dynamically allocated GPU resources with a configurable time budget. | |
| ## Changes Made | |
| ### 1. New Gradio Application (`app_gradio.py`) | |
| Created a new Gradio-based interface that replaces the Flask API for ZeroGPU deployment: | |
| **Key Features:** | |
| - Uses Gradio 5.49.1+ for the interface | |
| - Implements `@spaces.GPU(duration=15)` decorator with 15-second GPU budget | |
| - Model loading happens in global scope (outside GPU decorator) for efficiency | |
| - Simpler interface compared to the original Flask app with custom HTML | |
| - Accepts camera trajectory as JSON input | |
| - Returns PLY files for download | |
| **Architecture:** | |
| ```python | |
| # Model loads globally (once, at startup) | |
| generation_system = GenerationSystem(ckpt_path=ckpt_path, device=device, offload_t5=args.offload_t5) | |
| # Generation function uses GPU only when called | |
| @spaces.GPU(duration=15) | |
| def generate_scene(image_prompt, text_prompt, camera_json, resolution): | |
| # GPU-intensive work happens here | |
| # Returns PLY file + status message | |
| ``` | |
| ### 2. Requirements Updates (`requirements.txt`) | |
| **Removed:** | |
| - `flask==3.1.2` (not needed for ZeroGPU deployment) | |
| **Added:** | |
| - `spaces` (Hugging Face Spaces integration library) | |
| **Kept:** | |
| - `gradio==5.49.1` (required for Gradio SDK) | |
| - All other dependencies remain unchanged | |
| ### 3. System Dependencies (`packages.txt`) | |
| **Created new file** to install system-level dependencies required by gsplat for CUDA compilation: | |
| - `libglm-dev` (OpenGL Mathematics library headers) | |
| - `build-essential` (compilation tools) | |
| ### 4. README Updates | |
| **Added YAML frontmatter:** | |
| ```yaml | |
| --- | |
| title: FlashWorld | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app_gradio.py | |
| pinned: false | |
| license: cc-by-nc-sa-4.0 | |
| python_version: 3.10.13 | |
| --- | |
| ``` | |
| **Added ZeroGPU deployment section:** | |
| - Instructions for deploying on Hugging Face Spaces | |
| - Documentation of 15-second GPU budget | |
| - Explanation of model loading strategy | |
| ### 5. CLAUDE.md Updates | |
| Updated the development documentation to include: | |
| - Instructions for running both Flask (local) and Gradio (ZeroGPU) versions | |
| - Documentation of ZeroGPU configuration | |
| - Explanation of decorator usage and model loading patterns | |
| ### 6. Example Camera Trajectory | |
| Created `examples/simple_trajectory.json` with a basic 5-camera forward-moving trajectory to help users get started. | |
| ## Key Design Decisions | |
| ### Why 15 Seconds? | |
| The GPU duration budget was set to 15 seconds for the following reasons: | |
| 1. Generation takes ~7 seconds on A100/A800 | |
| 2. Additional time needed for: | |
| - Input processing (image resizing, camera parsing) | |
| - Export to PLY format | |
| - Buffer for slower GPUs or variable load | |
| 3. ZeroGPU default is 60 seconds, so 15 seconds is conservative | |
| ### Model Loading Strategy | |
| The model is loaded **once** in global scope, not inside the `@spaces.GPU` decorator: | |
| **Advantages:** | |
| - Model loads at startup, not on every generation | |
| - Faster response time for users | |
| - More efficient use of GPU time budget | |
| - Follows ZeroGPU best practices | |
| **Implementation:** | |
| ```python | |
| # Global scope - loads once at startup | |
| generation_system = GenerationSystem(...) | |
| # GPU decorator - only for inference | |
| @spaces.GPU(duration=15) | |
| def generate_scene(...): | |
| return generation_system.generate(...) | |
| ``` | |
| ### Input Format | |
| Camera trajectories are provided as JSON to make the Gradio interface simpler: | |
| ```json | |
| { | |
| "cameras": [ | |
| { | |
| "quaternion": [w, x, y, z], | |
| "position": [x, y, z], | |
| "fx": 352.0, | |
| "fy": 352.0, | |
| "cx": 352.0, | |
| "cy": 240.0 | |
| } | |
| ] | |
| } | |
| ``` | |
| This is different from the Flask API which used nested dictionaries in the POST request. | |
| ## Deployment Instructions | |
| ### Local Testing | |
| Test the Gradio app locally before deploying: | |
| ```bash | |
| python app_gradio.py | |
| ``` | |
| This will start the Gradio interface at `http://localhost:7860` | |
| ### Hugging Face Spaces Deployment | |
| 1. **Create a new Space:** | |
| - Go to https://huggingface.co/spaces | |
| - Click "Create new Space" | |
| - Select "ZeroGPU" as hardware | |
| 2. **Upload files:** | |
| - Push this repository to the Space | |
| - Ensure `app_gradio.py` is set as the app file in README.md | |
| 3. **Configuration:** | |
| - The Space will automatically use the YAML frontmatter in README.md | |
| - Model checkpoint will auto-download from HuggingFace Hub | |
| - No additional configuration needed | |
| 4. **Optional: Enable `--offload_t5` flag:** | |
| - Edit `app_gradio.py` to add `offload_t5=True` in `GenerationSystem` initialization | |
| - This reduces GPU memory usage but may slightly increase generation time | |
| ## Limitations | |
| ### ZeroGPU Constraints | |
| 1. **60-second hard limit:** Cannot exceed 60 seconds per GPU call | |
| 2. **No torch.compile:** Not supported in ZeroGPU environment | |
| 3. **Gradio only:** Must use Gradio SDK (no Flask or other frameworks) | |
| 4. **Python 3.10.13:** Recommended Python version | |
| ### Feature Differences from Flask App | |
| The Gradio app (`app_gradio.py`) differs from the original Flask app (`app.py`): | |
| **Missing features:** | |
| - Custom HTML/CSS interface | |
| - Real-time 3D preview with Spark.js | |
| - Manual camera trajectory recording with mouse/keyboard | |
| - Template-based trajectory generation | |
| - Queue visualization with progress bars | |
| - Concurrent request handling | |
| **Present features:** | |
| - Image and text prompts | |
| - Camera trajectory input (via JSON) | |
| - PLY file generation and download | |
| - Simple, accessible Gradio interface | |
| ### Recommended Usage | |
| For **ZeroGPU deployment:** | |
| - Use `app_gradio.py` | |
| - Keep camera trajectories reasonable (β€24 frames) | |
| - Consider enabling `--offload_t5` for memory savings | |
| For **local development with full features:** | |
| - Use `app.py` | |
| - Enjoy the full custom UI with interactive camera controls | |
| - Support for multiple concurrent generations | |
| ## Testing | |
| ### Test the Gradio App | |
| ```bash | |
| # Start the app | |
| python app_gradio.py | |
| # In the browser (http://localhost:7860): | |
| # 1. Upload an image (optional) | |
| # 2. Enter text prompt (optional) | |
| # 3. Paste example camera JSON from examples/simple_trajectory.json | |
| # 4. Select resolution (24x480x704) | |
| # 5. Click "Generate 3D Scene" | |
| ``` | |
| ### Verify GPU Decorator | |
| Check that model loading happens outside the decorator: | |
| ```python | |
| # Good - model loads once at startup | |
| generation_system = GenerationSystem(...) | |
| @spaces.GPU(duration=15) | |
| def generate_scene(...): | |
| return generation_system.generate(...) | |
| # Bad - would reload model on every call (slow!) | |
| @spaces.GPU(duration=15) | |
| def generate_scene(...): | |
| generation_system = GenerationSystem(...) # Don't do this! | |
| return generation_system.generate(...) | |
| ``` | |
| ## Troubleshooting | |
| ### "GPU budget exceeded" | |
| **Cause:** Generation took longer than 15 seconds | |
| **Solutions:** | |
| - Reduce number of frames in camera trajectory | |
| - Enable `--offload_t5` flag | |
| - Increase duration: `@spaces.GPU(duration=20)` | |
| ### "Out of memory" | |
| **Cause:** GPU memory exhausted | |
| **Solutions:** | |
| - Enable T5 offloading: `offload_t5=True` | |
| - Enable VAE offloading: `offload_vae=True` | |
| - Reduce resolution | |
| - Reduce number of frames | |
| ### "Model checkpoint not found" | |
| **Cause:** Automatic download failed | |
| **Solutions:** | |
| - Check internet connection | |
| - Verify HuggingFace access | |
| - Manually download and specify with `--ckpt` flag | |
| ### "Error building extension 'gsplat_cuda'" or "glm/gtc/type_ptr.hpp: No such file or directory" | |
| **Cause:** Missing GLM library headers required for gsplat CUDA compilation | |
| **Solutions:** | |
| - Ensure `packages.txt` exists with `libglm-dev` and `build-essential` | |
| - Restart the Space to reinstall dependencies | |
| - Check Space build logs for system package installation errors | |
| ### "Bias is not supported when out_dtype is set to Float32" | |
| **Cause:** PyTorch FP8 operations limitation on certain GPU architectures | |
| **Solutions:** | |
| - This is fixed in `quant.py` by applying bias separately when needed | |
| - Ensure you have the latest version of the code | |
| ## Future Improvements | |
| Potential enhancements for ZeroGPU deployment: | |
| 1. **Gradio Blocks UI:** Add more interactive controls | |
| 2. **Example gallery:** Pre-loaded example camera trajectories | |
| 3. **3D visualization:** Embed PLY viewer in Gradio interface | |
| 4. **Video preview:** Show rendered video before downloading PLY | |
| 5. **Dynamic duration:** Adjust GPU budget based on camera count | |
| ## References | |
| - [ZeroGPU Documentation](https://huggingface.co/docs/hub/en/spaces-zerogpu) | |
| - [Gradio Documentation](https://gradio.app/docs/) | |
| - [FlashWorld Paper](https://arxiv.org/pdf/2510.13678) | |
| - [FlashWorld Project Page](https://imlixinyang.github.io/FlashWorld-Project-Page) | |