Spaces:

blanchon
/

Image-GS

Build error

App Files Files Community

Image-GS / README.md

Julien Blanchon

Update

a0b1d08 13 days ago

preview code

raw

history blame contribute delete

12.7 kB

metadata

title: Image GS
emoji: 💻
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.1
python_version: '3.10'
app_file: gradio_app.py
suggested_hardware: cpu-basic
models:
  - blanchon/image-gs-models-utils
pinned: false

Image-GS: Content-Adaptive Image Representation via 2D Gaussians

Yunxiang Zhang^1*, Bingxuan Li^1*, Alexandr Kuznetsov^3†, Akshay Jindal², Stavros Diolatzis², Kenneth Chen¹, Anton Sochenov², Anton Kaplanyan², Qi Sun¹

* Equal contribution † Work done while at Intel

¹ ² ³

Neural image representations have emerged as a promising approach for encoding and rendering visual data. Combined with learning-based workflows, they demonstrate impressive trade-offs between visual fidelity and memory footprint. Existing methods in this domain, however, often rely on fixed data structures that suboptimally allocate memory or compute-intensive implicit models, hindering their practicality for real-time graphics applications.

Inspired by recent advancements in radiance field rendering, we introduce Image-GS, a content-adaptive image representation based on 2D Gaussians. Leveraging a custom differentiable renderer, Image-GS reconstructs images by adaptively allocating and progressively optimizing a group of anisotropic, colored 2D Gaussians. It achieves a favorable balance between visual fidelity and memory efficiency across a variety of stylized images frequently seen in graphics workflows, especially for those showing non-uniformly distributed features and in low-bitrate regimes. Moreover, it supports hardware-friendly rapid random access for real-time usage, requiring only 0.3K MACs to decode a pixel. Through error-guided progressive optimization, Image-GS naturally constructs a smooth level-of-detail hierarchy. We demonstrate its versatility with several applications, including texture compression, semantics-aware compression, and joint image compression and restoration.

_{Figure 1: Image-GS reconstructs an image by adaptively allocating and progressively optimizing a set of colored 2D Gaussians. It achieves favorable rate-distortion trade-offs, hardware-friendly random access, and flexible quality control through a smooth level-of-detail stack. (a) visualizes the optimized spatial distribution of Gaussians (20% randomly sampled for clarity). (b) Image-GS’s explicit content-adaptive design effectively captures non-uniformly distributed image features and better preserves fine details under constrained memory budgets. In the inset error maps, brighter colors indicate larger errors.}

Setup

Create a dedicated Python environment and install the dependencies

git clone https://github.com/NYU-ICL/image-gs.git
cd image-gs
conda env create -f environment.yml
conda activate image-gs
pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation
cd gsplat
pip install -e ".[dev]"
cd ..

Download the image and texture datasets from OneDrive and organize the folder structure as follows
```
image-gs
└── media
    ├── images
    └── textures
```
(Optional) To run saliency-guided Gaussian position initialization, download the pre-trained EML-Net models (res_imagenet.pth, res_places.pth, res_decoder.pth) and place them under the models/emlnet/ folder
```
image-gs
└── models
    └── emlnet
        ├── res_decoder.pth
        ├── res_imagenet.pth
        └── res_places.pth
```

Quick Start

Image Compression

Optimize an Image-GS representation for an input image anime-1_2k.png using 10000 Gaussians with half-precision parameters

python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize

Render the corresponding optimized Image-GS representation at a new resolution with height 4000 (aspect ratio is maintained)

python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --eval --render_height=4000

Texture Stack Compression

Optimize an Image-GS representation for an input texture stack alarm-clock_2k using 30000 Gaussians with half-precision parameters

python main.py --input_path="textures/alarm-clock_2k" --exp_name="test/alarm-clock_2k" --num_gaussians=30000 --quantize

Render the corresponding optimized Image-GS representation at a new resolution with height 3000 (aspect ratio is maintained)

python main.py --input_path="textures/alarm-clock_2k" --exp_name="test/alarm-clock_2k" --num_gaussians=30000 --quantize  --eval --render_height=3000

Control bit precision of Gaussian parameters

Optimize an Image-GS representation for an input image anime-1_2k.png using 10000 Gaussians with 12-bit-precision parameters

python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --pos_bits=12 --scale_bits 12 --rot_bits 12 --feat_bits 12

Switch to saliency-guided Gaussian position initialization

Optimize an Image-GS representation for an input image anime-1_2k.png using 10000 Gaussians with half-precision parameters and saliency-guided initialization

python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --init_mode="saliency"

Gradio Web Interface

We provide a user-friendly web interface built with Gradio for easy experimentation and training visualization.

Setup for Web Interface

Install Gradio (in addition to the main dependencies):

pip install gradio>=5.0.0

Launch the web interface:

python gradio_app.py

Open your browser and navigate to http://localhost:7860

Features

The Gradio interface provides:

Interactive Parameter Configuration: Adjust all training parameters through an intuitive UI
Image Upload: Drag and drop any image to train on
Real-time Training Progress: Stream training logs and intermediate results
Live Visualization: Watch Gaussian placement and rendering progress during training
Result Gallery: View final renders, gradient maps, and saliency maps
Easy Experimentation: No need to remember command-line arguments

Interface Sections

Configuration Panel:
- Basic parameters (number of Gaussians, training steps)
- Quantization settings for memory efficiency
- Initialization modes (gradient, saliency, random)
- Advanced optimization parameters (learning rates, loss weights)
Training Progress:
- Real-time streaming logs
- Current render and Gaussian visualization updates
- Training status and control buttons
Results Display:
- Final optimized image
- Gradient and saliency maps used for initialization
- Download capabilities for all results

Usage Tips

Start with default parameters for your first run
Use saliency initialization for better results on complex images
Enable Gaussian visualization to see how the representation evolves
Adjust save image steps to control visualization frequency (lower = more updates, but slower)
For quick tests, reduce max steps to 500-1000

Command Line Arguments

Please refer to cfgs/default.yaml for the full list of arguments and their default values.

Post-optimization rendering

--eval render the optimized Image-GS representation.
--render_height image height for rendering (aspect ratio is maintained).

Bit precision control: 32 bits (float32) per dimension by default

--quantize enable bit precision control of Gaussian parameters.
--pos_bits bit precision of individual coordinate dimension.
--scale_bits bit precision of individual scale dimension.
--rot_bits bit precision of Gaussian orientation angle.
--feat_bits bit precision of individual feature dimension.

Logging

--exp_name path to the logging directory.
--vis_gaussians: visualize Gaussians during optimization.
--save_image_steps frequency of rendering intermediate results during optimization.
--save_ckpt_steps frequency of checkpointing during optimization.

Input image

--input_path path to an image file or a directory containing a texture stack.
--downsample load a downsampled version of the input image or texture stack as the optimization target to evaluate image upsampling performance.
--downsample_ratio downsampling ratio.
--gamma optimize in a gamma-corrected space, modify with caution.

Gaussian

--num_gaussians number of Gaussians (for compression rate control).
--init_scale initial Gaussian scale in number of pixels.
--disable_topk_norm disable top-K normalization.
--disable_inverse_scale disable inverse Gaussian scale optimization.
--init_mode Gaussian position initialization mode, valid values include "gradient", "saliency", and "random".
--init_random_ratio ratio of Gaussians with randomly initialized position.

Optimization

--disable_tiles disable tile-based rendering (warning: optimization and rendering without tiles will be way slower).
--max_steps maximum number of optimization steps.
--pos_lr Gaussian position learning rate.
--scale_lr Gaussian scale learning rate.
--rot_lr Gaussian orientation angle learning rate.
--feat_lr Gaussian feature learning rate.
--disable_lr_schedule disable learning rate decay and early stopping schedule.
--disable_prog_optim disable error-guided progressive optimization.

Acknowledgements

We would like to thank the gsplat team, and the authors of 3DGS, fused-ssim, and EML-Net for their great work, based on which Image-GS was developed.

License

This project is licensed under the terms of the MIT license.

Citation

If you find this project helpful to your research, please consider citing BibTeX:

@inproceedings{zhang2025image,
  title={Image-gs: Content-adaptive image representation via 2d gaussians},
  author={Zhang, Yunxiang and Li, Bingxuan and Kuznetsov, Alexandr and Jindal, Akshay and Diolatzis, Stavros and Chen, Kenneth and Sochenov, Anton and Kaplanyan, Anton and Sun, Qi},
  booktitle={Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
  pages={1--11},
  year={2025}
}