arxiv:2511.16301

Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling

Published on Nov 20

· Submitted by

Authors:

Abstract

Upsample Anything is a lightweight test-time optimization framework that enhances low-resolution features to high-resolution outputs without training, using an anisotropic Gaussian kernel for precise reconstruction in tasks like semantic segmentation and depth estimation.

AI-generated summary

We present Upsample Anything, a lightweight test-time optimization (TTO) framework that restores low-resolution features to high-resolution, pixel-wise outputs without any training. Although Vision Foundation Models demonstrate strong generalization across diverse downstream tasks, their representations are typically downsampled by 14x/16x (e.g., ViT), which limits their direct use in pixel-level applications. Existing feature upsampling approaches depend on dataset-specific retraining or heavy implicit optimization, restricting scalability and generalization. Upsample Anything addresses these issues through a simple per-image optimization that learns an anisotropic Gaussian kernel combining spatial and range cues, effectively bridging Gaussian Splatting and Joint Bilateral Upsampling. The learned kernel acts as a universal, edge-aware operator that transfers seamlessly across architectures and modalities, enabling precise high-resolution reconstruction of features, depth, or probability maps. It runs in only approx0.419 s per 224x224 image and achieves state-of-the-art performance on semantic segmentation, depth estimation, and both depth and probability map upsampling. Project page: https://seominseok0429.github.io/Upsample-Anything/{https://seominseok0429.github.io/Upsample-Anything/}

View arXiv page View PDF Project page GitHub 61 Add to collection

Community

minseok96

Paper submitter 1 day ago

This paper presents a remarkably simple yet highly effective test-time optimization framework for feature upsampling.
The method is fully training-free and generalizes seamlessly across domains, tasks, and backbone architectures.
Its per-pixel anisotropic Gaussian formulation offers strong edge preservation and superior spatial fidelity compared to prior works.
The approach is computationally lightweight, scalable to high resolutions, and consistently achieves state-of-the-art performance.
Overall, this work provides a robust and universal upsampler that elegantly bridges JBU and Gaussian Splatting.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.16301 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.16301 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.16301 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.