Papers
arxiv:2511.14899

InstructMix2Mix: Consistent Sparse-View Editing Through Multi-View Model Personalization

Published on Nov 18
· Submitted by Daniel Gilo on Nov 24
Authors:

Abstract

A framework called InstructMix2Mix uses a 2D diffusion model to improve multi-view image editing by leveraging a 3D prior for cross-view consistency.

AI-generated summary

We address the task of multi-view image editing from sparse input views, where the inputs can be seen as a mix of images capturing the scene from different viewpoints. The goal is to modify the scene according to a textual instruction while preserving consistency across all views. Existing methods, based on per-scene neural fields or temporal attention mechanisms, struggle in this setting, often producing artifacts and incoherent edits. We propose InstructMix2Mix (I-Mix2Mix), a framework that distills the editing capabilities of a 2D diffusion model into a pretrained multi-view diffusion model, leveraging its data-driven 3D prior for cross-view consistency. A key contribution is replacing the conventional neural field consolidator in Score Distillation Sampling (SDS) with a multi-view diffusion student, which requires novel adaptations: incremental student updates across timesteps, a specialized teacher noise scheduler to prevent degeneration, and an attention modification that enhances cross-view coherence without additional cost. Experiments demonstrate that I-Mix2Mix significantly improves multi-view consistency while maintaining high per-frame edit quality.

Community

Paper author Paper submitter

I-Mix2Mix performs instruction-driven edits on a sparse set of views. The key idea is SDS with a twist: we distill a 2D editor into a pretrained multi-view diffusion model rather than a NeRF/3DGS. The student’s learned 3D prior enables multi-view consistent edits, despite the sparse input.

Check out our project page: https://danielgilo.github.io/instruct-mix2mix/

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.14899 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.14899 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.14899 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.