arxiv:2503.16153

FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing

Published on Mar 20

Authors:

Abstract

Rotary Position Embedding (RoPE) in Multimodal Diffusion Transformer (MMDiT) enhances text-to-image generation, and the analysis reveals distinct dependency patterns leading to a task-specific image editing framework.

AI-generated summary

The integration of Rotary Position Embedding (RoPE) in Multimodal Diffusion Transformer (MMDiT) has significantly enhanced text-to-image generation quality. However, the fundamental reliance of self-attention layers on positional embedding versus query-key similarity during generation remains an intriguing question. We present the first mechanistic analysis of RoPE-based MMDiT models (e.g., FLUX), introducing an automated probing strategy that disentangles positional information versus content dependencies by strategically manipulating RoPE during generation. Our analysis reveals distinct dependency patterns that do not straightforwardly correlate with depth, offering new insights into the layer-specific roles in RoPE-based MMDiT. Based on these findings, we propose a training-free, task-specific image editing framework that categorizes editing tasks into three types: position-dependent editing (e.g., object addition), content similarity-dependent editing (e.g., non-rigid editing), and region-preserved editing (e.g., background replacement). For each type, we design tailored key-value injection strategies based on the characteristics of the editing task. Extensive qualitative and quantitative evaluations demonstrate that our method outperforms state-of-the-art approaches, particularly in preserving original semantic content and achieving seamless modifications.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.16153 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.16153 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.16153 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.