OmniRefiner: Reinforcement-Guided Local Diffusion Refinement
Abstract
A detail-aware refinement framework using single-image diffusion and reinforcement learning enhances reference-guided image generation, improving detail preservation and consistency.
Reference-guided image generation has progressed rapidly, yet current diffusion models still struggle to preserve fine-grained visual details when refining a generated image using a reference. This limitation arises because VAE-based latent compression inherently discards subtle texture information, causing identity- and attribute-specific cues to vanish. Moreover, post-editing approaches that amplify local details based on existing methods often produce results inconsistent with the original image in terms of lighting, texture, or shape. To address this, we introduce , a detail-aware refinement framework that performs two consecutive stages of reference-driven correction to enhance pixel-level consistency. We first adapt a single-image diffusion editor by fine-tuning it to jointly ingest the draft image and the reference image, enabling globally coherent refinement while maintaining structural fidelity. We then apply reinforcement learning to further strengthen localized editing capability, explicitly optimizing for detail accuracy and semantic consistency. Extensive experiments demonstrate that significantly improves reference alignment and fine-grained detail preservation, producing faithful and visually coherent edits that surpass both open-source and commercial models on challenging reference-guided restoration benchmarks.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment (2025)
- CaliTex: Geometry-Calibrated Attention for View-Coherent 3D Texture Generation (2025)
- Local-Global Context-Aware and Structure-Preserving Image Super-Resolution (2025)
- BLIP3o-NEXT: Next Frontier of Native Image Generation (2025)
- Identity-Preserving Image-to-Video Generation via Reward-Guided Optimization (2025)
- Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes (2025)
- Group Relative Attention Guidance for Image Editing (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper