arxiv:2511.19320

SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

Published on Nov 24

· Submitted by

Jiaming Zhang on Nov 26

Multimedia Computing Group-Nanjing University

Upvote

Authors:

Jiaming Zhang ,

Abstract

SteadyDancer, an Image-to-Video framework, ensures first-frame identity preservation and precise motion control through harmonized conditions, adaptive pose representation, and hierarchical training objectives.

AI-generated summary

Preserving first-frame identity while ensuring precise motion control is a fundamental challenge in human image animation. The Image-to-Motion Binding process of the dominant Reference-to-Video (R2V) paradigm overlooks critical spatio-temporal misalignments common in real-world applications, leading to failures such as identity drift and visual artifacts. We introduce SteadyDancer, an Image-to-Video (I2V) paradigm-based framework that achieves harmonized and coherent animation and is the first to ensure first-frame preservation robustly. Firstly, we propose a Condition-Reconciliation Mechanism to harmonize the two conflicting conditions, enabling precise control without sacrificing fidelity. Secondly, we design Synergistic Pose Modulation Modules to generate an adaptive and coherent pose representation that is highly compatible with the reference image. Finally, we employ a Staged Decoupled-Objective Training Pipeline that hierarchically optimizes the model for motion fidelity, visual quality, and temporal coherence. Experiments demonstrate that SteadyDancer achieves state-of-the-art performance in both appearance fidelity and motion control, while requiring significantly fewer training resources than comparable methods.

View arXiv page View PDF Project page GitHub 63 Add to collection

Community

jiamingZ

Paper author Paper submitter 1 day ago

SteadyDancer is a strong animation framework based on Image-to-Video paradigm, ensuring robust first-frame preservation. In contrast to prior Reference-to-Video approaches that often suffer from identity drift due to spatio-temporal misalignments common in real-world applications, SteadyDancer generates high-fidelity and temporally coherent human animations, outperforming existing methods in visual quality and control while requiring significantly fewer training resources.