| | --- |
| | language: en |
| | license: mit |
| | tags: |
| | - pose-estimation |
| | - computer-vision |
| | - keypoint-detection |
| | - diffusion-models |
| | - stable-diffusion |
| | - out-of-distribution |
| | - human-pose |
| | - top-down-pose-estimation |
| | - coco |
| | - mmpose |
| | library_name: pytorch |
| | --- |
| | |
| | # SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (Body - 17 Keypoints) |
| |
|
| | <div align="center"> |
| |
|
| | [](https://arxiv.org/abs/2509.24980) |
| | [](https://t-s-liang.github.io/SDPose) |
| | [](https://huggingface.co/spaces/teemosliang/SDPose-Body) |
| | [](https://opensource.org/licenses/MIT) |
| |
|
| | </div> |
| |
|
| | ## Model Description |
| |
|
| | **SDPose** is a state-of-the-art human pose estimation model that leverages the powerful visual priors from **Stable Diffusion** to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates **17 COCO body keypoints** including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles. |
| |
|
| | ### Model Architecture |
| |
|
| | SDPose employs a **U-Net backbone** initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner: |
| |
|
| | 1. **Person Detection**: Detect human bounding boxes using an object detector (e.g., YOLO11-x) |
| | 2. **Pose Estimation**: Crop and estimate 17 body keypoints for each detected person |
| | 3. **Heatmap Generation**: Produce confidence heatmaps for precise keypoint estimation |
| |
|
| | **Model Specifications:** |
| | - **Backbone**: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes) |
| | - **Head**: Custom heatmap prediction head |
| | - **Input Resolution**: 1024ร768 (HรW) |
| | - **Output**: 17 keypoint heatmaps + coordinates with confidence scores |
| | - **Framework**: MMPose |
| |
|
| | ## Supported Keypoints (COCO Format) |
| |
|
| | The model predicts 17 body keypoints following the COCO keypoint format: |
| |
|
| | ``` |
| | 0: nose |
| | 1: left_eye |
| | 2: right_eye |
| | 3: left_ear |
| | 4: right_ear |
| | 5: left_shoulder |
| | 6: right_shoulder |
| | 7: left_elbow |
| | 8: right_elbow |
| | 9: left_wrist |
| | 10: right_wrist |
| | 11: left_hip |
| | 12: right_hip |
| | 13: left_knee |
| | 14: right_knee |
| | 15: left_ankle |
| | 16: right_ankle |
| | ``` |
| |
|
| | ## Intended Use |
| |
|
| | ### Primary Use Cases |
| |
|
| | - Human pose estimation in natural images |
| | - Pose estimation in artistic and stylized domains (paintings, anime, sketches) |
| | - Animation and video pose tracking |
| | - Cross-domain pose analysis and research |
| | - Applications requiring robust pose estimation under distribution shifts |
| |
|
| | ## How to Use |
| |
|
| | ### Installation |
| |
|
| | ```bash |
| | # Clone the repository |
| | git clone https://github.com/t-s-liang/SDPose-OOD.git |
| | cd SDPose-OOD |
| | |
| | # Install dependencies |
| | pip install -r requirements.txt |
| | # Download YOLO11-x for human detection |
| | wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/ |
| | |
| | # Launch Gradio interface |
| | cd gradio_app |
| | bash launch_gradio.sh |
| | ``` |
| |
|
| | ## Training Data |
| |
|
| | ### Datasets |
| |
|
| | Trained exclusively on COCO-2017 train2017 (no extra data). |
| |
|
| | - **COCO (Common Objects in Context)**: 200K+ images with 17 body keypoints |
| |
|
| | ### Preprocessing |
| |
|
| | - Images are resized and cropped to 1024ร768 resolution |
| | - Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout). |
| | - Heatmaps: UDP codec (MMPose style). |
| |
|
| | ### Comparison with Baselines |
| |
|
| | SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens, ViTPose++) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data. |
| |
|
| | See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results. |
| |
|
| | ## Citation |
| |
|
| | If you use SDPose in your research, please cite our paper: |
| |
|
| | ```bibtex |
| | @misc{liang2025sdposeexploitingdiffusionpriors, |
| | title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation}, |
| | author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan}, |
| | year={2025}, |
| | eprint={2509.24980}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV}, |
| | url={https://arxiv.org/abs/2509.24980}, |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | This model is released under the [MIT License](https://opensource.org/licenses/MIT). |
| |
|
| | ## Additional Resources |
| |
|
| | - ๐ **Project Website**: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose) |
| | - ๐ **Paper**: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980) |
| | - ๐ป **Code Repository**: [GitHub](https://github.com/t-s-liang/SDPose-OOD) |
| | - ๐ค **Demo**: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body) |
| | - ๐ง **Contact**: tsliang2001@gmail.com |
| |
|
| | --- |
| |
|
| | <div align="center"> |
| |
|
| | **โญ Star us on GitHub โ it motivates us a lot!** |
| |
|
| | </div> |
| |
|