README.md · teemosliang/SDPose-Body at main

SDPose-Body / README.md

teemosliang

Update README.md

0aed605 verified 5 months ago

preview code

raw

history blame contribute delete

4.91 kB

	---
	language: en
	license: mit
	tags:
	- pose-estimation
	- computer-vision
	- keypoint-detection
	- diffusion-models
	- stable-diffusion
	- out-of-distribution
	- human-pose
	- top-down-pose-estimation
	- coco
	- mmpose
	library_name: pytorch
	---

	# SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (Body - 17 Keypoints)

	<div align="center">

	[![Paper](https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2509.24980)
	[![Project Page](https://img.shields.io/badge/Project-Website-pink?logo=googlechrome&logoColor=white)](https://t-s-liang.github.io/SDPose)
	[![HuggingFace Demo](https://img.shields.io/badge/🤗%20HuggingFace-Demo-yellow)](https://huggingface.co/spaces/teemosliang/SDPose-Body)
	[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

	</div>

	## Model Description

	SDPose is a state-of-the-art human pose estimation model that leverages the powerful visual priors from Stable Diffusion to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates 17 COCO body keypoints including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.

	### Model Architecture

	SDPose employs a U-Net backbone initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner:

	1. Person Detection: Detect human bounding boxes using an object detector (e.g., YOLO11-x)
	2. Pose Estimation: Crop and estimate 17 body keypoints for each detected person
	3. Heatmap Generation: Produce confidence heatmaps for precise keypoint estimation

	Model Specifications:
	- Backbone: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes)
	- Head: Custom heatmap prediction head
	- Input Resolution: 1024×768 (H×W)
	- Output: 17 keypoint heatmaps + coordinates with confidence scores
	- Framework: MMPose

	## Supported Keypoints (COCO Format)

	The model predicts 17 body keypoints following the COCO keypoint format:

	```
	0: nose
	1: left_eye
	2: right_eye
	3: left_ear
	4: right_ear
	5: left_shoulder
	6: right_shoulder
	7: left_elbow
	8: right_elbow
	9: left_wrist
	10: right_wrist
	11: left_hip
	12: right_hip
	13: left_knee
	14: right_knee
	15: left_ankle
	16: right_ankle
	```

	## Intended Use

	### Primary Use Cases

	- Human pose estimation in natural images
	- Pose estimation in artistic and stylized domains (paintings, anime, sketches)
	- Animation and video pose tracking
	- Cross-domain pose analysis and research
	- Applications requiring robust pose estimation under distribution shifts

	## How to Use

	### Installation

	```bash
	# Clone the repository
	git clone https://github.com/t-s-liang/SDPose-OOD.git
	cd SDPose-OOD

	# Install dependencies
	pip install -r requirements.txt
	# Download YOLO11-x for human detection
	wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/

	# Launch Gradio interface
	cd gradio_app
	bash launch_gradio.sh
	```

	## Training Data

	### Datasets

	Trained exclusively on COCO-2017 train2017 (no extra data).

	- COCO (Common Objects in Context): 200K+ images with 17 body keypoints

	### Preprocessing

	- Images are resized and cropped to 1024×768 resolution
	- Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout).
	- Heatmaps: UDP codec (MMPose style).

	### Comparison with Baselines

	SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens, ViTPose++) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data.

	See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results.

	## Citation

	If you use SDPose in your research, please cite our paper:

	```bibtex
	@misc{liang2025sdposeexploitingdiffusionpriors,
	title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation},
	author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan},
	year={2025},
	eprint={2509.24980},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2509.24980},
	}
	```

	## License

	This model is released under the [MIT License](https://opensource.org/licenses/MIT).

	## Additional Resources

	- 🌐 Project Website: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose)
	- 📄 Paper: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980)
	- 💻 Code Repository: [GitHub](https://github.com/t-s-liang/SDPose-OOD)
	- 🤗 Demo: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body)
	- 📧 Contact: tsliang2001@gmail.com

	---

	<div align="center">

	⭐ Star us on GitHub — it motivates us a lot!

	</div>