Update model card: Add Expression Adapter paper, pipeline tag, and diffusers usage
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,17 +1,16 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
library_name: diffusers
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
# Arc2Face Model Card
|
| 9 |
|
| 10 |
<div align="center">
|
| 11 |
|
| 12 |
-
[**Project Page**](https://arc2face.github.io/) **|** [**Paper (ArXiv)**](https://arxiv.org/abs/2403.11641) **|** [**Code**](https://github.com/foivospar/Arc2Face) **|** [🤗 **Gradio demo**](https://huggingface.co/spaces/FoivosPar/Arc2Face)
|
| 13 |
-
|
| 14 |
-
|
| 15 |
|
| 16 |
</div>
|
| 17 |
|
|
@@ -20,8 +19,14 @@ library_name: diffusers
|
|
| 20 |
Arc2Face is an ID-conditioned face model, that can generate diverse, ID-consistent photos of a person given only its ArcFace ID-embedding.
|
| 21 |
It is trained on a restored version of the WebFace42M face recognition database, and is further fine-tuned on FFHQ and CelebA-HQ.
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
</div>
|
| 26 |
|
| 27 |
## Model Details
|
|
@@ -38,11 +43,11 @@ Arc2Face adapts the pre-trained backbone to the task of ID-to-face generation, c
|
|
| 38 |
|
| 39 |
We also provide a ControlNet model trained on top of Arc2Face for pose control.
|
| 40 |
|
| 41 |
-
<div
|
| 42 |
-
<img src='assets/controlnet_short.jpg'>
|
| 43 |
</div>
|
| 44 |
|
| 45 |
-
##
|
| 46 |
|
| 47 |
The models can be downloaded directly from this repository or using python:
|
| 48 |
```python
|
|
@@ -58,6 +63,75 @@ hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="controlnet/diffusion_pyt
|
|
| 58 |
|
| 59 |
Please check our [GitHub repository](https://github.com/foivospar/Arc2Face) for complete inference instructions.
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
## Limitations and Bias
|
| 62 |
|
| 63 |
- Only one person per image can be generated.
|
|
@@ -66,9 +140,9 @@ Please check our [GitHub repository](https://github.com/foivospar/Arc2Face) for
|
|
| 66 |
|
| 67 |
## Citation
|
| 68 |
|
|
|
|
| 69 |
|
| 70 |
-
**BibTeX:**
|
| 71 |
-
|
| 72 |
```bibtex
|
| 73 |
@inproceedings{paraperas2024arc2face,
|
| 74 |
title={Arc2Face: A Foundation Model for ID-Consistent Human Faces},
|
|
@@ -76,4 +150,16 @@ Please check our [GitHub repository](https://github.com/foivospar/Arc2Face) for
|
|
| 76 |
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
|
| 77 |
year={2024}
|
| 78 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
```
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
library_name: diffusers
|
| 5 |
+
license: mit
|
| 6 |
+
pipeline_tag: image-to-image
|
| 7 |
---
|
| 8 |
|
| 9 |
# Arc2Face Model Card
|
| 10 |
|
| 11 |
<div align="center">
|
| 12 |
|
| 13 |
+
[**Project Page**](https://arc2face.github.io/) **|** [**Original Paper (ArXiv)**](https://arxiv.org/abs/2403.11641) **|** [**Expression Adapter Paper (HF)**](https://huggingface.co/papers/2510.04706) **|** [**Code**](https://github.com/foivospar/Arc2Face) **|** [🤗 **Gradio demo**](https://huggingface.co/spaces/FoivosPar/Arc2Face)
|
|
|
|
|
|
|
| 14 |
|
| 15 |
</div>
|
| 16 |
|
|
|
|
| 19 |
Arc2Face is an ID-conditioned face model, that can generate diverse, ID-consistent photos of a person given only its ArcFace ID-embedding.
|
| 20 |
It is trained on a restored version of the WebFace42M face recognition database, and is further fine-tuned on FFHQ and CelebA-HQ.
|
| 21 |
|
| 22 |
+
Arc2Face has been extended with a fine-grained **Expression Adapter**, enabling the generation of any subject under any facial expression (even rare, asymmetric, subtle, or extreme ones). This extension is detailed in the paper [ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion](https://huggingface.co/papers/2510.04706).
|
| 23 |
+
|
| 24 |
+
<div align="center">
|
| 25 |
+
<img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/exp_teaser.jpg'>
|
| 26 |
+
</div>
|
| 27 |
+
|
| 28 |
+
<div align="center">
|
| 29 |
+
<img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/samples_short.jpg'>
|
| 30 |
</div>
|
| 31 |
|
| 32 |
## Model Details
|
|
|
|
| 43 |
|
| 44 |
We also provide a ControlNet model trained on top of Arc2Face for pose control.
|
| 45 |
|
| 46 |
+
<div align="center">
|
| 47 |
+
<img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/controlnet_short.jpg'>
|
| 48 |
</div>
|
| 49 |
|
| 50 |
+
## Download Models
|
| 51 |
|
| 52 |
The models can be downloaded directly from this repository or using python:
|
| 53 |
```python
|
|
|
|
| 63 |
|
| 64 |
Please check our [GitHub repository](https://github.com/foivospar/Arc2Face) for complete inference instructions.
|
| 65 |
|
| 66 |
+
## Sample Usage with Diffusers
|
| 67 |
+
|
| 68 |
+
To use the Arc2Face model with the `diffusers` library, first load the pipeline components:
|
| 69 |
+
|
| 70 |
+
```python
|
| 71 |
+
from diffusers import (
|
| 72 |
+
StableDiffusionPipeline,
|
| 73 |
+
UNet2DConditionModel,
|
| 74 |
+
DPMSolverMultistepScheduler,
|
| 75 |
+
)
|
| 76 |
+
|
| 77 |
+
from arc2face import CLIPTextModelWrapper, project_face_embs
|
| 78 |
+
|
| 79 |
+
import torch
|
| 80 |
+
from insightface.app import FaceAnalysis
|
| 81 |
+
from PIL import Image
|
| 82 |
+
import numpy as np
|
| 83 |
+
|
| 84 |
+
# Arc2Face is built upon SD1.5
|
| 85 |
+
# The repo below can be used instead of the now deprecated 'runwayml/stable-diffusion-v1-5'
|
| 86 |
+
base_model = 'runwayml/stable-diffusion-v1-5' # Changed to match original from README
|
| 87 |
+
|
| 88 |
+
encoder = CLIPTextModelWrapper.from_pretrained(
|
| 89 |
+
'models', subfolder="encoder", torch_dtype=torch.float16
|
| 90 |
+
)
|
| 91 |
+
|
| 92 |
+
unet = UNet2DConditionModel.from_pretrained(
|
| 93 |
+
'models', subfolder="arc2face", torch_dtype=torch.float16
|
| 94 |
+
)
|
| 95 |
+
|
| 96 |
+
pipeline = StableDiffusionPipeline.from_pretrained(
|
| 97 |
+
base_model,
|
| 98 |
+
text_encoder=encoder,
|
| 99 |
+
unet=unet,
|
| 100 |
+
torch_dtype=torch.float16,
|
| 101 |
+
safety_checker=None
|
| 102 |
+
)
|
| 103 |
+
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
|
| 104 |
+
pipeline = pipeline.to('cuda')
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
Then, pick an image to extract the ID-embedding and generate images:
|
| 108 |
+
|
| 109 |
+
```python
|
| 110 |
+
app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
|
| 111 |
+
app.prepare(ctx_id=0, det_size=(640, 640))
|
| 112 |
+
|
| 113 |
+
img = np.array(Image.open('https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/examples/joacquin.png'))[:,:,::-1] # Updated image path
|
| 114 |
+
|
| 115 |
+
faces = app.get(img)
|
| 116 |
+
faces = sorted(faces, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1] # select largest face (if more than one detected)
|
| 117 |
+
id_emb = torch.tensor(faces['embedding'], dtype=torch.float16)[None].cuda()
|
| 118 |
+
id_emb = id_emb/torch.norm(id_emb, dim=1, keepdim=True) # normalize embedding
|
| 119 |
+
id_emb = project_face_embs(pipeline, id_emb) # pass through the encoder
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
<div align="center">
|
| 123 |
+
<img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/examples/joacquin.png' style='width:25%;'>
|
| 124 |
+
</div>
|
| 125 |
+
|
| 126 |
+
Finally, generate images:
|
| 127 |
+
```python
|
| 128 |
+
num_images = 4
|
| 129 |
+
images = pipeline(prompt_embeds=id_emb, num_inference_steps=25, guidance_scale=3.0, num_images_per_prompt=num_images).images
|
| 130 |
+
```
|
| 131 |
+
<div align="center">
|
| 132 |
+
<img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/samples.jpg'>
|
| 133 |
+
</div>
|
| 134 |
+
|
| 135 |
## Limitations and Bias
|
| 136 |
|
| 137 |
- Only one person per image can be generated.
|
|
|
|
| 140 |
|
| 141 |
## Citation
|
| 142 |
|
| 143 |
+
If you find Arc2Face useful for your research, please consider citing us:
|
| 144 |
|
| 145 |
+
**BibTeX for Arc2Face:**
|
|
|
|
| 146 |
```bibtex
|
| 147 |
@inproceedings{paraperas2024arc2face,
|
| 148 |
title={Arc2Face: A Foundation Model for ID-Consistent Human Faces},
|
|
|
|
| 150 |
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
|
| 151 |
year={2024}
|
| 152 |
}
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
Additionally, if you use the Expression Adapter, please also cite the extension:
|
| 156 |
+
|
| 157 |
+
**BibTeX for Expression Adapter:**
|
| 158 |
+
```bibtex
|
| 159 |
+
@inproceedings{paraperas2025arc2face_exp,
|
| 160 |
+
title={ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion},
|
| 161 |
+
author={Paraperas Papantoniou, Foivos and Zafeiriou, Stefanos},
|
| 162 |
+
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
|
| 163 |
+
year={2025}
|
| 164 |
+
}
|
| 165 |
```
|