Instructions to use FoivosPar/Arc2Face with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use FoivosPar/Arc2Face with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("FoivosPar/Arc2Face", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| library_name: diffusers | |
| license: mit | |
| pipeline_tag: image-to-image | |
| # Arc2Face Model Card | |
| <div align="center"> | |
| [**Project Page**](https://arc2face.github.io/) **|** [**Original Paper (ArXiv)**](https://arxiv.org/abs/2403.11641) **|** [**Expression Adapter Paper (HF)**](https://huggingface.co/papers/2510.04706) **|** [**Code**](https://github.com/foivospar/Arc2Face) **|** [🤗 **Gradio demo**](https://huggingface.co/spaces/FoivosPar/Arc2Face) | |
| </div> | |
| ## Introduction | |
| Arc2Face is an ID-conditioned face model, that can generate diverse, ID-consistent photos of a person given only its ArcFace ID-embedding. | |
| It is trained on a restored version of the WebFace42M face recognition database, and is further fine-tuned on FFHQ and CelebA-HQ. | |
| Arc2Face has been extended with a fine-grained **Expression Adapter**, enabling the generation of any subject under any facial expression (even rare, asymmetric, subtle, or extreme ones). This extension is detailed in the paper [ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion](https://huggingface.co/papers/2510.04706). | |
| <div align="center"> | |
| <img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/exp_teaser.jpg'> | |
| </div> | |
| <div align="center"> | |
| <img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/samples_short.jpg'> | |
| </div> | |
| ## Model Details | |
| It consists of 2 components: | |
| - encoder, a finetuned CLIP ViT-L/14 model | |
| - arc2face, a finetuned UNet model | |
| both of which are fine-tuned from [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5). | |
| The encoder is tailored for projecting ID-embeddings to the CLIP latent space. | |
| Arc2Face adapts the pre-trained backbone to the task of ID-to-face generation, conditioned solely on ID vectors. | |
| ## ControlNet (pose) | |
| We also provide a ControlNet model trained on top of Arc2Face for pose control. | |
| <div align="center"> | |
| <img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/controlnet_short.jpg'> | |
| </div> | |
| ## Download Models | |
| The models can be downloaded directly from this repository or using python: | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/config.json", local_dir="./models") | |
| hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/diffusion_pytorch_model.safetensors", local_dir="./models") | |
| hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/config.json", local_dir="./models") | |
| hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/pytorch_model.bin", local_dir="./models") | |
| hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="controlnet/config.json", local_dir="./models") | |
| hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="controlnet/diffusion_pytorch_model.safetensors", local_dir="./models") | |
| ``` | |
| Please check our [GitHub repository](https://github.com/foivospar/Arc2Face) for complete inference instructions. | |
| ## Sample Usage with Diffusers | |
| To use the Arc2Face model with the `diffusers` library, first load the pipeline components: | |
| ```python | |
| from diffusers import ( | |
| StableDiffusionPipeline, | |
| UNet2DConditionModel, | |
| DPMSolverMultistepScheduler, | |
| ) | |
| from arc2face import CLIPTextModelWrapper, project_face_embs | |
| import torch | |
| from insightface.app import FaceAnalysis | |
| from PIL import Image | |
| import numpy as np | |
| # Arc2Face is built upon SD1.5 | |
| # The repo below can be used instead of the now deprecated 'runwayml/stable-diffusion-v1-5' | |
| base_model = 'runwayml/stable-diffusion-v1-5' # Changed to match original from README | |
| encoder = CLIPTextModelWrapper.from_pretrained( | |
| 'models', subfolder="encoder", torch_dtype=torch.float16 | |
| ) | |
| unet = UNet2DConditionModel.from_pretrained( | |
| 'models', subfolder="arc2face", torch_dtype=torch.float16 | |
| ) | |
| pipeline = StableDiffusionPipeline.from_pretrained( | |
| base_model, | |
| text_encoder=encoder, | |
| unet=unet, | |
| torch_dtype=torch.float16, | |
| safety_checker=None | |
| ) | |
| pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) | |
| pipeline = pipeline.to('cuda') | |
| ``` | |
| Then, pick an image to extract the ID-embedding and generate images: | |
| ```python | |
| app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) | |
| app.prepare(ctx_id=0, det_size=(640, 640)) | |
| img = np.array(Image.open('https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/examples/joacquin.png'))[:,:,::-1] # Updated image path | |
| faces = app.get(img) | |
| faces = sorted(faces, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1] # select largest face (if more than one detected) | |
| id_emb = torch.tensor(faces['embedding'], dtype=torch.float16)[None].cuda() | |
| id_emb = id_emb/torch.norm(id_emb, dim=1, keepdim=True) # normalize embedding | |
| id_emb = project_face_embs(pipeline, id_emb) # pass through the encoder | |
| ``` | |
| <div align="center"> | |
| <img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/examples/joacquin.png' style='width:25%;'> | |
| </div> | |
| Finally, generate images: | |
| ```python | |
| num_images = 4 | |
| images = pipeline(prompt_embeds=id_emb, num_inference_steps=25, guidance_scale=3.0, num_images_per_prompt=num_images).images | |
| ``` | |
| <div align="center"> | |
| <img src='https://huggingface.co/foivospar/Arc2Face/resolve/main/assets/samples.jpg'> | |
| </div> | |
| ## Limitations and Bias | |
| - Only one person per image can be generated. | |
| - Poses are constrained to the frontal hemisphere, similar to FFHQ images. | |
| - The model may reflect the biases of the training data or the ID encoder. | |
| ## Citation | |
| If you find Arc2Face useful for your research, please consider citing us: | |
| **BibTeX for Arc2Face:** | |
| ```bibtex | |
| @inproceedings{paraperas2024arc2face, | |
| title={Arc2Face: A Foundation Model for ID-Consistent Human Faces}, | |
| author={Paraperas Papantoniou, Foivos and Lattas, Alexandros and Moschoglou, Stylianos and Deng, Jiankang and Kainz, Bernhard and Zafeiriou, Stefanos}, | |
| booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, | |
| year={2024} | |
| } | |
| ``` | |
| Additionally, if you use the Expression Adapter, please also cite the extension: | |
| **BibTeX for Expression Adapter:** | |
| ```bibtex | |
| @inproceedings{paraperas2025arc2face_exp, | |
| title={ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion}, | |
| author={Paraperas Papantoniou, Foivos and Zafeiriou, Stefanos}, | |
| booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, | |
| year={2025} | |
| } | |
| ``` |