Diffusers documentation

ZImageTransformer2DModel

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.36.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

ZImageTransformer2DModel

A Transformer model for image-like data from Z-Image.

ZImageTransformer2DModel

class diffusers.ZImageTransformer2DModel

< >

( all_patch_size = (2,) all_f_patch_size = (1,) in_channels = 16 dim = 3840 n_layers = 30 n_refiner_layers = 2 n_heads = 30 n_kv_heads = 30 norm_eps = 1e-05 qk_norm = True cap_feat_dim = 2560 siglip_feat_dim = None rope_theta = 256.0 t_scale = 1000.0 axes_dims = [32, 48, 48] axes_lens = [1024, 512, 512] )

forward

< >

( x: typing.Union[typing.List[torch.Tensor], typing.List[typing.List[torch.Tensor]]] t cap_feats: typing.Union[typing.List[torch.Tensor], typing.List[typing.List[torch.Tensor]]] return_dict: bool = True controlnet_block_samples: typing.Optional[typing.Dict[int, torch.Tensor]] = None siglip_feats: typing.Optional[typing.List[typing.List[torch.Tensor]]] = None image_noise_mask: typing.Optional[typing.List[typing.List[int]]] = None patch_size: int = 2 f_patch_size: int = 1 )

Flow: patchify -> t_embed -> x_embed -> x_refine -> cap_embed -> cap_refine -> [siglip_embed -> siglip_refine] -> build_unified -> main_layers -> final_layer -> unpatchify

patchify_and_embed

< >

( all_image: typing.List[torch.Tensor] all_cap_feats: typing.List[torch.Tensor] patch_size: int f_patch_size: int )

Patchify for basic mode: single image per batch item.

patchify_and_embed_omni

< >

( all_x: typing.List[typing.List[torch.Tensor]] all_cap_feats: typing.List[typing.List[torch.Tensor]] all_siglip_feats: typing.List[typing.List[torch.Tensor]] patch_size: int f_patch_size: int images_noise_mask: typing.List[typing.List[int]] )

Patchify for omni mode: multiple images per batch item with noise masks.

Update on GitHub