File size: 4,383 Bytes
7c013e3
 
 
 
 
 
 
 
d2099b7
86f2aad
 
 
 
 
 
 
 
d2099b7
 
86f2aad
 
 
 
 
 
 
 
d2099b7
 
86f2aad
 
 
 
 
 
 
 
d2099b7
 
86f2aad
 
030b98a
7c013e3
 
030b98a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d2099b7
4077e33
e1125d6
7c013e3
d2099b7
db99291
 
86f2aad
3cbf337
7ca2b71
030b98a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86f2aad
 
 
030b98a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen-Image
pipeline_tag: text-to-image
library_name: diffusers
widget:
- text: >-
    cute anime girl with massive fennec ears and a big fluffy fox tail with long
    wavy blonde hair between eyes and large blue eyes blonde colored eyelashes
    chubby wearing oversized clothes summer uniform long blue maxi skirt muddy
    clothes happy sitting on the side of the road in a run down dark gritty
    cyberpunk city with neon and a crumbling skyscraper in the rain at night
    while dipping her feet in a river of water she is holding a sign that says
    "ComfyUI is the best" written in cursive
  output:
    url: workflow-demo1.png
- text: >-
    cute anime girl with massive fennec ears and a big fluffy fox tail with long
    wavy blonde hair between eyes and large blue eyes blonde colored eyelashes
    chubby wearing oversized clothes summer uniform long blue maxi skirt muddy
    clothes happy sitting on the side of the road in a run down dark gritty
    cyberpunk city with neon and a crumbling skyscraper in the rain at night
    while dipping her feet in a river of water she is holding a sign that says
    "PIG is the best" written in cursive
  output:
    url: workflow-demo2.png
- text: >-
    cute anime girl with massive fennec ears and a big fluffy fox tail with long
    wavy blonde hair between eyes and large blue eyes blonde colored eyelashes
    chubby wearing oversized clothes summer uniform long blue maxi skirt muddy
    clothes happy sitting on the side of the road in a run down dark gritty
    cyberpunk city with neon and a crumbling skyscraper in the rain at night
    while dipping her feet in a river of water she is holding a sign that says
    "1+1=2 is it correct?" written in cursive
  output:
    url: workflow-demo3.png
tags:
- gguf-node
- gguf-connector
---
# **gguf quantized version of qwen-image**
- run it straight with `gguf-connector`
```
ggc q5
```
>
>GGUF file(s) available. Select which one to use:
>
>1. qwen-image-iq2_s.gguf
>2. qwen-image-iq4_nl.gguf
>3. qwen-image-q4_0.gguf
>4. qwen-image-q8_0.gguf
>
>Enter your choice (1 to 4): _
>
## **run it with gguf-node via comfyui**
- drag **qwen-image** to > `./ComfyUI/models/diffusion_models`
- drag **qwen2.5-vl-7b** [[4.43GB](https://huggingface.co/chatpig/qwen2.5-vl-7b-it-gguf/blob/main/qwen2.5-vl-7b-it-q4_0.gguf)] to > `./ComfyUI/models/text_encoders`
- drag **pig** [[254MB](https://huggingface.co/calcuis/pig-vae/blob/main/pig_qwen_image_vae_fp32-f16.gguf)] to > `./ComfyUI/models/vae`

<Gallery />

![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/qwen-image.png)

tip: the text encoder used for this model is qwen2.5-vl-**7b**; get more encoder either [here](https://huggingface.co/calcuis/pig-encoder/tree/main) (pig quant) or [here](https://huggingface.co/chatpig/qwen2.5-vl-7b-it-gguf/tree/main) (llama.cpp quant); the size is different from the one (qwen2.5-vl-**3b**) used in **omnigen2**

## **run it with diffusers**
```py
import torch
from diffusers import DiffusionPipeline, GGUFQuantizationConfig, QwenImageTransformer2DModel

model_path = "https://huggingface.co/calcuis/qwen-image-gguf/blob/main/qwen-image-q2_k.gguf"
transformer = QwenImageTransformer2DModel.from_single_file(
    model_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
    config="callgg/qi-decoder",
    subfolder="transformer"
)
pipe = DiffusionPipeline.from_pretrained(
    "callgg/qi-decoder",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()

prompt = "a pig holding a sign that says hello world"
positive_magic = {"en": "Ultra HD, 4K, cinematic composition."}
negative_prompt = " "
image = pipe(
    prompt=prompt + positive_magic["en"],
    negative_prompt=negative_prompt,
    height=1024,
    width=1024,
    num_inference_steps=24,
    true_cfg_scale=2.5,
    generator=torch.Generator()
).images[0]
image.save("output.png")
```

### **reference**
- base model from [qwen](https://huggingface.co/Qwen)
- comfyui from [comfyanonymous](https://github.com/comfyanonymous/ComfyUI)
- gguf-node ([pypi](https://pypi.org/project/gguf-node)|[repo](https://github.com/calcuis/gguf)|[pack](https://github.com/calcuis/gguf/releases))
- gguf-connector ([pypi](https://pypi.org/project/gguf-connector))