V9 OOM

#106

by Ashotkon - opened 6 days ago

Discussion

Ashotkon

6 days ago

If the V9 model is used to indicate OOM, V5 is used normally.
8G vram 32G ram

yixun5613631

6 days ago

Indeed, computers with 8GB of VRAM struggle to run after V8. I wonder if there is a quantified version of the model, as it poses considerable difficulties for users with low VRAM

Phil2Sat

6 days ago

im just doing a q5_k_m for testing the v9 and only the full nsfw.
as i said in my repo comments it takes always a huge amount of time to make the whole set.
since im checking constanstly the reports of users and the mass says v5.3 is actually the best, for me its impossible to release every new version.
i wanted to upload v8 as i began, v8.1 came out, then i downloaded and reports said its not good.
to quant v8,8.1 and all different v9 it would need around 14days.
just the upload of ~220gb whats one model in all quants, needs at least 8hrs IF all goes good.
while quanting and uploading i cant really use my PC. Thats the main cause, normally i do it over night.

diveren2010

6 days ago

Is there an easy step by step guide for quanting so maybe others (like me) can help?

Phil2Sat

6 days ago

i hope i got it packed patched and uploaded, no guarantee at all
https://github.com/phil2sat/convert
the tool_auto.py does download checkout and patch llama.cpp if you get compile errors (WATCH FOR IT AT FIRST RUN) go into llama.cpp.auto folder and:

mkdir build           
cmake -B build
cmake --build build --config Debug -j10 --target llama-quantize
cd ..

after this the tool auto does its work:

python tool_auto.py --src  /data/models/Qwen-rapid/Qwen-Rapid-NSFW-v9.0.safetensors --output /data/models/Qwen-rapid/out/v90 --temp-dir /daten/models/Qwen-rapid/tmp --quants all

criss94

5 days ago

i hope i got it packed patched and uploaded, no guarantee at all
https://github.com/phil2sat/convert
the tool_auto.py does download checkout and patch llama.cpp if you get compile errors (WATCH FOR IT AT FIRST RUN) go into llama.cpp.auto folder and:
mkdir build           
cmake -B build
cmake --build build --config Debug -j10 --target llama-quantize
cd ..
after this the tool auto does its work:
python tool_auto.py --src  /data/models/Qwen-rapid/Qwen-Rapid-NSFW-v9.0.safetensors --output /data/models/Qwen-rapid/out/v90 --temp-dir /daten/models/Qwen-rapid/tmp --quants all

Hey there! I’ve got a bit of an upfront request—could you tell me which part of the qwen-image conversion script I need to modify if the safetensors I want to quantize aren’t your "aio" versions?

Phil2Sat

5 days ago

the name and folder!? works with checkpoint or diffusion model

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment