V9 OOM

#106
by Ashotkon - opened

If the V9 model is used to indicate OOM, V5 is used normally.
8G vram 32G ram

Indeed, computers with 8GB of VRAM struggle to run after V8. I wonder if there is a quantified version of the model, as it poses considerable difficulties for users with low VRAM

im just doing a q5_k_m for testing the v9 and only the full nsfw.
as i said in my repo comments it takes always a huge amount of time to make the whole set.
since im checking constanstly the reports of users and the mass says v5.3 is actually the best, for me its impossible to release every new version.
i wanted to upload v8 as i began, v8.1 came out, then i downloaded and reports said its not good.
to quant v8,8.1 and all different v9 it would need around 14days.
just the upload of ~220gb whats one model in all quants, needs at least 8hrs IF all goes good.
while quanting and uploading i cant really use my PC. Thats the main cause, normally i do it over night.

Is there an easy step by step guide for quanting so maybe others (like me) can help?

i hope i got it packed patched and uploaded, no guarantee at all
https://github.com/phil2sat/convert
the tool_auto.py does download checkout and patch llama.cpp if you get compile errors (WATCH FOR IT AT FIRST RUN) go into llama.cpp.auto folder and:

mkdir build           
cmake -B build
cmake --build build --config Debug -j10 --target llama-quantize
cd ..

after this the tool auto does its work:

python tool_auto.py --src  /data/models/Qwen-rapid/Qwen-Rapid-NSFW-v9.0.safetensors --output /data/models/Qwen-rapid/out/v90 --temp-dir /daten/models/Qwen-rapid/tmp --quants all

i hope i got it packed patched and uploaded, no guarantee at all
https://github.com/phil2sat/convert
the tool_auto.py does download checkout and patch llama.cpp if you get compile errors (WATCH FOR IT AT FIRST RUN) go into llama.cpp.auto folder and:

mkdir build           
cmake -B build
cmake --build build --config Debug -j10 --target llama-quantize
cd ..

after this the tool auto does its work:

python tool_auto.py --src  /data/models/Qwen-rapid/Qwen-Rapid-NSFW-v9.0.safetensors --output /data/models/Qwen-rapid/out/v90 --temp-dir /daten/models/Qwen-rapid/tmp --quants all

Hey there! I鈥檝e got a bit of an upfront request鈥攃ould you tell me which part of the qwen-image conversion script I need to modify if the safetensors I want to quantize aren鈥檛 your "aio" versions?

the name and folder!? works with checkpoint or diffusion model

Sign up or log in to comment