FP8?

#3
by notafraud - opened

Hello! Can you please upload fp8 version of the model? GGUF works slower than fp8, so it makes sense.

  1. GGUF can be slower due to the need for model decompression, but actual performance also depends on hardware
  2. He has other quants, no idea why they haven't been in the repo
    P.S. I would fail to convert it to fp8, sadly?

There is full model, shouldnt be problem to make one.

I mean if its fp32 .. is it?

Yes, I can probably download diffusers version and cast into fp8, but the model is twice the size of fp16. FP8 version would be great.

https://x.com/AstraliteHeart/status/1982558682362872264 : "GGUFs ... of various sizes" + https://hf.co/purplesmartai/pony-v7-base/blob/main/gguf/comparison.png
Where are they?
@Mescalamba
The model is fp16.

Why not bf16? Would be better

FP8 Scaled if possible would be best I think.

PurpleSmartAI, INC org

Scaled FP8 soon

https://huggingface.co/silveroxides/pony-v7-base-fp8_scaled-and-GGUF

With identical settings it generates just a black image.

https://huggingface.co/silveroxides/pony-v7-base-fp8_scaled-and-GGUF

With identical settings it generates just a black image.

See my comment in that repo, hybrid fp8 requires https://github.com/silveroxides/ComfyUI_Hybrid-Scaled_fp8-Loader

See my comment in that repo, hybrid fp8 requires https://github.com/silveroxides/ComfyUI_Hybrid-Scaled_fp8-Loader

With this loader, it works. In total, it gives me around a 15% increase in the speed of generations. Not bad, but I expected more.

This comment has been hidden (marked as Resolved)

See my comment in that repo, hybrid fp8 requires https://github.com/silveroxides/ComfyUI_Hybrid-Scaled_fp8-Loader

With this loader, it works. In total, it gives me around a 15% increase in the speed of generations. Not bad, but I expected more.

It would still be nice to have just a normal FP8 Scaled version that doesn't need that node TBH.

@qpqpqpqpqpqp , I don't know why your comment is hidden, but there's still an answer here. I meant it is faster than fp16; in my opinion, GGUF models are garbage at all. Any GGUF models I tried were always slower than fp16, often several times slower, and usually of worse quality than fp8. I think they might be useful if you have a very powerful video card with fairly low VRAM.

Sign up or log in to comment