fp8_e5m2?
Hey could you please make an fp8_e5m2 quant of your models also when you release them?
If you make this version, skipping bf16 and going straight from source, the speed can be up to 50% faster for a lot of users. (I am one of them.)
Thanks
tried q8_0? reports say its 60% faster
tried q8_0? reports say its 60% faster
let me know when v9 Q8 is uploaded.
click on the v9 safetensors here in hf. there is a workflow, paste it into chatgpt and say you want a json.
load in comfyui and change base and fp8 run and test.
you need around 100gb ram or a night if swap is big enough.
thats why i didnt try it. the model alone in bf16 is 40gb fp32 is 80gb.
and see the base is just bf16 so fp32 isnt available at all so the whole process is useless.
the only image i could find is https://huggingface.co/patientxtr/qwen-image-edit-fp8-e5m2/tree/main
maybe test and compare speed if thats truth or nonsense.
Thanks for finding that for me.
The speed increase is only roughly 5-9% gain on that model.
So either they did not quantize optimally, or the vision model is the hindering factor here.
Now trying to find an e5m2 of the TE, but doesn't seem to exist.
A good example for a huge performance gain for e4m3 vs e5m2 is the regular Qwen image fp8, vs JIB mix fp8 e5m2, I actually get nearly double speed with that JIB model.

