Could you make a set of quants for the new 12B version?

#1
by HDiffusion - opened

I believe this is the original 13.6B version. There's a newer version with even more layers removed that they claim performs just as well. I was hoping that you could produce gguf versions of that model too.

Thank you for your work thus far.

What do you run the gguf or non-gguf with? What's a low end system that can run it?

What do you run the gguf or non-gguf with? What's a low end system that can run it?

I'm using SwarmUI which is believe uses these nodes for it's gguf support: https://github.com/city96/ComfyUI-GGUF

I'm currently running the q4k_s on my 2060 6GB using less than 32GB of system memory. It's very slow (~15 seconds per step) but it's usable at 4 steps with the distill lora that's premerged into these weights. The prompt following is better than anything else I've tried so the speed tradeoff is worth it to me.

Hey, yeah ill try to get them uploaded tomorrow (;

I believe this is the original 13.6B version. There's a newer version with even more layers removed that they claim performs just as well. I was hoping that you could produce gguf versions of that model too.

Thank you for you for your work thus far.

Okay sorry for the delay, im uploading them rn a few are already online (;

I am getting fault using the 12B (Q8) on comfyui.
KSampler
linear(): input and weight.T shapes cannot be multiplied (1x3072 and 18432x3072)
13B works, but cannot add Loras. They seem to not load.

hmm ill check it out, maybe they changed something with the tensor names (;

Does this work as a diffusion model for you?

Ill test that out, because if not the issue is with their model /:

Okay yeah, it seems like the tensor structure is different than what comfyui expects, I dont have time to fix that rn though, since that involves checking which tensor maps onto which, if you want and have time, you can do that in the browser of hugginface yourself though, just check the tensors it says are missing in comfyui and compare the 13b with the 12b one in browser and look which key would be which (;
If you wont be able to do that ill do it when i have time (;

image

seems like in the 12B you randomly removed some tensors from some layers, the 13B version just removed whole layers cleanly.

image

seems like in the 12B you randomly removed some tensors from some layers, the 13B version just removed whole layers cleanly.

That should be the case with their original model too, no?

Nope, the original one is fine.

Nope, the original one is fine.

hmm ill go and investigate further later on (;

@wsbagnsv1 take a look at transformer block 18 of yours vs the original

original:
image

Yours:
image

there are many other such cases.

Yeah atm I dont have enough time to debug, so thanks for doing a part of that (;

Sign up or log in to comment