Low quality output of q4_0
For anyone looking for a simple conclusion (that might become irrelevant after we sort out the issues): Choose a quantized version with as high bits as possible (a larger number after q) for now, and if possible, use the fp16 or even fp32 ones.
I have been experimenting with this model for a while. I used to thought that my system (a 6GB RTX3060 mobile + 16GB RAM) would not be able to handle fp16 model, so I first tried the q4_0 quant provided here, but was baffled by the bad quality of the output on my side while being fine on the official demo (on huggingface). The output audio sounded compressed, with muffled high frequency component (like snares and claps, but also on other instruments) that sounded like the audio was processed by some stem isolation software. No combination of the sampler, scheduler, cfg value, and seed gave good result. Then I tried using the all-in-one checkpoint provided by ComfyUI themselves and found that quantization is the culprit. Not only both memory and processing speed is pretty much the same, but the quality also went right up.
It seems that the model may be quite sensitive to current quantization techniques (without any fine-tuning), thus at this moment I would suggest that anyone with at least 4 GB of VRAM to use the fp16 checkpoint instead, unless you just need to squeeze out the last bit of performance for longer audio, for which you'll probably spend much more time waiting for decoding instead. For reference, 60 seconds of audio using the fp16 checkpoint only took around 4GB of VRAM during sampling (for decoding it takes up a lot more and would usually fallback to tiled decoding if the audio is more than 2 minutes long).
the full set encoder is inside another repo: here or umt5base; since there is not necessary to attach full set encoder inside every repo; q4 is for fast test only; not really for production; and the comment written on top is for the team review (move it to the bottom seems better); to break out all-in-one model file into different components will always be useful if another model shares the same parts/components, you could reuse it; or mix and match to fit your machine's limit; of course, fp32 and/or fp16 will be the best in quality; as the model size for ace is not large and your hardware is good (someone still using 1060), you could opt to use the safetensors checkpoint if you feel the output is better; thanks for your detailed comment truly; as you noticed the last minute bottle neck issue that's very good; btw, have you ever try or by any chance to re-assemble t5-base (for stable-audio one) and quant it with unmodified llama.cpp, someone tried many times with different combinations of config, but seems the quantized file doesn't work; then use another approach eventually it works, it's usable but with certain trade offs
the full set encoder is inside another repo: here or umt5base; since there is not necessary to attach full set encoder inside every repo
Might be interesting to test out whether the quantized text encoder or the ace-step model itself is the real problem. I swapped both of them at the same time and have not tested them individually.
btw, have you ever try or by any chance to re-assemble t5-base (for stable-audio one) and quant it with unmodified llama.cpp, someone tried many times with different combinations of config, but seems the quantized file doesn't work; then use another approach eventually it works
Not really into quantizing model at the moment, have tried to quantize umt5-xxl's encoder before writing sentencepiece tokenizer into metadata was a thing and of course it is unusable. If you'd like me to test that out, perhaps you can point me to some resources that might help me get started on this business (at the current state I only know how to quantize models with llama.cpp support and probably do some simple hacks like removing parts of the models).
thanks; please see t5-base and t5-base-encoder; just uploaded along with f32 gguf for each; llama.cpp supported quant should be sufficient
I can install PIG from this link https://huggingface.co/connector, right? But there are too many versions and I don't know which one to choose. Can you help me?
T5-Base has been quantized successfully (https://huggingface.co/Able2/t5-base-gguf), but during inferencing with ComfyUI it reports that there are size mismatch during handing of a feed forward layer: RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x768 and 3072x768). Upon comparing your version with mine, it seems that the layer name are quite different since I started with the safetensor. Perhaps that tripped the gguf handling logic into misplacing some of the tensors and caused the error.
For the encoder only model, it seems that either my llama.cpp is out of date (I'm using a version that someone else compiled for Colab environment) or llama.cpp somehow does not support T5Encoder out of the box, thus I cannot successfully quantize the encoder only model (can convert to gguf format without problem but cannot proceed to quantize it). Will try to build llama.cpp from source when I have more time.
I can install PIG from this link https://huggingface.co/connector, right? But there are too many versions and I don't know which one to choose. Can you help me?
What are you trying to run? If you're using ace-step then the things here in this repo is probably enough.
I can install PIG from this link https://huggingface.co/connector, right? But there are too many versions and I don't know which one to choose. Can you help me?
What are you trying to run? If you're using ace-step then the things here in this repo is probably enough.
Where can I download this file?
pig is the vae gguf file beginning with pig in name; you could find it inside this repo here
T5-Base has been quantized successfully (https://huggingface.co/Able2/t5-base-gguf), but during inferencing with ComfyUI it reports that there are size mismatch during handing of a feed forward layer:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x768 and 3072x768). Upon comparing your version with mine, it seems that the layer name are quite different since I started with the safetensor. Perhaps that tripped the gguf handling logic into misplacing some of the tensors and caused the error.
just quick tested your quant t5-base-f32-Q4_0.gguf, t5base-q4_0.gguf, and t5base-encoder-q4_0.gguf; yes, same problem; with message with clip missing: ['encoder.block.0.layer.1.DenseReluDense.wi.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight'... as well; guess the tensor swap logic doesn't swap those heads, try to fix it with the node coding; currently only @chatpig 's quant works; after adding the logic with matching those tensors, all should work i think
For the encoder only model, it seems that either my llama.cpp is out of date (I'm using a version that someone else compiled for Colab environment) or llama.cpp somehow does not support T5Encoder out of the box, thus I cannot successfully quantize the encoder only model (can convert to gguf format without problem but cannot proceed to quantize it). Will try to build llama.cpp from source when I have more time.
for encoder only model here retains the encoder only tensors (~50% smaller in file size), if you use a full structure, try edit the name from config; change "T5ForConditionalGeneration" to "T5EncoderModel" then t5 will be renamed to t5encoder; actually, you could follow the same logic, change the to "T5EncoderModel" to "T5ForConditionalGeneration", the one you are currently used (from someone's colab env) might work
