llama.cpp support
#21
by
ngxson
- opened
Hi Zhipu AI team,
I am Xuan-Son from llama.cpp team. We're glad to announce that the GLM-4V model family is now official supported by llama.cpp. For more info, please refer to this PR: https://github.com/ggml-org/llama.cpp/pull/18042
We provide a simple-to-use weight in GGUF format here (quantized to 4 bits): https://huggingface.co/collections/ggml-org/glm-4v
Users can run it via a simple command:
# using CLI
llama-cli -hf ggml-org/GLM-4.6V-Flash-GGUF
# or, deploy as server
llama-server -hf ggml-org/GLM-4.6V-Flash-GGUF
Users can also run it via LM Studio using the same mentioned GGUF above.
I would like to ask if Zhipu AI team can mention this in the "Usage" section of your model card? This can allow users to use the model locally on low-resource device, making it accessible for everyone.
Thank you!