llama.cpp support

#21
by ngxson - opened

Hi Zhipu AI team,

I am Xuan-Son from llama.cpp team. We're glad to announce that the GLM-4V model family is now official supported by llama.cpp. For more info, please refer to this PR: https://github.com/ggml-org/llama.cpp/pull/18042

We provide a simple-to-use weight in GGUF format here (quantized to 4 bits): https://huggingface.co/collections/ggml-org/glm-4v

Users can run it via a simple command:

# using CLI
llama-cli -hf ggml-org/GLM-4.6V-Flash-GGUF

# or, deploy as server
llama-server -hf ggml-org/GLM-4.6V-Flash-GGUF

Users can also run it via LM Studio using the same mentioned GGUF above.

I would like to ask if Zhipu AI team can mention this in the "Usage" section of your model card? This can allow users to use the model locally on low-resource device, making it accessible for everyone.

Thank you!

Sign up or log in to comment