llama.cpp support

#21

by ngxson - opened 12 days ago

12 days ago

Hi Zhipu AI team,

I am Xuan-Son from llama.cpp team. We're glad to announce that the GLM-4V model family is now official supported by llama.cpp. For more info, please refer to this PR: https://github.com/ggml-org/llama.cpp/pull/18042

We provide a simple-to-use weight in GGUF format here (quantized to 4 bits): https://huggingface.co/collections/ggml-org/glm-4v

Users can run it via a simple command:

# using CLI
llama-cli -hf ggml-org/GLM-4.6V-Flash-GGUF

# or, deploy as server
llama-server -hf ggml-org/GLM-4.6V-Flash-GGUF

Users can also run it via LM Studio using the same mentioned GGUF above.

I would like to ask if Zhipu AI team can mention this in the "Usage" section of your model card? This can allow users to use the model locally on low-resource device, making it accessible for everyone.

Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment