its work!

#1
by uzvisa - opened

Hi there!

Thanks a bunch for your help and attention!

I just checked and it seems to be working perfectly!

By the way, is it hard to take away the vision feature from the model?
For example, if I don’t use the vision feature, can the model be smaller?
I mostly write some texts, short descriptions, and stories.
My laptop isn’t up for heavy models.
It’s a MacBook Pro M1 with 16GB of RAM. And when I use LLMs, I end up unloading everything from RAM.

Here’s my list of models:
image

No, not at this time. Because of how quantizing with MXFP4 isn't natively supported in MLX-VLM (yet), I have to bolt on that feature and quantizing the embeddings doesn't go quite as much as a regular 4bit vision model. I can't remove the vision capability without damaging the model overall.

You could look at alternative models and see if they will work for your needs. There are many good text-only models. Olmo3 has a good 7b model that even at 8bit is only 7.76GB (https://huggingface.co/mlx-community/Olmo-3-7B-Instruct-8bit). I use it for a lot of things. I could see what an MXFP4 quant of that would come out to. My guess would be between 4-4.5GB. They also have a 7B Reasoning/Thinking model as well. Their 32B think is great, but I've not tried 7B think, only 7B instruct which is good.

ExaltedSlayer changed discussion status to closed

Sign up or log in to comment