Qwen2.5-3B-Instruct-Q4_K_M-GGUF
GGUF quantized version of Qwen2.5-3B-Instruct for mobile and edge deployment.
Model Details
- Base Model: Qwen/Qwen2.5-3B-Instruct
- Quantization: Q4_K_M (4-bit quantization with K-quants)
- File Size: ~1.8 GB
- Format: GGUF
Usage
With llama.cpp
./llama-cli -m Qwen2.5-3B-Instruct-Q4_K_M.gguf -p "Hello, how are you?"
With llama.swiftui (iOS)
This model is optimized for running on iOS devices using the llama.swiftui app.
- Download the model
- Copy to app's Documents folder
- Load and chat!
Chat Template
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
Performance
| Device | Tokens/sec |
|---|---|
| iPhone 15 Pro | ~15-25 t/s |
| iPhone 14 | ~10-15 t/s |
| M1 Mac | ~30-50 t/s |
License
Apache 2.0 (following the base model license)
Credits
- Original model by Qwen Team
- Quantization and mobile optimization by TurkishCodeMan
- Downloads last month
- 18
Hardware compatibility
Log In
to view the estimation
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support