|  | --- | 
					
						
						|  | base_model: microsoft/Phi-4-mini-instruct | 
					
						
						|  | language: | 
					
						
						|  | - multilingual | 
					
						
						|  | - ar | 
					
						
						|  | - zh | 
					
						
						|  | - cs | 
					
						
						|  | - da | 
					
						
						|  | - nl | 
					
						
						|  | - en | 
					
						
						|  | - fi | 
					
						
						|  | - fr | 
					
						
						|  | - de | 
					
						
						|  | - he | 
					
						
						|  | - hu | 
					
						
						|  | - it | 
					
						
						|  | - ja | 
					
						
						|  | - ko | 
					
						
						|  | - 'no' | 
					
						
						|  | - pl | 
					
						
						|  | - pt | 
					
						
						|  | - ru | 
					
						
						|  | - es | 
					
						
						|  | - sv | 
					
						
						|  | - th | 
					
						
						|  | - tr | 
					
						
						|  | - uk | 
					
						
						|  | library_name: transformers | 
					
						
						|  | license: mit | 
					
						
						|  | license_link: https://huggingface.co/microsoft/Phi-4-mini-instruct/resolve/main/LICENSE | 
					
						
						|  | pipeline_tag: text-generation | 
					
						
						|  | tags: | 
					
						
						|  | - nlp | 
					
						
						|  | - code | 
					
						
						|  | - llama-cpp | 
					
						
						|  | - gguf-my-repo | 
					
						
						|  | widget: | 
					
						
						|  | - messages: | 
					
						
						|  | - role: user | 
					
						
						|  | content: Can you provide ways to eat combinations of bananas and dragonfruits? | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # Triangle104/Phi-4-mini-instruct-Q5_K_M-GGUF | 
					
						
						|  | This model was converted to GGUF format from [`microsoft/Phi-4-mini-instruct`](https://huggingface.co/microsoft/Phi-4-mini-instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. | 
					
						
						|  | Refer to the [original model card](https://huggingface.co/microsoft/Phi-4-mini-instruct) for more details on the model. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  | Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  | ## Use with llama.cpp | 
					
						
						|  | Install llama.cpp through brew (works on Mac and Linux) | 
					
						
						|  |  | 
					
						
						|  | ```bash | 
					
						
						|  | brew install llama.cpp | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | Invoke the llama.cpp server or the CLI. | 
					
						
						|  |  | 
					
						
						|  | ### CLI: | 
					
						
						|  | ```bash | 
					
						
						|  | llama-cli --hf-repo Triangle104/Phi-4-mini-instruct-Q5_K_M-GGUF --hf-file phi-4-mini-instruct-q5_k_m.gguf -p "The meaning to life and the universe is" | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ### Server: | 
					
						
						|  | ```bash | 
					
						
						|  | llama-server --hf-repo Triangle104/Phi-4-mini-instruct-Q5_K_M-GGUF --hf-file phi-4-mini-instruct-q5_k_m.gguf -c 2048 | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. | 
					
						
						|  |  | 
					
						
						|  | Step 1: Clone llama.cpp from GitHub. | 
					
						
						|  | ``` | 
					
						
						|  | git clone https://github.com/ggerganov/llama.cpp | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). | 
					
						
						|  | ``` | 
					
						
						|  | cd llama.cpp && LLAMA_CURL=1 make | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | Step 3: Run inference through the main binary. | 
					
						
						|  | ``` | 
					
						
						|  | ./llama-cli --hf-repo Triangle104/Phi-4-mini-instruct-Q5_K_M-GGUF --hf-file phi-4-mini-instruct-q5_k_m.gguf -p "The meaning to life and the universe is" | 
					
						
						|  | ``` | 
					
						
						|  | or | 
					
						
						|  | ``` | 
					
						
						|  | ./llama-server --hf-repo Triangle104/Phi-4-mini-instruct-Q5_K_M-GGUF --hf-file phi-4-mini-instruct-q5_k_m.gguf -c 2048 | 
					
						
						|  | ``` | 
					
						
						|  |  |