llama-2-7b-chat_q4_quantized_cpp
- This model contains the 4-bit quantized version of llama2-7B-chat model in cpp.
- This can be run on a local cpu system as a cpp module (instructions for the same are given below).
- As for the testing, the model has been tested on
Linux(Ubuntu)os with12 GB RAMandcore i5 processor. - The performance is
roughly~3 tokens per second
Usage:
- Clone the llama C++ repository from github:
git clone https://github.com/ggerganov/llama.cpp.git - Enter the llama.cpp repository(which was downloaded in the step 1) and build it by running the make command
cd llama.cpp
make - Create a directory names 7B under the directory llama.cpp/models and put the model file ggml-model-q4_0.bin under this newly created 7B directory
cd models
mkdir 7B - Navigate back to llama.cpp directory and run the below command:-
./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/alpaca.txtthe initial prompt file can be changed to anything from
prompts/alpaca.txtto of your choice - That's it. Enter the desired prompts and let the results surprise you...
Credits:
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support