YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Strict copy of https://huggingface.co/tiiuae/falcon-40b but quantized with GPTQ (on wikitext-2, 4bits, groupsize=128).

Intended to be used with https://github.com/huggingface/text-generation-inference

model=huggingface/falcon-40b-gptq
num_shard=2
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard --quantize gptq

For full configuration and usage outside docker, please refer to https://github.com/huggingface/text-generation-inference

Downloads last month
13
Safetensors
Model size
7B params
Tensor type
I64
I32
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Space using huggingface/falcon-40b-gptq 1