Model Card for Behemoth-ReduX-123B-v1c-GGUF

I just made this so I could fit it on my GPU with a little more room for comfort.

I had to quantize from their Q5_K_M.

This is in theory bad, according to llama.cpp. So much so they require a flag to allow it. In practice though, this isn't the case. Every time I've quantized from a GGUF the result has been great.

So don't be worried. This model, while quantized twice, should perform perfectly.

Bias, Risks, and Limitations

Uncensored

Recommendations

See the base model for information Mistral V2/V3 template

Downloads last month
54
GGUF
Model size
123B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for IIEleven11/Behemoth-ReduX-123B-v1c-Q3_K_M_GGUF