This is a MXFP4_MOE quantization of the model MiniMax-M2-THRIFT

Download the latest llama.cpp to use it.

Compression: 25% expert pruning (256 -> 192), top_k = 8

Also keep in mind that this a coding model.

GGUF

Model size

173B params

Architecture

minimax-m2

Hardware compatibility

4-bit

Model tree for noctrex/MiniMax-M2-THRIFT-MXFP4_MOE-GGUF

Base model

Finetuned

Quantized

(4)

this model