DACMini-IT

#1456

by MaxForce01 - opened Oct 17, 2025

Discussion

MaxForce01

Oct 17, 2025

•

edited Oct 17, 2025

Please quantize:

https://huggingface.co/Mattimax/DACMini-IT

nicoboss

Oct 17, 2025

We already tried this model the day before yesterday and yesterday but I now tried it again but manually and the pre-tokenizer is still not supported no matter how often someone requests this model:

INFO:hf-to-gguf:blk.9.attn_qkv.bias,       torch.float32 --> F32, shape = {2304}
INFO:hf-to-gguf:blk.9.attn_qkv.weight,     torch.float32 --> F32, shape = {768, 2304}
INFO:hf-to-gguf:blk.9.attn_output.bias,    torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_output.weight,  torch.float32 --> F32, shape = {768, 768}
INFO:hf-to-gguf:blk.9.attn_norm.bias,      torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_norm.weight,    torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_norm.bias,       torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,     torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_up.bias,         torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.9.ffn_up.weight,       torch.float32 --> F32, shape = {768, 3072}
INFO:hf-to-gguf:blk.9.ffn_down.bias,       torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_down.weight,     torch.float32 --> F32, shape = {3072, 768}
INFO:hf-to-gguf:output_norm.bias,          torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:output_norm.weight,        torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:position_embd.weight,      torch.float32 --> F32, shape = {768, 1024}
INFO:hf-to-gguf:token_embd.weight,         torch.float32 --> F32, shape = {768, 30002}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
WARNING:hf-to-gguf:

WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:**          There are 2 possible reasons for this:
WARNING:hf-to-gguf:**          - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:**          - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:**          Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref:     https://github.com/ggml-org/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh:  a81a3d402e8564fb52e642db51005cdeddf718acc1af0849d2b2c92c2b8fbea9
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:

Traceback (most recent call last):
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 9554, in <module>
    main()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 9548, in main
    model_instance.write()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 443, in write
    self.prepare_metadata(vocab_only=False)
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 564, in prepare_metadata
    self.set_vocab()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 536, in set_vocab
    self._set_vocab_gpt2()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 936, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 654, in get_vocab_base
    tokpre = self.get_vocab_base_pre(tokenizer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 924, in get_vocab_base_pre
    raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

But fine I will just use the GPT-2 pre-tokenizer but I can't guarantee that this won't affect the model’s quality as there might be a reason why an Italian model doesn't use the GPT-2 pre-tokenizer.

nicoboss

Oct 17, 2025

Great news: After modifying the llama.cpp source code to use the GPT-2 pre-tokenizer for this model it got successfully converted into the source GGUF.

INFO:hf-to-gguf:blk.9.attn_qkv.bias,       torch.float32 --> F32, shape = {2304}
INFO:hf-to-gguf:blk.9.attn_qkv.weight,     torch.float32 --> F32, shape = {768, 2304}
INFO:hf-to-gguf:blk.9.attn_output.bias,    torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_output.weight,  torch.float32 --> F32, shape = {768, 768}
INFO:hf-to-gguf:blk.9.attn_norm.bias,      torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_norm.weight,    torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_norm.bias,       torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,     torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_up.bias,         torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.9.ffn_up.weight,       torch.float32 --> F32, shape = {768, 3072}
INFO:hf-to-gguf:blk.9.ffn_down.bias,       torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_down.weight,     torch.float32 --> F32, shape = {3072, 768}
INFO:hf-to-gguf:output_norm.bias,          torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:output_norm.weight,        torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:position_embd.weight,      torch.float32 --> F32, shape = {768, 1024}
INFO:hf-to-gguf:token_embd.weight,         torch.float32 --> F32, shape = {768, 30002}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Adding 29743 merge(s).
INFO:gguf.vocab:Setting special token type bos to 30000
INFO:gguf.vocab:Setting special token type eos to 0
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 30001
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] != 'system' %}<|system|>
Ti chiami DACMini, un modello di intelligenza artificiale creato da M.INC.{% endif %}
{% for message in messages %}{% if message['role'] == 'user' %}<|user|>
{{ message['content'] }}{% elif message['role'] == 'assistant' %}<|assistant|>
{{ message['content'] }}{% endif %}{% endfor %}
<|assistant|>
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mradermacher/tmp/quant/DACMini-IT.gguf: n_tensors = 148, total_size = 435.5M
gguf serialising key  general.architecture value GGUFValue(value='gpt2', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.type value GGUFValue(value='model', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.name value GGUFValue(value='DACMini IT', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.size_label value GGUFValue(value='109M', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.license value GGUFValue(value='mit', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.base_model.count value GGUFValue(value=1, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  general.base_model.0.name value GGUFValue(value='DACMini', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.base_model.0.organization value GGUFValue(value='Mattimax', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.base_model.0.repo_url value GGUFValue(value='https://huggingface.co/Mattimax/DACMini', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.dataset.count value GGUFValue(value=1, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  general.dataset.0.name value GGUFValue(value='DATA AI_Conversation_ITA', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.dataset.0.organization value GGUFValue(value='Mattimax', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.dataset.0.repo_url value GGUFValue(value='https://huggingface.co/Mattimax/DATA-AI_Conversation_ITA', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.tags value GGUFValue(value=['DAC', 'DATA-AI', 'data-ai'], type=<GGUFValueType.ARRAY: 9>, sub_type=None)
gguf serialising key  general.languages value GGUFValue(value=['it'], type=<GGUFValueType.ARRAY: 9>, sub_type=None)
gguf serialising key  gpt2.block_count value GGUFValue(value=12, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.context_length value GGUFValue(value=1024, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.embedding_length value GGUFValue(value=768, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.feed_forward_length value GGUFValue(value=3072, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.attention.head_count value GGUFValue(value=12, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.attention.layer_norm_epsilon value GGUFValue(value=1e-05, type=<GGUFValueType.FLOAT32: 6>, sub_type=None)
gguf serialising key  general.file_type value GGUFValue(value=<LlamaFileType.MOSTLY_SOURCE: 1025>, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  general.quantization_version value GGUFValue(value=2, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.model value GGUFValue(value='gpt2', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  tokenizer.ggml.pre value GGUFValue(value='gpt-2', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  tokenizer.ggml.tokens value-suppressed
gguf serialising key  tokenizer.ggml.token_type value-suppressed
gguf serialising key  tokenizer.ggml.merges value-suppressed
gguf serialising key  tokenizer.ggml.bos_token_id value GGUFValue(value=30000, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.eos_token_id value GGUFValue(value=0, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.unknown_token_id value GGUFValue(value=0, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.padding_token_id value GGUFValue(value=30001, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.add_bos_token value GGUFValue(value=False, type=<GGUFValueType.BOOL: 7>, sub_type=None)
gguf serialising key  tokenizer.chat_template value GGUFValue(value="{% if messages[0]['role'] != 'system' %}<|system|>\nTi chiami DACMini, un modello di intelligenza artificiale creato da M.INC.{% endif %}\n{% for message in messages %}{% if message['role'] == 'user' %}<|user|>\n{{ message['content'] }}{% elif message['role'] == 'assistant' %}<|assistant|>\n{{ message['content'] }}{% endif %}{% endfor %}\n<|assistant|>", type=<GGUFValueType.STRING: 8>, sub_type=None)
Writing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 436M/436M [00:01<00:00, 309Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /mradermacher/tmp/quant/DACMini-IT.gguf

nicoboss

Oct 17, 2025

Static quants and imatrix coputed sucessfully:

-2000    0 DACMini-IT                                    run/imatrix (GPU-2d) 616/12 0.14s/c 0.3/1.1m(-0.2-0.4) [405/503] 96.4704

Last imatrix quant also just got computed:

-1999    1  I DACMini-IT                                   run/imatrix 24/24,IQ3_S [8/148]

So the model is already done.

nicoboss

Oct 17, 2025

Static quants: https://huggingface.co/mradermacher/DACMini-IT-GGUF
Weighted/imatrix quants: https://huggingface.co/mradermacher/DACMini-IT-i1-GGUF

Please let us know how it turned out compared to the unquantized model. We don't often quantize models with an incompatible pre-tokenizer but if it turned out well we might decide to more often do so.

MaxForce01

Oct 17, 2025

Hi there,

I’ve tested the quantized DACMini-IT models and I must say that they are fantastic! Even with very aggressive quantizations, the performance and output quality remain excellent. Honestly, this works better than I expected.
I really encourage you to do this much more often for models like this, as it clearly makes a huge difference for users.
Also, I want to sincerely thank you for your patience and effort in doing this. I know it wasn’t a trivial task, and your dedication is greatly appreciated.
Looking forward to seeing more of these quantized models in the future!

Best regards,
MaxForce01

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment