DACMini-IT

#1456
by MaxForce01 - opened

We already tried this model the day before yesterday and yesterday but I now tried it again but manually and the pre-tokenizer is still not supported no matter how often someone requests this model:

INFO:hf-to-gguf:blk.9.attn_qkv.bias,       torch.float32 --> F32, shape = {2304}
INFO:hf-to-gguf:blk.9.attn_qkv.weight,     torch.float32 --> F32, shape = {768, 2304}
INFO:hf-to-gguf:blk.9.attn_output.bias,    torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_output.weight,  torch.float32 --> F32, shape = {768, 768}
INFO:hf-to-gguf:blk.9.attn_norm.bias,      torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_norm.weight,    torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_norm.bias,       torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,     torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_up.bias,         torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.9.ffn_up.weight,       torch.float32 --> F32, shape = {768, 3072}
INFO:hf-to-gguf:blk.9.ffn_down.bias,       torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_down.weight,     torch.float32 --> F32, shape = {3072, 768}
INFO:hf-to-gguf:output_norm.bias,          torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:output_norm.weight,        torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:position_embd.weight,      torch.float32 --> F32, shape = {768, 1024}
INFO:hf-to-gguf:token_embd.weight,         torch.float32 --> F32, shape = {768, 30002}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
WARNING:hf-to-gguf:

WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:**          There are 2 possible reasons for this:
WARNING:hf-to-gguf:**          - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:**          - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:**          Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref:     https://github.com/ggml-org/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh:  a81a3d402e8564fb52e642db51005cdeddf718acc1af0849d2b2c92c2b8fbea9
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:

Traceback (most recent call last):
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 9554, in <module>
    main()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 9548, in main
    model_instance.write()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 443, in write
    self.prepare_metadata(vocab_only=False)
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 564, in prepare_metadata
    self.set_vocab()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 536, in set_vocab
    self._set_vocab_gpt2()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 936, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 654, in get_vocab_base
    tokpre = self.get_vocab_base_pre(tokenizer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 924, in get_vocab_base_pre
    raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

But fine I will just use the GPT-2 pre-tokenizer but I can't guarantee that this won't affect the model’s quality as there might be a reason why an Italian model doesn't use the GPT-2 pre-tokenizer.

Great news: After modifying the llama.cpp source code to use the GPT-2 pre-tokenizer for this model it got successfully converted into the source GGUF.

INFO:hf-to-gguf:blk.9.attn_qkv.bias,       torch.float32 --> F32, shape = {2304}
INFO:hf-to-gguf:blk.9.attn_qkv.weight,     torch.float32 --> F32, shape = {768, 2304}
INFO:hf-to-gguf:blk.9.attn_output.bias,    torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_output.weight,  torch.float32 --> F32, shape = {768, 768}
INFO:hf-to-gguf:blk.9.attn_norm.bias,      torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_norm.weight,    torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_norm.bias,       torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,     torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_up.bias,         torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.9.ffn_up.weight,       torch.float32 --> F32, shape = {768, 3072}
INFO:hf-to-gguf:blk.9.ffn_down.bias,       torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_down.weight,     torch.float32 --> F32, shape = {3072, 768}
INFO:hf-to-gguf:output_norm.bias,          torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:output_norm.weight,        torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:position_embd.weight,      torch.float32 --> F32, shape = {768, 1024}
INFO:hf-to-gguf:token_embd.weight,         torch.float32 --> F32, shape = {768, 30002}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Adding 29743 merge(s).
INFO:gguf.vocab:Setting special token type bos to 30000
INFO:gguf.vocab:Setting special token type eos to 0
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 30001
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] != 'system' %}<|system|>
Ti chiami DACMini, un modello di intelligenza artificiale creato da M.INC.{% endif %}
{% for message in messages %}{% if message['role'] == 'user' %}<|user|>
{{ message['content'] }}{% elif message['role'] == 'assistant' %}<|assistant|>
{{ message['content'] }}{% endif %}{% endfor %}
<|assistant|>
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mradermacher/tmp/quant/DACMini-IT.gguf: n_tensors = 148, total_size = 435.5M
gguf serialising key  general.architecture value GGUFValue(value='gpt2', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.type value GGUFValue(value='model', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.name value GGUFValue(value='DACMini IT', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.size_label value GGUFValue(value='109M', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.license value GGUFValue(value='mit', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.base_model.count value GGUFValue(value=1, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  general.base_model.0.name value GGUFValue(value='DACMini', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.base_model.0.organization value GGUFValue(value='Mattimax', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.base_model.0.repo_url value GGUFValue(value='https://huggingface.co/Mattimax/DACMini', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.dataset.count value GGUFValue(value=1, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  general.dataset.0.name value GGUFValue(value='DATA AI_Conversation_ITA', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.dataset.0.organization value GGUFValue(value='Mattimax', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.dataset.0.repo_url value GGUFValue(value='https://huggingface.co/Mattimax/DATA-AI_Conversation_ITA', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  general.tags value GGUFValue(value=['DAC', 'DATA-AI', 'data-ai'], type=<GGUFValueType.ARRAY: 9>, sub_type=None)
gguf serialising key  general.languages value GGUFValue(value=['it'], type=<GGUFValueType.ARRAY: 9>, sub_type=None)
gguf serialising key  gpt2.block_count value GGUFValue(value=12, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.context_length value GGUFValue(value=1024, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.embedding_length value GGUFValue(value=768, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.feed_forward_length value GGUFValue(value=3072, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.attention.head_count value GGUFValue(value=12, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  gpt2.attention.layer_norm_epsilon value GGUFValue(value=1e-05, type=<GGUFValueType.FLOAT32: 6>, sub_type=None)
gguf serialising key  general.file_type value GGUFValue(value=<LlamaFileType.MOSTLY_SOURCE: 1025>, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  general.quantization_version value GGUFValue(value=2, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.model value GGUFValue(value='gpt2', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  tokenizer.ggml.pre value GGUFValue(value='gpt-2', type=<GGUFValueType.STRING: 8>, sub_type=None)
gguf serialising key  tokenizer.ggml.tokens value-suppressed
gguf serialising key  tokenizer.ggml.token_type value-suppressed
gguf serialising key  tokenizer.ggml.merges value-suppressed
gguf serialising key  tokenizer.ggml.bos_token_id value GGUFValue(value=30000, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.eos_token_id value GGUFValue(value=0, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.unknown_token_id value GGUFValue(value=0, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.padding_token_id value GGUFValue(value=30001, type=<GGUFValueType.UINT32: 4>, sub_type=None)
gguf serialising key  tokenizer.ggml.add_bos_token value GGUFValue(value=False, type=<GGUFValueType.BOOL: 7>, sub_type=None)
gguf serialising key  tokenizer.chat_template value GGUFValue(value="{% if messages[0]['role'] != 'system' %}<|system|>\nTi chiami DACMini, un modello di intelligenza artificiale creato da M.INC.{% endif %}\n{% for message in messages %}{% if message['role'] == 'user' %}<|user|>\n{{ message['content'] }}{% elif message['role'] == 'assistant' %}<|assistant|>\n{{ message['content'] }}{% endif %}{% endfor %}\n<|assistant|>", type=<GGUFValueType.STRING: 8>, sub_type=None)
Writing: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 436M/436M [00:01<00:00, 309Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /mradermacher/tmp/quant/DACMini-IT.gguf

Static quants and imatrix coputed sucessfully:

-2000    0 DACMini-IT                                    run/imatrix (GPU-2d) 616/12 0.14s/c 0.3/1.1m(-0.2-0.4) [405/503] 96.4704

Last imatrix quant also just got computed:

-1999    1  I DACMini-IT                                   run/imatrix 24/24,IQ3_S [8/148]

So the model is already done.

Static quants: https://huggingface.co/mradermacher/DACMini-IT-GGUF
Weighted/imatrix quants: https://huggingface.co/mradermacher/DACMini-IT-i1-GGUF

Please let us know how it turned out compared to the unquantized model. We don't often quantize models with an incompatible pre-tokenizer but if it turned out well we might decide to more often do so.

Hi there,

I’ve tested the quantized DACMini-IT models and I must say that they are fantastic! Even with very aggressive quantizations, the performance and output quality remain excellent. Honestly, this works better than I expected.
I really encourage you to do this much more often for models like this, as it clearly makes a huge difference for users.
Also, I want to sincerely thank you for your patience and effort in doing this. I know it wasn’t a trivial task, and your dedication is greatly appreciated.
Looking forward to seeing more of these quantized models in the future!

Best regards,
MaxForce01

Sign up or log in to comment