https://huggingface.co/swap-uniba/LLaMAntino-2-70b-hf-UltraChat-ITA

#1411
by Rubertigno - opened

Please add GGUF quantized.
Many Thank in advance for your wonderful work

Rubertigno changed discussion title from Request: swap-uniba/LLaMAntino-2-70b-hf-UltraChat-ITA to swap-uniba/LLaMAntino-2-70b-hf-UltraChat-ITA
Rubertigno changed discussion title from swap-uniba/LLaMAntino-2-70b-hf-UltraChat-ITA to https://huggingface.co/swap-uniba/LLaMAntino-2-70b-hf-UltraChat-ITA

Unfortunately, we tried before and it failed, it seems the tokenizer does not match the model:

LLaMAntino-2-70b-hf-UltraChat-ITA WARNING:hf-to-gguf:ignore token 32000: id is out of range, max=31999
LLaMAntino-2-70b-hf-UltraChat-ITA File "/root/cvs/llama.cpp/convert_hf_to_gguf.py", line 823, in _create_vocab_sentencepiece
LLaMAntino-2-70b-hf-UltraChat-ITA if toktypes[token_id] != SentencePieceTokenTypes.UNUSED:
LLaMAntino-2-70b-hf-UltraChat-ITA ~~~~~~~~^^^^^^^^^^
LLaMAntino-2-70b-hf-UltraChat-ITA IndexError: list index out of range

We have a preprocessor that tries to fix such things, I'll see if I can extend it for this architecture.

many thanks

It should work, you can watch the status of the model at http://hf.tst.eu/status.html (imatrix quants aqre delayed due to another big model).

Unfortunately, imatrix generation failed, and I am not sure llama.cpp can load the static quants. I am not convinced this error is a problem with the model per-se, so I will keep the static quants.

/llmjob/llama.cpp-cuda512/tools/imatrix/imatrix.cpp:915: GGML_ASSERT(!llama_vocab_get_add_eos(vocab)) failed

Sign up or log in to comment