https://huggingface.co/swap-uniba/LLaMAntino-2-70b-hf-UltraChat-ITA
Please add GGUF quantized.
Many Thank in advance for your wonderful work
Unfortunately, we tried before and it failed, it seems the tokenizer does not match the model:
LLaMAntino-2-70b-hf-UltraChat-ITA WARNING:hf-to-gguf:ignore token 32000: id is out of range, max=31999
LLaMAntino-2-70b-hf-UltraChat-ITA File "/root/cvs/llama.cpp/convert_hf_to_gguf.py", line 823, in _create_vocab_sentencepiece
LLaMAntino-2-70b-hf-UltraChat-ITA if toktypes[token_id] != SentencePieceTokenTypes.UNUSED:
LLaMAntino-2-70b-hf-UltraChat-ITA ~~~~~~~~^^^^^^^^^^
LLaMAntino-2-70b-hf-UltraChat-ITA IndexError: list index out of range
We have a preprocessor that tries to fix such things, I'll see if I can extend it for this architecture.
many thanks
It should work, you can watch the status of the model at http://hf.tst.eu/status.html (imatrix quants aqre delayed due to another big model).
Unfortunately, imatrix generation failed, and I am not sure llama.cpp can load the static quants. I am not convinced this error is a problem with the model per-se, so I will keep the static quants.
/llmjob/llama.cpp-cuda512/tools/imatrix/imatrix.cpp:915: GGML_ASSERT(!llama_vocab_get_add_eos(vocab)) failed