ssllm_hf / tokenizer_info.txt
sausheong's picture
initial commit
c4a4b04
raw
history blame
121 Bytes
Tokenizer: cl100k_base (tiktoken)
Vocabulary size: 100277
BOS token ID: 100256
EOS token ID: 100257
PAD token ID: 100257