TokenMonster
The documentation and code is available on Github alasdairforsythe/tokenmonster.
The pretrained vocabularies are all available for download here.
July 11: TokenMonster v1.1.1 has been released. The "420" prebuilt vocabularies are being released as they are completed, at a rate of around 10 per day.
Choose a dataset from:
codeenglishenglishcodefiction
Choose a vocab size from:
1024204840968000160002400032000400005025665536100256
Choose an optimization mode from:
unfilteredcleanbalancedconsistentstrict
For a capcode disabled vocabulary add:
nocapcode
And finally add the version number:
v1
Examples:
fiction-24000-consistent-v1code-4096-clean-nocapcode-v1
There are two additional vocabularies:
gpt2llama
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support