1
π οΈEnvironment Setup
Set up your Python environment using uv, create a virtual environment, and install all necessary dependencies for the nanochat project.
View setup instructions β
2
πͺTokenizer Training
Train a custom BPE tokenizer using Rust bindings on 2 billion characters of data, achieving competitive compression ratios compared to GPT-4's tokenizer.
View tokenizer guide β
3
π₯¦Pre-training
Download a larger dataset and run distributed training across 8 GPUs using torchrun, with metrics tracked in a shared trackio space.
View pre-training steps β