MEMGPT
A GPT-2-style large language model (LLM) repository.This implementation includes full support for distributed training, sharded datasets, benchmark evaluation, and efficient text generation.
๐ง Features
- Transformer architecture based on GPT-2.
- Configurable training and model hyperparameters via JSON.
- Sharded dataset loading from
.npyfiles. - Mixed-precision training with
torch.autocast. - DDP (DistributedDataParallel) support.
- Evaluation support with HellaSwag.
- Modular codebase for easy extensibility.
๐ Project Structure
MEMGPT/
โโโ configs/
โ โโโ config.json # Model and training configuration
โ
โโโ data/
โ โโโ edu_fineweb/ # Sharded training data
โ โ โโโ train_000001.npy
โ โ โโโ train_000002.npy
โ โ โโโ test_000001.npy
โ โโโ hellaswag/
โ โ โโโ hellaswag_val.jsonl
โ โโโ fineweb.py # Dataset sharding/processing logic
โ
โโโ model_core/
โ โโโ __init__.py
โ โโโ attention.py # Self-attention module
โ โโโ model.py # GPT2 model architecture
โ โโโ dataloader.py # DataLoader_1 class
โ โโโ training.py # train_nanogpt function
โ
โโโ scripts/
โ โโโ train.py # Entry point to start training
โ โโโ evaluate.py # Run evaluation
โ โโโ generate.py # Generate text from trained model
โ
โโโ evaluation/
โ โโโ __init__.py
โ โโโ hellaswag.py # HellaSwag dataset preparation
โ โโโ val_hellaswag.py # HellaSwag scoring function
โ
โโโ logs/
โ โโโ log.txt # Training log file
โ โโโ model_xxxxx.pt # Checkpoint files
โ
โโโ .gitignore
โโโ README.md
โโโ requirements.txt
โ๏ธ Configuration
Edit configs/config.json to configure your model and training setup.
Example:
{
"model": {
"block_size": 1024,
"vocab_size": 50304,
"n_layer": 12,
"n_head": 12,
"n_embd": 768
},
"training": {
"max_steps": 19073,
"log_dir": "log",
"total_batch_size": 524288,
"B": 64,
"T": 1024,
"max_lr": 0.0006,
"min_lr": 0.00006,
"warmup_steps": 715,
"weight_decay": 0.1,
"learning_rate": 0.0006
}
}
๐ Training
To start training the model:
python scripts/train.py
This script internally loads train_nanogpt() from model_core/training.py using the config in configs/config.json.
Optional: Distributed Training
To run training across multiple GPUs using PyTorch DDP:
torchrun --nproc_per_node=NUM_GPUS scripts/train.py
Replace NUM_GPUS with the number of GPUs you want to use.
๐ Evaluation
To evaluate on HellaSwag:
python scripts/evaluate.py
Make sure the hellaswag_val.jsonl file is available under data/hellaswag/.
โ๏ธ Text Generation
To generate text from a trained model:
python scripts/generate.py
Make sure to adjust the generation script to point to the correct checkpoint under the logs/ directory.
๐งฉ Requirements
Install required packages:
pip install -r requirements.txt
๐ Notes
- Ensure your
.npysharded data is placed underdata/edu_fineweb/. - The log directory and checkpoints will be saved in
logs/. - The
DataLoader_1handles distributed data loading. - Supports
bfloat16autocasting for better training efficiency.
๐ฎ License
MIT License. Feel free to modify and build upon this for research or commercial use.
๐ Acknowledgements
Inspired by Andrej Karpathy's nanoGPT. Special thanks to the Andrej Karpathy Youtube tutorials and open-source AI community.