YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

MEMGPT

A GPT-2-style large language model (LLM) repository.This implementation includes full support for distributed training, sharded datasets, benchmark evaluation, and efficient text generation.


๐Ÿ”ง Features

  • Transformer architecture based on GPT-2.
  • Configurable training and model hyperparameters via JSON.
  • Sharded dataset loading from .npy files.
  • Mixed-precision training with torch.autocast.
  • DDP (DistributedDataParallel) support.
  • Evaluation support with HellaSwag.
  • Modular codebase for easy extensibility.

๐Ÿ“ Project Structure

MEMGPT/
โ”œโ”€โ”€ configs/
โ”‚   โ””โ”€โ”€ config.json                    # Model and training configuration
โ”‚
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ edu_fineweb/                   # Sharded training data
โ”‚   โ”‚   โ”œโ”€โ”€ train_000001.npy
โ”‚   โ”‚   โ”œโ”€โ”€ train_000002.npy
โ”‚   โ”‚   โ””โ”€โ”€ test_000001.npy
โ”‚   โ”œโ”€โ”€ hellaswag/
โ”‚   โ”‚   โ””โ”€โ”€ hellaswag_val.jsonl
โ”‚   โ””โ”€โ”€ fineweb.py                     # Dataset sharding/processing logic
โ”‚
โ”œโ”€โ”€ model_core/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ attention.py                   # Self-attention module
โ”‚   โ”œโ”€โ”€ model.py                       # GPT2 model architecture
โ”‚   โ”œโ”€โ”€ dataloader.py                  # DataLoader_1 class
โ”‚   โ””โ”€โ”€ training.py                    # train_nanogpt function
โ”‚
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ train.py                       # Entry point to start training
โ”‚   โ”œโ”€โ”€ evaluate.py                    # Run evaluation
โ”‚   โ””โ”€โ”€ generate.py                    # Generate text from trained model
โ”‚
โ”œโ”€โ”€ evaluation/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ hellaswag.py                   # HellaSwag dataset preparation
โ”‚   โ””โ”€โ”€ val_hellaswag.py               # HellaSwag scoring function
โ”‚
โ”œโ”€โ”€ logs/
โ”‚   โ”œโ”€โ”€ log.txt                        # Training log file
โ”‚   โ””โ”€โ”€ model_xxxxx.pt                # Checkpoint files
โ”‚
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt

โš™๏ธ Configuration

Edit configs/config.json to configure your model and training setup.

Example:

{
  "model": {
    "block_size": 1024,
    "vocab_size": 50304,
    "n_layer": 12,
    "n_head": 12,
    "n_embd": 768
  },
  "training": {
    "max_steps": 19073,
    "log_dir": "log",
    "total_batch_size": 524288,
    "B": 64,
    "T": 1024,
    "max_lr": 0.0006,
    "min_lr": 0.00006,
    "warmup_steps": 715,
    "weight_decay": 0.1,
    "learning_rate": 0.0006
  }
}

๐Ÿš€ Training

To start training the model:

python scripts/train.py

This script internally loads train_nanogpt() from model_core/training.py using the config in configs/config.json.

Optional: Distributed Training

To run training across multiple GPUs using PyTorch DDP:

torchrun --nproc_per_node=NUM_GPUS scripts/train.py

Replace NUM_GPUS with the number of GPUs you want to use.


๐Ÿ“Š Evaluation

To evaluate on HellaSwag:

python scripts/evaluate.py

Make sure the hellaswag_val.jsonl file is available under data/hellaswag/.


โœ๏ธ Text Generation

To generate text from a trained model:

python scripts/generate.py

Make sure to adjust the generation script to point to the correct checkpoint under the logs/ directory.


๐Ÿงฉ Requirements

Install required packages:

pip install -r requirements.txt

๐Ÿ“Œ Notes

  • Ensure your .npy sharded data is placed under data/edu_fineweb/.
  • The log directory and checkpoints will be saved in logs/.
  • The DataLoader_1 handles distributed data loading.
  • Supports bfloat16 autocasting for better training efficiency.

๐Ÿ“ฎ License

MIT License. Feel free to modify and build upon this for research or commercial use.


๐Ÿ™Œ Acknowledgements

Inspired by Andrej Karpathy's nanoGPT. Special thanks to the Andrej Karpathy Youtube tutorials and open-source AI community.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support