chess-gpt-4.5M / README.md
Derick DeCesare
updated yaml
cff82f5
---
language:
- en
tags:
- chess
- gpt
- transformers
- text-generation
license: mit
datasets:
- lichess
pipeline_tag: text-generation
library_name: transformers
base_model: gpt2
---
# Chess GPT-4.5M
## Overview
Chess GPT-4.5M is a generative language model trained specifically to generate chess moves and analyze chess games. The model is based on the GPT architecture and was trained with a custom 32-token vocabulary reflecting key chess symbols and notations.
## Model Details
- **Architecture:** GPT-based language model (GPT2LMHeadModel)
- **Parameters:** Approximately 4.5M parameters
- **Layers:** 8 transformer layers
- **Heads:** 4 attention heads per layer
- **Embedding Dimension:** 256
- **Training Sequence Length:** 1024 tokens per chess game
- **Vocabulary:** 32 tokens (custom vocabulary)
## Training Data
The model was trained on tokenized chess game data prepared from the [Lichess dataset](https://huggingface.co/datasets/lichess). The preparation process involved:
- Tokenizing chess games using a custom 32-token vocabulary.
- Creating binary training files (`train.bin` and `val.bin`).
- Saving vocabulary information to `meta.pkl`.
## Training Configuration
The training configuration, found in `config/mac_chess_gpt.py`, includes:
- **Dataset:** lichess_hf_dataset
- **Batch Size:** 2 (optimized for Mac's memory constraints)
- **Block Size:** 1023 (1024 including the positional embedding)
- **Learning Rate:** 3e-4
- **Max Iterations:** 140,000
- **Device:** 'mps' (Mac-specific settings)
- **Other Settings:** No dropout and compile set to False for Mac compatibility
## How to Use
### Generating Chess Moves
After fine-tuning, use the generation script to sample chess moves. Example commands:
bash
Sample from the model without a provided prompt:
python sample.py --out_dir=out-chess-mac
Generate a chess game sequence starting with a custom prompt:
python sample.py --out_dir=out-chess-mac --start=";1.e4"
### Loading the Model in Transformers
Once the model card and converted model files are pushed to the Hugging Face Hub, you can load the model using:
python
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model = GPT2LMHeadModel.from_pretrained("your-hf-username/chess-gpt-4.5M")
tokenizer = GPT2Tokenizer.from_pretrained("your-hf-username/chess-gpt-4.5M")
_Note:_ The tokenizer uses a custom vocabulary provided in `vocab.json`.
## Intended Use
The model is intended for:
- Generating chess move sequences.
- Assisting in automated chess analysis.
- Educational purposes in understanding language model training on specialized domains.
## Limitations
- The model is trained on a relatively small (4.5M parameter) architecture and may not capture extremely complex chess strategies.
- It is specialized on chess move generation and may not generalize to standard language tasks.
## Training Process Summary
1. **Data Preparation:** Tokenized the Lichess chess game dataset using a 32-token vocabulary.
2. **Model Training:** Used custom training configurations specified in `config/mac_chess_gpt.py`.
3. **Model Conversion:** Converted added checkpoint from `out-chess-mac/ckpt.pt` into a Hugging Face compatible format with `convert_to_hf.py`.
4. **Repository Setup:** Pushed the converted model files (including custom tokenizer vocab) to the Hugging Face Hub with Git LFS handling large files.
## Acknowledgements
This model was developed following inspiration from [GPT-2](https://openai.com/blog/better-language-models/) and adapted for the chess domain.
---