chess-gpt-4.5M / README.md

Derick DeCesare

updated yaml

cff82f5 9 months ago

3.55 kB

	---
	language:
	- en
	tags:
	- chess
	- gpt
	- transformers
	- text-generation
	license: mit
	datasets:
	- lichess
	pipeline_tag: text-generation
	library_name: transformers
	base_model: gpt2
	---

	# Chess GPT-4.5M

	## Overview

	Chess GPT-4.5M is a generative language model trained specifically to generate chess moves and analyze chess games. The model is based on the GPT architecture and was trained with a custom 32-token vocabulary reflecting key chess symbols and notations.

	## Model Details

	- Architecture: GPT-based language model (GPT2LMHeadModel)
	- Parameters: Approximately 4.5M parameters
	- Layers: 8 transformer layers
	- Heads: 4 attention heads per layer
	- Embedding Dimension: 256
	- Training Sequence Length: 1024 tokens per chess game
	- Vocabulary: 32 tokens (custom vocabulary)

	## Training Data

	The model was trained on tokenized chess game data prepared from the [Lichess dataset](https://huggingface.co/datasets/lichess). The preparation process involved:

	- Tokenizing chess games using a custom 32-token vocabulary.
	- Creating binary training files (`train.bin` and `val.bin`).
	- Saving vocabulary information to `meta.pkl`.

	## Training Configuration

	The training configuration, found in `config/mac_chess_gpt.py`, includes:

	- Dataset: lichess_hf_dataset
	- Batch Size: 2 (optimized for Mac's memory constraints)
	- Block Size: 1023 (1024 including the positional embedding)
	- Learning Rate: 3e-4
	- Max Iterations: 140,000
	- Device: 'mps' (Mac-specific settings)
	- Other Settings: No dropout and compile set to False for Mac compatibility

	## How to Use

	### Generating Chess Moves

	After fine-tuning, use the generation script to sample chess moves. Example commands:
	bash
	Sample from the model without a provided prompt:
	python sample.py --out_dir=out-chess-mac
	Generate a chess game sequence starting with a custom prompt:
	python sample.py --out_dir=out-chess-mac --start=";1.e4"

	### Loading the Model in Transformers

	Once the model card and converted model files are pushed to the Hugging Face Hub, you can load the model using:

	python
	from transformers import GPT2LMHeadModel, GPT2Tokenizer
	model = GPT2LMHeadModel.from_pretrained("your-hf-username/chess-gpt-4.5M")
	tokenizer = GPT2Tokenizer.from_pretrained("your-hf-username/chess-gpt-4.5M")

	_Note:_ The tokenizer uses a custom vocabulary provided in `vocab.json`.

	## Intended Use

	The model is intended for:

	- Generating chess move sequences.
	- Assisting in automated chess analysis.
	- Educational purposes in understanding language model training on specialized domains.

	## Limitations

	- The model is trained on a relatively small (4.5M parameter) architecture and may not capture extremely complex chess strategies.
	- It is specialized on chess move generation and may not generalize to standard language tasks.

	## Training Process Summary

	1. Data Preparation: Tokenized the Lichess chess game dataset using a 32-token vocabulary.
	2. Model Training: Used custom training configurations specified in `config/mac_chess_gpt.py`.
	3. Model Conversion: Converted added checkpoint from `out-chess-mac/ckpt.pt` into a Hugging Face compatible format with `convert_to_hf.py`.
	4. Repository Setup: Pushed the converted model files (including custom tokenizer vocab) to the Hugging Face Hub with Git LFS handling large files.

	## Acknowledgements

	This model was developed following inspiration from [GPT-2](https://openai.com/blog/better-language-models/) and adapted for the chess domain.

	---