Searchless Chess 9M Self-Play

A 9-million parameter transformer-based chess engine trained via self-play with Stockfish evaluation. This model learns to play chess without explicit search during inference, relying purely on learned pattern recognition.

Model Description

Model Size: 9M parameters (8 layers, 256 embedding dim, 8 attention heads)
Architecture: Decoder-only Transformer with learned positional encodings
Training Method: Self-play with Stockfish rewards
Framework: JAX + Haiku
Q-Value Distribution: 128 return buckets for action-value prediction

This model predicts action-values (Q-values) for chess positions without performing tree search, making it extremely fast for inference while maintaining strong play.

Installation

CPU Installation

Install the required dependencies for CPU inference:

pip install jax jaxlib dm-haiku orbax-checkpoint numpy chess huggingface-hub jaxtyping apache-beam grain

GPU Installation (Recommended)

For GPU acceleration with CUDA 12:

pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install dm-haiku orbax-checkpoint numpy chess huggingface-hub jaxtyping apache-beam grain

For other CUDA versions, see the JAX installation guide.

Note: This model includes all necessary code and can be used without cloning the original repository.

Quick Start

import sys
from huggingface_hub import snapshot_download

# Download model from HuggingFace Hub
model_path = snapshot_download(
    repo_id="dbest-isi/searchless-chess-9M-selfplay",
    local_dir="./searchless_chess_model"
)

# Add bundled code to Python path
sys.path.insert(0, f"{model_path}/searchless_chess_code")

# Import model wrapper
import hf_model

# Load the model
model = hf_model.SearchlessChessModel.from_pretrained(model_path)

# Make a prediction
fen = "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"
result = model.predict(fen, temperature=1.0)

print(f"Best move: {result['best_move']}")
print(f"Q-value: {result['q_value']:.4f}")
print(f"Action probabilities shape: {result['action_probs'].shape}")

Example Output

Best move: e7e5
Q-value: 0.0119
Action probabilities shape: (1968,)

Full Example with Multiple Positions

import sys
from huggingface_hub import snapshot_download

# Download and setup
model_path = snapshot_download(
    repo_id="dbest-isi/searchless-chess-9M-selfplay",
    local_dir="./searchless_chess_model"
)
sys.path.insert(0, f"{model_path}/searchless_chess_code")

import hf_model

# Load model
print("Loading model...")
model = hf_model.SearchlessChessModel.from_pretrained(model_path)
print("Model loaded!")

# Test on multiple positions
positions = [
    ("Starting position", "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"),
    ("After 1.e4", "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"),
    ("Scandinavian Defense", "rnbqkbnr/ppp1pppp/8/3p4/4P3/8/PPPP1PPP/RNBQKBNR w KQkq d6 0 2"),
]

for name, fen in positions:
    result = model.predict(fen)
    print(f"\n{name}")
    print(f"  FEN: {fen}")
    print(f"  Best move: {result['best_move']}")
    print(f"  Q-value: {result['q_value']:.4f}")

Model Architecture

TransformerConfig(
    vocab_size=1968,
    output_size=128,
    embedding_dim=256,
    num_layers=8,
    num_heads=8,
    max_sequence_length=79,
    num_return_buckets=128,
    pos_encodings="LEARNED",
    apply_post_ln=True,
    apply_qk_layernorm=False,
    use_causal_mask=False,
)

Training Details

Base Model: Initialized from pretrained 9M checkpoint
Training Method: Self-play reinforcement learning
Reward Signal: Stockfish evaluation at depth 20
Iteration: 22 (EMA parameters)
Action Space: 1968 possible moves (all legal chess moves)
Value Representation: Discretized into 128 buckets

Use Cases

Fast chess move prediction without search
Chess position evaluation
Research on learned planning in board games
Integration into chess applications requiring low-latency move suggestions

Limitations

Does not perform explicit search (unlike traditional chess engines)
May make suboptimal moves in complex tactical positions
Performance depends on training data distribution
Best suited for fast move suggestions rather than deep analysis

Background

This model is based on the architecture from DeepMind's Searchless Chess work. The self-play training implementation and this trained model are original work by Darrell Best.

For the full self-play training implementation and codebase, visit:

Repository: https://github.com/DarrellBest/searchless_chess

License

Apache 2.0

Model Card Contact

For questions or issues, please open an issue on the GitHub repository.

Downloads last month: 21

Video Preview

Reinforcement Learning