Searchless Chess 9M Self-Play

A 9-million parameter transformer-based chess engine trained via self-play with Stockfish evaluation. This model learns to play chess without explicit search during inference, relying purely on learned pattern recognition.

Model Description

  • Model Size: 9M parameters (8 layers, 256 embedding dim, 8 attention heads)
  • Architecture: Decoder-only Transformer with learned positional encodings
  • Training Method: Self-play with Stockfish rewards
  • Framework: JAX + Haiku
  • Q-Value Distribution: 128 return buckets for action-value prediction

This model predicts action-values (Q-values) for chess positions without performing tree search, making it extremely fast for inference while maintaining strong play.

Installation

CPU Installation

Install the required dependencies for CPU inference:

pip install jax jaxlib dm-haiku orbax-checkpoint numpy chess huggingface-hub jaxtyping apache-beam grain

GPU Installation (Recommended)

For GPU acceleration with CUDA 12:

pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install dm-haiku orbax-checkpoint numpy chess huggingface-hub jaxtyping apache-beam grain

For other CUDA versions, see the JAX installation guide.

Note: This model includes all necessary code and can be used without cloning the original repository.

Quick Start

import sys
from huggingface_hub import snapshot_download

# Download model from HuggingFace Hub
model_path = snapshot_download(
    repo_id="dbest-isi/searchless-chess-9M-selfplay",
    local_dir="./searchless_chess_model"
)

# Add bundled code to Python path
sys.path.insert(0, f"{model_path}/searchless_chess_code")

# Import model wrapper
import hf_model

# Load the model
model = hf_model.SearchlessChessModel.from_pretrained(model_path)

# Make a prediction
fen = "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"
result = model.predict(fen, temperature=1.0)

print(f"Best move: {result['best_move']}")
print(f"Q-value: {result['q_value']:.4f}")
print(f"Action probabilities shape: {result['action_probs'].shape}")

Example Output

Best move: e7e5
Q-value: 0.0119
Action probabilities shape: (1968,)

Full Example with Multiple Positions

import sys
from huggingface_hub import snapshot_download

# Download and setup
model_path = snapshot_download(
    repo_id="dbest-isi/searchless-chess-9M-selfplay",
    local_dir="./searchless_chess_model"
)
sys.path.insert(0, f"{model_path}/searchless_chess_code")

import hf_model

# Load model
print("Loading model...")
model = hf_model.SearchlessChessModel.from_pretrained(model_path)
print("Model loaded!")

# Test on multiple positions
positions = [
    ("Starting position", "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"),
    ("After 1.e4", "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"),
    ("Scandinavian Defense", "rnbqkbnr/ppp1pppp/8/3p4/4P3/8/PPPP1PPP/RNBQKBNR w KQkq d6 0 2"),
]

for name, fen in positions:
    result = model.predict(fen)
    print(f"\n{name}")
    print(f"  FEN: {fen}")
    print(f"  Best move: {result['best_move']}")
    print(f"  Q-value: {result['q_value']:.4f}")

Model Architecture

TransformerConfig(
    vocab_size=1968,
    output_size=128,
    embedding_dim=256,
    num_layers=8,
    num_heads=8,
    max_sequence_length=79,
    num_return_buckets=128,
    pos_encodings="LEARNED",
    apply_post_ln=True,
    apply_qk_layernorm=False,
    use_causal_mask=False,
)

Training Details

  • Base Model: Initialized from pretrained 9M checkpoint
  • Training Method: Self-play reinforcement learning
  • Reward Signal: Stockfish evaluation at depth 20
  • Iteration: 22 (EMA parameters)
  • Action Space: 1968 possible moves (all legal chess moves)
  • Value Representation: Discretized into 128 buckets

Use Cases

  • Fast chess move prediction without search
  • Chess position evaluation
  • Research on learned planning in board games
  • Integration into chess applications requiring low-latency move suggestions

Limitations

  • Does not perform explicit search (unlike traditional chess engines)
  • May make suboptimal moves in complex tactical positions
  • Performance depends on training data distribution
  • Best suited for fast move suggestions rather than deep analysis

Background

This model is based on the architecture from DeepMind's Searchless Chess work. The self-play training implementation and this trained model are original work by Darrell Best.

For the full self-play training implementation and codebase, visit:

License

Apache 2.0

Model Card Contact

For questions or issues, please open an issue on the GitHub repository.

Downloads last month
21
Video Preview
loading