---
language:
- en
license: apache-2.0
pipeline_tag: text-generation
tags:
- chess
- neuron
- aws-trainium
- vllm
- optimum-neuron
- continuous-batching
- sharded
base_model: karanps/ChessLM_Qwen3
---

# ChessLM Qwen3 - Neuron Traced (Sharded Model)

This is a **sharded version** of the Neuron-traced [karanps/ChessLM_Qwen3](https://huggingface.co/karanps/ChessLM_Qwen3) optimized for AWS Trainium (trn1) and Inferentia (inf2) instances using vLLM with **continuous batching enabled**.

The model.pt file (16.4GB) has been split into **9 shards** of ~2GB each for easier downloading and storage.

## Model Details

- **Base Model**: Qwen3-8B fine-tuned for chess
- **Compilation**: optimum-neuron[vllm]==0.3.0
- **Compiler Version**: neuronxcc 2.21.33363.0
- **Target Hardware**: AWS Trainium (trn1) / Inferentia (inf2)
- **Precision**: BF16
- **Tensor Parallelism**: 2 cores
- **Batch Size**: 4 (continuous batching enabled)
- **Max Sequence Length**: 2048
- **Model Format**: Sharded (9 parts)

## Files

### Model Shards
- `model.shard0000.pt` through `model.shard0007.pt`: 2GB each
- `model.shard0008.pt`: 799MB (final shard)
- `model.shards.json`: Metadata with SHA256 hashes for verification
- `reconstruct.py`: Script to reconstruct the original model.pt

### Configuration Files
- `config.json`: Model configuration
- `neuron_config.json`: Neuron compilation settings
- Tokenizer files: `tokenizer.json`, `vocab.json`, `merges.txt`, etc.

## Usage

### Option 1: Reconstruct the Full Model

If you need the complete `model.pt` file:

```bash
# Clone the repository
git clone https://huggingface.co/kunhunjon/ChessLM_Qwen3_Trainium_Sharded
cd ChessLM_Qwen3_Trainium_Sharded

# Reconstruct the original model.pt
python3 reconstruct.py

# This will create model.pt (16.4GB) from the shards
```

### Option 2: Use Directly with optimum-neuron

The model can be loaded directly without reconstruction:

```python
from optimum.neuron import NeuronModelForCausalLM
from transformers import AutoTokenizer

# Load the model (will handle shards automatically if needed)
model = NeuronModelForCausalLM.from_pretrained("kunhunjon/ChessLM_Qwen3_Trainium_Sharded")
tokenizer = AutoTokenizer.from_pretrained("kunhunjon/ChessLM_Qwen3_Trainium_Sharded")

# Run inference
prompt = "e2e4"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
```

## Requirements

```bash
pip install optimum-neuron[vllm]==0.3.0
pip install neuronx-distributed --extra-index-url=https://pip.repos.neuron.amazonaws.com
```

## Hardware Requirements

- AWS Trainium (trn1.32xlarge, trn1.2xlarge) or Inferentia (inf2) instances
- At least 2 Neuron cores (as configured during tracing)
- Minimum 32GB RAM recommended

## Sharding Details

The model was sharded using a custom script that:
- Splits the 16.4GB model.pt into 9 chunks of ~2GB each
- Generates SHA256 hashes for each shard for integrity verification
- Includes a reconstruction script to reassemble the original file
- Preserves all original model functionality

### Verification

The `model.shards.json` file contains SHA256 hashes for each shard. The reconstruction script automatically verifies these hashes when reassembling the model.

## Continuous Batching

This model is compiled with **continuous batching enabled**, which allows vLLM to:
- Process multiple requests simultaneously with dynamic batch sizes up to 4
- Optimize throughput by batching requests with different sequence lengths
- Reduce latency for concurrent inference workloads

**Note**: On-device sampling is disabled due to a known Neuron runtime limitation when using tensor parallelism with 2 cores. Sampling is handled on the host instead.

## Compilation Details

- `batch_size=4`
- `sequence_length=2048`
- `num_cores=2`
- `auto_cast_type="bf16"`
- `continuous_batching=True`
- Total compilation time: ~8.1 minutes

## License

This model inherits the license from the base model [karanps/ChessLM_Qwen3](https://huggingface.co/karanps/ChessLM_Qwen3).

## Citation

If you use this model, please cite the original ChessLM model and AWS Neuron tools.