kunhunjon commited on 20 days ago

Commit

d3b192b

verified ·

1 Parent(s): a534acd

Upload sharded model (9x2GB shards, continuous batching, neuronxcc 2.21)

Browse files

Files changed (21) hide show

.gitattributes +1 -0
README.md +134 -0
added_tokens.json +28 -0
config.json +69 -0
merges.txt +0 -0
model.shard0000.pt +3 -0
model.shard0001.pt +3 -0
model.shard0002.pt +3 -0
model.shard0003.pt +3 -0
model.shard0004.pt +3 -0
model.shard0005.pt +3 -0
model.shard0006.pt +3 -0
model.shard0007.pt +3 -0
model.shard0008.pt +3 -0
model.shards.json +62 -0
neuron_config.json +43 -0
reconstruct.py +60 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +247 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,134 @@

+---
+language:
+- en
+license: apache-2.0
+pipeline_tag: text-generation
+tags:
+- chess
+- neuron
+- aws-trainium
+- vllm
+- optimum-neuron
+- continuous-batching
+- sharded
+base_model: karanps/ChessLM_Qwen3
+---
+# ChessLM Qwen3 - Neuron Traced (Sharded Model)
+This is a **sharded version** of the Neuron-traced [karanps/ChessLM_Qwen3](https://huggingface.co/karanps/ChessLM_Qwen3) optimized for AWS Trainium (trn1) and Inferentia (inf2) instances using vLLM with **continuous batching enabled**.
+The model.pt file (16.4GB) has been split into **9 shards** of ~2GB each for easier downloading and storage.
+## Model Details
+- **Base Model**: Qwen3-2B fine-tuned for chess
+- **Compilation**: optimum-neuron[vllm]==0.3.0
+- **Compiler Version**: neuronxcc 2.21.33363.0
+- **Target Hardware**: AWS Trainium (trn1) / Inferentia (inf2)
+- **Precision**: BF16
+- **Tensor Parallelism**: 2 cores
+- **Batch Size**: 4 (continuous batching enabled)
+- **Max Sequence Length**: 2048
+- **Model Format**: Sharded (9 parts)
+## Files
+### Model Shards
+- `model.shard0000.pt` through `model.shard0007.pt`: 2GB each
+- `model.shard0008.pt`: 799MB (final shard)
+- `model.shards.json`: Metadata with SHA256 hashes for verification
+- `reconstruct.py`: Script to reconstruct the original model.pt
+### Configuration Files
+- `config.json`: Model configuration
+- `neuron_config.json`: Neuron compilation settings
+- Tokenizer files: `tokenizer.json`, `vocab.json`, `merges.txt`, etc.
+## Usage
+### Option 1: Reconstruct the Full Model
+If you need the complete `model.pt` file:
+```bash
+# Clone the repository
+git clone https://huggingface.co/kunhunjon/ChessLM_Qwen3_Trainium_Sharded
+cd ChessLM_Qwen3_Trainium_Sharded
+# Reconstruct the original model.pt
+python3 reconstruct.py
+# This will create model.pt (16.4GB) from the shards
+```
+### Option 2: Use Directly with optimum-neuron
+The model can be loaded directly without reconstruction:
+```python
+from optimum.neuron import NeuronModelForCausalLM
+from transformers import AutoTokenizer
+# Load the model (will handle shards automatically if needed)
+model = NeuronModelForCausalLM.from_pretrained("kunhunjon/ChessLM_Qwen3_Trainium_Sharded")
+tokenizer = AutoTokenizer.from_pretrained("kunhunjon/ChessLM_Qwen3_Trainium_Sharded")
+# Run inference
+prompt = "e2e4"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=20)
+result = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(result)
+```
+## Requirements
+```bash
+pip install optimum-neuron[vllm]==0.3.0
+pip install neuronx-distributed --extra-index-url=https://pip.repos.neuron.amazonaws.com
+```
+## Hardware Requirements
+- AWS Trainium (trn1.32xlarge, trn1.2xlarge) or Inferentia (inf2) instances
+- At least 2 Neuron cores (as configured during tracing)
+- Minimum 32GB RAM recommended
+## Sharding Details
+The model was sharded using a custom script that:
+- Splits the 16.4GB model.pt into 9 chunks of ~2GB each
+- Generates SHA256 hashes for each shard for integrity verification
+- Includes a reconstruction script to reassemble the original file
+- Preserves all original model functionality
+### Verification
+The `model.shards.json` file contains SHA256 hashes for each shard. The reconstruction script automatically verifies these hashes when reassembling the model.
+## Continuous Batching
+This model is compiled with **continuous batching enabled**, which allows vLLM to:
+- Process multiple requests simultaneously with dynamic batch sizes up to 4
+- Optimize throughput by batching requests with different sequence lengths
+- Reduce latency for concurrent inference workloads
+**Note**: On-device sampling is disabled due to a known Neuron runtime limitation when using tensor parallelism with 2 cores. Sampling is handled on the host instead.
+## Compilation Details
+- `batch_size=4`
+- `sequence_length=2048`
+- `num_cores=2`
+- `auto_cast_type="bf16"`
+- `continuous_batching=True`
+- Total compilation time: ~8.1 minutes
+## License
+This model inherits the license from the base model [karanps/ChessLM_Qwen3](https://huggingface.co/karanps/ChessLM_Qwen3).
+## Citation
+If you use this model, please cite the original ChessLM model and AWS Neuron tools.

added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

config.json ADDED Viewed

	@@ -0,0 +1,69 @@

+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "dtype": "float32",
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 12288,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 36,
+  "model_type": "qwen3",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 36,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.shard0000.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:840add1e51a5bf0bb8da47d347e3d120577557706af2be13c7863a87d87feef7
+size 2097152000

model.shard0001.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb03c716b799049da61b0f51bf702576502821e770f27fe5ad596e45f2901293
+size 2097152000

model.shard0002.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2113ab040f5803048520559b532f9354b6c9a84567614a236a79a10fa72af803
+size 2097152000

model.shard0003.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:46e74e9919f0ad0406402fe56481a9b06f785e895a88f0f25988b455175a0fe1
+size 2097152000

model.shard0004.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:50fa0595a89f711bd73dba6138da92c6a4785298ea2049fb4eb52231696a04c4
+size 2097152000

model.shard0005.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ce56a9c20b94e5bfe98c17bde3ca8538ae448df1c85d0ca15fbce5dd110ef1a3
+size 2097152000

model.shard0006.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0b8ec7de3ec641a46536894fb48871ad0d107773f1a13c6a9997bf699043eefc
+size 2097152000

model.shard0007.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3cb4c72d399021f970ac3e465b897ca09611af166d9bd37dd71c37401d13703a
+size 2097152000

model.shard0008.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aaecd4628682247206ec58cbb9f189a4abf1e21bed2f5e6ddfeaf161a0b7b886
+size 837175015

model.shards.json ADDED Viewed

	@@ -0,0 +1,62 @@

+{
+  "original_file": "model.pt",
+  "file_size": 17614391015,
+  "shard_size_mb": 2000,
+  "num_shards": 9,
+  "shards": [
+    {
+      "index": 0,
+      "filename": "model.shard0000.pt",
+      "size": 2097152000,
+      "sha256": "840add1e51a5bf0bb8da47d347e3d120577557706af2be13c7863a87d87feef7"
+    },
+    {
+      "index": 1,
+      "filename": "model.shard0001.pt",
+      "size": 2097152000,
+      "sha256": "eb03c716b799049da61b0f51bf702576502821e770f27fe5ad596e45f2901293"
+    },
+    {
+      "index": 2,
+      "filename": "model.shard0002.pt",
+      "size": 2097152000,
+      "sha256": "2113ab040f5803048520559b532f9354b6c9a84567614a236a79a10fa72af803"
+    },
+    {
+      "index": 3,
+      "filename": "model.shard0003.pt",
+      "size": 2097152000,
+      "sha256": "46e74e9919f0ad0406402fe56481a9b06f785e895a88f0f25988b455175a0fe1"
+    },
+    {
+      "index": 4,
+      "filename": "model.shard0004.pt",
+      "size": 2097152000,
+      "sha256": "50fa0595a89f711bd73dba6138da92c6a4785298ea2049fb4eb52231696a04c4"
+    },
+    {
+      "index": 5,
+      "filename": "model.shard0005.pt",
+      "size": 2097152000,
+      "sha256": "ce56a9c20b94e5bfe98c17bde3ca8538ae448df1c85d0ca15fbce5dd110ef1a3"
+    },
+    {
+      "index": 6,
+      "filename": "model.shard0006.pt",
+      "size": 2097152000,
+      "sha256": "0b8ec7de3ec641a46536894fb48871ad0d107773f1a13c6a9997bf699043eefc"
+    },
+    {
+      "index": 7,
+      "filename": "model.shard0007.pt",
+      "size": 2097152000,
+      "sha256": "3cb4c72d399021f970ac3e465b897ca09611af166d9bd37dd71c37401d13703a"
+    },
+    {
+      "index": 8,
+      "filename": "model.shard0008.pt",
+      "size": 837175015,
+      "sha256": "aaecd4628682247206ec58cbb9f189a4abf1e21bed2f5e6ddfeaf161a0b7b886"
+    }
+  ]
+}

neuron_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "_serialized_key": "NxDNeuronConfig",
+  "async_mode": false,
+  "attn_kernel_enabled": false,
+  "batch_size": 4,
+  "capacity_factor": null,
+  "cc_pipeline_tiling_factor": 2,
+  "checkpoint_id": "karanps/ChessLM_Qwen3",
+  "checkpoint_revision": "e0d57507d96b2be2dd0dc901ecb231dec2dd6330",
+  "continuous_batching": true,
+  "enable_bucketing": false,
+  "ep_degree": 1,
+  "flash_decoding_enabled": false,
+  "fused_qkv": true,
+  "glu_mlp": true,
+  "is_chunked_prefill": false,
+  "local_ranks_size": 2,
+  "logical_nc_config": 1,
+  "max_batch_size": 4,
+  "max_context_length": 2048,
+  "max_topk": 256,
+  "mlp_kernel_enabled": false,
+  "mlp_kernel_fuse_residual_add": false,
+  "n_active_tokens": 2048,
+  "neuronxcc_version": "2.21.33363.0+82129205",
+  "num_cores_per_group": 1,
+  "on_device_sampling": false,
+  "optimum_neuron_version": "0.3.0",
+  "output_logits": false,
+  "padding_side": "right",
+  "pp_degree": 1,
+  "qk_layernorm": false,
+  "qkv_kernel_enabled": false,
+  "rpl_reduce_dtype": "bfloat16",
+  "sequence_length": 2048,
+  "sequence_parallel_enabled": false,
+  "speculation_length": 0,
+  "start_rank_id": 0,
+  "target": null,
+  "torch_dtype": "bfloat16",
+  "tp_degree": 2,
+  "vocab_parallel": false
+}

reconstruct.py ADDED Viewed

	@@ -0,0 +1,60 @@

+#!/usr/bin/env python3
+"""
+Script to reconstruct the original model file from shards
+"""
+import json
+import hashlib
+from pathlib import Path
+def reconstruct_file(shards_dir="."):
+    shards_dir = Path(shards_dir)
+    # Find metadata file
+    metadata_files = list(shards_dir.glob("*.shards.json"))
+    if not metadata_files:
+        print("Error: No shards metadata file found")
+        return False
+    metadata_path = metadata_files[0]
+    print(f"Loading metadata: {metadata_path}")
+    with open(metadata_path, 'r') as f:
+        metadata = json.load(f)
+    output_file = metadata["original_file"]
+    print(f"Reconstructing: {output_file}")
+    print(f"  Expected size: {metadata['file_size'] / (1024**3):.2f} GB")
+    print(f"  Number of shards: {metadata['num_shards']}")
+    with open(output_file, 'wb') as f_out:
+        for shard_info in metadata["shards"]:
+            shard_path = shards_dir / shard_info["filename"]
+            print(f"  Processing shard {shard_info['index'] + 1}/{metadata['num_shards']}: {shard_info['filename']}")
+            if not shard_path.exists():
+                print(f"Error: Shard not found: {shard_path}")
+                return False
+            # Read shard
+            with open(shard_path, 'rb') as f_in:
+                chunk_data = f_in.read()
+            # Verify hash
+            chunk_hash = hashlib.sha256(chunk_data).hexdigest()
+            if chunk_hash != shard_info["sha256"]:
+                print(f"Error: Hash mismatch for {shard_info['filename']}")
+                print(f"  Expected: {shard_info['sha256']}")
+                print(f"  Got: {chunk_hash}")
+                return False
+            # Write to output
+            f_out.write(chunk_data)
+    print(f"\n✓ Reconstruction complete: {output_file}")
+    return True
+if __name__ == "__main__":
+    import sys
+    shards_dir = sys.argv[1] if len(sys.argv) > 1 else "."
+    success = reconstruct_file(shards_dir)
+    exit(0 if success else 1)

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9417dfa2470f086897a0fa5acf4c11e1b05646717bdd7f9d4dc119332c65d421
+size 11422919

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,247 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0].role == 'system' %}\n        {{- messages[0].content + '\\n\\n' }}\n    {%- endif %}\n    {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0].role == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n    {%- set index = (messages|length - 1) - loop.index0 %}\n    {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n        {%- set ns.multi_step_tool = false %}\n        {%- set ns.last_query_index = index %}\n    {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n    {%- if message.content is string %}\n        {%- set content = message.content %}\n    {%- else %}\n        {%- set content = '' %}\n    {%- endif %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n        {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {%- set reasoning_content = '' %}\n        {%- if message.reasoning_content is string %}\n            {%- set reasoning_content = message.reasoning_content %}\n        {%- else %}\n            {%- if '</think>' in content %}\n                {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n                {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n            {%- endif %}\n        {%- endif %}\n        {%- if loop.index0 > ns.last_query_index %}\n            {%- if loop.last or (not loop.last and reasoning_content) %}\n                {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n            {%- else %}\n                {{- '<|im_start|>' + message.role + '\\n' + content }}\n            {%- endif %}\n        {%- else %}\n            {{- '<|im_start|>' + message.role + '\\n' + content }}\n        {%- endif %}\n        {%- if message.tool_calls %}\n            {%- for tool_call in message.tool_calls %}\n                {%- if (loop.first and content) or (not loop.first) %}\n                    {{- '\\n' }}\n                {%- endif %}\n                {%- if tool_call.function %}\n                    {%- set tool_call = tool_call.function %}\n                {%- endif %}\n                {{- '<tool_call>\\n{\"name\": \"' }}\n                {{- tool_call.name }}\n                {{- '\", \"arguments\": ' }}\n                {%- if tool_call.arguments is string %}\n                    {{- tool_call.arguments }}\n                {%- else %}\n                    {{- tool_call.arguments | tojson }}\n                {%- endif %}\n                {{- '}\\n</tool_call>' }}\n            {%- endfor %}\n        {%- endif %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n    {%- if enable_thinking is defined and enable_thinking is false %}\n        {{- '<think>\\n\\n</think>\\n\\n' }}\n    {%- endif %}\n{%- endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "max_length": 512,
+  "model_max_length": 131072,
+  "pad_to_multiple_of": null,
+  "pad_token": "<|endoftext|>",
+  "pad_token_type_id": 0,
+  "padding_side": "left",
+  "split_special_tokens": false,
+  "stride": 0,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff