prabhuat commited on
Commit
ff4566a
·
verified ·
1 Parent(s): ac01696

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,239 +1,10 @@
1
  ---
 
 
 
2
  base_model: unsloth/Qwen3-Coder-30B-A3B-Instruct
3
- language:
4
- - en
5
  library_name: mlx
6
  license: apache-2.0
7
- tags:
8
- - cdxgen
9
- - transformers
10
- - sbom
11
- - supply-chain-security
12
- - mlx
13
  pipeline_tag: text-generation
14
- datasets:
15
- - CycloneDX/cdx-docs
16
  ---
17
- # Abstract
18
-
19
- We present [cdx1](https://huggingface.co/collections/CycloneDX/cdx1-67a616a859ac0582df99700b) and [cdx1-pro](https://huggingface.co/collections/CycloneDX/cdx1-pro-688e15a3c3b593753ceefc05), a family of language models designed to emulate the expertise of a professional in DevOps, xBOM (Bill of Materials), and the CycloneDX specification. The base models, `unsloth/Qwen2.5-Coder-14B-Instruct` (for cdx1) and `unsloth/Qwen3-Coder-30B-A3B-Instruct` (for cdx1-pro), were fine-tuned on a specialized, high-quality [dataset](https://huggingface.co/CycloneDX/datasets). This dataset was constructed using a synthetic data generation strategy with a teacher model (Gemini 2.5 Pro). The primary objective was to align the fine-tuned models' capabilities with the teacher model's performance on xBOM and CycloneDX-related question-answering tasks.
20
-
21
- ## Approach to Data
22
-
23
- ### Data Curation and Generation
24
-
25
- The models were trained on [cdx-docs](https://huggingface.co/datasets/CycloneDX/cdx-docs), a curated dataset comprising technical documentation, authoritative OWASP guides, and semantic interpretations derived from the CycloneDX Generator (cdxgen) source code. The dataset was augmented using a synthetic data generation technique. This process involved prompting a teacher model (Gemini 2.5 Pro) to generate question-answer pairs that encapsulate the nuances and semantics of the domain. The generated data was structured to facilitate effective learning by the target cdx1 models.
26
-
27
- ### Alignment with Inference
28
-
29
- During the training phase, the dataset was iteratively refined to ensure the format and context of the training examples closely resembled the intended inference-time inputs. This alignment is critical for the models to learn the domain's complexity and respond accurately to real-world prompts.
30
-
31
- ## Benchmarking
32
-
33
- The cdx1 models are optimized for xBOM use cases, including BOM summarization, component tagging, validation, and troubleshooting. To evaluate model performance, we developed a custom benchmark suite named [xBOMEval](https://github.com/CycloneDX/cdxgen/tree/master/contrib/xBOMEval).
34
-
35
- ### Categories
36
-
37
- xBOMEval contains tests across the following categories:
38
-
39
- - **Bias:** Assesses potential model bias towards CycloneDX or SPDX specifications through targeted questions.
40
- - **Specification (Spec):** Measures factual recall and synthesis on topics such as CycloneDX, PURL, and SPDX.
41
- - **Logic:** Evaluates problem-solving and reasoning capabilities with complex questions about specifications.
42
- - **DevOps:** Assesses knowledge of platforms and tools like GitHub, Azure Pipelines, and package managers.
43
- - **Linux:** Tests proficiency with Linux environments, including terminal and PowerShell commands.
44
- - **Docker:** Measures understanding of Docker, Podman, and the OCI specification.
45
-
46
- ### Scoring
47
-
48
- Model responses were scored using a combination of automated evaluation by a high-capability model (Gemini 2.5 Pro) and manual human review. To maintain benchmark integrity, the evaluation set was held out and not included in any model's training data. Detailed results and configurations are available in the `xBOMEval` directory of the [cdxgen repository](https://github.com/CycloneDX/cdxgen).
49
-
50
- ## Benchmark Results - August 2025
51
-
52
- ### Logic Category Comparison
53
-
54
- | Model | Accuracy (%) |
55
- | :------------------ | :----------- |
56
- | `gemini-2.5-pro` | 93.60 |
57
- | `deepthink-r1` | 89.63 |
58
- | `gpt-5` | 83.23 |
59
- | `deepseek-r1` | 82.92 |
60
- | `gpt-oss-120b` | 80.49 |
61
- | `gpt-oss-20b` | 79.27 |
62
- | `cdx1-pro-mlx-8bit` | 73.17 |
63
- | `o4-mini-high` | 67.99 |
64
- | `qwen3-coder-480B` | 48.48 |
65
- | `cdx1-mlx-8bit` | 46.04 |
66
-
67
-
68
- This table compares the accuracy of **ten** different AI models on a logic benchmark designed to assess reasoning and problem-solving skills. The results highlight a clear hierarchy of performance, with the newly added `gpt-5` debuting as a top-tier model.
69
-
70
- **Key Findings from the Chart:**
71
-
72
- - **Dominant Leader:** `gemini-2.5-pro` is the undisputed leader, achieving the highest accuracy of **93.6%**, placing it in a class of its own.
73
- - **Top-Tier Competitors:** A strong group of models follows, led by `deepthink-r1` at **89.63%**. The newly introduced **`gpt-5`** makes a powerful debut, securing the third-place spot with **83.23%** accuracy. It slightly outperforms `deepseek-r1` (82.92%) and `gpt-oss-120b` (80.49%).
74
- - **Strong Mid-Tier:** The `gpt-oss-20b` model performs impressively well for its size at **79.27%**, outscoring several larger models and leading the middle pack, which also includes `cdx1-pro-mlx-8bit` (73.17%) and `o4-mini-high` (67.99%).
75
- - **Lower Performers:** `qwen3-coder-480B` (48.48%) and `cdx1-mlx-8bit` (46.04%) score the lowest. It is noted that the score for `cdx1-mlx-8bit` is artificially low due to context length limitations, which caused it to miss questions.
76
- - **Efficiency and Performance:** The results from the `gpt-oss` models, particularly the 20B variant, demonstrate that highly optimized, smaller models can be very competitive on logic tasks.
77
-
78
- ### Performance Tiers
79
-
80
- The models can be grouped into four clear performance tiers:
81
-
82
- - **Elite Tier (>90%):**
83
- - `gemini-2.5-pro` (93.6%)
84
- - **High-Performing Tier (80%-90%):**
85
- - `deepthink-r1` (89.63%)
86
- - `gpt-5` (83.23%)
87
- - `deepseek-r1` (82.92%)
88
- - `gpt-oss-120b` (80.49%)
89
- - **Mid-Tier (65%-80%):**
90
- - `gpt-oss-20b` (79.27%)
91
- - `cdx1-pro-mlx-8bit` (73.17%)
92
- - `o4-mini-high` (67.99%)
93
- - **Lower Tier (<50%):**
94
- - `qwen3-coder-480B` (48.48%)
95
- - `cdx1-mlx-8bit` (46.04%)
96
-
97
- ### Spec Category Comparison
98
-
99
- | Model | Accuracy (%) |
100
- | :------------------ | :----------- |
101
- | `gemini-2.5-pro` | 100.00 |
102
- | `deepseek-r1` | 98.58 |
103
- | `cdx1-pro-mlx-8bit` | 98.30 |
104
- | `gpt-5` | 95.17 |
105
- | `qwen3-coder-480B` | 90.34 |
106
- | `gpt-oss-120b` | 89.20 |
107
- | `cdx1-mlx-8bit` | 83.52 |
108
- | `deepthink-r1` | 12.36 |
109
- | `gpt-oss-20b` | 9.09 |
110
- | `o4-mini-high` | 0.00 |
111
-
112
-
113
- This table evaluates **ten** AI models on the "Spec Category," a test of factual recall on 352 technical specification questions. The results starkly illustrate that a model's reliability and cooperative behavior are as crucial as its underlying knowledge. Several models, including the newly added `gpt-5`, achieved high scores only after overcoming significant behavioral hurdles.
114
-
115
- **Key Findings from the Chart:**
116
-
117
- - **Elite Factual Recall:** A top tier of models demonstrated near-perfect knowledge retrieval. **`gemini-2.5-pro`** led with a perfect **100%** score and superior answer depth. It was closely followed by **`deepseek-r1`** (98.58%) and **`cdx1-pro-mlx-8bit`** (98.3%).
118
-
119
- - **High Score with Major Caveats (`gpt-5`):** The newly added **`gpt-5`** achieved a high accuracy of **95.17%**, placing it among the top performers. However, this result required a significant compromise:
120
- - The model initially refused to answer the full set of questions, only offering to respond in small batches that required six separate user confirmations. This compromise was accepted to prevent an outright failure.
121
- - A related variant, `gpt-5-thinking`, refused the test entirely after a minute of processing.
122
-
123
- - **Complete Behavioral Failures:** Three models effectively failed the test not due to a lack of knowledge, but because they refused to cooperate:
124
- - **`o4-mini-high`** scored **0%** after refusing to answer, citing too many questions.
125
- - **`deepthink-r1`** (12.36%) and **`gpt-oss-20b`** (9.09%) also failed, answering only a small fraction of the questions without acknowledging the limitation.
126
-
127
- - **Strong Mid-Tier Performers:** `qwen3-coder-480B` (90.34%) and `gpt-oss-120b` (89.2%) both demonstrated strong and reliable factual recall without the behavioral issues seen elsewhere.
128
-
129
- - **Impact of Scale and Systematic Errors:** The contrast between the two `cdx1` models is revealing. The larger `cdx1-pro-mlx-8bit` (98.3%) performed exceptionally well, while the smaller `cdx1-mlx-8bit` (83.52%) was hampered by a single systematic error (misunderstanding "CBOM"), which cascaded into multiple wrong answers.
130
-
131
- ### Summary of Key Themes
132
-
133
- 1. **Reliability is Paramount:** This test's most important finding is that knowledge is useless if a model is unwilling or unable to share it. The failures of `o4-mini-high`, `deepthink-r1`, `gpt-oss-20b`, and the behavioral friction from `gpt-5` highlight this critical dimension.
134
- 2. **Scores Don't Tell the Whole Story:** The 95.17% score for `gpt-5` obscures the significant user intervention required to obtain it. Similarly, the near-identical scores of `cdx1-pro` and `gemini-2.5-pro` don't capture Gemini's superior answer quality.
135
- 3. **Scale Can Overcome Flaws:** The dramatic performance leap from the 14B to the 30B `cdx1` model suggests that increased scale can help correct for specific knowledge gaps and improve overall accuracy.
136
-
137
- ### Other Categories
138
-
139
- Performance in additional technical categories is summarized below.
140
-
141
- | Category | cdx1-mlx-8bit | cdx1-pro-mlx-8bit |
142
- | -------- | ------------- | ----------------- |
143
- | DevOps | 87.46% | 96.1% |
144
- | Docker | 89.08% | 100% |
145
- | Linux | 90.6% | 95.8% |
146
-
147
- ## Model Availability
148
-
149
- The `cdx1` and `cdx1-pro` models are provided in multiple formats and quantization levels to facilitate deployment across diverse hardware environments. Models are available in the **MLX** format, optimized for local inference on Apple Silicon, and the **GGUF** format, which offers broad compatibility with CPUs and various GPUs. The selection of quantization levels allows users to balance performance with resource consumption, enabling effective operation even in environments with limited VRAM.
150
-
151
- The table below details the available formats and their approximate resource requirements. All quantized models can be found on [Hugging Face](https://huggingface.co/CycloneDX/models).
152
-
153
- | Model | Format | Quantization | File Size (GiB) | Est. VRAM (GiB) | Notes |
154
- | :----------------- | :----- | :----------- | :-------------- | :-------------- | :----------------------------------------- |
155
- | **cdx1 (14B)** | MLX | 4-bit | ~8.1 | > 8 | For Apple Silicon with unified memory. |
156
- | | MLX | 6-bit | ~12 | > 12 | For Apple Silicon with unified memory. |
157
- | | MLX | 8-bit | ~14.2 | > 14 | Higher fidelity for Apple Silicon. |
158
- | | MLX | 16-bit | ~30 | > 30 | bfloat16 for fine-tuning. |
159
- | | GGUF | Q4_K_M | 8.99 | ~10.5 | Recommended balance for quality/size. |
160
- | | GGUF | Q8_0 | 15.7 | ~16.5 | Near-lossless quality. |
161
- | | GGUF | BF16 | 29.5 | ~30 | bfloat16 for fine-tuning. |
162
- | **cdx1-pro (30B)** | MLX | 4-bit | ~17.5 | > 18 | For Apple Silicon with unified memory. |
163
- | | MLX | 6-bit | ~24.8 | > 25 | For Apple Silicon with unified memory. |
164
- | | MLX | 8-bit | ~32.4 | > 33 | Higher fidelity for Apple Silicon. |
165
- | | MLX | 16-bit | ~57 | > 57 | bfloat16 for fine-tuning. |
166
- | | GGUF | Q4_K_M | 18.6 | ~20.0 | Recommended balance for quality/size. |
167
- | | GGUF | IQ4_NL | 17.6 | ~20.0 | Recommended balance for quality/size. |
168
- | | GGUF | Q8_0 | 32.5 | ~33 | Near-lossless quality. |
169
- | | GGUF | Q2_K | 11.3 | ~12 | Low quality. Use for speculative decoding. |
170
- | | GGUF | BF16 | 57 | ~60 | bfloat16 for fine-tuning. |
171
-
172
- **Notes on Quantization and Formats:**
173
-
174
- - **IQ4_NL (Importance-aware Quantization, Non-Linear):** A sophisticated 4-bit method that preserves important model weights with higher precision. It often provides superior performance compared to standard 4-bit quants at a similar file size and is a strong alternative to `Q4_K_M`.
175
- - **K-Quants (Q2_K, Q4_K_M):** This family of quantization methods generally offers a better quality-to-size ratio than older \_0 or \_1 variants.
176
- - **Q2_K:** An extremely small 2-bit quantization designed for environments with severe resource limitations. Users should anticipate a noticeable reduction in model accuracy and coherence in exchange for the minimal VRAM and storage footprint.
177
- - **Q8_0:** A full 8-bit quantization that provides high fidelity at the cost of a larger file size. It is suitable for systems with ample VRAM.
178
- - **VRAM Requirements:** The values provided are estimates for loading the model and processing a moderate context. Actual VRAM consumption can vary based on factors such as context length, batch size, and the specific inference software used.
179
-
180
- ## Safety and Bias
181
-
182
- ### Safety
183
- To rigorously evaluate safety performance, we developed a comprehensive testing framework comprising over 200 adversarial prompts spanning 10 critical risk categories including cybersecurity threats, hate speech, illegal activities, privacy violations, physical safety risks, misinformation, bias and discrimination, self-harm, child safety, and copyright infringement. These questions were systematically generated using a multi-layered approach: first establishing domain-specific threat models based on NIST AI RMF guidelines, then crafting prompts that incorporate real-world evasion techniques (including leetspeak substitutions, roleplay scenarios, and encoded instructions) to test for policy circumvention. Each category contains progressively severe prompts ranging from general inquiries about harmful activities to highly specific requests for executable code and step-by-step instructions. During evaluation, our model consistently refused all safety-compromising requests, demonstrating robust adherence to ethical boundaries without attempting to fulfill harmful instructions—even when presented with sophisticated evasion attempts. This testing protocol exceeds standard industry benchmarks by incorporating both direct harmful requests and nuanced edge cases designed to probe boundary conditions in safety policies.
184
-
185
- ### Bias
186
- Our analysis reveals that cdx1 and cdx1-pro models exhibits a notable bias toward CycloneDX specifications, a tendency directly attributable to the composition of its training data which contains significantly more CycloneDX-related content than competing Software Bill of Materials (SBOM) standards. This data imbalance manifests in the model's consistent preference for recommending CycloneDX over alternative frameworks such as SPDX and omnibor, even in contexts where these competing standards might offer superior suitability for specific use cases. The model frequently fails to provide balanced comparative analysis, instead defaulting to CycloneDX-centric recommendations without adequate consideration of factors like ecosystem compatibility, tooling support, or organizational requirements that might favor alternative specifications. We recognize this as a limitation affecting the model's objectivity in technical decision support. Our long-term mitigation strategy involves targeted expansion of the training corpus with high-quality, balanced documentation of all major SBOM standards, implementation of adversarial debiasing techniques during fine-tuning, and development of explicit prompting protocols that require the model to evaluate multiple standards against specific technical requirements before making recommendations. We are committed to evolving cdx1 toward genuine impartiality in standards evaluation while maintaining its deep expertise in software supply chain security.
187
-
188
- ## Weaknesses
189
-
190
- (To be determined)
191
-
192
- ## Acknowledgments
193
-
194
- (To be determined)
195
-
196
- ## Citation
197
-
198
- Please cite the following resources if you use the datasets, models, or benchmark in your work.
199
-
200
- ### For the Dataset
201
-
202
- ```bibtex
203
- @misc{cdx-docs,
204
- author = {OWASP CycloneDX Generator Team},
205
- title = {{cdx-docs: A Curated Dataset for SBOM and DevOps Tasks}},
206
- year = {2025},
207
- month = {February},
208
- howpublished = {\url{https://huggingface.co/datasets/CycloneDX/cdx-docs}}
209
- }
210
- ```
211
-
212
- ### For the Models
213
-
214
- ```bibtex
215
- @misc{cdx1_models,
216
- author = {OWASP CycloneDX Generator Team},
217
- title = {{cdx1 and cdx1-pro: Language Models for SBOM and DevOps}},
218
- year = {2025},
219
- month = {February},
220
- howpublished = {\url{https://huggingface.co/CycloneDX}}
221
- }
222
- ```
223
-
224
- ### For the xBOMEval Benchmark
225
-
226
- ```bibtex
227
- @misc{xBOMEval_v1,
228
- author = {OWASP CycloneDX Generator Team},
229
- title = {{xBOMEval: A Benchmark for Evaluating Language Models on SBOM Tasks}},
230
- year = {2025},
231
- month = {August},
232
- howpublished = {\url{https://github.com/CycloneDX/cdxgen}}
233
- }
234
- ```
235
-
236
- ## Licenses
237
-
238
- - **Datasets:** CC0-1.0
239
- - **Models:** Apache-2.0
 
1
  ---
2
+ tags:
3
+ - unsloth
4
+ - mlx
5
  base_model: unsloth/Qwen3-Coder-30B-A3B-Instruct
 
 
6
  library_name: mlx
7
  license: apache-2.0
8
+ license_link: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE
 
 
 
 
 
9
  pipeline_tag: text-generation
 
 
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
chat_template.jinja CHANGED
@@ -1,86 +1,135 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0].role == 'system' %}
4
- {{- messages[0].content + '\n\n' }}
 
 
 
 
 
 
 
 
 
 
 
5
  {%- endif %}
6
- {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
- {%- for tool in tools %}
8
- {{- "\n" }}
9
- {{- tool | tojson }}
10
- {%- endfor %}
11
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
  {%- else %}
13
- {%- if messages[0].role == 'system' %}
14
- {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
- {%- endif %}
16
  {%- endif %}
17
- {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
- {%- for message in messages[::-1] %}
19
- {%- set index = (messages|length - 1) - loop.index0 %}
20
- {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
- {%- set ns.multi_step_tool = false %}
22
- {%- set ns.last_query_index = index %}
23
- {%- endif %}
24
- {%- endfor %}
25
- {%- for message in messages %}
26
- {%- if message.content is string %}
27
- {%- set content = message.content %}
28
- {%- else %}
29
- {%- set content = '' %}
30
  {%- endif %}
31
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
- {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
- {%- elif message.role == "assistant" %}
34
- {%- set reasoning_content = '' %}
35
- {%- if message.reasoning_content is string %}
36
- {%- set reasoning_content = message.reasoning_content %}
37
- {%- else %}
38
- {%- if '</think>' in content %}
39
- {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
- {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
- {%- endif %}
42
  {%- endif %}
43
- {%- if loop.index0 > ns.last_query_index %}
44
- {%- if loop.last or (not loop.last and reasoning_content) %}
45
- {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
- {%- else %}
47
- {{- '<|im_start|>' + message.role + '\n' + content }}
 
 
 
48
  {%- endif %}
49
- {%- else %}
50
- {{- '<|im_start|>' + message.role + '\n' + content }}
51
- {%- endif %}
52
- {%- if message.tool_calls %}
53
- {%- for tool_call in message.tool_calls %}
54
- {%- if (loop.first and content) or (not loop.first) %}
55
- {{- '\n' }}
56
- {%- endif %}
57
- {%- if tool_call.function %}
58
- {%- set tool_call = tool_call.function %}
59
- {%- endif %}
60
- {{- '<tool_call>\n{"name": "' }}
61
- {{- tool_call.name }}
62
- {{- '", "arguments": ' }}
63
- {%- if tool_call.arguments is string %}
64
- {{- tool_call.arguments }}
65
- {%- else %}
66
- {{- tool_call.arguments | tojson }}
67
  {%- endif %}
68
- {{- '}\n</tool_call>' }}
69
  {%- endfor %}
 
 
 
 
 
 
 
 
 
 
 
70
  {%- endif %}
 
 
 
 
 
 
 
 
 
71
  {{- '<|im_end|>\n' }}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  {%- elif message.role == "tool" %}
73
- {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
- {{- '<|im_start|>user' }}
75
  {%- endif %}
76
- {{- '\n<tool_response>\n' }}
77
- {{- content }}
78
- {{- '\n</tool_response>' }}
79
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
 
 
80
  {{- '<|im_end|>\n' }}
81
  {%- endif %}
 
 
82
  {%- endif %}
83
  {%- endfor %}
84
  {%- if add_generation_prompt %}
85
  {{- '<|im_start|>assistant\n' }}
86
- {%- endif %}
 
 
1
+ {# Copyright 2025-present Unsloth. Apache 2.0 License. Unsloth Chat template fixes #}
2
+ {% macro render_item_list(item_list, tag_name='required') %}
3
+ {%- if item_list is defined and item_list is iterable and item_list | length > 0 %}
4
+ {%- if tag_name %}{{- '\n<' ~ tag_name ~ '>' -}}{% endif %}
5
+ {{- '[' }}
6
+ {%- for item in item_list -%}
7
+ {%- if loop.index > 1 %}{{- ", "}}{% endif -%}
8
+ {%- if item is string -%}
9
+ {{ "`" ~ item ~ "`" }}
10
+ {%- else -%}
11
+ {{ item }}
12
+ {%- endif -%}
13
+ {%- endfor -%}
14
+ {{- ']' }}
15
+ {%- if tag_name %}{{- '</' ~ tag_name ~ '>' -}}{% endif %}
16
  {%- endif %}
17
+ {% endmacro %}
18
+
19
+ {%- if messages[0]["role"] == "system" %}
20
+ {%- set system_message = messages[0]["content"] %}
21
+ {%- set loop_messages = messages[1:] %}
 
22
  {%- else %}
23
+ {%- set loop_messages = messages %}
 
 
24
  {%- endif %}
25
+
26
+ {%- if not tools is defined %}
27
+ {%- set tools = [] %}
28
+ {%- endif %}
29
+
30
+ {%- if system_message is defined %}
31
+ {{- "<|im_start|>system\n" + system_message }}
32
+ {%- else %}
33
+ {%- if tools is iterable and tools | length > 0 %}
34
+ {{- "<|im_start|>system\nYou are Qwen, a helpful AI assistant that can interact with a computer to solve tasks." }}
 
 
 
35
  {%- endif %}
36
+ {%- endif %}
37
+ {%- if tools is iterable and tools | length > 0 %}
38
+ {{- "\n\nYou have access to the following functions:\n\n" }}
39
+ {{- "<tools>" }}
40
+ {%- for tool in tools %}
41
+ {%- if tool.function is defined %}
42
+ {%- set tool = tool.function %}
 
 
 
 
43
  {%- endif %}
44
+ {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
45
+ {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
46
+ {{- '\n<parameters>' }}
47
+ {%- for param_name, param_fields in tool.parameters.properties|items %}
48
+ {{- '\n<parameter>' }}
49
+ {{- '\n<name>' ~ param_name ~ '</name>' }}
50
+ {%- if param_fields.type is defined %}
51
+ {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
52
  {%- endif %}
53
+ {%- if param_fields.description is defined %}
54
+ {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
55
+ {%- endif %}
56
+ {{- render_item_list(param_fields.enum, 'enum') }}
57
+ {%- set handled_keys = ['type', 'description', 'enum', 'required'] %}
58
+ {%- for json_key, json_value in param_fields|items %}
59
+ {%- if json_key not in handled_keys %}
60
+ {%- set normed_json_key = json_key|string %}
61
+ {%- if json_value is mapping %}
62
+ {{- '\n<' ~ normed_json_key ~ '>' ~ (json_value | tojson | safe) ~ '</' ~ normed_json_key ~ '>' }}
63
+ {%- else %}
64
+ {{- '\n<' ~ normed_json_key ~ '>' ~ (json_value | string) ~ '</' ~ normed_json_key ~ '>' }}
65
+ {%- endif %}
 
 
 
 
 
66
  {%- endif %}
 
67
  {%- endfor %}
68
+ {{- render_item_list(param_fields.required, 'required') }}
69
+ {{- '\n</parameter>' }}
70
+ {%- endfor %}
71
+ {{- render_item_list(tool.parameters.required, 'required') }}
72
+ {{- '\n</parameters>' }}
73
+ {%- if tool.return is defined %}
74
+ {%- if tool.return is mapping %}
75
+ {{- '\n<return>' ~ (tool.return | tojson | safe) ~ '</return>' }}
76
+ {%- else %}
77
+ {{- '\n<return>' ~ (tool.return | string) ~ '</return>' }}
78
+ {%- endif %}
79
  {%- endif %}
80
+ {{- '\n</function>' }}
81
+ {%- endfor %}
82
+ {{- "\n</tools>" }}
83
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
84
+ {%- endif %}
85
+ {%- if system_message is defined %}
86
+ {{- '<|im_end|>\n' }}
87
+ {%- else %}
88
+ {%- if tools is iterable and tools | length > 0 %}
89
  {{- '<|im_end|>\n' }}
90
+ {%- endif %}
91
+ {%- endif %}
92
+ {%- for message in loop_messages %}
93
+ {%- if message.role == "assistant" and message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
94
+ {{- '<|im_start|>' + message.role }}
95
+ {%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
96
+ {{- '\n' + message.content | trim + '\n' }}
97
+ {%- endif %}
98
+ {%- for tool_call in message.tool_calls %}
99
+ {%- if tool_call.function is defined %}
100
+ {%- set tool_call = tool_call.function %}
101
+ {%- endif %}
102
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
103
+ {%- if tool_call.arguments is defined %}
104
+ {%- for args_name, args_value in tool_call.arguments|items %}
105
+ {{- '<parameter=' + args_name + '>\n' }}
106
+ {%- set args_value = args_value if args_value is string else args_value | string %}
107
+ {{- args_value }}
108
+ {{- '\n</parameter>\n' }}
109
+ {%- endfor %}
110
+ {%- endif %}
111
+ {{- '</function>\n</tool_call>' }}
112
+ {%- endfor %}
113
+ {{- '<|im_end|>\n' }}
114
+ {%- elif message.role == "user" or message.role == "system" or message.role == "assistant" %}
115
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
116
  {%- elif message.role == "tool" %}
117
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
118
+ {{- '<|im_start|>user\n' }}
119
  {%- endif %}
120
+ {{- '<tool_response>\n' }}
121
+ {{- message.content }}
122
+ {{- '\n</tool_response>\n' }}
123
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
124
+ {{- '<|im_end|>\n' }}
125
+ {%- elif loop.last %}
126
  {{- '<|im_end|>\n' }}
127
  {%- endif %}
128
+ {%- else %}
129
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
130
  {%- endif %}
131
  {%- endfor %}
132
  {%- if add_generation_prompt %}
133
  {{- '<|im_start|>assistant\n' }}
134
+ {%- endif %}
135
+ {# Copyright 2025-present Unsloth. Apache 2.0 License. Unsloth Chat template fixes #}
config.json CHANGED
@@ -10,9 +10,9 @@
10
  "hidden_act": "silu",
11
  "hidden_size": 2048,
12
  "initializer_range": 0.02,
13
- "intermediate_size": 6144,
14
  "max_position_embeddings": 262144,
15
- "max_window_layers": 48,
16
  "mlp_only_layers": [],
17
  "model_type": "qwen3_moe",
18
  "moe_intermediate_size": 768,
@@ -24,16 +24,19 @@
24
  "num_key_value_heads": 4,
25
  "output_router_logits": false,
26
  "pad_token_id": 151654,
 
27
  "rms_norm_eps": 1e-06,
28
  "rope_scaling": null,
29
  "rope_theta": 10000000,
30
- "router_aux_loss_coef": 0.001,
 
31
  "sliding_window": null,
32
  "tie_word_embeddings": false,
33
  "torch_dtype": "bfloat16",
34
- "transformers_version": "4.54.0",
35
  "unsloth_fixed": true,
36
  "use_cache": true,
 
37
  "use_sliding_window": false,
38
  "vocab_size": 151936
39
  }
 
10
  "hidden_act": "silu",
11
  "hidden_size": 2048,
12
  "initializer_range": 0.02,
13
+ "intermediate_size": 5472,
14
  "max_position_embeddings": 262144,
15
+ "max_window_layers": 28,
16
  "mlp_only_layers": [],
17
  "model_type": "qwen3_moe",
18
  "moe_intermediate_size": 768,
 
24
  "num_key_value_heads": 4,
25
  "output_router_logits": false,
26
  "pad_token_id": 151654,
27
+ "qkv_bias": false,
28
  "rms_norm_eps": 1e-06,
29
  "rope_scaling": null,
30
  "rope_theta": 10000000,
31
+ "router_aux_loss_coef": 0.0,
32
+ "shared_expert_intermediate_size": 0,
33
  "sliding_window": null,
34
  "tie_word_embeddings": false,
35
  "torch_dtype": "bfloat16",
36
+ "transformers_version": "4.54.1",
37
  "unsloth_fixed": true,
38
  "use_cache": true,
39
+ "use_qk_norm": true,
40
  "use_sliding_window": false,
41
  "vocab_size": 151936
42
  }
generation_config.json CHANGED
@@ -1,13 +1,12 @@
1
  {
2
- "bos_token_id": 151643,
3
- "do_sample": true,
4
- "eos_token_id": [
5
- 151645,
6
- 151643
7
- ],
8
- "pad_token_id": 151643,
9
- "temperature": 0.7,
10
- "top_k": 20,
11
- "top_p": 0.8,
12
- "transformers_version": "4.51.0"
13
  }
 
1
  {
2
+ "pad_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "repetition_penalty": 1.05,
9
+ "temperature": 0.7,
10
+ "top_p": 0.8,
11
+ "top_k": 20
 
12
  }
model-00001-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:00bf00857a7ee66981f0724e01a46d7c942ee2844978466d3106863b2f84c99e
3
  size 5204647414
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2c91e9afd41575e546ee4186be55a982b3d5303849b37719635e541728d1581
3
  size 5204647414
model-00002-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:648268d56a9fd92ebb1f0c66462526d121dc72848071b85aa554f880ab42db4e
3
  size 4984970772
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9bebacf4ce55b4035cc4aee9264fe623edd9b470e6af4b9386d5441fd8e1a2d0
3
  size 4984970772
model-00003-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:97a6c01d2b231d9b1ba4d4b5d6bdbde6435850833e05a0784a29a51d03afdd83
3
  size 4984970797
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5efdcc20c7145ac7a277bc960efbb3f112a258304f447841aa009afddfa9150f
3
  size 4984970797
model-00004-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a2d24f9165c31655c032be0f19078ece7729457d47a841059a88f60aabb0702d
3
  size 4984970824
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bf2acef428d084cb213914d08fca5393ec9aca7497d5bc1c375b9cfb26a0dec
3
  size 4984970824
model-00005-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:45386c7039d77d4c072608a119b411d993074572631264326c675d631d4c5c38
3
  size 4984970804
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa9d5a95689a7ddacc557ac5d4ef91102e763efd7f6cf8b579820826e055dac0
3
  size 4984970804
model-00006-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3da0f64c4321d66e600ff12c235054b14d665e910857e445695c265d500b3aea
3
  size 4984970822
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c2a1fc571ec44d6bbba8928f86b9ab99f40147397d89a6895a850c03ab1c0ed
3
  size 4984970822
model-00007-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:35cca769e57ad0ef3e52c725009de170e6e6b46b0f052c1b374aff2d796a79f1
3
  size 4984970824
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c24ff403e02e6386a1bb597f0dee9ec12bf4fd6e3f40e276ea305cd958c732b0
3
  size 4984970824
model-00008-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e0eb2e9bc6b508762de131f39112b6e10ad6224c602f4d70524371cf2ec3cdb9
3
  size 4984970806
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b88ef5f2e4db320229315fbeb0257b414cfc9515189d558b023b9eae552ce63
3
  size 4984970806
model-00009-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ad3ce6ced2a58edf0fbf79d53c07fbee826288a2839be9c3d22ca8696df78c5e
3
  size 4984970818
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4889ad31c8412d47b883414d71cb798f37ea11cee1120c45aa54d631253931f1
3
  size 4984970818
model-00010-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9a8a79a95ea6e47283a6b90b757f3eb705ffb63aab465a5a35ff0eb86db2d43c
3
  size 4984970752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bfdc0c883de859f1e57d250f362b4af297573055adc83a7dd15eae41e61001e5
3
  size 4984970752
model-00011-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1597fe7587d4db6ae1b08b7bd24f50e8b275eb3bb8206f1a8f09309b89adda47
3
  size 4984970814
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ac3aeaf7bd7ce784625d51df9cb65148ec6a797b81fa7fce7cf49422cc8f4f8
3
  size 4984970814
model-00012-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:47a6b9413d6d2afcfcfef3b542b974266da010c390638e99547f54fc3d3721a6
3
  size 4984970822
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f9a273f814f700e0d9e0d107a83e285ea5b5f4d37c9807c6e4c709c3e2dbdb6
3
  size 4984970822
model-00013-of-00013.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:77050d027d7fa116e4857543404f5d2ae50bba47758ce45c8ea978927fb3fa26
3
  size 1024987469
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2baa49e37e374c49aa76d19b90c2552f6a71a3ce636822c0bad7277fd340474c
3
  size 1024987469
qwen3coder_tool_parser.py ADDED
@@ -0,0 +1,675 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPDX-License-Identifier: Apache-2.0
2
+
3
+ import json
4
+ import re
5
+ import uuid
6
+ from collections.abc import Sequence
7
+ from typing import Union, Optional, Any, List, Dict
8
+ from enum import Enum
9
+
10
+ from vllm.entrypoints.openai.protocol import (
11
+ ChatCompletionRequest,
12
+ ChatCompletionToolsParam,
13
+ DeltaMessage,
14
+ DeltaToolCall,
15
+ DeltaFunctionCall,
16
+ ExtractedToolCallInformation,
17
+ FunctionCall,
18
+ ToolCall,
19
+ )
20
+ from vllm.entrypoints.openai.tool_parsers.abstract_tool_parser import (
21
+ ToolParser,
22
+ ToolParserManager,
23
+ )
24
+ from vllm.logger import init_logger
25
+ from vllm.transformers_utils.tokenizer import AnyTokenizer
26
+
27
+ logger = init_logger(__name__)
28
+
29
+
30
+ @ToolParserManager.register_module("qwen3_xml")
31
+ class Qwen3XMLToolParser(ToolParser):
32
+ def __init__(self, tokenizer: AnyTokenizer):
33
+ super().__init__(tokenizer)
34
+
35
+ self.current_tool_name_sent: bool = False
36
+ self.prev_tool_call_arr: list[dict] = []
37
+ self.current_tool_id: int = -1
38
+ self.streamed_args_for_tool: list[str] = []
39
+
40
+ # Sentinel tokens for streaming mode
41
+ self.tool_call_start_token: str = "<tool_call>"
42
+ self.tool_call_end_token: str = "</tool_call>"
43
+ self.tool_call_prefix: str = "<function="
44
+ self.function_end_token: str = "</function>"
45
+ self.parameter_prefix: str = "<parameter="
46
+ self.parameter_end_token: str = "</parameter>"
47
+ self.is_tool_call_started: bool = False
48
+ self.failed_count: int = 0
49
+
50
+ # Enhanced streaming state - reset for each new message
51
+ self._reset_streaming_state()
52
+
53
+ # Regex patterns
54
+ self.tool_call_complete_regex = re.compile(
55
+ r"<tool_call>(.*?)</tool_call>", re.DOTALL
56
+ )
57
+ self.tool_call_regex = re.compile(
58
+ r"<tool_call>(.*?)</tool_call>|<tool_call>(.*?)$", re.DOTALL
59
+ )
60
+ self.tool_call_function_regex = re.compile(
61
+ r"<function=(.*?)</function>|<function=(.*)$", re.DOTALL
62
+ )
63
+ self.tool_call_parameter_regex = re.compile(
64
+ r"<parameter=(.*?)</parameter>|<parameter=(.*?)$", re.DOTALL
65
+ )
66
+
67
+ if not self.model_tokenizer:
68
+ raise ValueError(
69
+ "The model tokenizer must be passed to the ToolParser "
70
+ "constructor during construction."
71
+ )
72
+
73
+ self.tool_call_start_token_id = self.vocab.get(self.tool_call_start_token)
74
+ self.tool_call_end_token_id = self.vocab.get(self.tool_call_end_token)
75
+
76
+ if self.tool_call_start_token_id is None or self.tool_call_end_token_id is None:
77
+ raise RuntimeError(
78
+ "Qwen3 XML Tool parser could not locate tool call start/end "
79
+ "tokens in the tokenizer!"
80
+ )
81
+
82
+ logger.info(f"vLLM Successfully import tool parser {self.__class__.__name__} !")
83
+
84
+ def _generate_tool_call_id(self) -> str:
85
+ """Generate a unique tool call ID."""
86
+ return f"call_{uuid.uuid4().hex[:24]}"
87
+
88
+ def _reset_streaming_state(self):
89
+ """Reset all streaming state."""
90
+ self.current_tool_index = 0
91
+ self.is_tool_call_started = False
92
+ self.header_sent = False
93
+ self.current_tool_id = None
94
+ self.current_function_name = None
95
+ self.current_param_name = None
96
+ self.current_param_value = ""
97
+ self.param_count = 0
98
+ self.in_param = False
99
+ self.in_function = False
100
+ self.accumulated_text = ""
101
+ self.json_started = False
102
+ self.json_closed = False
103
+
104
+ def _parse_xml_function_call(
105
+ self, function_call_str: str, tools: Optional[list[ChatCompletionToolsParam]]
106
+ ) -> Optional[ToolCall]:
107
+ def get_arguments_config(func_name: str) -> dict:
108
+ if tools is None:
109
+ return {}
110
+ for config in tools:
111
+ if not hasattr(config, "type") or not (
112
+ hasattr(config, "function") and hasattr(config.function, "name")
113
+ ):
114
+ continue
115
+ if config.type == "function" and config.function.name == func_name:
116
+ if not hasattr(config.function, "parameters"):
117
+ return {}
118
+ params = config.function.parameters
119
+ if isinstance(params, dict) and "properties" in params:
120
+ return params["properties"]
121
+ elif isinstance(params, dict):
122
+ return params
123
+ else:
124
+ return {}
125
+ logger.warning(f"Tool '{func_name}' is not defined in the tools list.")
126
+ return {}
127
+
128
+ def convert_param_value(
129
+ param_value: str, param_name: str, param_config: dict, func_name: str
130
+ ) -> Any:
131
+ # Handle null value for any type
132
+ if param_value.lower() == "null":
133
+ return None
134
+
135
+ if param_name not in param_config:
136
+ if param_config != {}:
137
+ logger.warning(
138
+ f"Parsed parameter '{param_name}' is not defined in the tool "
139
+ f"parameters for tool '{func_name}', directly returning the string value."
140
+ )
141
+ return param_value
142
+
143
+ if (
144
+ isinstance(param_config[param_name], dict)
145
+ and "type" in param_config[param_name]
146
+ ):
147
+ param_type = str(param_config[param_name]["type"]).strip().lower()
148
+ else:
149
+ param_type = "string"
150
+ if param_type in ["string", "str", "text", "varchar", "char", "enum"]:
151
+ return param_value
152
+ elif (
153
+ param_type.startswith("int")
154
+ or param_type.startswith("uint")
155
+ or param_type.startswith("long")
156
+ or param_type.startswith("short")
157
+ or param_type.startswith("unsigned")
158
+ ):
159
+ try:
160
+ param_value = int(param_value)
161
+ except:
162
+ logger.warning(
163
+ f"Parsed value '{param_value}' of parameter '{param_name}' is not an integer in tool "
164
+ f"'{func_name}', degenerating to string."
165
+ )
166
+ return param_value
167
+ elif param_type.startswith("num") or param_type.startswith("float"):
168
+ try:
169
+ float_param_value = float(param_value)
170
+ param_value = float_param_value if float_param_value - int(float_param_value) != 0 else int(float_param_value)
171
+ except:
172
+ logger.warning(
173
+ f"Parsed value '{param_value}' of parameter '{param_name}' is not a float in tool "
174
+ f"'{func_name}', degenerating to string."
175
+ )
176
+ return param_value
177
+ elif param_type in ["boolean", "bool", "binary"]:
178
+ param_value = param_value.lower()
179
+ if param_value not in ["true", "false"]:
180
+ logger.warning(
181
+ f"Parsed value '{param_value}' of parameter '{param_name}' is not a boolean (`true` of `false`) in tool '{func_name}', degenerating to false."
182
+ )
183
+ return param_value == "true"
184
+ else:
185
+ if param_type == "object" or param_type.startswith("dict"):
186
+ try:
187
+ param_value = json.loads(param_value)
188
+ return param_value
189
+ except:
190
+ logger.warning(
191
+ f"Parsed value '{param_value}' of parameter '{param_name}' is not a valid JSON object in tool "
192
+ f"'{func_name}', will try other methods to parse it."
193
+ )
194
+ try:
195
+ param_value = eval(param_value)
196
+ except:
197
+ logger.warning(
198
+ f"Parsed value '{param_value}' of parameter '{param_name}' cannot be converted via Python `eval()` in tool '{func_name}', degenerating to string."
199
+ )
200
+ return param_value
201
+
202
+ # Extract function name
203
+ end_index = function_call_str.index(">")
204
+ function_name = function_call_str[:end_index]
205
+ param_config = get_arguments_config(function_name)
206
+ parameters = function_call_str[end_index + 1 :]
207
+ param_dict = {}
208
+ for match in self.tool_call_parameter_regex.findall(parameters):
209
+ match_text = match[0] if match[0] else match[1]
210
+ idx = match_text.index(">")
211
+ param_name = match_text[:idx]
212
+ param_value = str(match_text[idx + 1 :])
213
+ # Remove prefix and trailing \n
214
+ if param_value.startswith("\n"):
215
+ param_value = param_value[1:]
216
+ if param_value.endswith("\n"):
217
+ param_value = param_value[:-1]
218
+
219
+ param_dict[param_name] = convert_param_value(
220
+ param_value, param_name, param_config, function_name
221
+ )
222
+ return ToolCall(
223
+ type="function",
224
+ function=FunctionCall(
225
+ name=function_name, arguments=json.dumps(param_dict, ensure_ascii=False)
226
+ ),
227
+ )
228
+
229
+ def _get_function_calls(self, model_output: str) -> List[str]:
230
+ # Find all tool calls
231
+ matched_ranges = self.tool_call_regex.findall(model_output)
232
+ raw_tool_calls = [
233
+ match[0] if match[0] else match[1] for match in matched_ranges
234
+ ]
235
+
236
+ # Back-off strategy if no tool_call tags found
237
+ if len(raw_tool_calls) == 0:
238
+ raw_tool_calls = [model_output]
239
+
240
+ raw_function_calls = []
241
+ for tool_call in raw_tool_calls:
242
+ raw_function_calls.extend(self.tool_call_function_regex.findall(tool_call))
243
+
244
+ function_calls = [
245
+ match[0] if match[0] else match[1] for match in raw_function_calls
246
+ ]
247
+ return function_calls
248
+
249
+ def extract_tool_calls(
250
+ self,
251
+ model_output: str,
252
+ request: ChatCompletionRequest,
253
+ ) -> ExtractedToolCallInformation:
254
+ # Quick check to avoid unnecessary processing
255
+ if self.tool_call_prefix not in model_output:
256
+ return ExtractedToolCallInformation(
257
+ tools_called=False, tool_calls=[], content=model_output
258
+ )
259
+
260
+ try:
261
+ function_calls = self._get_function_calls(model_output)
262
+ if len(function_calls) == 0:
263
+ return ExtractedToolCallInformation(
264
+ tools_called=False, tool_calls=[], content=model_output
265
+ )
266
+
267
+ tool_calls = [
268
+ self._parse_xml_function_call(function_call_str, request.tools)
269
+ for function_call_str in function_calls
270
+ ]
271
+
272
+ # Populate prev_tool_call_arr for serving layer to set finish_reason
273
+ self.prev_tool_call_arr.clear() # Clear previous calls
274
+ for tool_call in tool_calls:
275
+ if tool_call:
276
+ self.prev_tool_call_arr.append(
277
+ {
278
+ "name": tool_call.function.name,
279
+ "arguments": tool_call.function.arguments,
280
+ }
281
+ )
282
+
283
+ # Extract content before tool calls
284
+ content_index = model_output.find(self.tool_call_start_token)
285
+ content_index = (
286
+ content_index
287
+ if content_index >= 0
288
+ else model_output.find(self.tool_call_prefix)
289
+ )
290
+ content = model_output[:content_index] # .rstrip()
291
+
292
+ return ExtractedToolCallInformation(
293
+ tools_called=(len(tool_calls) > 0),
294
+ tool_calls=tool_calls,
295
+ content=content if content else None,
296
+ )
297
+
298
+ except Exception:
299
+ logger.exception("Error in extracting tool call from response.")
300
+ return ExtractedToolCallInformation(
301
+ tools_called=False, tool_calls=[], content=model_output
302
+ )
303
+
304
+ def extract_tool_calls_streaming(
305
+ self,
306
+ previous_text: str,
307
+ current_text: str,
308
+ delta_text: str,
309
+ previous_token_ids: Sequence[int],
310
+ current_token_ids: Sequence[int],
311
+ delta_token_ids: Sequence[int],
312
+ request: ChatCompletionRequest,
313
+ ) -> Union[DeltaMessage, None]:
314
+ # If no delta text, return None unless it's an EOS token after tool calls
315
+ if not delta_text:
316
+ # Check if this is an EOS token after all tool calls are complete
317
+ # We check for tool calls in the text even if is_tool_call_started is False
318
+ # because it might have been reset after processing all tools
319
+ if delta_token_ids and self.tool_call_end_token_id not in delta_token_ids:
320
+ # Count complete tool calls
321
+ complete_calls = len(
322
+ self.tool_call_complete_regex.findall(current_text)
323
+ )
324
+
325
+ # If we have completed tool calls and populated prev_tool_call_arr
326
+ if complete_calls > 0 and len(self.prev_tool_call_arr) > 0:
327
+ # Check if all tool calls are closed
328
+ open_calls = current_text.count(
329
+ self.tool_call_start_token
330
+ ) - current_text.count(self.tool_call_end_token)
331
+ if open_calls == 0:
332
+ # Return empty delta message to allow finish_reason processing
333
+ return DeltaMessage(content="")
334
+ elif not self.is_tool_call_started and current_text:
335
+ # This is a regular content response that's now complete
336
+ return DeltaMessage(content="")
337
+ return None
338
+
339
+ # Check if this is the first call (reset state if needed)
340
+ if not previous_text:
341
+ self._reset_streaming_state()
342
+
343
+ # Update accumulated text
344
+ self.accumulated_text = current_text
345
+
346
+ # Check if we need to advance to next tool
347
+ if self.json_closed and not self.in_function:
348
+ # Check if this tool call has ended
349
+ tool_ends = current_text.count(self.tool_call_end_token)
350
+ if tool_ends > self.current_tool_index:
351
+ # This tool has ended, advance to next
352
+ self.current_tool_index += 1
353
+ self.header_sent = False
354
+ self.param_count = 0
355
+ self.json_started = False
356
+ self.json_closed = False
357
+
358
+ # Check if there are more tool calls
359
+ tool_starts = current_text.count(self.tool_call_start_token)
360
+ if self.current_tool_index >= tool_starts:
361
+ # No more tool calls
362
+ self.is_tool_call_started = False
363
+ # Continue processing next tool
364
+ return None
365
+
366
+ # Handle normal content before tool calls
367
+ if not self.is_tool_call_started:
368
+ # Check if tool call is starting
369
+ if (
370
+ self.tool_call_start_token_id in delta_token_ids
371
+ or self.tool_call_start_token in delta_text
372
+ ):
373
+ self.is_tool_call_started = True
374
+ # Return any content before the tool call
375
+ if self.tool_call_start_token in delta_text:
376
+ content_before = delta_text[
377
+ : delta_text.index(self.tool_call_start_token)
378
+ ]
379
+ if content_before:
380
+ return DeltaMessage(content=content_before)
381
+ return None
382
+ else:
383
+ # Check if we're between tool calls - skip whitespace
384
+ if current_text.rstrip().endswith(self.tool_call_end_token):
385
+ # We just ended a tool call, skip whitespace
386
+ if delta_text.strip() == "":
387
+ return None
388
+ # Normal content, no tool call
389
+ return DeltaMessage(content=delta_text)
390
+
391
+ # Check if we're between tool calls (waiting for next one)
392
+ # Count tool calls we've seen vs processed
393
+ tool_starts_count = current_text.count(self.tool_call_start_token)
394
+ if self.current_tool_index >= tool_starts_count:
395
+ # We're past all tool calls, shouldn't be here
396
+ return None
397
+
398
+ # We're in a tool call, find the current tool call portion
399
+ # Need to find the correct tool call based on current_tool_index
400
+ tool_starts = []
401
+ idx = 0
402
+ while True:
403
+ idx = current_text.find(self.tool_call_start_token, idx)
404
+ if idx == -1:
405
+ break
406
+ tool_starts.append(idx)
407
+ idx += len(self.tool_call_start_token)
408
+
409
+ if self.current_tool_index >= len(tool_starts):
410
+ # No more tool calls to process yet
411
+ return None
412
+
413
+ tool_start_idx = tool_starts[self.current_tool_index]
414
+ # Find where this tool call ends (or current position if not ended yet)
415
+ tool_end_idx = current_text.find(self.tool_call_end_token, tool_start_idx)
416
+ if tool_end_idx == -1:
417
+ tool_text = current_text[tool_start_idx:]
418
+ else:
419
+ tool_text = current_text[
420
+ tool_start_idx : tool_end_idx + len(self.tool_call_end_token)
421
+ ]
422
+
423
+ # Looking for function header
424
+ if not self.header_sent:
425
+ if self.tool_call_prefix in tool_text:
426
+ func_start = tool_text.find(self.tool_call_prefix) + len(
427
+ self.tool_call_prefix
428
+ )
429
+ func_end = tool_text.find(">", func_start)
430
+
431
+ if func_end != -1:
432
+ # Found complete function name
433
+ self.current_function_name = tool_text[func_start:func_end]
434
+ self.current_tool_id = self._generate_tool_call_id()
435
+ self.header_sent = True
436
+ self.in_function = True
437
+
438
+ # IMPORTANT: Add to prev_tool_call_arr immediately when we detect a tool call
439
+ # This ensures finish_reason="tool_calls" even if parsing isn't complete
440
+ already_added = any(
441
+ tool.get("name") == self.current_function_name
442
+ for tool in self.prev_tool_call_arr
443
+ )
444
+ if not already_added:
445
+ self.prev_tool_call_arr.append(
446
+ {
447
+ "name": self.current_function_name,
448
+ "arguments": "{}", # Placeholder, will be updated later
449
+ }
450
+ )
451
+
452
+ # Send header with function info
453
+ return DeltaMessage(
454
+ tool_calls=[
455
+ DeltaToolCall(
456
+ index=self.current_tool_index,
457
+ id=self.current_tool_id,
458
+ function=DeltaFunctionCall(
459
+ name=self.current_function_name, arguments=""
460
+ ),
461
+ type="function",
462
+ )
463
+ ]
464
+ )
465
+ return None
466
+
467
+ # We've sent header, now handle function body
468
+ if self.in_function:
469
+ # Send opening brace if not sent yet
470
+ if not self.json_started and not self.parameter_prefix in delta_text:
471
+ self.json_started = True
472
+ return DeltaMessage(
473
+ tool_calls=[
474
+ DeltaToolCall(
475
+ index=self.current_tool_index,
476
+ function=DeltaFunctionCall(arguments="{"),
477
+ )
478
+ ]
479
+ )
480
+
481
+ # Make sure json_started is set if we're processing parameters
482
+ if not self.json_started:
483
+ self.json_started = True
484
+
485
+ # Check for function end in accumulated text
486
+ if not self.json_closed and self.function_end_token in tool_text:
487
+ # Close JSON
488
+ self.json_closed = True
489
+
490
+ # Extract the complete tool call to update prev_tool_call_arr with final arguments
491
+ # Find the function content
492
+ func_start = tool_text.find(self.tool_call_prefix) + len(
493
+ self.tool_call_prefix
494
+ )
495
+ func_content_end = tool_text.find(self.function_end_token, func_start)
496
+ if func_content_end != -1:
497
+ func_content = tool_text[func_start:func_content_end]
498
+ # Parse to get the complete arguments
499
+ try:
500
+ parsed_tool = self._parse_xml_function_call(
501
+ func_content, request.tools if request else None
502
+ )
503
+ if parsed_tool:
504
+ # Update existing entry in prev_tool_call_arr with complete arguments
505
+ for i, tool in enumerate(self.prev_tool_call_arr):
506
+ if tool.get("name") == parsed_tool.function.name:
507
+ self.prev_tool_call_arr[i]["arguments"] = (
508
+ parsed_tool.function.arguments
509
+ )
510
+ break
511
+ except Exception:
512
+ pass # Ignore parsing errors during streaming
513
+
514
+ result = DeltaMessage(
515
+ tool_calls=[
516
+ DeltaToolCall(
517
+ index=self.current_tool_index,
518
+ function=DeltaFunctionCall(arguments="}"),
519
+ )
520
+ ]
521
+ )
522
+
523
+ # Reset state for next tool
524
+ self.in_function = False
525
+ self.json_closed = True
526
+
527
+ return result
528
+
529
+ # Look for parameters
530
+ # Count how many complete parameters we have processed
531
+ complete_params = tool_text.count(self.parameter_end_token)
532
+
533
+ # Check if we should start a new parameter
534
+ if not self.in_param and self.param_count < complete_params:
535
+ # Find the unprocessed parameter
536
+ # Count parameter starts
537
+ param_starts = []
538
+ idx = 0
539
+ while True:
540
+ idx = tool_text.find(self.parameter_prefix, idx)
541
+ if idx == -1:
542
+ break
543
+ param_starts.append(idx)
544
+ idx += len(self.parameter_prefix)
545
+
546
+ if len(param_starts) > self.param_count:
547
+ # Process the next parameter
548
+ param_idx = param_starts[self.param_count]
549
+ param_start = param_idx + len(self.parameter_prefix)
550
+ remaining = tool_text[param_start:]
551
+
552
+ if ">" in remaining:
553
+ # We have the complete parameter name
554
+ name_end = remaining.find(">")
555
+ self.current_param_name = remaining[:name_end]
556
+
557
+ # Find the parameter value
558
+ value_start = param_start + name_end + 1
559
+ value_text = tool_text[value_start:]
560
+ if value_text.startswith("\n"):
561
+ value_text = value_text[1:]
562
+
563
+ # Find where this parameter ends
564
+ param_end_idx = value_text.find(self.parameter_end_token)
565
+ if param_end_idx != -1:
566
+ # Complete parameter found
567
+ param_value = value_text[:param_end_idx]
568
+ if param_value.endswith("\n"):
569
+ param_value = param_value[:-1]
570
+
571
+ # Build complete JSON fragment for this parameter
572
+ if self.param_count == 0:
573
+ json_fragment = (
574
+ '"'
575
+ + self.current_param_name
576
+ + '": "'
577
+ + json.dumps(param_value)[1:-1]
578
+ + '"'
579
+ )
580
+ else:
581
+ json_fragment = (
582
+ ', "'
583
+ + self.current_param_name
584
+ + '": "'
585
+ + json.dumps(param_value)[1:-1]
586
+ + '"'
587
+ )
588
+
589
+ self.param_count += 1
590
+
591
+ return DeltaMessage(
592
+ tool_calls=[
593
+ DeltaToolCall(
594
+ index=self.current_tool_index,
595
+ function=DeltaFunctionCall(
596
+ arguments=json_fragment
597
+ ),
598
+ )
599
+ ]
600
+ )
601
+
602
+ # Continue parameter value
603
+ if self.in_param:
604
+ if self.parameter_end_token in delta_text:
605
+ # End of parameter
606
+ end_idx = delta_text.find(self.parameter_end_token)
607
+ value_chunk = delta_text[:end_idx]
608
+
609
+ # Skip past > if at start
610
+ if not self.current_param_value and ">" in value_chunk:
611
+ gt_idx = value_chunk.find(">")
612
+ value_chunk = value_chunk[gt_idx + 1 :]
613
+
614
+ if not self.current_param_value and value_chunk.startswith("\n"):
615
+ value_chunk = value_chunk[1:]
616
+
617
+ # Calculate incremental JSON
618
+ full_value = self.current_param_value + value_chunk
619
+ prev_escaped = (
620
+ json.dumps(self.current_param_value)[1:-1]
621
+ if self.current_param_value
622
+ else ""
623
+ )
624
+ full_escaped = json.dumps(full_value)[1:-1]
625
+ delta_escaped = full_escaped[len(prev_escaped) :]
626
+
627
+ self.in_param = False
628
+ self.current_param_value = ""
629
+
630
+ return DeltaMessage(
631
+ tool_calls=[
632
+ DeltaToolCall(
633
+ index=self.current_tool_index,
634
+ function=DeltaFunctionCall(
635
+ arguments=delta_escaped + '"'
636
+ ),
637
+ )
638
+ ]
639
+ )
640
+ else:
641
+ # Continue accumulating value
642
+ value_chunk = delta_text
643
+
644
+ # Handle first chunk after param name
645
+ if not self.current_param_value and ">" in value_chunk:
646
+ gt_idx = value_chunk.find(">")
647
+ value_chunk = value_chunk[gt_idx + 1 :]
648
+
649
+ if not self.current_param_value and value_chunk.startswith("\n"):
650
+ value_chunk = value_chunk[1:]
651
+
652
+ if value_chunk:
653
+ # Stream the escaped delta
654
+ prev_escaped = (
655
+ json.dumps(self.current_param_value)[1:-1]
656
+ if self.current_param_value
657
+ else ""
658
+ )
659
+ self.current_param_value += value_chunk
660
+ full_escaped = json.dumps(self.current_param_value)[1:-1]
661
+ delta_escaped = full_escaped[len(prev_escaped) :]
662
+
663
+ if delta_escaped:
664
+ return DeltaMessage(
665
+ tool_calls=[
666
+ DeltaToolCall(
667
+ index=self.current_tool_index,
668
+ function=DeltaFunctionCall(
669
+ arguments=delta_escaped
670
+ ),
671
+ )
672
+ ]
673
+ )
674
+
675
+ return None