Volko76 commited on
Commit
dba10be
·
verified ·
1 Parent(s): d707617

Other config files

Browse files
Files changed (36) hide show
  1. .gitattributes +11 -0
  2. README.md +430 -0
  3. configs/genie/baichuan2_7b.json +59 -0
  4. configs/genie/ibm_granite_v3_1_8b_instruct.json +63 -0
  5. configs/genie/llama_v2_7b_chat.json +57 -0
  6. configs/genie/llama_v3_1_8b_instruct.json +69 -0
  7. configs/genie/llama_v3_1_sea_lion_3_5_r_8b_chat.json +69 -0
  8. configs/genie/llama_v3_2_3b_instruct.json +67 -0
  9. configs/genie/llama_v3_8b_instruct.json +60 -0
  10. configs/genie/llama_v3_taide_8b_chat.json +60 -0
  11. configs/genie/mistral_7b_instruct_v0_3.json +59 -0
  12. configs/genie/phi_3_5_mini_instruct.json +170 -0
  13. configs/genie/qwen2_7b_instruct.json +58 -0
  14. configs/htp/htp_backend_ext_config.json.template +21 -0
  15. genie_bundle/Genie.dll +3 -0
  16. genie_bundle/PlatformValidatorShared.dll +3 -0
  17. genie_bundle/QnnGenAiTransformer.dll +3 -0
  18. genie_bundle/QnnGenAiTransformerModel.dll +0 -0
  19. genie_bundle/QnnHtp.dll +3 -0
  20. genie_bundle/QnnHtpNetRunExtensions.dll +3 -0
  21. genie_bundle/QnnHtpPrepare.dll +3 -0
  22. genie_bundle/QnnHtpv73CalculatorStub.dll +0 -0
  23. genie_bundle/QnnHtpv73Stub.dll +3 -0
  24. genie_bundle/QnnSystem.dll +3 -0
  25. genie_bundle/genie-t2t-run.exe +3 -0
  26. genie_bundle/genie_config.json +67 -0
  27. genie_bundle/htp_backend_ext_config.json +21 -0
  28. genie_bundle/libCalculator_skel.so +0 -0
  29. genie_bundle/libQnnHtpv73.cat +0 -0
  30. genie_bundle/libQnnHtpv73Skel.so +3 -0
  31. genie_bundle/qnn-platform-validator.exe +3 -0
  32. genie_bundle/tokenizer.json +0 -0
  33. platformValidator/output/Result.csv +3 -0
  34. powershell/LlmUtils.ps1 +188 -0
  35. powershell/README.md +41 -0
  36. powershell/RunLlm.ps1 +135 -0
.gitattributes CHANGED
@@ -33,3 +33,14 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ genie_bundle/genie-t2t-run.exe filter=lfs diff=lfs merge=lfs -text
37
+ genie_bundle/Genie.dll filter=lfs diff=lfs merge=lfs -text
38
+ genie_bundle/libQnnHtpv73Skel.so filter=lfs diff=lfs merge=lfs -text
39
+ genie_bundle/PlatformValidatorShared.dll filter=lfs diff=lfs merge=lfs -text
40
+ genie_bundle/qnn-platform-validator.exe filter=lfs diff=lfs merge=lfs -text
41
+ genie_bundle/QnnGenAiTransformer.dll filter=lfs diff=lfs merge=lfs -text
42
+ genie_bundle/QnnHtp.dll filter=lfs diff=lfs merge=lfs -text
43
+ genie_bundle/QnnHtpNetRunExtensions.dll filter=lfs diff=lfs merge=lfs -text
44
+ genie_bundle/QnnHtpPrepare.dll filter=lfs diff=lfs merge=lfs -text
45
+ genie_bundle/QnnHtpv73Stub.dll filter=lfs diff=lfs merge=lfs -text
46
+ genie_bundle/QnnSystem.dll filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,430 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLM On-Device Deployment
2
+
3
+ In this tutorial we will show an end to end workflow deploying large language
4
+ models (LLMs) to Snapdragon® platforms such as Snapdragon® 8 Elite,
5
+ Snapdragon® 8 Gen 3 (e.g., Samsung Galaxy S24 family) and Snapdragon® X Elite
6
+ (e.g. Snapdragon® based Microsoft Surface Pro). We will use
7
+ [Qualcomm AI Hub](https://aihub.qualcomm.com/) to compile the models to QAIRT
8
+ context binaries and run them with Genie from the [QAIRT
9
+ SDK](https://qpm.qualcomm.com/#/main/tools/details/Qualcomm_AI_Runtime_SDK).
10
+
11
+ We will use Llama3 8B as a running example. Other LLMs from [AI Hub
12
+ Models](https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models)
13
+ will work with the same flow.
14
+
15
+ ## Overview
16
+
17
+ We will walk you through the follow steps:
18
+
19
+ 1. Get access to [Llama 3 weights from Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
20
+ 2. Use Qualcomm [AI Hub
21
+ Models](https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models) to export Llama 3 using AI Hub.
22
+ 3. Prepare assets required by Qualcomm Genie, the inference runtime for LLMs.
23
+ 4. Run the LLM on device with an example prompt on Android / Windows PC with Snapdragon®.
24
+
25
+ Note that because this is a large model, it may take 4-6 hours to generate required assets.
26
+
27
+ If you have any questions, please feel free to post on [AI Hub Slack channel](https://aihub.qualcomm.com/community/slack)
28
+
29
+ ## Device Requirements
30
+
31
+ | Model name | Minimum Compile QAIRT SDK version | Supported devices |
32
+ | --- | --- | --- |
33
+ | Llama-v2-7B-Chat | 2.27.0 | Snapdragon® 8 Elite<br>Snapdragon® 8 Gen 3<br>Snapdragon® X Elite<br>Snapdragon® X Plus |
34
+ | Llama-v3-8B-Instruct | 2.27.0 | Snapdragon® 8 Elite<br>Snapdragon® X Elite<br>Snapdragon® X Plus |
35
+ | Llama-v3.1-8B-Instruct | 2.27.7 | Snapdragon® 8 Elite |
36
+ | Llama-v3.1-8B-Instruct | 2.28.0 | Snapdragon® X Elite<br>Snapdragon® X Plus |
37
+ | Llama-v3.2-3B-Instruct | 2.27.7 | Snapdragon® 8 Elite<br>Snapdragon® 8 Gen 3 (Context length 2048) |
38
+ | Llama-v3.2-3B-Instruct | 2.28.0 | Snapdragon® X Elite<br>Snapdragon® X Plus |
39
+ | Llama-SEA-LION-v3.5-8B-R | 2.28.0 | Snapdragon® 8 Elite<br>Snapdragon® X Elite<br>Snapdragon® X Plus |
40
+ | Llama3-TAIDE-LX-8B-Chat-Alpha1 | 2.27.0 | Snapdragon® 8 Elite<br>Snapdragon® X Elite<br>Snapdragon® X Plus |
41
+ | Baichuan2-7B | 2.27.7 | Snapdragon® 8 Elite |
42
+ | Qwen2-7B-Instruct | 2.27.7 | Snapdragon® 8 Elite |
43
+ | Mistral-7B-Instruct-v0.3 | 2.27.7 | Snapdragon® 8 Elite |
44
+ | Phi-3.5-Mini-Instruct | 2.29.0 | Snapdragon® 8 Elite<br>Snapdragon® X Elite<br>Snapdragon® 8 Gen 3 |
45
+ | IBM-Granite-v3.1-8B-Instruct | 2.30.0 | Snapdragon® 8 Elite<br>Snapdragon® X Elite |
46
+
47
+ Device requirements:
48
+
49
+ - Android 15
50
+ - At least Genie SDK from QAIRT (or QNN) SDK 2.29.0 (earlier versions have issues with long prompts).
51
+ - Hexagon architecture v73 or above (please see [Devices](https://app.aihub.qualcomm.com/devices/) list).
52
+ - 16GB memory or more for 7B+ or 4096 context length models.
53
+ - 12GB memory or more for 3B+ models (and you may need to adjust down context length).
54
+
55
+ > [!IMPORTANT]
56
+ > Please make sure device requirements are met before proceeding.
57
+
58
+ ## Required Software
59
+
60
+ The following packages are required:
61
+
62
+ 1. [QAIRT SDK](https://qpm.qualcomm.com/#/main/tools/details/Qualcomm_AI_Runtime_SDK) (see [QNN SDK](https://qpm.qualcomm.com/#/main/tools/details/qualcomm_ai_engine_direct) for versions prior to 2.32)
63
+ 2. [qai-hub-models](https://pypi.org/project/qai-hub-models/) and any extras for your desired model.
64
+ 3. [qai-hub](https://pypi.org/project/qai-hub/)
65
+
66
+ ### QAIRT Installation
67
+
68
+ Typically we recommend using the same QAIRT SDK version that AI Hub uses to compile
69
+ the assets. You can find this version by clicking the job links posted printed
70
+ by the export command.
71
+
72
+ Go to [QAIRT
73
+ SDK](https://qpm.qualcomm.com/#/main/tools/details/Qualcomm_AI_Runtime_SDK) (or [QNN SDK](https://qpm.qualcomm.com/#/main/tools/details/qualcomm_ai_engine_direct) for older versions) and
74
+ follow the installation instructions. Note that the first time after log in you
75
+ would be redirected to QPM home page. Click on the link again to get to the
76
+ QAIRT download page.
77
+
78
+ If you are on a Mac laptop, we recommend using
79
+ [Docker](https://www.docker.com/) to install qpm-cli to extract the `.qik` file.
80
+
81
+ If successful, you should see a message with the install path. This will depend on
82
+ the platform and can look like this:
83
+
84
+ ```text
85
+ /opt/qcom/aistack/qairt/<version>
86
+ C:\Qualcomm\AIStack\QAIRT\<version>
87
+ ```
88
+
89
+ Set your `QNN_SDK_ROOT` environment variable to point to this directory. On
90
+ Linux or Mac you would run:
91
+
92
+ ```bash
93
+ export QNN_SDK_ROOT=/opt/qcom/aistack/qairt/<version>
94
+ ```
95
+
96
+ On Windows, you can search the taskbar for "Edit the system environment
97
+ variables".
98
+
99
+ ### Python Packages
100
+
101
+ Following standard best practices, we recommend creating a virtual environment specifically for
102
+ exporting AI Hub models. The following steps can be performed on Windows,
103
+ Linux, or Mac. On Windows, you can either install x86-64 Python (since package
104
+ support is limited on native ARM64 Python) or use Windows Subsystem for Linux
105
+ (WSL).
106
+
107
+ #### Create Virtual Environment
108
+
109
+ Create a [virtualenv](https://virtualenv.pypa.io/en/latest/) for `qai-hub-models` with Python 3.10.
110
+ You can also use [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html).
111
+
112
+ For clarity, we recommend creating a virtual env:
113
+
114
+ ```bash
115
+ python3.10 -m venv llm_on_genie_venv
116
+ ```
117
+
118
+ #### Install `qai-hub-models`
119
+
120
+ In a shell session, install `qai-hub-models` in the virtual environment:
121
+
122
+ ```bash
123
+ source llm_on_genie_venv/bin/activate
124
+ pip install -U "qai-hub-models[llama-v3-8b-instruct]"
125
+ ```
126
+
127
+ Replace `llama-v3-8b-instruct` with the desired llama model from [AI Hub
128
+ Model](https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models).
129
+ Note to replace `_` with `-` (e.g. `llama_v3_8b_instruct` -> `llama-v3-8b-instruct`)
130
+
131
+ Make sure Git is installed in your environment. This command should work:
132
+
133
+ ```bash
134
+ git --version
135
+ ```
136
+
137
+ Ensure at least 80GB of memory (RAM + swap). On Ubuntu (including through WSL) you can check it by
138
+
139
+ ```bash
140
+ free -h
141
+ ```
142
+
143
+ Increase swap size if needed.
144
+
145
+ We use
146
+ [qai-hub-models](https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/)
147
+ to adapt Huggingface Llama models for on-device inference.
148
+
149
+ ## Acquire Genie Compatible QNN binaries from AI Hub
150
+
151
+ ### [Llama Only] Setup Hugging Face token
152
+
153
+ Setting up Hugging Face token is required only for the Llama model family.
154
+ Request model access on Hugging Face for Llama models. For instance, you can [apply here](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) to access Llama 3.2 3B model.
155
+
156
+ Set up Hugging Face token locally by following the instructions [here](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
157
+
158
+ ### Download or Generate Genie Compatible QNN Binaries
159
+
160
+ Some of the models can be downloaded directly from [AI
161
+ Hub](https://aihub.qualcomm.com). For Llama, it has to be exported through [AI Hub
162
+ Models](https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models).
163
+
164
+ To generate the Llama assets, we will run a single command that performs the
165
+ following steps:
166
+
167
+ 1. Download model weights from Hugging Face. You will need to sign the Llama
168
+ license if you haven't already done so.
169
+
170
+ 2. Upload models to AI Hub for compilation.
171
+
172
+ 3. Download compiled context binaries. Note that there are multiple binaries as
173
+ we have split up the model.
174
+
175
+ Make a directory to put in all deployable assets. For this example we use
176
+
177
+ ```bash
178
+ mkdir -p genie_bundle
179
+ ```
180
+
181
+ #### [Optional] Upgrade PyTorch
182
+
183
+ The export command below may take 4-6 hours. It takes an additional 1-2 hours
184
+ on PyTorch versions earlier than 2.4.0. We recommend upgrading PyTorch first:
185
+
186
+ ```bash
187
+ pip install torch==2.4.0
188
+ ```
189
+
190
+ This version is not yet supported in general by AI Hub Models but will work
191
+ for the below export command.
192
+
193
+ Note that the export also requires a lot of memory (RAM + swap) on the host
194
+ device (for Llama 3, we recommend 80 GB). If we detect that you have less
195
+ memory than recommended, the export command will print a warning with
196
+ instructions of how to increase your swap space.
197
+
198
+ #### For Android on Snapdragon® 8 Elite
199
+
200
+ ```bash
201
+ python -m qai_hub_models.models.llama_v3_8b_instruct.export --device "Snapdragon 8 Elite QRD" --skip-inferencing --skip-profiling --output-dir genie_bundle
202
+ ```
203
+
204
+ For Snapdragon 8 Gen 3, please use `--device "Snapdragon 8 Gen 3 QRD"`.
205
+
206
+ #### For Windows on Snapdragon® X Elite
207
+
208
+ ```bash
209
+ python -m qai_hub_models.models.llama_v3_8b_instruct.export --device "Snapdragon X Elite CRD" --skip-inferencing --skip-profiling --output-dir genie_bundle
210
+ ```
211
+
212
+ Note: For older devices, you may need to adjust the context length using
213
+ `--context-length <context-length>`.
214
+
215
+ The `genie_bundle` would now contain both the intermediate models (`token`,
216
+ `prompt`) and the final context binaries (`*.bin`). Remove the intermediate
217
+ models to have a smaller deployable artifact:
218
+
219
+ ```bash
220
+ # Remove intermediate assets
221
+ rm -rf genie_bundle/{prompt,token}
222
+ ```
223
+
224
+ ## Prepare Genie Configs
225
+
226
+ ### Tokenizer
227
+
228
+ To download the tokenizer, go to the source model's Hugging Face page and go to "Files
229
+ and versions." You can find a Hugging Face link through the model card on
230
+ [AI Hub](https://aihub.qualcomm.com/). This will take you to the Qualcomm Hugging Face page,
231
+ which in turn will have a link to the source Hugging Face page. The file will be named `tokenizer.json`
232
+ and should be downloaded to the `genie_bundle` directory. The tokenizers are only hosted on the source Hugging Face page.
233
+
234
+ | Model name | Tokenizer | Notes |
235
+ | --- | --- | --- |
236
+ | Llama-v2-7B-Chat | [tokenizer.json](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/tokenizer.json) | |
237
+ | Llama-v3-8B-Instruct | [tokenizer.json](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/tokenizer.json) | |
238
+ | Llama-v3.1-8B-Instruct | [tokenizer.json](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/tokenizer.json) | |
239
+ | Llama-SEA-LION-v3.5-8B-R | [tokenizer.json](https://huggingface.co/aisingapore/Llama-SEA-LION-v3.5-8B-R/blob/main/tokenizer.json) | |
240
+ | Llama-v3.2-3B-Instruct | [tokenizer.json](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/blob/main/tokenizer.json) | |
241
+ | Llama3-TAIDE-LX-8B-Chat-Alpha1 | [tokenizer.json](https://huggingface.co/taide/Llama3-TAIDE-LX-8B-Chat-Alpha1/blob/main/tokenizer.json) | |
242
+ | Baichuan2-7B | [tokenizer.json](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/baichuan2_7b_quantized/v2/tokenizer.json) | |
243
+ | Qwen2-7B-Instruct | [tokenizer.json](https://huggingface.co/Qwen/Qwen2-7B-Instruct/blob/main/tokenizer.json) | |
244
+ | Phi-3.5-Mini-Instruct | [tokenizer.json](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/tokenizer.json) | To see appropriate spaces in the output, remove lines 193-196 (Strip rule) in the tokenizer file. |
245
+ | Mistral-7B-Instruct-v0.3 | [tokenizer.json](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/blob/main/tokenizer.json) | |
246
+ | IBM-Granite-v3.1-8B-Instruct | [tokenizer.json](https://huggingface.co/ibm-granite/granite-3.1-8b-base/blob/main/tokenizer.json) | |
247
+
248
+ ### [Optional] Use the Windows PowerShell LLM Runner
249
+
250
+ **Do not use this script to create your Genie bundle if you are building Windows ChatApp. Continue with the rest of the tutorial instead.**
251
+
252
+ The easiest path to running an LLM on a Windows on Snapdragon® device is to use the [PowerShell implementation](powershell/)
253
+ of the rest of this tutorial. It will automatically generate the appropriate configuration files and execute `genie-t2t-run.exe`
254
+ on a prompt of your choosing.
255
+
256
+ ### Genie Config
257
+
258
+ Check out the [AI Hub Apps repository](https://github.com/quic/ai-hub-apps)
259
+ using Git:
260
+
261
+ ```bash
262
+ git clone https://github.com/quic/ai-hub-apps.git
263
+ ```
264
+
265
+ Now run (replacing `llama_v3_8b_instruct` with the desired model id):
266
+
267
+ ```bash
268
+ cp ai-hub-apps/tutorials/llm_on_genie/configs/genie/llama_v3_8b_instruct.json genie_bundle/genie_config.json
269
+ ```
270
+
271
+ For Windows laptops, please set `use-mmap` to `false`.
272
+
273
+ If you customized context length by adding `--context-length` to the export
274
+ command, please open `genie_config.json` and modify the `"size"` option (under
275
+ `"dialog"` -> `"context"`) to be consistent.
276
+
277
+ In `genie_bundle/genie_config.json`, also ensure that the list of bin files in
278
+ `ctx-bins` matches with the bin files under `genie_bundle`. Genie will look for
279
+ QNN binaries specified here.
280
+
281
+ ### HTP Backend Config
282
+
283
+ Copy the HTP config template:
284
+
285
+ ```bash
286
+ cp ai-hub-apps/tutorials/llm_on_genie/configs/htp/htp_backend_ext_config.json.template genie_bundle/htp_backend_ext_config.json
287
+ ```
288
+
289
+ Edit `soc_model` and `dsp_arch` in `genie_bundle/htp_backend_ext_config.json`
290
+ depending on your target device (should be consistent with the `--device` you
291
+ specified in the export command):
292
+
293
+ | Generation | `soc_model` | `dsp_arch` |
294
+ |--------------------------|--------|----------|
295
+ | Snapdragon® Gen 2 | 43 | v73 |
296
+ | Snapdragon® Gen 3 | 57 | v75 |
297
+ | Snapdragon® 8 Elite | 69 | v79 |
298
+ | Snapdragon® X Elite | 60 | v73 |
299
+ | Snapdragon® X Plus | 60 | v73 |
300
+
301
+ ## Collect & Finalize Genie Bundle
302
+
303
+ When finished with the above steps, your bundle should look like this:
304
+ ```
305
+ genie_bundle/
306
+ genie_config.json
307
+ htp_backend_ext_config.json
308
+ tokenizer.json
309
+ <model_id>_part_1_of_N.bin
310
+ ...
311
+ <model_id>_part_N_of_N.bin
312
+ ```
313
+
314
+ where <model_id> is the name of the model. This is the name of the json you copied from `configs/genie/<model_name>.json`.
315
+
316
+ ## Run LLM on Device
317
+
318
+ You have three options to run the LLM on device:
319
+
320
+ 1. Use the `genie-t2t-run` CLI command.
321
+ 2. Use the [CLI Windows ChatApp](https://github.com/quic/ai-hub-apps/tree/main/apps/windows/cpp/ChatApp) (Windows only).
322
+ 3. Use the [Android ChatApp](https://github.com/quic/ai-hub-apps/tree/main/apps/android/ChatApp).
323
+
324
+ ### Prompt Formats
325
+
326
+ All the LLMs have different formats. To get sensible output from the LLMs, it is important to use the correct prompt format for the model. These can also be found on the Hugging Face repository for each of the model. Adding samples for a few models here.
327
+
328
+ | Model name | Sample Prompt |
329
+ | --- | --- |
330
+ | Llama-v2-7B-Chat | &lt;s&gt;[INST] &lt;&lt;SYS&gt;&gt;You are a helpful AI Assistant.&lt;&lt;/SYS&gt;&gt;[/INST]&lt;/s>&lt;s&gt;[INST]What is France's capital?[/INST] |
331
+ | Llama-v3-8B-Instruct <br> Llama-v3.1-8B-Instruct <br> Llama-v3.2-3B-Instruct | <&#124;begin_of_text&#124;><&#124;start_header_id&#124;>user<&#124;end_header_id&#124;>\n\nWhat is France's capital?<&#124;eot_id&#124;><&#124;start_header_id&#124;>assistant<&#124;end_header_id&#124;> |
332
+ | Llama3-TAIDE-LX-8B-Chat-Alpha1 | <&#124;begin_of_text&#124;><&#124;start_header_id&#124;>system<&#124;end_header_id&#124;>\n\n你是一個來自台灣的AI助理,你的名字是 TAIDE,樂於以台灣人的立場幫助使用者,會用繁體中文回答問題<&#124;eot_id&#124;>\n<&#124;start_header_id&#124;>user<&#124;end_header_id&#124;>\n\n介紹台灣特色<&#124;eot_id&#124;>\n<&#124;start_header_id&#124;>assistant<&#124;end_header_id&#124;> |
333
+ | Llama-SEA-LION-v3.5-8B-R (non-thinking mode) | <&#124;begin_of_text&#124;><&#124;start_header_id&#124;>system<&#124;end_header_id&#124;>\n\ndetailed thinking off<&#124;eot_id&#124;><&#124;start_header_id&#124;>user<&#124;end_header_id&#124;>\n\nThủ đô của Việt Nam là thành phố nào?<&#124;eot_id&#124;><&#124;start_header_id&#124;>assistant<&#124;end_header_id&#124;>\n\n&lt;think&gt;\n\n&lt;/think&gt;>\n\n |
334
+ | Llama-SEA-LION-v3.5-8B-R (thinking mode) | <&#124;begin_of_text&#124;><&#124;start_header_id&#124;>system<&#124;end_header_id&#124;>\n\ndetailed thinking on<&#124;eot_id&#124;><&#124;start_header_id&#124;>user<&#124;end_header_id&#124;>\n\nThủ đô của Việt Nam là thành phố nào?<&#124;eot_id&#124;><&#124;start_header_id&#124;>assistant<&#124;end_header_id&#124;>\n\n&lt;think&gt;\nHere is my thinking:\n |
335
+ | Qwen2-7B-Instruct | <&#124;im_start&#124;>system\nYou are a helpful AI Assistant<&#124;im_end&#124;><&#124;im_start&#124;>What is France's capital?\n<&#124;im_end&#124;>\n<&#124;im_start&#124;>assistant\n |
336
+ | Phi-3.5-Mini-Instruct | <&#124;system&#124;>\nYou are a helpful assistant. Be helpful but brief.<&#124;end&#124;>\n<&#124;user&#124;>What is France's capital?\n<&#124;end&#124;>\n<&#124;assistant&#124;>\n |
337
+ | Mistral-7B-Instruct-v0.3 | &lt;s&gt;[INST] You are a helpful assistant\n\nTranslate 'Good morning, how are you?' into French.[/INST] |
338
+ | IBM-Granite-v3.1-8B-Instruct | <&#124;start_of_role&#124;>system<&#124;end_of_role&#124;>You are a helpful AI assistant.<&#124;end_of_text&#124;>\n <&#124;start_of_role&#124;>user<&#124;end_of_role&#124;>What is France's capital?<&#124;end_of_text&#124;>\n <&#124;start_of_role&#124;>assistant<&#124;end_of_role&#124;>\n |
339
+
340
+ ### 1. Run Genie On-Device via `genie-t2t-run`
341
+
342
+ #### Genie on Windows with Snapdragon® X
343
+
344
+ Copy Genie's shared libraries and executable to our bundle.
345
+ (Note you can skip this step if you used the powershell script to prepare your bundle.)
346
+
347
+ ```bash
348
+ cp $QNN_SDK_ROOT/lib/hexagon-v73/unsigned/* genie_bundle
349
+ cp $QNN_SDK_ROOT/lib/aarch64-windows-msvc/* genie_bundle
350
+ cp $QNN_SDK_ROOT/bin/aarch64-windows-msvc/genie-t2t-run.exe genie_bundle
351
+ ```
352
+
353
+ In Powershell, navigate to the bundle directory and run
354
+
355
+ ```bash
356
+ ./genie-t2t-run.exe -c genie_config.json -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
357
+ ```
358
+
359
+ Note that this prompt format is specific to Llama 3.
360
+
361
+ #### Genie on Android
362
+
363
+ Copy Genie's shared libraries and executable to our bundle.
364
+
365
+ ```bash
366
+ # For 8 Gen 2
367
+ cp $QNN_SDK_ROOT/lib/hexagon-v73/unsigned/* genie_bundle
368
+ # For 8 Gen 3
369
+ cp $QNN_SDK_ROOT/lib/hexagon-v75/unsigned/* genie_bundle
370
+ # For 8 Elite
371
+ cp $QNN_SDK_ROOT/lib/hexagon-v79/unsigned/* genie_bundle
372
+ # For all devices
373
+ cp $QNN_SDK_ROOT/lib/aarch64-android/* genie_bundle
374
+ cp $QNN_SDK_ROOT/bin/aarch64-android/genie-t2t-run genie_bundle
375
+ ```
376
+
377
+ Copy `genie_bundle` from the host machine to the target device using ADB and
378
+ open up an interactive shell on the target device:
379
+
380
+ ```bash
381
+ adb push genie_bundle /data/local/tmp
382
+ adb shell
383
+ ```
384
+
385
+ On device, navigate to the bundle directory:
386
+
387
+ ```bash
388
+ cd /data/local/tmp/genie_bundle
389
+ ```
390
+
391
+ Set `LD_LIBRARY_PATH` and `ADSP_LIBRARY_PATH` to the current directory:
392
+
393
+ ```bash
394
+ export LD_LIBRARY_PATH=$PWD
395
+ export ADSP_LIBRARY_PATH=$PWD
396
+ ```
397
+
398
+ Then run:
399
+
400
+ ```bash
401
+ ./genie-t2t-run -c genie_config.json -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
402
+ ```
403
+
404
+ #### Sample Output
405
+
406
+ ```text
407
+ Using libGenie.so version 1.1.0
408
+
409
+ [WARN] "Unable to initialize logging in backend extensions."
410
+ [INFO] "Using create From Binary List Async"
411
+ [INFO] "Allocated total size = 323453440 across 10 buffers"
412
+ [PROMPT]: <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
413
+
414
+ [BEGIN]: \n\nFrance's capital is Paris.[END]
415
+
416
+ [KPIS]:
417
+ Init Time: 6549034 us
418
+ Prompt Processing Time: 196067 us, Prompt Processing Rate : 86.707710 toks/sec
419
+ Token Generation Time: 740568 us, Token Generation Rate: 12.152884 toks/sec
420
+ ```
421
+
422
+ ### 2. Sample C++ Chat App Powered by Genie SDK
423
+
424
+ We provide a sample C++ app to show how to build an application using the Genie SDK.
425
+ See [CLI Windows ChatApp](https://github.com/quic/ai-hub-apps/tree/main/apps/windows/cpp/ChatApp) for more details.
426
+
427
+ ### 3. Sample Android Chat App Powered by Genie SDK
428
+
429
+ We provide a sample Android (Java and C++ app) to show how to build an application using the Genie SDK for mobile.
430
+ See [Android ChatApp](https://github.com/quic/ai-hub-apps/tree/main/apps/android/ChatApp) for more details.
configs/genie/baichuan2_7b.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 125696,
9
+ "bos-token": 1,
10
+ "eos-token": 2
11
+ },
12
+ "sampler": {
13
+ "version": 1,
14
+ "seed": 42,
15
+ "temp": 0.8,
16
+ "top-k": 40,
17
+ "top-p": 0.95
18
+ },
19
+ "tokenizer": {
20
+ "version": 1,
21
+ "path": "tokenizer.json"
22
+ },
23
+ "engine": {
24
+ "version": 1,
25
+ "n-threads": 3,
26
+ "backend": {
27
+ "version": 1,
28
+ "type": "QnnHtp",
29
+ "QnnHtp": {
30
+ "version": 1,
31
+ "use-mmap": true,
32
+ "spill-fill-bufsize": 0,
33
+ "mmap-budget": 0,
34
+ "poll": false,
35
+ "cpu-mask": "0xe0",
36
+ "kv-dim": 128,
37
+ "rope-theta": 10000,
38
+ "pos-id-dim" : 64,
39
+ "allow-async-init": false
40
+ },
41
+ "extensions": "htp_backend_ext_config.json"
42
+ },
43
+ "model": {
44
+ "version": 1,
45
+ "type": "binary",
46
+ "binary": {
47
+ "version": 1,
48
+ "ctx-bins": [
49
+ "weight_sharing_model_1_of_5.serialized.bin",
50
+ "weight_sharing_model_2_of_5.serialized.bin",
51
+ "weight_sharing_model_3_of_5.serialized.bin",
52
+ "weight_sharing_model_4_of_5.serialized.bin",
53
+ "weight_sharing_model_5_of_5.serialized.bin"
54
+ ]
55
+ }
56
+ }
57
+ }
58
+ }
59
+ }
configs/genie/ibm_granite_v3_1_8b_instruct.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 49155,
9
+ "bos-token": 0,
10
+ "eos-token": 0,
11
+ "pad-token": 0
12
+ },
13
+ "sampler": {
14
+ "version": 1,
15
+ "seed": 42,
16
+ "temp": 0.8,
17
+ "top-k": 40,
18
+ "top-p": 0.95
19
+ },
20
+ "tokenizer": {
21
+ "version": 1,
22
+ "path": "tokenizer.json"
23
+ },
24
+ "engine": {
25
+ "version": 1,
26
+ "n-threads": 3,
27
+ "backend": {
28
+ "version": 1,
29
+ "type": "QnnHtp",
30
+ "QnnHtp": {
31
+ "version": 1,
32
+ "use-mmap": true,
33
+ "spill-fill-bufsize": 0,
34
+ "mmap-budget": 0,
35
+ "poll": true,
36
+ "cpu-mask": "0xe0",
37
+ "kv-dim": 128,
38
+ "allow-async-init": false
39
+ },
40
+ "extensions": "htp_backend_ext_config.json"
41
+ },
42
+ "model": {
43
+ "version": 1,
44
+ "type": "binary",
45
+ "binary": {
46
+ "version": 1,
47
+ "ctx-bins": [
48
+ "weight_sharing_model_1_of_5.serialized.bin",
49
+ "weight_sharing_model_2_of_5.serialized.bin",
50
+ "weight_sharing_model_3_of_5.serialized.bin",
51
+ "weight_sharing_model_4_of_5.serialized.bin",
52
+ "weight_sharing_model_5_of_5.serialized.bin"
53
+ ]
54
+ },
55
+ "positional-encoding": {
56
+ "type": "rope",
57
+ "rope-dim": 64,
58
+ "rope-theta": 10000000
59
+ }
60
+ }
61
+ }
62
+ }
63
+ }
configs/genie/llama_v2_7b_chat.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 1024,
8
+ "n-vocab": 32000,
9
+ "bos-token": 1,
10
+ "eos-token": 2
11
+ },
12
+ "sampler": {
13
+ "version": 1,
14
+ "seed": 42,
15
+ "temp": 0.8,
16
+ "top-k": 40,
17
+ "top-p": 0.95
18
+ },
19
+ "tokenizer": {
20
+ "version": 1,
21
+ "path": "tokenizer.json"
22
+ },
23
+ "engine": {
24
+ "version": 1,
25
+ "n-threads": 4,
26
+ "backend": {
27
+ "version": 1,
28
+ "type": "QnnHtp",
29
+ "QnnHtp": {
30
+ "version": 1,
31
+ "spill-fill-bufsize": 0,
32
+ "use-mmap": true,
33
+ "mmap-budget": 0,
34
+ "poll": true,
35
+ "pos-id-dim": 64,
36
+ "cpu-mask": "0xe0",
37
+ "kv-dim": 128,
38
+ "allow-async-init": false
39
+ },
40
+ "extensions": "htp_backend_ext_config.json"
41
+ },
42
+ "model": {
43
+ "version": 1,
44
+ "type": "binary",
45
+ "binary": {
46
+ "version": 1,
47
+ "ctx-bins": [
48
+ "Llama2_Part1.bin",
49
+ "Llama2_Part2.bin",
50
+ "Llama2_Part3.bin",
51
+ "Llama2_Part4.bin"
52
+ ]
53
+ }
54
+ }
55
+ }
56
+ }
57
+ }
configs/genie/llama_v3_1_8b_instruct.json ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 128256,
9
+ "bos-token": -1,
10
+ "eos-token": [128001, 128009, 128008]
11
+ },
12
+ "sampler": {
13
+ "version": 1,
14
+ "seed": 42,
15
+ "temp": 0.8,
16
+ "top-k": 40,
17
+ "top-p": 0.95
18
+ },
19
+ "tokenizer": {
20
+ "version": 1,
21
+ "path": "tokenizer.json"
22
+ },
23
+ "engine": {
24
+ "version": 1,
25
+ "n-threads": 3,
26
+ "backend": {
27
+ "version": 1,
28
+ "type": "QnnHtp",
29
+ "QnnHtp": {
30
+ "version": 1,
31
+ "use-mmap": true,
32
+ "spill-fill-bufsize": 0,
33
+ "mmap-budget": 0,
34
+ "poll": true,
35
+ "cpu-mask": "0xe0",
36
+ "kv-dim": 128,
37
+ "allow-async-init": false
38
+ },
39
+ "extensions": "htp_backend_ext_config.json"
40
+ },
41
+ "model": {
42
+ "version": 1,
43
+ "type": "binary",
44
+ "binary": {
45
+ "version": 1,
46
+ "ctx-bins": [
47
+ "llama_v3_1_8b_instruct_part_1_of_5.bin",
48
+ "llama_v3_1_8b_instruct_part_2_of_5.bin",
49
+ "llama_v3_1_8b_instruct_part_3_of_5.bin",
50
+ "llama_v3_1_8b_instruct_part_4_of_5.bin",
51
+ "llama_v3_1_8b_instruct_part_5_of_5.bin"
52
+ ]
53
+ },
54
+ "positional-encoding": {
55
+ "type": "rope",
56
+ "rope-dim": 64,
57
+ "rope-theta": 500000,
58
+ "rope-scaling": {
59
+ "rope-type": "llama3",
60
+ "factor": 8.0,
61
+ "low-freq-factor": 1.0,
62
+ "high-freq-factor": 4.0,
63
+ "original-max-position-embeddings": 8192
64
+ }
65
+ }
66
+ }
67
+ }
68
+ }
69
+ }
configs/genie/llama_v3_1_sea_lion_3_5_r_8b_chat.json ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 128256,
9
+ "bos-token": -1,
10
+ "eos-token": [128001, 128009, 128008]
11
+ },
12
+ "sampler": {
13
+ "version": 1,
14
+ "seed": 42,
15
+ "temp": 0.85,
16
+ "top-k": 40,
17
+ "top-p": 0.6
18
+ },
19
+ "tokenizer": {
20
+ "version": 1,
21
+ "path": "tokenizer.json"
22
+ },
23
+ "engine": {
24
+ "version": 1,
25
+ "n-threads": 3,
26
+ "backend": {
27
+ "version": 1,
28
+ "type": "QnnHtp",
29
+ "QnnHtp": {
30
+ "version": 1,
31
+ "use-mmap": true,
32
+ "spill-fill-bufsize": 0,
33
+ "mmap-budget": 0,
34
+ "poll": true,
35
+ "cpu-mask": "0xe0",
36
+ "kv-dim": 128,
37
+ "allow-async-init": false
38
+ },
39
+ "extensions": "htp_backend_ext_config.json"
40
+ },
41
+ "model": {
42
+ "version": 1,
43
+ "type": "binary",
44
+ "binary": {
45
+ "version": 1,
46
+ "ctx-bins": [
47
+ "llama_v3_1_sea_lion_3_5_r_8b_chat_part_1_of_5.bin",
48
+ "llama_v3_1_sea_lion_3_5_r_8b_chat_part_2_of_5.bin",
49
+ "llama_v3_1_sea_lion_3_5_r_8b_chat_part_3_of_5.bin",
50
+ "llama_v3_1_sea_lion_3_5_r_8b_chat_part_4_of_5.bin",
51
+ "llama_v3_1_sea_lion_3_5_r_8b_chat_part_5_of_5.bin"
52
+ ]
53
+ },
54
+ "positional-encoding": {
55
+ "type": "rope",
56
+ "rope-dim": 64,
57
+ "rope-theta": 500000,
58
+ "rope-scaling": {
59
+ "rope-type": "llama3",
60
+ "factor": 8.0,
61
+ "low-freq-factor": 1.0,
62
+ "high-freq-factor": 4.0,
63
+ "original-max-position-embeddings": 8192
64
+ }
65
+ }
66
+ }
67
+ }
68
+ }
69
+ }
configs/genie/llama_v3_2_3b_instruct.json ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 128256,
9
+ "bos-token": -1,
10
+ "eos-token": [128001, 128009, 128008]
11
+ },
12
+ "sampler": {
13
+ "version": 1,
14
+ "seed": 42,
15
+ "temp": 0.8,
16
+ "top-k": 40,
17
+ "top-p": 0.95
18
+ },
19
+ "tokenizer": {
20
+ "version": 1,
21
+ "path": "tokenizer.json"
22
+ },
23
+ "engine": {
24
+ "version": 1,
25
+ "n-threads": 3,
26
+ "backend": {
27
+ "version": 1,
28
+ "type": "QnnHtp",
29
+ "QnnHtp": {
30
+ "version": 1,
31
+ "use-mmap": true,
32
+ "spill-fill-bufsize": 0,
33
+ "mmap-budget": 0,
34
+ "poll": true,
35
+ "cpu-mask": "0xe0",
36
+ "kv-dim": 128,
37
+ "allow-async-init": false
38
+ },
39
+ "extensions": "htp_backend_ext_config.json"
40
+ },
41
+ "model": {
42
+ "version": 1,
43
+ "type": "binary",
44
+ "binary": {
45
+ "version": 1,
46
+ "ctx-bins": [
47
+ "llama_v3_2_3b_instruct_part_1_of_3.bin",
48
+ "llama_v3_2_3b_instruct_part_2_of_3.bin",
49
+ "llama_v3_2_3b_instruct_part_3_of_3.bin"
50
+ ]
51
+ },
52
+ "positional-encoding": {
53
+ "type": "rope",
54
+ "rope-dim": 64,
55
+ "rope-theta": 500000,
56
+ "rope-scaling": {
57
+ "rope-type": "llama3",
58
+ "factor": 8.0,
59
+ "low-freq-factor": 1.0,
60
+ "high-freq-factor": 4.0,
61
+ "original-max-position-embeddings": 8192
62
+ }
63
+ }
64
+ }
65
+ }
66
+ }
67
+ }
configs/genie/llama_v3_8b_instruct.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 128256,
9
+ "bos-token": 128000,
10
+ "eos-token": 128001,
11
+ "eot-token": 128009
12
+ },
13
+ "sampler": {
14
+ "version": 1,
15
+ "seed": 42,
16
+ "temp": 0.8,
17
+ "top-k": 40,
18
+ "top-p": 0.95
19
+ },
20
+ "tokenizer": {
21
+ "version": 1,
22
+ "path": "tokenizer.json"
23
+ },
24
+ "engine": {
25
+ "version": 1,
26
+ "n-threads": 3,
27
+ "backend": {
28
+ "version": 1,
29
+ "type": "QnnHtp",
30
+ "QnnHtp": {
31
+ "version": 1,
32
+ "use-mmap": true,
33
+ "spill-fill-bufsize": 0,
34
+ "mmap-budget": 0,
35
+ "poll": true,
36
+ "pos-id-dim": 64,
37
+ "cpu-mask": "0xe0",
38
+ "kv-dim": 128,
39
+ "rope-theta": 500000,
40
+ "allow-async-init": false
41
+ },
42
+ "extensions": "htp_backend_ext_config.json"
43
+ },
44
+ "model": {
45
+ "version": 1,
46
+ "type": "binary",
47
+ "binary": {
48
+ "version": 1,
49
+ "ctx-bins": [
50
+ "llama_v3_8b_instruct_part_1_of_5.bin",
51
+ "llama_v3_8b_instruct_part_2_of_5.bin",
52
+ "llama_v3_8b_instruct_part_3_of_5.bin",
53
+ "llama_v3_8b_instruct_part_4_of_5.bin",
54
+ "llama_v3_8b_instruct_part_5_of_5.bin"
55
+ ]
56
+ }
57
+ }
58
+ }
59
+ }
60
+ }
configs/genie/llama_v3_taide_8b_chat.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 128256,
9
+ "bos-token": 128000,
10
+ "eos-token": 128001,
11
+ "eot-token": 128009
12
+ },
13
+ "sampler": {
14
+ "version": 1,
15
+ "seed": 42,
16
+ "temp": 0.8,
17
+ "top-k": 40,
18
+ "top-p": 0.95
19
+ },
20
+ "tokenizer": {
21
+ "version": 1,
22
+ "path": "tokenizer.json"
23
+ },
24
+ "engine": {
25
+ "version": 1,
26
+ "n-threads": 3,
27
+ "backend": {
28
+ "version": 1,
29
+ "type": "QnnHtp",
30
+ "QnnHtp": {
31
+ "version": 1,
32
+ "use-mmap": true,
33
+ "spill-fill-bufsize": 0,
34
+ "mmap-budget": 0,
35
+ "poll": true,
36
+ "pos-id-dim": 64,
37
+ "cpu-mask": "0xe0",
38
+ "kv-dim": 128,
39
+ "rope-theta": 500000,
40
+ "allow-async-init": false
41
+ },
42
+ "extensions": "htp_backend_ext_config.json"
43
+ },
44
+ "model": {
45
+ "version": 1,
46
+ "type": "binary",
47
+ "binary": {
48
+ "version": 1,
49
+ "ctx-bins": [
50
+ "llama_v3_taide_8b_chat_part_1_of_5.bin",
51
+ "llama_v3_taide_8b_chat_part_2_of_5.bin",
52
+ "llama_v3_taide_8b_chat_part_3_of_5.bin",
53
+ "llama_v3_taide_8b_chat_part_4_of_5.bin",
54
+ "llama_v3_taide_8b_chat_part_5_of_5.bin"
55
+ ]
56
+ }
57
+ }
58
+ }
59
+ }
60
+ }
configs/genie/mistral_7b_instruct_v0_3.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 128256,
9
+ "bos-token": 1,
10
+ "eos-token": 2,
11
+ "eot-token": 128009
12
+ },
13
+ "sampler": {
14
+ "version": 1,
15
+ "seed": 42,
16
+ "temp": 0.8,
17
+ "top-k": 40,
18
+ "top-p": 0.95
19
+ },
20
+ "tokenizer": {
21
+ "version": 1,
22
+ "path": "tokenizer.json"
23
+ },
24
+ "engine": {
25
+ "version": 1,
26
+ "n-threads": 3,
27
+ "backend": {
28
+ "version": 1,
29
+ "type": "QnnHtp",
30
+ "QnnHtp": {
31
+ "version": 1,
32
+ "spill-fill-bufsize": 0,
33
+ "use-mmap": true,
34
+ "mmap-budget": 0,
35
+ "poll": true,
36
+ "pos-id-dim": 64,
37
+ "cpu-mask": "0xe0",
38
+ "kv-dim": 128,
39
+ "rope-theta": 10000,
40
+ "allow-async-init": false
41
+ },
42
+ "extensions": "htp_backend_ext_config.json"
43
+ },
44
+ "model": {
45
+ "version": 1,
46
+ "type": "binary",
47
+ "binary": {
48
+ "version": 1,
49
+ "ctx-bins": [
50
+ "weight_sharing_model_1_of_4.serialized.bin",
51
+ "weight_sharing_model_2_of_4.serialized.bin",
52
+ "weight_sharing_model_3_of_4.serialized.bin",
53
+ "weight_sharing_model_4_of_4.serialized.bin"
54
+ ]
55
+ }
56
+ }
57
+ }
58
+ }
59
+ }
configs/genie/phi_3_5_mini_instruct.json ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "n-vocab": 32064,
8
+ "size": 4096,
9
+ "bos-token": 1,
10
+ "eos-token": [
11
+ 32007,
12
+ 32001,
13
+ 32000
14
+ ]
15
+ },
16
+ "sampler": {
17
+ "version": 1,
18
+ "seed": 42,
19
+ "temp": 0.8,
20
+ "top-k": 40,
21
+ "top-p": 0.95
22
+ },
23
+ "tokenizer": {
24
+ "version": 1,
25
+ "path": "tokenizer.json"
26
+ },
27
+ "engine": {
28
+ "version": 1,
29
+ "n-threads": 3,
30
+ "backend": {
31
+ "version": 1,
32
+ "type": "QnnHtp",
33
+ "QnnHtp": {
34
+ "version": 1,
35
+ "use-mmap": true,
36
+ "spill-fill-bufsize": 0,
37
+ "mmap-budget": 0,
38
+ "poll": true,
39
+ "cpu-mask": "0xe0",
40
+ "kv-dim": 96,
41
+ "allow-async-init": false
42
+ },
43
+ "extensions": "htp_backend_ext_config.json"
44
+ },
45
+ "model": {
46
+ "version": 1,
47
+ "type": "binary",
48
+ "binary": {
49
+ "version": 1,
50
+ "ctx-bins": [
51
+ "weight_sharing_model_1_of_4.serialized.bin",
52
+ "weight_sharing_model_2_of_4.serialized.bin",
53
+ "weight_sharing_model_3_of_4.serialized.bin",
54
+ "weight_sharing_model_4_of_4.serialized.bin"
55
+ ]
56
+ },
57
+ "positional-encoding": {
58
+ "type": "rope",
59
+ "rope-dim": 48,
60
+ "rope-theta": 10000,
61
+ "rope-scaling": {
62
+ "rope-type": "longrope",
63
+ "factor": 32,
64
+ "original-max-position-embeddings": 4096,
65
+ "long-factor": [
66
+ 1.0800000429153442,
67
+ 1.1100000143051147,
68
+ 1.1399999856948853,
69
+ 1.340000033378601,
70
+ 1.5899999141693115,
71
+ 1.600000023841858,
72
+ 1.6200000047683716,
73
+ 2.620000123977661,
74
+ 3.2300000190734863,
75
+ 3.2300000190734863,
76
+ 4.789999961853027,
77
+ 7.400000095367432,
78
+ 7.700000286102295,
79
+ 9.09000015258789,
80
+ 12.199999809265137,
81
+ 17.670000076293945,
82
+ 24.46000099182129,
83
+ 28.57000160217285,
84
+ 30.420001983642578,
85
+ 30.840002059936523,
86
+ 32.590003967285156,
87
+ 32.93000411987305,
88
+ 42.320003509521484,
89
+ 44.96000289916992,
90
+ 50.340003967285156,
91
+ 50.45000457763672,
92
+ 57.55000305175781,
93
+ 57.93000411987305,
94
+ 58.21000289916992,
95
+ 60.1400032043457,
96
+ 62.61000442504883,
97
+ 62.62000274658203,
98
+ 62.71000289916992,
99
+ 63.1400032043457,
100
+ 63.1400032043457,
101
+ 63.77000427246094,
102
+ 63.93000411987305,
103
+ 63.96000289916992,
104
+ 63.970001220703125,
105
+ 64.02999877929688,
106
+ 64.06999969482422,
107
+ 64.08000183105469,
108
+ 64.12000274658203,
109
+ 64.41000366210938,
110
+ 64.4800033569336,
111
+ 64.51000213623047,
112
+ 64.52999877929688,
113
+ 64.83999633789062
114
+ ],
115
+ "short-factor": [
116
+ 1,
117
+ 1.0199999809265137,
118
+ 1.0299999713897705,
119
+ 1.0299999713897705,
120
+ 1.0499999523162842,
121
+ 1.0499999523162842,
122
+ 1.0499999523162842,
123
+ 1.0499999523162842,
124
+ 1.0499999523162842,
125
+ 1.0699999332427979,
126
+ 1.0999999046325684,
127
+ 1.1099998950958252,
128
+ 1.1599998474121094,
129
+ 1.1599998474121094,
130
+ 1.1699998378753662,
131
+ 1.2899998426437378,
132
+ 1.339999794960022,
133
+ 1.679999828338623,
134
+ 1.7899998426437378,
135
+ 1.8199998140335083,
136
+ 1.8499997854232788,
137
+ 1.8799997568130493,
138
+ 1.9099997282028198,
139
+ 1.9399996995925903,
140
+ 1.9899996519088745,
141
+ 2.0199997425079346,
142
+ 2.0199997425079346,
143
+ 2.0199997425079346,
144
+ 2.0199997425079346,
145
+ 2.0199997425079346,
146
+ 2.0199997425079346,
147
+ 2.0299997329711914,
148
+ 2.0299997329711914,
149
+ 2.0299997329711914,
150
+ 2.0299997329711914,
151
+ 2.0299997329711914,
152
+ 2.0299997329711914,
153
+ 2.0299997329711914,
154
+ 2.0299997329711914,
155
+ 2.0299997329711914,
156
+ 2.0799996852874756,
157
+ 2.0899996757507324,
158
+ 2.189999580383301,
159
+ 2.2199995517730713,
160
+ 2.5899994373321533,
161
+ 2.729999542236328,
162
+ 2.749999523162842,
163
+ 2.8399994373321533
164
+ ]
165
+ }
166
+ }
167
+ }
168
+ }
169
+ }
170
+ }
configs/genie/qwen2_7b_instruct.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 152064,
9
+ "bos-token": -1,
10
+ "eos-token": 151645
11
+ },
12
+ "sampler": {
13
+ "version": 1,
14
+ "seed": 42,
15
+ "temp": 0.8,
16
+ "top-k": 40,
17
+ "top-p": 0.95
18
+ },
19
+ "tokenizer": {
20
+ "version": 1,
21
+ "path": "tokenizer.json"
22
+ },
23
+ "engine": {
24
+ "version": 1,
25
+ "n-threads": 3,
26
+ "backend": {
27
+ "version": 1,
28
+ "type": "QnnHtp",
29
+ "QnnHtp": {
30
+ "version": 1,
31
+ "use-mmap": true,
32
+ "spill-fill-bufsize": 0,
33
+ "mmap-budget": 0,
34
+ "poll": false,
35
+ "pos-id-dim": 64,
36
+ "cpu-mask": "0xe0",
37
+ "kv-dim": 128,
38
+ "rope-theta": 1000000,
39
+ "allow-async-init": false
40
+ },
41
+ "extensions": "htp_backend_ext_config.json"
42
+ },
43
+ "model": {
44
+ "version": 1,
45
+ "type": "binary",
46
+ "binary": {
47
+ "version": 1,
48
+ "ctx-bins": [
49
+ "weight_sharing_model_1_of_4.serialized.bin",
50
+ "weight_sharing_model_2_of_4.serialized.bin",
51
+ "weight_sharing_model_3_of_4.serialized.bin",
52
+ "weight_sharing_model_4_of_4.serialized.bin"
53
+ ]
54
+ }
55
+ }
56
+ }
57
+ }
58
+ }
configs/htp/htp_backend_ext_config.json.template ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "devices": [
3
+ {
4
+ "soc_model": <TODO>,
5
+ "dsp_arch": "<TODO>",
6
+ "cores": [
7
+ {
8
+ "core_id": 0,
9
+ "perf_profile": "burst",
10
+ "rpc_control_latency": 100
11
+ }
12
+ ]
13
+ }
14
+ ],
15
+ "memory": {
16
+ "mem_type": "shared_buffer"
17
+ },
18
+ "context": {
19
+ "weight_sharing_enabled": true
20
+ }
21
+ }
genie_bundle/Genie.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:415408862c6d4d00dcb6311fbe09295616488e34fef7c56021d78a19124ab018
3
+ size 6075904
genie_bundle/PlatformValidatorShared.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:148b5d54dd5c66886bd54fc2fc61492414d04ff8dd31ce8a2395596219cdc852
3
+ size 10963968
genie_bundle/QnnGenAiTransformer.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2fe7907763b53c1a6f576844a3cc8a2e984adc5ca74fca8aa68e9023b0c5b64
3
+ size 516096
genie_bundle/QnnGenAiTransformerModel.dll ADDED
Binary file (65 kB). View file
 
genie_bundle/QnnHtp.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45371a736f3fa4a131fb51031959d6b771577ec2ecc2ea2ef25c690ac94f77ab
3
+ size 2168832
genie_bundle/QnnHtpNetRunExtensions.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85aa19646293a0b6319345064b034ce36f6a89935b6ab98fad5221af3640c6e8
3
+ size 835584
genie_bundle/QnnHtpPrepare.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6202a3516c606545ab28fc4db0e554f46839aef9c178065cc9f637e82024d5bf
3
+ size 62586880
genie_bundle/QnnHtpv73CalculatorStub.dll ADDED
Binary file (25.1 kB). View file
 
genie_bundle/QnnHtpv73Stub.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14bba834d18b6491b0da343f2ef37d6497f5f6f43991d84ea4954916ae09419d
3
+ size 328704
genie_bundle/QnnSystem.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bc04d39a24d81932b545293e7de1dccc632d3b52c9841dc5e8cb4276142bbeb
3
+ size 2276352
genie_bundle/genie-t2t-run.exe ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1afe8a72063d6780b100e9a75f161c681b309c9e2af8640a28f1e4c344269453
3
+ size 228864
genie_bundle/genie_config.json ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dialog": {
3
+ "version": 1,
4
+ "type": "basic",
5
+ "context": {
6
+ "version": 1,
7
+ "size": 4096,
8
+ "n-vocab": 128256,
9
+ "bos-token": -1,
10
+ "eos-token": [128001, 128009, 128008]
11
+ },
12
+ "sampler": {
13
+ "version": 1,
14
+ "seed": 42,
15
+ "temp": 0.8,
16
+ "top-k": 40,
17
+ "top-p": 0.95
18
+ },
19
+ "tokenizer": {
20
+ "version": 1,
21
+ "path": "C:\\ai-hub-apps\\tutorials\\llm_on_genie\\genie_bundle\\tokenizer.json"
22
+ },
23
+ "engine": {
24
+ "version": 1,
25
+ "n-threads": 3,
26
+ "backend": {
27
+ "version": 1,
28
+ "type": "QnnHtp",
29
+ "QnnHtp": {
30
+ "version": 1,
31
+ "use-mmap": false,
32
+ "spill-fill-bufsize": 0,
33
+ "mmap-budget": 0,
34
+ "poll": true,
35
+ "cpu-mask": "0xe0",
36
+ "kv-dim": 128,
37
+ "allow-async-init": false
38
+ },
39
+ "extensions": "C:\\ai-hub-apps\\tutorials\\llm_on_genie\\genie_bundle\\htp_backend_ext_config.json"
40
+ },
41
+ "model": {
42
+ "version": 1,
43
+ "type": "binary",
44
+ "binary": {
45
+ "version": 1,
46
+ "ctx-bins": [
47
+ "C:\\ai-hub-apps\\tutorials\\llm_on_genie\\genie_bundle\\llama_v3_2_3b_instruct_part_1_of_3.bin",
48
+ "C:\\ai-hub-apps\\tutorials\\llm_on_genie\\genie_bundle\\llama_v3_2_3b_instruct_part_2_of_3.bin",
49
+ "C:\\ai-hub-apps\\tutorials\\llm_on_genie\\genie_bundle\\llama_v3_2_3b_instruct_part_3_of_3.bin"
50
+ ]
51
+ },
52
+ "positional-encoding": {
53
+ "type": "rope",
54
+ "rope-dim": 64,
55
+ "rope-theta": 500000,
56
+ "rope-scaling": {
57
+ "rope-type": "llama3",
58
+ "factor": 8.0,
59
+ "low-freq-factor": 1.0,
60
+ "high-freq-factor": 4.0,
61
+ "original-max-position-embeddings": 8192
62
+ }
63
+ }
64
+ }
65
+ }
66
+ }
67
+ }
genie_bundle/htp_backend_ext_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "devices": [
3
+ {
4
+ "soc_model": 60,
5
+ "dsp_arch": "v73",
6
+ "cores": [
7
+ {
8
+ "core_id": 0,
9
+ "perf_profile": "burst",
10
+ "rpc_control_latency": 100
11
+ }
12
+ ]
13
+ }
14
+ ],
15
+ "memory": {
16
+ "mem_type": "shared_buffer"
17
+ },
18
+ "context": {
19
+ "weight_sharing_enabled": true
20
+ }
21
+ }
genie_bundle/libCalculator_skel.so ADDED
Binary file (5.6 kB). View file
 
genie_bundle/libQnnHtpv73.cat ADDED
Binary file (12.1 kB). View file
 
genie_bundle/libQnnHtpv73Skel.so ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a96ea19f8bb55d576ab3721787d2502bc744064cca5f7f4aa267b39b802d45b5
3
+ size 9311684
genie_bundle/qnn-platform-validator.exe ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75b259422c65c7e970280a732059f451637ea1048bafdab00bcf45d37fd90047
3
+ size 637952
genie_bundle/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
platformValidator/output/Result.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ Platform Validator Results:
2
+ Backend,Hardware Backend,Library Prerequisites,Library Version,Backend Core Version,Unit Test,Overall Result
3
+ DSP,Supported,Found,Not Queried,Not Queried,Passed,Passed
powershell/LlmUtils.ps1 ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ------------------------------------------------------------------------
2
+ # Copyright (c) 2025 Qualcomm Innovation Center, Inc. All rights reserved.
3
+ # SPDX-License-Identifier: BSD-3-Clause
4
+ # ------------------------------------------------------------------------
5
+
6
+ function Copy-Genie {
7
+ <#
8
+ .SYNOPSIS
9
+ Copy enough of QNN to be able to run genie-t2t-run.exe.
10
+ #>
11
+
12
+ param(
13
+ [Parameter(Mandatory = $true)]
14
+ [string]$QnnSdkRoot,
15
+
16
+ [Parameter(Mandatory = $true)]
17
+ [string]$Destination,
18
+
19
+ [Parameter(Mandatory = $true)]
20
+ [string]$DspArch,
21
+
22
+ [boolean]$Clean = $true
23
+ )
24
+
25
+ if ($Clean) {
26
+ Remove-Item -Path $Destination\*.* -Include *.cat, *.dll, *.exe, *.lib, *.so
27
+ }
28
+
29
+ $Arch = "aarch64-windows-msvc"
30
+
31
+ Copy-Item -Destination $Destination `
32
+ $QnnSdkRoot\bin\$Arch\genie-t2t-run.exe, `
33
+ $QnnSdkRoot\bin\$Arch\qnn-platform-validator.exe, `
34
+ $QnnSdkRoot\lib\$Arch\Genie.dll, `
35
+ $QnnSdkRoot\lib\$Arch\PlatformValidatorShared.dll, `
36
+ $QnnSdkRoot\lib\$Arch\QnnGenAiTransformer.dll, `
37
+ $QnnSdkRoot\lib\$Arch\QnnGenAiTransformerModel.dll, `
38
+ $QnnSdkRoot\lib\$Arch\QnnHtp.dll, `
39
+ $QnnSdkRoot\lib\$Arch\QnnHtpNetRunExtensions.dll, `
40
+ $QnnSdkRoot\lib\$Arch\QnnHtpPrepare.dll, `
41
+ $QnnSdkRoot\lib\$Arch\QnnHtp${DspArch}Stub.dll, `
42
+ $QnnSdkRoot\lib\$Arch\QnnHtp${DspArch}CalculatorStub.dll, `
43
+ $QnnSdkRoot\lib\$Arch\QnnSystem.dll, `
44
+ $QnnSdkRoot\lib\hexagon-$DspArch\unsigned\libCalculator_skel.so, `
45
+ $QnnSdkRoot\lib\hexagon-$DspArch\unsigned\libQnnHtp${DspArch}.cat, `
46
+ $QnnSdkRoot\lib\hexagon-$DspArch\unsigned\libQnnHtp${DspArch}Skel.so `
47
+ }
48
+
49
+ function Get-ModelFileNames {
50
+ <#
51
+ .SYNOPSIS
52
+ Get a list of a model's context binaries.
53
+ #>
54
+ param (
55
+ [Parameter(Mandatory = $true)]
56
+ [string]$GenieConfigPath
57
+ )
58
+
59
+ $genieConfig = (Get-Content -Path $GenieConfigPath) | ConvertFrom-Json
60
+ return $genieConfig.dialog.engine.model.binary."ctx-bins"
61
+ }
62
+
63
+ function Test-ModelFiles {
64
+ <#
65
+ .SYNOPSIS
66
+ Test if all files required to run a model are available.
67
+ #>
68
+ param (
69
+ [string]$GenieConfigTemplatePath,
70
+ [string]$BundleRoot,
71
+ [string]$TokenizerPath
72
+ )
73
+
74
+ if (-Not (Test-Path $GenieConfigTemplatePath)) {
75
+ Write-Error "Genie config template $GenieConfigTemplatePath does not exist."
76
+ return $false
77
+ }
78
+
79
+ $modelFileNames = Get-ModelFileNames -GenieConfigPath $GenieConfigTemplatePath
80
+ foreach ($modelFileName in $modelFileNames) {
81
+ $modelFilePath = Join-Path $BundleRoot $modelFileName
82
+ if (-Not (Test-Path $modelFilePath)) {
83
+ Write-Error "Model file $modelFilePath does not exist."
84
+ return $false
85
+ }
86
+ }
87
+
88
+ if (-Not (Test-Path $TokenizerPath)) {
89
+ Write-Error "Tokenizer not found in $TokenizerPath. See LLM on Genie tutorial for more info."
90
+ return $false
91
+ }
92
+
93
+ return $true
94
+ }
95
+
96
+ function Test-SdkDirectories {
97
+ <#
98
+ .SYNOPSIS
99
+ Check if the QNN and QAIHM directories are valid.
100
+ #>
101
+
102
+ param(
103
+ [Parameter(Mandatory = $true)]
104
+ [string]$QnnSdkRoot,
105
+
106
+ [Parameter(Mandatory = $true)]
107
+ [string]$QaihmRoot
108
+ )
109
+
110
+ if (-Not $QnnSdkRoot) {
111
+ Write-Error "QNN SDK root has not been set. Set Env[QNN_SDK_ROOT] or pass -QnnSdkRoot argument."
112
+ return $false
113
+ }
114
+
115
+ if (-Not (Test-Path -Path $QnnSdkRoot -PathType Container)) {
116
+ Write-Error "QNN SDK root '$QnnSdkRoot' is not a directory."
117
+ return $false
118
+ }
119
+
120
+ if (-Not $QaihmRoot) {
121
+ Write-Error "ai-hub-models repo root has not been set. Set Env[QAIHM_ROOT] or pass -QaihmRoot argument."
122
+ return $false
123
+ }
124
+
125
+ if (-Not (Test-Path -Path $QaihmRoot -PathType Container)) {
126
+ Write-Error "ai-hub-models repo root '$QaihmRoot' is not a directory."
127
+ return $false
128
+ }
129
+
130
+ return $true
131
+ }
132
+
133
+ function ThrowIfLastFailed() {
134
+ param (
135
+ [string]$ErrorMessage = "Last command failed."
136
+ )
137
+ if (-Not $?) {
138
+ throw $ErrorMessage
139
+ }
140
+ }
141
+
142
+ function Update-ModelPaths {
143
+ <#
144
+ .SYNOPSIS
145
+ Prefix bare file references in the given genie.config text with our bundle root.
146
+ #>
147
+ param (
148
+ [Parameter(Mandatory = $true)]
149
+ [string]$GenieConfigTemplatePath,
150
+
151
+ [Parameter(Mandatory = $true)]
152
+ [string]$BundleRoot,
153
+
154
+ [Parameter(Mandatory = $true)]
155
+ [string]$TokenizerPath,
156
+
157
+ [Parameter(ValueFromPipeline = $true, Mandatory = $true)]
158
+ [string]$GenieConfigLine
159
+ )
160
+
161
+ BEGIN {
162
+ $BundleRootEscaped = $BundleRoot.Replace("\", "\\")
163
+ $TokenizerPathEscaped = $TokenizerPath.Replace("\", "\\")
164
+ $ModelFileNames = Get-ModelFileNames -GenieConfigPath $GenieConfigTemplatePath
165
+ }
166
+
167
+ PROCESS {
168
+ foreach ($line in $GenieConfigLine) {
169
+ $line = $line.Replace("`"path`": `"tokenizer.json`"", "`"path`": `"$TokenizerPathEscaped`"")
170
+ $line = $line.Replace("`"extensions`": `"htp_backend_ext_config.json`"", "`"extensions`": `"$BundleRootEscaped\\htp_backend_ext_config.json`"")
171
+
172
+ foreach ($modelFile in $ModelFileNames) {
173
+ $line = $line.Replace("`"$modelFile`"", "`"$BundleRootEscaped\\$modelFile`"")
174
+ }
175
+
176
+ $line
177
+ }
178
+ }
179
+ }
180
+
181
+ function Write-Status {
182
+ param(
183
+ [Parameter(Mandatory = $true)]
184
+ [string]$Message
185
+ )
186
+
187
+ Write-Host -ForegroundColor Green $Message
188
+ }
powershell/README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Genie PowerShell script for Windows
2
+
3
+ This script simplifies the steps in the [LLM On-Device Deployment](..)
4
+ tutorial. It is assumed that you have followed the tutorial and prepared a
5
+ Genie bundle with the context binaries and tokenizer.
6
+
7
+ From the parent directory (`llm_on_genie`), please run the command in a
8
+ PowerShell terminal:
9
+
10
+ ```powershell
11
+ .\powershell\RunLlm1.ps1
12
+ ```
13
+
14
+ For more information, run:
15
+
16
+ ```powershell
17
+ Get-Help .\powershell\RunLlm1.ps1
18
+ ```
19
+
20
+ If it says that you do not have permission to run PowerShell scripts, you may
21
+ need to enable this first with this command:
22
+
23
+ ```powershell
24
+ Set-ExecutionPolicy -Scope CurrentUser Unrestricted -Force
25
+ ```
26
+
27
+ ## UTF-8 support
28
+
29
+ If you want to use Unicode characters in your prompts, we recommend globally
30
+ enabling UTF-8 on Windows:
31
+
32
+ * Open Control Panel -> Region -> Administrative -> Change system locale...
33
+ * Tick the "Beta: Use Unicode UTF-8 for worldwide language support"
34
+ * Restart Windows
35
+
36
+ If you want to change this only for a single PowerShell session, execute:
37
+
38
+ ```powershell
39
+ chcp 65001
40
+ [Console]::OutputEncoding = [System.Text.Encoding]::UTF8]
41
+ ```
powershell/RunLlm.ps1 ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ------------------------------------------------------------------------
2
+ # Copyright (c) 2025 Qualcomm Innovation Center, Inc. All rights reserved.
3
+ # SPDX-License-Identifier: BSD-3-Clause
4
+ # ------------------------------------------------------------------------
5
+
6
+ <#
7
+ .SYNOPSIS
8
+ Run large language models exported by Qualcomm AI Hub.
9
+
10
+ .DESCRIPTION
11
+ This script handles all of the tasks required to run supported LLMs on
12
+ a Windows on Snapdragon device. Use the ai-hub-models Python package to
13
+ export the model and provide the associated tokenizer.json. Next, this
14
+ will produce all necessary configuration files, set up QNN/Genie, and
15
+ run genie-t2t-run.exe.
16
+
17
+ .EXAMPLE
18
+ # 1. Export the model
19
+ python -m qai_hub_models.models.llama_v3_2_3b_instruct.export --device "Snapdragon X Elite CRD" --skip-inferencing --skip-profiling --output-dir genie_bundle
20
+
21
+ # 2. Download tokenizer to genie_bundle\tokenizer.json (see ..\README.md for download links).
22
+
23
+ # 3. Run the model on this device.
24
+ & .\RunLlm.ps1 -ModelName llama_v3_2_3b_instruct -BundleRoot genie_bundle -RawPrompt "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHow many dogs are there in space?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
25
+
26
+ .LINK
27
+ https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie/
28
+
29
+ .LINK
30
+ https://github.com/quic/ai-hub-models/
31
+ #>
32
+ param (
33
+ [Parameter(Mandatory = $true,
34
+ HelpMessage = "The name of the AI Hub model to run,`n e.g., llama_v3_2_3b_instruct.")]
35
+ [string]$ModelName,
36
+
37
+ [Parameter(Mandatory = $true,
38
+ HelpMessage = "Path to the directory containing the model's context binaries.`n e.g., genie_bundle")]
39
+ [string]$BundleRoot,
40
+
41
+ [Parameter(HelpMessage = "Path to the model's tokenizer.json. Default: `$BundleRoot\tokenizer.json")]
42
+ [string]$TokenizerPath,
43
+
44
+ [Parameter(Mandatory = $true,
45
+ HelpMessage = "A raw prompt string, including system prompt header. Please note that this is model-specific; details in ..\README.md.`n e.g., <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>")]
46
+ [string]$RawPrompt,
47
+
48
+ [Parameter(HelpMessage = "Path to the QNN SDK.")]
49
+ [string]$QnnSdkRoot = $Env:QNN_SDK_ROOT,
50
+
51
+ [Parameter(HelpMessage = "Override path to the ai-hub-models repository.")]
52
+ [string]$QaihmRoot,
53
+
54
+ [Parameter(HelpMessage = "This device's soc_model. 60 = Snapdragon X Elite / X Plus.")]
55
+ [int]$SocModel = 60,
56
+
57
+ [Parameter(HelpMessage = "This device's dsp_arch. v73 = Snapdragon X Elite / X Plus and many others.")]
58
+ [string]$DspArch = "v73"
59
+ )
60
+
61
+ # Make utility functions available
62
+ $ScriptRoot = (Resolve-Path -Path "$(Split-Path -Parent $MyInvocation.MyCommand.Definition)").Path
63
+ . $ScriptRoot\LlmUtils.ps1
64
+
65
+ if (-Not $TokenizerPath) {
66
+ $TokenizerPath = Join-Path $BundleRoot "tokenizer.json"
67
+ }
68
+ $TokenizerPath = (Resolve-Path -Path $TokenizerPath)
69
+
70
+ if (-Not $QaihmRoot) {
71
+ $QaihmRoot = (Resolve-Path -Path $ScriptRoot\..\..\..)
72
+ }
73
+
74
+ # Turn $BundleRoot into an absolute path
75
+ $BundleRoot = (Resolve-Path -Path $BundleRoot)
76
+
77
+ # Location of the sample Genie configs in the ai-hub-models repo
78
+ $GenieConfigsDir = "$QaihmRoot\tutorials\llm_on_genie\configs"
79
+ $HtpConfigTemplatePath = "$GenieConfigsDir\htp\htp_backend_ext_config.json.template"
80
+ $GenieConfigTemplatePath = "$GenieConfigsDir\genie\$ModelName.json"
81
+
82
+ # Ensure the SDKs and other files are available
83
+ if (-Not (Test-SdkDirectories -QnnSdkRoot $QnnSdkRoot -QaihmRoot $QaihmRoot)) {
84
+ exit 1
85
+ }
86
+ if (-Not (Test-ModelFiles -GenieConfigTemplatePath $GenieConfigTemplatePath -BundleRoot $BundleRoot -TokenizerPath $TokenizerPath)) {
87
+ exit 1
88
+ }
89
+
90
+ # Where the config files need to go
91
+ $HtpConfigPath = "$BundleRoot/htp_backend_ext_config.json"
92
+ $GenieConfigPath = "$BundleRoot/genie_config.json"
93
+
94
+ # Substitute two placeholder strings in the HTP config.
95
+ # Using -Encoding ascii in the Out-File invocation is required to avoid
96
+ # cryptic JSON parsing errors.
97
+ Write-Status "Creating $HtpConfigPath from template in $HtpConfigTemplatePath."
98
+ (Get-Content -Path $HtpConfigTemplatePath) `
99
+ -replace "`"soc_model`": <TODO>", "`"soc_model`": $SocModel" `
100
+ -replace "`"dsp_arch`": `"<TODO>`"", "`"dsp_arch`": `"$DspArch`"" |
101
+ Out-File -Encoding ascii "$HtpConfigPath"
102
+ ThrowIfLastFailed
103
+
104
+ # Disable mmap on this Windows device and update some paths so we don't have to
105
+ # change directories to run Genie.
106
+ # As above, ascii encoding is required.
107
+ Write-Status "Creating $GenieConfigPath from template in $GenieConfigTemplatePath."
108
+ (Get-Content -Path "$GenieConfigTemplatePath") `
109
+ -replace "`"use-mmap`": true", "`"use-mmap`": false" ` |
110
+ Update-ModelPaths -GenieConfigTemplatePath $GenieConfigTemplatePath -BundleRoot $BundleRoot -TokenizerPath $TokenizerPath |
111
+ Out-File -Encoding ascii "$GenieConfigPath"
112
+ ThrowIfLastFailed
113
+
114
+ Write-Status "Setting ADSP_LIBRARY_PATH = `"$BundleRoot`"."
115
+ $OldAdspLibraryPath = $Env:ADSP_LIBRARY_PATH
116
+ $Env:ADSP_LIBRARY_PATH = $BundleRoot
117
+
118
+ # I haven't been able to get Genie to run from $QnnSdkRoot so copy it into $BundleRoot and run from there.
119
+ # Note that setting ADSP_LIBRARY_PATH and PATH _is_ sufficient for QNN (including the validator),
120
+ # but _not_ for Genie on Windows.
121
+ Write-Status "Copying Genie and $DspArch binaries to $BundleRoot."
122
+ Copy-Genie -QnnSdkRoot $QnnSdkRoot -Destination $BundleRoot -DspArch $DspArch
123
+
124
+ # Check if QNN thinks our host is set up properly.
125
+ # Note that qnn-platform-validator always returns 0 so we can't easily bail on failure.
126
+ # Its output is a malformed CSV so even that is hard to use :-/
127
+ & $BundleRoot\qnn-platform-validator.exe --backend dsp --testBackend
128
+
129
+ # Run Genie
130
+ $GenieCmd = "$BundleRoot\genie-t2t-run.exe"
131
+ $GenieArgs = @("-c", "$BundleRoot\genie_config.json", "-p", $RawPrompt)
132
+ Write-Status "Running $GenieCmd $GenieArgs"
133
+ & $GenieCmd $GenieArgs
134
+
135
+ $Env:ADSP_LIBRARY_PATH = $OldAdspLibraryPath