amant555 commited on
Commit
77c992c
·
1 Parent(s): b9114ba

Upload model Apriel 1.5 Thinker

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ Apriel-1.5-Thinker.pdf filter=lfs diff=lfs merge=lfs -text
Apriel-1.5-Thinker.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f38a1b75f8714198ca728a60563f34122f5b2c9fd207055d47de12f7bf055da
3
+ size 3388802
README.md ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ ---
6
+
7
+ # Apriel-1.5-15b-Thinker - Mid training is all you need!
8
+
9
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63d3095c2727d7888cbb54e2/Lt1t0tOO5emz1X23Azg-E.png" width="120" alt="thumbnail"/> `/ˈɑː.pri.əl/`
10
+
11
+ ---
12
+
13
+ # Table of Contents
14
+
15
+ 1. [Summary](#summary)
16
+ 2. [Evaluation](#evaluation)
17
+ 3. [Training Details](#training-details)
18
+ 4. [How to Use](#how-to-use)
19
+ 5. [Intended Use](#intended-use)
20
+ 6. [Limitations](#limitations)
21
+ 7. [Security and Responsible Use](#security-and-responsible-use)
22
+ 8. [Software](#software)
23
+ 9. [License](#license)
24
+ 10. [Acknowledgements](#acknowledgements)
25
+ 11. [Citation](#citation)
26
+
27
+
28
+ ---
29
+
30
+ # Summary
31
+
32
+ **Apriel-1.5-15b-Thinker** is a multimodal reasoning model in ServiceNow’s Apriel SLM series which achieves competitive performance against models 10 times it's size. Apriel-1.5 is the second model in the reasoning series. It introduces enhanced textual reasoning capabilities and adds image reasoning support to the previous text model. It has undergone extensive continual pretraining across both text and image domains. In terms of post-training this model has **undergone text-SFT only**. Our research demonstrates that with a strong mid-training regimen, we are able to achive SOTA performance on text and image reasoning tasks without having any image SFT training or RL.
33
+
34
+ **Highlights**
35
+ - Achieves a score of **52** on the Artificial Analysis index and is competitive with Deepseek R1 0528, Gemini-Flash etc.
36
+ - It is **AT LEAST 1 / 10** the size of any other model that scores > 50 on the Artificial Analysis index.
37
+ - Scores **68** on Tau2 Bench Telecom and **62** on IFBench, which are key benchmarks for the enterprise domain.
38
+ - At 15B parameters, the model fits on a single GPU, making it highly memory-efficient.
39
+
40
+ ---
41
+
42
+ # Evaluation
43
+
44
+ - For text benchmarks, we report evaluations perforomed by a third party - **Artificial Analysis**.
45
+ - For image benchmarks, we report evaluations obtained by https://github.com/open-compass/VLMEvalKit
46
+
47
+ ---
48
+
49
+ # Results reported by Artificial Analysis
50
+
51
+ <!-- ![image](https://cdn-uploads.huggingface.co/production/uploads/63d3095c2727d7888cbb54e2/if7y-zjJbzHNYgc4agl7X.png)
52
+
53
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/63d3095c2727d7888cbb54e2/Ds-VYH--MbLRH7GrhWnF0.png) -->
54
+
55
+
56
+ ![index - latest 2](https://cdn-uploads.huggingface.co/production/uploads/66a41f3c1d52bffc13c285a5/pdSRgotQw00XKB04bmJg3.png)
57
+
58
+ ![index vs size fixed 3](https://cdn-uploads.huggingface.co/production/uploads/66a41f3c1d52bffc13c285a5/ewXvkN75gfyplJpWEJLyP.png)
59
+
60
+ ---
61
+
62
+
63
+ # Training Details
64
+
65
+ **Mid training / Continual Pre‑training**
66
+ In this stage, the model is trained on billions of tokens of carefully curated textual samples drawn from mathematical reasoning, coding challenges, scientific discourse, logical puzzles, and diverse knowledge-rich texts along with multimodal samples covering image understanding and reasoning, captioning, and interleaved image-text data. The objective is to strengthen foundational reasoning capabilities of the model. This stage is critical for the model to function as a reasoner and provides significant lifts in reasoning benchmarks.
67
+
68
+ **Supervised Fine‑Tuning (SFT)**
69
+ The model is fine-tuned on over 2M high-quality text samples spanning mathematical and scientific problem-solving, coding tasks, instruction-following, API/function invocation, and conversational use cases. This results in superior text performance comparable to models such as Deepseek R1 0528 and Gemini-Flash. Although no image-specific fine-tuning is performed, the model’s inherent multimodal capabilities and cross-modal transfer of reasoning behavior from the text SFT yield competitive image performance relative to other leading open-source VL models.
70
+
71
+ ---
72
+
73
+ # Running Apriel-1.5-15B-Thinker with vLLM
74
+
75
+ As the upstream PR is not yet merged, you can use this custom image as an alternate way to run the model with tool and reasoning parsers enabled.
76
+
77
+ ### Docker Image
78
+
79
+ ```
80
+ docker.io/amant555/vllm_apriel:latest
81
+ ```
82
+
83
+ ### Start Command
84
+
85
+ Run the container with the following command:
86
+
87
+ ```bash
88
+ python3 -m vllm.entrypoints.openai.api_server \
89
+ --model ServiceNow-AI/Apriel-1.5-15b-Thinker \
90
+ --served-model-name Apriel-1p5-15B-Thinker \
91
+ --trust_remote_code \
92
+ --max-model-len 131072 \
93
+ --enable-auto-tool-choice \
94
+ --tool-call-parser apriel \
95
+ --reasoning-parser apriel
96
+ ```
97
+
98
+ This will start the vLLM OpenAI-compatible API server serving the **Apriel-1.5-15B-Thinker** model with Apriel’s custom tool parser and reasoning parser.
99
+
100
+
101
+
102
+ # How to Use
103
+
104
+ ```bash
105
+ pip install transformers
106
+ ```
107
+
108
+ ---
109
+
110
+ ## Running the Reasoning model
111
+
112
+
113
+ Here is a code snippet demonstrating the model's usage with the transformers library's generate function:
114
+
115
+ ```python
116
+ # Tested with transformers==4.48
117
+
118
+ import re
119
+ import requests
120
+ import torch
121
+ from PIL import Image
122
+ from transformers import AutoProcessor, AutoModelForImageTextToText
123
+
124
+ # Load model
125
+ model_id = "ServiceNow-AI/Apriel-1.5-15b-Thinker"
126
+ model = AutoModelForImageTextToText.from_pretrained(
127
+ model_id,
128
+ torch_dtype=torch.bfloat16,
129
+ device_map="auto"
130
+ )
131
+ processor = AutoProcessor.from_pretrained(model_id)
132
+
133
+ # Example 1: Text-only prompt
134
+ chat = [
135
+ {
136
+ "role": "user",
137
+ "content": [
138
+ {"type": "text", "text": "What is the capital for France?"},
139
+ ],
140
+ }
141
+ ]
142
+
143
+ inputs = processor.apply_chat_template(chat, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt")
144
+ inputs = {k: v.to(model.device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
145
+ inputs.pop("token_type_ids", None)
146
+
147
+ with torch.no_grad():
148
+ output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.6)
149
+
150
+ generated_ids = output_ids[:, inputs['input_ids'].shape[1]:]
151
+ output = processor.decode(generated_ids[0], skip_special_tokens=True)
152
+ response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)\[END FINAL RESPONSE\]", output, re.DOTALL)[0].strip()
153
+
154
+ print("Text-only Response:", response)
155
+
156
+ # Example 2: Image understanding
157
+ url = "https://picsum.photos/id/237/200/300"
158
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
159
+
160
+ chat = [
161
+ {
162
+ "role": "user",
163
+ "content": [
164
+ {"type": "text", "text": "Which animal is this?"},
165
+ {"type": "image"},
166
+ ],
167
+ }
168
+ ]
169
+
170
+ prompt = processor.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)
171
+ inputs = processor(text=prompt, images=[image], return_tensors="pt").to(model.device)
172
+ inputs.pop("token_type_ids", None)
173
+
174
+ with torch.no_grad():
175
+ output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.6)
176
+
177
+ generated_ids = output_ids[:, inputs['input_ids'].shape[1]:]
178
+ output = processor.decode(generated_ids[0], skip_special_tokens=True)
179
+ response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)\[END FINAL RESPONSE\]", output, re.DOTALL)[0].strip()
180
+
181
+ print("Image Response:", response)
182
+
183
+ ```
184
+
185
+ ---
186
+
187
+ ## Chat Template
188
+
189
+
190
+ ```
191
+ <|system|>
192
+ You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE].
193
+ <|end|>
194
+ <|user|>
195
+ # user message here
196
+ <|end|>
197
+ <|assistant|>
198
+ Here are my reasoning steps:
199
+ # thoughts here
200
+ [BEGIN FINAL RESPONSE]
201
+ # assistant response here
202
+ [END FINAL RESPONSE]
203
+ <|end|>
204
+ ```
205
+ The model will first generate its thinking process and then generate its final response between `[BEGIN FINAL RESPONSE]` and `[END FINAL RESPONSE]`. Here is a code snippet demonstrating the application of the chat template:
206
+
207
+
208
+
209
+ ```python
210
+ from transformers import AutoTokenizer
211
+ model_name = "ServiceNow-AI/Apriel-1.5-15b-Thinker"
212
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
213
+
214
+ # prepare the model input
215
+ custom_system_prompt = "Answer like a pirate."
216
+ prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is [email protected].\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
217
+ messages = [
218
+ {"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
219
+ ]
220
+ # example tools
221
+ tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
222
+ text = tokenizer.apply_chat_template(
223
+ messages,
224
+ tokenize=False,
225
+ add_generation_prompt=True,
226
+ tools=tools
227
+ )
228
+ model_inputs = tokenizer([text], return_tensors="pt")
229
+ ```
230
+
231
+ ## Usage Guidelines
232
+ 1. Use the model’s default chat template, which already includes a system prompt. We recommend adding all other instructions within the user message.
233
+ 2. We recommend setting temperature to `0.6`.
234
+ 3. We ensure the model starts with `Here are my reasoning steps:\n` during all our evaluations. This is implemented in the default chat template.
235
+
236
+ ---
237
+
238
+
239
+ # Intended Use
240
+
241
+ The Apriel family of models are designed for a variety of general-purpose instruction tasks, including:
242
+
243
+ - Code assistance and generation
244
+ - Logical reasoning and multi-step tasks
245
+ - Question answering and information retrieval
246
+ - Function calling, complex instruction following and agent use cases
247
+
248
+ They are **not intended** for use in safety-critical applications without human oversight or in scenarios requiring guaranteed factual accuracy.
249
+
250
+ ---
251
+
252
+ # Limitations
253
+
254
+ - **Factual accuracy:** May produce incorrect, misleading, or outdated content. Outputs should be verified before use in critical contexts.
255
+ - **Bias:** May reflect societal, cultural, or systemic biases present in training data.
256
+ - **Ethics:** Do not use the model to produce harmful, unlawful, or unethical content.
257
+ - **Language:** Strongest performance is in English. Output quality may degrade in underrepresented languages.
258
+ - **Critical use:** Not suitable for medical, legal, financial, or other high-risk applications without safeguards.
259
+
260
+ ---
261
+
262
+ # Security and Responsible Use
263
+
264
+ **Security Responsibilities:**
265
+ Deployers and users are strongly encouraged to align their security practices with established frameworks and regulatory guidelines such as the EU AI Act and the NIST AI Risk Management Framework (RMF).
266
+
267
+ **Guidelines for Deployers:**
268
+
269
+ - Regularly conduct robustness assessments to identify and mitigate adversarial inputs.
270
+ - Implement validation and filtering processes to prevent harmful or biased outputs.
271
+ - Continuously perform data privacy checks to guard against unintended data leaks.
272
+ - Document and communicate the model's limitations, intended usage, and known security risks to all end-users.
273
+ - Schedule periodic security reviews and updates to address emerging threats and vulnerabilities.
274
+
275
+ **Guidelines for Users:**
276
+
277
+ - Follow established security policies and usage guidelines provided by deployers.
278
+ - Protect and manage sensitive information when interacting with the model.
279
+ - Report anomalies, suspicious behavior, or unsafe outputs to deployers or developers.
280
+ - Maintain human oversight and apply judgment to mitigate potential security or ethical risks during interactions.
281
+
282
+ **Disclaimer:**
283
+ Users accept responsibility for securely deploying, managing, and using this open-source LLM. The model is provided "as-is," without explicit or implied warranty regarding security or fitness for any specific application or environment.
284
+
285
+ ---
286
+
287
+ # Software
288
+
289
+ - **Training stack:** [Fast-LLM](https://github.com/ServiceNow/Fast-LLM)
290
+
291
+ ---
292
+
293
+ # License
294
+
295
+ MIT
296
+
297
+ ---
298
+
299
+ # Citation
300
+
301
+ ```bibtex
302
+ @article{radhakrishna2025apriel,
303
+ title={Apriel-Nemotron-15B-Thinker},
304
+ author={Radhakrishna, Shruthan and Parikh, Soham and Sarda, Gopal and Turkkan, Anil and Vohra, Quaizar and Li, Raymond and Jhamb, Dhruv and Ogueji, Kelechi and Shukla, Aanjaneya and Bamgbose, Oluwanifemi and others},
305
+ journal={arXiv preprint arXiv:2508.10948},
306
+ year={2025}
307
+ }
308
+ ```
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{%- set available_tools_string, thought_instructions, add_tool_id, tool_output_format = '', '', true, \"default\" -%}\n\n{%- if tools is not none and tools|length > 0 -%}\n {%- set available_tools_string -%}\nYou are provided with function signatures within <available_tools></available_tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about the arguments. You should infer the argument values from previous user responses and the system message. Here are the available tools:\n<available_tools>\n{% for tool in tools %}\n{{ tool|string }}\n{% endfor %}\n</available_tools>\n{%- endset -%}\n{%- endif -%}\n{%- if tool_output_format is none or tool_output_format == \"default\" -%}\n{%- set tool_output_instructions -%}\nReturn all function calls as a list of json objects within <tool_call></tool_call> XML tags. Each json object should contain a function name and arguments as follows:\n<tool_calls>[{\"name\": <function-name-1>, \"arguments\": <args-dict-1>}, {\"name\": <function-name-2>, \"arguments\": <args-dict-2>},...]</tool_calls>\n{%- endset -%}\n{%- elif tool_output_format == \"yaml\" -%}\n{%- set tool_output_instructions -%}\nReturn all function calls as a list of yaml objects within <tool_call></tool_call> XML tags. Each yaml object should contain a function name and arguments as follows:\n<tool_calls>\n- name: <function-name-1>\n arguments: <args-dict-1>\n- name: <function-name-2>\n arguments: <args-dict-2>\n...\n</tool_calls>\n{%- endset -%}\n{%- endif -%}\n{%- if add_thoughts -%}\n{%- set thought_instructions -%}\nPrior to generating the function calls, you should generate the reasoning for why you're calling the function. Please generate these reasoning thoughts between <thinking> and </thinking> XML tags.\n{%- endset -%}\n{%- endif -%}\n{{- bos_token -}}\n{%- set reasoning_prompt='You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE].' -%}\n{%- if messages[0]['role'] != 'system' and tools is not none and tools|length > 0 -%}\n {{- '<|system|>\\n' + reasoning_prompt + available_tools_string + \"\\n\" + tool_output_instructions + '\\n<|end|>\\n' -}}\n{%- endif -%}\n{%- if messages|selectattr('role', 'equalto', 'system')|list|length == 0 -%}\n{{- '<|system|>\\n' + reasoning_prompt + '\\n<|end|>\\n' -}}\n{%- endif -%}\n{%- for message in messages -%}\n {%- if message['role'] == 'user' -%}\n {{- '<|user|>\\n' }}\n {%- if message['content'] is not string %}\n {%- for chunk in message['content'] %}\n {%- if chunk['type'] == 'text' %}\n {{- chunk['text'] }}\n {%- elif chunk['type'] == 'image' or chunk['type'] == 'image_url'%}\n {{- '[IMG]' }}\n {%- else %}\n {{- raise_exception('Unrecognized content type!') }}\n {%- endif %}\n {%- endfor %}\n {%- else %}\n {{- message['content'] }}\n {%- endif %}\n {{- '\\n<|end|>\\n' }}\n {%- elif message['role'] == 'content' -%}\n {%- if message['content'] is not string %}\n {{- '<|content|>\\n' + message['content'][0]['text'] + '\\n<|end|>\\n' -}}\n {%- else %}\n {{- '<|content|>\\n' + message['content'] + '\\n<|end|>\\n' -}}\n {%- endif -%}\n {%- elif message['role'] == 'system' -%}\n {%- if message['content'] is not none and message['content']|length > 0 %}\n {%- if message['content'] is string %}\n {%- set system_message = message['content'] %}\n {%- else %}\n {%- set system_message = message['content'][0]['text'] %}\n {%- endif %}\n {%- else %}\n {%- set system_message = '' %}\n {%- endif %}\n {%- if tools is not none and tools|length > 0 -%}\n {{- '<|system|>\\n' + reasoning_prompt + system_message + '\\n' + available_tools_string + '\\n<|end|>\\n' -}}\n {%- else -%}\n {{- '<|system|>\\n' + reasoning_prompt + system_message + '\\n<|end|>\\n' -}}\n {%- endif -%}\n {%- elif message['role'] == 'assistant' -%}\n {%- if loop.last -%}\n {%- set add_tool_id = false -%}\n {%- endif -%}\n {{- '<|assistant|>\\n' -}}\n {%- if message['content'] is not none and message['content']|length > 0 -%}\n {%- if message['content'] is not string and message['content'][0]['text'] is not none %}\n {{- message['content'][0]['text'] }}\n {%- else %}\n {{- message['content'] -}}\n {%- endif -%}\n {%- elif message['chosen'] is not none and message['chosen']|length > 0 -%}\n {{- message['chosen'][0] -}}\n {%- endif -%}\n {%- if add_thoughts and 'thought' in message and message['thought'] is not none -%}\n {{- '<thinking>' + message['thought'] + '</thinking>' -}}\n {%- endif -%}\n {%- if message['tool_calls'] is not none and message['tool_calls']|length > 0 -%}\n {{- '\\n<tool_calls>[' -}}\n {%- for tool_call in message[\"tool_calls\"] -%}\n {{- '{\"name\": \"' + tool_call['function']['name'] + '\", \"arguments\": ' + tool_call['function']['arguments']|string -}}\n {%- if add_tool_id == true -%}\n {{- ', \"id\": \"' + tool_call['id'] + '\"' -}}\n {%- endif -%}\n {{- '}' -}}\n {%- if not loop.last -%}{{- ', ' -}}{%- endif -%}\n {%- endfor -%}\n {{- ']</tool_calls>' -}}\n {%- endif -%}\n {{- '\\n<|end|>\\n' + eos_token -}}\n {%- elif message['role'] == 'tool' -%}\n {%- if message['content'] is string %}\n {%- set tool_message = message['content'] %}\n {%- else %}\n {%- set tool_message = message['content'][0]['text'] %}\n {%- endif -%}\n {{- '<|tool_result|>\\n' + tool_message|string + '\\n<|end|>\\n' -}}\n {%- endif -%}\n {%- if loop.last and add_generation_prompt and message['role'] != 'assistant' -%}\n {{- '<|assistant|>\\n' -}}\n {%- endif -%}\n{%- endfor -%}"
3
+ }
config.json ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlavaForConditionalGeneration"
4
+ ],
5
+ "ignore_index": -100,
6
+ "image_seq_length": 1,
7
+ "image_token_index": 10,
8
+ "model_type": "llava",
9
+ "projector_hidden_act": "gelu",
10
+ "text_config": {
11
+ "_attn_implementation_autoset": false,
12
+ "_name_or_path": "",
13
+ "add_cross_attention": false,
14
+ "architectures": null,
15
+ "attention_dropout": 0.0,
16
+ "bad_words_ids": null,
17
+ "begin_suppress_tokens": null,
18
+ "bos_token_id": 1,
19
+ "chunk_size_feed_forward": 0,
20
+ "cross_attention_hidden_size": null,
21
+ "decoder_start_token_id": null,
22
+ "diversity_penalty": 0.0,
23
+ "do_sample": false,
24
+ "early_stopping": false,
25
+ "encoder_no_repeat_ngram_size": 0,
26
+ "eos_token_id": 2,
27
+ "exponential_decay_length_penalty": null,
28
+ "finetuning_task": null,
29
+ "forced_bos_token_id": null,
30
+ "forced_eos_token_id": null,
31
+ "head_dim": 128,
32
+ "hidden_act": "silu",
33
+ "hidden_size": 5120,
34
+ "id2label": {
35
+ "0": "LABEL_0",
36
+ "1": "LABEL_1"
37
+ },
38
+ "initializer_range": 0.02,
39
+ "intermediate_size": 14336,
40
+ "is_decoder": false,
41
+ "is_encoder_decoder": false,
42
+ "label2id": {
43
+ "LABEL_0": 0,
44
+ "LABEL_1": 1
45
+ },
46
+ "length_penalty": 1.0,
47
+ "max_length": 20,
48
+ "max_position_embeddings": 262400,
49
+ "min_length": 0,
50
+ "model_type": "mistral",
51
+ "no_repeat_ngram_size": 0,
52
+ "num_attention_heads": 32,
53
+ "num_beam_groups": 1,
54
+ "num_beams": 1,
55
+ "num_hidden_layers": 48,
56
+ "num_key_value_heads": 8,
57
+ "num_return_sequences": 1,
58
+ "output_attentions": false,
59
+ "output_hidden_states": false,
60
+ "output_scores": false,
61
+ "pad_token_id": null,
62
+ "prefix": null,
63
+ "problem_type": null,
64
+ "pruned_heads": {},
65
+ "remove_invalid_values": false,
66
+ "repetition_penalty": 1.0,
67
+ "return_dict": true,
68
+ "return_dict_in_generate": false,
69
+ "rms_norm_eps": 1e-05,
70
+ "rope_theta": 1000000000.0,
71
+ "sep_token_id": null,
72
+ "sliding_window": null,
73
+ "suppress_tokens": null,
74
+ "task_specific_params": null,
75
+ "temperature": 1.0,
76
+ "tf_legacy_loss": false,
77
+ "tie_encoder_decoder": false,
78
+ "tie_word_embeddings": false,
79
+ "tokenizer_class": null,
80
+ "top_k": 50,
81
+ "top_p": 1.0,
82
+ "torch_dtype": null,
83
+ "torchscript": false,
84
+ "typical_p": 1.0,
85
+ "use_bfloat16": false,
86
+ "use_cache": true,
87
+ "vocab_size": 131072
88
+ },
89
+ "torch_dtype": "bfloat16",
90
+ "transformers_version": "4.46.3",
91
+ "vision_config": {
92
+ "_attn_implementation_autoset": false,
93
+ "_name_or_path": "",
94
+ "add_cross_attention": false,
95
+ "architectures": null,
96
+ "attention_dropout": 0.0,
97
+ "bad_words_ids": null,
98
+ "begin_suppress_tokens": null,
99
+ "bos_token_id": null,
100
+ "chunk_size_feed_forward": 0,
101
+ "cross_attention_hidden_size": null,
102
+ "decoder_start_token_id": null,
103
+ "diversity_penalty": 0.0,
104
+ "do_sample": false,
105
+ "early_stopping": false,
106
+ "encoder_no_repeat_ngram_size": 0,
107
+ "eos_token_id": null,
108
+ "exponential_decay_length_penalty": null,
109
+ "finetuning_task": null,
110
+ "forced_bos_token_id": null,
111
+ "forced_eos_token_id": null,
112
+ "head_dim": 64,
113
+ "hidden_act": "silu",
114
+ "hidden_size": 1024,
115
+ "id2label": {
116
+ "0": "LABEL_0",
117
+ "1": "LABEL_1"
118
+ },
119
+ "image_size": 1024,
120
+ "initializer_range": 0.02,
121
+ "intermediate_size": 4096,
122
+ "is_decoder": false,
123
+ "is_encoder_decoder": false,
124
+ "label2id": {
125
+ "LABEL_0": 0,
126
+ "LABEL_1": 1
127
+ },
128
+ "length_penalty": 1.0,
129
+ "max_length": 20,
130
+ "min_length": 0,
131
+ "model_type": "pixtral",
132
+ "no_repeat_ngram_size": 0,
133
+ "num_attention_heads": 16,
134
+ "num_beam_groups": 1,
135
+ "num_beams": 1,
136
+ "num_channels": 3,
137
+ "num_hidden_layers": 24,
138
+ "num_return_sequences": 1,
139
+ "output_attentions": false,
140
+ "output_hidden_states": false,
141
+ "output_scores": false,
142
+ "pad_token_id": null,
143
+ "patch_size": 16,
144
+ "prefix": null,
145
+ "problem_type": null,
146
+ "pruned_heads": {},
147
+ "remove_invalid_values": false,
148
+ "repetition_penalty": 1.0,
149
+ "return_dict": true,
150
+ "return_dict_in_generate": false,
151
+ "rope_theta": 10000.0,
152
+ "sep_token_id": null,
153
+ "suppress_tokens": null,
154
+ "task_specific_params": null,
155
+ "temperature": 1.0,
156
+ "tf_legacy_loss": false,
157
+ "tie_encoder_decoder": false,
158
+ "tie_word_embeddings": true,
159
+ "tokenizer_class": null,
160
+ "top_k": 50,
161
+ "top_p": 1.0,
162
+ "torch_dtype": null,
163
+ "torchscript": false,
164
+ "typical_p": 1.0,
165
+ "use_bfloat16": false
166
+ },
167
+ "vision_feature_layer": -1,
168
+ "vision_feature_select_strategy": "full"
169
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.47.0"
6
+ }
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a61970257466b22ad181daa9e2fb9203d285ce9be8a69404a5656ff2df5697c
3
+ size 4990957080
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:043752ae06ccb0033497f5edf079f2e1b479f99fde5d36142d7ebff340f98fc1
3
+ size 4959959696
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4e9505546d37801f8e22d2ba17addf38d25788db205418119e8cb7eac72b9a3
3
+ size 4907530672
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d7cc7b852cc200395798ca9c4c34dff2bdedb7acd45651e2da19274e02b84bc
3
+ size 4907530672
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad8051f55027b8d759a5cab223b429a1bd6011af4f7c1a67f421535893e0a255
3
+ size 4907530672
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:902cbc266bde9aa9d9ea0087516a2ef2da41f3b279193450d71d736cc06442b4
3
+ size 3712120512
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d04fad29f38deec44697bb621ad6b462e606dedebe4d37132ae1e9e9b751bcb
3
+ size 1342177424
model.safetensors.index.json ADDED
@@ -0,0 +1,664 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 29727719424
4
+ },
5
+ "weight_map": {
6
+ "language_model.lm_head.weight": "model-00007-of-00007.safetensors",
7
+ "language_model.model.embed_tokens.weight": "model-00001-of-00007.safetensors",
8
+ "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
9
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
10
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
11
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
12
+ "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
13
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
14
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
15
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
16
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
17
+ "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
18
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
19
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
20
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
21
+ "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
22
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
23
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
24
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
25
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
26
+ "language_model.model.layers.10.input_layernorm.weight": "model-00002-of-00007.safetensors",
27
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
28
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
29
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
30
+ "language_model.model.layers.10.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
31
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
32
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
33
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
34
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
35
+ "language_model.model.layers.11.input_layernorm.weight": "model-00002-of-00007.safetensors",
36
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
37
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
38
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
39
+ "language_model.model.layers.11.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
40
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
41
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
42
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
43
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
44
+ "language_model.model.layers.12.input_layernorm.weight": "model-00002-of-00007.safetensors",
45
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
46
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
47
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
48
+ "language_model.model.layers.12.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
49
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
50
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
51
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
52
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
53
+ "language_model.model.layers.13.input_layernorm.weight": "model-00002-of-00007.safetensors",
54
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
55
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
56
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
57
+ "language_model.model.layers.13.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
58
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
59
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
60
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
61
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
62
+ "language_model.model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
63
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
64
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
65
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
66
+ "language_model.model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
67
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
68
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
69
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
70
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
71
+ "language_model.model.layers.15.input_layernorm.weight": "model-00003-of-00007.safetensors",
72
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
73
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
74
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
75
+ "language_model.model.layers.15.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
76
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
77
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
78
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
79
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
80
+ "language_model.model.layers.16.input_layernorm.weight": "model-00003-of-00007.safetensors",
81
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
82
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
83
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
84
+ "language_model.model.layers.16.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
85
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
86
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
87
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
88
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
89
+ "language_model.model.layers.17.input_layernorm.weight": "model-00003-of-00007.safetensors",
90
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
91
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
92
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
93
+ "language_model.model.layers.17.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
94
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
95
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
96
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
97
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
98
+ "language_model.model.layers.18.input_layernorm.weight": "model-00003-of-00007.safetensors",
99
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
100
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
101
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
102
+ "language_model.model.layers.18.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
103
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
104
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
105
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
106
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
107
+ "language_model.model.layers.19.input_layernorm.weight": "model-00003-of-00007.safetensors",
108
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
109
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
110
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
111
+ "language_model.model.layers.19.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
112
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
113
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
114
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
115
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
116
+ "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
117
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
118
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
119
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
120
+ "language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
121
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
122
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
123
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
124
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
125
+ "language_model.model.layers.20.input_layernorm.weight": "model-00003-of-00007.safetensors",
126
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
127
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
128
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
129
+ "language_model.model.layers.20.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
130
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
131
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
132
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
133
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
134
+ "language_model.model.layers.21.input_layernorm.weight": "model-00003-of-00007.safetensors",
135
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
136
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
137
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
138
+ "language_model.model.layers.21.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
139
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
140
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
141
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
142
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
143
+ "language_model.model.layers.22.input_layernorm.weight": "model-00003-of-00007.safetensors",
144
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
145
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
146
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
147
+ "language_model.model.layers.22.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
148
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
149
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
150
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
151
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
152
+ "language_model.model.layers.23.input_layernorm.weight": "model-00004-of-00007.safetensors",
153
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
154
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
155
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
156
+ "language_model.model.layers.23.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
157
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
158
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
159
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
160
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
161
+ "language_model.model.layers.24.input_layernorm.weight": "model-00004-of-00007.safetensors",
162
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
163
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
164
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
165
+ "language_model.model.layers.24.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
166
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
167
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
168
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
169
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
170
+ "language_model.model.layers.25.input_layernorm.weight": "model-00004-of-00007.safetensors",
171
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
172
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
173
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
174
+ "language_model.model.layers.25.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
175
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
176
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
177
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
178
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
179
+ "language_model.model.layers.26.input_layernorm.weight": "model-00004-of-00007.safetensors",
180
+ "language_model.model.layers.26.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
181
+ "language_model.model.layers.26.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
182
+ "language_model.model.layers.26.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
183
+ "language_model.model.layers.26.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
184
+ "language_model.model.layers.26.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
185
+ "language_model.model.layers.26.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
186
+ "language_model.model.layers.26.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
187
+ "language_model.model.layers.26.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
188
+ "language_model.model.layers.27.input_layernorm.weight": "model-00004-of-00007.safetensors",
189
+ "language_model.model.layers.27.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
190
+ "language_model.model.layers.27.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
191
+ "language_model.model.layers.27.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
192
+ "language_model.model.layers.27.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
193
+ "language_model.model.layers.27.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
194
+ "language_model.model.layers.27.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
195
+ "language_model.model.layers.27.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
196
+ "language_model.model.layers.27.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
197
+ "language_model.model.layers.28.input_layernorm.weight": "model-00004-of-00007.safetensors",
198
+ "language_model.model.layers.28.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
199
+ "language_model.model.layers.28.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
200
+ "language_model.model.layers.28.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
201
+ "language_model.model.layers.28.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
202
+ "language_model.model.layers.28.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
203
+ "language_model.model.layers.28.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
204
+ "language_model.model.layers.28.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
205
+ "language_model.model.layers.28.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
206
+ "language_model.model.layers.29.input_layernorm.weight": "model-00004-of-00007.safetensors",
207
+ "language_model.model.layers.29.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
208
+ "language_model.model.layers.29.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
209
+ "language_model.model.layers.29.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
210
+ "language_model.model.layers.29.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
211
+ "language_model.model.layers.29.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
212
+ "language_model.model.layers.29.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
213
+ "language_model.model.layers.29.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
214
+ "language_model.model.layers.29.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
215
+ "language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00007.safetensors",
216
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
217
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
218
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
219
+ "language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
220
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
221
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
222
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
223
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
224
+ "language_model.model.layers.30.input_layernorm.weight": "model-00004-of-00007.safetensors",
225
+ "language_model.model.layers.30.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
226
+ "language_model.model.layers.30.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
227
+ "language_model.model.layers.30.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
228
+ "language_model.model.layers.30.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
229
+ "language_model.model.layers.30.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
230
+ "language_model.model.layers.30.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
231
+ "language_model.model.layers.30.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
232
+ "language_model.model.layers.30.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
233
+ "language_model.model.layers.31.input_layernorm.weight": "model-00004-of-00007.safetensors",
234
+ "language_model.model.layers.31.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
235
+ "language_model.model.layers.31.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
236
+ "language_model.model.layers.31.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
237
+ "language_model.model.layers.31.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
238
+ "language_model.model.layers.31.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
239
+ "language_model.model.layers.31.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
240
+ "language_model.model.layers.31.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
241
+ "language_model.model.layers.31.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
242
+ "language_model.model.layers.32.input_layernorm.weight": "model-00005-of-00007.safetensors",
243
+ "language_model.model.layers.32.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
244
+ "language_model.model.layers.32.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
245
+ "language_model.model.layers.32.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
246
+ "language_model.model.layers.32.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
247
+ "language_model.model.layers.32.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
248
+ "language_model.model.layers.32.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
249
+ "language_model.model.layers.32.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
250
+ "language_model.model.layers.32.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
251
+ "language_model.model.layers.33.input_layernorm.weight": "model-00005-of-00007.safetensors",
252
+ "language_model.model.layers.33.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
253
+ "language_model.model.layers.33.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
254
+ "language_model.model.layers.33.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
255
+ "language_model.model.layers.33.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
256
+ "language_model.model.layers.33.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
257
+ "language_model.model.layers.33.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
258
+ "language_model.model.layers.33.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
259
+ "language_model.model.layers.33.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
260
+ "language_model.model.layers.34.input_layernorm.weight": "model-00005-of-00007.safetensors",
261
+ "language_model.model.layers.34.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
262
+ "language_model.model.layers.34.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
263
+ "language_model.model.layers.34.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
264
+ "language_model.model.layers.34.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
265
+ "language_model.model.layers.34.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
266
+ "language_model.model.layers.34.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
267
+ "language_model.model.layers.34.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
268
+ "language_model.model.layers.34.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
269
+ "language_model.model.layers.35.input_layernorm.weight": "model-00005-of-00007.safetensors",
270
+ "language_model.model.layers.35.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
271
+ "language_model.model.layers.35.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
272
+ "language_model.model.layers.35.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
273
+ "language_model.model.layers.35.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
274
+ "language_model.model.layers.35.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
275
+ "language_model.model.layers.35.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
276
+ "language_model.model.layers.35.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
277
+ "language_model.model.layers.35.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
278
+ "language_model.model.layers.36.input_layernorm.weight": "model-00005-of-00007.safetensors",
279
+ "language_model.model.layers.36.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
280
+ "language_model.model.layers.36.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
281
+ "language_model.model.layers.36.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
282
+ "language_model.model.layers.36.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
283
+ "language_model.model.layers.36.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
284
+ "language_model.model.layers.36.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
285
+ "language_model.model.layers.36.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
286
+ "language_model.model.layers.36.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
287
+ "language_model.model.layers.37.input_layernorm.weight": "model-00005-of-00007.safetensors",
288
+ "language_model.model.layers.37.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
289
+ "language_model.model.layers.37.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
290
+ "language_model.model.layers.37.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
291
+ "language_model.model.layers.37.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
292
+ "language_model.model.layers.37.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
293
+ "language_model.model.layers.37.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
294
+ "language_model.model.layers.37.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
295
+ "language_model.model.layers.37.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
296
+ "language_model.model.layers.38.input_layernorm.weight": "model-00005-of-00007.safetensors",
297
+ "language_model.model.layers.38.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
298
+ "language_model.model.layers.38.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
299
+ "language_model.model.layers.38.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
300
+ "language_model.model.layers.38.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
301
+ "language_model.model.layers.38.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
302
+ "language_model.model.layers.38.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
303
+ "language_model.model.layers.38.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
304
+ "language_model.model.layers.38.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
305
+ "language_model.model.layers.39.input_layernorm.weight": "model-00005-of-00007.safetensors",
306
+ "language_model.model.layers.39.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
307
+ "language_model.model.layers.39.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
308
+ "language_model.model.layers.39.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
309
+ "language_model.model.layers.39.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
310
+ "language_model.model.layers.39.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
311
+ "language_model.model.layers.39.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
312
+ "language_model.model.layers.39.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
313
+ "language_model.model.layers.39.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
314
+ "language_model.model.layers.4.input_layernorm.weight": "model-00001-of-00007.safetensors",
315
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
316
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
317
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
318
+ "language_model.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
319
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
320
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
321
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
322
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
323
+ "language_model.model.layers.40.input_layernorm.weight": "model-00005-of-00007.safetensors",
324
+ "language_model.model.layers.40.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
325
+ "language_model.model.layers.40.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
326
+ "language_model.model.layers.40.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
327
+ "language_model.model.layers.40.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
328
+ "language_model.model.layers.40.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
329
+ "language_model.model.layers.40.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
330
+ "language_model.model.layers.40.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
331
+ "language_model.model.layers.40.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
332
+ "language_model.model.layers.41.input_layernorm.weight": "model-00006-of-00007.safetensors",
333
+ "language_model.model.layers.41.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
334
+ "language_model.model.layers.41.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
335
+ "language_model.model.layers.41.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
336
+ "language_model.model.layers.41.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
337
+ "language_model.model.layers.41.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
338
+ "language_model.model.layers.41.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
339
+ "language_model.model.layers.41.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
340
+ "language_model.model.layers.41.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
341
+ "language_model.model.layers.42.input_layernorm.weight": "model-00006-of-00007.safetensors",
342
+ "language_model.model.layers.42.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
343
+ "language_model.model.layers.42.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
344
+ "language_model.model.layers.42.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
345
+ "language_model.model.layers.42.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
346
+ "language_model.model.layers.42.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
347
+ "language_model.model.layers.42.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
348
+ "language_model.model.layers.42.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
349
+ "language_model.model.layers.42.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
350
+ "language_model.model.layers.43.input_layernorm.weight": "model-00006-of-00007.safetensors",
351
+ "language_model.model.layers.43.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
352
+ "language_model.model.layers.43.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
353
+ "language_model.model.layers.43.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
354
+ "language_model.model.layers.43.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
355
+ "language_model.model.layers.43.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
356
+ "language_model.model.layers.43.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
357
+ "language_model.model.layers.43.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
358
+ "language_model.model.layers.43.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
359
+ "language_model.model.layers.44.input_layernorm.weight": "model-00006-of-00007.safetensors",
360
+ "language_model.model.layers.44.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
361
+ "language_model.model.layers.44.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
362
+ "language_model.model.layers.44.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
363
+ "language_model.model.layers.44.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
364
+ "language_model.model.layers.44.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
365
+ "language_model.model.layers.44.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
366
+ "language_model.model.layers.44.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
367
+ "language_model.model.layers.44.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
368
+ "language_model.model.layers.45.input_layernorm.weight": "model-00006-of-00007.safetensors",
369
+ "language_model.model.layers.45.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
370
+ "language_model.model.layers.45.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
371
+ "language_model.model.layers.45.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
372
+ "language_model.model.layers.45.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
373
+ "language_model.model.layers.45.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
374
+ "language_model.model.layers.45.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
375
+ "language_model.model.layers.45.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
376
+ "language_model.model.layers.45.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
377
+ "language_model.model.layers.46.input_layernorm.weight": "model-00006-of-00007.safetensors",
378
+ "language_model.model.layers.46.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
379
+ "language_model.model.layers.46.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
380
+ "language_model.model.layers.46.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
381
+ "language_model.model.layers.46.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
382
+ "language_model.model.layers.46.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
383
+ "language_model.model.layers.46.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
384
+ "language_model.model.layers.46.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
385
+ "language_model.model.layers.46.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
386
+ "language_model.model.layers.47.input_layernorm.weight": "model-00006-of-00007.safetensors",
387
+ "language_model.model.layers.47.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
388
+ "language_model.model.layers.47.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
389
+ "language_model.model.layers.47.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
390
+ "language_model.model.layers.47.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
391
+ "language_model.model.layers.47.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
392
+ "language_model.model.layers.47.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
393
+ "language_model.model.layers.47.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
394
+ "language_model.model.layers.47.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
395
+ "language_model.model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
396
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
397
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
398
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
399
+ "language_model.model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
400
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
401
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
402
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
403
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
404
+ "language_model.model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
405
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
406
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
407
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
408
+ "language_model.model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
409
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
410
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
411
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
412
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
413
+ "language_model.model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
414
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
415
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
416
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
417
+ "language_model.model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
418
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
419
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
420
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
421
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
422
+ "language_model.model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
423
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
424
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
425
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
426
+ "language_model.model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
427
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
428
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
429
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
430
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
431
+ "language_model.model.layers.9.input_layernorm.weight": "model-00002-of-00007.safetensors",
432
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
433
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
434
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
435
+ "language_model.model.layers.9.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
436
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
437
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
438
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
439
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
440
+ "language_model.model.norm.weight": "model-00006-of-00007.safetensors",
441
+ "multi_modal_projector.linear_1.bias": "model-00001-of-00007.safetensors",
442
+ "multi_modal_projector.linear_1.weight": "model-00001-of-00007.safetensors",
443
+ "multi_modal_projector.linear_2.bias": "model-00001-of-00007.safetensors",
444
+ "multi_modal_projector.linear_2.weight": "model-00001-of-00007.safetensors",
445
+ "vision_tower.ln_pre.weight": "model-00001-of-00007.safetensors",
446
+ "vision_tower.patch_conv.weight": "model-00001-of-00007.safetensors",
447
+ "vision_tower.transformer.layers.0.attention.k_proj.weight": "model-00001-of-00007.safetensors",
448
+ "vision_tower.transformer.layers.0.attention.o_proj.weight": "model-00001-of-00007.safetensors",
449
+ "vision_tower.transformer.layers.0.attention.q_proj.weight": "model-00001-of-00007.safetensors",
450
+ "vision_tower.transformer.layers.0.attention.v_proj.weight": "model-00001-of-00007.safetensors",
451
+ "vision_tower.transformer.layers.0.attention_norm.weight": "model-00001-of-00007.safetensors",
452
+ "vision_tower.transformer.layers.0.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
453
+ "vision_tower.transformer.layers.0.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
454
+ "vision_tower.transformer.layers.0.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
455
+ "vision_tower.transformer.layers.0.ffn_norm.weight": "model-00001-of-00007.safetensors",
456
+ "vision_tower.transformer.layers.1.attention.k_proj.weight": "model-00001-of-00007.safetensors",
457
+ "vision_tower.transformer.layers.1.attention.o_proj.weight": "model-00001-of-00007.safetensors",
458
+ "vision_tower.transformer.layers.1.attention.q_proj.weight": "model-00001-of-00007.safetensors",
459
+ "vision_tower.transformer.layers.1.attention.v_proj.weight": "model-00001-of-00007.safetensors",
460
+ "vision_tower.transformer.layers.1.attention_norm.weight": "model-00001-of-00007.safetensors",
461
+ "vision_tower.transformer.layers.1.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
462
+ "vision_tower.transformer.layers.1.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
463
+ "vision_tower.transformer.layers.1.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
464
+ "vision_tower.transformer.layers.1.ffn_norm.weight": "model-00001-of-00007.safetensors",
465
+ "vision_tower.transformer.layers.10.attention.k_proj.weight": "model-00001-of-00007.safetensors",
466
+ "vision_tower.transformer.layers.10.attention.o_proj.weight": "model-00001-of-00007.safetensors",
467
+ "vision_tower.transformer.layers.10.attention.q_proj.weight": "model-00001-of-00007.safetensors",
468
+ "vision_tower.transformer.layers.10.attention.v_proj.weight": "model-00001-of-00007.safetensors",
469
+ "vision_tower.transformer.layers.10.attention_norm.weight": "model-00001-of-00007.safetensors",
470
+ "vision_tower.transformer.layers.10.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
471
+ "vision_tower.transformer.layers.10.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
472
+ "vision_tower.transformer.layers.10.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
473
+ "vision_tower.transformer.layers.10.ffn_norm.weight": "model-00001-of-00007.safetensors",
474
+ "vision_tower.transformer.layers.11.attention.k_proj.weight": "model-00001-of-00007.safetensors",
475
+ "vision_tower.transformer.layers.11.attention.o_proj.weight": "model-00001-of-00007.safetensors",
476
+ "vision_tower.transformer.layers.11.attention.q_proj.weight": "model-00001-of-00007.safetensors",
477
+ "vision_tower.transformer.layers.11.attention.v_proj.weight": "model-00001-of-00007.safetensors",
478
+ "vision_tower.transformer.layers.11.attention_norm.weight": "model-00001-of-00007.safetensors",
479
+ "vision_tower.transformer.layers.11.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
480
+ "vision_tower.transformer.layers.11.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
481
+ "vision_tower.transformer.layers.11.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
482
+ "vision_tower.transformer.layers.11.ffn_norm.weight": "model-00001-of-00007.safetensors",
483
+ "vision_tower.transformer.layers.12.attention.k_proj.weight": "model-00001-of-00007.safetensors",
484
+ "vision_tower.transformer.layers.12.attention.o_proj.weight": "model-00001-of-00007.safetensors",
485
+ "vision_tower.transformer.layers.12.attention.q_proj.weight": "model-00001-of-00007.safetensors",
486
+ "vision_tower.transformer.layers.12.attention.v_proj.weight": "model-00001-of-00007.safetensors",
487
+ "vision_tower.transformer.layers.12.attention_norm.weight": "model-00001-of-00007.safetensors",
488
+ "vision_tower.transformer.layers.12.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
489
+ "vision_tower.transformer.layers.12.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
490
+ "vision_tower.transformer.layers.12.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
491
+ "vision_tower.transformer.layers.12.ffn_norm.weight": "model-00001-of-00007.safetensors",
492
+ "vision_tower.transformer.layers.13.attention.k_proj.weight": "model-00001-of-00007.safetensors",
493
+ "vision_tower.transformer.layers.13.attention.o_proj.weight": "model-00001-of-00007.safetensors",
494
+ "vision_tower.transformer.layers.13.attention.q_proj.weight": "model-00001-of-00007.safetensors",
495
+ "vision_tower.transformer.layers.13.attention.v_proj.weight": "model-00001-of-00007.safetensors",
496
+ "vision_tower.transformer.layers.13.attention_norm.weight": "model-00001-of-00007.safetensors",
497
+ "vision_tower.transformer.layers.13.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
498
+ "vision_tower.transformer.layers.13.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
499
+ "vision_tower.transformer.layers.13.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
500
+ "vision_tower.transformer.layers.13.ffn_norm.weight": "model-00001-of-00007.safetensors",
501
+ "vision_tower.transformer.layers.14.attention.k_proj.weight": "model-00001-of-00007.safetensors",
502
+ "vision_tower.transformer.layers.14.attention.o_proj.weight": "model-00001-of-00007.safetensors",
503
+ "vision_tower.transformer.layers.14.attention.q_proj.weight": "model-00001-of-00007.safetensors",
504
+ "vision_tower.transformer.layers.14.attention.v_proj.weight": "model-00001-of-00007.safetensors",
505
+ "vision_tower.transformer.layers.14.attention_norm.weight": "model-00001-of-00007.safetensors",
506
+ "vision_tower.transformer.layers.14.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
507
+ "vision_tower.transformer.layers.14.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
508
+ "vision_tower.transformer.layers.14.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
509
+ "vision_tower.transformer.layers.14.ffn_norm.weight": "model-00001-of-00007.safetensors",
510
+ "vision_tower.transformer.layers.15.attention.k_proj.weight": "model-00001-of-00007.safetensors",
511
+ "vision_tower.transformer.layers.15.attention.o_proj.weight": "model-00001-of-00007.safetensors",
512
+ "vision_tower.transformer.layers.15.attention.q_proj.weight": "model-00001-of-00007.safetensors",
513
+ "vision_tower.transformer.layers.15.attention.v_proj.weight": "model-00001-of-00007.safetensors",
514
+ "vision_tower.transformer.layers.15.attention_norm.weight": "model-00001-of-00007.safetensors",
515
+ "vision_tower.transformer.layers.15.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
516
+ "vision_tower.transformer.layers.15.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
517
+ "vision_tower.transformer.layers.15.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
518
+ "vision_tower.transformer.layers.15.ffn_norm.weight": "model-00001-of-00007.safetensors",
519
+ "vision_tower.transformer.layers.16.attention.k_proj.weight": "model-00001-of-00007.safetensors",
520
+ "vision_tower.transformer.layers.16.attention.o_proj.weight": "model-00001-of-00007.safetensors",
521
+ "vision_tower.transformer.layers.16.attention.q_proj.weight": "model-00001-of-00007.safetensors",
522
+ "vision_tower.transformer.layers.16.attention.v_proj.weight": "model-00001-of-00007.safetensors",
523
+ "vision_tower.transformer.layers.16.attention_norm.weight": "model-00001-of-00007.safetensors",
524
+ "vision_tower.transformer.layers.16.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
525
+ "vision_tower.transformer.layers.16.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
526
+ "vision_tower.transformer.layers.16.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
527
+ "vision_tower.transformer.layers.16.ffn_norm.weight": "model-00001-of-00007.safetensors",
528
+ "vision_tower.transformer.layers.17.attention.k_proj.weight": "model-00001-of-00007.safetensors",
529
+ "vision_tower.transformer.layers.17.attention.o_proj.weight": "model-00001-of-00007.safetensors",
530
+ "vision_tower.transformer.layers.17.attention.q_proj.weight": "model-00001-of-00007.safetensors",
531
+ "vision_tower.transformer.layers.17.attention.v_proj.weight": "model-00001-of-00007.safetensors",
532
+ "vision_tower.transformer.layers.17.attention_norm.weight": "model-00001-of-00007.safetensors",
533
+ "vision_tower.transformer.layers.17.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
534
+ "vision_tower.transformer.layers.17.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
535
+ "vision_tower.transformer.layers.17.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
536
+ "vision_tower.transformer.layers.17.ffn_norm.weight": "model-00001-of-00007.safetensors",
537
+ "vision_tower.transformer.layers.18.attention.k_proj.weight": "model-00001-of-00007.safetensors",
538
+ "vision_tower.transformer.layers.18.attention.o_proj.weight": "model-00001-of-00007.safetensors",
539
+ "vision_tower.transformer.layers.18.attention.q_proj.weight": "model-00001-of-00007.safetensors",
540
+ "vision_tower.transformer.layers.18.attention.v_proj.weight": "model-00001-of-00007.safetensors",
541
+ "vision_tower.transformer.layers.18.attention_norm.weight": "model-00001-of-00007.safetensors",
542
+ "vision_tower.transformer.layers.18.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
543
+ "vision_tower.transformer.layers.18.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
544
+ "vision_tower.transformer.layers.18.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
545
+ "vision_tower.transformer.layers.18.ffn_norm.weight": "model-00001-of-00007.safetensors",
546
+ "vision_tower.transformer.layers.19.attention.k_proj.weight": "model-00001-of-00007.safetensors",
547
+ "vision_tower.transformer.layers.19.attention.o_proj.weight": "model-00001-of-00007.safetensors",
548
+ "vision_tower.transformer.layers.19.attention.q_proj.weight": "model-00001-of-00007.safetensors",
549
+ "vision_tower.transformer.layers.19.attention.v_proj.weight": "model-00001-of-00007.safetensors",
550
+ "vision_tower.transformer.layers.19.attention_norm.weight": "model-00001-of-00007.safetensors",
551
+ "vision_tower.transformer.layers.19.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
552
+ "vision_tower.transformer.layers.19.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
553
+ "vision_tower.transformer.layers.19.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
554
+ "vision_tower.transformer.layers.19.ffn_norm.weight": "model-00001-of-00007.safetensors",
555
+ "vision_tower.transformer.layers.2.attention.k_proj.weight": "model-00001-of-00007.safetensors",
556
+ "vision_tower.transformer.layers.2.attention.o_proj.weight": "model-00001-of-00007.safetensors",
557
+ "vision_tower.transformer.layers.2.attention.q_proj.weight": "model-00001-of-00007.safetensors",
558
+ "vision_tower.transformer.layers.2.attention.v_proj.weight": "model-00001-of-00007.safetensors",
559
+ "vision_tower.transformer.layers.2.attention_norm.weight": "model-00001-of-00007.safetensors",
560
+ "vision_tower.transformer.layers.2.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
561
+ "vision_tower.transformer.layers.2.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
562
+ "vision_tower.transformer.layers.2.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
563
+ "vision_tower.transformer.layers.2.ffn_norm.weight": "model-00001-of-00007.safetensors",
564
+ "vision_tower.transformer.layers.20.attention.k_proj.weight": "model-00001-of-00007.safetensors",
565
+ "vision_tower.transformer.layers.20.attention.o_proj.weight": "model-00001-of-00007.safetensors",
566
+ "vision_tower.transformer.layers.20.attention.q_proj.weight": "model-00001-of-00007.safetensors",
567
+ "vision_tower.transformer.layers.20.attention.v_proj.weight": "model-00001-of-00007.safetensors",
568
+ "vision_tower.transformer.layers.20.attention_norm.weight": "model-00001-of-00007.safetensors",
569
+ "vision_tower.transformer.layers.20.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
570
+ "vision_tower.transformer.layers.20.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
571
+ "vision_tower.transformer.layers.20.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
572
+ "vision_tower.transformer.layers.20.ffn_norm.weight": "model-00001-of-00007.safetensors",
573
+ "vision_tower.transformer.layers.21.attention.k_proj.weight": "model-00001-of-00007.safetensors",
574
+ "vision_tower.transformer.layers.21.attention.o_proj.weight": "model-00001-of-00007.safetensors",
575
+ "vision_tower.transformer.layers.21.attention.q_proj.weight": "model-00001-of-00007.safetensors",
576
+ "vision_tower.transformer.layers.21.attention.v_proj.weight": "model-00001-of-00007.safetensors",
577
+ "vision_tower.transformer.layers.21.attention_norm.weight": "model-00001-of-00007.safetensors",
578
+ "vision_tower.transformer.layers.21.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
579
+ "vision_tower.transformer.layers.21.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
580
+ "vision_tower.transformer.layers.21.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
581
+ "vision_tower.transformer.layers.21.ffn_norm.weight": "model-00001-of-00007.safetensors",
582
+ "vision_tower.transformer.layers.22.attention.k_proj.weight": "model-00001-of-00007.safetensors",
583
+ "vision_tower.transformer.layers.22.attention.o_proj.weight": "model-00001-of-00007.safetensors",
584
+ "vision_tower.transformer.layers.22.attention.q_proj.weight": "model-00001-of-00007.safetensors",
585
+ "vision_tower.transformer.layers.22.attention.v_proj.weight": "model-00001-of-00007.safetensors",
586
+ "vision_tower.transformer.layers.22.attention_norm.weight": "model-00001-of-00007.safetensors",
587
+ "vision_tower.transformer.layers.22.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
588
+ "vision_tower.transformer.layers.22.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
589
+ "vision_tower.transformer.layers.22.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
590
+ "vision_tower.transformer.layers.22.ffn_norm.weight": "model-00001-of-00007.safetensors",
591
+ "vision_tower.transformer.layers.23.attention.k_proj.weight": "model-00001-of-00007.safetensors",
592
+ "vision_tower.transformer.layers.23.attention.o_proj.weight": "model-00001-of-00007.safetensors",
593
+ "vision_tower.transformer.layers.23.attention.q_proj.weight": "model-00001-of-00007.safetensors",
594
+ "vision_tower.transformer.layers.23.attention.v_proj.weight": "model-00001-of-00007.safetensors",
595
+ "vision_tower.transformer.layers.23.attention_norm.weight": "model-00001-of-00007.safetensors",
596
+ "vision_tower.transformer.layers.23.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
597
+ "vision_tower.transformer.layers.23.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
598
+ "vision_tower.transformer.layers.23.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
599
+ "vision_tower.transformer.layers.23.ffn_norm.weight": "model-00001-of-00007.safetensors",
600
+ "vision_tower.transformer.layers.3.attention.k_proj.weight": "model-00001-of-00007.safetensors",
601
+ "vision_tower.transformer.layers.3.attention.o_proj.weight": "model-00001-of-00007.safetensors",
602
+ "vision_tower.transformer.layers.3.attention.q_proj.weight": "model-00001-of-00007.safetensors",
603
+ "vision_tower.transformer.layers.3.attention.v_proj.weight": "model-00001-of-00007.safetensors",
604
+ "vision_tower.transformer.layers.3.attention_norm.weight": "model-00001-of-00007.safetensors",
605
+ "vision_tower.transformer.layers.3.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
606
+ "vision_tower.transformer.layers.3.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
607
+ "vision_tower.transformer.layers.3.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
608
+ "vision_tower.transformer.layers.3.ffn_norm.weight": "model-00001-of-00007.safetensors",
609
+ "vision_tower.transformer.layers.4.attention.k_proj.weight": "model-00001-of-00007.safetensors",
610
+ "vision_tower.transformer.layers.4.attention.o_proj.weight": "model-00001-of-00007.safetensors",
611
+ "vision_tower.transformer.layers.4.attention.q_proj.weight": "model-00001-of-00007.safetensors",
612
+ "vision_tower.transformer.layers.4.attention.v_proj.weight": "model-00001-of-00007.safetensors",
613
+ "vision_tower.transformer.layers.4.attention_norm.weight": "model-00001-of-00007.safetensors",
614
+ "vision_tower.transformer.layers.4.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
615
+ "vision_tower.transformer.layers.4.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
616
+ "vision_tower.transformer.layers.4.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
617
+ "vision_tower.transformer.layers.4.ffn_norm.weight": "model-00001-of-00007.safetensors",
618
+ "vision_tower.transformer.layers.5.attention.k_proj.weight": "model-00001-of-00007.safetensors",
619
+ "vision_tower.transformer.layers.5.attention.o_proj.weight": "model-00001-of-00007.safetensors",
620
+ "vision_tower.transformer.layers.5.attention.q_proj.weight": "model-00001-of-00007.safetensors",
621
+ "vision_tower.transformer.layers.5.attention.v_proj.weight": "model-00001-of-00007.safetensors",
622
+ "vision_tower.transformer.layers.5.attention_norm.weight": "model-00001-of-00007.safetensors",
623
+ "vision_tower.transformer.layers.5.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
624
+ "vision_tower.transformer.layers.5.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
625
+ "vision_tower.transformer.layers.5.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
626
+ "vision_tower.transformer.layers.5.ffn_norm.weight": "model-00001-of-00007.safetensors",
627
+ "vision_tower.transformer.layers.6.attention.k_proj.weight": "model-00001-of-00007.safetensors",
628
+ "vision_tower.transformer.layers.6.attention.o_proj.weight": "model-00001-of-00007.safetensors",
629
+ "vision_tower.transformer.layers.6.attention.q_proj.weight": "model-00001-of-00007.safetensors",
630
+ "vision_tower.transformer.layers.6.attention.v_proj.weight": "model-00001-of-00007.safetensors",
631
+ "vision_tower.transformer.layers.6.attention_norm.weight": "model-00001-of-00007.safetensors",
632
+ "vision_tower.transformer.layers.6.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
633
+ "vision_tower.transformer.layers.6.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
634
+ "vision_tower.transformer.layers.6.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
635
+ "vision_tower.transformer.layers.6.ffn_norm.weight": "model-00001-of-00007.safetensors",
636
+ "vision_tower.transformer.layers.7.attention.k_proj.weight": "model-00001-of-00007.safetensors",
637
+ "vision_tower.transformer.layers.7.attention.o_proj.weight": "model-00001-of-00007.safetensors",
638
+ "vision_tower.transformer.layers.7.attention.q_proj.weight": "model-00001-of-00007.safetensors",
639
+ "vision_tower.transformer.layers.7.attention.v_proj.weight": "model-00001-of-00007.safetensors",
640
+ "vision_tower.transformer.layers.7.attention_norm.weight": "model-00001-of-00007.safetensors",
641
+ "vision_tower.transformer.layers.7.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
642
+ "vision_tower.transformer.layers.7.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
643
+ "vision_tower.transformer.layers.7.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
644
+ "vision_tower.transformer.layers.7.ffn_norm.weight": "model-00001-of-00007.safetensors",
645
+ "vision_tower.transformer.layers.8.attention.k_proj.weight": "model-00001-of-00007.safetensors",
646
+ "vision_tower.transformer.layers.8.attention.o_proj.weight": "model-00001-of-00007.safetensors",
647
+ "vision_tower.transformer.layers.8.attention.q_proj.weight": "model-00001-of-00007.safetensors",
648
+ "vision_tower.transformer.layers.8.attention.v_proj.weight": "model-00001-of-00007.safetensors",
649
+ "vision_tower.transformer.layers.8.attention_norm.weight": "model-00001-of-00007.safetensors",
650
+ "vision_tower.transformer.layers.8.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
651
+ "vision_tower.transformer.layers.8.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
652
+ "vision_tower.transformer.layers.8.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
653
+ "vision_tower.transformer.layers.8.ffn_norm.weight": "model-00001-of-00007.safetensors",
654
+ "vision_tower.transformer.layers.9.attention.k_proj.weight": "model-00001-of-00007.safetensors",
655
+ "vision_tower.transformer.layers.9.attention.o_proj.weight": "model-00001-of-00007.safetensors",
656
+ "vision_tower.transformer.layers.9.attention.q_proj.weight": "model-00001-of-00007.safetensors",
657
+ "vision_tower.transformer.layers.9.attention.v_proj.weight": "model-00001-of-00007.safetensors",
658
+ "vision_tower.transformer.layers.9.attention_norm.weight": "model-00001-of-00007.safetensors",
659
+ "vision_tower.transformer.layers.9.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
660
+ "vision_tower.transformer.layers.9.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
661
+ "vision_tower.transformer.layers.9.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
662
+ "vision_tower.transformer.layers.9.ffn_norm.weight": "model-00001-of-00007.safetensors"
663
+ }
664
+ }
preprocessor_config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.48145466,
8
+ 0.4578275,
9
+ 0.40821073
10
+ ],
11
+ "image_processor_type": "PixtralImageProcessor",
12
+ "image_std": [
13
+ 0.26862954,
14
+ 0.26130258,
15
+ 0.27577711
16
+ ],
17
+ "patch_size": {
18
+ "height": 16,
19
+ "width": 16
20
+ },
21
+ "processor_class": "PixtralProcessor",
22
+ "resample": 3,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "longest_edge": 1024
26
+ }
27
+ }
processor_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "image_break_token": "[IMG_BREAK]",
3
+ "image_end_token": "[IMG_END]",
4
+ "image_token": "[IMG]",
5
+ "patch_size": 16,
6
+ "processor_class": "PixtralProcessor"
7
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<pad>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84f33e6f52b2833e8cc17229af8eea363f640a898f19a48184a2c7f6f5a88337
3
+ size 17077329
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff