Text Generation
Transformers
Safetensors
arcee
conversational
bartowski commited on
Commit
53636b1
·
verified ·
0 Parent(s):

Super-squash branch 'main' using huggingface_hub

Browse files
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AFM‑License‑1
2
+
3
+ 1. Definitions
4
+
5
+ For purposes of this License, the following capitalised terms have the meanings set forth below. Terms not defined here have the meanings ordinarily ascribed to them by applicable law.
6
+
7
+ Term Definition
8
+ “License” This legal document, including any appendices or updates issued by Licensor.
9
+ “Licensor” The copyright owner releasing the Work under this License (presently Arcee AI, Inc.).
10
+ “You” / “Your” / “Licensee” An individual or Legal Entity exercising permissions granted by this License.
11
+ “Legal Entity” The entity (such as a corporation, partnership, or trust) on whose behalf You are acting; if none, then You as an individual.
12
+ “Work” The AFM large‑language‑model checkpoint(s), including all Model Weights, tokenizer files, and configuration files released by Licensor.
13
+ “Model Weights” The numerical parameters (including any quantised, fine‑tuned, merged, pruned, or otherwise modified versions) that define the neural‑network functions of the Work.
14
+ “Derivative Work” Any work based on or derived from the Work, including but not limited to: (i) fine‑tuned or further pre‑trained Model Weights; (ii) checkpoints produced by merging or adapter techniques (e.g., LoRA, PEFT); (iii) distillations or pruned versions; or (iv) translations to other model architectures.
15
+ “Outputs” Any text, image, code, embedding, or other content generated by executing the Work or a Derivative Work.
16
+ “Contribution” Any Work or Derivative Work intentionally submitted to Licensor for inclusion in the Work by any Contributor.
17
+ “Contributor” Licensor and any individual or Legal Entity that submits a Contribution.
18
+ “Commercial Use” Any use of the Work or a Derivative Work intended for or directed toward commercial advantage or monetary remuneration.
19
+ “Non‑Commercial Use” Any use that is not Commercial Use.
20
+ “Threshold” US $1.75 million in gross annual revenue of You or Your Legal Entity, calculated on a consolidated basis for the most recently completed fiscal year.
21
+ “Qualified Non‑Profit Organization” An entity described in §501(c)(3) of the U.S. Internal Revenue Code or its foreign equivalent.
22
+
23
+
24
+ 2. Grant of Copyright License
25
+
26
+ Each Contributor hereby grants You a perpetual, worldwide, royalty‑free, non‑exclusive, irrevocable copyright licence to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and any Derivative Works in Source or Object form, provided You comply with §5 (Commercial Use Limitation) and §7A (Acceptable Use).
27
+
28
+ 3. Grant of Patent License
29
+
30
+ Each Contributor hereby grants You a perpetual, worldwide, royalty‑free, non‑exclusive licence under such Contributor’s patent claims that are necessarily infringed by the Contributor’s Contribution, to make, use, sell, offer for sale, import, and otherwise run the Work or a Derivative Work, subject to §12.
31
+
32
+ 4. Redistribution, Notice & Attribution
33
+
34
+ (a) If You distribute the Work or a Derivative Work in Source or Object form, You must include a copy of this License.
35
+ (b) You must cause any modified files to carry prominent notices stating that You changed the files.
36
+ (c) You must retain, in the Source form of any Derivative Work, all copyright, patent, trademark, and attribution notices from the Source form of the Work.
37
+ (d) If the Work includes a “NOTICE” text file, You must reproduce the contents of that file within Your Derivative Works.
38
+ (e) Powered by AFM Badge. If Your product or service substantially relies on the Work and is public‑facing, You must display the notice “Powered by AFM” in a reasonably prominent location (e.g., footer, about modal, or API documentation).
39
+
40
+ 5. Commercial Use Limitation
41
+
42
+ (a) The rights granted under this License for Commercial Use are conditioned on You or Your Legal Entity not exceeding the Threshold.
43
+ (b) Any Commercial Use of the Work or a Derivative Work by a Legal Entity that meets or exceeds the Threshold is not licensed under this License.
44
+ (c) The Threshold limitation does not apply to use of the Work or a Derivative Work by a Qualified Non‑Profit Organization for Non‑Commercial or research purposes.
45
+ (d) Early‑Stage Startup Carve‑Out. A Legal Entity below the Threshold may embed the Work or a Derivative Work into a broader product or service offered to third‑party customers (including large for‑profit enterprises) provided that:
46
+ 1. No Model Weights Transfer. Model Weights (or Derivative Weights) are not transferred, licensed, or otherwise provided to the customer; and
47
+ 2. Execution within the Offering. The Work executes solely within Licensee’s product or service and is not exposed as a stand‑alone model or API that the customer can extract or re‑package.
48
+
49
+ 6. Submission of Contributions
50
+
51
+ Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be licensed as part of the Work under the terms of this License.
52
+
53
+ 7. Trademarks
54
+
55
+ This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor or its Contributors, except as required for reasonable and customary use in describing the origin of the Work.
56
+
57
+ 7A. Acceptable Use (Responsible‑AI Policy)
58
+
59
+ You shall not use the Work, a Derivative Work, or Outputs to:
60
+ 1. Develop or deploy biological, chemical, nuclear, or other weapons of mass destruction;
61
+ 2. Enable surveillance or profiling that violates internationally recognised human‑rights norms;
62
+ 3. Create or disseminate sexually exploitative content involving minors;
63
+ 4. Violate applicable laws or regulations on hate speech, harassment, or extremist content;
64
+ 5. Train, fine‑tune, or otherwise develop a competing large‑language model using the Outputs.
65
+ Licensor may update this list by publishing a revised version; Licensees must comply within 30 days of publication.
66
+
67
+ 7A‑1 Privacy & Data Responsibility
68
+
69
+ Licensee remains solely responsible for compliance with all data‑protection laws (including GDPR and CCPA) for any personal data processed through the Work.
70
+
71
+ 8. Disclaimer of Warranty
72
+
73
+ Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) “AS IS” and “AS AVAILABLE”, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non‑infringement, merchantability, or fitness for a particular purpose.
74
+
75
+ 8A. Export‑Control Compliance
76
+
77
+ Licensee shall not export, re‑export, or transfer the Work, Model Weights, or Derivative Works in violation of U.S. Export Administration Regulations (EAR), U.S. economic sanctions, or other applicable export‑control laws.
78
+
79
+ 9. Limitation of Liability
80
+
81
+ In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work.
82
+
83
+ 10. Accepting Warranty or Additional Liability
84
+
85
+ While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor.
86
+
87
+ 11. Termination and Cure
88
+
89
+ (a) If You breach §§ 4–8A (inclusive) or initiate patent litigation described in §12, Licensor may provide written notice. Your rights terminate 30 days after such notice unless You cure the breach within that period.
90
+ (b) Upon termination, You must cease all use of the Work and Derivative Works and delete all copies in Your possession.
91
+ (c) §§ 8, 9, 10, 12–14 survive termination.
92
+
93
+ 12. Patent Defensive Termination
94
+
95
+ If You initiate litigation asserting that the Work or a Contribution infringes a patent, any patent licences granted under this License to You shall terminate as of the date such litigation is filed, unless You withdraw or dismiss such litigation within 30 days.
96
+
97
+ 13. Governing Law, Venue & Audit
98
+
99
+ (a) This License is governed by the laws of the State of Delaware, USA, without regard to conflict‑of‑laws principles.
100
+ (b) Any dispute shall be resolved exclusively in the state or federal courts located in Wilmington, Delaware; each party consents to personal jurisdiction and venue therein.
101
+ (c) Audit Right. No more than once per calendar year, Licensor may request reasonable attestations or evidence of Licensee’s compliance with §§ 5 and 7A. Licensee shall provide such evidence and Licensor will treat it as confidential.
102
+
103
+ 14. Miscellaneous
104
+
105
+ (a) If any provision of this License is held unenforceable, the remainder continues in effect.
106
+ (b) No waiver shall be deemed a continuing waiver of any subsequent breach.
107
+ (c) Licensee may not assign this License without Licensor’s prior written consent.
108
+ (d) Headings are for convenience only and have no legal effect.
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: aml
4
+ language:
5
+ - en
6
+ - es
7
+ - fr
8
+ - de
9
+ - it
10
+ - pt
11
+ - ru
12
+ - ar
13
+ - hi
14
+ - ko
15
+ - zh
16
+ library_name: transformers
17
+ extra_gated_fields:
18
+ Company (optional): text
19
+ base_model:
20
+ - arcee-ai/AFM-4.5B-Base
21
+ ---
22
+
23
+ <div align="center">
24
+ <picture>
25
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/Lj9YVLIKKdImV_jID0A1g.png" width="25%" alt="Arcee AFM 4.5B">
26
+ </picture>
27
+ </div>
28
+
29
+
30
+ # AFM-4.5B
31
+
32
+ AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. We use a modified version of [TorchTitan](https://arxiv.org/abs/2410.06511) for pretraining, [Axolotl](https://axolotl.ai) for supervised fine-tuning, and a modified version of [Verifiers](https://github.com/willccbb/verifiers) for reinforcement learning.
33
+
34
+ The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance.
35
+
36
+ The model architecture follows a standard transformer decoder-only design based on Vaswani et al., incorporating several key modifications for enhanced performance and efficiency. Notable architectural features include grouped query attention for improved inference efficiency and ReLU^2 activation functions instead of SwiGLU to enable sparsification while maintaining or exceeding performance benchmarks.
37
+
38
+ The model available in this repo is the instruct model following supervised fine-tuning and reinforcement learning.
39
+
40
+ ***
41
+
42
+ <div align="center">
43
+ <picture>
44
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/sSVjGNHfrJKmQ6w8I18ek.png" style="background-color:ghostwhite;padding:5px;" width="17%" alt="Powered by Datology">
45
+ </picture>
46
+ </div>
47
+
48
+ ## Model Details
49
+
50
+ * **Model Architecture:** ArceeForCausalLM
51
+ * **Parameters:** 4.5B
52
+ * **Training Tokens:**
53
+ * **License:** [Arcee Model License (AML)](https://huggingface.co/arcee-ai/AFM-4.5B#license)
54
+ * **Key Features:**
55
+ * Built-in support for function calling and agentic reasoning.
56
+ * Strong multilingual performance (English, Spanish, French, German, Italian, Portuguese, Russian, Arabic, Hindi, Korean, and Mandarin).
57
+
58
+ ***
59
+
60
+ ## Benchmarks
61
+
62
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/BdsWFc4pxiHlK2E0j9AfG.png)
63
+
64
+ ## How to use with `transformers`
65
+
66
+ You can use the model directly with the `transformers` library.
67
+
68
+ ```python
69
+ from transformers import AutoTokenizer, AutoModelForCausalLM
70
+ import torch
71
+
72
+ model_id = "arcee-ai/AFM-4.5B"
73
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
74
+ model = AutoModelForCausalLM.from_pretrained(
75
+ model_id,
76
+ torch_dtype=torch.bfloat16,
77
+ device_map="auto"
78
+ )
79
+
80
+ messages = [
81
+ {"role": "user", "content": "Who are you?"},
82
+ ]
83
+
84
+ input_ids = tokenizer.apply_chat_template(
85
+ messages,
86
+ add_generation_prompt=True,
87
+ return_tensors="pt"
88
+ ).to(model.device)
89
+
90
+ outputs = model.generate(
91
+ input_ids,
92
+ max_new_tokens=256,
93
+ do_sample=True,
94
+ temperature=0.7,
95
+ top_k=50,
96
+ top_p=0.95
97
+ )
98
+
99
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
100
+ print(response)
101
+ ```
102
+
103
+ ## How to use with `vllm`
104
+
105
+ Ensure you are on version `0.10.0` or newer
106
+
107
+ ```
108
+ pip install vllm>=0.10.0
109
+ ```
110
+
111
+ You can then serve the model natively
112
+
113
+ ```
114
+ vllm serve arcee-ai/AFM-4.5B
115
+ ```
116
+
117
+ ## Quantization support
118
+
119
+ Support for llama.cpp is available, GGUF format quants are provided here:
120
+
121
+ https://huggingface.co/arcee-ai/AFM-4.5B-GGUF
122
+
123
+ ## License
124
+
125
+ AFM-4.5B is released under the [Arcee Model License](https://huggingface.co/arcee-ai/AFM-4.5B/blob/main/LICENSE). If your company makes less than $1.75 million in annual revenue, you’re free to use the model for commercial purposes, as long as you’re not providing the weights to a company above that threshold. If your product or application using AFM-4.5B is sold to a larger company, that’s fine—as long as they don’t receive or run the weights directly.
126
+
127
+ We want as many developers, researchers, and builders as possible to benefit from AFM-4.5B. At the same time, this license ensures that we can continue to develop and support the model for the community.
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ArceeForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 128000,
8
+ "eos_token_id": 128003,
9
+ "head_dim": 128,
10
+ "hidden_act": "relu2",
11
+ "hidden_size": 2560,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 18432,
14
+ "max_position_embeddings": 65536,
15
+ "mlp_bias": false,
16
+ "model_type": "arcee",
17
+ "num_attention_heads": 20,
18
+ "num_hidden_layers": 36,
19
+ "num_key_value_heads": 4,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_scaling": {
22
+ "beta_fast": 32.0,
23
+ "beta_slow": 1.0,
24
+ "factor": 20.0,
25
+ "mscale": 1.0,
26
+ "original_max_position_embeddings": 4096,
27
+ "rope_type": "yarn",
28
+ "type": "yarn"
29
+ },
30
+ "rope_theta": 10000.0,
31
+ "tie_word_embeddings": false,
32
+ "torch_dtype": "bfloat16",
33
+ "transformers_version": "4.53.3",
34
+ "use_cache": false,
35
+ "vocab_size": 128005
36
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 128000,
4
+ "eos_token_id": 128003,
5
+ "transformers_version": "4.53.3",
6
+ "use_cache": false
7
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57ef34a05def7d83a9fa33f11a44c842b99c4ab3fc72eab03403a15c0b6de445
3
+ size 4965245872
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fcf0b354a1975416200837ff9401064fa72462bc57044741d36854658f5bc25e
3
+ size 4273167472
model.safetensors.index.json ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 4619189760,
4
+ "total_size": 9238379520
5
+ },
6
+ "weight_map": {
7
+ "lm_head.weight": "model-00002-of-00002.safetensors",
8
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
19
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
26
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
28
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
29
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
31
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
42
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
44
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
53
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
55
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
62
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
65
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
67
+ "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
71
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
72
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
74
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
75
+ "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
76
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
77
+ "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
78
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
79
+ "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
81
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
82
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
83
+ "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
84
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
85
+ "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
86
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
87
+ "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
88
+ "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
89
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
90
+ "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
91
+ "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
92
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
93
+ "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
94
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
95
+ "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
97
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00002.safetensors",
98
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
99
+ "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
100
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
101
+ "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
103
+ "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
106
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
107
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
108
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
109
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
110
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
111
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
113
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
114
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
115
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
116
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
117
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
118
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
119
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
120
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
121
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
122
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
123
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
124
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
125
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
126
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
127
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
128
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
129
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
130
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
131
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
132
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
133
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
134
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
135
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
136
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
137
+ "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
138
+ "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
139
+ "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
140
+ "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
141
+ "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
142
+ "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
143
+ "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
144
+ "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
145
+ "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
146
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
147
+ "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
148
+ "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
149
+ "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
150
+ "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
151
+ "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
152
+ "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
153
+ "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
154
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
155
+ "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
156
+ "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
157
+ "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
158
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
159
+ "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
160
+ "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
161
+ "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
162
+ "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
163
+ "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
164
+ "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
165
+ "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
166
+ "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
167
+ "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
168
+ "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
169
+ "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
170
+ "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
171
+ "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
172
+ "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
173
+ "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
174
+ "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
175
+ "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
176
+ "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
177
+ "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
178
+ "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
179
+ "model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
180
+ "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
181
+ "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
182
+ "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
183
+ "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
184
+ "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
185
+ "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
186
+ "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
187
+ "model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
188
+ "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
189
+ "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
190
+ "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
191
+ "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
192
+ "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
193
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
194
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
195
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
196
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
197
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
198
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
199
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
200
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
201
+ "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
202
+ "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
203
+ "model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
204
+ "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
205
+ "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
206
+ "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
207
+ "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
208
+ "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
209
+ "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
210
+ "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
211
+ "model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
212
+ "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
213
+ "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
214
+ "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
215
+ "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
216
+ "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
217
+ "model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
218
+ "model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
219
+ "model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
220
+ "model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
221
+ "model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
222
+ "model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
223
+ "model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
224
+ "model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
225
+ "model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
226
+ "model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
227
+ "model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
228
+ "model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
229
+ "model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
230
+ "model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
231
+ "model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
232
+ "model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
233
+ "model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
234
+ "model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
235
+ "model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
236
+ "model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
237
+ "model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
238
+ "model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
239
+ "model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
240
+ "model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
241
+ "model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
242
+ "model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
243
+ "model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
244
+ "model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
245
+ "model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
246
+ "model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
247
+ "model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
248
+ "model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
249
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
250
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
251
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
252
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
253
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
254
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
255
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
256
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
257
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
258
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
259
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
260
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
261
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
262
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
263
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
264
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
265
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
266
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
267
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
268
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
269
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
270
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
271
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
272
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
273
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
274
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
275
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
276
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
277
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
278
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
279
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
280
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
281
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
282
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
283
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
284
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
285
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
286
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
287
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
288
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
289
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
290
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
291
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
292
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
293
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
294
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
295
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
296
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
297
+ "model.norm.weight": "model-00002-of-00002.safetensors"
298
+ }
299
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|im_end|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|finetune_right_pad_id|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d48708c6021027e8fc6d5342e1498111d8e87aae8903319d3ead1fbdfc4a9125
3
+ size 17158115
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "128000": {
4
+ "content": "<|begin_of_text|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "128001": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "128002": {
20
+ "content": "<|im_start|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "128003": {
28
+ "content": "<|im_end|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128004": {
36
+ "content": "<|finetune_right_pad_id|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<|begin_of_text|>",
45
+ "chat_template": "{%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n{%- else %}\n {{- '<|im_start|>system\\nThe assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant\\'s responses are calm, intelligent, and personable, always aiming to truly understand the user\\'s intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully.<|im_end|>\\n' }}\n{%- endif %}\n{%- for message in messages %}\n {%- if not (message.role == 'system' and loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endfor %}\n{%- if messages[-1]['role'] != 'assistant' %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
46
+ "clean_up_tokenization_spaces": true,
47
+ "eos_token": "<|im_end|>",
48
+ "extra_special_tokens": {},
49
+ "model_input_names": [
50
+ "input_ids",
51
+ "attention_mask"
52
+ ],
53
+ "model_max_length": 65536,
54
+ "pad_token": "<|finetune_right_pad_id|>",
55
+ "tokenizer_class": "PreTrainedTokenizerFast"
56
+ }