robbiemu commited on
Commit
e39ff3a
·
1 Parent(s): e296535

add mlx and mlx-lm support

Browse files
README.md CHANGED
@@ -1,5 +1,695 @@
1
- ---
2
- license: other
3
- license_name: fair-noncommercial-research-license
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: fair-noncommercial-research
4
+ extra_gated_prompt: >
5
+ FAIR Noncommercial Research License v1 Last Updated: August 18, 2025
6
+
7
+ “Acceptable Use Policy” means the FAIR Acceptable Use Policy, applicable to
8
+ Research Materials, that is incorporated into this Agreement.
9
+
10
+ “Agreement” means the terms and conditions for use, reproduction, distribution
11
+ and modification of the Research Materials set forth herein.
12
+
13
+
14
+ “Documentation” means the specifications, manuals and documentation
15
+ accompanying Research Materials distributed by Meta.
16
+
17
+
18
+ “Licensee” or “you” means you, or your employer or any other person or entity
19
+ (if you are entering into this Agreement on such person or entity’s behalf),
20
+ of the age required under applicable laws, rules or regulations to provide
21
+ legal consent and that has legal authority to bind your employer or such other
22
+ person or entity if you are entering in this Agreement on their behalf.
23
+
24
+
25
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or,
26
+ if you are an entity, your principal place of business is in the EEA or
27
+ Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA
28
+ or Switzerland).
29
+
30
+ “Noncommercial Research Uses” means noncommercial research use cases related
31
+ to research, development, education, processing, or analysis and in each case,
32
+ is not primarily intended for commercial advantage or monetary compensation to
33
+ you or others.
34
+
35
+ “Research Materials” means, collectively, Documentation and the models,
36
+ software and algorithms, including machine-learning model code, trained model
37
+ weights, inference-enabling code, training-enabling code, fine-tuning enabling
38
+ code, demonstration materials and other elements of the foregoing distributed
39
+ by Meta and made available under this Agreement.
40
+
41
+ By clicking “I Accept” below or by using or distributing any portion or
42
+ element of the Research Materials, you agree to be bound by this Agreement.
43
+
44
+
45
+ 1. License Rights and Redistribution.
46
+
47
+
48
+ a. Grant of Rights. You are granted a non-exclusive, worldwide,
49
+ non-transferable and royalty-free limited license under Meta’s intellectual
50
+ property or other rights owned by Meta embodied in the Research Materials to
51
+ use, reproduce, distribute, copy, create derivative works of, and make
52
+ modifications to the Research Materials.
53
+
54
+ b. Redistribution and Use. i. You will not use the Research Materials or any
55
+ outputs or results of the Research Materials in connection with any commercial
56
+ uses or for any uses other than Noncommercial Research Uses;
57
+
58
+
59
+ ii. Distribution of Research Materials, and any derivative works thereof, are
60
+ subject to the terms of this Agreement. If you distribute or make the Research
61
+ Materials, or any derivative works thereof, available to a third party, you
62
+ may only do so under the terms of this Agreement. You shall also provide a
63
+ copy of this Agreement to such third party.
64
+
65
+
66
+ iii. If you submit for publication the results of research you perform on,
67
+ using, or otherwise in connection with Research Materials, you must
68
+ acknowledge the use of Research Materials in your publication.
69
+
70
+
71
+ iv. Your use of the Research Materials must comply with applicable laws and
72
+ regulations (including Trade Control Laws) and adhere to the FAIR Acceptable
73
+ Use Policy, which is hereby incorporated by reference into this Agreement. 2.
74
+ User Support. Your Noncommercial Research Use of the Research Materials is
75
+ done at your own discretion; Meta does not process any information nor provide
76
+ any service in relation to such use. Meta is under no obligation to provide
77
+ any support services for the Research Materials. Any support provided is “as
78
+ is”, “with all faults”, and without warranty of any kind.
79
+
80
+
81
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE RESEARCH
82
+ MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS”
83
+ BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF
84
+ ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY
85
+ WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A
86
+ PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE
87
+ APPROPRIATENESS OF USING OR REDISTRIBUTING THE RESEARCH MATERIALS AND ASSUME
88
+ ANY RISKS ASSOCIATED WITH YOUR USE OF THE RESEARCH MATERIALS AND ANY OUTPUT
89
+ AND RESULTS.
90
+
91
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE
92
+ UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS
93
+ LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS
94
+ OR ANY DIRECT OR INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR
95
+ PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE
96
+ POSSIBILITY OF ANY OF THE FOREGOING.
97
+
98
+ 5. Intellectual Property.
99
+
100
+
101
+ a. Subject to Meta’s ownership of Research Materials and derivatives made by
102
+ or for Meta, with respect to any derivative works and modifications of the
103
+ Research Materials that are made by you, as between you and Meta, you are and
104
+ will be the owner of such derivative works and modifications.
105
+
106
+ b. If you institute litigation or other proceedings against Meta or any entity
107
+ (including a cross-claim or counterclaim in a lawsuit) alleging that the
108
+ Research Materials, outputs or results, or any portion of any of the
109
+ foregoing, constitutes infringement of intellectual property or other rights
110
+ owned or licensable by you, then any licenses granted to you under this
111
+ Agreement shall terminate as of the date such litigation or claim is filed or
112
+ instituted. You will indemnify and hold harmless Meta from and against any
113
+ claim by any third party arising out of or related to your use or distribution
114
+ of the Research Materials.
115
+
116
+ 6. Term and Termination. The term of this Agreement will commence upon your
117
+ acceptance of this Agreement or access to the Research Materials and will
118
+ continue in full force and effect until terminated in accordance with the
119
+ terms and conditions herein. Meta may terminate this Agreement if you are in
120
+ breach of any term or condition of this Agreement. Upon termination of this
121
+ Agreement, you shall delete and cease use of the Research Materials. Sections
122
+ 3, 4 and 7 shall survive the termination of this Agreement.
123
+
124
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and
125
+ construed under the laws of the State of California without regard to choice
126
+ of law principles, and the UN Convention on Contracts for the International
127
+ Sale of Goods does not apply to this Agreement. The courts of California shall
128
+ have exclusive jurisdiction of any dispute arising out of this Agreement.
129
+
130
+
131
+ 8. Modifications and Amendments. Meta may modify this Agreement from time to
132
+ time; provided that they are similar in spirit to the current version of the
133
+ Agreement, but may differ in detail to address new problems or concerns. All
134
+ such changes will be effective immediately. Your continued use of the Research
135
+ Materials after any modification to this Agreement constitutes your agreement
136
+ to such modification. Except as provided in this Agreement, no modification or
137
+ addition to any provision of this Agreement will be binding unless it is in
138
+ writing and signed by an authorized representative of both you and Meta.
139
+
140
+
141
+ FAIR Acceptable Use Policy
142
+
143
+ The Fundamental AI Research (FAIR) team at Meta seeks to further understanding
144
+ of new and existing research domains with the mission of advancing the
145
+ state-of-the-art in artificial intelligence through open research for the
146
+ benefit of all.
147
+
148
+ As part of this mission, Meta makes certain research materials available for
149
+ noncommercial research use. Meta is committed to promoting the safe and
150
+ responsible use of such research materials.
151
+
152
+ Prohibited Uses
153
+
154
+ You agree you will not use, or allow others to use, Research Materials to:
155
+
156
+ Violate the law or others’ rights, including to: Engage in, promote, generate,
157
+ contribute to, encourage, plan, incite, or further illegal or unlawful
158
+ activity or content, such as: Violence or terrorism Exploitation or harm to
159
+ children, including the solicitation, creation, acquisition, or dissemination
160
+ of child exploitative content or failure to report Child Sexual Abuse Material
161
+ Human trafficking, exploitation, and sexual violence The illegal distribution
162
+ of information or materials to minors, including obscene materials, or failure
163
+ to employ legally required age-gating in connection with such information or
164
+ materials. Sexual solicitation Any other criminal activity
165
+
166
+ Engage in, promote, incite, or facilitate the harassment, abuse, threatening,
167
+ or bullying of individuals or groups of individuals
168
+
169
+ Engage in, promote, incite, or facilitate discrimination or other unlawful or
170
+ harmful conduct in the provision of employment, employment benefits, credit,
171
+ housing, other economic benefits, or other essential goods and services
172
+
173
+ Engage in the unauthorized or unlicensed practice of any profession including,
174
+ but not limited to, financial, legal, medical/health, or related professional
175
+ practices
176
+
177
+ Collect, process, disclose, generate, or infer health, demographic, or other
178
+ sensitive personal or private information about individuals without rights and
179
+ consents required by applicable laws
180
+
181
+ Engage in or facilitate any action or generate any content that infringes,
182
+ misappropriates, or otherwise violates any third-party rights, including the
183
+ outputs or results of any technology using FAIR research materials
184
+
185
+ Create, generate, or facilitate the creation of malicious code, malware,
186
+ computer viruses or do anything else that could disable, overburden, interfere
187
+ with or impair the proper working, integrity, operation or appearance of a
188
+ website or computer system
189
+
190
+ 2. Engage in, promote, incite, facilitate, or assist in the planning or
191
+ development of activities that present a risk of death or bodily harm to
192
+ individuals, including use of research artifacts related to the following:
193
+
194
+ Military, warfare, nuclear industries or applications, espionage, use for
195
+ materials or activities that are subject to the International Traffic Arms
196
+ Regulations (ITAR) maintained by the United States Department of State
197
+
198
+ Guns and illegal weapons (including weapon development)
199
+
200
+ Illegal drugs and regulated/controlled substances
201
+
202
+ Operation of critical infrastructure, transportation technologies, or heavy
203
+ machinery
204
+
205
+ Self-harm or harm to others, including suicide, cutting, and eating disorders
206
+
207
+ Any content intended to incite or promote violence, abuse, or any infliction
208
+ of bodily harm to an individual
209
+
210
+ 3. Intentionally deceive or mislead others, including use of FAIR Research
211
+ Materials related to the following:
212
+
213
+ Generating, promoting, or furthering fraud or the creation or promotion of
214
+ disinformation
215
+
216
+ Generating, promoting, or furthering defamatory content, including the
217
+ creation of defamatory statements, images, or other content
218
+
219
+ Generating, promoting, or further distributing spam
220
+
221
+ Impersonating another individual without consent, authorization, or legal
222
+ right
223
+
224
+ Representing that outputs of FAIR research materials or outputs from
225
+ technology using FAIR research materials are human-generated
226
+
227
+ Generating or facilitating false online engagement, including fake reviews and
228
+ other means of fake online engagement
229
+
230
+ 4. Fail to appropriately disclose to end users any known dangers of your
231
+ Research Materials.
232
+
233
+ Please report any violation of this Policy or other problems that could lead
234
+ to a violation of this Policy by submitting a report here
235
+ [https://docs.google.com/forms/d/e/1FAIpQLSeb11cryAopJ7LNrC4nxEUXrHY26hfkXQMf_uH-oFgA3WlYZQ/viewform].
236
+ extra_gated_fields:
237
+ First Name: text
238
+ Last Name: text
239
+ Date of birth: date_picker
240
+ Country: country
241
+ Affiliation: text
242
+ Job title:
243
+ type: select
244
+ options:
245
+ - Student
246
+ - Research Graduate
247
+ - AI researcher
248
+ - AI developer/engineer
249
+ - Reporter
250
+ - Other
251
+ geo: ip_location
252
+ By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
253
+ extra_gated_description: >-
254
+ The information you provide will be collected, stored, processed and shared in
255
+ accordance with the [Meta Privacy
256
+ Policy](https://www.facebook.com/privacy/policy/).
257
+ extra_gated_button_content: Submit
258
+ extra_gated_heading: >-
259
+ Please be sure to provide your full legal name, date of birth, and full
260
+ organization name with all corporate identifiers. Avoid the use of acronyms
261
+ and special characters. Failure to follow these instructions may prevent you
262
+ from accessing this model and others on Hugging Face. You will not have the
263
+ ability to edit this form after submission, so please ensure all information
264
+ is accurate.
265
+ language:
266
+ - en
267
+ library_name: mlx
268
+ pipeline_tag: text-generation
269
+ tags:
270
+ - facebook
271
+ - meta
272
+ - pytorch
273
+ - mobilellm
274
+ - mlx
275
+ - apple-mlx
276
+ - runtime
277
+ base_model:
278
+ - facebook/MobileLLM-R1-950M
279
+ ---
280
+
281
+ # MLX Runtime (Apple silicon) — Added Files & Usage
282
+
283
+ This fork adds a lightweight MLX runtime so you can run the original MobileLLM‑R1‑950M weights with Apple’s MLX on Apple silicon. It keeps the original weights (`model.safetensors`) and tokenizer; only the runtime is added.
284
+
285
+ ## Technical Documentation
286
+
287
+ For detailed technical information about this port, see:
288
+ - [**MLX Technical Summary**](mlx_technical_summary.md) - Challenges and solutions for porting MobileLLM-R1 to MLX in this PoC conversion.
289
+ - [**Conversion Log**](conversion.log) - Details of the model conversion process
290
+ - [**Quantization Log**](quantization.log) - Information about quantization procedures and results
291
+
292
+
293
+ What’s included (added files)
294
+ - `model.py` — Minimal MLX implementation of the architecture with GQA, optional Q/K norm, RoPE, and output weight tying.
295
+ - `inference.py` — Simple text generation CLI with temperature, top‑p, greedy mode, optional chat template, EOS handling, plus boxed‑answer controls for math.
296
+ - `test_model.py` — Diagnostics to verify model structure/parameter shapes and key weight presence.
297
+ - `check_shape.py` — Heuristic check to inspect the MLP variant from `model.safetensors` and `config.json`.
298
+ - `main.py` — Convenience entry for quick manual tests.
299
+
300
+ Notes
301
+ - This is an MLX runtime; it does not change or fine‑tune the weights. The README front‑matter marks this repo as a derivative of `facebook/MobileLLM-R1-950M` via `base_model` so it appears correctly on Hugging Face.
302
+ - Tested via `uv` on macOS with Python 3.13; deps are pinned in `uv.lock`/`pyproject.toml`.
303
+
304
+ Quick start (MLX, local safetensors)
305
+ - Install and run with uv: `uv run python inference.py --prompt "What is 2+2?" --temperature 0.0 --max-tokens 64`
306
+ - Use chat template (default if `chat_template.jinja` present): `uv run python inference.py --prompt "Explain quicksort in 1–2 sentences." --temperature 0.7 --top-p 0.9`
307
+ - Disable chat template: `uv run python inference.py --prompt "Explain quicksort in 1–2 sentences." --disable-chat-template --temperature 0.7 --top-p 0.9`
308
+ - Math mode, final answer only: `uv run python inference.py --prompt "Compute 17 * 23. Put your final answer in \\boxed{.}" --temperature 0.0 --final-only --stop-at-boxed --extract-boxed --max-tokens 128`
309
+
310
+ Tips
311
+ - If a sampled response stops mid‑sentence, increase `--max-tokens` (e.g., 192–256) or use a lower `--temperature`/`--top-p`.
312
+ - For concise answers with the chat template, pass a system prompt: `--system "Be concise. Answer in 1–2 sentences."`.
313
+
314
+ Diagnostics
315
+ - Structure/weights check: `uv run python test_model.py`
316
+ - MLP variant heuristic: `uv run python check_shape.py .`
317
+
318
+ Details
319
+ - The loader maps HF weight names to MLX module names and detects the MLP variant from weight keys to ensure correct layer wiring.
320
+ - Attention uses standard `1/sqrt(d)` scaling for best generation quality.
321
+
322
+ ```markdown
323
+ ## Installation
324
+
325
+ This project uses `uv` for dependency management.
326
+
327
+ ### Using uv (recommended)
328
+ ```bash
329
+ # 1. Clone the repo
330
+ git clone <your-repo>
331
+ cd <your-repo>
332
+
333
+ # 2. Sync all dependencies (includes the default set)
334
+ uv sync
335
+
336
+ # 3. (Optional) Add the torch group if you plan to customize/train models
337
+ uv sync --extra torch
338
+ ```
339
+
340
+ ### Without uv
341
+ If you prefer pip/venv, a `requirements.txt` is provided:
342
+ ```bash
343
+ python -m venv .venv
344
+ source .venv/bin/activate # Windows: .venv\Scripts\activate
345
+ pip install -r requirements.txt
346
+ ```
347
+
348
+ > The `torch` extra is only required if you intend to fine-tune or swap model back-ends; the default installation already supports inference.
349
+ ```
350
+
351
+ ## MLX Inference Examples (safetensors)
352
+
353
+ - Basic greedy generation:
354
+ - `uv run python inference.py --prompt "MobileLLM-R1 runs on MLX." --temperature 0 --max-tokens 64`
355
+ - Chat-style with template:
356
+ - `uv run python inference.py --prompt "Briefly summarize quicksort." --temperature 0.7 --top-p 0.9`
357
+ - Disable the chat template:
358
+ - `uv run python inference.py --prompt "Briefly summarize quicksort." --disable-chat-template --temperature 0.7 --top-p 0.9`
359
+ - Math/coding “final answer only”:
360
+ - `uv run python inference.py --prompt "Solve: 128 / 8. Put final answer in \\boxed{.}" --temperature 0 --final-only --stop-at-boxed --extract-boxed`
361
+
362
+ ## Design Choices (why not a trivial block)
363
+
364
+ This runtime mirrors the functional details of the released weights so they load 1:1 and generate well in MLX. A minimal “one size fits all” block hides critical differences and leads to poor output quality. Key choices:
365
+
366
+ - Attention layout and features
367
+ - Grouped-Query Attention (GQA): separate `num_attention_heads` vs `num_key_value_heads` with head_dim from config. We implement a custom `Attention` so K/V can be repeated across groups and still match the HF weight layout.
368
+ - Q/K normalization: optional RMSNorm applied to per-head Q and K, controlled by `use_qk_norm`.
369
+ - RoPE: MLX `nn.RoPE` with the model’s `rope_theta` (8e6 here), and a per-layer toggle via `no_rope_layers`. We gate RoPE per block, with a safe fallback if the list disables all layers.
370
+ - Scaling: we use standard `1/sqrt(d)` for SDPA. Some configs expose an `attn_scale` used for training tricks; applying it at inference severely degraded outputs, so it’s not multiplied into SDPA.
371
+
372
+ - MLP variant detection
373
+ - MobileLLM variants use either standard SwiGLU (gate_proj/up_proj/down_proj) or a dual-branch dense MLP. We detect the variant from weight keys in `model.safetensors` and instantiate the correct module so shapes and semantics match.
374
+
375
+ - Weight tying and mapping
376
+ - Tie output logits to the token embedding matrix when `tie_word_embeddings` is true, matching HF behavior and saving memory.
377
+ - Map HF names to MLX names during load: `model.embed_tokens`→`tok_embeddings`, layer/attn/norm renames, `mlp.`→`feed_forward.`, `model.norm`→`norm`.
378
+
379
+ - Template and decoding
380
+ - Provide a Jinja chat template for parity with HF chat usage, but allow `--disable-chat-template` for raw prompting. Multiple EOS IDs are supported.
381
+ - Sampling: temperature, top‑p, and greedy; optional repetition/frequency penalties; math helpers `--final-only/--stop-at-boxed/--extract-boxed` to keep answers concise.
382
+
383
+ # Model Details
384
+
385
+ We present MobileLLM-R1, a new series of efficient reasoning models in the MobileLLM family. The release includes two categories of models:
386
+
387
+ Base models:
388
+ - [MobileLLM-R1-140M-base](https://huggingface.co/facebook/MobileLLM-R1-140M-base/)
389
+ - [MobileLLM-R1-360M-base](https://huggingface.co/facebook/MobileLLM-R1-360M-base/)
390
+ - [MobileLLM-R1-950M-base](https://huggingface.co/facebook/MobileLLM-R1-950M-base/)
391
+
392
+ Final models:
393
+ - [MobileLLM-R1-140M](https://huggingface.co/facebook/MobileLLM-R1-140M/)
394
+ - [MobileLLM-R1-360M](https://huggingface.co/facebook/MobileLLM-R1-360M/)
395
+ - [MobileLLM-R1-950M](https://huggingface.co/facebook/MobileLLM-R1-950M/)
396
+
397
+ > **Note**: These models are not general-purpose chat models. They are Supervised Fine-Tuned (SFT) models, specifically trained to address mathematical, programming (Python, C++), and scientific problems.
398
+
399
+ In addition to the models, we release the complete training recipes and data sources to ensure reproducibility and support further research.
400
+
401
+ Remarkably, the MobileLLM-R1 950M, pre-trained on only **~2T high-quality tokens** and with fewer than 5T total training tokens, achieves comparable or superior performance to Qwen3 0.6B, which was trained on 36T tokens, across MATH, GSM8K, MMLU, and LiveCodeBench benchmarks.
402
+
403
+ Compared to existing fully open-source models, MobileLLM-R1 950M model achieves **~5× higher accuracy on MATH** compared to the Olmo 1.24B model and **~2× higher accuracy** relative to the SmolLM2 1.7B model, despite being substantially smaller in parameter scale. In addition, MobileLLM-R1 950M outperforms both Olmo 1.24B and SmolLM2 1.7B **by a wide margin on coding benchmarks**, establishing a new state-of-the-art among fully open-source models.
404
+
405
+ # Highlights
406
+
407
+
408
+ ### Pretrained Model
409
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/660f893bae89429c07a32cdb/b9rg8yZTxeWhRWus_tJR_.jpeg)
410
+
411
+ ### Token efficiency comparison across pretrained models
412
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/660f893bae89429c07a32cdb/dJtdh5dmVTdowP1gMR5qQ.jpeg)
413
+
414
+ ### Post-trained Model
415
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660f893bae89429c07a32cdb/0MxKBLDfb8xRwg-uVi1WQ.png)
416
+
417
+
418
+
419
+ **Model Architecture**:
420
+
421
+ | | # Layers | # Attnetion Heads | # KV Heads | Dim | Hidden Dim | Params |
422
+ | --- | --- | --- | --- | --- | --- | --- |
423
+ | MobileLLM-R1-140M | 15 | 9 | 3 | 576 | 2048 | 140M |
424
+ | MobileLLM-R1-360M | 15 | 16 | 4 | 1024 | 4096 | 359M |
425
+ | MobileLLM-R1-950M | 22 | 24 | 6 | 1536 | 6144 | 949M |
426
+
427
+ | | Input modalities | Output modalities | Context Length | Vocaburary Size | Shared Embeddings |
428
+ | --- | --- | --- | --- | --- | --- |
429
+ | [MobileLLM-R1-140M-base](https://huggingface.co/facebook/MobileLLM-R1-140M-base) | Text | Text | 4k | 128k | Yes |
430
+ | [MobileLLM-R1-360M-base](https://huggingface.co/facebook/MobileLLM-R1-360M-base) | Text | Text | 4k | 128k | Yes |
431
+ | [MobileLLM-R1-950M-base](https://huggingface.co/facebook/MobileLLM-R1-950M-base) | Text | Text | 4k | 128k | Yes |
432
+ | [MobileLLM-R1-140M](https://huggingface.co/facebook/MobileLLM-R1-140M) | Text | Text | 32k | 128k | Yes |
433
+ | [MobileLLM-R1-360M](https://huggingface.co/facebook/MobileLLM-R1-360M) | Text | Text | 32k | 128k | Yes |
434
+ | [MobileLLM-R1-950M](https://huggingface.co/facebook/MobileLLM-R1-950M) | Text | Text | 32k | 128k | Yes |
435
+
436
+ # How to use
437
+
438
+ To load the pretrained model for further finetuning or evaluation:
439
+ ```bash
440
+ from transformers import AutoModelForCausalLM, AutoTokenizer
441
+ tokenizer = AutoTokenizer.from_pretrained("facebook/MobileLLM-R1-950M")
442
+ model = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-R1-950M")
443
+ ```
444
+
445
+ # Inference examples
446
+
447
+ ## Inference (MLX)
448
+
449
+ Use the MLX runtime provided in this repo to run the local `model.safetensors` on Apple silicon.
450
+
451
+ - Basic: `uv run python inference.py --prompt "Hello MLX" --temperature 0.7 --top-p 0.9`
452
+ - Deterministic: `uv run python inference.py --prompt "Hello MLX" --temperature 0 --max-tokens 64`
453
+
454
+ Flags in `inference.py`
455
+ - `--model-path`: path to model directory (default: `.`)
456
+ - `--prompt`: input text
457
+ - `--max-tokens`: number of tokens to generate
458
+ - `--temperature`: 0 for greedy, >0 for sampling
459
+ - `--top-p`: nucleus sampling cutoff
460
+ - `--system`: optional system message when using chat template
461
+ - `--final-only`: instructs model to output only a final boxed answer
462
+ - `--stop-at-boxed`: stop generation after closing `}` following `\boxed{`
463
+ - `--extract-boxed`: print the last `\boxed{...}` content
464
+ - `--disable-chat-template`: bypass `chat_template.jinja` and send raw prompt (with BOS)
465
+ - `--repetition-penalty`: discourage previously generated tokens (>1.0)
466
+ - `--frequency-penalty`: subtract alpha * token frequency from logits
467
+
468
+ See also: the “MLX Runtime (Apple silicon) — Added Files & Usage” section above for more examples and notes.
469
+
470
+ Transformers
471
+
472
+ ```py
473
+ from transformers import pipeline
474
+ import torch
475
+
476
+ model_id = "facebook/MobileLLM-R1-950M"
477
+
478
+ pipe = pipeline(
479
+ "text-generation",
480
+ model=model_id,
481
+ torch_dtype="auto",
482
+ device_map="auto",
483
+ )
484
+
485
+ # Math problem / default scenario
486
+ messages = [
487
+ {
488
+ "role": "system",
489
+ "content": "Please reason step by step, and put your final answer within \\boxed{}."
490
+ },
491
+ {"role": "user", "content": "Compute: $1-2+3-4+5- \\dots +99-100$."},
492
+ ]
493
+
494
+ # C++ coding scenario
495
+ messages = [
496
+ {
497
+ "role": "system",
498
+ "content": (
499
+ "\nYou are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.\n\n"
500
+ "Please use c++ programming language only.\n"
501
+ "You must use ```cpp for just the final solution code block with the following format:\n"
502
+ "```cpp\n# Your code here\n```\n"
503
+ )
504
+ },
505
+ {"role": "user", "content": "Write a C++ program that prints 'Hello, World!'."},
506
+ ]
507
+
508
+ # Python coding scenario
509
+ messages = [
510
+ {
511
+ "role": "system",
512
+ "content": (
513
+ "\nYou are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.\n\n"
514
+ "Please use python programming language only.\n"
515
+ "You must use ```python for just the final solution code block with the following format:\n"
516
+ "```python\n# Your code here\n```\n"
517
+ )
518
+ },
519
+ {"role": "user", "content": "Write a Python function that returns the square of a number."},
520
+ ]
521
+
522
+ outputs = pipe(
523
+ messages,
524
+ max_new_tokens=8192,
525
+ )
526
+ print(outputs[0]["generated_text"][-1])
527
+ ```
528
+
529
+ You can also run inference with vLLM. You only need to register the model architecture Llama4ForCausalLM with the vLLM ModelRegistry.
530
+ ```bash
531
+ from vllm.model_executor.models.llama4 import Llama4ForCausalLM
532
+ from vllm.model_executor.models.registry import ModelRegistry
533
+ ModelRegistry.register_model("Llama4ForCausalLM", Llama4ForCausalLM)
534
+ ```
535
+
536
+
537
+ # Evaluation
538
+
539
+ ## MobileLLM-R1 base model
540
+ | Model | Size | MATH500 | GSM8K | MBPP | HumanEval | CommonSense Avg. | MMLU |
541
+ | --- | --- | --- | --- | --- | --- | --- | --- |
542
+ | | | 4-shot <br> em | 8-shot <br> em | 3-shot <br> pass@1 | 0-shot <br> pass@1 | 0-shot <br> accuracy | 5-shot <br> accuracy |
543
+ | |
544
+ | *<150M* | | | | | | | |
545
+ | SmolLM2-135M-base | 135M | 0.4 | 1.8 | 3.8 | 0.0 | **50.7** | -- |
546
+ | **MobileLLM-R1-140M-base** | 140M | **4.6** | **16.3** | **5.4** | **15.9** | 44.3 | -- |
547
+ | |
548
+ | *150M - 400M* | | | | | | | |
549
+ | Gemma-3-270M-pt | 268M | 0.6 | 1.1 | 2.0 | 3.1 | 48.4 | 26.5 |
550
+ | SmolLM2-360M-base | 362M | 1.8 | 5.0 | **19.4** | 0.0 | **56.6** | 24.7 |
551
+ | **MobileLLM-R1-360M-base** | 359M | **13.4** | **39.4** | **20.8** | **32.9** | 51.0 | **26.8** |
552
+ | |
553
+ | *400M - 1B* | | | | | | | |
554
+ | Qwen2.5-0.5B-base | 494M | 14.8 | 41.8 | 29.6 | 28.1 | 52.3 | 47.5 |
555
+ | Qwen3-0.6B-base | 596M | **29.8** | 60.9 | **39.0** | 30.5 | 55.3 | **52.4** |
556
+ | **MobileLLM-R1-950M-base** | 949M | 26.8 | **61.6** | **39.2** | **46.3** | **58.6** | 47.4 |
557
+ | |
558
+ | *> 1B* | | | | | | | |
559
+ | Gemma-3-1B-pt | 1.0B | 0.6 | 2.4 | 9.4 | 6.1 | 57.3 | 26.1 |
560
+ | LLaMA3.2-1B-base | 1.24B | 1.6 | 6.8 | 26.6 | 17.1 | 58.4 | 32.0 |
561
+ | OLMo-2-0425-1B-base | 1.48B | 5.2 | 39.8 | 7.8 | 6.7 | 61.0 | 42.4 |
562
+ | Qwen2.5-1.5B-base | 1.54B | 31.0 | 68.4 | 44.6 | 36.6 | 58.7 | 61.2 |
563
+ | SmolLM2-1.7B-base | 1.71B | 11.6 | 31.8 | 35.4 | 0.6 | 62.9 | 50.0 |
564
+ | Qwen3-1.7B-base | 2.03B | 38.5 | 76.2 | 56.4 | 47.6 | 60.9 | 62.1 |
565
+
566
+
567
+ Here, CommonSense Avg. denotes an average of 8 tasks in CommonSense Reasoning benchmarks including ARC-easy, ARC-challenge, BoolQ, PIQA, SIQA, HellaSwag, OBQA, and WinoGrand. Models with fewer than 150M parameters do not yield reliable MMLU scores and are therefore denoted as '—'.
568
+
569
+ ## MobileLLM-R1 post-trained model
570
+
571
+ | Model | Size | MATH500 | GSM8K | AIME'24 | AIME'25 | LiveCodeBench-v6 |
572
+ | --- | --- | --- | --- | --- | --- | --- |
573
+ | | | 0-shot <br> pass@1 | 0-shot <br> pass@1 | 0-shot <br> pass@1, n=64 | 0-shot <br> pass@1, n=64 | 0-shot <br> pass@1, n=16 |
574
+ | |
575
+ | *<150M* | | | | | | |
576
+ | SmolLM2-135M-Instruct | 135M | 3.0 | 2.4 | -- | -- | 0.0 |
577
+ | **MobileLLM-R1-140M** | 140M | **7.4** | **3.0** | -- | -- | **1.0** |
578
+ | |
579
+ | *150M - 400M* | | | | | | |
580
+ | Gemma-3-270m-it | 268M | 6.8 | 8.4 | -- | -- | 0.0 |
581
+ | SmolLM2-360M-Instruct | 362M | 3.4 | 8.1 | -- | -- | 0.7 |
582
+ | **MobileLLM-R1-360M** | 359M | **26.6** | **22.7** | -- | -- | **4.8** |
583
+ | |
584
+ | *400M - 1B* | | | | | | |
585
+ | Qwen2.5-0.5B-Instruct | 494M | 31.2 | 48.1 | 0.1 | 0.3 | 3.6 |
586
+ | Qwen3-0.6B | 596M | 73.0 | **79.2** | 11.3 | **17.0** | 14.9 |
587
+ | **MobileLLM-R1-950M** | 949M | **74.0** | 67.5 | **15.5** | 16.3 | **19.9** |
588
+ | |
589
+ | *> 1B* | | | | | | |
590
+ | Gemma-3-1B-it | 1.0B | 45.4 | 62.9 | 0.9 | 0.0 | 2.0 |
591
+ | LLaMA3.2-1B-Instruct | 1.24B | 24.8 | 38.8 | 1.1 | 0.2 | 4.1 |
592
+ | OLMo-2-0425-1B-Instruct | 1.48B | 19.2 | 69.7 | 0.6 | 0.1 | 0.0 |
593
+ | OpenReasoning-Nemotron-1.5B | 1.54B | 83.4 | 76.7 | 49.7 | 40.4 | 28.3 |
594
+ | DeepSeek-R1-Distill-Qwen-1.5B | 1.54B | 83.2 | 77.3 | 29.1 | 23.4 | 19.9 |
595
+ | Qwen2.5-1.5B-Instruct | 1.54B | 54.0 | 70.0 | 2.5 | 0.9 | 7.9 |
596
+ | SmolLM2-1.7B-Instruct | 1.71B | 19.2 | 41.8 | 0.3 | 0.1 | 4.4 |
597
+ | Qwen3-1.7B | 2.03B | 89.4 | 90.3 | 47.0 | 37.0 | 29.8 |
598
+
599
+ For AIME, we evaluate models across 64 runs and report the average accuracy. For LiveCodeBench, results are reported as the average accuracy across 16 runs. Models with fewer than 400M parameters do not produce reliable AIME scores and are therefore denoted as '—'.
600
+
601
+
602
+ # Training
603
+
604
+ ## Training Process
605
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/660f893bae89429c07a32cdb/ThVFzsaaGa4gQ3iha5CKM.jpeg)
606
+
607
+ ### Training stages and hyperparameter details
608
+
609
+ In the pretraining phase, MobileLLM-R1 models are randomly initialized and optimized using the Adam optimizer with hyperparameters (β_1, β_2, ε) = (0.9, 0.95, 1e-8), coupled with a weight decay coefficient of 0.1. The learning rate follows a 2k-step warmup schedule and then decays linearly from its peak to 10\% of the maximum.
610
+
611
+ In the mid-training phase, we use Adam optimizer with learning rate linearly decays from its maximum value to zero. We employ knowledge distillation with Llama-3.1-8B-Instruct model as the teacher, where the student is trained via minimizing the KL divergence between its output logits and the teacher logits.
612
+
613
+ In the post-training phase, we use the Adam optimizer with zero weight decay. The learning rate warmup ratio is set to 0.03 for general-purpose SFT and 0.1 for reasoning-specific SFT, and it linearly decays from its maximum value to zero. Full training hyperparameters are provided in the table below.
614
+
615
+ | Stage | Phase | Tokens / Samples | BS | Sequence Length | Steps | LR | #GPUs | Training Time |
616
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
617
+ | Pre-training | Phase1 | 2T tokens | 16 | 2k | 500k | 4.00E-03 | 16 x 8 | 4-5 days |
618
+ | | Phase2 | 2T tokens | 16 | 2k | 500k | 4.00E-03 | 16 x 8 | 4-5 days |
619
+ | Mid-training | Phase1 | 100B tokens | 4 | 4k | 50K | 3.60E-04 | 16 x 8 | 1-2 days |
620
+ | | Phase2 | 100B tokens | 4 | 4k | 50K | 3.60E-04 | 16 x 8 | 1-2 days |
621
+ | Post-training | General SFT | 866K samples | 4 | 4k | 2 epochs | 5.00E-06 | 16 x 8 | ~2h |
622
+ | | Reasoning SFT | 6.2M samples | 8 | 32k | 4 epochs | 8.00E-05 | 16 x 8 | ~2.5days |
623
+
624
+ ## Data Mix
625
+
626
+ ### Pre-training
627
+
628
+ | Dataset | Rows | Tokens (B) | Phase1 Mix Ratio | Phase2 Mix Ratio |
629
+ | --- | --- | --- | --- | --- |
630
+ | [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata) | 206,640,114 | 263.8 | 10.66% | 0.52% |
631
+ | [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) | 6,117,786 | 12.6 | 6.93% | 23.33% |
632
+ | [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) | 1,279,107,432 | 1300 | 63.75% | 54.83% |
633
+ | [Wiki](https://huggingface.co/datasets/allenai/dolmino-mix-1124/tree/main/data/wiki) | 7,222,303 | 3.7 | 5.03% | 0.14% |
634
+ | [Arxiv](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T/blob/main/urls/arxiv.txt) | 1,533,917 | 28 | 6.36% | 1.32% |
635
+ | [StackExchange](https://data.together.xyz/redpajama-data-1T/v1.0.0/stackexchange/stackexchange.jsonl) | 29,249,120 | 19.6 | 5.03% | 0.86% |
636
+ | [Algebraic stack](https://huggingface.co/datasets/EleutherAI/proof-pile-2/tree/main/algebraic-stack) | 3,404,331 | 12.6 | 2.25% | 1.26% |
637
+ | [Nemotron science](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset/blob/main/SFT/science/science.jsonl) | 708,920 | 2 | -- | 0.03% |
638
+ | [Nemotron code](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset/blob/main/SFT/code/code_v1.1.jsonl) | 10,108,883 | 16 | -- | 0.72% |
639
+ | [Nemotron math](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset/blob/main/SFT/math/math_v1.1.jsonl) | 22,066,397 | 15 | -- | 3.01% |
640
+ | [Cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) | 31,064,744 | 25 | -- | 2.70% |
641
+ | [Facebook natural reasoning](https://huggingface.co/datasets/facebook/natural_reasoning) | 1,145,824 | 1.8 | -- | 3.18% |
642
+ | [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath/tree/main/finemath-3plus) | 48,283,984 | 34 | -- | 8.01% |
643
+ | [peS2o](https://huggingface.co/datasets/allenai/peS2o) | 38,800,000 | 50 | -- | 0.08% |
644
+ | **Total** | | | 100% | 100% |
645
+
646
+
647
+
648
+
649
+ ### Mid-training
650
+
651
+
652
+ | Dataset | Subset | Rows (M) | Phase1 Mix Ratio | Phase2 Mix Ratio |
653
+ | --- | --- | --- | --- | --- |
654
+ | [Dolmino](https://huggingface.co/datasets/allenai/dolmino-mix-1124) | DCLM Baseline | 606 | 37.03% | 6.51% |
655
+ | | FLAN | 57.3 | 4.10% | 0.72% |
656
+ | | peS2o | 38.8 | 11.41% | 2.01% |
657
+ | | Wiki | 6.17 | 2.66% | 0.47% |
658
+ | | StackExchange | 2.48 | 2.12% | 2.00% |
659
+ | | Math | 21 | 11.63% | 29.10% |
660
+ | Nemotron | [Nemotron-Pretraining-Code-v1](https://huggingface.co/datasets/nvidia/Nemotron-Pretraining-Code-v1) | 882 | 20.69% | 29.10% |
661
+ | | [Nemotron-CC-Math-v1](https://huggingface.co/datasets/nvidia/Nemotron-CC-Math-v1) | 144 | 3.45% | 19.40% |
662
+ | StarCoder | [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata) | 206 | 6.90% | 9.70% |
663
+ | Benchmark training set | [TriviaQA (train)](https://huggingface.co/datasets/mandarjoshi/trivia_qa/tree/main/rc) <br> [OBQA (train)](https://huggingface.co/datasets/allenai/openbookqa/blob/main/main/train-00000-of-00001.parquet) <br> [NaturalQuestions (train)](https://github.com/google-research-datasets/natural-questions/blob/master/nq_open/NQ-open.train.jsonl) <br> [PIQA (train)](https://github.com/ybisk/ybisk.github.io/blob/master/piqa/data/train.jsonl) <br> [GSM8K (train)](https://huggingface.co/datasets/openai/gsm8k/blob/main/main/train-00000-of-00001.parquet) <br> [BoolQ (train)](https://huggingface.co/datasets/google/boolq/blob/main/data/train-00000-of-00001.parquet) <br> [ARC-Easy (train)](https://huggingface.co/datasets/allenai/ai2_arc/blob/main/ARC-Easy/train-00000-of-00001.parquet) <br> [ARC-Challenge (train)](https://huggingface.co/datasets/allenai/ai2_arc/blob/main/ARC-Challenge/train-00000-of-00001.parquet) | ~0.01 | -- | 0.97% |
664
+ | Total | | | 100.00% | 100.00% |
665
+
666
+ ### Post-training
667
+ | Phase | Dataset | Rows |
668
+ | --- | --- | --- |
669
+ | General SFT | [Tulu-3-sft-olmo-2-mixture-0225](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture-0225) | 866K samples |
670
+ | Reasoning SFT | [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) | 3.2M samples |
671
+ | | [OpenScienceReasoning-2](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2) | 803K samples |
672
+ | | [OpenCodeReasoning-2](https://huggingface.co/datasets/nvidia/OpenCodeReasoning-2) | 2.16M samples |
673
+
674
+
675
+ # Citation
676
+
677
+ If you find our model useful for your research, please consider citing:
678
+
679
+ @misc{mobilellm_r1_2025,
680
+ title={MobileLLM-R1: Model Card},
681
+ author={Zechun Liu*, Ernie Chang*, Changsheng Zhao*, Chia-Jung Chang, Wei Wen, Chen Lai, Rick Cao, Yuandong Tian, Raghuraman Krishnamoorthi, Yangyang Shi, Vikas Chandra},
682
+ year={2025},
683
+ url = {https://huggingface.co/mobilellm-r1}
684
+ }
685
+
686
+ # Contact
687
+ Zechun Liu, Meta Inc (zechunliu at meta dot com)
688
+
689
+ Ernie Chang, Meta Inc (erniecyc at meta dot com)
690
+
691
+ Changsheng Zhao, Meta Inc (cszhao at meta dot com)
692
+
693
+ # License
694
+
695
+ MobileLLM-R1 is FAIR NC licensed as of now
check_shape.py ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ from pathlib import Path
4
+ from safetensors import safe_open
5
+
6
+
7
+ def check_model_shape(model_path: str):
8
+ """Inspects a model's config and weights to determine its MLP structure."""
9
+ model_path = Path(model_path)
10
+ config_path = model_path / "config.json"
11
+ weights_path = model_path / "model.safetensors"
12
+
13
+ if not config_path.exists():
14
+ print(f"Error: config.json not found in {model_path}")
15
+ return
16
+
17
+ if not weights_path.exists():
18
+ print(f"Error: model.safetensors not found in {model_path}")
19
+ return
20
+
21
+ print(f"--- Checking model shape in {model_path} ---")
22
+
23
+ # 1. Inspect config.json
24
+ with open(config_path, "r") as f:
25
+ config = json.load(f)
26
+
27
+ has_dual_mlp_config = config.get("intermediate_size_mlp", 0) > 0
28
+ print(f"Config has 'intermediate_size_mlp': {has_dual_mlp_config}")
29
+
30
+ # 2. Inspect weight keys from model.safetensors
31
+ has_dual_mlp_weights = False
32
+ try:
33
+ with safe_open(weights_path, framework="mlx") as f:
34
+ weight_keys = f.keys()
35
+ # A simple heuristic: check for weight keys that are not part of the standard SwiGLU MLP.
36
+ # This is not foolproof as names can vary, but it's a good indicator.
37
+ for key in weight_keys:
38
+ if (
39
+ "mlp" in key
40
+ and "gate_proj" not in key
41
+ and "up_proj" not in key
42
+ and "down_proj" not in key
43
+ ):
44
+ print(f"Found potential dual-branch weight: {key}")
45
+ has_dual_mlp_weights = True
46
+ break
47
+ except Exception as e:
48
+ print(f"Could not read weights from model.safetensors: {e}")
49
+ return
50
+
51
+ print(f"Found potential dual-branch MLP weights: {has_dual_mlp_weights}")
52
+
53
+ # 3. Report conclusion
54
+ print("\n--- Conclusion ---")
55
+ if has_dual_mlp_config and has_dual_mlp_weights:
56
+ print("✅ The model appears to be a DUAL-BRANCH MLP variant.")
57
+ elif has_dual_mlp_config and not has_dual_mlp_weights:
58
+ print(
59
+ "⚠️ The model configuration suggests a dual-branch MLP, but no corresponding weights were found."
60
+ )
61
+ print(" It will likely run as a SINGLE-BRANCH model.")
62
+ else:
63
+ print("✅ The model appears to be a SINGLE-BRANCH MLP variant.")
64
+ print("--------------------\n")
65
+
66
+
67
+ if __name__ == "__main__":
68
+ parser = argparse.ArgumentParser(
69
+ description="Check the MLP shape of a model variant."
70
+ )
71
+ parser.add_argument(
72
+ "model_path",
73
+ type=str,
74
+ nargs="?",
75
+ default=".",
76
+ help="Path to the model directory to check.",
77
+ )
78
+ args = parser.parse_args()
79
+
80
+ check_model_shape(args.model_path)
conversion.log ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ uv run python custom_mlx_lm/custom_convert.py --hf-path . --mlx-path MobileLLM-R1-950M-mlx/ --report-ppl
2
+ Loading model from ....
3
+ Loading calibration data...
4
+ Token indices sequence length is longer than the specified maximum sequence length for this model (110205 > 32768). Running this sequence through the model will result in indexing errors
5
+ Calculating perplexity of original model...
6
+ Original PPL: 50.262
7
+
8
+ ✅ Model saved to MobileLLM-R1-950M-mlx/
custom_mlx_lm/README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Custom MLX-LM Conversion, Quantization, and Inference
2
+
3
+ Overview
4
+ - Scripts here convert the HF safetensors model to MLX format, optionally apply mixed-precision dynamic quantization, and run inference with prompt formatting consistent with inference.py.
5
+ - Quant layout is persisted in config.json so the loader can re-materialize only Linear layers as QuantizedLinear while keeping embeddings and norms in float.
6
+
7
+ Key scripts
8
+ - custom_convert_2.py
9
+ - Convert and optionally quantize.
10
+ - Mixed precision uses calibration data and a sensitivity-driven split between 4-bit and 8-bit Linear layers.
11
+ - Saves weights to weights.npz and writes quantization metadata to config.json.
12
+ - custom_loader.py
13
+ - Loads the model with the correct module types (QuantizedLinear vs float) based on config metadata, then applies saved weights.
14
+ - Leaves embeddings and layernorms in float.
15
+ - inference_mlx_lm.py (CLI: mobilellm-infer)
16
+ - Runs generation. Uses chat_template.jinja when present, else prepends BOS, matching inference.py behavior.
17
+ - quant_summary.py
18
+ - Prints a summary of per-layer bit-widths and checks quantized tensors exist in weights.npz.
19
+
20
+ Quickstart
21
+ - Mixed-precision dynamic quantization
22
+ - uv run python custom_mlx_lm/custom_convert_2.py --hf-path . --mlx-path MobileLLM-R1-950M-mixed-4bit-mlx --dynamic-quant --target-bpw 4.5 --report-ppl
23
+ - Group size defaults to 64 when not provided.
24
+ - Uniform quantization
25
+ - uv run python custom_mlx_lm/custom_convert_2.py --hf-path . --mlx-path MobileLLM-R1-950M-4bit-mlx --quantize --bits 4 --report-ppl
26
+ - Summarize quant layout
27
+ - uv run python custom_mlx_lm/quant_summary.py --model-path MobileLLM-R1-950M-mixed-4bit-mlx --show 8
28
+ - Inference
29
+ - mobilellm-infer --model-path MobileLLM-R1-950M-mixed-4bit-mlx --prompt "What is the nearest prime to 9^2?"
30
+
31
+ Notes and defaults
32
+ - Calibration: load_data uses WikiText-like data; dynamic quant computes sensitivities once and chooses 4/8-bit per Linear layer to target the requested bits-per-weight. Reported PPL is from the same set.
33
+ - Group size: defaults to 64 when quantizing if not provided.
34
+ - Prompt formatting: by default uses chat_template.jinja if present; otherwise prepends BOS for stable behavior across float and quant models.
35
+
36
+ Troubleshooting
37
+ - Empty sensitivities (ValueError: min() arg is empty)
38
+ - Fixed: ensure Linear weights are not frozen during sensitivity estimation; grads must exist.
39
+ - Unable to quantize model of type QuantizedLinear
40
+ - Fixed: second quantization pass now targets only remaining float Linear layers.
41
+ - [dequantize] The matrix should be given as a uint32
42
+ - Fixed: loader does not blanket-quantize; it re-materializes only Linear layers from per-layer bits map before loading weights, leaving embeddings in float.
43
+
44
+ Rationale and behavior
45
+ - Persist per-layer bits: enables deterministic, loader-driven reconstruction of quant modules and prevents accidental quantization of unsupported modules.
46
+ - Keep embeddings float: avoids dtype mismatch and preserves quality.
47
+ - Match inference.py formatting: improves output consistency between float and quant variants.
48
+
custom_mlx_lm/__init__.py ADDED
File without changes
custom_mlx_lm/custom_convert.py ADDED
@@ -0,0 +1,305 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import copy
3
+ import json
4
+ import os
5
+ import sys
6
+ from pathlib import Path
7
+
8
+ import mlx.core as mx
9
+ import mlx.nn as nn
10
+ from mlx.utils import tree_flatten, tree_map, tree_unflatten
11
+ from mlx_lm.quant.dynamic_quant import eval_ppl
12
+ from mlx_lm.quant.utils import load_data
13
+ from safetensors import safe_open
14
+ from tqdm import tqdm
15
+ from transformers import AutoTokenizer
16
+
17
+ # FIX: Correctly calculate the project root to find model.py
18
+ project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
19
+ if project_root not in sys.path:
20
+ sys.path.insert(0, project_root)
21
+
22
+ from model import Model, ModelArgs
23
+
24
+
25
+ def estimate_sensitivities(
26
+ model, data, low_bits, low_group_size, high_bits, high_group_size, batch_size=4
27
+ ):
28
+ def qdq(w, bits, group_size):
29
+ w, s, b = mx.quantize(w, bits=bits, group_size=group_size)
30
+ return mx.dequantize(w, scales=s, biases=b, bits=bits, group_size=group_size)
31
+
32
+ q_model = copy.deepcopy(model)
33
+ linear_layers = {
34
+ k: layer
35
+ for k, layer in tree_flatten(
36
+ q_model.leaf_modules(), is_leaf=nn.Module.is_module
37
+ )
38
+ if isinstance(layer, nn.Linear)
39
+ }
40
+ # Quantize-dequantize weights for low-precision model copy and ensure
41
+ # the weights remain trainable so gradients are computed for sensitivities.
42
+ for layer in linear_layers.values():
43
+ layer.weight = qdq(layer.weight, low_bits, low_group_size)
44
+
45
+ def loss_fn(batch, targets):
46
+ logits = q_model(batch)
47
+ return nn.losses.cross_entropy(logits, targets, reduction="mean")
48
+
49
+ grad_accum = tree_map(lambda x: mx.zeros(x.shape), q_model.trainable_parameters())
50
+
51
+ for s in tqdm(range(0, len(data), batch_size), desc="Estimating sensitivities"):
52
+ batch = data[s : s + batch_size]
53
+ targets = model(batch[:, :-1])
54
+ mx.eval(targets)
55
+ _, grads = nn.value_and_grad(q_model, loss_fn)(batch[:, :-1], batch[:, 1:])
56
+ grad_accum = tree_map(lambda x, y: x + y, grad_accum, grads)
57
+ mx.eval(grad_accum)
58
+
59
+ def compute_sensitivity(grad, lq_w, orig_w):
60
+ hq_w = qdq(orig_w, high_bits, high_group_size)
61
+ return (grad * (lq_w - hq_w)).sum()
62
+
63
+ # Use a direct loop instead of tree_map to be more robust
64
+ grad_dict = dict(tree_flatten(grad_accum))
65
+ q_params_dict = dict(tree_flatten(q_model.parameters()))
66
+ orig_params_dict = dict(tree_flatten(model.parameters()))
67
+
68
+ sensitivities = {}
69
+ for path, module in linear_layers.items():
70
+ weight_key = f"{path}.weight"
71
+ if weight_key in grad_dict:
72
+ grad = grad_dict[weight_key]
73
+ q_weight = q_params_dict[weight_key]
74
+ orig_weight = orig_params_dict[weight_key]
75
+
76
+ sensitivity = compute_sensitivity(grad, q_weight, orig_weight)
77
+ sensitivities[path] = sensitivity.item()
78
+
79
+ return sensitivities
80
+
81
+
82
+ def estimate_threshold(
83
+ model,
84
+ sensitivities,
85
+ target_bpw,
86
+ low_bits,
87
+ low_group_size,
88
+ high_bits,
89
+ high_group_size,
90
+ ):
91
+ def predicate(p, m, threshold):
92
+ if not isinstance(m, nn.Linear):
93
+ return False
94
+ return sensitivities.get(p, 0) > threshold
95
+
96
+ sens_vals = list(sensitivities.values())
97
+ if len(sens_vals) == 0:
98
+ raise RuntimeError(
99
+ "No sensitivities were computed. This usually means gradients "
100
+ "for Linear weights were not collected. Ensure layers are detected "
101
+ "and weights are trainable during sensitivity estimation."
102
+ )
103
+ min_thr, max_thr = min(sens_vals), max(sens_vals)
104
+
105
+ while (max_thr - min_thr) > 1e-3 * (max(sens_vals) - min(sens_vals)):
106
+ mid = (max_thr + min_thr) / 2
107
+ q_model = copy.deepcopy(model)
108
+
109
+ def high_predicate(p, m):
110
+ return predicate(p, m, mid)
111
+
112
+ def low_predicate(p, m):
113
+ # Only quantize remaining float nn.Linear layers; avoid re-quantizing
114
+ # modules already quantized in the first pass.
115
+ return isinstance(m, nn.Linear) and (not predicate(p, m, mid))
116
+
117
+ nn.quantize(
118
+ q_model,
119
+ group_size=high_group_size,
120
+ bits=high_bits,
121
+ class_predicate=high_predicate,
122
+ )
123
+ nn.quantize(
124
+ q_model,
125
+ group_size=low_group_size,
126
+ bits=low_bits,
127
+ class_predicate=low_predicate,
128
+ )
129
+
130
+ bpw = (
131
+ sum(p.nbytes for _, p in tree_flatten(q_model.parameters()))
132
+ * 8
133
+ / sum(p.size for _, p in tree_flatten(q_model.parameters()))
134
+ )
135
+
136
+ if bpw > target_bpw:
137
+ min_thr = mid
138
+ else:
139
+ max_thr = mid
140
+ return (max_thr + min_thr) / 2
141
+
142
+
143
+ # --- Main Conversion and Saving Logic ---
144
+ def main():
145
+ parser = argparse.ArgumentParser(
146
+ description="Convert and optionally quantize a model."
147
+ )
148
+ parser.add_argument(
149
+ "--hf-path", type=str, default=".", help="Path to the Hugging Face model."
150
+ )
151
+ parser.add_argument(
152
+ "--mlx-path", type=str, required=True, help="Path to save the MLX model."
153
+ )
154
+ parser.add_argument(
155
+ "--quantize",
156
+ "-q",
157
+ action="store_true",
158
+ help="Generate a simple uniformly quantized model.",
159
+ )
160
+ parser.add_argument(
161
+ "--dynamic-quant",
162
+ action="store_true",
163
+ help="Use advanced mixed-precision quantization.",
164
+ )
165
+ parser.add_argument(
166
+ "--report-ppl",
167
+ action="store_true",
168
+ help="Report perplexity before and after quantization.",
169
+ )
170
+ parser.add_argument(
171
+ "--target-bpw",
172
+ type=float,
173
+ default=4.5,
174
+ help="Target bits per weight for advanced quant.",
175
+ )
176
+ parser.add_argument(
177
+ "--bits", "-b", type=int, default=4, help="Bits for uniform quantization."
178
+ )
179
+ parser.add_argument(
180
+ "--group-size",
181
+ "-g",
182
+ type=int,
183
+ default=None,
184
+ help="Group size for quantization. If omitted, defaults to 64 when quantizing.",
185
+ )
186
+ args = parser.parse_args()
187
+
188
+ print(f"Loading model from {args.hf_path}...")
189
+ hf_path = Path(args.hf_path)
190
+ tokenizer = AutoTokenizer.from_pretrained(args.hf_path)
191
+
192
+ with open(hf_path / "config.json", "r") as f:
193
+ config = json.load(f)
194
+
195
+ with safe_open(hf_path / "model.safetensors", framework="mlx") as f:
196
+ keys = list(f.keys())
197
+ has_dual = any(
198
+ (".feed_forward.g_up.weight" in k) or (".mlp.g_up.weight" in k) for k in keys
199
+ )
200
+ model_args = ModelArgs.from_dict(config)
201
+ model_args.use_dual_mlp = bool(has_dual)
202
+ model = Model(model_args)
203
+
204
+ weights = {}
205
+ with safe_open(hf_path / "model.safetensors", framework="mlx") as f:
206
+ for k in f.keys():
207
+ if has_dual and ("gate_proj" in k or "up_proj" in k or "down_proj" in k):
208
+ continue
209
+ v = f.get_tensor(k)
210
+ k = k.replace("model.embed_tokens", "tok_embeddings")
211
+ k = k.replace("model.layers", "layers")
212
+ k = k.replace("self_attn", "attention")
213
+ k = k.replace("input_layernorm", "attention_norm")
214
+ k = k.replace("post_attention_layernorm", "ffn_norm")
215
+ k = k.replace("mlp.", "feed_forward.")
216
+ k = k.replace("model.norm", "norm")
217
+ weights[k] = v
218
+ if config.get("tie_word_embeddings", True):
219
+ weights.pop("output.weight", None)
220
+ model.update(tree_unflatten(list(weights.items())))
221
+
222
+ calibration_data = None
223
+ if args.report_ppl or args.dynamic_quant:
224
+ print("Loading calibration data...")
225
+ calibration_data = load_data(tokenizer, num_samples=-1, sequence_length=512)
226
+
227
+ if args.report_ppl:
228
+ print("Calculating perplexity of original model...")
229
+ ppl = eval_ppl(model, data=calibration_data)
230
+ print(f"Original PPL: {ppl:.3f}")
231
+
232
+ if args.dynamic_quant:
233
+ # Choose a sensible default group size if not provided
234
+ if args.group_size is None:
235
+ args.group_size = 64
236
+ print("[info] Using default group_size=64 for dynamic quantization")
237
+ print("Starting advanced mixed-precision quantization...")
238
+ sensitivities = estimate_sensitivities(
239
+ model, calibration_data, 4, args.group_size, 8, args.group_size
240
+ )
241
+
242
+ threshold = estimate_threshold(
243
+ model,
244
+ sensitivities,
245
+ args.target_bpw,
246
+ 4,
247
+ args.group_size,
248
+ 8,
249
+ args.group_size,
250
+ )
251
+
252
+ # Compute per-layer bit widths BEFORE mutating the model
253
+ per_layer_bits = {p: (8 if s > threshold else 4) for p, s in sensitivities.items()}
254
+
255
+ def high_predicate(p, m):
256
+ return isinstance(m, nn.Linear) and per_layer_bits.get(p, 4) == 8
257
+
258
+ def low_predicate(p, m):
259
+ return isinstance(m, nn.Linear) and per_layer_bits.get(p, 4) == 4
260
+
261
+ nn.quantize(
262
+ model, group_size=args.group_size, bits=8, class_predicate=high_predicate
263
+ )
264
+ nn.quantize(
265
+ model, group_size=args.group_size, bits=4, class_predicate=low_predicate
266
+ )
267
+
268
+ # Persist per-layer bit-widths so the loader can re-materialize
269
+ # the correct QuantizedLinear modules on load without touching
270
+ # embeddings or other layers.
271
+ config["quantization"] = {
272
+ "group_size": args.group_size,
273
+ "method": "mixed_precision_dynamic",
274
+ "per_layer_bits": per_layer_bits,
275
+ }
276
+
277
+ elif args.quantize:
278
+ # Choose a sensible default group size if not provided
279
+ if args.group_size is None:
280
+ args.group_size = 64
281
+ print("[info] Using default group_size=64 for uniform quantization")
282
+ print("Starting simple uniform quantization...")
283
+ nn.quantize(model, group_size=args.group_size, bits=args.bits)
284
+ config["quantization"] = {
285
+ "group_size": args.group_size,
286
+ "bits": args.bits,
287
+ "method": "uniform",
288
+ }
289
+
290
+ if args.report_ppl and (args.quantize or args.dynamic_quant):
291
+ print("Calculating perplexity of quantized model...")
292
+ ppl = eval_ppl(model, data=calibration_data)
293
+ print(f"Quantized PPL: {ppl:.3f}")
294
+
295
+ output_path = Path(args.mlx_path)
296
+ output_path.mkdir(parents=True, exist_ok=True)
297
+ mx.savez(str(output_path / "weights.npz"), **dict(tree_flatten(model.parameters())))
298
+ with open(output_path / "config.json", "w") as f:
299
+ json.dump(config, f, indent=4)
300
+ tokenizer.save_pretrained(output_path)
301
+ print(f"\n✅ Model saved to {args.mlx_path}")
302
+
303
+
304
+ if __name__ == "__main__":
305
+ main()
custom_mlx_lm/custom_loader.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from pathlib import Path
3
+ import numpy as np
4
+ import mlx.core as mx
5
+ import mlx.nn as nn
6
+ from mlx.utils import tree_unflatten
7
+
8
+ # We must import from the project root's model.py
9
+ from model import Model, ModelArgs
10
+
11
+
12
+ def load_model(model_path: str):
13
+ model_path = Path(model_path)
14
+ with open(model_path / "config.json", "r") as f:
15
+ config = json.load(f)
16
+
17
+ # Peek with numpy to inspect keys without materializing MLX arrays yet
18
+ npz_path = model_path / "weights.npz"
19
+ npz = np.load(npz_path, allow_pickle=False)
20
+ keys = list(npz.files)
21
+ has_dual = any("g_up" in k for k in keys)
22
+
23
+ args = ModelArgs.from_dict(config)
24
+ args.use_dual_mlp = bool(has_dual)
25
+ model = Model(args)
26
+
27
+ # If quantization metadata is present, re-materialize QuantizedLinear modules
28
+ qcfg = config.get("quantization") or {}
29
+ method = qcfg.get("method")
30
+ group_size = qcfg.get("group_size")
31
+
32
+ if method == "uniform":
33
+ bits = int(qcfg.get("bits", 4))
34
+ nn.quantize(
35
+ model,
36
+ group_size=int(group_size) if group_size is not None else 64,
37
+ bits=bits,
38
+ class_predicate=lambda p, m: isinstance(m, nn.Linear),
39
+ )
40
+ elif method == "mixed_precision_dynamic":
41
+ per_layer_bits = qcfg.get("per_layer_bits", {})
42
+
43
+ def predicate(p, m):
44
+ if not isinstance(m, nn.Linear):
45
+ return False
46
+ b = per_layer_bits.get(p)
47
+ if b is None:
48
+ return False
49
+ return {"bits": int(b), "group_size": int(group_size)}
50
+
51
+ nn.quantize(
52
+ model,
53
+ group_size=int(group_size) if group_size is not None else 64,
54
+ bits=4,
55
+ class_predicate=predicate,
56
+ )
57
+
58
+ # Now load the actual weights into MLX and update
59
+ weights = mx.load(str(npz_path))
60
+ model.update(tree_unflatten(list(weights.items())))
61
+ return model
custom_mlx_lm/inference_mlx_lm.py ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # custom_mlx_lm/inference_mlx_lm.py
2
+ import argparse
3
+ import os
4
+ import sys
5
+ import time
6
+
7
+ import mlx.core as mx
8
+ from transformers import AutoTokenizer
9
+ from pathlib import Path
10
+
11
+ project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
12
+ if project_root not in sys.path:
13
+ sys.path.insert(0, project_root)
14
+
15
+ # Use the robust universal loader
16
+ from custom_mlx_lm.custom_loader import load_model
17
+
18
+
19
+ def generate_text(
20
+ prompt: str,
21
+ model_path: str,
22
+ max_tokens: int = 100,
23
+ temperature: float = 0.1,
24
+ top_p: float = 0.9,
25
+ # Add other parameters from your original inference.py if needed
26
+ ):
27
+ """
28
+ Generates text using the loaded MLX model with the robust custom sampler.
29
+ This logic is adapted from your proven inference.py script.
30
+ """
31
+ print("Loading model and tokenizer using custom loader...")
32
+ model = load_model(model_path)
33
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
34
+
35
+ # Align prompt handling with inference.py: prefer chat template, else prepend BOS
36
+ chat_template_path = Path(model_path) / "chat_template.jinja"
37
+ use_chat_format = chat_template_path.exists()
38
+
39
+ if use_chat_format:
40
+ messages = [{"role": "user", "content": prompt}]
41
+ formatted_prompt = tokenizer.apply_chat_template(
42
+ messages, tokenize=False, add_generation_prompt=True
43
+ )
44
+ else:
45
+ bos = tokenizer.bos_token or ""
46
+ formatted_prompt = f"{bos}{prompt}"
47
+
48
+ print("Starting generation...")
49
+ prompt_tokens = tokenizer.encode(formatted_prompt, add_special_tokens=False)
50
+ prompt_tokens = mx.array([prompt_tokens])
51
+
52
+ start_time = time.time()
53
+ generated_tokens = []
54
+ for i in range(max_tokens):
55
+ logits = model(prompt_tokens)
56
+ next_token_logits = logits[0, -1, :]
57
+
58
+ if temperature == 0:
59
+ next_token = int(mx.argmax(next_token_logits).item())
60
+ else:
61
+ scaled_logits = next_token_logits / temperature
62
+ if 0.0 < top_p < 1.0:
63
+ probs = mx.softmax(scaled_logits, axis=-1)
64
+ sorted_probs = mx.sort(probs)[::-1]
65
+ cumulative_probs = mx.cumsum(sorted_probs, axis=-1)
66
+ cutoff_index = mx.sum(cumulative_probs < top_p)
67
+ cutoff_prob = sorted_probs[cutoff_index.item()]
68
+ mask = probs >= cutoff_prob
69
+ scaled_logits = mx.where(mask, scaled_logits, float("-inf"))
70
+ next_token = mx.random.categorical(scaled_logits, num_samples=1).item()
71
+
72
+ eos_ids = tokenizer.eos_token_id
73
+ stop_ids = (
74
+ {int(i) for i in eos_ids} if isinstance(eos_ids, list) else {int(eos_ids)}
75
+ )
76
+ if next_token in stop_ids:
77
+ break
78
+
79
+ generated_tokens.append(next_token)
80
+ prompt_tokens = mx.concatenate(
81
+ [prompt_tokens, mx.array([[next_token]])], axis=1
82
+ )
83
+
84
+ response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
85
+ print("\n--- Response ---")
86
+ print(response)
87
+ print("------------------")
88
+ generation_speed = (
89
+ len(generated_tokens) / (time.time() - start_time) if generated_tokens else 0
90
+ )
91
+ print(
92
+ f"Generated {len(generated_tokens)} tokens at {generation_speed:.2f} tokens/sec"
93
+ )
94
+
95
+
96
+ def main():
97
+ parser = argparse.ArgumentParser(
98
+ description="Run inference on converted MLX models."
99
+ )
100
+ parser.add_argument(
101
+ "--model-path",
102
+ type=str,
103
+ required=True,
104
+ help="Path to the converted MLX model directory.",
105
+ )
106
+ parser.add_argument(
107
+ "--prompt",
108
+ type=str,
109
+ default="What is the capital of France?",
110
+ help="The prompt.",
111
+ )
112
+ parser.add_argument(
113
+ "--max-tokens", type=int, default=100, help="Max tokens to generate."
114
+ )
115
+ parser.add_argument(
116
+ "--temperature", type=float, default=0.1, help="Sampling temperature."
117
+ )
118
+ parser.add_argument("--top-p", type=float, default=0.9, help="Top-p sampling.")
119
+ args = parser.parse_args()
120
+
121
+ generate_text(
122
+ args.prompt,
123
+ args.model_path,
124
+ args.max_tokens,
125
+ args.temperature,
126
+ args.top_p,
127
+ )
128
+
129
+
130
+ if __name__ == "__main__":
131
+ main()
custom_mlx_lm/quant_summary.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ from pathlib import Path
4
+ import numpy as np
5
+
6
+
7
+ def main():
8
+ p = argparse.ArgumentParser(description="Summarize MLX-LM quantization layout")
9
+ p.add_argument("--model-path", required=True, help="Path to converted MLX model")
10
+ p.add_argument("--show", type=int, default=10, help="Show up to N entries per group")
11
+ args = p.parse_args()
12
+
13
+ mpath = Path(args.model_path)
14
+ cfg = json.loads((mpath / "config.json").read_text())
15
+ q = cfg.get("quantization") or {}
16
+ method = q.get("method", "none")
17
+ gsize = q.get("group_size")
18
+ plb = q.get("per_layer_bits", {})
19
+
20
+ print(f"Method: {method}")
21
+ print(f"Group size: {gsize}")
22
+ if method == "uniform":
23
+ print(f"Uniform bits: {q.get('bits')}")
24
+ return
25
+
26
+ if not plb:
27
+ print("No per-layer bits found in config.")
28
+ return
29
+
30
+ # Basic counts
31
+ buckets = {4: [], 8: [], "other": []}
32
+ for k, b in plb.items():
33
+ if b == 4:
34
+ buckets[4].append(k)
35
+ elif b == 8:
36
+ buckets[8].append(k)
37
+ else:
38
+ buckets["other"].append(k)
39
+
40
+ total = sum(len(v) for v in buckets.values())
41
+ print(f"Total linear layers: {total}")
42
+ print(f"4-bit layers: {len(buckets[4])}")
43
+ print(f"8-bit layers: {len(buckets[8])}")
44
+ if buckets["other"]:
45
+ print(f"Other-bit layers: {len(buckets['other'])}")
46
+
47
+ # Optional: show a few examples
48
+ for b in (8, 4):
49
+ items = sorted(buckets[b])
50
+ if not items:
51
+ continue
52
+ print(f"\nExamples ({b}-bit):")
53
+ for k in items[: args.show]:
54
+ print(f"- {k}")
55
+
56
+ # Optional: sanity-check against npz contents
57
+ try:
58
+ npz = np.load(mpath / "weights.npz", allow_pickle=False)
59
+ has_q = any(k.endswith(".scales") or k.endswith(".biases") for k in npz.files)
60
+ print(f"\nweights.npz contains quantized tensors: {has_q}")
61
+ except Exception as e:
62
+ print(f"Note: could not open weights.npz: {e}")
63
+
64
+
65
+ if __name__ == "__main__":
66
+ main()
67
+
inference.py ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import time
3
+ import mlx.core as mx
4
+ from transformers import AutoTokenizer
5
+ from model import load_model
6
+ from pathlib import Path
7
+
8
+
9
+ def generate_text(
10
+ prompt: str,
11
+ model_path: str,
12
+ max_tokens: int = 100,
13
+ temperature: float = 0.1,
14
+ top_p: float = 0.9,
15
+ system: str | None = None,
16
+ final_only: bool = False,
17
+ stop_at_boxed: bool = False,
18
+ extract_boxed: bool = False,
19
+ disable_chat_template: bool = False,
20
+ repetition_penalty: float = 1.0,
21
+ frequency_penalty: float = 0.0,
22
+ ):
23
+ """Generates text using the loaded MLX model with better sampling."""
24
+ print("Loading model and tokenizer...")
25
+ model = load_model(model_path)
26
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
27
+
28
+ # Check if we have the chat template
29
+ chat_template_path = Path(model_path) / "chat_template.jinja"
30
+ use_chat_format = chat_template_path.exists() and not disable_chat_template
31
+
32
+ print(f"Chat template found: {use_chat_format}")
33
+ print("Starting generation...")
34
+ print(f"Prompt: {prompt}")
35
+
36
+ # Format the prompt if using chat template
37
+ if use_chat_format:
38
+ messages = []
39
+ if system is None and final_only:
40
+ system = (
41
+ "You are a helpful assistant. Do not reveal your reasoning. "
42
+ "Respond with only the final answer enclosed in \\boxed{...}."
43
+ )
44
+ if system is not None:
45
+ messages.append({"role": "system", "content": system})
46
+ messages.append({"role": "user", "content": prompt})
47
+ formatted_prompt = tokenizer.apply_chat_template(
48
+ messages, tokenize=False, add_generation_prompt=True
49
+ )
50
+ print(f"Formatted prompt: {formatted_prompt}")
51
+ else:
52
+ # No chat template: prepend BOS if available in tokenizer
53
+ bos = tokenizer.bos_token or ""
54
+ formatted_prompt = f"{bos}{prompt}"
55
+
56
+ # Tokenize the prompt
57
+ prompt_tokens = tokenizer.encode(formatted_prompt, add_special_tokens=False)
58
+ prompt_tokens = mx.array([prompt_tokens])
59
+
60
+ print(f"Prompt tokens shape: {prompt_tokens.shape}")
61
+ print(
62
+ f"First few token IDs: {prompt_tokens[0, : min(10, prompt_tokens.shape[1])].tolist()}"
63
+ )
64
+
65
+ # Generation loop with better sampling
66
+ start_time = time.time()
67
+ generated_tokens = []
68
+ freq_counts = {}
69
+
70
+ running_text = ""
71
+ seen_box_start = False
72
+ for i in range(max_tokens):
73
+ # Get logits from model
74
+ logits = model(prompt_tokens)
75
+
76
+ # Focus on next-token logits
77
+ next_token_logits = logits[0, -1, :]
78
+
79
+ # Apply repetition and frequency penalties before sampling/argmax
80
+ if repetition_penalty and repetition_penalty != 1.0 and generated_tokens:
81
+ # Apply a simple repetition penalty to previously generated tokens
82
+ # Using HF-like rule: if logit > 0 divide by penalty else multiply by penalty
83
+ logits_list = next_token_logits.tolist()
84
+ seen = set(generated_tokens)
85
+ for tid in seen:
86
+ val = logits_list[tid]
87
+ if val > 0:
88
+ logits_list[tid] = val / repetition_penalty
89
+ else:
90
+ logits_list[tid] = val * repetition_penalty
91
+ next_token_logits = mx.array(logits_list)
92
+
93
+ if frequency_penalty and frequency_penalty > 0 and generated_tokens:
94
+ # Subtract a multiple of token frequency from logits
95
+ counts = {}
96
+ for t in generated_tokens:
97
+ counts[t] = counts.get(t, 0) + 1
98
+ # Build a dense penalty vector once per step
99
+ vocab_size = next_token_logits.shape[-1]
100
+ pen = [0.0] * vocab_size
101
+ for tid, c in counts.items():
102
+ pen[tid] = frequency_penalty * float(c)
103
+ next_token_logits = next_token_logits - mx.array(pen)
104
+
105
+ # Apply temperature (temperature==0 -> greedy)
106
+ if temperature == 0:
107
+ # Greedy decode
108
+ next_token = int(mx.argmax(next_token_logits).item())
109
+ else:
110
+ # Sampling path: scale logits, apply top-p mask in logits space
111
+ scaled_logits = next_token_logits / temperature
112
+
113
+ if 0.0 < top_p < 1.0:
114
+ probs = mx.softmax(scaled_logits, axis=-1)
115
+ sorted_probs = mx.sort(probs)[::-1]
116
+ cumulative_probs = mx.cumsum(sorted_probs, axis=-1)
117
+ cutoff_index = mx.sum(cumulative_probs < top_p)
118
+ cutoff_prob = sorted_probs[cutoff_index.item()]
119
+ mask = probs >= cutoff_prob
120
+ scaled_logits = mx.where(mask, scaled_logits, float("-inf"))
121
+
122
+ # Sample from logits (MLX categorical expects logits)
123
+ next_token = mx.random.categorical(scaled_logits, num_samples=1).item()
124
+
125
+ # Safer stop condition: support multiple EOS ids
126
+ eos_ids = tokenizer.eos_token_id
127
+ if isinstance(eos_ids, (list, tuple)):
128
+ stop_ids = set(int(i) for i in eos_ids)
129
+ else:
130
+ stop_ids = {int(eos_ids)}
131
+ if next_token in stop_ids:
132
+ print(f"Stopping generation at EOS token: {next_token}")
133
+ break
134
+
135
+ generated_tokens.append(next_token)
136
+ # Update frequency counts
137
+ freq_counts[next_token] = freq_counts.get(next_token, 0) + 1
138
+ # Append the new token for the next iteration
139
+ prompt_tokens = mx.concatenate(
140
+ [prompt_tokens, mx.array([[next_token]])], axis=1
141
+ )
142
+
143
+ # Print token as we generate for debugging
144
+ if i < 10: # Only print first 10 tokens to avoid spam
145
+ token_text = tokenizer.decode([next_token])
146
+ print(f"Token {i}: {next_token} -> '{token_text}'")
147
+
148
+ # Optional boxed stopping condition
149
+ if stop_at_boxed:
150
+ token_text_full = tokenizer.decode([next_token], skip_special_tokens=False)
151
+ running_text += token_text_full
152
+ if not seen_box_start and "\\boxed{" in running_text:
153
+ seen_box_start = True
154
+ if seen_box_start and "}" in running_text:
155
+ print("Stopping generation at boxed answer.")
156
+ break
157
+
158
+ end_time = time.time()
159
+
160
+ # Decode and print the result
161
+ if generated_tokens:
162
+ response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
163
+ print("\n--- Response ---")
164
+ print(response)
165
+ else:
166
+ print("\n--- No tokens generated ---")
167
+
168
+ print("------------------")
169
+
170
+ generation_speed = (
171
+ len(generated_tokens) / (end_time - start_time) if generated_tokens else 0
172
+ )
173
+ print(f"Generated {len(generated_tokens)} tokens")
174
+ print(f"Generation speed: {generation_speed:.2f} tokens/sec")
175
+
176
+ # Also print the full generated sequence including special tokens for debugging
177
+ if generated_tokens:
178
+ full_response = tokenizer.decode(generated_tokens, skip_special_tokens=False)
179
+ print(f"\nFull response (with special tokens): '{full_response}'")
180
+
181
+ if extract_boxed and generated_tokens:
182
+ import re
183
+ m = None
184
+ # Get the last occurrence of \\boxed{...}
185
+ for m in re.finditer(r"\\\\boxed\{([^}]*)\}", full_response):
186
+ pass
187
+ if m:
188
+ print(f"\nExtracted boxed answer: {m.group(1).strip()}")
189
+ else:
190
+ print("\nNo \\boxed{...} segment found to extract.")
191
+
192
+
193
+ def main():
194
+ parser = argparse.ArgumentParser(description="Run inference with the MLX model.")
195
+ parser.add_argument(
196
+ "--model-path", type=str, default=".", help="Path to the model directory."
197
+ )
198
+ parser.add_argument(
199
+ "--prompt",
200
+ type=str,
201
+ default="What is the capital of France?",
202
+ help="The prompt to start generation from.",
203
+ )
204
+ parser.add_argument(
205
+ "--max-tokens",
206
+ type=int,
207
+ default=100,
208
+ help="The maximum number of tokens to generate.",
209
+ )
210
+ parser.add_argument(
211
+ "--temperature", type=float, default=0.1, help="Sampling temperature."
212
+ )
213
+ parser.add_argument(
214
+ "--top-p", type=float, default=0.9, help="Top-p (nucleus) sampling parameter."
215
+ )
216
+ parser.add_argument(
217
+ "--system", type=str, default=None, help="Optional system message for chat template."
218
+ )
219
+ parser.add_argument(
220
+ "--final-only",
221
+ action="store_true",
222
+ help="Instruct the model to output only the final answer inside \\boxed{...}.",
223
+ )
224
+ parser.add_argument(
225
+ "--stop-at-boxed",
226
+ action="store_true",
227
+ help="Stop generation once a closing '}' appears after \\boxed{.",
228
+ )
229
+ parser.add_argument(
230
+ "--extract-boxed",
231
+ action="store_true",
232
+ help="Extract and print the content inside the last \\boxed{...} in the response.",
233
+ )
234
+ parser.add_argument(
235
+ "--disable-chat-template",
236
+ action="store_true",
237
+ help="Ignore chat_template.jinja and feed the raw prompt (prepended with BOS).",
238
+ )
239
+ parser.add_argument(
240
+ "--repetition-penalty",
241
+ type=float,
242
+ default=1.0,
243
+ help="Penalty (>1.0) to discourage previously generated tokens.",
244
+ )
245
+ parser.add_argument(
246
+ "--frequency-penalty",
247
+ type=float,
248
+ default=0.0,
249
+ help="Subtract alpha * count(token) from logits before sampling.",
250
+ )
251
+ args = parser.parse_args()
252
+
253
+ generate_text(
254
+ args.prompt,
255
+ args.model_path,
256
+ args.max_tokens,
257
+ args.temperature,
258
+ args.top_p,
259
+ args.system,
260
+ args.final_only,
261
+ args.stop_at_boxed,
262
+ args.extract_boxed,
263
+ args.disable_chat_template,
264
+ args.repetition_penalty,
265
+ args.frequency_penalty,
266
+ )
267
+
268
+
269
+ if __name__ == "__main__":
270
+ main()
main.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ def main():
2
+ print("Hello from mobilellm-r1-950m!")
3
+
4
+
5
+ if __name__ == "__main__":
6
+ main()
mlx_technical_summary.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Porting **MobileLLM-R1-950M** to MLX and mlx-lm: Architectural Challenges and Solutions
2
+
3
+ I spent a some time pairing with Gemini 2.5 Pro and later OpenAI Codex to drag the brand-new facebook/MobileLLM-R1-950M weights onto Apple Silicon.
4
+ This write-up is the “why it wasn’t copy-paste” story, plus the gotchas that bit us until the model finally spoke clean English and quantized without drama.
5
+
6
+ ### Goal
7
+
8
+ Enable **facebook/MobileLLM-R1-950M** to run natively on Apple Silicon using MLX, then create quantized versions compatible with the mlx-lm ecosystem.
9
+
10
+ ---
11
+
12
+ ## 1. Why a Direct "Llama-4 Drop-In" Failed
13
+
14
+ Although the Hugging Face repo presents MobileLLM-R1-950M as a Llama-4-style dense model, its **config and weights don't align cleanly** with a stock Llama block. The deviations aren't quirks of MLX—they reflect this model's specific architecture:
15
+
16
+ * **MLP ambiguity**
17
+ Config advertises both `intermediate_size` and `intermediate_size_mlp`, suggesting a dual-branch feed-forward.
18
+ Actual weights contain only a SwiGLU branch (`gate_proj`, `up_proj`, `down_proj`).
19
+ → Solution: **auto-detect MLP variant from weight names** at load time.
20
+
21
+ * **Grouped-Query Attention (GQA)**
22
+ `num_attention_heads=24`, `num_key_value_heads=6`.
23
+ K/V tensors must be **repeated to full head count** for attention shapes to align correctly.
24
+
25
+ * **QK-norm and scaling**
26
+ Config includes `use_qk_norm=True` and `attn_scale=0.1`.
27
+ We add the **RMSNorm on Q/K** as specified, but drop the extra `0.1` multiplier—applying it in MLX's `scaled_dot_product_attention` collapses logits into gibberish.
28
+
29
+ * **RoPE gating**
30
+ Config lists all layers under `no_rope_layers`.
31
+ Disabling RoPE everywhere would eliminate positional encoding entirely.
32
+ → Treat "all layers disabled" as a config artifact and **apply RoPE everywhere**.
33
+
34
+ ---
35
+
36
+ ## 2. Prompt-Level Deviations
37
+
38
+ Even after weights loaded correctly, default inference was disrupted by tokenizer settings:
39
+
40
+ * **Chat template**
41
+ Default system prompt: *"Please reason step-by-step and put your final answer within \boxed{}."*
42
+ Without overrides, the model produces verbose "reasoning" outputs.
43
+ → Added CLI controls: `--system`, `--disable-chat-template`, `--final-only`.
44
+
45
+ * **Double BOS**
46
+ Both tokenizer and template inserted BOS tokens.
47
+ → Fixed with `add_special_tokens=False`.
48
+
49
+ * **Premature EOS**
50
+ Template headers (`<|eot_id|>`) were treated as stop tokens.
51
+ → Limited stopping criteria to true EOS token only.
52
+
53
+ ---
54
+
55
+ ## 3. Sampling Stability
56
+
57
+ Sampling issues stemmed from API mismatches rather than model problems:
58
+
59
+ * **Top-p on probabilities** then feeding `mx.random.categorical` produced repetition loops.
60
+ * **Solution:** Apply penalties → scale logits → top-p mask (with `float('-inf')`) → `categorical(logits)`.
61
+ * Added controls for **temperature, repetition penalty, frequency penalty**.
62
+
63
+ ---
64
+
65
+ ## 4. Quantization in mlx-lm: Why Custom Metadata Was Required
66
+
67
+ mlx-lm provides quantization hooks, but MobileLLM's architecture exposed several challenges:
68
+
69
+ 1. **Frozen gradients during sensitivity analysis** → empty sensitivity lists.
70
+ → Avoid freezing weights during gradient computation.
71
+
72
+ 2. **Re-quantizing quantized layers** → type errors on second pass.
73
+ → Skip `QuantizedLinear` layers if already quantized.
74
+
75
+ 3. **Embedding/norm dtype crashes**
76
+ Standard quantization re-quantized everything, but embeddings must remain float.
77
+ → Introduced **metadata-driven approach**: config.json records *per-layer bit-widths*. Only specified layers are instantiated as `QuantizedLinear`.
78
+
79
+ This metadata contract allows **4-bit mixed-precision MobileLLM** to be loaded cleanly by our **metadata-aware `custom_loader.py`**, making it compatible with the mlx-lm ecosystem.
80
+
81
+ ---
82
+
83
+ ## 5. End State
84
+
85
+ * **MLX path:**
86
+ Structural fixes (GQA, MLP detection), numerical fixes (QK-norm, RoPE, attn_scale), and prompt controls together yield fluent, stable inference.
87
+
88
+ * **mlx-lm path:**
89
+ Custom quantization pipeline produces FP16 and 4-bit models. These can be loaded with our **metadata-aware `custom_loader.py`** and used for inference with our provided scripts.
90
+ Performance: measurable speedup and reduced VRAM usage on Apple Silicon, with minimal quality degradation.
91
+
92
+ ---
93
+
94
+ ### Takeaway
95
+
96
+ The MobileLLM-R1-950M port required systematically addressing architectural mismatches (MLP variant detection, GQA handling, QK-norm implementation, RoPE configuration) and developing a metadata-driven quantization approach. Once these were resolved, the model became fully functional in MLX with both float and quantized inference paths.
model.py ADDED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import mlx.core as mx
2
+ import mlx.nn as nn
3
+ import json
4
+ from dataclasses import dataclass
5
+ from pathlib import Path
6
+
7
+
8
+ @dataclass
9
+ class ModelArgs:
10
+ hidden_size: int
11
+ num_attention_heads: int
12
+ num_hidden_layers: int
13
+ vocab_size: int
14
+ intermediate_size: int
15
+ intermediate_size_mlp: int = None
16
+ num_key_value_heads: int = 0
17
+ rms_norm_eps: float = 1e-5
18
+ rope_theta: float = 10000.0
19
+ head_dim: int = None
20
+ use_dual_mlp: bool = False
21
+ tie_word_embeddings: bool = True
22
+ use_qk_norm: bool = False
23
+ attn_scale: float = 1.0
24
+ no_rope_layers: list | None = None
25
+ attention_chunk_size: int | None = None
26
+ attn_temperature_tuning: bool = False
27
+
28
+ @classmethod
29
+ def from_dict(cls, params):
30
+ return cls(
31
+ hidden_size=params["hidden_size"],
32
+ num_attention_heads=params["num_attention_heads"],
33
+ num_hidden_layers=params["num_hidden_layers"],
34
+ vocab_size=params["vocab_size"],
35
+ intermediate_size=params["intermediate_size"],
36
+ intermediate_size_mlp=params.get("intermediate_size_mlp"),
37
+ num_key_value_heads=params.get("num_key_value_heads", 0),
38
+ rms_norm_eps=params.get("rms_norm_eps", 1e-5),
39
+ rope_theta=params.get("rope_theta", 10000.0),
40
+ head_dim=params.get("head_dim"),
41
+ # Default: off. We'll detect from weights in load_model.
42
+ use_dual_mlp=False,
43
+ tie_word_embeddings=params.get("tie_word_embeddings", True),
44
+ use_qk_norm=params.get("use_qk_norm", False),
45
+ attn_scale=params.get("attn_scale", 1.0),
46
+ no_rope_layers=params.get("no_rope_layers"),
47
+ attention_chunk_size=params.get("attention_chunk_size"),
48
+ attn_temperature_tuning=params.get("attn_temperature_tuning", False),
49
+ )
50
+
51
+
52
+ class RMSNorm(nn.Module):
53
+ def __init__(self, dims: int, eps: float = 1e-5):
54
+ super().__init__()
55
+ self.weight = mx.ones((dims,))
56
+ self.eps = eps
57
+
58
+ def _norm(self, x):
59
+ return x * mx.rsqrt(x.square().mean(-1, keepdims=True) + self.eps)
60
+
61
+ def __call__(self, x):
62
+ output = self._norm(x.astype(mx.float32)).astype(x.dtype)
63
+ return self.weight * output
64
+
65
+
66
+ class Attention(nn.Module):
67
+ def __init__(self, args: ModelArgs):
68
+ super().__init__()
69
+ self.args = args
70
+ self.n_heads = args.num_attention_heads
71
+ self.n_kv_heads = (
72
+ args.num_key_value_heads
73
+ if args.num_key_value_heads > 0
74
+ else args.num_attention_heads
75
+ )
76
+ self.head_dim = (
77
+ args.head_dim
78
+ if getattr(args, "head_dim", None) is not None
79
+ else (args.hidden_size // self.n_heads)
80
+ )
81
+ # Use standard LLaMA scaling. The attn_scale field in some configs
82
+ # does not correspond to SDPA scaling and degrades outputs if applied here.
83
+ self.scale = self.head_dim**-0.5
84
+
85
+ self.q_proj = nn.Linear(
86
+ args.hidden_size, self.n_heads * self.head_dim, bias=False
87
+ )
88
+ self.k_proj = nn.Linear(
89
+ args.hidden_size, self.n_kv_heads * self.head_dim, bias=False
90
+ )
91
+ self.v_proj = nn.Linear(
92
+ args.hidden_size, self.n_kv_heads * self.head_dim, bias=False
93
+ )
94
+ self.o_proj = nn.Linear(
95
+ self.n_heads * self.head_dim, args.hidden_size, bias=False
96
+ )
97
+ self.q_norm = (
98
+ RMSNorm(self.head_dim, eps=args.rms_norm_eps)
99
+ if getattr(args, "use_qk_norm", False)
100
+ else None
101
+ )
102
+ self.k_norm = (
103
+ RMSNorm(self.head_dim, eps=args.rms_norm_eps)
104
+ if getattr(args, "use_qk_norm", False)
105
+ else None
106
+ )
107
+ # Llama 4 text models commonly use traditional RoPE application
108
+ self.rope = nn.RoPE(self.head_dim, traditional=True, base=args.rope_theta)
109
+
110
+ def __call__(
111
+ self,
112
+ x,
113
+ mask=None,
114
+ cache=None,
115
+ apply_rope: bool = True,
116
+ attn_temp: float | None = None,
117
+ ):
118
+ B, L, D = x.shape
119
+ queries, keys, values = self.q_proj(x), self.k_proj(x), self.v_proj(x)
120
+
121
+ queries = queries.reshape(B, L, self.n_heads, -1).transpose(0, 2, 1, 3)
122
+ keys = keys.reshape(B, L, self.n_kv_heads, -1).transpose(0, 2, 1, 3)
123
+ values = values.reshape(B, L, self.n_kv_heads, -1).transpose(0, 2, 1, 3)
124
+
125
+ if self.q_norm is not None:
126
+ queries = self.q_norm(queries)
127
+ keys = self.k_norm(keys)
128
+
129
+ # Optionally apply RoPE depending on per-layer setting
130
+ if apply_rope:
131
+ if cache is not None:
132
+ queries = self.rope(queries, offset=cache.offset)
133
+ keys = self.rope(keys, offset=cache.offset)
134
+ keys, values = cache.update_and_fetch(keys, values)
135
+ else:
136
+ queries = self.rope(queries)
137
+ keys = self.rope(keys)
138
+ else:
139
+ if cache is not None:
140
+ keys, values = cache.update_and_fetch(keys, values)
141
+
142
+ if self.n_kv_heads != self.n_heads:
143
+ repeat = self.n_heads // self.n_kv_heads
144
+ keys = mx.repeat(keys, repeat, axis=1)
145
+ values = mx.repeat(values, repeat, axis=1)
146
+
147
+ # Optional attention temperature tuning (scale the softmax input)
148
+ scale = self.scale if attn_temp is None else (self.scale * attn_temp)
149
+ output = mx.fast.scaled_dot_product_attention(
150
+ queries, keys, values, scale=scale, mask=mask
151
+ )
152
+ output = output.transpose(0, 2, 1, 3).reshape(B, L, -1)
153
+ return self.o_proj(output)
154
+
155
+
156
+ class SwiGLUMLP(nn.Module):
157
+ """Standard LLaMA-style gated MLP (SwiGLU)."""
158
+
159
+ def __init__(self, dim, intermediate_size, activation=nn.silu):
160
+ super().__init__()
161
+ self.gate_proj = nn.Linear(dim, intermediate_size, bias=False)
162
+ self.up_proj = nn.Linear(dim, intermediate_size, bias=False)
163
+ self.down_proj = nn.Linear(intermediate_size, dim, bias=False)
164
+
165
+ # self.activation = activation
166
+
167
+ def __call__(self, x):
168
+ # return self.down_proj(self.activation(self.gate_proj(x)) * self.up_proj(x))
169
+ return self.down_proj(nn.silu(self.gate_proj(x)) * self.up_proj(x))
170
+
171
+
172
+ class DualMLP(nn.Module):
173
+ """Dense dual-branch MLP: gated + plain."""
174
+
175
+ def __init__(self, dim, intermediate_gated, intermediate_plain, activation=nn.silu):
176
+ super().__init__()
177
+ self.g_up = nn.Linear(dim, intermediate_gated, bias=False)
178
+ self.g_gate = nn.Linear(dim, intermediate_gated, bias=False)
179
+ self.g_down = nn.Linear(intermediate_gated, dim, bias=False)
180
+
181
+ self.p_up = nn.Linear(dim, intermediate_plain, bias=False)
182
+ self.p_down = nn.Linear(intermediate_plain, dim, bias=False)
183
+
184
+ # self.activation = activation
185
+
186
+ def __call__(self, x):
187
+ # gated_out = self.g_down(self.activation(self.g_gate(x)) * self.g_up(x))
188
+ # plain_out = self.p_down(self.activation(self.p_up(x)))
189
+ gated_out = self.g_down(nn.silu(self.g_gate(x)) * self.g_up(x))
190
+ plain_out = self.p_down(nn.silu(self.p_up(x)))
191
+
192
+ return gated_out + plain_out
193
+
194
+
195
+ class TransformerBlock(nn.Module):
196
+ def __init__(self, args: ModelArgs, layer_idx: int):
197
+ super().__init__()
198
+ self.attention = Attention(args)
199
+ self.layer_idx = layer_idx
200
+ # RoPE gating per layer.
201
+ # If the config provides a per-layer no_rope mask:
202
+ # - If it disables ALL layers, ignore it (apply RoPE everywhere)
203
+ # - Otherwise, honor the per-layer flag.
204
+ if (
205
+ isinstance(args.no_rope_layers, list)
206
+ and len(args.no_rope_layers) > layer_idx
207
+ ):
208
+ all_marked = all(bool(v) for v in args.no_rope_layers)
209
+ if all_marked:
210
+ disable_rope = False
211
+ else:
212
+ disable_rope = bool(args.no_rope_layers[layer_idx])
213
+ else:
214
+ disable_rope = False
215
+ self.apply_rope = not disable_rope
216
+ self.layer_idx = layer_idx
217
+
218
+ if args.use_dual_mlp and args.intermediate_size_mlp:
219
+ self.feed_forward = DualMLP(
220
+ args.hidden_size,
221
+ args.intermediate_size,
222
+ args.intermediate_size_mlp,
223
+ )
224
+ else:
225
+ self.feed_forward = SwiGLUMLP(
226
+ args.hidden_size,
227
+ args.intermediate_size_mlp,
228
+ )
229
+
230
+ self.attention_norm = RMSNorm(args.hidden_size, eps=args.rms_norm_eps)
231
+ self.ffn_norm = RMSNorm(args.hidden_size, eps=args.rms_norm_eps)
232
+
233
+ def __call__(self, x, mask=None, cache=None):
234
+ L = x.shape[1]
235
+ # Use standard causal mask; iRoPE chunking is not applied for now
236
+ attn_mask = (
237
+ None
238
+ if L <= 1
239
+ else nn.MultiHeadAttention.create_additive_causal_mask(L).astype(x.dtype)
240
+ )
241
+ args = self.attention.args
242
+ apply_rope = self.apply_rope
243
+ attn_temp = 1.0 if getattr(args, "attn_temperature_tuning", False) else None
244
+
245
+ r = self.attention(
246
+ self.attention_norm(x),
247
+ attn_mask,
248
+ cache,
249
+ apply_rope=apply_rope,
250
+ attn_temp=attn_temp,
251
+ )
252
+ h = x + r
253
+ r = self.feed_forward(self.ffn_norm(h))
254
+ return h + r
255
+
256
+
257
+ class Model(nn.Module):
258
+ def __init__(self, args: ModelArgs):
259
+ super().__init__()
260
+ self.args = args
261
+ self.vocab_size = args.vocab_size
262
+ self.tok_embeddings = nn.Embedding(args.vocab_size, args.hidden_size)
263
+ # Plain Python list is fine in MLX
264
+ self.layers = [
265
+ TransformerBlock(args=args, layer_idx=i)
266
+ for i in range(args.num_hidden_layers)
267
+ ]
268
+ self.norm = RMSNorm(args.hidden_size, eps=args.rms_norm_eps)
269
+
270
+ if not self.args.tie_word_embeddings:
271
+ self.output = nn.Linear(args.hidden_size, args.vocab_size, bias=False)
272
+
273
+ def __call__(self, inputs, cache=None):
274
+ h = self.tok_embeddings(inputs)
275
+
276
+ if cache is None:
277
+ cache = [None] * len(self.layers)
278
+
279
+ for layer, c in zip(self.layers, cache):
280
+ h = layer(h, None, c)
281
+
282
+ h = self.norm(h)
283
+
284
+ if self.args.tie_word_embeddings:
285
+ return h @ self.tok_embeddings.weight.T
286
+ else:
287
+ return self.output(h)
288
+
289
+
290
+ def load_model(model_path: str):
291
+ model_path = Path(model_path)
292
+ with open(model_path / "config.json", "r") as f:
293
+ config = json.load(f)
294
+
295
+ from safetensors import safe_open
296
+ from mlx.utils import tree_unflatten
297
+
298
+ # Peek at weights to decide MLP variant
299
+ with safe_open(model_path / "model.safetensors", framework="mlx") as f:
300
+ keys = list(f.keys())
301
+ has_dual = any(
302
+ (".feed_forward.g_up.weight" in k)
303
+ or (".mlp.g_up.weight" in k)
304
+ or (".feed_forward.p_up.weight" in k)
305
+ or (".mlp.p_up.weight" in k)
306
+ for k in keys
307
+ )
308
+
309
+ args = ModelArgs.from_dict(config)
310
+ args.use_dual_mlp = bool(has_dual)
311
+ model = Model(args)
312
+
313
+ weights = {}
314
+ with safe_open(model_path / "model.safetensors", framework="mlx") as f:
315
+ for k in f.keys():
316
+ v = f.get_tensor(k)
317
+ # The keys in the safetensors file are from the Hugging Face model.
318
+ # We need to map them to the names in our MLX model.
319
+ k = k.replace("model.embed_tokens", "tok_embeddings")
320
+ k = k.replace("model.layers", "layers")
321
+ k = k.replace("self_attn", "attention")
322
+ k = k.replace("input_layernorm", "attention_norm")
323
+ k = k.replace("post_attention_layernorm", "ffn_norm")
324
+ k = k.replace("mlp.", "feed_forward.")
325
+ k = k.replace("model.norm", "norm")
326
+
327
+ # For the MLP, the names are conveniently the same if using SwiGLUMLP
328
+ # k = k.replace("feed_forward.gate_proj", "feed_forward.gate_proj")
329
+ # k = k.replace("feed_forward.up_proj", "feed_forward.up_proj")
330
+ # k = k.replace("feed_forward.down_proj", "feed_forward.down_proj")
331
+
332
+ weights[k] = v
333
+
334
+ # The output layer is tied to the token embeddings, so we don't load weights for it separately.
335
+ if config.get("tie_word_embeddings", True):
336
+ weights.pop("output.weight", None)
337
+
338
+ model.update(tree_unflatten(list(weights.items())))
339
+ return model
pr-16104-summary.md ADDED
The diff for this file is too large to render. See raw diff
 
pyproject.toml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "mobilellm-r1-950m"
3
+ version = "0.1.0"
4
+ description = "mlx_lm_for_mobile_llm_r1"
5
+ readme = "README.md"
6
+ requires-python = ">=3.13"
7
+ dependencies = [
8
+ "mlx>=0.29.1",
9
+ "mlx-lm>=0.27.1",
10
+ "safetensors>=0.6.2",
11
+ "transformers>=4.56.1",
12
+ ]
13
+
14
+ [dependency-groups]
15
+ dev = [
16
+ "torch>=2.8.0",
17
+ ]
18
+
19
+ [tool.hatch.build.targets.wheel]
20
+ packages = ["custom_mlx_lm"]
21
+
22
+ [project.scripts]
23
+ mobilellm-infer = "custom_mlx_lm.inference_mlx_lm:main"
quantization.log ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ uv run python custom_mlx_lm/custom_convert.py --hf-path . --mlx-path MobileLLM-R1-950M-mixed-4bit-mlx --dynamic-quant --target-bpw 4.5 --group-size 64 --report-ppl
2
+ Loading model from ....
3
+ Loading calibration data...
4
+ Token indices sequence length is longer than the specified maximum sequence length for this model (110205 > 32768). Running this sequence through the model will result in indexing errors
5
+ Calculating perplexity of original model...
6
+ Original PPL: 50.262
7
+ Starting advanced mixed-precision quantization...
8
+ huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
9
+ To disable this warning, you can either:
10
+ - Avoid using `tokenizers` before the fork if possible
11
+ - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
12
+ Estimating sensitivities: 100%|████████████████████████████████████| 54/54 [02:03<00:00, 2.28s/it]
13
+ Calculating perplexity of quantized model...
14
+ Quantized PPL: 59.059
15
+
16
+ ✅ Model saved to MobileLLM-R1-950M-mixed-4bit-mlx
17
+
18
+ uv run python custom_mlx_lm/quant_summary.py --model-path MobileLLM-R1-950M-mixed-4bit-mlx --show 8
19
+ Method: mixed_precision_dynamic
20
+ Group size: 64
21
+ Total linear layers: 154
22
+ 4-bit layers: 153
23
+ 8-bit layers: 1
24
+
25
+ Examples (8-bit):
26
+ - layers.0.attention.o_proj
27
+
28
+ Examples (4-bit):
29
+ - layers.0.attention.k_proj
30
+ - layers.0.attention.q_proj
31
+ - layers.0.attention.v_proj
32
+ - layers.0.feed_forward.down_proj
33
+ - layers.0.feed_forward.gate_proj
34
+ - layers.0.feed_forward.up_proj
35
+ - layers.1.attention.k_proj
36
+ - layers.1.attention.o_proj
37
+
38
+ weights.npz contains quantized tensors: True
requirements.txt ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ certifi==2025.8.3
2
+ charset-normalizer==3.4.3
3
+ filelock==3.19.1
4
+ fsspec==2025.9.0
5
+ hf-xet==1.1.10
6
+ huggingface-hub==0.34.4
7
+ idna==3.10
8
+ jinja2==3.1.6
9
+ markupsafe==3.0.2
10
+ mlx==0.29.1
11
+ mlx-lm==0.27.1
12
+ mlx-metal==0.29.1
13
+ mpmath==1.3.0
14
+ networkx==3.5
15
+ numpy==2.3.3
16
+ packaging==25.0
17
+ protobuf==6.32.1
18
+ pyyaml==6.0.2
19
+ regex==2025.9.1
20
+ requests==2.32.5
21
+ safetensors==0.6.2
22
+ setuptools==80.9.0
23
+ sympy==1.14.0
24
+ tokenizers==0.22.0
25
+ torch==2.8.0
26
+ tqdm==4.67.1
27
+ transformers==4.56.1
28
+ typing-extensions==4.15.0
29
+ urllib3==2.5.0
test_model.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from pathlib import Path
3
+
4
+ # Add the current directory to the python path to import model.py
5
+ sys.path.append(str(Path.cwd()))
6
+
7
+ from model import load_model
8
+ from mlx.utils import tree_flatten
9
+
10
+
11
+ def run_diagnostic_checks():
12
+ """
13
+ Performs the verification checks outlined in the review.
14
+ """
15
+ print("--- Running Diagnostic Checks ---")
16
+
17
+ # 1. Load model and check for errors
18
+ try:
19
+ model = load_model(".")
20
+ print("Successfully loaded model definition.")
21
+ except Exception as e:
22
+ print(f"Error loading model: {e}")
23
+ return
24
+
25
+ # 2. Print total parameter count
26
+ try:
27
+ params = model.parameters()
28
+ num_params = sum(p.size for _, p in tree_flatten(params))
29
+ print(f"Total number of parameters: {num_params / 1e6:.2f}M")
30
+ except Exception as e:
31
+ print(f"Error calculating parameters: {e}")
32
+
33
+ # 3. Verify MLP weight shapes
34
+ print("--- Verifying MLP Weight Shapes ---")
35
+ try:
36
+ first_block = model.layers[0]
37
+ args = model.args
38
+ print(f"use_dual_mlp detected: {args.use_dual_mlp}")
39
+
40
+ if args.use_dual_mlp:
41
+ g_up_shape = first_block.feed_forward.g_up.weight.shape
42
+ p_up_shape = first_block.feed_forward.p_up.weight.shape
43
+ print(f"Gated MLP branch (g_up) weight shape: {g_up_shape}")
44
+ print(f"Plain MLP branch (p_up) weight shape: {p_up_shape}")
45
+ assert g_up_shape == (args.intermediate_size, args.hidden_size)
46
+ assert p_up_shape == (args.intermediate_size_mlp, args.hidden_size)
47
+ print("DualMLP weight shapes are correct.")
48
+ else:
49
+ gate_proj_shape = first_block.feed_forward.gate_proj.weight.shape
50
+ up_proj_shape = first_block.feed_forward.up_proj.weight.shape
51
+ print(f"SwiGLUMLP gate_proj weight shape: {gate_proj_shape}")
52
+ print(f"SwiGLUMLP up_proj weight shape: {up_proj_shape}")
53
+ assert gate_proj_shape == (args.intermediate_size_mlp, args.hidden_size)
54
+ assert up_proj_shape == (args.intermediate_size_mlp, args.hidden_size)
55
+ print("SwiGLUMLP weight shapes are correct.")
56
+
57
+ except AttributeError as e:
58
+ print(
59
+ f"Error accessing MLP weights. It seems the structure is not as expected: {e}"
60
+ )
61
+ except AssertionError:
62
+ print("Error: MLP weight shapes do not match the configuration.")
63
+ except Exception as e:
64
+ print(f"An unexpected error occurred while verifying shapes: {e}")
65
+
66
+ # 4. Verify Embedding shape
67
+ print("--- Verifying Embedding Shape ---")
68
+ try:
69
+ embedding_shape = model.tok_embeddings.weight.shape
70
+ print(f"Embedding weight shape: {embedding_shape}")
71
+
72
+ args = model.args
73
+ print(f"Expected embedding shape: ({args.vocab_size}, {args.hidden_size})")
74
+
75
+ assert embedding_shape == (args.vocab_size, args.hidden_size)
76
+ print("Embedding shape is correct.")
77
+ except Exception as e:
78
+ print(f"An unexpected error occurred while verifying embedding shape: {e}")
79
+
80
+ print("--- Sanity Checking Loaded Weights ---")
81
+ try:
82
+ # Check expected attribute exists based on architecture
83
+ if model.args.use_dual_mlp:
84
+ _ = model.layers[0].feed_forward.g_gate.weight
85
+ _ = model.layers[0].feed_forward.g_up.weight
86
+ _ = model.layers[0].feed_forward.g_down.weight
87
+ _ = model.layers[0].feed_forward.p_up.weight
88
+ _ = model.layers[0].feed_forward.p_down.weight
89
+ print("Found dual-branch MLP weights in the model.")
90
+ else:
91
+ _ = model.layers[0].feed_forward.gate_proj.weight
92
+ _ = model.layers[0].feed_forward.up_proj.weight
93
+ _ = model.layers[0].feed_forward.down_proj.weight
94
+ print("Found SwiGLU MLP weights in the model.")
95
+ print("Weight presence sanity check passed.")
96
+ except Exception as e:
97
+ print(f"An error occurred during sanity check: {e}")
98
+
99
+ print("--- Diagnostic Checks Complete ---")
100
+
101
+
102
+ if __name__ == "__main__":
103
+ run_diagnostic_checks()
uv.lock ADDED
@@ -0,0 +1,678 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version = 1
2
+ revision = 3
3
+ requires-python = ">=3.13"
4
+
5
+ [[package]]
6
+ name = "certifi"
7
+ version = "2025.8.3"
8
+ source = { registry = "https://pypi.org/simple" }
9
+ sdist = { url = "https://files.pythonhosted.org/packages/dc/67/960ebe6bf230a96cda2e0abcf73af550ec4f090005363542f0765df162e0/certifi-2025.8.3.tar.gz", hash = "sha256:e564105f78ded564e3ae7c923924435e1daa7463faeab5bb932bc53ffae63407", size = 162386, upload-time = "2025-08-03T03:07:47.08Z" }
10
+ wheels = [
11
+ { url = "https://files.pythonhosted.org/packages/e5/48/1549795ba7742c948d2ad169c1c8cdbae65bc450d6cd753d124b17c8cd32/certifi-2025.8.3-py3-none-any.whl", hash = "sha256:f6c12493cfb1b06ba2ff328595af9350c65d6644968e5d3a2ffd78699af217a5", size = 161216, upload-time = "2025-08-03T03:07:45.777Z" },
12
+ ]
13
+
14
+ [[package]]
15
+ name = "charset-normalizer"
16
+ version = "3.4.3"
17
+ source = { registry = "https://pypi.org/simple" }
18
+ sdist = { url = "https://files.pythonhosted.org/packages/83/2d/5fd176ceb9b2fc619e63405525573493ca23441330fcdaee6bef9460e924/charset_normalizer-3.4.3.tar.gz", hash = "sha256:6fce4b8500244f6fcb71465d4a4930d132ba9ab8e71a7859e6a5d59851068d14", size = 122371, upload-time = "2025-08-09T07:57:28.46Z" }
19
+ wheels = [
20
+ { url = "https://files.pythonhosted.org/packages/65/ca/2135ac97709b400c7654b4b764daf5c5567c2da45a30cdd20f9eefe2d658/charset_normalizer-3.4.3-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:14c2a87c65b351109f6abfc424cab3927b3bdece6f706e4d12faaf3d52ee5efe", size = 205326, upload-time = "2025-08-09T07:56:24.721Z" },
21
+ { url = "https://files.pythonhosted.org/packages/71/11/98a04c3c97dd34e49c7d247083af03645ca3730809a5509443f3c37f7c99/charset_normalizer-3.4.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:41d1fc408ff5fdfb910200ec0e74abc40387bccb3252f3f27c0676731df2b2c8", size = 146008, upload-time = "2025-08-09T07:56:26.004Z" },
22
+ { url = "https://files.pythonhosted.org/packages/60/f5/4659a4cb3c4ec146bec80c32d8bb16033752574c20b1252ee842a95d1a1e/charset_normalizer-3.4.3-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:1bb60174149316da1c35fa5233681f7c0f9f514509b8e399ab70fea5f17e45c9", size = 159196, upload-time = "2025-08-09T07:56:27.25Z" },
23
+ { url = "https://files.pythonhosted.org/packages/86/9e/f552f7a00611f168b9a5865a1414179b2c6de8235a4fa40189f6f79a1753/charset_normalizer-3.4.3-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:30d006f98569de3459c2fc1f2acde170b7b2bd265dc1943e87e1a4efe1b67c31", size = 156819, upload-time = "2025-08-09T07:56:28.515Z" },
24
+ { url = "https://files.pythonhosted.org/packages/7e/95/42aa2156235cbc8fa61208aded06ef46111c4d3f0de233107b3f38631803/charset_normalizer-3.4.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:416175faf02e4b0810f1f38bcb54682878a4af94059a1cd63b8747244420801f", size = 151350, upload-time = "2025-08-09T07:56:29.716Z" },
25
+ { url = "https://files.pythonhosted.org/packages/c2/a9/3865b02c56f300a6f94fc631ef54f0a8a29da74fb45a773dfd3dcd380af7/charset_normalizer-3.4.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:6aab0f181c486f973bc7262a97f5aca3ee7e1437011ef0c2ec04b5a11d16c927", size = 148644, upload-time = "2025-08-09T07:56:30.984Z" },
26
+ { url = "https://files.pythonhosted.org/packages/77/d9/cbcf1a2a5c7d7856f11e7ac2d782aec12bdfea60d104e60e0aa1c97849dc/charset_normalizer-3.4.3-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:fdabf8315679312cfa71302f9bd509ded4f2f263fb5b765cf1433b39106c3cc9", size = 160468, upload-time = "2025-08-09T07:56:32.252Z" },
27
+ { url = "https://files.pythonhosted.org/packages/f6/42/6f45efee8697b89fda4d50580f292b8f7f9306cb2971d4b53f8914e4d890/charset_normalizer-3.4.3-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:bd28b817ea8c70215401f657edef3a8aa83c29d447fb0b622c35403780ba11d5", size = 158187, upload-time = "2025-08-09T07:56:33.481Z" },
28
+ { url = "https://files.pythonhosted.org/packages/70/99/f1c3bdcfaa9c45b3ce96f70b14f070411366fa19549c1d4832c935d8e2c3/charset_normalizer-3.4.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:18343b2d246dc6761a249ba1fb13f9ee9a2bcd95decc767319506056ea4ad4dc", size = 152699, upload-time = "2025-08-09T07:56:34.739Z" },
29
+ { url = "https://files.pythonhosted.org/packages/a3/ad/b0081f2f99a4b194bcbb1934ef3b12aa4d9702ced80a37026b7607c72e58/charset_normalizer-3.4.3-cp313-cp313-win32.whl", hash = "sha256:6fb70de56f1859a3f71261cbe41005f56a7842cc348d3aeb26237560bfa5e0ce", size = 99580, upload-time = "2025-08-09T07:56:35.981Z" },
30
+ { url = "https://files.pythonhosted.org/packages/9a/8f/ae790790c7b64f925e5c953b924aaa42a243fb778fed9e41f147b2a5715a/charset_normalizer-3.4.3-cp313-cp313-win_amd64.whl", hash = "sha256:cf1ebb7d78e1ad8ec2a8c4732c7be2e736f6e5123a4146c5b89c9d1f585f8cef", size = 107366, upload-time = "2025-08-09T07:56:37.339Z" },
31
+ { url = "https://files.pythonhosted.org/packages/8e/91/b5a06ad970ddc7a0e513112d40113e834638f4ca1120eb727a249fb2715e/charset_normalizer-3.4.3-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:3cd35b7e8aedeb9e34c41385fda4f73ba609e561faedfae0a9e75e44ac558a15", size = 204342, upload-time = "2025-08-09T07:56:38.687Z" },
32
+ { url = "https://files.pythonhosted.org/packages/ce/ec/1edc30a377f0a02689342f214455c3f6c2fbedd896a1d2f856c002fc3062/charset_normalizer-3.4.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b89bc04de1d83006373429975f8ef9e7932534b8cc9ca582e4db7d20d91816db", size = 145995, upload-time = "2025-08-09T07:56:40.048Z" },
33
+ { url = "https://files.pythonhosted.org/packages/17/e5/5e67ab85e6d22b04641acb5399c8684f4d37caf7558a53859f0283a650e9/charset_normalizer-3.4.3-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:2001a39612b241dae17b4687898843f254f8748b796a2e16f1051a17078d991d", size = 158640, upload-time = "2025-08-09T07:56:41.311Z" },
34
+ { url = "https://files.pythonhosted.org/packages/f1/e5/38421987f6c697ee3722981289d554957c4be652f963d71c5e46a262e135/charset_normalizer-3.4.3-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:8dcfc373f888e4fb39a7bc57e93e3b845e7f462dacc008d9749568b1c4ece096", size = 156636, upload-time = "2025-08-09T07:56:43.195Z" },
35
+ { url = "https://files.pythonhosted.org/packages/a0/e4/5a075de8daa3ec0745a9a3b54467e0c2967daaaf2cec04c845f73493e9a1/charset_normalizer-3.4.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:18b97b8404387b96cdbd30ad660f6407799126d26a39ca65729162fd810a99aa", size = 150939, upload-time = "2025-08-09T07:56:44.819Z" },
36
+ { url = "https://files.pythonhosted.org/packages/02/f7/3611b32318b30974131db62b4043f335861d4d9b49adc6d57c1149cc49d4/charset_normalizer-3.4.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:ccf600859c183d70eb47e05a44cd80a4ce77394d1ac0f79dbd2dd90a69a3a049", size = 148580, upload-time = "2025-08-09T07:56:46.684Z" },
37
+ { url = "https://files.pythonhosted.org/packages/7e/61/19b36f4bd67f2793ab6a99b979b4e4f3d8fc754cbdffb805335df4337126/charset_normalizer-3.4.3-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:53cd68b185d98dde4ad8990e56a58dea83a4162161b1ea9272e5c9182ce415e0", size = 159870, upload-time = "2025-08-09T07:56:47.941Z" },
38
+ { url = "https://files.pythonhosted.org/packages/06/57/84722eefdd338c04cf3030ada66889298eaedf3e7a30a624201e0cbe424a/charset_normalizer-3.4.3-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:30a96e1e1f865f78b030d65241c1ee850cdf422d869e9028e2fc1d5e4db73b92", size = 157797, upload-time = "2025-08-09T07:56:49.756Z" },
39
+ { url = "https://files.pythonhosted.org/packages/72/2a/aff5dd112b2f14bcc3462c312dce5445806bfc8ab3a7328555da95330e4b/charset_normalizer-3.4.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:d716a916938e03231e86e43782ca7878fb602a125a91e7acb8b5112e2e96ac16", size = 152224, upload-time = "2025-08-09T07:56:51.369Z" },
40
+ { url = "https://files.pythonhosted.org/packages/b7/8c/9839225320046ed279c6e839d51f028342eb77c91c89b8ef2549f951f3ec/charset_normalizer-3.4.3-cp314-cp314-win32.whl", hash = "sha256:c6dbd0ccdda3a2ba7c2ecd9d77b37f3b5831687d8dc1b6ca5f56a4880cc7b7ce", size = 100086, upload-time = "2025-08-09T07:56:52.722Z" },
41
+ { url = "https://files.pythonhosted.org/packages/ee/7a/36fbcf646e41f710ce0a563c1c9a343c6edf9be80786edeb15b6f62e17db/charset_normalizer-3.4.3-cp314-cp314-win_amd64.whl", hash = "sha256:73dc19b562516fc9bcf6e5d6e596df0b4eb98d87e4f79f3ae71840e6ed21361c", size = 107400, upload-time = "2025-08-09T07:56:55.172Z" },
42
+ { url = "https://files.pythonhosted.org/packages/8a/1f/f041989e93b001bc4e44bb1669ccdcf54d3f00e628229a85b08d330615c5/charset_normalizer-3.4.3-py3-none-any.whl", hash = "sha256:ce571ab16d890d23b5c278547ba694193a45011ff86a9162a71307ed9f86759a", size = 53175, upload-time = "2025-08-09T07:57:26.864Z" },
43
+ ]
44
+
45
+ [[package]]
46
+ name = "colorama"
47
+ version = "0.4.6"
48
+ source = { registry = "https://pypi.org/simple" }
49
+ sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697, upload-time = "2022-10-25T02:36:22.414Z" }
50
+ wheels = [
51
+ { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" },
52
+ ]
53
+
54
+ [[package]]
55
+ name = "filelock"
56
+ version = "3.19.1"
57
+ source = { registry = "https://pypi.org/simple" }
58
+ sdist = { url = "https://files.pythonhosted.org/packages/40/bb/0ab3e58d22305b6f5440629d20683af28959bf793d98d11950e305c1c326/filelock-3.19.1.tar.gz", hash = "sha256:66eda1888b0171c998b35be2bcc0f6d75c388a7ce20c3f3f37aa8e96c2dddf58", size = 17687, upload-time = "2025-08-14T16:56:03.016Z" }
59
+ wheels = [
60
+ { url = "https://files.pythonhosted.org/packages/42/14/42b2651a2f46b022ccd948bca9f2d5af0fd8929c4eec235b8d6d844fbe67/filelock-3.19.1-py3-none-any.whl", hash = "sha256:d38e30481def20772f5baf097c122c3babc4fcdb7e14e57049eb9d88c6dc017d", size = 15988, upload-time = "2025-08-14T16:56:01.633Z" },
61
+ ]
62
+
63
+ [[package]]
64
+ name = "fsspec"
65
+ version = "2025.9.0"
66
+ source = { registry = "https://pypi.org/simple" }
67
+ sdist = { url = "https://files.pythonhosted.org/packages/de/e0/bab50af11c2d75c9c4a2a26a5254573c0bd97cea152254401510950486fa/fsspec-2025.9.0.tar.gz", hash = "sha256:19fd429483d25d28b65ec68f9f4adc16c17ea2c7c7bf54ec61360d478fb19c19", size = 304847, upload-time = "2025-09-02T19:10:49.215Z" }
68
+ wheels = [
69
+ { url = "https://files.pythonhosted.org/packages/47/71/70db47e4f6ce3e5c37a607355f80da8860a33226be640226ac52cb05ef2e/fsspec-2025.9.0-py3-none-any.whl", hash = "sha256:530dc2a2af60a414a832059574df4a6e10cce927f6f4a78209390fe38955cfb7", size = 199289, upload-time = "2025-09-02T19:10:47.708Z" },
70
+ ]
71
+
72
+ [[package]]
73
+ name = "hf-xet"
74
+ version = "1.1.10"
75
+ source = { registry = "https://pypi.org/simple" }
76
+ sdist = { url = "https://files.pythonhosted.org/packages/74/31/feeddfce1748c4a233ec1aa5b7396161c07ae1aa9b7bdbc9a72c3c7dd768/hf_xet-1.1.10.tar.gz", hash = "sha256:408aef343800a2102374a883f283ff29068055c111f003ff840733d3b715bb97", size = 487910, upload-time = "2025-09-12T20:10:27.12Z" }
77
+ wheels = [
78
+ { url = "https://files.pythonhosted.org/packages/f7/a2/343e6d05de96908366bdc0081f2d8607d61200be2ac802769c4284cc65bd/hf_xet-1.1.10-cp37-abi3-macosx_10_12_x86_64.whl", hash = "sha256:686083aca1a6669bc85c21c0563551cbcdaa5cf7876a91f3d074a030b577231d", size = 2761466, upload-time = "2025-09-12T20:10:22.836Z" },
79
+ { url = "https://files.pythonhosted.org/packages/31/f9/6215f948ac8f17566ee27af6430ea72045e0418ce757260248b483f4183b/hf_xet-1.1.10-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:71081925383b66b24eedff3013f8e6bbd41215c3338be4b94ba75fd75b21513b", size = 2623807, upload-time = "2025-09-12T20:10:21.118Z" },
80
+ { url = "https://files.pythonhosted.org/packages/15/07/86397573efefff941e100367bbda0b21496ffcdb34db7ab51912994c32a2/hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6b6bceb6361c80c1cc42b5a7b4e3efd90e64630bcf11224dcac50ef30a47e435", size = 3186960, upload-time = "2025-09-12T20:10:19.336Z" },
81
+ { url = "https://files.pythonhosted.org/packages/01/a7/0b2e242b918cc30e1f91980f3c4b026ff2eedaf1e2ad96933bca164b2869/hf_xet-1.1.10-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:eae7c1fc8a664e54753ffc235e11427ca61f4b0477d757cc4eb9ae374b69f09c", size = 3087167, upload-time = "2025-09-12T20:10:17.255Z" },
82
+ { url = "https://files.pythonhosted.org/packages/4a/25/3e32ab61cc7145b11eee9d745988e2f0f4fafda81b25980eebf97d8cff15/hf_xet-1.1.10-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:0a0005fd08f002180f7a12d4e13b22be277725bc23ed0529f8add5c7a6309c06", size = 3248612, upload-time = "2025-09-12T20:10:24.093Z" },
83
+ { url = "https://files.pythonhosted.org/packages/2c/3d/ab7109e607ed321afaa690f557a9ada6d6d164ec852fd6bf9979665dc3d6/hf_xet-1.1.10-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:f900481cf6e362a6c549c61ff77468bd59d6dd082f3170a36acfef2eb6a6793f", size = 3353360, upload-time = "2025-09-12T20:10:25.563Z" },
84
+ { url = "https://files.pythonhosted.org/packages/ee/0e/471f0a21db36e71a2f1752767ad77e92d8cde24e974e03d662931b1305ec/hf_xet-1.1.10-cp37-abi3-win_amd64.whl", hash = "sha256:5f54b19cc347c13235ae7ee98b330c26dd65ef1df47e5316ffb1e87713ca7045", size = 2804691, upload-time = "2025-09-12T20:10:28.433Z" },
85
+ ]
86
+
87
+ [[package]]
88
+ name = "huggingface-hub"
89
+ version = "0.34.4"
90
+ source = { registry = "https://pypi.org/simple" }
91
+ dependencies = [
92
+ { name = "filelock" },
93
+ { name = "fsspec" },
94
+ { name = "hf-xet", marker = "platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'" },
95
+ { name = "packaging" },
96
+ { name = "pyyaml" },
97
+ { name = "requests" },
98
+ { name = "tqdm" },
99
+ { name = "typing-extensions" },
100
+ ]
101
+ sdist = { url = "https://files.pythonhosted.org/packages/45/c9/bdbe19339f76d12985bc03572f330a01a93c04dffecaaea3061bdd7fb892/huggingface_hub-0.34.4.tar.gz", hash = "sha256:a4228daa6fb001be3f4f4bdaf9a0db00e1739235702848df00885c9b5742c85c", size = 459768, upload-time = "2025-08-08T09:14:52.365Z" }
102
+ wheels = [
103
+ { url = "https://files.pythonhosted.org/packages/39/7b/bb06b061991107cd8783f300adff3e7b7f284e330fd82f507f2a1417b11d/huggingface_hub-0.34.4-py3-none-any.whl", hash = "sha256:9b365d781739c93ff90c359844221beef048403f1bc1f1c123c191257c3c890a", size = 561452, upload-time = "2025-08-08T09:14:50.159Z" },
104
+ ]
105
+
106
+ [[package]]
107
+ name = "idna"
108
+ version = "3.10"
109
+ source = { registry = "https://pypi.org/simple" }
110
+ sdist = { url = "https://files.pythonhosted.org/packages/f1/70/7703c29685631f5a7590aa73f1f1d3fa9a380e654b86af429e0934a32f7d/idna-3.10.tar.gz", hash = "sha256:12f65c9b470abda6dc35cf8e63cc574b1c52b11df2c86030af0ac09b01b13ea9", size = 190490, upload-time = "2024-09-15T18:07:39.745Z" }
111
+ wheels = [
112
+ { url = "https://files.pythonhosted.org/packages/76/c6/c88e154df9c4e1a2a66ccf0005a88dfb2650c1dffb6f5ce603dfbd452ce3/idna-3.10-py3-none-any.whl", hash = "sha256:946d195a0d259cbba61165e88e65941f16e9b36ea6ddb97f00452bae8b1287d3", size = 70442, upload-time = "2024-09-15T18:07:37.964Z" },
113
+ ]
114
+
115
+ [[package]]
116
+ name = "jinja2"
117
+ version = "3.1.6"
118
+ source = { registry = "https://pypi.org/simple" }
119
+ dependencies = [
120
+ { name = "markupsafe" },
121
+ ]
122
+ sdist = { url = "https://files.pythonhosted.org/packages/df/bf/f7da0350254c0ed7c72f3e33cef02e048281fec7ecec5f032d4aac52226b/jinja2-3.1.6.tar.gz", hash = "sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d", size = 245115, upload-time = "2025-03-05T20:05:02.478Z" }
123
+ wheels = [
124
+ { url = "https://files.pythonhosted.org/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67", size = 134899, upload-time = "2025-03-05T20:05:00.369Z" },
125
+ ]
126
+
127
+ [[package]]
128
+ name = "markupsafe"
129
+ version = "3.0.2"
130
+ source = { registry = "https://pypi.org/simple" }
131
+ sdist = { url = "https://files.pythonhosted.org/packages/b2/97/5d42485e71dfc078108a86d6de8fa46db44a1a9295e89c5d6d4a06e23a62/markupsafe-3.0.2.tar.gz", hash = "sha256:ee55d3edf80167e48ea11a923c7386f4669df67d7994554387f84e7d8b0a2bf0", size = 20537, upload-time = "2024-10-18T15:21:54.129Z" }
132
+ wheels = [
133
+ { url = "https://files.pythonhosted.org/packages/83/0e/67eb10a7ecc77a0c2bbe2b0235765b98d164d81600746914bebada795e97/MarkupSafe-3.0.2-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:ba9527cdd4c926ed0760bc301f6728ef34d841f405abf9d4f959c478421e4efd", size = 14274, upload-time = "2024-10-18T15:21:24.577Z" },
134
+ { url = "https://files.pythonhosted.org/packages/2b/6d/9409f3684d3335375d04e5f05744dfe7e9f120062c9857df4ab490a1031a/MarkupSafe-3.0.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f8b3d067f2e40fe93e1ccdd6b2e1d16c43140e76f02fb1319a05cf2b79d99430", size = 12352, upload-time = "2024-10-18T15:21:25.382Z" },
135
+ { url = "https://files.pythonhosted.org/packages/d2/f5/6eadfcd3885ea85fe2a7c128315cc1bb7241e1987443d78c8fe712d03091/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:569511d3b58c8791ab4c2e1285575265991e6d8f8700c7be0e88f86cb0672094", size = 24122, upload-time = "2024-10-18T15:21:26.199Z" },
136
+ { url = "https://files.pythonhosted.org/packages/0c/91/96cf928db8236f1bfab6ce15ad070dfdd02ed88261c2afafd4b43575e9e9/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:15ab75ef81add55874e7ab7055e9c397312385bd9ced94920f2802310c930396", size = 23085, upload-time = "2024-10-18T15:21:27.029Z" },
137
+ { url = "https://files.pythonhosted.org/packages/c2/cf/c9d56af24d56ea04daae7ac0940232d31d5a8354f2b457c6d856b2057d69/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f3818cb119498c0678015754eba762e0d61e5b52d34c8b13d770f0719f7b1d79", size = 22978, upload-time = "2024-10-18T15:21:27.846Z" },
138
+ { url = "https://files.pythonhosted.org/packages/2a/9f/8619835cd6a711d6272d62abb78c033bda638fdc54c4e7f4272cf1c0962b/MarkupSafe-3.0.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:cdb82a876c47801bb54a690c5ae105a46b392ac6099881cdfb9f6e95e4014c6a", size = 24208, upload-time = "2024-10-18T15:21:28.744Z" },
139
+ { url = "https://files.pythonhosted.org/packages/f9/bf/176950a1792b2cd2102b8ffeb5133e1ed984547b75db47c25a67d3359f77/MarkupSafe-3.0.2-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:cabc348d87e913db6ab4aa100f01b08f481097838bdddf7c7a84b7575b7309ca", size = 23357, upload-time = "2024-10-18T15:21:29.545Z" },
140
+ { url = "https://files.pythonhosted.org/packages/ce/4f/9a02c1d335caabe5c4efb90e1b6e8ee944aa245c1aaaab8e8a618987d816/MarkupSafe-3.0.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:444dcda765c8a838eaae23112db52f1efaf750daddb2d9ca300bcae1039adc5c", size = 23344, upload-time = "2024-10-18T15:21:30.366Z" },
141
+ { url = "https://files.pythonhosted.org/packages/ee/55/c271b57db36f748f0e04a759ace9f8f759ccf22b4960c270c78a394f58be/MarkupSafe-3.0.2-cp313-cp313-win32.whl", hash = "sha256:bcf3e58998965654fdaff38e58584d8937aa3096ab5354d493c77d1fdd66d7a1", size = 15101, upload-time = "2024-10-18T15:21:31.207Z" },
142
+ { url = "https://files.pythonhosted.org/packages/29/88/07df22d2dd4df40aba9f3e402e6dc1b8ee86297dddbad4872bd5e7b0094f/MarkupSafe-3.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:e6a2a455bd412959b57a172ce6328d2dd1f01cb2135efda2e4576e8a23fa3b0f", size = 15603, upload-time = "2024-10-18T15:21:32.032Z" },
143
+ { url = "https://files.pythonhosted.org/packages/62/6a/8b89d24db2d32d433dffcd6a8779159da109842434f1dd2f6e71f32f738c/MarkupSafe-3.0.2-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:b5a6b3ada725cea8a5e634536b1b01c30bcdcd7f9c6fff4151548d5bf6b3a36c", size = 14510, upload-time = "2024-10-18T15:21:33.625Z" },
144
+ { url = "https://files.pythonhosted.org/packages/7a/06/a10f955f70a2e5a9bf78d11a161029d278eeacbd35ef806c3fd17b13060d/MarkupSafe-3.0.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:a904af0a6162c73e3edcb969eeeb53a63ceeb5d8cf642fade7d39e7963a22ddb", size = 12486, upload-time = "2024-10-18T15:21:34.611Z" },
145
+ { url = "https://files.pythonhosted.org/packages/34/cf/65d4a571869a1a9078198ca28f39fba5fbb910f952f9dbc5220afff9f5e6/MarkupSafe-3.0.2-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4aa4e5faecf353ed117801a068ebab7b7e09ffb6e1d5e412dc852e0da018126c", size = 25480, upload-time = "2024-10-18T15:21:35.398Z" },
146
+ { url = "https://files.pythonhosted.org/packages/0c/e3/90e9651924c430b885468b56b3d597cabf6d72be4b24a0acd1fa0e12af67/MarkupSafe-3.0.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c0ef13eaeee5b615fb07c9a7dadb38eac06a0608b41570d8ade51c56539e509d", size = 23914, upload-time = "2024-10-18T15:21:36.231Z" },
147
+ { url = "https://files.pythonhosted.org/packages/66/8c/6c7cf61f95d63bb866db39085150df1f2a5bd3335298f14a66b48e92659c/MarkupSafe-3.0.2-cp313-cp313t-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d16a81a06776313e817c951135cf7340a3e91e8c1ff2fac444cfd75fffa04afe", size = 23796, upload-time = "2024-10-18T15:21:37.073Z" },
148
+ { url = "https://files.pythonhosted.org/packages/bb/35/cbe9238ec3f47ac9a7c8b3df7a808e7cb50fe149dc7039f5f454b3fba218/MarkupSafe-3.0.2-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:6381026f158fdb7c72a168278597a5e3a5222e83ea18f543112b2662a9b699c5", size = 25473, upload-time = "2024-10-18T15:21:37.932Z" },
149
+ { url = "https://files.pythonhosted.org/packages/e6/32/7621a4382488aa283cc05e8984a9c219abad3bca087be9ec77e89939ded9/MarkupSafe-3.0.2-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:3d79d162e7be8f996986c064d1c7c817f6df3a77fe3d6859f6f9e7be4b8c213a", size = 24114, upload-time = "2024-10-18T15:21:39.799Z" },
150
+ { url = "https://files.pythonhosted.org/packages/0d/80/0985960e4b89922cb5a0bac0ed39c5b96cbc1a536a99f30e8c220a996ed9/MarkupSafe-3.0.2-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:131a3c7689c85f5ad20f9f6fb1b866f402c445b220c19fe4308c0b147ccd2ad9", size = 24098, upload-time = "2024-10-18T15:21:40.813Z" },
151
+ { url = "https://files.pythonhosted.org/packages/82/78/fedb03c7d5380df2427038ec8d973587e90561b2d90cd472ce9254cf348b/MarkupSafe-3.0.2-cp313-cp313t-win32.whl", hash = "sha256:ba8062ed2cf21c07a9e295d5b8a2a5ce678b913b45fdf68c32d95d6c1291e0b6", size = 15208, upload-time = "2024-10-18T15:21:41.814Z" },
152
+ { url = "https://files.pythonhosted.org/packages/4f/65/6079a46068dfceaeabb5dcad6d674f5f5c61a6fa5673746f42a9f4c233b3/MarkupSafe-3.0.2-cp313-cp313t-win_amd64.whl", hash = "sha256:e444a31f8db13eb18ada366ab3cf45fd4b31e4db1236a4448f68778c1d1a5a2f", size = 15739, upload-time = "2024-10-18T15:21:42.784Z" },
153
+ ]
154
+
155
+ [[package]]
156
+ name = "mlx"
157
+ version = "0.29.1"
158
+ source = { registry = "https://pypi.org/simple" }
159
+ dependencies = [
160
+ { name = "mlx-metal", marker = "sys_platform == 'darwin'" },
161
+ ]
162
+ wheels = [
163
+ { url = "https://files.pythonhosted.org/packages/66/62/7691ea664123d6e1fc0626207d5f1a6ed2b92b71059f4be42634e89b479e/mlx-0.29.1-cp313-cp313-macosx_13_0_arm64.whl", hash = "sha256:e86644cef409a00dd46eb9debf0796899623c686d16cc25b6e83078fb5081eba", size = 546904, upload-time = "2025-09-12T00:17:43.197Z" },
164
+ { url = "https://files.pythonhosted.org/packages/44/b8/1a77cafb6302703fe5576b2298f533cb36b6721fa6d9c41a9d6078c14a89/mlx-0.29.1-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:fd27d49f631ecc9d0a766327e65236738e338c74c7be504c22a1e53801eb40d1", size = 546909, upload-time = "2025-09-12T00:17:15.127Z" },
165
+ { url = "https://files.pythonhosted.org/packages/79/f1/1f4ddf70d1f77993e25f25fb0ab8f5579d81fce6a2a554400c75b447c148/mlx-0.29.1-cp313-cp313-macosx_15_0_arm64.whl", hash = "sha256:aaeacf864163b645ddd58c57e65290bf4c8cd493378e89dd11c00d2c9c42b42d", size = 546904, upload-time = "2025-09-12T00:17:09.691Z" },
166
+ { url = "https://files.pythonhosted.org/packages/eb/47/7216e859ba3dbda78c840858cf1e120442721b48c974f587ef4e89d5f86f/mlx-0.29.1-cp313-cp313-manylinux_2_35_x86_64.whl", hash = "sha256:e33221c75ebed38dc6bad7fed46cdde8e4dbb47d789401232b4ab2c34305d42d", size = 646103, upload-time = "2025-09-12T00:21:56.544Z" },
167
+ ]
168
+
169
+ [[package]]
170
+ name = "mlx-lm"
171
+ version = "0.27.1"
172
+ source = { registry = "https://pypi.org/simple" }
173
+ dependencies = [
174
+ { name = "jinja2" },
175
+ { name = "mlx" },
176
+ { name = "numpy" },
177
+ { name = "protobuf" },
178
+ { name = "pyyaml" },
179
+ { name = "transformers" },
180
+ ]
181
+ sdist = { url = "https://files.pythonhosted.org/packages/41/77/e8d3a82658a2070bc392a583dd08c8d24088433e920eac4905bf882255ad/mlx_lm-0.27.1.tar.gz", hash = "sha256:36640fb64c909cfd9baddf37b16e7d3b94a1a141033e6b7ea7a0ef5a965fb4ae", size = 185170, upload-time = "2025-09-04T16:06:57.949Z" }
182
+ wheels = [
183
+ { url = "https://files.pythonhosted.org/packages/e1/54/5f35831d208cbf81572e9a0ae8ac6d595ca7c59f3e1da57c367894b0a75b/mlx_lm-0.27.1-py3-none-any.whl", hash = "sha256:300da6f63d8d392483b62b2abda794730fa04343dcb28a1f6a712f4c3ab60f3c", size = 255687, upload-time = "2025-09-04T16:06:54.904Z" },
184
+ ]
185
+
186
+ [[package]]
187
+ name = "mlx-metal"
188
+ version = "0.29.1"
189
+ source = { registry = "https://pypi.org/simple" }
190
+ wheels = [
191
+ { url = "https://files.pythonhosted.org/packages/61/b4/c96f54061fff12c2acc06f2cd402aae4a9cba52e40aae51f71ae508ef206/mlx_metal-0.29.1-py3-none-macosx_13_0_arm64.whl", hash = "sha256:b9dadd432948eab196ed110db0dc745795fd516b7124c0d3c4d176fee678a07a", size = 34983555, upload-time = "2025-09-12T00:19:44.815Z" },
192
+ { url = "https://files.pythonhosted.org/packages/82/3a/45c9ea1b6741a5dc80ad0b57eeee09e544a0d89ca66c7ad6cc55887c00d8/mlx_metal-0.29.1-py3-none-macosx_14_0_arm64.whl", hash = "sha256:824b939b721a964a455aeea4d0e956e4cc945f3333522c1e72a077ae774bca49", size = 34712571, upload-time = "2025-09-12T00:19:26.183Z" },
193
+ { url = "https://files.pythonhosted.org/packages/64/7f/294c8cac159661d732e5c01f841e07edfd2ea90651d39faca6579b3cdbf4/mlx_metal-0.29.1-py3-none-macosx_15_0_arm64.whl", hash = "sha256:ebd9ba8e83213f929663b92b8065b451a4276c7002ed83eae0fc8dde721c50c5", size = 34704543, upload-time = "2025-09-12T00:18:59.595Z" },
194
+ ]
195
+
196
+ [[package]]
197
+ name = "mobilellm-r1-950m"
198
+ version = "0.1.0"
199
+ source = { virtual = "." }
200
+ dependencies = [
201
+ { name = "mlx" },
202
+ { name = "mlx-lm" },
203
+ { name = "safetensors" },
204
+ { name = "transformers" },
205
+ ]
206
+
207
+ [package.dev-dependencies]
208
+ dev = [
209
+ { name = "torch" },
210
+ ]
211
+
212
+ [package.metadata]
213
+ requires-dist = [
214
+ { name = "mlx", specifier = ">=0.29.1" },
215
+ { name = "mlx-lm", specifier = ">=0.27.1" },
216
+ { name = "safetensors", specifier = ">=0.6.2" },
217
+ { name = "transformers", specifier = ">=4.56.1" },
218
+ ]
219
+
220
+ [package.metadata.requires-dev]
221
+ dev = [{ name = "torch", specifier = ">=2.8.0" }]
222
+
223
+ [[package]]
224
+ name = "mpmath"
225
+ version = "1.3.0"
226
+ source = { registry = "https://pypi.org/simple" }
227
+ sdist = { url = "https://files.pythonhosted.org/packages/e0/47/dd32fa426cc72114383ac549964eecb20ecfd886d1e5ccf5340b55b02f57/mpmath-1.3.0.tar.gz", hash = "sha256:7a28eb2a9774d00c7bc92411c19a89209d5da7c4c9a9e227be8330a23a25b91f", size = 508106, upload-time = "2023-03-07T16:47:11.061Z" }
228
+ wheels = [
229
+ { url = "https://files.pythonhosted.org/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl", hash = "sha256:a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c", size = 536198, upload-time = "2023-03-07T16:47:09.197Z" },
230
+ ]
231
+
232
+ [[package]]
233
+ name = "networkx"
234
+ version = "3.5"
235
+ source = { registry = "https://pypi.org/simple" }
236
+ sdist = { url = "https://files.pythonhosted.org/packages/6c/4f/ccdb8ad3a38e583f214547fd2f7ff1fc160c43a75af88e6aec213404b96a/networkx-3.5.tar.gz", hash = "sha256:d4c6f9cf81f52d69230866796b82afbccdec3db7ae4fbd1b65ea750feed50037", size = 2471065, upload-time = "2025-05-29T11:35:07.804Z" }
237
+ wheels = [
238
+ { url = "https://files.pythonhosted.org/packages/eb/8d/776adee7bbf76365fdd7f2552710282c79a4ead5d2a46408c9043a2b70ba/networkx-3.5-py3-none-any.whl", hash = "sha256:0030d386a9a06dee3565298b4a734b68589749a544acbb6c412dc9e2489ec6ec", size = 2034406, upload-time = "2025-05-29T11:35:04.961Z" },
239
+ ]
240
+
241
+ [[package]]
242
+ name = "numpy"
243
+ version = "2.3.3"
244
+ source = { registry = "https://pypi.org/simple" }
245
+ sdist = { url = "https://files.pythonhosted.org/packages/d0/19/95b3d357407220ed24c139018d2518fab0a61a948e68286a25f1a4d049ff/numpy-2.3.3.tar.gz", hash = "sha256:ddc7c39727ba62b80dfdbedf400d1c10ddfa8eefbd7ec8dcb118be8b56d31029", size = 20576648, upload-time = "2025-09-09T16:54:12.543Z" }
246
+ wheels = [
247
+ { url = "https://files.pythonhosted.org/packages/7d/b9/984c2b1ee61a8b803bf63582b4ac4242cf76e2dbd663efeafcb620cc0ccb/numpy-2.3.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:f5415fb78995644253370985342cd03572ef8620b934da27d77377a2285955bf", size = 20949588, upload-time = "2025-09-09T15:56:59.087Z" },
248
+ { url = "https://files.pythonhosted.org/packages/a6/e4/07970e3bed0b1384d22af1e9912527ecbeb47d3b26e9b6a3bced068b3bea/numpy-2.3.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:d00de139a3324e26ed5b95870ce63be7ec7352171bc69a4cf1f157a48e3eb6b7", size = 14177802, upload-time = "2025-09-09T15:57:01.73Z" },
249
+ { url = "https://files.pythonhosted.org/packages/35/c7/477a83887f9de61f1203bad89cf208b7c19cc9fef0cebef65d5a1a0619f2/numpy-2.3.3-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:9dc13c6a5829610cc07422bc74d3ac083bd8323f14e2827d992f9e52e22cd6a6", size = 5106537, upload-time = "2025-09-09T15:57:03.765Z" },
250
+ { url = "https://files.pythonhosted.org/packages/52/47/93b953bd5866a6f6986344d045a207d3f1cfbad99db29f534ea9cee5108c/numpy-2.3.3-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:d79715d95f1894771eb4e60fb23f065663b2298f7d22945d66877aadf33d00c7", size = 6640743, upload-time = "2025-09-09T15:57:07.921Z" },
251
+ { url = "https://files.pythonhosted.org/packages/23/83/377f84aaeb800b64c0ef4de58b08769e782edcefa4fea712910b6f0afd3c/numpy-2.3.3-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:952cfd0748514ea7c3afc729a0fc639e61655ce4c55ab9acfab14bda4f402b4c", size = 14278881, upload-time = "2025-09-09T15:57:11.349Z" },
252
+ { url = "https://files.pythonhosted.org/packages/9a/a5/bf3db6e66c4b160d6ea10b534c381a1955dfab34cb1017ea93aa33c70ed3/numpy-2.3.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5b83648633d46f77039c29078751f80da65aa64d5622a3cd62aaef9d835b6c93", size = 16636301, upload-time = "2025-09-09T15:57:14.245Z" },
253
+ { url = "https://files.pythonhosted.org/packages/a2/59/1287924242eb4fa3f9b3a2c30400f2e17eb2707020d1c5e3086fe7330717/numpy-2.3.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:b001bae8cea1c7dfdb2ae2b017ed0a6f2102d7a70059df1e338e307a4c78a8ae", size = 16053645, upload-time = "2025-09-09T15:57:16.534Z" },
254
+ { url = "https://files.pythonhosted.org/packages/e6/93/b3d47ed882027c35e94ac2320c37e452a549f582a5e801f2d34b56973c97/numpy-2.3.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8e9aced64054739037d42fb84c54dd38b81ee238816c948c8f3ed134665dcd86", size = 18578179, upload-time = "2025-09-09T15:57:18.883Z" },
255
+ { url = "https://files.pythonhosted.org/packages/20/d9/487a2bccbf7cc9d4bfc5f0f197761a5ef27ba870f1e3bbb9afc4bbe3fcc2/numpy-2.3.3-cp313-cp313-win32.whl", hash = "sha256:9591e1221db3f37751e6442850429b3aabf7026d3b05542d102944ca7f00c8a8", size = 6312250, upload-time = "2025-09-09T15:57:21.296Z" },
256
+ { url = "https://files.pythonhosted.org/packages/1b/b5/263ebbbbcede85028f30047eab3d58028d7ebe389d6493fc95ae66c636ab/numpy-2.3.3-cp313-cp313-win_amd64.whl", hash = "sha256:f0dadeb302887f07431910f67a14d57209ed91130be0adea2f9793f1a4f817cf", size = 12783269, upload-time = "2025-09-09T15:57:23.034Z" },
257
+ { url = "https://files.pythonhosted.org/packages/fa/75/67b8ca554bbeaaeb3fac2e8bce46967a5a06544c9108ec0cf5cece559b6c/numpy-2.3.3-cp313-cp313-win_arm64.whl", hash = "sha256:3c7cf302ac6e0b76a64c4aecf1a09e51abd9b01fc7feee80f6c43e3ab1b1dbc5", size = 10195314, upload-time = "2025-09-09T15:57:25.045Z" },
258
+ { url = "https://files.pythonhosted.org/packages/11/d0/0d1ddec56b162042ddfafeeb293bac672de9b0cfd688383590090963720a/numpy-2.3.3-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:eda59e44957d272846bb407aad19f89dc6f58fecf3504bd144f4c5cf81a7eacc", size = 21048025, upload-time = "2025-09-09T15:57:27.257Z" },
259
+ { url = "https://files.pythonhosted.org/packages/36/9e/1996ca6b6d00415b6acbdd3c42f7f03ea256e2c3f158f80bd7436a8a19f3/numpy-2.3.3-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:823d04112bc85ef5c4fda73ba24e6096c8f869931405a80aa8b0e604510a26bc", size = 14301053, upload-time = "2025-09-09T15:57:30.077Z" },
260
+ { url = "https://files.pythonhosted.org/packages/05/24/43da09aa764c68694b76e84b3d3f0c44cb7c18cdc1ba80e48b0ac1d2cd39/numpy-2.3.3-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:40051003e03db4041aa325da2a0971ba41cf65714e65d296397cc0e32de6018b", size = 5229444, upload-time = "2025-09-09T15:57:32.733Z" },
261
+ { url = "https://files.pythonhosted.org/packages/bc/14/50ffb0f22f7218ef8af28dd089f79f68289a7a05a208db9a2c5dcbe123c1/numpy-2.3.3-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:6ee9086235dd6ab7ae75aba5662f582a81ced49f0f1c6de4260a78d8f2d91a19", size = 6738039, upload-time = "2025-09-09T15:57:34.328Z" },
262
+ { url = "https://files.pythonhosted.org/packages/55/52/af46ac0795e09657d45a7f4db961917314377edecf66db0e39fa7ab5c3d3/numpy-2.3.3-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:94fcaa68757c3e2e668ddadeaa86ab05499a70725811e582b6a9858dd472fb30", size = 14352314, upload-time = "2025-09-09T15:57:36.255Z" },
263
+ { url = "https://files.pythonhosted.org/packages/a7/b1/dc226b4c90eb9f07a3fff95c2f0db3268e2e54e5cce97c4ac91518aee71b/numpy-2.3.3-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:da1a74b90e7483d6ce5244053399a614b1d6b7bc30a60d2f570e5071f8959d3e", size = 16701722, upload-time = "2025-09-09T15:57:38.622Z" },
264
+ { url = "https://files.pythonhosted.org/packages/9d/9d/9d8d358f2eb5eced14dba99f110d83b5cd9a4460895230f3b396ad19a323/numpy-2.3.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:2990adf06d1ecee3b3dcbb4977dfab6e9f09807598d647f04d385d29e7a3c3d3", size = 16132755, upload-time = "2025-09-09T15:57:41.16Z" },
265
+ { url = "https://files.pythonhosted.org/packages/b6/27/b3922660c45513f9377b3fb42240bec63f203c71416093476ec9aa0719dc/numpy-2.3.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:ed635ff692483b8e3f0fcaa8e7eb8a75ee71aa6d975388224f70821421800cea", size = 18651560, upload-time = "2025-09-09T15:57:43.459Z" },
266
+ { url = "https://files.pythonhosted.org/packages/5b/8e/3ab61a730bdbbc201bb245a71102aa609f0008b9ed15255500a99cd7f780/numpy-2.3.3-cp313-cp313t-win32.whl", hash = "sha256:a333b4ed33d8dc2b373cc955ca57babc00cd6f9009991d9edc5ddbc1bac36bcd", size = 6442776, upload-time = "2025-09-09T15:57:45.793Z" },
267
+ { url = "https://files.pythonhosted.org/packages/1c/3a/e22b766b11f6030dc2decdeff5c2fb1610768055603f9f3be88b6d192fb2/numpy-2.3.3-cp313-cp313t-win_amd64.whl", hash = "sha256:4384a169c4d8f97195980815d6fcad04933a7e1ab3b530921c3fef7a1c63426d", size = 12927281, upload-time = "2025-09-09T15:57:47.492Z" },
268
+ { url = "https://files.pythonhosted.org/packages/7b/42/c2e2bc48c5e9b2a83423f99733950fbefd86f165b468a3d85d52b30bf782/numpy-2.3.3-cp313-cp313t-win_arm64.whl", hash = "sha256:75370986cc0bc66f4ce5110ad35aae6d182cc4ce6433c40ad151f53690130bf1", size = 10265275, upload-time = "2025-09-09T15:57:49.647Z" },
269
+ { url = "https://files.pythonhosted.org/packages/6b/01/342ad585ad82419b99bcf7cebe99e61da6bedb89e213c5fd71acc467faee/numpy-2.3.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:cd052f1fa6a78dee696b58a914b7229ecfa41f0a6d96dc663c1220a55e137593", size = 20951527, upload-time = "2025-09-09T15:57:52.006Z" },
270
+ { url = "https://files.pythonhosted.org/packages/ef/d8/204e0d73fc1b7a9ee80ab1fe1983dd33a4d64a4e30a05364b0208e9a241a/numpy-2.3.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:414a97499480067d305fcac9716c29cf4d0d76db6ebf0bf3cbce666677f12652", size = 14186159, upload-time = "2025-09-09T15:57:54.407Z" },
271
+ { url = "https://files.pythonhosted.org/packages/22/af/f11c916d08f3a18fb8ba81ab72b5b74a6e42ead4c2846d270eb19845bf74/numpy-2.3.3-cp314-cp314-macosx_14_0_arm64.whl", hash = "sha256:50a5fe69f135f88a2be9b6ca0481a68a136f6febe1916e4920e12f1a34e708a7", size = 5114624, upload-time = "2025-09-09T15:57:56.5Z" },
272
+ { url = "https://files.pythonhosted.org/packages/fb/11/0ed919c8381ac9d2ffacd63fd1f0c34d27e99cab650f0eb6f110e6ae4858/numpy-2.3.3-cp314-cp314-macosx_14_0_x86_64.whl", hash = "sha256:b912f2ed2b67a129e6a601e9d93d4fa37bef67e54cac442a2f588a54afe5c67a", size = 6642627, upload-time = "2025-09-09T15:57:58.206Z" },
273
+ { url = "https://files.pythonhosted.org/packages/ee/83/deb5f77cb0f7ba6cb52b91ed388b47f8f3c2e9930d4665c600408d9b90b9/numpy-2.3.3-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9e318ee0596d76d4cb3d78535dc005fa60e5ea348cd131a51e99d0bdbe0b54fe", size = 14296926, upload-time = "2025-09-09T15:58:00.035Z" },
274
+ { url = "https://files.pythonhosted.org/packages/77/cc/70e59dcb84f2b005d4f306310ff0a892518cc0c8000a33d0e6faf7ca8d80/numpy-2.3.3-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ce020080e4a52426202bdb6f7691c65bb55e49f261f31a8f506c9f6bc7450421", size = 16638958, upload-time = "2025-09-09T15:58:02.738Z" },
275
+ { url = "https://files.pythonhosted.org/packages/b6/5a/b2ab6c18b4257e099587d5b7f903317bd7115333ad8d4ec4874278eafa61/numpy-2.3.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:e6687dc183aa55dae4a705b35f9c0f8cb178bcaa2f029b241ac5356221d5c021", size = 16071920, upload-time = "2025-09-09T15:58:05.029Z" },
276
+ { url = "https://files.pythonhosted.org/packages/b8/f1/8b3fdc44324a259298520dd82147ff648979bed085feeacc1250ef1656c0/numpy-2.3.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:d8f3b1080782469fdc1718c4ed1d22549b5fb12af0d57d35e992158a772a37cf", size = 18577076, upload-time = "2025-09-09T15:58:07.745Z" },
277
+ { url = "https://files.pythonhosted.org/packages/f0/a1/b87a284fb15a42e9274e7fcea0dad259d12ddbf07c1595b26883151ca3b4/numpy-2.3.3-cp314-cp314-win32.whl", hash = "sha256:cb248499b0bc3be66ebd6578b83e5acacf1d6cb2a77f2248ce0e40fbec5a76d0", size = 6366952, upload-time = "2025-09-09T15:58:10.096Z" },
278
+ { url = "https://files.pythonhosted.org/packages/70/5f/1816f4d08f3b8f66576d8433a66f8fa35a5acfb3bbd0bf6c31183b003f3d/numpy-2.3.3-cp314-cp314-win_amd64.whl", hash = "sha256:691808c2b26b0f002a032c73255d0bd89751425f379f7bcd22d140db593a96e8", size = 12919322, upload-time = "2025-09-09T15:58:12.138Z" },
279
+ { url = "https://files.pythonhosted.org/packages/8c/de/072420342e46a8ea41c324a555fa90fcc11637583fb8df722936aed1736d/numpy-2.3.3-cp314-cp314-win_arm64.whl", hash = "sha256:9ad12e976ca7b10f1774b03615a2a4bab8addce37ecc77394d8e986927dc0dfe", size = 10478630, upload-time = "2025-09-09T15:58:14.64Z" },
280
+ { url = "https://files.pythonhosted.org/packages/d5/df/ee2f1c0a9de7347f14da5dd3cd3c3b034d1b8607ccb6883d7dd5c035d631/numpy-2.3.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:9cc48e09feb11e1db00b320e9d30a4151f7369afb96bd0e48d942d09da3a0d00", size = 21047987, upload-time = "2025-09-09T15:58:16.889Z" },
281
+ { url = "https://files.pythonhosted.org/packages/d6/92/9453bdc5a4e9e69cf4358463f25e8260e2ffc126d52e10038b9077815989/numpy-2.3.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:901bf6123879b7f251d3631967fd574690734236075082078e0571977c6a8e6a", size = 14301076, upload-time = "2025-09-09T15:58:20.343Z" },
282
+ { url = "https://files.pythonhosted.org/packages/13/77/1447b9eb500f028bb44253105bd67534af60499588a5149a94f18f2ca917/numpy-2.3.3-cp314-cp314t-macosx_14_0_arm64.whl", hash = "sha256:7f025652034199c301049296b59fa7d52c7e625017cae4c75d8662e377bf487d", size = 5229491, upload-time = "2025-09-09T15:58:22.481Z" },
283
+ { url = "https://files.pythonhosted.org/packages/3d/f9/d72221b6ca205f9736cb4b2ce3b002f6e45cd67cd6a6d1c8af11a2f0b649/numpy-2.3.3-cp314-cp314t-macosx_14_0_x86_64.whl", hash = "sha256:533ca5f6d325c80b6007d4d7fb1984c303553534191024ec6a524a4c92a5935a", size = 6737913, upload-time = "2025-09-09T15:58:24.569Z" },
284
+ { url = "https://files.pythonhosted.org/packages/3c/5f/d12834711962ad9c46af72f79bb31e73e416ee49d17f4c797f72c96b6ca5/numpy-2.3.3-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0edd58682a399824633b66885d699d7de982800053acf20be1eaa46d92009c54", size = 14352811, upload-time = "2025-09-09T15:58:26.416Z" },
285
+ { url = "https://files.pythonhosted.org/packages/a1/0d/fdbec6629d97fd1bebed56cd742884e4eead593611bbe1abc3eb40d304b2/numpy-2.3.3-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:367ad5d8fbec5d9296d18478804a530f1191e24ab4d75ab408346ae88045d25e", size = 16702689, upload-time = "2025-09-09T15:58:28.831Z" },
286
+ { url = "https://files.pythonhosted.org/packages/9b/09/0a35196dc5575adde1eb97ddfbc3e1687a814f905377621d18ca9bc2b7dd/numpy-2.3.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8f6ac61a217437946a1fa48d24c47c91a0c4f725237871117dea264982128097", size = 16133855, upload-time = "2025-09-09T15:58:31.349Z" },
287
+ { url = "https://files.pythonhosted.org/packages/7a/ca/c9de3ea397d576f1b6753eaa906d4cdef1bf97589a6d9825a349b4729cc2/numpy-2.3.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:179a42101b845a816d464b6fe9a845dfaf308fdfc7925387195570789bb2c970", size = 18652520, upload-time = "2025-09-09T15:58:33.762Z" },
288
+ { url = "https://files.pythonhosted.org/packages/fd/c2/e5ed830e08cd0196351db55db82f65bc0ab05da6ef2b72a836dcf1936d2f/numpy-2.3.3-cp314-cp314t-win32.whl", hash = "sha256:1250c5d3d2562ec4174bce2e3a1523041595f9b651065e4a4473f5f48a6bc8a5", size = 6515371, upload-time = "2025-09-09T15:58:36.04Z" },
289
+ { url = "https://files.pythonhosted.org/packages/47/c7/b0f6b5b67f6788a0725f744496badbb604d226bf233ba716683ebb47b570/numpy-2.3.3-cp314-cp314t-win_amd64.whl", hash = "sha256:b37a0b2e5935409daebe82c1e42274d30d9dd355852529eab91dab8dcca7419f", size = 13112576, upload-time = "2025-09-09T15:58:37.927Z" },
290
+ { url = "https://files.pythonhosted.org/packages/06/b9/33bba5ff6fb679aa0b1f8a07e853f002a6b04b9394db3069a1270a7784ca/numpy-2.3.3-cp314-cp314t-win_arm64.whl", hash = "sha256:78c9f6560dc7e6b3990e32df7ea1a50bbd0e2a111e05209963f5ddcab7073b0b", size = 10545953, upload-time = "2025-09-09T15:58:40.576Z" },
291
+ ]
292
+
293
+ [[package]]
294
+ name = "nvidia-cublas-cu12"
295
+ version = "12.8.4.1"
296
+ source = { registry = "https://pypi.org/simple" }
297
+ wheels = [
298
+ { url = "https://files.pythonhosted.org/packages/dc/61/e24b560ab2e2eaeb3c839129175fb330dfcfc29e5203196e5541a4c44682/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:8ac4e771d5a348c551b2a426eda6193c19aa630236b418086020df5ba9667142", size = 594346921, upload-time = "2025-03-07T01:44:31.254Z" },
299
+ ]
300
+
301
+ [[package]]
302
+ name = "nvidia-cuda-cupti-cu12"
303
+ version = "12.8.90"
304
+ source = { registry = "https://pypi.org/simple" }
305
+ wheels = [
306
+ { url = "https://files.pythonhosted.org/packages/f8/02/2adcaa145158bf1a8295d83591d22e4103dbfd821bcaf6f3f53151ca4ffa/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ea0cb07ebda26bb9b29ba82cda34849e73c166c18162d3913575b0c9db9a6182", size = 10248621, upload-time = "2025-03-07T01:40:21.213Z" },
307
+ ]
308
+
309
+ [[package]]
310
+ name = "nvidia-cuda-nvrtc-cu12"
311
+ version = "12.8.93"
312
+ source = { registry = "https://pypi.org/simple" }
313
+ wheels = [
314
+ { url = "https://files.pythonhosted.org/packages/05/6b/32f747947df2da6994e999492ab306a903659555dddc0fbdeb9d71f75e52/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:a7756528852ef889772a84c6cd89d41dfa74667e24cca16bb31f8f061e3e9994", size = 88040029, upload-time = "2025-03-07T01:42:13.562Z" },
315
+ ]
316
+
317
+ [[package]]
318
+ name = "nvidia-cuda-runtime-cu12"
319
+ version = "12.8.90"
320
+ source = { registry = "https://pypi.org/simple" }
321
+ wheels = [
322
+ { url = "https://files.pythonhosted.org/packages/0d/9b/a997b638fcd068ad6e4d53b8551a7d30fe8b404d6f1804abf1df69838932/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:adade8dcbd0edf427b7204d480d6066d33902cab2a4707dcfc48a2d0fd44ab90", size = 954765, upload-time = "2025-03-07T01:40:01.615Z" },
323
+ ]
324
+
325
+ [[package]]
326
+ name = "nvidia-cudnn-cu12"
327
+ version = "9.10.2.21"
328
+ source = { registry = "https://pypi.org/simple" }
329
+ dependencies = [
330
+ { name = "nvidia-cublas-cu12" },
331
+ ]
332
+ wheels = [
333
+ { url = "https://files.pythonhosted.org/packages/ba/51/e123d997aa098c61d029f76663dedbfb9bc8dcf8c60cbd6adbe42f76d049/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:949452be657fa16687d0930933f032835951ef0892b37d2d53824d1a84dc97a8", size = 706758467, upload-time = "2025-06-06T21:54:08.597Z" },
334
+ ]
335
+
336
+ [[package]]
337
+ name = "nvidia-cufft-cu12"
338
+ version = "11.3.3.83"
339
+ source = { registry = "https://pypi.org/simple" }
340
+ dependencies = [
341
+ { name = "nvidia-nvjitlink-cu12" },
342
+ ]
343
+ wheels = [
344
+ { url = "https://files.pythonhosted.org/packages/1f/13/ee4e00f30e676b66ae65b4f08cb5bcbb8392c03f54f2d5413ea99a5d1c80/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:4d2dd21ec0b88cf61b62e6b43564355e5222e4a3fb394cac0db101f2dd0d4f74", size = 193118695, upload-time = "2025-03-07T01:45:27.821Z" },
345
+ ]
346
+
347
+ [[package]]
348
+ name = "nvidia-cufile-cu12"
349
+ version = "1.13.1.3"
350
+ source = { registry = "https://pypi.org/simple" }
351
+ wheels = [
352
+ { url = "https://files.pythonhosted.org/packages/bb/fe/1bcba1dfbfb8d01be8d93f07bfc502c93fa23afa6fd5ab3fc7c1df71038a/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1d069003be650e131b21c932ec3d8969c1715379251f8d23a1860554b1cb24fc", size = 1197834, upload-time = "2025-03-07T01:45:50.723Z" },
353
+ ]
354
+
355
+ [[package]]
356
+ name = "nvidia-curand-cu12"
357
+ version = "10.3.9.90"
358
+ source = { registry = "https://pypi.org/simple" }
359
+ wheels = [
360
+ { url = "https://files.pythonhosted.org/packages/fb/aa/6584b56dc84ebe9cf93226a5cde4d99080c8e90ab40f0c27bda7a0f29aa1/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:b32331d4f4df5d6eefa0554c565b626c7216f87a06a4f56fab27c3b68a830ec9", size = 63619976, upload-time = "2025-03-07T01:46:23.323Z" },
361
+ ]
362
+
363
+ [[package]]
364
+ name = "nvidia-cusolver-cu12"
365
+ version = "11.7.3.90"
366
+ source = { registry = "https://pypi.org/simple" }
367
+ dependencies = [
368
+ { name = "nvidia-cublas-cu12" },
369
+ { name = "nvidia-cusparse-cu12" },
370
+ { name = "nvidia-nvjitlink-cu12" },
371
+ ]
372
+ wheels = [
373
+ { url = "https://files.pythonhosted.org/packages/85/48/9a13d2975803e8cf2777d5ed57b87a0b6ca2cc795f9a4f59796a910bfb80/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:4376c11ad263152bd50ea295c05370360776f8c3427b30991df774f9fb26c450", size = 267506905, upload-time = "2025-03-07T01:47:16.273Z" },
374
+ ]
375
+
376
+ [[package]]
377
+ name = "nvidia-cusparse-cu12"
378
+ version = "12.5.8.93"
379
+ source = { registry = "https://pypi.org/simple" }
380
+ dependencies = [
381
+ { name = "nvidia-nvjitlink-cu12" },
382
+ ]
383
+ wheels = [
384
+ { url = "https://files.pythonhosted.org/packages/c2/f5/e1854cb2f2bcd4280c44736c93550cc300ff4b8c95ebe370d0aa7d2b473d/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1ec05d76bbbd8b61b06a80e1eaf8cf4959c3d4ce8e711b65ebd0443bb0ebb13b", size = 288216466, upload-time = "2025-03-07T01:48:13.779Z" },
385
+ ]
386
+
387
+ [[package]]
388
+ name = "nvidia-cusparselt-cu12"
389
+ version = "0.7.1"
390
+ source = { registry = "https://pypi.org/simple" }
391
+ wheels = [
392
+ { url = "https://files.pythonhosted.org/packages/56/79/12978b96bd44274fe38b5dde5cfb660b1d114f70a65ef962bcbbed99b549/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl", hash = "sha256:f1bb701d6b930d5a7cea44c19ceb973311500847f81b634d802b7b539dc55623", size = 287193691, upload-time = "2025-02-26T00:15:44.104Z" },
393
+ ]
394
+
395
+ [[package]]
396
+ name = "nvidia-nccl-cu12"
397
+ version = "2.27.3"
398
+ source = { registry = "https://pypi.org/simple" }
399
+ wheels = [
400
+ { url = "https://files.pythonhosted.org/packages/5c/5b/4e4fff7bad39adf89f735f2bc87248c81db71205b62bcc0d5ca5b606b3c3/nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:adf27ccf4238253e0b826bce3ff5fa532d65fc42322c8bfdfaf28024c0fbe039", size = 322364134, upload-time = "2025-06-03T21:58:04.013Z" },
401
+ ]
402
+
403
+ [[package]]
404
+ name = "nvidia-nvjitlink-cu12"
405
+ version = "12.8.93"
406
+ source = { registry = "https://pypi.org/simple" }
407
+ wheels = [
408
+ { url = "https://files.pythonhosted.org/packages/f6/74/86a07f1d0f42998ca31312f998bd3b9a7eff7f52378f4f270c8679c77fb9/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:81ff63371a7ebd6e6451970684f916be2eab07321b73c9d244dc2b4da7f73b88", size = 39254836, upload-time = "2025-03-07T01:49:55.661Z" },
409
+ ]
410
+
411
+ [[package]]
412
+ name = "nvidia-nvtx-cu12"
413
+ version = "12.8.90"
414
+ source = { registry = "https://pypi.org/simple" }
415
+ wheels = [
416
+ { url = "https://files.pythonhosted.org/packages/a2/eb/86626c1bbc2edb86323022371c39aa48df6fd8b0a1647bc274577f72e90b/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b17e2001cc0d751a5bc2c6ec6d26ad95913324a4adb86788c944f8ce9ba441f", size = 89954, upload-time = "2025-03-07T01:42:44.131Z" },
417
+ ]
418
+
419
+ [[package]]
420
+ name = "packaging"
421
+ version = "25.0"
422
+ source = { registry = "https://pypi.org/simple" }
423
+ sdist = { url = "https://files.pythonhosted.org/packages/a1/d4/1fc4078c65507b51b96ca8f8c3ba19e6a61c8253c72794544580a7b6c24d/packaging-25.0.tar.gz", hash = "sha256:d443872c98d677bf60f6a1f2f8c1cb748e8fe762d2bf9d3148b5599295b0fc4f", size = 165727, upload-time = "2025-04-19T11:48:59.673Z" }
424
+ wheels = [
425
+ { url = "https://files.pythonhosted.org/packages/20/12/38679034af332785aac8774540895e234f4d07f7545804097de4b666afd8/packaging-25.0-py3-none-any.whl", hash = "sha256:29572ef2b1f17581046b3a2227d5c611fb25ec70ca1ba8554b24b0e69331a484", size = 66469, upload-time = "2025-04-19T11:48:57.875Z" },
426
+ ]
427
+
428
+ [[package]]
429
+ name = "protobuf"
430
+ version = "6.32.1"
431
+ source = { registry = "https://pypi.org/simple" }
432
+ sdist = { url = "https://files.pythonhosted.org/packages/fa/a4/cc17347aa2897568beece2e674674359f911d6fe21b0b8d6268cd42727ac/protobuf-6.32.1.tar.gz", hash = "sha256:ee2469e4a021474ab9baafea6cd070e5bf27c7d29433504ddea1a4ee5850f68d", size = 440635, upload-time = "2025-09-11T21:38:42.935Z" }
433
+ wheels = [
434
+ { url = "https://files.pythonhosted.org/packages/c0/98/645183ea03ab3995d29086b8bf4f7562ebd3d10c9a4b14ee3f20d47cfe50/protobuf-6.32.1-cp310-abi3-win32.whl", hash = "sha256:a8a32a84bc9f2aad712041b8b366190f71dde248926da517bde9e832e4412085", size = 424411, upload-time = "2025-09-11T21:38:27.427Z" },
435
+ { url = "https://files.pythonhosted.org/packages/8c/f3/6f58f841f6ebafe076cebeae33fc336e900619d34b1c93e4b5c97a81fdfa/protobuf-6.32.1-cp310-abi3-win_amd64.whl", hash = "sha256:b00a7d8c25fa471f16bc8153d0e53d6c9e827f0953f3c09aaa4331c718cae5e1", size = 435738, upload-time = "2025-09-11T21:38:30.959Z" },
436
+ { url = "https://files.pythonhosted.org/packages/10/56/a8a3f4e7190837139e68c7002ec749190a163af3e330f65d90309145a210/protobuf-6.32.1-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:d8c7e6eb619ffdf105ee4ab76af5a68b60a9d0f66da3ea12d1640e6d8dab7281", size = 426454, upload-time = "2025-09-11T21:38:34.076Z" },
437
+ { url = "https://files.pythonhosted.org/packages/3f/be/8dd0a927c559b37d7a6c8ab79034fd167dcc1f851595f2e641ad62be8643/protobuf-6.32.1-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:2f5b80a49e1eb7b86d85fcd23fe92df154b9730a725c3b38c4e43b9d77018bf4", size = 322874, upload-time = "2025-09-11T21:38:35.509Z" },
438
+ { url = "https://files.pythonhosted.org/packages/5c/f6/88d77011b605ef979aace37b7703e4eefad066f7e84d935e5a696515c2dd/protobuf-6.32.1-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:b1864818300c297265c83a4982fd3169f97122c299f56a56e2445c3698d34710", size = 322013, upload-time = "2025-09-11T21:38:37.017Z" },
439
+ { url = "https://files.pythonhosted.org/packages/97/b7/15cc7d93443d6c6a84626ae3258a91f4c6ac8c0edd5df35ea7658f71b79c/protobuf-6.32.1-py3-none-any.whl", hash = "sha256:2601b779fc7d32a866c6b4404f9d42a3f67c5b9f3f15b4db3cccabe06b95c346", size = 169289, upload-time = "2025-09-11T21:38:41.234Z" },
440
+ ]
441
+
442
+ [[package]]
443
+ name = "pyyaml"
444
+ version = "6.0.2"
445
+ source = { registry = "https://pypi.org/simple" }
446
+ sdist = { url = "https://files.pythonhosted.org/packages/54/ed/79a089b6be93607fa5cdaedf301d7dfb23af5f25c398d5ead2525b063e17/pyyaml-6.0.2.tar.gz", hash = "sha256:d584d9ec91ad65861cc08d42e834324ef890a082e591037abe114850ff7bbc3e", size = 130631, upload-time = "2024-08-06T20:33:50.674Z" }
447
+ wheels = [
448
+ { url = "https://files.pythonhosted.org/packages/ef/e3/3af305b830494fa85d95f6d95ef7fa73f2ee1cc8ef5b495c7c3269fb835f/PyYAML-6.0.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:efdca5630322a10774e8e98e1af481aad470dd62c3170801852d752aa7a783ba", size = 181309, upload-time = "2024-08-06T20:32:43.4Z" },
449
+ { url = "https://files.pythonhosted.org/packages/45/9f/3b1c20a0b7a3200524eb0076cc027a970d320bd3a6592873c85c92a08731/PyYAML-6.0.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:50187695423ffe49e2deacb8cd10510bc361faac997de9efef88badc3bb9e2d1", size = 171679, upload-time = "2024-08-06T20:32:44.801Z" },
450
+ { url = "https://files.pythonhosted.org/packages/7c/9a/337322f27005c33bcb656c655fa78325b730324c78620e8328ae28b64d0c/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0ffe8360bab4910ef1b9e87fb812d8bc0a308b0d0eef8c8f44e0254ab3b07133", size = 733428, upload-time = "2024-08-06T20:32:46.432Z" },
451
+ { url = "https://files.pythonhosted.org/packages/a3/69/864fbe19e6c18ea3cc196cbe5d392175b4cf3d5d0ac1403ec3f2d237ebb5/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:17e311b6c678207928d649faa7cb0d7b4c26a0ba73d41e99c4fff6b6c3276484", size = 763361, upload-time = "2024-08-06T20:32:51.188Z" },
452
+ { url = "https://files.pythonhosted.org/packages/04/24/b7721e4845c2f162d26f50521b825fb061bc0a5afcf9a386840f23ea19fa/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:70b189594dbe54f75ab3a1acec5f1e3faa7e8cf2f1e08d9b561cb41b845f69d5", size = 759523, upload-time = "2024-08-06T20:32:53.019Z" },
453
+ { url = "https://files.pythonhosted.org/packages/2b/b2/e3234f59ba06559c6ff63c4e10baea10e5e7df868092bf9ab40e5b9c56b6/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:41e4e3953a79407c794916fa277a82531dd93aad34e29c2a514c2c0c5fe971cc", size = 726660, upload-time = "2024-08-06T20:32:54.708Z" },
454
+ { url = "https://files.pythonhosted.org/packages/fe/0f/25911a9f080464c59fab9027482f822b86bf0608957a5fcc6eaac85aa515/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:68ccc6023a3400877818152ad9a1033e3db8625d899c72eacb5a668902e4d652", size = 751597, upload-time = "2024-08-06T20:32:56.985Z" },
455
+ { url = "https://files.pythonhosted.org/packages/14/0d/e2c3b43bbce3cf6bd97c840b46088a3031085179e596d4929729d8d68270/PyYAML-6.0.2-cp313-cp313-win32.whl", hash = "sha256:bc2fa7c6b47d6bc618dd7fb02ef6fdedb1090ec036abab80d4681424b84c1183", size = 140527, upload-time = "2024-08-06T20:33:03.001Z" },
456
+ { url = "https://files.pythonhosted.org/packages/fa/de/02b54f42487e3d3c6efb3f89428677074ca7bf43aae402517bc7cca949f3/PyYAML-6.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:8388ee1976c416731879ac16da0aff3f63b286ffdd57cdeb95f3f2e085687563", size = 156446, upload-time = "2024-08-06T20:33:04.33Z" },
457
+ ]
458
+
459
+ [[package]]
460
+ name = "regex"
461
+ version = "2025.9.1"
462
+ source = { registry = "https://pypi.org/simple" }
463
+ sdist = { url = "https://files.pythonhosted.org/packages/b2/5a/4c63457fbcaf19d138d72b2e9b39405954f98c0349b31c601bfcb151582c/regex-2025.9.1.tar.gz", hash = "sha256:88ac07b38d20b54d79e704e38aa3bd2c0f8027432164226bdee201a1c0c9c9ff", size = 400852, upload-time = "2025-09-01T22:10:10.479Z" }
464
+ wheels = [
465
+ { url = "https://files.pythonhosted.org/packages/98/25/b2959ce90c6138c5142fe5264ee1f9b71a0c502ca4c7959302a749407c79/regex-2025.9.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:bc6834727d1b98d710a63e6c823edf6ffbf5792eba35d3fa119531349d4142ef", size = 485932, upload-time = "2025-09-01T22:08:57.913Z" },
466
+ { url = "https://files.pythonhosted.org/packages/49/2e/6507a2a85f3f2be6643438b7bd976e67ad73223692d6988eb1ff444106d3/regex-2025.9.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:c3dc05b6d579875719bccc5f3037b4dc80433d64e94681a0061845bd8863c025", size = 289568, upload-time = "2025-09-01T22:08:59.258Z" },
467
+ { url = "https://files.pythonhosted.org/packages/c7/d8/de4a4b57215d99868f1640e062a7907e185ec7476b4b689e2345487c1ff4/regex-2025.9.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:22213527df4c985ec4a729b055a8306272d41d2f45908d7bacb79be0fa7a75ad", size = 286984, upload-time = "2025-09-01T22:09:00.835Z" },
468
+ { url = "https://files.pythonhosted.org/packages/03/15/e8cb403403a57ed316e80661db0e54d7aa2efcd85cb6156f33cc18746922/regex-2025.9.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8e3f6e3c5a5a1adc3f7ea1b5aec89abfc2f4fbfba55dafb4343cd1d084f715b2", size = 797514, upload-time = "2025-09-01T22:09:02.538Z" },
469
+ { url = "https://files.pythonhosted.org/packages/e4/26/2446f2b9585fed61faaa7e2bbce3aca7dd8df6554c32addee4c4caecf24a/regex-2025.9.1-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:bcb89c02a0d6c2bec9b0bb2d8c78782699afe8434493bfa6b4021cc51503f249", size = 862586, upload-time = "2025-09-01T22:09:04.322Z" },
470
+ { url = "https://files.pythonhosted.org/packages/fd/b8/82ffbe9c0992c31bbe6ae1c4b4e21269a5df2559102b90543c9b56724c3c/regex-2025.9.1-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:b0e2f95413eb0c651cd1516a670036315b91b71767af83bc8525350d4375ccba", size = 910815, upload-time = "2025-09-01T22:09:05.978Z" },
471
+ { url = "https://files.pythonhosted.org/packages/2f/d8/7303ea38911759c1ee30cc5bc623ee85d3196b733c51fd6703c34290a8d9/regex-2025.9.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:09a41dc039e1c97d3c2ed3e26523f748e58c4de3ea7a31f95e1cf9ff973fff5a", size = 802042, upload-time = "2025-09-01T22:09:07.865Z" },
472
+ { url = "https://files.pythonhosted.org/packages/fc/0e/6ad51a55ed4b5af512bb3299a05d33309bda1c1d1e1808fa869a0bed31bc/regex-2025.9.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:4f0b4258b161094f66857a26ee938d3fe7b8a5063861e44571215c44fbf0e5df", size = 786764, upload-time = "2025-09-01T22:09:09.362Z" },
473
+ { url = "https://files.pythonhosted.org/packages/8d/d5/394e3ffae6baa5a9217bbd14d96e0e5da47bb069d0dbb8278e2681a2b938/regex-2025.9.1-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:bf70e18ac390e6977ea7e56f921768002cb0fa359c4199606c7219854ae332e0", size = 856557, upload-time = "2025-09-01T22:09:11.129Z" },
474
+ { url = "https://files.pythonhosted.org/packages/cd/80/b288d3910c41194ad081b9fb4b371b76b0bbfdce93e7709fc98df27b37dc/regex-2025.9.1-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:b84036511e1d2bb0a4ff1aec26951caa2dea8772b223c9e8a19ed8885b32dbac", size = 849108, upload-time = "2025-09-01T22:09:12.877Z" },
475
+ { url = "https://files.pythonhosted.org/packages/d1/cd/5ec76bf626d0d5abdc277b7a1734696f5f3d14fbb4a3e2540665bc305d85/regex-2025.9.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:c2e05dcdfe224047f2a59e70408274c325d019aad96227ab959403ba7d58d2d7", size = 788201, upload-time = "2025-09-01T22:09:14.561Z" },
476
+ { url = "https://files.pythonhosted.org/packages/b5/36/674672f3fdead107565a2499f3007788b878188acec6d42bc141c5366c2c/regex-2025.9.1-cp313-cp313-win32.whl", hash = "sha256:3b9a62107a7441b81ca98261808fed30ae36ba06c8b7ee435308806bd53c1ed8", size = 264508, upload-time = "2025-09-01T22:09:16.193Z" },
477
+ { url = "https://files.pythonhosted.org/packages/83/ad/931134539515eb64ce36c24457a98b83c1b2e2d45adf3254b94df3735a76/regex-2025.9.1-cp313-cp313-win_amd64.whl", hash = "sha256:b38afecc10c177eb34cfae68d669d5161880849ba70c05cbfbe409f08cc939d7", size = 275469, upload-time = "2025-09-01T22:09:17.462Z" },
478
+ { url = "https://files.pythonhosted.org/packages/24/8c/96d34e61c0e4e9248836bf86d69cb224fd222f270fa9045b24e218b65604/regex-2025.9.1-cp313-cp313-win_arm64.whl", hash = "sha256:ec329890ad5e7ed9fc292858554d28d58d56bf62cf964faf0aa57964b21155a0", size = 268586, upload-time = "2025-09-01T22:09:18.948Z" },
479
+ { url = "https://files.pythonhosted.org/packages/21/b1/453cbea5323b049181ec6344a803777914074b9726c9c5dc76749966d12d/regex-2025.9.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:72fb7a016467d364546f22b5ae86c45680a4e0de6b2a6f67441d22172ff641f1", size = 486111, upload-time = "2025-09-01T22:09:20.734Z" },
480
+ { url = "https://files.pythonhosted.org/packages/f6/0e/92577f197bd2f7652c5e2857f399936c1876978474ecc5b068c6d8a79c86/regex-2025.9.1-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:c9527fa74eba53f98ad86be2ba003b3ebe97e94b6eb2b916b31b5f055622ef03", size = 289520, upload-time = "2025-09-01T22:09:22.249Z" },
481
+ { url = "https://files.pythonhosted.org/packages/af/c6/b472398116cca7ea5a6c4d5ccd0fc543f7fd2492cb0c48d2852a11972f73/regex-2025.9.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c905d925d194c83a63f92422af7544ec188301451b292c8b487f0543726107ca", size = 287215, upload-time = "2025-09-01T22:09:23.657Z" },
482
+ { url = "https://files.pythonhosted.org/packages/cf/11/f12ecb0cf9ca792a32bb92f758589a84149017467a544f2f6bfb45c0356d/regex-2025.9.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:74df7c74a63adcad314426b1f4ea6054a5ab25d05b0244f0c07ff9ce640fa597", size = 797855, upload-time = "2025-09-01T22:09:25.197Z" },
483
+ { url = "https://files.pythonhosted.org/packages/46/88/bbb848f719a540fb5997e71310f16f0b33a92c5d4b4d72d4311487fff2a3/regex-2025.9.1-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:4f6e935e98ea48c7a2e8be44494de337b57a204470e7f9c9c42f912c414cd6f5", size = 863363, upload-time = "2025-09-01T22:09:26.705Z" },
484
+ { url = "https://files.pythonhosted.org/packages/54/a9/2321eb3e2838f575a78d48e03c1e83ea61bd08b74b7ebbdeca8abc50fc25/regex-2025.9.1-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:4a62d033cd9ebefc7c5e466731a508dfabee827d80b13f455de68a50d3c2543d", size = 910202, upload-time = "2025-09-01T22:09:28.906Z" },
485
+ { url = "https://files.pythonhosted.org/packages/33/07/d1d70835d7d11b7e126181f316f7213c4572ecf5c5c97bdbb969fb1f38a2/regex-2025.9.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ef971ebf2b93bdc88d8337238be4dfb851cc97ed6808eb04870ef67589415171", size = 801808, upload-time = "2025-09-01T22:09:30.733Z" },
486
+ { url = "https://files.pythonhosted.org/packages/13/d1/29e4d1bed514ef2bf3a4ead3cb8bb88ca8af94130239a4e68aa765c35b1c/regex-2025.9.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:d936a1db208bdca0eca1f2bb2c1ba1d8370b226785c1e6db76e32a228ffd0ad5", size = 786824, upload-time = "2025-09-01T22:09:32.61Z" },
487
+ { url = "https://files.pythonhosted.org/packages/33/27/20d8ccb1bee460faaa851e6e7cc4cfe852a42b70caa1dca22721ba19f02f/regex-2025.9.1-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:7e786d9e4469698fc63815b8de08a89165a0aa851720eb99f5e0ea9d51dd2b6a", size = 857406, upload-time = "2025-09-01T22:09:34.117Z" },
488
+ { url = "https://files.pythonhosted.org/packages/74/fe/60c6132262dc36430d51e0c46c49927d113d3a38c1aba6a26c7744c84cf3/regex-2025.9.1-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:6b81d7dbc5466ad2c57ce3a0ddb717858fe1a29535c8866f8514d785fdb9fc5b", size = 848593, upload-time = "2025-09-01T22:09:35.598Z" },
489
+ { url = "https://files.pythonhosted.org/packages/cc/ae/2d4ff915622fabbef1af28387bf71e7f2f4944a348b8460d061e85e29bf0/regex-2025.9.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:cd4890e184a6feb0ef195338a6ce68906a8903a0f2eb7e0ab727dbc0a3156273", size = 787951, upload-time = "2025-09-01T22:09:37.139Z" },
490
+ { url = "https://files.pythonhosted.org/packages/85/37/dc127703a9e715a284cc2f7dbdd8a9776fd813c85c126eddbcbdd1ca5fec/regex-2025.9.1-cp314-cp314-win32.whl", hash = "sha256:34679a86230e46164c9e0396b56cab13c0505972343880b9e705083cc5b8ec86", size = 269833, upload-time = "2025-09-01T22:09:39.245Z" },
491
+ { url = "https://files.pythonhosted.org/packages/83/bf/4bed4d3d0570e16771defd5f8f15f7ea2311edcbe91077436d6908956c4a/regex-2025.9.1-cp314-cp314-win_amd64.whl", hash = "sha256:a1196e530a6bfa5f4bde029ac5b0295a6ecfaaffbfffede4bbaf4061d9455b70", size = 278742, upload-time = "2025-09-01T22:09:40.651Z" },
492
+ { url = "https://files.pythonhosted.org/packages/cf/3e/7d7ac6fd085023312421e0d69dfabdfb28e116e513fadbe9afe710c01893/regex-2025.9.1-cp314-cp314-win_arm64.whl", hash = "sha256:f46d525934871ea772930e997d577d48c6983e50f206ff7b66d4ac5f8941e993", size = 271860, upload-time = "2025-09-01T22:09:42.413Z" },
493
+ ]
494
+
495
+ [[package]]
496
+ name = "requests"
497
+ version = "2.32.5"
498
+ source = { registry = "https://pypi.org/simple" }
499
+ dependencies = [
500
+ { name = "certifi" },
501
+ { name = "charset-normalizer" },
502
+ { name = "idna" },
503
+ { name = "urllib3" },
504
+ ]
505
+ sdist = { url = "https://files.pythonhosted.org/packages/c9/74/b3ff8e6c8446842c3f5c837e9c3dfcfe2018ea6ecef224c710c85ef728f4/requests-2.32.5.tar.gz", hash = "sha256:dbba0bac56e100853db0ea71b82b4dfd5fe2bf6d3754a8893c3af500cec7d7cf", size = 134517, upload-time = "2025-08-18T20:46:02.573Z" }
506
+ wheels = [
507
+ { url = "https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl", hash = "sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6", size = 64738, upload-time = "2025-08-18T20:46:00.542Z" },
508
+ ]
509
+
510
+ [[package]]
511
+ name = "safetensors"
512
+ version = "0.6.2"
513
+ source = { registry = "https://pypi.org/simple" }
514
+ sdist = { url = "https://files.pythonhosted.org/packages/ac/cc/738f3011628920e027a11754d9cae9abec1aed00f7ae860abbf843755233/safetensors-0.6.2.tar.gz", hash = "sha256:43ff2aa0e6fa2dc3ea5524ac7ad93a9839256b8703761e76e2d0b2a3fa4f15d9", size = 197968, upload-time = "2025-08-08T13:13:58.654Z" }
515
+ wheels = [
516
+ { url = "https://files.pythonhosted.org/packages/4d/b1/3f5fd73c039fc87dba3ff8b5d528bfc5a32b597fea8e7a6a4800343a17c7/safetensors-0.6.2-cp38-abi3-macosx_10_12_x86_64.whl", hash = "sha256:9c85ede8ec58f120bad982ec47746981e210492a6db876882aa021446af8ffba", size = 454797, upload-time = "2025-08-08T13:13:52.066Z" },
517
+ { url = "https://files.pythonhosted.org/packages/8c/c9/bb114c158540ee17907ec470d01980957fdaf87b4aa07914c24eba87b9c6/safetensors-0.6.2-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:d6675cf4b39c98dbd7d940598028f3742e0375a6b4d4277e76beb0c35f4b843b", size = 432206, upload-time = "2025-08-08T13:13:50.931Z" },
518
+ { url = "https://files.pythonhosted.org/packages/d3/8e/f70c34e47df3110e8e0bb268d90db8d4be8958a54ab0336c9be4fe86dac8/safetensors-0.6.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1d2d2b3ce1e2509c68932ca03ab8f20570920cd9754b05063d4368ee52833ecd", size = 473261, upload-time = "2025-08-08T13:13:41.259Z" },
519
+ { url = "https://files.pythonhosted.org/packages/2a/f5/be9c6a7c7ef773e1996dc214e73485286df1836dbd063e8085ee1976f9cb/safetensors-0.6.2-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:93de35a18f46b0f5a6a1f9e26d91b442094f2df02e9fd7acf224cfec4238821a", size = 485117, upload-time = "2025-08-08T13:13:43.506Z" },
520
+ { url = "https://files.pythonhosted.org/packages/c9/55/23f2d0a2c96ed8665bf17a30ab4ce5270413f4d74b6d87dd663258b9af31/safetensors-0.6.2-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:89a89b505f335640f9120fac65ddeb83e40f1fd081cb8ed88b505bdccec8d0a1", size = 616154, upload-time = "2025-08-08T13:13:45.096Z" },
521
+ { url = "https://files.pythonhosted.org/packages/98/c6/affb0bd9ce02aa46e7acddbe087912a04d953d7a4d74b708c91b5806ef3f/safetensors-0.6.2-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:fc4d0d0b937e04bdf2ae6f70cd3ad51328635fe0e6214aa1fc811f3b576b3bda", size = 520713, upload-time = "2025-08-08T13:13:46.25Z" },
522
+ { url = "https://files.pythonhosted.org/packages/fe/5d/5a514d7b88e310c8b146e2404e0dc161282e78634d9358975fd56dfd14be/safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8045db2c872db8f4cbe3faa0495932d89c38c899c603f21e9b6486951a5ecb8f", size = 485835, upload-time = "2025-08-08T13:13:49.373Z" },
523
+ { url = "https://files.pythonhosted.org/packages/7a/7b/4fc3b2ba62c352b2071bea9cfbad330fadda70579f617506ae1a2f129cab/safetensors-0.6.2-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:81e67e8bab9878bb568cffbc5f5e655adb38d2418351dc0859ccac158f753e19", size = 521503, upload-time = "2025-08-08T13:13:47.651Z" },
524
+ { url = "https://files.pythonhosted.org/packages/5a/50/0057e11fe1f3cead9254315a6c106a16dd4b1a19cd247f7cc6414f6b7866/safetensors-0.6.2-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:b0e4d029ab0a0e0e4fdf142b194514695b1d7d3735503ba700cf36d0fc7136ce", size = 652256, upload-time = "2025-08-08T13:13:53.167Z" },
525
+ { url = "https://files.pythonhosted.org/packages/e9/29/473f789e4ac242593ac1656fbece6e1ecd860bb289e635e963667807afe3/safetensors-0.6.2-cp38-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:fa48268185c52bfe8771e46325a1e21d317207bcabcb72e65c6e28e9ffeb29c7", size = 747281, upload-time = "2025-08-08T13:13:54.656Z" },
526
+ { url = "https://files.pythonhosted.org/packages/68/52/f7324aad7f2df99e05525c84d352dc217e0fa637a4f603e9f2eedfbe2c67/safetensors-0.6.2-cp38-abi3-musllinux_1_2_i686.whl", hash = "sha256:d83c20c12c2d2f465997c51b7ecb00e407e5f94d7dec3ea0cc11d86f60d3fde5", size = 692286, upload-time = "2025-08-08T13:13:55.884Z" },
527
+ { url = "https://files.pythonhosted.org/packages/ad/fe/cad1d9762868c7c5dc70c8620074df28ebb1a8e4c17d4c0cb031889c457e/safetensors-0.6.2-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:d944cea65fad0ead848b6ec2c37cc0b197194bec228f8020054742190e9312ac", size = 655957, upload-time = "2025-08-08T13:13:57.029Z" },
528
+ { url = "https://files.pythonhosted.org/packages/59/a7/e2158e17bbe57d104f0abbd95dff60dda916cf277c9f9663b4bf9bad8b6e/safetensors-0.6.2-cp38-abi3-win32.whl", hash = "sha256:cab75ca7c064d3911411461151cb69380c9225798a20e712b102edda2542ddb1", size = 308926, upload-time = "2025-08-08T13:14:01.095Z" },
529
+ { url = "https://files.pythonhosted.org/packages/2c/c3/c0be1135726618dc1e28d181b8c442403d8dbb9e273fd791de2d4384bcdd/safetensors-0.6.2-cp38-abi3-win_amd64.whl", hash = "sha256:c7b214870df923cbc1593c3faee16bec59ea462758699bd3fee399d00aac072c", size = 320192, upload-time = "2025-08-08T13:13:59.467Z" },
530
+ ]
531
+
532
+ [[package]]
533
+ name = "setuptools"
534
+ version = "80.9.0"
535
+ source = { registry = "https://pypi.org/simple" }
536
+ sdist = { url = "https://files.pythonhosted.org/packages/18/5d/3bf57dcd21979b887f014ea83c24ae194cfcd12b9e0fda66b957c69d1fca/setuptools-80.9.0.tar.gz", hash = "sha256:f36b47402ecde768dbfafc46e8e4207b4360c654f1f3bb84475f0a28628fb19c", size = 1319958, upload-time = "2025-05-27T00:56:51.443Z" }
537
+ wheels = [
538
+ { url = "https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl", hash = "sha256:062d34222ad13e0cc312a4c02d73f059e86a4acbfbdea8f8f76b28c99f306922", size = 1201486, upload-time = "2025-05-27T00:56:49.664Z" },
539
+ ]
540
+
541
+ [[package]]
542
+ name = "sympy"
543
+ version = "1.14.0"
544
+ source = { registry = "https://pypi.org/simple" }
545
+ dependencies = [
546
+ { name = "mpmath" },
547
+ ]
548
+ sdist = { url = "https://files.pythonhosted.org/packages/83/d3/803453b36afefb7c2bb238361cd4ae6125a569b4db67cd9e79846ba2d68c/sympy-1.14.0.tar.gz", hash = "sha256:d3d3fe8df1e5a0b42f0e7bdf50541697dbe7d23746e894990c030e2b05e72517", size = 7793921, upload-time = "2025-04-27T18:05:01.611Z" }
549
+ wheels = [
550
+ { url = "https://files.pythonhosted.org/packages/a2/09/77d55d46fd61b4a135c444fc97158ef34a095e5681d0a6c10b75bf356191/sympy-1.14.0-py3-none-any.whl", hash = "sha256:e091cc3e99d2141a0ba2847328f5479b05d94a6635cb96148ccb3f34671bd8f5", size = 6299353, upload-time = "2025-04-27T18:04:59.103Z" },
551
+ ]
552
+
553
+ [[package]]
554
+ name = "tokenizers"
555
+ version = "0.22.0"
556
+ source = { registry = "https://pypi.org/simple" }
557
+ dependencies = [
558
+ { name = "huggingface-hub" },
559
+ ]
560
+ sdist = { url = "https://files.pythonhosted.org/packages/5e/b4/c1ce3699e81977da2ace8b16d2badfd42b060e7d33d75c4ccdbf9dc920fa/tokenizers-0.22.0.tar.gz", hash = "sha256:2e33b98525be8453f355927f3cab312c36cd3e44f4d7e9e97da2fa94d0a49dcb", size = 362771, upload-time = "2025-08-29T10:25:33.914Z" }
561
+ wheels = [
562
+ { url = "https://files.pythonhosted.org/packages/6d/b1/18c13648edabbe66baa85fe266a478a7931ddc0cd1ba618802eb7b8d9865/tokenizers-0.22.0-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:eaa9620122a3fb99b943f864af95ed14c8dfc0f47afa3b404ac8c16b3f2bb484", size = 3081954, upload-time = "2025-08-29T10:25:24.993Z" },
563
+ { url = "https://files.pythonhosted.org/packages/c2/02/c3c454b641bd7c4f79e4464accfae9e7dfc913a777d2e561e168ae060362/tokenizers-0.22.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:71784b9ab5bf0ff3075bceeb198149d2c5e068549c0d18fe32d06ba0deb63f79", size = 2945644, upload-time = "2025-08-29T10:25:23.405Z" },
564
+ { url = "https://files.pythonhosted.org/packages/55/02/d10185ba2fd8c2d111e124c9d92de398aee0264b35ce433f79fb8472f5d0/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ec5b71f668a8076802b0241a42387d48289f25435b86b769ae1837cad4172a17", size = 3254764, upload-time = "2025-08-29T10:25:12.445Z" },
565
+ { url = "https://files.pythonhosted.org/packages/13/89/17514bd7ef4bf5bfff58e2b131cec0f8d5cea2b1c8ffe1050a2c8de88dbb/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ea8562fa7498850d02a16178105b58803ea825b50dc9094d60549a7ed63654bb", size = 3161654, upload-time = "2025-08-29T10:25:15.493Z" },
566
+ { url = "https://files.pythonhosted.org/packages/5a/d8/bac9f3a7ef6dcceec206e3857c3b61bb16c6b702ed7ae49585f5bd85c0ef/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:4136e1558a9ef2e2f1de1555dcd573e1cbc4a320c1a06c4107a3d46dc8ac6e4b", size = 3511484, upload-time = "2025-08-29T10:25:20.477Z" },
567
+ { url = "https://files.pythonhosted.org/packages/aa/27/9c9800eb6763683010a4851db4d1802d8cab9cec114c17056eccb4d4a6e0/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:cdf5954de3962a5fd9781dc12048d24a1a6f1f5df038c6e95db328cd22964206", size = 3712829, upload-time = "2025-08-29T10:25:17.154Z" },
568
+ { url = "https://files.pythonhosted.org/packages/10/e3/b1726dbc1f03f757260fa21752e1921445b5bc350389a8314dd3338836db/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8337ca75d0731fc4860e6204cc24bb36a67d9736142aa06ed320943b50b1e7ed", size = 3408934, upload-time = "2025-08-29T10:25:18.76Z" },
569
+ { url = "https://files.pythonhosted.org/packages/d4/61/aeab3402c26874b74bb67a7f2c4b569dde29b51032c5384db592e7b216f4/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a89264e26f63c449d8cded9061adea7b5de53ba2346fc7e87311f7e4117c1cc8", size = 3345585, upload-time = "2025-08-29T10:25:22.08Z" },
570
+ { url = "https://files.pythonhosted.org/packages/bc/d3/498b4a8a8764cce0900af1add0f176ff24f475d4413d55b760b8cdf00893/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:790bad50a1b59d4c21592f9c3cf5e5cf9c3c7ce7e1a23a739f13e01fb1be377a", size = 9322986, upload-time = "2025-08-29T10:25:26.607Z" },
571
+ { url = "https://files.pythonhosted.org/packages/a2/62/92378eb1c2c565837ca3cb5f9569860d132ab9d195d7950c1ea2681dffd0/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:76cf6757c73a10ef10bf06fa937c0ec7393d90432f543f49adc8cab3fb6f26cb", size = 9276630, upload-time = "2025-08-29T10:25:28.349Z" },
572
+ { url = "https://files.pythonhosted.org/packages/eb/f0/342d80457aa1cda7654327460f69db0d69405af1e4c453f4dc6ca7c4a76e/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:1626cb186e143720c62c6c6b5371e62bbc10af60481388c0da89bc903f37ea0c", size = 9547175, upload-time = "2025-08-29T10:25:29.989Z" },
573
+ { url = "https://files.pythonhosted.org/packages/14/84/8aa9b4adfc4fbd09381e20a5bc6aa27040c9c09caa89988c01544e008d18/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:da589a61cbfea18ae267723d6b029b84598dc8ca78db9951d8f5beff72d8507c", size = 9692735, upload-time = "2025-08-29T10:25:32.089Z" },
574
+ { url = "https://files.pythonhosted.org/packages/bf/24/83ee2b1dc76bfe05c3142e7d0ccdfe69f0ad2f1ebf6c726cea7f0874c0d0/tokenizers-0.22.0-cp39-abi3-win32.whl", hash = "sha256:dbf9d6851bddae3e046fedfb166f47743c1c7bd11c640f0691dd35ef0bcad3be", size = 2471915, upload-time = "2025-08-29T10:25:36.411Z" },
575
+ { url = "https://files.pythonhosted.org/packages/d1/9b/0e0bf82214ee20231845b127aa4a8015936ad5a46779f30865d10e404167/tokenizers-0.22.0-cp39-abi3-win_amd64.whl", hash = "sha256:c78174859eeaee96021f248a56c801e36bfb6bd5b067f2e95aa82445ca324f00", size = 2680494, upload-time = "2025-08-29T10:25:35.14Z" },
576
+ ]
577
+
578
+ [[package]]
579
+ name = "torch"
580
+ version = "2.8.0"
581
+ source = { registry = "https://pypi.org/simple" }
582
+ dependencies = [
583
+ { name = "filelock" },
584
+ { name = "fsspec" },
585
+ { name = "jinja2" },
586
+ { name = "networkx" },
587
+ { name = "nvidia-cublas-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
588
+ { name = "nvidia-cuda-cupti-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
589
+ { name = "nvidia-cuda-nvrtc-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
590
+ { name = "nvidia-cuda-runtime-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
591
+ { name = "nvidia-cudnn-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
592
+ { name = "nvidia-cufft-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
593
+ { name = "nvidia-cufile-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
594
+ { name = "nvidia-curand-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
595
+ { name = "nvidia-cusolver-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
596
+ { name = "nvidia-cusparse-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
597
+ { name = "nvidia-cusparselt-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
598
+ { name = "nvidia-nccl-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
599
+ { name = "nvidia-nvjitlink-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
600
+ { name = "nvidia-nvtx-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
601
+ { name = "setuptools" },
602
+ { name = "sympy" },
603
+ { name = "triton", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
604
+ { name = "typing-extensions" },
605
+ ]
606
+ wheels = [
607
+ { url = "https://files.pythonhosted.org/packages/10/4e/469ced5a0603245d6a19a556e9053300033f9c5baccf43a3d25ba73e189e/torch-2.8.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:2b2f96814e0345f5a5aed9bf9734efa913678ed19caf6dc2cddb7930672d6128", size = 101936856, upload-time = "2025-08-06T14:54:01.526Z" },
608
+ { url = "https://files.pythonhosted.org/packages/16/82/3948e54c01b2109238357c6f86242e6ecbf0c63a1af46906772902f82057/torch-2.8.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:65616ca8ec6f43245e1f5f296603e33923f4c30f93d65e103d9e50c25b35150b", size = 887922844, upload-time = "2025-08-06T14:55:50.78Z" },
609
+ { url = "https://files.pythonhosted.org/packages/e3/54/941ea0a860f2717d86a811adf0c2cd01b3983bdd460d0803053c4e0b8649/torch-2.8.0-cp313-cp313-win_amd64.whl", hash = "sha256:659df54119ae03e83a800addc125856effda88b016dfc54d9f65215c3975be16", size = 241330968, upload-time = "2025-08-06T14:54:45.293Z" },
610
+ { url = "https://files.pythonhosted.org/packages/de/69/8b7b13bba430f5e21d77708b616f767683629fc4f8037564a177d20f90ed/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:1a62a1ec4b0498930e2543535cf70b1bef8c777713de7ceb84cd79115f553767", size = 73915128, upload-time = "2025-08-06T14:54:34.769Z" },
611
+ { url = "https://files.pythonhosted.org/packages/15/0e/8a800e093b7f7430dbaefa80075aee9158ec22e4c4fc3c1a66e4fb96cb4f/torch-2.8.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:83c13411a26fac3d101fe8035a6b0476ae606deb8688e904e796a3534c197def", size = 102020139, upload-time = "2025-08-06T14:54:39.047Z" },
612
+ { url = "https://files.pythonhosted.org/packages/4a/15/5e488ca0bc6162c86a33b58642bc577c84ded17c7b72d97e49b5833e2d73/torch-2.8.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:8f0a9d617a66509ded240add3754e462430a6c1fc5589f86c17b433dd808f97a", size = 887990692, upload-time = "2025-08-06T14:56:18.286Z" },
613
+ { url = "https://files.pythonhosted.org/packages/b4/a8/6a04e4b54472fc5dba7ca2341ab219e529f3c07b6941059fbf18dccac31f/torch-2.8.0-cp313-cp313t-win_amd64.whl", hash = "sha256:a7242b86f42be98ac674b88a4988643b9bc6145437ec8f048fea23f72feb5eca", size = 241603453, upload-time = "2025-08-06T14:55:22.945Z" },
614
+ { url = "https://files.pythonhosted.org/packages/04/6e/650bb7f28f771af0cb791b02348db8b7f5f64f40f6829ee82aa6ce99aabe/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:7b677e17f5a3e69fdef7eb3b9da72622f8d322692930297e4ccb52fefc6c8211", size = 73632395, upload-time = "2025-08-06T14:55:28.645Z" },
615
+ ]
616
+
617
+ [[package]]
618
+ name = "tqdm"
619
+ version = "4.67.1"
620
+ source = { registry = "https://pypi.org/simple" }
621
+ dependencies = [
622
+ { name = "colorama", marker = "sys_platform == 'win32'" },
623
+ ]
624
+ sdist = { url = "https://files.pythonhosted.org/packages/a8/4b/29b4ef32e036bb34e4ab51796dd745cdba7ed47ad142a9f4a1eb8e0c744d/tqdm-4.67.1.tar.gz", hash = "sha256:f8aef9c52c08c13a65f30ea34f4e5aac3fd1a34959879d7e59e63027286627f2", size = 169737, upload-time = "2024-11-24T20:12:22.481Z" }
625
+ wheels = [
626
+ { url = "https://files.pythonhosted.org/packages/d0/30/dc54f88dd4a2b5dc8a0279bdd7270e735851848b762aeb1c1184ed1f6b14/tqdm-4.67.1-py3-none-any.whl", hash = "sha256:26445eca388f82e72884e0d580d5464cd801a3ea01e63e5601bdff9ba6a48de2", size = 78540, upload-time = "2024-11-24T20:12:19.698Z" },
627
+ ]
628
+
629
+ [[package]]
630
+ name = "transformers"
631
+ version = "4.56.1"
632
+ source = { registry = "https://pypi.org/simple" }
633
+ dependencies = [
634
+ { name = "filelock" },
635
+ { name = "huggingface-hub" },
636
+ { name = "numpy" },
637
+ { name = "packaging" },
638
+ { name = "pyyaml" },
639
+ { name = "regex" },
640
+ { name = "requests" },
641
+ { name = "safetensors" },
642
+ { name = "tokenizers" },
643
+ { name = "tqdm" },
644
+ ]
645
+ sdist = { url = "https://files.pythonhosted.org/packages/89/21/dc88ef3da1e49af07ed69386a11047a31dcf1aaf4ded3bc4b173fbf94116/transformers-4.56.1.tar.gz", hash = "sha256:0d88b1089a563996fc5f2c34502f10516cad3ea1aa89f179f522b54c8311fe74", size = 9855473, upload-time = "2025-09-04T20:47:13.14Z" }
646
+ wheels = [
647
+ { url = "https://files.pythonhosted.org/packages/71/7c/283c3dd35e00e22a7803a0b2a65251347b745474a82399be058bde1c9f15/transformers-4.56.1-py3-none-any.whl", hash = "sha256:1697af6addfb6ddbce9618b763f4b52d5a756f6da4899ffd1b4febf58b779248", size = 11608197, upload-time = "2025-09-04T20:47:04.895Z" },
648
+ ]
649
+
650
+ [[package]]
651
+ name = "triton"
652
+ version = "3.4.0"
653
+ source = { registry = "https://pypi.org/simple" }
654
+ dependencies = [
655
+ { name = "setuptools" },
656
+ ]
657
+ wheels = [
658
+ { url = "https://files.pythonhosted.org/packages/30/7b/0a685684ed5322d2af0bddefed7906674f67974aa88b0fae6e82e3b766f6/triton-3.4.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:00be2964616f4c619193cb0d1b29a99bd4b001d7dc333816073f92cf2a8ccdeb", size = 155569223, upload-time = "2025-07-30T19:58:44.017Z" },
659
+ { url = "https://files.pythonhosted.org/packages/20/63/8cb444ad5cdb25d999b7d647abac25af0ee37d292afc009940c05b82dda0/triton-3.4.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7936b18a3499ed62059414d7df563e6c163c5e16c3773678a3ee3d417865035d", size = 155659780, upload-time = "2025-07-30T19:58:51.171Z" },
660
+ ]
661
+
662
+ [[package]]
663
+ name = "typing-extensions"
664
+ version = "4.15.0"
665
+ source = { registry = "https://pypi.org/simple" }
666
+ sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" }
667
+ wheels = [
668
+ { url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" },
669
+ ]
670
+
671
+ [[package]]
672
+ name = "urllib3"
673
+ version = "2.5.0"
674
+ source = { registry = "https://pypi.org/simple" }
675
+ sdist = { url = "https://files.pythonhosted.org/packages/15/22/9ee70a2574a4f4599c47dd506532914ce044817c7752a79b6a51286319bc/urllib3-2.5.0.tar.gz", hash = "sha256:3fc47733c7e419d4bc3f6b3dc2b4f890bb743906a30d56ba4a5bfa4bbff92760", size = 393185, upload-time = "2025-06-18T14:07:41.644Z" }
676
+ wheels = [
677
+ { url = "https://files.pythonhosted.org/packages/a7/c2/fe1e52489ae3122415c51f387e221dd0773709bad6c6cdaa599e8a2c5185/urllib3-2.5.0-py3-none-any.whl", hash = "sha256:e6b01673c0fa6a13e374b50871808eb3bf7046c4b125b216f6bf1cc604cff0dc", size = 129795, upload-time = "2025-06-18T14:07:40.39Z" },
678
+ ]