add mlx and mlx-lm support

Files changed (19) hide show

README.md +695 -5
check_shape.py +80 -0
conversion.log +8 -0
custom_mlx_lm/README.md +48 -0
custom_mlx_lm/__init__.py +0 -0
custom_mlx_lm/custom_convert.py +305 -0
custom_mlx_lm/custom_loader.py +61 -0
custom_mlx_lm/inference_mlx_lm.py +131 -0
custom_mlx_lm/quant_summary.py +67 -0
inference.py +270 -0
main.py +6 -0
mlx_technical_summary.md +96 -0
model.py +339 -0
pr-16104-summary.md +0 -0
pyproject.toml +23 -0
quantization.log +38 -0
requirements.txt +29 -0
test_model.py +103 -0
uv.lock +678 -0

README.md CHANGED Viewed

@@ -1,5 +1,695 @@
----
-license: other
-license_name: fair-noncommercial-research-license
-license_link: LICENSE
----

+---
+license: other
+license_name: fair-noncommercial-research
+extra_gated_prompt: >
+  FAIR Noncommercial Research License v1 Last Updated: August 18, 2025
+  “Acceptable Use Policy” means the FAIR Acceptable Use Policy, applicable to
+  Research Materials, that is incorporated into this Agreement.
+  “Agreement” means the terms and conditions for use, reproduction, distribution
+  and modification of the Research Materials set forth herein.
+  “Documentation” means the specifications, manuals and documentation
+  accompanying  Research Materials distributed by Meta.
+  “Licensee” or “you” means you, or your employer or any other person or entity
+  (if you are entering into this Agreement on such person or entity’s behalf),
+  of the age required under applicable laws, rules or regulations to provide
+  legal consent and that has legal authority to bind your employer or such other
+  person or entity if you are entering in this Agreement on their behalf.
+  “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or,
+  if you are an entity, your principal place of business is in the EEA or
+  Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA
+  or Switzerland).
+  “Noncommercial Research Uses” means noncommercial research use cases related
+  to research, development, education, processing, or analysis and in each case,
+  is not primarily intended for commercial advantage or monetary compensation to
+  you or others.
+  “Research Materials” means, collectively, Documentation and the models,
+  software and algorithms, including machine-learning model code, trained model
+  weights, inference-enabling code, training-enabling code, fine-tuning enabling
+  code, demonstration materials and other elements of the foregoing distributed
+  by Meta and made available under this Agreement.
+  By clicking “I Accept” below or by using or distributing any portion or
+  element of the Research Materials, you agree to be bound by this Agreement.
+  1. License Rights and Redistribution.
+  a. Grant of Rights. You are granted a non-exclusive, worldwide,
+  non-transferable and royalty-free limited license under Meta’s intellectual
+  property or other rights owned by Meta embodied in the Research Materials to
+  use, reproduce, distribute, copy, create derivative works of, and make
+  modifications to the Research Materials.
+  b. Redistribution and Use.   i. You will not use the Research Materials or any
+  outputs or results of the Research Materials in connection with any commercial
+  uses or for any uses other than Noncommercial Research Uses;
+  ii. Distribution of Research Materials, and any derivative works thereof, are
+  subject to the terms of this Agreement. If you distribute or make the Research
+  Materials, or any derivative works thereof, available to a third party, you
+  may only do so under the terms of this Agreement. You shall also provide a
+  copy of this Agreement to such third party.
+  iii.  If you submit for publication the results of research you perform on,
+  using, or otherwise in connection with Research Materials, you must
+  acknowledge the use of Research Materials in your publication.
+  iv. Your use of the Research Materials must comply with applicable laws and
+  regulations (including Trade Control Laws) and adhere to the FAIR Acceptable
+  Use Policy, which is hereby incorporated by reference into this Agreement. 2.
+  User Support. Your Noncommercial Research Use of the Research Materials is
+  done at your own discretion; Meta does not process any information nor provide
+  any service in relation to such use.  Meta is under no obligation to provide
+  any support services for the Research Materials. Any support provided is “as
+  is”, “with all faults”, and without warranty of any kind.
+  3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE RESEARCH
+  MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS”
+  BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF
+  ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY
+  WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A
+  PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE
+  APPROPRIATENESS OF USING OR REDISTRIBUTING THE RESEARCH MATERIALS AND ASSUME
+  ANY RISKS ASSOCIATED WITH YOUR USE OF THE RESEARCH MATERIALS AND ANY OUTPUT
+  AND RESULTS.
+  4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE
+  UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS
+  LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS
+  OR ANY DIRECT OR INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR
+  PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE
+  POSSIBILITY OF ANY OF THE FOREGOING.
+  5. Intellectual Property.
+  a. Subject to Meta’s ownership of Research Materials and derivatives made by
+  or for Meta, with respect to any derivative works and modifications of the
+  Research Materials that are made by you, as between you and Meta, you are and
+  will be the owner of such derivative works and modifications.
+  b. If you institute litigation or other proceedings against Meta or any entity
+  (including a cross-claim or counterclaim in a lawsuit) alleging that the
+  Research Materials, outputs or results, or any portion of any of the
+  foregoing, constitutes infringement of intellectual property or other rights
+  owned or licensable by you, then any licenses granted to you under this
+  Agreement shall terminate as of the date such litigation or claim is filed or
+  instituted. You will indemnify and hold harmless Meta from and against any
+  claim by any third party arising out of or related to your use or distribution
+  of the Research Materials.
+  6. Term and Termination. The term of this Agreement will commence upon your
+  acceptance of this Agreement or access to the Research Materials and will
+  continue in full force and effect until terminated in accordance with the
+  terms and conditions herein. Meta may terminate this Agreement if you are in
+  breach of any term or condition of this Agreement. Upon termination of this
+  Agreement, you shall delete and cease use of the Research Materials. Sections
+  3, 4 and 7 shall survive the termination of this Agreement.
+  7. Governing Law and Jurisdiction. This Agreement will be governed and
+  construed under the laws of the State of California without regard to choice
+  of law principles, and the UN Convention on Contracts for the International
+  Sale of Goods does not apply to this Agreement. The courts of California shall
+  have exclusive jurisdiction of any dispute arising out of this Agreement.
+  8. Modifications and Amendments. Meta may modify this Agreement from time to
+  time; provided that they are similar in spirit to the current version of the
+  Agreement, but may differ in detail to address new problems or concerns. All
+  such changes will be effective immediately. Your continued use of the Research
+  Materials after any modification to this Agreement constitutes your agreement
+  to such modification. Except as provided in this Agreement, no modification or
+  addition to any provision of this Agreement will be binding unless it is in
+  writing and signed by an authorized representative of both you and Meta.
+  FAIR Acceptable Use Policy
+  The Fundamental AI Research (FAIR) team at Meta seeks to further understanding
+  of new and existing research domains with the mission of advancing the
+  state-of-the-art in artificial intelligence through open research for the
+  benefit of all.
+  As part of this mission, Meta makes certain research materials available for
+  noncommercial research use. Meta is committed to promoting the safe and
+  responsible use of such research materials.
+  Prohibited Uses
+  You agree you will not use, or allow others to use, Research Materials to:
+  Violate the law or others’ rights, including to: Engage in, promote, generate,
+  contribute to, encourage, plan, incite, or further illegal or unlawful
+  activity or content, such as: Violence or terrorism Exploitation or harm to
+  children, including the solicitation, creation, acquisition, or dissemination
+  of child exploitative content or failure to report Child Sexual Abuse Material
+  Human trafficking, exploitation, and sexual violence The illegal distribution
+  of information or materials to minors, including obscene materials, or failure
+  to employ legally required age-gating in connection with such information or
+  materials. Sexual solicitation Any other criminal activity
+  Engage in, promote, incite, or facilitate the harassment, abuse, threatening,
+  or bullying of individuals or groups of individuals
+  Engage in, promote, incite, or facilitate discrimination or other unlawful or
+  harmful conduct in the provision of employment, employment benefits, credit,
+  housing, other economic benefits, or other essential goods and services
+  Engage in the unauthorized or unlicensed practice of any profession including,
+  but not limited to, financial, legal, medical/health, or related professional
+  practices
+  Collect, process, disclose, generate, or infer health, demographic, or other
+  sensitive personal or private information about individuals without rights and
+  consents required by applicable laws
+  Engage in or facilitate any action or generate any content that infringes,
+  misappropriates, or otherwise violates any third-party rights, including the
+  outputs or results of any technology using FAIR research materials
+  Create, generate, or facilitate the creation of malicious code, malware,
+  computer viruses or do anything else that could disable, overburden, interfere
+  with or impair the proper working, integrity, operation or appearance of a
+  website or computer system
+  2. Engage in, promote, incite, facilitate, or assist in the planning or
+  development of activities that present a risk of death or bodily harm to
+  individuals, including use of research artifacts related to the following:
+  Military, warfare, nuclear industries or applications, espionage, use for
+  materials or activities that are subject to the International Traffic Arms
+  Regulations (ITAR) maintained by the United States Department of State
+  Guns and illegal weapons (including weapon development)
+  Illegal drugs and regulated/controlled substances
+  Operation of critical infrastructure, transportation technologies, or heavy
+  machinery
+  Self-harm or harm to others, including suicide, cutting, and eating disorders
+  Any content intended to incite or promote violence, abuse, or any infliction
+  of bodily harm to an individual
+  3. Intentionally deceive or mislead others, including use of FAIR Research
+  Materials related to the following:
+  Generating, promoting, or furthering fraud or the creation or promotion of
+  disinformation
+  Generating, promoting, or furthering defamatory content, including the
+  creation of defamatory statements, images, or other content
+  Generating, promoting, or further distributing spam
+  Impersonating another individual without consent, authorization, or legal
+  right
+  Representing that outputs of FAIR research materials or outputs from
+  technology using FAIR research materials are human-generated
+  Generating or facilitating false online engagement, including fake reviews and
+  other means of fake online engagement
+  4. Fail to appropriately disclose to end users any known dangers of your
+  Research Materials.
+  Please report any violation of this Policy or other problems that could lead
+  to a violation of this Policy by submitting a report here
+  [https://docs.google.com/forms/d/e/1FAIpQLSeb11cryAopJ7LNrC4nxEUXrHY26hfkXQMf_uH-oFgA3WlYZQ/viewform].
+extra_gated_fields:
+  First Name: text
+  Last Name: text
+  Date of birth: date_picker
+  Country: country
+  Affiliation: text
+  Job title:
+    type: select
+    options:
+    - Student
+    - Research Graduate
+    - AI researcher
+    - AI developer/engineer
+    - Reporter
+    - Other
+  geo: ip_location
+  By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
+extra_gated_description: >-
+  The information you provide will be collected, stored, processed and shared in
+  accordance with the [Meta Privacy
+  Policy](https://www.facebook.com/privacy/policy/).
+extra_gated_button_content: Submit
+extra_gated_heading: >-
+  Please be sure to provide your full legal name, date of birth, and full
+  organization name with all corporate identifiers. Avoid the use of acronyms
+  and special characters. Failure to follow these instructions may prevent you
+  from accessing this model and others on Hugging Face. You will not have the
+  ability to edit this form after submission, so please ensure all information
+  is accurate.
+language:
+- en
+library_name: mlx
+pipeline_tag: text-generation
+tags:
+- facebook
+- meta
+- pytorch
+- mobilellm
+- mlx
+ - apple-mlx
+ - runtime
+base_model:
+- facebook/MobileLLM-R1-950M
+---
+# MLX Runtime (Apple silicon) — Added Files & Usage
+This fork adds a lightweight MLX runtime so you can run the original MobileLLM‑R1‑950M weights with Apple’s MLX on Apple silicon. It keeps the original weights (`model.safetensors`) and tokenizer; only the runtime is added.
+## Technical Documentation
+For detailed technical information about this port, see:
+- [**MLX Technical Summary**](mlx_technical_summary.md) - Challenges and solutions for porting MobileLLM-R1 to MLX in this PoC conversion.
+- [**Conversion Log**](conversion.log) - Details of the model conversion process
+- [**Quantization Log**](quantization.log) - Information about quantization procedures and results
+What’s included (added files)
+- `model.py` — Minimal MLX implementation of the architecture with GQA, optional Q/K norm, RoPE, and output weight tying.
+- `inference.py` — Simple text generation CLI with temperature, top‑p, greedy mode, optional chat template, EOS handling, plus boxed‑answer controls for math.
+- `test_model.py` — Diagnostics to verify model structure/parameter shapes and key weight presence.
+- `check_shape.py` — Heuristic check to inspect the MLP variant from `model.safetensors` and `config.json`.
+- `main.py` — Convenience entry for quick manual tests.
+Notes
+- This is an MLX runtime; it does not change or fine‑tune the weights. The README front‑matter marks this repo as a derivative of `facebook/MobileLLM-R1-950M` via `base_model` so it appears correctly on Hugging Face.
+- Tested via `uv` on macOS with Python 3.13; deps are pinned in `uv.lock`/`pyproject.toml`.
+Quick start (MLX, local safetensors)
+- Install and run with uv: `uv run python inference.py --prompt "What is 2+2?" --temperature 0.0 --max-tokens 64`
+- Use chat template (default if `chat_template.jinja` present): `uv run python inference.py --prompt "Explain quicksort in 1–2 sentences." --temperature 0.7 --top-p 0.9`
+- Disable chat template: `uv run python inference.py --prompt "Explain quicksort in 1–2 sentences." --disable-chat-template --temperature 0.7 --top-p 0.9`
+- Math mode, final answer only: `uv run python inference.py --prompt "Compute 17 * 23. Put your final answer in \\boxed{.}" --temperature 0.0 --final-only --stop-at-boxed --extract-boxed --max-tokens 128`
+Tips
+- If a sampled response stops mid‑sentence, increase `--max-tokens` (e.g., 192–256) or use a lower `--temperature`/`--top-p`.
+- For concise answers with the chat template, pass a system prompt: `--system "Be concise. Answer in 1–2 sentences."`.
+Diagnostics
+- Structure/weights check: `uv run python test_model.py`
+- MLP variant heuristic: `uv run python check_shape.py .`
+Details
+- The loader maps HF weight names to MLX module names and detects the MLP variant from weight keys to ensure correct layer wiring.
+- Attention uses standard `1/sqrt(d)` scaling for best generation quality.
+```markdown
+## Installation
+This project uses `uv` for dependency management.
+### Using uv (recommended)
+```bash
+# 1. Clone the repo
+git clone <your-repo>
+cd <your-repo>
+# 2. Sync all dependencies (includes the default set)
+uv sync
+# 3. (Optional) Add the torch group if you plan to customize/train models
+uv sync --extra torch
+```
+### Without uv
+If you prefer pip/venv, a `requirements.txt` is provided:
+```bash
+python -m venv .venv
+source .venv/bin/activate  # Windows: .venv\Scripts\activate
+pip install -r requirements.txt
+```
+> The `torch` extra is only required if you intend to fine-tune or swap model back-ends; the default installation already supports inference.
+```
+## MLX Inference Examples (safetensors)
+- Basic greedy generation:
+  - `uv run python inference.py --prompt "MobileLLM-R1 runs on MLX." --temperature 0 --max-tokens 64`
+- Chat-style with template:
+  - `uv run python inference.py --prompt "Briefly summarize quicksort." --temperature 0.7 --top-p 0.9`
+- Disable the chat template:
+  - `uv run python inference.py --prompt "Briefly summarize quicksort." --disable-chat-template --temperature 0.7 --top-p 0.9`
+- Math/coding “final answer only”:
+  - `uv run python inference.py --prompt "Solve: 128 / 8. Put final answer in \\boxed{.}" --temperature 0 --final-only --stop-at-boxed --extract-boxed`
+## Design Choices (why not a trivial block)
+This runtime mirrors the functional details of the released weights so they load 1:1 and generate well in MLX. A minimal “one size fits all” block hides critical differences and leads to poor output quality. Key choices:
+- Attention layout and features
+  - Grouped-Query Attention (GQA): separate `num_attention_heads` vs `num_key_value_heads` with head_dim from config. We implement a custom `Attention` so K/V can be repeated across groups and still match the HF weight layout.
+  - Q/K normalization: optional RMSNorm applied to per-head Q and K, controlled by `use_qk_norm`.
+  - RoPE: MLX `nn.RoPE` with the model’s `rope_theta` (8e6 here), and a per-layer toggle via `no_rope_layers`. We gate RoPE per block, with a safe fallback if the list disables all layers.
+  - Scaling: we use standard `1/sqrt(d)` for SDPA. Some configs expose an `attn_scale` used for training tricks; applying it at inference severely degraded outputs, so it’s not multiplied into SDPA.
+- MLP variant detection
+  - MobileLLM variants use either standard SwiGLU (gate_proj/up_proj/down_proj) or a dual-branch dense MLP. We detect the variant from weight keys in `model.safetensors` and instantiate the correct module so shapes and semantics match.
+- Weight tying and mapping
+  - Tie output logits to the token embedding matrix when `tie_word_embeddings` is true, matching HF behavior and saving memory.
+  - Map HF names to MLX names during load: `model.embed_tokens`→`tok_embeddings`, layer/attn/norm renames, `mlp.`→`feed_forward.`, `model.norm`→`norm`.
+- Template and decoding
+  - Provide a Jinja chat template for parity with HF chat usage, but allow `--disable-chat-template` for raw prompting. Multiple EOS IDs are supported.
+  - Sampling: temperature, top‑p, and greedy; optional repetition/frequency penalties; math helpers `--final-only/--stop-at-boxed/--extract-boxed` to keep answers concise.
+# Model Details
+We present MobileLLM-R1, a new series of efficient reasoning models in the MobileLLM family. The release includes two categories of models:
+Base models:
+- [MobileLLM-R1-140M-base](https://huggingface.co/facebook/MobileLLM-R1-140M-base/)
+- [MobileLLM-R1-360M-base](https://huggingface.co/facebook/MobileLLM-R1-360M-base/)
+- [MobileLLM-R1-950M-base](https://huggingface.co/facebook/MobileLLM-R1-950M-base/)
+Final models:
+- [MobileLLM-R1-140M](https://huggingface.co/facebook/MobileLLM-R1-140M/)
+- [MobileLLM-R1-360M](https://huggingface.co/facebook/MobileLLM-R1-360M/)
+- [MobileLLM-R1-950M](https://huggingface.co/facebook/MobileLLM-R1-950M/)
+> **Note**: These models are not general-purpose chat models. They are Supervised Fine-Tuned (SFT) models, specifically trained to address mathematical, programming (Python, C++), and scientific problems.
+In addition to the models, we release the complete training recipes and data sources to ensure reproducibility and support further research.
+Remarkably, the MobileLLM-R1 950M, pre-trained on only **~2T high-quality tokens** and with fewer than 5T total training tokens, achieves comparable or superior performance to Qwen3 0.6B, which was trained on 36T tokens, across MATH, GSM8K, MMLU, and LiveCodeBench benchmarks.
+Compared to existing fully open-source models, MobileLLM-R1 950M model achieves **~5× higher accuracy on MATH** compared to the Olmo 1.24B model and **~2× higher accuracy** relative to the SmolLM2 1.7B model, despite being substantially smaller in parameter scale. In addition, MobileLLM-R1 950M outperforms both Olmo 1.24B and SmolLM2 1.7B **by a wide margin on coding benchmarks**, establishing a new state-of-the-art among fully open-source models.
+# Highlights
+### Pretrained Model
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/660f893bae89429c07a32cdb/b9rg8yZTxeWhRWus_tJR_.jpeg)
+### Token efficiency comparison across pretrained models
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/660f893bae89429c07a32cdb/dJtdh5dmVTdowP1gMR5qQ.jpeg)
+### Post-trained Model
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/660f893bae89429c07a32cdb/0MxKBLDfb8xRwg-uVi1WQ.png)
+**Model Architecture**:
+|  | # Layers | # Attnetion Heads | # KV Heads | Dim | Hidden Dim | Params |
+| --- | --- | --- | --- | --- | --- | --- |
+| MobileLLM-R1-140M | 15 | 9 | 3 | 576 | 2048 | 140M |
+| MobileLLM-R1-360M | 15 | 16 | 4 | 1024 | 4096 | 359M |
+| MobileLLM-R1-950M | 22 | 24 | 6 | 1536 | 6144 | 949M |
+|  | Input modalities | Output modalities | Context Length | Vocaburary Size | Shared Embeddings |
+| --- | --- | --- | --- | --- | --- |
+| [MobileLLM-R1-140M-base](https://huggingface.co/facebook/MobileLLM-R1-140M-base) | Text | Text | 4k | 128k | Yes |
+| [MobileLLM-R1-360M-base](https://huggingface.co/facebook/MobileLLM-R1-360M-base) | Text | Text | 4k | 128k | Yes |
+| [MobileLLM-R1-950M-base](https://huggingface.co/facebook/MobileLLM-R1-950M-base) | Text | Text | 4k | 128k | Yes |
+| [MobileLLM-R1-140M](https://huggingface.co/facebook/MobileLLM-R1-140M) | Text | Text | 32k | 128k | Yes |
+| [MobileLLM-R1-360M](https://huggingface.co/facebook/MobileLLM-R1-360M) | Text | Text | 32k | 128k | Yes |
+| [MobileLLM-R1-950M](https://huggingface.co/facebook/MobileLLM-R1-950M) | Text | Text | 32k | 128k | Yes |
+# How to use
+To load the pretrained model for further finetuning or evaluation:
+```bash
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("facebook/MobileLLM-R1-950M")
+model = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-R1-950M")
+```
+# Inference examples
+## Inference (MLX)
+Use the MLX runtime provided in this repo to run the local `model.safetensors` on Apple silicon.
+- Basic: `uv run python inference.py --prompt "Hello MLX" --temperature 0.7 --top-p 0.9`
+- Deterministic: `uv run python inference.py --prompt "Hello MLX" --temperature 0 --max-tokens 64`
+Flags in `inference.py`
+- `--model-path`: path to model directory (default: `.`)
+- `--prompt`: input text
+- `--max-tokens`: number of tokens to generate
+- `--temperature`: 0 for greedy, >0 for sampling
+- `--top-p`: nucleus sampling cutoff
+- `--system`: optional system message when using chat template
+- `--final-only`: instructs model to output only a final boxed answer
+- `--stop-at-boxed`: stop generation after closing `}` following `\boxed{`
+- `--extract-boxed`: print the last `\boxed{...}` content
+- `--disable-chat-template`: bypass `chat_template.jinja` and send raw prompt (with BOS)
+- `--repetition-penalty`: discourage previously generated tokens (>1.0)
+- `--frequency-penalty`: subtract alpha * token frequency from logits
+See also: the “MLX Runtime (Apple silicon) — Added Files & Usage” section above for more examples and notes.
+Transformers
+```py
+from transformers import pipeline
+import torch
+model_id = "facebook/MobileLLM-R1-950M"
+pipe = pipeline(
+    "text-generation",
+    model=model_id,
+    torch_dtype="auto",
+    device_map="auto",
+)
+# Math problem / default scenario
+messages = [
+    {
+        "role": "system",
+        "content": "Please reason step by step, and put your final answer within \\boxed{}."
+    },
+    {"role": "user", "content": "Compute: $1-2+3-4+5- \\dots +99-100$."},
+]
+# C++ coding scenario
+messages = [
+    {
+        "role": "system",
+        "content": (
+            "\nYou are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.\n\n"
+            "Please use c++ programming language only.\n"
+            "You must use ```cpp for just the final solution code block with the following format:\n"
+            "```cpp\n# Your code here\n```\n"
+        )
+    },
+    {"role": "user", "content": "Write a C++ program that prints 'Hello, World!'."},
+]
+# Python coding scenario
+messages = [
+    {
+        "role": "system",
+        "content": (
+            "\nYou are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.\n\n"
+            "Please use python programming language only.\n"
+            "You must use ```python for just the final solution code block with the following format:\n"
+            "```python\n# Your code here\n```\n"
+        )
+    },
+    {"role": "user", "content": "Write a Python function that returns the square of a number."},
+]
+outputs = pipe(
+    messages,
+    max_new_tokens=8192,
+)
+print(outputs[0]["generated_text"][-1])
+```
+You can also run inference with vLLM. You only need to register the model architecture Llama4ForCausalLM with the vLLM ModelRegistry.
+```bash
+from vllm.model_executor.models.llama4 import Llama4ForCausalLM
+from vllm.model_executor.models.registry import ModelRegistry
+ModelRegistry.register_model("Llama4ForCausalLM", Llama4ForCausalLM)
+```
+# Evaluation
+## MobileLLM-R1 base model
+| Model | Size | MATH500 | GSM8K | MBPP | HumanEval | CommonSense Avg. | MMLU |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+|  |  | 4-shot <br> em | 8-shot <br> em | 3-shot <br> pass@1 | 0-shot <br> pass@1 | 0-shot <br> accuracy | 5-shot <br> accuracy |
+|  |
+| *<150M* |  |  |  |  |  |  |  |
+| SmolLM2-135M-base | 135M | 0.4 | 1.8 | 3.8 | 0.0 | **50.7** | -- |
+| **MobileLLM-R1-140M-base** | 140M | **4.6** | **16.3** | **5.4** | **15.9** | 44.3 | -- |
+|  |
+| *150M - 400M* |  |  |  |  |  |  |  |
+| Gemma-3-270M-pt | 268M | 0.6 | 1.1 | 2.0 | 3.1 | 48.4 | 26.5 |
+| SmolLM2-360M-base | 362M | 1.8 | 5.0 | **19.4** | 0.0 | **56.6** | 24.7 |
+| **MobileLLM-R1-360M-base** | 359M | **13.4** | **39.4** | **20.8** | **32.9** | 51.0 | **26.8** |
+|  |
+| *400M - 1B* |  |  |  |  |  |  |  |
+| Qwen2.5-0.5B-base | 494M | 14.8 | 41.8 | 29.6 | 28.1 | 52.3 | 47.5 |
+| Qwen3-0.6B-base | 596M | **29.8** | 60.9 | **39.0** | 30.5 | 55.3 | **52.4** |
+| **MobileLLM-R1-950M-base** | 949M | 26.8 | **61.6** | **39.2** | **46.3** | **58.6** | 47.4 |
+|  |
+| *> 1B* |  |  |  |  |  |  |  |
+| Gemma-3-1B-pt | 1.0B | 0.6 | 2.4 | 9.4 | 6.1 | 57.3 | 26.1 |
+| LLaMA3.2-1B-base | 1.24B | 1.6 | 6.8 | 26.6 | 17.1 | 58.4 | 32.0 |
+| OLMo-2-0425-1B-base | 1.48B | 5.2 | 39.8 | 7.8 | 6.7 | 61.0 | 42.4 |
+| Qwen2.5-1.5B-base | 1.54B | 31.0 | 68.4 | 44.6 | 36.6 | 58.7 | 61.2 |
+| SmolLM2-1.7B-base | 1.71B | 11.6 | 31.8 | 35.4 | 0.6 | 62.9 | 50.0 |
+| Qwen3-1.7B-base | 2.03B | 38.5 | 76.2 | 56.4 | 47.6 | 60.9 | 62.1 |
+Here, CommonSense Avg. denotes an average of 8 tasks in CommonSense Reasoning benchmarks including ARC-easy, ARC-challenge, BoolQ, PIQA, SIQA, HellaSwag, OBQA, and WinoGrand. Models with fewer than 150M parameters do not yield reliable MMLU scores and are therefore denoted as '—'.
+## MobileLLM-R1 post-trained model
+ | Model | Size | MATH500 | GSM8K | AIME'24 | AIME'25 | LiveCodeBench-v6 |
+ | --- | --- | --- | --- | --- | --- | --- |
+ |  |  | 0-shot <br> pass@1 | 0-shot <br> pass@1 | 0-shot <br> pass@1, n=64 | 0-shot <br> pass@1, n=64 | 0-shot <br> pass@1, n=16 |
+ |  |
+ | *<150M* |  |  |  |  |  |  |
+ | SmolLM2-135M-Instruct | 135M | 3.0 | 2.4 | -- | -- | 0.0 |
+ | **MobileLLM-R1-140M** | 140M | **7.4** | **3.0** | -- | -- | **1.0** |
+ |  |
+ | *150M - 400M* |  |  |  |  |  |  |
+ | Gemma-3-270m-it | 268M | 6.8 | 8.4 | -- | -- | 0.0 |
+ | SmolLM2-360M-Instruct | 362M | 3.4 | 8.1 | -- | -- | 0.7 |
+ | **MobileLLM-R1-360M** | 359M | **26.6** | **22.7** | -- | -- | **4.8** |
+ |  |
+ | *400M - 1B* |  |  |  |  |  |  |
+ | Qwen2.5-0.5B-Instruct | 494M | 31.2 | 48.1 | 0.1 | 0.3 | 3.6 |
+ | Qwen3-0.6B | 596M | 73.0 | **79.2** | 11.3 | **17.0** | 14.9 |
+ | **MobileLLM-R1-950M** | 949M | **74.0** | 67.5 | **15.5** | 16.3 | **19.9** |
+ |  |
+ | *> 1B* |  |  |  |  |  |  |
+ | Gemma-3-1B-it | 1.0B | 45.4 | 62.9 | 0.9 | 0.0 | 2.0 |
+ | LLaMA3.2-1B-Instruct | 1.24B | 24.8 | 38.8 | 1.1 | 0.2 | 4.1 |
+ | OLMo-2-0425-1B-Instruct | 1.48B | 19.2 | 69.7 | 0.6 | 0.1 | 0.0 |
+ | OpenReasoning-Nemotron-1.5B | 1.54B | 83.4 | 76.7 | 49.7 | 40.4 | 28.3 |
+ | DeepSeek-R1-Distill-Qwen-1.5B | 1.54B | 83.2 | 77.3 | 29.1 | 23.4 | 19.9 |
+ | Qwen2.5-1.5B-Instruct | 1.54B | 54.0 | 70.0 | 2.5 | 0.9 | 7.9 |
+ | SmolLM2-1.7B-Instruct | 1.71B | 19.2 | 41.8 | 0.3 | 0.1 | 4.4 |
+ | Qwen3-1.7B | 2.03B | 89.4 | 90.3 | 47.0 | 37.0 | 29.8 |
+For AIME, we evaluate models across 64 runs and report the average accuracy. For LiveCodeBench, results are reported as the average accuracy across 16 runs. Models with fewer than 400M parameters do not produce reliable AIME scores and are therefore denoted as '—'.
+# Training
+## Training Process
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/660f893bae89429c07a32cdb/ThVFzsaaGa4gQ3iha5CKM.jpeg)
+### Training stages and hyperparameter details
+In the pretraining phase, MobileLLM-R1 models are randomly initialized and optimized using the Adam optimizer with hyperparameters (β_1, β_2, ε) = (0.9, 0.95, 1e-8), coupled with a weight decay coefficient of 0.1. The learning rate follows a 2k-step warmup schedule and then decays linearly from its peak to 10\% of the maximum.
+In the mid-training phase, we use Adam optimizer with learning rate linearly decays from its maximum value to zero. We employ knowledge distillation with Llama-3.1-8B-Instruct model as the teacher, where the student is trained via minimizing the KL divergence between its output logits and the teacher logits.
+In the post-training phase, we use the Adam optimizer with zero weight decay. The learning rate warmup ratio is set to 0.03 for general-purpose SFT and 0.1 for reasoning-specific SFT, and it linearly decays from its maximum value to zero. Full training hyperparameters are provided in the table below.
+| Stage | Phase | Tokens / Samples | BS | Sequence Length | Steps | LR | #GPUs | Training Time |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Pre-training | Phase1 | 2T tokens | 16 | 2k | 500k | 4.00E-03 | 16 x 8 | 4-5 days |
+|  | Phase2 | 2T tokens  | 16 | 2k | 500k | 4.00E-03 | 16 x 8 | 4-5 days |
+| Mid-training | Phase1 | 100B tokens  | 4 | 4k | 50K | 3.60E-04 | 16 x 8 | 1-2 days |
+|  | Phase2 | 100B tokens | 4 | 4k | 50K | 3.60E-04 | 16 x 8 | 1-2 days |
+| Post-training | General SFT | 866K samples | 4 | 4k | 2 epochs | 5.00E-06 | 16 x 8 | ~2h |
+|  | Reasoning SFT | 6.2M samples | 8 | 32k | 4 epochs | 8.00E-05 | 16 x 8 | ~2.5days |
+## Data Mix
+### Pre-training
+| Dataset | Rows | Tokens (B) | Phase1 Mix Ratio | Phase2 Mix Ratio |
+| --- | --- | --- | --- | --- |
+| [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata) | 206,640,114 | 263.8 | 10.66% | 0.52% |
+| [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) | 6,117,786 | 12.6 | 6.93% | 23.33% |
+| [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) | 1,279,107,432 | 1300 | 63.75% | 54.83% |
+| [Wiki](https://huggingface.co/datasets/allenai/dolmino-mix-1124/tree/main/data/wiki) | 7,222,303 | 3.7 | 5.03% | 0.14% |
+| [Arxiv](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T/blob/main/urls/arxiv.txt) | 1,533,917 | 28 | 6.36% | 1.32% |
+| [StackExchange](https://data.together.xyz/redpajama-data-1T/v1.0.0/stackexchange/stackexchange.jsonl) | 29,249,120 | 19.6 | 5.03% | 0.86% |
+| [Algebraic stack](https://huggingface.co/datasets/EleutherAI/proof-pile-2/tree/main/algebraic-stack) | 3,404,331 | 12.6 | 2.25% | 1.26% |
+| [Nemotron science](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset/blob/main/SFT/science/science.jsonl) | 708,920 | 2 | -- | 0.03% |
+| [Nemotron code](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset/blob/main/SFT/code/code_v1.1.jsonl) | 10,108,883 | 16 | -- | 0.72% |
+| [Nemotron math](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset/blob/main/SFT/math/math_v1.1.jsonl) | 22,066,397 | 15 | -- | 3.01% |
+| [Cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) | 31,064,744 | 25 | -- | 2.70% |
+| [Facebook natural reasoning](https://huggingface.co/datasets/facebook/natural_reasoning) | 1,145,824 | 1.8 | -- | 3.18% |
+| [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath/tree/main/finemath-3plus) | 48,283,984 | 34 | -- | 8.01% |
+| [peS2o](https://huggingface.co/datasets/allenai/peS2o) | 38,800,000 | 50 | -- | 0.08% |
+| **Total** |  |  | 100% | 100% |
+### Mid-training
+ | Dataset | Subset | Rows (M) | Phase1 Mix Ratio | Phase2 Mix Ratio |
+ | --- | --- | --- | --- | --- |
+ | [Dolmino](https://huggingface.co/datasets/allenai/dolmino-mix-1124) | DCLM Baseline | 606 | 37.03% | 6.51% |
+ |  | FLAN | 57.3 | 4.10% | 0.72% |
+ |  | peS2o | 38.8 | 11.41% | 2.01% |
+ |  | Wiki | 6.17 | 2.66% | 0.47% |
+ |  | StackExchange | 2.48 | 2.12% | 2.00% |
+ |  | Math | 21 | 11.63% | 29.10% |
+ | Nemotron | [Nemotron-Pretraining-Code-v1](https://huggingface.co/datasets/nvidia/Nemotron-Pretraining-Code-v1) | 882 | 20.69% | 29.10% |
+ |  | [Nemotron-CC-Math-v1](https://huggingface.co/datasets/nvidia/Nemotron-CC-Math-v1) | 144 | 3.45% | 19.40% |
+ | StarCoder | [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata) | 206 | 6.90% | 9.70% |
+ | Benchmark training set | [TriviaQA (train)](https://huggingface.co/datasets/mandarjoshi/trivia_qa/tree/main/rc) <br> [OBQA (train)](https://huggingface.co/datasets/allenai/openbookqa/blob/main/main/train-00000-of-00001.parquet) <br> [NaturalQuestions (train)](https://github.com/google-research-datasets/natural-questions/blob/master/nq_open/NQ-open.train.jsonl) <br> [PIQA (train)](https://github.com/ybisk/ybisk.github.io/blob/master/piqa/data/train.jsonl) <br> [GSM8K (train)](https://huggingface.co/datasets/openai/gsm8k/blob/main/main/train-00000-of-00001.parquet) <br> [BoolQ (train)](https://huggingface.co/datasets/google/boolq/blob/main/data/train-00000-of-00001.parquet) <br> [ARC-Easy (train)](https://huggingface.co/datasets/allenai/ai2_arc/blob/main/ARC-Easy/train-00000-of-00001.parquet) <br> [ARC-Challenge (train)](https://huggingface.co/datasets/allenai/ai2_arc/blob/main/ARC-Challenge/train-00000-of-00001.parquet) | ~0.01 | -- | 0.97% |
+ | Total |  |  | 100.00% | 100.00% |
+### Post-training
+ | Phase | Dataset | Rows |
+ | --- | --- | --- |
+ | General SFT | [Tulu-3-sft-olmo-2-mixture-0225](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture-0225) | 866K samples |
+ | Reasoning SFT | [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) | 3.2M samples |
+ | | [OpenScienceReasoning-2](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2) | 803K samples |
+ | | [OpenCodeReasoning-2](https://huggingface.co/datasets/nvidia/OpenCodeReasoning-2) | 2.16M samples |
+# Citation
+If you find our model useful for your research, please consider citing:
+    @misc{mobilellm_r1_2025,
+      title={MobileLLM-R1: Model Card},
+      author={Zechun Liu*, Ernie Chang*, Changsheng Zhao*, Chia-Jung Chang, Wei Wen, Chen Lai, Rick Cao, Yuandong Tian, Raghuraman Krishnamoorthi, Yangyang Shi, Vikas Chandra},
+      year={2025},
+      url = {https://huggingface.co/mobilellm-r1}
+    }
+# Contact
+Zechun Liu, Meta Inc (zechunliu at meta dot com)
+Ernie Chang, Meta Inc (erniecyc at meta dot com)
+Changsheng Zhao, Meta Inc (cszhao at meta dot com)
+# License
+MobileLLM-R1 is FAIR NC licensed as of now

check_shape.py ADDED Viewed

	@@ -0,0 +1,80 @@

+import argparse
+import json
+from pathlib import Path
+from safetensors import safe_open
+def check_model_shape(model_path: str):
+    """Inspects a model's config and weights to determine its MLP structure."""
+    model_path = Path(model_path)
+    config_path = model_path / "config.json"
+    weights_path = model_path / "model.safetensors"
+    if not config_path.exists():
+        print(f"Error: config.json not found in {model_path}")
+        return
+    if not weights_path.exists():
+        print(f"Error: model.safetensors not found in {model_path}")
+        return
+    print(f"--- Checking model shape in {model_path} ---")
+    # 1. Inspect config.json
+    with open(config_path, "r") as f:
+        config = json.load(f)
+    has_dual_mlp_config = config.get("intermediate_size_mlp", 0) > 0
+    print(f"Config has 'intermediate_size_mlp': {has_dual_mlp_config}")
+    # 2. Inspect weight keys from model.safetensors
+    has_dual_mlp_weights = False
+    try:
+        with safe_open(weights_path, framework="mlx") as f:
+            weight_keys = f.keys()
+            # A simple heuristic: check for weight keys that are not part of the standard SwiGLU MLP.
+            # This is not foolproof as names can vary, but it's a good indicator.
+            for key in weight_keys:
+                if (
+                    "mlp" in key
+                    and "gate_proj" not in key
+                    and "up_proj" not in key
+                    and "down_proj" not in key
+                ):
+                    print(f"Found potential dual-branch weight: {key}")
+                    has_dual_mlp_weights = True
+                    break
+    except Exception as e:
+        print(f"Could not read weights from model.safetensors: {e}")
+        return
+    print(f"Found potential dual-branch MLP weights: {has_dual_mlp_weights}")
+    # 3. Report conclusion
+    print("\n--- Conclusion ---")
+    if has_dual_mlp_config and has_dual_mlp_weights:
+        print("✅ The model appears to be a DUAL-BRANCH MLP variant.")
+    elif has_dual_mlp_config and not has_dual_mlp_weights:
+        print(
+            "⚠️ The model configuration suggests a dual-branch MLP, but no corresponding weights were found."
+        )
+        print("   It will likely run as a SINGLE-BRANCH model.")
+    else:
+        print("✅ The model appears to be a SINGLE-BRANCH MLP variant.")
+    print("--------------------\n")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Check the MLP shape of a model variant."
+    )
+    parser.add_argument(
+        "model_path",
+        type=str,
+        nargs="?",
+        default=".",
+        help="Path to the model directory to check.",
+    )
+    args = parser.parse_args()
+    check_model_shape(args.model_path)

conversion.log ADDED Viewed

	@@ -0,0 +1,8 @@

+uv run python custom_mlx_lm/custom_convert.py --hf-path . --mlx-path MobileLLM-R1-950M-mlx/ --report-ppl
+Loading model from ....
+Loading calibration data...
+Token indices sequence length is longer than the specified maximum sequence length for this model (110205 > 32768). Running this sequence through the model will result in indexing errors
+Calculating perplexity of original model...
+Original PPL: 50.262
+✅ Model saved to MobileLLM-R1-950M-mlx/

custom_mlx_lm/README.md ADDED Viewed

	@@ -0,0 +1,48 @@

+Custom MLX-LM Conversion, Quantization, and Inference
+Overview
+- Scripts here convert the HF safetensors model to MLX format, optionally apply mixed-precision dynamic quantization, and run inference with prompt formatting consistent with inference.py.
+- Quant layout is persisted in config.json so the loader can re-materialize only Linear layers as QuantizedLinear while keeping embeddings and norms in float.
+Key scripts
+- custom_convert_2.py
+  - Convert and optionally quantize.
+  - Mixed precision uses calibration data and a sensitivity-driven split between 4-bit and 8-bit Linear layers.
+  - Saves weights to weights.npz and writes quantization metadata to config.json.
+- custom_loader.py
+  - Loads the model with the correct module types (QuantizedLinear vs float) based on config metadata, then applies saved weights.
+  - Leaves embeddings and layernorms in float.
+- inference_mlx_lm.py (CLI: mobilellm-infer)
+  - Runs generation. Uses chat_template.jinja when present, else prepends BOS, matching inference.py behavior.
+- quant_summary.py
+  - Prints a summary of per-layer bit-widths and checks quantized tensors exist in weights.npz.
+Quickstart
+- Mixed-precision dynamic quantization
+  - uv run python custom_mlx_lm/custom_convert_2.py --hf-path . --mlx-path MobileLLM-R1-950M-mixed-4bit-mlx --dynamic-quant --target-bpw 4.5 --report-ppl
+  - Group size defaults to 64 when not provided.
+- Uniform quantization
+  - uv run python custom_mlx_lm/custom_convert_2.py --hf-path . --mlx-path MobileLLM-R1-950M-4bit-mlx --quantize --bits 4 --report-ppl
+- Summarize quant layout
+  - uv run python custom_mlx_lm/quant_summary.py --model-path MobileLLM-R1-950M-mixed-4bit-mlx --show 8
+- Inference
+  - mobilellm-infer --model-path MobileLLM-R1-950M-mixed-4bit-mlx --prompt "What is the nearest prime to 9^2?"
+Notes and defaults
+- Calibration: load_data uses WikiText-like data; dynamic quant computes sensitivities once and chooses 4/8-bit per Linear layer to target the requested bits-per-weight. Reported PPL is from the same set.
+- Group size: defaults to 64 when quantizing if not provided.
+- Prompt formatting: by default uses chat_template.jinja if present; otherwise prepends BOS for stable behavior across float and quant models.
+Troubleshooting
+- Empty sensitivities (ValueError: min() arg is empty)
+  - Fixed: ensure Linear weights are not frozen during sensitivity estimation; grads must exist.
+- Unable to quantize model of type QuantizedLinear
+  - Fixed: second quantization pass now targets only remaining float Linear layers.
+- [dequantize] The matrix should be given as a uint32
+  - Fixed: loader does not blanket-quantize; it re-materializes only Linear layers from per-layer bits map before loading weights, leaving embeddings in float.
+Rationale and behavior
+- Persist per-layer bits: enables deterministic, loader-driven reconstruction of quant modules and prevents accidental quantization of unsupported modules.
+- Keep embeddings float: avoids dtype mismatch and preserves quality.
+- Match inference.py formatting: improves output consistency between float and quant variants.

custom_mlx_lm/__init__.py ADDED Viewed

File without changes

custom_mlx_lm/custom_convert.py ADDED Viewed

	@@ -0,0 +1,305 @@

+import argparse
+import copy
+import json
+import os
+import sys
+from pathlib import Path
+import mlx.core as mx
+import mlx.nn as nn
+from mlx.utils import tree_flatten, tree_map, tree_unflatten
+from mlx_lm.quant.dynamic_quant import eval_ppl
+from mlx_lm.quant.utils import load_data
+from safetensors import safe_open
+from tqdm import tqdm
+from transformers import AutoTokenizer
+# FIX: Correctly calculate the project root to find model.py
+project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+if project_root not in sys.path:
+    sys.path.insert(0, project_root)
+from model import Model, ModelArgs
+def estimate_sensitivities(
+    model, data, low_bits, low_group_size, high_bits, high_group_size, batch_size=4
+):
+    def qdq(w, bits, group_size):
+        w, s, b = mx.quantize(w, bits=bits, group_size=group_size)
+        return mx.dequantize(w, scales=s, biases=b, bits=bits, group_size=group_size)
+    q_model = copy.deepcopy(model)
+    linear_layers = {
+        k: layer
+        for k, layer in tree_flatten(
+            q_model.leaf_modules(), is_leaf=nn.Module.is_module
+        )
+        if isinstance(layer, nn.Linear)
+    }
+    # Quantize-dequantize weights for low-precision model copy and ensure
+    # the weights remain trainable so gradients are computed for sensitivities.
+    for layer in linear_layers.values():
+        layer.weight = qdq(layer.weight, low_bits, low_group_size)
+    def loss_fn(batch, targets):
+        logits = q_model(batch)
+        return nn.losses.cross_entropy(logits, targets, reduction="mean")
+    grad_accum = tree_map(lambda x: mx.zeros(x.shape), q_model.trainable_parameters())
+    for s in tqdm(range(0, len(data), batch_size), desc="Estimating sensitivities"):
+        batch = data[s : s + batch_size]
+        targets = model(batch[:, :-1])
+        mx.eval(targets)
+        _, grads = nn.value_and_grad(q_model, loss_fn)(batch[:, :-1], batch[:, 1:])
+        grad_accum = tree_map(lambda x, y: x + y, grad_accum, grads)
+        mx.eval(grad_accum)
+    def compute_sensitivity(grad, lq_w, orig_w):
+        hq_w = qdq(orig_w, high_bits, high_group_size)
+        return (grad * (lq_w - hq_w)).sum()
+    # Use a direct loop instead of tree_map to be more robust
+    grad_dict = dict(tree_flatten(grad_accum))
+    q_params_dict = dict(tree_flatten(q_model.parameters()))
+    orig_params_dict = dict(tree_flatten(model.parameters()))
+    sensitivities = {}
+    for path, module in linear_layers.items():
+        weight_key = f"{path}.weight"
+        if weight_key in grad_dict:
+            grad = grad_dict[weight_key]
+            q_weight = q_params_dict[weight_key]
+            orig_weight = orig_params_dict[weight_key]
+            sensitivity = compute_sensitivity(grad, q_weight, orig_weight)
+            sensitivities[path] = sensitivity.item()
+    return sensitivities
+def estimate_threshold(
+    model,
+    sensitivities,
+    target_bpw,
+    low_bits,
+    low_group_size,
+    high_bits,
+    high_group_size,
+):
+    def predicate(p, m, threshold):
+        if not isinstance(m, nn.Linear):
+            return False
+        return sensitivities.get(p, 0) > threshold
+    sens_vals = list(sensitivities.values())
+    if len(sens_vals) == 0:
+        raise RuntimeError(
+            "No sensitivities were computed. This usually means gradients "
+            "for Linear weights were not collected. Ensure layers are detected "
+            "and weights are trainable during sensitivity estimation."
+        )
+    min_thr, max_thr = min(sens_vals), max(sens_vals)
+    while (max_thr - min_thr) > 1e-3 * (max(sens_vals) - min(sens_vals)):
+        mid = (max_thr + min_thr) / 2
+        q_model = copy.deepcopy(model)
+        def high_predicate(p, m):
+            return predicate(p, m, mid)
+        def low_predicate(p, m):
+            # Only quantize remaining float nn.Linear layers; avoid re-quantizing
+            # modules already quantized in the first pass.
+            return isinstance(m, nn.Linear) and (not predicate(p, m, mid))
+        nn.quantize(
+            q_model,
+            group_size=high_group_size,
+            bits=high_bits,
+            class_predicate=high_predicate,
+        )
+        nn.quantize(
+            q_model,
+            group_size=low_group_size,
+            bits=low_bits,
+            class_predicate=low_predicate,
+        )
+        bpw = (
+            sum(p.nbytes for _, p in tree_flatten(q_model.parameters()))
+            * 8
+            / sum(p.size for _, p in tree_flatten(q_model.parameters()))
+        )
+        if bpw > target_bpw:
+            min_thr = mid
+        else:
+            max_thr = mid
+    return (max_thr + min_thr) / 2
+# --- Main Conversion and Saving Logic ---
+def main():
+    parser = argparse.ArgumentParser(
+        description="Convert and optionally quantize a model."
+    )
+    parser.add_argument(
+        "--hf-path", type=str, default=".", help="Path to the Hugging Face model."
+    )
+    parser.add_argument(
+        "--mlx-path", type=str, required=True, help="Path to save the MLX model."
+    )
+    parser.add_argument(
+        "--quantize",
+        "-q",
+        action="store_true",
+        help="Generate a simple uniformly quantized model.",
+    )
+    parser.add_argument(
+        "--dynamic-quant",
+        action="store_true",
+        help="Use advanced mixed-precision quantization.",
+    )
+    parser.add_argument(
+        "--report-ppl",
+        action="store_true",
+        help="Report perplexity before and after quantization.",
+    )
+    parser.add_argument(
+        "--target-bpw",
+        type=float,
+        default=4.5,
+        help="Target bits per weight for advanced quant.",
+    )
+    parser.add_argument(
+        "--bits", "-b", type=int, default=4, help="Bits for uniform quantization."
+    )
+    parser.add_argument(
+        "--group-size",
+        "-g",
+        type=int,
+        default=None,
+        help="Group size for quantization. If omitted, defaults to 64 when quantizing.",
+    )
+    args = parser.parse_args()
+    print(f"Loading model from {args.hf_path}...")
+    hf_path = Path(args.hf_path)
+    tokenizer = AutoTokenizer.from_pretrained(args.hf_path)
+    with open(hf_path / "config.json", "r") as f:
+        config = json.load(f)
+    with safe_open(hf_path / "model.safetensors", framework="mlx") as f:
+        keys = list(f.keys())
+    has_dual = any(
+        (".feed_forward.g_up.weight" in k) or (".mlp.g_up.weight" in k) for k in keys
+    )
+    model_args = ModelArgs.from_dict(config)
+    model_args.use_dual_mlp = bool(has_dual)
+    model = Model(model_args)
+    weights = {}
+    with safe_open(hf_path / "model.safetensors", framework="mlx") as f:
+        for k in f.keys():
+            if has_dual and ("gate_proj" in k or "up_proj" in k or "down_proj" in k):
+                continue
+            v = f.get_tensor(k)
+            k = k.replace("model.embed_tokens", "tok_embeddings")
+            k = k.replace("model.layers", "layers")
+            k = k.replace("self_attn", "attention")
+            k = k.replace("input_layernorm", "attention_norm")
+            k = k.replace("post_attention_layernorm", "ffn_norm")
+            k = k.replace("mlp.", "feed_forward.")
+            k = k.replace("model.norm", "norm")
+            weights[k] = v
+    if config.get("tie_word_embeddings", True):
+        weights.pop("output.weight", None)
+    model.update(tree_unflatten(list(weights.items())))
+    calibration_data = None
+    if args.report_ppl or args.dynamic_quant:
+        print("Loading calibration data...")
+        calibration_data = load_data(tokenizer, num_samples=-1, sequence_length=512)
+    if args.report_ppl:
+        print("Calculating perplexity of original model...")
+        ppl = eval_ppl(model, data=calibration_data)
+        print(f"Original PPL: {ppl:.3f}")
+    if args.dynamic_quant:
+        # Choose a sensible default group size if not provided
+        if args.group_size is None:
+            args.group_size = 64
+            print("[info] Using default group_size=64 for dynamic quantization")
+        print("Starting advanced mixed-precision quantization...")
+        sensitivities = estimate_sensitivities(
+            model, calibration_data, 4, args.group_size, 8, args.group_size
+        )
+        threshold = estimate_threshold(
+            model,
+            sensitivities,
+            args.target_bpw,
+            4,
+            args.group_size,
+            8,
+            args.group_size,
+        )
+        # Compute per-layer bit widths BEFORE mutating the model
+        per_layer_bits = {p: (8 if s > threshold else 4) for p, s in sensitivities.items()}
+        def high_predicate(p, m):
+            return isinstance(m, nn.Linear) and per_layer_bits.get(p, 4) == 8
+        def low_predicate(p, m):
+            return isinstance(m, nn.Linear) and per_layer_bits.get(p, 4) == 4
+        nn.quantize(
+            model, group_size=args.group_size, bits=8, class_predicate=high_predicate
+        )
+        nn.quantize(
+            model, group_size=args.group_size, bits=4, class_predicate=low_predicate
+        )
+        # Persist per-layer bit-widths so the loader can re-materialize
+        # the correct QuantizedLinear modules on load without touching
+        # embeddings or other layers.
+        config["quantization"] = {
+            "group_size": args.group_size,
+            "method": "mixed_precision_dynamic",
+            "per_layer_bits": per_layer_bits,
+        }
+    elif args.quantize:
+        # Choose a sensible default group size if not provided
+        if args.group_size is None:
+            args.group_size = 64
+            print("[info] Using default group_size=64 for uniform quantization")
+        print("Starting simple uniform quantization...")
+        nn.quantize(model, group_size=args.group_size, bits=args.bits)
+        config["quantization"] = {
+            "group_size": args.group_size,
+            "bits": args.bits,
+            "method": "uniform",
+        }
+    if args.report_ppl and (args.quantize or args.dynamic_quant):
+        print("Calculating perplexity of quantized model...")
+        ppl = eval_ppl(model, data=calibration_data)
+        print(f"Quantized PPL: {ppl:.3f}")
+    output_path = Path(args.mlx_path)
+    output_path.mkdir(parents=True, exist_ok=True)
+    mx.savez(str(output_path / "weights.npz"), **dict(tree_flatten(model.parameters())))
+    with open(output_path / "config.json", "w") as f:
+        json.dump(config, f, indent=4)
+    tokenizer.save_pretrained(output_path)
+    print(f"\n✅ Model saved to {args.mlx_path}")
+if __name__ == "__main__":
+    main()

custom_mlx_lm/custom_loader.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import json
+from pathlib import Path
+import numpy as np
+import mlx.core as mx
+import mlx.nn as nn
+from mlx.utils import tree_unflatten
+# We must import from the project root's model.py
+from model import Model, ModelArgs
+def load_model(model_path: str):
+    model_path = Path(model_path)
+    with open(model_path / "config.json", "r") as f:
+        config = json.load(f)
+    # Peek with numpy to inspect keys without materializing MLX arrays yet
+    npz_path = model_path / "weights.npz"
+    npz = np.load(npz_path, allow_pickle=False)
+    keys = list(npz.files)
+    has_dual = any("g_up" in k for k in keys)
+    args = ModelArgs.from_dict(config)
+    args.use_dual_mlp = bool(has_dual)
+    model = Model(args)
+    # If quantization metadata is present, re-materialize QuantizedLinear modules
+    qcfg = config.get("quantization") or {}
+    method = qcfg.get("method")
+    group_size = qcfg.get("group_size")
+    if method == "uniform":
+        bits = int(qcfg.get("bits", 4))
+        nn.quantize(
+            model,
+            group_size=int(group_size) if group_size is not None else 64,
+            bits=bits,
+            class_predicate=lambda p, m: isinstance(m, nn.Linear),
+        )
+    elif method == "mixed_precision_dynamic":
+        per_layer_bits = qcfg.get("per_layer_bits", {})
+        def predicate(p, m):
+            if not isinstance(m, nn.Linear):
+                return False
+            b = per_layer_bits.get(p)
+            if b is None:
+                return False
+            return {"bits": int(b), "group_size": int(group_size)}
+        nn.quantize(
+            model,
+            group_size=int(group_size) if group_size is not None else 64,
+            bits=4,
+            class_predicate=predicate,
+        )
+    # Now load the actual weights into MLX and update
+    weights = mx.load(str(npz_path))
+    model.update(tree_unflatten(list(weights.items())))
+    return model

custom_mlx_lm/inference_mlx_lm.py ADDED Viewed

	@@ -0,0 +1,131 @@

+# custom_mlx_lm/inference_mlx_lm.py
+import argparse
+import os
+import sys
+import time
+import mlx.core as mx
+from transformers import AutoTokenizer
+from pathlib import Path
+project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+if project_root not in sys.path:
+    sys.path.insert(0, project_root)
+# Use the robust universal loader
+from custom_mlx_lm.custom_loader import load_model
+def generate_text(
+    prompt: str,
+    model_path: str,
+    max_tokens: int = 100,
+    temperature: float = 0.1,
+    top_p: float = 0.9,
+    # Add other parameters from your original inference.py if needed
+):
+    """
+    Generates text using the loaded MLX model with the robust custom sampler.
+    This logic is adapted from your proven inference.py script.
+    """
+    print("Loading model and tokenizer using custom loader...")
+    model = load_model(model_path)
+    tokenizer = AutoTokenizer.from_pretrained(model_path)
+    # Align prompt handling with inference.py: prefer chat template, else prepend BOS
+    chat_template_path = Path(model_path) / "chat_template.jinja"
+    use_chat_format = chat_template_path.exists()
+    if use_chat_format:
+        messages = [{"role": "user", "content": prompt}]
+        formatted_prompt = tokenizer.apply_chat_template(
+            messages, tokenize=False, add_generation_prompt=True
+        )
+    else:
+        bos = tokenizer.bos_token or ""
+        formatted_prompt = f"{bos}{prompt}"
+    print("Starting generation...")
+    prompt_tokens = tokenizer.encode(formatted_prompt, add_special_tokens=False)
+    prompt_tokens = mx.array([prompt_tokens])
+    start_time = time.time()
+    generated_tokens = []
+    for i in range(max_tokens):
+        logits = model(prompt_tokens)
+        next_token_logits = logits[0, -1, :]
+        if temperature == 0:
+            next_token = int(mx.argmax(next_token_logits).item())
+        else:
+            scaled_logits = next_token_logits / temperature
+            if 0.0 < top_p < 1.0:
+                probs = mx.softmax(scaled_logits, axis=-1)
+                sorted_probs = mx.sort(probs)[::-1]
+                cumulative_probs = mx.cumsum(sorted_probs, axis=-1)
+                cutoff_index = mx.sum(cumulative_probs < top_p)
+                cutoff_prob = sorted_probs[cutoff_index.item()]
+                mask = probs >= cutoff_prob
+                scaled_logits = mx.where(mask, scaled_logits, float("-inf"))
+            next_token = mx.random.categorical(scaled_logits, num_samples=1).item()
+        eos_ids = tokenizer.eos_token_id
+        stop_ids = (
+            {int(i) for i in eos_ids} if isinstance(eos_ids, list) else {int(eos_ids)}
+        )
+        if next_token in stop_ids:
+            break
+        generated_tokens.append(next_token)
+        prompt_tokens = mx.concatenate(
+            [prompt_tokens, mx.array([[next_token]])], axis=1
+        )
+    response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
+    print("\n--- Response ---")
+    print(response)
+    print("------------------")
+    generation_speed = (
+        len(generated_tokens) / (time.time() - start_time) if generated_tokens else 0
+    )
+    print(
+        f"Generated {len(generated_tokens)} tokens at {generation_speed:.2f} tokens/sec"
+    )
+def main():
+    parser = argparse.ArgumentParser(
+        description="Run inference on converted MLX models."
+    )
+    parser.add_argument(
+        "--model-path",
+        type=str,
+        required=True,
+        help="Path to the converted MLX model directory.",
+    )
+    parser.add_argument(
+        "--prompt",
+        type=str,
+        default="What is the capital of France?",
+        help="The prompt.",
+    )
+    parser.add_argument(
+        "--max-tokens", type=int, default=100, help="Max tokens to generate."
+    )
+    parser.add_argument(
+        "--temperature", type=float, default=0.1, help="Sampling temperature."
+    )
+    parser.add_argument("--top-p", type=float, default=0.9, help="Top-p sampling.")
+    args = parser.parse_args()
+    generate_text(
+        args.prompt,
+        args.model_path,
+        args.max_tokens,
+        args.temperature,
+        args.top_p,
+    )
+if __name__ == "__main__":
+    main()

custom_mlx_lm/quant_summary.py ADDED Viewed

	@@ -0,0 +1,67 @@

+import argparse
+import json
+from pathlib import Path
+import numpy as np
+def main():
+    p = argparse.ArgumentParser(description="Summarize MLX-LM quantization layout")
+    p.add_argument("--model-path", required=True, help="Path to converted MLX model")
+    p.add_argument("--show", type=int, default=10, help="Show up to N entries per group")
+    args = p.parse_args()
+    mpath = Path(args.model_path)
+    cfg = json.loads((mpath / "config.json").read_text())
+    q = cfg.get("quantization") or {}
+    method = q.get("method", "none")
+    gsize = q.get("group_size")
+    plb = q.get("per_layer_bits", {})
+    print(f"Method: {method}")
+    print(f"Group size: {gsize}")
+    if method == "uniform":
+        print(f"Uniform bits: {q.get('bits')}")
+        return
+    if not plb:
+        print("No per-layer bits found in config.")
+        return
+    # Basic counts
+    buckets = {4: [], 8: [], "other": []}
+    for k, b in plb.items():
+        if b == 4:
+            buckets[4].append(k)
+        elif b == 8:
+            buckets[8].append(k)
+        else:
+            buckets["other"].append(k)
+    total = sum(len(v) for v in buckets.values())
+    print(f"Total linear layers: {total}")
+    print(f"4-bit layers: {len(buckets[4])}")
+    print(f"8-bit layers: {len(buckets[8])}")
+    if buckets["other"]:
+        print(f"Other-bit layers: {len(buckets['other'])}")
+    # Optional: show a few examples
+    for b in (8, 4):
+        items = sorted(buckets[b])
+        if not items:
+            continue
+        print(f"\nExamples ({b}-bit):")
+        for k in items[: args.show]:
+            print(f"- {k}")
+    # Optional: sanity-check against npz contents
+    try:
+        npz = np.load(mpath / "weights.npz", allow_pickle=False)
+        has_q = any(k.endswith(".scales") or k.endswith(".biases") for k in npz.files)
+        print(f"\nweights.npz contains quantized tensors: {has_q}")
+    except Exception as e:
+        print(f"Note: could not open weights.npz: {e}")
+if __name__ == "__main__":
+    main()

inference.py ADDED Viewed

	@@ -0,0 +1,270 @@

+import argparse
+import time
+import mlx.core as mx
+from transformers import AutoTokenizer
+from model import load_model
+from pathlib import Path
+def generate_text(
+    prompt: str,
+    model_path: str,
+    max_tokens: int = 100,
+    temperature: float = 0.1,
+    top_p: float = 0.9,
+    system: str | None = None,
+    final_only: bool = False,
+    stop_at_boxed: bool = False,
+    extract_boxed: bool = False,
+    disable_chat_template: bool = False,
+    repetition_penalty: float = 1.0,
+    frequency_penalty: float = 0.0,
+):
+    """Generates text using the loaded MLX model with better sampling."""
+    print("Loading model and tokenizer...")
+    model = load_model(model_path)
+    tokenizer = AutoTokenizer.from_pretrained(model_path)
+    # Check if we have the chat template
+    chat_template_path = Path(model_path) / "chat_template.jinja"
+    use_chat_format = chat_template_path.exists() and not disable_chat_template
+    print(f"Chat template found: {use_chat_format}")
+    print("Starting generation...")
+    print(f"Prompt: {prompt}")
+    # Format the prompt if using chat template
+    if use_chat_format:
+        messages = []
+        if system is None and final_only:
+            system = (
+                "You are a helpful assistant. Do not reveal your reasoning. "
+                "Respond with only the final answer enclosed in \\boxed{...}."
+            )
+        if system is not None:
+            messages.append({"role": "system", "content": system})
+        messages.append({"role": "user", "content": prompt})
+        formatted_prompt = tokenizer.apply_chat_template(
+            messages, tokenize=False, add_generation_prompt=True
+        )
+        print(f"Formatted prompt: {formatted_prompt}")
+    else:
+        # No chat template: prepend BOS if available in tokenizer
+        bos = tokenizer.bos_token or ""
+        formatted_prompt = f"{bos}{prompt}"
+    # Tokenize the prompt
+    prompt_tokens = tokenizer.encode(formatted_prompt, add_special_tokens=False)
+    prompt_tokens = mx.array([prompt_tokens])
+    print(f"Prompt tokens shape: {prompt_tokens.shape}")
+    print(
+        f"First few token IDs: {prompt_tokens[0, : min(10, prompt_tokens.shape[1])].tolist()}"
+    )
+    # Generation loop with better sampling
+    start_time = time.time()
+    generated_tokens = []
+    freq_counts = {}
+    running_text = ""
+    seen_box_start = False
+    for i in range(max_tokens):
+        # Get logits from model
+        logits = model(prompt_tokens)
+        # Focus on next-token logits
+        next_token_logits = logits[0, -1, :]
+        # Apply repetition and frequency penalties before sampling/argmax
+        if repetition_penalty and repetition_penalty != 1.0 and generated_tokens:
+            # Apply a simple repetition penalty to previously generated tokens
+            # Using HF-like rule: if logit > 0 divide by penalty else multiply by penalty
+            logits_list = next_token_logits.tolist()
+            seen = set(generated_tokens)
+            for tid in seen:
+                val = logits_list[tid]
+                if val > 0:
+                    logits_list[tid] = val / repetition_penalty
+                else:
+                    logits_list[tid] = val * repetition_penalty
+            next_token_logits = mx.array(logits_list)
+        if frequency_penalty and frequency_penalty > 0 and generated_tokens:
+            # Subtract a multiple of token frequency from logits
+            counts = {}
+            for t in generated_tokens:
+                counts[t] = counts.get(t, 0) + 1
+            # Build a dense penalty vector once per step
+            vocab_size = next_token_logits.shape[-1]
+            pen = [0.0] * vocab_size
+            for tid, c in counts.items():
+                pen[tid] = frequency_penalty * float(c)
+            next_token_logits = next_token_logits - mx.array(pen)
+        # Apply temperature (temperature==0 -> greedy)
+        if temperature == 0:
+            # Greedy decode
+            next_token = int(mx.argmax(next_token_logits).item())
+        else:
+            # Sampling path: scale logits, apply top-p mask in logits space
+            scaled_logits = next_token_logits / temperature
+            if 0.0 < top_p < 1.0:
+                probs = mx.softmax(scaled_logits, axis=-1)
+                sorted_probs = mx.sort(probs)[::-1]
+                cumulative_probs = mx.cumsum(sorted_probs, axis=-1)
+                cutoff_index = mx.sum(cumulative_probs < top_p)
+                cutoff_prob = sorted_probs[cutoff_index.item()]
+                mask = probs >= cutoff_prob
+                scaled_logits = mx.where(mask, scaled_logits, float("-inf"))
+            # Sample from logits (MLX categorical expects logits)
+            next_token = mx.random.categorical(scaled_logits, num_samples=1).item()
+        # Safer stop condition: support multiple EOS ids
+        eos_ids = tokenizer.eos_token_id
+        if isinstance(eos_ids, (list, tuple)):
+            stop_ids = set(int(i) for i in eos_ids)
+        else:
+            stop_ids = {int(eos_ids)}
+        if next_token in stop_ids:
+            print(f"Stopping generation at EOS token: {next_token}")
+            break
+        generated_tokens.append(next_token)
+        # Update frequency counts
+        freq_counts[next_token] = freq_counts.get(next_token, 0) + 1
+        # Append the new token for the next iteration
+        prompt_tokens = mx.concatenate(
+            [prompt_tokens, mx.array([[next_token]])], axis=1
+        )
+        # Print token as we generate for debugging
+        if i < 10:  # Only print first 10 tokens to avoid spam
+            token_text = tokenizer.decode([next_token])
+            print(f"Token {i}: {next_token} -> '{token_text}'")
+        # Optional boxed stopping condition
+        if stop_at_boxed:
+            token_text_full = tokenizer.decode([next_token], skip_special_tokens=False)
+            running_text += token_text_full
+            if not seen_box_start and "\\boxed{" in running_text:
+                seen_box_start = True
+            if seen_box_start and "}" in running_text:
+                print("Stopping generation at boxed answer.")
+                break
+    end_time = time.time()
+    # Decode and print the result
+    if generated_tokens:
+        response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
+        print("\n--- Response ---")
+        print(response)
+    else:
+        print("\n--- No tokens generated ---")
+    print("------------------")
+    generation_speed = (
+        len(generated_tokens) / (end_time - start_time) if generated_tokens else 0
+    )
+    print(f"Generated {len(generated_tokens)} tokens")
+    print(f"Generation speed: {generation_speed:.2f} tokens/sec")
+    # Also print the full generated sequence including special tokens for debugging
+    if generated_tokens:
+        full_response = tokenizer.decode(generated_tokens, skip_special_tokens=False)
+        print(f"\nFull response (with special tokens): '{full_response}'")
+    if extract_boxed and generated_tokens:
+        import re
+        m = None
+        # Get the last occurrence of \\boxed{...}
+        for m in re.finditer(r"\\\\boxed\{([^}]*)\}", full_response):
+            pass
+        if m:
+            print(f"\nExtracted boxed answer: {m.group(1).strip()}")
+        else:
+            print("\nNo \\boxed{...} segment found to extract.")
+def main():
+    parser = argparse.ArgumentParser(description="Run inference with the MLX model.")
+    parser.add_argument(
+        "--model-path", type=str, default=".", help="Path to the model directory."
+    )
+    parser.add_argument(
+        "--prompt",
+        type=str,
+        default="What is the capital of France?",
+        help="The prompt to start generation from.",
+    )
+    parser.add_argument(
+        "--max-tokens",
+        type=int,
+        default=100,
+        help="The maximum number of tokens to generate.",
+    )
+    parser.add_argument(
+        "--temperature", type=float, default=0.1, help="Sampling temperature."
+    )
+    parser.add_argument(
+        "--top-p", type=float, default=0.9, help="Top-p (nucleus) sampling parameter."
+    )
+    parser.add_argument(
+        "--system", type=str, default=None, help="Optional system message for chat template."
+    )
+    parser.add_argument(
+        "--final-only",
+        action="store_true",
+        help="Instruct the model to output only the final answer inside \\boxed{...}.",
+    )
+    parser.add_argument(
+        "--stop-at-boxed",
+        action="store_true",
+        help="Stop generation once a closing '}' appears after \\boxed{.",
+    )
+    parser.add_argument(
+        "--extract-boxed",
+        action="store_true",
+        help="Extract and print the content inside the last \\boxed{...} in the response.",
+    )
+    parser.add_argument(
+        "--disable-chat-template",
+        action="store_true",
+        help="Ignore chat_template.jinja and feed the raw prompt (prepended with BOS).",
+    )
+    parser.add_argument(
+        "--repetition-penalty",
+        type=float,
+        default=1.0,
+        help="Penalty (>1.0) to discourage previously generated tokens.",
+    )
+    parser.add_argument(
+        "--frequency-penalty",
+        type=float,
+        default=0.0,
+        help="Subtract alpha * count(token) from logits before sampling.",
+    )
+    args = parser.parse_args()
+    generate_text(
+        args.prompt,
+        args.model_path,
+        args.max_tokens,
+        args.temperature,
+        args.top_p,
+        args.system,
+        args.final_only,
+        args.stop_at_boxed,
+        args.extract_boxed,
+        args.disable_chat_template,
+        args.repetition_penalty,
+        args.frequency_penalty,
+    )
+if __name__ == "__main__":
+    main()

main.py ADDED Viewed

	@@ -0,0 +1,6 @@

+def main():
+    print("Hello from mobilellm-r1-950m!")
+if __name__ == "__main__":
+    main()

mlx_technical_summary.md ADDED Viewed

	@@ -0,0 +1,96 @@

+# Porting **MobileLLM-R1-950M** to MLX and mlx-lm: Architectural Challenges and Solutions
+I spent a some time pairing with Gemini 2.5 Pro and later OpenAI Codex to drag the brand-new facebook/MobileLLM-R1-950M weights onto Apple Silicon.
+This write-up is the “why it wasn’t copy-paste” story, plus the gotchas that bit us until the model finally spoke clean English and quantized without drama.
+### Goal
+Enable **facebook/MobileLLM-R1-950M** to run natively on Apple Silicon using MLX, then create quantized versions compatible with the mlx-lm ecosystem.
+---
+## 1. Why a Direct "Llama-4 Drop-In" Failed
+Although the Hugging Face repo presents MobileLLM-R1-950M as a Llama-4-style dense model, its **config and weights don't align cleanly** with a stock Llama block. The deviations aren't quirks of MLX—they reflect this model's specific architecture:
+* **MLP ambiguity**
+  Config advertises both `intermediate_size` and `intermediate_size_mlp`, suggesting a dual-branch feed-forward.
+  Actual weights contain only a SwiGLU branch (`gate_proj`, `up_proj`, `down_proj`).
+  → Solution: **auto-detect MLP variant from weight names** at load time.
+* **Grouped-Query Attention (GQA)**
+  `num_attention_heads=24`, `num_key_value_heads=6`.
+  K/V tensors must be **repeated to full head count** for attention shapes to align correctly.
+* **QK-norm and scaling**
+  Config includes `use_qk_norm=True` and `attn_scale=0.1`.
+  We add the **RMSNorm on Q/K** as specified, but drop the extra `0.1` multiplier—applying it in MLX's `scaled_dot_product_attention` collapses logits into gibberish.
+* **RoPE gating**
+  Config lists all layers under `no_rope_layers`.
+  Disabling RoPE everywhere would eliminate positional encoding entirely.
+  → Treat "all layers disabled" as a config artifact and **apply RoPE everywhere**.
+---
+## 2. Prompt-Level Deviations
+Even after weights loaded correctly, default inference was disrupted by tokenizer settings:
+* **Chat template**
+  Default system prompt: *"Please reason step-by-step and put your final answer within \boxed{}."*
+  Without overrides, the model produces verbose "reasoning" outputs.
+  → Added CLI controls: `--system`, `--disable-chat-template`, `--final-only`.
+* **Double BOS**
+  Both tokenizer and template inserted BOS tokens.
+  → Fixed with `add_special_tokens=False`.
+* **Premature EOS**
+  Template headers (`<|eot_id|>`) were treated as stop tokens.
+  → Limited stopping criteria to true EOS token only.
+---
+## 3. Sampling Stability
+Sampling issues stemmed from API mismatches rather than model problems:
+* **Top-p on probabilities** then feeding `mx.random.categorical` produced repetition loops.
+* **Solution:** Apply penalties → scale logits → top-p mask (with `float('-inf')`) → `categorical(logits)`.
+* Added controls for **temperature, repetition penalty, frequency penalty**.
+---
+## 4. Quantization in mlx-lm: Why Custom Metadata Was Required
+mlx-lm provides quantization hooks, but MobileLLM's architecture exposed several challenges:
+1. **Frozen gradients during sensitivity analysis** → empty sensitivity lists.
+   → Avoid freezing weights during gradient computation.
+2. **Re-quantizing quantized layers** → type errors on second pass.
+   → Skip `QuantizedLinear` layers if already quantized.
+3. **Embedding/norm dtype crashes**
+   Standard quantization re-quantized everything, but embeddings must remain float.
+   → Introduced **metadata-driven approach**: config.json records *per-layer bit-widths*. Only specified layers are instantiated as `QuantizedLinear`.
+This metadata contract allows **4-bit mixed-precision MobileLLM** to be loaded cleanly by our **metadata-aware `custom_loader.py`**, making it compatible with the mlx-lm ecosystem.
+---
+## 5. End State
+* **MLX path:**
+  Structural fixes (GQA, MLP detection), numerical fixes (QK-norm, RoPE, attn_scale), and prompt controls together yield fluent, stable inference.
+* **mlx-lm path:**
+  Custom quantization pipeline produces FP16 and 4-bit models. These can be loaded with our **metadata-aware `custom_loader.py`** and used for inference with our provided scripts.
+  Performance: measurable speedup and reduced VRAM usage on Apple Silicon, with minimal quality degradation.
+---
+### Takeaway
+The MobileLLM-R1-950M port required systematically addressing architectural mismatches (MLP variant detection, GQA handling, QK-norm implementation, RoPE configuration) and developing a metadata-driven quantization approach. Once these were resolved, the model became fully functional in MLX with both float and quantized inference paths.

model.py ADDED Viewed

	@@ -0,0 +1,339 @@

+import mlx.core as mx
+import mlx.nn as nn
+import json
+from dataclasses import dataclass
+from pathlib import Path
+@dataclass
+class ModelArgs:
+    hidden_size: int
+    num_attention_heads: int
+    num_hidden_layers: int
+    vocab_size: int
+    intermediate_size: int
+    intermediate_size_mlp: int = None
+    num_key_value_heads: int = 0
+    rms_norm_eps: float = 1e-5
+    rope_theta: float = 10000.0
+    head_dim: int = None
+    use_dual_mlp: bool = False
+    tie_word_embeddings: bool = True
+    use_qk_norm: bool = False
+    attn_scale: float = 1.0
+    no_rope_layers: list | None = None
+    attention_chunk_size: int | None = None
+    attn_temperature_tuning: bool = False
+    @classmethod
+    def from_dict(cls, params):
+        return cls(
+            hidden_size=params["hidden_size"],
+            num_attention_heads=params["num_attention_heads"],
+            num_hidden_layers=params["num_hidden_layers"],
+            vocab_size=params["vocab_size"],
+            intermediate_size=params["intermediate_size"],
+            intermediate_size_mlp=params.get("intermediate_size_mlp"),
+            num_key_value_heads=params.get("num_key_value_heads", 0),
+            rms_norm_eps=params.get("rms_norm_eps", 1e-5),
+            rope_theta=params.get("rope_theta", 10000.0),
+            head_dim=params.get("head_dim"),
+            # Default: off. We'll detect from weights in load_model.
+            use_dual_mlp=False,
+            tie_word_embeddings=params.get("tie_word_embeddings", True),
+            use_qk_norm=params.get("use_qk_norm", False),
+            attn_scale=params.get("attn_scale", 1.0),
+            no_rope_layers=params.get("no_rope_layers"),
+            attention_chunk_size=params.get("attention_chunk_size"),
+            attn_temperature_tuning=params.get("attn_temperature_tuning", False),
+        )
+class RMSNorm(nn.Module):
+    def __init__(self, dims: int, eps: float = 1e-5):
+        super().__init__()
+        self.weight = mx.ones((dims,))
+        self.eps = eps
+    def _norm(self, x):
+        return x * mx.rsqrt(x.square().mean(-1, keepdims=True) + self.eps)
+    def __call__(self, x):
+        output = self._norm(x.astype(mx.float32)).astype(x.dtype)
+        return self.weight * output
+class Attention(nn.Module):
+    def __init__(self, args: ModelArgs):
+        super().__init__()
+        self.args = args
+        self.n_heads = args.num_attention_heads
+        self.n_kv_heads = (
+            args.num_key_value_heads
+            if args.num_key_value_heads > 0
+            else args.num_attention_heads
+        )
+        self.head_dim = (
+            args.head_dim
+            if getattr(args, "head_dim", None) is not None
+            else (args.hidden_size // self.n_heads)
+        )
+        # Use standard LLaMA scaling. The attn_scale field in some configs
+        # does not correspond to SDPA scaling and degrades outputs if applied here.
+        self.scale = self.head_dim**-0.5
+        self.q_proj = nn.Linear(
+            args.hidden_size, self.n_heads * self.head_dim, bias=False
+        )
+        self.k_proj = nn.Linear(
+            args.hidden_size, self.n_kv_heads * self.head_dim, bias=False
+        )
+        self.v_proj = nn.Linear(
+            args.hidden_size, self.n_kv_heads * self.head_dim, bias=False
+        )
+        self.o_proj = nn.Linear(
+            self.n_heads * self.head_dim, args.hidden_size, bias=False
+        )
+        self.q_norm = (
+            RMSNorm(self.head_dim, eps=args.rms_norm_eps)
+            if getattr(args, "use_qk_norm", False)
+            else None
+        )
+        self.k_norm = (
+            RMSNorm(self.head_dim, eps=args.rms_norm_eps)
+            if getattr(args, "use_qk_norm", False)
+            else None
+        )
+        # Llama 4 text models commonly use traditional RoPE application
+        self.rope = nn.RoPE(self.head_dim, traditional=True, base=args.rope_theta)
+    def __call__(
+        self,
+        x,
+        mask=None,
+        cache=None,
+        apply_rope: bool = True,
+        attn_temp: float | None = None,
+    ):
+        B, L, D = x.shape
+        queries, keys, values = self.q_proj(x), self.k_proj(x), self.v_proj(x)
+        queries = queries.reshape(B, L, self.n_heads, -1).transpose(0, 2, 1, 3)
+        keys = keys.reshape(B, L, self.n_kv_heads, -1).transpose(0, 2, 1, 3)
+        values = values.reshape(B, L, self.n_kv_heads, -1).transpose(0, 2, 1, 3)
+        if self.q_norm is not None:
+            queries = self.q_norm(queries)
+            keys = self.k_norm(keys)
+        # Optionally apply RoPE depending on per-layer setting
+        if apply_rope:
+            if cache is not None:
+                queries = self.rope(queries, offset=cache.offset)
+                keys = self.rope(keys, offset=cache.offset)
+                keys, values = cache.update_and_fetch(keys, values)
+            else:
+                queries = self.rope(queries)
+                keys = self.rope(keys)
+        else:
+            if cache is not None:
+                keys, values = cache.update_and_fetch(keys, values)
+        if self.n_kv_heads != self.n_heads:
+            repeat = self.n_heads // self.n_kv_heads
+            keys = mx.repeat(keys, repeat, axis=1)
+            values = mx.repeat(values, repeat, axis=1)
+        # Optional attention temperature tuning (scale the softmax input)
+        scale = self.scale if attn_temp is None else (self.scale * attn_temp)
+        output = mx.fast.scaled_dot_product_attention(
+            queries, keys, values, scale=scale, mask=mask
+        )
+        output = output.transpose(0, 2, 1, 3).reshape(B, L, -1)
+        return self.o_proj(output)
+class SwiGLUMLP(nn.Module):
+    """Standard LLaMA-style gated MLP (SwiGLU)."""
+    def __init__(self, dim, intermediate_size, activation=nn.silu):
+        super().__init__()
+        self.gate_proj = nn.Linear(dim, intermediate_size, bias=False)
+        self.up_proj = nn.Linear(dim, intermediate_size, bias=False)
+        self.down_proj = nn.Linear(intermediate_size, dim, bias=False)
+        # self.activation = activation
+    def __call__(self, x):
+        # return self.down_proj(self.activation(self.gate_proj(x)) * self.up_proj(x))
+        return self.down_proj(nn.silu(self.gate_proj(x)) * self.up_proj(x))
+class DualMLP(nn.Module):
+    """Dense dual-branch MLP: gated + plain."""
+    def __init__(self, dim, intermediate_gated, intermediate_plain, activation=nn.silu):
+        super().__init__()
+        self.g_up = nn.Linear(dim, intermediate_gated, bias=False)
+        self.g_gate = nn.Linear(dim, intermediate_gated, bias=False)
+        self.g_down = nn.Linear(intermediate_gated, dim, bias=False)
+        self.p_up = nn.Linear(dim, intermediate_plain, bias=False)
+        self.p_down = nn.Linear(intermediate_plain, dim, bias=False)
+        # self.activation = activation
+    def __call__(self, x):
+        # gated_out = self.g_down(self.activation(self.g_gate(x)) * self.g_up(x))
+        # plain_out = self.p_down(self.activation(self.p_up(x)))
+        gated_out = self.g_down(nn.silu(self.g_gate(x)) * self.g_up(x))
+        plain_out = self.p_down(nn.silu(self.p_up(x)))
+        return gated_out + plain_out
+class TransformerBlock(nn.Module):
+    def __init__(self, args: ModelArgs, layer_idx: int):
+        super().__init__()
+        self.attention = Attention(args)
+        self.layer_idx = layer_idx
+        # RoPE gating per layer.
+        # If the config provides a per-layer no_rope mask:
+        # - If it disables ALL layers, ignore it (apply RoPE everywhere)
+        # - Otherwise, honor the per-layer flag.
+        if (
+            isinstance(args.no_rope_layers, list)
+            and len(args.no_rope_layers) > layer_idx
+        ):
+            all_marked = all(bool(v) for v in args.no_rope_layers)
+            if all_marked:
+                disable_rope = False
+            else:
+                disable_rope = bool(args.no_rope_layers[layer_idx])
+        else:
+            disable_rope = False
+        self.apply_rope = not disable_rope
+        self.layer_idx = layer_idx
+        if args.use_dual_mlp and args.intermediate_size_mlp:
+            self.feed_forward = DualMLP(
+                args.hidden_size,
+                args.intermediate_size,
+                args.intermediate_size_mlp,
+            )
+        else:
+            self.feed_forward = SwiGLUMLP(
+                args.hidden_size,
+                args.intermediate_size_mlp,
+            )
+        self.attention_norm = RMSNorm(args.hidden_size, eps=args.rms_norm_eps)
+        self.ffn_norm = RMSNorm(args.hidden_size, eps=args.rms_norm_eps)
+    def __call__(self, x, mask=None, cache=None):
+        L = x.shape[1]
+        # Use standard causal mask; iRoPE chunking is not applied for now
+        attn_mask = (
+            None
+            if L <= 1
+            else nn.MultiHeadAttention.create_additive_causal_mask(L).astype(x.dtype)
+        )
+        args = self.attention.args
+        apply_rope = self.apply_rope
+        attn_temp = 1.0 if getattr(args, "attn_temperature_tuning", False) else None
+        r = self.attention(
+            self.attention_norm(x),
+            attn_mask,
+            cache,
+            apply_rope=apply_rope,
+            attn_temp=attn_temp,
+        )
+        h = x + r
+        r = self.feed_forward(self.ffn_norm(h))
+        return h + r
+class Model(nn.Module):
+    def __init__(self, args: ModelArgs):
+        super().__init__()
+        self.args = args
+        self.vocab_size = args.vocab_size
+        self.tok_embeddings = nn.Embedding(args.vocab_size, args.hidden_size)
+        # Plain Python list is fine in MLX
+        self.layers = [
+            TransformerBlock(args=args, layer_idx=i)
+            for i in range(args.num_hidden_layers)
+        ]
+        self.norm = RMSNorm(args.hidden_size, eps=args.rms_norm_eps)
+        if not self.args.tie_word_embeddings:
+            self.output = nn.Linear(args.hidden_size, args.vocab_size, bias=False)
+    def __call__(self, inputs, cache=None):
+        h = self.tok_embeddings(inputs)
+        if cache is None:
+            cache = [None] * len(self.layers)
+        for layer, c in zip(self.layers, cache):
+            h = layer(h, None, c)
+        h = self.norm(h)
+        if self.args.tie_word_embeddings:
+            return h @ self.tok_embeddings.weight.T
+        else:
+            return self.output(h)
+def load_model(model_path: str):
+    model_path = Path(model_path)
+    with open(model_path / "config.json", "r") as f:
+        config = json.load(f)
+    from safetensors import safe_open
+    from mlx.utils import tree_unflatten
+    # Peek at weights to decide MLP variant
+    with safe_open(model_path / "model.safetensors", framework="mlx") as f:
+        keys = list(f.keys())
+    has_dual = any(
+        (".feed_forward.g_up.weight" in k)
+        or (".mlp.g_up.weight" in k)
+        or (".feed_forward.p_up.weight" in k)
+        or (".mlp.p_up.weight" in k)
+        for k in keys
+    )
+    args = ModelArgs.from_dict(config)
+    args.use_dual_mlp = bool(has_dual)
+    model = Model(args)
+    weights = {}
+    with safe_open(model_path / "model.safetensors", framework="mlx") as f:
+        for k in f.keys():
+            v = f.get_tensor(k)
+            # The keys in the safetensors file are from the Hugging Face model.
+            # We need to map them to the names in our MLX model.
+            k = k.replace("model.embed_tokens", "tok_embeddings")
+            k = k.replace("model.layers", "layers")
+            k = k.replace("self_attn", "attention")
+            k = k.replace("input_layernorm", "attention_norm")
+            k = k.replace("post_attention_layernorm", "ffn_norm")
+            k = k.replace("mlp.", "feed_forward.")
+            k = k.replace("model.norm", "norm")
+            # For the MLP, the names are conveniently the same if using SwiGLUMLP
+            # k = k.replace("feed_forward.gate_proj", "feed_forward.gate_proj")
+            # k = k.replace("feed_forward.up_proj", "feed_forward.up_proj")
+            # k = k.replace("feed_forward.down_proj", "feed_forward.down_proj")
+            weights[k] = v
+    # The output layer is tied to the token embeddings, so we don't load weights for it separately.
+    if config.get("tie_word_embeddings", True):
+        weights.pop("output.weight", None)
+    model.update(tree_unflatten(list(weights.items())))
+    return model

pr-16104-summary.md ADDED Viewed

The diff for this file is too large to render. See raw diff

pyproject.toml ADDED Viewed

	@@ -0,0 +1,23 @@

+[project]
+name = "mobilellm-r1-950m"
+version = "0.1.0"
+description = "mlx_lm_for_mobile_llm_r1"
+readme = "README.md"
+requires-python = ">=3.13"
+dependencies = [
+    "mlx>=0.29.1",
+    "mlx-lm>=0.27.1",
+    "safetensors>=0.6.2",
+    "transformers>=4.56.1",
+]
+[dependency-groups]
+dev = [
+    "torch>=2.8.0",
+]
+[tool.hatch.build.targets.wheel]
+packages = ["custom_mlx_lm"]
+[project.scripts]
+mobilellm-infer = "custom_mlx_lm.inference_mlx_lm:main"

quantization.log ADDED Viewed

	@@ -0,0 +1,38 @@

+uv run python custom_mlx_lm/custom_convert.py --hf-path . --mlx-path MobileLLM-R1-950M-mixed-4bit-mlx --dynamic-quant --target-bpw 4.5 --group-size 64 --report-ppl
+Loading model from ....
+Loading calibration data...
+Token indices sequence length is longer than the specified maximum sequence length for this model (110205 > 32768). Running this sequence through the model will result in indexing errors
+Calculating perplexity of original model...
+Original PPL: 50.262
+Starting advanced mixed-precision quantization...
+huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
+To disable this warning, you can either:
+	- Avoid using `tokenizers` before the fork if possible
+	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
+Estimating sensitivities: 100%|████████████████████████████████████| 54/54 [02:03<00:00,  2.28s/it]
+Calculating perplexity of quantized model...
+Quantized PPL: 59.059
+✅ Model saved to MobileLLM-R1-950M-mixed-4bit-mlx
+uv run python custom_mlx_lm/quant_summary.py --model-path MobileLLM-R1-950M-mixed-4bit-mlx --show 8
+Method: mixed_precision_dynamic
+Group size: 64
+Total linear layers: 154
+4-bit layers: 153
+8-bit layers: 1
+Examples (8-bit):
+- layers.0.attention.o_proj
+Examples (4-bit):
+- layers.0.attention.k_proj
+- layers.0.attention.q_proj
+- layers.0.attention.v_proj
+- layers.0.feed_forward.down_proj
+- layers.0.feed_forward.gate_proj
+- layers.0.feed_forward.up_proj
+- layers.1.attention.k_proj
+- layers.1.attention.o_proj
+weights.npz contains quantized tensors: True

requirements.txt ADDED Viewed

	@@ -0,0 +1,29 @@

+certifi==2025.8.3
+charset-normalizer==3.4.3
+filelock==3.19.1
+fsspec==2025.9.0
+hf-xet==1.1.10
+huggingface-hub==0.34.4
+idna==3.10
+jinja2==3.1.6
+markupsafe==3.0.2
+mlx==0.29.1
+mlx-lm==0.27.1
+mlx-metal==0.29.1
+mpmath==1.3.0
+networkx==3.5
+numpy==2.3.3
+packaging==25.0
+protobuf==6.32.1
+pyyaml==6.0.2
+regex==2025.9.1
+requests==2.32.5
+safetensors==0.6.2
+setuptools==80.9.0
+sympy==1.14.0
+tokenizers==0.22.0
+torch==2.8.0
+tqdm==4.67.1
+transformers==4.56.1
+typing-extensions==4.15.0
+urllib3==2.5.0

test_model.py ADDED Viewed

	@@ -0,0 +1,103 @@

+import sys
+from pathlib import Path
+# Add the current directory to the python path to import model.py
+sys.path.append(str(Path.cwd()))
+from model import load_model
+from mlx.utils import tree_flatten
+def run_diagnostic_checks():
+    """
+    Performs the verification checks outlined in the review.
+    """
+    print("--- Running Diagnostic Checks ---")
+    # 1. Load model and check for errors
+    try:
+        model = load_model(".")
+        print("Successfully loaded model definition.")
+    except Exception as e:
+        print(f"Error loading model: {e}")
+        return
+    # 2. Print total parameter count
+    try:
+        params = model.parameters()
+        num_params = sum(p.size for _, p in tree_flatten(params))
+        print(f"Total number of parameters: {num_params / 1e6:.2f}M")
+    except Exception as e:
+        print(f"Error calculating parameters: {e}")
+    # 3. Verify MLP weight shapes
+    print("--- Verifying MLP Weight Shapes ---")
+    try:
+        first_block = model.layers[0]
+        args = model.args
+        print(f"use_dual_mlp detected: {args.use_dual_mlp}")
+        if args.use_dual_mlp:
+            g_up_shape = first_block.feed_forward.g_up.weight.shape
+            p_up_shape = first_block.feed_forward.p_up.weight.shape
+            print(f"Gated MLP branch (g_up) weight shape: {g_up_shape}")
+            print(f"Plain MLP branch (p_up) weight shape: {p_up_shape}")
+            assert g_up_shape == (args.intermediate_size, args.hidden_size)
+            assert p_up_shape == (args.intermediate_size_mlp, args.hidden_size)
+            print("DualMLP weight shapes are correct.")
+        else:
+            gate_proj_shape = first_block.feed_forward.gate_proj.weight.shape
+            up_proj_shape = first_block.feed_forward.up_proj.weight.shape
+            print(f"SwiGLUMLP gate_proj weight shape: {gate_proj_shape}")
+            print(f"SwiGLUMLP up_proj weight shape: {up_proj_shape}")
+            assert gate_proj_shape == (args.intermediate_size_mlp, args.hidden_size)
+            assert up_proj_shape == (args.intermediate_size_mlp, args.hidden_size)
+            print("SwiGLUMLP weight shapes are correct.")
+    except AttributeError as e:
+        print(
+            f"Error accessing MLP weights. It seems the structure is not as expected: {e}"
+        )
+    except AssertionError:
+        print("Error: MLP weight shapes do not match the configuration.")
+    except Exception as e:
+        print(f"An unexpected error occurred while verifying shapes: {e}")
+    # 4. Verify Embedding shape
+    print("--- Verifying Embedding Shape ---")
+    try:
+        embedding_shape = model.tok_embeddings.weight.shape
+        print(f"Embedding weight shape: {embedding_shape}")
+        args = model.args
+        print(f"Expected embedding shape: ({args.vocab_size}, {args.hidden_size})")
+        assert embedding_shape == (args.vocab_size, args.hidden_size)
+        print("Embedding shape is correct.")
+    except Exception as e:
+        print(f"An unexpected error occurred while verifying embedding shape: {e}")
+    print("--- Sanity Checking Loaded Weights ---")
+    try:
+        # Check expected attribute exists based on architecture
+        if model.args.use_dual_mlp:
+            _ = model.layers[0].feed_forward.g_gate.weight
+            _ = model.layers[0].feed_forward.g_up.weight
+            _ = model.layers[0].feed_forward.g_down.weight
+            _ = model.layers[0].feed_forward.p_up.weight
+            _ = model.layers[0].feed_forward.p_down.weight
+            print("Found dual-branch MLP weights in the model.")
+        else:
+            _ = model.layers[0].feed_forward.gate_proj.weight
+            _ = model.layers[0].feed_forward.up_proj.weight
+            _ = model.layers[0].feed_forward.down_proj.weight
+            print("Found SwiGLU MLP weights in the model.")
+        print("Weight presence sanity check passed.")
+    except Exception as e:
+        print(f"An error occurred during sanity check: {e}")
+    print("--- Diagnostic Checks Complete ---")
+if __name__ == "__main__":
+    run_diagnostic_checks()

uv.lock ADDED Viewed

	@@ -0,0 +1,678 @@

+version = 1
+revision = 3
+requires-python = ">=3.13"
+[[package]]
+name = "certifi"
+version = "2025.8.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/dc/67/960ebe6bf230a96cda2e0abcf73af550ec4f090005363542f0765df162e0/certifi-2025.8.3.tar.gz", hash = "sha256:e564105f78ded564e3ae7c923924435e1daa7463faeab5bb932bc53ffae63407", size = 162386, upload-time = "2025-08-03T03:07:47.08Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e5/48/1549795ba7742c948d2ad169c1c8cdbae65bc450d6cd753d124b17c8cd32/certifi-2025.8.3-py3-none-any.whl", hash = "sha256:f6c12493cfb1b06ba2ff328595af9350c65d6644968e5d3a2ffd78699af217a5", size = 161216, upload-time = "2025-08-03T03:07:45.777Z" },
+]
+[[package]]
+name = "charset-normalizer"
+version = "3.4.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/83/2d/5fd176ceb9b2fc619e63405525573493ca23441330fcdaee6bef9460e924/charset_normalizer-3.4.3.tar.gz", hash = "sha256:6fce4b8500244f6fcb71465d4a4930d132ba9ab8e71a7859e6a5d59851068d14", size = 122371, upload-time = "2025-08-09T07:57:28.46Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/65/ca/2135ac97709b400c7654b4b764daf5c5567c2da45a30cdd20f9eefe2d658/charset_normalizer-3.4.3-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:14c2a87c65b351109f6abfc424cab3927b3bdece6f706e4d12faaf3d52ee5efe", size = 205326, upload-time = "2025-08-09T07:56:24.721Z" },
+    { url = "https://files.pythonhosted.org/packages/71/11/98a04c3c97dd34e49c7d247083af03645ca3730809a5509443f3c37f7c99/charset_normalizer-3.4.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:41d1fc408ff5fdfb910200ec0e74abc40387bccb3252f3f27c0676731df2b2c8", size = 146008, upload-time = "2025-08-09T07:56:26.004Z" },
+    { url = "https://files.pythonhosted.org/packages/60/f5/4659a4cb3c4ec146bec80c32d8bb16033752574c20b1252ee842a95d1a1e/charset_normalizer-3.4.3-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:1bb60174149316da1c35fa5233681f7c0f9f514509b8e399ab70fea5f17e45c9", size = 159196, upload-time = "2025-08-09T07:56:27.25Z" },
+    { url = "https://files.pythonhosted.org/packages/86/9e/f552f7a00611f168b9a5865a1414179b2c6de8235a4fa40189f6f79a1753/charset_normalizer-3.4.3-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:30d006f98569de3459c2fc1f2acde170b7b2bd265dc1943e87e1a4efe1b67c31", size = 156819, upload-time = "2025-08-09T07:56:28.515Z" },
+    { url = "https://files.pythonhosted.org/packages/7e/95/42aa2156235cbc8fa61208aded06ef46111c4d3f0de233107b3f38631803/charset_normalizer-3.4.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:416175faf02e4b0810f1f38bcb54682878a4af94059a1cd63b8747244420801f", size = 151350, upload-time = "2025-08-09T07:56:29.716Z" },
+    { url = "https://files.pythonhosted.org/packages/c2/a9/3865b02c56f300a6f94fc631ef54f0a8a29da74fb45a773dfd3dcd380af7/charset_normalizer-3.4.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:6aab0f181c486f973bc7262a97f5aca3ee7e1437011ef0c2ec04b5a11d16c927", size = 148644, upload-time = "2025-08-09T07:56:30.984Z" },
+    { url = "https://files.pythonhosted.org/packages/77/d9/cbcf1a2a5c7d7856f11e7ac2d782aec12bdfea60d104e60e0aa1c97849dc/charset_normalizer-3.4.3-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:fdabf8315679312cfa71302f9bd509ded4f2f263fb5b765cf1433b39106c3cc9", size = 160468, upload-time = "2025-08-09T07:56:32.252Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/42/6f45efee8697b89fda4d50580f292b8f7f9306cb2971d4b53f8914e4d890/charset_normalizer-3.4.3-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:bd28b817ea8c70215401f657edef3a8aa83c29d447fb0b622c35403780ba11d5", size = 158187, upload-time = "2025-08-09T07:56:33.481Z" },
+    { url = "https://files.pythonhosted.org/packages/70/99/f1c3bdcfaa9c45b3ce96f70b14f070411366fa19549c1d4832c935d8e2c3/charset_normalizer-3.4.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:18343b2d246dc6761a249ba1fb13f9ee9a2bcd95decc767319506056ea4ad4dc", size = 152699, upload-time = "2025-08-09T07:56:34.739Z" },
+    { url = "https://files.pythonhosted.org/packages/a3/ad/b0081f2f99a4b194bcbb1934ef3b12aa4d9702ced80a37026b7607c72e58/charset_normalizer-3.4.3-cp313-cp313-win32.whl", hash = "sha256:6fb70de56f1859a3f71261cbe41005f56a7842cc348d3aeb26237560bfa5e0ce", size = 99580, upload-time = "2025-08-09T07:56:35.981Z" },
+    { url = "https://files.pythonhosted.org/packages/9a/8f/ae790790c7b64f925e5c953b924aaa42a243fb778fed9e41f147b2a5715a/charset_normalizer-3.4.3-cp313-cp313-win_amd64.whl", hash = "sha256:cf1ebb7d78e1ad8ec2a8c4732c7be2e736f6e5123a4146c5b89c9d1f585f8cef", size = 107366, upload-time = "2025-08-09T07:56:37.339Z" },
+    { url = "https://files.pythonhosted.org/packages/8e/91/b5a06ad970ddc7a0e513112d40113e834638f4ca1120eb727a249fb2715e/charset_normalizer-3.4.3-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:3cd35b7e8aedeb9e34c41385fda4f73ba609e561faedfae0a9e75e44ac558a15", size = 204342, upload-time = "2025-08-09T07:56:38.687Z" },
+    { url = "https://files.pythonhosted.org/packages/ce/ec/1edc30a377f0a02689342f214455c3f6c2fbedd896a1d2f856c002fc3062/charset_normalizer-3.4.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b89bc04de1d83006373429975f8ef9e7932534b8cc9ca582e4db7d20d91816db", size = 145995, upload-time = "2025-08-09T07:56:40.048Z" },
+    { url = "https://files.pythonhosted.org/packages/17/e5/5e67ab85e6d22b04641acb5399c8684f4d37caf7558a53859f0283a650e9/charset_normalizer-3.4.3-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:2001a39612b241dae17b4687898843f254f8748b796a2e16f1051a17078d991d", size = 158640, upload-time = "2025-08-09T07:56:41.311Z" },
+    { url = "https://files.pythonhosted.org/packages/f1/e5/38421987f6c697ee3722981289d554957c4be652f963d71c5e46a262e135/charset_normalizer-3.4.3-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:8dcfc373f888e4fb39a7bc57e93e3b845e7f462dacc008d9749568b1c4ece096", size = 156636, upload-time = "2025-08-09T07:56:43.195Z" },
+    { url = "https://files.pythonhosted.org/packages/a0/e4/5a075de8daa3ec0745a9a3b54467e0c2967daaaf2cec04c845f73493e9a1/charset_normalizer-3.4.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:18b97b8404387b96cdbd30ad660f6407799126d26a39ca65729162fd810a99aa", size = 150939, upload-time = "2025-08-09T07:56:44.819Z" },
+    { url = "https://files.pythonhosted.org/packages/02/f7/3611b32318b30974131db62b4043f335861d4d9b49adc6d57c1149cc49d4/charset_normalizer-3.4.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:ccf600859c183d70eb47e05a44cd80a4ce77394d1ac0f79dbd2dd90a69a3a049", size = 148580, upload-time = "2025-08-09T07:56:46.684Z" },
+    { url = "https://files.pythonhosted.org/packages/7e/61/19b36f4bd67f2793ab6a99b979b4e4f3d8fc754cbdffb805335df4337126/charset_normalizer-3.4.3-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:53cd68b185d98dde4ad8990e56a58dea83a4162161b1ea9272e5c9182ce415e0", size = 159870, upload-time = "2025-08-09T07:56:47.941Z" },
+    { url = "https://files.pythonhosted.org/packages/06/57/84722eefdd338c04cf3030ada66889298eaedf3e7a30a624201e0cbe424a/charset_normalizer-3.4.3-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:30a96e1e1f865f78b030d65241c1ee850cdf422d869e9028e2fc1d5e4db73b92", size = 157797, upload-time = "2025-08-09T07:56:49.756Z" },
+    { url = "https://files.pythonhosted.org/packages/72/2a/aff5dd112b2f14bcc3462c312dce5445806bfc8ab3a7328555da95330e4b/charset_normalizer-3.4.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:d716a916938e03231e86e43782ca7878fb602a125a91e7acb8b5112e2e96ac16", size = 152224, upload-time = "2025-08-09T07:56:51.369Z" },
+    { url = "https://files.pythonhosted.org/packages/b7/8c/9839225320046ed279c6e839d51f028342eb77c91c89b8ef2549f951f3ec/charset_normalizer-3.4.3-cp314-cp314-win32.whl", hash = "sha256:c6dbd0ccdda3a2ba7c2ecd9d77b37f3b5831687d8dc1b6ca5f56a4880cc7b7ce", size = 100086, upload-time = "2025-08-09T07:56:52.722Z" },
+    { url = "https://files.pythonhosted.org/packages/ee/7a/36fbcf646e41f710ce0a563c1c9a343c6edf9be80786edeb15b6f62e17db/charset_normalizer-3.4.3-cp314-cp314-win_amd64.whl", hash = "sha256:73dc19b562516fc9bcf6e5d6e596df0b4eb98d87e4f79f3ae71840e6ed21361c", size = 107400, upload-time = "2025-08-09T07:56:55.172Z" },
+    { url = "https://files.pythonhosted.org/packages/8a/1f/f041989e93b001bc4e44bb1669ccdcf54d3f00e628229a85b08d330615c5/charset_normalizer-3.4.3-py3-none-any.whl", hash = "sha256:ce571ab16d890d23b5c278547ba694193a45011ff86a9162a71307ed9f86759a", size = 53175, upload-time = "2025-08-09T07:57:26.864Z" },
+]
+[[package]]
+name = "colorama"
+version = "0.4.6"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697, upload-time = "2022-10-25T02:36:22.414Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" },
+]
+[[package]]
+name = "filelock"
+version = "3.19.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/40/bb/0ab3e58d22305b6f5440629d20683af28959bf793d98d11950e305c1c326/filelock-3.19.1.tar.gz", hash = "sha256:66eda1888b0171c998b35be2bcc0f6d75c388a7ce20c3f3f37aa8e96c2dddf58", size = 17687, upload-time = "2025-08-14T16:56:03.016Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/42/14/42b2651a2f46b022ccd948bca9f2d5af0fd8929c4eec235b8d6d844fbe67/filelock-3.19.1-py3-none-any.whl", hash = "sha256:d38e30481def20772f5baf097c122c3babc4fcdb7e14e57049eb9d88c6dc017d", size = 15988, upload-time = "2025-08-14T16:56:01.633Z" },
+]
+[[package]]
+name = "fsspec"
+version = "2025.9.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/de/e0/bab50af11c2d75c9c4a2a26a5254573c0bd97cea152254401510950486fa/fsspec-2025.9.0.tar.gz", hash = "sha256:19fd429483d25d28b65ec68f9f4adc16c17ea2c7c7bf54ec61360d478fb19c19", size = 304847, upload-time = "2025-09-02T19:10:49.215Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/47/71/70db47e4f6ce3e5c37a607355f80da8860a33226be640226ac52cb05ef2e/fsspec-2025.9.0-py3-none-any.whl", hash = "sha256:530dc2a2af60a414a832059574df4a6e10cce927f6f4a78209390fe38955cfb7", size = 199289, upload-time = "2025-09-02T19:10:47.708Z" },
+]
+[[package]]
+name = "hf-xet"
+version = "1.1.10"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/74/31/feeddfce1748c4a233ec1aa5b7396161c07ae1aa9b7bdbc9a72c3c7dd768/hf_xet-1.1.10.tar.gz", hash = "sha256:408aef343800a2102374a883f283ff29068055c111f003ff840733d3b715bb97", size = 487910, upload-time = "2025-09-12T20:10:27.12Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/f7/a2/343e6d05de96908366bdc0081f2d8607d61200be2ac802769c4284cc65bd/hf_xet-1.1.10-cp37-abi3-macosx_10_12_x86_64.whl", hash = "sha256:686083aca1a6669bc85c21c0563551cbcdaa5cf7876a91f3d074a030b577231d", size = 2761466, upload-time = "2025-09-12T20:10:22.836Z" },
+    { url = "https://files.pythonhosted.org/packages/31/f9/6215f948ac8f17566ee27af6430ea72045e0418ce757260248b483f4183b/hf_xet-1.1.10-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:71081925383b66b24eedff3013f8e6bbd41215c3338be4b94ba75fd75b21513b", size = 2623807, upload-time = "2025-09-12T20:10:21.118Z" },
+    { url = "https://files.pythonhosted.org/packages/15/07/86397573efefff941e100367bbda0b21496ffcdb34db7ab51912994c32a2/hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6b6bceb6361c80c1cc42b5a7b4e3efd90e64630bcf11224dcac50ef30a47e435", size = 3186960, upload-time = "2025-09-12T20:10:19.336Z" },
+    { url = "https://files.pythonhosted.org/packages/01/a7/0b2e242b918cc30e1f91980f3c4b026ff2eedaf1e2ad96933bca164b2869/hf_xet-1.1.10-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:eae7c1fc8a664e54753ffc235e11427ca61f4b0477d757cc4eb9ae374b69f09c", size = 3087167, upload-time = "2025-09-12T20:10:17.255Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/25/3e32ab61cc7145b11eee9d745988e2f0f4fafda81b25980eebf97d8cff15/hf_xet-1.1.10-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:0a0005fd08f002180f7a12d4e13b22be277725bc23ed0529f8add5c7a6309c06", size = 3248612, upload-time = "2025-09-12T20:10:24.093Z" },
+    { url = "https://files.pythonhosted.org/packages/2c/3d/ab7109e607ed321afaa690f557a9ada6d6d164ec852fd6bf9979665dc3d6/hf_xet-1.1.10-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:f900481cf6e362a6c549c61ff77468bd59d6dd082f3170a36acfef2eb6a6793f", size = 3353360, upload-time = "2025-09-12T20:10:25.563Z" },
+    { url = "https://files.pythonhosted.org/packages/ee/0e/471f0a21db36e71a2f1752767ad77e92d8cde24e974e03d662931b1305ec/hf_xet-1.1.10-cp37-abi3-win_amd64.whl", hash = "sha256:5f54b19cc347c13235ae7ee98b330c26dd65ef1df47e5316ffb1e87713ca7045", size = 2804691, upload-time = "2025-09-12T20:10:28.433Z" },
+]
+[[package]]
+name = "huggingface-hub"
+version = "0.34.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "filelock" },
+    { name = "fsspec" },
+    { name = "hf-xet", marker = "platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'" },
+    { name = "packaging" },
+    { name = "pyyaml" },
+    { name = "requests" },
+    { name = "tqdm" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/45/c9/bdbe19339f76d12985bc03572f330a01a93c04dffecaaea3061bdd7fb892/huggingface_hub-0.34.4.tar.gz", hash = "sha256:a4228daa6fb001be3f4f4bdaf9a0db00e1739235702848df00885c9b5742c85c", size = 459768, upload-time = "2025-08-08T09:14:52.365Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/39/7b/bb06b061991107cd8783f300adff3e7b7f284e330fd82f507f2a1417b11d/huggingface_hub-0.34.4-py3-none-any.whl", hash = "sha256:9b365d781739c93ff90c359844221beef048403f1bc1f1c123c191257c3c890a", size = 561452, upload-time = "2025-08-08T09:14:50.159Z" },
+]
+[[package]]
+name = "idna"
+version = "3.10"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f1/70/7703c29685631f5a7590aa73f1f1d3fa9a380e654b86af429e0934a32f7d/idna-3.10.tar.gz", hash = "sha256:12f65c9b470abda6dc35cf8e63cc574b1c52b11df2c86030af0ac09b01b13ea9", size = 190490, upload-time = "2024-09-15T18:07:39.745Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/76/c6/c88e154df9c4e1a2a66ccf0005a88dfb2650c1dffb6f5ce603dfbd452ce3/idna-3.10-py3-none-any.whl", hash = "sha256:946d195a0d259cbba61165e88e65941f16e9b36ea6ddb97f00452bae8b1287d3", size = 70442, upload-time = "2024-09-15T18:07:37.964Z" },
+]
+[[package]]
+name = "jinja2"
+version = "3.1.6"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "markupsafe" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/df/bf/f7da0350254c0ed7c72f3e33cef02e048281fec7ecec5f032d4aac52226b/jinja2-3.1.6.tar.gz", hash = "sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d", size = 245115, upload-time = "2025-03-05T20:05:02.478Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67", size = 134899, upload-time = "2025-03-05T20:05:00.369Z" },
+]
+[[package]]
+name = "markupsafe"
+version = "3.0.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b2/97/5d42485e71dfc078108a86d6de8fa46db44a1a9295e89c5d6d4a06e23a62/markupsafe-3.0.2.tar.gz", hash = "sha256:ee55d3edf80167e48ea11a923c7386f4669df67d7994554387f84e7d8b0a2bf0", size = 20537, upload-time = "2024-10-18T15:21:54.129Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/83/0e/67eb10a7ecc77a0c2bbe2b0235765b98d164d81600746914bebada795e97/MarkupSafe-3.0.2-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:ba9527cdd4c926ed0760bc301f6728ef34d841f405abf9d4f959c478421e4efd", size = 14274, upload-time = "2024-10-18T15:21:24.577Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/6d/9409f3684d3335375d04e5f05744dfe7e9f120062c9857df4ab490a1031a/MarkupSafe-3.0.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f8b3d067f2e40fe93e1ccdd6b2e1d16c43140e76f02fb1319a05cf2b79d99430", size = 12352, upload-time = "2024-10-18T15:21:25.382Z" },
+    { url = "https://files.pythonhosted.org/packages/d2/f5/6eadfcd3885ea85fe2a7c128315cc1bb7241e1987443d78c8fe712d03091/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:569511d3b58c8791ab4c2e1285575265991e6d8f8700c7be0e88f86cb0672094", size = 24122, upload-time = "2024-10-18T15:21:26.199Z" },
+    { url = "https://files.pythonhosted.org/packages/0c/91/96cf928db8236f1bfab6ce15ad070dfdd02ed88261c2afafd4b43575e9e9/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:15ab75ef81add55874e7ab7055e9c397312385bd9ced94920f2802310c930396", size = 23085, upload-time = "2024-10-18T15:21:27.029Z" },
+    { url = "https://files.pythonhosted.org/packages/c2/cf/c9d56af24d56ea04daae7ac0940232d31d5a8354f2b457c6d856b2057d69/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f3818cb119498c0678015754eba762e0d61e5b52d34c8b13d770f0719f7b1d79", size = 22978, upload-time = "2024-10-18T15:21:27.846Z" },
+    { url = "https://files.pythonhosted.org/packages/2a/9f/8619835cd6a711d6272d62abb78c033bda638fdc54c4e7f4272cf1c0962b/MarkupSafe-3.0.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:cdb82a876c47801bb54a690c5ae105a46b392ac6099881cdfb9f6e95e4014c6a", size = 24208, upload-time = "2024-10-18T15:21:28.744Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/bf/176950a1792b2cd2102b8ffeb5133e1ed984547b75db47c25a67d3359f77/MarkupSafe-3.0.2-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:cabc348d87e913db6ab4aa100f01b08f481097838bdddf7c7a84b7575b7309ca", size = 23357, upload-time = "2024-10-18T15:21:29.545Z" },
+    { url = "https://files.pythonhosted.org/packages/ce/4f/9a02c1d335caabe5c4efb90e1b6e8ee944aa245c1aaaab8e8a618987d816/MarkupSafe-3.0.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:444dcda765c8a838eaae23112db52f1efaf750daddb2d9ca300bcae1039adc5c", size = 23344, upload-time = "2024-10-18T15:21:30.366Z" },
+    { url = "https://files.pythonhosted.org/packages/ee/55/c271b57db36f748f0e04a759ace9f8f759ccf22b4960c270c78a394f58be/MarkupSafe-3.0.2-cp313-cp313-win32.whl", hash = "sha256:bcf3e58998965654fdaff38e58584d8937aa3096ab5354d493c77d1fdd66d7a1", size = 15101, upload-time = "2024-10-18T15:21:31.207Z" },
+    { url = "https://files.pythonhosted.org/packages/29/88/07df22d2dd4df40aba9f3e402e6dc1b8ee86297dddbad4872bd5e7b0094f/MarkupSafe-3.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:e6a2a455bd412959b57a172ce6328d2dd1f01cb2135efda2e4576e8a23fa3b0f", size = 15603, upload-time = "2024-10-18T15:21:32.032Z" },
+    { url = "https://files.pythonhosted.org/packages/62/6a/8b89d24db2d32d433dffcd6a8779159da109842434f1dd2f6e71f32f738c/MarkupSafe-3.0.2-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:b5a6b3ada725cea8a5e634536b1b01c30bcdcd7f9c6fff4151548d5bf6b3a36c", size = 14510, upload-time = "2024-10-18T15:21:33.625Z" },
+    { url = "https://files.pythonhosted.org/packages/7a/06/a10f955f70a2e5a9bf78d11a161029d278eeacbd35ef806c3fd17b13060d/MarkupSafe-3.0.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:a904af0a6162c73e3edcb969eeeb53a63ceeb5d8cf642fade7d39e7963a22ddb", size = 12486, upload-time = "2024-10-18T15:21:34.611Z" },
+    { url = "https://files.pythonhosted.org/packages/34/cf/65d4a571869a1a9078198ca28f39fba5fbb910f952f9dbc5220afff9f5e6/MarkupSafe-3.0.2-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4aa4e5faecf353ed117801a068ebab7b7e09ffb6e1d5e412dc852e0da018126c", size = 25480, upload-time = "2024-10-18T15:21:35.398Z" },
+    { url = "https://files.pythonhosted.org/packages/0c/e3/90e9651924c430b885468b56b3d597cabf6d72be4b24a0acd1fa0e12af67/MarkupSafe-3.0.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c0ef13eaeee5b615fb07c9a7dadb38eac06a0608b41570d8ade51c56539e509d", size = 23914, upload-time = "2024-10-18T15:21:36.231Z" },
+    { url = "https://files.pythonhosted.org/packages/66/8c/6c7cf61f95d63bb866db39085150df1f2a5bd3335298f14a66b48e92659c/MarkupSafe-3.0.2-cp313-cp313t-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d16a81a06776313e817c951135cf7340a3e91e8c1ff2fac444cfd75fffa04afe", size = 23796, upload-time = "2024-10-18T15:21:37.073Z" },
+    { url = "https://files.pythonhosted.org/packages/bb/35/cbe9238ec3f47ac9a7c8b3df7a808e7cb50fe149dc7039f5f454b3fba218/MarkupSafe-3.0.2-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:6381026f158fdb7c72a168278597a5e3a5222e83ea18f543112b2662a9b699c5", size = 25473, upload-time = "2024-10-18T15:21:37.932Z" },
+    { url = "https://files.pythonhosted.org/packages/e6/32/7621a4382488aa283cc05e8984a9c219abad3bca087be9ec77e89939ded9/MarkupSafe-3.0.2-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:3d79d162e7be8f996986c064d1c7c817f6df3a77fe3d6859f6f9e7be4b8c213a", size = 24114, upload-time = "2024-10-18T15:21:39.799Z" },
+    { url = "https://files.pythonhosted.org/packages/0d/80/0985960e4b89922cb5a0bac0ed39c5b96cbc1a536a99f30e8c220a996ed9/MarkupSafe-3.0.2-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:131a3c7689c85f5ad20f9f6fb1b866f402c445b220c19fe4308c0b147ccd2ad9", size = 24098, upload-time = "2024-10-18T15:21:40.813Z" },
+    { url = "https://files.pythonhosted.org/packages/82/78/fedb03c7d5380df2427038ec8d973587e90561b2d90cd472ce9254cf348b/MarkupSafe-3.0.2-cp313-cp313t-win32.whl", hash = "sha256:ba8062ed2cf21c07a9e295d5b8a2a5ce678b913b45fdf68c32d95d6c1291e0b6", size = 15208, upload-time = "2024-10-18T15:21:41.814Z" },
+    { url = "https://files.pythonhosted.org/packages/4f/65/6079a46068dfceaeabb5dcad6d674f5f5c61a6fa5673746f42a9f4c233b3/MarkupSafe-3.0.2-cp313-cp313t-win_amd64.whl", hash = "sha256:e444a31f8db13eb18ada366ab3cf45fd4b31e4db1236a4448f68778c1d1a5a2f", size = 15739, upload-time = "2024-10-18T15:21:42.784Z" },
+]
+[[package]]
+name = "mlx"
+version = "0.29.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "mlx-metal", marker = "sys_platform == 'darwin'" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/66/62/7691ea664123d6e1fc0626207d5f1a6ed2b92b71059f4be42634e89b479e/mlx-0.29.1-cp313-cp313-macosx_13_0_arm64.whl", hash = "sha256:e86644cef409a00dd46eb9debf0796899623c686d16cc25b6e83078fb5081eba", size = 546904, upload-time = "2025-09-12T00:17:43.197Z" },
+    { url = "https://files.pythonhosted.org/packages/44/b8/1a77cafb6302703fe5576b2298f533cb36b6721fa6d9c41a9d6078c14a89/mlx-0.29.1-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:fd27d49f631ecc9d0a766327e65236738e338c74c7be504c22a1e53801eb40d1", size = 546909, upload-time = "2025-09-12T00:17:15.127Z" },
+    { url = "https://files.pythonhosted.org/packages/79/f1/1f4ddf70d1f77993e25f25fb0ab8f5579d81fce6a2a554400c75b447c148/mlx-0.29.1-cp313-cp313-macosx_15_0_arm64.whl", hash = "sha256:aaeacf864163b645ddd58c57e65290bf4c8cd493378e89dd11c00d2c9c42b42d", size = 546904, upload-time = "2025-09-12T00:17:09.691Z" },
+    { url = "https://files.pythonhosted.org/packages/eb/47/7216e859ba3dbda78c840858cf1e120442721b48c974f587ef4e89d5f86f/mlx-0.29.1-cp313-cp313-manylinux_2_35_x86_64.whl", hash = "sha256:e33221c75ebed38dc6bad7fed46cdde8e4dbb47d789401232b4ab2c34305d42d", size = 646103, upload-time = "2025-09-12T00:21:56.544Z" },
+]
+[[package]]
+name = "mlx-lm"
+version = "0.27.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "jinja2" },
+    { name = "mlx" },
+    { name = "numpy" },
+    { name = "protobuf" },
+    { name = "pyyaml" },
+    { name = "transformers" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/41/77/e8d3a82658a2070bc392a583dd08c8d24088433e920eac4905bf882255ad/mlx_lm-0.27.1.tar.gz", hash = "sha256:36640fb64c909cfd9baddf37b16e7d3b94a1a141033e6b7ea7a0ef5a965fb4ae", size = 185170, upload-time = "2025-09-04T16:06:57.949Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e1/54/5f35831d208cbf81572e9a0ae8ac6d595ca7c59f3e1da57c367894b0a75b/mlx_lm-0.27.1-py3-none-any.whl", hash = "sha256:300da6f63d8d392483b62b2abda794730fa04343dcb28a1f6a712f4c3ab60f3c", size = 255687, upload-time = "2025-09-04T16:06:54.904Z" },
+]
+[[package]]
+name = "mlx-metal"
+version = "0.29.1"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/61/b4/c96f54061fff12c2acc06f2cd402aae4a9cba52e40aae51f71ae508ef206/mlx_metal-0.29.1-py3-none-macosx_13_0_arm64.whl", hash = "sha256:b9dadd432948eab196ed110db0dc745795fd516b7124c0d3c4d176fee678a07a", size = 34983555, upload-time = "2025-09-12T00:19:44.815Z" },
+    { url = "https://files.pythonhosted.org/packages/82/3a/45c9ea1b6741a5dc80ad0b57eeee09e544a0d89ca66c7ad6cc55887c00d8/mlx_metal-0.29.1-py3-none-macosx_14_0_arm64.whl", hash = "sha256:824b939b721a964a455aeea4d0e956e4cc945f3333522c1e72a077ae774bca49", size = 34712571, upload-time = "2025-09-12T00:19:26.183Z" },
+    { url = "https://files.pythonhosted.org/packages/64/7f/294c8cac159661d732e5c01f841e07edfd2ea90651d39faca6579b3cdbf4/mlx_metal-0.29.1-py3-none-macosx_15_0_arm64.whl", hash = "sha256:ebd9ba8e83213f929663b92b8065b451a4276c7002ed83eae0fc8dde721c50c5", size = 34704543, upload-time = "2025-09-12T00:18:59.595Z" },
+]
+[[package]]
+name = "mobilellm-r1-950m"
+version = "0.1.0"
+source = { virtual = "." }
+dependencies = [
+    { name = "mlx" },
+    { name = "mlx-lm" },
+    { name = "safetensors" },
+    { name = "transformers" },
+]
+[package.dev-dependencies]
+dev = [
+    { name = "torch" },
+]
+[package.metadata]
+requires-dist = [
+    { name = "mlx", specifier = ">=0.29.1" },
+    { name = "mlx-lm", specifier = ">=0.27.1" },
+    { name = "safetensors", specifier = ">=0.6.2" },
+    { name = "transformers", specifier = ">=4.56.1" },
+]
+[package.metadata.requires-dev]
+dev = [{ name = "torch", specifier = ">=2.8.0" }]
+[[package]]
+name = "mpmath"
+version = "1.3.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e0/47/dd32fa426cc72114383ac549964eecb20ecfd886d1e5ccf5340b55b02f57/mpmath-1.3.0.tar.gz", hash = "sha256:7a28eb2a9774d00c7bc92411c19a89209d5da7c4c9a9e227be8330a23a25b91f", size = 508106, upload-time = "2023-03-07T16:47:11.061Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl", hash = "sha256:a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c", size = 536198, upload-time = "2023-03-07T16:47:09.197Z" },
+]
+[[package]]
+name = "networkx"
+version = "3.5"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/6c/4f/ccdb8ad3a38e583f214547fd2f7ff1fc160c43a75af88e6aec213404b96a/networkx-3.5.tar.gz", hash = "sha256:d4c6f9cf81f52d69230866796b82afbccdec3db7ae4fbd1b65ea750feed50037", size = 2471065, upload-time = "2025-05-29T11:35:07.804Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/eb/8d/776adee7bbf76365fdd7f2552710282c79a4ead5d2a46408c9043a2b70ba/networkx-3.5-py3-none-any.whl", hash = "sha256:0030d386a9a06dee3565298b4a734b68589749a544acbb6c412dc9e2489ec6ec", size = 2034406, upload-time = "2025-05-29T11:35:04.961Z" },
+]
+[[package]]
+name = "numpy"
+version = "2.3.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d0/19/95b3d357407220ed24c139018d2518fab0a61a948e68286a25f1a4d049ff/numpy-2.3.3.tar.gz", hash = "sha256:ddc7c39727ba62b80dfdbedf400d1c10ddfa8eefbd7ec8dcb118be8b56d31029", size = 20576648, upload-time = "2025-09-09T16:54:12.543Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/7d/b9/984c2b1ee61a8b803bf63582b4ac4242cf76e2dbd663efeafcb620cc0ccb/numpy-2.3.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:f5415fb78995644253370985342cd03572ef8620b934da27d77377a2285955bf", size = 20949588, upload-time = "2025-09-09T15:56:59.087Z" },
+    { url = "https://files.pythonhosted.org/packages/a6/e4/07970e3bed0b1384d22af1e9912527ecbeb47d3b26e9b6a3bced068b3bea/numpy-2.3.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:d00de139a3324e26ed5b95870ce63be7ec7352171bc69a4cf1f157a48e3eb6b7", size = 14177802, upload-time = "2025-09-09T15:57:01.73Z" },
+    { url = "https://files.pythonhosted.org/packages/35/c7/477a83887f9de61f1203bad89cf208b7c19cc9fef0cebef65d5a1a0619f2/numpy-2.3.3-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:9dc13c6a5829610cc07422bc74d3ac083bd8323f14e2827d992f9e52e22cd6a6", size = 5106537, upload-time = "2025-09-09T15:57:03.765Z" },
+    { url = "https://files.pythonhosted.org/packages/52/47/93b953bd5866a6f6986344d045a207d3f1cfbad99db29f534ea9cee5108c/numpy-2.3.3-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:d79715d95f1894771eb4e60fb23f065663b2298f7d22945d66877aadf33d00c7", size = 6640743, upload-time = "2025-09-09T15:57:07.921Z" },
+    { url = "https://files.pythonhosted.org/packages/23/83/377f84aaeb800b64c0ef4de58b08769e782edcefa4fea712910b6f0afd3c/numpy-2.3.3-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:952cfd0748514ea7c3afc729a0fc639e61655ce4c55ab9acfab14bda4f402b4c", size = 14278881, upload-time = "2025-09-09T15:57:11.349Z" },
+    { url = "https://files.pythonhosted.org/packages/9a/a5/bf3db6e66c4b160d6ea10b534c381a1955dfab34cb1017ea93aa33c70ed3/numpy-2.3.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5b83648633d46f77039c29078751f80da65aa64d5622a3cd62aaef9d835b6c93", size = 16636301, upload-time = "2025-09-09T15:57:14.245Z" },
+    { url = "https://files.pythonhosted.org/packages/a2/59/1287924242eb4fa3f9b3a2c30400f2e17eb2707020d1c5e3086fe7330717/numpy-2.3.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:b001bae8cea1c7dfdb2ae2b017ed0a6f2102d7a70059df1e338e307a4c78a8ae", size = 16053645, upload-time = "2025-09-09T15:57:16.534Z" },
+    { url = "https://files.pythonhosted.org/packages/e6/93/b3d47ed882027c35e94ac2320c37e452a549f582a5e801f2d34b56973c97/numpy-2.3.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8e9aced64054739037d42fb84c54dd38b81ee238816c948c8f3ed134665dcd86", size = 18578179, upload-time = "2025-09-09T15:57:18.883Z" },
+    { url = "https://files.pythonhosted.org/packages/20/d9/487a2bccbf7cc9d4bfc5f0f197761a5ef27ba870f1e3bbb9afc4bbe3fcc2/numpy-2.3.3-cp313-cp313-win32.whl", hash = "sha256:9591e1221db3f37751e6442850429b3aabf7026d3b05542d102944ca7f00c8a8", size = 6312250, upload-time = "2025-09-09T15:57:21.296Z" },
+    { url = "https://files.pythonhosted.org/packages/1b/b5/263ebbbbcede85028f30047eab3d58028d7ebe389d6493fc95ae66c636ab/numpy-2.3.3-cp313-cp313-win_amd64.whl", hash = "sha256:f0dadeb302887f07431910f67a14d57209ed91130be0adea2f9793f1a4f817cf", size = 12783269, upload-time = "2025-09-09T15:57:23.034Z" },
+    { url = "https://files.pythonhosted.org/packages/fa/75/67b8ca554bbeaaeb3fac2e8bce46967a5a06544c9108ec0cf5cece559b6c/numpy-2.3.3-cp313-cp313-win_arm64.whl", hash = "sha256:3c7cf302ac6e0b76a64c4aecf1a09e51abd9b01fc7feee80f6c43e3ab1b1dbc5", size = 10195314, upload-time = "2025-09-09T15:57:25.045Z" },
+    { url = "https://files.pythonhosted.org/packages/11/d0/0d1ddec56b162042ddfafeeb293bac672de9b0cfd688383590090963720a/numpy-2.3.3-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:eda59e44957d272846bb407aad19f89dc6f58fecf3504bd144f4c5cf81a7eacc", size = 21048025, upload-time = "2025-09-09T15:57:27.257Z" },
+    { url = "https://files.pythonhosted.org/packages/36/9e/1996ca6b6d00415b6acbdd3c42f7f03ea256e2c3f158f80bd7436a8a19f3/numpy-2.3.3-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:823d04112bc85ef5c4fda73ba24e6096c8f869931405a80aa8b0e604510a26bc", size = 14301053, upload-time = "2025-09-09T15:57:30.077Z" },
+    { url = "https://files.pythonhosted.org/packages/05/24/43da09aa764c68694b76e84b3d3f0c44cb7c18cdc1ba80e48b0ac1d2cd39/numpy-2.3.3-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:40051003e03db4041aa325da2a0971ba41cf65714e65d296397cc0e32de6018b", size = 5229444, upload-time = "2025-09-09T15:57:32.733Z" },
+    { url = "https://files.pythonhosted.org/packages/bc/14/50ffb0f22f7218ef8af28dd089f79f68289a7a05a208db9a2c5dcbe123c1/numpy-2.3.3-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:6ee9086235dd6ab7ae75aba5662f582a81ced49f0f1c6de4260a78d8f2d91a19", size = 6738039, upload-time = "2025-09-09T15:57:34.328Z" },
+    { url = "https://files.pythonhosted.org/packages/55/52/af46ac0795e09657d45a7f4db961917314377edecf66db0e39fa7ab5c3d3/numpy-2.3.3-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:94fcaa68757c3e2e668ddadeaa86ab05499a70725811e582b6a9858dd472fb30", size = 14352314, upload-time = "2025-09-09T15:57:36.255Z" },
+    { url = "https://files.pythonhosted.org/packages/a7/b1/dc226b4c90eb9f07a3fff95c2f0db3268e2e54e5cce97c4ac91518aee71b/numpy-2.3.3-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:da1a74b90e7483d6ce5244053399a614b1d6b7bc30a60d2f570e5071f8959d3e", size = 16701722, upload-time = "2025-09-09T15:57:38.622Z" },
+    { url = "https://files.pythonhosted.org/packages/9d/9d/9d8d358f2eb5eced14dba99f110d83b5cd9a4460895230f3b396ad19a323/numpy-2.3.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:2990adf06d1ecee3b3dcbb4977dfab6e9f09807598d647f04d385d29e7a3c3d3", size = 16132755, upload-time = "2025-09-09T15:57:41.16Z" },
+    { url = "https://files.pythonhosted.org/packages/b6/27/b3922660c45513f9377b3fb42240bec63f203c71416093476ec9aa0719dc/numpy-2.3.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:ed635ff692483b8e3f0fcaa8e7eb8a75ee71aa6d975388224f70821421800cea", size = 18651560, upload-time = "2025-09-09T15:57:43.459Z" },
+    { url = "https://files.pythonhosted.org/packages/5b/8e/3ab61a730bdbbc201bb245a71102aa609f0008b9ed15255500a99cd7f780/numpy-2.3.3-cp313-cp313t-win32.whl", hash = "sha256:a333b4ed33d8dc2b373cc955ca57babc00cd6f9009991d9edc5ddbc1bac36bcd", size = 6442776, upload-time = "2025-09-09T15:57:45.793Z" },
+    { url = "https://files.pythonhosted.org/packages/1c/3a/e22b766b11f6030dc2decdeff5c2fb1610768055603f9f3be88b6d192fb2/numpy-2.3.3-cp313-cp313t-win_amd64.whl", hash = "sha256:4384a169c4d8f97195980815d6fcad04933a7e1ab3b530921c3fef7a1c63426d", size = 12927281, upload-time = "2025-09-09T15:57:47.492Z" },
+    { url = "https://files.pythonhosted.org/packages/7b/42/c2e2bc48c5e9b2a83423f99733950fbefd86f165b468a3d85d52b30bf782/numpy-2.3.3-cp313-cp313t-win_arm64.whl", hash = "sha256:75370986cc0bc66f4ce5110ad35aae6d182cc4ce6433c40ad151f53690130bf1", size = 10265275, upload-time = "2025-09-09T15:57:49.647Z" },
+    { url = "https://files.pythonhosted.org/packages/6b/01/342ad585ad82419b99bcf7cebe99e61da6bedb89e213c5fd71acc467faee/numpy-2.3.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:cd052f1fa6a78dee696b58a914b7229ecfa41f0a6d96dc663c1220a55e137593", size = 20951527, upload-time = "2025-09-09T15:57:52.006Z" },
+    { url = "https://files.pythonhosted.org/packages/ef/d8/204e0d73fc1b7a9ee80ab1fe1983dd33a4d64a4e30a05364b0208e9a241a/numpy-2.3.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:414a97499480067d305fcac9716c29cf4d0d76db6ebf0bf3cbce666677f12652", size = 14186159, upload-time = "2025-09-09T15:57:54.407Z" },
+    { url = "https://files.pythonhosted.org/packages/22/af/f11c916d08f3a18fb8ba81ab72b5b74a6e42ead4c2846d270eb19845bf74/numpy-2.3.3-cp314-cp314-macosx_14_0_arm64.whl", hash = "sha256:50a5fe69f135f88a2be9b6ca0481a68a136f6febe1916e4920e12f1a34e708a7", size = 5114624, upload-time = "2025-09-09T15:57:56.5Z" },
+    { url = "https://files.pythonhosted.org/packages/fb/11/0ed919c8381ac9d2ffacd63fd1f0c34d27e99cab650f0eb6f110e6ae4858/numpy-2.3.3-cp314-cp314-macosx_14_0_x86_64.whl", hash = "sha256:b912f2ed2b67a129e6a601e9d93d4fa37bef67e54cac442a2f588a54afe5c67a", size = 6642627, upload-time = "2025-09-09T15:57:58.206Z" },
+    { url = "https://files.pythonhosted.org/packages/ee/83/deb5f77cb0f7ba6cb52b91ed388b47f8f3c2e9930d4665c600408d9b90b9/numpy-2.3.3-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9e318ee0596d76d4cb3d78535dc005fa60e5ea348cd131a51e99d0bdbe0b54fe", size = 14296926, upload-time = "2025-09-09T15:58:00.035Z" },
+    { url = "https://files.pythonhosted.org/packages/77/cc/70e59dcb84f2b005d4f306310ff0a892518cc0c8000a33d0e6faf7ca8d80/numpy-2.3.3-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ce020080e4a52426202bdb6f7691c65bb55e49f261f31a8f506c9f6bc7450421", size = 16638958, upload-time = "2025-09-09T15:58:02.738Z" },
+    { url = "https://files.pythonhosted.org/packages/b6/5a/b2ab6c18b4257e099587d5b7f903317bd7115333ad8d4ec4874278eafa61/numpy-2.3.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:e6687dc183aa55dae4a705b35f9c0f8cb178bcaa2f029b241ac5356221d5c021", size = 16071920, upload-time = "2025-09-09T15:58:05.029Z" },
+    { url = "https://files.pythonhosted.org/packages/b8/f1/8b3fdc44324a259298520dd82147ff648979bed085feeacc1250ef1656c0/numpy-2.3.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:d8f3b1080782469fdc1718c4ed1d22549b5fb12af0d57d35e992158a772a37cf", size = 18577076, upload-time = "2025-09-09T15:58:07.745Z" },
+    { url = "https://files.pythonhosted.org/packages/f0/a1/b87a284fb15a42e9274e7fcea0dad259d12ddbf07c1595b26883151ca3b4/numpy-2.3.3-cp314-cp314-win32.whl", hash = "sha256:cb248499b0bc3be66ebd6578b83e5acacf1d6cb2a77f2248ce0e40fbec5a76d0", size = 6366952, upload-time = "2025-09-09T15:58:10.096Z" },
+    { url = "https://files.pythonhosted.org/packages/70/5f/1816f4d08f3b8f66576d8433a66f8fa35a5acfb3bbd0bf6c31183b003f3d/numpy-2.3.3-cp314-cp314-win_amd64.whl", hash = "sha256:691808c2b26b0f002a032c73255d0bd89751425f379f7bcd22d140db593a96e8", size = 12919322, upload-time = "2025-09-09T15:58:12.138Z" },
+    { url = "https://files.pythonhosted.org/packages/8c/de/072420342e46a8ea41c324a555fa90fcc11637583fb8df722936aed1736d/numpy-2.3.3-cp314-cp314-win_arm64.whl", hash = "sha256:9ad12e976ca7b10f1774b03615a2a4bab8addce37ecc77394d8e986927dc0dfe", size = 10478630, upload-time = "2025-09-09T15:58:14.64Z" },
+    { url = "https://files.pythonhosted.org/packages/d5/df/ee2f1c0a9de7347f14da5dd3cd3c3b034d1b8607ccb6883d7dd5c035d631/numpy-2.3.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:9cc48e09feb11e1db00b320e9d30a4151f7369afb96bd0e48d942d09da3a0d00", size = 21047987, upload-time = "2025-09-09T15:58:16.889Z" },
+    { url = "https://files.pythonhosted.org/packages/d6/92/9453bdc5a4e9e69cf4358463f25e8260e2ffc126d52e10038b9077815989/numpy-2.3.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:901bf6123879b7f251d3631967fd574690734236075082078e0571977c6a8e6a", size = 14301076, upload-time = "2025-09-09T15:58:20.343Z" },
+    { url = "https://files.pythonhosted.org/packages/13/77/1447b9eb500f028bb44253105bd67534af60499588a5149a94f18f2ca917/numpy-2.3.3-cp314-cp314t-macosx_14_0_arm64.whl", hash = "sha256:7f025652034199c301049296b59fa7d52c7e625017cae4c75d8662e377bf487d", size = 5229491, upload-time = "2025-09-09T15:58:22.481Z" },
+    { url = "https://files.pythonhosted.org/packages/3d/f9/d72221b6ca205f9736cb4b2ce3b002f6e45cd67cd6a6d1c8af11a2f0b649/numpy-2.3.3-cp314-cp314t-macosx_14_0_x86_64.whl", hash = "sha256:533ca5f6d325c80b6007d4d7fb1984c303553534191024ec6a524a4c92a5935a", size = 6737913, upload-time = "2025-09-09T15:58:24.569Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/5f/d12834711962ad9c46af72f79bb31e73e416ee49d17f4c797f72c96b6ca5/numpy-2.3.3-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0edd58682a399824633b66885d699d7de982800053acf20be1eaa46d92009c54", size = 14352811, upload-time = "2025-09-09T15:58:26.416Z" },
+    { url = "https://files.pythonhosted.org/packages/a1/0d/fdbec6629d97fd1bebed56cd742884e4eead593611bbe1abc3eb40d304b2/numpy-2.3.3-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:367ad5d8fbec5d9296d18478804a530f1191e24ab4d75ab408346ae88045d25e", size = 16702689, upload-time = "2025-09-09T15:58:28.831Z" },
+    { url = "https://files.pythonhosted.org/packages/9b/09/0a35196dc5575adde1eb97ddfbc3e1687a814f905377621d18ca9bc2b7dd/numpy-2.3.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8f6ac61a217437946a1fa48d24c47c91a0c4f725237871117dea264982128097", size = 16133855, upload-time = "2025-09-09T15:58:31.349Z" },
+    { url = "https://files.pythonhosted.org/packages/7a/ca/c9de3ea397d576f1b6753eaa906d4cdef1bf97589a6d9825a349b4729cc2/numpy-2.3.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:179a42101b845a816d464b6fe9a845dfaf308fdfc7925387195570789bb2c970", size = 18652520, upload-time = "2025-09-09T15:58:33.762Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/c2/e5ed830e08cd0196351db55db82f65bc0ab05da6ef2b72a836dcf1936d2f/numpy-2.3.3-cp314-cp314t-win32.whl", hash = "sha256:1250c5d3d2562ec4174bce2e3a1523041595f9b651065e4a4473f5f48a6bc8a5", size = 6515371, upload-time = "2025-09-09T15:58:36.04Z" },
+    { url = "https://files.pythonhosted.org/packages/47/c7/b0f6b5b67f6788a0725f744496badbb604d226bf233ba716683ebb47b570/numpy-2.3.3-cp314-cp314t-win_amd64.whl", hash = "sha256:b37a0b2e5935409daebe82c1e42274d30d9dd355852529eab91dab8dcca7419f", size = 13112576, upload-time = "2025-09-09T15:58:37.927Z" },
+    { url = "https://files.pythonhosted.org/packages/06/b9/33bba5ff6fb679aa0b1f8a07e853f002a6b04b9394db3069a1270a7784ca/numpy-2.3.3-cp314-cp314t-win_arm64.whl", hash = "sha256:78c9f6560dc7e6b3990e32df7ea1a50bbd0e2a111e05209963f5ddcab7073b0b", size = 10545953, upload-time = "2025-09-09T15:58:40.576Z" },
+]
+[[package]]
+name = "nvidia-cublas-cu12"
+version = "12.8.4.1"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/dc/61/e24b560ab2e2eaeb3c839129175fb330dfcfc29e5203196e5541a4c44682/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:8ac4e771d5a348c551b2a426eda6193c19aa630236b418086020df5ba9667142", size = 594346921, upload-time = "2025-03-07T01:44:31.254Z" },
+]
+[[package]]
+name = "nvidia-cuda-cupti-cu12"
+version = "12.8.90"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/f8/02/2adcaa145158bf1a8295d83591d22e4103dbfd821bcaf6f3f53151ca4ffa/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ea0cb07ebda26bb9b29ba82cda34849e73c166c18162d3913575b0c9db9a6182", size = 10248621, upload-time = "2025-03-07T01:40:21.213Z" },
+]
+[[package]]
+name = "nvidia-cuda-nvrtc-cu12"
+version = "12.8.93"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/05/6b/32f747947df2da6994e999492ab306a903659555dddc0fbdeb9d71f75e52/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:a7756528852ef889772a84c6cd89d41dfa74667e24cca16bb31f8f061e3e9994", size = 88040029, upload-time = "2025-03-07T01:42:13.562Z" },
+]
+[[package]]
+name = "nvidia-cuda-runtime-cu12"
+version = "12.8.90"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/0d/9b/a997b638fcd068ad6e4d53b8551a7d30fe8b404d6f1804abf1df69838932/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:adade8dcbd0edf427b7204d480d6066d33902cab2a4707dcfc48a2d0fd44ab90", size = 954765, upload-time = "2025-03-07T01:40:01.615Z" },
+]
+[[package]]
+name = "nvidia-cudnn-cu12"
+version = "9.10.2.21"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "nvidia-cublas-cu12" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ba/51/e123d997aa098c61d029f76663dedbfb9bc8dcf8c60cbd6adbe42f76d049/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:949452be657fa16687d0930933f032835951ef0892b37d2d53824d1a84dc97a8", size = 706758467, upload-time = "2025-06-06T21:54:08.597Z" },
+]
+[[package]]
+name = "nvidia-cufft-cu12"
+version = "11.3.3.83"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "nvidia-nvjitlink-cu12" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/1f/13/ee4e00f30e676b66ae65b4f08cb5bcbb8392c03f54f2d5413ea99a5d1c80/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:4d2dd21ec0b88cf61b62e6b43564355e5222e4a3fb394cac0db101f2dd0d4f74", size = 193118695, upload-time = "2025-03-07T01:45:27.821Z" },
+]
+[[package]]
+name = "nvidia-cufile-cu12"
+version = "1.13.1.3"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/bb/fe/1bcba1dfbfb8d01be8d93f07bfc502c93fa23afa6fd5ab3fc7c1df71038a/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1d069003be650e131b21c932ec3d8969c1715379251f8d23a1860554b1cb24fc", size = 1197834, upload-time = "2025-03-07T01:45:50.723Z" },
+]
+[[package]]
+name = "nvidia-curand-cu12"
+version = "10.3.9.90"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/fb/aa/6584b56dc84ebe9cf93226a5cde4d99080c8e90ab40f0c27bda7a0f29aa1/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:b32331d4f4df5d6eefa0554c565b626c7216f87a06a4f56fab27c3b68a830ec9", size = 63619976, upload-time = "2025-03-07T01:46:23.323Z" },
+]
+[[package]]
+name = "nvidia-cusolver-cu12"
+version = "11.7.3.90"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "nvidia-cublas-cu12" },
+    { name = "nvidia-cusparse-cu12" },
+    { name = "nvidia-nvjitlink-cu12" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/85/48/9a13d2975803e8cf2777d5ed57b87a0b6ca2cc795f9a4f59796a910bfb80/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:4376c11ad263152bd50ea295c05370360776f8c3427b30991df774f9fb26c450", size = 267506905, upload-time = "2025-03-07T01:47:16.273Z" },
+]
+[[package]]
+name = "nvidia-cusparse-cu12"
+version = "12.5.8.93"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "nvidia-nvjitlink-cu12" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c2/f5/e1854cb2f2bcd4280c44736c93550cc300ff4b8c95ebe370d0aa7d2b473d/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1ec05d76bbbd8b61b06a80e1eaf8cf4959c3d4ce8e711b65ebd0443bb0ebb13b", size = 288216466, upload-time = "2025-03-07T01:48:13.779Z" },
+]
+[[package]]
+name = "nvidia-cusparselt-cu12"
+version = "0.7.1"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/56/79/12978b96bd44274fe38b5dde5cfb660b1d114f70a65ef962bcbbed99b549/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl", hash = "sha256:f1bb701d6b930d5a7cea44c19ceb973311500847f81b634d802b7b539dc55623", size = 287193691, upload-time = "2025-02-26T00:15:44.104Z" },
+]
+[[package]]
+name = "nvidia-nccl-cu12"
+version = "2.27.3"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/5c/5b/4e4fff7bad39adf89f735f2bc87248c81db71205b62bcc0d5ca5b606b3c3/nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:adf27ccf4238253e0b826bce3ff5fa532d65fc42322c8bfdfaf28024c0fbe039", size = 322364134, upload-time = "2025-06-03T21:58:04.013Z" },
+]
+[[package]]
+name = "nvidia-nvjitlink-cu12"
+version = "12.8.93"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/f6/74/86a07f1d0f42998ca31312f998bd3b9a7eff7f52378f4f270c8679c77fb9/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:81ff63371a7ebd6e6451970684f916be2eab07321b73c9d244dc2b4da7f73b88", size = 39254836, upload-time = "2025-03-07T01:49:55.661Z" },
+]
+[[package]]
+name = "nvidia-nvtx-cu12"
+version = "12.8.90"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a2/eb/86626c1bbc2edb86323022371c39aa48df6fd8b0a1647bc274577f72e90b/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b17e2001cc0d751a5bc2c6ec6d26ad95913324a4adb86788c944f8ce9ba441f", size = 89954, upload-time = "2025-03-07T01:42:44.131Z" },
+]
+[[package]]
+name = "packaging"
+version = "25.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/a1/d4/1fc4078c65507b51b96ca8f8c3ba19e6a61c8253c72794544580a7b6c24d/packaging-25.0.tar.gz", hash = "sha256:d443872c98d677bf60f6a1f2f8c1cb748e8fe762d2bf9d3148b5599295b0fc4f", size = 165727, upload-time = "2025-04-19T11:48:59.673Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/20/12/38679034af332785aac8774540895e234f4d07f7545804097de4b666afd8/packaging-25.0-py3-none-any.whl", hash = "sha256:29572ef2b1f17581046b3a2227d5c611fb25ec70ca1ba8554b24b0e69331a484", size = 66469, upload-time = "2025-04-19T11:48:57.875Z" },
+]
+[[package]]
+name = "protobuf"
+version = "6.32.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/fa/a4/cc17347aa2897568beece2e674674359f911d6fe21b0b8d6268cd42727ac/protobuf-6.32.1.tar.gz", hash = "sha256:ee2469e4a021474ab9baafea6cd070e5bf27c7d29433504ddea1a4ee5850f68d", size = 440635, upload-time = "2025-09-11T21:38:42.935Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c0/98/645183ea03ab3995d29086b8bf4f7562ebd3d10c9a4b14ee3f20d47cfe50/protobuf-6.32.1-cp310-abi3-win32.whl", hash = "sha256:a8a32a84bc9f2aad712041b8b366190f71dde248926da517bde9e832e4412085", size = 424411, upload-time = "2025-09-11T21:38:27.427Z" },
+    { url = "https://files.pythonhosted.org/packages/8c/f3/6f58f841f6ebafe076cebeae33fc336e900619d34b1c93e4b5c97a81fdfa/protobuf-6.32.1-cp310-abi3-win_amd64.whl", hash = "sha256:b00a7d8c25fa471f16bc8153d0e53d6c9e827f0953f3c09aaa4331c718cae5e1", size = 435738, upload-time = "2025-09-11T21:38:30.959Z" },
+    { url = "https://files.pythonhosted.org/packages/10/56/a8a3f4e7190837139e68c7002ec749190a163af3e330f65d90309145a210/protobuf-6.32.1-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:d8c7e6eb619ffdf105ee4ab76af5a68b60a9d0f66da3ea12d1640e6d8dab7281", size = 426454, upload-time = "2025-09-11T21:38:34.076Z" },
+    { url = "https://files.pythonhosted.org/packages/3f/be/8dd0a927c559b37d7a6c8ab79034fd167dcc1f851595f2e641ad62be8643/protobuf-6.32.1-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:2f5b80a49e1eb7b86d85fcd23fe92df154b9730a725c3b38c4e43b9d77018bf4", size = 322874, upload-time = "2025-09-11T21:38:35.509Z" },
+    { url = "https://files.pythonhosted.org/packages/5c/f6/88d77011b605ef979aace37b7703e4eefad066f7e84d935e5a696515c2dd/protobuf-6.32.1-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:b1864818300c297265c83a4982fd3169f97122c299f56a56e2445c3698d34710", size = 322013, upload-time = "2025-09-11T21:38:37.017Z" },
+    { url = "https://files.pythonhosted.org/packages/97/b7/15cc7d93443d6c6a84626ae3258a91f4c6ac8c0edd5df35ea7658f71b79c/protobuf-6.32.1-py3-none-any.whl", hash = "sha256:2601b779fc7d32a866c6b4404f9d42a3f67c5b9f3f15b4db3cccabe06b95c346", size = 169289, upload-time = "2025-09-11T21:38:41.234Z" },
+]
+[[package]]
+name = "pyyaml"
+version = "6.0.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/54/ed/79a089b6be93607fa5cdaedf301d7dfb23af5f25c398d5ead2525b063e17/pyyaml-6.0.2.tar.gz", hash = "sha256:d584d9ec91ad65861cc08d42e834324ef890a082e591037abe114850ff7bbc3e", size = 130631, upload-time = "2024-08-06T20:33:50.674Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ef/e3/3af305b830494fa85d95f6d95ef7fa73f2ee1cc8ef5b495c7c3269fb835f/PyYAML-6.0.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:efdca5630322a10774e8e98e1af481aad470dd62c3170801852d752aa7a783ba", size = 181309, upload-time = "2024-08-06T20:32:43.4Z" },
+    { url = "https://files.pythonhosted.org/packages/45/9f/3b1c20a0b7a3200524eb0076cc027a970d320bd3a6592873c85c92a08731/PyYAML-6.0.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:50187695423ffe49e2deacb8cd10510bc361faac997de9efef88badc3bb9e2d1", size = 171679, upload-time = "2024-08-06T20:32:44.801Z" },
+    { url = "https://files.pythonhosted.org/packages/7c/9a/337322f27005c33bcb656c655fa78325b730324c78620e8328ae28b64d0c/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0ffe8360bab4910ef1b9e87fb812d8bc0a308b0d0eef8c8f44e0254ab3b07133", size = 733428, upload-time = "2024-08-06T20:32:46.432Z" },
+    { url = "https://files.pythonhosted.org/packages/a3/69/864fbe19e6c18ea3cc196cbe5d392175b4cf3d5d0ac1403ec3f2d237ebb5/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:17e311b6c678207928d649faa7cb0d7b4c26a0ba73d41e99c4fff6b6c3276484", size = 763361, upload-time = "2024-08-06T20:32:51.188Z" },
+    { url = "https://files.pythonhosted.org/packages/04/24/b7721e4845c2f162d26f50521b825fb061bc0a5afcf9a386840f23ea19fa/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:70b189594dbe54f75ab3a1acec5f1e3faa7e8cf2f1e08d9b561cb41b845f69d5", size = 759523, upload-time = "2024-08-06T20:32:53.019Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/b2/e3234f59ba06559c6ff63c4e10baea10e5e7df868092bf9ab40e5b9c56b6/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:41e4e3953a79407c794916fa277a82531dd93aad34e29c2a514c2c0c5fe971cc", size = 726660, upload-time = "2024-08-06T20:32:54.708Z" },
+    { url = "https://files.pythonhosted.org/packages/fe/0f/25911a9f080464c59fab9027482f822b86bf0608957a5fcc6eaac85aa515/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:68ccc6023a3400877818152ad9a1033e3db8625d899c72eacb5a668902e4d652", size = 751597, upload-time = "2024-08-06T20:32:56.985Z" },
+    { url = "https://files.pythonhosted.org/packages/14/0d/e2c3b43bbce3cf6bd97c840b46088a3031085179e596d4929729d8d68270/PyYAML-6.0.2-cp313-cp313-win32.whl", hash = "sha256:bc2fa7c6b47d6bc618dd7fb02ef6fdedb1090ec036abab80d4681424b84c1183", size = 140527, upload-time = "2024-08-06T20:33:03.001Z" },
+    { url = "https://files.pythonhosted.org/packages/fa/de/02b54f42487e3d3c6efb3f89428677074ca7bf43aae402517bc7cca949f3/PyYAML-6.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:8388ee1976c416731879ac16da0aff3f63b286ffdd57cdeb95f3f2e085687563", size = 156446, upload-time = "2024-08-06T20:33:04.33Z" },
+]
+[[package]]
+name = "regex"
+version = "2025.9.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b2/5a/4c63457fbcaf19d138d72b2e9b39405954f98c0349b31c601bfcb151582c/regex-2025.9.1.tar.gz", hash = "sha256:88ac07b38d20b54d79e704e38aa3bd2c0f8027432164226bdee201a1c0c9c9ff", size = 400852, upload-time = "2025-09-01T22:10:10.479Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/98/25/b2959ce90c6138c5142fe5264ee1f9b71a0c502ca4c7959302a749407c79/regex-2025.9.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:bc6834727d1b98d710a63e6c823edf6ffbf5792eba35d3fa119531349d4142ef", size = 485932, upload-time = "2025-09-01T22:08:57.913Z" },
+    { url = "https://files.pythonhosted.org/packages/49/2e/6507a2a85f3f2be6643438b7bd976e67ad73223692d6988eb1ff444106d3/regex-2025.9.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:c3dc05b6d579875719bccc5f3037b4dc80433d64e94681a0061845bd8863c025", size = 289568, upload-time = "2025-09-01T22:08:59.258Z" },
+    { url = "https://files.pythonhosted.org/packages/c7/d8/de4a4b57215d99868f1640e062a7907e185ec7476b4b689e2345487c1ff4/regex-2025.9.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:22213527df4c985ec4a729b055a8306272d41d2f45908d7bacb79be0fa7a75ad", size = 286984, upload-time = "2025-09-01T22:09:00.835Z" },
+    { url = "https://files.pythonhosted.org/packages/03/15/e8cb403403a57ed316e80661db0e54d7aa2efcd85cb6156f33cc18746922/regex-2025.9.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8e3f6e3c5a5a1adc3f7ea1b5aec89abfc2f4fbfba55dafb4343cd1d084f715b2", size = 797514, upload-time = "2025-09-01T22:09:02.538Z" },
+    { url = "https://files.pythonhosted.org/packages/e4/26/2446f2b9585fed61faaa7e2bbce3aca7dd8df6554c32addee4c4caecf24a/regex-2025.9.1-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:bcb89c02a0d6c2bec9b0bb2d8c78782699afe8434493bfa6b4021cc51503f249", size = 862586, upload-time = "2025-09-01T22:09:04.322Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/b8/82ffbe9c0992c31bbe6ae1c4b4e21269a5df2559102b90543c9b56724c3c/regex-2025.9.1-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:b0e2f95413eb0c651cd1516a670036315b91b71767af83bc8525350d4375ccba", size = 910815, upload-time = "2025-09-01T22:09:05.978Z" },
+    { url = "https://files.pythonhosted.org/packages/2f/d8/7303ea38911759c1ee30cc5bc623ee85d3196b733c51fd6703c34290a8d9/regex-2025.9.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:09a41dc039e1c97d3c2ed3e26523f748e58c4de3ea7a31f95e1cf9ff973fff5a", size = 802042, upload-time = "2025-09-01T22:09:07.865Z" },
+    { url = "https://files.pythonhosted.org/packages/fc/0e/6ad51a55ed4b5af512bb3299a05d33309bda1c1d1e1808fa869a0bed31bc/regex-2025.9.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:4f0b4258b161094f66857a26ee938d3fe7b8a5063861e44571215c44fbf0e5df", size = 786764, upload-time = "2025-09-01T22:09:09.362Z" },
+    { url = "https://files.pythonhosted.org/packages/8d/d5/394e3ffae6baa5a9217bbd14d96e0e5da47bb069d0dbb8278e2681a2b938/regex-2025.9.1-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:bf70e18ac390e6977ea7e56f921768002cb0fa359c4199606c7219854ae332e0", size = 856557, upload-time = "2025-09-01T22:09:11.129Z" },
+    { url = "https://files.pythonhosted.org/packages/cd/80/b288d3910c41194ad081b9fb4b371b76b0bbfdce93e7709fc98df27b37dc/regex-2025.9.1-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:b84036511e1d2bb0a4ff1aec26951caa2dea8772b223c9e8a19ed8885b32dbac", size = 849108, upload-time = "2025-09-01T22:09:12.877Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/cd/5ec76bf626d0d5abdc277b7a1734696f5f3d14fbb4a3e2540665bc305d85/regex-2025.9.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:c2e05dcdfe224047f2a59e70408274c325d019aad96227ab959403ba7d58d2d7", size = 788201, upload-time = "2025-09-01T22:09:14.561Z" },
+    { url = "https://files.pythonhosted.org/packages/b5/36/674672f3fdead107565a2499f3007788b878188acec6d42bc141c5366c2c/regex-2025.9.1-cp313-cp313-win32.whl", hash = "sha256:3b9a62107a7441b81ca98261808fed30ae36ba06c8b7ee435308806bd53c1ed8", size = 264508, upload-time = "2025-09-01T22:09:16.193Z" },
+    { url = "https://files.pythonhosted.org/packages/83/ad/931134539515eb64ce36c24457a98b83c1b2e2d45adf3254b94df3735a76/regex-2025.9.1-cp313-cp313-win_amd64.whl", hash = "sha256:b38afecc10c177eb34cfae68d669d5161880849ba70c05cbfbe409f08cc939d7", size = 275469, upload-time = "2025-09-01T22:09:17.462Z" },
+    { url = "https://files.pythonhosted.org/packages/24/8c/96d34e61c0e4e9248836bf86d69cb224fd222f270fa9045b24e218b65604/regex-2025.9.1-cp313-cp313-win_arm64.whl", hash = "sha256:ec329890ad5e7ed9fc292858554d28d58d56bf62cf964faf0aa57964b21155a0", size = 268586, upload-time = "2025-09-01T22:09:18.948Z" },
+    { url = "https://files.pythonhosted.org/packages/21/b1/453cbea5323b049181ec6344a803777914074b9726c9c5dc76749966d12d/regex-2025.9.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:72fb7a016467d364546f22b5ae86c45680a4e0de6b2a6f67441d22172ff641f1", size = 486111, upload-time = "2025-09-01T22:09:20.734Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/0e/92577f197bd2f7652c5e2857f399936c1876978474ecc5b068c6d8a79c86/regex-2025.9.1-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:c9527fa74eba53f98ad86be2ba003b3ebe97e94b6eb2b916b31b5f055622ef03", size = 289520, upload-time = "2025-09-01T22:09:22.249Z" },
+    { url = "https://files.pythonhosted.org/packages/af/c6/b472398116cca7ea5a6c4d5ccd0fc543f7fd2492cb0c48d2852a11972f73/regex-2025.9.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c905d925d194c83a63f92422af7544ec188301451b292c8b487f0543726107ca", size = 287215, upload-time = "2025-09-01T22:09:23.657Z" },
+    { url = "https://files.pythonhosted.org/packages/cf/11/f12ecb0cf9ca792a32bb92f758589a84149017467a544f2f6bfb45c0356d/regex-2025.9.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:74df7c74a63adcad314426b1f4ea6054a5ab25d05b0244f0c07ff9ce640fa597", size = 797855, upload-time = "2025-09-01T22:09:25.197Z" },
+    { url = "https://files.pythonhosted.org/packages/46/88/bbb848f719a540fb5997e71310f16f0b33a92c5d4b4d72d4311487fff2a3/regex-2025.9.1-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:4f6e935e98ea48c7a2e8be44494de337b57a204470e7f9c9c42f912c414cd6f5", size = 863363, upload-time = "2025-09-01T22:09:26.705Z" },
+    { url = "https://files.pythonhosted.org/packages/54/a9/2321eb3e2838f575a78d48e03c1e83ea61bd08b74b7ebbdeca8abc50fc25/regex-2025.9.1-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:4a62d033cd9ebefc7c5e466731a508dfabee827d80b13f455de68a50d3c2543d", size = 910202, upload-time = "2025-09-01T22:09:28.906Z" },
+    { url = "https://files.pythonhosted.org/packages/33/07/d1d70835d7d11b7e126181f316f7213c4572ecf5c5c97bdbb969fb1f38a2/regex-2025.9.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ef971ebf2b93bdc88d8337238be4dfb851cc97ed6808eb04870ef67589415171", size = 801808, upload-time = "2025-09-01T22:09:30.733Z" },
+    { url = "https://files.pythonhosted.org/packages/13/d1/29e4d1bed514ef2bf3a4ead3cb8bb88ca8af94130239a4e68aa765c35b1c/regex-2025.9.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:d936a1db208bdca0eca1f2bb2c1ba1d8370b226785c1e6db76e32a228ffd0ad5", size = 786824, upload-time = "2025-09-01T22:09:32.61Z" },
+    { url = "https://files.pythonhosted.org/packages/33/27/20d8ccb1bee460faaa851e6e7cc4cfe852a42b70caa1dca22721ba19f02f/regex-2025.9.1-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:7e786d9e4469698fc63815b8de08a89165a0aa851720eb99f5e0ea9d51dd2b6a", size = 857406, upload-time = "2025-09-01T22:09:34.117Z" },
+    { url = "https://files.pythonhosted.org/packages/74/fe/60c6132262dc36430d51e0c46c49927d113d3a38c1aba6a26c7744c84cf3/regex-2025.9.1-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:6b81d7dbc5466ad2c57ce3a0ddb717858fe1a29535c8866f8514d785fdb9fc5b", size = 848593, upload-time = "2025-09-01T22:09:35.598Z" },
+    { url = "https://files.pythonhosted.org/packages/cc/ae/2d4ff915622fabbef1af28387bf71e7f2f4944a348b8460d061e85e29bf0/regex-2025.9.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:cd4890e184a6feb0ef195338a6ce68906a8903a0f2eb7e0ab727dbc0a3156273", size = 787951, upload-time = "2025-09-01T22:09:37.139Z" },
+    { url = "https://files.pythonhosted.org/packages/85/37/dc127703a9e715a284cc2f7dbdd8a9776fd813c85c126eddbcbdd1ca5fec/regex-2025.9.1-cp314-cp314-win32.whl", hash = "sha256:34679a86230e46164c9e0396b56cab13c0505972343880b9e705083cc5b8ec86", size = 269833, upload-time = "2025-09-01T22:09:39.245Z" },
+    { url = "https://files.pythonhosted.org/packages/83/bf/4bed4d3d0570e16771defd5f8f15f7ea2311edcbe91077436d6908956c4a/regex-2025.9.1-cp314-cp314-win_amd64.whl", hash = "sha256:a1196e530a6bfa5f4bde029ac5b0295a6ecfaaffbfffede4bbaf4061d9455b70", size = 278742, upload-time = "2025-09-01T22:09:40.651Z" },
+    { url = "https://files.pythonhosted.org/packages/cf/3e/7d7ac6fd085023312421e0d69dfabdfb28e116e513fadbe9afe710c01893/regex-2025.9.1-cp314-cp314-win_arm64.whl", hash = "sha256:f46d525934871ea772930e997d577d48c6983e50f206ff7b66d4ac5f8941e993", size = 271860, upload-time = "2025-09-01T22:09:42.413Z" },
+]
+[[package]]
+name = "requests"
+version = "2.32.5"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "certifi" },
+    { name = "charset-normalizer" },
+    { name = "idna" },
+    { name = "urllib3" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c9/74/b3ff8e6c8446842c3f5c837e9c3dfcfe2018ea6ecef224c710c85ef728f4/requests-2.32.5.tar.gz", hash = "sha256:dbba0bac56e100853db0ea71b82b4dfd5fe2bf6d3754a8893c3af500cec7d7cf", size = 134517, upload-time = "2025-08-18T20:46:02.573Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl", hash = "sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6", size = 64738, upload-time = "2025-08-18T20:46:00.542Z" },
+]
+[[package]]
+name = "safetensors"
+version = "0.6.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/ac/cc/738f3011628920e027a11754d9cae9abec1aed00f7ae860abbf843755233/safetensors-0.6.2.tar.gz", hash = "sha256:43ff2aa0e6fa2dc3ea5524ac7ad93a9839256b8703761e76e2d0b2a3fa4f15d9", size = 197968, upload-time = "2025-08-08T13:13:58.654Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/4d/b1/3f5fd73c039fc87dba3ff8b5d528bfc5a32b597fea8e7a6a4800343a17c7/safetensors-0.6.2-cp38-abi3-macosx_10_12_x86_64.whl", hash = "sha256:9c85ede8ec58f120bad982ec47746981e210492a6db876882aa021446af8ffba", size = 454797, upload-time = "2025-08-08T13:13:52.066Z" },
+    { url = "https://files.pythonhosted.org/packages/8c/c9/bb114c158540ee17907ec470d01980957fdaf87b4aa07914c24eba87b9c6/safetensors-0.6.2-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:d6675cf4b39c98dbd7d940598028f3742e0375a6b4d4277e76beb0c35f4b843b", size = 432206, upload-time = "2025-08-08T13:13:50.931Z" },
+    { url = "https://files.pythonhosted.org/packages/d3/8e/f70c34e47df3110e8e0bb268d90db8d4be8958a54ab0336c9be4fe86dac8/safetensors-0.6.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1d2d2b3ce1e2509c68932ca03ab8f20570920cd9754b05063d4368ee52833ecd", size = 473261, upload-time = "2025-08-08T13:13:41.259Z" },
+    { url = "https://files.pythonhosted.org/packages/2a/f5/be9c6a7c7ef773e1996dc214e73485286df1836dbd063e8085ee1976f9cb/safetensors-0.6.2-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:93de35a18f46b0f5a6a1f9e26d91b442094f2df02e9fd7acf224cfec4238821a", size = 485117, upload-time = "2025-08-08T13:13:43.506Z" },
+    { url = "https://files.pythonhosted.org/packages/c9/55/23f2d0a2c96ed8665bf17a30ab4ce5270413f4d74b6d87dd663258b9af31/safetensors-0.6.2-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:89a89b505f335640f9120fac65ddeb83e40f1fd081cb8ed88b505bdccec8d0a1", size = 616154, upload-time = "2025-08-08T13:13:45.096Z" },
+    { url = "https://files.pythonhosted.org/packages/98/c6/affb0bd9ce02aa46e7acddbe087912a04d953d7a4d74b708c91b5806ef3f/safetensors-0.6.2-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:fc4d0d0b937e04bdf2ae6f70cd3ad51328635fe0e6214aa1fc811f3b576b3bda", size = 520713, upload-time = "2025-08-08T13:13:46.25Z" },
+    { url = "https://files.pythonhosted.org/packages/fe/5d/5a514d7b88e310c8b146e2404e0dc161282e78634d9358975fd56dfd14be/safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8045db2c872db8f4cbe3faa0495932d89c38c899c603f21e9b6486951a5ecb8f", size = 485835, upload-time = "2025-08-08T13:13:49.373Z" },
+    { url = "https://files.pythonhosted.org/packages/7a/7b/4fc3b2ba62c352b2071bea9cfbad330fadda70579f617506ae1a2f129cab/safetensors-0.6.2-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:81e67e8bab9878bb568cffbc5f5e655adb38d2418351dc0859ccac158f753e19", size = 521503, upload-time = "2025-08-08T13:13:47.651Z" },
+    { url = "https://files.pythonhosted.org/packages/5a/50/0057e11fe1f3cead9254315a6c106a16dd4b1a19cd247f7cc6414f6b7866/safetensors-0.6.2-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:b0e4d029ab0a0e0e4fdf142b194514695b1d7d3735503ba700cf36d0fc7136ce", size = 652256, upload-time = "2025-08-08T13:13:53.167Z" },
+    { url = "https://files.pythonhosted.org/packages/e9/29/473f789e4ac242593ac1656fbece6e1ecd860bb289e635e963667807afe3/safetensors-0.6.2-cp38-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:fa48268185c52bfe8771e46325a1e21d317207bcabcb72e65c6e28e9ffeb29c7", size = 747281, upload-time = "2025-08-08T13:13:54.656Z" },
+    { url = "https://files.pythonhosted.org/packages/68/52/f7324aad7f2df99e05525c84d352dc217e0fa637a4f603e9f2eedfbe2c67/safetensors-0.6.2-cp38-abi3-musllinux_1_2_i686.whl", hash = "sha256:d83c20c12c2d2f465997c51b7ecb00e407e5f94d7dec3ea0cc11d86f60d3fde5", size = 692286, upload-time = "2025-08-08T13:13:55.884Z" },
+    { url = "https://files.pythonhosted.org/packages/ad/fe/cad1d9762868c7c5dc70c8620074df28ebb1a8e4c17d4c0cb031889c457e/safetensors-0.6.2-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:d944cea65fad0ead848b6ec2c37cc0b197194bec228f8020054742190e9312ac", size = 655957, upload-time = "2025-08-08T13:13:57.029Z" },
+    { url = "https://files.pythonhosted.org/packages/59/a7/e2158e17bbe57d104f0abbd95dff60dda916cf277c9f9663b4bf9bad8b6e/safetensors-0.6.2-cp38-abi3-win32.whl", hash = "sha256:cab75ca7c064d3911411461151cb69380c9225798a20e712b102edda2542ddb1", size = 308926, upload-time = "2025-08-08T13:14:01.095Z" },
+    { url = "https://files.pythonhosted.org/packages/2c/c3/c0be1135726618dc1e28d181b8c442403d8dbb9e273fd791de2d4384bcdd/safetensors-0.6.2-cp38-abi3-win_amd64.whl", hash = "sha256:c7b214870df923cbc1593c3faee16bec59ea462758699bd3fee399d00aac072c", size = 320192, upload-time = "2025-08-08T13:13:59.467Z" },
+]
+[[package]]
+name = "setuptools"
+version = "80.9.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/18/5d/3bf57dcd21979b887f014ea83c24ae194cfcd12b9e0fda66b957c69d1fca/setuptools-80.9.0.tar.gz", hash = "sha256:f36b47402ecde768dbfafc46e8e4207b4360c654f1f3bb84475f0a28628fb19c", size = 1319958, upload-time = "2025-05-27T00:56:51.443Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl", hash = "sha256:062d34222ad13e0cc312a4c02d73f059e86a4acbfbdea8f8f76b28c99f306922", size = 1201486, upload-time = "2025-05-27T00:56:49.664Z" },
+]
+[[package]]
+name = "sympy"
+version = "1.14.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "mpmath" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/83/d3/803453b36afefb7c2bb238361cd4ae6125a569b4db67cd9e79846ba2d68c/sympy-1.14.0.tar.gz", hash = "sha256:d3d3fe8df1e5a0b42f0e7bdf50541697dbe7d23746e894990c030e2b05e72517", size = 7793921, upload-time = "2025-04-27T18:05:01.611Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a2/09/77d55d46fd61b4a135c444fc97158ef34a095e5681d0a6c10b75bf356191/sympy-1.14.0-py3-none-any.whl", hash = "sha256:e091cc3e99d2141a0ba2847328f5479b05d94a6635cb96148ccb3f34671bd8f5", size = 6299353, upload-time = "2025-04-27T18:04:59.103Z" },
+]
+[[package]]
+name = "tokenizers"
+version = "0.22.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "huggingface-hub" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5e/b4/c1ce3699e81977da2ace8b16d2badfd42b060e7d33d75c4ccdbf9dc920fa/tokenizers-0.22.0.tar.gz", hash = "sha256:2e33b98525be8453f355927f3cab312c36cd3e44f4d7e9e97da2fa94d0a49dcb", size = 362771, upload-time = "2025-08-29T10:25:33.914Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/6d/b1/18c13648edabbe66baa85fe266a478a7931ddc0cd1ba618802eb7b8d9865/tokenizers-0.22.0-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:eaa9620122a3fb99b943f864af95ed14c8dfc0f47afa3b404ac8c16b3f2bb484", size = 3081954, upload-time = "2025-08-29T10:25:24.993Z" },
+    { url = "https://files.pythonhosted.org/packages/c2/02/c3c454b641bd7c4f79e4464accfae9e7dfc913a777d2e561e168ae060362/tokenizers-0.22.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:71784b9ab5bf0ff3075bceeb198149d2c5e068549c0d18fe32d06ba0deb63f79", size = 2945644, upload-time = "2025-08-29T10:25:23.405Z" },
+    { url = "https://files.pythonhosted.org/packages/55/02/d10185ba2fd8c2d111e124c9d92de398aee0264b35ce433f79fb8472f5d0/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ec5b71f668a8076802b0241a42387d48289f25435b86b769ae1837cad4172a17", size = 3254764, upload-time = "2025-08-29T10:25:12.445Z" },
+    { url = "https://files.pythonhosted.org/packages/13/89/17514bd7ef4bf5bfff58e2b131cec0f8d5cea2b1c8ffe1050a2c8de88dbb/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ea8562fa7498850d02a16178105b58803ea825b50dc9094d60549a7ed63654bb", size = 3161654, upload-time = "2025-08-29T10:25:15.493Z" },
+    { url = "https://files.pythonhosted.org/packages/5a/d8/bac9f3a7ef6dcceec206e3857c3b61bb16c6b702ed7ae49585f5bd85c0ef/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:4136e1558a9ef2e2f1de1555dcd573e1cbc4a320c1a06c4107a3d46dc8ac6e4b", size = 3511484, upload-time = "2025-08-29T10:25:20.477Z" },
+    { url = "https://files.pythonhosted.org/packages/aa/27/9c9800eb6763683010a4851db4d1802d8cab9cec114c17056eccb4d4a6e0/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:cdf5954de3962a5fd9781dc12048d24a1a6f1f5df038c6e95db328cd22964206", size = 3712829, upload-time = "2025-08-29T10:25:17.154Z" },
+    { url = "https://files.pythonhosted.org/packages/10/e3/b1726dbc1f03f757260fa21752e1921445b5bc350389a8314dd3338836db/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8337ca75d0731fc4860e6204cc24bb36a67d9736142aa06ed320943b50b1e7ed", size = 3408934, upload-time = "2025-08-29T10:25:18.76Z" },
+    { url = "https://files.pythonhosted.org/packages/d4/61/aeab3402c26874b74bb67a7f2c4b569dde29b51032c5384db592e7b216f4/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a89264e26f63c449d8cded9061adea7b5de53ba2346fc7e87311f7e4117c1cc8", size = 3345585, upload-time = "2025-08-29T10:25:22.08Z" },
+    { url = "https://files.pythonhosted.org/packages/bc/d3/498b4a8a8764cce0900af1add0f176ff24f475d4413d55b760b8cdf00893/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:790bad50a1b59d4c21592f9c3cf5e5cf9c3c7ce7e1a23a739f13e01fb1be377a", size = 9322986, upload-time = "2025-08-29T10:25:26.607Z" },
+    { url = "https://files.pythonhosted.org/packages/a2/62/92378eb1c2c565837ca3cb5f9569860d132ab9d195d7950c1ea2681dffd0/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:76cf6757c73a10ef10bf06fa937c0ec7393d90432f543f49adc8cab3fb6f26cb", size = 9276630, upload-time = "2025-08-29T10:25:28.349Z" },
+    { url = "https://files.pythonhosted.org/packages/eb/f0/342d80457aa1cda7654327460f69db0d69405af1e4c453f4dc6ca7c4a76e/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:1626cb186e143720c62c6c6b5371e62bbc10af60481388c0da89bc903f37ea0c", size = 9547175, upload-time = "2025-08-29T10:25:29.989Z" },
+    { url = "https://files.pythonhosted.org/packages/14/84/8aa9b4adfc4fbd09381e20a5bc6aa27040c9c09caa89988c01544e008d18/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:da589a61cbfea18ae267723d6b029b84598dc8ca78db9951d8f5beff72d8507c", size = 9692735, upload-time = "2025-08-29T10:25:32.089Z" },
+    { url = "https://files.pythonhosted.org/packages/bf/24/83ee2b1dc76bfe05c3142e7d0ccdfe69f0ad2f1ebf6c726cea7f0874c0d0/tokenizers-0.22.0-cp39-abi3-win32.whl", hash = "sha256:dbf9d6851bddae3e046fedfb166f47743c1c7bd11c640f0691dd35ef0bcad3be", size = 2471915, upload-time = "2025-08-29T10:25:36.411Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/9b/0e0bf82214ee20231845b127aa4a8015936ad5a46779f30865d10e404167/tokenizers-0.22.0-cp39-abi3-win_amd64.whl", hash = "sha256:c78174859eeaee96021f248a56c801e36bfb6bd5b067f2e95aa82445ca324f00", size = 2680494, upload-time = "2025-08-29T10:25:35.14Z" },
+]
+[[package]]
+name = "torch"
+version = "2.8.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "filelock" },
+    { name = "fsspec" },
+    { name = "jinja2" },
+    { name = "networkx" },
+    { name = "nvidia-cublas-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-cuda-cupti-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-cuda-nvrtc-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-cuda-runtime-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-cudnn-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-cufft-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-cufile-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-curand-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-cusolver-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-cusparse-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-cusparselt-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-nccl-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-nvjitlink-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "nvidia-nvtx-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "setuptools" },
+    { name = "sympy" },
+    { name = "triton", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" },
+    { name = "typing-extensions" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/10/4e/469ced5a0603245d6a19a556e9053300033f9c5baccf43a3d25ba73e189e/torch-2.8.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:2b2f96814e0345f5a5aed9bf9734efa913678ed19caf6dc2cddb7930672d6128", size = 101936856, upload-time = "2025-08-06T14:54:01.526Z" },
+    { url = "https://files.pythonhosted.org/packages/16/82/3948e54c01b2109238357c6f86242e6ecbf0c63a1af46906772902f82057/torch-2.8.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:65616ca8ec6f43245e1f5f296603e33923f4c30f93d65e103d9e50c25b35150b", size = 887922844, upload-time = "2025-08-06T14:55:50.78Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/54/941ea0a860f2717d86a811adf0c2cd01b3983bdd460d0803053c4e0b8649/torch-2.8.0-cp313-cp313-win_amd64.whl", hash = "sha256:659df54119ae03e83a800addc125856effda88b016dfc54d9f65215c3975be16", size = 241330968, upload-time = "2025-08-06T14:54:45.293Z" },
+    { url = "https://files.pythonhosted.org/packages/de/69/8b7b13bba430f5e21d77708b616f767683629fc4f8037564a177d20f90ed/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:1a62a1ec4b0498930e2543535cf70b1bef8c777713de7ceb84cd79115f553767", size = 73915128, upload-time = "2025-08-06T14:54:34.769Z" },
+    { url = "https://files.pythonhosted.org/packages/15/0e/8a800e093b7f7430dbaefa80075aee9158ec22e4c4fc3c1a66e4fb96cb4f/torch-2.8.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:83c13411a26fac3d101fe8035a6b0476ae606deb8688e904e796a3534c197def", size = 102020139, upload-time = "2025-08-06T14:54:39.047Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/15/5e488ca0bc6162c86a33b58642bc577c84ded17c7b72d97e49b5833e2d73/torch-2.8.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:8f0a9d617a66509ded240add3754e462430a6c1fc5589f86c17b433dd808f97a", size = 887990692, upload-time = "2025-08-06T14:56:18.286Z" },
+    { url = "https://files.pythonhosted.org/packages/b4/a8/6a04e4b54472fc5dba7ca2341ab219e529f3c07b6941059fbf18dccac31f/torch-2.8.0-cp313-cp313t-win_amd64.whl", hash = "sha256:a7242b86f42be98ac674b88a4988643b9bc6145437ec8f048fea23f72feb5eca", size = 241603453, upload-time = "2025-08-06T14:55:22.945Z" },
+    { url = "https://files.pythonhosted.org/packages/04/6e/650bb7f28f771af0cb791b02348db8b7f5f64f40f6829ee82aa6ce99aabe/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:7b677e17f5a3e69fdef7eb3b9da72622f8d322692930297e4ccb52fefc6c8211", size = 73632395, upload-time = "2025-08-06T14:55:28.645Z" },
+]
+[[package]]
+name = "tqdm"
+version = "4.67.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "colorama", marker = "sys_platform == 'win32'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a8/4b/29b4ef32e036bb34e4ab51796dd745cdba7ed47ad142a9f4a1eb8e0c744d/tqdm-4.67.1.tar.gz", hash = "sha256:f8aef9c52c08c13a65f30ea34f4e5aac3fd1a34959879d7e59e63027286627f2", size = 169737, upload-time = "2024-11-24T20:12:22.481Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d0/30/dc54f88dd4a2b5dc8a0279bdd7270e735851848b762aeb1c1184ed1f6b14/tqdm-4.67.1-py3-none-any.whl", hash = "sha256:26445eca388f82e72884e0d580d5464cd801a3ea01e63e5601bdff9ba6a48de2", size = 78540, upload-time = "2024-11-24T20:12:19.698Z" },
+]
+[[package]]
+name = "transformers"
+version = "4.56.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "filelock" },
+    { name = "huggingface-hub" },
+    { name = "numpy" },
+    { name = "packaging" },
+    { name = "pyyaml" },
+    { name = "regex" },
+    { name = "requests" },
+    { name = "safetensors" },
+    { name = "tokenizers" },
+    { name = "tqdm" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/89/21/dc88ef3da1e49af07ed69386a11047a31dcf1aaf4ded3bc4b173fbf94116/transformers-4.56.1.tar.gz", hash = "sha256:0d88b1089a563996fc5f2c34502f10516cad3ea1aa89f179f522b54c8311fe74", size = 9855473, upload-time = "2025-09-04T20:47:13.14Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/71/7c/283c3dd35e00e22a7803a0b2a65251347b745474a82399be058bde1c9f15/transformers-4.56.1-py3-none-any.whl", hash = "sha256:1697af6addfb6ddbce9618b763f4b52d5a756f6da4899ffd1b4febf58b779248", size = 11608197, upload-time = "2025-09-04T20:47:04.895Z" },
+]
+[[package]]
+name = "triton"
+version = "3.4.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "setuptools" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/30/7b/0a685684ed5322d2af0bddefed7906674f67974aa88b0fae6e82e3b766f6/triton-3.4.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:00be2964616f4c619193cb0d1b29a99bd4b001d7dc333816073f92cf2a8ccdeb", size = 155569223, upload-time = "2025-07-30T19:58:44.017Z" },
+    { url = "https://files.pythonhosted.org/packages/20/63/8cb444ad5cdb25d999b7d647abac25af0ee37d292afc009940c05b82dda0/triton-3.4.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7936b18a3499ed62059414d7df563e6c163c5e16c3773678a3ee3d417865035d", size = 155659780, upload-time = "2025-07-30T19:58:51.171Z" },
+]
+[[package]]
+name = "typing-extensions"
+version = "4.15.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" },
+]
+[[package]]
+name = "urllib3"
+version = "2.5.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/15/22/9ee70a2574a4f4599c47dd506532914ce044817c7752a79b6a51286319bc/urllib3-2.5.0.tar.gz", hash = "sha256:3fc47733c7e419d4bc3f6b3dc2b4f890bb743906a30d56ba4a5bfa4bbff92760", size = 393185, upload-time = "2025-06-18T14:07:41.644Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a7/c2/fe1e52489ae3122415c51f387e221dd0773709bad6c6cdaa599e8a2c5185/urllib3-2.5.0-py3-none-any.whl", hash = "sha256:e6b01673c0fa6a13e374b50871808eb3bf7046c4b125b216f6bf1cc604cff0dc", size = 129795, upload-time = "2025-06-18T14:07:40.39Z" },
+]