Model Details

This model is a mixed gguf q2ks format of zai-org/GLM-4.6 generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section Generate the model for more details.

Please follow the license of the original model.

How To Use

Llamacpp inference

/llama-cli -hf Intel/GLM-4.6-gguf-q2ks-mixed-AutoRound
> Write a quick sort algorithm.
<think>The user asks: "Write a quick sort algorithm."\n\nWe need to output a quick sort algorithm. The request is a standard programming request, no disallowed content. So we can comply.\n\nWe need to produce code for quick sort. It could be in various language: Python, Java, C, etc. We can choose a language. Usually, quick sort is a classic algorithm.\n\nWe can provide a straightforward implementation with recursion. The user didn\'t specify language, but we can provide Python implementation (most common). Also we can discuss the algorithm.\n\nGiven the request, we can comply and produce a code snippet.\n\nWe might also include explanation.\n\nThus answer: Provide quick sort algorithm in Python.\n\nWill also include typical implementation with in-place partition.\n\nWe can also discuss complexity.\n\nThus we will answer accordingly.\n\nWe must not mention policy or guidelines.\n\nHence produce code and explanation.\n\nPotentially also provide alternative implementation.\n\nThus final answer will be a quick sort algorithm.\n\nWe might also provide a version for generic types.\n\nLet\'s produce Python code:\n\n```\ndef quicksort(arr):\n    if len(arr) <= 1:\n        return arr\n    pivot = arr[0]\n    less = [x for x in arr[1:] if x <= pivot]\n    greater = [x for x in arr[1:] if x > pivot]\n    return quicksort(less) + [pivot] + quicksort(greater)\n```\n\nAlternatively, in-place version:\n\n```\ndef quicksort_inplace(arr, low, high):\n    if low < high:\n        pi = partition(arr, low, high)\n        quicksort_inplace(arr, low, pi-1)\n        quicksort_inplace(arr, pi+1, high)\n```\n\nWe can also provide partition function.\n\nThus final answer.\n\nGiven the instruction to be brief but thorough, we can provide both.\n\nThus answer: Provide quick sort algorithm.\n\nWe must not output any disallowed content.\n\nThus final answer is safe.\n\nNow produce the final.</think>
Below is a simple, self‑contained implementation of the Quick‑Sort algorithm in Python. Two versions are shown:

* **A functional (non‑in‑place) version** that builds new lists for the recursive calls.
* **An in‑place version** that sorts the original list by swapping elements around a pivot.

Both have the same expected time‑complexity: *O(n log n)* on average (and *O(n²)* in the worst case, e.g., when the pivot is always the smallest or largest element).

---

## 1️ ⃣ Functional (non‑in‑place) Quick‑Sort

```python
def quicksort(arr):
    """
    Return a new list containing the elements of `arr` sorted.
    This version is not in‑place; it creates new sub‑lists for each recursion.
    """
    # Base case: a list of length 0 or 1 is already sorted
    if len(arr) <= 1:
        return arr

    # Choose the first element as the pivot (any element works)
    pivot = arr[0]

    # Partition the remaining elements into three groups:
    #   * less   – elements <= pivot
    #   * equal  – elements == pivot (excluding the pivot itself)
    #   * greater– elements > pivot
    less = [x for x in arr[1:] if x <= pivot]
    greater = [x for x in arr[1:] if x > pivot]

    # Recursively sort the partitions and concatenate the results
    return quicksort(less) + [pivot] + quicksort(greater)


# Example usage:
if __name__ == "__main__":
    unsorted = [33, 10, 55, 71, 29, 3, 18, 42]
    print("Original:", unsorted)
    print("Sorted:", quicksort(unsorted))
```

**Explanation**

1. **Base case** – If the list has 0 or 1 elements, it’s already sorted.
2. **Pivot selection** – Here we simply pick the first element; you could also use a random element or “median‑of‑three” for better performance on pathological inputs.
3. **Partition** – Two list comprehensions split the remaining items into those ≤ pivot and those > pivot.
4. **Recursive sorting** – The two sub‑lists are sorted recursively, then combined with the pivot to produce the final sorted list.

---

## 2️ ⃣ In‑Place Quick‑Sort (Lomuto partition scheme)

```python
def quicksort_inplace(arr, low=0, high=None):
    """
    Sort `arr` in place using Quick‑Sort.
    Parameters:
        arr  – list of comparable items.
        low  – starting index of the sub‑array to sort.
        high – ending index (inclusive) of the sub‑array; if None, uses len(arr)-1.
    """
    if high is None:
        high = len(arr) - 1

    # Continue only if there are at least two elements to sort
    if low < high:
        # Partition the array and obtain the pivot's final position
        pivot_index = partition(arr, low, high)

        # Recursively sort the sub‑arrays left and right of the pivot
        quicksort_inplace(arr, low, pivot_index - 1)
        quicksort_inplace(arr, pivot_index + 1, high)


def partition(arr, low, high):
    """
    Lomuto partition: choose the element at `high` as pivot,
    then rearrange the sub‑array so that all elements <= pivot are left
    and all > pivot are right. Returns the final index of the pivot.
    """
    pivot = arr[high]          # pivot value
    i = low - 1                # index of the smaller element

    for j in range(low, high):
        if arr[j] <= pivot:
            i += 1
            arr[i], arr[j] = arr[j], arr[i]   # swap

    # Place pivot after the last smaller element
    arr[i + 1], arr[high] = arr[high], arr[i + 1]
    return i + 1


# Example usage:
if __name__ == "__main__":
    data = [33, 10, 55, 71, 29, 3, 18, 42]
    print("Original:", data)
    quicksort_inplace(data)
    print("Sorted:", data)
```

**Explanation**

* **`partition`** – The Lomuto scheme picks the last element as the pivot. It walks the sub‑array, moving any element ≤ pivot to the front. After the loop, the pivot is swapped into its correct final position, and its index (`i+1`) is returned.
* **Recursion** – The list is sorted recursively on the two halves separated by the pivot.

---

## 3️ ⃣ Tips & Variations

| Variation | Why use it? |
|-----------|-------------|
| **Random pivot** (`pivot = arr[random.randint(low, high)]`) | Reduces chance of worst‑case *O(n²)* on already sorted inputs. |
| **Median‑of‑three pivot** (choose median of first, middle, last) | Improves partition balance for many real‑world datasets. |
| **Tail‑call optimization** (recurse on the smaller side, loop on the larger) | Keeps recursion depth ≤ log n, preventing stack overflow on large lists. |
| **Hybrid with Insertion Sort** (switch to insertion sort for sub‑arrays ≤ 10 elements) | Quick‑Sort overhead is high for tiny partitions; insertion sort is faster there. |

---

### TL;DR

* Use the **functional version** when you want a concise, immutable‑style implementation or you’re dealing with small data where extra list allocations are cheap.
* Use the **in‑place version** when you need to sort large data without extra memory allocation.

Both achieve the classic Quick‑Sort behavior: fast average performance, elegant divide‑and‑conquer recursion, and a simple pivot‑based partition step. Happy sorting!

Generate the model

Here is the sample command to reproduce the model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_round import AutoRound

model_name = "zai-org/GLM-4.6"

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="cpu", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
layer_config = {}
for n, m in model.named_modules():
    if n == "lm_head" or isinstance(m,torch.nn.Embedding):
        layer_config[n] = {"bits": 8}
    elif isinstance(m, torch.nn.Linear) and (not "expert" in n or "shared_experts" in n) and n != "lm_head":
        layer_config[n] = {"bits": 4}

autoround = AutoRound(model, tokenizer, iters=0, layer_config=layer_config, nsamples=512)
autoround.quantize_and_save("tmp_autoround", format="gguf:q2_k_s")

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

  • Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Downloads last month
408
GGUF
Model size
357B params
Architecture
glm4moe
Hardware compatibility
Log In to view the estimation

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Intel/GLM-4.6-gguf-q2ks-mixed-AutoRound

Base model

zai-org/GLM-4.6
Quantized
(32)
this model