SmolLM2-1.7B-Instruct-QuadRotor-Full-4bit

Weight-quantized via QuadRotor-Full at 4 bits.

Smoke results (WikiText-2, seq_len 2048, MPS)

Variant Perplexity Δ vs BF16
BF16 (upstream) 8.9391
Affine 4-bit 11.0134 +2.07 (+23 %)
QuadRotor-Full 4-bit 11.0197 +2.08 (+23 %)

Random Haar quaternions without calibration are essentially identical to plain affine 4-bit on this model (Δ +0.006). See the source repo for the consolidated picture and next-step recommendations.

Algorithm

Per group of 64 weight elements:

  1. Split norm ρ = ‖x‖₂ from direction x̄ = x/ρ
  2. Reshape into 16 blocks of 4 coordinates
  3. Sample 16 pairs of unit quaternions from a per-tensor seed (SHA-256 of tensor name → 32-bit seed → Haar-on-S³)
  4. Apply T(v) = q_L · v · q̄_R per block
  5. Group-wise affine 4-bit quantize the rotated coordinates

Loading

import json, torch
from pathlib import Path
from huggingface_hub import snapshot_download
from safetensors.torch import load_file
from quadrotor.quantize import QuadRotorConfig
from quadrotor.state_dict import decode_state_dict

p = Path(snapshot_download("majentik/SmolLM2-1.7B-Instruct-QuadRotor-Full-4bit"))
sidecar = json.loads((p / "quadrotor.json").read_text())
cfg = QuadRotorConfig(variant=sidecar["variant"], bits=sidecar["bits"], group_size=sidecar["group_size"])
state = load_file(str(p / "model.safetensors"))
restored = decode_state_dict(state, cfg, sidecar)

Source

License

Apache 2.0 (inherited from base model).

Downloads last month
14
Safetensors
Model size
2B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/SmolLM2-1.7B-Instruct-QuadRotor-Full-4bit

Finetuned
(140)
this model