SmolLM2-1.7B-Instruct-QuadRotor-Full-4bit

Weight-quantized via QuadRotor-Full at 4 bits.

Smoke results (WikiText-2, seq_len 2048, MPS)

Variant	Perplexity	Δ vs BF16
BF16 (upstream)	8.9391	—
Affine 4-bit	11.0134	+2.07 (+23 %)
QuadRotor-Full 4-bit	11.0197	+2.08 (+23 %)

Random Haar quaternions without calibration are essentially identical to plain affine 4-bit on this model (Δ +0.006). See the source repo for the consolidated picture and next-step recommendations.

Algorithm

Per group of 64 weight elements:

Split norm ρ = ‖x‖₂ from direction x̄ = x/ρ
Reshape x̄ into 16 blocks of 4 coordinates
Sample 16 pairs of unit quaternions from a per-tensor seed (SHA-256 of tensor name → 32-bit seed → Haar-on-S³)
Apply T(v) = q_L · v · q̄_R per block
Group-wise affine 4-bit quantize the rotated coordinates

Loading

import json, torch
from pathlib import Path
from huggingface_hub import snapshot_download
from safetensors.torch import load_file
from quadrotor.quantize import QuadRotorConfig
from quadrotor.state_dict import decode_state_dict

p = Path(snapshot_download("majentik/SmolLM2-1.7B-Instruct-QuadRotor-Full-4bit"))
sidecar = json.loads((p / "quadrotor.json").read_text())
cfg = QuadRotorConfig(variant=sidecar["variant"], bits=sidecar["bits"], group_size=sidecar["group_size"])
state = load_file(str(p / "model.safetensors"))
restored = decode_state_dict(state, cfg, sidecar)

Source

Algorithm + tooling: github.com/ajentik/quadrotor
Base model: HuggingFaceTB/SmolLM2-1.7B-Instruct

License

Apache 2.0 (inherited from base model).

Downloads last month: 14

Safetensors

Model size

2B params

Tensor type

F32

BF16

Model tree for majentik/SmolLM2-1.7B-Instruct-QuadRotor-Full-4bit

Base model

HuggingFaceTB/SmolLM2-1.7B

Quantized

HuggingFaceTB/SmolLM2-1.7B-Instruct

Finetuned

(140)

this model