SkeptiSTEM-4B-v2 Final (Merged 16-bit)

Complete merged model combining all training stages:

R1: STEM SFT (math, science, code)
R2: Format primer (reasoning tags)
R3: GRPO verification (DOUBT framework)
C: Chat restoration SFT
D: DPO preference alignment

Capabilities

✅ STEM problem solving (math, science, coding) ✅ Verification of suggested answers ✅ Structured reasoning when appropriate ✅ Natural conversational ability ✅ Preference-aligned responses

Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "HallD/SkeptiSTEM-4B-v2-final-merged-16bit",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {"role": "user", "content": "What is the derivative of x^3 + 2x?"}
]

text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

License

Apache 2.0

Trained with Unsloth.

Downloads last month: 6

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for HallD/SkeptiSTEM-4B-v2-final-merged-16bit

Base model

Qwen/Qwen3-4B-Base

Finetuned

unsloth/Qwen3-4B-Base

Finetuned

(200)

this model