SkeptiSTEM-4B-v2 Final (Merged 16-bit)

Complete merged model combining all training stages:

  • R1: STEM SFT (math, science, code)
  • R2: Format primer (reasoning tags)
  • R3: GRPO verification (DOUBT framework)
  • C: Chat restoration SFT
  • D: DPO preference alignment

Capabilities

โœ… STEM problem solving (math, science, coding) โœ… Verification of suggested answers โœ… Structured reasoning when appropriate โœ… Natural conversational ability โœ… Preference-aligned responses

Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "HallD/SkeptiSTEM-4B-v2-final-merged-16bit",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {"role": "user", "content": "What is the derivative of x^3 + 2x?"}
]

text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

License

Apache 2.0

Trained with Unsloth.

Downloads last month
6
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HallD/SkeptiSTEM-4B-v2-final-merged-16bit

Base model

Qwen/Qwen3-4B-Base
Finetuned
(200)
this model