toddric_v2_merged

Merged, ready-to-run weights of a fine-tuned Llama-3.1-8B specialized to be crisp, witty, encouraging, and allergic to fluff. The Stage-C (DPO) LoRA is already merged into the base, so you can load it like any normal HF model folder.

Persona: “You are toddric: crisp, witty, encouraging. Prefer concrete advice over fluff.”


Contents

toddric_v2_merged/ ├─ config.json ├─ generation_config.json ├─ tokenizer_config.json ├─ tokenizer.json (or tokenizer.model) ├─ model.safetensors (or shards model-00001-of-0000N.safetensors) └─ README.md

pgsql Copy code


Quickstart (Transformers)

4-bit (single GPU dev)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_dir = "toddie314/toddric_v2_merged"  # or local path

bnb = BitsAndBytesConfig(load_in_4bit=True)
tok = AutoTokenizer.from_pretrained(model_dir, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    quantization_config=bnb,
    device_map="auto",
)

system = "You are toddric: crisp, witty, encouraging. Prefer concrete advice over fluff."
user   = "Give three tactics to make technical docs clearer."

messages = [
  {"role":"system","content":system},
  {"role":"user","content":user},
]

prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)

out = model.generate(
  **inputs,
  max_new_tokens=200,
  do_sample=True,
  temperature=0.3,
  top_p=0.9,
  repetition_penalty=1.12,
)
print(tok.decode(out[0], skip_special_tokens=True))
Greedy “format-strict” tasks often work best with:

python
Copy code
do_sample=False, max_new_tokens=64
bf16/fp16 (server inference)
Use vLLM/TGI with 2432 GB+ VRAM for maximum throughput. Quant support varies by version.

Why “merged”?
No PEFT adapters at runtime.

Simpler deployment (vLLM/TGI/Transformers).

One folder, one artifact.

To re-merge future adapters, call peft_model.merge_and_unload() or a helper like merge_lora.py.

Prompting patterns (baked-in habits)
Two-line strict style drill

pgsql
Copy code
Return EXACTLY a fenced code block with two lines.
Line 1 must begin with 'Tone:' and give a short tip (<=12 words).
Line 2 must begin with 'Style:' and give a short tip (<=12 words).
Use plain text. Include 'narrative', 'voice', and 'prose' across the two lines.
No extra text before/after.
Safety refusal (medical dosing)
Brief refusal + helpful redirect (doctor/urgent care/emergency line). No first-person, no apologies, 24 sentences.

JSON-only tool output
Output exactly one JSON object. No prose/markdown/questions.

Hardware & env notes
4-bit runs on ~16 GB consumer GPUs with device_map="auto".

Set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce fragmentation.

For CPU fallback, load with low_cpu_mem_usage=True (slower, but fine for tests).

Eval snapshot
Style Meter (greedy): passes strict tasks (RAG vs fine-tuning, dosing refusal, two-line truth/misconception, JSON-only SQL gating).

Stratified Eval: sane length distribution; no runaway outputs.

These are smoke tests—bring your own eval for production.

Limitations & safety
Not a medical/legal/financial advisor; should refuse dosing and high-risk instructions and redirect responsibly.

Concise by design; ask explicitly for longer explanations or examples.

License
Base: Meta Llama 3.1 license.

This fine-tuned merged artifact: inherits the base license unless you specify otherwise.

Citation
bash
Copy code
@software{toddric_v2_merged_2025,
  title  = {toddric_v2_merged: a crisp, concrete-advice Llama-3.1-8B},
  author = {toddie314},
  year   = {2025},
  url    = {https://huggingface.co/toddie314/toddric_v2_merged}
}
Changelog
v2 (Stage-C merged): DPO merge; strict formatting stabilized; JSON gating improved.

v1: Base + SFT + refinement adapters (pre-merge).
Downloads last month
88
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for toddie314/toddric_v2_merged

Finetuned
(1647)
this model

Evaluation results

  • style_meter_greedy_pass_rate on style_meter_smoketest
    self-reported
    1.000