wildjailbreak-linear-head-narrowing-L4-lr1e-06

Simple MLP linear head for 4-class classification over LLaMA-2 last-token vectors.

  • Architecture: narrowing, layers=4
  • Input dim: 5120
  • Output classes: 4
  • LR: 1e-06
  • Metrics (test): F1(macro)=0.969547, Acc=0.963565

Usage

See example code in this repo card or the snippet we provide in the notebook to load and run inference.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train CatBarks/wildjailbreak-linear-head-narrowing-L4-lr1e-06

Evaluation results

  • f1_macro on wildjailbreak (LLaMA-2 last-token vectors)
    self-reported
    0.000
  • accuracy on wildjailbreak (LLaMA-2 last-token vectors)
    self-reported
    0.000