AI Detector LoRA (DeBERTa-v3-large)

LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples (label: 1 = AI, 0 = Human) using microsoft/deberta-v3-large as the base model.

Base model: microsoft/deberta-v3-large
Task: Binary classification (AI vs Human)
Head: Single-logit + BCEWithLogitsLoss
Adapter type: LoRA (peft)
Hardware: 8 x RTX 5090, bf16, multi-GPU
Final decision threshold: 0.8697 (max-F1 on calibration set)

Files in this repo

adapter/ – LoRA weights saved with peft_model.save_pretrained(...)
merged_model/ – fully merged model (base + LoRA) for standalone use
threshold.json – chosen deployment threshold and validation F1
calibration.json – temperature scaling parameters and calibration metrics
results.json – hyperparameters, validation threshold search, test metrics
training_log_history.csv – raw Trainer log history
predictions_calib.csv – calibration-set probabilities and labels
predictions_test.csv – test probabilities and labels
figures/ – training and evaluation plots
README.md – this file

Metrics (test set, n=279,241)

Using threshold 0.8697:

Metric	Value
AUROC	0.9985
Average Precision (AP)	0.9985
F1	0.9812
Accuracy	0.9814
Precision (AI)	0.9902
Recall (AI)	0.9724
Precision (Human)	0.9728
Recall (Human)	0.9904

Confusion matrix (test):

True Negatives (Human correctly): 138,276
False Positives (Human → AI): 1,345
False Negatives (AI → Human): 3,859
True Positives (AI correctly): 135,761

Calibration

Method: temperature scaling
Temperature (T): 1.4437
Calibration set: calibration
Test ECE: 0.0075 → 0.0116 (after calibration)
Test Brier: 0.0157 → 0.0156 (after calibration)

Plots

Training & validation

Learning curves:
Eval metrics over time:

Validation set

ROC:
Precision–Recall:
Calibration curve:
F1 vs threshold:

Test set

ROC:
Precision–Recall:
Calibration curve:
Confusion matrix:

Usage

Load base + LoRA adapter

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
import json

base_model_id = "microsoft/deberta-v3-large"
adapter_id    = "stealthcode/ai-detection"  # or local: "./adapter"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)

base_model = AutoModelForSequenceClassification.from_pretrained(
    base_model_id,
    num_labels=1,  # single logit for BCEWithLogitsLoss
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Inference with threshold

# load threshold
with open("threshold.json") as f:
    thr = json.load(f)["threshold"]  # 0.8697

def predict_proba(texts):
    enc = tokenizer(
        texts,
        padding=True,
        truncation=True,
        max_length=512,
        return_tensors="pt",
    )
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits)
    return probs.cpu().numpy()

def predict_label(texts, threshold=thr):
    probs = predict_proba(texts)
    return (probs >= threshold).astype(int)

# example
texts = ["Some example text to classify"]
probs = predict_proba(texts)
labels = predict_label(texts)
print(probs, labels)  # label 1 = AI, 0 = Human

Load merged model (no PEFT required)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, json

model_dir = "./merged_model"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)
model.eval()

with open("threshold.json") as f:
    thr = json.load(f)["threshold"]  # 0.8697

def predict_proba(texts):
    enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits)
    return probs.cpu().numpy()

Optional: apply temperature scaling to logits

import json
with open("calibration.json") as f:
    T = json.load(f)["temperature"]  # e.g., 1.4437

def predict_proba_calibrated(texts):
    enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits / T)
    return probs.cpu().numpy()

Notes

Classifier head is trainable together with LoRA layers (unfrozen after applying PEFT).
LoRA config:
- r=32, alpha=128, dropout=0.0
- Target modules: query_proj, key_proj, value_proj
Training config:
- bf16=True
- optim="adamw_torch_fused"
- lr_scheduler_type="cosine_with_restarts"
- num_train_epochs=2
- per_device_train_batch_size=8, gradient_accumulation_steps=4
- max_grad_norm=0.5
Threshold 0.8697 was chosen as the max-F1 point on the calibration set. You can adjust it if you prefer fewer false positives or fewer false negatives.

Downloads last month: -

Model tree for stealthcode/ai-detection

Base model

microsoft/deberta-v3-large

Adapter

(8)

this model

Evaluation results

auroc on stealthcode/ai-detection
self-reported

0.999
f1 on stealthcode/ai-detection
self-reported

0.981
accuracy on stealthcode/ai-detection
self-reported

0.981

View on Papers With Code