AI Detector LoRA (DeBERTa-v3-large)
LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples
(label: 1 = AI, 0 = Human) using microsoft/deberta-v3-large as the base model.
- Base model:
microsoft/deberta-v3-large - Task: Binary classification (AI vs Human)
- Head: Single-logit +
BCEWithLogitsLoss - Adapter type: LoRA (
peft) - Hardware: 8 x RTX 5090, bf16, multi-GPU
- Final decision threshold: 0.8697 (max-F1 on calibration set)
Files in this repo
adapter/– LoRA weights saved withpeft_model.save_pretrained(...)merged_model/– fully merged model (base + LoRA) for standalone usethreshold.json– chosen deployment threshold and validation F1calibration.json– temperature scaling parameters and calibration metricsresults.json– hyperparameters, validation threshold search, test metricstraining_log_history.csv– raw Trainer log historypredictions_calib.csv– calibration-set probabilities and labelspredictions_test.csv– test probabilities and labelsfigures/– training and evaluation plotsREADME.md– this file
Metrics (test set, n=279,241)
Using threshold 0.8697:
| Metric | Value |
|---|---|
| AUROC | 0.9985 |
| Average Precision (AP) | 0.9985 |
| F1 | 0.9812 |
| Accuracy | 0.9814 |
| Precision (AI) | 0.9902 |
| Recall (AI) | 0.9724 |
| Precision (Human) | 0.9728 |
| Recall (Human) | 0.9904 |
Confusion matrix (test):
- True Negatives (Human correctly): 138,276
- False Positives (Human → AI): 1,345
- False Negatives (AI → Human): 3,859
- True Positives (AI correctly): 135,761
Calibration
- Method: temperature scaling
- Temperature (T): 1.4437
- Calibration set: calibration
- Test ECE: 0.0075 → 0.0116 (after calibration)
- Test Brier: 0.0157 → 0.0156 (after calibration)
Plots
Training & validation
Validation set
Test set
Usage
Load base + LoRA adapter
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
import json
base_model_id = "microsoft/deberta-v3-large"
adapter_id = "stealthcode/ai-detection" # or local: "./adapter"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForSequenceClassification.from_pretrained(
base_model_id,
num_labels=1, # single logit for BCEWithLogitsLoss
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
Inference with threshold
# load threshold
with open("threshold.json") as f:
thr = json.load(f)["threshold"] # 0.8697
def predict_proba(texts):
enc = tokenizer(
texts,
padding=True,
truncation=True,
max_length=512,
return_tensors="pt",
)
with torch.no_grad():
logits = model(**enc).logits.squeeze(-1)
probs = torch.sigmoid(logits)
return probs.cpu().numpy()
def predict_label(texts, threshold=thr):
probs = predict_proba(texts)
return (probs >= threshold).astype(int)
# example
texts = ["Some example text to classify"]
probs = predict_proba(texts)
labels = predict_label(texts)
print(probs, labels) # label 1 = AI, 0 = Human
Load merged model (no PEFT required)
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, json
model_dir = "./merged_model"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)
model.eval()
with open("threshold.json") as f:
thr = json.load(f)["threshold"] # 0.8697
def predict_proba(texts):
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits.squeeze(-1)
probs = torch.sigmoid(logits)
return probs.cpu().numpy()
Optional: apply temperature scaling to logits
import json
with open("calibration.json") as f:
T = json.load(f)["temperature"] # e.g., 1.4437
def predict_proba_calibrated(texts):
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits.squeeze(-1)
probs = torch.sigmoid(logits / T)
return probs.cpu().numpy()
Notes
Classifier head is trainable together with LoRA layers (unfrozen after applying PEFT).
LoRA config:
r=32,alpha=128,dropout=0.0- Target modules:
query_proj,key_proj,value_proj
Training config:
bf16=Trueoptim="adamw_torch_fused"lr_scheduler_type="cosine_with_restarts"num_train_epochs=2per_device_train_batch_size=8,gradient_accumulation_steps=4max_grad_norm=0.5
Threshold
0.8697was chosen as the max-F1 point on the calibration set. You can adjust it if you prefer fewer false positives or fewer false negatives.
- Downloads last month
- -
Model tree for stealthcode/ai-detection
Base model
microsoft/deberta-v3-largeEvaluation results
- auroc on stealthcode/ai-detectionself-reported0.999
- f1 on stealthcode/ai-detectionself-reported0.981
- accuracy on stealthcode/ai-detectionself-reported0.981









