---
license: mit
language:
- en
library_name: transformers
pipeline_tag: text-classification
base_model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
tags:
- clinical
- healthcare
- emergency-medicine
- OHCA
- PubMedBERT
- MIMIC
- patient-level-split
model-index:
- name: OHCA Classifier v8 (PubMedBERT fine-tuned)
  results:
  - task:
      type: text-classification
      name: Binary OHCA detection (OHCA vs non-OHCA)
    dataset:
      name: Internal (MIMIC-derived discharge notes)
      type: text
      split: test (patient-level)
    metrics:
    - type: recall
      name: Sensitivity (Recall)
      value: 1.000
    - type: specificity
      name: Specificity
      value: 0.879
    - type: precision
      name: PPV (Precision)
      value: 0.562
    - type: npv
      name: NPV
      value: 1.000
    - type: f1
      name: F1-score
      value: 0.720
    - type: auc
      name: ROC-AUC
      value: 0.971
---

# OHCA Classifier v8 — PubMedBERT fine-tuned for cardiac arrest detection

**Author:** Mona Moukaddem  
**Model:** `monajm36/ohca-classifier-v8`  
**Task:** Binary text classification — *Out-of-Hospital Cardiac Arrest (OHCA) vs Non-OHCA*  
**Base model:** `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract`

This model predicts whether a discharge note likely describes **out-of-hospital cardiac arrest (OHCA)**.  
It was fine-tuned from PubMedBERT on MIMIC-derived discharge notes using **patient-level splits** to prevent leakage.  

> ⚠️ For research and decision support only. Not a substitute for clinical judgment.

---

## How to use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "monajm36/ohca-classifier-v8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = """Chief Complaint: cardiac arrest
History of Present Illness: Patient found unresponsive at home... ROSC after EMS CPR..."""

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1).squeeze()
print(probs)
Threshold recommendations
Clinical Goal	Threshold	Behavior
High Sensitivity	0.28–0.32	Captures nearly all OHCA cases
Balanced	0.36	Validation-optimized
High Precision	≥0.50	Fewer false positives

At 0.36, validation yielded:

Sensitivity (Recall): 1.000

Specificity: 0.879

AUC: 0.971

Data & Training Summary
Source: MIMIC-derived discharge notes

Sections used: Chief Complaint, History of Present Illness

Splits: Train 210, Val 54, Test 66 (patient-level)

Max length: 512 tokens

Epochs: 5

Loss: Weighted cross-entropy

Sampler: Class-balanced

Hardware: CPU

Evaluation (Test Set)
Pred Non-OHCA	Pred OHCA
Actual Non	51	7
Actual OHCA	0	9

Metrics:

Recall: 1.000

Specificity: 0.879

Precision: 0.562

NPV: 1.000

F1-score: 0.720

AUC: 0.971

Interpretation: The model captured all OHCA cases at the chosen threshold, with 7 false positives.

License
MIT

Citation
sql
Copy code
M. Moukaddem. OHCA Classifier v8: PubMedBERT fine-tuned for Out-of-Hospital Cardiac Arrest