ohca-classifier-v8 / README.md

monajm36

Update README.md

56b9b62 verified 15 days ago

preview code

raw

history blame contribute delete

3.18 kB

metadata

license: mit
language:
  - en
library_name: transformers
pipeline_tag: text-classification
base_model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
tags:
  - clinical
  - healthcare
  - emergency-medicine
  - OHCA
  - PubMedBERT
  - MIMIC
  - patient-level-split
model-index:
  - name: OHCA Classifier v8 (PubMedBERT fine-tuned)
    results:
      - task:
          type: text-classification
          name: Binary OHCA detection (OHCA vs non-OHCA)
        dataset:
          name: Internal (MIMIC-derived discharge notes)
          type: text
          split: test (patient-level)
        metrics:
          - type: recall
            name: Sensitivity (Recall)
            value: 1
          - type: specificity
            name: Specificity
            value: 0.879
          - type: precision
            name: PPV (Precision)
            value: 0.562
          - type: npv
            name: NPV
            value: 1
          - type: f1
            name: F1-score
            value: 0.72
          - type: auc
            name: ROC-AUC
            value: 0.971

OHCA Classifier v8 — PubMedBERT fine-tuned for cardiac arrest detection

Author: Mona Moukaddem
Model: monajm36/ohca-classifier-v8
Task: Binary text classification — Out-of-Hospital Cardiac Arrest (OHCA) vs Non-OHCA
Base model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract

This model predicts whether a discharge note likely describes out-of-hospital cardiac arrest (OHCA).
It was fine-tuned from PubMedBERT on MIMIC-derived discharge notes using patient-level splits to prevent leakage.

⚠️ For research and decision support only. Not a substitute for clinical judgment.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "monajm36/ohca-classifier-v8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = """Chief Complaint: cardiac arrest
History of Present Illness: Patient found unresponsive at home... ROSC after EMS CPR..."""

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1).squeeze()
print(probs)
Threshold recommendations
Clinical Goal	Threshold	Behavior
High Sensitivity	0.28–0.32	Captures nearly all OHCA cases
Balanced	0.36	Validation-optimized
High Precision	≥0.50	Fewer false positives

At 0.36, validation yielded:

Sensitivity (Recall): 1.000

Specificity: 0.879

AUC: 0.971

Data & Training Summary
Source: MIMIC-derived discharge notes

Sections used: Chief Complaint, History of Present Illness

Splits: Train 210, Val 54, Test 66 (patient-level)

Max length: 512 tokens

Epochs: 5

Loss: Weighted cross-entropy

Sampler: Class-balanced

Hardware: CPU

Evaluation (Test Set)
Pred Non-OHCA	Pred OHCA
Actual Non	51	7
Actual OHCA	0	9

Metrics:

Recall: 1.000

Specificity: 0.879

Precision: 0.562

NPV: 1.000

F1-score: 0.720

AUC: 0.971

Interpretation: The model captured all OHCA cases at the chosen threshold, with 7 false positives.

License
MIT

Citation
sql
Copy code
M. Moukaddem. OHCA Classifier v8: PubMedBERT fine-tuned for Out-of-Hospital Cardiac Arrest