ppddddpp's picture
Update README.md
11aaf30 verified
---
license: mit
tags:
- chest-xray
- medical
- multimodal
- retrieval
- explanation
- clinicalbert
- swin-transformer
- deep-learning
- image-text
datasets:
- openi
language:
- en
---
# Multimodal Chest X-ray Retrieval & Diagnosis (ClinicalBERT + MedCLIP/Swin)
This model jointly encodes chest X-rays (DICOM) and radiology reports (XML) to:
- Predict medical conditions from multimodal input (image + text)
- Retrieve similar cases using shared disease-aware embeddings
- Provide visual explanations using attention, GAM-CAM Integrated Gradients (IG)
> Developed as a final project at HCMUS.
---
## Model Architecture
- **Image Encoder:** Swin Transformer (pretrained) / MedCLIP (pretrained)
- **Text Encoder (Base):** ClinicalBERT
- **Fusion Module:** Cross-modal attention with hybrid FFN layers
- **Losses:** BCE + Focal Loss for multi-label classification
Embeddings from both modalities are projected into a **shared joint space**, enabling retrieval and explanation.
---
## Training Data
- **Dataset:** [NIH Open-i Chest X-ray Dataset](https://openi.nlm.nih.gov/)
- **Input Modalities:**
- Chest X-ray DICOMs
- Associated XML radiology reports
- **Labels:** MeSH-derived disease categories (multi-label)
---
## Intended Uses
* Clinical Education: Case similarity search for radiology students
* Research: Baseline for multimodal medical retrieval
* Explainability: Visualize disease evidence in both image and text
## Model Performance
### Classification
The model was evaluated on a held-out **evaluation set** and a **separate test set** across 22 disease labels. The highest performance metrics are achieved using the **MedCLIP** text encoder. Metrics include **Precision (Prec)**, **Recall (Rec)**, **F1-score**, and **AUROC**.
| Metric | Eval Set (Macro Avg) | Test Set (Macro Avg) |
|--------|----------------------|----------------------|
| F1-score | **0.7967** | **0.8974** |
| AUROC | **0.9664** | **0.7372** |
| AP | **0.9138** | **0.7648** |
*The model achieves strong label-level performance, particularly on common findings such as COPD, Cardiomegaly, and Musculoskeletal degenerative diseases. The MedCLIP configuration significantly improves overall performance.*
---
### Retrieval Performance
Retrieval was evaluated under two protocols. Metrics demonstrate strong performance in retrieving relevant cases across different datasets.
| Protocol | P@5 | mAP | MRR | DCG@5 | Avg Time (ms) |
|----------|-----|-----|-----|-------|---------------|
| Generalization (test → test) | 0.7463 | **0.0068** | 0.848 | **0.9381** | 0.77 |
| Historical (test → train) | 0.9173 | **0.0010** | 0.881 | 0.9503 | 0.58 |
---
### Explainability Performance
Attribution metrics confirm high visual fidelity, ensuring the model's attention aligns with clinically relevant image regions.
| Metric | Value | Interpretation |
|--------|------|-----------------|
| Pearson correlation ($\rho$) | **0.9163** | High linear agreement across attribution maps |
| [email protected] | **0.5762** | Moderate overlap of top 5\% most salient regions |
| [email protected] | **0.2519** | Moderate overlap across broader 20\% salient regions |
*The model retrieves diverse and relevant cases, enabling multimodal explanation and case-based reasoning for clinical education.*
---
### Notes
- Retrieval and diversity metrics highlight the model’s ability to surface multiple relevant cases per query.
- Lower performance on some rare labels may reflect dataset imbalance in Open-i.
---
## Limitations & Risks
* Trained on a public dataset (Open-i) — may not generalize to other hospitals
* Explanations are not clinically validated
* Not for diagnostic use in real-world settings
---
---
## Acknowledgments
* [NIH Open-i Dataset](https://openi.nlm.nih.gov/faq#collection)
* [DOID](http://purl.obolibrary.org/obo/doid.obo)
* [RADLEX](https://bioportal.bioontology.org/ontologies/RADLEX)
* Swin Transformer (Timm)
* ClinicalBERT (Emily Alsentzer)
* MedCLIP (Zifeng Wang et al., EMNLP 2022)
* Captum (for IG explanations)
* Gam-CAM
## Code link: [GitHub](https://github.com/ppddddpp/unified-multimodal-chestxray)