unified-multimodal-chestxray / README.md

Update README.md

11aaf30 verified 11 days ago

4.17 kB

	---
	license: mit
	tags:
	- chest-xray
	- medical
	- multimodal
	- retrieval
	- explanation
	- clinicalbert
	- swin-transformer
	- deep-learning
	- image-text
	datasets:
	- openi
	language:
	- en
	---

	# Multimodal Chest X-ray Retrieval & Diagnosis (ClinicalBERT + MedCLIP/Swin)

	This model jointly encodes chest X-rays (DICOM) and radiology reports (XML) to:

	- Predict medical conditions from multimodal input (image + text)
	- Retrieve similar cases using shared disease-aware embeddings
	- Provide visual explanations using attention, GAM-CAM Integrated Gradients (IG)

	> Developed as a final project at HCMUS.

	---

	## Model Architecture

	- Image Encoder: Swin Transformer (pretrained) / MedCLIP (pretrained)
	- Text Encoder (Base): ClinicalBERT
	- Fusion Module: Cross-modal attention with hybrid FFN layers
	- Losses: BCE + Focal Loss for multi-label classification

	Embeddings from both modalities are projected into a shared joint space, enabling retrieval and explanation.

	---

	## Training Data

	- Dataset: [NIH Open-i Chest X-ray Dataset](https://openi.nlm.nih.gov/)
	- Input Modalities:
	- Chest X-ray DICOMs
	- Associated XML radiology reports
	- Labels: MeSH-derived disease categories (multi-label)

	---

	## Intended Uses
	* Clinical Education: Case similarity search for radiology students

	* Research: Baseline for multimodal medical retrieval

	* Explainability: Visualize disease evidence in both image and text

	## Model Performance

	### Classification

	The model was evaluated on a held-out evaluation set and a separate test set across 22 disease labels. The highest performance metrics are achieved using the MedCLIP text encoder. Metrics include Precision (Prec), Recall (Rec), F1-score, and AUROC.

	\| Metric \| Eval Set (Macro Avg) \| Test Set (Macro Avg) \|
	\|--------\|----------------------\|----------------------\|
	\| F1-score \| 0.7967 \| 0.8974 \|
	\| AUROC \| 0.9664 \| 0.7372 \|
	\| AP \| 0.9138 \| 0.7648 \|

	The model achieves strong label-level performance, particularly on common findings such as COPD, Cardiomegaly, and Musculoskeletal degenerative diseases. The MedCLIP configuration significantly improves overall performance.

	---

	### Retrieval Performance

	Retrieval was evaluated under two protocols. Metrics demonstrate strong performance in retrieving relevant cases across different datasets.

	\| Protocol \| P@5 \| mAP \| MRR \| DCG@5 \| Avg Time (ms) \|
	\|----------\|-----\|-----\|-----\|-------\|---------------\|
	\| Generalization (test → test) \| 0.7463 \| 0.0068 \| 0.848 \| 0.9381 \| 0.77 \|
	\| Historical (test → train) \| 0.9173 \| 0.0010 \| 0.881 \| 0.9503 \| 0.58 \|

	---

	### Explainability Performance

	Attribution metrics confirm high visual fidelity, ensuring the model's attention aligns with clinically relevant image regions.

	\| Metric \| Value \| Interpretation \|
	\|--------\|------\|-----------------\|
	\| Pearson correlation ($\rho$) \| 0.9163 \| High linear agreement across attribution maps \|
	\| [email protected] \| 0.5762 \| Moderate overlap of top 5\% most salient regions \|
	\| [email protected] \| 0.2519 \| Moderate overlap across broader 20\% salient regions \|

	The model retrieves diverse and relevant cases, enabling multimodal explanation and case-based reasoning for clinical education.

	---

	### Notes

	- Retrieval and diversity metrics highlight the model’s ability to surface multiple relevant cases per query.
	- Lower performance on some rare labels may reflect dataset imbalance in Open-i.

	---

	## Limitations & Risks
	* Trained on a public dataset (Open-i) — may not generalize to other hospitals

	* Explanations are not clinically validated

	* Not for diagnostic use in real-world settings

	---

	---
	## Acknowledgments
	* [NIH Open-i Dataset](https://openi.nlm.nih.gov/faq#collection)

	* [DOID](http://purl.obolibrary.org/obo/doid.obo)

	* [RADLEX](https://bioportal.bioontology.org/ontologies/RADLEX)

	* Swin Transformer (Timm)

	* ClinicalBERT (Emily Alsentzer)

	* MedCLIP (Zifeng Wang et al., EMNLP 2022)

	* Captum (for IG explanations)

	* Gam-CAM

	## Code link: [GitHub](https://github.com/ppddddpp/unified-multimodal-chestxray)