license: cc-by-nd-4.0 language: - en base_model: - timm/timm/resnet50.a1_in1k pipeline_tag: image-classification tags: - breast-cancer - whole-slide-image - TCGA-BRCA - Multiple-Instance-Learning

ResNet50-MIL for TCGA Breast Cancer

This model is a Vision Transformer (ViT) based Multiple Instance Learning (MIL) framework designed for detecting breast cancer in whole slide images (WSI) of surgically removed breast tissue.

πŸ† Institutional Achievement

Developed as part of National HPC Supporting Program by AICA, Gwangju, s.Korea and also partly through National NPU Support Program by NIPA, Daegu, s. Korea where Elice Group Co., Ltd. in Seoul, s. Korea kindly provided A100 x 2 GPUs for the model training. This model represents our commitment to reducing the manual workload of pathologists through high-performance AI.

πŸ“Š Model Details

  • Architecture: ResNet50-Backbone with Attention-based MIL Aggregator
  • Training Data: TCGA-BRCA (H&E Stained Slides)
  • Framework: Keras / TensorFlow
  • Target: Detection of breast cancers in surgically removed breast tissue
  • Note: keras_hub utilizes standardized Vision Transformer weights originally researched and released by the Google/timm teams. The base_model tag on Hugging Face is used for lineage tracking.

Model Reproducibility

The implementation, including feature extraction and MIL training, is provided as an interactive Jupyter Notebook in our GitHub repository. This allows researchers to step through the pipeline cell-by-cell.

πŸ“ Dataset & Data Availability

The model was trained on a curated version of the TCGA-BRCA dataset, processed into 10x patches.

Dataset Components:

  • Patches: Extracted at 10.0x magnification(for morphological features).

Access:

Due to the significant storage size and ongoing curation for commercial spin-off readiness, the processed dataset is not publicly hosted at this time.

  • Academic Researchers: Available upon reasonable request for validation purposes.
  • Inquiries: Please contact [dskim@btrust.co.kr] for data access requests.

πŸ“Š Dataset Pipeline

We provide the full pipeline to convert original TCGA-BRCA's .svs images into the TFRecord format used for training this model. Available at https://github.com/kimdesok/ViT-backbone-MIL-on-TCGA/SVS_to_TFRecord_Convert.ipynb

Data Components

  • Source: Original TCGA-BRCA WSIs (.svs)
  • Output: TFRecord sets (10.0x magnification)
  • Contents: Patch sets

Accessing the Data

The processed TFRecord files are hosted on our secure institutional storage due to their large scale.

  • Scripts: See here for the SVS-to-TFRecord conversion code.
  • Download: To request access to the pre-processed TFRecord sets, please fill out our Data Request Form/Email us.

πŸ“ˆ Version History

Version Date Description Status
v1.0 2024-05-22 Initial Release (Fine-tuned on TCGA-BRCA) Current
v2.0 (TBD) Planned Virchow 2.0 Integration on H100 R&D Phase

⚠️ License & Commercial Use

This model is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

  • Academics: Free to use for research and publications.
  • Industry/Commercial: Use for-profit requires a separate commercial license.
  • Inquiries: Please contact [dskim@btrust.co.kr] for licensing and collaboration.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kimdesok/ResNet50_MIL_TCGA_BRCA

Finetuned
(3)
this model