dbMiM Neuron Segmentation

Official implementation for dbMiM pretraining and CREMI neuron segmentation. The maintained workflow is:

prepare unlabeled EM volumes for self-supervised pretraining;
run dbMiM / MAE-style masked-image pretraining;
finetune an anisotropic 3D UNETR affinity model on CREMI;
evaluate full CREMI A/B/C volumes with VOI and adapted Rand error (ARAND);
decode instances with the reference waterz post-processing backend.

Learnable / differentiable post-processing is developed separately at https://github.com/ydchen0806/nnEM-Seg-diff-postprocess.

Method

The segmentation model is UNETRAnisotropicAffinityNet.

Input crop: 32 x 160 x 160
Patch size: (4, 16, 16)
Output: z/y/x nearest-neighbor affinity logits
Backbone: ViT encoder initialized from dbMiM pretraining
Decoder: UNETR-style staged upsampling with an anisotropic z transition
Finetuning loss: MSE + membrane-aware spatial weighting (MAWS)
Evaluation: full-volume CREMI A/B/C inference, ignore_label=0, boundary ignore xy=1, z=0

dbMiM pretraining masks 3D ViT patches and reconstructs EM voxels with membrane-aware weighting and a lightweight structure-consistency term. Plain MAE controls use the same data, model size, crop size, mask ratio, and schedule with the dbMiM-specific terms disabled.

Results

Lower is better for both VOI and ARAND.

Run	Checkpoint	VOI	ARAND	Note
R48	`weights/publicem_dbmim_r48_seed309_long20k/finetuned_latest.pt`	0.962154	0.178252	Best VOI
R57	`weights/publicem_dbmim_r57_seed777_long20k/finetuned_latest.pt`	0.964617	0.178248	Best ARAND in repeat sweep
R33	`weights/fullem_mixedmask_dbmim_r33/finetuned_latest.pt`	1.039372	0.205380	Best fullEM recipe

Validation is run on public labeled CREMI A/B/C training volumes, not hidden challenge labels.

Weights

Weights are hosted on Hugging Face:

https://huggingface.co/cyd0806/dbmim-neuron-segmentation

File	Use
`weights/publicem_dbmim_r48_seed309_long20k/finetuned_latest.pt`	Recommended segmentation checkpoint
`weights/publicem_dbmim_r57_seed777_long20k/finetuned_latest.pt`	ARAND-best repeat checkpoint
`weights/publicem_dbmim_r17/pretrained_latest.pt`	PublicEM dbMiM encoder pretraining checkpoint
`weights/publicem_dbmim_r17/finetuned_latest.pt`	Earlier publicEM finetuned checkpoint
`weights/fullem_mixedmask_dbmim_r33/pretrained_latest.pt`	FullEM mixed-mask pretraining checkpoint
`weights/fullem_mixedmask_dbmim_r33/finetuned_latest.pt`	FullEM mixed-mask finetuned checkpoint

Download example:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="cyd0806/dbmim-neuron-segmentation",
    local_dir="weights/dbmim-neuron-segmentation",
)

Setup

git clone https://github.com/ydchen0806/dbMiM.git
cd dbMiM
python -m pip install -r requirements-dbMIM.txt

The waterz reference backend is optional for training but required for the reported instance-segmentation metrics.

Data

CREMI finetuning/evaluation expects the public CREMI 2016 training files under:

data/CREMI/sample_A_20160501.hdf
data/CREMI/sample_B_20160501.hdf
data/CREMI/sample_C_20160501.hdf

Prepare public EM pretraining data:

python scripts/prepare_public_em_pretrain_data.py \
  --target-dir data/EM_pretrain_data

Prepare the larger fullEM pretraining set:

HF_TOKEN=<your_token> python scripts/prepare_em_pretrain_data.py \
  --target-dir data/EM_pretrain_data

Pretraining

PublicEM dbMiM pretraining:

python train_pretrain.py \
  --config configs/pretrain_public_em_membrane_r16.yaml

FullEM mixed-mask pretraining:

python train_pretrain.py \
  --config configs/pretrain_em_full_mixedmask_dbmim_r33.yaml

Finetuning

Recommended R48 finetuning recipe:

python train_finetune.py \
  --config configs/finetune_cremi_real_unetr_aniso_em_mse_maws_publicem_r16_seed309_long20k_r48q.yaml

The config points to the pretrained encoder checkpoint. Update the path if the weights are stored outside outputs/.

Evaluation

Run full-volume CREMI A/B/C waterz evaluation:

python scripts/evaluate_cremi_segmentation.py \
  --config configs/finetune_cremi_real_unetr_aniso_em_mse_maws_publicem_r16_seed309_long20k_r48q.yaml \
  --checkpoint outputs/finetune_cremi_real_unetr_aniso_em_mse_maws_publicem_r16_seed309_long20k_r48q/finetuned_latest.pt \
  --data-dir data/CREMI \
  --output-dir outputs/eval_r48_cremi_abc \
  --crop-size 0 0 0 \
  --stride 16 80 80 \
  --backends waterz \
  --thresholds 0.16 0.18 0.20 0.22 0.24 \
  --calibration-biases -0.25 -0.50 -0.50 \
  --seed-method maxima_distance \
  --seed-distance 10 \
  --boundary-threshold 0.5 \
  --waterz-scoring hist_quantile \
  --batched-waterz \
  --metric-backend skimage \
  --ignore-label 0 \
  --cremi-boundary-ignore-distance-xy 1 \
  --cremi-boundary-ignore-distance-z 0 \
  --device cuda

--batched-waterz evaluates all waterz thresholds for each affinity variant in one waterz hierarchy pass. It keeps the reported R48 VOI unchanged (0.962154) and reduces threshold-sweep post-processing time from about 75s to about 17s on CREMI A/B/C.

The summary is written to:

outputs/eval_r48_cremi_abc/cremi_segmentation_summary.json

Repository Layout

dbmim/                         Models, datasets, metrics, utilities
configs/                       Pretraining and finetuning configs
scripts/prepare_*_data.py       Data preparation
scripts/evaluate_*.py           CREMI evaluation
train_pretrain.py               dbMiM / MAE pretraining
train_finetune.py               CREMI affinity finetuning

Large datasets, checkpoints, and generated outputs are not tracked in Git.

Downloads last month: -; Downloads are not tracked for this model. How to track