scGPT (no prior) β Replogle K562/Jurkat/HepG2 β RPE1
Produced as part of the sc-interp single-cell model comparison repo.
Provenance
- Source code commit:
1d58906 - Runner:
scripts/run_scgpt.py - Dataset manifest:
data/manifests/replogle.yaml
Base model
scGPT whole-human pretrained (Cui et al. 2024), used as-is with the model's original learnable gene-token embeddings (no external prior). 12 transformer blocks, 8 heads, d_model=512, max_seq_len=1536. This run is the baseline counterpart to matthewshu/scgpt-replogle-esm-ft, which adds a frozen ESM2-15B per-gene prior; both runs use identical training data, splits, optimizer, and budget except for the prior.
Training
Source dataset: arcinstitute/State-Replogle-Filtered β CRISPRi essential-genome screens from Replogle et al. 2022 and Nadig et al. 2025. Training: 362,327 cells from K562 + Jurkat + HepG2 with 1,383 perturbations and 8,569 val pairs (held-out K562 perturbations). Evaluation: 109,207 RPE1 cells perturbed by the 1,047 genes overlapping the K562 training perturbation set, plus 10,691 real RPE1 controls.
Fine-tuned the scGPT whole-human pretrained checkpoint on this split with no additional gene prior. Used --stop-metric pearson_delta (per-perturbation Pearson on Ξ-expression) for early-stopping and best-checkpoint selection β this metric directly measures perturbation-effect prediction quality, whereas full-expression pearson is dominated by the unchanged-genes baseline. Training ran the full 30-epoch budget without early-stopping; best checkpoint is from epoch 27.
Budget and stopping
| Hardware | NVIDIA H100 PCIe (80 GB) |
| Train batch size | 192 |
| Eval batch size | 192 |
| Max epochs | 30 |
| Early-stop patience | 10 |
| Stop metric | pearson_delta |
| Epochs trained | 30 |
| Best epoch | 27 |
Best val pearson_delta |
0.1993 |
| Training cells seen | 5,400,630 |
| Wall clock | 393.6 min (~6.56 h) |
| Stop reason | max_epochs |
| AMP | fp16 |
| Optimizer | Adam, lr=1e-4, StepLR Ξ³=0.9 |
Test set metrics (cell-eval)
| metric | mean | median | max |
|---|---|---|---|
| pearson_delta | 0.1825 | 0.1424 | 0.6170 |
| pr_auc | 0.5206 | 0.5159 | 0.9191 |
| roc_auc | 0.3626 | 0.3603 | 0.4858 |
| overlap_at_N | 0.5081 | 0.4978 | 0.9252 |
| de_sig_genes_recall | 0.5313 | 0.5135 | 0.9527 |
| de_direction_match | 0.5252 | 0.5336 | 0.7896 |
| discrimination_score_l1 | 0.5091 | 0.5091 | 1.0000 |
| mae_delta | 0.1763 | 0.1737 | 0.2336 |
Compare with matthewshu/scgpt-replogle-esm-ft (same data, same budget, with ESM2-15B prior injected at the gene-embedding layer): the +ESM run reaches pearson_delta = 0.508 vs 0.183 here on test, and improves de_direction_match, de_sig_genes_recall, and overlap_at_N by 5β11 absolute percentage points. The two runs share commit, runner code, dataset manifest, split, optimizer, and batch size.
Known limitations
- Cell line distribution shift: trained on K562/Jurkat/HepG2, evaluated on RPE1.
- Test set restricted to the 1,047 perturbed genes overlapping K562 training perturbations β does not test out-of-distribution perturbed genes.
roc_auc< 0.5 on test (also seen in the +ESM counterpart) β same eval pipeline in both, so a cell-eval/data convention quirk rather than a model defect.
Files
best_model.ptβ fine-tuned scGPT weights (PyTorch state_dict, best val pearson_delta)args.jsonβ scGPT pretrained args (whole-human checkpoint config)vocab.jsonβ scGPT gene-token vocabularytraining_stats.jsonβ wall clock, wandb run url, epoch count, best metrics, stop reasoneval/agg_results.csvβ cell-evaldescribe()table over 1,047 RPE1 test perturbationseval/results.csvβ per-perturbation cell-eval metrics (1,047 rows Γ 28 metric columns)predictions/scgpt_replogle_test.h5adβ self-contained predictions h5ad: predicted expression in.X, ground truth in.layers['truth'], includes 10,691 real RPE1 control cells. Layout produced byscripts/run_scgpt.py:save_predictions.