Instructions to use ctheodoris/Geneformer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ctheodoris/Geneformer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="ctheodoris/Geneformer")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("ctheodoris/Geneformer") model = AutoModelForMaskedLM.from_pretrained("ctheodoris/Geneformer") - Inference
- Notebooks
- Google Colab
- Kaggle
Size of tensor a does not match size of tensor b in InSilicoPerturbation
Thank you for making this model available!
I have had previous success running InSilicoPerturber on a subsetted Genecorpus-30M dataset composed of ~2700 cells, using the same parameters (I am simply using the Genecorpus-30M dataset to test how the InSilicoPerturbation functions works). However, after changes were pushed on Aug 2, 2023 to fix the attention mask issue, I now receive the following error:
isp = InSilicoPerturber(perturb_type="delete",
perturb_rank_shift=None,
genes_to_perturb=["ENSG00000135100"],
combos=0,
anchor_gene=None,
model_type="Pretrained",
num_classes=0,
emb_mode="cell",
cell_emb_style="mean_pool",
filter_data=None,
cell_states_to_model=None,
max_ncells=None,
emb_layer=-1,
forward_batch_size=50,
nproc=16,
token_dictionary_file = "/home/ubuntu/Geneformer/geneformer/token_dictionary.pkl")
isp.perturb_data("/home/ubuntu/Geneformer",
"/data/subset_genecorpus/",
"/data/subset_genecorpus/delete_cell/",
"delete_cell_HNF1A")
Filter (num_proc=16): 100%|βββββββ| 2741/2741 [00:12<00:00, 214.74 examples/s]
Map (num_proc=16): 100%|βββββββββββββββ| 37/37 [00:12<00:00, 2.92 examples/s]
Map (num_proc=16): 100%|ββββββββββββββ| 37/37 [00:00<00:00, 159.23 examples/s]
Map (num_proc=16): 100%|ββββββββββββββ| 37/37 [00:00<00:00, 166.24 examples/s]
Map (num_proc=16): 100%|ββββββββββββββ| 37/37 [00:00<00:00, 165.63 examples/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/Geneformer/geneformer/in_silico_perturber.py", line 974,
in perturb_data
self.in_silico_perturb(model,
File "/home/ubuntu/Geneformer/geneformer/in_silico_perturber.py", line 1052,
in in_silico_perturb
cos_sims_data = quant_cos_sims(model,
File "/home/ubuntu/Geneformer/geneformer/in_silico_perturber.py", line 444,
in quant_cos_sims
cos_sims += [cos(minibatch_emb, minibatch_comparison).to("cpu")]
File "/opt/tensorflow/lib/python3.10/site-packages/torch/nn/modules/module.p
y", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/tensorflow/lib/python3.10/site-packages/torch/nn/modules/distance
.py", line 87, in forward
return F.cosine_similarity(x1, x2, self.dim, self.eps)
RuntimeError: The size of tensor a (2047) must match the size of tensor b (204
6) at non-singleton dimension 1
I've referenced Discussion #85 to help with this issue; however changing the batch
size to 200 still raises the same error. I also have the latest version of Geneformer pulled.
Could I get some help with why this error is now raising? Thank you!
Hi there, thanks for bringing this issue up! We've just updated the code to address the issue. Thanks for your interest!