You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Vir2vec: A Genome-Wide Viral Embedding

Model description

vir2vec is a viral genomic language model (gLM) designed to produce fixed-length, genome-level embeddings that can be fine-tuned across downstream tasks such as viral discrimination, host-range prediction, and variant typing. For more details and training scripts check GitHub

Intended use

vir2vec embeddings are intended for tasks including (but not limited to):

  • Virus vs non-virus genome/read discrimination
  • DNA vs RNA virus classification
  • Host-range prediction
  • Intra-genus separation (e.g., HIV-1 vs HIV-2)
  • Variant/subtype typing (e.g., SARS-CoV-2 lineages)
  • Phenotypic signal detection (e.g., tissue tropism proxies)

Model sizes

  • 422M
  • 138M
  • 17M

How to use

Load from Hugging Face

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("pabloarozarenad/Vir2vec", trust_remote_code=True) # Add revision=138M or revision=17M to change model size. 422M is default.
model = AutoModelForCausalLM.from_pretrained("pabloarozarenad/Vir2vec", trust_remote_code=True) # Add revision=138M or revision=17M to change model size. 422M is default.
model.eval()

Compute embeddings

dna = "ACGTAGCATCGCGATGACTGCATCACT"
inputs = tokenizer(dna, return_tensors="pt")

with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
last_hidden = outputs.hidden_states[-1] # [1, seq_len, hidden_dim]
embedding = last_hidden.max(dim=1).values[0] # [hidden_dim] (max pooling)

print(embedding.shape)

Access

vir2vec can be loaded upon request, subject to providing an institutional email address, a brief description of the intended use, and the associated IRB protocol number.

Downloads last month
4
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pabloarozarenad/Vir2vec

Finetuned
(1)
this model