CardioEmbed-BioLinkBERT

Domain-specialized cardiology text embeddings using LoRA-adapted BioLinkBERT-large

This is the best performing model from our comparative study of 10 embedding architectures for clinical cardiology.

Performance

Metric Score
Separation Score 0.510
Similar Pair Avg 0.811
Different Pair Avg 0.301
Throughput 143.5 emb/sec
Memory 1.51 GB

Usage

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModel.from_pretrained("michiyasunaga/BioLinkBERT-large")
tokenizer = AutoTokenizer.from_pretrained("michiyasunaga/BioLinkBERT-large")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "richardyoung/CardioEmbed-BioLinkBERT")

# Generate embeddings
text = "Atrial fibrillation with rapid ventricular response"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)

Training

  • Training Data: 106,535 cardiology text pairs from medical textbooks
  • Method: LoRA fine-tuning (r=16, alpha=32)
  • Loss: Multiple Negatives Ranking Loss (InfoNCE)

Citation

@article{young2024comparative,
  title={Comparative Analysis of LoRA-Adapted Embedding Models for Clinical Cardiology Text Representation},
  author={Young, Richard J and Matthews, Alice M},
  journal={arXiv preprint},
  year={2024}
}

Related Models

This is part of the CardioEmbed model family. See richardyoung/CardioEmbed for more models.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for richardyoung/CardioEmbed-BioLinkBERT

Adapter
(1)
this model

Collection including richardyoung/CardioEmbed-BioLinkBERT