CardioEmbed-BioLinkBERT

Domain-specialized cardiology text embeddings using LoRA-adapted BioLinkBERT-large

This is the best performing model from our comparative study of 10 embedding architectures for clinical cardiology.

Performance

Metric	Score
Separation Score	0.510
Similar Pair Avg	0.811
Different Pair Avg	0.301
Throughput	143.5 emb/sec
Memory	1.51 GB

Usage

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModel.from_pretrained("michiyasunaga/BioLinkBERT-large")
tokenizer = AutoTokenizer.from_pretrained("michiyasunaga/BioLinkBERT-large")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "richardyoung/CardioEmbed-BioLinkBERT")

# Generate embeddings
text = "Atrial fibrillation with rapid ventricular response"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)

Training

Training Data: 106,535 cardiology text pairs from medical textbooks
Method: LoRA fine-tuning (r=16, alpha=32)
Loss: Multiple Negatives Ranking Loss (InfoNCE)

Citation

@article{young2024comparative,
  title={Comparative Analysis of LoRA-Adapted Embedding Models for Clinical Cardiology Text Representation},
  author={Young, Richard J and Matthews, Alice M},
  journal={arXiv preprint},
  year={2024}
}

Related Models

This is part of the CardioEmbed model family. See richardyoung/CardioEmbed for more models.

Downloads last month: 16

Model tree for richardyoung/CardioEmbed-BioLinkBERT

Base model

michiyasunaga/BioLinkBERT-large

Adapter

(1)

this model

Collection including richardyoung/CardioEmbed-BioLinkBERT

Medical & Healthcare AI

Collection

Models and datasets for medical AI research. Includes CardioEmbed embeddings for cardiology, medical LLMs, and synthetic patient datasets. • 9 items • Updated about 5 hours ago