Model Card for Model ID
This model is a LoRA-based fine-tuned version of Microsoft Phi-2 trained to generate concise medical-style descriptions of drugs. Given a drug name as input, the model produces a short, single-paragraph medical description following an instruction-style prompt format.
The model was trained using a two-stage pipeline consisting of continued pretraining (CPT) on domain-relevant text and supervised fine-tuning (SFT) on structured drug name–description pairs.
This model is intended strictly for educational and research purposes and is not suitable for real-world medical or clinical use.
Model Details
Model Description
This model is a parameter-efficient fine-tuned version of the Microsoft Phi-2 language model, adapted to generate concise medical drug descriptions from drug names. The training pipeline consists of two stages:
- Continued Pretraining (CPT) to adapt the base model to drug and medical terminology.
- Supervised Fine-Tuning (SFT) using instruction-style input–output pairs.
LoRA adapters were used during fine-tuning to reduce memory usage and training cost while preserving base model knowledge.
- Developed by: Atharva Gaykar
- Funded by: Not applicable
- Shared by: Atharva Gaykar
- Model type: Causal Language Model (LoRA-adapted)
- Language(s) (NLP): English
- License: CC-BY-NC 4.0
- Finetuned from model: microsoft/phi-2
Uses
This model is designed to generate concise medical-style descriptions of drugs given their names.
Direct Use
- Educational demonstrations of instruction-following language models
- Academic research on medical-domain adaptation
- Experimentation with CPT + SFT pipelines
- Studying hallucination behavior in domain-specific LLMs
The model should only be used in non-production, educational, or research settings.
Out-of-Scope Use
This model is not designed or validated for:
- Medical diagnosis or treatment planning
- Clinical decision support systems
- Dosage recommendations or prescribing guidance
- Patient-facing healthcare applications
- Professional medical, pharmaceutical, or regulatory use
- Any real-world deployment where incorrect medical information could cause harm
Bias, Risks, and Limitations
This model was developed solely for educational purposes and must not be used in real-world medical or clinical decision-making.
Known Limitations
- May hallucinate incorrect drug indications or mechanisms
- Generated descriptions may be incomplete or outdated
- Does not verify outputs against authoritative medical sources
- Does not understand patient context, dosage, or drug interactions
- Output quality is sensitive to prompt phrasing
Risks
- Misinterpretation of outputs as medical advice
- Overconfidence in fluent but inaccurate responses
- Potential propagation of misinformation if misused
Recommendations
- Always verify outputs using trusted medical references
- Use only in controlled, non-production environments
- Clearly disclose limitations in any downstream use
- Avoid deployment in safety-critical or healthcare systems
How to Get Started with the Model
This repository contains LoRA adapter weights, not a full model.
Example usage (conceptual):
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Gaykar/phi2-drug-lora")
model.eval()
import torch
# Drug to evaluate
drug_name = "Anavip"
# Build evaluation prompt
eval_prompt = (
"Generate exactly ONE sentence describing the drug.\n"
"Do not include headings or extra information.\n\n"
f"Drug Name: {drug_name}\n"
"Description:"
)
# Tokenize prompt
model_input = tokenizer(
eval_prompt,
return_tensors="pt"
).to(model.device)
# Generate output (greedy decoding)
with torch.no_grad():
output = model.generate(
**model_input,
do_sample=False, # Greedy decoding (This decision is critical for this model because it operates in the medical domain, where factual consistency and determinism are more important than linguistic diversity.)
max_new_tokens=120,
repetition_penalty=1.1,
eos_token_id=tokenizer.eos_token_id
)
# Remove prompt tokens
prompt_length = model_input["input_ids"].shape[1]
generated_tokens = output[0][prompt_length:]
# Decode generated text only
generated_text = tokenizer.decode(
generated_tokens,
skip_special_tokens=True
).strip()
# Enforce single-sentence output
if "." in generated_text:
generated_text = generated_text.split(".")[0] + "."
print(" DRUG NAME:", drug_name)
print(" MODEL GENERATED DESCRIPTION:")
print(generated_text)
#Example output
DRUG NAME (EVAL): Anavip
MODEL GENERATED DESCRIPTION:
Anavip (Crotalidae immune $F(ab')_{2}$ equine) is an antivenin used to treat adults and children with crotalid snake envenomation (rattlesnake, copperhead, or cottonmouth/water moccasin).
Training Details
Training Data
- Dataset: Gaykar/DrugData
- Structured drug name–description pairs
- Used for both CPT (domain adaptation) and SFT (instruction following)
Training Procedure
Continued Pretraining (CPT)
The base model was further trained on domain-relevant medical and drug-related text to improve familiarity with terminology and style. CPT focused on next-token prediction without instruction formatting.
Supervised Fine-Tuning (SFT)
After CPT, the model was fine-tuned using instruction-style prompts to generate concise medical descriptions from drug names.
Training Hyperparameters
CPT Hyperparameters
| Hyperparameter | Value |
|---|---|
| Batch size (per device) | 1 |
| Effective batch size | 8 |
| Epochs | 4 |
| Learning rate | 2e-4 |
| Precision | FP16 |
| Optimizer | Paged AdamW (8-bit) |
| Logging steps | 10 |
| Checkpoint saving | Every 500 steps |
| Checkpoint limit | 2 |
SFT Hyperparameters
| Hyperparameter | Value |
|---|---|
| Batch size (per device) | 4 |
| Gradient accumulation | 1 |
| Effective batch size | 4 |
| Epochs | 5 |
| Learning rate | 5e-5 |
| LR scheduler | Linear |
| Warmup ratio | 6% |
| Weight decay | 1e-4 |
| Max gradient norm | 1.0 |
| Precision | FP16 |
| Optimizer | Paged AdamW (8-bit) |
| Checkpoint saving | Every 50 steps |
| Checkpoint limit | 2 |
| Experiment tracking | Weights & Biases |
Evaluation
Testing Data
Drug names sampled from the same dataset were used for evaluation. Outputs were assessed for factual correctness using an external LLM-based evaluation approach.
Metrics
Evaluation Method: LLM-as-a-Judge (Google Gemini )
- Binary classification: Factually Correct / Hallucinated
- Three evaluation batches
Results
Batch 1
| Category | Count | Percentage |
|---|---|---|
| Total Drugs Evaluated | 25 | 100% |
| Factually Correct | 24 | 96% |
| Hallucinated / Failed | 1 | 4% |
Batch 2
| Category | Count | Percentage |
|---|---|---|
| Total Drugs Evaluated | 25 | 100% |
| Factually Correct | 22 | 88% |
| Hallucinated / Failed | 3 | 12% |
Batch 3
| Category | Count | Percentage |
|---|---|---|
| Total Drugs Evaluated | 10 | 100% |
| Factually Correct | 10 | 100% |
| Hallucinated / Failed | 0 | 0% |
Summary
Since this model was fine-tuned using LoRA rather than full-parameter fine-tuning, eliminating hallucinations entirely is challenging. While LoRA enables efficient training and strong instruction-following behavior, it does not fully overwrite the base model’s internal knowledge. Despite this limitation, the model performs well for educational and research-oriented drug description generation tasks.
Environmental Impact
- Hardware Type: NVIDIA T4 GPU
- Hours used: Not recorded
- Cloud Provider: Google Colab
- Compute Region: Not specified
- Carbon Emitted: Not estimated
Technical Specifications
Model Architecture and Objective
- Base model: Microsoft Phi-2
- Objective: Instruction-following text generation
- Adaptation method: LoRA (PEFT)
Compute Infrastructure
Hardware
- NVIDIA T4 GPU
Software
- Transformers
- PEFT
- PyTorch
Model Card Contact
Atharva Gaykar
Framework Versions
- PEFT 0.18.0
- Downloads last month
- 13
Model tree for Gaykar/phi2-drug-lora
Base model
microsoft/phi-2