Model Card for Model ID

This model is a LoRA-based fine-tuned version of Microsoft Phi-2 trained to generate concise medical-style descriptions of drugs. Given a drug name as input, the model produces a short, single-paragraph medical description following an instruction-style prompt format.

The model was trained using a two-stage pipeline consisting of continued pretraining (CPT) on domain-relevant text and supervised fine-tuning (SFT) on structured drug name–description pairs.

This model is intended strictly for educational and research purposes and is not suitable for real-world medical or clinical use.


Model Details

Model Description

This model is a parameter-efficient fine-tuned version of the Microsoft Phi-2 language model, adapted to generate concise medical drug descriptions from drug names. The training pipeline consists of two stages:

  1. Continued Pretraining (CPT) to adapt the base model to drug and medical terminology.
  2. Supervised Fine-Tuning (SFT) using instruction-style input–output pairs.

LoRA adapters were used during fine-tuning to reduce memory usage and training cost while preserving base model knowledge.

  • Developed by: Atharva Gaykar
  • Funded by: Not applicable
  • Shared by: Atharva Gaykar
  • Model type: Causal Language Model (LoRA-adapted)
  • Language(s) (NLP): English
  • License: CC-BY-NC 4.0
  • Finetuned from model: microsoft/phi-2

Uses

This model is designed to generate concise medical-style descriptions of drugs given their names.

Direct Use

  • Educational demonstrations of instruction-following language models
  • Academic research on medical-domain adaptation
  • Experimentation with CPT + SFT pipelines
  • Studying hallucination behavior in domain-specific LLMs

The model should only be used in non-production, educational, or research settings.

Out-of-Scope Use

This model is not designed or validated for:

  • Medical diagnosis or treatment planning
  • Clinical decision support systems
  • Dosage recommendations or prescribing guidance
  • Patient-facing healthcare applications
  • Professional medical, pharmaceutical, or regulatory use
  • Any real-world deployment where incorrect medical information could cause harm

Bias, Risks, and Limitations

This model was developed solely for educational purposes and must not be used in real-world medical or clinical decision-making.

Known Limitations

  • May hallucinate incorrect drug indications or mechanisms
  • Generated descriptions may be incomplete or outdated
  • Does not verify outputs against authoritative medical sources
  • Does not understand patient context, dosage, or drug interactions
  • Output quality is sensitive to prompt phrasing

Risks

  • Misinterpretation of outputs as medical advice
  • Overconfidence in fluent but inaccurate responses
  • Potential propagation of misinformation if misused

Recommendations

  • Always verify outputs using trusted medical references
  • Use only in controlled, non-production environments
  • Clearly disclose limitations in any downstream use
  • Avoid deployment in safety-critical or healthcare systems

How to Get Started with the Model

This repository contains LoRA adapter weights, not a full model.

Example usage (conceptual):

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Gaykar/phi2-drug-lora")

model.eval()


import torch

# Drug to evaluate
drug_name = "Anavip"

# Build evaluation prompt
eval_prompt = (
    "Generate exactly ONE sentence describing the drug.\n"
    "Do not include headings or extra information.\n\n"
    f"Drug Name: {drug_name}\n"
    "Description:"
)

# Tokenize prompt
model_input = tokenizer(
    eval_prompt,
    return_tensors="pt"
).to(model.device)

# Generate output (greedy decoding)
with torch.no_grad():
    output = model.generate(
        **model_input,
        do_sample=False,              # Greedy decoding (This decision is critical for this model because it operates in the medical domain, where factual consistency and determinism are more important than linguistic diversity.)
        max_new_tokens=120,
        repetition_penalty=1.1,
        eos_token_id=tokenizer.eos_token_id
    )

# Remove prompt tokens
prompt_length = model_input["input_ids"].shape[1]
generated_tokens = output[0][prompt_length:]

# Decode generated text only
generated_text = tokenizer.decode(
    generated_tokens,
    skip_special_tokens=True
).strip()

# Enforce single-sentence output
if "." in generated_text:
    generated_text = generated_text.split(".")[0] + "."

print(" DRUG NAME:", drug_name)
print(" MODEL GENERATED DESCRIPTION:")
print(generated_text)

#Example output
DRUG NAME (EVAL): Anavip

MODEL GENERATED DESCRIPTION:
Anavip (Crotalidae immune $F(ab')_{2}$ equine) is an antivenin used to treat adults and children with crotalid snake envenomation (rattlesnake, copperhead, or cottonmouth/water moccasin).


Training Details

Training Data

  • Dataset: Gaykar/DrugData
  • Structured drug name–description pairs
  • Used for both CPT (domain adaptation) and SFT (instruction following)

Training Procedure

Continued Pretraining (CPT)

The base model was further trained on domain-relevant medical and drug-related text to improve familiarity with terminology and style. CPT focused on next-token prediction without instruction formatting.

Supervised Fine-Tuning (SFT)

After CPT, the model was fine-tuned using instruction-style prompts to generate concise medical descriptions from drug names.

Training Hyperparameters

CPT Hyperparameters

Hyperparameter Value
Batch size (per device) 1
Effective batch size 8
Epochs 4
Learning rate 2e-4
Precision FP16
Optimizer Paged AdamW (8-bit)
Logging steps 10
Checkpoint saving Every 500 steps
Checkpoint limit 2

SFT Hyperparameters

Hyperparameter Value
Batch size (per device) 4
Gradient accumulation 1
Effective batch size 4
Epochs 5
Learning rate 5e-5
LR scheduler Linear
Warmup ratio 6%
Weight decay 1e-4
Max gradient norm 1.0
Precision FP16
Optimizer Paged AdamW (8-bit)
Checkpoint saving Every 50 steps
Checkpoint limit 2
Experiment tracking Weights & Biases

Evaluation

Testing Data

Drug names sampled from the same dataset were used for evaluation. Outputs were assessed for factual correctness using an external LLM-based evaluation approach.

Metrics

Evaluation Method: LLM-as-a-Judge (Google Gemini )

  • Binary classification: Factually Correct / Hallucinated
  • Three evaluation batches

Results

Batch 1

Category Count Percentage
Total Drugs Evaluated 25 100%
Factually Correct 24 96%
Hallucinated / Failed 1 4%

Batch 2

Category Count Percentage
Total Drugs Evaluated 25 100%
Factually Correct 22 88%
Hallucinated / Failed 3 12%

Batch 3

Category Count Percentage
Total Drugs Evaluated 10 100%
Factually Correct 10 100%
Hallucinated / Failed 0 0%

Summary

Since this model was fine-tuned using LoRA rather than full-parameter fine-tuning, eliminating hallucinations entirely is challenging. While LoRA enables efficient training and strong instruction-following behavior, it does not fully overwrite the base model’s internal knowledge. Despite this limitation, the model performs well for educational and research-oriented drug description generation tasks.


Environmental Impact

  • Hardware Type: NVIDIA T4 GPU
  • Hours used: Not recorded
  • Cloud Provider: Google Colab
  • Compute Region: Not specified
  • Carbon Emitted: Not estimated

Technical Specifications

Model Architecture and Objective

  • Base model: Microsoft Phi-2
  • Objective: Instruction-following text generation
  • Adaptation method: LoRA (PEFT)

Compute Infrastructure

Hardware

  • NVIDIA T4 GPU

Software

  • Transformers
  • PEFT
  • PyTorch

Model Card Contact

Atharva Gaykar

Framework Versions

  • PEFT 0.18.0
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Gaykar/phi2-drug-lora

Base model

microsoft/phi-2
Adapter
(938)
this model

Dataset used to train Gaykar/phi2-drug-lora