Model Card for Model ID

This model is a LoRA-based fine-tuned version of Microsoft Phi-2 trained to generate concise medical-style descriptions of drugs. Given a drug name as input, the model produces a short, single-paragraph medical description following an instruction-style prompt format.

The model was trained using a two-stage pipeline consisting of continued pretraining (CPT) on domain-relevant text and supervised fine-tuning (SFT) on structured drug name–description pairs.

This model is intended strictly for educational and research purposes and is not suitable for real-world medical or clinical use.

Model Details

Model Description

This model is a parameter-efficient fine-tuned version of the Microsoft Phi-2 language model, adapted to generate concise medical drug descriptions from drug names. The training pipeline consists of two stages:

Continued Pretraining (CPT) to adapt the base model to drug and medical terminology.
Supervised Fine-Tuning (SFT) using instruction-style input–output pairs.

LoRA adapters were used during fine-tuning to reduce memory usage and training cost while preserving base model knowledge.

Developed by: Atharva Gaykar
Funded by: Not applicable
Shared by: Atharva Gaykar
Model type: Causal Language Model (LoRA-adapted)
Language(s) (NLP): English
License: CC-BY-NC 4.0
Finetuned from model: microsoft/phi-2

Uses

This model is designed to generate concise medical-style descriptions of drugs given their names.

Direct Use

Educational demonstrations of instruction-following language models
Academic research on medical-domain adaptation
Experimentation with CPT + SFT pipelines
Studying hallucination behavior in domain-specific LLMs

The model should only be used in non-production, educational, or research settings.

Out-of-Scope Use

This model is not designed or validated for:

Medical diagnosis or treatment planning
Clinical decision support systems
Dosage recommendations or prescribing guidance
Patient-facing healthcare applications
Professional medical, pharmaceutical, or regulatory use
Any real-world deployment where incorrect medical information could cause harm

Bias, Risks, and Limitations

This model was developed solely for educational purposes and must not be used in real-world medical or clinical decision-making.

Known Limitations

May hallucinate incorrect drug indications or mechanisms
Generated descriptions may be incomplete or outdated
Does not verify outputs against authoritative medical sources
Does not understand patient context, dosage, or drug interactions
Output quality is sensitive to prompt phrasing

Risks

Misinterpretation of outputs as medical advice
Overconfidence in fluent but inaccurate responses
Potential propagation of misinformation if misused

Recommendations

Always verify outputs using trusted medical references
Use only in controlled, non-production environments
Clearly disclose limitations in any downstream use
Avoid deployment in safety-critical or healthcare systems

How to Get Started with the Model

This repository contains LoRA adapter weights, not a full model.

Example usage (conceptual):

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Gaykar/phi2-drug-lora")

model.eval()


import torch

# Drug to evaluate
drug_name = "Anavip"

# Build evaluation prompt
eval_prompt = (
    "Generate exactly ONE sentence describing the drug.\n"
    "Do not include headings or extra information.\n\n"
    f"Drug Name: {drug_name}\n"
    "Description:"
)

# Tokenize prompt
model_input = tokenizer(
    eval_prompt,
    return_tensors="pt"
).to(model.device)

# Generate output (greedy decoding)
with torch.no_grad():
    output = model.generate(
        **model_input,
        do_sample=False,              # Greedy decoding (This decision is critical for this model because it operates in the medical domain, where factual consistency and determinism are more important than linguistic diversity.)
        max_new_tokens=120,
        repetition_penalty=1.1,
        eos_token_id=tokenizer.eos_token_id
    )

# Remove prompt tokens
prompt_length = model_input["input_ids"].shape[1]
generated_tokens = output[0][prompt_length:]

# Decode generated text only
generated_text = tokenizer.decode(
    generated_tokens,
    skip_special_tokens=True
).strip()

# Enforce single-sentence output
if "." in generated_text:
    generated_text = generated_text.split(".")[0] + "."

print(" DRUG NAME:", drug_name)
print(" MODEL GENERATED DESCRIPTION:")
print(generated_text)

#Example output
DRUG NAME (EVAL): Anavip

MODEL GENERATED DESCRIPTION:
Anavip (Crotalidae immune $F(ab')_{2}$ equine) is an antivenin used to treat adults and children with crotalid snake envenomation (rattlesnake, copperhead, or cottonmouth/water moccasin).

Training Details

Training Data

Dataset: Gaykar/DrugData
Structured drug name–description pairs
Used for both CPT (domain adaptation) and SFT (instruction following)

Training Procedure

Continued Pretraining (CPT)

The base model was further trained on domain-relevant medical and drug-related text to improve familiarity with terminology and style. CPT focused on next-token prediction without instruction formatting.

Supervised Fine-Tuning (SFT)

After CPT, the model was fine-tuned using instruction-style prompts to generate concise medical descriptions from drug names.

Training Hyperparameters

CPT Hyperparameters

Hyperparameter	Value
Batch size (per device)	1
Effective batch size	8
Epochs	4
Learning rate	2e-4
Precision	FP16
Optimizer	Paged AdamW (8-bit)
Logging steps	10
Checkpoint saving	Every 500 steps
Checkpoint limit	2

SFT Hyperparameters

Hyperparameter	Value
Batch size (per device)	4
Gradient accumulation	1
Effective batch size	4
Epochs	5
Learning rate	5e-5
LR scheduler	Linear
Warmup ratio	6%
Weight decay	1e-4
Max gradient norm	1.0
Precision	FP16
Optimizer	Paged AdamW (8-bit)
Checkpoint saving	Every 50 steps
Checkpoint limit	2
Experiment tracking	Weights & Biases

Evaluation

Testing Data

Drug names sampled from the same dataset were used for evaluation. Outputs were assessed for factual correctness using an external LLM-based evaluation approach.

Metrics

Evaluation Method: LLM-as-a-Judge (Google Gemini )

Binary classification: Factually Correct / Hallucinated
Three evaluation batches

Results

Batch 1

Category	Count	Percentage
Total Drugs Evaluated	25	100%
Factually Correct	24	96%
Hallucinated / Failed	1	4%

Batch 2

Category	Count	Percentage
Total Drugs Evaluated	25	100%
Factually Correct	22	88%
Hallucinated / Failed	3	12%

Batch 3

Category	Count	Percentage
Total Drugs Evaluated	10	100%
Factually Correct	10	100%
Hallucinated / Failed	0	0%

Summary

Since this model was fine-tuned using LoRA rather than full-parameter fine-tuning, eliminating hallucinations entirely is challenging. While LoRA enables efficient training and strong instruction-following behavior, it does not fully overwrite the base model’s internal knowledge. Despite this limitation, the model performs well for educational and research-oriented drug description generation tasks.

Environmental Impact

Hardware Type: NVIDIA T4 GPU
Hours used: Not recorded
Cloud Provider: Google Colab
Compute Region: Not specified
Carbon Emitted: Not estimated

Technical Specifications

Model Architecture and Objective

Base model: Microsoft Phi-2
Objective: Instruction-following text generation
Adaptation method: LoRA (PEFT)

Compute Infrastructure

Hardware

NVIDIA T4 GPU

Software

Transformers
PEFT
PyTorch

Model Card Contact

Atharva Gaykar

Framework Versions

PEFT 0.18.0

Downloads last month: 13

Model tree for Gaykar/phi2-drug-lora

Base model

microsoft/phi-2

Adapter

(938)

this model

Gaykar
/

phi2-drug-lora