SURESHBEEKHANI's picture
Update README.md
3a43993 verified
metadata
license: mit
datasets:
  - SURESHBEEKHANI/medical-reasoning-orpo
language:
  - en
base_model:
  - unsloth/gemma-2b-bnb-4bit
pipeline_tag: question-answering
metrics:
  - accuracy

Model Card: Gemma 2B Medical ORPO RLHF Fine-Tuning

Model Overview

Model Description

This model is a fine-tuned version of the Gemma 2B model using ORPO RLHF to enhance its medical reasoning capabilities. The fine-tuning process leverages a medical-reasoning dataset to improve decision-making and contextual understanding in healthcare-related queries.

Intended Use

This model is designed for:

  • Assisting in medical reasoning and diagnosis
  • Enhancing clinical decision support
  • Providing explanations for medical queries
  • Research and educational purposes in the medical field

Limitations:

  • Not a substitute for professional medical advice.
  • May contain biases based on the dataset.
  • Performance is dependent on prompt formulation.

Training Details

  • Dataset Used: SURESHBEEKHANI/medical-reasoning-orpo
  • Number of Training Steps: 30 (Demo setting, increase for full training)
  • Batch Size: 1 per device
  • Gradient Accumulation Steps: 4
  • Optimizer: AdamW (8-bit)
  • Learning Rate Scheduler: Linear
  • Precision: Mixed (Bfloat16 or Float16 depending on hardware)
  • Quantization: 4-bit (q4_k_m, q8_0, q5_k_m)

Model Performance

The model was evaluated based on:

  • Accuracy in medical reasoning tasks
  • Fluency in response generation
  • Coherence and factual correctness
  • Comparison with baseline medical AI models

Ethical Considerations

  • The model should not be used for making actual medical decisions without professional oversight.
  • Potential biases in medical datasets may lead to inaccurate or misleading outputs.
  • Always verify responses with medical professionals before acting on them.

How to Use

from unsloth import FastLanguageModel
from transformers import AutoTokenizer

model, tokenizer = FastLanguageModel.from_pretrained(
    "SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning",
    load_in_4bit=True
)

prompt = "### Instruction: Diagnose the following symptoms...\n### Input: Fever, headache, and rash\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Citation

If you use this model, please cite:

@misc{gemma2b_orpo_medical,
  author = {Suresh Beekhanii},
  title = {Fine-Tuning Gemma 2B for Medical Reasoning using ORPO RLHF},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning}
}

Contact

For any issues or questions, please contact Suresh Beekhanii or open an issue in the Hugging Face repository.