MDCAT-Llama3.2-3B

This is a 4-bit quantized version of LLaMA 3.2 3B, fine-tuned for answering MDCAT (Medical and Dental College Admission Test) questions. It uses Parameter-Efficient Fine-Tuning (PEFT) with QLoRA to provide accurate responses to MDCAT-related queries in biology, chemistry, physics, and medical topics, while refusing non-MDCAT questions.

Model Details

Model Description

Designed to assist MDCAT students, this model delivers precise answers within its domain and rejects off-topic queries. It’s quantized to 4-bit precision for efficiency.

Developed by: abdullah1101
Model type: Text generation (causal language model)
Language(s): English
License: Apache 2.0
Finetuned from: meta-llama/Llama-3.2-3B
Size: 2.35GB (4-bit quantized) # Clarifies quantization

Model Sources

Repository: https://huggingface.co/abdullah1101/MDCAT-Llama3.2-3B

Uses

Direct Use

Use via the Hugging Face Inference API (once processed) or load locally for MDCAT question-answering.

Local Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig

quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
model = AutoModelForCausalLM.from_pretrained(
    "abdullah1101/MDCAT-Llama3.2-3B",
    quantization_config=quant_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("abdullah1101/MDCAT-Llama3.2-3B")

inputs = tokenizer("Question: What is the function of the liver?\nAnswer: ", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

F32

F16

Model tree for abdullah1101/MDCAT-Llama3.2-3B

Base model

meta-llama/Llama-3.2-3B

Finetuned

(342)

this model