MDCAT-Llama3.2-3B

This is a 4-bit quantized version of LLaMA 3.2 3B, fine-tuned for answering MDCAT (Medical and Dental College Admission Test) questions. It uses Parameter-Efficient Fine-Tuning (PEFT) with QLoRA to provide accurate responses to MDCAT-related queries in biology, chemistry, physics, and medical topics, while refusing non-MDCAT questions.

Model Details

Model Description

Designed to assist MDCAT students, this model delivers precise answers within its domain and rejects off-topic queries. It鈥檚 quantized to 4-bit precision for efficiency.

  • Developed by: abdullah1101
  • Model type: Text generation (causal language model)
  • Language(s): English
  • License: Apache 2.0
  • Finetuned from: meta-llama/Llama-3.2-3B
  • Size: 2.35GB (4-bit quantized) # Clarifies quantization

Model Sources

Uses

Direct Use

Use via the Hugging Face Inference API (once processed) or load locally for MDCAT question-answering.

Local Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig

quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
model = AutoModelForCausalLM.from_pretrained(
    "abdullah1101/MDCAT-Llama3.2-3B",
    quantization_config=quant_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("abdullah1101/MDCAT-Llama3.2-3B")

inputs = tokenizer("Question: What is the function of the liver?\nAnswer: ", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
-
Safetensors
Model size
2B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for abdullah1101/MDCAT-Llama3.2-3B

Finetuned
(342)
this model