---
library_name: transformers
pipeline_tag: text-generation  # Added—crucial for widget
base_model: meta-llama/Llama-3.2-3B
tags:
  - text-generation
  - mdcat
  - medical
  - education
license: apache-2.0
---

# MDCAT-Llama3.2-3B

This is a 4-bit quantized version of LLaMA 3.2 3B, fine-tuned for answering MDCAT (Medical and Dental College Admission Test) questions. It uses Parameter-Efficient Fine-Tuning (PEFT) with QLoRA to provide accurate responses to MDCAT-related queries in biology, chemistry, physics, and medical topics, while refusing non-MDCAT questions.

## Model Details

### Model Description

Designed to assist MDCAT students, this model delivers precise answers within its domain and rejects off-topic queries. It’s quantized to 4-bit precision for efficiency.

- **Developed by:** abdullah1101
- **Model type:** Text generation (causal language model)
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from:** meta-llama/Llama-3.2-3B
- **Size:** 2.35GB (4-bit quantized)  # Clarifies quantization

### Model Sources

- **Repository:** https://huggingface.co/abdullah1101/MDCAT-Llama3.2-3B

## Uses

### Direct Use

Use via the Hugging Face Inference API (once processed) or load locally for MDCAT question-answering.

#### Local Usage Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig

quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
model = AutoModelForCausalLM.from_pretrained(
    "abdullah1101/MDCAT-Llama3.2-3B",
    quantization_config=quant_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("abdullah1101/MDCAT-Llama3.2-3B")

inputs = tokenizer("Question: What is the function of the liver?\nAnswer: ", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))