PaliGemma 3B Fine-tuned for Chart Classification (Vision + Language)

Fine-tuning completo do PaliGemma incluindo:

✅ Language Model (LoRA rank 8)
✅ Vision Tower (LoRA rank 4)
✅ Multi-Modal Projector (LoRA rank 8)

Uso

from transformers import PaliGemmaProcessor, PaliGemmaForConditionalGeneration
from peft import PeftModel
import torch

# Carregar modelo base
model = PaliGemmaForConditionalGeneration.from_pretrained(
    "google/paligemma-3b-pt-448",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Carregar adaptadores LoRA
model = PeftModel.from_pretrained(model, "PessoniHugo/paligemma_residencia_vision_v2")
processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-448")

# Inferência
image = ...  # PIL Image
question = "answer What type of chart is this?"
inputs = processor(text=question, images=image, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(processor.decode(outputs[0], skip_special_tokens=True))

Estatísticas

Parâmetros treináveis: 13,071,424
Percentual treinável: 0.75%
Epochs: 2
Batch efetivo: 16

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PessoniHugo/paligemma_residencia_V2.1

Base model

google/paligemma-3b-pt-448

Adapter

(14)

this model