πŸ₯˜ Llama-3.2-11B-Vision-Food-mini

A highly optimized, 16-bit merged Vision-Language Model (VLM) fine-tuned for Indian Food Recognition and Analysis.

This model is a fine-tuned, 16-bit merged version of unsloth/Llama-3.2-11B-Vision-Instruct. It was trained using the Unsloth library for rapid and memory-efficient QLoRA fine-tuning. The model specializes in multimodal tasks related to Indian food, such as visual classification, detailed description, and Visual Question Answering (VQA).


πŸš€ Model Details

Property Value
Model Type Vision-Language Model (VLM)
Base Model Meta Llama-3.2-11B-Vision-Instruct
Fine-Tuning Method Unsloth QLoRA (Merged to 16-bit)
Parameters 11 Billion
Quantization Full 16-bit (merged from 4-bit LoRA)
Owner Anachronox39
Repo ID Anachronox39/Llama-3.2-11B-Vision-Food-mini

Intended Use

This model is intended for research, academic and non-commercial applications focused on:

  • Indian Food Classification/Recognition from images.
  • Multimodal Food Description (e.g., generating detailed captions based on an image).
  • Visual Question Answering (VQA) on Indian food content (e.g., "What region is this dish from?").
  • Note: Commercial use is strictly governed by the Llama 3.2 Community License.

Training Data

  • Dataset Used: SKSKCulinaryAI/Indian_Food_Images_Dataset
  • Training Hardware: 2X Dual NVIDIA GeForce RTX 3060 GPUs

πŸ’» How to Use for Inference

The model is loaded using the FastVisionModel from the unsloth library for optimized inference performance.

Downloads last month
5
Safetensors
Model size
11B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Anachronox39/Llama-3.2-11B-Vision-Food-mini