π₯ Llama-3.2-11B-Vision-Food-mini
A highly optimized, 16-bit merged Vision-Language Model (VLM) fine-tuned for Indian Food Recognition and Analysis.
This model is a fine-tuned, 16-bit merged version of unsloth/Llama-3.2-11B-Vision-Instruct. It was trained using the Unsloth library for rapid and memory-efficient QLoRA fine-tuning. The model specializes in multimodal tasks related to Indian food, such as visual classification, detailed description, and Visual Question Answering (VQA).
π Model Details
| Property | Value |
|---|---|
| Model Type | Vision-Language Model (VLM) |
| Base Model | Meta Llama-3.2-11B-Vision-Instruct |
| Fine-Tuning Method | Unsloth QLoRA (Merged to 16-bit) |
| Parameters | 11 Billion |
| Quantization | Full 16-bit (merged from 4-bit LoRA) |
| Owner | Anachronox39 |
| Repo ID | Anachronox39/Llama-3.2-11B-Vision-Food-mini |
Intended Use
This model is intended for research, academic and non-commercial applications focused on:
- Indian Food Classification/Recognition from images.
- Multimodal Food Description (e.g., generating detailed captions based on an image).
- Visual Question Answering (VQA) on Indian food content (e.g., "What region is this dish from?").
- Note: Commercial use is strictly governed by the Llama 3.2 Community License.
Training Data
- Dataset Used:
SKSKCulinaryAI/Indian_Food_Images_Dataset - Training Hardware: 2X Dual NVIDIA GeForce RTX 3060 GPUs
π» How to Use for Inference
The model is loaded using the FastVisionModel from the unsloth library for optimized inference performance.
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for Anachronox39/Llama-3.2-11B-Vision-Food-mini
Base model
meta-llama/Llama-3.2-11B-Vision-Instruct
Finetuned
unsloth/Llama-3.2-11B-Vision-Instruct