AskAnythingInCharts-Qwen2.5-7B
A fine-tuned Qwen2.5-VL-7B model specifically optimized for chart understanding tasks using LoRA (Low-Rank Adaptation) on the ChartQA dataset.
Model Details
- Base Model: Qwen/Qwen2.5-VL-7B-Instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Dataset: ChartQA (chart understanding benchmark)
- Accuracy: 66.0% on ChartQA validation set (+8.5% improvement over base model)
Performance Comparison
| Model | ChartQA Accuracy | Improvement |
|---|---|---|
| Qwen 2.5 7B base | 57.5% | - |
| AskAnythingInCharts-Qwen2.5 7B | 66.0% | +8.5% |
Training Configuration
- Epochs: 6
- Learning Rate: 4e-5
- LoRA Rank: 64
- LoRA Alpha: 16
- Target Modules: Vision and language attention layers
- Batch Size: 1 (with gradient accumulation)
- Optimizer: AdamW with fused implementation
- Scheduler: Cosine learning rate schedule
- Hardware: GPU with 16GB+ VRAM
- Framework: HuggingFace Transformers + PEFT + DeepSpeed
Key Improvements
The fine-tuned model shows significant improvements in:
- β Concise Answers: Returns exact values without verbose explanations
- β Label Recognition: Better at reading text labels from charts
- β Color Identification: More accurate at identifying chart colors
- β Statistical Calculations: Improved at medians, ratios, differences
- β Counting: Better accuracy in counting chart elements
- β Region Comparison: Accurate comparisons across chart regions
- β Yes/No Questions: More reliable binary responses
Usage
Direct Usage
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from peft import PeftModel
from PIL import Image
# Load base model
base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct",
torch_dtype="bfloat16",
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "prakashchhipa/Qwen2.5-VL-7B-ChartQA-LoRA")
model = model.merge_and_unload()
# Load processor
processor = AutoProcessor.from_pretrained("prakashchhipa/Qwen2.5-VL-7B-ChartQA-LoRA")
# Inference
image = Image.open("chart.png")
question = "What is the highest value in the chart?"
messages = [
{"role": "user", "content": [
{"type": "text", "text": question},
{"type": "image", "image": image}
]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Process and generate response...
Try the Demo
- π¨ Interactive Demo: HuggingFace Spaces
- π GitHub Repository: AskAnythingInCharts-Qwen2.5-7B
Training Data
The model was fine-tuned on the ChartQA dataset, which contains:
- Chart images from various sources
- Questions about chart content
- Ground truth answers
- Multiple chart types (bar charts, line graphs, pie charts, etc.)
Evaluation
- Test Set: ChartQA validation set (500 examples)
- Metric: Exact Match with normalization and numeric tolerance
- Filtering: Only genuine improvements (excluded verbose-but-correct cases)
Limitations
- Primarily optimized for chart understanding tasks
- May not perform as well on general vision-language tasks
- Requires GPU with sufficient VRAM for inference
- Performance may vary on chart types not well-represented in training data
Citation
If you use this model in your research, please cite:
@misc{askanything-charts-qwen2.5,
title={AskAnythingInCharts-Qwen2.5-7B: Fine-tuned Qwen2.5-VL for Chart Understanding},
author={Prakash Chandra Chhipa},
year={2025},
url={https://huggingface.co/prakashchhipa/Qwen2.5-VL-7B-ChartQA-LoRA}
}
License
This model is released under the MIT License. The base Qwen2.5-VL model is subject to its own license terms.
Contact
- Author: Prakash Chandra Chhipa
- Portfolio: prakashchhipa.github.io
- GitHub: @prakashchhipa
Built with β€οΈ using Qwen2.5-VL and HuggingFace Transformers
Model tree for prakashchhipa/Qwen2.5-VL-7B-ChartQA-LoRA
Base model
Qwen/Qwen2.5-VL-7B-Instruct