Image-Text-to-Text
Transformers
Safetensors
GGUF
English
qwen2_5_vl
image-to-text
remyx
qwen2.5-vl
spatial-reasoning
multimodal
vlm
vqasynth
thinking
reasoning
test-time-compute
robotics
embodied-ai
quantitative-spatial-reasoning
distance-estimation
visual-question-answering
conversational
Eval Results
text-generation-inference
File size: 135 Bytes
834c2a8 |
1 2 3 4 |
version https://git-lfs.github.com/spec/v1
oid sha256:59a57561032bbe89397fd35b956e76f7d5e63587c4d38f98390250f2df5e8516
size 1929903424
|