Vision, Language and Reading

non-profit

https://www.vlr.ai/

Activity Feed

AI & ML interests

Multimodal AI, Document Understanding, Reading Systems.

Recent Activity

emanuelevivoli authored a paper 17 days ago

CoSMo: A Multimodal Transformer for Page Stream Segmentation in Comic Books

emanuelevivoli authored a paper 17 days ago

Multimodal Transformer for Comics Text-Cloze

Llabres updated a dataset about 1 month ago

VLR-CVC/ComicsPAP

View all activity

Papers

ComicsPAP: understanding comic strips by picking the correct panel

One missing piece in Vision and Language: A Survey on Comics Understanding

View all Papers

Organization Card

Community About org cards

Vision, Language, and Reading Group

At the Computer Vision Center (CVC) in Barcelona, Spain.

The VLR research team conducts fundamental research and technology transfer at the frontier between vision, language and reading systems. We devise reading systems for text in the wild, and incorporate scene text semantics in a multitude of computer vision tasks such as captioning, visual question answering, cross-modal retrieval, fine-grained classification, etc. In parallel, we advance document understanding with a special interest in end-to-end approaches for Document Visual Question Answering.

Collections 3

View 3 collections

models 2

VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP

Updated Apr 9

Vision, Language and Reading

AI & ML interests

Recent Activity

Papers

Vision, Language, and Reading Group

Collections 3

VLR-CVC/ComicsPAP

ComicsPAP: understanding comic strips by picking the correct panel

VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP

VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP

DocVQA: A Dataset for VQA on Document Images

Hierarchical multimodal transformers for Multi-Page DocVQA

InfographicVQA

VLR-CVC/ComicsPAP

ComicsPAP: understanding comic strips by picking the correct panel

VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP

VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP

DocVQA: A Dataset for VQA on Document Images

Hierarchical multimodal transformers for Multi-Page DocVQA

InfographicVQA

models 2

VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP

VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP

datasets 1

VLR-CVC/ComicsPAP

AI & ML interests

Recent Activity

Papers

Team members 8

Vision, Language, and Reading Group

Collections 3

models 2 Sort: Recently updated

datasets 1

models 2