arxiv:2204.03162

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Published on Apr 7, 2022

Upvote

Authors:

Tristan Thrush ,

Max Bartolo ,

Amanpreet Singh ,

Adina Williams ,

Douwe Kiela ,

Abstract

Winoground evaluates vision and language models' ability to perform visio-linguistic compositional reasoning using identically worded but reordered captions paired with images, revealing limitations in current models.

AI-generated summary

We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground. Given two images and two captions, the goal is to match them correctly - but crucially, both captions contain a completely identical set of words, only in a different order. The dataset was carefully hand-curated by expert annotators and is labeled with a rich set of fine-grained tags to assist in analyzing model performance. We probe a diverse range of state-of-the-art vision and language models and find that, surprisingly, none of them do much better than chance. Evidently, these models are not as skilled at visio-linguistic compositional reasoning as we might have hoped. We perform an extensive analysis to obtain insights into how future work might try to mitigate these models' shortcomings. We aim for Winoground to serve as a useful evaluation set for advancing the state of the art and driving further progress in the field. The dataset is available at https://huggingface.co/datasets/facebook/winoground.

View arXiv page View PDF GitHub 18 Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2204.03162 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 4

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.