flax-community
/

clip-vision-bert-cc12m-70k

clip-vision-bert

Model card Files Files and versions

gchhablani commited on Jul 21, 2021

Commit

06fb61a

·

1 Parent(s): 61db6a0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ You can use the raw model for masked language modeling, but it's mostly intended
 Note that this model is primarily aimed at being fine-tuned on tasks such as visuo-linguistic sequence classification or visual question answering. We used this model to fine-tuned on a multi-translated version of the visual question answering task - [VQA v2](https://visualqa.org/challenge.html). Since Conceptual-12M is a dataset scraped from the internet, it will involve some biases which will also affect all fine-tuned versions of this model.
 ### How to use❓
-You can use this model directly with a pipeline for masked language modeling:
 ```python
 >>> from torchvision.io import read_image
 >>> import numpy as  np

 Note that this model is primarily aimed at being fine-tuned on tasks such as visuo-linguistic sequence classification or visual question answering. We used this model to fine-tuned on a multi-translated version of the visual question answering task - [VQA v2](https://visualqa.org/challenge.html). Since Conceptual-12M is a dataset scraped from the internet, it will involve some biases which will also affect all fine-tuned versions of this model.
 ### How to use❓
+You can use this model directly with a pipeline for masked language modeling. You will need to clone the model from [here](https://github.com/gchhablani/multilingual-vqa). An example of usage is shown below:
 ```python
 >>> from torchvision.io import read_image
 >>> import numpy as  np