Felix1023 nielsr HF Staff commited on
Commit
97b57d6
·
verified ·
1 Parent(s): 0378ef1

Add pipeline tag and fix image path in model card (#1)

Browse files

- Add pipeline tag and fix image path in model card (1523c27143502a99ce97b59f5b2bb1758f13a7a7)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -1,27 +1,27 @@
1
  ---
2
- license: other
3
- license_name: other
4
- license_link: https://github.com/TencentARC/TokLIP/blob/main/LICENSE
5
- language:
6
- - en
7
  base_model:
8
  - google/siglip2-so400m-patch16-384
9
  - google/siglip2-so400m-patch16-256
 
 
 
 
 
 
10
  tags:
11
  - Tokenizer
12
  - CLIP
13
  - UnifiedMLLM
14
  ---
15
 
16
-
17
  # TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
18
 
19
  <h5 align="center">
20
 
21
  [![arXiv](https://img.shields.io/badge/TokLIP-2505.05422-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2505.05422)
22
  [![GitHub](https://img.shields.io/badge/GitHub-Code-green?logo=github)](https://github.com/TencentARC/TokLIP)
23
- [![HuggingFace](https://img.shields.io/badge/🤗%20Model-Huggingface-yellow)](https://huggingface.co/TencentARC/TokLIP)
24
- [![License](https://img.shields.io/badge/⚖️%20Code%20License-Other-blue)](https://github.com/TencentARC/TokLIP/blob/main/LICENSE)
25
  <br>
26
 
27
  </h5>
@@ -41,7 +41,7 @@ Your star means a lot to us in developing this project! ⭐⭐⭐
41
 
42
  ## 👀 Introduction
43
 
44
- <img src="./TokLIP.png" alt="TokLIP" style="zoom:50%;" />
45
 
46
  - We introduce TokLIP, a visual tokenizer that enhances comprehension by **semanticizing** vector-quantized (VQ) tokens and **incorporating CLIP-level semantics** while enabling end-to-end multimodal autoregressive training with standard VQ tokens.
47
 
@@ -137,4 +137,4 @@ Please cite our work if you use our code or discuss our findings in your own res
137
  journal={arXiv preprint arXiv:2505.05422},
138
  year={2025}
139
  }
140
- ```
 
1
  ---
 
 
 
 
 
2
  base_model:
3
  - google/siglip2-so400m-patch16-384
4
  - google/siglip2-so400m-patch16-256
5
+ language:
6
+ - en
7
+ license: other
8
+ license_name: other
9
+ license_link: https://github.com/TencentARC/TokLIP/blob/main/LICENSE
10
+ pipeline_tag: image-text-to-text
11
  tags:
12
  - Tokenizer
13
  - CLIP
14
  - UnifiedMLLM
15
  ---
16
 
 
17
  # TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
18
 
19
  <h5 align="center">
20
 
21
  [![arXiv](https://img.shields.io/badge/TokLIP-2505.05422-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2505.05422)
22
  [![GitHub](https://img.shields.io/badge/GitHub-Code-green?logo=github)](https://github.com/TencentARC/TokLIP)
23
+ [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20Model-Huggingface-yellow)](https://huggingface.co/TencentARC/TokLIP)
24
+ [![License](https://img.shields.io/badge/%E2%9A%96%EF%B8%8F%20Code%20License-Other-blue)](https://github.com/TencentARC/TokLIP/blob/main/LICENSE)
25
  <br>
26
 
27
  </h5>
 
41
 
42
  ## 👀 Introduction
43
 
44
+ <img src="https://raw.githubusercontent.com/TencentARC/TokLIP/main/docs/TokLIP.png" alt="TokLIP" style="zoom:50%;" />
45
 
46
  - We introduce TokLIP, a visual tokenizer that enhances comprehension by **semanticizing** vector-quantized (VQ) tokens and **incorporating CLIP-level semantics** while enabling end-to-end multimodal autoregressive training with standard VQ tokens.
47
 
 
137
  journal={arXiv preprint arXiv:2505.05422},
138
  year={2025}
139
  }
140
+ ```