Update README.md
Browse files
README.md
CHANGED
|
@@ -14,9 +14,9 @@ base_model_relation: finetune
|
|
| 14 |
|
| 15 |
# InternViT-6B-448px-V1-5
|
| 16 |
|
| 17 |
-
[\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
|
| 18 |
|
| 19 |
-
[\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#
|
| 20 |
|
| 21 |
We develop InternViT-6B-448px-V1-5 based on the pre-training of the strong foundation of [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). In this update, the resolution of training images is expanded from 448×448 to dynamic 448×448, where the basic tile size is 448×448 and the number of tiles ranges from 1 to 12.
|
| 22 |
Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
|
|
|
|
| 14 |
|
| 15 |
# InternViT-6B-448px-V1-5
|
| 16 |
|
| 17 |
+
[\[π GitHub\]](https://github.com/OpenGVLab/InternVL) [\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
|
| 18 |
|
| 19 |
+
[\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#quick-start) [\[π δΈζθ§£θ―»\]](https://zhuanlan.zhihu.com/p/706547971) [\[π Documents\]](https://internvl.readthedocs.io/en/latest/)
|
| 20 |
|
| 21 |
We develop InternViT-6B-448px-V1-5 based on the pre-training of the strong foundation of [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). In this update, the resolution of training images is expanded from 448×448 to dynamic 448×448, where the basic tile size is 448×448 and the number of tiles ranges from 1 to 12.
|
| 22 |
Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
|