Update README.md
Browse files
README.md
CHANGED
|
@@ -146,139 +146,6 @@ accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \
|
|
| 146 |
--batch_size=1
|
| 147 |
```
|
| 148 |
|
| 149 |
-
## Quick Start Guide
|
| 150 |
-
|
| 151 |
-
### 1.🐳 Docker (Recommended)
|
| 152 |
-
|
| 153 |
-
We strongly recommend using the docker environment for a seamless experience. The following instructions are tailored for the A100 80GB GPU environment.
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
```bash
|
| 157 |
-
# Clone repository
|
| 158 |
-
git clone https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5.git
|
| 159 |
-
cd LLaVA-OneVision-1.5
|
| 160 |
-
|
| 161 |
-
docker build -t llava_megatron:25.04 .
|
| 162 |
-
|
| 163 |
-
# Run container with -w to set working directory directly to the mounted volume
|
| 164 |
-
docker run -it --gpus all \
|
| 165 |
-
--ipc host --net host --privileged --cap-add IPC_LOCK \
|
| 166 |
-
--ulimit memlock=-1 --ulimit stack=67108864 --rm \
|
| 167 |
-
-v $(pwd):/workspace/LLaVA-OneVision-1.5 \
|
| 168 |
-
-w /workspace/LLaVA-OneVision-1.5 \
|
| 169 |
-
--name "llava_megatron_container" \
|
| 170 |
-
llava_megatron:25.04 /bin/bash
|
| 171 |
-
```
|
| 172 |
-
|
| 173 |
-
### 2. Checkpoint and Format Conversion
|
| 174 |
-
|
| 175 |
-
You have two options to get started with LLaVA-OneVision-1.5-stage-0:
|
| 176 |
-
|
| 177 |
-
#### Option 1: Download pre-trained model from HuggingFace
|
| 178 |
-
Download our `LLaVA-OneVision-1.5-4B-stage0` model directly from [HuggingFace](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-stage0).
|
| 179 |
-
|
| 180 |
-
#### Option 2: Merge initial weights yourself
|
| 181 |
-
Alternatively, you can merge the initial weights from the original ViT and LLM:
|
| 182 |
-
```bash
|
| 183 |
-
python ds/merge_model.py \
|
| 184 |
-
--vit_path DeepGlint-AI/rice-vit-large-patch14-560 \
|
| 185 |
-
--llm_path Qwen/Qwen3-4B-Instruct-2507 \
|
| 186 |
-
--output LLaVA-OneVision-1.5-4B-stage0
|
| 187 |
-
```
|
| 188 |
-
Note: When merging weights, the adapter component will be initialized with default values.
|
| 189 |
-
|
| 190 |
-
Convert the model from HuggingFace format to Megatron format:
|
| 191 |
-
|
| 192 |
-
```bash
|
| 193 |
-
AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 bash examples/llava_ov_1_5/convert/convert_4b_hf_to_mcore.sh \
|
| 194 |
-
LLaVA-OneVision-1.5-4B-stage0 \
|
| 195 |
-
LLaVA-OneVision-1.5-4B-stage0_mcore_tp1_pp1 \
|
| 196 |
-
1 1
|
| 197 |
-
```
|
| 198 |
-
|
| 199 |
-
### 3. Stage 1 Alignment-Training
|
| 200 |
-
|
| 201 |
-
Download LLaVA from [LLaVA-558K-Webdataset](https://huggingface.co/datasets/lmms-lab/LLaVA-558K-Webdataset).
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
```bash
|
| 205 |
-
# ============================================================
|
| 206 |
-
# Required environment variables:
|
| 207 |
-
# AIAK_TRAINING_PATH Root directory of the AIAK-Training-LLM project
|
| 208 |
-
# DATA_PATH Directory with WebDataset shards (.tar) for pretraining
|
| 209 |
-
# TOKENIZER_PATH Hugging Face tokenizer directory
|
| 210 |
-
# CHECKPOINT_PATH Megatron-formatted checkpoint directory (e.g., mcore TP1/PP1)
|
| 211 |
-
# SAVE_CKPT_PATH Output directory for saving training checkpoints
|
| 212 |
-
AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
|
| 213 |
-
DATA_PATH=LLaVA-558K-Webdataset \
|
| 214 |
-
TOKENIZER_PATH=LLaVA-OneVision-1.5-4B-stage0 \
|
| 215 |
-
CHECKPOINT_PATH=LLaVA-OneVision-1.5-4B-stage0_mcore_tp1_pp1 \
|
| 216 |
-
bash examples/llava_ov_1_5/quick_start/stage_1_alignment_llava_ov_4b.sh
|
| 217 |
-
```
|
| 218 |
-
|
| 219 |
-
### 4. Stage 1.5 Mid-Training
|
| 220 |
-
|
| 221 |
-
Download our lightweight packed subset from [LLaVA-OneVision-1.5-Mid-Training-Quick-Start-3M-Webdataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Mid-Training-Webdataset-Quick-Start-3M).
|
| 222 |
-
|
| 223 |
-
```bash
|
| 224 |
-
# ============================================================
|
| 225 |
-
# Convert model to release format
|
| 226 |
-
bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_release.sh \
|
| 227 |
-
stage_1_alignment_llava_ov_4b/iter_0002500/ \
|
| 228 |
-
stage_1_alignment_llava_ov_4b_release 1 1
|
| 229 |
-
# ============================================================
|
| 230 |
-
# Launch
|
| 231 |
-
AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
|
| 232 |
-
DATA_PATH=LLaVA-OneVision-1.5-Mid-Training-Quick-Start-3M-Webdataset \
|
| 233 |
-
TOKENIZER_PATH=LLaVA-OneVision-1.5-4B-stage0 \
|
| 234 |
-
CHECKPOINT_PATH=stage_1_alignment_llava_ov_4b_release \
|
| 235 |
-
bash examples/llava_ov_1_5/quick_start/stage_1.5_mid_training_llava_ov_4b.sh
|
| 236 |
-
```
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
### 5. Stage 2 Instruct-Training
|
| 240 |
-
|
| 241 |
-
Download LLaVA-NeXT-780k-webdataset at [LLaVA-NeXT-780K Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-780k-webdataset).
|
| 242 |
-
|
| 243 |
-
```bash
|
| 244 |
-
# ============================================================
|
| 245 |
-
# Convert model to release format
|
| 246 |
-
bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_release.sh \
|
| 247 |
-
stage_1.5_mid_training_llava_ov_4b/iter_0020000/ \
|
| 248 |
-
stage_1.5_mid_training_llava_ov_4b_release 1 1
|
| 249 |
-
# ============================================================
|
| 250 |
-
# # Launch
|
| 251 |
-
AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
|
| 252 |
-
DATA_PATH=LLaVA-NeXT-780k-Webdataset \
|
| 253 |
-
TOKENIZER_PATH=LLaVA-OneVision-1.5-4B-stage0 \
|
| 254 |
-
CHECKPOINT_PATH=stage_1.5_mid_training_llava_ov_4b_release \
|
| 255 |
-
bash examples/llava_ov_1_5/quick_start/stage_2_instruct_llava_ov_4b.sh
|
| 256 |
-
```
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
### 6. Convert mcore to huggingface
|
| 260 |
-
```bash
|
| 261 |
-
AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
|
| 262 |
-
bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_hf.sh \
|
| 263 |
-
stage_2_instruct_llava_ov_4b/iter_0003500 \
|
| 264 |
-
LLaVA-OneVision-1.5-4B-3M-Mid-Training-780K-Instruct \
|
| 265 |
-
1 1
|
| 266 |
-
# Copy non-model files (e.g., tokenizer config) to the new directory
|
| 267 |
-
find LLaVA-OneVision-1.5-4B-stage0/ -type f -not -iname '*safetensors*' -exec cp {} LLaVA-OneVision-1.5-4B-3M-Mid-Training-780K-Instruct/ ';'
|
| 268 |
-
```
|
| 269 |
-
|
| 270 |
-
### 7. Evaluation
|
| 271 |
-
```bash
|
| 272 |
-
# pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
| 273 |
-
CUDA_VISIBLE_DEVICES=4,5,6,7 accelerate launch \
|
| 274 |
-
--num_processes=4 --main_process_port 12399 -m lmms_eval --model=llava_onevision1_5 --batch_size=1 --tasks=mme \
|
| 275 |
-
--model_args=pretrained=/workspace/LLaVA-OneVision-1.5/LLaVA-OneVision-1.5-4B-3M-Mid-Training-780K-Instruct,max_pixels=3240000
|
| 276 |
-
```
|
| 277 |
-
|
| 278 |
-
## Fully Reproducing Guide
|
| 279 |
-
|
| 280 |
-
> [!TIP]
|
| 281 |
-
> More detailed reproduction steps for the complete process will be provided after the dataset upload is completed.
|
| 282 |
|
| 283 |
|
| 284 |
### Mid-Training
|
|
|
|
| 146 |
--batch_size=1
|
| 147 |
```
|
| 148 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
|
| 150 |
|
| 151 |
### Mid-Training
|