Image-Text-to-Text
Transformers
TensorBoard
Safetensors
feature-extraction
conversational
custom_code
xiangan commited on
Commit
d6fbfb7
·
verified ·
1 Parent(s): d51517f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -133
README.md CHANGED
@@ -146,139 +146,6 @@ accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \
146
  --batch_size=1
147
  ```
148
 
149
- ## Quick Start Guide
150
-
151
- ### 1.🐳 Docker (Recommended)
152
-
153
- We strongly recommend using the docker environment for a seamless experience. The following instructions are tailored for the A100 80GB GPU environment.
154
-
155
-
156
- ```bash
157
- # Clone repository
158
- git clone https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5.git
159
- cd LLaVA-OneVision-1.5
160
-
161
- docker build -t llava_megatron:25.04 .
162
-
163
- # Run container with -w to set working directory directly to the mounted volume
164
- docker run -it --gpus all \
165
- --ipc host --net host --privileged --cap-add IPC_LOCK \
166
- --ulimit memlock=-1 --ulimit stack=67108864 --rm \
167
- -v $(pwd):/workspace/LLaVA-OneVision-1.5 \
168
- -w /workspace/LLaVA-OneVision-1.5 \
169
- --name "llava_megatron_container" \
170
- llava_megatron:25.04 /bin/bash
171
- ```
172
-
173
- ### 2. Checkpoint and Format Conversion
174
-
175
- You have two options to get started with LLaVA-OneVision-1.5-stage-0:
176
-
177
- #### Option 1: Download pre-trained model from HuggingFace
178
- Download our `LLaVA-OneVision-1.5-4B-stage0` model directly from [HuggingFace](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-stage0).
179
-
180
- #### Option 2: Merge initial weights yourself
181
- Alternatively, you can merge the initial weights from the original ViT and LLM:
182
- ```bash
183
- python ds/merge_model.py \
184
- --vit_path DeepGlint-AI/rice-vit-large-patch14-560 \
185
- --llm_path Qwen/Qwen3-4B-Instruct-2507 \
186
- --output LLaVA-OneVision-1.5-4B-stage0
187
- ```
188
- Note: When merging weights, the adapter component will be initialized with default values.
189
-
190
- Convert the model from HuggingFace format to Megatron format:
191
-
192
- ```bash
193
- AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 bash examples/llava_ov_1_5/convert/convert_4b_hf_to_mcore.sh \
194
- LLaVA-OneVision-1.5-4B-stage0 \
195
- LLaVA-OneVision-1.5-4B-stage0_mcore_tp1_pp1 \
196
- 1 1
197
- ```
198
-
199
- ### 3. Stage 1 Alignment-Training
200
-
201
- Download LLaVA from [LLaVA-558K-Webdataset](https://huggingface.co/datasets/lmms-lab/LLaVA-558K-Webdataset).
202
-
203
-
204
- ```bash
205
- # ============================================================
206
- # Required environment variables:
207
- # AIAK_TRAINING_PATH Root directory of the AIAK-Training-LLM project
208
- # DATA_PATH Directory with WebDataset shards (.tar) for pretraining
209
- # TOKENIZER_PATH Hugging Face tokenizer directory
210
- # CHECKPOINT_PATH Megatron-formatted checkpoint directory (e.g., mcore TP1/PP1)
211
- # SAVE_CKPT_PATH Output directory for saving training checkpoints
212
- AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
213
- DATA_PATH=LLaVA-558K-Webdataset \
214
- TOKENIZER_PATH=LLaVA-OneVision-1.5-4B-stage0 \
215
- CHECKPOINT_PATH=LLaVA-OneVision-1.5-4B-stage0_mcore_tp1_pp1 \
216
- bash examples/llava_ov_1_5/quick_start/stage_1_alignment_llava_ov_4b.sh
217
- ```
218
-
219
- ### 4. Stage 1.5 Mid-Training
220
-
221
- Download our lightweight packed subset from [LLaVA-OneVision-1.5-Mid-Training-Quick-Start-3M-Webdataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Mid-Training-Webdataset-Quick-Start-3M).
222
-
223
- ```bash
224
- # ============================================================
225
- # Convert model to release format
226
- bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_release.sh \
227
- stage_1_alignment_llava_ov_4b/iter_0002500/ \
228
- stage_1_alignment_llava_ov_4b_release 1 1
229
- # ============================================================
230
- # Launch
231
- AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
232
- DATA_PATH=LLaVA-OneVision-1.5-Mid-Training-Quick-Start-3M-Webdataset \
233
- TOKENIZER_PATH=LLaVA-OneVision-1.5-4B-stage0 \
234
- CHECKPOINT_PATH=stage_1_alignment_llava_ov_4b_release \
235
- bash examples/llava_ov_1_5/quick_start/stage_1.5_mid_training_llava_ov_4b.sh
236
- ```
237
-
238
-
239
- ### 5. Stage 2 Instruct-Training
240
-
241
- Download LLaVA-NeXT-780k-webdataset at [LLaVA-NeXT-780K Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-780k-webdataset).
242
-
243
- ```bash
244
- # ============================================================
245
- # Convert model to release format
246
- bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_release.sh \
247
- stage_1.5_mid_training_llava_ov_4b/iter_0020000/ \
248
- stage_1.5_mid_training_llava_ov_4b_release 1 1
249
- # ============================================================
250
- # # Launch
251
- AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
252
- DATA_PATH=LLaVA-NeXT-780k-Webdataset \
253
- TOKENIZER_PATH=LLaVA-OneVision-1.5-4B-stage0 \
254
- CHECKPOINT_PATH=stage_1.5_mid_training_llava_ov_4b_release \
255
- bash examples/llava_ov_1_5/quick_start/stage_2_instruct_llava_ov_4b.sh
256
- ```
257
-
258
-
259
- ### 6. Convert mcore to huggingface
260
- ```bash
261
- AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
262
- bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_hf.sh \
263
- stage_2_instruct_llava_ov_4b/iter_0003500 \
264
- LLaVA-OneVision-1.5-4B-3M-Mid-Training-780K-Instruct \
265
- 1 1
266
- # Copy non-model files (e.g., tokenizer config) to the new directory
267
- find LLaVA-OneVision-1.5-4B-stage0/ -type f -not -iname '*safetensors*' -exec cp {} LLaVA-OneVision-1.5-4B-3M-Mid-Training-780K-Instruct/ ';'
268
- ```
269
-
270
- ### 7. Evaluation
271
- ```bash
272
- # pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
273
- CUDA_VISIBLE_DEVICES=4,5,6,7 accelerate launch \
274
- --num_processes=4 --main_process_port 12399 -m lmms_eval --model=llava_onevision1_5 --batch_size=1 --tasks=mme \
275
- --model_args=pretrained=/workspace/LLaVA-OneVision-1.5/LLaVA-OneVision-1.5-4B-3M-Mid-Training-780K-Instruct,max_pixels=3240000
276
- ```
277
-
278
- ## Fully Reproducing Guide
279
-
280
- > [!TIP]
281
- > More detailed reproduction steps for the complete process will be provided after the dataset upload is completed.
282
 
283
 
284
  ### Mid-Training
 
146
  --batch_size=1
147
  ```
148
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149
 
150
 
151
  ### Mid-Training