lmms-lab
/

LLaVA-OneVision-1.5-4B-Instruct

@@ -146,139 +146,6 @@ accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \
     --batch_size=1
 ```
-## Quick Start Guide
-### 1.🐳 Docker (Recommended)
-We strongly recommend using the docker environment for a seamless experience. The following instructions are tailored for the A100 80GB GPU environment.
-```bash
-# Clone repository
-git clone https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5.git
-cd LLaVA-OneVision-1.5
-docker build -t llava_megatron:25.04 .
-# Run container with -w to set working directory directly to the mounted volume
-docker run -it --gpus all \
-    --ipc host --net host --privileged --cap-add IPC_LOCK \
-    --ulimit memlock=-1 --ulimit stack=67108864 --rm \
-    -v $(pwd):/workspace/LLaVA-OneVision-1.5 \
-    -w /workspace/LLaVA-OneVision-1.5 \
-    --name "llava_megatron_container" \
-    llava_megatron:25.04 /bin/bash
-```
-### 2. Checkpoint and Format Conversion
-You have two options to get started with LLaVA-OneVision-1.5-stage-0:
-#### Option 1: Download pre-trained model from HuggingFace
-Download our `LLaVA-OneVision-1.5-4B-stage0` model directly from [HuggingFace](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-stage0).
-#### Option 2: Merge initial weights yourself
-Alternatively, you can merge the initial weights from the original ViT and LLM:
-```bash
-python ds/merge_model.py \
---vit_path DeepGlint-AI/rice-vit-large-patch14-560 \
---llm_path Qwen/Qwen3-4B-Instruct-2507 \
---output LLaVA-OneVision-1.5-4B-stage0
-```
-Note: When merging weights, the adapter component will be initialized with default values.
-Convert the model from HuggingFace format to Megatron format:
-```bash
-AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 bash examples/llava_ov_1_5/convert/convert_4b_hf_to_mcore.sh \
-LLaVA-OneVision-1.5-4B-stage0 \
-LLaVA-OneVision-1.5-4B-stage0_mcore_tp1_pp1 \
-1 1
-```
-### 3. Stage 1 Alignment-Training
-Download LLaVA from [LLaVA-558K-Webdataset](https://huggingface.co/datasets/lmms-lab/LLaVA-558K-Webdataset).
-```bash
-# ============================================================
-# Required environment variables:
-#   AIAK_TRAINING_PATH  Root directory of the AIAK-Training-LLM project
-#   DATA_PATH           Directory with WebDataset shards (.tar) for pretraining
-#   TOKENIZER_PATH      Hugging Face tokenizer directory
-#   CHECKPOINT_PATH     Megatron-formatted checkpoint directory (e.g., mcore TP1/PP1)
-#   SAVE_CKPT_PATH      Output directory for saving training checkpoints
-AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
-DATA_PATH=LLaVA-558K-Webdataset \
-TOKENIZER_PATH=LLaVA-OneVision-1.5-4B-stage0 \
-CHECKPOINT_PATH=LLaVA-OneVision-1.5-4B-stage0_mcore_tp1_pp1 \
-bash examples/llava_ov_1_5/quick_start/stage_1_alignment_llava_ov_4b.sh
-```
-### 4. Stage 1.5 Mid-Training
-Download our lightweight packed subset from [LLaVA-OneVision-1.5-Mid-Training-Quick-Start-3M-Webdataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Mid-Training-Webdataset-Quick-Start-3M).
-```bash
-# ============================================================
-# Convert model to release format
-bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_release.sh \
-stage_1_alignment_llava_ov_4b/iter_0002500/ \
-stage_1_alignment_llava_ov_4b_release 1 1
-# ============================================================
-# Launch
-AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
-DATA_PATH=LLaVA-OneVision-1.5-Mid-Training-Quick-Start-3M-Webdataset \
-TOKENIZER_PATH=LLaVA-OneVision-1.5-4B-stage0 \
-CHECKPOINT_PATH=stage_1_alignment_llava_ov_4b_release \
-bash examples/llava_ov_1_5/quick_start/stage_1.5_mid_training_llava_ov_4b.sh
-```
-### 5. Stage 2 Instruct-Training
-Download LLaVA-NeXT-780k-webdataset at [LLaVA-NeXT-780K Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-780k-webdataset).
-```bash
-# ============================================================
-# Convert model to release format
-bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_release.sh \
-stage_1.5_mid_training_llava_ov_4b/iter_0020000/ \
-stage_1.5_mid_training_llava_ov_4b_release 1 1
-# ============================================================
-# # Launch
-AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
-DATA_PATH=LLaVA-NeXT-780k-Webdataset \
-TOKENIZER_PATH=LLaVA-OneVision-1.5-4B-stage0 \
-CHECKPOINT_PATH=stage_1.5_mid_training_llava_ov_4b_release \
-bash examples/llava_ov_1_5/quick_start/stage_2_instruct_llava_ov_4b.sh
-```
-### 6. Convert mcore to huggingface
-```bash
-AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
-bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_hf.sh \
-stage_2_instruct_llava_ov_4b/iter_0003500 \
-LLaVA-OneVision-1.5-4B-3M-Mid-Training-780K-Instruct \
-1 1
-# Copy non-model files (e.g., tokenizer config) to the new directory
-find LLaVA-OneVision-1.5-4B-stage0/ -type f -not -iname '*safetensors*' -exec cp {}  LLaVA-OneVision-1.5-4B-3M-Mid-Training-780K-Instruct/ ';'
-```
-### 7. Evaluation
-```bash
-# pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
-CUDA_VISIBLE_DEVICES=4,5,6,7 accelerate launch \
---num_processes=4 --main_process_port 12399 -m lmms_eval --model=llava_onevision1_5 --batch_size=1 --tasks=mme \
---model_args=pretrained=/workspace/LLaVA-OneVision-1.5/LLaVA-OneVision-1.5-4B-3M-Mid-Training-780K-Instruct,max_pixels=3240000
-```
-## Fully Reproducing Guide
-> [!TIP]
-> More detailed reproduction steps for the complete process will be provided after the dataset upload is completed.
 ### Mid-Training

     --batch_size=1
 ```
 ### Mid-Training