--- library_name: peft license: gemma base_model: google/gemma-3-4b-it tags: - axolotl - base_model:adapter:google/gemma-3-4b-it - lora - transformers datasets: - vlm_data_2025101_1/gemma3-4b-v-KoV_0.0.0.jsonl pipeline_tag: text-generation model-index: - name: outputs/gemma3-4b-v-KoV_0.0.0_w_lora_2.jsonl results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.12.2` ```yaml # ===== Model ===== base_model: google/gemma-3-4b-it processor_type: AutoProcessor chat_template: gemma3 # 멀티모달(비전-챗) 필수 플래그 skip_prepare_dataset: true remove_unused_columns: false sample_packing: false #shuffle_merged_datasets: false #shuffle_before_merging_datasets: false # (기본 false지만 명시 추천) ddp_find_unused_parameters: true dataloader_num_workers: 0 # ===== Data ===== eot_tokens: - datasets: - path: vlm_data_2025101_1/gemma3-4b-v-KoV_0.0.0.jsonl type: chat_template field_messages: messages split: null val_set_size: 0.0 dataset_prepared_path: # ===== Output / Logging ===== output_dir: ./outputs/gemma3-4b-v-KoV_0.0.0_w_lora_2.jsonl logging_steps: 1 # wandb 연동(원하면 변경/주석) wandb_entity: minkyun1 wandb_project: kisti_vlm_axo wandb_name: gemma3-4b-v-KoV_0.0.0_w_lora_2.jsonl # ===== LoRA / Quantization ===== adapter: lora # LLaVA에서 언어모델 쪽 프로젝션에만 LoRA(안전 기본값) lora_r: 128 lora_alpha: 256 lora_dropout: 0.05 lora_target_modules: "model.language_model.layers.[\\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj" # 메모리 여유 충분하지만, 시작은 4bit 로 안정적으로 load_in_4bit: false load_in_8bit: false bf16: true tf32: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false flash_attention: true eager_attention: # ===== Optim & Train ===== optimizer: adamw_torch_fused learning_rate: 4e-5 lr_scheduler: cosine warmup_ratio: 0.05 weight_decay: 0.01 max_grad_norm: 1.0 seed: 42 sequence_len: 8192 pad_to_sequence_len: false excess_length_strategy: drop # GPU당 마이크로 배치/누적 → 유효 배치 = 1 * 8 * 2GPU = 16 micro_batch_size: 1 gradient_accumulation_steps: 16 num_epochs: 5 evals_per_epoch: 1 saves_per_epoch: 1 # save_first_step: true # ===== Multi-GPU: DeepSpeed (추천) ===== # deepspeed 프리셋을 받아서 사용: # axolotl fetch deepspeed_configs # 2×A100 80GB + 7B에는 zero2가 빠르고 안정적 deepspeed: ds_zero2.json # ===== 디버그/재현성(선택) ===== # 데이터 전처리 멀티프로세스가 문제 생기면 1로 낮춰서 원인 파악 # dataset_processes: 1 # ===== [대안] FSDP2 설정(DeepSpeed 대신 쓰고 싶을 때) ===== # fsdp_version: 2 # fsdp_config: # offload_params: false # cpu_ram_efficient_loading: true # auto_wrap_policy: TRANSFORMER_BASED_WRAP # transformer_layer_cls_to_wrap: LlamaDecoderLayer # state_dict_type: FULL_STATE_DICT # reshard_after_forward: true ```

# outputs/gemma3-4b-v-KoV_0.0.0_w_lora_2.jsonl This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) on the vlm_data_2025101_1/gemma3-4b-v-KoV_0.0.0.jsonl dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 4e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - total_eval_batch_size: 8 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 195 - training_steps: 3907 ### Training results ### Framework versions - PEFT 0.17.0 - Transformers 4.55.2 - Pytorch 2.6.0+cu124 - Datasets 4.0.0 - Tokenizers 0.21.4