feat: Upload full training checkpoint for resume

Browse files

Files changed (5) hide show

README.md +182 -47
adapter_model.safetensors +1 -1
optimizer.pt +1 -1
scheduler.pt +0 -0
trainer_state.json +308 -5

README.md CHANGED Viewed

@@ -1,72 +1,207 @@
 ---
-library_name: peft
 base_model: Qwen/Qwen2.5-VL-3B-Instruct
 tags:
 - base_model:adapter:Qwen/Qwen2.5-VL-3B-Instruct
 - lora
 - transformers
-pipeline_tag: text-generation
-model-index:
-- name: qwen2.5-vl-vqa-vibook-tmp
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# qwen2.5-vl-vqa-vibook-tmp
-This model is a fine-tuned version of [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.1527
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 4
-- eval_batch_size: 4
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 8
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- training_steps: 1576
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.9777        | 0.1111 | 50   | 1.0407          |
-| 0.8787        | 0.2222 | 100  | 0.8106          |
-| 0.9219        | 0.3333 | 150  | 0.7609          |
-| 0.6949        | 0.4444 | 200  | 0.7009          |
-| 0.7088        | 0.5556 | 250  | 0.6456          |
-| 0.6903        | 0.6667 | 300  | 0.5962          |
-| 0.5669        | 0.7778 | 350  | 0.5696          |
-| 0.6577        | 0.8889 | 400  | 0.5607          |
-| 0.4788        | 1.0    | 450  | 0.5549          |
 ### Framework versions
-- PEFT 0.16.0
-- Transformers 4.53.3
-- Pytorch 2.6.0+cu124
-- Datasets 4.4.1
-- Tokenizers 0.21.2

 ---
 base_model: Qwen/Qwen2.5-VL-3B-Instruct
+library_name: peft
+pipeline_tag: text-generation
 tags:
 - base_model:adapter:Qwen/Qwen2.5-VL-3B-Instruct
 - lora
 - transformers
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
 ### Framework versions
+- PEFT 0.16.0

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:07159a33ca2c1b778fa34d8d79942224bc0f31e7b0936740b7ffcb4734f9f89c
 size 148712776

 version https://git-lfs.github.com/spec/v1
+oid sha256:3750ccd57d3fdcb6b88d266ceb4058d9820139544a558a1849183cd4df3477ae
 size 148712776

optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1ca0feda33341f6afe8006871a17ce269886994285439866daa5248ee77a7d5e
 size 297808698

 version https://git-lfs.github.com/spec/v1
+oid sha256:4e9c16c75f244fe4373934880e5b893fdc5bc9b875528012f878df42fdd3be53
 size 297808698

scheduler.pt CHANGED Viewed

Binary files a/scheduler.pt and b/scheduler.pt differ

trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": 750,
   "best_metric": 0.48672306537628174,
   "best_model_checkpoint": "./qwen2.5-vl-finetune-checkpoints/checkpoint-750",
-  "epoch": 2.7511111111111113,
   "eval_steps": 50,
-  "global_step": 1238,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -1106,12 +1106,315 @@
       "train_runtime": 34829.2458,
       "train_samples_per_second": 0.284,
       "train_steps_per_second": 0.036
     }
   ],
   "logging_steps": 10,
-  "max_steps": 1238,
   "num_input_tokens_seen": 0,
-  "num_train_epochs": 3,
   "save_steps": 50,
   "stateful_callbacks": {
     "TrainerControl": {
@@ -1125,7 +1428,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 6.388579291468186e+16,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": 750,
   "best_metric": 0.48672306537628174,
   "best_model_checkpoint": "./qwen2.5-vl-finetune-checkpoints/checkpoint-750",
+  "epoch": 4.666666666666667,
   "eval_steps": 50,
+  "global_step": 1576,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "train_runtime": 34829.2458,
       "train_samples_per_second": 0.284,
       "train_steps_per_second": 0.036
+    },
+    {
+      "epoch": 3.66962962962963,
+      "grad_norm": 14.275522232055664,
+      "learning_rate": 1.3300797847207797e-05,
+      "loss": 3.5621,
+      "step": 1240
+    },
+    {
+      "epoch": 3.699259259259259,
+      "grad_norm": 27.858943939208984,
+      "learning_rate": 1.2557515699430094e-05,
+      "loss": 4.3815,
+      "step": 1250
+    },
+    {
+      "epoch": 3.699259259259259,
+      "eval_loss": 2.2658419609069824,
+      "eval_runtime": 995.7526,
+      "eval_samples_per_second": 0.301,
+      "eval_steps_per_second": 0.075,
+      "step": 1250
+    },
+    {
+      "epoch": 3.728888888888889,
+      "grad_norm": 30.557031631469727,
+      "learning_rate": 1.1832611379355878e-05,
+      "loss": 3.2056,
+      "step": 1260
+    },
+    {
+      "epoch": 3.7585185185185184,
+      "grad_norm": 34.28306579589844,
+      "learning_rate": 1.1126440690477996e-05,
+      "loss": 2.8957,
+      "step": 1270
+    },
+    {
+      "epoch": 3.788148148148148,
+      "grad_norm": 29.017297744750977,
+      "learning_rate": 1.0439350241294566e-05,
+      "loss": 2.5225,
+      "step": 1280
+    },
+    {
+      "epoch": 3.8177777777777777,
+      "grad_norm": 23.32266616821289,
+      "learning_rate": 9.771677275183744e-06,
+      "loss": 2.6028,
+      "step": 1290
+    },
+    {
+      "epoch": 3.8474074074074074,
+      "grad_norm": 32.830848693847656,
+      "learning_rate": 9.123749504875135e-06,
+      "loss": 2.7177,
+      "step": 1300
+    },
+    {
+      "epoch": 3.8474074074074074,
+      "eval_loss": 1.3522464036941528,
+      "eval_runtime": 985.7859,
+      "eval_samples_per_second": 0.304,
+      "eval_steps_per_second": 0.076,
+      "step": 1300
+    },
+    {
+      "epoch": 3.877037037037037,
+      "grad_norm": 6.538234233856201,
+      "learning_rate": 8.495884951599142e-06,
+      "loss": 2.2624,
+      "step": 1310
+    },
+    {
+      "epoch": 3.9066666666666667,
+      "grad_norm": 19.523771286010742,
+      "learning_rate": 7.888391788993216e-06,
+      "loss": 2.6275,
+      "step": 1320
+    },
+    {
+      "epoch": 3.9362962962962964,
+      "grad_norm": 11.971488952636719,
+      "learning_rate": 7.301568191841457e-06,
+      "loss": 2.1496,
+      "step": 1330
+    },
+    {
+      "epoch": 3.965925925925926,
+      "grad_norm": 34.24433898925781,
+      "learning_rate": 6.735702189722115e-06,
+      "loss": 2.0774,
+      "step": 1340
+    },
+    {
+      "epoch": 3.9955555555555557,
+      "grad_norm": 12.619851112365723,
+      "learning_rate": 6.191071525634456e-06,
+      "loss": 2.0749,
+      "step": 1350
+    },
+    {
+      "epoch": 3.9955555555555557,
+      "eval_loss": 1.2665727138519287,
+      "eval_runtime": 972.1433,
+      "eval_samples_per_second": 0.309,
+      "eval_steps_per_second": 0.077,
+      "step": 1350
+    },
+    {
+      "epoch": 4.026666666666666,
+      "grad_norm": 21.63642692565918,
+      "learning_rate": 5.667943519674723e-06,
+      "loss": 2.2795,
+      "step": 1360
+    },
+    {
+      "epoch": 4.0562962962962965,
+      "grad_norm": 5.838581562042236,
+      "learning_rate": 5.166574937827867e-06,
+      "loss": 2.6146,
+      "step": 1370
+    },
+    {
+      "epoch": 4.085925925925926,
+      "grad_norm": 11.008721351623535,
+      "learning_rate": 4.687211865939539e-06,
+      "loss": 2.3045,
+      "step": 1380
+    },
+    {
+      "epoch": 4.115555555555556,
+      "grad_norm": 6.246650218963623,
+      "learning_rate": 4.2300895889302805e-06,
+      "loss": 1.823,
+      "step": 1390
+    },
+    {
+      "epoch": 4.145185185185185,
+      "grad_norm": 13.782442092895508,
+      "learning_rate": 3.7954324753109673e-06,
+      "loss": 2.2982,
+      "step": 1400
+    },
+    {
+      "epoch": 4.145185185185185,
+      "eval_loss": 1.2098972797393799,
+      "eval_runtime": 998.8662,
+      "eval_samples_per_second": 0.3,
+      "eval_steps_per_second": 0.075,
+      "step": 1400
+    },
+    {
+      "epoch": 4.174814814814815,
+      "grad_norm": 11.179134368896484,
+      "learning_rate": 3.383453867056452e-06,
+      "loss": 2.5618,
+      "step": 1410
+    },
+    {
+      "epoch": 4.204444444444444,
+      "grad_norm": 73.97550201416016,
+      "learning_rate": 2.9943559748912996e-06,
+      "loss": 1.8831,
+      "step": 1420
+    },
+    {
+      "epoch": 4.234074074074074,
+      "grad_norm": 17.907745361328125,
+      "learning_rate": 2.628329779039057e-06,
+      "loss": 2.2352,
+      "step": 1430
+    },
+    {
+      "epoch": 4.263703703703704,
+      "grad_norm": 81.71790313720703,
+      "learning_rate": 2.2855549354837912e-06,
+      "loss": 2.1651,
+      "step": 1440
+    },
+    {
+      "epoch": 4.293333333333333,
+      "grad_norm": 10.33467960357666,
+      "learning_rate": 1.9661996877898105e-06,
+      "loss": 1.7595,
+      "step": 1450
+    },
+    {
+      "epoch": 4.293333333333333,
+      "eval_loss": 1.1622637510299683,
+      "eval_runtime": 993.3397,
+      "eval_samples_per_second": 0.302,
+      "eval_steps_per_second": 0.076,
+      "step": 1450
+    },
+    {
+      "epoch": 4.322962962962963,
+      "grad_norm": 40.43919372558594,
+      "learning_rate": 1.6704207845230358e-06,
+      "loss": 1.9304,
+      "step": 1460
+    },
+    {
+      "epoch": 4.352592592592592,
+      "grad_norm": 10.497286796569824,
+      "learning_rate": 1.3983634023143511e-06,
+      "loss": 2.098,
+      "step": 1470
+    },
+    {
+      "epoch": 4.3822222222222225,
+      "grad_norm": 9.101359367370605,
+      "learning_rate": 1.1501610746028124e-06,
+      "loss": 1.8441,
+      "step": 1480
+    },
+    {
+      "epoch": 4.411851851851852,
+      "grad_norm": 20.517807006835938,
+      "learning_rate": 9.25935626093688e-07,
+      "loss": 2.3551,
+      "step": 1490
+    },
+    {
+      "epoch": 4.441481481481482,
+      "grad_norm": 7.981099605560303,
+      "learning_rate": 7.257971129634389e-07,
+      "loss": 1.6124,
+      "step": 1500
+    },
+    {
+      "epoch": 4.441481481481482,
+      "eval_loss": 1.1480356454849243,
+      "eval_runtime": 970.9195,
+      "eval_samples_per_second": 0.309,
+      "eval_steps_per_second": 0.077,
+      "step": 1500
+    },
+    {
+      "epoch": 4.471111111111111,
+      "grad_norm": 51.19599533081055,
+      "learning_rate": 5.498437688410463e-07,
+      "loss": 2.0946,
+      "step": 1510
+    },
+    {
+      "epoch": 4.50074074074074,
+      "grad_norm": 7.847194671630859,
+      "learning_rate": 3.981619565921968e-07,
+      "loss": 1.8896,
+      "step": 1520
+    },
+    {
+      "epoch": 4.53037037037037,
+      "grad_norm": 12.63452434539795,
+      "learning_rate": 2.708261259299072e-07,
+      "loss": 2.1132,
+      "step": 1530
+    },
+    {
+      "epoch": 4.5600000000000005,
+      "grad_norm": 8.711173057556152,
+      "learning_rate": 1.6789877687254928e-07,
+      "loss": 1.9074,
+      "step": 1540
+    },
+    {
+      "epoch": 4.58962962962963,
+      "grad_norm": 14.014768600463867,
+      "learning_rate": 8.943042906705001e-08,
+      "loss": 2.4591,
+      "step": 1550
+    },
+    {
+      "epoch": 4.58962962962963,
+      "eval_loss": 1.1526756286621094,
+      "eval_runtime": 1013.0536,
+      "eval_samples_per_second": 0.296,
+      "eval_steps_per_second": 0.074,
+      "step": 1550
+    },
+    {
+      "epoch": 4.619259259259259,
+      "grad_norm": 241.5323486328125,
+      "learning_rate": 3.545959699243207e-08,
+      "loss": 1.9968,
+      "step": 1560
+    },
+    {
+      "epoch": 4.648888888888889,
+      "grad_norm": 41.02328109741211,
+      "learning_rate": 6.0127710558133265e-09,
+      "loss": 1.9328,
+      "step": 1570
+    },
+    {
+      "epoch": 4.666666666666667,
+      "step": 1576,
+      "total_flos": 8.15036810717184e+16,
+      "train_loss": 0.49436442077462445,
+      "train_runtime": 26325.3193,
+      "train_samples_per_second": 0.479,
+      "train_steps_per_second": 0.06
     }
   ],
   "logging_steps": 10,
+  "max_steps": 1576,
   "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
   "save_steps": 50,
   "stateful_callbacks": {
     "TrainerControl": {
       "attributes": {}
     }
   },
+  "total_flos": 8.15036810717184e+16,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null