Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

README.md +202 -0
adapter_config.json +31 -0
adapter_model.safetensors +3 -0
merges.txt +0 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +24 -0
tokenizer_config.json +22 -0
trainer_state.json +2217 -0
training_args.bin +3 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: gpt2-medium
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "GPT2LMHeadModel",
+    "parent_library": "transformers.models.gpt2.modeling_gpt2"
+  },
+  "base_model_name_or_path": "gpt2-medium",
+  "bias": "none",
+  "fan_in_fan_out": true,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": null,
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:39118f7368d7b390b9e48c4e9a09b46e6fcdee5e50be90f61bfd90db2c637fa8
+size 25172088

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6b60f982069b7054a7c3e41818e87b3ee2a8d276881f8d520b1f0c0d70d36309
+size 50372538

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e5994634e90ce1b45d012a45568063be05fea876e791cd66b48a4efc924164b2
+size 14244

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:392855cc9cbe029377262097ef598767921e2a3bc6937822c989a7603ee182c3
+size 1064

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "50256": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 1024,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,2217 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 3125,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0032,
+      "grad_norm": 10.0,
+      "learning_rate": 2.132196162046908e-06,
+      "loss": 22.4733,
+      "step": 10
+    },
+    {
+      "epoch": 0.0064,
+      "grad_norm": 9.999999046325684,
+      "learning_rate": 4.264392324093816e-06,
+      "loss": 18.8042,
+      "step": 20
+    },
+    {
+      "epoch": 0.0096,
+      "grad_norm": 10.000001907348633,
+      "learning_rate": 6.396588486140726e-06,
+      "loss": 19.6935,
+      "step": 30
+    },
+    {
+      "epoch": 0.0128,
+      "grad_norm": 9.999999046325684,
+      "learning_rate": 8.528784648187633e-06,
+      "loss": 19.4804,
+      "step": 40
+    },
+    {
+      "epoch": 0.016,
+      "grad_norm": 10.000000953674316,
+      "learning_rate": 1.0660980810234541e-05,
+      "loss": 17.4569,
+      "step": 50
+    },
+    {
+      "epoch": 0.0192,
+      "grad_norm": 10.0,
+      "learning_rate": 1.2793176972281452e-05,
+      "loss": 18.3143,
+      "step": 60
+    },
+    {
+      "epoch": 0.0224,
+      "grad_norm": 10.0,
+      "learning_rate": 1.4925373134328357e-05,
+      "loss": 16.6743,
+      "step": 70
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 9.999999046325684,
+      "learning_rate": 1.7057569296375266e-05,
+      "loss": 14.7713,
+      "step": 80
+    },
+    {
+      "epoch": 0.0288,
+      "grad_norm": 9.999999046325684,
+      "learning_rate": 1.9189765458422178e-05,
+      "loss": 14.883,
+      "step": 90
+    },
+    {
+      "epoch": 0.032,
+      "grad_norm": 9.999999046325684,
+      "learning_rate": 2.1321961620469083e-05,
+      "loss": 14.1393,
+      "step": 100
+    },
+    {
+      "epoch": 0.0352,
+      "grad_norm": 9.999999046325684,
+      "learning_rate": 2.345415778251599e-05,
+      "loss": 11.5208,
+      "step": 110
+    },
+    {
+      "epoch": 0.0384,
+      "grad_norm": 10.0,
+      "learning_rate": 2.5586353944562904e-05,
+      "loss": 12.865,
+      "step": 120
+    },
+    {
+      "epoch": 0.0416,
+      "grad_norm": 10.0,
+      "learning_rate": 2.771855010660981e-05,
+      "loss": 10.3148,
+      "step": 130
+    },
+    {
+      "epoch": 0.0448,
+      "grad_norm": 9.999998092651367,
+      "learning_rate": 2.9850746268656714e-05,
+      "loss": 9.7127,
+      "step": 140
+    },
+    {
+      "epoch": 0.048,
+      "grad_norm": 9.999998092651367,
+      "learning_rate": 3.1982942430703626e-05,
+      "loss": 8.3006,
+      "step": 150
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 9.999999046325684,
+      "learning_rate": 3.411513859275053e-05,
+      "loss": 7.292,
+      "step": 160
+    },
+    {
+      "epoch": 0.0544,
+      "grad_norm": 9.999999046325684,
+      "learning_rate": 3.624733475479744e-05,
+      "loss": 6.0268,
+      "step": 170
+    },
+    {
+      "epoch": 0.0576,
+      "grad_norm": 9.999998092651367,
+      "learning_rate": 3.8379530916844355e-05,
+      "loss": 5.7297,
+      "step": 180
+    },
+    {
+      "epoch": 0.0608,
+      "grad_norm": 10.0,
+      "learning_rate": 4.051172707889126e-05,
+      "loss": 4.6862,
+      "step": 190
+    },
+    {
+      "epoch": 0.064,
+      "grad_norm": 9.999999046325684,
+      "learning_rate": 4.2643923240938166e-05,
+      "loss": 4.0643,
+      "step": 200
+    },
+    {
+      "epoch": 0.0672,
+      "grad_norm": 9.999998092651367,
+      "learning_rate": 4.477611940298508e-05,
+      "loss": 3.3502,
+      "step": 210
+    },
+    {
+      "epoch": 0.0704,
+      "grad_norm": 9.999998092651367,
+      "learning_rate": 4.690831556503198e-05,
+      "loss": 2.6549,
+      "step": 220
+    },
+    {
+      "epoch": 0.0736,
+      "grad_norm": 8.796483039855957,
+      "learning_rate": 4.904051172707889e-05,
+      "loss": 2.883,
+      "step": 230
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 8.624096870422363,
+      "learning_rate": 5.117270788912581e-05,
+      "loss": 2.3625,
+      "step": 240
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 4.12373161315918,
+      "learning_rate": 5.330490405117271e-05,
+      "loss": 2.0516,
+      "step": 250
+    },
+    {
+      "epoch": 0.0832,
+      "grad_norm": 6.344674110412598,
+      "learning_rate": 5.543710021321962e-05,
+      "loss": 1.8814,
+      "step": 260
+    },
+    {
+      "epoch": 0.0864,
+      "grad_norm": 5.940646171569824,
+      "learning_rate": 5.756929637526652e-05,
+      "loss": 1.8997,
+      "step": 270
+    },
+    {
+      "epoch": 0.0896,
+      "grad_norm": 4.136787414550781,
+      "learning_rate": 5.970149253731343e-05,
+      "loss": 1.6641,
+      "step": 280
+    },
+    {
+      "epoch": 0.0928,
+      "grad_norm": 3.336697578430176,
+      "learning_rate": 6.183368869936035e-05,
+      "loss": 1.5741,
+      "step": 290
+    },
+    {
+      "epoch": 0.096,
+      "grad_norm": 4.003772735595703,
+      "learning_rate": 6.396588486140725e-05,
+      "loss": 1.4759,
+      "step": 300
+    },
+    {
+      "epoch": 0.0992,
+      "grad_norm": 5.183838367462158,
+      "learning_rate": 6.609808102345416e-05,
+      "loss": 1.5434,
+      "step": 310
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 4.6075849533081055,
+      "learning_rate": 6.823027718550106e-05,
+      "loss": 1.5185,
+      "step": 320
+    },
+    {
+      "epoch": 0.1056,
+      "grad_norm": 4.707767486572266,
+      "learning_rate": 7.036247334754798e-05,
+      "loss": 1.3706,
+      "step": 330
+    },
+    {
+      "epoch": 0.1088,
+      "grad_norm": 3.7312469482421875,
+      "learning_rate": 7.249466950959489e-05,
+      "loss": 1.3455,
+      "step": 340
+    },
+    {
+      "epoch": 0.112,
+      "grad_norm": 3.605818033218384,
+      "learning_rate": 7.46268656716418e-05,
+      "loss": 1.2145,
+      "step": 350
+    },
+    {
+      "epoch": 0.1152,
+      "grad_norm": 4.285920143127441,
+      "learning_rate": 7.675906183368871e-05,
+      "loss": 1.303,
+      "step": 360
+    },
+    {
+      "epoch": 0.1184,
+      "grad_norm": 3.215698480606079,
+      "learning_rate": 7.889125799573562e-05,
+      "loss": 1.168,
+      "step": 370
+    },
+    {
+      "epoch": 0.1216,
+      "grad_norm": 4.043213844299316,
+      "learning_rate": 8.102345415778252e-05,
+      "loss": 1.0962,
+      "step": 380
+    },
+    {
+      "epoch": 0.1248,
+      "grad_norm": 4.1487555503845215,
+      "learning_rate": 8.315565031982943e-05,
+      "loss": 1.1853,
+      "step": 390
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 3.942502498626709,
+      "learning_rate": 8.528784648187633e-05,
+      "loss": 1.0418,
+      "step": 400
+    },
+    {
+      "epoch": 0.1312,
+      "grad_norm": 4.188830852508545,
+      "learning_rate": 8.742004264392325e-05,
+      "loss": 1.0992,
+      "step": 410
+    },
+    {
+      "epoch": 0.1344,
+      "grad_norm": 4.130038738250732,
+      "learning_rate": 8.955223880597016e-05,
+      "loss": 0.976,
+      "step": 420
+    },
+    {
+      "epoch": 0.1376,
+      "grad_norm": 3.643944501876831,
+      "learning_rate": 9.168443496801706e-05,
+      "loss": 1.0592,
+      "step": 430
+    },
+    {
+      "epoch": 0.1408,
+      "grad_norm": 3.760075092315674,
+      "learning_rate": 9.381663113006397e-05,
+      "loss": 1.0115,
+      "step": 440
+    },
+    {
+      "epoch": 0.144,
+      "grad_norm": 2.4692914485931396,
+      "learning_rate": 9.594882729211087e-05,
+      "loss": 1.0096,
+      "step": 450
+    },
+    {
+      "epoch": 0.1472,
+      "grad_norm": 3.1593716144561768,
+      "learning_rate": 9.808102345415778e-05,
+      "loss": 0.9181,
+      "step": 460
+    },
+    {
+      "epoch": 0.1504,
+      "grad_norm": 3.6118807792663574,
+      "learning_rate": 9.998877161464182e-05,
+      "loss": 0.921,
+      "step": 470
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 3.3329732418060303,
+      "learning_rate": 9.987648776105997e-05,
+      "loss": 0.8714,
+      "step": 480
+    },
+    {
+      "epoch": 0.1568,
+      "grad_norm": 2.9491333961486816,
+      "learning_rate": 9.97642039074781e-05,
+      "loss": 0.8553,
+      "step": 490
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 3.0345425605773926,
+      "learning_rate": 9.965192005389625e-05,
+      "loss": 0.9232,
+      "step": 500
+    },
+    {
+      "epoch": 0.1632,
+      "grad_norm": 3.3396542072296143,
+      "learning_rate": 9.95396362003144e-05,
+      "loss": 0.8112,
+      "step": 510
+    },
+    {
+      "epoch": 0.1664,
+      "grad_norm": 3.1262471675872803,
+      "learning_rate": 9.942735234673256e-05,
+      "loss": 0.8252,
+      "step": 520
+    },
+    {
+      "epoch": 0.1696,
+      "grad_norm": 2.431586742401123,
+      "learning_rate": 9.931506849315069e-05,
+      "loss": 0.7863,
+      "step": 530
+    },
+    {
+      "epoch": 0.1728,
+      "grad_norm": 3.4569954872131348,
+      "learning_rate": 9.920278463956883e-05,
+      "loss": 0.7916,
+      "step": 540
+    },
+    {
+      "epoch": 0.176,
+      "grad_norm": 3.2024927139282227,
+      "learning_rate": 9.909050078598698e-05,
+      "loss": 0.8355,
+      "step": 550
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 3.4040327072143555,
+      "learning_rate": 9.897821693240512e-05,
+      "loss": 0.8548,
+      "step": 560
+    },
+    {
+      "epoch": 0.1824,
+      "grad_norm": 2.5302040576934814,
+      "learning_rate": 9.886593307882327e-05,
+      "loss": 0.7237,
+      "step": 570
+    },
+    {
+      "epoch": 0.1856,
+      "grad_norm": 3.6245014667510986,
+      "learning_rate": 9.875364922524142e-05,
+      "loss": 0.7395,
+      "step": 580
+    },
+    {
+      "epoch": 0.1888,
+      "grad_norm": 2.6966214179992676,
+      "learning_rate": 9.864136537165956e-05,
+      "loss": 0.7004,
+      "step": 590
+    },
+    {
+      "epoch": 0.192,
+      "grad_norm": 3.207789421081543,
+      "learning_rate": 9.852908151807771e-05,
+      "loss": 0.7419,
+      "step": 600
+    },
+    {
+      "epoch": 0.1952,
+      "grad_norm": 3.4613256454467773,
+      "learning_rate": 9.841679766449586e-05,
+      "loss": 0.7264,
+      "step": 610
+    },
+    {
+      "epoch": 0.1984,
+      "grad_norm": 3.311279058456421,
+      "learning_rate": 9.8304513810914e-05,
+      "loss": 0.6852,
+      "step": 620
+    },
+    {
+      "epoch": 0.2016,
+      "grad_norm": 2.7546231746673584,
+      "learning_rate": 9.819222995733213e-05,
+      "loss": 0.6922,
+      "step": 630
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 3.5927953720092773,
+      "learning_rate": 9.807994610375028e-05,
+      "loss": 0.7391,
+      "step": 640
+    },
+    {
+      "epoch": 0.208,
+      "grad_norm": 2.975539207458496,
+      "learning_rate": 9.796766225016843e-05,
+      "loss": 0.699,
+      "step": 650
+    },
+    {
+      "epoch": 0.2112,
+      "grad_norm": 3.653235673904419,
+      "learning_rate": 9.785537839658657e-05,
+      "loss": 0.6669,
+      "step": 660
+    },
+    {
+      "epoch": 0.2144,
+      "grad_norm": 3.6133604049682617,
+      "learning_rate": 9.774309454300472e-05,
+      "loss": 0.7107,
+      "step": 670
+    },
+    {
+      "epoch": 0.2176,
+      "grad_norm": 2.855388641357422,
+      "learning_rate": 9.763081068942287e-05,
+      "loss": 0.6938,
+      "step": 680
+    },
+    {
+      "epoch": 0.2208,
+      "grad_norm": 2.790356159210205,
+      "learning_rate": 9.751852683584101e-05,
+      "loss": 0.6208,
+      "step": 690
+    },
+    {
+      "epoch": 0.224,
+      "grad_norm": 3.2559432983398438,
+      "learning_rate": 9.740624298225916e-05,
+      "loss": 0.6327,
+      "step": 700
+    },
+    {
+      "epoch": 0.2272,
+      "grad_norm": 2.8997764587402344,
+      "learning_rate": 9.729395912867731e-05,
+      "loss": 0.6176,
+      "step": 710
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 3.3574609756469727,
+      "learning_rate": 9.718167527509545e-05,
+      "loss": 0.601,
+      "step": 720
+    },
+    {
+      "epoch": 0.2336,
+      "grad_norm": 2.94797945022583,
+      "learning_rate": 9.706939142151358e-05,
+      "loss": 0.5904,
+      "step": 730
+    },
+    {
+      "epoch": 0.2368,
+      "grad_norm": 2.8964426517486572,
+      "learning_rate": 9.695710756793174e-05,
+      "loss": 0.5889,
+      "step": 740
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 2.505166530609131,
+      "learning_rate": 9.684482371434989e-05,
+      "loss": 0.5491,
+      "step": 750
+    },
+    {
+      "epoch": 0.2432,
+      "grad_norm": 2.8895745277404785,
+      "learning_rate": 9.673253986076802e-05,
+      "loss": 0.5751,
+      "step": 760
+    },
+    {
+      "epoch": 0.2464,
+      "grad_norm": 2.348055601119995,
+      "learning_rate": 9.662025600718617e-05,
+      "loss": 0.5702,
+      "step": 770
+    },
+    {
+      "epoch": 0.2496,
+      "grad_norm": 3.0331923961639404,
+      "learning_rate": 9.650797215360432e-05,
+      "loss": 0.5299,
+      "step": 780
+    },
+    {
+      "epoch": 0.2528,
+      "grad_norm": 2.881728172302246,
+      "learning_rate": 9.639568830002246e-05,
+      "loss": 0.4772,
+      "step": 790
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 2.9465715885162354,
+      "learning_rate": 9.628340444644061e-05,
+      "loss": 0.5137,
+      "step": 800
+    },
+    {
+      "epoch": 0.2592,
+      "grad_norm": 2.664348602294922,
+      "learning_rate": 9.617112059285875e-05,
+      "loss": 0.4398,
+      "step": 810
+    },
+    {
+      "epoch": 0.2624,
+      "grad_norm": 2.931985378265381,
+      "learning_rate": 9.605883673927689e-05,
+      "loss": 0.4969,
+      "step": 820
+    },
+    {
+      "epoch": 0.2656,
+      "grad_norm": 2.953338861465454,
+      "learning_rate": 9.594655288569504e-05,
+      "loss": 0.4829,
+      "step": 830
+    },
+    {
+      "epoch": 0.2688,
+      "grad_norm": 3.0215139389038086,
+      "learning_rate": 9.583426903211319e-05,
+      "loss": 0.463,
+      "step": 840
+    },
+    {
+      "epoch": 0.272,
+      "grad_norm": 3.6885106563568115,
+      "learning_rate": 9.572198517853134e-05,
+      "loss": 0.4601,
+      "step": 850
+    },
+    {
+      "epoch": 0.2752,
+      "grad_norm": 3.8273849487304688,
+      "learning_rate": 9.560970132494948e-05,
+      "loss": 0.5616,
+      "step": 860
+    },
+    {
+      "epoch": 0.2784,
+      "grad_norm": 3.266014337539673,
+      "learning_rate": 9.549741747136763e-05,
+      "loss": 0.4581,
+      "step": 870
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 2.304288387298584,
+      "learning_rate": 9.538513361778578e-05,
+      "loss": 0.4192,
+      "step": 880
+    },
+    {
+      "epoch": 0.2848,
+      "grad_norm": 2.7526509761810303,
+      "learning_rate": 9.527284976420391e-05,
+      "loss": 0.3809,
+      "step": 890
+    },
+    {
+      "epoch": 0.288,
+      "grad_norm": 3.033849000930786,
+      "learning_rate": 9.516056591062205e-05,
+      "loss": 0.3865,
+      "step": 900
+    },
+    {
+      "epoch": 0.2912,
+      "grad_norm": 2.9246816635131836,
+      "learning_rate": 9.50482820570402e-05,
+      "loss": 0.3565,
+      "step": 910
+    },
+    {
+      "epoch": 0.2944,
+      "grad_norm": 2.111208438873291,
+      "learning_rate": 9.493599820345834e-05,
+      "loss": 0.3507,
+      "step": 920
+    },
+    {
+      "epoch": 0.2976,
+      "grad_norm": 3.193631172180176,
+      "learning_rate": 9.482371434987649e-05,
+      "loss": 0.3136,
+      "step": 930
+    },
+    {
+      "epoch": 0.3008,
+      "grad_norm": 2.647897481918335,
+      "learning_rate": 9.471143049629464e-05,
+      "loss": 0.3034,
+      "step": 940
+    },
+    {
+      "epoch": 0.304,
+      "grad_norm": 4.482762813568115,
+      "learning_rate": 9.459914664271278e-05,
+      "loss": 0.3168,
+      "step": 950
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 2.408997058868408,
+      "learning_rate": 9.448686278913093e-05,
+      "loss": 0.2838,
+      "step": 960
+    },
+    {
+      "epoch": 0.3104,
+      "grad_norm": 2.701946496963501,
+      "learning_rate": 9.437457893554908e-05,
+      "loss": 0.2945,
+      "step": 970
+    },
+    {
+      "epoch": 0.3136,
+      "grad_norm": 2.5696310997009277,
+      "learning_rate": 9.426229508196722e-05,
+      "loss": 0.2483,
+      "step": 980
+    },
+    {
+      "epoch": 0.3168,
+      "grad_norm": 1.9443378448486328,
+      "learning_rate": 9.415001122838537e-05,
+      "loss": 0.2379,
+      "step": 990
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 2.137860059738159,
+      "learning_rate": 9.40377273748035e-05,
+      "loss": 0.2396,
+      "step": 1000
+    },
+    {
+      "epoch": 0.3232,
+      "grad_norm": 2.6493215560913086,
+      "learning_rate": 9.392544352122165e-05,
+      "loss": 0.2294,
+      "step": 1010
+    },
+    {
+      "epoch": 0.3264,
+      "grad_norm": 3.381498336791992,
+      "learning_rate": 9.381315966763979e-05,
+      "loss": 0.2162,
+      "step": 1020
+    },
+    {
+      "epoch": 0.3296,
+      "grad_norm": 2.907684564590454,
+      "learning_rate": 9.370087581405794e-05,
+      "loss": 0.194,
+      "step": 1030
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 2.3227946758270264,
+      "learning_rate": 9.358859196047609e-05,
+      "loss": 0.2071,
+      "step": 1040
+    },
+    {
+      "epoch": 0.336,
+      "grad_norm": 2.3173024654388428,
+      "learning_rate": 9.347630810689423e-05,
+      "loss": 0.2018,
+      "step": 1050
+    },
+    {
+      "epoch": 0.3392,
+      "grad_norm": 2.9500386714935303,
+      "learning_rate": 9.336402425331238e-05,
+      "loss": 0.1987,
+      "step": 1060
+    },
+    {
+      "epoch": 0.3424,
+      "grad_norm": 5.206624984741211,
+      "learning_rate": 9.325174039973053e-05,
+      "loss": 0.2036,
+      "step": 1070
+    },
+    {
+      "epoch": 0.3456,
+      "grad_norm": 1.9075260162353516,
+      "learning_rate": 9.313945654614867e-05,
+      "loss": 0.1785,
+      "step": 1080
+    },
+    {
+      "epoch": 0.3488,
+      "grad_norm": 1.7478675842285156,
+      "learning_rate": 9.30271726925668e-05,
+      "loss": 0.1956,
+      "step": 1090
+    },
+    {
+      "epoch": 0.352,
+      "grad_norm": 2.1234798431396484,
+      "learning_rate": 9.291488883898496e-05,
+      "loss": 0.175,
+      "step": 1100
+    },
+    {
+      "epoch": 0.3552,
+      "grad_norm": 1.828234076499939,
+      "learning_rate": 9.280260498540311e-05,
+      "loss": 0.1591,
+      "step": 1110
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 1.8341890573501587,
+      "learning_rate": 9.269032113182124e-05,
+      "loss": 0.1649,
+      "step": 1120
+    },
+    {
+      "epoch": 0.3616,
+      "grad_norm": 1.3575913906097412,
+      "learning_rate": 9.25780372782394e-05,
+      "loss": 0.1694,
+      "step": 1130
+    },
+    {
+      "epoch": 0.3648,
+      "grad_norm": 2.1410560607910156,
+      "learning_rate": 9.246575342465755e-05,
+      "loss": 0.1659,
+      "step": 1140
+    },
+    {
+      "epoch": 0.368,
+      "grad_norm": 1.7794946432113647,
+      "learning_rate": 9.235346957107568e-05,
+      "loss": 0.1479,
+      "step": 1150
+    },
+    {
+      "epoch": 0.3712,
+      "grad_norm": 2.1806530952453613,
+      "learning_rate": 9.224118571749383e-05,
+      "loss": 0.151,
+      "step": 1160
+    },
+    {
+      "epoch": 0.3744,
+      "grad_norm": 1.5767009258270264,
+      "learning_rate": 9.212890186391197e-05,
+      "loss": 0.1541,
+      "step": 1170
+    },
+    {
+      "epoch": 0.3776,
+      "grad_norm": 1.453840970993042,
+      "learning_rate": 9.201661801033011e-05,
+      "loss": 0.1368,
+      "step": 1180
+    },
+    {
+      "epoch": 0.3808,
+      "grad_norm": 1.8884810209274292,
+      "learning_rate": 9.190433415674826e-05,
+      "loss": 0.1394,
+      "step": 1190
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 1.6665995121002197,
+      "learning_rate": 9.179205030316641e-05,
+      "loss": 0.1356,
+      "step": 1200
+    },
+    {
+      "epoch": 0.3872,
+      "grad_norm": 1.6958074569702148,
+      "learning_rate": 9.167976644958456e-05,
+      "loss": 0.1468,
+      "step": 1210
+    },
+    {
+      "epoch": 0.3904,
+      "grad_norm": 1.609466552734375,
+      "learning_rate": 9.15674825960027e-05,
+      "loss": 0.1396,
+      "step": 1220
+    },
+    {
+      "epoch": 0.3936,
+      "grad_norm": 2.045232057571411,
+      "learning_rate": 9.145519874242085e-05,
+      "loss": 0.1234,
+      "step": 1230
+    },
+    {
+      "epoch": 0.3968,
+      "grad_norm": 1.8025046586990356,
+      "learning_rate": 9.1342914888839e-05,
+      "loss": 0.1335,
+      "step": 1240
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 1.4416459798812866,
+      "learning_rate": 9.123063103525713e-05,
+      "loss": 0.1325,
+      "step": 1250
+    },
+    {
+      "epoch": 0.4032,
+      "grad_norm": 2.590508460998535,
+      "learning_rate": 9.111834718167527e-05,
+      "loss": 0.1258,
+      "step": 1260
+    },
+    {
+      "epoch": 0.4064,
+      "grad_norm": 1.7362828254699707,
+      "learning_rate": 9.100606332809342e-05,
+      "loss": 0.1337,
+      "step": 1270
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 1.726511836051941,
+      "learning_rate": 9.089377947451156e-05,
+      "loss": 0.1274,
+      "step": 1280
+    },
+    {
+      "epoch": 0.4128,
+      "grad_norm": 1.3185865879058838,
+      "learning_rate": 9.078149562092971e-05,
+      "loss": 0.1244,
+      "step": 1290
+    },
+    {
+      "epoch": 0.416,
+      "grad_norm": 1.6214089393615723,
+      "learning_rate": 9.066921176734786e-05,
+      "loss": 0.1002,
+      "step": 1300
+    },
+    {
+      "epoch": 0.4192,
+      "grad_norm": 1.206648826599121,
+      "learning_rate": 9.055692791376601e-05,
+      "loss": 0.1169,
+      "step": 1310
+    },
+    {
+      "epoch": 0.4224,
+      "grad_norm": 1.020280122756958,
+      "learning_rate": 9.044464406018415e-05,
+      "loss": 0.1036,
+      "step": 1320
+    },
+    {
+      "epoch": 0.4256,
+      "grad_norm": 1.86745285987854,
+      "learning_rate": 9.03323602066023e-05,
+      "loss": 0.1163,
+      "step": 1330
+    },
+    {
+      "epoch": 0.4288,
+      "grad_norm": 1.2367403507232666,
+      "learning_rate": 9.022007635302045e-05,
+      "loss": 0.1166,
+      "step": 1340
+    },
+    {
+      "epoch": 0.432,
+      "grad_norm": 1.3708258867263794,
+      "learning_rate": 9.010779249943859e-05,
+      "loss": 0.0956,
+      "step": 1350
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 1.9553757905960083,
+      "learning_rate": 8.999550864585672e-05,
+      "loss": 0.1139,
+      "step": 1360
+    },
+    {
+      "epoch": 0.4384,
+      "grad_norm": 2.1358702182769775,
+      "learning_rate": 8.988322479227488e-05,
+      "loss": 0.1094,
+      "step": 1370
+    },
+    {
+      "epoch": 0.4416,
+      "grad_norm": 3.4062862396240234,
+      "learning_rate": 8.977094093869301e-05,
+      "loss": 0.1017,
+      "step": 1380
+    },
+    {
+      "epoch": 0.4448,
+      "grad_norm": 1.4105799198150635,
+      "learning_rate": 8.965865708511116e-05,
+      "loss": 0.1008,
+      "step": 1390
+    },
+    {
+      "epoch": 0.448,
+      "grad_norm": 1.713500738143921,
+      "learning_rate": 8.954637323152931e-05,
+      "loss": 0.1078,
+      "step": 1400
+    },
+    {
+      "epoch": 0.4512,
+      "grad_norm": 1.128848671913147,
+      "learning_rate": 8.943408937794746e-05,
+      "loss": 0.093,
+      "step": 1410
+    },
+    {
+      "epoch": 0.4544,
+      "grad_norm": 1.3671025037765503,
+      "learning_rate": 8.93218055243656e-05,
+      "loss": 0.1093,
+      "step": 1420
+    },
+    {
+      "epoch": 0.4576,
+      "grad_norm": 1.8151606321334839,
+      "learning_rate": 8.920952167078375e-05,
+      "loss": 0.1088,
+      "step": 1430
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 1.1207199096679688,
+      "learning_rate": 8.909723781720189e-05,
+      "loss": 0.1051,
+      "step": 1440
+    },
+    {
+      "epoch": 0.464,
+      "grad_norm": 1.0649629831314087,
+      "learning_rate": 8.898495396362003e-05,
+      "loss": 0.1,
+      "step": 1450
+    },
+    {
+      "epoch": 0.4672,
+      "grad_norm": 1.4365625381469727,
+      "learning_rate": 8.887267011003818e-05,
+      "loss": 0.0861,
+      "step": 1460
+    },
+    {
+      "epoch": 0.4704,
+      "grad_norm": 1.1462955474853516,
+      "learning_rate": 8.876038625645633e-05,
+      "loss": 0.0866,
+      "step": 1470
+    },
+    {
+      "epoch": 0.4736,
+      "grad_norm": 1.3217401504516602,
+      "learning_rate": 8.864810240287447e-05,
+      "loss": 0.0979,
+      "step": 1480
+    },
+    {
+      "epoch": 0.4768,
+      "grad_norm": 1.3830331563949585,
+      "learning_rate": 8.853581854929262e-05,
+      "loss": 0.0792,
+      "step": 1490
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 1.777252435684204,
+      "learning_rate": 8.842353469571077e-05,
+      "loss": 0.0727,
+      "step": 1500
+    },
+    {
+      "epoch": 0.4832,
+      "grad_norm": 1.220021367073059,
+      "learning_rate": 8.83112508421289e-05,
+      "loss": 0.0836,
+      "step": 1510
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 2.3963851928710938,
+      "learning_rate": 8.819896698854705e-05,
+      "loss": 0.0851,
+      "step": 1520
+    },
+    {
+      "epoch": 0.4896,
+      "grad_norm": 1.1223018169403076,
+      "learning_rate": 8.808668313496519e-05,
+      "loss": 0.082,
+      "step": 1530
+    },
+    {
+      "epoch": 0.4928,
+      "grad_norm": 1.148827314376831,
+      "learning_rate": 8.797439928138334e-05,
+      "loss": 0.0758,
+      "step": 1540
+    },
+    {
+      "epoch": 0.496,
+      "grad_norm": 0.8612151145935059,
+      "learning_rate": 8.786211542780148e-05,
+      "loss": 0.0783,
+      "step": 1550
+    },
+    {
+      "epoch": 0.4992,
+      "grad_norm": 1.1042686700820923,
+      "learning_rate": 8.774983157421963e-05,
+      "loss": 0.0846,
+      "step": 1560
+    },
+    {
+      "epoch": 0.5024,
+      "grad_norm": 1.3059678077697754,
+      "learning_rate": 8.763754772063778e-05,
+      "loss": 0.0842,
+      "step": 1570
+    },
+    {
+      "epoch": 0.5056,
+      "grad_norm": 1.976445198059082,
+      "learning_rate": 8.752526386705592e-05,
+      "loss": 0.0811,
+      "step": 1580
+    },
+    {
+      "epoch": 0.5088,
+      "grad_norm": 1.7577661275863647,
+      "learning_rate": 8.741298001347407e-05,
+      "loss": 0.0848,
+      "step": 1590
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 1.2085758447647095,
+      "learning_rate": 8.730069615989222e-05,
+      "loss": 0.077,
+      "step": 1600
+    },
+    {
+      "epoch": 0.5152,
+      "grad_norm": 1.1045840978622437,
+      "learning_rate": 8.718841230631036e-05,
+      "loss": 0.0794,
+      "step": 1610
+    },
+    {
+      "epoch": 0.5184,
+      "grad_norm": 2.760986328125,
+      "learning_rate": 8.70761284527285e-05,
+      "loss": 0.0887,
+      "step": 1620
+    },
+    {
+      "epoch": 0.5216,
+      "grad_norm": 1.1649103164672852,
+      "learning_rate": 8.696384459914664e-05,
+      "loss": 0.0806,
+      "step": 1630
+    },
+    {
+      "epoch": 0.5248,
+      "grad_norm": 2.1718943119049072,
+      "learning_rate": 8.68515607455648e-05,
+      "loss": 0.0774,
+      "step": 1640
+    },
+    {
+      "epoch": 0.528,
+      "grad_norm": 5.1032586097717285,
+      "learning_rate": 8.673927689198293e-05,
+      "loss": 0.0823,
+      "step": 1650
+    },
+    {
+      "epoch": 0.5312,
+      "grad_norm": 1.3188016414642334,
+      "learning_rate": 8.662699303840108e-05,
+      "loss": 0.0723,
+      "step": 1660
+    },
+    {
+      "epoch": 0.5344,
+      "grad_norm": 1.1091735363006592,
+      "learning_rate": 8.651470918481923e-05,
+      "loss": 0.0801,
+      "step": 1670
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 1.944667935371399,
+      "learning_rate": 8.640242533123737e-05,
+      "loss": 0.0656,
+      "step": 1680
+    },
+    {
+      "epoch": 0.5408,
+      "grad_norm": 1.409705400466919,
+      "learning_rate": 8.629014147765552e-05,
+      "loss": 0.0783,
+      "step": 1690
+    },
+    {
+      "epoch": 0.544,
+      "grad_norm": 1.0202471017837524,
+      "learning_rate": 8.617785762407367e-05,
+      "loss": 0.0698,
+      "step": 1700
+    },
+    {
+      "epoch": 0.5472,
+      "grad_norm": 0.8339371681213379,
+      "learning_rate": 8.606557377049181e-05,
+      "loss": 0.0646,
+      "step": 1710
+    },
+    {
+      "epoch": 0.5504,
+      "grad_norm": 1.0416721105575562,
+      "learning_rate": 8.595328991690995e-05,
+      "loss": 0.0631,
+      "step": 1720
+    },
+    {
+      "epoch": 0.5536,
+      "grad_norm": 0.812588632106781,
+      "learning_rate": 8.58410060633281e-05,
+      "loss": 0.0718,
+      "step": 1730
+    },
+    {
+      "epoch": 0.5568,
+      "grad_norm": 1.2019861936569214,
+      "learning_rate": 8.572872220974623e-05,
+      "loss": 0.0766,
+      "step": 1740
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 0.8724514842033386,
+      "learning_rate": 8.561643835616438e-05,
+      "loss": 0.066,
+      "step": 1750
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 0.9269486665725708,
+      "learning_rate": 8.550415450258253e-05,
+      "loss": 0.0711,
+      "step": 1760
+    },
+    {
+      "epoch": 0.5664,
+      "grad_norm": 1.1435014009475708,
+      "learning_rate": 8.539187064900069e-05,
+      "loss": 0.0649,
+      "step": 1770
+    },
+    {
+      "epoch": 0.5696,
+      "grad_norm": 0.9017223119735718,
+      "learning_rate": 8.527958679541882e-05,
+      "loss": 0.0743,
+      "step": 1780
+    },
+    {
+      "epoch": 0.5728,
+      "grad_norm": 1.5114269256591797,
+      "learning_rate": 8.516730294183697e-05,
+      "loss": 0.0711,
+      "step": 1790
+    },
+    {
+      "epoch": 0.576,
+      "grad_norm": 1.1478126049041748,
+      "learning_rate": 8.505501908825511e-05,
+      "loss": 0.0749,
+      "step": 1800
+    },
+    {
+      "epoch": 0.5792,
+      "grad_norm": 1.3925341367721558,
+      "learning_rate": 8.494273523467325e-05,
+      "loss": 0.0623,
+      "step": 1810
+    },
+    {
+      "epoch": 0.5824,
+      "grad_norm": 0.7855392098426819,
+      "learning_rate": 8.48304513810914e-05,
+      "loss": 0.0568,
+      "step": 1820
+    },
+    {
+      "epoch": 0.5856,
+      "grad_norm": 0.7520506381988525,
+      "learning_rate": 8.471816752750955e-05,
+      "loss": 0.0654,
+      "step": 1830
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 0.8448010087013245,
+      "learning_rate": 8.460588367392769e-05,
+      "loss": 0.0624,
+      "step": 1840
+    },
+    {
+      "epoch": 0.592,
+      "grad_norm": 0.7805534601211548,
+      "learning_rate": 8.449359982034584e-05,
+      "loss": 0.0676,
+      "step": 1850
+    },
+    {
+      "epoch": 0.5952,
+      "grad_norm": 1.1754975318908691,
+      "learning_rate": 8.438131596676399e-05,
+      "loss": 0.0609,
+      "step": 1860
+    },
+    {
+      "epoch": 0.5984,
+      "grad_norm": 0.7776190638542175,
+      "learning_rate": 8.426903211318214e-05,
+      "loss": 0.0601,
+      "step": 1870
+    },
+    {
+      "epoch": 0.6016,
+      "grad_norm": 1.610683560371399,
+      "learning_rate": 8.415674825960028e-05,
+      "loss": 0.0618,
+      "step": 1880
+    },
+    {
+      "epoch": 0.6048,
+      "grad_norm": 0.8926658630371094,
+      "learning_rate": 8.404446440601843e-05,
+      "loss": 0.0582,
+      "step": 1890
+    },
+    {
+      "epoch": 0.608,
+      "grad_norm": 1.2540363073349,
+      "learning_rate": 8.393218055243656e-05,
+      "loss": 0.0623,
+      "step": 1900
+    },
+    {
+      "epoch": 0.6112,
+      "grad_norm": 1.463120937347412,
+      "learning_rate": 8.38198966988547e-05,
+      "loss": 0.0595,
+      "step": 1910
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 1.3101223707199097,
+      "learning_rate": 8.370761284527285e-05,
+      "loss": 0.0653,
+      "step": 1920
+    },
+    {
+      "epoch": 0.6176,
+      "grad_norm": 1.1716495752334595,
+      "learning_rate": 8.3595328991691e-05,
+      "loss": 0.0518,
+      "step": 1930
+    },
+    {
+      "epoch": 0.6208,
+      "grad_norm": 1.539556860923767,
+      "learning_rate": 8.348304513810914e-05,
+      "loss": 0.0496,
+      "step": 1940
+    },
+    {
+      "epoch": 0.624,
+      "grad_norm": 0.8535395264625549,
+      "learning_rate": 8.337076128452729e-05,
+      "loss": 0.0581,
+      "step": 1950
+    },
+    {
+      "epoch": 0.6272,
+      "grad_norm": 0.9112345576286316,
+      "learning_rate": 8.325847743094544e-05,
+      "loss": 0.0568,
+      "step": 1960
+    },
+    {
+      "epoch": 0.6304,
+      "grad_norm": 0.9361368417739868,
+      "learning_rate": 8.314619357736358e-05,
+      "loss": 0.0507,
+      "step": 1970
+    },
+    {
+      "epoch": 0.6336,
+      "grad_norm": 1.4482288360595703,
+      "learning_rate": 8.303390972378173e-05,
+      "loss": 0.0521,
+      "step": 1980
+    },
+    {
+      "epoch": 0.6368,
+      "grad_norm": 0.9236279129981995,
+      "learning_rate": 8.292162587019986e-05,
+      "loss": 0.0512,
+      "step": 1990
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 1.238692045211792,
+      "learning_rate": 8.280934201661802e-05,
+      "loss": 0.0538,
+      "step": 2000
+    },
+    {
+      "epoch": 0.6432,
+      "grad_norm": 0.9093800187110901,
+      "learning_rate": 8.269705816303615e-05,
+      "loss": 0.054,
+      "step": 2010
+    },
+    {
+      "epoch": 0.6464,
+      "grad_norm": 0.7657427787780762,
+      "learning_rate": 8.25847743094543e-05,
+      "loss": 0.049,
+      "step": 2020
+    },
+    {
+      "epoch": 0.6496,
+      "grad_norm": 1.5526565313339233,
+      "learning_rate": 8.247249045587245e-05,
+      "loss": 0.0524,
+      "step": 2030
+    },
+    {
+      "epoch": 0.6528,
+      "grad_norm": 0.7204543352127075,
+      "learning_rate": 8.236020660229059e-05,
+      "loss": 0.0555,
+      "step": 2040
+    },
+    {
+      "epoch": 0.656,
+      "grad_norm": 0.8594505190849304,
+      "learning_rate": 8.224792274870874e-05,
+      "loss": 0.0529,
+      "step": 2050
+    },
+    {
+      "epoch": 0.6592,
+      "grad_norm": 1.0530890226364136,
+      "learning_rate": 8.213563889512689e-05,
+      "loss": 0.0621,
+      "step": 2060
+    },
+    {
+      "epoch": 0.6624,
+      "grad_norm": 0.92835533618927,
+      "learning_rate": 8.202335504154503e-05,
+      "loss": 0.0434,
+      "step": 2070
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 1.0584304332733154,
+      "learning_rate": 8.191107118796317e-05,
+      "loss": 0.0515,
+      "step": 2080
+    },
+    {
+      "epoch": 0.6688,
+      "grad_norm": 0.7033383250236511,
+      "learning_rate": 8.179878733438132e-05,
+      "loss": 0.0451,
+      "step": 2090
+    },
+    {
+      "epoch": 0.672,
+      "grad_norm": 1.09385085105896,
+      "learning_rate": 8.168650348079947e-05,
+      "loss": 0.0486,
+      "step": 2100
+    },
+    {
+      "epoch": 0.6752,
+      "grad_norm": 1.0709513425827026,
+      "learning_rate": 8.15742196272176e-05,
+      "loss": 0.0571,
+      "step": 2110
+    },
+    {
+      "epoch": 0.6784,
+      "grad_norm": 0.7316601276397705,
+      "learning_rate": 8.146193577363576e-05,
+      "loss": 0.0496,
+      "step": 2120
+    },
+    {
+      "epoch": 0.6816,
+      "grad_norm": 0.9458874464035034,
+      "learning_rate": 8.13496519200539e-05,
+      "loss": 0.054,
+      "step": 2130
+    },
+    {
+      "epoch": 0.6848,
+      "grad_norm": 2.1238739490509033,
+      "learning_rate": 8.123736806647204e-05,
+      "loss": 0.0535,
+      "step": 2140
+    },
+    {
+      "epoch": 0.688,
+      "grad_norm": 1.011391520500183,
+      "learning_rate": 8.11250842128902e-05,
+      "loss": 0.0527,
+      "step": 2150
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 0.8783167004585266,
+      "learning_rate": 8.101280035930835e-05,
+      "loss": 0.0424,
+      "step": 2160
+    },
+    {
+      "epoch": 0.6944,
+      "grad_norm": 0.9206530451774597,
+      "learning_rate": 8.090051650572648e-05,
+      "loss": 0.0588,
+      "step": 2170
+    },
+    {
+      "epoch": 0.6976,
+      "grad_norm": 0.5304967164993286,
+      "learning_rate": 8.078823265214462e-05,
+      "loss": 0.0574,
+      "step": 2180
+    },
+    {
+      "epoch": 0.7008,
+      "grad_norm": 0.5870644450187683,
+      "learning_rate": 8.067594879856277e-05,
+      "loss": 0.0512,
+      "step": 2190
+    },
+    {
+      "epoch": 0.704,
+      "grad_norm": 0.7831016182899475,
+      "learning_rate": 8.056366494498092e-05,
+      "loss": 0.0485,
+      "step": 2200
+    },
+    {
+      "epoch": 0.7072,
+      "grad_norm": 2.5291478633880615,
+      "learning_rate": 8.045138109139906e-05,
+      "loss": 0.058,
+      "step": 2210
+    },
+    {
+      "epoch": 0.7104,
+      "grad_norm": 0.705797553062439,
+      "learning_rate": 8.033909723781721e-05,
+      "loss": 0.0534,
+      "step": 2220
+    },
+    {
+      "epoch": 0.7136,
+      "grad_norm": 0.7620034217834473,
+      "learning_rate": 8.022681338423536e-05,
+      "loss": 0.0466,
+      "step": 2230
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 2.0571022033691406,
+      "learning_rate": 8.01145295306535e-05,
+      "loss": 0.0469,
+      "step": 2240
+    },
+    {
+      "epoch": 0.72,
+      "grad_norm": 1.094312310218811,
+      "learning_rate": 8.000224567707165e-05,
+      "loss": 0.0482,
+      "step": 2250
+    },
+    {
+      "epoch": 0.7232,
+      "grad_norm": 0.857584536075592,
+      "learning_rate": 7.988996182348978e-05,
+      "loss": 0.044,
+      "step": 2260
+    },
+    {
+      "epoch": 0.7264,
+      "grad_norm": 0.7637543082237244,
+      "learning_rate": 7.977767796990792e-05,
+      "loss": 0.0493,
+      "step": 2270
+    },
+    {
+      "epoch": 0.7296,
+      "grad_norm": 0.6703894138336182,
+      "learning_rate": 7.966539411632607e-05,
+      "loss": 0.0462,
+      "step": 2280
+    },
+    {
+      "epoch": 0.7328,
+      "grad_norm": 0.7242295145988464,
+      "learning_rate": 7.955311026274422e-05,
+      "loss": 0.044,
+      "step": 2290
+    },
+    {
+      "epoch": 0.736,
+      "grad_norm": 1.0875325202941895,
+      "learning_rate": 7.944082640916236e-05,
+      "loss": 0.0458,
+      "step": 2300
+    },
+    {
+      "epoch": 0.7392,
+      "grad_norm": 0.8322548270225525,
+      "learning_rate": 7.932854255558051e-05,
+      "loss": 0.0471,
+      "step": 2310
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 1.6921989917755127,
+      "learning_rate": 7.921625870199866e-05,
+      "loss": 0.0481,
+      "step": 2320
+    },
+    {
+      "epoch": 0.7456,
+      "grad_norm": 0.9339900612831116,
+      "learning_rate": 7.910397484841681e-05,
+      "loss": 0.0438,
+      "step": 2330
+    },
+    {
+      "epoch": 0.7488,
+      "grad_norm": 0.9007784724235535,
+      "learning_rate": 7.899169099483495e-05,
+      "loss": 0.051,
+      "step": 2340
+    },
+    {
+      "epoch": 0.752,
+      "grad_norm": 0.7366623282432556,
+      "learning_rate": 7.887940714125309e-05,
+      "loss": 0.0488,
+      "step": 2350
+    },
+    {
+      "epoch": 0.7552,
+      "grad_norm": 1.0581986904144287,
+      "learning_rate": 7.876712328767124e-05,
+      "loss": 0.0465,
+      "step": 2360
+    },
+    {
+      "epoch": 0.7584,
+      "grad_norm": 0.8398572206497192,
+      "learning_rate": 7.865483943408937e-05,
+      "loss": 0.0444,
+      "step": 2370
+    },
+    {
+      "epoch": 0.7616,
+      "grad_norm": 0.829765796661377,
+      "learning_rate": 7.854255558050752e-05,
+      "loss": 0.0468,
+      "step": 2380
+    },
+    {
+      "epoch": 0.7648,
+      "grad_norm": 0.8922726511955261,
+      "learning_rate": 7.843027172692568e-05,
+      "loss": 0.0429,
+      "step": 2390
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 0.9981004595756531,
+      "learning_rate": 7.831798787334381e-05,
+      "loss": 0.0425,
+      "step": 2400
+    },
+    {
+      "epoch": 0.7712,
+      "grad_norm": 1.1781072616577148,
+      "learning_rate": 7.820570401976196e-05,
+      "loss": 0.0467,
+      "step": 2410
+    },
+    {
+      "epoch": 0.7744,
+      "grad_norm": 1.295114517211914,
+      "learning_rate": 7.809342016618011e-05,
+      "loss": 0.0429,
+      "step": 2420
+    },
+    {
+      "epoch": 0.7776,
+      "grad_norm": 1.2923245429992676,
+      "learning_rate": 7.798113631259825e-05,
+      "loss": 0.0411,
+      "step": 2430
+    },
+    {
+      "epoch": 0.7808,
+      "grad_norm": 0.9972735643386841,
+      "learning_rate": 7.78688524590164e-05,
+      "loss": 0.0392,
+      "step": 2440
+    },
+    {
+      "epoch": 0.784,
+      "grad_norm": 0.7741293907165527,
+      "learning_rate": 7.775656860543454e-05,
+      "loss": 0.0432,
+      "step": 2450
+    },
+    {
+      "epoch": 0.7872,
+      "grad_norm": 0.5855127573013306,
+      "learning_rate": 7.764428475185269e-05,
+      "loss": 0.0468,
+      "step": 2460
+    },
+    {
+      "epoch": 0.7904,
+      "grad_norm": 0.6745654940605164,
+      "learning_rate": 7.753200089827083e-05,
+      "loss": 0.0394,
+      "step": 2470
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.2831262350082397,
+      "learning_rate": 7.741971704468898e-05,
+      "loss": 0.037,
+      "step": 2480
+    },
+    {
+      "epoch": 0.7968,
+      "grad_norm": 0.6621804237365723,
+      "learning_rate": 7.730743319110713e-05,
+      "loss": 0.0413,
+      "step": 2490
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 0.5354174375534058,
+      "learning_rate": 7.719514933752526e-05,
+      "loss": 0.0375,
+      "step": 2500
+    },
+    {
+      "epoch": 0.8032,
+      "grad_norm": 0.9441866278648376,
+      "learning_rate": 7.708286548394342e-05,
+      "loss": 0.0364,
+      "step": 2510
+    },
+    {
+      "epoch": 0.8064,
+      "grad_norm": 0.6830460429191589,
+      "learning_rate": 7.697058163036157e-05,
+      "loss": 0.042,
+      "step": 2520
+    },
+    {
+      "epoch": 0.8096,
+      "grad_norm": 0.6004045605659485,
+      "learning_rate": 7.68582977767797e-05,
+      "loss": 0.04,
+      "step": 2530
+    },
+    {
+      "epoch": 0.8128,
+      "grad_norm": 0.6852745413780212,
+      "learning_rate": 7.674601392319784e-05,
+      "loss": 0.0404,
+      "step": 2540
+    },
+    {
+      "epoch": 0.816,
+      "grad_norm": 1.0080032348632812,
+      "learning_rate": 7.663373006961599e-05,
+      "loss": 0.043,
+      "step": 2550
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 0.7791699767112732,
+      "learning_rate": 7.652144621603414e-05,
+      "loss": 0.0411,
+      "step": 2560
+    },
+    {
+      "epoch": 0.8224,
+      "grad_norm": 1.192091464996338,
+      "learning_rate": 7.640916236245228e-05,
+      "loss": 0.0413,
+      "step": 2570
+    },
+    {
+      "epoch": 0.8256,
+      "grad_norm": 0.744788646697998,
+      "learning_rate": 7.629687850887043e-05,
+      "loss": 0.0401,
+      "step": 2580
+    },
+    {
+      "epoch": 0.8288,
+      "grad_norm": 0.6732345223426819,
+      "learning_rate": 7.618459465528858e-05,
+      "loss": 0.0426,
+      "step": 2590
+    },
+    {
+      "epoch": 0.832,
+      "grad_norm": 1.0868829488754272,
+      "learning_rate": 7.607231080170672e-05,
+      "loss": 0.0368,
+      "step": 2600
+    },
+    {
+      "epoch": 0.8352,
+      "grad_norm": 1.4248939752578735,
+      "learning_rate": 7.596002694812487e-05,
+      "loss": 0.0369,
+      "step": 2610
+    },
+    {
+      "epoch": 0.8384,
+      "grad_norm": 0.7218601107597351,
+      "learning_rate": 7.5847743094543e-05,
+      "loss": 0.0415,
+      "step": 2620
+    },
+    {
+      "epoch": 0.8416,
+      "grad_norm": 0.803717851638794,
+      "learning_rate": 7.573545924096114e-05,
+      "loss": 0.0352,
+      "step": 2630
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 0.821607768535614,
+      "learning_rate": 7.562317538737929e-05,
+      "loss": 0.039,
+      "step": 2640
+    },
+    {
+      "epoch": 0.848,
+      "grad_norm": 1.1404283046722412,
+      "learning_rate": 7.551089153379744e-05,
+      "loss": 0.0369,
+      "step": 2650
+    },
+    {
+      "epoch": 0.8512,
+      "grad_norm": 1.2288737297058105,
+      "learning_rate": 7.53986076802156e-05,
+      "loss": 0.0417,
+      "step": 2660
+    },
+    {
+      "epoch": 0.8544,
+      "grad_norm": 1.0263468027114868,
+      "learning_rate": 7.528632382663373e-05,
+      "loss": 0.0423,
+      "step": 2670
+    },
+    {
+      "epoch": 0.8576,
+      "grad_norm": 0.8517736196517944,
+      "learning_rate": 7.517403997305188e-05,
+      "loss": 0.0364,
+      "step": 2680
+    },
+    {
+      "epoch": 0.8608,
+      "grad_norm": 0.8727993369102478,
+      "learning_rate": 7.506175611947003e-05,
+      "loss": 0.0382,
+      "step": 2690
+    },
+    {
+      "epoch": 0.864,
+      "grad_norm": 0.7277560234069824,
+      "learning_rate": 7.494947226588817e-05,
+      "loss": 0.0368,
+      "step": 2700
+    },
+    {
+      "epoch": 0.8672,
+      "grad_norm": 0.854989230632782,
+      "learning_rate": 7.483718841230631e-05,
+      "loss": 0.0431,
+      "step": 2710
+    },
+    {
+      "epoch": 0.8704,
+      "grad_norm": 0.47089987993240356,
+      "learning_rate": 7.472490455872446e-05,
+      "loss": 0.0372,
+      "step": 2720
+    },
+    {
+      "epoch": 0.8736,
+      "grad_norm": 0.716643750667572,
+      "learning_rate": 7.46126207051426e-05,
+      "loss": 0.0348,
+      "step": 2730
+    },
+    {
+      "epoch": 0.8768,
+      "grad_norm": 0.8277871012687683,
+      "learning_rate": 7.450033685156075e-05,
+      "loss": 0.0369,
+      "step": 2740
+    },
+    {
+      "epoch": 0.88,
+      "grad_norm": 0.9618933796882629,
+      "learning_rate": 7.43880529979789e-05,
+      "loss": 0.0377,
+      "step": 2750
+    },
+    {
+      "epoch": 0.8832,
+      "grad_norm": 0.6898852586746216,
+      "learning_rate": 7.427576914439703e-05,
+      "loss": 0.0543,
+      "step": 2760
+    },
+    {
+      "epoch": 0.8864,
+      "grad_norm": 1.4362825155258179,
+      "learning_rate": 7.416348529081518e-05,
+      "loss": 0.0397,
+      "step": 2770
+    },
+    {
+      "epoch": 0.8896,
+      "grad_norm": 1.1972767114639282,
+      "learning_rate": 7.405120143723333e-05,
+      "loss": 0.0324,
+      "step": 2780
+    },
+    {
+      "epoch": 0.8928,
+      "grad_norm": 0.5438815355300903,
+      "learning_rate": 7.393891758365149e-05,
+      "loss": 0.0397,
+      "step": 2790
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.513469398021698,
+      "learning_rate": 7.382663373006962e-05,
+      "loss": 0.0338,
+      "step": 2800
+    },
+    {
+      "epoch": 0.8992,
+      "grad_norm": 0.5743911266326904,
+      "learning_rate": 7.371434987648776e-05,
+      "loss": 0.0313,
+      "step": 2810
+    },
+    {
+      "epoch": 0.9024,
+      "grad_norm": 1.011957049369812,
+      "learning_rate": 7.360206602290591e-05,
+      "loss": 0.0374,
+      "step": 2820
+    },
+    {
+      "epoch": 0.9056,
+      "grad_norm": 0.6926620602607727,
+      "learning_rate": 7.348978216932405e-05,
+      "loss": 0.0392,
+      "step": 2830
+    },
+    {
+      "epoch": 0.9088,
+      "grad_norm": 0.6338510513305664,
+      "learning_rate": 7.33774983157422e-05,
+      "loss": 0.0359,
+      "step": 2840
+    },
+    {
+      "epoch": 0.912,
+      "grad_norm": 0.7649824023246765,
+      "learning_rate": 7.326521446216035e-05,
+      "loss": 0.0353,
+      "step": 2850
+    },
+    {
+      "epoch": 0.9152,
+      "grad_norm": 0.8123289346694946,
+      "learning_rate": 7.315293060857849e-05,
+      "loss": 0.0322,
+      "step": 2860
+    },
+    {
+      "epoch": 0.9184,
+      "grad_norm": 0.8033359050750732,
+      "learning_rate": 7.304064675499664e-05,
+      "loss": 0.0331,
+      "step": 2870
+    },
+    {
+      "epoch": 0.9216,
+      "grad_norm": 0.8859496116638184,
+      "learning_rate": 7.292836290141479e-05,
+      "loss": 0.0352,
+      "step": 2880
+    },
+    {
+      "epoch": 0.9248,
+      "grad_norm": 0.7962930202484131,
+      "learning_rate": 7.281607904783292e-05,
+      "loss": 0.0373,
+      "step": 2890
+    },
+    {
+      "epoch": 0.928,
+      "grad_norm": 0.746497392654419,
+      "learning_rate": 7.270379519425106e-05,
+      "loss": 0.0426,
+      "step": 2900
+    },
+    {
+      "epoch": 0.9312,
+      "grad_norm": 0.8344641327857971,
+      "learning_rate": 7.259151134066921e-05,
+      "loss": 0.0349,
+      "step": 2910
+    },
+    {
+      "epoch": 0.9344,
+      "grad_norm": 0.8275250792503357,
+      "learning_rate": 7.247922748708736e-05,
+      "loss": 0.0358,
+      "step": 2920
+    },
+    {
+      "epoch": 0.9376,
+      "grad_norm": 0.5994471907615662,
+      "learning_rate": 7.23669436335055e-05,
+      "loss": 0.0344,
+      "step": 2930
+    },
+    {
+      "epoch": 0.9408,
+      "grad_norm": 0.6452350616455078,
+      "learning_rate": 7.225465977992365e-05,
+      "loss": 0.0358,
+      "step": 2940
+    },
+    {
+      "epoch": 0.944,
+      "grad_norm": 1.0141571760177612,
+      "learning_rate": 7.21423759263418e-05,
+      "loss": 0.0347,
+      "step": 2950
+    },
+    {
+      "epoch": 0.9472,
+      "grad_norm": 0.832384467124939,
+      "learning_rate": 7.203009207275994e-05,
+      "loss": 0.0332,
+      "step": 2960
+    },
+    {
+      "epoch": 0.9504,
+      "grad_norm": 0.7129203677177429,
+      "learning_rate": 7.191780821917809e-05,
+      "loss": 0.0313,
+      "step": 2970
+    },
+    {
+      "epoch": 0.9536,
+      "grad_norm": 0.7890746593475342,
+      "learning_rate": 7.180552436559623e-05,
+      "loss": 0.0331,
+      "step": 2980
+    },
+    {
+      "epoch": 0.9568,
+      "grad_norm": 1.432335615158081,
+      "learning_rate": 7.169324051201438e-05,
+      "loss": 0.0353,
+      "step": 2990
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 1.0536537170410156,
+      "learning_rate": 7.158095665843251e-05,
+      "loss": 0.039,
+      "step": 3000
+    },
+    {
+      "epoch": 0.9632,
+      "grad_norm": 0.7935389280319214,
+      "learning_rate": 7.146867280485066e-05,
+      "loss": 0.0334,
+      "step": 3010
+    },
+    {
+      "epoch": 0.9664,
+      "grad_norm": 1.4054580926895142,
+      "learning_rate": 7.135638895126882e-05,
+      "loss": 0.033,
+      "step": 3020
+    },
+    {
+      "epoch": 0.9696,
+      "grad_norm": 0.6271975040435791,
+      "learning_rate": 7.124410509768695e-05,
+      "loss": 0.0327,
+      "step": 3030
+    },
+    {
+      "epoch": 0.9728,
+      "grad_norm": 0.5951416492462158,
+      "learning_rate": 7.11318212441051e-05,
+      "loss": 0.0325,
+      "step": 3040
+    },
+    {
+      "epoch": 0.976,
+      "grad_norm": 0.6794223785400391,
+      "learning_rate": 7.101953739052325e-05,
+      "loss": 0.0336,
+      "step": 3050
+    },
+    {
+      "epoch": 0.9792,
+      "grad_norm": 1.084647536277771,
+      "learning_rate": 7.090725353694139e-05,
+      "loss": 0.035,
+      "step": 3060
+    },
+    {
+      "epoch": 0.9824,
+      "grad_norm": 0.40548598766326904,
+      "learning_rate": 7.079496968335954e-05,
+      "loss": 0.0277,
+      "step": 3070
+    },
+    {
+      "epoch": 0.9856,
+      "grad_norm": 0.6343255043029785,
+      "learning_rate": 7.068268582977768e-05,
+      "loss": 0.0282,
+      "step": 3080
+    },
+    {
+      "epoch": 0.9888,
+      "grad_norm": 0.53138667345047,
+      "learning_rate": 7.057040197619582e-05,
+      "loss": 0.032,
+      "step": 3090
+    },
+    {
+      "epoch": 0.992,
+      "grad_norm": 0.7178220748901367,
+      "learning_rate": 7.045811812261397e-05,
+      "loss": 0.0323,
+      "step": 3100
+    },
+    {
+      "epoch": 0.9952,
+      "grad_norm": 0.5384820103645325,
+      "learning_rate": 7.034583426903212e-05,
+      "loss": 0.0319,
+      "step": 3110
+    },
+    {
+      "epoch": 0.9984,
+      "grad_norm": 1.4491897821426392,
+      "learning_rate": 7.023355041545027e-05,
+      "loss": 0.0345,
+      "step": 3120
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 9375,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 32,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:58dc4832b9ecbedb58e177e8210247010a0ec93903efa66a51b87b3bc91d64e4
+size 5304

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff