| 2023-10-11 18:40:02,970 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,973 Model: "SequenceTagger( | |
| (embeddings): ByT5Embeddings( | |
| (model): T5EncoderModel( | |
| (shared): Embedding(384, 1472) | |
| (encoder): T5Stack( | |
| (embed_tokens): Embedding(384, 1472) | |
| (block): ModuleList( | |
| (0): T5Block( | |
| (layer): ModuleList( | |
| (0): T5LayerSelfAttention( | |
| (SelfAttention): T5Attention( | |
| (q): Linear(in_features=1472, out_features=384, bias=False) | |
| (k): Linear(in_features=1472, out_features=384, bias=False) | |
| (v): Linear(in_features=1472, out_features=384, bias=False) | |
| (o): Linear(in_features=384, out_features=1472, bias=False) | |
| (relative_attention_bias): Embedding(32, 6) | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (1): T5LayerFF( | |
| (DenseReluDense): T5DenseGatedActDense( | |
| (wi_0): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wi_1): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wo): Linear(in_features=3584, out_features=1472, bias=False) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| (act): NewGELUActivation() | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| ) | |
| ) | |
| (1-11): 11 x T5Block( | |
| (layer): ModuleList( | |
| (0): T5LayerSelfAttention( | |
| (SelfAttention): T5Attention( | |
| (q): Linear(in_features=1472, out_features=384, bias=False) | |
| (k): Linear(in_features=1472, out_features=384, bias=False) | |
| (v): Linear(in_features=1472, out_features=384, bias=False) | |
| (o): Linear(in_features=384, out_features=1472, bias=False) | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (1): T5LayerFF( | |
| (DenseReluDense): T5DenseGatedActDense( | |
| (wi_0): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wi_1): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wo): Linear(in_features=3584, out_features=1472, bias=False) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| (act): NewGELUActivation() | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| ) | |
| ) | |
| ) | |
| (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| ) | |
| ) | |
| (locked_dropout): LockedDropout(p=0.5) | |
| (linear): Linear(in_features=1472, out_features=17, bias=True) | |
| (loss_function): CrossEntropyLoss() | |
| )" | |
| 2023-10-11 18:40:02,973 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,974 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences | |
| - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator | |
| 2023-10-11 18:40:02,974 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,974 Train: 20847 sentences | |
| 2023-10-11 18:40:02,974 (train_with_dev=False, train_with_test=False) | |
| 2023-10-11 18:40:02,974 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,974 Training Params: | |
| 2023-10-11 18:40:02,974 - learning_rate: "0.00015" | |
| 2023-10-11 18:40:02,974 - mini_batch_size: "8" | |
| 2023-10-11 18:40:02,974 - max_epochs: "10" | |
| 2023-10-11 18:40:02,974 - shuffle: "True" | |
| 2023-10-11 18:40:02,974 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,974 Plugins: | |
| 2023-10-11 18:40:02,975 - TensorboardLogger | |
| 2023-10-11 18:40:02,975 - LinearScheduler | warmup_fraction: '0.1' | |
| 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,975 Final evaluation on model from best epoch (best-model.pt) | |
| 2023-10-11 18:40:02,975 - metric: "('micro avg', 'f1-score')" | |
| 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,975 Computation: | |
| 2023-10-11 18:40:02,975 - compute on device: cuda:0 | |
| 2023-10-11 18:40:02,975 - embedding storage: none | |
| 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,975 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" | |
| 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 18:40:02,976 Logging anything other than scalars to TensorBoard is currently not supported. | |
| 2023-10-11 18:42:06,831 epoch 1 - iter 260/2606 - loss 2.81712327 - time (sec): 123.85 - samples/sec: 267.17 - lr: 0.000015 - momentum: 0.000000 | |
| 2023-10-11 18:44:12,618 epoch 1 - iter 520/2606 - loss 2.56637286 - time (sec): 249.64 - samples/sec: 271.69 - lr: 0.000030 - momentum: 0.000000 | |
| 2023-10-11 18:46:20,188 epoch 1 - iter 780/2606 - loss 2.15544405 - time (sec): 377.21 - samples/sec: 277.24 - lr: 0.000045 - momentum: 0.000000 | |
| 2023-10-11 18:48:27,598 epoch 1 - iter 1040/2606 - loss 1.75472736 - time (sec): 504.62 - samples/sec: 281.12 - lr: 0.000060 - momentum: 0.000000 | |
| 2023-10-11 18:50:37,138 epoch 1 - iter 1300/2606 - loss 1.48365754 - time (sec): 634.16 - samples/sec: 283.56 - lr: 0.000075 - momentum: 0.000000 | |
| 2023-10-11 18:52:43,224 epoch 1 - iter 1560/2606 - loss 1.32250010 - time (sec): 760.25 - samples/sec: 282.41 - lr: 0.000090 - momentum: 0.000000 | |
| 2023-10-11 18:54:52,733 epoch 1 - iter 1820/2606 - loss 1.18203870 - time (sec): 889.76 - samples/sec: 283.87 - lr: 0.000105 - momentum: 0.000000 | |
| 2023-10-11 18:57:02,260 epoch 1 - iter 2080/2606 - loss 1.06995283 - time (sec): 1019.28 - samples/sec: 283.80 - lr: 0.000120 - momentum: 0.000000 | |
| 2023-10-11 18:59:14,269 epoch 1 - iter 2340/2606 - loss 0.97221876 - time (sec): 1151.29 - samples/sec: 285.28 - lr: 0.000135 - momentum: 0.000000 | |
| 2023-10-11 19:01:26,904 epoch 1 - iter 2600/2606 - loss 0.89144401 - time (sec): 1283.93 - samples/sec: 285.71 - lr: 0.000150 - momentum: 0.000000 | |
| 2023-10-11 19:01:29,797 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 19:01:29,798 EPOCH 1 done: loss 0.8903 - lr: 0.000150 | |
| 2023-10-11 19:02:04,152 DEV : loss 0.12409034371376038 - f1-score (micro avg) 0.2288 | |
| 2023-10-11 19:02:04,214 saving best model | |
| 2023-10-11 19:02:05,095 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 19:04:16,365 epoch 2 - iter 260/2606 - loss 0.19944480 - time (sec): 131.27 - samples/sec: 276.21 - lr: 0.000148 - momentum: 0.000000 | |
| 2023-10-11 19:06:29,023 epoch 2 - iter 520/2606 - loss 0.17888305 - time (sec): 263.93 - samples/sec: 280.41 - lr: 0.000147 - momentum: 0.000000 | |
| 2023-10-11 19:08:39,009 epoch 2 - iter 780/2606 - loss 0.17550124 - time (sec): 393.91 - samples/sec: 276.30 - lr: 0.000145 - momentum: 0.000000 | |
| 2023-10-11 19:10:54,773 epoch 2 - iter 1040/2606 - loss 0.17289231 - time (sec): 529.68 - samples/sec: 279.05 - lr: 0.000143 - momentum: 0.000000 | |
| 2023-10-11 19:13:08,244 epoch 2 - iter 1300/2606 - loss 0.16686068 - time (sec): 663.15 - samples/sec: 278.12 - lr: 0.000142 - momentum: 0.000000 | |
| 2023-10-11 19:15:18,062 epoch 2 - iter 1560/2606 - loss 0.16331495 - time (sec): 792.96 - samples/sec: 274.50 - lr: 0.000140 - momentum: 0.000000 | |
| 2023-10-11 19:17:26,046 epoch 2 - iter 1820/2606 - loss 0.16202813 - time (sec): 920.95 - samples/sec: 271.80 - lr: 0.000138 - momentum: 0.000000 | |
| 2023-10-11 19:19:38,966 epoch 2 - iter 2080/2606 - loss 0.15745900 - time (sec): 1053.87 - samples/sec: 273.38 - lr: 0.000137 - momentum: 0.000000 | |
| 2023-10-11 19:21:55,074 epoch 2 - iter 2340/2606 - loss 0.15266877 - time (sec): 1189.98 - samples/sec: 277.06 - lr: 0.000135 - momentum: 0.000000 | |
| 2023-10-11 19:24:06,873 epoch 2 - iter 2600/2606 - loss 0.14911363 - time (sec): 1321.78 - samples/sec: 277.57 - lr: 0.000133 - momentum: 0.000000 | |
| 2023-10-11 19:24:09,628 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 19:24:09,628 EPOCH 2 done: loss 0.1491 - lr: 0.000133 | |
| 2023-10-11 19:24:50,274 DEV : loss 0.11884504556655884 - f1-score (micro avg) 0.3333 | |
| 2023-10-11 19:24:50,327 saving best model | |
| 2023-10-11 19:24:52,888 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 19:27:06,818 epoch 3 - iter 260/2606 - loss 0.09549702 - time (sec): 133.93 - samples/sec: 256.83 - lr: 0.000132 - momentum: 0.000000 | |
| 2023-10-11 19:29:18,230 epoch 3 - iter 520/2606 - loss 0.09118772 - time (sec): 265.34 - samples/sec: 253.83 - lr: 0.000130 - momentum: 0.000000 | |
| 2023-10-11 19:31:36,696 epoch 3 - iter 780/2606 - loss 0.09431511 - time (sec): 403.80 - samples/sec: 267.40 - lr: 0.000128 - momentum: 0.000000 | |
| 2023-10-11 19:33:47,901 epoch 3 - iter 1040/2606 - loss 0.09523690 - time (sec): 535.01 - samples/sec: 268.90 - lr: 0.000127 - momentum: 0.000000 | |
| 2023-10-11 19:36:01,433 epoch 3 - iter 1300/2606 - loss 0.09429778 - time (sec): 668.54 - samples/sec: 267.31 - lr: 0.000125 - momentum: 0.000000 | |
| 2023-10-11 19:38:21,667 epoch 3 - iter 1560/2606 - loss 0.09059556 - time (sec): 808.77 - samples/sec: 269.83 - lr: 0.000123 - momentum: 0.000000 | |
| 2023-10-11 19:40:42,056 epoch 3 - iter 1820/2606 - loss 0.09119508 - time (sec): 949.16 - samples/sec: 273.08 - lr: 0.000122 - momentum: 0.000000 | |
| 2023-10-11 19:42:54,370 epoch 3 - iter 2080/2606 - loss 0.09200293 - time (sec): 1081.48 - samples/sec: 270.13 - lr: 0.000120 - momentum: 0.000000 | |
| 2023-10-11 19:45:09,393 epoch 3 - iter 2340/2606 - loss 0.09134453 - time (sec): 1216.50 - samples/sec: 270.53 - lr: 0.000118 - momentum: 0.000000 | |
| 2023-10-11 19:47:25,969 epoch 3 - iter 2600/2606 - loss 0.09143332 - time (sec): 1353.08 - samples/sec: 270.96 - lr: 0.000117 - momentum: 0.000000 | |
| 2023-10-11 19:47:28,983 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 19:47:28,983 EPOCH 3 done: loss 0.0913 - lr: 0.000117 | |
| 2023-10-11 19:48:08,882 DEV : loss 0.2239927500486374 - f1-score (micro avg) 0.3746 | |
| 2023-10-11 19:48:08,941 saving best model | |
| 2023-10-11 19:48:11,492 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 19:50:25,170 epoch 4 - iter 260/2606 - loss 0.06118530 - time (sec): 133.67 - samples/sec: 270.56 - lr: 0.000115 - momentum: 0.000000 | |
| 2023-10-11 19:52:37,833 epoch 4 - iter 520/2606 - loss 0.06087637 - time (sec): 266.34 - samples/sec: 278.05 - lr: 0.000113 - momentum: 0.000000 | |
| 2023-10-11 19:54:53,690 epoch 4 - iter 780/2606 - loss 0.06056691 - time (sec): 402.19 - samples/sec: 280.14 - lr: 0.000112 - momentum: 0.000000 | |
| 2023-10-11 19:57:05,159 epoch 4 - iter 1040/2606 - loss 0.06218598 - time (sec): 533.66 - samples/sec: 282.03 - lr: 0.000110 - momentum: 0.000000 | |
| 2023-10-11 19:59:16,019 epoch 4 - iter 1300/2606 - loss 0.06065766 - time (sec): 664.52 - samples/sec: 280.05 - lr: 0.000108 - momentum: 0.000000 | |
| 2023-10-11 20:01:25,024 epoch 4 - iter 1560/2606 - loss 0.06041444 - time (sec): 793.53 - samples/sec: 282.13 - lr: 0.000107 - momentum: 0.000000 | |
| 2023-10-11 20:03:32,195 epoch 4 - iter 1820/2606 - loss 0.06132794 - time (sec): 920.70 - samples/sec: 282.66 - lr: 0.000105 - momentum: 0.000000 | |
| 2023-10-11 20:05:42,062 epoch 4 - iter 2080/2606 - loss 0.06371225 - time (sec): 1050.56 - samples/sec: 280.30 - lr: 0.000103 - momentum: 0.000000 | |
| 2023-10-11 20:07:54,590 epoch 4 - iter 2340/2606 - loss 0.06335648 - time (sec): 1183.09 - samples/sec: 280.57 - lr: 0.000102 - momentum: 0.000000 | |
| 2023-10-11 20:10:05,042 epoch 4 - iter 2600/2606 - loss 0.06430066 - time (sec): 1313.55 - samples/sec: 278.85 - lr: 0.000100 - momentum: 0.000000 | |
| 2023-10-11 20:10:08,636 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 20:10:08,636 EPOCH 4 done: loss 0.0642 - lr: 0.000100 | |
| 2023-10-11 20:10:49,803 DEV : loss 0.29200002551078796 - f1-score (micro avg) 0.3514 | |
| 2023-10-11 20:10:49,859 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 20:13:04,052 epoch 5 - iter 260/2606 - loss 0.04431033 - time (sec): 134.19 - samples/sec: 269.01 - lr: 0.000098 - momentum: 0.000000 | |
| 2023-10-11 20:15:19,115 epoch 5 - iter 520/2606 - loss 0.04629748 - time (sec): 269.25 - samples/sec: 268.81 - lr: 0.000097 - momentum: 0.000000 | |
| 2023-10-11 20:17:32,772 epoch 5 - iter 780/2606 - loss 0.04374310 - time (sec): 402.91 - samples/sec: 264.85 - lr: 0.000095 - momentum: 0.000000 | |
| 2023-10-11 20:19:48,742 epoch 5 - iter 1040/2606 - loss 0.04534844 - time (sec): 538.88 - samples/sec: 268.67 - lr: 0.000093 - momentum: 0.000000 | |
| 2023-10-11 20:22:02,477 epoch 5 - iter 1300/2606 - loss 0.04402533 - time (sec): 672.62 - samples/sec: 266.46 - lr: 0.000092 - momentum: 0.000000 | |
| 2023-10-11 20:24:16,827 epoch 5 - iter 1560/2606 - loss 0.04300346 - time (sec): 806.97 - samples/sec: 268.39 - lr: 0.000090 - momentum: 0.000000 | |
| 2023-10-11 20:26:33,229 epoch 5 - iter 1820/2606 - loss 0.04146209 - time (sec): 943.37 - samples/sec: 270.44 - lr: 0.000088 - momentum: 0.000000 | |
| 2023-10-11 20:28:48,188 epoch 5 - iter 2080/2606 - loss 0.04272847 - time (sec): 1078.33 - samples/sec: 271.02 - lr: 0.000087 - momentum: 0.000000 | |
| 2023-10-11 20:31:02,796 epoch 5 - iter 2340/2606 - loss 0.04380797 - time (sec): 1212.93 - samples/sec: 270.51 - lr: 0.000085 - momentum: 0.000000 | |
| 2023-10-11 20:33:19,020 epoch 5 - iter 2600/2606 - loss 0.04446162 - time (sec): 1349.16 - samples/sec: 271.77 - lr: 0.000083 - momentum: 0.000000 | |
| 2023-10-11 20:33:21,988 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 20:33:21,988 EPOCH 5 done: loss 0.0445 - lr: 0.000083 | |
| 2023-10-11 20:34:01,578 DEV : loss 0.35915789008140564 - f1-score (micro avg) 0.3515 | |
| 2023-10-11 20:34:01,631 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 20:36:15,312 epoch 6 - iter 260/2606 - loss 0.02751499 - time (sec): 133.68 - samples/sec: 285.83 - lr: 0.000082 - momentum: 0.000000 | |
| 2023-10-11 20:38:28,554 epoch 6 - iter 520/2606 - loss 0.02668139 - time (sec): 266.92 - samples/sec: 285.05 - lr: 0.000080 - momentum: 0.000000 | |
| 2023-10-11 20:40:39,962 epoch 6 - iter 780/2606 - loss 0.02772708 - time (sec): 398.33 - samples/sec: 280.60 - lr: 0.000078 - momentum: 0.000000 | |
| 2023-10-11 20:42:51,500 epoch 6 - iter 1040/2606 - loss 0.02939592 - time (sec): 529.87 - samples/sec: 277.21 - lr: 0.000077 - momentum: 0.000000 | |
| 2023-10-11 20:45:05,211 epoch 6 - iter 1300/2606 - loss 0.02930341 - time (sec): 663.58 - samples/sec: 279.96 - lr: 0.000075 - momentum: 0.000000 | |
| 2023-10-11 20:47:18,160 epoch 6 - iter 1560/2606 - loss 0.03009322 - time (sec): 796.53 - samples/sec: 281.04 - lr: 0.000073 - momentum: 0.000000 | |
| 2023-10-11 20:49:26,446 epoch 6 - iter 1820/2606 - loss 0.03205205 - time (sec): 924.81 - samples/sec: 280.23 - lr: 0.000072 - momentum: 0.000000 | |
| 2023-10-11 20:51:34,987 epoch 6 - iter 2080/2606 - loss 0.03189978 - time (sec): 1053.35 - samples/sec: 279.61 - lr: 0.000070 - momentum: 0.000000 | |
| 2023-10-11 20:53:46,605 epoch 6 - iter 2340/2606 - loss 0.03201383 - time (sec): 1184.97 - samples/sec: 279.41 - lr: 0.000068 - momentum: 0.000000 | |
| 2023-10-11 20:55:59,370 epoch 6 - iter 2600/2606 - loss 0.03251908 - time (sec): 1317.74 - samples/sec: 278.21 - lr: 0.000067 - momentum: 0.000000 | |
| 2023-10-11 20:56:02,483 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 20:56:02,483 EPOCH 6 done: loss 0.0325 - lr: 0.000067 | |
| 2023-10-11 20:56:43,741 DEV : loss 0.3739728629589081 - f1-score (micro avg) 0.3533 | |
| 2023-10-11 20:56:43,807 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 20:58:52,823 epoch 7 - iter 260/2606 - loss 0.02619037 - time (sec): 129.01 - samples/sec: 283.19 - lr: 0.000065 - momentum: 0.000000 | |
| 2023-10-11 21:01:03,347 epoch 7 - iter 520/2606 - loss 0.02490134 - time (sec): 259.54 - samples/sec: 293.88 - lr: 0.000063 - momentum: 0.000000 | |
| 2023-10-11 21:03:10,109 epoch 7 - iter 780/2606 - loss 0.02134400 - time (sec): 386.30 - samples/sec: 289.15 - lr: 0.000062 - momentum: 0.000000 | |
| 2023-10-11 21:05:20,230 epoch 7 - iter 1040/2606 - loss 0.02220272 - time (sec): 516.42 - samples/sec: 288.75 - lr: 0.000060 - momentum: 0.000000 | |
| 2023-10-11 21:07:28,798 epoch 7 - iter 1300/2606 - loss 0.02232363 - time (sec): 644.99 - samples/sec: 286.82 - lr: 0.000058 - momentum: 0.000000 | |
| 2023-10-11 21:09:39,566 epoch 7 - iter 1560/2606 - loss 0.02293314 - time (sec): 775.76 - samples/sec: 287.24 - lr: 0.000057 - momentum: 0.000000 | |
| 2023-10-11 21:11:47,374 epoch 7 - iter 1820/2606 - loss 0.02380953 - time (sec): 903.56 - samples/sec: 285.08 - lr: 0.000055 - momentum: 0.000000 | |
| 2023-10-11 21:13:57,055 epoch 7 - iter 2080/2606 - loss 0.02337935 - time (sec): 1033.24 - samples/sec: 285.77 - lr: 0.000053 - momentum: 0.000000 | |
| 2023-10-11 21:16:05,268 epoch 7 - iter 2340/2606 - loss 0.02286752 - time (sec): 1161.46 - samples/sec: 284.52 - lr: 0.000052 - momentum: 0.000000 | |
| 2023-10-11 21:18:15,839 epoch 7 - iter 2600/2606 - loss 0.02318007 - time (sec): 1292.03 - samples/sec: 283.93 - lr: 0.000050 - momentum: 0.000000 | |
| 2023-10-11 21:18:18,517 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 21:18:18,517 EPOCH 7 done: loss 0.0232 - lr: 0.000050 | |
| 2023-10-11 21:18:59,922 DEV : loss 0.43210309743881226 - f1-score (micro avg) 0.3573 | |
| 2023-10-11 21:18:59,981 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 21:21:15,546 epoch 8 - iter 260/2606 - loss 0.01736298 - time (sec): 135.56 - samples/sec: 273.93 - lr: 0.000048 - momentum: 0.000000 | |
| 2023-10-11 21:23:32,130 epoch 8 - iter 520/2606 - loss 0.01768000 - time (sec): 272.15 - samples/sec: 279.51 - lr: 0.000047 - momentum: 0.000000 | |
| 2023-10-11 21:25:51,149 epoch 8 - iter 780/2606 - loss 0.01844562 - time (sec): 411.16 - samples/sec: 285.42 - lr: 0.000045 - momentum: 0.000000 | |
| 2023-10-11 21:28:06,844 epoch 8 - iter 1040/2606 - loss 0.01901340 - time (sec): 546.86 - samples/sec: 280.10 - lr: 0.000043 - momentum: 0.000000 | |
| 2023-10-11 21:30:19,867 epoch 8 - iter 1300/2606 - loss 0.01890679 - time (sec): 679.88 - samples/sec: 276.66 - lr: 0.000042 - momentum: 0.000000 | |
| 2023-10-11 21:32:31,499 epoch 8 - iter 1560/2606 - loss 0.01866615 - time (sec): 811.52 - samples/sec: 273.92 - lr: 0.000040 - momentum: 0.000000 | |
| 2023-10-11 21:34:40,747 epoch 8 - iter 1820/2606 - loss 0.01757957 - time (sec): 940.76 - samples/sec: 272.69 - lr: 0.000038 - momentum: 0.000000 | |
| 2023-10-11 21:36:51,645 epoch 8 - iter 2080/2606 - loss 0.01765646 - time (sec): 1071.66 - samples/sec: 272.98 - lr: 0.000037 - momentum: 0.000000 | |
| 2023-10-11 21:39:02,297 epoch 8 - iter 2340/2606 - loss 0.01729175 - time (sec): 1202.31 - samples/sec: 273.08 - lr: 0.000035 - momentum: 0.000000 | |
| 2023-10-11 21:41:16,293 epoch 8 - iter 2600/2606 - loss 0.01830845 - time (sec): 1336.31 - samples/sec: 274.43 - lr: 0.000033 - momentum: 0.000000 | |
| 2023-10-11 21:41:19,135 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 21:41:19,135 EPOCH 8 done: loss 0.0183 - lr: 0.000033 | |
| 2023-10-11 21:41:57,348 DEV : loss 0.4571216106414795 - f1-score (micro avg) 0.3794 | |
| 2023-10-11 21:41:57,402 saving best model | |
| 2023-10-11 21:41:59,961 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 21:44:14,638 epoch 9 - iter 260/2606 - loss 0.01227961 - time (sec): 134.67 - samples/sec: 288.28 - lr: 0.000032 - momentum: 0.000000 | |
| 2023-10-11 21:46:25,123 epoch 9 - iter 520/2606 - loss 0.01627790 - time (sec): 265.16 - samples/sec: 289.19 - lr: 0.000030 - momentum: 0.000000 | |
| 2023-10-11 21:48:34,665 epoch 9 - iter 780/2606 - loss 0.01503186 - time (sec): 394.70 - samples/sec: 281.10 - lr: 0.000028 - momentum: 0.000000 | |
| 2023-10-11 21:50:45,179 epoch 9 - iter 1040/2606 - loss 0.01537357 - time (sec): 525.21 - samples/sec: 277.70 - lr: 0.000027 - momentum: 0.000000 | |
| 2023-10-11 21:52:57,790 epoch 9 - iter 1300/2606 - loss 0.01459316 - time (sec): 657.82 - samples/sec: 278.54 - lr: 0.000025 - momentum: 0.000000 | |
| 2023-10-11 21:55:08,103 epoch 9 - iter 1560/2606 - loss 0.01431843 - time (sec): 788.14 - samples/sec: 278.39 - lr: 0.000023 - momentum: 0.000000 | |
| 2023-10-11 21:57:19,726 epoch 9 - iter 1820/2606 - loss 0.01345396 - time (sec): 919.76 - samples/sec: 281.39 - lr: 0.000022 - momentum: 0.000000 | |
| 2023-10-11 21:59:26,920 epoch 9 - iter 2080/2606 - loss 0.01290213 - time (sec): 1046.95 - samples/sec: 280.05 - lr: 0.000020 - momentum: 0.000000 | |
| 2023-10-11 22:01:36,701 epoch 9 - iter 2340/2606 - loss 0.01298304 - time (sec): 1176.74 - samples/sec: 280.05 - lr: 0.000018 - momentum: 0.000000 | |
| 2023-10-11 22:03:49,254 epoch 9 - iter 2600/2606 - loss 0.01328377 - time (sec): 1309.29 - samples/sec: 279.95 - lr: 0.000017 - momentum: 0.000000 | |
| 2023-10-11 22:03:52,305 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 22:03:52,306 EPOCH 9 done: loss 0.0134 - lr: 0.000017 | |
| 2023-10-11 22:04:31,292 DEV : loss 0.48688554763793945 - f1-score (micro avg) 0.3662 | |
| 2023-10-11 22:04:31,344 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 22:06:40,870 epoch 10 - iter 260/2606 - loss 0.00976991 - time (sec): 129.52 - samples/sec: 274.54 - lr: 0.000015 - momentum: 0.000000 | |
| 2023-10-11 22:08:51,631 epoch 10 - iter 520/2606 - loss 0.01046522 - time (sec): 260.28 - samples/sec: 277.04 - lr: 0.000013 - momentum: 0.000000 | |
| 2023-10-11 22:11:04,388 epoch 10 - iter 780/2606 - loss 0.01008981 - time (sec): 393.04 - samples/sec: 281.28 - lr: 0.000012 - momentum: 0.000000 | |
| 2023-10-11 22:13:14,775 epoch 10 - iter 1040/2606 - loss 0.00982650 - time (sec): 523.43 - samples/sec: 280.23 - lr: 0.000010 - momentum: 0.000000 | |
| 2023-10-11 22:15:25,947 epoch 10 - iter 1300/2606 - loss 0.01060840 - time (sec): 654.60 - samples/sec: 279.02 - lr: 0.000008 - momentum: 0.000000 | |
| 2023-10-11 22:17:41,758 epoch 10 - iter 1560/2606 - loss 0.01019264 - time (sec): 790.41 - samples/sec: 276.50 - lr: 0.000007 - momentum: 0.000000 | |
| 2023-10-11 22:19:57,563 epoch 10 - iter 1820/2606 - loss 0.01002362 - time (sec): 926.22 - samples/sec: 276.30 - lr: 0.000005 - momentum: 0.000000 | |
| 2023-10-11 22:22:13,203 epoch 10 - iter 2080/2606 - loss 0.01004871 - time (sec): 1061.86 - samples/sec: 277.49 - lr: 0.000003 - momentum: 0.000000 | |
| 2023-10-11 22:24:25,037 epoch 10 - iter 2340/2606 - loss 0.01040771 - time (sec): 1193.69 - samples/sec: 277.55 - lr: 0.000002 - momentum: 0.000000 | |
| 2023-10-11 22:26:35,199 epoch 10 - iter 2600/2606 - loss 0.01012088 - time (sec): 1323.85 - samples/sec: 276.81 - lr: 0.000000 - momentum: 0.000000 | |
| 2023-10-11 22:26:38,285 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 22:26:38,286 EPOCH 10 done: loss 0.0101 - lr: 0.000000 | |
| 2023-10-11 22:27:17,636 DEV : loss 0.49472910165786743 - f1-score (micro avg) 0.3727 | |
| 2023-10-11 22:27:18,568 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 22:27:18,570 Loading model from best epoch ... | |
| 2023-10-11 22:27:22,331 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd | |
| 2023-10-11 22:28:58,579 | |
| Results: | |
| - F-score (micro) 0.4707 | |
| - F-score (macro) 0.3209 | |
| - Accuracy 0.3127 | |
| By class: | |
| precision recall f1-score support | |
| LOC 0.4855 0.6087 0.5402 1214 | |
| PER 0.3913 0.4926 0.4362 808 | |
| ORG 0.3030 0.3116 0.3073 353 | |
| HumanProd 0.0000 0.0000 0.0000 15 | |
| micro avg 0.4287 0.5218 0.4707 2390 | |
| macro avg 0.2950 0.3532 0.3209 2390 | |
| weighted avg 0.4237 0.5218 0.4672 2390 | |
| 2023-10-11 22:28:58,579 ---------------------------------------------------------------------------------------------------- | |