stefan-it's picture
Upload folder using huggingface_hub
693dbd7
2023-10-11 18:40:02,970 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,973 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 18:40:02,973 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,974 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
2023-10-11 18:40:02,974 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,974 Train: 20847 sentences
2023-10-11 18:40:02,974 (train_with_dev=False, train_with_test=False)
2023-10-11 18:40:02,974 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,974 Training Params:
2023-10-11 18:40:02,974 - learning_rate: "0.00015"
2023-10-11 18:40:02,974 - mini_batch_size: "8"
2023-10-11 18:40:02,974 - max_epochs: "10"
2023-10-11 18:40:02,974 - shuffle: "True"
2023-10-11 18:40:02,974 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,974 Plugins:
2023-10-11 18:40:02,975 - TensorboardLogger
2023-10-11 18:40:02,975 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 18:40:02,975 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,975 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 18:40:02,975 - metric: "('micro avg', 'f1-score')"
2023-10-11 18:40:02,975 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,975 Computation:
2023-10-11 18:40:02,975 - compute on device: cuda:0
2023-10-11 18:40:02,975 - embedding storage: none
2023-10-11 18:40:02,975 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,975 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4"
2023-10-11 18:40:02,975 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,975 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:02,976 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 18:42:06,831 epoch 1 - iter 260/2606 - loss 2.81712327 - time (sec): 123.85 - samples/sec: 267.17 - lr: 0.000015 - momentum: 0.000000
2023-10-11 18:44:12,618 epoch 1 - iter 520/2606 - loss 2.56637286 - time (sec): 249.64 - samples/sec: 271.69 - lr: 0.000030 - momentum: 0.000000
2023-10-11 18:46:20,188 epoch 1 - iter 780/2606 - loss 2.15544405 - time (sec): 377.21 - samples/sec: 277.24 - lr: 0.000045 - momentum: 0.000000
2023-10-11 18:48:27,598 epoch 1 - iter 1040/2606 - loss 1.75472736 - time (sec): 504.62 - samples/sec: 281.12 - lr: 0.000060 - momentum: 0.000000
2023-10-11 18:50:37,138 epoch 1 - iter 1300/2606 - loss 1.48365754 - time (sec): 634.16 - samples/sec: 283.56 - lr: 0.000075 - momentum: 0.000000
2023-10-11 18:52:43,224 epoch 1 - iter 1560/2606 - loss 1.32250010 - time (sec): 760.25 - samples/sec: 282.41 - lr: 0.000090 - momentum: 0.000000
2023-10-11 18:54:52,733 epoch 1 - iter 1820/2606 - loss 1.18203870 - time (sec): 889.76 - samples/sec: 283.87 - lr: 0.000105 - momentum: 0.000000
2023-10-11 18:57:02,260 epoch 1 - iter 2080/2606 - loss 1.06995283 - time (sec): 1019.28 - samples/sec: 283.80 - lr: 0.000120 - momentum: 0.000000
2023-10-11 18:59:14,269 epoch 1 - iter 2340/2606 - loss 0.97221876 - time (sec): 1151.29 - samples/sec: 285.28 - lr: 0.000135 - momentum: 0.000000
2023-10-11 19:01:26,904 epoch 1 - iter 2600/2606 - loss 0.89144401 - time (sec): 1283.93 - samples/sec: 285.71 - lr: 0.000150 - momentum: 0.000000
2023-10-11 19:01:29,797 ----------------------------------------------------------------------------------------------------
2023-10-11 19:01:29,798 EPOCH 1 done: loss 0.8903 - lr: 0.000150
2023-10-11 19:02:04,152 DEV : loss 0.12409034371376038 - f1-score (micro avg) 0.2288
2023-10-11 19:02:04,214 saving best model
2023-10-11 19:02:05,095 ----------------------------------------------------------------------------------------------------
2023-10-11 19:04:16,365 epoch 2 - iter 260/2606 - loss 0.19944480 - time (sec): 131.27 - samples/sec: 276.21 - lr: 0.000148 - momentum: 0.000000
2023-10-11 19:06:29,023 epoch 2 - iter 520/2606 - loss 0.17888305 - time (sec): 263.93 - samples/sec: 280.41 - lr: 0.000147 - momentum: 0.000000
2023-10-11 19:08:39,009 epoch 2 - iter 780/2606 - loss 0.17550124 - time (sec): 393.91 - samples/sec: 276.30 - lr: 0.000145 - momentum: 0.000000
2023-10-11 19:10:54,773 epoch 2 - iter 1040/2606 - loss 0.17289231 - time (sec): 529.68 - samples/sec: 279.05 - lr: 0.000143 - momentum: 0.000000
2023-10-11 19:13:08,244 epoch 2 - iter 1300/2606 - loss 0.16686068 - time (sec): 663.15 - samples/sec: 278.12 - lr: 0.000142 - momentum: 0.000000
2023-10-11 19:15:18,062 epoch 2 - iter 1560/2606 - loss 0.16331495 - time (sec): 792.96 - samples/sec: 274.50 - lr: 0.000140 - momentum: 0.000000
2023-10-11 19:17:26,046 epoch 2 - iter 1820/2606 - loss 0.16202813 - time (sec): 920.95 - samples/sec: 271.80 - lr: 0.000138 - momentum: 0.000000
2023-10-11 19:19:38,966 epoch 2 - iter 2080/2606 - loss 0.15745900 - time (sec): 1053.87 - samples/sec: 273.38 - lr: 0.000137 - momentum: 0.000000
2023-10-11 19:21:55,074 epoch 2 - iter 2340/2606 - loss 0.15266877 - time (sec): 1189.98 - samples/sec: 277.06 - lr: 0.000135 - momentum: 0.000000
2023-10-11 19:24:06,873 epoch 2 - iter 2600/2606 - loss 0.14911363 - time (sec): 1321.78 - samples/sec: 277.57 - lr: 0.000133 - momentum: 0.000000
2023-10-11 19:24:09,628 ----------------------------------------------------------------------------------------------------
2023-10-11 19:24:09,628 EPOCH 2 done: loss 0.1491 - lr: 0.000133
2023-10-11 19:24:50,274 DEV : loss 0.11884504556655884 - f1-score (micro avg) 0.3333
2023-10-11 19:24:50,327 saving best model
2023-10-11 19:24:52,888 ----------------------------------------------------------------------------------------------------
2023-10-11 19:27:06,818 epoch 3 - iter 260/2606 - loss 0.09549702 - time (sec): 133.93 - samples/sec: 256.83 - lr: 0.000132 - momentum: 0.000000
2023-10-11 19:29:18,230 epoch 3 - iter 520/2606 - loss 0.09118772 - time (sec): 265.34 - samples/sec: 253.83 - lr: 0.000130 - momentum: 0.000000
2023-10-11 19:31:36,696 epoch 3 - iter 780/2606 - loss 0.09431511 - time (sec): 403.80 - samples/sec: 267.40 - lr: 0.000128 - momentum: 0.000000
2023-10-11 19:33:47,901 epoch 3 - iter 1040/2606 - loss 0.09523690 - time (sec): 535.01 - samples/sec: 268.90 - lr: 0.000127 - momentum: 0.000000
2023-10-11 19:36:01,433 epoch 3 - iter 1300/2606 - loss 0.09429778 - time (sec): 668.54 - samples/sec: 267.31 - lr: 0.000125 - momentum: 0.000000
2023-10-11 19:38:21,667 epoch 3 - iter 1560/2606 - loss 0.09059556 - time (sec): 808.77 - samples/sec: 269.83 - lr: 0.000123 - momentum: 0.000000
2023-10-11 19:40:42,056 epoch 3 - iter 1820/2606 - loss 0.09119508 - time (sec): 949.16 - samples/sec: 273.08 - lr: 0.000122 - momentum: 0.000000
2023-10-11 19:42:54,370 epoch 3 - iter 2080/2606 - loss 0.09200293 - time (sec): 1081.48 - samples/sec: 270.13 - lr: 0.000120 - momentum: 0.000000
2023-10-11 19:45:09,393 epoch 3 - iter 2340/2606 - loss 0.09134453 - time (sec): 1216.50 - samples/sec: 270.53 - lr: 0.000118 - momentum: 0.000000
2023-10-11 19:47:25,969 epoch 3 - iter 2600/2606 - loss 0.09143332 - time (sec): 1353.08 - samples/sec: 270.96 - lr: 0.000117 - momentum: 0.000000
2023-10-11 19:47:28,983 ----------------------------------------------------------------------------------------------------
2023-10-11 19:47:28,983 EPOCH 3 done: loss 0.0913 - lr: 0.000117
2023-10-11 19:48:08,882 DEV : loss 0.2239927500486374 - f1-score (micro avg) 0.3746
2023-10-11 19:48:08,941 saving best model
2023-10-11 19:48:11,492 ----------------------------------------------------------------------------------------------------
2023-10-11 19:50:25,170 epoch 4 - iter 260/2606 - loss 0.06118530 - time (sec): 133.67 - samples/sec: 270.56 - lr: 0.000115 - momentum: 0.000000
2023-10-11 19:52:37,833 epoch 4 - iter 520/2606 - loss 0.06087637 - time (sec): 266.34 - samples/sec: 278.05 - lr: 0.000113 - momentum: 0.000000
2023-10-11 19:54:53,690 epoch 4 - iter 780/2606 - loss 0.06056691 - time (sec): 402.19 - samples/sec: 280.14 - lr: 0.000112 - momentum: 0.000000
2023-10-11 19:57:05,159 epoch 4 - iter 1040/2606 - loss 0.06218598 - time (sec): 533.66 - samples/sec: 282.03 - lr: 0.000110 - momentum: 0.000000
2023-10-11 19:59:16,019 epoch 4 - iter 1300/2606 - loss 0.06065766 - time (sec): 664.52 - samples/sec: 280.05 - lr: 0.000108 - momentum: 0.000000
2023-10-11 20:01:25,024 epoch 4 - iter 1560/2606 - loss 0.06041444 - time (sec): 793.53 - samples/sec: 282.13 - lr: 0.000107 - momentum: 0.000000
2023-10-11 20:03:32,195 epoch 4 - iter 1820/2606 - loss 0.06132794 - time (sec): 920.70 - samples/sec: 282.66 - lr: 0.000105 - momentum: 0.000000
2023-10-11 20:05:42,062 epoch 4 - iter 2080/2606 - loss 0.06371225 - time (sec): 1050.56 - samples/sec: 280.30 - lr: 0.000103 - momentum: 0.000000
2023-10-11 20:07:54,590 epoch 4 - iter 2340/2606 - loss 0.06335648 - time (sec): 1183.09 - samples/sec: 280.57 - lr: 0.000102 - momentum: 0.000000
2023-10-11 20:10:05,042 epoch 4 - iter 2600/2606 - loss 0.06430066 - time (sec): 1313.55 - samples/sec: 278.85 - lr: 0.000100 - momentum: 0.000000
2023-10-11 20:10:08,636 ----------------------------------------------------------------------------------------------------
2023-10-11 20:10:08,636 EPOCH 4 done: loss 0.0642 - lr: 0.000100
2023-10-11 20:10:49,803 DEV : loss 0.29200002551078796 - f1-score (micro avg) 0.3514
2023-10-11 20:10:49,859 ----------------------------------------------------------------------------------------------------
2023-10-11 20:13:04,052 epoch 5 - iter 260/2606 - loss 0.04431033 - time (sec): 134.19 - samples/sec: 269.01 - lr: 0.000098 - momentum: 0.000000
2023-10-11 20:15:19,115 epoch 5 - iter 520/2606 - loss 0.04629748 - time (sec): 269.25 - samples/sec: 268.81 - lr: 0.000097 - momentum: 0.000000
2023-10-11 20:17:32,772 epoch 5 - iter 780/2606 - loss 0.04374310 - time (sec): 402.91 - samples/sec: 264.85 - lr: 0.000095 - momentum: 0.000000
2023-10-11 20:19:48,742 epoch 5 - iter 1040/2606 - loss 0.04534844 - time (sec): 538.88 - samples/sec: 268.67 - lr: 0.000093 - momentum: 0.000000
2023-10-11 20:22:02,477 epoch 5 - iter 1300/2606 - loss 0.04402533 - time (sec): 672.62 - samples/sec: 266.46 - lr: 0.000092 - momentum: 0.000000
2023-10-11 20:24:16,827 epoch 5 - iter 1560/2606 - loss 0.04300346 - time (sec): 806.97 - samples/sec: 268.39 - lr: 0.000090 - momentum: 0.000000
2023-10-11 20:26:33,229 epoch 5 - iter 1820/2606 - loss 0.04146209 - time (sec): 943.37 - samples/sec: 270.44 - lr: 0.000088 - momentum: 0.000000
2023-10-11 20:28:48,188 epoch 5 - iter 2080/2606 - loss 0.04272847 - time (sec): 1078.33 - samples/sec: 271.02 - lr: 0.000087 - momentum: 0.000000
2023-10-11 20:31:02,796 epoch 5 - iter 2340/2606 - loss 0.04380797 - time (sec): 1212.93 - samples/sec: 270.51 - lr: 0.000085 - momentum: 0.000000
2023-10-11 20:33:19,020 epoch 5 - iter 2600/2606 - loss 0.04446162 - time (sec): 1349.16 - samples/sec: 271.77 - lr: 0.000083 - momentum: 0.000000
2023-10-11 20:33:21,988 ----------------------------------------------------------------------------------------------------
2023-10-11 20:33:21,988 EPOCH 5 done: loss 0.0445 - lr: 0.000083
2023-10-11 20:34:01,578 DEV : loss 0.35915789008140564 - f1-score (micro avg) 0.3515
2023-10-11 20:34:01,631 ----------------------------------------------------------------------------------------------------
2023-10-11 20:36:15,312 epoch 6 - iter 260/2606 - loss 0.02751499 - time (sec): 133.68 - samples/sec: 285.83 - lr: 0.000082 - momentum: 0.000000
2023-10-11 20:38:28,554 epoch 6 - iter 520/2606 - loss 0.02668139 - time (sec): 266.92 - samples/sec: 285.05 - lr: 0.000080 - momentum: 0.000000
2023-10-11 20:40:39,962 epoch 6 - iter 780/2606 - loss 0.02772708 - time (sec): 398.33 - samples/sec: 280.60 - lr: 0.000078 - momentum: 0.000000
2023-10-11 20:42:51,500 epoch 6 - iter 1040/2606 - loss 0.02939592 - time (sec): 529.87 - samples/sec: 277.21 - lr: 0.000077 - momentum: 0.000000
2023-10-11 20:45:05,211 epoch 6 - iter 1300/2606 - loss 0.02930341 - time (sec): 663.58 - samples/sec: 279.96 - lr: 0.000075 - momentum: 0.000000
2023-10-11 20:47:18,160 epoch 6 - iter 1560/2606 - loss 0.03009322 - time (sec): 796.53 - samples/sec: 281.04 - lr: 0.000073 - momentum: 0.000000
2023-10-11 20:49:26,446 epoch 6 - iter 1820/2606 - loss 0.03205205 - time (sec): 924.81 - samples/sec: 280.23 - lr: 0.000072 - momentum: 0.000000
2023-10-11 20:51:34,987 epoch 6 - iter 2080/2606 - loss 0.03189978 - time (sec): 1053.35 - samples/sec: 279.61 - lr: 0.000070 - momentum: 0.000000
2023-10-11 20:53:46,605 epoch 6 - iter 2340/2606 - loss 0.03201383 - time (sec): 1184.97 - samples/sec: 279.41 - lr: 0.000068 - momentum: 0.000000
2023-10-11 20:55:59,370 epoch 6 - iter 2600/2606 - loss 0.03251908 - time (sec): 1317.74 - samples/sec: 278.21 - lr: 0.000067 - momentum: 0.000000
2023-10-11 20:56:02,483 ----------------------------------------------------------------------------------------------------
2023-10-11 20:56:02,483 EPOCH 6 done: loss 0.0325 - lr: 0.000067
2023-10-11 20:56:43,741 DEV : loss 0.3739728629589081 - f1-score (micro avg) 0.3533
2023-10-11 20:56:43,807 ----------------------------------------------------------------------------------------------------
2023-10-11 20:58:52,823 epoch 7 - iter 260/2606 - loss 0.02619037 - time (sec): 129.01 - samples/sec: 283.19 - lr: 0.000065 - momentum: 0.000000
2023-10-11 21:01:03,347 epoch 7 - iter 520/2606 - loss 0.02490134 - time (sec): 259.54 - samples/sec: 293.88 - lr: 0.000063 - momentum: 0.000000
2023-10-11 21:03:10,109 epoch 7 - iter 780/2606 - loss 0.02134400 - time (sec): 386.30 - samples/sec: 289.15 - lr: 0.000062 - momentum: 0.000000
2023-10-11 21:05:20,230 epoch 7 - iter 1040/2606 - loss 0.02220272 - time (sec): 516.42 - samples/sec: 288.75 - lr: 0.000060 - momentum: 0.000000
2023-10-11 21:07:28,798 epoch 7 - iter 1300/2606 - loss 0.02232363 - time (sec): 644.99 - samples/sec: 286.82 - lr: 0.000058 - momentum: 0.000000
2023-10-11 21:09:39,566 epoch 7 - iter 1560/2606 - loss 0.02293314 - time (sec): 775.76 - samples/sec: 287.24 - lr: 0.000057 - momentum: 0.000000
2023-10-11 21:11:47,374 epoch 7 - iter 1820/2606 - loss 0.02380953 - time (sec): 903.56 - samples/sec: 285.08 - lr: 0.000055 - momentum: 0.000000
2023-10-11 21:13:57,055 epoch 7 - iter 2080/2606 - loss 0.02337935 - time (sec): 1033.24 - samples/sec: 285.77 - lr: 0.000053 - momentum: 0.000000
2023-10-11 21:16:05,268 epoch 7 - iter 2340/2606 - loss 0.02286752 - time (sec): 1161.46 - samples/sec: 284.52 - lr: 0.000052 - momentum: 0.000000
2023-10-11 21:18:15,839 epoch 7 - iter 2600/2606 - loss 0.02318007 - time (sec): 1292.03 - samples/sec: 283.93 - lr: 0.000050 - momentum: 0.000000
2023-10-11 21:18:18,517 ----------------------------------------------------------------------------------------------------
2023-10-11 21:18:18,517 EPOCH 7 done: loss 0.0232 - lr: 0.000050
2023-10-11 21:18:59,922 DEV : loss 0.43210309743881226 - f1-score (micro avg) 0.3573
2023-10-11 21:18:59,981 ----------------------------------------------------------------------------------------------------
2023-10-11 21:21:15,546 epoch 8 - iter 260/2606 - loss 0.01736298 - time (sec): 135.56 - samples/sec: 273.93 - lr: 0.000048 - momentum: 0.000000
2023-10-11 21:23:32,130 epoch 8 - iter 520/2606 - loss 0.01768000 - time (sec): 272.15 - samples/sec: 279.51 - lr: 0.000047 - momentum: 0.000000
2023-10-11 21:25:51,149 epoch 8 - iter 780/2606 - loss 0.01844562 - time (sec): 411.16 - samples/sec: 285.42 - lr: 0.000045 - momentum: 0.000000
2023-10-11 21:28:06,844 epoch 8 - iter 1040/2606 - loss 0.01901340 - time (sec): 546.86 - samples/sec: 280.10 - lr: 0.000043 - momentum: 0.000000
2023-10-11 21:30:19,867 epoch 8 - iter 1300/2606 - loss 0.01890679 - time (sec): 679.88 - samples/sec: 276.66 - lr: 0.000042 - momentum: 0.000000
2023-10-11 21:32:31,499 epoch 8 - iter 1560/2606 - loss 0.01866615 - time (sec): 811.52 - samples/sec: 273.92 - lr: 0.000040 - momentum: 0.000000
2023-10-11 21:34:40,747 epoch 8 - iter 1820/2606 - loss 0.01757957 - time (sec): 940.76 - samples/sec: 272.69 - lr: 0.000038 - momentum: 0.000000
2023-10-11 21:36:51,645 epoch 8 - iter 2080/2606 - loss 0.01765646 - time (sec): 1071.66 - samples/sec: 272.98 - lr: 0.000037 - momentum: 0.000000
2023-10-11 21:39:02,297 epoch 8 - iter 2340/2606 - loss 0.01729175 - time (sec): 1202.31 - samples/sec: 273.08 - lr: 0.000035 - momentum: 0.000000
2023-10-11 21:41:16,293 epoch 8 - iter 2600/2606 - loss 0.01830845 - time (sec): 1336.31 - samples/sec: 274.43 - lr: 0.000033 - momentum: 0.000000
2023-10-11 21:41:19,135 ----------------------------------------------------------------------------------------------------
2023-10-11 21:41:19,135 EPOCH 8 done: loss 0.0183 - lr: 0.000033
2023-10-11 21:41:57,348 DEV : loss 0.4571216106414795 - f1-score (micro avg) 0.3794
2023-10-11 21:41:57,402 saving best model
2023-10-11 21:41:59,961 ----------------------------------------------------------------------------------------------------
2023-10-11 21:44:14,638 epoch 9 - iter 260/2606 - loss 0.01227961 - time (sec): 134.67 - samples/sec: 288.28 - lr: 0.000032 - momentum: 0.000000
2023-10-11 21:46:25,123 epoch 9 - iter 520/2606 - loss 0.01627790 - time (sec): 265.16 - samples/sec: 289.19 - lr: 0.000030 - momentum: 0.000000
2023-10-11 21:48:34,665 epoch 9 - iter 780/2606 - loss 0.01503186 - time (sec): 394.70 - samples/sec: 281.10 - lr: 0.000028 - momentum: 0.000000
2023-10-11 21:50:45,179 epoch 9 - iter 1040/2606 - loss 0.01537357 - time (sec): 525.21 - samples/sec: 277.70 - lr: 0.000027 - momentum: 0.000000
2023-10-11 21:52:57,790 epoch 9 - iter 1300/2606 - loss 0.01459316 - time (sec): 657.82 - samples/sec: 278.54 - lr: 0.000025 - momentum: 0.000000
2023-10-11 21:55:08,103 epoch 9 - iter 1560/2606 - loss 0.01431843 - time (sec): 788.14 - samples/sec: 278.39 - lr: 0.000023 - momentum: 0.000000
2023-10-11 21:57:19,726 epoch 9 - iter 1820/2606 - loss 0.01345396 - time (sec): 919.76 - samples/sec: 281.39 - lr: 0.000022 - momentum: 0.000000
2023-10-11 21:59:26,920 epoch 9 - iter 2080/2606 - loss 0.01290213 - time (sec): 1046.95 - samples/sec: 280.05 - lr: 0.000020 - momentum: 0.000000
2023-10-11 22:01:36,701 epoch 9 - iter 2340/2606 - loss 0.01298304 - time (sec): 1176.74 - samples/sec: 280.05 - lr: 0.000018 - momentum: 0.000000
2023-10-11 22:03:49,254 epoch 9 - iter 2600/2606 - loss 0.01328377 - time (sec): 1309.29 - samples/sec: 279.95 - lr: 0.000017 - momentum: 0.000000
2023-10-11 22:03:52,305 ----------------------------------------------------------------------------------------------------
2023-10-11 22:03:52,306 EPOCH 9 done: loss 0.0134 - lr: 0.000017
2023-10-11 22:04:31,292 DEV : loss 0.48688554763793945 - f1-score (micro avg) 0.3662
2023-10-11 22:04:31,344 ----------------------------------------------------------------------------------------------------
2023-10-11 22:06:40,870 epoch 10 - iter 260/2606 - loss 0.00976991 - time (sec): 129.52 - samples/sec: 274.54 - lr: 0.000015 - momentum: 0.000000
2023-10-11 22:08:51,631 epoch 10 - iter 520/2606 - loss 0.01046522 - time (sec): 260.28 - samples/sec: 277.04 - lr: 0.000013 - momentum: 0.000000
2023-10-11 22:11:04,388 epoch 10 - iter 780/2606 - loss 0.01008981 - time (sec): 393.04 - samples/sec: 281.28 - lr: 0.000012 - momentum: 0.000000
2023-10-11 22:13:14,775 epoch 10 - iter 1040/2606 - loss 0.00982650 - time (sec): 523.43 - samples/sec: 280.23 - lr: 0.000010 - momentum: 0.000000
2023-10-11 22:15:25,947 epoch 10 - iter 1300/2606 - loss 0.01060840 - time (sec): 654.60 - samples/sec: 279.02 - lr: 0.000008 - momentum: 0.000000
2023-10-11 22:17:41,758 epoch 10 - iter 1560/2606 - loss 0.01019264 - time (sec): 790.41 - samples/sec: 276.50 - lr: 0.000007 - momentum: 0.000000
2023-10-11 22:19:57,563 epoch 10 - iter 1820/2606 - loss 0.01002362 - time (sec): 926.22 - samples/sec: 276.30 - lr: 0.000005 - momentum: 0.000000
2023-10-11 22:22:13,203 epoch 10 - iter 2080/2606 - loss 0.01004871 - time (sec): 1061.86 - samples/sec: 277.49 - lr: 0.000003 - momentum: 0.000000
2023-10-11 22:24:25,037 epoch 10 - iter 2340/2606 - loss 0.01040771 - time (sec): 1193.69 - samples/sec: 277.55 - lr: 0.000002 - momentum: 0.000000
2023-10-11 22:26:35,199 epoch 10 - iter 2600/2606 - loss 0.01012088 - time (sec): 1323.85 - samples/sec: 276.81 - lr: 0.000000 - momentum: 0.000000
2023-10-11 22:26:38,285 ----------------------------------------------------------------------------------------------------
2023-10-11 22:26:38,286 EPOCH 10 done: loss 0.0101 - lr: 0.000000
2023-10-11 22:27:17,636 DEV : loss 0.49472910165786743 - f1-score (micro avg) 0.3727
2023-10-11 22:27:18,568 ----------------------------------------------------------------------------------------------------
2023-10-11 22:27:18,570 Loading model from best epoch ...
2023-10-11 22:27:22,331 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-11 22:28:58,579
Results:
- F-score (micro) 0.4707
- F-score (macro) 0.3209
- Accuracy 0.3127
By class:
precision recall f1-score support
LOC 0.4855 0.6087 0.5402 1214
PER 0.3913 0.4926 0.4362 808
ORG 0.3030 0.3116 0.3073 353
HumanProd 0.0000 0.0000 0.0000 15
micro avg 0.4287 0.5218 0.4707 2390
macro avg 0.2950 0.3532 0.3209 2390
weighted avg 0.4237 0.5218 0.4672 2390
2023-10-11 22:28:58,579 ----------------------------------------------------------------------------------------------------