2023-10-11 18:40:02,970 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,973 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 18:40:02,973 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,974 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-11 18:40:02,974 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,974 Train: 20847 sentences 2023-10-11 18:40:02,974 (train_with_dev=False, train_with_test=False) 2023-10-11 18:40:02,974 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,974 Training Params: 2023-10-11 18:40:02,974 - learning_rate: "0.00015" 2023-10-11 18:40:02,974 - mini_batch_size: "8" 2023-10-11 18:40:02,974 - max_epochs: "10" 2023-10-11 18:40:02,974 - shuffle: "True" 2023-10-11 18:40:02,974 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,974 Plugins: 2023-10-11 18:40:02,975 - TensorboardLogger 2023-10-11 18:40:02,975 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,975 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 18:40:02,975 - metric: "('micro avg', 'f1-score')" 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,975 Computation: 2023-10-11 18:40:02,975 - compute on device: cuda:0 2023-10-11 18:40:02,975 - embedding storage: none 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,975 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,975 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:40:02,976 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 18:42:06,831 epoch 1 - iter 260/2606 - loss 2.81712327 - time (sec): 123.85 - samples/sec: 267.17 - lr: 0.000015 - momentum: 0.000000 2023-10-11 18:44:12,618 epoch 1 - iter 520/2606 - loss 2.56637286 - time (sec): 249.64 - samples/sec: 271.69 - lr: 0.000030 - momentum: 0.000000 2023-10-11 18:46:20,188 epoch 1 - iter 780/2606 - loss 2.15544405 - time (sec): 377.21 - samples/sec: 277.24 - lr: 0.000045 - momentum: 0.000000 2023-10-11 18:48:27,598 epoch 1 - iter 1040/2606 - loss 1.75472736 - time (sec): 504.62 - samples/sec: 281.12 - lr: 0.000060 - momentum: 0.000000 2023-10-11 18:50:37,138 epoch 1 - iter 1300/2606 - loss 1.48365754 - time (sec): 634.16 - samples/sec: 283.56 - lr: 0.000075 - momentum: 0.000000 2023-10-11 18:52:43,224 epoch 1 - iter 1560/2606 - loss 1.32250010 - time (sec): 760.25 - samples/sec: 282.41 - lr: 0.000090 - momentum: 0.000000 2023-10-11 18:54:52,733 epoch 1 - iter 1820/2606 - loss 1.18203870 - time (sec): 889.76 - samples/sec: 283.87 - lr: 0.000105 - momentum: 0.000000 2023-10-11 18:57:02,260 epoch 1 - iter 2080/2606 - loss 1.06995283 - time (sec): 1019.28 - samples/sec: 283.80 - lr: 0.000120 - momentum: 0.000000 2023-10-11 18:59:14,269 epoch 1 - iter 2340/2606 - loss 0.97221876 - time (sec): 1151.29 - samples/sec: 285.28 - lr: 0.000135 - momentum: 0.000000 2023-10-11 19:01:26,904 epoch 1 - iter 2600/2606 - loss 0.89144401 - time (sec): 1283.93 - samples/sec: 285.71 - lr: 0.000150 - momentum: 0.000000 2023-10-11 19:01:29,797 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:01:29,798 EPOCH 1 done: loss 0.8903 - lr: 0.000150 2023-10-11 19:02:04,152 DEV : loss 0.12409034371376038 - f1-score (micro avg) 0.2288 2023-10-11 19:02:04,214 saving best model 2023-10-11 19:02:05,095 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:04:16,365 epoch 2 - iter 260/2606 - loss 0.19944480 - time (sec): 131.27 - samples/sec: 276.21 - lr: 0.000148 - momentum: 0.000000 2023-10-11 19:06:29,023 epoch 2 - iter 520/2606 - loss 0.17888305 - time (sec): 263.93 - samples/sec: 280.41 - lr: 0.000147 - momentum: 0.000000 2023-10-11 19:08:39,009 epoch 2 - iter 780/2606 - loss 0.17550124 - time (sec): 393.91 - samples/sec: 276.30 - lr: 0.000145 - momentum: 0.000000 2023-10-11 19:10:54,773 epoch 2 - iter 1040/2606 - loss 0.17289231 - time (sec): 529.68 - samples/sec: 279.05 - lr: 0.000143 - momentum: 0.000000 2023-10-11 19:13:08,244 epoch 2 - iter 1300/2606 - loss 0.16686068 - time (sec): 663.15 - samples/sec: 278.12 - lr: 0.000142 - momentum: 0.000000 2023-10-11 19:15:18,062 epoch 2 - iter 1560/2606 - loss 0.16331495 - time (sec): 792.96 - samples/sec: 274.50 - lr: 0.000140 - momentum: 0.000000 2023-10-11 19:17:26,046 epoch 2 - iter 1820/2606 - loss 0.16202813 - time (sec): 920.95 - samples/sec: 271.80 - lr: 0.000138 - momentum: 0.000000 2023-10-11 19:19:38,966 epoch 2 - iter 2080/2606 - loss 0.15745900 - time (sec): 1053.87 - samples/sec: 273.38 - lr: 0.000137 - momentum: 0.000000 2023-10-11 19:21:55,074 epoch 2 - iter 2340/2606 - loss 0.15266877 - time (sec): 1189.98 - samples/sec: 277.06 - lr: 0.000135 - momentum: 0.000000 2023-10-11 19:24:06,873 epoch 2 - iter 2600/2606 - loss 0.14911363 - time (sec): 1321.78 - samples/sec: 277.57 - lr: 0.000133 - momentum: 0.000000 2023-10-11 19:24:09,628 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:24:09,628 EPOCH 2 done: loss 0.1491 - lr: 0.000133 2023-10-11 19:24:50,274 DEV : loss 0.11884504556655884 - f1-score (micro avg) 0.3333 2023-10-11 19:24:50,327 saving best model 2023-10-11 19:24:52,888 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:27:06,818 epoch 3 - iter 260/2606 - loss 0.09549702 - time (sec): 133.93 - samples/sec: 256.83 - lr: 0.000132 - momentum: 0.000000 2023-10-11 19:29:18,230 epoch 3 - iter 520/2606 - loss 0.09118772 - time (sec): 265.34 - samples/sec: 253.83 - lr: 0.000130 - momentum: 0.000000 2023-10-11 19:31:36,696 epoch 3 - iter 780/2606 - loss 0.09431511 - time (sec): 403.80 - samples/sec: 267.40 - lr: 0.000128 - momentum: 0.000000 2023-10-11 19:33:47,901 epoch 3 - iter 1040/2606 - loss 0.09523690 - time (sec): 535.01 - samples/sec: 268.90 - lr: 0.000127 - momentum: 0.000000 2023-10-11 19:36:01,433 epoch 3 - iter 1300/2606 - loss 0.09429778 - time (sec): 668.54 - samples/sec: 267.31 - lr: 0.000125 - momentum: 0.000000 2023-10-11 19:38:21,667 epoch 3 - iter 1560/2606 - loss 0.09059556 - time (sec): 808.77 - samples/sec: 269.83 - lr: 0.000123 - momentum: 0.000000 2023-10-11 19:40:42,056 epoch 3 - iter 1820/2606 - loss 0.09119508 - time (sec): 949.16 - samples/sec: 273.08 - lr: 0.000122 - momentum: 0.000000 2023-10-11 19:42:54,370 epoch 3 - iter 2080/2606 - loss 0.09200293 - time (sec): 1081.48 - samples/sec: 270.13 - lr: 0.000120 - momentum: 0.000000 2023-10-11 19:45:09,393 epoch 3 - iter 2340/2606 - loss 0.09134453 - time (sec): 1216.50 - samples/sec: 270.53 - lr: 0.000118 - momentum: 0.000000 2023-10-11 19:47:25,969 epoch 3 - iter 2600/2606 - loss 0.09143332 - time (sec): 1353.08 - samples/sec: 270.96 - lr: 0.000117 - momentum: 0.000000 2023-10-11 19:47:28,983 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:47:28,983 EPOCH 3 done: loss 0.0913 - lr: 0.000117 2023-10-11 19:48:08,882 DEV : loss 0.2239927500486374 - f1-score (micro avg) 0.3746 2023-10-11 19:48:08,941 saving best model 2023-10-11 19:48:11,492 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:50:25,170 epoch 4 - iter 260/2606 - loss 0.06118530 - time (sec): 133.67 - samples/sec: 270.56 - lr: 0.000115 - momentum: 0.000000 2023-10-11 19:52:37,833 epoch 4 - iter 520/2606 - loss 0.06087637 - time (sec): 266.34 - samples/sec: 278.05 - lr: 0.000113 - momentum: 0.000000 2023-10-11 19:54:53,690 epoch 4 - iter 780/2606 - loss 0.06056691 - time (sec): 402.19 - samples/sec: 280.14 - lr: 0.000112 - momentum: 0.000000 2023-10-11 19:57:05,159 epoch 4 - iter 1040/2606 - loss 0.06218598 - time (sec): 533.66 - samples/sec: 282.03 - lr: 0.000110 - momentum: 0.000000 2023-10-11 19:59:16,019 epoch 4 - iter 1300/2606 - loss 0.06065766 - time (sec): 664.52 - samples/sec: 280.05 - lr: 0.000108 - momentum: 0.000000 2023-10-11 20:01:25,024 epoch 4 - iter 1560/2606 - loss 0.06041444 - time (sec): 793.53 - samples/sec: 282.13 - lr: 0.000107 - momentum: 0.000000 2023-10-11 20:03:32,195 epoch 4 - iter 1820/2606 - loss 0.06132794 - time (sec): 920.70 - samples/sec: 282.66 - lr: 0.000105 - momentum: 0.000000 2023-10-11 20:05:42,062 epoch 4 - iter 2080/2606 - loss 0.06371225 - time (sec): 1050.56 - samples/sec: 280.30 - lr: 0.000103 - momentum: 0.000000 2023-10-11 20:07:54,590 epoch 4 - iter 2340/2606 - loss 0.06335648 - time (sec): 1183.09 - samples/sec: 280.57 - lr: 0.000102 - momentum: 0.000000 2023-10-11 20:10:05,042 epoch 4 - iter 2600/2606 - loss 0.06430066 - time (sec): 1313.55 - samples/sec: 278.85 - lr: 0.000100 - momentum: 0.000000 2023-10-11 20:10:08,636 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:10:08,636 EPOCH 4 done: loss 0.0642 - lr: 0.000100 2023-10-11 20:10:49,803 DEV : loss 0.29200002551078796 - f1-score (micro avg) 0.3514 2023-10-11 20:10:49,859 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:13:04,052 epoch 5 - iter 260/2606 - loss 0.04431033 - time (sec): 134.19 - samples/sec: 269.01 - lr: 0.000098 - momentum: 0.000000 2023-10-11 20:15:19,115 epoch 5 - iter 520/2606 - loss 0.04629748 - time (sec): 269.25 - samples/sec: 268.81 - lr: 0.000097 - momentum: 0.000000 2023-10-11 20:17:32,772 epoch 5 - iter 780/2606 - loss 0.04374310 - time (sec): 402.91 - samples/sec: 264.85 - lr: 0.000095 - momentum: 0.000000 2023-10-11 20:19:48,742 epoch 5 - iter 1040/2606 - loss 0.04534844 - time (sec): 538.88 - samples/sec: 268.67 - lr: 0.000093 - momentum: 0.000000 2023-10-11 20:22:02,477 epoch 5 - iter 1300/2606 - loss 0.04402533 - time (sec): 672.62 - samples/sec: 266.46 - lr: 0.000092 - momentum: 0.000000 2023-10-11 20:24:16,827 epoch 5 - iter 1560/2606 - loss 0.04300346 - time (sec): 806.97 - samples/sec: 268.39 - lr: 0.000090 - momentum: 0.000000 2023-10-11 20:26:33,229 epoch 5 - iter 1820/2606 - loss 0.04146209 - time (sec): 943.37 - samples/sec: 270.44 - lr: 0.000088 - momentum: 0.000000 2023-10-11 20:28:48,188 epoch 5 - iter 2080/2606 - loss 0.04272847 - time (sec): 1078.33 - samples/sec: 271.02 - lr: 0.000087 - momentum: 0.000000 2023-10-11 20:31:02,796 epoch 5 - iter 2340/2606 - loss 0.04380797 - time (sec): 1212.93 - samples/sec: 270.51 - lr: 0.000085 - momentum: 0.000000 2023-10-11 20:33:19,020 epoch 5 - iter 2600/2606 - loss 0.04446162 - time (sec): 1349.16 - samples/sec: 271.77 - lr: 0.000083 - momentum: 0.000000 2023-10-11 20:33:21,988 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:33:21,988 EPOCH 5 done: loss 0.0445 - lr: 0.000083 2023-10-11 20:34:01,578 DEV : loss 0.35915789008140564 - f1-score (micro avg) 0.3515 2023-10-11 20:34:01,631 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:36:15,312 epoch 6 - iter 260/2606 - loss 0.02751499 - time (sec): 133.68 - samples/sec: 285.83 - lr: 0.000082 - momentum: 0.000000 2023-10-11 20:38:28,554 epoch 6 - iter 520/2606 - loss 0.02668139 - time (sec): 266.92 - samples/sec: 285.05 - lr: 0.000080 - momentum: 0.000000 2023-10-11 20:40:39,962 epoch 6 - iter 780/2606 - loss 0.02772708 - time (sec): 398.33 - samples/sec: 280.60 - lr: 0.000078 - momentum: 0.000000 2023-10-11 20:42:51,500 epoch 6 - iter 1040/2606 - loss 0.02939592 - time (sec): 529.87 - samples/sec: 277.21 - lr: 0.000077 - momentum: 0.000000 2023-10-11 20:45:05,211 epoch 6 - iter 1300/2606 - loss 0.02930341 - time (sec): 663.58 - samples/sec: 279.96 - lr: 0.000075 - momentum: 0.000000 2023-10-11 20:47:18,160 epoch 6 - iter 1560/2606 - loss 0.03009322 - time (sec): 796.53 - samples/sec: 281.04 - lr: 0.000073 - momentum: 0.000000 2023-10-11 20:49:26,446 epoch 6 - iter 1820/2606 - loss 0.03205205 - time (sec): 924.81 - samples/sec: 280.23 - lr: 0.000072 - momentum: 0.000000 2023-10-11 20:51:34,987 epoch 6 - iter 2080/2606 - loss 0.03189978 - time (sec): 1053.35 - samples/sec: 279.61 - lr: 0.000070 - momentum: 0.000000 2023-10-11 20:53:46,605 epoch 6 - iter 2340/2606 - loss 0.03201383 - time (sec): 1184.97 - samples/sec: 279.41 - lr: 0.000068 - momentum: 0.000000 2023-10-11 20:55:59,370 epoch 6 - iter 2600/2606 - loss 0.03251908 - time (sec): 1317.74 - samples/sec: 278.21 - lr: 0.000067 - momentum: 0.000000 2023-10-11 20:56:02,483 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:56:02,483 EPOCH 6 done: loss 0.0325 - lr: 0.000067 2023-10-11 20:56:43,741 DEV : loss 0.3739728629589081 - f1-score (micro avg) 0.3533 2023-10-11 20:56:43,807 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:58:52,823 epoch 7 - iter 260/2606 - loss 0.02619037 - time (sec): 129.01 - samples/sec: 283.19 - lr: 0.000065 - momentum: 0.000000 2023-10-11 21:01:03,347 epoch 7 - iter 520/2606 - loss 0.02490134 - time (sec): 259.54 - samples/sec: 293.88 - lr: 0.000063 - momentum: 0.000000 2023-10-11 21:03:10,109 epoch 7 - iter 780/2606 - loss 0.02134400 - time (sec): 386.30 - samples/sec: 289.15 - lr: 0.000062 - momentum: 0.000000 2023-10-11 21:05:20,230 epoch 7 - iter 1040/2606 - loss 0.02220272 - time (sec): 516.42 - samples/sec: 288.75 - lr: 0.000060 - momentum: 0.000000 2023-10-11 21:07:28,798 epoch 7 - iter 1300/2606 - loss 0.02232363 - time (sec): 644.99 - samples/sec: 286.82 - lr: 0.000058 - momentum: 0.000000 2023-10-11 21:09:39,566 epoch 7 - iter 1560/2606 - loss 0.02293314 - time (sec): 775.76 - samples/sec: 287.24 - lr: 0.000057 - momentum: 0.000000 2023-10-11 21:11:47,374 epoch 7 - iter 1820/2606 - loss 0.02380953 - time (sec): 903.56 - samples/sec: 285.08 - lr: 0.000055 - momentum: 0.000000 2023-10-11 21:13:57,055 epoch 7 - iter 2080/2606 - loss 0.02337935 - time (sec): 1033.24 - samples/sec: 285.77 - lr: 0.000053 - momentum: 0.000000 2023-10-11 21:16:05,268 epoch 7 - iter 2340/2606 - loss 0.02286752 - time (sec): 1161.46 - samples/sec: 284.52 - lr: 0.000052 - momentum: 0.000000 2023-10-11 21:18:15,839 epoch 7 - iter 2600/2606 - loss 0.02318007 - time (sec): 1292.03 - samples/sec: 283.93 - lr: 0.000050 - momentum: 0.000000 2023-10-11 21:18:18,517 ---------------------------------------------------------------------------------------------------- 2023-10-11 21:18:18,517 EPOCH 7 done: loss 0.0232 - lr: 0.000050 2023-10-11 21:18:59,922 DEV : loss 0.43210309743881226 - f1-score (micro avg) 0.3573 2023-10-11 21:18:59,981 ---------------------------------------------------------------------------------------------------- 2023-10-11 21:21:15,546 epoch 8 - iter 260/2606 - loss 0.01736298 - time (sec): 135.56 - samples/sec: 273.93 - lr: 0.000048 - momentum: 0.000000 2023-10-11 21:23:32,130 epoch 8 - iter 520/2606 - loss 0.01768000 - time (sec): 272.15 - samples/sec: 279.51 - lr: 0.000047 - momentum: 0.000000 2023-10-11 21:25:51,149 epoch 8 - iter 780/2606 - loss 0.01844562 - time (sec): 411.16 - samples/sec: 285.42 - lr: 0.000045 - momentum: 0.000000 2023-10-11 21:28:06,844 epoch 8 - iter 1040/2606 - loss 0.01901340 - time (sec): 546.86 - samples/sec: 280.10 - lr: 0.000043 - momentum: 0.000000 2023-10-11 21:30:19,867 epoch 8 - iter 1300/2606 - loss 0.01890679 - time (sec): 679.88 - samples/sec: 276.66 - lr: 0.000042 - momentum: 0.000000 2023-10-11 21:32:31,499 epoch 8 - iter 1560/2606 - loss 0.01866615 - time (sec): 811.52 - samples/sec: 273.92 - lr: 0.000040 - momentum: 0.000000 2023-10-11 21:34:40,747 epoch 8 - iter 1820/2606 - loss 0.01757957 - time (sec): 940.76 - samples/sec: 272.69 - lr: 0.000038 - momentum: 0.000000 2023-10-11 21:36:51,645 epoch 8 - iter 2080/2606 - loss 0.01765646 - time (sec): 1071.66 - samples/sec: 272.98 - lr: 0.000037 - momentum: 0.000000 2023-10-11 21:39:02,297 epoch 8 - iter 2340/2606 - loss 0.01729175 - time (sec): 1202.31 - samples/sec: 273.08 - lr: 0.000035 - momentum: 0.000000 2023-10-11 21:41:16,293 epoch 8 - iter 2600/2606 - loss 0.01830845 - time (sec): 1336.31 - samples/sec: 274.43 - lr: 0.000033 - momentum: 0.000000 2023-10-11 21:41:19,135 ---------------------------------------------------------------------------------------------------- 2023-10-11 21:41:19,135 EPOCH 8 done: loss 0.0183 - lr: 0.000033 2023-10-11 21:41:57,348 DEV : loss 0.4571216106414795 - f1-score (micro avg) 0.3794 2023-10-11 21:41:57,402 saving best model 2023-10-11 21:41:59,961 ---------------------------------------------------------------------------------------------------- 2023-10-11 21:44:14,638 epoch 9 - iter 260/2606 - loss 0.01227961 - time (sec): 134.67 - samples/sec: 288.28 - lr: 0.000032 - momentum: 0.000000 2023-10-11 21:46:25,123 epoch 9 - iter 520/2606 - loss 0.01627790 - time (sec): 265.16 - samples/sec: 289.19 - lr: 0.000030 - momentum: 0.000000 2023-10-11 21:48:34,665 epoch 9 - iter 780/2606 - loss 0.01503186 - time (sec): 394.70 - samples/sec: 281.10 - lr: 0.000028 - momentum: 0.000000 2023-10-11 21:50:45,179 epoch 9 - iter 1040/2606 - loss 0.01537357 - time (sec): 525.21 - samples/sec: 277.70 - lr: 0.000027 - momentum: 0.000000 2023-10-11 21:52:57,790 epoch 9 - iter 1300/2606 - loss 0.01459316 - time (sec): 657.82 - samples/sec: 278.54 - lr: 0.000025 - momentum: 0.000000 2023-10-11 21:55:08,103 epoch 9 - iter 1560/2606 - loss 0.01431843 - time (sec): 788.14 - samples/sec: 278.39 - lr: 0.000023 - momentum: 0.000000 2023-10-11 21:57:19,726 epoch 9 - iter 1820/2606 - loss 0.01345396 - time (sec): 919.76 - samples/sec: 281.39 - lr: 0.000022 - momentum: 0.000000 2023-10-11 21:59:26,920 epoch 9 - iter 2080/2606 - loss 0.01290213 - time (sec): 1046.95 - samples/sec: 280.05 - lr: 0.000020 - momentum: 0.000000 2023-10-11 22:01:36,701 epoch 9 - iter 2340/2606 - loss 0.01298304 - time (sec): 1176.74 - samples/sec: 280.05 - lr: 0.000018 - momentum: 0.000000 2023-10-11 22:03:49,254 epoch 9 - iter 2600/2606 - loss 0.01328377 - time (sec): 1309.29 - samples/sec: 279.95 - lr: 0.000017 - momentum: 0.000000 2023-10-11 22:03:52,305 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:03:52,306 EPOCH 9 done: loss 0.0134 - lr: 0.000017 2023-10-11 22:04:31,292 DEV : loss 0.48688554763793945 - f1-score (micro avg) 0.3662 2023-10-11 22:04:31,344 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:06:40,870 epoch 10 - iter 260/2606 - loss 0.00976991 - time (sec): 129.52 - samples/sec: 274.54 - lr: 0.000015 - momentum: 0.000000 2023-10-11 22:08:51,631 epoch 10 - iter 520/2606 - loss 0.01046522 - time (sec): 260.28 - samples/sec: 277.04 - lr: 0.000013 - momentum: 0.000000 2023-10-11 22:11:04,388 epoch 10 - iter 780/2606 - loss 0.01008981 - time (sec): 393.04 - samples/sec: 281.28 - lr: 0.000012 - momentum: 0.000000 2023-10-11 22:13:14,775 epoch 10 - iter 1040/2606 - loss 0.00982650 - time (sec): 523.43 - samples/sec: 280.23 - lr: 0.000010 - momentum: 0.000000 2023-10-11 22:15:25,947 epoch 10 - iter 1300/2606 - loss 0.01060840 - time (sec): 654.60 - samples/sec: 279.02 - lr: 0.000008 - momentum: 0.000000 2023-10-11 22:17:41,758 epoch 10 - iter 1560/2606 - loss 0.01019264 - time (sec): 790.41 - samples/sec: 276.50 - lr: 0.000007 - momentum: 0.000000 2023-10-11 22:19:57,563 epoch 10 - iter 1820/2606 - loss 0.01002362 - time (sec): 926.22 - samples/sec: 276.30 - lr: 0.000005 - momentum: 0.000000 2023-10-11 22:22:13,203 epoch 10 - iter 2080/2606 - loss 0.01004871 - time (sec): 1061.86 - samples/sec: 277.49 - lr: 0.000003 - momentum: 0.000000 2023-10-11 22:24:25,037 epoch 10 - iter 2340/2606 - loss 0.01040771 - time (sec): 1193.69 - samples/sec: 277.55 - lr: 0.000002 - momentum: 0.000000 2023-10-11 22:26:35,199 epoch 10 - iter 2600/2606 - loss 0.01012088 - time (sec): 1323.85 - samples/sec: 276.81 - lr: 0.000000 - momentum: 0.000000 2023-10-11 22:26:38,285 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:26:38,286 EPOCH 10 done: loss 0.0101 - lr: 0.000000 2023-10-11 22:27:17,636 DEV : loss 0.49472910165786743 - f1-score (micro avg) 0.3727 2023-10-11 22:27:18,568 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:27:18,570 Loading model from best epoch ... 2023-10-11 22:27:22,331 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 22:28:58,579 Results: - F-score (micro) 0.4707 - F-score (macro) 0.3209 - Accuracy 0.3127 By class: precision recall f1-score support LOC 0.4855 0.6087 0.5402 1214 PER 0.3913 0.4926 0.4362 808 ORG 0.3030 0.3116 0.3073 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4287 0.5218 0.4707 2390 macro avg 0.2950 0.3532 0.3209 2390 weighted avg 0.4237 0.5218 0.4672 2390 2023-10-11 22:28:58,579 ----------------------------------------------------------------------------------------------------