stefan-it's picture
Upload folder using huggingface_hub
8ca5a62
2023-10-10 22:24:24,605 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,608 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-10 22:24:24,608 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,608 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
2023-10-10 22:24:24,608 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,609 Train: 20847 sentences
2023-10-10 22:24:24,609 (train_with_dev=False, train_with_test=False)
2023-10-10 22:24:24,609 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,609 Training Params:
2023-10-10 22:24:24,609 - learning_rate: "0.00016"
2023-10-10 22:24:24,609 - mini_batch_size: "4"
2023-10-10 22:24:24,609 - max_epochs: "10"
2023-10-10 22:24:24,609 - shuffle: "True"
2023-10-10 22:24:24,609 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,609 Plugins:
2023-10-10 22:24:24,609 - TensorboardLogger
2023-10-10 22:24:24,609 - LinearScheduler | warmup_fraction: '0.1'
2023-10-10 22:24:24,610 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,610 Final evaluation on model from best epoch (best-model.pt)
2023-10-10 22:24:24,610 - metric: "('micro avg', 'f1-score')"
2023-10-10 22:24:24,610 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,610 Computation:
2023-10-10 22:24:24,610 - compute on device: cuda:0
2023-10-10 22:24:24,610 - embedding storage: none
2023-10-10 22:24:24,610 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,610 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2"
2023-10-10 22:24:24,610 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,610 ----------------------------------------------------------------------------------------------------
2023-10-10 22:24:24,610 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-10 22:26:46,615 epoch 1 - iter 521/5212 - loss 2.80621420 - time (sec): 142.00 - samples/sec: 252.95 - lr: 0.000016 - momentum: 0.000000
2023-10-10 22:29:11,418 epoch 1 - iter 1042/5212 - loss 2.34595380 - time (sec): 286.81 - samples/sec: 251.49 - lr: 0.000032 - momentum: 0.000000
2023-10-10 22:31:43,864 epoch 1 - iter 1563/5212 - loss 1.80357624 - time (sec): 439.25 - samples/sec: 249.05 - lr: 0.000048 - momentum: 0.000000
2023-10-10 22:34:13,264 epoch 1 - iter 2084/5212 - loss 1.44765160 - time (sec): 588.65 - samples/sec: 252.62 - lr: 0.000064 - momentum: 0.000000
2023-10-10 22:36:34,209 epoch 1 - iter 2605/5212 - loss 1.24536717 - time (sec): 729.60 - samples/sec: 254.32 - lr: 0.000080 - momentum: 0.000000
2023-10-10 22:38:55,115 epoch 1 - iter 3126/5212 - loss 1.09417563 - time (sec): 870.50 - samples/sec: 256.70 - lr: 0.000096 - momentum: 0.000000
2023-10-10 22:41:14,782 epoch 1 - iter 3647/5212 - loss 0.98901475 - time (sec): 1010.17 - samples/sec: 257.39 - lr: 0.000112 - momentum: 0.000000
2023-10-10 22:43:35,247 epoch 1 - iter 4168/5212 - loss 0.90166452 - time (sec): 1150.63 - samples/sec: 256.91 - lr: 0.000128 - momentum: 0.000000
2023-10-10 22:45:52,609 epoch 1 - iter 4689/5212 - loss 0.83405959 - time (sec): 1288.00 - samples/sec: 256.15 - lr: 0.000144 - momentum: 0.000000
2023-10-10 22:48:12,733 epoch 1 - iter 5210/5212 - loss 0.77082608 - time (sec): 1428.12 - samples/sec: 257.19 - lr: 0.000160 - momentum: 0.000000
2023-10-10 22:48:13,210 ----------------------------------------------------------------------------------------------------
2023-10-10 22:48:13,210 EPOCH 1 done: loss 0.7706 - lr: 0.000160
2023-10-10 22:48:47,574 DEV : loss 0.15222090482711792 - f1-score (micro avg) 0.2556
2023-10-10 22:48:47,634 saving best model
2023-10-10 22:48:48,628 ----------------------------------------------------------------------------------------------------
2023-10-10 22:51:07,221 epoch 2 - iter 521/5212 - loss 0.18484159 - time (sec): 138.59 - samples/sec: 253.05 - lr: 0.000158 - momentum: 0.000000
2023-10-10 22:53:28,919 epoch 2 - iter 1042/5212 - loss 0.18798941 - time (sec): 280.29 - samples/sec: 253.43 - lr: 0.000156 - momentum: 0.000000
2023-10-10 22:55:50,621 epoch 2 - iter 1563/5212 - loss 0.18286963 - time (sec): 421.99 - samples/sec: 255.27 - lr: 0.000155 - momentum: 0.000000
2023-10-10 22:58:20,531 epoch 2 - iter 2084/5212 - loss 0.17954531 - time (sec): 571.90 - samples/sec: 249.53 - lr: 0.000153 - momentum: 0.000000
2023-10-10 23:00:52,956 epoch 2 - iter 2605/5212 - loss 0.17672694 - time (sec): 724.32 - samples/sec: 248.40 - lr: 0.000151 - momentum: 0.000000
2023-10-10 23:03:19,258 epoch 2 - iter 3126/5212 - loss 0.16868245 - time (sec): 870.63 - samples/sec: 250.12 - lr: 0.000149 - momentum: 0.000000
2023-10-10 23:05:44,294 epoch 2 - iter 3647/5212 - loss 0.16681779 - time (sec): 1015.66 - samples/sec: 251.89 - lr: 0.000148 - momentum: 0.000000
2023-10-10 23:08:07,373 epoch 2 - iter 4168/5212 - loss 0.16454487 - time (sec): 1158.74 - samples/sec: 251.36 - lr: 0.000146 - momentum: 0.000000
2023-10-10 23:10:31,277 epoch 2 - iter 4689/5212 - loss 0.16327904 - time (sec): 1302.65 - samples/sec: 250.95 - lr: 0.000144 - momentum: 0.000000
2023-10-10 23:12:58,253 epoch 2 - iter 5210/5212 - loss 0.15972063 - time (sec): 1449.62 - samples/sec: 253.30 - lr: 0.000142 - momentum: 0.000000
2023-10-10 23:12:58,844 ----------------------------------------------------------------------------------------------------
2023-10-10 23:12:58,844 EPOCH 2 done: loss 0.1598 - lr: 0.000142
2023-10-10 23:13:39,614 DEV : loss 0.14660175144672394 - f1-score (micro avg) 0.3391
2023-10-10 23:13:39,668 saving best model
2023-10-10 23:13:42,315 ----------------------------------------------------------------------------------------------------
2023-10-10 23:16:06,577 epoch 3 - iter 521/5212 - loss 0.09466517 - time (sec): 144.26 - samples/sec: 247.08 - lr: 0.000140 - momentum: 0.000000
2023-10-10 23:18:30,244 epoch 3 - iter 1042/5212 - loss 0.10152859 - time (sec): 287.93 - samples/sec: 252.60 - lr: 0.000139 - momentum: 0.000000
2023-10-10 23:20:52,348 epoch 3 - iter 1563/5212 - loss 0.10301371 - time (sec): 430.03 - samples/sec: 255.52 - lr: 0.000137 - momentum: 0.000000
2023-10-10 23:23:12,806 epoch 3 - iter 2084/5212 - loss 0.11068085 - time (sec): 570.49 - samples/sec: 255.33 - lr: 0.000135 - momentum: 0.000000
2023-10-10 23:25:40,398 epoch 3 - iter 2605/5212 - loss 0.11269189 - time (sec): 718.08 - samples/sec: 258.48 - lr: 0.000133 - momentum: 0.000000
2023-10-10 23:28:03,360 epoch 3 - iter 3126/5212 - loss 0.11124043 - time (sec): 861.04 - samples/sec: 259.51 - lr: 0.000132 - momentum: 0.000000
2023-10-10 23:30:25,782 epoch 3 - iter 3647/5212 - loss 0.10989130 - time (sec): 1003.46 - samples/sec: 259.62 - lr: 0.000130 - momentum: 0.000000
2023-10-10 23:32:47,962 epoch 3 - iter 4168/5212 - loss 0.10954252 - time (sec): 1145.64 - samples/sec: 258.97 - lr: 0.000128 - momentum: 0.000000
2023-10-10 23:35:08,657 epoch 3 - iter 4689/5212 - loss 0.10867168 - time (sec): 1286.34 - samples/sec: 256.45 - lr: 0.000126 - momentum: 0.000000
2023-10-10 23:37:30,782 epoch 3 - iter 5210/5212 - loss 0.10800539 - time (sec): 1428.46 - samples/sec: 257.08 - lr: 0.000124 - momentum: 0.000000
2023-10-10 23:37:31,326 ----------------------------------------------------------------------------------------------------
2023-10-10 23:37:31,326 EPOCH 3 done: loss 0.1080 - lr: 0.000124
2023-10-10 23:38:11,188 DEV : loss 0.17651064693927765 - f1-score (micro avg) 0.3568
2023-10-10 23:38:11,239 saving best model
2023-10-10 23:38:13,860 ----------------------------------------------------------------------------------------------------
2023-10-10 23:40:32,288 epoch 4 - iter 521/5212 - loss 0.06937373 - time (sec): 138.42 - samples/sec: 266.87 - lr: 0.000123 - momentum: 0.000000
2023-10-10 23:42:53,769 epoch 4 - iter 1042/5212 - loss 0.07237076 - time (sec): 279.90 - samples/sec: 273.67 - lr: 0.000121 - momentum: 0.000000
2023-10-10 23:45:12,667 epoch 4 - iter 1563/5212 - loss 0.06838367 - time (sec): 418.80 - samples/sec: 269.27 - lr: 0.000119 - momentum: 0.000000
2023-10-10 23:47:31,229 epoch 4 - iter 2084/5212 - loss 0.07063170 - time (sec): 557.36 - samples/sec: 266.63 - lr: 0.000117 - momentum: 0.000000
2023-10-10 23:49:55,323 epoch 4 - iter 2605/5212 - loss 0.06907979 - time (sec): 701.46 - samples/sec: 264.86 - lr: 0.000116 - momentum: 0.000000
2023-10-10 23:52:19,372 epoch 4 - iter 3126/5212 - loss 0.07197287 - time (sec): 845.51 - samples/sec: 263.54 - lr: 0.000114 - momentum: 0.000000
2023-10-10 23:54:44,597 epoch 4 - iter 3647/5212 - loss 0.07262935 - time (sec): 990.73 - samples/sec: 263.16 - lr: 0.000112 - momentum: 0.000000
2023-10-10 23:57:08,949 epoch 4 - iter 4168/5212 - loss 0.07328823 - time (sec): 1135.08 - samples/sec: 260.17 - lr: 0.000110 - momentum: 0.000000
2023-10-10 23:59:34,787 epoch 4 - iter 4689/5212 - loss 0.07551901 - time (sec): 1280.92 - samples/sec: 259.28 - lr: 0.000108 - momentum: 0.000000
2023-10-11 00:01:57,824 epoch 4 - iter 5210/5212 - loss 0.07534259 - time (sec): 1423.96 - samples/sec: 257.97 - lr: 0.000107 - momentum: 0.000000
2023-10-11 00:01:58,278 ----------------------------------------------------------------------------------------------------
2023-10-11 00:01:58,279 EPOCH 4 done: loss 0.0754 - lr: 0.000107
2023-10-11 00:02:36,597 DEV : loss 0.3018316924571991 - f1-score (micro avg) 0.3465
2023-10-11 00:02:36,649 ----------------------------------------------------------------------------------------------------
2023-10-11 00:04:55,777 epoch 5 - iter 521/5212 - loss 0.05055408 - time (sec): 139.13 - samples/sec: 247.21 - lr: 0.000105 - momentum: 0.000000
2023-10-11 00:07:17,265 epoch 5 - iter 1042/5212 - loss 0.05232179 - time (sec): 280.61 - samples/sec: 250.93 - lr: 0.000103 - momentum: 0.000000
2023-10-11 00:09:41,678 epoch 5 - iter 1563/5212 - loss 0.05134707 - time (sec): 425.03 - samples/sec: 255.95 - lr: 0.000101 - momentum: 0.000000
2023-10-11 00:12:05,508 epoch 5 - iter 2084/5212 - loss 0.05065103 - time (sec): 568.86 - samples/sec: 255.50 - lr: 0.000100 - momentum: 0.000000
2023-10-11 00:14:30,059 epoch 5 - iter 2605/5212 - loss 0.05233700 - time (sec): 713.41 - samples/sec: 255.88 - lr: 0.000098 - momentum: 0.000000
2023-10-11 00:16:53,617 epoch 5 - iter 3126/5212 - loss 0.05266053 - time (sec): 856.97 - samples/sec: 256.10 - lr: 0.000096 - momentum: 0.000000
2023-10-11 00:19:14,374 epoch 5 - iter 3647/5212 - loss 0.05313538 - time (sec): 997.72 - samples/sec: 255.43 - lr: 0.000094 - momentum: 0.000000
2023-10-11 00:21:37,110 epoch 5 - iter 4168/5212 - loss 0.05233419 - time (sec): 1140.46 - samples/sec: 257.40 - lr: 0.000092 - momentum: 0.000000
2023-10-11 00:24:00,274 epoch 5 - iter 4689/5212 - loss 0.05299320 - time (sec): 1283.62 - samples/sec: 258.64 - lr: 0.000091 - momentum: 0.000000
2023-10-11 00:26:19,077 epoch 5 - iter 5210/5212 - loss 0.05236410 - time (sec): 1422.43 - samples/sec: 258.11 - lr: 0.000089 - momentum: 0.000000
2023-10-11 00:26:19,687 ----------------------------------------------------------------------------------------------------
2023-10-11 00:26:19,687 EPOCH 5 done: loss 0.0523 - lr: 0.000089
2023-10-11 00:26:58,423 DEV : loss 0.34503793716430664 - f1-score (micro avg) 0.3566
2023-10-11 00:26:58,477 ----------------------------------------------------------------------------------------------------
2023-10-11 00:29:17,497 epoch 6 - iter 521/5212 - loss 0.03864880 - time (sec): 139.02 - samples/sec: 252.13 - lr: 0.000087 - momentum: 0.000000
2023-10-11 00:31:39,164 epoch 6 - iter 1042/5212 - loss 0.03680672 - time (sec): 280.68 - samples/sec: 249.61 - lr: 0.000085 - momentum: 0.000000
2023-10-11 00:34:03,957 epoch 6 - iter 1563/5212 - loss 0.03797352 - time (sec): 425.48 - samples/sec: 255.01 - lr: 0.000084 - momentum: 0.000000
2023-10-11 00:36:27,335 epoch 6 - iter 2084/5212 - loss 0.03866110 - time (sec): 568.86 - samples/sec: 258.52 - lr: 0.000082 - momentum: 0.000000
2023-10-11 00:38:54,007 epoch 6 - iter 2605/5212 - loss 0.03679084 - time (sec): 715.53 - samples/sec: 259.93 - lr: 0.000080 - momentum: 0.000000
2023-10-11 00:41:22,709 epoch 6 - iter 3126/5212 - loss 0.03674023 - time (sec): 864.23 - samples/sec: 256.47 - lr: 0.000078 - momentum: 0.000000
2023-10-11 00:43:51,760 epoch 6 - iter 3647/5212 - loss 0.03691101 - time (sec): 1013.28 - samples/sec: 254.80 - lr: 0.000076 - momentum: 0.000000
2023-10-11 00:46:19,390 epoch 6 - iter 4168/5212 - loss 0.03750351 - time (sec): 1160.91 - samples/sec: 254.23 - lr: 0.000075 - momentum: 0.000000
2023-10-11 00:48:44,878 epoch 6 - iter 4689/5212 - loss 0.03642794 - time (sec): 1306.40 - samples/sec: 252.38 - lr: 0.000073 - momentum: 0.000000
2023-10-11 00:51:14,302 epoch 6 - iter 5210/5212 - loss 0.03786334 - time (sec): 1455.82 - samples/sec: 252.32 - lr: 0.000071 - momentum: 0.000000
2023-10-11 00:51:14,775 ----------------------------------------------------------------------------------------------------
2023-10-11 00:51:14,775 EPOCH 6 done: loss 0.0379 - lr: 0.000071
2023-10-11 00:51:56,260 DEV : loss 0.3911699652671814 - f1-score (micro avg) 0.377
2023-10-11 00:51:56,318 saving best model
2023-10-11 00:51:57,349 ----------------------------------------------------------------------------------------------------
2023-10-11 00:54:23,005 epoch 7 - iter 521/5212 - loss 0.02758289 - time (sec): 145.65 - samples/sec: 244.02 - lr: 0.000069 - momentum: 0.000000
2023-10-11 00:56:48,889 epoch 7 - iter 1042/5212 - loss 0.02486965 - time (sec): 291.54 - samples/sec: 245.37 - lr: 0.000068 - momentum: 0.000000
2023-10-11 00:59:14,382 epoch 7 - iter 1563/5212 - loss 0.02452347 - time (sec): 437.03 - samples/sec: 247.72 - lr: 0.000066 - momentum: 0.000000
2023-10-11 01:01:42,335 epoch 7 - iter 2084/5212 - loss 0.02779702 - time (sec): 584.98 - samples/sec: 247.26 - lr: 0.000064 - momentum: 0.000000
2023-10-11 01:04:12,124 epoch 7 - iter 2605/5212 - loss 0.02791685 - time (sec): 734.77 - samples/sec: 249.16 - lr: 0.000062 - momentum: 0.000000
2023-10-11 01:06:41,818 epoch 7 - iter 3126/5212 - loss 0.02752516 - time (sec): 884.47 - samples/sec: 247.89 - lr: 0.000060 - momentum: 0.000000
2023-10-11 01:09:13,741 epoch 7 - iter 3647/5212 - loss 0.02871266 - time (sec): 1036.39 - samples/sec: 247.16 - lr: 0.000059 - momentum: 0.000000
2023-10-11 01:11:40,180 epoch 7 - iter 4168/5212 - loss 0.02782251 - time (sec): 1182.83 - samples/sec: 245.44 - lr: 0.000057 - momentum: 0.000000
2023-10-11 01:14:10,487 epoch 7 - iter 4689/5212 - loss 0.02769821 - time (sec): 1333.14 - samples/sec: 246.10 - lr: 0.000055 - momentum: 0.000000
2023-10-11 01:16:41,507 epoch 7 - iter 5210/5212 - loss 0.02717297 - time (sec): 1484.16 - samples/sec: 247.48 - lr: 0.000053 - momentum: 0.000000
2023-10-11 01:16:42,009 ----------------------------------------------------------------------------------------------------
2023-10-11 01:16:42,010 EPOCH 7 done: loss 0.0272 - lr: 0.000053
2023-10-11 01:17:23,080 DEV : loss 0.4509029686450958 - f1-score (micro avg) 0.3605
2023-10-11 01:17:23,145 ----------------------------------------------------------------------------------------------------
2023-10-11 01:19:56,459 epoch 8 - iter 521/5212 - loss 0.02089913 - time (sec): 153.31 - samples/sec: 260.97 - lr: 0.000052 - momentum: 0.000000
2023-10-11 01:22:23,031 epoch 8 - iter 1042/5212 - loss 0.01959801 - time (sec): 299.88 - samples/sec: 258.01 - lr: 0.000050 - momentum: 0.000000
2023-10-11 01:24:44,575 epoch 8 - iter 1563/5212 - loss 0.01903963 - time (sec): 441.43 - samples/sec: 254.36 - lr: 0.000048 - momentum: 0.000000
2023-10-11 01:27:13,502 epoch 8 - iter 2084/5212 - loss 0.01784594 - time (sec): 590.35 - samples/sec: 254.25 - lr: 0.000046 - momentum: 0.000000
2023-10-11 01:29:41,666 epoch 8 - iter 2605/5212 - loss 0.01877834 - time (sec): 738.52 - samples/sec: 251.02 - lr: 0.000044 - momentum: 0.000000
2023-10-11 01:32:11,565 epoch 8 - iter 3126/5212 - loss 0.01871028 - time (sec): 888.42 - samples/sec: 251.11 - lr: 0.000043 - momentum: 0.000000
2023-10-11 01:34:38,191 epoch 8 - iter 3647/5212 - loss 0.01838741 - time (sec): 1035.04 - samples/sec: 250.50 - lr: 0.000041 - momentum: 0.000000
2023-10-11 01:37:06,347 epoch 8 - iter 4168/5212 - loss 0.01893451 - time (sec): 1183.20 - samples/sec: 248.63 - lr: 0.000039 - momentum: 0.000000
2023-10-11 01:39:32,627 epoch 8 - iter 4689/5212 - loss 0.01873050 - time (sec): 1329.48 - samples/sec: 246.39 - lr: 0.000037 - momentum: 0.000000
2023-10-11 01:42:01,006 epoch 8 - iter 5210/5212 - loss 0.01887562 - time (sec): 1477.86 - samples/sec: 248.60 - lr: 0.000036 - momentum: 0.000000
2023-10-11 01:42:01,426 ----------------------------------------------------------------------------------------------------
2023-10-11 01:42:01,427 EPOCH 8 done: loss 0.0189 - lr: 0.000036
2023-10-11 01:42:43,115 DEV : loss 0.47359704971313477 - f1-score (micro avg) 0.3753
2023-10-11 01:42:43,171 ----------------------------------------------------------------------------------------------------
2023-10-11 01:45:15,714 epoch 9 - iter 521/5212 - loss 0.01495092 - time (sec): 152.54 - samples/sec: 244.06 - lr: 0.000034 - momentum: 0.000000
2023-10-11 01:47:47,653 epoch 9 - iter 1042/5212 - loss 0.01250951 - time (sec): 304.48 - samples/sec: 254.36 - lr: 0.000032 - momentum: 0.000000
2023-10-11 01:50:15,849 epoch 9 - iter 1563/5212 - loss 0.01218680 - time (sec): 452.68 - samples/sec: 250.90 - lr: 0.000030 - momentum: 0.000000
2023-10-11 01:52:40,851 epoch 9 - iter 2084/5212 - loss 0.01212985 - time (sec): 597.68 - samples/sec: 245.62 - lr: 0.000028 - momentum: 0.000000
2023-10-11 01:55:10,319 epoch 9 - iter 2605/5212 - loss 0.01257174 - time (sec): 747.15 - samples/sec: 248.48 - lr: 0.000027 - momentum: 0.000000
2023-10-11 01:57:33,873 epoch 9 - iter 3126/5212 - loss 0.01277984 - time (sec): 890.70 - samples/sec: 247.69 - lr: 0.000025 - momentum: 0.000000
2023-10-11 01:59:56,151 epoch 9 - iter 3647/5212 - loss 0.01203742 - time (sec): 1032.98 - samples/sec: 247.46 - lr: 0.000023 - momentum: 0.000000
2023-10-11 02:02:18,498 epoch 9 - iter 4168/5212 - loss 0.01213790 - time (sec): 1175.32 - samples/sec: 248.50 - lr: 0.000021 - momentum: 0.000000
2023-10-11 02:04:43,630 epoch 9 - iter 4689/5212 - loss 0.01165778 - time (sec): 1320.46 - samples/sec: 248.72 - lr: 0.000020 - momentum: 0.000000
2023-10-11 02:07:12,851 epoch 9 - iter 5210/5212 - loss 0.01197423 - time (sec): 1469.68 - samples/sec: 249.97 - lr: 0.000018 - momentum: 0.000000
2023-10-11 02:07:13,282 ----------------------------------------------------------------------------------------------------
2023-10-11 02:07:13,282 EPOCH 9 done: loss 0.0120 - lr: 0.000018
2023-10-11 02:07:53,434 DEV : loss 0.41926491260528564 - f1-score (micro avg) 0.3992
2023-10-11 02:07:53,488 saving best model
2023-10-11 02:07:59,890 ----------------------------------------------------------------------------------------------------
2023-10-11 02:10:30,789 epoch 10 - iter 521/5212 - loss 0.00878818 - time (sec): 150.89 - samples/sec: 253.51 - lr: 0.000016 - momentum: 0.000000
2023-10-11 02:13:01,984 epoch 10 - iter 1042/5212 - loss 0.01027434 - time (sec): 302.09 - samples/sec: 254.66 - lr: 0.000014 - momentum: 0.000000
2023-10-11 02:15:30,771 epoch 10 - iter 1563/5212 - loss 0.00888626 - time (sec): 450.88 - samples/sec: 250.10 - lr: 0.000012 - momentum: 0.000000
2023-10-11 02:17:55,199 epoch 10 - iter 2084/5212 - loss 0.00837755 - time (sec): 595.31 - samples/sec: 244.29 - lr: 0.000011 - momentum: 0.000000
2023-10-11 02:20:21,472 epoch 10 - iter 2605/5212 - loss 0.00871533 - time (sec): 741.58 - samples/sec: 246.59 - lr: 0.000009 - momentum: 0.000000
2023-10-11 02:22:49,693 epoch 10 - iter 3126/5212 - loss 0.00858225 - time (sec): 889.80 - samples/sec: 246.01 - lr: 0.000007 - momentum: 0.000000
2023-10-11 02:25:17,618 epoch 10 - iter 3647/5212 - loss 0.00859775 - time (sec): 1037.72 - samples/sec: 247.36 - lr: 0.000005 - momentum: 0.000000
2023-10-11 02:27:46,389 epoch 10 - iter 4168/5212 - loss 0.00874080 - time (sec): 1186.49 - samples/sec: 249.03 - lr: 0.000004 - momentum: 0.000000
2023-10-11 02:30:11,769 epoch 10 - iter 4689/5212 - loss 0.00880848 - time (sec): 1331.87 - samples/sec: 248.84 - lr: 0.000002 - momentum: 0.000000
2023-10-11 02:32:39,960 epoch 10 - iter 5210/5212 - loss 0.00872135 - time (sec): 1480.07 - samples/sec: 248.04 - lr: 0.000000 - momentum: 0.000000
2023-10-11 02:32:40,633 ----------------------------------------------------------------------------------------------------
2023-10-11 02:32:40,634 EPOCH 10 done: loss 0.0087 - lr: 0.000000
2023-10-11 02:33:20,405 DEV : loss 0.4779926538467407 - f1-score (micro avg) 0.3891
2023-10-11 02:33:21,353 ----------------------------------------------------------------------------------------------------
2023-10-11 02:33:21,355 Loading model from best epoch ...
2023-10-11 02:33:25,843 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-11 02:35:07,559
Results:
- F-score (micro) 0.4583
- F-score (macro) 0.3044
- Accuracy 0.3024
By class:
precision recall f1-score support
LOC 0.5082 0.5601 0.5329 1214
PER 0.4197 0.4530 0.4357 808
ORG 0.2576 0.2408 0.2489 353
HumanProd 0.0000 0.0000 0.0000 15
micro avg 0.4442 0.4732 0.4583 2390
macro avg 0.2964 0.3135 0.3044 2390
weighted avg 0.4381 0.4732 0.4548 2390
2023-10-11 02:35:07,559 ----------------------------------------------------------------------------------------------------