| 2023-10-10 22:24:24,605 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,608 Model: "SequenceTagger( | |
| (embeddings): ByT5Embeddings( | |
| (model): T5EncoderModel( | |
| (shared): Embedding(384, 1472) | |
| (encoder): T5Stack( | |
| (embed_tokens): Embedding(384, 1472) | |
| (block): ModuleList( | |
| (0): T5Block( | |
| (layer): ModuleList( | |
| (0): T5LayerSelfAttention( | |
| (SelfAttention): T5Attention( | |
| (q): Linear(in_features=1472, out_features=384, bias=False) | |
| (k): Linear(in_features=1472, out_features=384, bias=False) | |
| (v): Linear(in_features=1472, out_features=384, bias=False) | |
| (o): Linear(in_features=384, out_features=1472, bias=False) | |
| (relative_attention_bias): Embedding(32, 6) | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (1): T5LayerFF( | |
| (DenseReluDense): T5DenseGatedActDense( | |
| (wi_0): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wi_1): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wo): Linear(in_features=3584, out_features=1472, bias=False) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| (act): NewGELUActivation() | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| ) | |
| ) | |
| (1-11): 11 x T5Block( | |
| (layer): ModuleList( | |
| (0): T5LayerSelfAttention( | |
| (SelfAttention): T5Attention( | |
| (q): Linear(in_features=1472, out_features=384, bias=False) | |
| (k): Linear(in_features=1472, out_features=384, bias=False) | |
| (v): Linear(in_features=1472, out_features=384, bias=False) | |
| (o): Linear(in_features=384, out_features=1472, bias=False) | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (1): T5LayerFF( | |
| (DenseReluDense): T5DenseGatedActDense( | |
| (wi_0): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wi_1): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wo): Linear(in_features=3584, out_features=1472, bias=False) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| (act): NewGELUActivation() | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| ) | |
| ) | |
| ) | |
| (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| ) | |
| ) | |
| (locked_dropout): LockedDropout(p=0.5) | |
| (linear): Linear(in_features=1472, out_features=17, bias=True) | |
| (loss_function): CrossEntropyLoss() | |
| )" | |
| 2023-10-10 22:24:24,608 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,608 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences | |
| - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator | |
| 2023-10-10 22:24:24,608 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,609 Train: 20847 sentences | |
| 2023-10-10 22:24:24,609 (train_with_dev=False, train_with_test=False) | |
| 2023-10-10 22:24:24,609 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,609 Training Params: | |
| 2023-10-10 22:24:24,609 - learning_rate: "0.00016" | |
| 2023-10-10 22:24:24,609 - mini_batch_size: "4" | |
| 2023-10-10 22:24:24,609 - max_epochs: "10" | |
| 2023-10-10 22:24:24,609 - shuffle: "True" | |
| 2023-10-10 22:24:24,609 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,609 Plugins: | |
| 2023-10-10 22:24:24,609 - TensorboardLogger | |
| 2023-10-10 22:24:24,609 - LinearScheduler | warmup_fraction: '0.1' | |
| 2023-10-10 22:24:24,610 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,610 Final evaluation on model from best epoch (best-model.pt) | |
| 2023-10-10 22:24:24,610 - metric: "('micro avg', 'f1-score')" | |
| 2023-10-10 22:24:24,610 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,610 Computation: | |
| 2023-10-10 22:24:24,610 - compute on device: cuda:0 | |
| 2023-10-10 22:24:24,610 - embedding storage: none | |
| 2023-10-10 22:24:24,610 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,610 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" | |
| 2023-10-10 22:24:24,610 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,610 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:24:24,610 Logging anything other than scalars to TensorBoard is currently not supported. | |
| 2023-10-10 22:26:46,615 epoch 1 - iter 521/5212 - loss 2.80621420 - time (sec): 142.00 - samples/sec: 252.95 - lr: 0.000016 - momentum: 0.000000 | |
| 2023-10-10 22:29:11,418 epoch 1 - iter 1042/5212 - loss 2.34595380 - time (sec): 286.81 - samples/sec: 251.49 - lr: 0.000032 - momentum: 0.000000 | |
| 2023-10-10 22:31:43,864 epoch 1 - iter 1563/5212 - loss 1.80357624 - time (sec): 439.25 - samples/sec: 249.05 - lr: 0.000048 - momentum: 0.000000 | |
| 2023-10-10 22:34:13,264 epoch 1 - iter 2084/5212 - loss 1.44765160 - time (sec): 588.65 - samples/sec: 252.62 - lr: 0.000064 - momentum: 0.000000 | |
| 2023-10-10 22:36:34,209 epoch 1 - iter 2605/5212 - loss 1.24536717 - time (sec): 729.60 - samples/sec: 254.32 - lr: 0.000080 - momentum: 0.000000 | |
| 2023-10-10 22:38:55,115 epoch 1 - iter 3126/5212 - loss 1.09417563 - time (sec): 870.50 - samples/sec: 256.70 - lr: 0.000096 - momentum: 0.000000 | |
| 2023-10-10 22:41:14,782 epoch 1 - iter 3647/5212 - loss 0.98901475 - time (sec): 1010.17 - samples/sec: 257.39 - lr: 0.000112 - momentum: 0.000000 | |
| 2023-10-10 22:43:35,247 epoch 1 - iter 4168/5212 - loss 0.90166452 - time (sec): 1150.63 - samples/sec: 256.91 - lr: 0.000128 - momentum: 0.000000 | |
| 2023-10-10 22:45:52,609 epoch 1 - iter 4689/5212 - loss 0.83405959 - time (sec): 1288.00 - samples/sec: 256.15 - lr: 0.000144 - momentum: 0.000000 | |
| 2023-10-10 22:48:12,733 epoch 1 - iter 5210/5212 - loss 0.77082608 - time (sec): 1428.12 - samples/sec: 257.19 - lr: 0.000160 - momentum: 0.000000 | |
| 2023-10-10 22:48:13,210 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:48:13,210 EPOCH 1 done: loss 0.7706 - lr: 0.000160 | |
| 2023-10-10 22:48:47,574 DEV : loss 0.15222090482711792 - f1-score (micro avg) 0.2556 | |
| 2023-10-10 22:48:47,634 saving best model | |
| 2023-10-10 22:48:48,628 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 22:51:07,221 epoch 2 - iter 521/5212 - loss 0.18484159 - time (sec): 138.59 - samples/sec: 253.05 - lr: 0.000158 - momentum: 0.000000 | |
| 2023-10-10 22:53:28,919 epoch 2 - iter 1042/5212 - loss 0.18798941 - time (sec): 280.29 - samples/sec: 253.43 - lr: 0.000156 - momentum: 0.000000 | |
| 2023-10-10 22:55:50,621 epoch 2 - iter 1563/5212 - loss 0.18286963 - time (sec): 421.99 - samples/sec: 255.27 - lr: 0.000155 - momentum: 0.000000 | |
| 2023-10-10 22:58:20,531 epoch 2 - iter 2084/5212 - loss 0.17954531 - time (sec): 571.90 - samples/sec: 249.53 - lr: 0.000153 - momentum: 0.000000 | |
| 2023-10-10 23:00:52,956 epoch 2 - iter 2605/5212 - loss 0.17672694 - time (sec): 724.32 - samples/sec: 248.40 - lr: 0.000151 - momentum: 0.000000 | |
| 2023-10-10 23:03:19,258 epoch 2 - iter 3126/5212 - loss 0.16868245 - time (sec): 870.63 - samples/sec: 250.12 - lr: 0.000149 - momentum: 0.000000 | |
| 2023-10-10 23:05:44,294 epoch 2 - iter 3647/5212 - loss 0.16681779 - time (sec): 1015.66 - samples/sec: 251.89 - lr: 0.000148 - momentum: 0.000000 | |
| 2023-10-10 23:08:07,373 epoch 2 - iter 4168/5212 - loss 0.16454487 - time (sec): 1158.74 - samples/sec: 251.36 - lr: 0.000146 - momentum: 0.000000 | |
| 2023-10-10 23:10:31,277 epoch 2 - iter 4689/5212 - loss 0.16327904 - time (sec): 1302.65 - samples/sec: 250.95 - lr: 0.000144 - momentum: 0.000000 | |
| 2023-10-10 23:12:58,253 epoch 2 - iter 5210/5212 - loss 0.15972063 - time (sec): 1449.62 - samples/sec: 253.30 - lr: 0.000142 - momentum: 0.000000 | |
| 2023-10-10 23:12:58,844 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 23:12:58,844 EPOCH 2 done: loss 0.1598 - lr: 0.000142 | |
| 2023-10-10 23:13:39,614 DEV : loss 0.14660175144672394 - f1-score (micro avg) 0.3391 | |
| 2023-10-10 23:13:39,668 saving best model | |
| 2023-10-10 23:13:42,315 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 23:16:06,577 epoch 3 - iter 521/5212 - loss 0.09466517 - time (sec): 144.26 - samples/sec: 247.08 - lr: 0.000140 - momentum: 0.000000 | |
| 2023-10-10 23:18:30,244 epoch 3 - iter 1042/5212 - loss 0.10152859 - time (sec): 287.93 - samples/sec: 252.60 - lr: 0.000139 - momentum: 0.000000 | |
| 2023-10-10 23:20:52,348 epoch 3 - iter 1563/5212 - loss 0.10301371 - time (sec): 430.03 - samples/sec: 255.52 - lr: 0.000137 - momentum: 0.000000 | |
| 2023-10-10 23:23:12,806 epoch 3 - iter 2084/5212 - loss 0.11068085 - time (sec): 570.49 - samples/sec: 255.33 - lr: 0.000135 - momentum: 0.000000 | |
| 2023-10-10 23:25:40,398 epoch 3 - iter 2605/5212 - loss 0.11269189 - time (sec): 718.08 - samples/sec: 258.48 - lr: 0.000133 - momentum: 0.000000 | |
| 2023-10-10 23:28:03,360 epoch 3 - iter 3126/5212 - loss 0.11124043 - time (sec): 861.04 - samples/sec: 259.51 - lr: 0.000132 - momentum: 0.000000 | |
| 2023-10-10 23:30:25,782 epoch 3 - iter 3647/5212 - loss 0.10989130 - time (sec): 1003.46 - samples/sec: 259.62 - lr: 0.000130 - momentum: 0.000000 | |
| 2023-10-10 23:32:47,962 epoch 3 - iter 4168/5212 - loss 0.10954252 - time (sec): 1145.64 - samples/sec: 258.97 - lr: 0.000128 - momentum: 0.000000 | |
| 2023-10-10 23:35:08,657 epoch 3 - iter 4689/5212 - loss 0.10867168 - time (sec): 1286.34 - samples/sec: 256.45 - lr: 0.000126 - momentum: 0.000000 | |
| 2023-10-10 23:37:30,782 epoch 3 - iter 5210/5212 - loss 0.10800539 - time (sec): 1428.46 - samples/sec: 257.08 - lr: 0.000124 - momentum: 0.000000 | |
| 2023-10-10 23:37:31,326 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 23:37:31,326 EPOCH 3 done: loss 0.1080 - lr: 0.000124 | |
| 2023-10-10 23:38:11,188 DEV : loss 0.17651064693927765 - f1-score (micro avg) 0.3568 | |
| 2023-10-10 23:38:11,239 saving best model | |
| 2023-10-10 23:38:13,860 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-10 23:40:32,288 epoch 4 - iter 521/5212 - loss 0.06937373 - time (sec): 138.42 - samples/sec: 266.87 - lr: 0.000123 - momentum: 0.000000 | |
| 2023-10-10 23:42:53,769 epoch 4 - iter 1042/5212 - loss 0.07237076 - time (sec): 279.90 - samples/sec: 273.67 - lr: 0.000121 - momentum: 0.000000 | |
| 2023-10-10 23:45:12,667 epoch 4 - iter 1563/5212 - loss 0.06838367 - time (sec): 418.80 - samples/sec: 269.27 - lr: 0.000119 - momentum: 0.000000 | |
| 2023-10-10 23:47:31,229 epoch 4 - iter 2084/5212 - loss 0.07063170 - time (sec): 557.36 - samples/sec: 266.63 - lr: 0.000117 - momentum: 0.000000 | |
| 2023-10-10 23:49:55,323 epoch 4 - iter 2605/5212 - loss 0.06907979 - time (sec): 701.46 - samples/sec: 264.86 - lr: 0.000116 - momentum: 0.000000 | |
| 2023-10-10 23:52:19,372 epoch 4 - iter 3126/5212 - loss 0.07197287 - time (sec): 845.51 - samples/sec: 263.54 - lr: 0.000114 - momentum: 0.000000 | |
| 2023-10-10 23:54:44,597 epoch 4 - iter 3647/5212 - loss 0.07262935 - time (sec): 990.73 - samples/sec: 263.16 - lr: 0.000112 - momentum: 0.000000 | |
| 2023-10-10 23:57:08,949 epoch 4 - iter 4168/5212 - loss 0.07328823 - time (sec): 1135.08 - samples/sec: 260.17 - lr: 0.000110 - momentum: 0.000000 | |
| 2023-10-10 23:59:34,787 epoch 4 - iter 4689/5212 - loss 0.07551901 - time (sec): 1280.92 - samples/sec: 259.28 - lr: 0.000108 - momentum: 0.000000 | |
| 2023-10-11 00:01:57,824 epoch 4 - iter 5210/5212 - loss 0.07534259 - time (sec): 1423.96 - samples/sec: 257.97 - lr: 0.000107 - momentum: 0.000000 | |
| 2023-10-11 00:01:58,278 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:01:58,279 EPOCH 4 done: loss 0.0754 - lr: 0.000107 | |
| 2023-10-11 00:02:36,597 DEV : loss 0.3018316924571991 - f1-score (micro avg) 0.3465 | |
| 2023-10-11 00:02:36,649 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:04:55,777 epoch 5 - iter 521/5212 - loss 0.05055408 - time (sec): 139.13 - samples/sec: 247.21 - lr: 0.000105 - momentum: 0.000000 | |
| 2023-10-11 00:07:17,265 epoch 5 - iter 1042/5212 - loss 0.05232179 - time (sec): 280.61 - samples/sec: 250.93 - lr: 0.000103 - momentum: 0.000000 | |
| 2023-10-11 00:09:41,678 epoch 5 - iter 1563/5212 - loss 0.05134707 - time (sec): 425.03 - samples/sec: 255.95 - lr: 0.000101 - momentum: 0.000000 | |
| 2023-10-11 00:12:05,508 epoch 5 - iter 2084/5212 - loss 0.05065103 - time (sec): 568.86 - samples/sec: 255.50 - lr: 0.000100 - momentum: 0.000000 | |
| 2023-10-11 00:14:30,059 epoch 5 - iter 2605/5212 - loss 0.05233700 - time (sec): 713.41 - samples/sec: 255.88 - lr: 0.000098 - momentum: 0.000000 | |
| 2023-10-11 00:16:53,617 epoch 5 - iter 3126/5212 - loss 0.05266053 - time (sec): 856.97 - samples/sec: 256.10 - lr: 0.000096 - momentum: 0.000000 | |
| 2023-10-11 00:19:14,374 epoch 5 - iter 3647/5212 - loss 0.05313538 - time (sec): 997.72 - samples/sec: 255.43 - lr: 0.000094 - momentum: 0.000000 | |
| 2023-10-11 00:21:37,110 epoch 5 - iter 4168/5212 - loss 0.05233419 - time (sec): 1140.46 - samples/sec: 257.40 - lr: 0.000092 - momentum: 0.000000 | |
| 2023-10-11 00:24:00,274 epoch 5 - iter 4689/5212 - loss 0.05299320 - time (sec): 1283.62 - samples/sec: 258.64 - lr: 0.000091 - momentum: 0.000000 | |
| 2023-10-11 00:26:19,077 epoch 5 - iter 5210/5212 - loss 0.05236410 - time (sec): 1422.43 - samples/sec: 258.11 - lr: 0.000089 - momentum: 0.000000 | |
| 2023-10-11 00:26:19,687 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:26:19,687 EPOCH 5 done: loss 0.0523 - lr: 0.000089 | |
| 2023-10-11 00:26:58,423 DEV : loss 0.34503793716430664 - f1-score (micro avg) 0.3566 | |
| 2023-10-11 00:26:58,477 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:29:17,497 epoch 6 - iter 521/5212 - loss 0.03864880 - time (sec): 139.02 - samples/sec: 252.13 - lr: 0.000087 - momentum: 0.000000 | |
| 2023-10-11 00:31:39,164 epoch 6 - iter 1042/5212 - loss 0.03680672 - time (sec): 280.68 - samples/sec: 249.61 - lr: 0.000085 - momentum: 0.000000 | |
| 2023-10-11 00:34:03,957 epoch 6 - iter 1563/5212 - loss 0.03797352 - time (sec): 425.48 - samples/sec: 255.01 - lr: 0.000084 - momentum: 0.000000 | |
| 2023-10-11 00:36:27,335 epoch 6 - iter 2084/5212 - loss 0.03866110 - time (sec): 568.86 - samples/sec: 258.52 - lr: 0.000082 - momentum: 0.000000 | |
| 2023-10-11 00:38:54,007 epoch 6 - iter 2605/5212 - loss 0.03679084 - time (sec): 715.53 - samples/sec: 259.93 - lr: 0.000080 - momentum: 0.000000 | |
| 2023-10-11 00:41:22,709 epoch 6 - iter 3126/5212 - loss 0.03674023 - time (sec): 864.23 - samples/sec: 256.47 - lr: 0.000078 - momentum: 0.000000 | |
| 2023-10-11 00:43:51,760 epoch 6 - iter 3647/5212 - loss 0.03691101 - time (sec): 1013.28 - samples/sec: 254.80 - lr: 0.000076 - momentum: 0.000000 | |
| 2023-10-11 00:46:19,390 epoch 6 - iter 4168/5212 - loss 0.03750351 - time (sec): 1160.91 - samples/sec: 254.23 - lr: 0.000075 - momentum: 0.000000 | |
| 2023-10-11 00:48:44,878 epoch 6 - iter 4689/5212 - loss 0.03642794 - time (sec): 1306.40 - samples/sec: 252.38 - lr: 0.000073 - momentum: 0.000000 | |
| 2023-10-11 00:51:14,302 epoch 6 - iter 5210/5212 - loss 0.03786334 - time (sec): 1455.82 - samples/sec: 252.32 - lr: 0.000071 - momentum: 0.000000 | |
| 2023-10-11 00:51:14,775 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:51:14,775 EPOCH 6 done: loss 0.0379 - lr: 0.000071 | |
| 2023-10-11 00:51:56,260 DEV : loss 0.3911699652671814 - f1-score (micro avg) 0.377 | |
| 2023-10-11 00:51:56,318 saving best model | |
| 2023-10-11 00:51:57,349 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:54:23,005 epoch 7 - iter 521/5212 - loss 0.02758289 - time (sec): 145.65 - samples/sec: 244.02 - lr: 0.000069 - momentum: 0.000000 | |
| 2023-10-11 00:56:48,889 epoch 7 - iter 1042/5212 - loss 0.02486965 - time (sec): 291.54 - samples/sec: 245.37 - lr: 0.000068 - momentum: 0.000000 | |
| 2023-10-11 00:59:14,382 epoch 7 - iter 1563/5212 - loss 0.02452347 - time (sec): 437.03 - samples/sec: 247.72 - lr: 0.000066 - momentum: 0.000000 | |
| 2023-10-11 01:01:42,335 epoch 7 - iter 2084/5212 - loss 0.02779702 - time (sec): 584.98 - samples/sec: 247.26 - lr: 0.000064 - momentum: 0.000000 | |
| 2023-10-11 01:04:12,124 epoch 7 - iter 2605/5212 - loss 0.02791685 - time (sec): 734.77 - samples/sec: 249.16 - lr: 0.000062 - momentum: 0.000000 | |
| 2023-10-11 01:06:41,818 epoch 7 - iter 3126/5212 - loss 0.02752516 - time (sec): 884.47 - samples/sec: 247.89 - lr: 0.000060 - momentum: 0.000000 | |
| 2023-10-11 01:09:13,741 epoch 7 - iter 3647/5212 - loss 0.02871266 - time (sec): 1036.39 - samples/sec: 247.16 - lr: 0.000059 - momentum: 0.000000 | |
| 2023-10-11 01:11:40,180 epoch 7 - iter 4168/5212 - loss 0.02782251 - time (sec): 1182.83 - samples/sec: 245.44 - lr: 0.000057 - momentum: 0.000000 | |
| 2023-10-11 01:14:10,487 epoch 7 - iter 4689/5212 - loss 0.02769821 - time (sec): 1333.14 - samples/sec: 246.10 - lr: 0.000055 - momentum: 0.000000 | |
| 2023-10-11 01:16:41,507 epoch 7 - iter 5210/5212 - loss 0.02717297 - time (sec): 1484.16 - samples/sec: 247.48 - lr: 0.000053 - momentum: 0.000000 | |
| 2023-10-11 01:16:42,009 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 01:16:42,010 EPOCH 7 done: loss 0.0272 - lr: 0.000053 | |
| 2023-10-11 01:17:23,080 DEV : loss 0.4509029686450958 - f1-score (micro avg) 0.3605 | |
| 2023-10-11 01:17:23,145 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 01:19:56,459 epoch 8 - iter 521/5212 - loss 0.02089913 - time (sec): 153.31 - samples/sec: 260.97 - lr: 0.000052 - momentum: 0.000000 | |
| 2023-10-11 01:22:23,031 epoch 8 - iter 1042/5212 - loss 0.01959801 - time (sec): 299.88 - samples/sec: 258.01 - lr: 0.000050 - momentum: 0.000000 | |
| 2023-10-11 01:24:44,575 epoch 8 - iter 1563/5212 - loss 0.01903963 - time (sec): 441.43 - samples/sec: 254.36 - lr: 0.000048 - momentum: 0.000000 | |
| 2023-10-11 01:27:13,502 epoch 8 - iter 2084/5212 - loss 0.01784594 - time (sec): 590.35 - samples/sec: 254.25 - lr: 0.000046 - momentum: 0.000000 | |
| 2023-10-11 01:29:41,666 epoch 8 - iter 2605/5212 - loss 0.01877834 - time (sec): 738.52 - samples/sec: 251.02 - lr: 0.000044 - momentum: 0.000000 | |
| 2023-10-11 01:32:11,565 epoch 8 - iter 3126/5212 - loss 0.01871028 - time (sec): 888.42 - samples/sec: 251.11 - lr: 0.000043 - momentum: 0.000000 | |
| 2023-10-11 01:34:38,191 epoch 8 - iter 3647/5212 - loss 0.01838741 - time (sec): 1035.04 - samples/sec: 250.50 - lr: 0.000041 - momentum: 0.000000 | |
| 2023-10-11 01:37:06,347 epoch 8 - iter 4168/5212 - loss 0.01893451 - time (sec): 1183.20 - samples/sec: 248.63 - lr: 0.000039 - momentum: 0.000000 | |
| 2023-10-11 01:39:32,627 epoch 8 - iter 4689/5212 - loss 0.01873050 - time (sec): 1329.48 - samples/sec: 246.39 - lr: 0.000037 - momentum: 0.000000 | |
| 2023-10-11 01:42:01,006 epoch 8 - iter 5210/5212 - loss 0.01887562 - time (sec): 1477.86 - samples/sec: 248.60 - lr: 0.000036 - momentum: 0.000000 | |
| 2023-10-11 01:42:01,426 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 01:42:01,427 EPOCH 8 done: loss 0.0189 - lr: 0.000036 | |
| 2023-10-11 01:42:43,115 DEV : loss 0.47359704971313477 - f1-score (micro avg) 0.3753 | |
| 2023-10-11 01:42:43,171 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 01:45:15,714 epoch 9 - iter 521/5212 - loss 0.01495092 - time (sec): 152.54 - samples/sec: 244.06 - lr: 0.000034 - momentum: 0.000000 | |
| 2023-10-11 01:47:47,653 epoch 9 - iter 1042/5212 - loss 0.01250951 - time (sec): 304.48 - samples/sec: 254.36 - lr: 0.000032 - momentum: 0.000000 | |
| 2023-10-11 01:50:15,849 epoch 9 - iter 1563/5212 - loss 0.01218680 - time (sec): 452.68 - samples/sec: 250.90 - lr: 0.000030 - momentum: 0.000000 | |
| 2023-10-11 01:52:40,851 epoch 9 - iter 2084/5212 - loss 0.01212985 - time (sec): 597.68 - samples/sec: 245.62 - lr: 0.000028 - momentum: 0.000000 | |
| 2023-10-11 01:55:10,319 epoch 9 - iter 2605/5212 - loss 0.01257174 - time (sec): 747.15 - samples/sec: 248.48 - lr: 0.000027 - momentum: 0.000000 | |
| 2023-10-11 01:57:33,873 epoch 9 - iter 3126/5212 - loss 0.01277984 - time (sec): 890.70 - samples/sec: 247.69 - lr: 0.000025 - momentum: 0.000000 | |
| 2023-10-11 01:59:56,151 epoch 9 - iter 3647/5212 - loss 0.01203742 - time (sec): 1032.98 - samples/sec: 247.46 - lr: 0.000023 - momentum: 0.000000 | |
| 2023-10-11 02:02:18,498 epoch 9 - iter 4168/5212 - loss 0.01213790 - time (sec): 1175.32 - samples/sec: 248.50 - lr: 0.000021 - momentum: 0.000000 | |
| 2023-10-11 02:04:43,630 epoch 9 - iter 4689/5212 - loss 0.01165778 - time (sec): 1320.46 - samples/sec: 248.72 - lr: 0.000020 - momentum: 0.000000 | |
| 2023-10-11 02:07:12,851 epoch 9 - iter 5210/5212 - loss 0.01197423 - time (sec): 1469.68 - samples/sec: 249.97 - lr: 0.000018 - momentum: 0.000000 | |
| 2023-10-11 02:07:13,282 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 02:07:13,282 EPOCH 9 done: loss 0.0120 - lr: 0.000018 | |
| 2023-10-11 02:07:53,434 DEV : loss 0.41926491260528564 - f1-score (micro avg) 0.3992 | |
| 2023-10-11 02:07:53,488 saving best model | |
| 2023-10-11 02:07:59,890 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 02:10:30,789 epoch 10 - iter 521/5212 - loss 0.00878818 - time (sec): 150.89 - samples/sec: 253.51 - lr: 0.000016 - momentum: 0.000000 | |
| 2023-10-11 02:13:01,984 epoch 10 - iter 1042/5212 - loss 0.01027434 - time (sec): 302.09 - samples/sec: 254.66 - lr: 0.000014 - momentum: 0.000000 | |
| 2023-10-11 02:15:30,771 epoch 10 - iter 1563/5212 - loss 0.00888626 - time (sec): 450.88 - samples/sec: 250.10 - lr: 0.000012 - momentum: 0.000000 | |
| 2023-10-11 02:17:55,199 epoch 10 - iter 2084/5212 - loss 0.00837755 - time (sec): 595.31 - samples/sec: 244.29 - lr: 0.000011 - momentum: 0.000000 | |
| 2023-10-11 02:20:21,472 epoch 10 - iter 2605/5212 - loss 0.00871533 - time (sec): 741.58 - samples/sec: 246.59 - lr: 0.000009 - momentum: 0.000000 | |
| 2023-10-11 02:22:49,693 epoch 10 - iter 3126/5212 - loss 0.00858225 - time (sec): 889.80 - samples/sec: 246.01 - lr: 0.000007 - momentum: 0.000000 | |
| 2023-10-11 02:25:17,618 epoch 10 - iter 3647/5212 - loss 0.00859775 - time (sec): 1037.72 - samples/sec: 247.36 - lr: 0.000005 - momentum: 0.000000 | |
| 2023-10-11 02:27:46,389 epoch 10 - iter 4168/5212 - loss 0.00874080 - time (sec): 1186.49 - samples/sec: 249.03 - lr: 0.000004 - momentum: 0.000000 | |
| 2023-10-11 02:30:11,769 epoch 10 - iter 4689/5212 - loss 0.00880848 - time (sec): 1331.87 - samples/sec: 248.84 - lr: 0.000002 - momentum: 0.000000 | |
| 2023-10-11 02:32:39,960 epoch 10 - iter 5210/5212 - loss 0.00872135 - time (sec): 1480.07 - samples/sec: 248.04 - lr: 0.000000 - momentum: 0.000000 | |
| 2023-10-11 02:32:40,633 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 02:32:40,634 EPOCH 10 done: loss 0.0087 - lr: 0.000000 | |
| 2023-10-11 02:33:20,405 DEV : loss 0.4779926538467407 - f1-score (micro avg) 0.3891 | |
| 2023-10-11 02:33:21,353 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 02:33:21,355 Loading model from best epoch ... | |
| 2023-10-11 02:33:25,843 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd | |
| 2023-10-11 02:35:07,559 | |
| Results: | |
| - F-score (micro) 0.4583 | |
| - F-score (macro) 0.3044 | |
| - Accuracy 0.3024 | |
| By class: | |
| precision recall f1-score support | |
| LOC 0.5082 0.5601 0.5329 1214 | |
| PER 0.4197 0.4530 0.4357 808 | |
| ORG 0.2576 0.2408 0.2489 353 | |
| HumanProd 0.0000 0.0000 0.0000 15 | |
| micro avg 0.4442 0.4732 0.4583 2390 | |
| macro avg 0.2964 0.3135 0.3044 2390 | |
| weighted avg 0.4381 0.4732 0.4548 2390 | |
| 2023-10-11 02:35:07,559 ---------------------------------------------------------------------------------------------------- | |