| 2023-10-11 00:31:54,080 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,083 Model: "SequenceTagger( | |
| (embeddings): ByT5Embeddings( | |
| (model): T5EncoderModel( | |
| (shared): Embedding(384, 1472) | |
| (encoder): T5Stack( | |
| (embed_tokens): Embedding(384, 1472) | |
| (block): ModuleList( | |
| (0): T5Block( | |
| (layer): ModuleList( | |
| (0): T5LayerSelfAttention( | |
| (SelfAttention): T5Attention( | |
| (q): Linear(in_features=1472, out_features=384, bias=False) | |
| (k): Linear(in_features=1472, out_features=384, bias=False) | |
| (v): Linear(in_features=1472, out_features=384, bias=False) | |
| (o): Linear(in_features=384, out_features=1472, bias=False) | |
| (relative_attention_bias): Embedding(32, 6) | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (1): T5LayerFF( | |
| (DenseReluDense): T5DenseGatedActDense( | |
| (wi_0): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wi_1): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wo): Linear(in_features=3584, out_features=1472, bias=False) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| (act): NewGELUActivation() | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| ) | |
| ) | |
| (1-11): 11 x T5Block( | |
| (layer): ModuleList( | |
| (0): T5LayerSelfAttention( | |
| (SelfAttention): T5Attention( | |
| (q): Linear(in_features=1472, out_features=384, bias=False) | |
| (k): Linear(in_features=1472, out_features=384, bias=False) | |
| (v): Linear(in_features=1472, out_features=384, bias=False) | |
| (o): Linear(in_features=384, out_features=1472, bias=False) | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (1): T5LayerFF( | |
| (DenseReluDense): T5DenseGatedActDense( | |
| (wi_0): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wi_1): Linear(in_features=1472, out_features=3584, bias=False) | |
| (wo): Linear(in_features=3584, out_features=1472, bias=False) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| (act): NewGELUActivation() | |
| ) | |
| (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| ) | |
| ) | |
| ) | |
| (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| ) | |
| ) | |
| (locked_dropout): LockedDropout(p=0.5) | |
| (linear): Linear(in_features=1472, out_features=17, bias=True) | |
| (loss_function): CrossEntropyLoss() | |
| )" | |
| 2023-10-11 00:31:54,083 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,083 MultiCorpus: 1166 train + 165 dev + 415 test sentences | |
| - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator | |
| 2023-10-11 00:31:54,083 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,083 Train: 1166 sentences | |
| 2023-10-11 00:31:54,083 (train_with_dev=False, train_with_test=False) | |
| 2023-10-11 00:31:54,083 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,083 Training Params: | |
| 2023-10-11 00:31:54,083 - learning_rate: "0.00016" | |
| 2023-10-11 00:31:54,083 - mini_batch_size: "8" | |
| 2023-10-11 00:31:54,084 - max_epochs: "10" | |
| 2023-10-11 00:31:54,084 - shuffle: "True" | |
| 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,084 Plugins: | |
| 2023-10-11 00:31:54,084 - TensorboardLogger | |
| 2023-10-11 00:31:54,084 - LinearScheduler | warmup_fraction: '0.1' | |
| 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,084 Final evaluation on model from best epoch (best-model.pt) | |
| 2023-10-11 00:31:54,084 - metric: "('micro avg', 'f1-score')" | |
| 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,084 Computation: | |
| 2023-10-11 00:31:54,084 - compute on device: cuda:0 | |
| 2023-10-11 00:31:54,084 - embedding storage: none | |
| 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,084 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" | |
| 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,085 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:31:54,085 Logging anything other than scalars to TensorBoard is currently not supported. | |
| 2023-10-11 00:32:02,779 epoch 1 - iter 14/146 - loss 2.82806942 - time (sec): 8.69 - samples/sec: 431.49 - lr: 0.000014 - momentum: 0.000000 | |
| 2023-10-11 00:32:12,268 epoch 1 - iter 28/146 - loss 2.81932722 - time (sec): 18.18 - samples/sec: 447.92 - lr: 0.000030 - momentum: 0.000000 | |
| 2023-10-11 00:32:21,654 epoch 1 - iter 42/146 - loss 2.80887588 - time (sec): 27.57 - samples/sec: 443.61 - lr: 0.000045 - momentum: 0.000000 | |
| 2023-10-11 00:32:30,455 epoch 1 - iter 56/146 - loss 2.79001925 - time (sec): 36.37 - samples/sec: 435.57 - lr: 0.000060 - momentum: 0.000000 | |
| 2023-10-11 00:32:40,316 epoch 1 - iter 70/146 - loss 2.74954234 - time (sec): 46.23 - samples/sec: 447.70 - lr: 0.000076 - momentum: 0.000000 | |
| 2023-10-11 00:32:50,359 epoch 1 - iter 84/146 - loss 2.69183004 - time (sec): 56.27 - samples/sec: 456.68 - lr: 0.000091 - momentum: 0.000000 | |
| 2023-10-11 00:32:59,788 epoch 1 - iter 98/146 - loss 2.62431560 - time (sec): 65.70 - samples/sec: 456.55 - lr: 0.000106 - momentum: 0.000000 | |
| 2023-10-11 00:33:09,461 epoch 1 - iter 112/146 - loss 2.55499292 - time (sec): 75.37 - samples/sec: 451.80 - lr: 0.000122 - momentum: 0.000000 | |
| 2023-10-11 00:33:18,969 epoch 1 - iter 126/146 - loss 2.46794210 - time (sec): 84.88 - samples/sec: 452.41 - lr: 0.000137 - momentum: 0.000000 | |
| 2023-10-11 00:33:28,484 epoch 1 - iter 140/146 - loss 2.38373239 - time (sec): 94.40 - samples/sec: 451.06 - lr: 0.000152 - momentum: 0.000000 | |
| 2023-10-11 00:33:32,416 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:33:32,416 EPOCH 1 done: loss 2.3459 - lr: 0.000152 | |
| 2023-10-11 00:33:37,294 DEV : loss 1.2697190046310425 - f1-score (micro avg) 0.0 | |
| 2023-10-11 00:33:37,303 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:33:46,076 epoch 2 - iter 14/146 - loss 1.28532706 - time (sec): 8.77 - samples/sec: 430.14 - lr: 0.000158 - momentum: 0.000000 | |
| 2023-10-11 00:33:55,330 epoch 2 - iter 28/146 - loss 1.18643116 - time (sec): 18.03 - samples/sec: 432.90 - lr: 0.000157 - momentum: 0.000000 | |
| 2023-10-11 00:34:04,613 epoch 2 - iter 42/146 - loss 1.10584441 - time (sec): 27.31 - samples/sec: 441.81 - lr: 0.000155 - momentum: 0.000000 | |
| 2023-10-11 00:34:13,560 epoch 2 - iter 56/146 - loss 1.04255237 - time (sec): 36.26 - samples/sec: 438.39 - lr: 0.000153 - momentum: 0.000000 | |
| 2023-10-11 00:34:23,157 epoch 2 - iter 70/146 - loss 0.95927936 - time (sec): 45.85 - samples/sec: 446.28 - lr: 0.000152 - momentum: 0.000000 | |
| 2023-10-11 00:34:32,776 epoch 2 - iter 84/146 - loss 0.93373424 - time (sec): 55.47 - samples/sec: 451.40 - lr: 0.000150 - momentum: 0.000000 | |
| 2023-10-11 00:34:41,982 epoch 2 - iter 98/146 - loss 0.89334982 - time (sec): 64.68 - samples/sec: 448.77 - lr: 0.000148 - momentum: 0.000000 | |
| 2023-10-11 00:34:51,404 epoch 2 - iter 112/146 - loss 0.84771656 - time (sec): 74.10 - samples/sec: 451.21 - lr: 0.000147 - momentum: 0.000000 | |
| 2023-10-11 00:35:01,045 epoch 2 - iter 126/146 - loss 0.81012905 - time (sec): 83.74 - samples/sec: 452.68 - lr: 0.000145 - momentum: 0.000000 | |
| 2023-10-11 00:35:10,772 epoch 2 - iter 140/146 - loss 0.78129014 - time (sec): 93.47 - samples/sec: 453.54 - lr: 0.000143 - momentum: 0.000000 | |
| 2023-10-11 00:35:14,956 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:35:14,956 EPOCH 2 done: loss 0.7769 - lr: 0.000143 | |
| 2023-10-11 00:35:20,385 DEV : loss 0.4217626750469208 - f1-score (micro avg) 0.0 | |
| 2023-10-11 00:35:20,394 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:35:30,560 epoch 3 - iter 14/146 - loss 0.52507267 - time (sec): 10.16 - samples/sec: 487.22 - lr: 0.000141 - momentum: 0.000000 | |
| 2023-10-11 00:35:40,572 epoch 3 - iter 28/146 - loss 0.47734218 - time (sec): 20.18 - samples/sec: 498.62 - lr: 0.000139 - momentum: 0.000000 | |
| 2023-10-11 00:35:49,814 epoch 3 - iter 42/146 - loss 0.52236022 - time (sec): 29.42 - samples/sec: 491.47 - lr: 0.000137 - momentum: 0.000000 | |
| 2023-10-11 00:35:58,140 epoch 3 - iter 56/146 - loss 0.49316258 - time (sec): 37.74 - samples/sec: 490.92 - lr: 0.000136 - momentum: 0.000000 | |
| 2023-10-11 00:36:06,676 epoch 3 - iter 70/146 - loss 0.48608795 - time (sec): 46.28 - samples/sec: 493.22 - lr: 0.000134 - momentum: 0.000000 | |
| 2023-10-11 00:36:15,235 epoch 3 - iter 84/146 - loss 0.46795110 - time (sec): 54.84 - samples/sec: 495.23 - lr: 0.000132 - momentum: 0.000000 | |
| 2023-10-11 00:36:23,555 epoch 3 - iter 98/146 - loss 0.44959769 - time (sec): 63.16 - samples/sec: 493.81 - lr: 0.000131 - momentum: 0.000000 | |
| 2023-10-11 00:36:31,553 epoch 3 - iter 112/146 - loss 0.44307766 - time (sec): 71.16 - samples/sec: 488.26 - lr: 0.000129 - momentum: 0.000000 | |
| 2023-10-11 00:36:39,208 epoch 3 - iter 126/146 - loss 0.43279780 - time (sec): 78.81 - samples/sec: 482.64 - lr: 0.000127 - momentum: 0.000000 | |
| 2023-10-11 00:36:47,675 epoch 3 - iter 140/146 - loss 0.42683695 - time (sec): 87.28 - samples/sec: 482.64 - lr: 0.000125 - momentum: 0.000000 | |
| 2023-10-11 00:36:51,608 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:36:51,608 EPOCH 3 done: loss 0.4184 - lr: 0.000125 | |
| 2023-10-11 00:36:57,059 DEV : loss 0.2698569595813751 - f1-score (micro avg) 0.2605 | |
| 2023-10-11 00:36:57,068 saving best model | |
| 2023-10-11 00:36:57,949 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:37:06,223 epoch 4 - iter 14/146 - loss 0.31637424 - time (sec): 8.27 - samples/sec: 464.20 - lr: 0.000123 - momentum: 0.000000 | |
| 2023-10-11 00:37:15,069 epoch 4 - iter 28/146 - loss 0.31876580 - time (sec): 17.12 - samples/sec: 488.18 - lr: 0.000121 - momentum: 0.000000 | |
| 2023-10-11 00:37:23,150 epoch 4 - iter 42/146 - loss 0.30486535 - time (sec): 25.20 - samples/sec: 488.19 - lr: 0.000120 - momentum: 0.000000 | |
| 2023-10-11 00:37:31,536 epoch 4 - iter 56/146 - loss 0.31674327 - time (sec): 33.59 - samples/sec: 491.25 - lr: 0.000118 - momentum: 0.000000 | |
| 2023-10-11 00:37:40,236 epoch 4 - iter 70/146 - loss 0.30195569 - time (sec): 42.29 - samples/sec: 500.46 - lr: 0.000116 - momentum: 0.000000 | |
| 2023-10-11 00:37:48,519 epoch 4 - iter 84/146 - loss 0.32557627 - time (sec): 50.57 - samples/sec: 499.37 - lr: 0.000115 - momentum: 0.000000 | |
| 2023-10-11 00:37:56,785 epoch 4 - iter 98/146 - loss 0.31759790 - time (sec): 58.83 - samples/sec: 498.45 - lr: 0.000113 - momentum: 0.000000 | |
| 2023-10-11 00:38:05,588 epoch 4 - iter 112/146 - loss 0.31022905 - time (sec): 67.64 - samples/sec: 501.32 - lr: 0.000111 - momentum: 0.000000 | |
| 2023-10-11 00:38:13,906 epoch 4 - iter 126/146 - loss 0.31045555 - time (sec): 75.96 - samples/sec: 500.41 - lr: 0.000109 - momentum: 0.000000 | |
| 2023-10-11 00:38:22,839 epoch 4 - iter 140/146 - loss 0.30366458 - time (sec): 84.89 - samples/sec: 499.98 - lr: 0.000108 - momentum: 0.000000 | |
| 2023-10-11 00:38:26,552 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:38:26,552 EPOCH 4 done: loss 0.2989 - lr: 0.000108 | |
| 2023-10-11 00:38:32,194 DEV : loss 0.209104984998703 - f1-score (micro avg) 0.4208 | |
| 2023-10-11 00:38:32,206 saving best model | |
| 2023-10-11 00:38:39,160 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:38:49,119 epoch 5 - iter 14/146 - loss 0.25760456 - time (sec): 9.96 - samples/sec: 452.21 - lr: 0.000105 - momentum: 0.000000 | |
| 2023-10-11 00:38:58,893 epoch 5 - iter 28/146 - loss 0.23471806 - time (sec): 19.73 - samples/sec: 438.29 - lr: 0.000104 - momentum: 0.000000 | |
| 2023-10-11 00:39:08,357 epoch 5 - iter 42/146 - loss 0.26661266 - time (sec): 29.19 - samples/sec: 432.83 - lr: 0.000102 - momentum: 0.000000 | |
| 2023-10-11 00:39:17,448 epoch 5 - iter 56/146 - loss 0.28286595 - time (sec): 38.28 - samples/sec: 428.20 - lr: 0.000100 - momentum: 0.000000 | |
| 2023-10-11 00:39:27,150 epoch 5 - iter 70/146 - loss 0.26401052 - time (sec): 47.99 - samples/sec: 427.81 - lr: 0.000099 - momentum: 0.000000 | |
| 2023-10-11 00:39:37,205 epoch 5 - iter 84/146 - loss 0.25154027 - time (sec): 58.04 - samples/sec: 433.92 - lr: 0.000097 - momentum: 0.000000 | |
| 2023-10-11 00:39:47,492 epoch 5 - iter 98/146 - loss 0.24582348 - time (sec): 68.33 - samples/sec: 443.59 - lr: 0.000095 - momentum: 0.000000 | |
| 2023-10-11 00:39:57,217 epoch 5 - iter 112/146 - loss 0.23536220 - time (sec): 78.05 - samples/sec: 444.78 - lr: 0.000093 - momentum: 0.000000 | |
| 2023-10-11 00:40:06,736 epoch 5 - iter 126/146 - loss 0.23213010 - time (sec): 87.57 - samples/sec: 445.89 - lr: 0.000092 - momentum: 0.000000 | |
| 2023-10-11 00:40:16,105 epoch 5 - iter 140/146 - loss 0.22807507 - time (sec): 96.94 - samples/sec: 444.47 - lr: 0.000090 - momentum: 0.000000 | |
| 2023-10-11 00:40:19,721 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:40:19,722 EPOCH 5 done: loss 0.2286 - lr: 0.000090 | |
| 2023-10-11 00:40:26,552 DEV : loss 0.17275798320770264 - f1-score (micro avg) 0.533 | |
| 2023-10-11 00:40:26,563 saving best model | |
| 2023-10-11 00:40:34,103 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:40:43,731 epoch 6 - iter 14/146 - loss 0.14647893 - time (sec): 9.62 - samples/sec: 508.91 - lr: 0.000088 - momentum: 0.000000 | |
| 2023-10-11 00:40:52,055 epoch 6 - iter 28/146 - loss 0.15333865 - time (sec): 17.95 - samples/sec: 475.21 - lr: 0.000086 - momentum: 0.000000 | |
| 2023-10-11 00:41:00,745 epoch 6 - iter 42/146 - loss 0.15603887 - time (sec): 26.64 - samples/sec: 476.53 - lr: 0.000084 - momentum: 0.000000 | |
| 2023-10-11 00:41:09,885 epoch 6 - iter 56/146 - loss 0.14799884 - time (sec): 35.78 - samples/sec: 480.65 - lr: 0.000083 - momentum: 0.000000 | |
| 2023-10-11 00:41:18,384 epoch 6 - iter 70/146 - loss 0.16215169 - time (sec): 44.28 - samples/sec: 478.17 - lr: 0.000081 - momentum: 0.000000 | |
| 2023-10-11 00:41:28,260 epoch 6 - iter 84/146 - loss 0.17969819 - time (sec): 54.15 - samples/sec: 492.60 - lr: 0.000079 - momentum: 0.000000 | |
| 2023-10-11 00:41:36,809 epoch 6 - iter 98/146 - loss 0.17881606 - time (sec): 62.70 - samples/sec: 489.82 - lr: 0.000077 - momentum: 0.000000 | |
| 2023-10-11 00:41:45,394 epoch 6 - iter 112/146 - loss 0.17728906 - time (sec): 71.29 - samples/sec: 488.60 - lr: 0.000076 - momentum: 0.000000 | |
| 2023-10-11 00:41:54,314 epoch 6 - iter 126/146 - loss 0.17321844 - time (sec): 80.21 - samples/sec: 486.73 - lr: 0.000074 - momentum: 0.000000 | |
| 2023-10-11 00:42:02,929 epoch 6 - iter 140/146 - loss 0.17286199 - time (sec): 88.82 - samples/sec: 481.50 - lr: 0.000072 - momentum: 0.000000 | |
| 2023-10-11 00:42:06,597 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:42:06,597 EPOCH 6 done: loss 0.1704 - lr: 0.000072 | |
| 2023-10-11 00:42:12,450 DEV : loss 0.1590806394815445 - f1-score (micro avg) 0.6079 | |
| 2023-10-11 00:42:12,460 saving best model | |
| 2023-10-11 00:42:19,778 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:42:29,140 epoch 7 - iter 14/146 - loss 0.13574474 - time (sec): 9.36 - samples/sec: 494.46 - lr: 0.000070 - momentum: 0.000000 | |
| 2023-10-11 00:42:38,694 epoch 7 - iter 28/146 - loss 0.13424153 - time (sec): 18.91 - samples/sec: 501.67 - lr: 0.000068 - momentum: 0.000000 | |
| 2023-10-11 00:42:47,777 epoch 7 - iter 42/146 - loss 0.13292302 - time (sec): 27.99 - samples/sec: 487.31 - lr: 0.000067 - momentum: 0.000000 | |
| 2023-10-11 00:42:56,399 epoch 7 - iter 56/146 - loss 0.12583950 - time (sec): 36.62 - samples/sec: 476.75 - lr: 0.000065 - momentum: 0.000000 | |
| 2023-10-11 00:43:05,449 epoch 7 - iter 70/146 - loss 0.12456020 - time (sec): 45.67 - samples/sec: 471.05 - lr: 0.000063 - momentum: 0.000000 | |
| 2023-10-11 00:43:13,769 epoch 7 - iter 84/146 - loss 0.12805783 - time (sec): 53.99 - samples/sec: 469.43 - lr: 0.000061 - momentum: 0.000000 | |
| 2023-10-11 00:43:22,866 epoch 7 - iter 98/146 - loss 0.13275272 - time (sec): 63.08 - samples/sec: 473.50 - lr: 0.000060 - momentum: 0.000000 | |
| 2023-10-11 00:43:31,148 epoch 7 - iter 112/146 - loss 0.13262712 - time (sec): 71.37 - samples/sec: 465.17 - lr: 0.000058 - momentum: 0.000000 | |
| 2023-10-11 00:43:40,630 epoch 7 - iter 126/146 - loss 0.13489647 - time (sec): 80.85 - samples/sec: 470.22 - lr: 0.000056 - momentum: 0.000000 | |
| 2023-10-11 00:43:50,153 epoch 7 - iter 140/146 - loss 0.13471555 - time (sec): 90.37 - samples/sec: 475.26 - lr: 0.000055 - momentum: 0.000000 | |
| 2023-10-11 00:43:53,631 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:43:53,631 EPOCH 7 done: loss 0.1341 - lr: 0.000055 | |
| 2023-10-11 00:43:59,885 DEV : loss 0.1412263810634613 - f1-score (micro avg) 0.7484 | |
| 2023-10-11 00:43:59,896 saving best model | |
| 2023-10-11 00:44:04,148 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:44:14,108 epoch 8 - iter 14/146 - loss 0.11916361 - time (sec): 9.96 - samples/sec: 527.93 - lr: 0.000052 - momentum: 0.000000 | |
| 2023-10-11 00:44:22,742 epoch 8 - iter 28/146 - loss 0.13021523 - time (sec): 18.59 - samples/sec: 481.62 - lr: 0.000051 - momentum: 0.000000 | |
| 2023-10-11 00:44:31,277 epoch 8 - iter 42/146 - loss 0.12227977 - time (sec): 27.12 - samples/sec: 474.15 - lr: 0.000049 - momentum: 0.000000 | |
| 2023-10-11 00:44:39,841 epoch 8 - iter 56/146 - loss 0.12444551 - time (sec): 35.69 - samples/sec: 479.15 - lr: 0.000047 - momentum: 0.000000 | |
| 2023-10-11 00:44:48,733 epoch 8 - iter 70/146 - loss 0.12722021 - time (sec): 44.58 - samples/sec: 483.62 - lr: 0.000045 - momentum: 0.000000 | |
| 2023-10-11 00:44:57,112 epoch 8 - iter 84/146 - loss 0.12712166 - time (sec): 52.96 - samples/sec: 476.71 - lr: 0.000044 - momentum: 0.000000 | |
| 2023-10-11 00:45:05,995 epoch 8 - iter 98/146 - loss 0.12123456 - time (sec): 61.84 - samples/sec: 474.22 - lr: 0.000042 - momentum: 0.000000 | |
| 2023-10-11 00:45:15,792 epoch 8 - iter 112/146 - loss 0.11590990 - time (sec): 71.64 - samples/sec: 470.71 - lr: 0.000040 - momentum: 0.000000 | |
| 2023-10-11 00:45:25,852 epoch 8 - iter 126/146 - loss 0.11272017 - time (sec): 81.70 - samples/sec: 467.06 - lr: 0.000039 - momentum: 0.000000 | |
| 2023-10-11 00:45:35,986 epoch 8 - iter 140/146 - loss 0.11288497 - time (sec): 91.83 - samples/sec: 462.69 - lr: 0.000037 - momentum: 0.000000 | |
| 2023-10-11 00:45:40,161 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:45:40,161 EPOCH 8 done: loss 0.1126 - lr: 0.000037 | |
| 2023-10-11 00:45:46,862 DEV : loss 0.13121522963047028 - f1-score (micro avg) 0.7425 | |
| 2023-10-11 00:45:46,873 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:45:56,825 epoch 9 - iter 14/146 - loss 0.12532790 - time (sec): 9.95 - samples/sec: 472.51 - lr: 0.000035 - momentum: 0.000000 | |
| 2023-10-11 00:46:07,362 epoch 9 - iter 28/146 - loss 0.10275371 - time (sec): 20.49 - samples/sec: 455.04 - lr: 0.000033 - momentum: 0.000000 | |
| 2023-10-11 00:46:16,396 epoch 9 - iter 42/146 - loss 0.09731750 - time (sec): 29.52 - samples/sec: 445.90 - lr: 0.000031 - momentum: 0.000000 | |
| 2023-10-11 00:46:26,755 epoch 9 - iter 56/146 - loss 0.09793219 - time (sec): 39.88 - samples/sec: 442.79 - lr: 0.000029 - momentum: 0.000000 | |
| 2023-10-11 00:46:36,624 epoch 9 - iter 70/146 - loss 0.09884983 - time (sec): 49.75 - samples/sec: 441.18 - lr: 0.000028 - momentum: 0.000000 | |
| 2023-10-11 00:46:46,452 epoch 9 - iter 84/146 - loss 0.09904834 - time (sec): 59.58 - samples/sec: 441.88 - lr: 0.000026 - momentum: 0.000000 | |
| 2023-10-11 00:46:56,310 epoch 9 - iter 98/146 - loss 0.09634791 - time (sec): 69.43 - samples/sec: 438.61 - lr: 0.000024 - momentum: 0.000000 | |
| 2023-10-11 00:47:06,018 epoch 9 - iter 112/146 - loss 0.09264175 - time (sec): 79.14 - samples/sec: 439.24 - lr: 0.000023 - momentum: 0.000000 | |
| 2023-10-11 00:47:15,964 epoch 9 - iter 126/146 - loss 0.09617828 - time (sec): 89.09 - samples/sec: 438.73 - lr: 0.000021 - momentum: 0.000000 | |
| 2023-10-11 00:47:25,790 epoch 9 - iter 140/146 - loss 0.09910015 - time (sec): 98.91 - samples/sec: 435.56 - lr: 0.000019 - momentum: 0.000000 | |
| 2023-10-11 00:47:29,342 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:47:29,343 EPOCH 9 done: loss 0.0986 - lr: 0.000019 | |
| 2023-10-11 00:47:36,270 DEV : loss 0.1271078884601593 - f1-score (micro avg) 0.78 | |
| 2023-10-11 00:47:36,281 saving best model | |
| 2023-10-11 00:47:42,280 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:47:51,489 epoch 10 - iter 14/146 - loss 0.10220196 - time (sec): 9.21 - samples/sec: 498.35 - lr: 0.000017 - momentum: 0.000000 | |
| 2023-10-11 00:48:01,979 epoch 10 - iter 28/146 - loss 0.10097503 - time (sec): 19.70 - samples/sec: 463.65 - lr: 0.000015 - momentum: 0.000000 | |
| 2023-10-11 00:48:11,849 epoch 10 - iter 42/146 - loss 0.10253434 - time (sec): 29.57 - samples/sec: 471.18 - lr: 0.000013 - momentum: 0.000000 | |
| 2023-10-11 00:48:21,454 epoch 10 - iter 56/146 - loss 0.09627103 - time (sec): 39.17 - samples/sec: 475.71 - lr: 0.000012 - momentum: 0.000000 | |
| 2023-10-11 00:48:29,986 epoch 10 - iter 70/146 - loss 0.09699707 - time (sec): 47.70 - samples/sec: 474.22 - lr: 0.000010 - momentum: 0.000000 | |
| 2023-10-11 00:48:39,413 epoch 10 - iter 84/146 - loss 0.09278009 - time (sec): 57.13 - samples/sec: 472.41 - lr: 0.000008 - momentum: 0.000000 | |
| 2023-10-11 00:48:47,745 epoch 10 - iter 98/146 - loss 0.09014855 - time (sec): 65.46 - samples/sec: 460.60 - lr: 0.000007 - momentum: 0.000000 | |
| 2023-10-11 00:48:56,979 epoch 10 - iter 112/146 - loss 0.09219331 - time (sec): 74.70 - samples/sec: 463.89 - lr: 0.000005 - momentum: 0.000000 | |
| 2023-10-11 00:49:05,736 epoch 10 - iter 126/146 - loss 0.09003232 - time (sec): 83.45 - samples/sec: 462.88 - lr: 0.000003 - momentum: 0.000000 | |
| 2023-10-11 00:49:14,622 epoch 10 - iter 140/146 - loss 0.09300204 - time (sec): 92.34 - samples/sec: 462.35 - lr: 0.000002 - momentum: 0.000000 | |
| 2023-10-11 00:49:18,215 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:49:18,215 EPOCH 10 done: loss 0.0927 - lr: 0.000002 | |
| 2023-10-11 00:49:24,001 DEV : loss 0.1254645138978958 - f1-score (micro avg) 0.779 | |
| 2023-10-11 00:49:24,961 ---------------------------------------------------------------------------------------------------- | |
| 2023-10-11 00:49:24,963 Loading model from best epoch ... | |
| 2023-10-11 00:49:29,153 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd | |
| 2023-10-11 00:49:42,185 | |
| Results: | |
| - F-score (micro) 0.7087 | |
| - F-score (macro) 0.628 | |
| - Accuracy 0.5675 | |
| By class: | |
| precision recall f1-score support | |
| PER 0.7895 0.8190 0.8039 348 | |
| LOC 0.5805 0.7739 0.6634 261 | |
| ORG 0.2979 0.2692 0.2828 52 | |
| HumanProd 0.8000 0.7273 0.7619 22 | |
| micro avg 0.6662 0.7570 0.7087 683 | |
| macro avg 0.6170 0.6474 0.6280 683 | |
| weighted avg 0.6725 0.7570 0.7092 683 | |
| 2023-10-11 00:49:42,185 ---------------------------------------------------------------------------------------------------- | |