File size: 25,290 Bytes
76d9b71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
2023-10-11 00:14:02,384 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,386 Model: "SequenceTagger(
  (embeddings): ByT5Embeddings(
    (model): T5EncoderModel(
      (shared): Embedding(384, 1472)
      (encoder): T5Stack(
        (embed_tokens): Embedding(384, 1472)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                  (relative_attention_bias): Embedding(32, 6)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
          (1-11): 11 x T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
        )
        (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1472, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-11 00:14:02,386 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,386 MultiCorpus: 1166 train + 165 dev + 415 test sentences
 - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator
2023-10-11 00:14:02,387 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,387 Train:  1166 sentences
2023-10-11 00:14:02,387         (train_with_dev=False, train_with_test=False)
2023-10-11 00:14:02,387 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,387 Training Params:
2023-10-11 00:14:02,387  - learning_rate: "0.00015" 
2023-10-11 00:14:02,387  - mini_batch_size: "8"
2023-10-11 00:14:02,387  - max_epochs: "10"
2023-10-11 00:14:02,387  - shuffle: "True"
2023-10-11 00:14:02,387 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,387 Plugins:
2023-10-11 00:14:02,387  - TensorboardLogger
2023-10-11 00:14:02,387  - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 00:14:02,388  - metric: "('micro avg', 'f1-score')"
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 Computation:
2023-10-11 00:14:02,388  - compute on device: cuda:0
2023-10-11 00:14:02,388  - embedding storage: none
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3"
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 00:14:11,164 epoch 1 - iter 14/146 - loss 2.82817866 - time (sec): 8.77 - samples/sec: 427.52 - lr: 0.000013 - momentum: 0.000000
2023-10-11 00:14:20,453 epoch 1 - iter 28/146 - loss 2.81986952 - time (sec): 18.06 - samples/sec: 450.86 - lr: 0.000028 - momentum: 0.000000
2023-10-11 00:14:29,670 epoch 1 - iter 42/146 - loss 2.81010839 - time (sec): 27.28 - samples/sec: 448.27 - lr: 0.000042 - momentum: 0.000000
2023-10-11 00:14:38,407 epoch 1 - iter 56/146 - loss 2.79282156 - time (sec): 36.02 - samples/sec: 439.82 - lr: 0.000057 - momentum: 0.000000
2023-10-11 00:14:48,463 epoch 1 - iter 70/146 - loss 2.75564153 - time (sec): 46.07 - samples/sec: 449.23 - lr: 0.000071 - momentum: 0.000000
2023-10-11 00:14:58,486 epoch 1 - iter 84/146 - loss 2.70141670 - time (sec): 56.10 - samples/sec: 458.13 - lr: 0.000085 - momentum: 0.000000
2023-10-11 00:15:07,903 epoch 1 - iter 98/146 - loss 2.63744532 - time (sec): 65.51 - samples/sec: 457.86 - lr: 0.000100 - momentum: 0.000000
2023-10-11 00:15:16,446 epoch 1 - iter 112/146 - loss 2.57069765 - time (sec): 74.06 - samples/sec: 459.84 - lr: 0.000114 - momentum: 0.000000
2023-10-11 00:15:25,138 epoch 1 - iter 126/146 - loss 2.48819700 - time (sec): 82.75 - samples/sec: 464.09 - lr: 0.000128 - momentum: 0.000000
2023-10-11 00:15:33,976 epoch 1 - iter 140/146 - loss 2.40859330 - time (sec): 91.59 - samples/sec: 464.91 - lr: 0.000143 - momentum: 0.000000
2023-10-11 00:15:37,658 ----------------------------------------------------------------------------------------------------
2023-10-11 00:15:37,659 EPOCH 1 done: loss 2.3727 - lr: 0.000143
2023-10-11 00:15:42,937 DEV : loss 1.3521078824996948 - f1-score (micro avg)  0.0
2023-10-11 00:15:42,946 ----------------------------------------------------------------------------------------------------
2023-10-11 00:15:50,957 epoch 2 - iter 14/146 - loss 1.37293642 - time (sec): 8.01 - samples/sec: 471.08 - lr: 0.000149 - momentum: 0.000000
2023-10-11 00:15:59,218 epoch 2 - iter 28/146 - loss 1.27588533 - time (sec): 16.27 - samples/sec: 479.57 - lr: 0.000147 - momentum: 0.000000
2023-10-11 00:16:07,875 epoch 2 - iter 42/146 - loss 1.19578947 - time (sec): 24.93 - samples/sec: 484.01 - lr: 0.000145 - momentum: 0.000000
2023-10-11 00:16:16,004 epoch 2 - iter 56/146 - loss 1.12937665 - time (sec): 33.06 - samples/sec: 480.81 - lr: 0.000144 - momentum: 0.000000
2023-10-11 00:16:24,974 epoch 2 - iter 70/146 - loss 1.04224254 - time (sec): 42.03 - samples/sec: 486.91 - lr: 0.000142 - momentum: 0.000000
2023-10-11 00:16:34,067 epoch 2 - iter 84/146 - loss 1.00774700 - time (sec): 51.12 - samples/sec: 489.84 - lr: 0.000141 - momentum: 0.000000
2023-10-11 00:16:42,519 epoch 2 - iter 98/146 - loss 0.96099641 - time (sec): 59.57 - samples/sec: 487.23 - lr: 0.000139 - momentum: 0.000000
2023-10-11 00:16:51,251 epoch 2 - iter 112/146 - loss 0.90901669 - time (sec): 68.30 - samples/sec: 489.49 - lr: 0.000137 - momentum: 0.000000
2023-10-11 00:17:00,072 epoch 2 - iter 126/146 - loss 0.86569250 - time (sec): 77.12 - samples/sec: 491.52 - lr: 0.000136 - momentum: 0.000000
2023-10-11 00:17:09,029 epoch 2 - iter 140/146 - loss 0.83180630 - time (sec): 86.08 - samples/sec: 492.45 - lr: 0.000134 - momentum: 0.000000
2023-10-11 00:17:12,929 ----------------------------------------------------------------------------------------------------
2023-10-11 00:17:12,930 EPOCH 2 done: loss 0.8265 - lr: 0.000134
2023-10-11 00:17:18,512 DEV : loss 0.45962727069854736 - f1-score (micro avg)  0.0
2023-10-11 00:17:18,522 ----------------------------------------------------------------------------------------------------
2023-10-11 00:17:27,515 epoch 3 - iter 14/146 - loss 0.56664628 - time (sec): 8.99 - samples/sec: 550.74 - lr: 0.000132 - momentum: 0.000000
2023-10-11 00:17:36,702 epoch 3 - iter 28/146 - loss 0.51000469 - time (sec): 18.18 - samples/sec: 553.40 - lr: 0.000130 - momentum: 0.000000
2023-10-11 00:17:45,427 epoch 3 - iter 42/146 - loss 0.55411592 - time (sec): 26.90 - samples/sec: 537.41 - lr: 0.000129 - momentum: 0.000000
2023-10-11 00:17:53,675 epoch 3 - iter 56/146 - loss 0.52261842 - time (sec): 35.15 - samples/sec: 527.12 - lr: 0.000127 - momentum: 0.000000
2023-10-11 00:18:02,091 epoch 3 - iter 70/146 - loss 0.51648209 - time (sec): 43.57 - samples/sec: 523.93 - lr: 0.000126 - momentum: 0.000000
2023-10-11 00:18:10,925 epoch 3 - iter 84/146 - loss 0.49728401 - time (sec): 52.40 - samples/sec: 518.27 - lr: 0.000124 - momentum: 0.000000
2023-10-11 00:18:19,329 epoch 3 - iter 98/146 - loss 0.47812146 - time (sec): 60.81 - samples/sec: 512.91 - lr: 0.000122 - momentum: 0.000000
2023-10-11 00:18:27,211 epoch 3 - iter 112/146 - loss 0.47088239 - time (sec): 68.69 - samples/sec: 505.81 - lr: 0.000121 - momentum: 0.000000
2023-10-11 00:18:34,844 epoch 3 - iter 126/146 - loss 0.46014170 - time (sec): 76.32 - samples/sec: 498.40 - lr: 0.000119 - momentum: 0.000000
2023-10-11 00:18:43,390 epoch 3 - iter 140/146 - loss 0.45282216 - time (sec): 84.87 - samples/sec: 496.35 - lr: 0.000118 - momentum: 0.000000
2023-10-11 00:18:47,362 ----------------------------------------------------------------------------------------------------
2023-10-11 00:18:47,362 EPOCH 3 done: loss 0.4440 - lr: 0.000118
2023-10-11 00:18:53,034 DEV : loss 0.28692546486854553 - f1-score (micro avg)  0.1634
2023-10-11 00:18:53,043 saving best model
2023-10-11 00:18:53,929 ----------------------------------------------------------------------------------------------------
2023-10-11 00:19:02,136 epoch 4 - iter 14/146 - loss 0.33858093 - time (sec): 8.21 - samples/sec: 468.00 - lr: 0.000115 - momentum: 0.000000
2023-10-11 00:19:11,252 epoch 4 - iter 28/146 - loss 0.33897577 - time (sec): 17.32 - samples/sec: 482.48 - lr: 0.000114 - momentum: 0.000000
2023-10-11 00:19:19,557 epoch 4 - iter 42/146 - loss 0.32510505 - time (sec): 25.63 - samples/sec: 480.07 - lr: 0.000112 - momentum: 0.000000
2023-10-11 00:19:27,999 epoch 4 - iter 56/146 - loss 0.33741625 - time (sec): 34.07 - samples/sec: 484.30 - lr: 0.000111 - momentum: 0.000000
2023-10-11 00:19:36,717 epoch 4 - iter 70/146 - loss 0.32330073 - time (sec): 42.79 - samples/sec: 494.60 - lr: 0.000109 - momentum: 0.000000
2023-10-11 00:19:45,262 epoch 4 - iter 84/146 - loss 0.35066962 - time (sec): 51.33 - samples/sec: 491.95 - lr: 0.000107 - momentum: 0.000000
2023-10-11 00:19:53,579 epoch 4 - iter 98/146 - loss 0.34254804 - time (sec): 59.65 - samples/sec: 491.65 - lr: 0.000106 - momentum: 0.000000
2023-10-11 00:20:02,300 epoch 4 - iter 112/146 - loss 0.33425876 - time (sec): 68.37 - samples/sec: 495.96 - lr: 0.000104 - momentum: 0.000000
2023-10-11 00:20:10,709 epoch 4 - iter 126/146 - loss 0.33494481 - time (sec): 76.78 - samples/sec: 495.05 - lr: 0.000103 - momentum: 0.000000
2023-10-11 00:20:19,699 epoch 4 - iter 140/146 - loss 0.32750282 - time (sec): 85.77 - samples/sec: 494.85 - lr: 0.000101 - momentum: 0.000000
2023-10-11 00:20:23,418 ----------------------------------------------------------------------------------------------------
2023-10-11 00:20:23,418 EPOCH 4 done: loss 0.3225 - lr: 0.000101
2023-10-11 00:20:29,017 DEV : loss 0.23322905600070953 - f1-score (micro avg)  0.332
2023-10-11 00:20:29,025 saving best model
2023-10-11 00:20:35,031 ----------------------------------------------------------------------------------------------------
2023-10-11 00:20:43,857 epoch 5 - iter 14/146 - loss 0.27735107 - time (sec): 8.82 - samples/sec: 510.33 - lr: 0.000099 - momentum: 0.000000
2023-10-11 00:20:52,350 epoch 5 - iter 28/146 - loss 0.25431500 - time (sec): 17.31 - samples/sec: 499.40 - lr: 0.000097 - momentum: 0.000000
2023-10-11 00:21:00,691 epoch 5 - iter 42/146 - loss 0.29245784 - time (sec): 25.66 - samples/sec: 492.51 - lr: 0.000096 - momentum: 0.000000
2023-10-11 00:21:08,893 epoch 5 - iter 56/146 - loss 0.30867369 - time (sec): 33.86 - samples/sec: 484.17 - lr: 0.000094 - momentum: 0.000000
2023-10-11 00:21:17,431 epoch 5 - iter 70/146 - loss 0.28826282 - time (sec): 42.40 - samples/sec: 484.22 - lr: 0.000092 - momentum: 0.000000
2023-10-11 00:21:26,668 epoch 5 - iter 84/146 - loss 0.27456335 - time (sec): 51.63 - samples/sec: 487.77 - lr: 0.000091 - momentum: 0.000000
2023-10-11 00:21:36,019 epoch 5 - iter 98/146 - loss 0.26911782 - time (sec): 60.98 - samples/sec: 497.02 - lr: 0.000089 - momentum: 0.000000
2023-10-11 00:21:44,734 epoch 5 - iter 112/146 - loss 0.25803376 - time (sec): 69.70 - samples/sec: 498.10 - lr: 0.000088 - momentum: 0.000000
2023-10-11 00:21:53,421 epoch 5 - iter 126/146 - loss 0.25520050 - time (sec): 78.39 - samples/sec: 498.15 - lr: 0.000086 - momentum: 0.000000
2023-10-11 00:22:01,791 epoch 5 - iter 140/146 - loss 0.25148287 - time (sec): 86.76 - samples/sec: 496.66 - lr: 0.000084 - momentum: 0.000000
2023-10-11 00:22:05,100 ----------------------------------------------------------------------------------------------------
2023-10-11 00:22:05,100 EPOCH 5 done: loss 0.2521 - lr: 0.000084
2023-10-11 00:22:10,781 DEV : loss 0.19501639902591705 - f1-score (micro avg)  0.473
2023-10-11 00:22:10,790 saving best model
2023-10-11 00:22:16,955 ----------------------------------------------------------------------------------------------------
2023-10-11 00:22:26,475 epoch 6 - iter 14/146 - loss 0.16514820 - time (sec): 9.52 - samples/sec: 514.75 - lr: 0.000082 - momentum: 0.000000
2023-10-11 00:22:34,815 epoch 6 - iter 28/146 - loss 0.17522830 - time (sec): 17.86 - samples/sec: 477.67 - lr: 0.000081 - momentum: 0.000000
2023-10-11 00:22:43,531 epoch 6 - iter 42/146 - loss 0.17690455 - time (sec): 26.57 - samples/sec: 477.73 - lr: 0.000079 - momentum: 0.000000
2023-10-11 00:22:52,461 epoch 6 - iter 56/146 - loss 0.16628079 - time (sec): 35.50 - samples/sec: 484.40 - lr: 0.000077 - momentum: 0.000000
2023-10-11 00:23:00,737 epoch 6 - iter 70/146 - loss 0.18071160 - time (sec): 43.78 - samples/sec: 483.63 - lr: 0.000076 - momentum: 0.000000
2023-10-11 00:23:10,600 epoch 6 - iter 84/146 - loss 0.20187792 - time (sec): 53.64 - samples/sec: 497.31 - lr: 0.000074 - momentum: 0.000000
2023-10-11 00:23:19,008 epoch 6 - iter 98/146 - loss 0.20080362 - time (sec): 62.05 - samples/sec: 494.99 - lr: 0.000073 - momentum: 0.000000
2023-10-11 00:23:27,506 epoch 6 - iter 112/146 - loss 0.19888829 - time (sec): 70.55 - samples/sec: 493.73 - lr: 0.000071 - momentum: 0.000000
2023-10-11 00:23:35,994 epoch 6 - iter 126/146 - loss 0.19539473 - time (sec): 79.03 - samples/sec: 493.95 - lr: 0.000069 - momentum: 0.000000
2023-10-11 00:23:43,994 epoch 6 - iter 140/146 - loss 0.19529054 - time (sec): 87.03 - samples/sec: 491.39 - lr: 0.000068 - momentum: 0.000000
2023-10-11 00:23:47,387 ----------------------------------------------------------------------------------------------------
2023-10-11 00:23:47,387 EPOCH 6 done: loss 0.1923 - lr: 0.000068
2023-10-11 00:23:52,889 DEV : loss 0.1738743782043457 - f1-score (micro avg)  0.5498
2023-10-11 00:23:52,897 saving best model
2023-10-11 00:23:59,052 ----------------------------------------------------------------------------------------------------
2023-10-11 00:24:08,020 epoch 7 - iter 14/146 - loss 0.15115652 - time (sec): 8.96 - samples/sec: 516.15 - lr: 0.000066 - momentum: 0.000000
2023-10-11 00:24:16,989 epoch 7 - iter 28/146 - loss 0.15169848 - time (sec): 17.93 - samples/sec: 529.01 - lr: 0.000064 - momentum: 0.000000
2023-10-11 00:24:25,559 epoch 7 - iter 42/146 - loss 0.15112913 - time (sec): 26.50 - samples/sec: 514.72 - lr: 0.000062 - momentum: 0.000000
2023-10-11 00:24:33,559 epoch 7 - iter 56/146 - loss 0.14375947 - time (sec): 34.50 - samples/sec: 505.96 - lr: 0.000061 - momentum: 0.000000
2023-10-11 00:24:41,851 epoch 7 - iter 70/146 - loss 0.14191662 - time (sec): 42.80 - samples/sec: 502.65 - lr: 0.000059 - momentum: 0.000000
2023-10-11 00:24:49,764 epoch 7 - iter 84/146 - loss 0.14733674 - time (sec): 50.71 - samples/sec: 499.78 - lr: 0.000058 - momentum: 0.000000
2023-10-11 00:24:58,428 epoch 7 - iter 98/146 - loss 0.15209724 - time (sec): 59.37 - samples/sec: 503.09 - lr: 0.000056 - momentum: 0.000000
2023-10-11 00:25:06,268 epoch 7 - iter 112/146 - loss 0.15110304 - time (sec): 67.21 - samples/sec: 493.92 - lr: 0.000054 - momentum: 0.000000
2023-10-11 00:25:15,356 epoch 7 - iter 126/146 - loss 0.15315807 - time (sec): 76.30 - samples/sec: 498.24 - lr: 0.000053 - momentum: 0.000000
2023-10-11 00:25:24,258 epoch 7 - iter 140/146 - loss 0.15339801 - time (sec): 85.20 - samples/sec: 504.10 - lr: 0.000051 - momentum: 0.000000
2023-10-11 00:25:27,449 ----------------------------------------------------------------------------------------------------
2023-10-11 00:25:27,450 EPOCH 7 done: loss 0.1525 - lr: 0.000051
2023-10-11 00:25:33,160 DEV : loss 0.1568579375743866 - f1-score (micro avg)  0.6026
2023-10-11 00:25:33,170 saving best model
2023-10-11 00:25:39,402 ----------------------------------------------------------------------------------------------------
2023-10-11 00:25:48,695 epoch 8 - iter 14/146 - loss 0.14470409 - time (sec): 9.29 - samples/sec: 565.87 - lr: 0.000049 - momentum: 0.000000
2023-10-11 00:25:56,856 epoch 8 - iter 28/146 - loss 0.15466879 - time (sec): 17.45 - samples/sec: 513.07 - lr: 0.000047 - momentum: 0.000000
2023-10-11 00:26:05,090 epoch 8 - iter 42/146 - loss 0.14556756 - time (sec): 25.68 - samples/sec: 500.74 - lr: 0.000046 - momentum: 0.000000
2023-10-11 00:26:13,757 epoch 8 - iter 56/146 - loss 0.14562604 - time (sec): 34.35 - samples/sec: 497.81 - lr: 0.000044 - momentum: 0.000000
2023-10-11 00:26:22,674 epoch 8 - iter 70/146 - loss 0.14744312 - time (sec): 43.27 - samples/sec: 498.29 - lr: 0.000043 - momentum: 0.000000
2023-10-11 00:26:31,283 epoch 8 - iter 84/146 - loss 0.14623450 - time (sec): 51.88 - samples/sec: 486.66 - lr: 0.000041 - momentum: 0.000000
2023-10-11 00:26:40,567 epoch 8 - iter 98/146 - loss 0.13910467 - time (sec): 61.16 - samples/sec: 479.51 - lr: 0.000039 - momentum: 0.000000
2023-10-11 00:26:50,125 epoch 8 - iter 112/146 - loss 0.13334599 - time (sec): 70.72 - samples/sec: 476.83 - lr: 0.000038 - momentum: 0.000000
2023-10-11 00:26:59,926 epoch 8 - iter 126/146 - loss 0.12965286 - time (sec): 80.52 - samples/sec: 473.91 - lr: 0.000036 - momentum: 0.000000
2023-10-11 00:27:09,581 epoch 8 - iter 140/146 - loss 0.12939577 - time (sec): 90.18 - samples/sec: 471.19 - lr: 0.000035 - momentum: 0.000000
2023-10-11 00:27:13,663 ----------------------------------------------------------------------------------------------------
2023-10-11 00:27:13,664 EPOCH 8 done: loss 0.1293 - lr: 0.000035
2023-10-11 00:27:20,336 DEV : loss 0.14915454387664795 - f1-score (micro avg)  0.6711
2023-10-11 00:27:20,346 saving best model
2023-10-11 00:27:26,639 ----------------------------------------------------------------------------------------------------
2023-10-11 00:27:35,810 epoch 9 - iter 14/146 - loss 0.14480468 - time (sec): 9.17 - samples/sec: 512.79 - lr: 0.000032 - momentum: 0.000000
2023-10-11 00:27:44,972 epoch 9 - iter 28/146 - loss 0.12111733 - time (sec): 18.33 - samples/sec: 508.59 - lr: 0.000031 - momentum: 0.000000
2023-10-11 00:27:53,282 epoch 9 - iter 42/146 - loss 0.11551154 - time (sec): 26.64 - samples/sec: 494.12 - lr: 0.000029 - momentum: 0.000000
2023-10-11 00:28:02,178 epoch 9 - iter 56/146 - loss 0.11450421 - time (sec): 35.54 - samples/sec: 496.91 - lr: 0.000028 - momentum: 0.000000
2023-10-11 00:28:11,315 epoch 9 - iter 70/146 - loss 0.11627392 - time (sec): 44.67 - samples/sec: 491.31 - lr: 0.000026 - momentum: 0.000000
2023-10-11 00:28:20,239 epoch 9 - iter 84/146 - loss 0.11633930 - time (sec): 53.60 - samples/sec: 491.19 - lr: 0.000024 - momentum: 0.000000
2023-10-11 00:28:28,971 epoch 9 - iter 98/146 - loss 0.11323542 - time (sec): 62.33 - samples/sec: 488.62 - lr: 0.000023 - momentum: 0.000000
2023-10-11 00:28:37,804 epoch 9 - iter 112/146 - loss 0.10890718 - time (sec): 71.16 - samples/sec: 488.50 - lr: 0.000021 - momentum: 0.000000
2023-10-11 00:28:46,797 epoch 9 - iter 126/146 - loss 0.11265525 - time (sec): 80.15 - samples/sec: 487.63 - lr: 0.000020 - momentum: 0.000000
2023-10-11 00:28:55,411 epoch 9 - iter 140/146 - loss 0.11540303 - time (sec): 88.77 - samples/sec: 485.35 - lr: 0.000018 - momentum: 0.000000
2023-10-11 00:28:58,607 ----------------------------------------------------------------------------------------------------
2023-10-11 00:28:58,607 EPOCH 9 done: loss 0.1148 - lr: 0.000018
2023-10-11 00:29:04,628 DEV : loss 0.15014490485191345 - f1-score (micro avg)  0.7097
2023-10-11 00:29:04,638 saving best model
2023-10-11 00:29:10,636 ----------------------------------------------------------------------------------------------------
2023-10-11 00:29:19,540 epoch 10 - iter 14/146 - loss 0.11532110 - time (sec): 8.90 - samples/sec: 515.52 - lr: 0.000016 - momentum: 0.000000
2023-10-11 00:29:28,674 epoch 10 - iter 28/146 - loss 0.11738693 - time (sec): 18.03 - samples/sec: 506.38 - lr: 0.000014 - momentum: 0.000000
2023-10-11 00:29:37,812 epoch 10 - iter 42/146 - loss 0.11927845 - time (sec): 27.17 - samples/sec: 512.69 - lr: 0.000013 - momentum: 0.000000
2023-10-11 00:29:47,596 epoch 10 - iter 56/146 - loss 0.11248663 - time (sec): 36.96 - samples/sec: 504.22 - lr: 0.000011 - momentum: 0.000000
2023-10-11 00:29:56,837 epoch 10 - iter 70/146 - loss 0.11378977 - time (sec): 46.20 - samples/sec: 489.68 - lr: 0.000009 - momentum: 0.000000
2023-10-11 00:30:06,496 epoch 10 - iter 84/146 - loss 0.10860430 - time (sec): 55.86 - samples/sec: 483.19 - lr: 0.000008 - momentum: 0.000000
2023-10-11 00:30:15,171 epoch 10 - iter 98/146 - loss 0.10596910 - time (sec): 64.53 - samples/sec: 467.25 - lr: 0.000006 - momentum: 0.000000
2023-10-11 00:30:25,050 epoch 10 - iter 112/146 - loss 0.10933142 - time (sec): 74.41 - samples/sec: 465.68 - lr: 0.000005 - momentum: 0.000000
2023-10-11 00:30:34,564 epoch 10 - iter 126/146 - loss 0.10650302 - time (sec): 83.92 - samples/sec: 460.29 - lr: 0.000003 - momentum: 0.000000
2023-10-11 00:30:44,129 epoch 10 - iter 140/146 - loss 0.10893081 - time (sec): 93.49 - samples/sec: 456.67 - lr: 0.000001 - momentum: 0.000000
2023-10-11 00:30:48,031 ----------------------------------------------------------------------------------------------------
2023-10-11 00:30:48,032 EPOCH 10 done: loss 0.1087 - lr: 0.000001
2023-10-11 00:30:53,773 DEV : loss 0.15238186717033386 - f1-score (micro avg)  0.7229
2023-10-11 00:30:53,782 saving best model
2023-10-11 00:31:00,790 ----------------------------------------------------------------------------------------------------
2023-10-11 00:31:00,792 Loading model from best epoch ...
2023-10-11 00:31:04,651 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-11 00:31:16,872 
Results:
- F-score (micro) 0.7015
- F-score (macro) 0.6099
- Accuracy 0.5632

By class:
              precision    recall  f1-score   support

         PER     0.7821    0.8046    0.7932       348
         LOC     0.5766    0.7931    0.6677       261
         ORG     0.2982    0.3269    0.3119        52
   HumanProd     0.7647    0.5909    0.6667        22

   micro avg     0.6536    0.7570    0.7015       683
   macro avg     0.6054    0.6289    0.6099       683
weighted avg     0.6662    0.7570    0.7045       683

2023-10-11 00:31:16,872 ----------------------------------------------------------------------------------------------------