deter3 commited on
Commit
15722b2
·
verified ·
1 Parent(s): 63e0ad8

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,844 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - dense
10
+ - generated_from_trainer
11
+ - dataset_size:6300
12
+ - loss:MatryoshkaLoss
13
+ - loss:MultipleNegativesRankingLoss
14
+ base_model: BAAI/bge-base-en-v1.5
15
+ widget:
16
+ - source_sentence: The total lease payments for 2023 were initially valued at $1,008
17
+ million, but after incorporating $43 million for interest, the final amount totaled
18
+ $1,051 million.
19
+ sentences:
20
+ - What percentage of Kenvue's shares did Johnson & Johnson own after the exchange
21
+ offer on August 23, 2023?
22
+ - What was the increase in total lease payments from the base amount to the final
23
+ amount including interest in 2023?
24
+ - What is the primary use of Global Business Services within Procter & Gamble?
25
+ - source_sentence: We amortize software costs using the straight-line method over
26
+ the expected life of the software, generally 3 to 7 years.
27
+ sentences:
28
+ - How often does the company issue standby letters of credit, performance or surety
29
+ bonds, or other guarantees?
30
+ - What is the amortization method used for software costs and what is their expected
31
+ useful life range?
32
+ - How are the translation adjustments of foreign entity operations recorded in financial
33
+ statements?
34
+ - source_sentence: In 2023, we continued to invest in our colleagues, building on
35
+ a wide range of learning and development opportunities and enhancing our competitive
36
+ benefits in key areas including holistic health and wellness, total compensation
37
+ and flexibility. We conduct an annual Colleague Experience Survey to better understand
38
+ our colleagues’ needs and overall experience at American Express.
39
+ sentences:
40
+ - How does American Express support employee development and well-being?
41
+ - By what percentage did admissions revenues increase during the year ended December
42
+ 31, 2023 compared to the prior year?
43
+ - What is the maximum amount payable by the Corporation for most credit derivatives,
44
+ and how is this measured in terms of credit risk management?
45
+ - source_sentence: Prepaid expenses were $69,167 in 2022 and increased to $97,670
46
+ in 2023.
47
+ sentences:
48
+ - What functional responsibility does Mary E. Adcock have at Kroger?
49
+ - What is Apple's approach to licenses for intellectual property owned by third
50
+ parties used in its products and services?
51
+ - How much did the prepaid expenses increase from 2022 to 2023?
52
+ - source_sentence: Generated cash flows from operations of $4.5 billion.
53
+ sentences:
54
+ - How much did cash flows from operations amount to in 2022?
55
+ - What was the overall turnover rate at the company in fiscal year 2023?
56
+ - What are the expectations the company has for its employees in aligning with the
57
+ Code of Conduct?
58
+ pipeline_tag: sentence-similarity
59
+ library_name: sentence-transformers
60
+ metrics:
61
+ - cosine_accuracy@1
62
+ - cosine_accuracy@3
63
+ - cosine_accuracy@5
64
+ - cosine_accuracy@10
65
+ - cosine_precision@1
66
+ - cosine_precision@3
67
+ - cosine_precision@5
68
+ - cosine_precision@10
69
+ - cosine_recall@1
70
+ - cosine_recall@3
71
+ - cosine_recall@5
72
+ - cosine_recall@10
73
+ - cosine_ndcg@10
74
+ - cosine_mrr@10
75
+ - cosine_map@100
76
+ model-index:
77
+ - name: BGE base Financial Matryoshka
78
+ results:
79
+ - task:
80
+ type: information-retrieval
81
+ name: Information Retrieval
82
+ dataset:
83
+ name: dim 768
84
+ type: dim_768
85
+ metrics:
86
+ - type: cosine_accuracy@1
87
+ value: 0.7014285714285714
88
+ name: Cosine Accuracy@1
89
+ - type: cosine_accuracy@3
90
+ value: 0.8242857142857143
91
+ name: Cosine Accuracy@3
92
+ - type: cosine_accuracy@5
93
+ value: 0.8671428571428571
94
+ name: Cosine Accuracy@5
95
+ - type: cosine_accuracy@10
96
+ value: 0.9071428571428571
97
+ name: Cosine Accuracy@10
98
+ - type: cosine_precision@1
99
+ value: 0.7014285714285714
100
+ name: Cosine Precision@1
101
+ - type: cosine_precision@3
102
+ value: 0.2747619047619047
103
+ name: Cosine Precision@3
104
+ - type: cosine_precision@5
105
+ value: 0.1734285714285714
106
+ name: Cosine Precision@5
107
+ - type: cosine_precision@10
108
+ value: 0.09071428571428569
109
+ name: Cosine Precision@10
110
+ - type: cosine_recall@1
111
+ value: 0.7014285714285714
112
+ name: Cosine Recall@1
113
+ - type: cosine_recall@3
114
+ value: 0.8242857142857143
115
+ name: Cosine Recall@3
116
+ - type: cosine_recall@5
117
+ value: 0.8671428571428571
118
+ name: Cosine Recall@5
119
+ - type: cosine_recall@10
120
+ value: 0.9071428571428571
121
+ name: Cosine Recall@10
122
+ - type: cosine_ndcg@10
123
+ value: 0.8052852140611453
124
+ name: Cosine Ndcg@10
125
+ - type: cosine_mrr@10
126
+ value: 0.7727052154195015
127
+ name: Cosine Mrr@10
128
+ - type: cosine_map@100
129
+ value: 0.7763711302515639
130
+ name: Cosine Map@100
131
+ - task:
132
+ type: information-retrieval
133
+ name: Information Retrieval
134
+ dataset:
135
+ name: dim 512
136
+ type: dim_512
137
+ metrics:
138
+ - type: cosine_accuracy@1
139
+ value: 0.7114285714285714
140
+ name: Cosine Accuracy@1
141
+ - type: cosine_accuracy@3
142
+ value: 0.8242857142857143
143
+ name: Cosine Accuracy@3
144
+ - type: cosine_accuracy@5
145
+ value: 0.8642857142857143
146
+ name: Cosine Accuracy@5
147
+ - type: cosine_accuracy@10
148
+ value: 0.9085714285714286
149
+ name: Cosine Accuracy@10
150
+ - type: cosine_precision@1
151
+ value: 0.7114285714285714
152
+ name: Cosine Precision@1
153
+ - type: cosine_precision@3
154
+ value: 0.2747619047619047
155
+ name: Cosine Precision@3
156
+ - type: cosine_precision@5
157
+ value: 0.17285714285714285
158
+ name: Cosine Precision@5
159
+ - type: cosine_precision@10
160
+ value: 0.09085714285714284
161
+ name: Cosine Precision@10
162
+ - type: cosine_recall@1
163
+ value: 0.7114285714285714
164
+ name: Cosine Recall@1
165
+ - type: cosine_recall@3
166
+ value: 0.8242857142857143
167
+ name: Cosine Recall@3
168
+ - type: cosine_recall@5
169
+ value: 0.8642857142857143
170
+ name: Cosine Recall@5
171
+ - type: cosine_recall@10
172
+ value: 0.9085714285714286
173
+ name: Cosine Recall@10
174
+ - type: cosine_ndcg@10
175
+ value: 0.8098666238099614
176
+ name: Cosine Ndcg@10
177
+ - type: cosine_mrr@10
178
+ value: 0.7784104308390026
179
+ name: Cosine Mrr@10
180
+ - type: cosine_map@100
181
+ value: 0.7819743643907353
182
+ name: Cosine Map@100
183
+ - task:
184
+ type: information-retrieval
185
+ name: Information Retrieval
186
+ dataset:
187
+ name: dim 256
188
+ type: dim_256
189
+ metrics:
190
+ - type: cosine_accuracy@1
191
+ value: 0.7014285714285714
192
+ name: Cosine Accuracy@1
193
+ - type: cosine_accuracy@3
194
+ value: 0.8242857142857143
195
+ name: Cosine Accuracy@3
196
+ - type: cosine_accuracy@5
197
+ value: 0.8557142857142858
198
+ name: Cosine Accuracy@5
199
+ - type: cosine_accuracy@10
200
+ value: 0.8914285714285715
201
+ name: Cosine Accuracy@10
202
+ - type: cosine_precision@1
203
+ value: 0.7014285714285714
204
+ name: Cosine Precision@1
205
+ - type: cosine_precision@3
206
+ value: 0.2747619047619047
207
+ name: Cosine Precision@3
208
+ - type: cosine_precision@5
209
+ value: 0.17114285714285712
210
+ name: Cosine Precision@5
211
+ - type: cosine_precision@10
212
+ value: 0.08914285714285713
213
+ name: Cosine Precision@10
214
+ - type: cosine_recall@1
215
+ value: 0.7014285714285714
216
+ name: Cosine Recall@1
217
+ - type: cosine_recall@3
218
+ value: 0.8242857142857143
219
+ name: Cosine Recall@3
220
+ - type: cosine_recall@5
221
+ value: 0.8557142857142858
222
+ name: Cosine Recall@5
223
+ - type: cosine_recall@10
224
+ value: 0.8914285714285715
225
+ name: Cosine Recall@10
226
+ - type: cosine_ndcg@10
227
+ value: 0.8008524512077413
228
+ name: Cosine Ndcg@10
229
+ - type: cosine_mrr@10
230
+ value: 0.7714569160997735
231
+ name: Cosine Mrr@10
232
+ - type: cosine_map@100
233
+ value: 0.7758614780389599
234
+ name: Cosine Map@100
235
+ - task:
236
+ type: information-retrieval
237
+ name: Information Retrieval
238
+ dataset:
239
+ name: dim 128
240
+ type: dim_128
241
+ metrics:
242
+ - type: cosine_accuracy@1
243
+ value: 0.6828571428571428
244
+ name: Cosine Accuracy@1
245
+ - type: cosine_accuracy@3
246
+ value: 0.8128571428571428
247
+ name: Cosine Accuracy@3
248
+ - type: cosine_accuracy@5
249
+ value: 0.8485714285714285
250
+ name: Cosine Accuracy@5
251
+ - type: cosine_accuracy@10
252
+ value: 0.8914285714285715
253
+ name: Cosine Accuracy@10
254
+ - type: cosine_precision@1
255
+ value: 0.6828571428571428
256
+ name: Cosine Precision@1
257
+ - type: cosine_precision@3
258
+ value: 0.270952380952381
259
+ name: Cosine Precision@3
260
+ - type: cosine_precision@5
261
+ value: 0.16971428571428568
262
+ name: Cosine Precision@5
263
+ - type: cosine_precision@10
264
+ value: 0.08914285714285713
265
+ name: Cosine Precision@10
266
+ - type: cosine_recall@1
267
+ value: 0.6828571428571428
268
+ name: Cosine Recall@1
269
+ - type: cosine_recall@3
270
+ value: 0.8128571428571428
271
+ name: Cosine Recall@3
272
+ - type: cosine_recall@5
273
+ value: 0.8485714285714285
274
+ name: Cosine Recall@5
275
+ - type: cosine_recall@10
276
+ value: 0.8914285714285715
277
+ name: Cosine Recall@10
278
+ - type: cosine_ndcg@10
279
+ value: 0.7893688537538128
280
+ name: Cosine Ndcg@10
281
+ - type: cosine_mrr@10
282
+ value: 0.756581632653061
283
+ name: Cosine Mrr@10
284
+ - type: cosine_map@100
285
+ value: 0.7607042782514057
286
+ name: Cosine Map@100
287
+ - task:
288
+ type: information-retrieval
289
+ name: Information Retrieval
290
+ dataset:
291
+ name: dim 64
292
+ type: dim_64
293
+ metrics:
294
+ - type: cosine_accuracy@1
295
+ value: 0.6614285714285715
296
+ name: Cosine Accuracy@1
297
+ - type: cosine_accuracy@3
298
+ value: 0.7957142857142857
299
+ name: Cosine Accuracy@3
300
+ - type: cosine_accuracy@5
301
+ value: 0.8285714285714286
302
+ name: Cosine Accuracy@5
303
+ - type: cosine_accuracy@10
304
+ value: 0.8771428571428571
305
+ name: Cosine Accuracy@10
306
+ - type: cosine_precision@1
307
+ value: 0.6614285714285715
308
+ name: Cosine Precision@1
309
+ - type: cosine_precision@3
310
+ value: 0.2652380952380953
311
+ name: Cosine Precision@3
312
+ - type: cosine_precision@5
313
+ value: 0.1657142857142857
314
+ name: Cosine Precision@5
315
+ - type: cosine_precision@10
316
+ value: 0.0877142857142857
317
+ name: Cosine Precision@10
318
+ - type: cosine_recall@1
319
+ value: 0.6614285714285715
320
+ name: Cosine Recall@1
321
+ - type: cosine_recall@3
322
+ value: 0.7957142857142857
323
+ name: Cosine Recall@3
324
+ - type: cosine_recall@5
325
+ value: 0.8285714285714286
326
+ name: Cosine Recall@5
327
+ - type: cosine_recall@10
328
+ value: 0.8771428571428571
329
+ name: Cosine Recall@10
330
+ - type: cosine_ndcg@10
331
+ value: 0.7706919427250147
332
+ name: Cosine Ndcg@10
333
+ - type: cosine_mrr@10
334
+ value: 0.736583900226757
335
+ name: Cosine Mrr@10
336
+ - type: cosine_map@100
337
+ value: 0.7408800803327711
338
+ name: Cosine Map@100
339
+ ---
340
+
341
+ # BGE base Financial Matryoshka
342
+
343
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
344
+
345
+ ## Model Details
346
+
347
+ ### Model Description
348
+ - **Model Type:** Sentence Transformer
349
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
350
+ - **Maximum Sequence Length:** 512 tokens
351
+ - **Output Dimensionality:** 768 dimensions
352
+ - **Similarity Function:** Cosine Similarity
353
+ - **Training Dataset:**
354
+ - json
355
+ - **Language:** en
356
+ - **License:** apache-2.0
357
+
358
+ ### Model Sources
359
+
360
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
361
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
362
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
363
+
364
+ ### Full Model Architecture
365
+
366
+ ```
367
+ SentenceTransformer(
368
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
369
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
370
+ (2): Normalize()
371
+ )
372
+ ```
373
+
374
+ ## Usage
375
+
376
+ ### Direct Usage (Sentence Transformers)
377
+
378
+ First install the Sentence Transformers library:
379
+
380
+ ```bash
381
+ pip install -U sentence-transformers
382
+ ```
383
+
384
+ Then you can load this model and run inference.
385
+ ```python
386
+ from sentence_transformers import SentenceTransformer
387
+
388
+ # Download from the 🤗 Hub
389
+ model = SentenceTransformer("deter3/bge-base-financial-matryoshka")
390
+ # Run inference
391
+ sentences = [
392
+ 'Generated cash flows from operations of $4.5 billion.',
393
+ 'How much did cash flows from operations amount to in 2022?',
394
+ 'What was the overall turnover rate at the company in fiscal year 2023?',
395
+ ]
396
+ embeddings = model.encode(sentences)
397
+ print(embeddings.shape)
398
+ # [3, 768]
399
+
400
+ # Get the similarity scores for the embeddings
401
+ similarities = model.similarity(embeddings, embeddings)
402
+ print(similarities)
403
+ # tensor([[1.0000, 0.7518, 0.2425],
404
+ # [0.7518, 1.0000, 0.2768],
405
+ # [0.2425, 0.2768, 1.0000]])
406
+ ```
407
+
408
+ <!--
409
+ ### Direct Usage (Transformers)
410
+
411
+ <details><summary>Click to see the direct usage in Transformers</summary>
412
+
413
+ </details>
414
+ -->
415
+
416
+ <!--
417
+ ### Downstream Usage (Sentence Transformers)
418
+
419
+ You can finetune this model on your own dataset.
420
+
421
+ <details><summary>Click to expand</summary>
422
+
423
+ </details>
424
+ -->
425
+
426
+ <!--
427
+ ### Out-of-Scope Use
428
+
429
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
430
+ -->
431
+
432
+ ## Evaluation
433
+
434
+ ### Metrics
435
+
436
+ #### Information Retrieval
437
+
438
+ * Dataset: `dim_768`
439
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
440
+ ```json
441
+ {
442
+ "truncate_dim": 768
443
+ }
444
+ ```
445
+
446
+ | Metric | Value |
447
+ |:--------------------|:-----------|
448
+ | cosine_accuracy@1 | 0.7014 |
449
+ | cosine_accuracy@3 | 0.8243 |
450
+ | cosine_accuracy@5 | 0.8671 |
451
+ | cosine_accuracy@10 | 0.9071 |
452
+ | cosine_precision@1 | 0.7014 |
453
+ | cosine_precision@3 | 0.2748 |
454
+ | cosine_precision@5 | 0.1734 |
455
+ | cosine_precision@10 | 0.0907 |
456
+ | cosine_recall@1 | 0.7014 |
457
+ | cosine_recall@3 | 0.8243 |
458
+ | cosine_recall@5 | 0.8671 |
459
+ | cosine_recall@10 | 0.9071 |
460
+ | **cosine_ndcg@10** | **0.8053** |
461
+ | cosine_mrr@10 | 0.7727 |
462
+ | cosine_map@100 | 0.7764 |
463
+
464
+ #### Information Retrieval
465
+
466
+ * Dataset: `dim_512`
467
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
468
+ ```json
469
+ {
470
+ "truncate_dim": 512
471
+ }
472
+ ```
473
+
474
+ | Metric | Value |
475
+ |:--------------------|:-----------|
476
+ | cosine_accuracy@1 | 0.7114 |
477
+ | cosine_accuracy@3 | 0.8243 |
478
+ | cosine_accuracy@5 | 0.8643 |
479
+ | cosine_accuracy@10 | 0.9086 |
480
+ | cosine_precision@1 | 0.7114 |
481
+ | cosine_precision@3 | 0.2748 |
482
+ | cosine_precision@5 | 0.1729 |
483
+ | cosine_precision@10 | 0.0909 |
484
+ | cosine_recall@1 | 0.7114 |
485
+ | cosine_recall@3 | 0.8243 |
486
+ | cosine_recall@5 | 0.8643 |
487
+ | cosine_recall@10 | 0.9086 |
488
+ | **cosine_ndcg@10** | **0.8099** |
489
+ | cosine_mrr@10 | 0.7784 |
490
+ | cosine_map@100 | 0.782 |
491
+
492
+ #### Information Retrieval
493
+
494
+ * Dataset: `dim_256`
495
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
496
+ ```json
497
+ {
498
+ "truncate_dim": 256
499
+ }
500
+ ```
501
+
502
+ | Metric | Value |
503
+ |:--------------------|:-----------|
504
+ | cosine_accuracy@1 | 0.7014 |
505
+ | cosine_accuracy@3 | 0.8243 |
506
+ | cosine_accuracy@5 | 0.8557 |
507
+ | cosine_accuracy@10 | 0.8914 |
508
+ | cosine_precision@1 | 0.7014 |
509
+ | cosine_precision@3 | 0.2748 |
510
+ | cosine_precision@5 | 0.1711 |
511
+ | cosine_precision@10 | 0.0891 |
512
+ | cosine_recall@1 | 0.7014 |
513
+ | cosine_recall@3 | 0.8243 |
514
+ | cosine_recall@5 | 0.8557 |
515
+ | cosine_recall@10 | 0.8914 |
516
+ | **cosine_ndcg@10** | **0.8009** |
517
+ | cosine_mrr@10 | 0.7715 |
518
+ | cosine_map@100 | 0.7759 |
519
+
520
+ #### Information Retrieval
521
+
522
+ * Dataset: `dim_128`
523
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
524
+ ```json
525
+ {
526
+ "truncate_dim": 128
527
+ }
528
+ ```
529
+
530
+ | Metric | Value |
531
+ |:--------------------|:-----------|
532
+ | cosine_accuracy@1 | 0.6829 |
533
+ | cosine_accuracy@3 | 0.8129 |
534
+ | cosine_accuracy@5 | 0.8486 |
535
+ | cosine_accuracy@10 | 0.8914 |
536
+ | cosine_precision@1 | 0.6829 |
537
+ | cosine_precision@3 | 0.271 |
538
+ | cosine_precision@5 | 0.1697 |
539
+ | cosine_precision@10 | 0.0891 |
540
+ | cosine_recall@1 | 0.6829 |
541
+ | cosine_recall@3 | 0.8129 |
542
+ | cosine_recall@5 | 0.8486 |
543
+ | cosine_recall@10 | 0.8914 |
544
+ | **cosine_ndcg@10** | **0.7894** |
545
+ | cosine_mrr@10 | 0.7566 |
546
+ | cosine_map@100 | 0.7607 |
547
+
548
+ #### Information Retrieval
549
+
550
+ * Dataset: `dim_64`
551
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
552
+ ```json
553
+ {
554
+ "truncate_dim": 64
555
+ }
556
+ ```
557
+
558
+ | Metric | Value |
559
+ |:--------------------|:-----------|
560
+ | cosine_accuracy@1 | 0.6614 |
561
+ | cosine_accuracy@3 | 0.7957 |
562
+ | cosine_accuracy@5 | 0.8286 |
563
+ | cosine_accuracy@10 | 0.8771 |
564
+ | cosine_precision@1 | 0.6614 |
565
+ | cosine_precision@3 | 0.2652 |
566
+ | cosine_precision@5 | 0.1657 |
567
+ | cosine_precision@10 | 0.0877 |
568
+ | cosine_recall@1 | 0.6614 |
569
+ | cosine_recall@3 | 0.7957 |
570
+ | cosine_recall@5 | 0.8286 |
571
+ | cosine_recall@10 | 0.8771 |
572
+ | **cosine_ndcg@10** | **0.7707** |
573
+ | cosine_mrr@10 | 0.7366 |
574
+ | cosine_map@100 | 0.7409 |
575
+
576
+ <!--
577
+ ## Bias, Risks and Limitations
578
+
579
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
580
+ -->
581
+
582
+ <!--
583
+ ### Recommendations
584
+
585
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
586
+ -->
587
+
588
+ ## Training Details
589
+
590
+ ### Training Dataset
591
+
592
+ #### json
593
+
594
+ * Dataset: json
595
+ * Size: 6,300 training samples
596
+ * Columns: <code>positive</code> and <code>anchor</code>
597
+ * Approximate statistics based on the first 1000 samples:
598
+ | | positive | anchor |
599
+ |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
600
+ | type | string | string |
601
+ | details | <ul><li>min: 13 tokens</li><li>mean: 45.95 tokens</li><li>max: 248 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 20.43 tokens</li><li>max: 41 tokens</li></ul> |
602
+ * Samples:
603
+ | positive | anchor |
604
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------|
605
+ | <code>The Company's nominal par value per share was slightly reduced to USD $0.10, reflecting in the share capital, as of December 30, 2023.</code> | <code>What was the nominal par value per share of Garmin Ltd. in U.S. dollars as of December 30, 2023?</code> |
606
+ | <code>Over the last several years, the number and potential significance of the litigation and investigations involving the company have increased, and there can be no assurance that this trend will not continue.</code> | <code>How has the litigation and investigation landscape changed for the company over recent years?</code> |
607
+ | <code>As of January 31, 2023, assets located outside the Americas were 15 percent of total assets.</code> | <code>What percentage of Salesforce's total assets were located outside the Americas as of January 31, 2023?</code> |
608
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
609
+ ```json
610
+ {
611
+ "loss": "MultipleNegativesRankingLoss",
612
+ "matryoshka_dims": [
613
+ 768,
614
+ 512,
615
+ 256,
616
+ 128,
617
+ 64
618
+ ],
619
+ "matryoshka_weights": [
620
+ 1,
621
+ 1,
622
+ 1,
623
+ 1,
624
+ 1
625
+ ],
626
+ "n_dims_per_step": -1
627
+ }
628
+ ```
629
+
630
+ ### Training Hyperparameters
631
+ #### Non-Default Hyperparameters
632
+
633
+ - `eval_strategy`: epoch
634
+ - `per_device_train_batch_size`: 32
635
+ - `per_device_eval_batch_size`: 16
636
+ - `gradient_accumulation_steps`: 16
637
+ - `learning_rate`: 2e-05
638
+ - `num_train_epochs`: 4
639
+ - `lr_scheduler_type`: cosine
640
+ - `warmup_ratio`: 0.1
641
+ - `bf16`: True
642
+ - `tf32`: True
643
+ - `load_best_model_at_end`: True
644
+ - `optim`: adamw_torch_fused
645
+ - `batch_sampler`: no_duplicates
646
+
647
+ #### All Hyperparameters
648
+ <details><summary>Click to expand</summary>
649
+
650
+ - `overwrite_output_dir`: False
651
+ - `do_predict`: False
652
+ - `eval_strategy`: epoch
653
+ - `prediction_loss_only`: True
654
+ - `per_device_train_batch_size`: 32
655
+ - `per_device_eval_batch_size`: 16
656
+ - `per_gpu_train_batch_size`: None
657
+ - `per_gpu_eval_batch_size`: None
658
+ - `gradient_accumulation_steps`: 16
659
+ - `eval_accumulation_steps`: None
660
+ - `learning_rate`: 2e-05
661
+ - `weight_decay`: 0.0
662
+ - `adam_beta1`: 0.9
663
+ - `adam_beta2`: 0.999
664
+ - `adam_epsilon`: 1e-08
665
+ - `max_grad_norm`: 1.0
666
+ - `num_train_epochs`: 4
667
+ - `max_steps`: -1
668
+ - `lr_scheduler_type`: cosine
669
+ - `lr_scheduler_kwargs`: {}
670
+ - `warmup_ratio`: 0.1
671
+ - `warmup_steps`: 0
672
+ - `log_level`: passive
673
+ - `log_level_replica`: warning
674
+ - `log_on_each_node`: True
675
+ - `logging_nan_inf_filter`: True
676
+ - `save_safetensors`: True
677
+ - `save_on_each_node`: False
678
+ - `save_only_model`: False
679
+ - `restore_callback_states_from_checkpoint`: False
680
+ - `no_cuda`: False
681
+ - `use_cpu`: False
682
+ - `use_mps_device`: False
683
+ - `seed`: 42
684
+ - `data_seed`: None
685
+ - `jit_mode_eval`: False
686
+ - `use_ipex`: False
687
+ - `bf16`: True
688
+ - `fp16`: False
689
+ - `fp16_opt_level`: O1
690
+ - `half_precision_backend`: auto
691
+ - `bf16_full_eval`: False
692
+ - `fp16_full_eval`: False
693
+ - `tf32`: True
694
+ - `local_rank`: 0
695
+ - `ddp_backend`: None
696
+ - `tpu_num_cores`: None
697
+ - `tpu_metrics_debug`: False
698
+ - `debug`: []
699
+ - `dataloader_drop_last`: False
700
+ - `dataloader_num_workers`: 0
701
+ - `dataloader_prefetch_factor`: None
702
+ - `past_index`: -1
703
+ - `disable_tqdm`: False
704
+ - `remove_unused_columns`: True
705
+ - `label_names`: None
706
+ - `load_best_model_at_end`: True
707
+ - `ignore_data_skip`: False
708
+ - `fsdp`: []
709
+ - `fsdp_min_num_params`: 0
710
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
711
+ - `fsdp_transformer_layer_cls_to_wrap`: None
712
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
713
+ - `deepspeed`: None
714
+ - `label_smoothing_factor`: 0.0
715
+ - `optim`: adamw_torch_fused
716
+ - `optim_args`: None
717
+ - `adafactor`: False
718
+ - `group_by_length`: False
719
+ - `length_column_name`: length
720
+ - `ddp_find_unused_parameters`: None
721
+ - `ddp_bucket_cap_mb`: None
722
+ - `ddp_broadcast_buffers`: False
723
+ - `dataloader_pin_memory`: True
724
+ - `dataloader_persistent_workers`: False
725
+ - `skip_memory_metrics`: True
726
+ - `use_legacy_prediction_loop`: False
727
+ - `push_to_hub`: False
728
+ - `resume_from_checkpoint`: None
729
+ - `hub_model_id`: None
730
+ - `hub_strategy`: every_save
731
+ - `hub_private_repo`: False
732
+ - `hub_always_push`: False
733
+ - `gradient_checkpointing`: False
734
+ - `gradient_checkpointing_kwargs`: None
735
+ - `include_inputs_for_metrics`: False
736
+ - `eval_do_concat_batches`: True
737
+ - `fp16_backend`: auto
738
+ - `push_to_hub_model_id`: None
739
+ - `push_to_hub_organization`: None
740
+ - `mp_parameters`:
741
+ - `auto_find_batch_size`: False
742
+ - `full_determinism`: False
743
+ - `torchdynamo`: None
744
+ - `ray_scope`: last
745
+ - `ddp_timeout`: 1800
746
+ - `torch_compile`: False
747
+ - `torch_compile_backend`: None
748
+ - `torch_compile_mode`: None
749
+ - `dispatch_batches`: None
750
+ - `split_batches`: None
751
+ - `include_tokens_per_second`: False
752
+ - `include_num_input_tokens_seen`: False
753
+ - `neftune_noise_alpha`: None
754
+ - `optim_target_modules`: None
755
+ - `batch_eval_metrics`: False
756
+ - `prompts`: None
757
+ - `batch_sampler`: no_duplicates
758
+ - `multi_dataset_batch_sampler`: proportional
759
+ - `router_mapping`: {}
760
+ - `learning_rate_mapping`: {}
761
+
762
+ </details>
763
+
764
+ ### Training Logs
765
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
766
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
767
+ | 0.8122 | 10 | 1.5429 | - | - | - | - | - |
768
+ | 0.9746 | 12 | - | 0.7915 | 0.7927 | 0.7820 | 0.7722 | 0.7396 |
769
+ | 1.6244 | 20 | 0.6772 | - | - | - | - | - |
770
+ | 1.9492 | 24 | - | 0.8019 | 0.8041 | 0.7971 | 0.7835 | 0.7625 |
771
+ | 2.4365 | 30 | 0.5496 | - | - | - | - | - |
772
+ | 2.9239 | 36 | - | 0.8048 | 0.8070 | 0.8007 | 0.7879 | 0.7690 |
773
+ | 3.2487 | 40 | 0.4528 | - | - | - | - | - |
774
+ | **3.8985** | **48** | **-** | **0.8053** | **0.8099** | **0.8009** | **0.7894** | **0.7707** |
775
+
776
+ * The bold row denotes the saved checkpoint.
777
+
778
+ ### Framework Versions
779
+ - Python: 3.10.12
780
+ - Sentence Transformers: 5.0.0
781
+ - Transformers: 4.41.2
782
+ - PyTorch: 2.1.2+cu121
783
+ - Accelerate: 1.8.1
784
+ - Datasets: 2.19.1
785
+ - Tokenizers: 0.19.1
786
+
787
+ ## Citation
788
+
789
+ ### BibTeX
790
+
791
+ #### Sentence Transformers
792
+ ```bibtex
793
+ @inproceedings{reimers-2019-sentence-bert,
794
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
795
+ author = "Reimers, Nils and Gurevych, Iryna",
796
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
797
+ month = "11",
798
+ year = "2019",
799
+ publisher = "Association for Computational Linguistics",
800
+ url = "https://arxiv.org/abs/1908.10084",
801
+ }
802
+ ```
803
+
804
+ #### MatryoshkaLoss
805
+ ```bibtex
806
+ @misc{kusupati2024matryoshka,
807
+ title={Matryoshka Representation Learning},
808
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
809
+ year={2024},
810
+ eprint={2205.13147},
811
+ archivePrefix={arXiv},
812
+ primaryClass={cs.LG}
813
+ }
814
+ ```
815
+
816
+ #### MultipleNegativesRankingLoss
817
+ ```bibtex
818
+ @misc{henderson2017efficient,
819
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
820
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
821
+ year={2017},
822
+ eprint={1705.00652},
823
+ archivePrefix={arXiv},
824
+ primaryClass={cs.CL}
825
+ }
826
+ ```
827
+
828
+ <!--
829
+ ## Glossary
830
+
831
+ *Clearly define terms in order to be accessible across audiences.*
832
+ -->
833
+
834
+ <!--
835
+ ## Model Card Authors
836
+
837
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
838
+ -->
839
+
840
+ <!--
841
+ ## Model Card Contact
842
+
843
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
844
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.0.0",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2+cu121"
6
+ },
7
+ "model_type": "SentenceTransformer",
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:602fa462b6cf004974e1d1d519b38e0c1d6926a95cef0764d5a478dedc312e69
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff