rahulseetharaman commited on
Commit
9d724bf
·
verified ·
1 Parent(s): b02b0ac

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,498 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - reranker
8
+ - generated_from_trainer
9
+ - dataset_size:90000
10
+ - loss:BinaryCrossEntropyLoss
11
+ base_model: bansalaman18/bert-uncased_L-10_H-256_A-4
12
+ datasets:
13
+ - sentence-transformers/msmarco
14
+ pipeline_tag: text-ranking
15
+ library_name: sentence-transformers
16
+ metrics:
17
+ - map
18
+ - mrr@10
19
+ - ndcg@10
20
+ model-index:
21
+ - name: CrossEncoder based on bansalaman18/bert-uncased_L-10_H-256_A-4
22
+ results:
23
+ - task:
24
+ type: cross-encoder-reranking
25
+ name: Cross Encoder Reranking
26
+ dataset:
27
+ name: NanoMSMARCO R100
28
+ type: NanoMSMARCO_R100
29
+ metrics:
30
+ - type: map
31
+ value: 0.0872
32
+ name: Map
33
+ - type: mrr@10
34
+ value: 0.0649
35
+ name: Mrr@10
36
+ - type: ndcg@10
37
+ value: 0.0903
38
+ name: Ndcg@10
39
+ - task:
40
+ type: cross-encoder-reranking
41
+ name: Cross Encoder Reranking
42
+ dataset:
43
+ name: NanoNFCorpus R100
44
+ type: NanoNFCorpus_R100
45
+ metrics:
46
+ - type: map
47
+ value: 0.2815
48
+ name: Map
49
+ - type: mrr@10
50
+ value: 0.4108
51
+ name: Mrr@10
52
+ - type: ndcg@10
53
+ value: 0.2897
54
+ name: Ndcg@10
55
+ - task:
56
+ type: cross-encoder-reranking
57
+ name: Cross Encoder Reranking
58
+ dataset:
59
+ name: NanoNQ R100
60
+ type: NanoNQ_R100
61
+ metrics:
62
+ - type: map
63
+ value: 0.0564
64
+ name: Map
65
+ - type: mrr@10
66
+ value: 0.0317
67
+ name: Mrr@10
68
+ - type: ndcg@10
69
+ value: 0.0532
70
+ name: Ndcg@10
71
+ - task:
72
+ type: cross-encoder-nano-beir
73
+ name: Cross Encoder Nano BEIR
74
+ dataset:
75
+ name: NanoBEIR R100 mean
76
+ type: NanoBEIR_R100_mean
77
+ metrics:
78
+ - type: map
79
+ value: 0.1417
80
+ name: Map
81
+ - type: mrr@10
82
+ value: 0.1692
83
+ name: Mrr@10
84
+ - type: ndcg@10
85
+ value: 0.1444
86
+ name: Ndcg@10
87
+ ---
88
+
89
+ # CrossEncoder based on bansalaman18/bert-uncased_L-10_H-256_A-4
90
+
91
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [bansalaman18/bert-uncased_L-10_H-256_A-4](https://huggingface.co/bansalaman18/bert-uncased_L-10_H-256_A-4) on the [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
92
+
93
+ ## Model Details
94
+
95
+ ### Model Description
96
+ - **Model Type:** Cross Encoder
97
+ - **Base model:** [bansalaman18/bert-uncased_L-10_H-256_A-4](https://huggingface.co/bansalaman18/bert-uncased_L-10_H-256_A-4) <!-- at revision 2c743a1678c7e2a9a2ba9cda4400b08cfa7054fc -->
98
+ - **Maximum Sequence Length:** 512 tokens
99
+ - **Number of Output Labels:** 1 label
100
+ - **Training Dataset:**
101
+ - [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco)
102
+ - **Language:** en
103
+ <!-- - **License:** Unknown -->
104
+
105
+ ### Model Sources
106
+
107
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
108
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
109
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
110
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
111
+
112
+ ## Usage
113
+
114
+ ### Direct Usage (Sentence Transformers)
115
+
116
+ First install the Sentence Transformers library:
117
+
118
+ ```bash
119
+ pip install -U sentence-transformers
120
+ ```
121
+
122
+ Then you can load this model and run inference.
123
+ ```python
124
+ from sentence_transformers import CrossEncoder
125
+
126
+ # Download from the 🤗 Hub
127
+ model = CrossEncoder("rahulseetharaman/reranker-bert-uncased_L-10_H-256_A-4-msmarco-bce")
128
+ # Get scores for pairs of texts
129
+ pairs = [
130
+ ['are solar pool covers worth it', 'If you are using Onga pool pumps or Hurlcon pool pumps, then you need not worry about them getting overheated for they are one of the best pool pumps available on the market. If you want to know about What causes a pool pump to overheat so please visit here onga pool pumps.'],
131
+ ['how much do Customer Service Agent: Ticketing/Gate make in general', '$41,000. Average Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.verage Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.'],
132
+ ['what is adverse selection economics', 'The last first woman to win the Nobel in her category was Elinor Ostrom, who shared the 2009 economics prize for her groundbreaking analysis of common property. The wait was so long for a woman economics laureate in part because that prize wasnâ\x80\x99t established until 1969.'],
133
+ ['where do newts live', 'Newts can be found living in North America, Europe and Asia. They are not found in Australia or Africa. In fact there are no species of salamander that live in Australia and only a few found in Northern Africa. Seven species of newt live in Europe.'],
134
+ ['define: rolling hourly average', 'An example of two moving average curves. In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is a type of finite impulse response filter.'],
135
+ ]
136
+ scores = model.predict(pairs)
137
+ print(scores.shape)
138
+ # (5,)
139
+
140
+ # Or rank different texts based on similarity to a single text
141
+ ranks = model.rank(
142
+ 'are solar pool covers worth it',
143
+ [
144
+ 'If you are using Onga pool pumps or Hurlcon pool pumps, then you need not worry about them getting overheated for they are one of the best pool pumps available on the market. If you want to know about What causes a pool pump to overheat so please visit here onga pool pumps.',
145
+ '$41,000. Average Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.verage Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.',
146
+ 'The last first woman to win the Nobel in her category was Elinor Ostrom, who shared the 2009 economics prize for her groundbreaking analysis of common property. The wait was so long for a woman economics laureate in part because that prize wasnâ\x80\x99t established until 1969.',
147
+ 'Newts can be found living in North America, Europe and Asia. They are not found in Australia or Africa. In fact there are no species of salamander that live in Australia and only a few found in Northern Africa. Seven species of newt live in Europe.',
148
+ 'An example of two moving average curves. In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is a type of finite impulse response filter.',
149
+ ]
150
+ )
151
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
152
+ ```
153
+
154
+ <!--
155
+ ### Direct Usage (Transformers)
156
+
157
+ <details><summary>Click to see the direct usage in Transformers</summary>
158
+
159
+ </details>
160
+ -->
161
+
162
+ <!--
163
+ ### Downstream Usage (Sentence Transformers)
164
+
165
+ You can finetune this model on your own dataset.
166
+
167
+ <details><summary>Click to expand</summary>
168
+
169
+ </details>
170
+ -->
171
+
172
+ <!--
173
+ ### Out-of-Scope Use
174
+
175
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
176
+ -->
177
+
178
+ ## Evaluation
179
+
180
+ ### Metrics
181
+
182
+ #### Cross Encoder Reranking
183
+
184
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
185
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
186
+ ```json
187
+ {
188
+ "at_k": 10,
189
+ "always_rerank_positives": true
190
+ }
191
+ ```
192
+
193
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
194
+ |:------------|:---------------------|:---------------------|:---------------------|
195
+ | map | 0.0872 (-0.4024) | 0.2815 (+0.0205) | 0.0564 (-0.3632) |
196
+ | mrr@10 | 0.0649 (-0.4126) | 0.4108 (-0.0890) | 0.0317 (-0.3949) |
197
+ | **ndcg@10** | **0.0903 (-0.4501)** | **0.2897 (-0.0353)** | **0.0532 (-0.4474)** |
198
+
199
+ #### Cross Encoder Nano BEIR
200
+
201
+ * Dataset: `NanoBEIR_R100_mean`
202
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
203
+ ```json
204
+ {
205
+ "dataset_names": [
206
+ "msmarco",
207
+ "nfcorpus",
208
+ "nq"
209
+ ],
210
+ "rerank_k": 100,
211
+ "at_k": 10,
212
+ "always_rerank_positives": true
213
+ }
214
+ ```
215
+
216
+ | Metric | Value |
217
+ |:------------|:---------------------|
218
+ | map | 0.1417 (-0.2484) |
219
+ | mrr@10 | 0.1692 (-0.2989) |
220
+ | **ndcg@10** | **0.1444 (-0.3110)** |
221
+
222
+ <!--
223
+ ## Bias, Risks and Limitations
224
+
225
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
226
+ -->
227
+
228
+ <!--
229
+ ### Recommendations
230
+
231
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
232
+ -->
233
+
234
+ ## Training Details
235
+
236
+ ### Training Dataset
237
+
238
+ #### msmarco
239
+
240
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
241
+ * Size: 90,000 training samples
242
+ * Columns: <code>query</code>, <code>passage</code>, and <code>score</code>
243
+ * Approximate statistics based on the first 1000 samples:
244
+ | | query | passage | score |
245
+ |:--------|:-----------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
246
+ | type | string | string | float |
247
+ | details | <ul><li>min: 7 characters</li><li>mean: 33.59 characters</li><li>max: 164 characters</li></ul> | <ul><li>min: 49 characters</li><li>mean: 340.88 characters</li><li>max: 1018 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.53</li><li>max: 1.0</li></ul> |
248
+ * Samples:
249
+ | query | passage | score |
250
+ |:---------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
251
+ | <code>fantomcoin current price</code> | <code>The current Average monthly rental price per square meter for a studio property in Pretoria / Tshwane on Gumtree is R 47.</code> | <code>0.0</code> |
252
+ | <code>ddp price definition</code> | <code>Delivered Duty Paid - DDP. Loading the player... What does 'Delivered Duty Paid - DDP' mean. Delivered duty paid (DDP) is a transaction where the seller pays for the total costs associated with transporting goods and is fully responsible for the goods until they are received and transferred to the buyer.</code> | <code>1.0</code> |
253
+ | <code>what is neil diamond's hometown</code> | <code>Oct 6, 2014 8:00 am ET. Brooklyn native Neil Diamond played his first-ever hometown show last week with a 10-song set at Erasmus Hall High School, where he sang in the choir during the two years he was a student there. Speakeasy today premieres a clip of Diamond performing the new song “Something Blue” at that concert.</code> | <code>1.0</code> |
254
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
255
+ ```json
256
+ {
257
+ "activation_fn": "torch.nn.modules.linear.Identity",
258
+ "pos_weight": null
259
+ }
260
+ ```
261
+
262
+ ### Evaluation Dataset
263
+
264
+ #### msmarco
265
+
266
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
267
+ * Size: 10,000 evaluation samples
268
+ * Columns: <code>query</code>, <code>passage</code>, and <code>score</code>
269
+ * Approximate statistics based on the first 1000 samples:
270
+ | | query | passage | score |
271
+ |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
272
+ | type | string | string | float |
273
+ | details | <ul><li>min: 9 characters</li><li>mean: 34.17 characters</li><li>max: 146 characters</li></ul> | <ul><li>min: 83 characters</li><li>mean: 349.58 characters</li><li>max: 974 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.51</li><li>max: 1.0</li></ul> |
274
+ * Samples:
275
+ | query | passage | score |
276
+ |:--------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
277
+ | <code>are solar pool covers worth it</code> | <code>If you are using Onga pool pumps or Hurlcon pool pumps, then you need not worry about them getting overheated for they are one of the best pool pumps available on the market. If you want to know about What causes a pool pump to overheat so please visit here onga pool pumps.</code> | <code>0.0</code> |
278
+ | <code>how much do Customer Service Agent: Ticketing/Gate make in general</code> | <code>$41,000. Average Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.verage Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.</code> | <code>1.0</code> |
279
+ | <code>what is adverse selection economics</code> | <code>The last first woman to win the Nobel in her category was Elinor Ostrom, who shared the 2009 economics prize for her groundbreaking analysis of common property. The wait was so long for a woman economics laureate in part because that prize wasn’t established until 1969.</code> | <code>0.0</code> |
280
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
281
+ ```json
282
+ {
283
+ "activation_fn": "torch.nn.modules.linear.Identity",
284
+ "pos_weight": null
285
+ }
286
+ ```
287
+
288
+ ### Training Hyperparameters
289
+ #### Non-Default Hyperparameters
290
+
291
+ - `eval_strategy`: steps
292
+ - `per_device_train_batch_size`: 16
293
+ - `per_device_eval_batch_size`: 16
294
+ - `learning_rate`: 2e-05
295
+ - `num_train_epochs`: 4
296
+ - `warmup_ratio`: 0.1
297
+ - `seed`: 12
298
+ - `bf16`: True
299
+ - `dataloader_num_workers`: 4
300
+ - `load_best_model_at_end`: True
301
+
302
+ #### All Hyperparameters
303
+ <details><summary>Click to expand</summary>
304
+
305
+ - `overwrite_output_dir`: False
306
+ - `do_predict`: False
307
+ - `eval_strategy`: steps
308
+ - `prediction_loss_only`: True
309
+ - `per_device_train_batch_size`: 16
310
+ - `per_device_eval_batch_size`: 16
311
+ - `per_gpu_train_batch_size`: None
312
+ - `per_gpu_eval_batch_size`: None
313
+ - `gradient_accumulation_steps`: 1
314
+ - `eval_accumulation_steps`: None
315
+ - `torch_empty_cache_steps`: None
316
+ - `learning_rate`: 2e-05
317
+ - `weight_decay`: 0.0
318
+ - `adam_beta1`: 0.9
319
+ - `adam_beta2`: 0.999
320
+ - `adam_epsilon`: 1e-08
321
+ - `max_grad_norm`: 1.0
322
+ - `num_train_epochs`: 4
323
+ - `max_steps`: -1
324
+ - `lr_scheduler_type`: linear
325
+ - `lr_scheduler_kwargs`: {}
326
+ - `warmup_ratio`: 0.1
327
+ - `warmup_steps`: 0
328
+ - `log_level`: passive
329
+ - `log_level_replica`: warning
330
+ - `log_on_each_node`: True
331
+ - `logging_nan_inf_filter`: True
332
+ - `save_safetensors`: True
333
+ - `save_on_each_node`: False
334
+ - `save_only_model`: False
335
+ - `restore_callback_states_from_checkpoint`: False
336
+ - `no_cuda`: False
337
+ - `use_cpu`: False
338
+ - `use_mps_device`: False
339
+ - `seed`: 12
340
+ - `data_seed`: None
341
+ - `jit_mode_eval`: False
342
+ - `use_ipex`: False
343
+ - `bf16`: True
344
+ - `fp16`: False
345
+ - `fp16_opt_level`: O1
346
+ - `half_precision_backend`: auto
347
+ - `bf16_full_eval`: False
348
+ - `fp16_full_eval`: False
349
+ - `tf32`: None
350
+ - `local_rank`: 0
351
+ - `ddp_backend`: None
352
+ - `tpu_num_cores`: None
353
+ - `tpu_metrics_debug`: False
354
+ - `debug`: []
355
+ - `dataloader_drop_last`: False
356
+ - `dataloader_num_workers`: 4
357
+ - `dataloader_prefetch_factor`: None
358
+ - `past_index`: -1
359
+ - `disable_tqdm`: False
360
+ - `remove_unused_columns`: True
361
+ - `label_names`: None
362
+ - `load_best_model_at_end`: True
363
+ - `ignore_data_skip`: False
364
+ - `fsdp`: []
365
+ - `fsdp_min_num_params`: 0
366
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
367
+ - `fsdp_transformer_layer_cls_to_wrap`: None
368
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
369
+ - `deepspeed`: None
370
+ - `label_smoothing_factor`: 0.0
371
+ - `optim`: adamw_torch
372
+ - `optim_args`: None
373
+ - `adafactor`: False
374
+ - `group_by_length`: False
375
+ - `length_column_name`: length
376
+ - `ddp_find_unused_parameters`: None
377
+ - `ddp_bucket_cap_mb`: None
378
+ - `ddp_broadcast_buffers`: False
379
+ - `dataloader_pin_memory`: True
380
+ - `dataloader_persistent_workers`: False
381
+ - `skip_memory_metrics`: True
382
+ - `use_legacy_prediction_loop`: False
383
+ - `push_to_hub`: False
384
+ - `resume_from_checkpoint`: None
385
+ - `hub_model_id`: None
386
+ - `hub_strategy`: every_save
387
+ - `hub_private_repo`: None
388
+ - `hub_always_push`: False
389
+ - `hub_revision`: None
390
+ - `gradient_checkpointing`: False
391
+ - `gradient_checkpointing_kwargs`: None
392
+ - `include_inputs_for_metrics`: False
393
+ - `include_for_metrics`: []
394
+ - `eval_do_concat_batches`: True
395
+ - `fp16_backend`: auto
396
+ - `push_to_hub_model_id`: None
397
+ - `push_to_hub_organization`: None
398
+ - `mp_parameters`:
399
+ - `auto_find_batch_size`: False
400
+ - `full_determinism`: False
401
+ - `torchdynamo`: None
402
+ - `ray_scope`: last
403
+ - `ddp_timeout`: 1800
404
+ - `torch_compile`: False
405
+ - `torch_compile_backend`: None
406
+ - `torch_compile_mode`: None
407
+ - `include_tokens_per_second`: False
408
+ - `include_num_input_tokens_seen`: False
409
+ - `neftune_noise_alpha`: None
410
+ - `optim_target_modules`: None
411
+ - `batch_eval_metrics`: False
412
+ - `eval_on_start`: False
413
+ - `use_liger_kernel`: False
414
+ - `liger_kernel_config`: None
415
+ - `eval_use_gather_object`: False
416
+ - `average_tokens_across_devices`: False
417
+ - `prompts`: None
418
+ - `batch_sampler`: batch_sampler
419
+ - `multi_dataset_batch_sampler`: proportional
420
+ - `router_mapping`: {}
421
+ - `learning_rate_mapping`: {}
422
+
423
+ </details>
424
+
425
+ ### Training Logs
426
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
427
+ |:----------:|:---------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
428
+ | -1 | -1 | - | - | 0.0797 (-0.4607) | 0.2817 (-0.0434) | 0.0302 (-0.4704) | 0.1305 (-0.3248) |
429
+ | 0.0002 | 1 | 0.6362 | - | - | - | - | - |
430
+ | 0.1778 | 1000 | 0.6946 | 0.7033 | 0.0227 (-0.5178) | 0.2131 (-0.1119) | 0.0285 (-0.4722) | 0.0881 (-0.3673) |
431
+ | 0.3556 | 2000 | 0.6943 | 0.6900 | 0.0155 (-0.5250) | 0.2458 (-0.0792) | 0.0718 (-0.4289) | 0.1110 (-0.3443) |
432
+ | 0.5333 | 3000 | 0.6924 | 0.6786 | 0.0399 (-0.5005) | 0.2142 (-0.1109) | 0.0626 (-0.4380) | 0.1056 (-0.3498) |
433
+ | 0.7111 | 4000 | 0.6821 | 0.6755 | 0.0379 (-0.5025) | 0.2399 (-0.0851) | 0.0682 (-0.4325) | 0.1153 (-0.3400) |
434
+ | 0.8889 | 5000 | 0.6749 | 0.6678 | 0.0466 (-0.4938) | 0.2542 (-0.0709) | 0.0947 (-0.4060) | 0.1318 (-0.3235) |
435
+ | 1.0667 | 6000 | 0.6699 | 0.6661 | 0.0536 (-0.4868) | 0.2670 (-0.0581) | 0.0498 (-0.4508) | 0.1235 (-0.3319) |
436
+ | 1.2444 | 7000 | 0.6576 | 0.6651 | 0.0389 (-0.5016) | 0.2491 (-0.0760) | 0.0450 (-0.4557) | 0.1110 (-0.3444) |
437
+ | 1.4222 | 8000 | 0.6579 | 0.6891 | 0.0375 (-0.5029) | 0.2852 (-0.0398) | 0.0370 (-0.4637) | 0.1199 (-0.3355) |
438
+ | 1.6 | 9000 | 0.6459 | 0.6646 | 0.0553 (-0.4851) | 0.2706 (-0.0544) | 0.0461 (-0.4545) | 0.1240 (-0.3314) |
439
+ | 1.7778 | 10000 | 0.6576 | 0.6592 | 0.0493 (-0.4911) | 0.2633 (-0.0618) | 0.0352 (-0.4654) | 0.1159 (-0.3394) |
440
+ | 1.9556 | 11000 | 0.6499 | 0.6589 | 0.0631 (-0.4773) | 0.2778 (-0.0472) | 0.0581 (-0.4426) | 0.1330 (-0.3224) |
441
+ | 2.1333 | 12000 | 0.6289 | 0.6755 | 0.0744 (-0.4660) | 0.2747 (-0.0503) | 0.0386 (-0.4620) | 0.1292 (-0.3261) |
442
+ | 2.3111 | 13000 | 0.6233 | 0.6888 | 0.0617 (-0.4787) | 0.2963 (-0.0287) | 0.0494 (-0.4513) | 0.1358 (-0.3196) |
443
+ | 2.4889 | 14000 | 0.6257 | 0.6854 | 0.0788 (-0.4616) | 0.2920 (-0.0331) | 0.0532 (-0.4475) | 0.1413 (-0.3141) |
444
+ | 2.6667 | 15000 | 0.619 | 0.6705 | 0.0741 (-0.4663) | 0.2863 (-0.0388) | 0.0645 (-0.4361) | 0.1416 (-0.3137) |
445
+ | 2.8444 | 16000 | 0.6218 | 0.6868 | 0.0750 (-0.4654) | 0.2874 (-0.0377) | 0.0583 (-0.4424) | 0.1402 (-0.3151) |
446
+ | 3.0222 | 17000 | 0.6191 | 0.6846 | 0.0768 (-0.4637) | 0.2879 (-0.0372) | 0.0393 (-0.4613) | 0.1346 (-0.3207) |
447
+ | 3.2 | 18000 | 0.5977 | 0.6846 | 0.0883 (-0.4521) | 0.2874 (-0.0376) | 0.0457 (-0.4549) | 0.1405 (-0.3149) |
448
+ | 3.3778 | 19000 | 0.5947 | 0.6938 | 0.0877 (-0.4528) | 0.2798 (-0.0452) | 0.0615 (-0.4391) | 0.1430 (-0.3124) |
449
+ | 3.5556 | 20000 | 0.5944 | 0.6860 | 0.0815 (-0.4589) | 0.2856 (-0.0395) | 0.0561 (-0.4446) | 0.1411 (-0.3143) |
450
+ | **3.7333** | **21000** | **0.5939** | **0.6887** | **0.0903 (-0.4501)** | **0.2897 (-0.0353)** | **0.0532 (-0.4474)** | **0.1444 (-0.3110)** |
451
+ | 3.9111 | 22000 | 0.5947 | 0.6908 | 0.0876 (-0.4528) | 0.2897 (-0.0353) | 0.0545 (-0.4461) | 0.1440 (-0.3114) |
452
+ | -1 | -1 | - | - | 0.0903 (-0.4501) | 0.2897 (-0.0353) | 0.0532 (-0.4474) | 0.1444 (-0.3110) |
453
+
454
+ * The bold row denotes the saved checkpoint.
455
+
456
+ ### Framework Versions
457
+ - Python: 3.10.18
458
+ - Sentence Transformers: 5.0.0
459
+ - Transformers: 4.56.0.dev0
460
+ - PyTorch: 2.7.1+cu126
461
+ - Accelerate: 1.9.0
462
+ - Datasets: 4.0.0
463
+ - Tokenizers: 0.21.4
464
+
465
+ ## Citation
466
+
467
+ ### BibTeX
468
+
469
+ #### Sentence Transformers
470
+ ```bibtex
471
+ @inproceedings{reimers-2019-sentence-bert,
472
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
473
+ author = "Reimers, Nils and Gurevych, Iryna",
474
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
475
+ month = "11",
476
+ year = "2019",
477
+ publisher = "Association for Computational Linguistics",
478
+ url = "https://arxiv.org/abs/1908.10084",
479
+ }
480
+ ```
481
+
482
+ <!--
483
+ ## Glossary
484
+
485
+ *Clearly define terms in order to be accessible across audiences.*
486
+ -->
487
+
488
+ <!--
489
+ ## Model Card Authors
490
+
491
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
492
+ -->
493
+
494
+ <!--
495
+ ## Model Card Contact
496
+
497
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
498
+ -->
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 256,
10
+ "id2label": {
11
+ "0": "LABEL_0"
12
+ },
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 1024,
15
+ "label2id": {
16
+ "LABEL_0": 0
17
+ },
18
+ "layer_norm_eps": 1e-12,
19
+ "max_position_embeddings": 512,
20
+ "model_type": "bert",
21
+ "num_attention_heads": 4,
22
+ "num_hidden_layers": 10,
23
+ "pad_token_id": 0,
24
+ "position_embedding_type": "absolute",
25
+ "sentence_transformers": {
26
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
27
+ "version": "5.0.0"
28
+ },
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.56.0.dev0",
31
+ "type_vocab_size": 2,
32
+ "use_cache": true,
33
+ "vocab_size": 30522
34
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b48e3902af4b99993437fa0cfa75ff8306333fa96c2e6c64c13d2b4ee04a3dc5
3
+ size 63656924
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff