Add new CrossEncoder model

Browse files

Files changed (7) hide show

README.md +498 -0
config.json +34 -0
model.safetensors +3 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +58 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,498 @@

+---
+language:
+- en
+tags:
+- sentence-transformers
+- cross-encoder
+- reranker
+- generated_from_trainer
+- dataset_size:90000
+- loss:BinaryCrossEntropyLoss
+base_model: bansalaman18/bert-uncased_L-10_H-256_A-4
+datasets:
+- sentence-transformers/msmarco
+pipeline_tag: text-ranking
+library_name: sentence-transformers
+metrics:
+- map
+- mrr@10
+- ndcg@10
+model-index:
+- name: CrossEncoder based on bansalaman18/bert-uncased_L-10_H-256_A-4
+  results:
+  - task:
+      type: cross-encoder-reranking
+      name: Cross Encoder Reranking
+    dataset:
+      name: NanoMSMARCO R100
+      type: NanoMSMARCO_R100
+    metrics:
+    - type: map
+      value: 0.0872
+      name: Map
+    - type: mrr@10
+      value: 0.0649
+      name: Mrr@10
+    - type: ndcg@10
+      value: 0.0903
+      name: Ndcg@10
+  - task:
+      type: cross-encoder-reranking
+      name: Cross Encoder Reranking
+    dataset:
+      name: NanoNFCorpus R100
+      type: NanoNFCorpus_R100
+    metrics:
+    - type: map
+      value: 0.2815
+      name: Map
+    - type: mrr@10
+      value: 0.4108
+      name: Mrr@10
+    - type: ndcg@10
+      value: 0.2897
+      name: Ndcg@10
+  - task:
+      type: cross-encoder-reranking
+      name: Cross Encoder Reranking
+    dataset:
+      name: NanoNQ R100
+      type: NanoNQ_R100
+    metrics:
+    - type: map
+      value: 0.0564
+      name: Map
+    - type: mrr@10
+      value: 0.0317
+      name: Mrr@10
+    - type: ndcg@10
+      value: 0.0532
+      name: Ndcg@10
+  - task:
+      type: cross-encoder-nano-beir
+      name: Cross Encoder Nano BEIR
+    dataset:
+      name: NanoBEIR R100 mean
+      type: NanoBEIR_R100_mean
+    metrics:
+    - type: map
+      value: 0.1417
+      name: Map
+    - type: mrr@10
+      value: 0.1692
+      name: Mrr@10
+    - type: ndcg@10
+      value: 0.1444
+      name: Ndcg@10
+---
+# CrossEncoder based on bansalaman18/bert-uncased_L-10_H-256_A-4
+This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [bansalaman18/bert-uncased_L-10_H-256_A-4](https://huggingface.co/bansalaman18/bert-uncased_L-10_H-256_A-4) on the [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
+## Model Details
+### Model Description
+- **Model Type:** Cross Encoder
+- **Base model:** [bansalaman18/bert-uncased_L-10_H-256_A-4](https://huggingface.co/bansalaman18/bert-uncased_L-10_H-256_A-4) <!-- at revision 2c743a1678c7e2a9a2ba9cda4400b08cfa7054fc -->
+- **Maximum Sequence Length:** 512 tokens
+- **Number of Output Labels:** 1 label
+- **Training Dataset:**
+    - [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco)
+- **Language:** en
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import CrossEncoder
+# Download from the 🤗 Hub
+model = CrossEncoder("rahulseetharaman/reranker-bert-uncased_L-10_H-256_A-4-msmarco-bce")
+# Get scores for pairs of texts
+pairs = [
+    ['are solar pool covers worth it', 'If you are using Onga pool pumps or Hurlcon pool pumps, then you need not worry about them getting overheated for they are one of the best pool pumps available on the market. If you want to know about What causes a pool pump to overheat so please visit here onga pool pumps.'],
+    ['how much do Customer Service Agent: Ticketing/Gate make in general', '$41,000. Average Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.verage Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.'],
+    ['what is adverse selection economics', 'The last first woman to win the Nobel in her category was Elinor Ostrom, who shared the 2009 economics prize for her groundbreaking analysis of common property. The wait was so long for a woman economics laureate in part because that prize wasnâ\x80\x99t established until 1969.'],
+    ['where do newts live', 'Newts can be found living in North America, Europe and Asia. They are not found in Australia or Africa. In fact there are no species of salamander that live in Australia and only a few found in Northern Africa. Seven species of newt live in Europe.'],
+    ['define: rolling hourly average', 'An example of two moving average curves. In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is a type of finite impulse response filter.'],
+]
+scores = model.predict(pairs)
+print(scores.shape)
+# (5,)
+# Or rank different texts based on similarity to a single text
+ranks = model.rank(
+    'are solar pool covers worth it',
+    [
+        'If you are using Onga pool pumps or Hurlcon pool pumps, then you need not worry about them getting overheated for they are one of the best pool pumps available on the market. If you want to know about What causes a pool pump to overheat so please visit here onga pool pumps.',
+        '$41,000. Average Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.verage Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.',
+        'The last first woman to win the Nobel in her category was Elinor Ostrom, who shared the 2009 economics prize for her groundbreaking analysis of common property. The wait was so long for a woman economics laureate in part because that prize wasnâ\x80\x99t established until 1969.',
+        'Newts can be found living in North America, Europe and Asia. They are not found in Australia or Africa. In fact there are no species of salamander that live in Australia and only a few found in Northern Africa. Seven species of newt live in Europe.',
+        'An example of two moving average curves. In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is a type of finite impulse response filter.',
+    ]
+)
+# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Cross Encoder Reranking
+* Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
+* Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
+  ```json
+  {
+      "at_k": 10,
+      "always_rerank_positives": true
+  }
+  ```
+| Metric      | NanoMSMARCO_R100     | NanoNFCorpus_R100    | NanoNQ_R100          |
+|:------------|:---------------------|:---------------------|:---------------------|
+| map         | 0.0872 (-0.4024)     | 0.2815 (+0.0205)     | 0.0564 (-0.3632)     |
+| mrr@10      | 0.0649 (-0.4126)     | 0.4108 (-0.0890)     | 0.0317 (-0.3949)     |
+| **ndcg@10** | **0.0903 (-0.4501)** | **0.2897 (-0.0353)** | **0.0532 (-0.4474)** |
+#### Cross Encoder Nano BEIR
+* Dataset: `NanoBEIR_R100_mean`
+* Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
+  ```json
+  {
+      "dataset_names": [
+          "msmarco",
+          "nfcorpus",
+          "nq"
+      ],
+      "rerank_k": 100,
+      "at_k": 10,
+      "always_rerank_positives": true
+  }
+  ```
+| Metric      | Value                |
+|:------------|:---------------------|
+| map         | 0.1417 (-0.2484)     |
+| mrr@10      | 0.1692 (-0.2989)     |
+| **ndcg@10** | **0.1444 (-0.3110)** |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### msmarco
+* Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
+* Size: 90,000 training samples
+* Columns: <code>query</code>, <code>passage</code>, and <code>score</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | query                                                                                          | passage                                                                                           | score                                                          |
+  |:--------|:-----------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
+  | type    | string                                                                                         | string                                                                                            | float                                                          |
+  | details | <ul><li>min: 7 characters</li><li>mean: 33.59 characters</li><li>max: 164 characters</li></ul> | <ul><li>min: 49 characters</li><li>mean: 340.88 characters</li><li>max: 1018 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.53</li><li>max: 1.0</li></ul> |
+* Samples:
+  | query                                        | passage                                                                                                                                                                                                                                                                                                                                           | score            |
+  |:---------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
+  | <code>fantomcoin current price</code>        | <code>The current Average monthly rental price per square meter for a studio property in Pretoria / Tshwane on Gumtree is R 47.</code>                                                                                                                                                                                                            | <code>0.0</code> |
+  | <code>ddp price definition</code>            | <code>Delivered Duty Paid - DDP. Loading the player... What does 'Delivered Duty Paid - DDP' mean. Delivered duty paid (DDP) is a transaction where the seller pays for the total costs associated with transporting goods and is fully responsible for the goods until they are received and transferred to the buyer.</code>                    | <code>1.0</code> |
+  | <code>what is neil diamond's hometown</code> | <code>Oct 6, 2014 8:00 am ET. Brooklyn native Neil Diamond played his first-ever hometown show last week with a 10-song set at Erasmus Hall High School, where he sang in the choir during the two years he was a student there. Speakeasy today premieres a clip of Diamond performing the new song âSomething Blueâ at that concert.</code> | <code>1.0</code> |
+* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
+  ```json
+  {
+      "activation_fn": "torch.nn.modules.linear.Identity",
+      "pos_weight": null
+  }
+  ```
+### Evaluation Dataset
+#### msmarco
+* Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
+* Size: 10,000 evaluation samples
+* Columns: <code>query</code>, <code>passage</code>, and <code>score</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | query                                                                                          | passage                                                                                          | score                                                          |
+  |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
+  | type    | string                                                                                         | string                                                                                           | float                                                          |
+  | details | <ul><li>min: 9 characters</li><li>mean: 34.17 characters</li><li>max: 146 characters</li></ul> | <ul><li>min: 83 characters</li><li>mean: 349.58 characters</li><li>max: 974 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.51</li><li>max: 1.0</li></ul> |
+* Samples:
+  | query                                                                           | passage                                                                                                                                                                                                                                                                                                                                                                                                                                           | score            |
+  |:--------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
+  | <code>are solar pool covers worth it</code>                                     | <code>If you are using Onga pool pumps or Hurlcon pool pumps, then you need not worry about them getting overheated for they are one of the best pool pumps available on the market. If you want to know about What causes a pool pump to overheat so please visit here onga pool pumps.</code>                                                                                                                                                   | <code>0.0</code> |
+  | <code>how much do Customer Service Agent: Ticketing/Gate make in general</code> | <code>$41,000. Average Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.verage Airport Customer Service Ticketing Gate Agent salaries for job postings in Houston, TX are 13% higher than average Airport Customer Service Ticketing Gate Agent salaries for job postings nationwide.</code> | <code>1.0</code> |
+  | <code>what is adverse selection economics</code>                                | <code>The last first woman to win the Nobel in her category was Elinor Ostrom, who shared the 2009 economics prize for her groundbreaking analysis of common property. The wait was so long for a woman economics laureate in part because that prize wasnât established until 1969.</code>                                                                                                                                                     | <code>0.0</code> |
+* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
+  ```json
+  {
+      "activation_fn": "torch.nn.modules.linear.Identity",
+      "pos_weight": null
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 16
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 4
+- `warmup_ratio`: 0.1
+- `seed`: 12
+- `bf16`: True
+- `dataloader_num_workers`: 4
+- `load_best_model_at_end`: True
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 16
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 4
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 12
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: True
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 4
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: True
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `hub_revision`: None
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `liger_kernel_config`: None
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: proportional
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
+</details>
+### Training Logs
+| Epoch      | Step      | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10  | NanoBEIR_R100_mean_ndcg@10 |
+|:----------:|:---------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
+| -1         | -1        | -             | -               | 0.0797 (-0.4607)         | 0.2817 (-0.0434)          | 0.0302 (-0.4704)     | 0.1305 (-0.3248)           |
+| 0.0002     | 1         | 0.6362        | -               | -                        | -                         | -                    | -                          |
+| 0.1778     | 1000      | 0.6946        | 0.7033          | 0.0227 (-0.5178)         | 0.2131 (-0.1119)          | 0.0285 (-0.4722)     | 0.0881 (-0.3673)           |
+| 0.3556     | 2000      | 0.6943        | 0.6900          | 0.0155 (-0.5250)         | 0.2458 (-0.0792)          | 0.0718 (-0.4289)     | 0.1110 (-0.3443)           |
+| 0.5333     | 3000      | 0.6924        | 0.6786          | 0.0399 (-0.5005)         | 0.2142 (-0.1109)          | 0.0626 (-0.4380)     | 0.1056 (-0.3498)           |
+| 0.7111     | 4000      | 0.6821        | 0.6755          | 0.0379 (-0.5025)         | 0.2399 (-0.0851)          | 0.0682 (-0.4325)     | 0.1153 (-0.3400)           |
+| 0.8889     | 5000      | 0.6749        | 0.6678          | 0.0466 (-0.4938)         | 0.2542 (-0.0709)          | 0.0947 (-0.4060)     | 0.1318 (-0.3235)           |
+| 1.0667     | 6000      | 0.6699        | 0.6661          | 0.0536 (-0.4868)         | 0.2670 (-0.0581)          | 0.0498 (-0.4508)     | 0.1235 (-0.3319)           |
+| 1.2444     | 7000      | 0.6576        | 0.6651          | 0.0389 (-0.5016)         | 0.2491 (-0.0760)          | 0.0450 (-0.4557)     | 0.1110 (-0.3444)           |
+| 1.4222     | 8000      | 0.6579        | 0.6891          | 0.0375 (-0.5029)         | 0.2852 (-0.0398)          | 0.0370 (-0.4637)     | 0.1199 (-0.3355)           |
+| 1.6        | 9000      | 0.6459        | 0.6646          | 0.0553 (-0.4851)         | 0.2706 (-0.0544)          | 0.0461 (-0.4545)     | 0.1240 (-0.3314)           |
+| 1.7778     | 10000     | 0.6576        | 0.6592          | 0.0493 (-0.4911)         | 0.2633 (-0.0618)          | 0.0352 (-0.4654)     | 0.1159 (-0.3394)           |
+| 1.9556     | 11000     | 0.6499        | 0.6589          | 0.0631 (-0.4773)         | 0.2778 (-0.0472)          | 0.0581 (-0.4426)     | 0.1330 (-0.3224)           |
+| 2.1333     | 12000     | 0.6289        | 0.6755          | 0.0744 (-0.4660)         | 0.2747 (-0.0503)          | 0.0386 (-0.4620)     | 0.1292 (-0.3261)           |
+| 2.3111     | 13000     | 0.6233        | 0.6888          | 0.0617 (-0.4787)         | 0.2963 (-0.0287)          | 0.0494 (-0.4513)     | 0.1358 (-0.3196)           |
+| 2.4889     | 14000     | 0.6257        | 0.6854          | 0.0788 (-0.4616)         | 0.2920 (-0.0331)          | 0.0532 (-0.4475)     | 0.1413 (-0.3141)           |
+| 2.6667     | 15000     | 0.619         | 0.6705          | 0.0741 (-0.4663)         | 0.2863 (-0.0388)          | 0.0645 (-0.4361)     | 0.1416 (-0.3137)           |
+| 2.8444     | 16000     | 0.6218        | 0.6868          | 0.0750 (-0.4654)         | 0.2874 (-0.0377)          | 0.0583 (-0.4424)     | 0.1402 (-0.3151)           |
+| 3.0222     | 17000     | 0.6191        | 0.6846          | 0.0768 (-0.4637)         | 0.2879 (-0.0372)          | 0.0393 (-0.4613)     | 0.1346 (-0.3207)           |
+| 3.2        | 18000     | 0.5977        | 0.6846          | 0.0883 (-0.4521)         | 0.2874 (-0.0376)          | 0.0457 (-0.4549)     | 0.1405 (-0.3149)           |
+| 3.3778     | 19000     | 0.5947        | 0.6938          | 0.0877 (-0.4528)         | 0.2798 (-0.0452)          | 0.0615 (-0.4391)     | 0.1430 (-0.3124)           |
+| 3.5556     | 20000     | 0.5944        | 0.6860          | 0.0815 (-0.4589)         | 0.2856 (-0.0395)          | 0.0561 (-0.4446)     | 0.1411 (-0.3143)           |
+| **3.7333** | **21000** | **0.5939**    | **0.6887**      | **0.0903 (-0.4501)**     | **0.2897 (-0.0353)**      | **0.0532 (-0.4474)** | **0.1444 (-0.3110)**       |
+| 3.9111     | 22000     | 0.5947        | 0.6908          | 0.0876 (-0.4528)         | 0.2897 (-0.0353)          | 0.0545 (-0.4461)     | 0.1440 (-0.3114)           |
+| -1         | -1        | -             | -               | 0.0903 (-0.4501)         | 0.2897 (-0.0353)          | 0.0532 (-0.4474)     | 0.1444 (-0.3110)           |
+* The bold row denotes the saved checkpoint.
+### Framework Versions
+- Python: 3.10.18
+- Sentence Transformers: 5.0.0
+- Transformers: 4.56.0.dev0
+- PyTorch: 2.7.1+cu126
+- Accelerate: 1.9.0
+- Datasets: 4.0.0
+- Tokenizers: 0.21.4
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 256,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 1024,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 4,
+  "num_hidden_layers": 10,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "sentence_transformers": {
+    "activation_fn": "torch.nn.modules.activation.Sigmoid",
+    "version": "5.0.0"
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.56.0.dev0",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b48e3902af4b99993437fa0cfa75ff8306333fa96c2e6c64c13d2b4ee04a3dc5
+size 63656924

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff