Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
9
This is a sentence-transformers model finetuned from dunzhang/stella_en_400M_v5 on the csv dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jebish7/stella-MNSR-2")
# Run inference
sentences = [
"Could you provide examples of circumstances that, when changed, would necessitate the reevaluation of a customer's risk assessment and the application of updated CDD measures?",
'DocumentID: 1 | PassageID: 8.1.2.(1) | Passage: A Relevant Person must also apply CDD measures to each existing customer under Rules \u200e8.3.1, \u200e8.4.1 or \u200e8.5.1 as applicable:\n(a)\twith a frequency appropriate to the outcome of the risk-based approach taken in relation to each customer; and\n(b)\twhen the Relevant Person becomes aware that any circumstances relevant to its risk assessment for a customer have changed.',
"DocumentID: 13 | PassageID: 9.2.1.Guidance.1. | Passage: The Regulator expects that an Authorised Person's Liquidity Risk strategy will set out the approach that the Authorised Person will take to Liquidity Risk management, including various quantitative and qualitative targets. It should be communicated to all relevant functions and staff within the organisation and be set out in the Authorised Person's Liquidity Risk policy.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
Could you outline the expected procedures for a Trade Repository to notify relevant authorities of any significant errors or omissions in previously submitted data? |
DocumentID: 7 |
In the context of a non-binding MPO, how are commodities held by an Authorised Person treated for the purpose of determining the Commodities Risk Capital Requirement? |
DocumentID: 9 |
Can the FSRA provide case studies or examples of best practices for RIEs operating MTFs or OTFs using spot commodities in line with the Spot Commodities Framework? |
DocumentID: 34 |
MultipleNegativesSymmetricRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
per_device_train_batch_size: 16num_train_epochs: 1warmup_ratio: 0.1batch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss |
|---|---|---|
| 0.0541 | 100 | 0.4442 |
| 0.1083 | 200 | 0.4793 |
| 0.1624 | 300 | 0.4395 |
| 0.2166 | 400 | 0.4783 |
| 0.2707 | 500 | 0.4573 |
| 0.3249 | 600 | 0.4235 |
| 0.3790 | 700 | 0.4029 |
| 0.4331 | 800 | 0.3951 |
| 0.4873 | 900 | 0.438 |
| 0.5414 | 1000 | 0.364 |
| 0.5956 | 1100 | 0.3732 |
| 0.6497 | 1200 | 0.3932 |
| 0.7038 | 1300 | 0.3387 |
| 0.7580 | 1400 | 0.2956 |
| 0.8121 | 1500 | 0.3612 |
| 0.8663 | 1600 | 0.3333 |
| 0.9204 | 1700 | 0.2837 |
| 0.9746 | 1800 | 0.2785 |
| 0.0541 | 100 | 0.2263 |
| 0.1083 | 200 | 0.2085 |
| 0.1624 | 300 | 0.1638 |
| 0.2166 | 400 | 0.2085 |
| 0.2707 | 500 | 0.2442 |
| 0.3249 | 600 | 0.1965 |
| 0.3790 | 700 | 0.2548 |
| 0.4331 | 800 | 0.2504 |
| 0.4873 | 900 | 0.2358 |
| 0.5414 | 1000 | 0.2083 |
| 0.5956 | 1100 | 0.2117 |
| 0.6497 | 1200 | 0.248 |
| 0.7038 | 1300 | 0.221 |
| 0.7580 | 1400 | 0.1886 |
| 0.8121 | 1500 | 0.2653 |
| 0.8663 | 1600 | 0.2651 |
| 0.9204 | 1700 | 0.2349 |
| 0.9746 | 1800 | 0.2435 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
NovaSearch/stella_en_400M_v5