SentenceTransformer based on google-bert/bert-base-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-cased on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-cased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • csv

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("Jimmy-Ooi/Tyrisonase_test_model_800_8epoch_wd_1.00")
# Run inference
sentences = [
    'O=C(NC(Cc1ccccc1)C(=O)NO)OCc1cc(=O)c(O)co1',
    'Cc1ccc(C(C)C)c(OC(=O)/C=C/c2ccc(O)cc2)c1',
    'COCCOc1ccc(C=C2C(=O)NC(=S)NC2=O)cc1',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.6172, -0.5274],
#         [ 0.6172,  1.0000,  0.2429],
#         [-0.5274,  0.2429,  1.0000]])

Training Details

Training Dataset

csv

  • Dataset: csv
  • Size: 120,059 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 11 tokens
    • mean: 38.92 tokens
    • max: 106 tokens
    • min: 11 tokens
    • mean: 40.28 tokens
    • max: 106 tokens
    • 0: ~48.30%
    • 2: ~51.70%
  • Samples:
    premise hypothesis label
    Cc1cc(OCc2cn(Cc3ccc(F)cc3)nn2)c(C=C2C(=O)NC(=S)NC2=O)cc1Br Nn1c(SCc2cc(=O)c(O)co2)nnc1-c1ccc(Cl)cc1 2
    Cl.NC(Cc1ccc(=O)n(O)c1)C(=O)O Oc1cc(O)cc(CC(c2ccc(O)cc2O)c2c(O)cc(/C=C/c3ccc(O)cc3O)cc2O)c1 2
    Nc1ccc(C(=O)N2CCN(Cc3ccc(F)cc3)CC2)cc1 NC(=O)C@HNC(=O)OCc1cc(=O)c(O)co1 2
  • Loss: SoftmaxLoss

Evaluation Dataset

csv

  • Dataset: csv
  • Size: 21,187 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 11 tokens
    • mean: 39.53 tokens
    • max: 106 tokens
    • min: 11 tokens
    • mean: 39.69 tokens
    • max: 106 tokens
    • 0: ~51.60%
    • 2: ~48.40%
  • Samples:
    premise hypothesis label
    COc1ccc(/C=C/C(=O)NCCc2c[nH]c3ccc(O)cc23)cc1O O=C(NCc1ccc(O)cc1O)c1cc(O)c(O)c(O)c1 0
    COc1ccc(/C=C/C(=O)NCCc2c[nH]c3ccc(O)cc23)cc1O CC(C)CC@HC(=O)OCc1cc(=O)c(O)c[nH]1 2
    COc1c(O)cc2c(c1O)[C@@H]1OC@HC@@HC@H[C@H]1OC2=O COc1cc(OC)c(C2CCN(C)C2CO)c(O)c1-c1cc(-c2ccc(N+[O-])cc2Cl)[nH]n1 0
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • weight_decay: 1.0
  • num_train_epochs: 8
  • warmup_steps: 100
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 1.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 8
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 100
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0533 100 0.7573
0.1066 200 0.6717
0.1599 300 0.6325
0.2132 400 0.6192
0.2665 500 0.6102
0.3198 600 0.5817
0.3731 700 0.5813
0.4264 800 0.5718
0.4797 900 0.5619
0.5330 1000 0.5711
0.5864 1100 0.5602
0.6397 1200 0.5601
0.6930 1300 0.5498
0.7463 1400 0.5549
0.7996 1500 0.5677
0.8529 1600 0.5444
0.9062 1700 0.5491
0.9595 1800 0.5353
1.0128 1900 0.5528
1.0661 2000 0.5414
1.1194 2100 0.5451
1.1727 2200 0.5392
1.2260 2300 0.5343
1.2793 2400 0.5384
1.3326 2500 0.5289
1.3859 2600 0.5273
1.4392 2700 0.5279
1.4925 2800 0.5278
1.5458 2900 0.536
1.5991 3000 0.5286
1.6525 3100 0.5301
1.7058 3200 0.5302
1.7591 3300 0.5298
1.8124 3400 0.5231
1.8657 3500 0.5258
1.9190 3600 0.5212
1.9723 3700 0.5363
2.0256 3800 0.5207
2.0789 3900 0.5112
2.1322 4000 0.5237
2.1855 4100 0.5218
2.2388 4200 0.5182
2.2921 4300 0.5187
2.3454 4400 0.5213
2.3987 4500 0.5122
2.4520 4600 0.5128
2.5053 4700 0.5115
2.5586 4800 0.5271
2.6119 4900 0.5152
2.6652 5000 0.5131
2.7186 5100 0.5097
2.7719 5200 0.517
2.8252 5300 0.5157
2.8785 5400 0.5121
2.9318 5500 0.525
2.9851 5600 0.511
3.0384 5700 0.516
3.0917 5800 0.5191
3.1450 5900 0.5078
3.1983 6000 0.5108
3.2516 6100 0.5121
3.3049 6200 0.5031
3.3582 6300 0.5149
3.4115 6400 0.5041
3.4648 6500 0.5129
3.5181 6600 0.5057
3.5714 6700 0.508
3.6247 6800 0.5085
3.6780 6900 0.5133
3.7313 7000 0.509
3.7846 7100 0.5093
3.8380 7200 0.503
3.8913 7300 0.5017
3.9446 7400 0.5119
3.9979 7500 0.5073
4.0512 7600 0.5107
4.1045 7700 0.5104
4.1578 7800 0.5067
4.2111 7900 0.5164
4.2644 8000 0.4986
4.3177 8100 0.5128
4.3710 8200 0.506
4.4243 8300 0.5094
4.4776 8400 0.5018
4.5309 8500 0.5026
4.5842 8600 0.5069
4.6375 8700 0.5033
4.6908 8800 0.5054
4.7441 8900 0.4993
4.7974 9000 0.501
4.8507 9100 0.4963
4.9041 9200 0.4999
4.9574 9300 0.5126
5.0107 9400 0.498
5.0640 9500 0.5026
5.1173 9600 0.5131
5.1706 9700 0.5013
5.2239 9800 0.4994
5.2772 9900 0.4954
5.3305 10000 0.4972
5.3838 10100 0.5021
5.4371 10200 0.5024
5.4904 10300 0.5019
5.5437 10400 0.5021
5.5970 10500 0.5022
5.6503 10600 0.5009
5.7036 10700 0.5034
5.7569 10800 0.4943
5.8102 10900 0.5131
5.8635 11000 0.4962
5.9168 11100 0.4991
5.9701 11200 0.4976
6.0235 11300 0.5086
6.0768 11400 0.5012
6.1301 11500 0.5033
6.1834 11600 0.5006
6.2367 11700 0.5029
6.2900 11800 0.4883
6.3433 11900 0.5016
6.3966 12000 0.4932
6.4499 12100 0.4971
6.5032 12200 0.5025
6.5565 12300 0.5023
6.6098 12400 0.4985
6.6631 12500 0.4998
6.7164 12600 0.4997
6.7697 12700 0.498
6.8230 12800 0.5077
6.8763 12900 0.4966
6.9296 13000 0.4992
6.9829 13100 0.4906
7.0362 13200 0.5024
7.0896 13300 0.4981
7.1429 13400 0.4913
7.1962 13500 0.5022
7.2495 13600 0.4935
7.3028 13700 0.4994
7.3561 13800 0.4915
7.4094 13900 0.4973
7.4627 14000 0.4985
7.5160 14100 0.4958
7.5693 14200 0.5001
7.6226 14300 0.4955
7.6759 14400 0.4946
7.7292 14500 0.4889
7.7825 14600 0.494
7.8358 14700 0.5026
7.8891 14800 0.5017
7.9424 14900 0.4965
7.9957 15000 0.5036

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Jimmy-Ooi/Tyrisonase_test_model_800_8epoch_wd_1.00

Finetuned
(2663)
this model