SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the ๐ค Hub
model = SentenceTransformer("zihoo/all-MiniLM-L6-v2-WMNLI-10epoch")
# Run inference
sentences = [
'I accept my mistakes as part of my learning process.',
'I fully concentrate on client communications.',
'I remain conscious of my work-life balance.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 8,000 training samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 8 tokens
- mean: 11.65 tokens
- max: 17 tokens
- min: 8 tokens
- mean: 11.77 tokens
- max: 17 tokens
- 0: ~25.80%
- 1: ~36.80%
- 2: ~37.40%
- Samples:
sentence1 sentence2 label I focus on one work task at a time.I keep my attention on the task despite office chatter.0I worry they might spread false rumors about meI return focus to my work when my mind drifts.2I stay aware of my posture when working at a desk.I pay attention to non-verbal cues from others.0 - Loss:
SoftmaxLoss
Evaluation Dataset
Unnamed Dataset
- Size: 2,000 evaluation samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 8 tokens
- mean: 11.68 tokens
- max: 17 tokens
- min: 8 tokens
- mean: 11.79 tokens
- max: 17 tokens
- 0: ~24.40%
- 1: ~36.30%
- 2: ~39.30%
- Samples:
sentence1 sentence2 label I stay conscious of my emotional responses to work challenges.I pay close attention to verbal instructions.1I accept varied perspectives from my team graciously.I accept team dynamics as they naturally evolve.0I accept technology upgrades with an open heart.I am mindful of my facial expressions during discussions.1 - Loss:
SoftmaxLoss
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 9warmup_ratio: 0.01
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 9max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.01warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 0.4 | 100 | 0.9566 | 0.8119 |
| 0.8 | 200 | 0.7499 | 0.6819 |
| 1.2 | 300 | 0.6541 | 0.5908 |
| 1.6 | 400 | 0.5759 | 0.5258 |
| 2.0 | 500 | 0.5112 | 0.4811 |
| 2.4 | 600 | 0.4659 | 0.4377 |
| 2.8 | 700 | 0.44 | 0.4020 |
| 3.2 | 800 | 0.4112 | 0.3721 |
| 3.6 | 900 | 0.3751 | 0.3462 |
| 4.0 | 1000 | 0.3517 | 0.3233 |
| 4.4 | 1100 | 0.3232 | 0.3033 |
| 4.8 | 1200 | 0.3189 | 0.2871 |
| 5.2 | 1300 | 0.2961 | 0.2711 |
| 5.6 | 1400 | 0.2865 | 0.2597 |
| 6.0 | 1500 | 0.2715 | 0.2499 |
| 6.4 | 1600 | 0.2639 | 0.2403 |
| 6.8 | 1700 | 0.2528 | 0.2339 |
| 7.2 | 1800 | 0.2482 | 0.2277 |
| 7.6 | 1900 | 0.2406 | 0.2236 |
| 8.0 | 2000 | 0.2403 | 0.2207 |
| 8.4 | 2100 | 0.2382 | 0.2184 |
| 8.8 | 2200 | 0.2314 | 0.2166 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.3.1
- Transformers: 4.47.1
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers and SoftmaxLoss
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- -
Model tree for zihoo/all-MiniLM-L6-v2-WMNLI-10epoch
Base model
sentence-transformers/all-MiniLM-L6-v2