SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/embed-all-MiniLM-L6-v2-squad-2-epochs")
# Run inference
sentences = [
'What is the name of the cup that some Catholics think is the Grail?',
"On 9 July 2006, during Mass at Valencia's Cathedral, Our Lady of the Forsaken Basilica, Pope Benedict XVI used, at the World Day of Families, the Santo Caliz, a 1st-century Middle-Eastern artifact that some Catholics believe is the Holy Grail. It was supposedly brought to that church by Emperor Valerian in the 3rd century, after having been brought by St. Peter to Rome from Jerusalem. The Santo Caliz (Holy Chalice) is a simple, small stone cup. Its base was added in Medieval Times and consists of fine gold, alabaster and gem stones.",
'The quail is a small to medium-sized, cryptically coloured bird. In its natural environment, it is found in bushy places, in rough grassland, among agricultural crops, and in other places with dense cover. It feeds on seeds, insects, and other small invertebrates. Being a largely ground-dwelling, gregarious bird, domestication of the quail was not difficult, although many of its wild instincts are retained in captivity. It was known to the Egyptians long before the arrival of chickens and was depicted in hieroglyphs from 2575 BC. It migrated across Egypt in vast flocks and the birds could sometimes be picked up off the ground by hand. These were the common quail (Coturnix coturnix), but modern domesticated flocks are mostly of Japanese quail (Coturnix japonica) which was probably domesticated as early as the 11th century AD in Japan. They were originally kept as songbirds, and they are thought to have been regularly used in song contests.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Dataset:
gooqa-dev - Evaluated with
TripletEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.4078 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 44,286 training samples
- Columns:
question,context, andnegative - Approximate statistics based on the first 1000 samples:
question context negative type string string string details - min: 6 tokens
- mean: 14.48 tokens
- max: 39 tokens
- min: 28 tokens
- mean: 147.46 tokens
- max: 256 tokens
- min: 29 tokens
- mean: 147.85 tokens
- max: 256 tokens
- Samples:
question context negative What are the two cycles of the Greek folk song?Along with the Byzantine (Church) chant and music, the Greek people also cultivated the Greek folk song which is divided into two cycles, the akritic and klephtic. The akritic was created between the 9th and 10th centuries and expressed the life and struggles of the akrites (frontier guards) of the Byzantine empire, the most well known being the stories associated with Digenes Akritas. The klephtic cycle came into being between the late Byzantine period and the start of the Greek War of Independence. The klephtic cycle, together with historical songs, paraloghes (narrative song or ballad), love songs, mantinades, wedding songs, songs of exile and dirges express the life of the Greeks. There is a unity between the Greek people's struggles for freedom, their joys and sorrow and attitudes towards love and death.The Hellenic languages or Greek language are widely spoken in Greece and in the Greek part of Cyprus. Additionally, other varieties of Greek are spoken in small communities in parts of other European counties.What material is within a wrestling ring?Matches are held within a wrestling ring, an elevated square canvas mat with posts on each corner. A cloth apron hangs over the edges of the ring. Three horizontal ropes or cables surround the ring, suspended with turnbuckles which are connected to the posts. For safety, the ropes are padded at the turnbuckles and cushioned mats surround the floor outside the ring. Guardrails or a similar barrier enclose this area from the audience. Wrestlers are generally expected to stay within the confines of the ring, though matches sometimes end up outside the ring, and even in the audience, to add excitement.Many modern specialty matches have been devised, with unique winning conditions. The most common of these is the ladder match. In the basic ladder match, the wrestlers or teams of wrestlers must climb a ladder to obtain a prize that is hoisted above the ring. The key to winning this match is that the wrestler or team of wrestlers must try to incapacitate each other long enough for one wrestler to climb the ladder and secure that prize for their team. As a result, the ladder can be used as a weapon. The prizes include – but are not limited to any given championship belt (the traditional prize), a document granting the winner the right to a future title shot, or any document that matters to the wrestlers involved in the match (such as one granting the winner a cash prize). Another common specialty match is known as the battle royal. In a battle royal, all the wrestlers enter the ring to the point that there are 20-30 wrestlers in the ring at one time. When the match begins, the simple ob...What is the Hebrew Bible?The Hebrew Bible, a religious interpretation of the traditions and early national history of the Jews, established the first of the Abrahamic religions, which are now practiced by 54% of the world. Judaism guides its adherents in both practice and belief, and has been called not only a religion, but also a "way of life," which has made drawing a clear distinction between Judaism, Jewish culture, and Jewish identity rather difficult. Throughout history, in eras and places as diverse as the ancient Hellenic world, in Europe before and after The Age of Enlightenment (see Haskalah), in Islamic Spain and Portugal, in North Africa and the Middle East, India, China, or the contemporary United States and Israel, cultural phenomena have developed that are in some sense characteristically Jewish without being at all specifically religious. Some factors in this come from within Judaism, others from the interaction of Jews or specific communities of Jews with their surroundings, others from the in...Israel's diverse culture stems from the diversity of its population: Jews from diaspora communities around the world have brought their cultural and religious traditions back with them, creating a melting pot of Jewish customs and beliefs. Israel is the only country in the world where life revolves around the Hebrew calendar. Work and school holidays are determined by the Jewish holidays, and the official day of rest is Saturday, the Jewish Sabbath. Israel's substantial Arab minority has also left its imprint on Israeli culture in such spheres as architecture, music, and cuisine. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 5,000 evaluation samples
- Columns:
question,context, andnegative_1 - Approximate statistics based on the first 1000 samples:
question context negative_1 type string string string details - min: 6 tokens
- mean: 14.47 tokens
- max: 36 tokens
- min: 28 tokens
- mean: 147.36 tokens
- max: 256 tokens
- min: 28 tokens
- mean: 146.82 tokens
- max: 256 tokens
- Samples:
question context negative_1 What state supported DST because it wanted to sell more potatoes?The history of time in the United States includes DST during both world wars, but no standardization of peacetime DST until 1966. In May 1965, for two weeks, St. Paul, Minnesota and Minneapolis, Minnesota were on different times, when the capital city decided to join most of the nation by starting Daylight Saving Time while Minneapolis opted to follow the later date set by state law. In the mid-1980s, Clorox (parent of Kingsford Charcoal) and 7-Eleven provided the primary funding for the Daylight Saving Time Coalition behind the 1987 extension to US DST, and both Idaho senators voted for it based on the premise that during DST fast-food restaurants sell more French fries, which are made from Idaho potatoes.The history of time in the United States includes DST during both world wars, but no standardization of peacetime DST until 1966. In May 1965, for two weeks, St. Paul, Minnesota and Minneapolis, Minnesota were on different times, when the capital city decided to join most of the nation by starting Daylight Saving Time while Minneapolis opted to follow the later date set by state law. In the mid-1980s, Clorox (parent of Kingsford Charcoal) and 7-Eleven provided the primary funding for the Daylight Saving Time Coalition behind the 1987 extension to US DST, and both Idaho senators voted for it based on the premise that during DST fast-food restaurants sell more French fries, which are made from Idaho potatoes.Who dealt with the design faults of the palace?Buckingham Palace finally became the principal royal residence in 1837, on the accession of Queen Victoria, who was the first monarch to reside there; her predecessor William IV had died before its completion. While the state rooms were a riot of gilt and colour, the necessities of the new palace were somewhat less luxurious. For one thing, it was reported the chimneys smoked so much that the fires had to be allowed to die down, and consequently the court shivered in icy magnificence. Ventilation was so bad that the interior smelled, and when a decision was taken to install gas lamps, there was a serious worry about the build-up of gas on the lower floors. It was also said that staff were lax and lazy and the palace was dirty. Following the queen's marriage in 1840, her husband, Prince Albert, concerned himself with a reorganisation of the household offices and staff, and with the design faults of the palace. The problems were all rectified by the close of 1840. However, the builders w...Buckingham Palace finally became the principal royal residence in 1837, on the accession of Queen Victoria, who was the first monarch to reside there; her predecessor William IV had died before its completion. While the state rooms were a riot of gilt and colour, the necessities of the new palace were somewhat less luxurious. For one thing, it was reported the chimneys smoked so much that the fires had to be allowed to die down, and consequently the court shivered in icy magnificence. Ventilation was so bad that the interior smelled, and when a decision was taken to install gas lamps, there was a serious worry about the build-up of gas on the lower floors. It was also said that staff were lax and lazy and the palace was dirty. Following the queen's marriage in 1840, her husband, Prince Albert, concerned himself with a reorganisation of the household offices and staff, and with the design faults of the palace. The problems were all rectified by the close of 1840. However, the builders w...On what date did IndyMac fail?The first visible institution to run into trouble in the United States was the Southern California–based IndyMac, a spin-off of Countrywide Financial. Before its failure, IndyMac Bank was the largest savings and loan association in the Los Angeles market and the seventh largest mortgage originator in the United States. The failure of IndyMac Bank on July 11, 2008, was the fourth largest bank failure in United States history up until the crisis precipitated even larger failures, and the second largest failure of a regulated thrift. IndyMac Bank's parent corporation was IndyMac Bancorp until the FDIC seized IndyMac Bank. IndyMac Bancorp filed for Chapter 7 bankruptcy in July 2008.The first visible institution to run into trouble in the United States was the Southern California–based IndyMac, a spin-off of Countrywide Financial. Before its failure, IndyMac Bank was the largest savings and loan association in the Los Angeles market and the seventh largest mortgage originator in the United States. The failure of IndyMac Bank on July 11, 2008, was the fourth largest bank failure in United States history up until the crisis precipitated even larger failures, and the second largest failure of a regulated thrift. IndyMac Bank's parent corporation was IndyMac Bancorp until the FDIC seized IndyMac Bank. IndyMac Bancorp filed for Chapter 7 bankruptcy in July 2008. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 128per_device_eval_batch_size: 128num_train_epochs: 2warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 128per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | gooqa-dev_cosine_accuracy |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.3266 |
| 0.2890 | 100 | 0.4285 | 0.7828 | 0.3894 |
| 0.5780 | 200 | 0.3895 | 0.7691 | 0.4006 |
| 0.8671 | 300 | 0.3744 | 0.7545 | 0.3992 |
| 1.1561 | 400 | 0.3157 | 0.7396 | 0.4070 |
| 1.4451 | 500 | 0.2715 | 0.7422 | 0.4074 |
| 1.7341 | 600 | 0.2672 | 0.7405 | 0.4080 |
| -1 | -1 | - | - | 0.4078 |
Framework Versions
- Python: 3.11.0
- Sentence Transformers: 4.0.1
- Transformers: 4.50.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 5
Model tree for ayushexel/embed-all-MiniLM-L6-v2-squad-2-epochs
Base model
sentence-transformers/all-MiniLM-L6-v2Evaluation results
- Cosine Accuracy on gooqa devself-reported0.408