SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/finetuned-bge-m3-base-en")
# Run inference
sentences = [
    'Shell integration: bash and zsh don\'t serialize \\n and ; characters Part of https://github.com/microsoft/vscode/issues/155639\r\n\r\nRepro:\r\n\r\n1. Open a bash or zsh session\r\n2. Run:\r\n  ```sh\r\n  echo "a\r\n  … b"\r\n  ```\r\n  ![image](https://github.com/microsoft/vscode/assets/2193314/95ce755b-cec5-4da1-8791-75923a088efa)\r\n3. ctrl+alt+r to run recent command, select the last command, 🐛 it\'s run without the new line\r\n  ![image](https://github.com/microsoft/vscode/assets/2193314/98e81851-3a68-422c-94e8-e3e4bb21ef53)\r\n',
    'TreeView state out of sync Testing #117304\r\n\r\nRepro: Not Sure\r\n\r\nTest state shows passed in file but still running in tree view.\r\n\r\n![image](https://user-images.githubusercontent.com/6561887/108932683-a082c600-75fe-11eb-8ccd-afa5d0e2a91e.png)\r\n',
    'Setting icon and color in createTerminal API no longer works correctly See https://github.com/fabiospampinato/vscode-terminals/issues/77\r\n\r\nLooks like the default tab color/icon change probably regressed this.\r\n\r\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4264, 0.4315],
#         [0.4264, 1.0000, 0.4278],
#         [0.4315, 0.4278, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 1.0

Triplet

Metric Value
cosine_accuracy 0.9524

Training Details

Training Dataset

Unnamed Dataset

  • Size: 336 training samples
  • Columns: texts and label
  • Approximate statistics based on the first 336 samples:
    texts label
    type string int
    details
    • min: 12 tokens
    • mean: 340.9 tokens
    • max: 996 tokens
    • 0: ~1.19%
    • 5: ~0.60%
    • 7: ~0.60%
    • 8: ~0.60%
    • 9: ~0.60%
    • 10: ~0.60%
    • 12: ~0.60%
    • 13: ~0.60%
    • 15: ~0.60%
    • 16: ~0.60%
    • 17: ~0.60%
    • 19: ~0.60%
    • 20: ~0.60%
    • 21: ~0.60%
    • 25: ~0.60%
    • 26: ~0.60%
    • 27: ~0.60%
    • 28: ~0.60%
    • 29: ~0.89%
    • 30: ~0.89%
    • 31: ~2.08%
    • 33: ~1.49%
    • 34: ~0.60%
    • 36: ~0.60%
    • 37: ~0.89%
    • 38: ~0.60%
    • 42: ~0.89%
    • 43: ~1.19%
    • 45: ~0.60%
    • 47: ~0.60%
    • 48: ~0.60%
    • 49: ~0.60%
    • 50: ~0.60%
    • 51: ~0.60%
    • 52: ~1.19%
    • 53: ~0.60%
    • 55: ~0.60%
    • 57: ~1.19%
    • 59: ~0.60%
    • 60: ~0.60%
    • 61: ~0.60%
    • 62: ~0.60%
    • 63: ~0.60%
    • 64: ~0.89%
    • 65: ~0.60%
    • 67: ~0.60%
    • 68: ~0.89%
    • 69: ~0.60%
    • 70: ~0.60%
    • 71: ~0.60%
    • 72: ~0.60%
    • 73: ~0.60%
    • 74: ~0.60%
    • 75: ~0.60%
    • 77: ~0.60%
    • 78: ~0.60%
    • 82: ~0.89%
    • 84: ~0.60%
    • 85: ~0.60%
    • 86: ~0.60%
    • 87: ~0.60%
    • 88: ~0.60%
    • 89: ~0.60%
    • 90: ~0.60%
    • 91: ~0.60%
    • 92: ~0.89%
    • 93: ~0.60%
    • 95: ~0.60%
    • 96: ~0.60%
    • 97: ~0.60%
    • 98: ~0.60%
    • 99: ~0.60%
    • 100: ~0.60%
    • 101: ~0.60%
    • 103: ~0.60%
    • 105: ~0.60%
    • 108: ~0.60%
    • 109: ~0.60%
    • 110: ~0.89%
    • 113: ~0.60%
    • 115: ~0.60%
    • 116: ~0.60%
    • 117: ~0.60%
    • 118: ~0.60%
    • 119: ~0.60%
    • 120: ~1.49%
    • 121: ~0.60%
    • 126: ~0.60%
    • 127: ~0.60%
    • 130: ~0.60%
    • 131: ~0.60%
    • 134: ~0.60%
    • 135: ~0.60%
    • 136: ~0.60%
    • 137: ~0.60%
    • 138: ~0.60%
    • 139: ~0.60%
    • 140: ~0.60%
    • 143: ~0.60%
    • 144: ~0.60%
    • 147: ~0.60%
    • 148: ~0.60%
    • 149: ~0.60%
    • 150: ~0.60%
    • 152: ~0.60%
    • 153: ~0.60%
    • 154: ~0.60%
    • 155: ~0.60%
    • 157: ~0.60%
    • 158: ~0.60%
    • 159: ~0.60%
    • 160: ~0.60%
    • 161: ~0.60%
    • 163: ~0.60%
    • 164: ~0.60%
    • 165: ~0.60%
    • 168: ~0.60%
    • 170: ~0.60%
    • 171: ~0.89%
    • 174: ~0.89%
    • 175: ~0.60%
    • 176: ~0.60%
    • 177: ~0.60%
    • 178: ~0.60%
    • 179: ~0.60%
    • 180: ~0.60%
    • 181: ~0.60%
    • 182: ~0.60%
    • 183: ~0.60%
    • 185: ~0.60%
    • 186: ~0.60%
    • 187: ~0.60%
    • 192: ~0.60%
    • 193: ~0.60%
    • 194: ~0.60%
    • 195: ~0.89%
    • 196: ~0.60%
    • 197: ~0.60%
    • 198: ~0.60%
    • 199: ~0.60%
    • 202: ~0.60%
    • 204: ~0.60%
    • 205: ~0.89%
    • 207: ~0.60%
    • 208: ~0.60%
    • 209: ~0.60%
    • 210: ~0.60%
    • 213: ~0.60%
    • 214: ~0.60%
    • 215: ~0.60%
    • 216: ~0.60%
    • 218: ~0.60%
  • Samples:
    texts label
    Branch list is sometimes out of order

    Type: Bug


    1. Open a workspace

    2. Quickly open the branch picker and type main


    Bug

    The first time you do this, sometimes you end up with an unordered list:


    Image



    The correct order shows up when you keep start typing or try doing this again:



    Image






    VS Code version: Code - Insiders 1.91.0-insider (Universal) (0354163c1c66b950b0762364f5b4cd37937b624a, 2024-06-26T10:12:33.304Z)

    OS version: Darwin arm64 23.5.0

    Modes:



    System Info


    |Item|Value|

    |---|---|

    |CPUs|Apple M2 Max (12 x 2400)|

    |GPU Status|2d_canvas: unavailable_software
    canvas_oop_rasterization: disabled_off
    direct_rendering_display_compositor: disabled_off_ok
    gpu_compositing: disabled_software
    multiple_raster_threads: enabled_on
    ope...
    218
    Git Branch Picker Race Condition If I paste the branch too quickly and then press enter, it does not switch to it, but creates a new branch.

    This breaks muscle memory, as it works when you do it slowly.


    Code_-_Insiders_peF36XR6nS



    Once loading completes, it should select the branch again.
    218
    links aren't discoverable to screen reader users in markdown documents They're only discoverable via visual distinction and the action that can be taken (IE opening them) is only indicated in the tooltip AFAICT.


    https://github.com/microsoft/vscode/assets/29464607/09d28b81-c2cc-4477-b1fc-7b1de1baae74


    177
  • Loss: BatchSemiHardTripletLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 70 evaluation samples
  • Columns: texts and label
  • Approximate statistics based on the first 70 samples:
    texts label
    type string int
    details
    • min: 58 tokens
    • mean: 303.57 tokens
    • max: 864 tokens
    • 1: ~2.86%
    • 2: ~2.86%
    • 6: ~2.86%
    • 11: ~5.71%
    • 14: ~2.86%
    • 23: ~2.86%
    • 32: ~5.71%
    • 35: ~2.86%
    • 39: ~2.86%
    • 40: ~2.86%
    • 46: ~2.86%
    • 54: ~2.86%
    • 83: ~2.86%
    • 102: ~2.86%
    • 104: ~4.29%
    • 111: ~2.86%
    • 122: ~2.86%
    • 123: ~2.86%
    • 125: ~2.86%
    • 145: ~2.86%
    • 146: ~2.86%
    • 162: ~2.86%
    • 166: ~2.86%
    • 169: ~2.86%
    • 184: ~2.86%
    • 188: ~2.86%
    • 190: ~2.86%
    • 200: ~2.86%
    • 201: ~4.29%
    • 203: ~2.86%
    • 206: ~2.86%
    • 217: ~2.86%
  • Samples:
    texts label
    Ctrl+I stopped working after first hold+talk+release Testing #213355


    Screencast shows that it seems to be in the wrong context and is trying to stop the session?


    Recording 2024-05-28 at 14 05 54


    Repro was just asking "Testing testing" and then trying to ask something else
    217
    Ctrl + I does not work when chat input field has focus Testing #213355


    Ctrl + I works in the editor and when I hold it, I get into speech mode. But when the chat input field (panel or inline chat) already has focus, Ctrl + I does not work.


    (Connected to Windows through Remote Desktop in case that matters.)

    217
    Terminal renaming not functioning as expected in editor area







    Does this issue occur when all extensions are disabled?: Yes



Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaa961/finetuned-bge-m3-base-en

Base model

BAAI/bge-m3
Finetuned
(348)
this model

Evaluation results