BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-base-en-v1.5
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("joshuapb/fine-tuned-matryoshka-1000")
# Run inference
sentences = [
    '(1) Joint: join with step 2, where the few-shot examples are structured as (response, verification questions, verification answers); The drawback is that the original response is in the context, so the model may repeat similar hallucination.\n(2) 2-step: separate the verification planning and execution steps, such as the original response doesn’t impact\n(3) Factored: each verification question is answered separately. Say, if a long-form base generation results in multiple verification questions, we would answer each question one-by-one.\n(4) Factor+revise: adding a “cross-checking” step after factored verification execution, conditioned on both the baseline response and the verification question and answer. It detects inconsistency.\n\n\nFinal output: Generate the final, refined output. The output gets revised at this step if any inconsistency is discovered.',
    "In what ways does the 'Factor+revise' method enhance the reliability of responses when compared to the 'Joint' and '2-step' methods used for cross-checking?",
    'What obstacles arise when depending on the pre-training dataset in the context of extrinsic hallucination affecting model outputs?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Dataset: dim_768
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.8802
cosine_accuracy@3	0.9844
cosine_accuracy@5	0.9948
cosine_accuracy@10	0.9948
cosine_precision@1	0.8802
cosine_precision@3	0.3281
cosine_precision@5	0.199
cosine_precision@10	0.0995
cosine_recall@1	0.8802
cosine_recall@3	0.9844
cosine_recall@5	0.9948
cosine_recall@10	0.9948
cosine_ndcg@10	0.9495
cosine_mrr@10	0.9338
cosine_map@100	0.9342

Information Retrieval

Dataset: dim_512
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.8854
cosine_accuracy@3	0.9844
cosine_accuracy@5	0.9948
cosine_accuracy@10	1.0
cosine_precision@1	0.8854
cosine_precision@3	0.3281
cosine_precision@5	0.199
cosine_precision@10	0.1
cosine_recall@1	0.8854
cosine_recall@3	0.9844
cosine_recall@5	0.9948
cosine_recall@10	1.0
cosine_ndcg@10	0.9537
cosine_mrr@10	0.9378
cosine_map@100	0.9378

Information Retrieval

Dataset: dim_256
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.901
cosine_accuracy@3	0.9844
cosine_accuracy@5	1.0
cosine_accuracy@10	1.0
cosine_precision@1	0.901
cosine_precision@3	0.3281
cosine_precision@5	0.2
cosine_precision@10	0.1
cosine_recall@1	0.901
cosine_recall@3	0.9844
cosine_recall@5	1.0
cosine_recall@10	1.0
cosine_ndcg@10	0.9588
cosine_mrr@10	0.9446
cosine_map@100	0.9446

Information Retrieval

Dataset: dim_128
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.9062
cosine_accuracy@3	0.9844
cosine_accuracy@5	1.0
cosine_accuracy@10	1.0
cosine_precision@1	0.9062
cosine_precision@3	0.3281
cosine_precision@5	0.2
cosine_precision@10	0.1
cosine_recall@1	0.9062
cosine_recall@3	0.9844
cosine_recall@5	1.0
cosine_recall@10	1.0
cosine_ndcg@10	0.9609
cosine_mrr@10	0.9475
cosine_map@100	0.9475

Information Retrieval

Dataset: dim_64
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.8906
cosine_accuracy@3	0.9844
cosine_accuracy@5	1.0
cosine_accuracy@10	1.0
cosine_precision@1	0.8906
cosine_precision@3	0.3281
cosine_precision@5	0.2
cosine_precision@10	0.1
cosine_recall@1	0.8906
cosine_recall@3	0.9844
cosine_recall@5	1.0
cosine_recall@10	1.0
cosine_ndcg@10	0.9551
cosine_mrr@10	0.9397
cosine_map@100	0.9397

Training Details

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 5
lr_scheduler_type: cosine
warmup_ratio: 0.1
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 8
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	dim_128_cosine_map@100	dim_256_cosine_map@100	dim_512_cosine_map@100	dim_64_cosine_map@100	dim_768_cosine_map@100
0.04	5	4.9678	-	-	-	-	-
0.08	10	4.6482	-	-	-	-	-
0.12	15	5.0735	-	-	-	-	-
0.16	20	4.0336	-	-	-	-	-
0.2	25	3.7572	-	-	-	-	-
0.24	30	4.3054	-	-	-	-	-
0.28	35	2.6705	-	-	-	-	-
0.32	40	3.1929	-	-	-	-	-
0.36	45	3.1139	-	-	-	-	-
0.4	50	2.5219	-	-	-	-	-
0.44	55	3.1847	-	-	-	-	-
0.48	60	2.2306	-	-	-	-	-
0.52	65	2.251	-	-	-	-	-
0.56	70	2.2432	-	-	-	-	-
0.6	75	2.7462	-	-	-	-	-
0.64	80	2.9992	-	-	-	-	-
0.68	85	2.338	-	-	-	-	-
0.72	90	2.0169	-	-	-	-	-
0.76	95	1.257	-	-	-	-	-
0.8	100	1.5015	-	-	-	-	-
0.84	105	1.9198	-	-	-	-	-
0.88	110	2.2154	-	-	-	-	-
0.92	115	2.4026	-	-	-	-	-
0.96	120	1.911	-	-	-	-	-
1.0	125	2.079	0.9151	0.9098	0.9220	0.8788	0.9251
1.04	130	1.4704	-	-	-	-	-
1.08	135	0.7323	-	-	-	-	-
1.12	140	0.6308	-	-	-	-	-
1.16	145	0.4655	-	-	-	-	-
1.2	150	1.0186	-	-	-	-	-
1.24	155	1.1408	-	-	-	-	-
1.28	160	1.965	-	-	-	-	-
1.32	165	1.5987	-	-	-	-	-
1.3600	170	3.288	-	-	-	-	-
1.4	175	1.632	-	-	-	-	-
1.44	180	1.0376	-	-	-	-	-
1.48	185	0.9466	-	-	-	-	-
1.52	190	1.0106	-	-	-	-	-
1.56	195	1.4875	-	-	-	-	-
1.6	200	1.314	-	-	-	-	-
1.6400	205	1.3022	-	-	-	-	-
1.6800	210	1.5312	-	-	-	-	-
1.72	215	1.7982	-	-	-	-	-
1.76	220	1.7962	-	-	-	-	-
1.8	225	1.5788	-	-	-	-	-
1.8400	230	1.152	-	-	-	-	-
1.88	235	2.0556	-	-	-	-	-
1.92	240	1.3165	-	-	-	-	-
1.96	245	0.6941	-	-	-	-	-
2.0	250	1.2239	0.9404	0.944	0.9427	0.9327	0.9424
2.04	255	1.0423	-	-	-	-	-
2.08	260	0.8893	-	-	-	-	-
2.12	265	1.2859	-	-	-	-	-
2.16	270	1.4505	-	-	-	-	-
2.2	275	0.2728	-	-	-	-	-
2.24	280	0.6588	-	-	-	-	-
2.2800	285	0.8014	-	-	-	-	-
2.32	290	0.3053	-	-	-	-	-
2.36	295	1.4289	-	-	-	-	-
2.4	300	1.1458	-	-	-	-	-
2.44	305	0.6987	-	-	-	-	-
2.48	310	1.3389	-	-	-	-	-
2.52	315	1.2991	-	-	-	-	-
2.56	320	1.8088	-	-	-	-	-
2.6	325	0.4242	-	-	-	-	-
2.64	330	1.5873	-	-	-	-	-
2.68	335	1.3873	-	-	-	-	-
2.7200	340	1.4297	-	-	-	-	-
2.76	345	2.0637	-	-	-	-	-
2.8	350	1.1252	-	-	-	-	-
2.84	355	0.367	-	-	-	-	-
2.88	360	1.7606	-	-	-	-	-
2.92	365	1.196	-	-	-	-	-
2.96	370	1.8827	-	-	-	-	-
3.0	375	0.6822	0.9494	0.9479	0.9336	0.9414	0.9405
3.04	380	0.4954	-	-	-	-	-
3.08	385	0.1717	-	-	-	-	-
3.12	390	0.7435	-	-	-	-	-
3.16	395	1.4323	-	-	-	-	-
3.2	400	1.1207	-	-	-	-	-
3.24	405	1.9009	-	-	-	-	-
3.2800	410	1.6706	-	-	-	-	-
3.32	415	0.8378	-	-	-	-	-
3.36	420	1.0911	-	-	-	-	-
3.4	425	0.6565	-	-	-	-	-
3.44	430	1.0302	-	-	-	-	-
3.48	435	0.6425	-	-	-	-	-
3.52	440	1.1472	-	-	-	-	-
3.56	445	1.996	-	-	-	-	-
3.6	450	1.5308	-	-	-	-	-
3.64	455	0.7427	-	-	-	-	-
3.68	460	1.4596	-	-	-	-	-
3.7200	465	1.1984	-	-	-	-	-
3.76	470	0.7601	-	-	-	-	-
3.8	475	1.3544	-	-	-	-	-
3.84	480	1.6655	-	-	-	-	-
3.88	485	1.2596	-	-	-	-	-
3.92	490	0.9451	-	-	-	-	-
3.96	495	0.7079	-	-	-	-	-
4.0	500	1.3471	0.9453	0.9446	0.9404	0.9371	0.9335
4.04	505	0.4583	-	-	-	-	-
4.08	510	1.288	-	-	-	-	-
4.12	515	1.6946	-	-	-	-	-
4.16	520	1.1239	-	-	-	-	-
4.2	525	1.1026	-	-	-	-	-
4.24	530	1.4121	-	-	-	-	-
4.28	535	1.7113	-	-	-	-	-
4.32	540	0.8389	-	-	-	-	-
4.36	545	0.3117	-	-	-	-	-
4.4	550	0.3144	-	-	-	-	-
4.44	555	1.4694	-	-	-	-	-
4.48	560	1.3233	-	-	-	-	-
4.52	565	0.792	-	-	-	-	-
4.5600	570	0.4881	-	-	-	-	-
4.6	575	0.5097	-	-	-	-	-
4.64	580	1.6377	-	-	-	-	-
4.68	585	0.7273	-	-	-	-	-
4.72	590	1.5464	-	-	-	-	-
4.76	595	1.4392	-	-	-	-	-
4.8	600	1.4384	-	-	-	-	-
4.84	605	0.6375	-	-	-	-	-
4.88	610	1.0528	-	-	-	-	-
4.92	615	0.0276	-	-	-	-	-
4.96	620	0.9604	-	-	-	-	-
5.0	625	0.7219	0.9475	0.9446	0.9378	0.9397	0.9342

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.42.4
PyTorch: 2.3.1+cu121
Accelerate: 0.32.1
Datasets: 2.21.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 6

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for joshuapb/fine-tuned-matryoshka-1000

Base model

BAAI/bge-base-en-v1.5

Finetuned

(430)

this model

Evaluation results

Cosine Accuracy@1 on dim 768
self-reported

0.880
Cosine Accuracy@3 on dim 768
self-reported

0.984
Cosine Accuracy@5 on dim 768
self-reported

0.995
Cosine Accuracy@10 on dim 768
self-reported

0.995
Cosine Precision@1 on dim 768
self-reported

0.880
Cosine Precision@3 on dim 768
self-reported

0.328
Cosine Precision@5 on dim 768
self-reported

0.199
Cosine Precision@10 on dim 768
self-reported

0.099
Cosine Recall@1 on dim 768
self-reported

0.880
Cosine Recall@3 on dim 768
self-reported

0.984