BGE base Financial Matryoshka
	
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
	
		
	
	
		Model Details
	
	
		
	
	
		Model Description
	
- Model Type: Sentence Transformer
 
- Base model: BAAI/bge-base-en-v1.5 
 
- Maximum Sequence Length: 512 tokens
 
- Output Dimensionality: 768 tokens
 
- Similarity Function: Cosine Similarity
 
- Language: en
 
- License: apache-2.0
 
	
		
	
	
		Model Sources
	
	
		
	
	
		Full Model Architecture
	
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
	
		
	
	
		Usage
	
	
		
	
	
		Direct Usage (Sentence Transformers)
	
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("xiaofengzi/bge-base-financial-matryoshka")
sentences = [
    'What is basic earnings per share based on?',
    'How is basic net income per share calculated?',
    "How did NIKE's fiscal 2023 revenue compare to its fiscal 2022 revenue?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
	
		
	
	
		Evaluation
	
	
		
	
	
		Metrics
	
	
		
	
	
		Information Retrieval
	
	
		
| Metric | 
Value | 
		
| cosine_accuracy@1 | 
0.6829 | 
| cosine_accuracy@3 | 
0.82 | 
| cosine_accuracy@5 | 
0.8629 | 
| cosine_accuracy@10 | 
0.91 | 
| cosine_precision@1 | 
0.6829 | 
| cosine_precision@3 | 
0.2733 | 
| cosine_precision@5 | 
0.1726 | 
| cosine_precision@10 | 
0.091 | 
| cosine_recall@1 | 
0.6829 | 
| cosine_recall@3 | 
0.82 | 
| cosine_recall@5 | 
0.8629 | 
| cosine_recall@10 | 
0.91 | 
| cosine_ndcg@10 | 
0.7971 | 
| cosine_mrr@10 | 
0.7608 | 
| cosine_map@100 | 
0.7644 | 
	
 
	
		
	
	
		Information Retrieval
	
	
		
| Metric | 
Value | 
		
| cosine_accuracy@1 | 
0.68 | 
| cosine_accuracy@3 | 
0.8186 | 
| cosine_accuracy@5 | 
0.8657 | 
| cosine_accuracy@10 | 
0.9114 | 
| cosine_precision@1 | 
0.68 | 
| cosine_precision@3 | 
0.2729 | 
| cosine_precision@5 | 
0.1731 | 
| cosine_precision@10 | 
0.0911 | 
| cosine_recall@1 | 
0.68 | 
| cosine_recall@3 | 
0.8186 | 
| cosine_recall@5 | 
0.8657 | 
| cosine_recall@10 | 
0.9114 | 
| cosine_ndcg@10 | 
0.7955 | 
| cosine_mrr@10 | 
0.7584 | 
| cosine_map@100 | 
0.7618 | 
	
 
	
		
	
	
		Information Retrieval
	
	
		
| Metric | 
Value | 
		
| cosine_accuracy@1 | 
0.6786 | 
| cosine_accuracy@3 | 
0.81 | 
| cosine_accuracy@5 | 
0.8571 | 
| cosine_accuracy@10 | 
0.9143 | 
| cosine_precision@1 | 
0.6786 | 
| cosine_precision@3 | 
0.27 | 
| cosine_precision@5 | 
0.1714 | 
| cosine_precision@10 | 
0.0914 | 
| cosine_recall@1 | 
0.6786 | 
| cosine_recall@3 | 
0.81 | 
| cosine_recall@5 | 
0.8571 | 
| cosine_recall@10 | 
0.9143 | 
| cosine_ndcg@10 | 
0.7955 | 
| cosine_mrr@10 | 
0.7578 | 
| cosine_map@100 | 
0.7606 | 
	
 
	
		
	
	
		Information Retrieval
	
	
		
| Metric | 
Value | 
		
| cosine_accuracy@1 | 
0.6571 | 
| cosine_accuracy@3 | 
0.7957 | 
| cosine_accuracy@5 | 
0.8471 | 
| cosine_accuracy@10 | 
0.9 | 
| cosine_precision@1 | 
0.6571 | 
| cosine_precision@3 | 
0.2652 | 
| cosine_precision@5 | 
0.1694 | 
| cosine_precision@10 | 
0.09 | 
| cosine_recall@1 | 
0.6571 | 
| cosine_recall@3 | 
0.7957 | 
| cosine_recall@5 | 
0.8471 | 
| cosine_recall@10 | 
0.9 | 
| cosine_ndcg@10 | 
0.7771 | 
| cosine_mrr@10 | 
0.7379 | 
| cosine_map@100 | 
0.742 | 
	
 
	
		
	
	
		Information Retrieval
	
	
		
| Metric | 
Value | 
		
| cosine_accuracy@1 | 
0.6343 | 
| cosine_accuracy@3 | 
0.7643 | 
| cosine_accuracy@5 | 
0.8057 | 
| cosine_accuracy@10 | 
0.8657 | 
| cosine_precision@1 | 
0.6343 | 
| cosine_precision@3 | 
0.2548 | 
| cosine_precision@5 | 
0.1611 | 
| cosine_precision@10 | 
0.0866 | 
| cosine_recall@1 | 
0.6343 | 
| cosine_recall@3 | 
0.7643 | 
| cosine_recall@5 | 
0.8057 | 
| cosine_recall@10 | 
0.8657 | 
| cosine_ndcg@10 | 
0.7467 | 
| cosine_mrr@10 | 
0.7091 | 
| cosine_map@100 | 
0.7142 | 
	
 
	
		
	
	
		Training Details
	
	
		
	
	
		Training Dataset
	
	
		
	
	
		Unnamed Dataset
	
- Size: 6,300 training samples
 
- Columns: 
anchor and positive 
- Approximate statistics based on the first 1000 samples:
	
		
 | 
anchor | 
positive | 
		
| type | 
string | 
string | 
| details | 
- min: 7 tokens
 - mean: 20.44 tokens
 - max: 51 tokens
 
  | 
- min: 6 tokens
 - mean: 47.22 tokens
 - max: 512 tokens
 
  | 
	
 
 
- Samples:
	
		
| anchor | 
positive | 
		
How did the Energy & Transportation segment's sales and profit change in 2023? | 
Energy & Transportation's total sales were $28.001 billion in 2023, an increase of $4.249 billion, or 18... and profit was $4.936 billion in 2023, an increase of $1.627 billion, or 49 percent... | 
In which segments were acquisitions made in 2022? | 
During 2022, acquisitions occurred in Workforce Solutions and USIS operating segments, and the International segment. | 
What are the contents found on pages 163 to 309 in the document? | 
The Consolidated Financial Statements, together with the Notes thereto and the report thereon dated February 16, 2024, of PricewaterhouseCoopers LLP, appear on pages 163–309. | 
	
 
 
- Loss: 
MatryoshkaLoss with these parameters:{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}
 
	
		
	
	
		Training Hyperparameters
	
	
		
	
	
		Non-Default Hyperparameters
	
eval_strategy: epoch 
per_device_train_batch_size: 32 
per_device_eval_batch_size: 16 
gradient_accumulation_steps: 16 
learning_rate: 2e-05 
num_train_epochs: 4 
lr_scheduler_type: cosine 
warmup_ratio: 0.1 
bf16: True 
tf32: True 
load_best_model_at_end: True 
optim: adamw_torch_fused 
batch_sampler: no_duplicates 
	
		
	
	
		All Hyperparameters
	
Click to expand
overwrite_output_dir: False 
do_predict: False 
eval_strategy: epoch 
prediction_loss_only: True 
per_device_train_batch_size: 32 
per_device_eval_batch_size: 16 
per_gpu_train_batch_size: None 
per_gpu_eval_batch_size: None 
gradient_accumulation_steps: 16 
eval_accumulation_steps: None 
learning_rate: 2e-05 
weight_decay: 0.0 
adam_beta1: 0.9 
adam_beta2: 0.999 
adam_epsilon: 1e-08 
max_grad_norm: 1.0 
num_train_epochs: 4 
max_steps: -1 
lr_scheduler_type: cosine 
lr_scheduler_kwargs: {} 
warmup_ratio: 0.1 
warmup_steps: 0 
log_level: passive 
log_level_replica: warning 
log_on_each_node: True 
logging_nan_inf_filter: True 
save_safetensors: True 
save_on_each_node: False 
save_only_model: False 
restore_callback_states_from_checkpoint: False 
no_cuda: False 
use_cpu: False 
use_mps_device: False 
seed: 42 
data_seed: None 
jit_mode_eval: False 
use_ipex: False 
bf16: True 
fp16: False 
fp16_opt_level: O1 
half_precision_backend: auto 
bf16_full_eval: False 
fp16_full_eval: False 
tf32: True 
local_rank: 0 
ddp_backend: None 
tpu_num_cores: None 
tpu_metrics_debug: False 
debug: [] 
dataloader_drop_last: False 
dataloader_num_workers: 0 
dataloader_prefetch_factor: None 
past_index: -1 
disable_tqdm: False 
remove_unused_columns: True 
label_names: None 
load_best_model_at_end: True 
ignore_data_skip: False 
fsdp: [] 
fsdp_min_num_params: 0 
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} 
fsdp_transformer_layer_cls_to_wrap: None 
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} 
deepspeed: None 
label_smoothing_factor: 0.0 
optim: adamw_torch_fused 
optim_args: None 
adafactor: False 
group_by_length: False 
length_column_name: length 
ddp_find_unused_parameters: None 
ddp_bucket_cap_mb: None 
ddp_broadcast_buffers: False 
dataloader_pin_memory: True 
dataloader_persistent_workers: False 
skip_memory_metrics: True 
use_legacy_prediction_loop: False 
push_to_hub: False 
resume_from_checkpoint: None 
hub_model_id: None 
hub_strategy: every_save 
hub_private_repo: False 
hub_always_push: False 
gradient_checkpointing: False 
gradient_checkpointing_kwargs: None 
include_inputs_for_metrics: False 
eval_do_concat_batches: True 
fp16_backend: auto 
push_to_hub_model_id: None 
push_to_hub_organization: None 
mp_parameters:  
auto_find_batch_size: False 
full_determinism: False 
torchdynamo: None 
ray_scope: last 
ddp_timeout: 1800 
torch_compile: False 
torch_compile_backend: None 
torch_compile_mode: None 
dispatch_batches: None 
split_batches: None 
include_tokens_per_second: False 
include_num_input_tokens_seen: False 
neftune_noise_alpha: None 
optim_target_modules: None 
batch_eval_metrics: False 
batch_sampler: no_duplicates 
multi_dataset_batch_sampler: proportional 
 
	
		
	
	
		Training Logs
	
	
		
| Epoch | 
Step | 
Training Loss | 
dim_128_cosine_map@100 | 
dim_256_cosine_map@100 | 
dim_512_cosine_map@100 | 
dim_64_cosine_map@100 | 
dim_768_cosine_map@100 | 
		
| 0.8122 | 
10 | 
1.5644 | 
- | 
- | 
- | 
- | 
- | 
| 0.9746 | 
12 | 
- | 
0.7186 | 
0.7399 | 
0.7414 | 
0.6757 | 
0.7445 | 
| 1.6244 | 
20 | 
0.6502 | 
- | 
- | 
- | 
- | 
- | 
| 1.9492 | 
24 | 
- | 
0.7379 | 
0.7544 | 
0.7573 | 
0.7069 | 
0.7600 | 
| 2.4365 | 
30 | 
0.434 | 
- | 
- | 
- | 
- | 
- | 
| 2.9239 | 
36 | 
- | 
0.7426 | 
0.7614 | 
0.7616 | 
0.7134 | 
0.7634 | 
| 3.2487 | 
40 | 
0.3627 | 
- | 
- | 
- | 
- | 
- | 
| 3.8985 | 
48 | 
- | 
0.7420 | 
0.7606 | 
0.7618 | 
0.7142 | 
0.7644 | 
	
 
- The bold row denotes the saved checkpoint.
 
	
		
	
	
		Framework Versions
	
- Python: 3.10.4
 
- Sentence Transformers: 3.0.0
 
- Transformers: 4.41.2
 
- PyTorch: 2.6.0+cu118
 
- Accelerate: 1.6.0
 
- Datasets: 2.19.1
 
- Tokenizers: 0.19.1
 
	
		
	
	
		Citation
	
	
		
	
	
		BibTeX
	
	
		
	
	
		Sentence Transformers
	
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
	
		
	
	
		MatryoshkaLoss
	
@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
	
		
	
	
		MultipleNegativesRankingLoss
	
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}