Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use nadrajak/allenai-specter-ft2 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nadrajak/allenai-specter-ft2")
sentences = [
"A structure with a massive group in its center and a cool expansion outflow outside is studied around the Cen A galaxy with the use of the Hubble Space Telescope observations. It is demonstrated that the dynamics of the flow is dominated by the antigravity of the dark energy background. The density of dark energy in the cell is estimated to be near the global cosmological density. This agrees with our previous result from the neighborhood of the Local group. A notion of the ``Hubble cell'' is introduced as a building block of the local structure of the universe.",
"Today's computer-based annotation systems implement a wide range of functionalities that often go beyond those available in traditional paper-and-pencil annotations. Conceptually, annotation systems are based on thoroughly investigated psycho-sociological and pedagogical learning theories. They offer a huge diversity of annotation types that can be placed in textual as well as in multimedia format. Additionally, annotations can be published or shared with a group of interested parties via well-organized repositories. Although highly sophisticated annotation systems exist both conceptually as well as technologically, we still observe that their acceptance is somewhat limited. In this paper, we argue that nowadays annotation systems suffer from several fundamental problems that are inherent in the traditional paper-and-pencil annotation paradigm. As a solution, we propose to shift the annotation paradigm for the implementation of annotation system.",
"By providing sensitive sub-arcsecond images and integral field spectroscopy in the 25 - 400 micron wavelength range, a far-IR interferometer will revolutionize our understanding of planetary system formation, reveal otherwise-undetectable planets through the disk perturbations they induce, and spectroscopically probe the atmospheres of extrasolar giant planets in orbits typical of most of the planets in our solar system. The technical challenges associated with interferometry in the far-IR are greatly relaxed relative to those encountered at shorter wavelengths or when starlight nulling is required. A structurally connected far-IR interferometer with a maximum baseline length of 36 m can resolve the interesting spatial structures in nascent and developed exoplanetary systems and measure exozodiacal emission at a sensitivity level critical to TPF-I mission planning. The Space Infrared Interferometric Telescope was recommended in the Community Plan for Far-IR/Submillimeter Space Astronomy, studied as a Probe-class mission, and estimated to cost 800M dollars. The scientific communities in Europe, Japan, and Canada have also demonstrated a keen interest in far-IR interferometry through mission planning workshops and technology research, suggesting the possibility of an international collaborative effort.",
"Late long-term potentiation (L-LTP) appears essential for the formation of long-term memory, with memories at least partly encoded by patterns of strengthened synapses. How memories are preserved for months or years, despite molecular turnover, is not well understood. Ongoing recurrent neuronal activity, during memory recall or during sleep, has been hypothesized to preferentially potentiate strong synapses, preserving memories. This hypothesis has not been evaluated in the context of a mathematical model representing biochemical pathways important for L-LTP. I incorporated ongoing activity into two such models: a reduced model that represents some of the essential biochemical processes, and a more detailed published model. The reduced model represents synaptic tagging and gene induction intuitively, and the detailed model adds activation of essential kinases by Ca. Ongoing activity was modeled as continual brief elevations of [Ca]. In each model, two stable states of synaptic weight resulted. Positive feedback between synaptic weight and the amplitude of ongoing Ca transients underlies this bistability. A tetanic or theta-burst stimulus switches a model synapse from a low weight to a high weight stabilized by ongoing activity. Bistability was robust to parameter variations. Simulations illustrated that prolonged decreased activity reset synapses to low weights, suggesting a plausible forgetting mechanism. However, episodic activity with shorter inactive intervals maintained strong synapses. Both models support experimental predictions. Tests of these predictions are expected to further understanding of how neuronal activity is coupled to maintenance of synaptic strength."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/allenai-specter. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("nadrajak/allenai-specter-ft2")
# Run inference
sentences = [
"We describe azimuth structure commonly associated with elliptic and directed flow in the context of 2D angular autocorrelations for the purpose of precise separation of so-called nonflow (mainly minijets) from flow. We extend the Fourier-transform description of azimuth structure to include power spectra and autocorrelations related by the Wiener-Khintchine theorem. We analyze several examples of conventional flow analysis in that context and question the relevance of reaction plane estimation to flow analysis. We introduce the 2D angular autocorrelation with examples from data analysis and describe a simulation exercise which demonstrates precise separation of flow and nonflow using the 2D autocorrelation method. We show that an alternative correlation measure based on Pearson's normalized covariance provides a more intuitive measure of azimuth structure.",
'It is a brief review on composing and solving Infrared Evolution Equations. They can be used in order to calculate amplitudes of high-energy reactions in different kinematic regions in the double-logarithmic approximation.',
'Moeller\'s energy-momentum complex is employed in order to determine the energy and momentum distributions for a spacetime described by a "generalized Schwarzschild" geometry in (3+1)-dimensions on a noncommutative curved D3-brane in an effective, open bosonic string theory. The geometry considered is obtained by an effective theory of gravity coupled with a nonlinear electromagnetic field and depends only on the generalized (effective) mass and charge which incorporate corrections of first order in the noncommutativity parameter.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
triplet_evalTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.949 |
triplet_evalTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.947 |
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
We study the notion of the scaled entropy of a filtration of $\sigma$-fields (= decreasing sequence of $\sigma$-fields) introduced by the first author ({V4}). We suggest a method for computing this entropy for the sequence of $\sigma$-fields of pasts of a Markov process determined by a random walk over the trajectories of a Bernoulli action of a commutative or nilpotent countable group (Theorems |
In this paper we complete a classification of finite linear spaces $\cS$ with line size at most 12 admitting a line-transitive point-imprimitive subgroup of automorphisms. The examples are the Desarguesian projective planes of orders $4,7, 9$ and 11, two designs on 91 points with line size 6, and 467 designs on 729 points with line size 8. |
We show that the combined data from solar, long-baseline and reactor neutrino experiments can exclude the generalized bicycle model of Lorentz noninvariant direction-dependent and/or direction-independent oscillations of massless neutrinos. This model has five parameters, which is more than is needed in standard oscillation phenomenology with neutrino masses. Solar data alone are sufficient to exclude the pure direction-dependent case. The combination of solar and long-baseline data rules out the pure direction-independent case. With the addition of KamLAND data, a mixture of direction-dependent and direction-independent terms in the effective Hamiltonian is also excluded. |
We discuss a numerical model for black hole growth and its associated feedback processes that for the first time allows cosmological simulations of structure formation to self-consistently follow the build up of the cosmic population of galaxies and active galactic nuclei. Our model assumes that seed black holes are present at early cosmic epochs at the centres of forming halos. We then track their growth from gas accretion and mergers with other black holes in the course of cosmic time. For black holes that are active, we distinguish between two distinct modes of feedback, depending on the black hole accretion rate itself. Black holes that accrete at high rates are assumed to be in a `quasar regime', where we model their feedback by thermally coupling a small fraction of their bolometric luminosity to the surrounding gas. For black holes with low accretion rates, we conjecture that most of their feedback occurs in mechanical form, where AGN-driven bubbles are injected into a gaseous e... |
Context: L'-band (3.8 micron) images of the Galactic Center show a large number of thin filaments in the mini-spiral, located west of the mini-cavity and along the inner edge of the Northern Arm. One possible mechanism that could produce such structures is the interaction of a central wind with the mini-spiral. Additionally, we identify similar features that appear to be associated with stars. Aims: We present the first proper motion measurements of the thin dust filaments observed in the central parsec around SgrA* and investigate possible mechanisms that could be responsible for the observed motions. Methods: The observations have been carried out using the NACO adaptive optics system at the ESO VLT. The images have been transformed to a common coordinate system and features of interest were extracted. Then a cross-correlation technique could be performed in order to determine the offsets between the features with respect to their position in the reference epoch. Results: We derive t... |
We consider a social system of interacting heterogeneous agents with learning abilities, a model close to Random Field Ising Models, where the random field corresponds to the idiosyncratic willingness to pay. Given a fixed price, agents decide repeatedly whether to buy or not a unit of a good, so as to maximize their expected utilities. We show that the equilibrium reached by the system depends on the nature of the information agents use to estimate their expected utilities. |
Low-energy dipole excitations have been investigated theoretically in N=50, several N=82 isotones and the Z=50 Sn isotopes. For this purpose a method incorporating both HFB and multi-phonon QPM theory is applied. A concentration of one-phonon dipole strength located below the neutron emission threshold has been calculated in these nuclei. The analysis of the corresponding neutron and proton dipole transition densities allows to assign a genuine pattern to the low-energy excitations and making them distinct from the conventional GDR modes. Analyzing also the QRPA wave functions of the states we can identify these excitations as Pygmy Dipole Resonance (PDR) modes, recently studied also in Sn and N=82 nuclei. The results for N=50 are exploratory for an experimental project designed for the bremsstrahlung facility at the ELBE accelerator. |
The NA60 experiment is a fixed-target experiment at the CERN SPS. It has measured the dimuon yield in Indium--Indium collisions with an In beam of 158 AGeV/c and in p-A collisions with a proton beam of 400 and 158 AGeV/c. The results allow to address three important physics topics, namely the study of the rho spectral function in nuclear collisions, the clarification of the origin of the dimuon excess measured by NA50 in the intermediate mass range, and the J/psi suppression pattern in a collision system different from Pb-Pb. An overview of these results will be given in this paper. |
We examine the stability of a trapped dipolar condensate mixed with a single-component fermion gas at T=0. Whereas pure dipolar condensates with small s-wave interaction are unstable even for small dipole-dipole interaction strength, we find that the admixture of fermions can significantly stabilize them, depending on the strength of the boson-fermion interaction. Within the stable regime we find a region where a ground state is characterized by a density wave along the soft trap direction. |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
We disprove a conjecture of A. Koldobsky asking whether it is enough to compare $(n-2)$-derivatives of the projection functions of two symmetric convex bodies in the Shephard problem in order to get a positive answer in all dimensions. |
The projective degrees of strict partitions of n were computed for all n < 101 and the partitions with maximal projective degree were found for each n. It was observed that maximizing partitions for successive values of n "lie close to each other" in a certain sense. Conjecturing that this holds for larger values of n, the partitions of maximal degree were computed for all n < 221. The results are consistent with a recent conjecture on the limiting shape of the strict partition of maximal projective degree. |
In [1] was considered the superintegrable system which describes the magnetic dipole with spin 1/2 (neutron) in the field of linear current. Here we present its generalization for any spin which preserves superintegrability. The dynamical symmetry stays the same as it is for spin 1/2. |
We develop a method for measuring and localizing homology classes. This involves two problems. First, we define relevant notions of size for both a homology class and a homology group basis, using ideas from relative homology. Second, we propose an algorithm to compute the optimal homology basis, using techniques from persistent homology and finite field algebra. Classes of the computed optimal basis are localized with cycles conveying their sizes. The algorithm runs in $O(\beta^4 n^3 \log^2 n)$ time, where $n$ is the size of the simplicial complex and $\beta$ is the Betti number of the homology group. |
We consider two-way wire-tap channels, where two users are communicating with each other in the presence of an eavesdropper, who has access to the communications through a multiple-access channel. We find achievable rates for two different scenarios, the Gaussian two-way wire-tap channel, (GTW-WT), and the binary additive two-way wire-tap channel, (BATW-WT). It is shown that the two-way channels inherently provide a unique advantage for wire-tapped scenarios, as the users know their own transmitted signals and in effect help encrypt the other user's messages, similar to a one-time pad. We compare the achievable rates to that of the Gaussian multiple-access wire-tap channel (GMAC-WT) to illustrate this advantage. |
We report quantitative relations between corruption level and economic factors, such as country wealth and foreign investment per capita, which are characterized by a power law spanning multiple scales of wealth and investments per capita. These relations hold for diverse countries, and also remain stable over different time periods. We also observe a negative correlation between level of corruption and long-term economic growth. We find similar results for two independent indices of corruption, suggesting that the relation between corruption and wealth does not depend on the specific measure of corruption. The functional relations we report have implications when assessing the relative level of corruption for two countries with comparable wealth, and for quantifying the impact of corruption on economic growth and foreign investments. |
The paper addresses the space-frequency correlations of electromagnetic waves in general random, bi-anisotropic media whose constitutive tensors are complex Hermitian matrices. The two-frequency Wigner distribution (2f-WD) for polarized waves is introduced to describe the space-frequency correlations and the closed form Wigner-Moyal equation is derived from the Maxwell equations. Two-frequency radiative transfer (2f-RT) equations is then derived from the Wigner-Moyal equation by using the multiple scale expansion. For the simplest isotropic medium, the result coincides with Chandrasekhar's transfer equation. In birefringent media, the 2f-RT equations take the scalar form due to the absence of depolarization. A number of birefringent media such as the chiral, uniaxial and gyrotropic media are examined. For the unpolarized wave in the isotropic medium the 2f-RT equations reduces to the Fokker-Planck equation previously derived in Part I. A similar Fokker-Planck equation is derived from t... |
In this paper, it is shown that the cosmological model that was introduced in a sequence of three earlier papers under the title, A Dust Universe Solution to the Dark Energy Problem, can be used to resolve the problem of the great mismatch of numerical values between dark energy from cosmology and zero point energy from quantum theory. It is shown that, if the zero point energies for the cosmic microwave background and for all the rest of the universe that is not cosmic microwave background are introduced into this model as two entities, their separate values appear within this theory in the form of a numerical difference. It is this difference that gives the numerical value for the zero point value of Einstein's dark energy density. Consequently, although the two zero point energies may be large, their difference can give the known small dark energy value from cosmology for dark energy density. Issues relating to interpretation, calculation and measurement associated with this result ... |
We demonstrate spin injection into a graphene thin film with high reliability by using non-local magnetoresistance (MR) measurements, in which the electric current path is completely separated from the spin current path. Using these non-local measurements, an obvious MR effect was observed at room temperature; and the MR effect was ascribed to magnetization reversal of ferromagnetic electrodes. This result is a direct demonstration of spin injection into a graphene thin film. Furthermore, this is the first report of spin injection into molecules at room temperature. |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
eval_strategy: stepsgradient_accumulation_steps: 2learning_rate: 2e-05lr_scheduler_type: cosinewarmup_ratio: 0.1fp16: Truedataloader_num_workers: 1dataloader_pin_memory: Falseoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 2eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 1dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Falsedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | Validation Loss | triplet_eval_cosine_accuracy |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.8210 |
| 0.8 | 500 | 2.5031 | 0.7956 | 0.9410 |
| 1.6 | 1000 | 1.0464 | 0.7594 | 0.9450 |
| 2.4 | 1500 | 0.5218 | 0.7086 | 0.9480 |
| -1 | -1 | - | - | 0.9470 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Base model
sentence-transformers/allenai-specter