Instructions to use johngiorgi/declutr-sci-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use johngiorgi/declutr-sci-base with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("johngiorgi/declutr-sci-base") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Tokens
#1
by Recognizeme - opened
Can you tell me if you are still developing the model?
Are you looking to increase the number of tokens?
I am not currently still developing the model but it would be pretty straightforward to train it on more tokens! See: https://github.com/JohnGiorgi/DeCLUTR. Based on the results in the paper I would expect increasing the training set to have a large positive effect on performance.
It's a real shame. Your model is one of the best for getting embeddings in scientific texts!