Instructions to use flipbitsnotburgers/m2v-multilingual-european with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Model2Vec
How to use flipbitsnotburgers/m2v-multilingual-european with Model2Vec:
from model2vec import StaticModel model = StaticModel.from_pretrained("flipbitsnotburgers/m2v-multilingual-european") - Notebooks
- Google Colab
- Kaggle
m2v-multilingual-european
The minishlab/M2V_multilingual_output model (distilled from LaBSE), pruned to European languages only.
What is this?
This is the original M2V multilingual model with all non-European script tokens removed. The base model was distilled from LaBSE (Language-agnostic BERT Sentence Embedding, 470M params) by the MinishLab team. We pruned the vocabulary to only keep European-script tokens.
Stats
| Before pruning | After pruning | |
|---|---|---|
| Vocabulary | 501,054 tokens | 357,416 tokens |
| Model size | ~490 MB | ~350 MB |
| Embedding dim | 256 | 256 |
28.7% of tokens were removed (non-European scripts).
Usage
from model2vec import StaticModel
model = StaticModel.from_pretrained("flipbitsnotburgers/m2v-multilingual-european")
embeddings = model.encode(["deodorant", "Duschgel", "shower gel"])
Pruned scripts
The following scripts were removed:
- CJK (Chinese, Japanese Kanji)
- Hangul (Korean)
- Hiragana & Katakana (Japanese)
- Arabic
- Hebrew
- Thai, Lao
- Devanagari, Bengali, Tamil, Telugu, and other Indic scripts
- Myanmar, Ethiopic, Tibetan, Khmer
License
MIT (same as base model)
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for flipbitsnotburgers/m2v-multilingual-european
Base model
sentence-transformers/LaBSE Quantized
minishlab/M2V_multilingual_output