This model provides a Tigre–English quality checker built on a fine-tuned SONAR encoder. It produces embeddings for both Tigre and English text and scores their similarity with cosine distance. The result is a fast, lightweight tool for filtering parallel data, validating translations, and supporting Tigre–English NLP workflows.
pip install transformers torch
<pre>
```python
from transformers import AutoTokenizer, M2M100ForConditionalGeneration
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
# load your Tigre-trained encoder
model_id = "BeitTigreAI/tigre-sonar-encoder"
seq2seq = M2M100ForConditionalGeneration.from_pretrained(model_id)
encoder = seq2seq.get_encoder().to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id)
@torch.inference_mode()
def embed(texts, lang):
    tokenizer.src_lang = lang
    batch = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
    out = encoder(**batch, return_dict=True)
    mask = batch["attention_mask"].unsqueeze(-1).float()
    pooled = (out.last_hidden_state * mask).sum(dim=1) / mask.sum(dim=1).clamp_min(1.0)
    return torch.nn.functional.normalize(pooled, p=2, dim=1)
def score_pair(tig, eng):
    t = embed([tig], "tig_Ethi")
    e = embed([eng], "eng_Latn")
    sim = float((t*e).sum())
    return round(sim*100, 1)
print(score_pair("እት እድንየ እግል ትርኤ ተሐዜዮ ተቅዪር ግበእ", "Be the change that you wish to see in the world"))
print(score_pair("ክል ዶል ኢገብእ መስል እስከ ይከለስ", "It always seems impossible until it's done"))
- Downloads last month
 - 11
 
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	🙋
			
		Ask for provider support
Model tree for BeitTigreAI/tigre-sonar-encoder
Base model
facebook/SONAR