SETU - Script-agnostic English Translation Unifier
SETU is a neural translation model that unifies multiscript, multilingual, and informal text into clean, formal English.
Model Description
The SETU model can handle:
- Romanized Nepali to English translation
- Devanagari Nepali to English translation
- Code-mixed text to English translation
- Informal/slang to formal English translation
Try It Out
🚀 Interactive Demo: Try SETU in Google Colab: https://colab.research.google.com/drive/1KdLiLtAKGK8_XLyFlEwSqGFPZZqGwl4n?usp=sharing
Installation
Ensure that you have transformers and onnx installed:
pip install transformers onnxruntime
Usage
from transformers import AutoModel
# Load the model
model = AutoModel.from_pretrained("santoshdahal/setu", trust_remote_code=True)
# Translate text
result = model("mero name ramesh ho")
print("Translation:", result)
# Output: "My name is Ramesh."
# Works with Devanagari script too
result = model("सामाजिक मिडिया र ग्राउण्ड वास्तविकता फरक छ।")
print("Translation:", result)
# Output: "Social media and reality are different."
# Handles informal text
result = model("what is your nam")
print("Translation:", result)
# Output: "what's your name"
Model Details
- Model Type: Neural Machine Translation
- Architecture: Transformer
- Vocabulary Size: 40,253 tokens
- Languages Supported: Nepali (Romanized & Devanagari), English, Code-mixed text
- Model Format: ONNX for efficient inference
Technical Implementation
The model uses:
- ONNX Runtime for efficient inference
- SentencePiece for tokenization
- Beam search decoding with configurable beam size
- Separate encoder and decoder ONNX models
Files Included
encoder.onnx: ONNX encoder modeldecoder.onnx: ONNX decoder modelspm.model: SentencePiece tokenizer modelspm.vocab: SentencePiece vocabularyconfig.json: Model configurationmodeling_setu_translation.py: Model implementationconfiguration_setu_translation.py: Configuration class
Citation
If you use this model, please cite:
@misc{setu2025,
title={SETU: Script-agnostic English Translation Unifier},
author={Santosh Dahal},
year={2025}
}
- Downloads last month
- 28
Evaluation results
- BLEU on Nepali-English Mixed Datasetself-reported49.500