SETU - Script-agnostic English Translation Unifier

SETU is a neural translation model that unifies multiscript, multilingual, and informal text into clean, formal English.

Model Description

The SETU model can handle:

  • Romanized Nepali to English translation
  • Devanagari Nepali to English translation
  • Code-mixed text to English translation
  • Informal/slang to formal English translation

Try It Out

🚀 Interactive Demo: Try SETU in Google Colab: https://colab.research.google.com/drive/1KdLiLtAKGK8_XLyFlEwSqGFPZZqGwl4n?usp=sharing

Installation

Ensure that you have transformers and onnx installed:

pip install transformers  onnxruntime 

Usage

from transformers import AutoModel

# Load the model
model = AutoModel.from_pretrained("santoshdahal/setu", trust_remote_code=True)

# Translate text
result = model("mero name ramesh  ho")
print("Translation:", result)
# Output: "My name is Ramesh."

# Works with Devanagari script too
result = model("सामाजिक मिडिया र ग्राउण्ड वास्तविकता फरक छ।")
print("Translation:", result) 
# Output: "Social media and reality are different."

# Handles informal text
result = model("what is your nam")
print("Translation:", result)
# Output: "what's your name"

Model Details

  • Model Type: Neural Machine Translation
  • Architecture: Transformer
  • Vocabulary Size: 40,253 tokens
  • Languages Supported: Nepali (Romanized & Devanagari), English, Code-mixed text
  • Model Format: ONNX for efficient inference

Technical Implementation

The model uses:

  • ONNX Runtime for efficient inference
  • SentencePiece for tokenization
  • Beam search decoding with configurable beam size
  • Separate encoder and decoder ONNX models

Files Included

  • encoder.onnx: ONNX encoder model
  • decoder.onnx: ONNX decoder model
  • spm.model: SentencePiece tokenizer model
  • spm.vocab: SentencePiece vocabulary
  • config.json: Model configuration
  • modeling_setu_translation.py: Model implementation
  • configuration_setu_translation.py: Configuration class

Citation

If you use this model, please cite:

@misc{setu2025,
  title={SETU: Script-agnostic English Translation Unifier},
  author={Santosh Dahal},
  year={2025}
}
Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results