Chayan: Multi-Model LLM Router

Chayan is a high-performance LLM router that intelligently routes between 4 models (gpt-4o-mini, gemini-2.5-flash-lite, gemini-2.5-flash, and gpt-4o) to optimize the accuracy-cost tradeoff.

πŸ† RouterArena Performance

Official Leaderboard Results (8,400 queries):

  • πŸ₯‡ #1 Optimal Accuracy Score: 88.7% - SOTA! (Best routing decision quality)
  • πŸ₯ˆ #2 Optimal Selection Score: 43.0% - Silver! (Second-best model selection)
  • #7 Overall (#5 open-source): 64.9% accuracy, 63.8 arena score
  • $0.60 per 1K queries - Cost-efficient routing

RouterArena Leaderboard

What do these metrics mean?

  • Optimal Accuracy: When Chayan routes to a model, that model gives the correct answer 88.7% of the time
  • Optimal Selection: Chayan selects the best available model 43% of the time

View full leaderboard: RouterArena | PR #24

Quick Start

pip install adaptive-classifier
from adaptive_classifier import AdaptiveClassifier

# Load router
router = AdaptiveClassifier.load("adaptive-classifier/chayan")

# Get routing decision
query = "What is the capital of France?"
predictions = router.predict(query, k=4)

# Route to top model
selected_model = predictions[0][0]  # e.g., "openai/gpt-4o-mini"

Recommended: Use with Calibration

# Apply calibration factors for best performance
calibration = {
    "openai/gpt-4o-mini": 0.9,
    "google/gemini-2.5-flash-lite": 1.5,
    "google/gemini-2.5-flash": 1.8,
    "openai/gpt-4o": 1.5
}

predictions = router.predict(query, k=4)
calibrated_scores = {model: score * calibration[model] for model, score in predictions}
selected_model = max(calibrated_scores.items(), key=lambda x: x[1])[0]

Architecture

Core Components:

  • Base Model: BERT-base-uncased embeddings
  • Classifier: Adaptive K-NN with prototype memory (FAISS-backed)
  • Innovation: Calibrated confidence scores to correct training data imbalance

Supported Models:

Model Use Case Cost/1M tokens
openai/gpt-4o-mini Simple queries $0.15
google/gemini-2.5-flash-lite Medium complexity $0.075
google/gemini-2.5-flash Higher complexity $0.30
openai/gpt-4o Complex queries $2.50

How It Works

Training

  • Dataset: RouterArena sub_10 (809 queries)
  • Oracle Labels: 4-model cascade strategy (select cheapest successful model)
  • Training Time: 19.2 minutes
  • Method: K-NN classifier with 3000 prototypes, temperature 0.4

The Calibration Breakthrough

The uncalibrated router achieved 61.76% accuracy but was biased toward gpt-4o-mini (83% routing). This happened because the training data had class imbalance:

  • 57% gpt-4o-mini examples
  • 27% gpt-4o examples
  • 12% gemini-flash-lite examples
  • 4% gemini-flash examples

Solution: Apply post-training calibration factors to correct the bias without retraining.

Result: +7.29pp improvement (61.76% β†’ 69.05% on sub_10 benchmark)

Performance Benchmarks

Sub_10 Benchmark (809 queries):

Router Accuracy Cost/1K
All gpt-4o-mini (baseline) 56.98% $0.088
2-model router 61.43% $0.217
Chayan (uncalibrated) 61.76% $0.269
Chayan (calibrated) 69.05% $0.333
Perfect 2-model oracle 69.84% $0.784

Key Insight: Chayan achieves 99% of perfect oracle performance at 57% lower cost.

Full Dataset (8,400 queries):

  • Optimal Accuracy: 88.7% (πŸ₯‡ #1)
  • Optimal Selection: 43.0% (πŸ₯ˆ #2)
  • Overall Accuracy: 64.9% (#7 overall, #5 open-source)
  • Cost: $0.60/1K queries

Advanced Usage

Feature Augmentation

Chayan was trained with query features prepended as tokens:

from adaptive_classifier.complexity_features import augment_query_with_features

query = "What is 2+2?"
augmented = augment_query_with_features(query)
# Returns: "[LEN:12][WORDS:3][MATH:1][SENT:1][MC:0] What is 2+2?"

predictions = router.predict(augmented, k=4)

Limitations

  • Calibration factors optimized on RouterArena sub_10; may require adjustment for other domains
  • Requires the 4 specific models to be available via API
  • Performance depends on query distribution similar to RouterArena benchmark
  • Cost estimates assume ~500 tokens per query

Citation

@software{adaptive_classifier,
  title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning},
  author = {Sharma, Asankhaya},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/codelion/adaptive-classifier}
}

Links

Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support