--- library_name: adaptive-classifier tags: - llm - routing - multi-model - bert - router-arena - model-selection language: - en metrics: - accuracy license: apache-2.0 --- # Chayan: Multi-Model LLM Router **Chayan** is a high-performance LLM router that intelligently routes between 4 models (gpt-4o-mini, gemini-2.5-flash-lite, gemini-2.5-flash, and gpt-4o) to optimize the accuracy-cost tradeoff. ## 🏆 RouterArena Performance **Official Leaderboard Results** (8,400 queries): - 🥇 **#1 Optimal Accuracy Score: 88.7%** - SOTA! (Best routing decision quality) - 🥈 **#2 Optimal Selection Score: 43.0%** - Silver! (Second-best model selection) - **#7 Overall** (#5 open-source): 64.9% accuracy, 63.8 arena score - **$0.60 per 1K queries** - Cost-efficient routing ![RouterArena Leaderboard](routerarena_leaderboard.png) **What do these metrics mean?** - **Optimal Accuracy**: When Chayan routes to a model, that model gives the correct answer 88.7% of the time - **Optimal Selection**: Chayan selects the best available model 43% of the time View full leaderboard: [RouterArena](https://routeworks.github.io/) | [PR #24](https://github.com/RouteWorks/RouterArena/pull/24) ## Quick Start ```bash pip install adaptive-classifier ``` ```python from adaptive_classifier import AdaptiveClassifier # Load router router = AdaptiveClassifier.load("adaptive-classifier/chayan") # Get routing decision query = "What is the capital of France?" predictions = router.predict(query, k=4) # Route to top model selected_model = predictions[0][0] # e.g., "openai/gpt-4o-mini" ``` ### Recommended: Use with Calibration ```python # Apply calibration factors for best performance calibration = { "openai/gpt-4o-mini": 0.9, "google/gemini-2.5-flash-lite": 1.5, "google/gemini-2.5-flash": 1.8, "openai/gpt-4o": 1.5 } predictions = router.predict(query, k=4) calibrated_scores = {model: score * calibration[model] for model, score in predictions} selected_model = max(calibrated_scores.items(), key=lambda x: x[1])[0] ``` ## Architecture **Core Components:** - **Base Model**: BERT-base-uncased embeddings - **Classifier**: Adaptive K-NN with prototype memory (FAISS-backed) - **Innovation**: Calibrated confidence scores to correct training data imbalance **Supported Models:** | Model | Use Case | Cost/1M tokens | |-------|----------|----------------| | openai/gpt-4o-mini | Simple queries | $0.15 | | google/gemini-2.5-flash-lite | Medium complexity | $0.075 | | google/gemini-2.5-flash | Higher complexity | $0.30 | | openai/gpt-4o | Complex queries | $2.50 | ## How It Works ### Training - **Dataset**: RouterArena sub_10 (809 queries) - **Oracle Labels**: 4-model cascade strategy (select cheapest successful model) - **Training Time**: 19.2 minutes - **Method**: K-NN classifier with 3000 prototypes, temperature 0.4 ### The Calibration Breakthrough The uncalibrated router achieved 61.76% accuracy but was biased toward gpt-4o-mini (83% routing). This happened because the training data had class imbalance: - 57% gpt-4o-mini examples - 27% gpt-4o examples - 12% gemini-flash-lite examples - 4% gemini-flash examples **Solution**: Apply post-training calibration factors to correct the bias without retraining. **Result**: +7.29pp improvement (61.76% → 69.05% on sub_10 benchmark) ## Performance Benchmarks **Sub_10 Benchmark (809 queries):** | Router | Accuracy | Cost/1K | |--------|----------|---------| | All gpt-4o-mini (baseline) | 56.98% | $0.088 | | 2-model router | 61.43% | $0.217 | | Chayan (uncalibrated) | 61.76% | $0.269 | | **Chayan (calibrated)** | **69.05%** | **$0.333** | | Perfect 2-model oracle | 69.84% | $0.784 | **Key Insight**: Chayan achieves 99% of perfect oracle performance at 57% lower cost. **Full Dataset (8,400 queries):** - **Optimal Accuracy**: 88.7% (🥇 #1) - **Optimal Selection**: 43.0% (🥈 #2) - **Overall Accuracy**: 64.9% (#7 overall, #5 open-source) - **Cost**: $0.60/1K queries ## Advanced Usage ### Feature Augmentation Chayan was trained with query features prepended as tokens: ```python from adaptive_classifier.complexity_features import augment_query_with_features query = "What is 2+2?" augmented = augment_query_with_features(query) # Returns: "[LEN:12][WORDS:3][MATH:1][SENT:1][MC:0] What is 2+2?" predictions = router.predict(augmented, k=4) ``` ## Limitations - Calibration factors optimized on RouterArena sub_10; may require adjustment for other domains - Requires the 4 specific models to be available via API - Performance depends on query distribution similar to RouterArena benchmark - Cost estimates assume ~500 tokens per query ## Citation ```bibtex @software{adaptive_classifier, title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning}, author = {Sharma, Asankhaya}, year = {2025}, publisher = {GitHub}, url = {https://github.com/codelion/adaptive-classifier} } ``` ## Links - **Library**: https://github.com/codelion/adaptive-classifier - **RouterArena**: https://routeworks.github.io/ - **RouterArena Paper**: https://arxiv.org/abs/2510.00202