Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their training remains resource- and time-intensive, requiring massive compute power and careful orchestration of training procedures. Model souping-the practice of averaging weights from multiple models of the same architecture-has emerged as a promising pre- and post-training technique that can enhance performance without expensive retraining. In this paper, we introduce Soup Of Category Experts (SoCE), a principled approach for model souping that utilizes benchmark composition to identify optimal model candidates and applies non-uniform weighted averaging to maximize performance. Contrary to previous uniform-averaging approaches, our method leverages the observation that benchmark categories often exhibit low inter-correlations in model performance. SoCE identifies "expert" models for each weakly-correlated category cluster and combines them using optimized weighted averaging rather than uniform weights. We demonstrate that the proposed method improves performance and robustness across multiple domains, including multilingual capabilities, tool calling, and math and achieves state-of-the-art results on the Berkeley Function Calling Leaderboard.
Community
outperforming uniform averaging is really hard. congrats, this is huge!!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes (2025)
- Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation (2025)
- MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics (2025)
- GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models (2025)
- Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models (2025)
- Zero-Shot Cross-Lingual Transfer using Prefix-Based Adaptation (2025)
- Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper