Papers - MoE - Research
updated
Adaptive sequential Monte Carlo by means of mixture of experts
Paper
• 1108.2836
• Published
• 2
Convergence Rates for Mixture-of-Experts
Paper
• 1110.2058
• Published
• 2
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs
Paper
• 2310.12008
• Published
• 2
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer
with Mixture-of-View-Experts
Paper
• 2308.11793
• Published
• 2
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper
• 2308.10110
• Published
• 2
HyperFormer: Enhancing Entity and Relation Interaction for
Hyper-Relational Knowledge Graph Completion
Paper
• 2308.06512
• Published
• 2
Experts Weights Averaging: A New General Training Scheme for Vision
Transformers
Paper
• 2308.06093
• Published
• 2
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language
Models
Paper
• 2403.03432
• Published
• 1
Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial
Defense
Paper
• 2402.18787
• Published
• 2
Not All Experts are Equal: Efficient Expert Pruning and Skipping for
Mixture-of-Experts Large Language Models
Paper
• 2402.14800
• Published
• 3
Multilinear Mixture of Experts: Scalable Expert Specialization through
Factorization
Paper
• 2402.12550
• Published
• 2
Turn Waste into Worth: Rectifying Top-k Router of MoE
Paper
• 2402.12399
• Published
• 2
Buffer Overflow in Mixture of Experts
Paper
• 2402.05526
• Published
• 9
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Paper
• 2211.15841
• Published
• 8
A Machine Learning Perspective on Predictive Coding with PAQ
Paper
• 1108.3298
• Published
• 2
DEMix Layers: Disentangling Domains for Modular Language Modeling
Paper
• 2108.05036
• Published
• 3
Sparse Backpropagation for MoE Training
Paper
• 2310.00811
• Published
• 2
A Review of Sparse Expert Models in Deep Learning
Paper
• 2209.01667
• Published
• 3
FedJETs: Efficient Just-In-Time Personalization with Federated Mixture
of Experts
Paper
• 2306.08586
• Published
• 1
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with
Architecture-Routed Mixture-of-Experts
Paper
• 2306.04845
• Published
• 4
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
• 2403.07508
• Published
• 77
Unified Scaling Laws for Routed Language Models
Paper
• 2202.01169
• Published
• 2
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper
• 2310.16795
• Published
• 27
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper
• 2212.05055
• Published
• 6