Decentralized State Sync for Sparse Mixture-of-Experts Nodes
Authors: R. Rithish, A. Kowalski, J. Chen
Google Scholar Verified
Abstract
Sparse Mixture-of-Experts (MoE) architectures present unique challenges in wide-area distributed setups. We propose an optimistic state synchronization protocol featuring low-rank weight projection, reducing node synchronization overheads by 60% while maintaining p99 convergence parameters under high package-loss conditions.