EPLB is DeepSeek’s open implementation of a load balancing algorithm designed for expert parallelism (EP) settings in MoE architectures. In EP, different “experts” are mapped to different GPUs or nodes, so load imbalance becomes a performance bottleneck if certain experts are invoked much more often. EPLB solves this by duplicating heavily used experts (redundancy) and then placing those duplicates across GPUs to even out computational load. It uses policies like hierarchical load balancing (grouped experts placed at node and then GPU level) and global load balancing depending on configuration. The logic is implemented in eplb.py and supports predicting placements given estimated expert usage weights. EPLB aims to reduce hot-spotting and ensure more uniform usage of compute resources in large MoE deployments.
Features
- Expert replication (redundancy) to mitigate hot-spot usage
- Hierarchical load balancing (group-aware placement)
- Global balancing fallback when grouping doesn’t align with hardware
- Heuristic-based placement planning from usage statistics
- Simple Python interface (rebalance_experts) for reuse
- MIT-licensed and publicly available algorithm for expert routing