AutoEP (Automatic Expert Parallelism)
AutoEP automatically detects MoE layers in Hugging Face models and replaces them with EP-enabled versions, requiring zero model code changes. It follows the pattern of AutoTP (Automatic Tensor Parallelism).
This API is separate from the explicit deepspeed.moe.layer.MoE layer API.
For the explicit DeepSpeed MoE layer API, see Mixture of Experts (DeepSpeed MoE).
Built-in AutoEP presets: mixtral (Mixtral), qwen3_moe (Qwen3-MoE),
qwen3_5_moe (Qwen3.5-MoE), deepseek_v2 (DeepSeek-V2), and
deepseek_v3 (DeepSeek-V3).
The preset name means AutoEP knows the router, expert, and weight naming
patterns for that model family. Running a Hugging Face model also requires a
Transformers build that exposes the matching config/model classes,
model.config.model_type value, and fused expert layout.
Preset |
Minimum Transformers version |
Notes |
|---|---|---|
|
|
|
|
|
Also covers Qwen2-MoE when the installed Transformers build uses the
validated fused expert layout. Qwen3-MoE classes appear in |
|
|
Requires the Qwen3.5 text-backbone |
|
|
|
|
|
|
ZeRO compatibility: Stages 0, 1, and 2. Stage 3 is not supported.
Usage:
{
"expert_parallel": {
"enabled": true,
"autoep_size": 4,
"preset_model": "mixtral"
}
}
How it works:
During
deepspeed.initialize(), AutoEP scans the model for MoE layers using preset-defined patterns (router name, expert name, weight shapes).Detected MoE blocks are replaced with
AutoEPMoELayer, which uses TorchTitan’s grouped GEMM kernels and AllToAll token dispatch.EP/EDP process groups are created automatically based on
autoep_size.Expert parameters are marked for expert-data-parallel gradient reduction; router and shared-expert parameters use standard data-parallel reduction.
Constraints:
autoep_sizemust dividenum_expertsfor all detected MoE layers.autoep_size=1is valid: all experts remain local (no AllToAll), useful for functional testing on a single GPU.AutoEP currently cannot be combined with AutoTP (
tensor_parallel.autotp_size > 1) or tensor model parallelism frommpu; support is planned as follow-up work.AutoEP currently supports ZeRO stages 0, 1, and 2 only. ZeRO stage 3 and its partitioned-parameter get/set APIs are outside the scope of the current AutoEP support.
Checkpoint save/load requires matching
autoep_size. To changeautoep_sizeacross runs for the same AutoEP-detected model topology, convert the checkpoint to Universal Checkpoint format and load it withcheckpoint.load_universal; see the Universal Checkpointing tutorial for the detailed flow and constraints.DeepSeek-V2 and DeepSeek-V3 AutoEP do not support load-balance expert bias yet. The built-in DeepSeek presets disable it by default; explicit non-null values fail.