DeepSpeed
latest
Training Setup
Inference Setup
Training API
Inference API
Model Checkpointing
Activation Checkpointing
ZeRO
Mixture of Experts (MoE)
Layer specification
Transformer Kernels
Pipeline Parallelism
Optimizers
Learning Rate Schedulers
Flops Profiler
Autotuning
Memory Requirements
DeepSpeed
Docs
»
Mixture of Experts (MoE)
Edit on GitHub
Mixture of Experts (MoE)
¶
Layer specification
¶
Read the Docs
v: latest
Versions
latest
stable
rtd-staging
Downloads
pdf
On Read the Docs
Project Home
Builds
Free document hosting provided by
Read the Docs
.