DeepSpeed
stable
Training Setup
Inference Setup
Training API
Inference API
Model Checkpointing
Activation Checkpointing
ZeRO
Mixture of Experts (MoE)
Transformer Kernels
Pipeline Parallelism
Optimizers
Learning Rate Schedulers
Flops Profiler
Autotuning
Memory Requirements
DeepSpeed
Docs
»
DeepSpeed
Edit on GitHub
DeepSpeed
¶
Model Setup
¶
Training Setup
Argument Parsing
Training Initialization
Distributed Initialization
Inference Setup
Training API
¶
Training API
Forward Propagation
Backward Propagation
Optimizer Step
Gradient Accumulation
Model Saving
Inference API
¶
Inference API
Forward Propagation
Checkpointing API
¶
Model Checkpointing
Loading Training Checkpoints
Saving Training Checkpoints
ZeRO Checkpoint fp32 Weights Recovery
Activation Checkpointing
Configuring Activation Checkpointing
Using Activation Checkpointing
Configuring and Checkpointing Random Seeds
ZeRO API
¶
ZeRO
Getting Started
Constructing Massive Models
Manual Parameter Coordination
Memory-Centric Tiling
Mixture of Experts (MoE)
¶
Mixture of Experts (MoE)
Layer specification
Transformer Kernel API
¶
Transformer Kernels
DeepSpeed Transformer Config
DeepSpeed Transformer Layer
Pipeline Parallelism
¶
Pipeline Parallelism
Model Specification
Training
Extending Pipeline Parallelism
Optimizers
¶
Optimizers
Adam (CPU)
FusedAdam (GPU)
FusedLamb (GPU)
OneBitAdam (GPU)
ZeroOneAdam (GPU)
OnebitLamb (GPU)
Learning Rate Schedulers
¶
Learning Rate Schedulers
LRRangeTest
OneCycle
WarmupLR
WarmupDecayLR
Flops Profiler
¶
Flops Profiler
Autotuning
¶
Autotuning
Autotuner
Memory Usage
¶
Memory Requirements
API To Estimate Memory Usage
Discussion
Indices and tables
¶
Index
Module Index
Search Page
Read the Docs
v: stable
Versions
latest
stable
rtd-staging
Downloads
pdf
On Read the Docs
Project Home
Builds
Free document hosting provided by
Read the Docs
.