Transformer Kernels

The transformer kernel API in DeepSpeed can be used to create BERT transformer layer for more efficient pre-training and fine-tuning, it includes the transformer layer configurations and transformer layer module initialization.

Here we present the transformer kernel API. Please see the BERT pre-training tutorial for usage details.

DeepSpeed Transformer Config

DeepSpeed Transformer Layer