Transformer Kernels¶
The transformer kernel API in DeepSpeed can be used to create BERT transformer layer for more efficient pre-training and fine-tuning, it includes the transformer layer configurations and transformer layer module initialization.
Here we present the transformer kernel API. Please see the BERT pre-training tutorial for usage details.