GeLU activation function – On the Impact of the Activation Function on Deep Neural Networks Training

https://arxiv.org/abs/1902.06853

[1902.06853] On the Impact of the Activation Function on Deep Neural Networks Training
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks …
arxiv.org

Publié

dans

, ,

par