https://arxiv.org/abs/1902.06853
[1902.06853] On the Impact of the Activation Function on Deep Neural Networks Training The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks … arxiv.org |