An overview of gradient descent optimization algorithms

https://ruder.io/optimizing-gradient-descent/

An overview of gradient descent optimization algorithms
Gradient descent is the preferred way to optimize neural networks and many other machine learning algorithms but is often used as a black box. This post explores how many of the most popular gradient-based optimization algorithms such as Momentum, Adagrad, and Adam actually work.
ruder.io