Gradient Decent (GD)

$$ W:=W-\alpha\frac{\partial}{\partial W} $$

Stochastic Gradient Decent (SGD)

Momentum

$$ v=\alpha v-\eta\frac{\partial L}{\partial W}\\W=W+v $$

AdaGrad

$$ h=h+\frac{\partial L}{\partial W}\odot\frac{\partial L}{\partial W}\\W=W-\eta\frac{1}{\sqrt{h}}\frac{\partial L}{\partial W} $$