https://s3-us-west-2.amazonaws.com/secure.notion-static.com/0cf36bc3-82b2-427c-aa32-cffb1de5f03c/Untitled.png

Gradient Decent (GD)

$$ W:=W-\alpha\frac{\partial}{\partial W} $$

Stochastic Gradient Decent (SGD)

Momentum

$$ v=\alpha v-\eta\frac{\partial L}{\partial W}\\W=W+v $$

AdaGrad

$$ h=h+\frac{\partial L}{\partial W}\odot\frac{\partial L}{\partial W}\\W=W-\eta\frac{1}{\sqrt{h}}\frac{\partial L}{\partial W} $$