Optimization

Problem with SGD

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e1a12396-61c2-48af-8acd-02008ae268d3/Untitled.png

Mommentum

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/d9073983-1cda-4e5a-9b1a-0724ec2c48b5/Untitled.png

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/b4d35ba1-f4b2-4bd6-9d43-b85f55fd3b13/Untitled.png

Nesterov Momentum

$v_{t+1}=\rho v_t-\alpha\nabla f(x_t+\rho v_t)$

$x_{t+1}=x_t+v_{t+1}$