Optimization

Problem with SGD

Mommentum

Nesterov Momentum

$v_{t+1}=\rho v_t-\alpha\nabla f(x_t+\rho v_t)$

$x_{t+1}=x_t+v_{t+1}$