Activation Functions
Sigmoid
- Squashes number to range [0,1]
- saturates 'firing rate' of neuron
- problems
- Saturated neurons can kill gradients
- gradient vanishing problem
- too high or too low gradient will be killed
- Sigmoid outputs are not zero centered
- always all positive or all negative
- inefficient gradient update
- exponential computation too expensive
tanh
- squashes number to [-1,1]
- still gradient vanishing problem
- works better than sigmoid
ReLU
- gradient vanishing problem fixed
- computationally efficient
- converges much faster
- most common
- problem
- not zero centered
- annoyance
- negative half still kill gradient
- dead ReLU possible
- never update
- cause
- bad initialization
- learning rate too high
Leaky ReLU
- slight negative slope rather than flat slope
- no saturation at all