Lecture 6: Training Neural Networks I | Notion

Activation Functions

Sigmoid

Squashes number to range [0,1]
saturates 'firing rate' of neuron
problems
- Saturated neurons can kill gradients
  - gradient vanishing problem
    - too high or too low gradient will be killed
- Sigmoid outputs are not zero centered
  - always all positive or all negative
    - inefficient gradient update
      - forces zig zag path
- exponential computation too expensive
  - minor

tanh

squashes number to [-1,1]
- zero centered
still gradient vanishing problem
works better than sigmoid
- don't expect much

ReLU

gradient vanishing problem fixed
computationally efficient
converges much faster
most common
problem
- not zero centered
- annoyance
  - negative half still kill gradient
dead ReLU possible
- never update
- cause
  - bad initialization
  - learning rate too high
    - more common

Leaky ReLU

slight negative slope rather than flat slope
no saturation at all