Lecture 4: Introduction to Neural Networks

Computational Graphs

Can represent any function
- node = step of computation
- no matter how complex the function is

Can apply backpropagation
- recursively use chain rule to compute the gradient with respect to every variable
Especially useful when using complex models
- ex) Convolutional Network

Forward pass computation first
- Give all intermediate variable names
  - q : $x + y$
  - f : $qz$
From the back apply chain rule backwards to all paths
Calculate gradient of all variables
- $\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}$

What's exactly happening
- Each node have two inputs and an output
  - - local gradient
- during backpropagation:
  - each nodes get upstream gradients coming back
  - compute only the gradient of two inputs
  - give computed gradients to previous node(s)
when a node is outputted to multiple nodes
- gradients are added
If x, y, z are vecotrs
- works the same
- local gradients are represented as Jacobian matrix
  - Jacobian matrix = diagonal matrix
- gradient of a vector is always same size as original vector