Computational Graphs
- Can represent any function
- node = step of computation
- no matter how complex the function is
Advantage
- Can apply backpropagation
- recursively use chain rule to compute the gradient with respect to every variable
- Especially useful when using complex models
- ex) Convolutional Network
Backpropagation
How it's done
- Forward pass computation first
- Give all intermediate variable names
- From the back apply chain rule backwards to all paths
- Calculate gradient of all variables
- $\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}$
- What's exactly happening
- Each node have two inputs and an output
- during backpropagation:
- each nodes get upstream gradients coming back
- compute only the gradient of two inputs
- give computed gradients to previous node(s)
- when a node is outputted to multiple nodes
- If x, y, z are vecotrs
- works the same
- local gradients are represented as Jacobian matrix
- Jacobian matrix = diagonal matrix
- gradient of a vector is always same size as original vector
Advantage
- Simpler than using complex calculation to compute gradients
Tips
- backpropagation of
- addition node will end up multiplying one
- multiplication node will end up multiplying other input to the upstream value
- max gate
- greater input: upstream gradient value
- smaller input: 0