Computational Graphs
data:image/s3,"s3://crabby-images/0b44d/0b44dc6cafaa092f6fb0e2d172d2282582bbc013" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/5de88bee-473f-4b07-b7a8-131d238e285f/.png"
- Can represent any function
- node = step of computation
- no matter how complex the function is
Advantage
- Can apply backpropagation
- recursively use chain rule to compute the gradient with respect to every variable
- Especially useful when using complex models
- ex) Convolutional Network
Backpropagation
data:image/s3,"s3://crabby-images/3d331/3d331e0a49bf8aa61ce9b37b80620aff0beeca59" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/663dc9d5-4b6b-4319-be64-3e7bed72695a/.png"
How it's done
- Forward pass computation first
- Give all intermediate variable names
- From the back apply chain rule backwards to all paths
- Calculate gradient of all variables
- $\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}$
- What's exactly happening
- Each node have two inputs and an output
- during backpropagation:
- each nodes get upstream gradients coming back
- compute only the gradient of two inputs
- give computed gradients to previous node(s)
- when a node is outputted to multiple nodes
- If x, y, z are vecotrs
- works the same
- local gradients are represented as Jacobian matrix
- Jacobian matrix = diagonal matrix
- gradient of a vector is always same size as original vector
Advantage
- Simpler than using complex calculation to compute gradients
Tips
data:image/s3,"s3://crabby-images/315fc/315fc6b6e88e89d632e05a76eb00d42e329779d5" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/609cde2f-3832-47a2-aa70-ae1ca0c541f1/.png"
data:image/s3,"s3://crabby-images/ddde2/ddde2140fd5b0d2d781657e9761f378faee4d414" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/251fdd13-79c4-41ea-9cb5-50cf99564ad9/.png"
data:image/s3,"s3://crabby-images/db564/db564e138501d76f0bb3879bf73d55548c29fd77" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/66d077aa-b4c2-4372-affd-16f03e715625/.png"
- backpropagation of
- addition node will end up multiplying one
- multiplication node will end up multiplying other input to the upstream value
- max gate
- greater input: upstream gradient value
- smaller input: 0