Lecture 3: Loss Functions and Optimization (Linear Classifier, Multi-Class SVM Loss, Regularization, Softmax Loss, SGD)

Linear Classifier

W will give scores for all class for an image
- highest score will be chosen for prediction
Need to determine which W will be best
- need some way to quantify the badness of W
  - Loss Function
Need efficient procedure for searching through the space of all possible Ws
- come up with what the correct value is
  - Optimization

$$ L_i=\sum_{j\neq y_i}max(0,s_j-s_{y_i}+1)\\s_j:\ Incorrect\ Score\\s_{y_i}:\ Correct\ Score $$

generalization of binary SVM to handle multiple classes
Calculates the sum of all the scores for incorrect categories
- Compare the score of correct category
  - If correct category score is greater than the incorrect score by some safety margin
    - Loss = 0
    - Safety margin = 1 (for above math notation
  - If not sum it up through whole dataset and calc. the average
Also referred as Hinge Loss
- As certain point reached loss = 0
At initialization of W, W is very small leading to small scores in all category
- Loss will end up giving $n(classes)-1$
  - Loops over all incorrect classes, Ss will be all similar with -1 from safety margin
Squared Hinge Loss get used sometimes
- When you value very wrong and wrong differently
Example Code
There can be multiple W that gives 0 loss
- Always consider test/validation data in such case