[Google_Bootcamp_Day11]
Updated:
Batch Normalization
- makes hyperparameter search much easier
Normalizing inputs to speed up learning
- Normalizing input features help train ‘w,b’ more efficiently
- Same process to hidden layer
Implementing Batch Norm
Adding Batch Norm to a network
With mini-batch
Implementing gradient descent
Why Batch norm work
- Problem
- Even the exact value of Z1,Z2,…,Z4 (all in second layer) change, at least the mean and the variance remain same
- Solution
- Batch Norm reduces the amount that the distribution of hidden unit values shift around
- Limits the amount that updating parameters in the earlier layers can affect the distribution of values
- Can use batch norm as regularization
- Each mini-batch is scaled by the mean/variance computed on just that mini-batch
- This adds some noise to the values z[l] within that minibatch.
- So, similar to dropout, it adds some noise to each hidden layer’s activations
- large mini-batch size -> reduce noise -> reduce regularization effect
Batch Norm at test time
Softmax regression
- Ex. Recognizing cats(1), dogs(2), and baby chicks(3)
- Softmax regression generalizes logistic regression to C classes
Softmax layer
[Source] https://www.coursera.org/learn/deep-neural-network
Leave a comment