[Google_Bootcamp_Day11]

Updated:

Batch Normalization

  • makes hyperparameter search much easier

1

Normalizing inputs to speed up learning

  • Normalizing input features help train ‘w,b’ more efficiently
  • Same process to hidden layer 2

Implementing Batch Norm

3

Adding Batch Norm to a network

4

With mini-batch

5

Implementing gradient descent

6

Why Batch norm work

  • Problem 7
    • Even the exact value of Z1,Z2,…,Z4 (all in second layer) change, at least the mean and the variance remain same
  • Solution
    • Batch Norm reduces the amount that the distribution of hidden unit values shift around
    • Limits the amount that updating parameters in the earlier layers can affect the distribution of values
    • Can use batch norm as regularization
      • Each mini-batch is scaled by the mean/variance computed on just that mini-batch
      • This adds some noise to the values z[l] within that minibatch.
      • So, similar to dropout, it adds some noise to each hidden layer’s activations
      • large mini-batch size -> reduce noise -> reduce regularization effect

Batch Norm at test time

1

Softmax regression

  • Ex. Recognizing cats(1), dogs(2), and baby chicks(3) 2
  • Softmax regression generalizes logistic regression to C classes

Softmax layer

3 4 5

[Source] https://www.coursera.org/learn/deep-neural-network

Categories:

Updated:

Leave a comment