[Google_Bootcamp_Day8]

Updated:

Mini-batch gradient descent

  • Vectorization allows you to efficiently compute on m examples
  • But if m is too large, then it could be slow to train

mini code

Training with mini-batch gradient descent

graph

Choosing your mini-batch size

  • If mini-batch size = m : Batch gradient descent
  • If mini-batch size = 1 : Stochastic gradient descent (Every example is its own mini-batch)
  • In practice, use somewhat between 1 and m
  1. Batch gradient descent
    • Too long per iteration
  2. Stochastic gradient descent
    • Lose speed-up from vectorization
  3. Mini-batch gradient descent
    • Fastest learning
    • Vectorization effect
    • Make progress without processing entire training set

If small training set (m <= 2000), then use gradient descent.
Else use typical mini-batch size(64,128,256,512)
Make sure that mini-batch fit in CPU/GPU memory

Exponentially weighted average

  • Recent value : weight high, past value : weight low
  • EX. Temperature in London

ex1 ex2 ex3 ex4

Bias correction in exponentially weighted average

bias

Gradient descent with momentum

  • Gradient descent example momen
  • Implementation of solution method imp

[Source] https://www.coursera.org/learn/deep-neural-network

Categories:

Updated:

Leave a comment