[Google_Bootcamp_Day8]
Updated:
Mini-batch gradient descent
- Vectorization allows you to efficiently compute on m examples
- But if m is too large, then it could be slow to train
Training with mini-batch gradient descent
Choosing your mini-batch size
- If mini-batch size = m : Batch gradient descent
- If mini-batch size = 1 : Stochastic gradient descent (Every example is its own mini-batch)
- In practice, use somewhat between 1 and m
- Batch gradient descent
- Too long per iteration
- Stochastic gradient descent
- Lose speed-up from vectorization
- Mini-batch gradient descent
- Fastest learning
- Vectorization effect
- Make progress without processing entire training set
If small training set (m <= 2000), then use gradient descent.
Else use typical mini-batch size(64,128,256,512)
Make sure that mini-batch fit in CPU/GPU memory
Exponentially weighted average
- Recent value : weight high, past value : weight low
- EX. Temperature in London
Bias correction in exponentially weighted average
Gradient descent with momentum
- Gradient descent example
- Implementation of solution method
[Source] https://www.coursera.org/learn/deep-neural-network
Leave a comment