[Google_Bootcamp_Day8]
Updated:
Mini-batch gradient descent
- Vectorization allows you to efficiently compute on m examples
- But if m is too large, then it could be slow to train
 

Training with mini-batch gradient descent

Choosing your mini-batch size
- If mini-batch size = m : Batch gradient descent
- If mini-batch size = 1 : Stochastic gradient descent (Every example is its own mini-batch)
- In practice, use somewhat between 1 and m 
- Batch gradient descent
    - Too long per iteration
 
- Stochastic gradient descent
    - Lose speed-up from vectorization
 
- Mini-batch gradient descent
    - Fastest learning
- Vectorization effect
- Make progress without processing entire training set 
 
If small training set (m <= 2000), then use gradient descent. 
Else use typical mini-batch size(64,128,256,512) 
Make sure that mini-batch fit in CPU/GPU memory
Exponentially weighted average
- Recent value : weight high, past value : weight low
- EX. Temperature in London
 
 
 

Bias correction in exponentially weighted average

Gradient descent with momentum
- Gradient descent example
 
- Implementation of solution method
 
[Source] https://www.coursera.org/learn/deep-neural-network
 
      
    
Leave a comment