[Google_Bootcamp_Day8]

Updated: October 31, 2020

Mini-batch gradient descent

Vectorization allows you to efficiently compute on m examples
But if m is too large, then it could be slow to train

mini code

Training with mini-batch gradient descent

graph

Choosing your mini-batch size

If mini-batch size = m : Batch gradient descent
If mini-batch size = 1 : Stochastic gradient descent (Every example is its own mini-batch)
In practice, use somewhat between 1 and m

Batch gradient descent
- Too long per iteration
Stochastic gradient descent
- Lose speed-up from vectorization
Mini-batch gradient descent
- Fastest learning
- Vectorization effect
- Make progress without processing entire training set

If small training set (m <= 2000), then use gradient descent.
Else use typical mini-batch size(64,128,256,512)
Make sure that mini-batch fit in CPU/GPU memory