[Google_Bootcamp_Day20]

Updated:

Object Detection

Object localization

1

Define the target label y

2

Landmark Detection

  • Labels have to be consistent across the different images 3

Sliding windows detection

  • Pass lots of little cropped images into the ConvNet and have it classified zero or one for each position
  • Problem : computation cost 4

Implementation of sliding windows

  • Turning FC layer into Convolutional layers 5

  • Convolution implementation 6 7

  • crop out a region
  • Repeat run cropped image to your ConvNet until recognizes the object
  • Instead of doing it sequentially, implement the entire image, all maybe 28 by 28 and convolutionally make all the predictions at the same time by one forward pass
  • Problem : the position of the bounding boxes is not going to be too accurate (none of the bounding boxes match exactly)

Intersection over Union (IoU)

  • Generally, IoU is a measure of the overlap between two bounding boxes
  • Evaluation for object localization task iou

Non-max Suppression

  • Non-max suppression is a way for you to make sure that your algorithm detects each object only once
  • What non-max suppression does is that it cleans up all detections, so they end up with just one detection per car, rather than multiple detections per car

ex

  • Disard all boxes with Pc <= 0.6 (threshold)
  • While there are any remaining boxes
    • Pick the box with the largest Pc, then output that as a prediction
    • Discard any remaining box with IoU >= 0.5 (threshold) with the box output in the previous step

Anchor boxes

  • Problem : each of the grid cells can detect only one object
  • Motivation : What if a grid cell wants to detect multiple objects? anchor
  • Previous method
    • Each object in training image is assigned to grid cell that contains that object’s midpoint
    • Output y : 3 * 3 * 8
  • With two anchor boxes
    • Each object in training image is assigned to grid cell that contains object’s midpoint and anchor box for the grid cell with highest IoU (similar shape between bounding box and anchor box)
    • Output y : 3 * 3 * 16
    • Objects assiged to (grid cell, anchor box) pairs

Region proposal

  • R-CNN
    • Idea : Rather than running sliding windows on every single window, you instead select just a few windows and run your classifier on just a few windows (segementation algorithm applied in order to figure out in which windows could be objeccts)
    • Propose regions
    • Classify proposed regions one at a time
    • Output label + bounding box
  • Fast R-CNN
    • Propose regions
    • Use convolution implementation of sliding windows to classify all the proposed regions
  • Faster R-CNN
    • Use convolutional network to propose regions

[source] https://www.coursera.org/learn/convolutional-neural-networks

Categories:

Updated:

Leave a comment