Deep Learning - Object Detection Notes Part 2

R-CNN

Posted by Rico's Nerd Cluster on February 13, 2022

Region Based CNN (R-CNN, Girshick et al. CVPR 2014)

Zhihu

Regional Proposal is the core of R-CNN. It first uses a segmentation algorithm to find regions with objects, then use these regions as “region proposals” for CNN to run on [2]. Each region outputs [label, bounding box].

  1. Use Selective Search Algorithm to come up with 2000 region proposals: TODO
  2. Use AlexNet for Feature Extraction on 2000 region proposals.
    • Get 2000x4096 feature vector
  3. Classification & bounding box
    • Use 21 SVM to classify 21 classes (including background) on 2000 region proposals
      • Each SVM has 21 values.
    • Parallel to classication, use TODO regression for bounding box regression

Later came Fast R-CNN (Girshick, ICCV 2015). Fast R-CNN propose regions, then uses convolution implementation of sliding windows to classify all proposed regions.

Then came Faster R-CNN (Ren, He et al. NeurlPS 2015). They are all slower than YOLO. From Prof. Andrew Ng’s perspective, YOLO’s 1-stage architecture is more concise.

  • YOLOv3: 30 fps on high end CPU
  • Faster R-CNN: 7 fps+
  • YOLOv4 and YOLOv5: 60fps+

References