Deep Learning - Object Detection Notes Part 2 - Rico贾若童的博客

Region Based CNN (R-CNN, Girshick et al. CVPR 2014)

Regional Proposal is the core of R-CNN. It first uses a segmentation algorithm to find regions with objects, then use these regions as “region proposals” for CNN to run on [2]. Each region outputs [label, bounding box].

Use Selective Search Algorithm to come up with 2000 region proposals: TODO
- Use Hierarchical Grouping Algorithm (Felzenszwalb and Huttenlocher, 1999) TODO: https://zhuanlan.zhihu.com/p/39927488
Use AlexNet for Feature Extraction on 2000 region proposals.
- Get 2000x4096 feature vector
Classification & bounding box
- Use 21 SVM to classify 21 classes (including background) on 2000 region proposals
  - Each SVM has 21 values.
- Parallel to classication, use TODO regression for bounding box regression

Later came Fast R-CNN (Girshick, ICCV 2015). Fast R-CNN propose regions, then uses convolution implementation of sliding windows to classify all proposed regions.

Then came Faster R-CNN (Ren, He et al. NeurlPS 2015). They are all slower than YOLO. From Prof. Andrew Ng’s perspective, YOLO’s 1-stage architecture is more concise.

YOLOv3: 30 fps on high end CPU
Faster R-CNN: 7 fps+
YOLOv4 and YOLOv5: 60fps+

Deep Learning - Object Detection Notes Part 2

R-CNN

Region Based CNN (R-CNN, Girshick et al. CVPR 2014)

References

CATALOG

FEATURED TAGS

FRIENDS