In image classification, we often get many bounding boxes of the same object in a small subarea. Non Maximum Suppression (NMS) can help us pick which boxes we want to keep.
Inputs:
- Bounding box coordinates. These bounding boxes may / may not be far from each other.
- Confidence score of each bounding box.
- A threshold for determining if two bounding boxes are “separate enough”, IOU (Intersection Over Union)
The basic idea is to go through every pair of bounding boxes, following the descending order of confidence scores, and filter out the bounding boxes that has high intersection over union (IoU) ratios. The remaining bounding boxes are the ones that are separate from each other and have high confidence scores.
Well, let’s jump right into the code!
- First, delete the boxes with a confidence lower than 0.6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#! /usr/bin/env python3
import numpy as np
import typing
def handcrafted_non_maximum_suppresion(scores: np.ndarray, boxes: np.ndarray, overlapping_thre: float = 0.3) -> typing.List:
# Sort scores into descending order and store their indices
score_order = scores.argsort()[::-1]
# Store individual coordinates in lists.
x1s = boxes[:, 0]
y1s = boxes[:, 1]
x2s = boxes[:, 2]
y2s = boxes[:, 3]
# Find areas of each box
areas = (x2s - x1s + 1) * (y2s - y1s + 1)
keep = []
while score_order.size > 0:
keep.append(score_order[0])
# Find coords differences along x
other_x2_this_x1 = x2s[score_order[1:]] - x1s[score_order[0]]
this_x2_other_x1 = x2s[score_order[0]] - x1s[score_order[1:]]
# Rest of the score_order elements
x_overlapping_lengths = np.maximum(np.minimum(other_x2_this_x1, this_x2_other_x1), 0)
other_y2_this_y1 = y2s[score_order[1:]] - y1s[score_order[0]]
this_y2_other_y1 = y2s[score_order[0]] - y1s[score_order[1:]]
y_overlapping_lengths = np.maximum(np.minimum(other_y2_this_y1, this_y2_other_y1), 0)
# Find intersection area
overlapping_areas = x_overlapping_lengths * y_overlapping_lengths
ious = overlapping_areas / (areas[score_order[0]] + areas[score_order[1:]] - overlapping_areas)
independent_box_indices = np.where(ious <= overlapping_thre)[0]
# Because ious, independent_box_indices are arrays starting at the next element, we want to add 1 here.
score_order = score_order[independent_box_indices+1]
return keep
# [x1, y1, x2, y2]
boxes = np.array([
[100, 100, 210, 210],
[101, 101, 209, 209],
[105, 105, 215, 215],
[150, 150, 270, 270]
])
scores = np.array([0.8, 0.79, 0.75, 0.9])
overlapping_thre = 0.3
selected_indices = handcrafted_non_maximum_suppresion(scores=scores, boxes=boxes, overlapping_thre=overlapping_thre)
# Should see [3,0]
print(selected_indices)
- There’s a smarter way:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def iou(box1, box2):
"""Implement the intersection over union (IoU) between box1 and box2
Arguments:
box1 -- first box, list object with coordinates (box1_x1, box1_y1, box1_x2, box_1_y2)
box2 -- second box, list object with coordinates (box2_x1, box2_y1, box2_x2, box2_y2)
"""
(box1_x1, box1_y1, box1_x2, box1_y2) = box1
(box2_x1, box2_y1, box2_x2, box2_y2) = box2
# Follows the convention that (0,0) is the top-left corner of an image, (1,0) is the upper-right corner, and (1,1) is the lower-right corner. This is easier than using the midpoint.
xi1 = max(box1_x1, box2_x1)
yi1 = max(box1_y1, box2_y1)
xi2 = min(box1_x2, box2_x2)
yi2 = min(box1_y2, box2_y2)
inter_width = xi2 - xi1
inter_height = yi2 - yi1
inter_area = inter_width * inter_height
# Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
box1_area = (box1_x2 - box1_x1 ) * (box1_y2 - box1_y1)
box2_area = (box2_x2 - box2_x1 ) * (box2_y2 - box2_y1)
union_area = box1_area + box2_area - inter_area
print(union_area)
# When there's no overlap, the inter_width and inter_height would be negative.
iou = inter_area / union_area if inter_width > 0 and inter_height > 0 else 0
return iou
- IoU threshold is often 0.5
TensorFlow Implementation
1
2
3
4
5
6
7
8
9
tf.image.non_max_suppression(
boxes,
scores,
max_output_size,
iou_threshold=0.5,
score_threshold=float('-inf'),
name=None
)
- Bounding boxes are supplied as
[y1, x1, y2, x2]
, where(y1, x1)
and(y2, x2)
are the coordinates of any diagonal pair of box corners -
A scalar integer Tensor representing the maximum number of boxes to be selected by non-max suppression.
tf.gather()
: The bounding box coordinates corresponding to the selected indices can then be obtained using the tf.gather operation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
boxes = tf.constant([
[0.1, 0.2, 0.3, 0.4], # Box 1
[0.5, 0.6, 0.7, 0.8], # Box 2
[0.2, 0.3, 0.4, 0.5], # Box 3
[0.6, 0.7, 0.8, 0.9] # Box 4
])
# Let's say the selected indices after a selection operation are:
selected_indices = tf.constant([0, 2]) # Select Box 1 and Box 3
# see: tf.Tensor(
[[0.1 0.2 0.3 0.4]
[0.2 0.3 0.4 0.5]], shape=(2, 4), dtype=float32)
selected_boxes = tf.gather(boxes, selected_indices)
MIoU
Mean IoU is to take the average of IoU at each channel:
\[miou = \frac{1}{C} \sum_C IoU_c \\ = \frac{1}{C} \sum_C \frac{True Positive}{True Positive + False Positive + False Negative}\]