Receptive Field Block Net for Accurate and Fast Object Detection

last modified : 09-11-2018

General Information

Title: Receptive Field Block Net for Accurate and Fast Object Detection
Authors: Songtao Liu, Di Huang, Yunhong Wang
Link: article
Date of first submission: Tue, 21 Nov 2017
Implementations:
- PyTorch

Brief

This article tries to improve the accuracy of the single shot detectors while avoiding a decrease of the speed of inference. To do so, they introduce a new module, the RF Block (RFB) inspired by the Receptive Fields (RFs) in the human visual systems. The idea is to copy the pattern of the neurones in the human visual cortex. They get results on par with the state of the art multi-stage detectors while keeping the speed advantage of the one stage classifiers.

How Does It Work

The architecture of the proposed test network is shown in the following figure and is a SSD with some of the layers replaced by the RFB module. The backbone network is still a VGG16 and the prediction is still made by a cascade of convolutions.

RFB Network

The idea is to look wider, the RBF module makes use of a trou convolutions to increase the perception field of the network, as shown in the following figure:

RFB module activation

Results

Comparison of results for detection on the VOC 2007:

Model	mAP	fps
Faster RCNN	73.2%	7
R-FCN	80.5%	9
YOLOv2 544	78.6%	40
R-FCN w deformable CNN	82.6 \%	8
SSD300	77.2%	46
RFB Net300	80.5%	43
RFB Net512	82.2%	18

Comparison of results for classification on COCO dataset:

Method	Backbone	Data	Time	Avg. Precision, IoU: 0.5:0.9
Faster	VGG	trainval	147 ms	24.2
Faster+++	ResNet-101	trainval	3.36 s	34.9
Faster w FPN	ResNet-101-FPN	trainval35k	240 ms	36.2
Faster by G-RMI	Inception-Resnet-v2	trainval	–	34.7
R-FCN	ResNet-101	trainval	110 ms	29.9
R-FCN w Deformable CNN	ResNet-101	trainval	125 ms	34.5
Mask R-CNN	ResNext-101-FPN	trainval35k	210 ms	37.1
YOLOv2	darknet	trainval35k	25 ms	21.6
SSD300*	VGG	trainval35k	22 ms	25.1
SSD512*	VGG	trainval35k	53 ms	28.8
DSSD513	ResNet-101	trainval35k	182 ms	33.2
RetinaNet500	ResNet-101-FPN	trainval35k	90 ms	34.4
RetinaNet800	ResNet-101-FPN	trainval35k	198 ms	39.1
RFB Net300	VGG	trainval35k	25 ms	29.9
RFB Net512	VGG	trainval35k	55 ms	33.8
RFB Net512-E	VGG	trainval35k	59 ms	34.4

In Depth

They introduce two different module, the rfb_a and rfb_b. In conception they are both very similar, the internal layer is the only part changing, cf figures.

RFB-a: RFB module a

RFB-b: RFB module activation

For each of the two modules, the principal change compared to an inception module is the presence of the dilated convolutions.