FSSD

last modified : 06-11-2018

General Information

Title: FSSD: Feature Fusion Single Shot Multibox Detector
Authors: Zuo-Xin Li, Fu-Qiang Zhou
Link: article
Date of first submission: 4 December 2017
Implementations:
- Pytorch
- Caffe

Brief

The FSSD is an improved version of the SSD. The authors try to add semantic information to improve the mAP, while at the same time not loose too much time in computation.

How Does It Work

The FSSD is very close to the SSD, the principle is exactly the same, a cascade of convolutional layers used to predict a set of boxes. The difference is that the networks makes two passes on some layers to add information on the smaller boxes predicted (see image below):

SSD/FSSD comparison

Results

The results shown here are the main results for the networks, some were omitted, look into the paper for more details.

Comparison of results for classification on the VOC 2007 test set :

Model	train set	mAP	fps	gpu	backbone network
Faster RCNN	07+12	73.2	7	Titan X	VGGNet
Faster RCNN	07+12	76.2	2.4	K40	ResNet-101
SSD	07+12+COCO	81.2	46	Titan X	VGGNet
SSD	07+12	77.2	85	1080Ti	VGGNet
FSSD300	07+12+COCO	82.7	65.8	1080Ti	VGGNet

Comparison of results for classification on the VOC 2007 test set.

In Depth

As said before, the main particularity of the network is two reuse computed feature maps to improve the accuracy of the network with out impeding too much the computation speed.

If one takes a look at the diagram in the first section, we can see that three layers are re-used before the the classification part. The selection is limited to those three layers because they think that the smaller layer at the end have too little information to be worth merging.

The merging is done using concatenation of the three layers and by resizing the smallest layers to the correct size using bilinear interpolation. To downsize the number of filers, they use 1x1 convolutions.