Virtual Worlds as Proxy for Multi-Object Tracking Analysis

(will be updated by a git hook on commit) last modified : 04-02-2020

General Information (main fields described, non-exhaustive list)

Title: Virtual Worlds as Proxy for Multi-Object Tracking Analysis
Authors: Adrien Gaidon, Qiao Wang, Yohann Cabon and Eleonora Vig
Link: article
Date of first submission: 20 May 2016
Implementations:

Brief

This article presents a new approach to alleviate the cost of annotation for training data. They use generated data to train neural networks and show the impact of weather condition on real dataset. They have four contributions:

Generation of a generated dataset from an existing one
Creation of the Virtual Kitti dataset
Quantitative measure of the usefulness of the created world for multi-object tracking
Measurement of the impact of altered conditions on real datasets

How Does It Work

The generation of the virtual world and evaluation of the usefulness follows a five steps scheme:

Acquisition of the real data
Cloning of the world (using the annotated real images and manual reconstruction of the world)
Generation of various weather conditions
Automatic generation of detailed ground truth annotations (notably segmentation, easier using virtual world)
Evaluation of the usefulness by comparing the results obtained on real videos and the cloned ones (network trained on real data)

Having a good usefulness means that a network trained on real data will behave the same on generated data.

If you have a good usefulness, you can reasonably expect the network to behave the same for change of conditions on real and generated data (adding rain for instance). Or at least, if your results drop when you add rain in the generated data, you know that the hypothesis that your network will work well under real rainy images is "an optimistic upper bound at best".

Using this, they can try to measure the impact of altered conditions on real datasets. You evaluate the results on the altered generated data and compare with the synthetic one. You then get an expected behavior on the real dataset.

Results

They show that networks trained on real data transfers quite well to synthetic videos. But conversely, network trained on synthetic data do not transfer to real video that well (see article for more details). The table below shows the results obtain on real videos and their clones.

MDP	MOTA	MOTP	MT	ML	I	F	P	R
0001	81.8%	85.3%	78.7%	13.3%	5	6	91.1%	92.5%
v0001	82.8%	81.9%	63.3%	13.9%	1	10	98.7%	85.8%
0002	80.7%	82.2%	63.6%	27.3%	0	1	99.0%	82.5%
v0002	81.1%	81.8%	60.0%	20.0%	0	2	98.4%	83.4%
0006	91.3%	84.3%	72.7%	9.1%	0	3	99.7%	92.3%
v0006	91.3%	84.4%	81.8%	9.1%	1	2	99.9%	92.0%
0018	91.1%	87.0%	52.9%	35.3%	1	1	96.7%	95.2%
v0018	90.9%	74.9%	44.4%	33.3%	0	0	99.1%	92.4%
0020	84.4%	85.1%	58.1%	25.6%	14	24	96.7%	88.7%
v0020	84.0%	79.4%	52.1%	34.4%	1	9	99.3%	85.6%
AVG	85.9%	84.8%	65.2%	22.1%	4	7	96.7%	90.3%
v-AVG	86.0%	80.5%	60.3%	22.1%	0	4	99.1%	87.9%

They also show strong degradation of the results when using altered conditions. Which suggest that the results obtained on the Kitti dataset may not work so well on other meteorological conditions.