To add to the challenge, data sources are both in the form of video data from cameras as well as lidar point cloud data. To build HD maps, this data needs to be accurately and consistently labeled. Today, labeling can be done either manually – which is time-consuming and not scalable – or by using artificial intelligence (AI).
Using AI to label data
At TomTom, we leverage advanced AI algorithms to label data in a robust, scalable way, leading to higher quality HD maps. This involves extracting detailed geometry and semantics from our rich lidar and camera data sources.
One challenge lies in applying convolutional neural networks (CNNs) to structured prediction problems. While they have been successfully applied to perform image-based segmentation tasks, they fall short in cases when the problem is not strictly a per-pixel classification task and the predictions need to preserve certain structures or qualities.
Introducing EL-GAN
To address this problem, the TomTom AI team proposed a novel framework: embedded loss generative adversarial networks, or, in short
EL-GAN. This framework improves semantic segmentation results by adding an additional “adversarial” loss term to better preserve the structural qualities.
The result is a great reduction in the need for error-prone post-processing of the intermediate neural network output. This is achieved by leveraging the concept of generative adversarial networks (GANs), where two contrasting networks are trained together: a generator that is trained to create results and a discriminator that is trained to distinguish fake results from ground truth results.