Building maps for autonomous driving is a challenge, even for experienced mapmakers. It entails making sense of vast amounts – petabytes – of data gathered by survey vehicles and connected cars. By leveraging advanced AI algorithms, TomTom is accelerating the production of HD maps for self-driving cars.
To add to the challenge, data sources are both in the form of video data from cameras as well as lidar point cloud data. To build HD maps, this data needs to be accurately and consistently labeled. Today, labeling can be done either manually – which is time-consuming and not scalable – or by using artificial intelligence (AI).
At TomTom, we leverage advanced AI algorithms to label data in a robust, scalable way, leading to higher quality HD maps. This involves extracting detailed geometry and semantics from our rich lidar and camera data sources.
One challenge lies in applying convolutional neural networks (CNNs) to structured prediction problems. While they have been successfully applied to perform image-based segmentation tasks, they fall short in cases when the problem is not strictly a per-pixel classification task and the predictions need to preserve certain structures or qualities.
To address this problem, the TomTom AI team proposed a novel framework: embedded loss generative adversarial networks, or, in short EL-GAN. This framework improves semantic segmentation results by adding an additional “adversarial” loss term to better preserve the structural qualities.
The result is a great reduction in the need for error-prone post-processing of the intermediate neural network output. This is achieved by leveraging the concept of generative adversarial networks (GANs), where two contrasting networks are trained together: a generator that is trained to create results and a discriminator that is trained to distinguish fake results from ground truth results.Overview of the EL-GAN architecture, illustrating both the training of the generator and discriminator with examples from the TuSimple lane marking challenge
In our proposed EL-GAN solution, the discriminator is trained not only on predictions but also is provided with ground truth labels and trained to minimize the difference between the predictions and the labels in a dynamically learned embedding space. As a result, the semantic segmentation predictions are structurally much more similar to the training labels, without needing complicated post-processing.Illustration of using EL-GAN for lane marking segmentation: an example of a ground truth label (left), its corresponding raw prediction by a conventional segmentation network (middle), and a prediction by EL-GAN (right). Note how EL-GAN matches the thin-line style of the labels in terms of certainty and connectivity
The EL-GAN approach, which you can explore in full in the technical report, is shown to be particularly useful for semantic segmentation problems involving lane detection, making it easy to enforce qualities such as thinness, uniqueness and straightness of the line.
By leveraging advanced AI algorithms and improving them with cutting-edge approaches such as EL-GAN, we’re in high gear to accelerate the production of HD maps for autonomous driving, leading to high quality, robust and scalable product.
The TomTom HD Map is a highly accurate representation of the road which helps autonomous vehicles precisely localize themselves on the road and support sensors in understanding their surroundings and plan manoeuvres in a way that is safer and more comfortable for passengers. To achieve this, the HD map uses attributes such as lane models, traffic signs, road furniture and lane geometry, with centimetre-level accuracy.