Why machine learning development should always be Agile
In today’s world, Agile software development is considered a mainstream practice.
While there’s plenty of literature on related topics, most are focused on standard enterprise software development scenarios.
A prime example of this is business processes automation. Typically, articles and books cover graphical user interface applications scenarios, missing the full spectrum of possibilities.
The real picture is this: Agile principles are flexible and powerful enough to go beyond standard use cases and software development.
Let’s take a step back, to get some context around my experience on the topic. I’m a Product Owner in TomTom's Autonomous Driving product unit. At TomTom, we make maps for both humans and machines, including HD Maps for autonomous vehicles (making my product unit aptly named).
In this article, I will highlight the principles of Agile software development together with machine learning projects, for insight on how TomTom approaches innovation.
The capabilities of machine learning
If you want to build a truly global worldwide map with centimeter-level accuracy and extremely high detail like we do at TomTom, you’re going to need a great deal of automation.
Ultimately, people should be excluded from the loop of extracting map data from images, lidar point clouds and other sensor measurements of the Earth. And this is where the modern computer vision and other machine learning technologies come into play.
As a side note, it’s important to understand that although we are solving complex problems with machine learning, we’re not a research institute. Essentially, what we do is software engineering. Therefore, most of the principles and best practices of software development apply, including clean code, unit testing, and modularity, to skim the surface.
As for any software development team, it is absolutely crucial that the machine learning teams keep an explicit and transparent connection between research activities and the value they bring to the business and its clients.
Gathering new knowledge
Machine learning development requires a significant research component, making it special compared to other types of software development. The goal of the research is to find new knowledge that either the team is lacking or even knowledge that isn’t available at all.
Our development process consists of three main components:
Applied research. Digesting existing research papers and trying to implement them in our system
Basic research. Authoring our own research papers and publishing them to the community
Software engineering. Conducting software engineering around machine learning algorithms to incorporate them into real products
While they have many commonalities, the differences between these activities are in the levels of uncertainty and technical complexity.
It’s crucial to note machine learning deals in much more uncertainty than regular software development. Some research hypotheses and ideas might not work for the use case and application, so it could take a while to ascertain the issues. Developers and the business should always be ready for hiccups.
Another interesting observation is that some of the factors that are considered the quality attributes (also known as non-functional requirements) of the system are now becoming the main features.
Say you’re building an object detection or image classification pipeline, what would be most important? At the top of the list would be how accurately and correctly the algorithm does the task.
In traditional software development, attributes like accuracy are usually not considered features of the product and most of the development is dedicated to making new functionalities. In machine learning use cases, however, attributes like algorithm accuracy are among the main features and most of the development is dedicated to that.
Embracing the Agile mindset
Here is how machine learning teams can benefit from embracing the Agile mindset:
Understanding that it is more difficult to plan in advance due to the significant research component in development that usually implies high uncertainty about the result of each iteration.
Staying focused on goals by getting sufficient feedback from clients. There’s a temptation to carry out endless research on topics that are far more interesting for a researcher, rather than relevant for the customer.
Self-organizing to catalogue the many ideas that come from within teams.
Applying Agile principles to help relieve some of the pressure and manage the complexity inherent to the machine learning development process.
Balancing quick and optimal
It’s important to keep a balance between quick and optimal solutions.
In the context of machine learning, a quick approach usually implies heuristic techniques in solving data-driven challenges. A quick solution can clearly bring in results fast – up to several months. While quick is often good enough, the team should always aim to solve the same problem in a more structured and scientific way, which will require flexibility and discipline.
These challenges only emphasize the value of a fast feedback loop and the ability to focus on what’s most valuable for the customer. In other words, machine learning development can and should be Agile.
In an upcoming article, I will delve into the specific Agile concepts, techniques, and methodologies that we are using for the machine learning development within the TomTom Autonomous Driving unit.
People also read
Behind the map: how we keep our maps up to date
How to structure machine learning work effectively
The TomTom Traffic Index: an objective measure of urban traffic congestion