How to structure machine learning work effectively
In my previous article, I explained why machine learning development should always be Agile. Now I will share how we apply those principles within the TomTom Autonomous Driving unit.
Specifically, breaking down the machine learning work into small parts in order to iterate and deliver in short cycles and getting valuable feedback from customers.
We found that the industry best practices work pretty well, but some important details on how to fit that structure into machine learning development are worth mentioning.
Break it down!
Themes
Themes refer to large focus areas of the company, usually separate products. Themes are not solely dedicated to machine learning work. They combine a set of activities that are done by different teams, in the end amounting to something big and important. An example of a theme is TomTom’s HD Map.
Initiatives
An initiative refers to something significant that usually requires a good amount of development and cross-team coordination. Keeping with the example of the TomTom HD Map, an initiative for this theme could be a new map attribute.
Let’s dive into this
Say we want to locate all the potholes in our HD Map. For that, we would need to develop an application for manual annotation and validation of the pothole data, then machine learning algorithms to automate the pothole detection and a lot of backend processes to fit this together and eventually put on the map. That would be an initiative.
Epics
Then comes the epic. Epics are big chunks of work that have a clear result and influence the product. They usually take one to three months to deliver.
In our setup, epics are generally supposed to be developed and delivered by a single team. That's why epics are the main means by which the team communicates with the "outer world". They need to be formulated in such a way to be understood not only by developers but also by various stakeholders that usually have less context knowledge and a lighter understanding of machine learning.
For machine learning teams, an epic is usually an idea or hypothesis that can be a subject of research and development work. For example, new deep neural network architecture. In a machine learning context, it is quite normal that an idea eventually doesn't work out. An epic could have a positive or a negative outcome, depending on the situation.
User stories
An epic contains a number of user stories and other backlog items.
So, what is a user story for machine learning projects?
The concept of a user story comes from the XP methodology. It is a natural language description of one or more features of a software system, written from an end-user perspective.
Here’s where the main issue comes in.
Usually, machine learning pipelines are hidden away from end-users. Which is why the "As a user I want... so that..." template cannot be applied very successfully in most cases.
“Alexander Korvyakov
Product Owner Autonomous Driving
Let’s flesh this out with an example. Say we're building the detector service that recognizes potholes on pictures of the road. Now let's write an example user story for the machine learning team according to the "classic" template:
"As an HD Map user, I want to have the 'deep pothole' detection model optimized so that a pothole is located on the map more accurately."
This sounds a bit outlandish, doesn’t it? A map user would never ask for a machine learning model to be optimized because the user has no idea of its existence in the first place.
Given this, let's just call them stories to reduce the ambiguity.
The stories in this setup are usually quite technical. Still, it’s important to describe them in a neat, concise and easily understandable way so that they can be taken up by any developer on the team. They must also be understandable by the product owner, new developers or anyone else with limited knowledge of the particular context or technical details.
“Alexander Korvyakov
Product Owner Autonomous Driving
The three Ws
One of the possible alternatives to the common "as a user" pattern discussed above, is the so-called WWW template. The same example would sound like:
What: "deep pothole" model optimization
Why: to increase the accuracy of pothole localization
Who will benefit: users of the HD Map
Acceptance criteria
Apart from that "what, why, who" formula, a good story should have detailed acceptance criteria so that it is clear what exactly should be the result of the work, and how to assess that result. This is especially important for more research-oriented work because sometimes the research tends to be never-ending.
Acceptance criteria of the story should draw the boundaries of the research to get a definite result from the work and help to eventually evolve from a research idea to a product feature.
In the "pothole" example above, the acceptance criteria would describe what the expected impact of the model optimization is, how to measure that impact and possibly what the necessary experiments that should be done are.
Of course, due to the nature of research-oriented work, it is often not possible to define all the acceptance criteria up-front, so making room for exploration and experimentation is a good idea. However, that doesn't mean that a machine learning team can drop any attempts at defining what the scope and expected result of the work should be. Whatever is certain during planning about the expected result of work should be reflected in the acceptance criteria.
“Alexander Korvyakov
Product Owner Autonomous Driving
More insights about stories
There are also other measures for effective research and development stories. While this will be the focus of a future article, here are two to already consider:
Thorough discussions and estimations within the development team before taking the stories into work.
Collaboration between the developers, and with the product owner during the implementation phase.
Takeaways
The boundary between "how" and "what" moves up in the hierarchy making stories and even epics more technical.
New stories and epics are, more often than not, initiated by developers, while initiatives and some epics come from product stakeholders.
Not all backlog items should be stories.
There are bug tickets. From the product owner's perspective, the main difference between fixing a bug and other activities is that bug fixing doesn't really add new value to the product.
Other types of backlog items also could be introduced. For instance, some maintenance activities such as refactoring are not really stories. They could be called tasks.
Stories: knowledge and feature
Stories can be further classified into "new knowledge" and "new feature" stories.
Let me explain.
The high uncertainty of whether a machine learning idea will work often makes it risky to combine the feasibility investigation (and other kinds of research activities) with the implementation of that idea within the same iteration of the plan. Thus, the epics contain multiple backlog items of a different kind.
If the team has an idea on how to improve its machine learning product often, it makes sense to make a "new knowledge" story, which would have knowledge as a result. One important benefit of this approach is that many new ideas tend to be ineffective, so it is good to carry out some investigation and experimentation, then report on what has been discovered, present the results to the team and then iterate further.
As I mentioned before, the epic could end up with a negative result (i.e. the idea didn't work out). By dividing the "new knowledge" and "new feature" stories, we make sure this fact is discovered as early as possible. In case the idea is actually effective and feasible for the product, this distinction would still help to achieve the desired result sooner.
The "new knowledge" idea is somewhat contradictory to the notion of a user story, as it implies that every iteration of work should have a direct impact on the customer. However, I believe this is the "lesser evil" given the benefits that approach gives.
In the case of "knowledge" stories, it is especially important for the team to properly manage that knowledge in a clear, consistent and persistent way. In the next article, I'll reveal how we manage knowledge in TomTom Autonomous Driving machine learning teams.
An example of a "new knowledge" story would be:
What: run a set of experiments for the "deep pothole" model optimization
Why: prove that the pothole location accuracy can be further improved
Who will benefit: developers
Acceptance criteria:
The key internal metrics for the pothole localization should improve on the evaluated dataset
The evaluation should take different road surface types and geographies into account
Report the results in a written form
Yes, it's the developers who benefit most of all from that work.
Then the "new feature" story would look like:
What: implement the "deep pothole" model optimization
Why: to improve the pothole location accuracy
Who will benefit: HD Map users
Acceptance criteria:
So far, I haven’t mentioned the definition of done (DoD). Unlike the acceptance criteria, the DoD is not specific to one story, but rather a generic checklist of all items to be done before a story of a certain kind can be considered complete. That checklist is created and maintained by the team and is usually specific for different projects. For "knowledge" and "feature" stories, the DoDs are also supposed to differ for obvious reasons.
Wrap-up
In the first article of the series, I explained the "why", the conceptual part of the "Agile machine learning" mindset. In this article, we focused on the "static" side of the story, namely, how to structure, divide and formulate the work for machine learning research and development teams.
Next time, I will unveil some "dynamic" aspects. I will tell you how to apply specific Agile methodologies like Scrum and XP in the machine learning setup. I will also share some home-grown best practices to gather, grow and manage the knowledge in order to apply the research ideas in real software projects.
Stay tuned!
People also read
Why machine learning development should always be Agile
Behind the map: how we keep our maps up to date
The TomTom Traffic Index: An objective measure of urban traffic congestion
* Required field. By submitting your contact details to TomTom, you agree that we can contact you about marketing offers, newsletters, or to invite you to webinars and events. We could further personalize the content that you receive via cookies. You can unsubscribe at any time by the link included in our emails. Review our privacy policy.