Recently, we were facing the following problem: Most of a team’s work went into bug fixes and resolving urgent issues. Almost nothing else got done.
Solutions discussed in such situations usually are:
- A fixed percentage of the team’s time is used to fix bugs.
- No fix without a test - for every fix a test must be written
- Bug bash, one or more sprints where only bugs are fixed in order to get some air again
- A fastlane for bugs
- …
So anything that will relieve the symptoms and fix the immediate pain.
All of that is legitimate and fine. It’s just that when a team spends so much time on bugfixes that almost nothing else gets done, it doesn’t matter how you distribute the work. The team will always be massively slowed. Too much is simply too much. The solution, then, must lie elsewhere.
Let’s take a step back.
If you ask a developer, “How long does it take you to fix a bug when you’re developing a feature and your focus is on that exact part of code?” the answer is usually, “It depends, but usually only a few minutes. An hour at most if it’s something bigger.”
“What about when the code has been sitting for two to three months and you’re somewhere else entirely right now?”
“Well, I have to get back into it, maybe do some more research. At least an hour more like half a day. A day or more if it’s something bigger. Plus, I still have to back up my current work and pick up the thread there too.”
The effort to fix a bug increases with the time the code has been sitting. Or to put it the other way around: the earlier a bug is found, the easier (and cheaper) it is to fix it.
Figure 1 illustrates the relationship. The relationship is intuitively clear to most people. But what is the consequence for the team mentioned above?
If a team is busy only fixing bugs, it is because bugs are found too late.
That’s it. That’s why some teams drown in bugfixing and others don’t.
Caution! There is an assumption in the statement: The code base is not so fragile that every small change leads to a lot of bugs. In that case you have a whole other problem. But back to the matter at hand 🙂
There are essentially two leverage points to find bugs faster: Better automated tests and shorter lead times (i.e. the time between ’the code is written’ to ’the code runs in production’).
In the case in the beginning, the lead time is a year or more. Bugs that have not been found by tests are reported at the earliest one year after development. Of course, the effort to fix these bugs is gigantic. Therefore, the only solution to relieve the team from the burden of bugs in the long run is to drastically reduce the lead time.