Michal Paszkiewicz

the bus factor

After a gruelling expedition, Douglas Mawson returned to Cape Denison without his two companions Ninnis and Mertz. They had died days earlier. Little did he know that for the next year, he would be dealing with the problem of a low bus factor on the Antarctic.

Sidney Jeffryes had been given the job of being the wireless officer at Cape Denison. No-one else had a clue how to operate the wireless system at the house. 6 men were depending on Jeffryes to stay in touch with the world outside of Antarctica. Then Jeffryes started to develop a serious paranoia and lived on the edge of a mental breakdown for the rest of the year. Getting messages across to Australia, which was already hard with the flaky wireless system, slowly became impossible. Jeffryes, believing that all his colleagues had turned against him and wanted to kill him, kept the wireless crystal to himself, refused to send messages and started storing his urine in jars.

Why am I telling you all of this, you wonder. Everyone knows that a low bus factor is bad and dangerous. Ladies and gentlemen, this may indeed be common sense, but frankly this is a recurring mistake that you won't stop seeing happening until the management starts recording bus factors of components in your office. The only way you will stop seeing this happening is if you start seeing metrics for bus factors being more closely scrutinised than the team velocities.

The greatest problem is that the problem of low bus factors always gets noticed too late.
It'll be at that point when a lot of people are leaving the office and no more money can be spent on the team.
It'll be when that piece of work is finished off and everyone has moved on to work on entirely new projects.
It'll be when you are negotiating a salary with the sole owner of the knowledge needed to keep your project afloat.
It'll be when that breaking change takes your whole system down and you no longer have anyone left to save the day.

What IS the bus factor?

I'm just going to take a small step back for all the people who have no idea what I'm talking about. The Bus Factor is a count of the amount of people without whom a project could not be run. It is named a "Bus Factor" because if these people were hit by a bus, or if some other disaster befell them (e.g. a mental breakdown in Antarctica), everything would be over.

Of course, that is not entirely true, as any project can eventually be picked back up. But in the long term, you will save a lot of effort and have smoother transitions if you can keep all the knowledge you need within your company.

Some people call the factor "the truck factor", but in London we always refer to it as the Bus Factor, due to the fact engineers here are more likely to be hit by buses. It remains to be found that that is also why buses are red.

The Bus Factor is not like golf

The Bus Factor is not like golf. It is in fact more like gold - you want as much of it as possible!
A score of 0 means your project is dead.
A score of 1 is a dire situation.
A score of 2 is still pretty desperate.
A score of 3 will mean you can always get a majority vote on tough decisions, but you should remember you're just 1 step away from a score of 2.

Generally, you are doing OK as soon as your score approaches the size of the perfect SCRUM team. If you have half a dozen people sharing domain and technical knowledge of a project, you should be safe from most disasters. You will find that while a project is running healthily, the Bus Factor is high and no-one considers worrying about it. But eventually, there is always some movement in the teams and in the office politics. Eventually, every project will reach a critical stage. Eventually, you will need to start double checking the traffic lights.

It is all good to prophesy fire and brimstone, but what can you actually do to keep your bus factor up?

You will need to attach bus factors to every logically independent section of knowledge and handle these scores carefully, aiming to maximise them with every business decision.

Reinventing the wheels on the bus

Whenever you reinvent the wheel in-house, you are decreasing the bus factor. I have talked a lot (in person) to people about reinventing wheels and I've even drawn a comic for your amusement. Interestingly, I have always found that reinventing the wheel seems to occur at the point that someone has to make some trade-offs. It might be a choice between low cost, speed and quality, or it might be a choice defined by the CAP theorem (you have to choose between consistency and availability if you want to tolerate a network partition). One of the greatest problems with software is that a new solution just doesn't seem THAT difficult to achieve. But as the codebase grows and grows, it becomes more difficult to manage.

If you pick an out-of-the-box solution supported by hundreds if not thousands of people worldwide (or large companies specialising in that one thing), you are keeping the bus factor high for that component. If something goes wrong and someone in your team can't figure out what is wrong, it shouldn't be too hard to get support from the company providing the component or from the myriad of people using the solution.

If you start making your own versions of existing products, you are needlessly adding knowledge that needs to be shared by the members of your team. At first, this may not seem to be much extra knowledge, but this knowledge-base just grows and grows. To minimise this growth, you need to keep thinking at each step about which knowledge you really need to have in your team and which you can avoid. The knowledge your team should need should be as close as possible to the Business Domain knowledge. Not that you should take this to an extreme, as seen in the case of left-pad, a trivial piece of code that was discontinued and broke numerous projects. However, if you are spending a considerable amount of time on middleware, you should reconsider what risks you are adding to the project due to the fact that it will take you longer to hand over this expanded piece of knowledge to anyone that may inherit this project from you.

A Ubiquitous Language

In Domain Driven Design, Eric Evans uses the term "Ubiquitous Language" to state what a programmer's code should look like. A programmer should write code that, if read by a non-technical member of his team, it should still make sense to them. The words in the language should have the same meaning in the code and when talking to the client.

Following the Domain Driven Design principles is not entirely a science, but more of an art. It is the art of trying to make your code look like your domain, while keeping it neatly structured. The great advantage of this art form is that, should you succeed, you will increase your bus factor.

The moment that your code is written in such a way that it is readable to your client and he is capable of checking the key parts of your logic, you have increased your bus factor to include them. Your new bus factor will also include your testers, your business analysts, your project manager, product owner and anyone else involved in the development process. You should be able to retire and the non-technical team members should be able to explain how a project works to someone who has technical skills, but no business knowledge.

Metrics

I have mentioned this a few times already in this article - you need metrics. If there is one thing you should definitely do after you read this article (if you havn't done it already), is draw a table with all the components of your project, with the bus factor drawn up next to it.
ComponentBus Factor
Main thing7
Important stuff5
Other thing2

You may want to consider a few small techniques to increase your bus factor without hiring anyone new or decreasing your teams' velocity too much. You could occasionally rotate some of the team members - let some of the bigger teams get to know what the smaller teams do. It will become easier for these individuals to take over a project if the need arises. In the meantime, if you spread the knowledge of your bigger projects to the smaller teams, this means that you should be able to bring in some emergency relief to the project if for example, half the team goes off sick and your smaller component isn't in need of maintenance.


Anyway, I've done enough rambling now. Stay safe, only cross on green lights.

published: Mon Dec 05 2016

Michal Paszkiewicz's face
Michal Paszkiewicz reads books, solves equations and plays instruments whenever he isn't developing software for Transport For London. All views on this site are the author's views only and do not necessarily represent the views of TfL.