One of the key aspects of Continuous Delivery is to be able to deliver new versions of your software continuously. Things go wrong sometimes. A root cause analysis helps to find the reason why things fail. This activity is also called premortem. Although this is a good practice to become better with every iteration, essentially you’re too late. A completely different strategy to reduce the number of these premortem activities is the premortem strategy. In this strategy, you would imagine that a project or software release has failed and then work your way back to determine what potentially has led to the failure. Great things to know about a premortem strategy.
Essentially premortem is a strategy to collect and prioritize project risks. In terms of software development practices, it is a method to help reduce the likelihood of failed releases. Contrary to post-mortems on “looking back to what happened”, premortem sessions tend to look forward and answer questions like “What might be the cause when the system fails”. By practicing this strategy, your developer team can come up with fantastic creative ideas to solve challenges before they lead to concrete problems.
It all starts with a brainstorming session with the entire team. The main problem statement to “look back at” will be the failure of a new release. Suppose your problem statement is “The database server is out of disk space”, What could have led to this issue? Multiple causes might play a role here: somebody who misses a critical feature, improper unit testing, incorrect expectations of the expected load, or unintended configuration changes as a result of manual handovers. The list is very long and diverse.
With an open view of these causes, people feel empowered to control potential problems as well as focus on the best (architectural) design of the system to prevent them. Don’t focus on solutions yet but on the problem area and the functional perspective of the problem statement.
Step 1: One pager
The first step of the process would be to write a compact one-pager that describes the problem that needs to be solved. It needs to answer a business-related issue (answer the why question) followed by how this issue would be tackled. It includes a design and actual implementation of the proposed solution. At last, it should describe how to actually test it. All is needed to make sure everyone is on the same page with respect to the problem statement.
Step 2: Identify failure reasons
After step one, a brainstorming session should be conducted with the entire Developer team. In this step, collect all potential reasons why the system can fail. And catch difficult-to-spot issues that can cause severe harm. Following the example above, potential failure reasons include log files that grow too fast (too many requests or log level is set too high), unexpected traffic volumes that lead to many new records, or Blob storage which quickly demands more space than anticipated, etc.
Record every failure reason and don’t focus yet on solutions or other implications. This is something for the next step.
Partnering teams and/or architects from other teams should also involved if the problem statement spans multiple teams. The initial design from Step 1 should be aligned with the applied architectural standards and guidelines.
Step 3: Prioritize issues
Brainstorming can go on and on if you do not define a fixed time box. After your brainstorming session is finished, prioritize the issues. There are different ratings techniques such as “extremely high”, “high”, “medium” and “low” which describe the potential impact of an issue. Pick the most important ones first and address the issues to come up with potential solutions. Then redefine your design to consolidate the improved solution.
Conducting premortem sessions has several (organizational) advantages that improve your business. Think of the following:
- Thinking and discussing failures breaks down the “taboo” of making mistakes. It creates a positive atmosphere in which people feel safe.
- Premortem sessions help to identify the problem and tackle it from different angles. If it’s unclear, then it’s being revealed in an early stage, even before the refinement session. Having a collective session with the entire team or even with external stakeholders boosts the collective intelligence and imagination of every team member.
- An addition to the previous bullet point, new and/or inexperienced team members are also heard since they provide input. This also opens up a method for managers to collect feedback about individual team members. It gives them a hook on where and how to coach them.
- Planning and actually executing the initiatives in an organization benefit immediately from premortem outcomes.
- It increases the collective knowledge of the team, requires very little time to prepare, and is an easy-to-understand process.
These are just a handful of practical advantages. Of course, the entire system benefits from conducting premortem sessions since the overall stability grows. It also helps to outline the expertise of the team in problem-solving. Perhaps they can be seen as an example for the other teams as well.
With all good initiatives, common pitfalls have a negative impact on the effectiveness of them. It’s good to understand common pitfalls up front so you can address them in an early stage:
- Management support is essential. If people perceive premortem sessions as overhead (spending time on problems that might never become reality), it won’t work.
- Make sure everyone within the team is involved – not only group leaders or even people who are not directly involved in the problem area. If this is the case, you’ll never get to the best-designed solution and thus won’t solve problems up-front.
- Aggregating the results with the entire group of participants helps to get a common understanding of the topics and also makes sure everyone is on the same page about their shared characteristics. This helps to weigh the risks and their impact.
- Keep the same structure for your premortem sessions at least at the start when you don’t have much experience yet. Letting every team figure out the format that works for them doesn’t help the organization compare results in an equal manner.
- Don’t mix up the concepts of risk management and premortem. In risk management, you would monitor your project through a so-called risk register. You would react to risks in case they happen. In a premortem, your team imagines a project already has failed and uses that as an “as is” situation to tackle problems up-front.
For sure there are many more pitfalls that you might encounter. For now, use them to your advantage before you start practicing premortem sessions.
Since premortem sessions are relatively new and not many teams actually use this strategy, some valuable resources help to get off the ground.
From a project point of view, the website of asana.com provides a huge checklist of all the relevant aspects to take into account. You can use all of the sections to carefully execute your first premortem sessions.
Josh Clemm writes about premortem as a “Software Engineering Best Practice”. This valuable statement might help you to get commitment from the senior management in case you need it.
An interesting viewpoint of riskology is to view premortem techniques as a way to “bring order to chaos in a big project”. They emphasize looking forward instead of looking back when a project goes horribly wrong. It provides a lot of great examples and practical tips.
Premortem is a strategy to think about potential failures for projects and software applications up-front. Its goal is to capture risks in an early stage by designing the best solution. The strategy is easy to learn and implement. Although it provides many advantages to the team(s) in charge, there are common pitfalls that should be tackled. Our simple steps help to get the process done. Last but not least, the last section of this article offers great resources as a starting point to actually implement it in your organization.