Stop Putting Band-Aids on Bullet Holes
Most operational failures are not solved. They are temporarily suppressed.
The server crashes, so the team reboots it. The project misses the deadline, so leadership adds more meetings. Customer churn rises, so the company launches a discount campaign. The symptom disappears for a week, then returns in a slightly different form.
This is first-order problem solving: treating the visible failure instead of the system that created it.

Taiichi Ohno’s rule was simple:
Keep asking “Why?” until you reach the system-level cause.
The mistake most organisations make is assuming the root cause sits close to the incident itself. In practice, the opposite is usually true.
What looks like a technical failure at Level 1 is often a process failure at Level 3 and a leadership failure at Level 5.
A missed deadline is rarely caused by “slow engineering.” A customer escalation is rarely caused by “poor communication.” An outage is rarely caused by “human error.”
Those are surface-level manifestations of deeper operating design problems.
The five-level pattern
A typical failure chain looks like this:
Problem: We missed the Q1 shipping deadline.
Why did we miss the deadline?
Engineering delivery slipped.Why did engineering delivery slip?
Requirements kept changing.Why did requirements keep changing?
Sales kept committing to new features during the quarter.Why was Sales doing that?
Their incentives rewarded revenue generation, not delivery feasibility.Why were incentives structured that way?
Leadership optimised compensation for growth without integrating operational constraints.
That fifth layer matters because it changes the corrective action completely.
A Level 1 fix sounds like this:
“Engineering needs to execute faster.”
A Level 5 fix sounds like this:
“Redesign commercial incentives so feature commitments require delivery sign-off.”
One treats symptoms. The other changes the system.
The “human error” trap
Many organisations stop their root cause analysis at the first socially convenient answer:
“Someone forgot.”
“Someone missed the check.”
“Someone made a mistake.”
That is not root cause analysis. That is blame with administrative language.
If your investigation ends with “human error,” the process design is probably incomplete.
Humans forget things. Humans skip steps. Humans get distracted. Good operating systems assume this in advance.
Toyota’s manufacturing philosophy was built around this principle. Instead of trying to create flawless humans, they designed systems that prevented predictable mistakes from occurring in the first place.
That distinction matters because organisations that punish individuals for process failures usually create two downstream problems:
Employees hide issues earlier.
The underlying failure mechanism survives untouched.
The result is recurring operational instability.
The executive rule
A useful leadership rule is this:
Never fire a person for a process failure you designed.
That does not eliminate accountability. It relocates accountability upward, toward system ownership.
If the process allows a predictable mistake to occur repeatedly, leadership owns the process.
This is why mature operational cultures focus less on blame and more on controls, incentives, interfaces and decision architecture.
The goal is not to find who failed.
The goal is to understand why the system permitted the failure.
Use this prompt to run a proper root cause analysis.
Prompt: Toyota 5 Whys Analysis
Act as a Root Cause Analyst using the Toyota 5 Whys Method.
The Problem: [Describe the operational failure]
Walk backwards through five layers of causality.
For each layer:
State the immediate cause.
Explain why that cause existed.
Distinguish whether this is:
a technical failure
process failure
incentive failure
communication failure
leadership failure
At Level 5:
identify the root systemic cause
explain which leadership assumption, structure or incentive created it
Then output:
The incorrect Level 1 fix most organisations would apply
The correct Level 5 corrective action
The operational risks if the root cause remains unresolved
The metrics or signals that would confirm the issue is fixed
The next problem: recurrence
Even when organisations identify the root cause correctly, many still fail because they do not redesign the process afterward.
They conduct the post-mortem, document the findings, circulate the slides, then leave the workflow unchanged.
That guarantees recurrence.
Toyota addressed this using “Poka-Yoke”: mistake-proofing mechanisms that either prevent the error entirely or make it immediately visible.
A strong operational process should not rely on memory, heroics or vigilance.
It should make the correct behaviour automatic.
Examples:
A deployment pipeline that blocks production release without test coverage
A CRM that cannot progress without mandatory data fields
A procurement workflow that requires dual approval above spending thresholds
A board paper template that forces the recommendation onto page one
The pattern is always the same:
Do not ask people to remember critical controls.
Engineer the controls into the system.
Use this prompt to redesign the process itself.


