Safety cases don’t usually fail where people expect
When people hear “safety case failure,” they often imagine something quite simple.
- a missing requirement
- an incorrect assumption
- a calculation error
- a missed hazard
And yes, those do happen.
But in modern aviation and complex engineered systems, that’s rarely how safety cases actually fail in practice.
They don’t usually fail because something is missing.
They fail because everything looks correct in isolation — but breaks down in interaction.
What a safety case is really assuming
At its core, a safety case is trying to demonstrate one thing:
The system is acceptably safe to operate within a defined environment.
To do that, it relies on:
- hazard identification (FHA)
- failure analysis (FTA, FMEA, etc.)
- probability and severity classification
- mitigation strategies (redundancy, alerts, procedures)
- validation and verification evidence
Each of these elements is usually assessed as a structured, well-bounded problem.
And that works — until the system stops behaving as a set of independent parts.
The hidden assumption: clean separation of concerns
Most safety cases assume something important:
System functions can be analysed independently and then recombined safely.
So you evaluate:
- Sensor A
- System B
- Function C
- Human procedure D
And then conclude:
“Overall risk is controlled.”
But in real systems, especially modern aircraft, this assumption starts to break down.
Because these elements are not independent.
They are tightly coupled and continuously interacting.
Where interaction breaks the model
Complex systems don’t fail in straight lines.
They fail in combinations of interactions that were individually considered safe.
For example:
- a sensor produces slightly degraded data
- automation logic reacts correctly to that data
- the pilot responds correctly to the automation output
- ATC procedures are followed correctly
Individually, each component behaves as designed.
But collectively, the system state diverges from reality.
This is where safety cases start to lose fidelity.
Not because the analysis was wrong — but because the interaction space was underestimated.
The problem of system coupling
As systems evolve, coupling increases:
- sensors feed multiple systems simultaneously
- automation depends on fused data sources
- control laws depend on inferred state
- humans depend on system interpretation
- external systems (ATC, traffic, procedures) feed back into the same loop
This creates a condition where a change in one part of the system can propagate in unexpected ways across multiple layers.
Importantly, this propagation is not always linear or obvious at design time.
Why redundancy does not always protect the safety case
Redundancy is often treated as a safety case strength.
And in simple systems, it is.
But redundancy assumes:
independent failure modes and clear voting logic between elements.
In complex interactions, redundancy can introduce a different problem:
- multiple correct signals under different assumptions
- disagreement between valid sources
- ambiguity in which source represents “truth”
So instead of eliminating uncertainty, redundancy can sometimes redistribute uncertainty across the system.
Human operators: the final interpretation layer
In most safety cases, humans are treated as:
- mitigators of system failure
- procedural actors
- final decision-makers under defined rules
But in reality, the human is also a real-time interpreter of system state under uncertainty.
And that interpretation depends on:
- consistency of feedback
- trust in system outputs
- workload and time pressure
- clarity of automation state
If system outputs are internally inconsistent, the human does not see system design boundaries.
They see conflicting reality signals.
And the safety case rarely models that transition explicitly.
Where safety cases quietly degrade
Safety cases tend to be strongest in:
- normal operation
- single-fault conditions
- well-defined failure modes
They tend to be weakest in:
- multi-layer partial failures
- degraded but functional systems
- conflicting but valid system outputs
- dynamic transitions between modes
Because these states are hard to enumerate exhaustively.
And more importantly, they are defined by interactions, not components.
The real failure mode: unmodelled interaction space
The most important gap in many safety cases is not missing hazards.
It is missing interaction combinations:
- system A behaves correctly with degraded input from system B
- system B behaves correctly with outdated state from system C
- human operator behaves correctly based on system A and B outputs
- overall system state is still incorrect
Each step is valid.
But the combination was not explicitly bounded.
This is where latent risk accumulates.
Why this matters in modern aviation systems
Modern aircraft are not just mechanical systems with redundancy.
They are:
- distributed sensing systems
- real-time control systems
- adaptive automation layers
- human-in-the-loop decision systems
- external networked environments (ATC, traffic systems)
This creates a key shift:
Safety is no longer a property of components — it is a property of system interaction coherence.
And safety cases must now demonstrate not only that components are safe…
but that their interactions remain interpretable and bounded under degraded conditions.
Closing Thought
A safety case can be technically correct in every individual analysis step.
Every hazard can be identified.
Every failure mode can be classified.
Every mitigation can be justified.
And still fail in practice.
Not because it was wrong…
but because the system it describes does not operate as isolated parts.
It operates as a continuously interacting network.
And in that network, safety is not just about preventing failure.
It is about preserving consistency of understanding across all system layers when things start to deviate from normal operation.

