How Safety Cases Fail in Complex System Interactions

safety case

Safety cases don’t usually fail where people expect

When people hear “safety case failure,” they often imagine something quite simple.

  • a missing requirement
  • an incorrect assumption
  • a calculation error
  • a missed hazard

And yes, those do happen.

But in modern aviation and complex engineered systems, that’s rarely how safety cases actually fail in practice.

They don’t usually fail because something is missing.

They fail because everything looks correct in isolation — but breaks down in interaction.

What a safety case is really assuming

At its core, a safety case is trying to demonstrate one thing:

The system is acceptably safe to operate within a defined environment.

To do that, it relies on:

  • hazard identification (FHA)
  • failure analysis (FTA, FMEA, etc.)
  • probability and severity classification
  • mitigation strategies (redundancy, alerts, procedures)
  • validation and verification evidence

Each of these elements is usually assessed as a structured, well-bounded problem.

And that works — until the system stops behaving as a set of independent parts.

The hidden assumption: clean separation of concerns

Most safety cases assume something important:

System functions can be analysed independently and then recombined safely.

So you evaluate:

  • Sensor A
  • System B
  • Function C
  • Human procedure D

And then conclude:

“Overall risk is controlled.”

But in real systems, especially modern aircraft, this assumption starts to break down.

Because these elements are not independent.

They are tightly coupled and continuously interacting.

Where interaction breaks the model

Complex systems don’t fail in straight lines.

They fail in combinations of interactions that were individually considered safe.

For example:

  • a sensor produces slightly degraded data
  • automation logic reacts correctly to that data
  • the pilot responds correctly to the automation output
  • ATC procedures are followed correctly

Individually, each component behaves as designed.

But collectively, the system state diverges from reality.

This is where safety cases start to lose fidelity.

Not because the analysis was wrong — but because the interaction space was underestimated.

The problem of system coupling

As systems evolve, coupling increases:

  • sensors feed multiple systems simultaneously
  • automation depends on fused data sources
  • control laws depend on inferred state
  • humans depend on system interpretation
  • external systems (ATC, traffic, procedures) feed back into the same loop

This creates a condition where a change in one part of the system can propagate in unexpected ways across multiple layers.

Importantly, this propagation is not always linear or obvious at design time.

Why redundancy does not always protect the safety case

Redundancy is often treated as a safety case strength.

And in simple systems, it is.

But redundancy assumes:

independent failure modes and clear voting logic between elements.

In complex interactions, redundancy can introduce a different problem:

  • multiple correct signals under different assumptions
  • disagreement between valid sources
  • ambiguity in which source represents “truth”

So instead of eliminating uncertainty, redundancy can sometimes redistribute uncertainty across the system.

Human operators: the final interpretation layer

In most safety cases, humans are treated as:

  • mitigators of system failure
  • procedural actors
  • final decision-makers under defined rules

But in reality, the human is also a real-time interpreter of system state under uncertainty.

And that interpretation depends on:

  • consistency of feedback
  • trust in system outputs
  • workload and time pressure
  • clarity of automation state

If system outputs are internally inconsistent, the human does not see system design boundaries.

They see conflicting reality signals.

And the safety case rarely models that transition explicitly.

Where safety cases quietly degrade

Safety cases tend to be strongest in:

  • normal operation
  • single-fault conditions
  • well-defined failure modes

They tend to be weakest in:

  • multi-layer partial failures
  • degraded but functional systems
  • conflicting but valid system outputs
  • dynamic transitions between modes

Because these states are hard to enumerate exhaustively.

And more importantly, they are defined by interactions, not components.

The real failure mode: unmodelled interaction space

The most important gap in many safety cases is not missing hazards.

It is missing interaction combinations:

  • system A behaves correctly with degraded input from system B
  • system B behaves correctly with outdated state from system C
  • human operator behaves correctly based on system A and B outputs
  • overall system state is still incorrect

Each step is valid.

But the combination was not explicitly bounded.

This is where latent risk accumulates.

Why this matters in modern aviation systems

Modern aircraft are not just mechanical systems with redundancy.

They are:

  • distributed sensing systems
  • real-time control systems
  • adaptive automation layers
  • human-in-the-loop decision systems
  • external networked environments (ATC, traffic systems)

This creates a key shift:

Safety is no longer a property of components — it is a property of system interaction coherence.

And safety cases must now demonstrate not only that components are safe…

but that their interactions remain interpretable and bounded under degraded conditions.

Closing Thought

A safety case can be technically correct in every individual analysis step.

Every hazard can be identified.

Every failure mode can be classified.

Every mitigation can be justified.

And still fail in practice.

Not because it was wrong…

but because the system it describes does not operate as isolated parts.

It operates as a continuously interacting network.

And in that network, safety is not just about preventing failure.

It is about preserving consistency of understanding across all system layers when things start to deviate from normal operation.