Safety Engineering Doesn’t Fail at the Big Things — It Fails at the “Almost Invisible” Ones

question

There’s a strange pattern in aviation safety work that you only really notice after a while.

The catastrophic events—the ones that make reports, boards, and headlines—almost never come from something truly unknown. In hindsight, they’re usually traceable. The warning signs were there. The system “knew” in some sense.

So the real question is not: why did we miss it?

It’s:

Why do we keep letting small, almost boring weaknesses accumulate until they matter?


 

We’re very good at designing against big failures

Modern safety engineering is excellent at obvious things:

  • Engine failures
  • Structural overloads
  • Electrical faults
  • Software crashes
  • Single-point failures

We have tools, models, redundancy, certification regimes, and decades of lessons learned.

If something is dramatic and clearly dangerous, we usually have a strategy for it.

And yet…

That’s not where most risk lives anymore.


 

The real risk is the stuff that “sort of works”

The uncomfortable truth is that the hardest safety problems today are not binary failures.

They’re degraded behaviours like:

  • A sensor that’s slightly unreliable, but “good enough”
  • A procedure that still works, but only if people shortcut it
  • A maintenance step that gets quietly compressed under time pressure
  • A warning system that’s technically functional, but routinely ignored
  • A design assumption that is almost true in real operations

Individually, none of these are catastrophic.

Together, they become normalised drift.

And normalised drift is where safety quietly erodes.


 

Safety systems don’t usually break — they relax

One of the most misleading mental models in safety engineering is that systems “fail.”

In reality, most systems don’t fail suddenly. They relax.

They move from:

  • tightly designed behaviour
    to
  • loosely followed intent
    to
  • informal practice
    to
  • “this is how we actually do it”

By the time something serious happens, the system hasn’t “broken” in a technical sense.

It has simply drifted far enough from its safety assumptions that those assumptions no longer apply.


 

The dangerous word in safety engineering: “acceptable”

A lot of safety work quietly depends on this idea:

“This level of risk is acceptable.”

That’s not wrong. It’s necessary. You can’t design a zero-risk aviation system.

But here’s the subtle issue:

“Acceptable” is not a fixed property. It is a contextual agreement.

And context changes:

  • Staffing pressure changes
  • Operational tempo changes
  • Technology gets layered over old assumptions
  • Experience levels shift
  • Maintenance realities evolve

What was acceptable at design time may become fragile in operation.

Not because anyone failed — but because the world moved.


 

Most safety systems are optimised for certainty, not ambiguity

Engineering tools are strongest when the system is:

  • defined
  • bounded
  • predictable

But real operations are:

  • messy
  • time-pressured
  • partially visible
  • constantly adapting

So we end up with a gap:

  • Design assumes clarity
  • Operation lives in ambiguity

And in that gap, people do what they always do in real systems:

They adapt.


 

Adaptation is both the solution and the risk

This is where it gets uncomfortable.

Operational staff are not “deviating from design” in a careless sense.

They are:

  • making the system work in real conditions
  • closing gaps the design didn’t fully anticipate
  • keeping operations moving under constraints

In many cases, adaptation is exactly what prevents incidents.

But adaptation has a cost:

It slowly replaces designed safety margins with lived experience.

And lived experience is not always visible to engineers, analysts, or regulators.


 

So what actually keeps systems safe?

Not perfection. Not elimination of error. Not rigid adherence.

It’s something more boring and more important:

Continuous visibility of how the system is actually behaving, not just how it was designed to behave.

That means:

  • listening to weak signals, not just incidents
  • treating “workarounds” as data, not discipline problems
  • updating assumptions faster than operational reality changes
  • designing for drift, not just nominal cases

In other words:

Safety engineering is less about preventing failure
and more about preventing invisibility.


 

A final uncomfortable thought

If a system only looks safe in reports, models, and certification documents…

That’s not evidence of safety.

That’s evidence of alignment between documentation and design intent.

The real question is always:

What is the system doing when no one is formally checking it?

Because that is where safety actually lives.

Related Posts