DRMacIver's Notebook

Skirting the edge of disaster

Published: 2020-03-04

Have you ever looked at a system and marvelled at the fact that it works at all?

Of course you have, you are one. You also work at one, live in one, if you’re in London you certainly commute on one. Complex systems are a mess and the only reason we can believe that they work is that they, somehow, do.

Why does this happen?

It’s a boundary defined between two competing forces:

If an important system is broken, we fix it.
If a system has spare capacity, we place more demands on it.

Any system subject to these forces will skirt the edge of disaster: It will constantly be on the verge of being broken, and yet somehow work.

The reason is very simple: If the system is not on the edge of disaster, it will either be currently broken, and move towards the edge by getting better, or it will currently be under capacity and more demands will be put on it, and move towards the edge by getting worse. The edge is the only point of equilibrium.

The best way to fix this that seems in widespread use is not to try and remove this dynamic, but instead to move the edge, by changing what counts as a problem that needs fixing. You still skirt the edge of “disaster” but your definition of disaster is now that the system is working suboptimally instead of just broken.

One way in particular to do that is to treat situations in which disaster was plausible rather than actual (near misses) as things that need fixing in their own right. Learning from samples of one or fewer is a good paper about this, describing how organisations can learn to avoid low probability high cost events which they, thankfully, not yet actually experienced. Experiencing near misses as things to learn from and feed back into the system is a key part of this.