← Tin's Posts · June 11, 2026 · 2 min read

Two AM

Took a nap while waiting for the technician. Two AM, main supply cable blown, half the house dead. Nothing I could fix myself.

How could I sleep? Because I've only lost one phase.

In a three-phase system, your circuits are distributed across three independent feeds. Lose one, you lose about a third to a half - not everything. The fridge was cold. The bedroom lights worked and my phone was charging. A lot of appliances broke, but everything didn't.

Set an alarm and went to sleep, rather than stay on needles until the technician arrived.

Not an emergency. Downgraded from despair to inconvenience (couldn't watch my show!).


You've seen this before. YouTube switching to 480p instead of buffering endlessly when your connection drops. The video keeps playing. Experience worsened but it's still going on.

The same idea runs through distributed systems as the circuit breaker pattern - the approach Netflix formalized when scaling streaming to millions of concurrent users. When a downstream service starts failing, stop sending it requests. Return a fallback. Your users see "this feature is temporarily unavailable" instead of a spinner that never resolves. The service recovers. Nothing else has to break.


I've built this into a couple of things recently.

A self-healing pipeline. Servers have bad days. If the errors stop accumulating and the data is clean, restart automatically. Don't tell me about it. Escalate only if we're consistently down for a while. Don't make me think of you.

And auth tokens that carry just enough context that the app stays usable when the identity server is having a bad hour. Read your data, navigate the app - you just can't change permissions until the auth layer comes back. Maybe sensitive features stop for a while. The core loop doesn't. A graceful window instead of a hard wall.

Both were designed that way from the start. Not patched in after the first outage.


Zero tolerance is possible. It's just expensive in a way most systems don't need or want - redundant infrastructure all the way down, engineering time you probably don't have and probably don't want to pay for.

Low tolerance, failing with style? Six, seven figures saved. Easily.

Failing gracefully runs a different calculation. The cost is upfront: think through your possible failures before you ship, give them permission to fail. Avoid debugging something that "must stay up" at two AM in your socks. The return is the nap.

Especially at two AM.


Enjoyed this? Subscribe to get future posts by email.

Book a discovery call