Could The Telstra Outage Happen To You?

6 years ago

May 22, 2018 at 3:30 pm

Most businesses have disaster recovery and business continuity plans. Often, they are baked into the design of systems so, if something goes wrong, another system picks up the load and things keep running, more or less, as designed. If you think about a modern passenger jet, there are lots of redundant systems so it’s highly unlikely the failure of a single part causes a disaster. Telstra’s recent season of outages continued yesterday with a problem that knocked out much of their phone and data network. But should the outage have been avoidable?

It’s easy, on the face of it, to say Telstra messed up and is running the their network poorly. But here’s what one of my former colleagues had to say about yesterday’s outage.

For those complaining about @Telstra outages: Please remember: Humanity has never built a mobile network as fast or as complex as Telstra’s 4G network. It is the premiere network of its type globally, and usage of it is exploding. We are pushing humanity into new territory here.

— Renai LeMay (@renailemay) May 21, 2018

Telstra released a statement about the issue saying that a software issue caused a piece of equipment to malfunction. When things were meant to failover to another piece of hardware there was a further fault with redundancy built into the systems not working as intended.

Typically, companies test for all sorts of scenarios when it comes to ensuring their redundant systems pick up the load as expected – or at least they should test. And I have little doubt Telstra does do testing. But with a network the size and complexity of Telstra’s it’s very difficult to test for every single possible scenario and potential knock on effect.

So, while it’s easy to pick on Telstra for yesterday’s failure and the other issues they’ve faced over recent weeks, we should consider those failures in the context of the systems they have deployed and are managing.

When was the last time your company did some serious business continuity testing? Have you walked into your data centre and randomly pulled cables to see if the redundancy you’ve designed works? A former CIO of mine used to do exactly that – basically he was a live Chaos Monkey.

Perhaps yesterday’s failure by Telstra is a salient reminder to do your own testing.

Comments

READ THE COMMENTS

Our Housing System Is Broken and the Poorest Australians Are Being Hardest Hit

12 of the Best Lamps to Buy If You’re Sick of Using the Big Light

Limit Your Data Usage With These Plans and Phone Settings

Baby Reindeer: How the Series Brings a Needed Perspective on Male Victimisation

The Secret to Happiness, According to Psychology Experts

Here Are Amazon Australia’s Best Deals of the Week

TPG Has Changed the Prices for Almost All of Its NBN Plans

Wrap Me in ALDI’s $30 Heated Winter Travel Blanket

JB Hi-Fi Is Clearing Out Games For As Little As $2

Amazon Australia Beauty Week Sale: 24 of the Best Products to Shop

Could The Telstra Outage Happen To You?

Comments

Leave a Reply Cancel reply