While the forensic analysis of this weekend's British Airways outage is still some time away, the airline's woes are a good reminder of why we need robust systems in place to protect the systems and data that underpin our businesses. I spoke with Zerto's Vice President, Andrew Martin, for the Asia Pacific and Japan (APJ) region about the challenges business face and what they can do.'
While cyber-threats pull the headlines, they aren't the biggest causes of business disruptions. Martin said natural disasters, hardware failure, software failure and corruption, human error are far more prevalent than malware and hackers. A report conducted by Zerto with Ovum [PDF, registration required] in 2016 found the following were the causes of an outages in the APJ region:
- 55% of companies reported natural disasters
- 55% reported hardware failure
- 45% reported power failure
- 43% reported network failure
- 43% reported human error
- 43% reported IT software failure
Martin said the rise of virtualisation has given rise to a new paradigm when if comes to protecting systems and data. In the physical world, we had to think about redundancy and resilience in terms of N + 1. We needed at least one more device available to cover a failure. But virtualisation and cloud technologies allow us to change our thinking and give us the ability to replicate data and applications consistently.
While we can only speculate as to what a "power system issue" might mean at British Airways, it's clear elements of their infrastructure weren't equipped for the specific circumstances they faced on the weekend. Martin doesn't claim any specific inside knowledge of the outage but he has dealt with airlines in the past.
It's possible, he thought, that some legacy applications could not be run in a virtualised environment as they were tied, perhaps, to a specific hardware platform.
So, what can we do to mitigate the risks of a British Airways type of incident in our businesses?
Martin suggests three things:
- Virtualise apps wherever you can and use a virtually-aware tool for workload replication.
- Conduct an audit of your applications and data and classify them by importance and what downtime you can afford. Need more Then assign granular service levels so you can focus your effort and apply the best technology for each app/data repository.
- Have multiple DR locations so a single failure doesn't hurt even if a whole territory is impacted. Using the public cloud helps and you can consider off shore providers where sovereignty isn't an issue.