How Azure Fell Over And How Microsoft Fixed It

9 years ago

December 18, 2014 at 7:30 am

Cloud computing services are highly automated, but still ultimately managed by humans — and humans make mistakes. That’s the key lesson from a major outage Microsoft’s Azure service suffered last month.

Microsoft’s Azure blog has published a detailed root cause analysis for the outage in its Azure Storage Service, which fell over on November 18. Because storage is also a key requirement for many virtual machines, other Azure customers are always affected.

The main cause of the problem? Azure updates are supposed to be rolled out gradually in “slices” across regions, but that process was ignored — an update designed to improve performance was tested but then immediately put into production across the infrastructure. In other words: a sensible business process was ignored:

The engineer fixing the Azure Table storage performance issue believed that because the change had already been flighted on a portion of the production infrastructure for several weeks, enabling this across the infrastructure was low risk. Unfortunately, the configuration tooling did not have adequate enforcement of this policy of incrementally deploying the change across the infrastructure.

Microsoft has now updated its deployment platform so changes can only be made in stages, which means (in theory at least) that the problem won’t re-occur. It’s a reminder that there’s no such thing as 100 per cent uptime. Hit the link below for the full account.

Final Root Cause Analysis and Improvement Areas: Nov 18 Azure Storage Service Interruption [Azure Blog]

Our Housing System Is Broken and the Poorest Australians Are Being Hardest Hit

12 of the Best Lamps to Buy If You’re Sick of Using the Big Light

Limit Your Data Usage With These Plans and Phone Settings

Baby Reindeer: How the Series Brings a Needed Perspective on Male Victimisation

The Secret to Happiness, According to Psychology Experts

Here Are Amazon Australia’s Best Deals of the Week

TPG Has Changed the Prices for Almost All of Its NBN Plans

Wrap Me in ALDI’s $30 Heated Winter Travel Blanket

JB Hi-Fi Is Clearing Out Games For As Little As $2

Amazon Australia Beauty Week Sale: 24 of the Best Products to Shop

How Azure Fell Over And How Microsoft Fixed It

Comments