How One Little Amazon Error Can Destroy The Internet

The fact that Amazon controls a vast swath of cloud computing services became dreadfully clear on Wednesday morning when a string of errors brought countless websites to their knees. This consolidation of power is, perhaps suddenly, a very big problem.

Unlike its internet marketplace, Amazon Web Services (AWS) works more like a house of cards than a traditional retail business. After all, instead of selling books and reasonably priced electronics, AWS caters to enterprise clients to provide cloud-computing services. Amazon Simple Storage Service (S3), the product that suffered errors and knocked out a solid portion of the web today, provides storage for cloud-based apps like Slack and Trello. Amazon says that its S3 service is "designed to deliver 99.999999999 per cent durability". But when it one piece of the infrastructure fails, AWS fails big.

This is because Amazon controls a ridiculous portion of the market share when it comes to cloud computing and, specifically, cloud storage. A Gartner study from August 2016 claims that AWS controls 31 per cent of the market in global cloud infrastructure, and the business is growing. The same study said that AWS accounted for 51 per cent of Amazon's profits. (Another study from the same time period puts Amazon's market share at 45 per cent.) Microsoft, IBM and Google are all expanding their cloud offerings as well, but Amazon's been the leader in the space since 2006.

So for over a decade, Amazon has been king of the cloud. During that span of time, the company's business model, which Jeff Bezos once compared to the early days of electricity, enabled startups to scale and yet still afford the cost of hosting. Ingrid Burrington explained in The Atlantic last year:

In practice, this meant that pricing for services was entirely contingent on actual use, an approach that allowed developers to rapidly scale small startups into massive companies by paying for infrastructure support on an as-needed basis and scaffolding as needs grew. Thanks to AWS, the initial overhead for starting a service like Airbnb or Slack (both AWS customers) is so low that those companies can afford to expand quickly.

But what happens when any service gets so big that its tentacles touch the entire industry? Its failures become amplified to a destructive degree. In the case of AWS, that .000000001 per cent of the time when things don't work just right means that over a third of the internet ceases to function well. Amazon won't say how many cloud computing customers it has or the exact percentage of internet traffic that's affected when an error happens. But today's outage showed that it could bring entire networks of websites grinding to a halt. (Gizmodo Media is an AWS customer, so I can confirm that this was a messed up day.)

Meanwhile, the fact that many of Amazon's AWS servers are located in northern Virginia, where an unholy number of tubes come together to form one of the most congested bottlenecks of internet traffic, certainly doesn't help. Amazon says that this region, known as US-EAST-1, was the source of Tuesday's outage.

So while this week's paralysing series of errors gave Amazon engineers a terrible headache, cloud computing competitors like Microsoft, IBM and Google must be thrilled. As mentioned earlier, they're all gaining on Amazon's absurd market share, and now their salespeople will have a single incident to show that AWS is not 100 per cent durable. The fact that added competition should improve services and lower prices for everyone is undeniably a good thing, too.

Amazon still hasn't explained exactly what went down this morning. In response to a Gizmodo request for comment the company said:

We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.

That's basically a different version of the error notice posted on the AWS website. Good luck using the internet. It's a mess out there.

This article first appeared on Gizmodo.

WATCH MORE: Tech News

Comments

    The 99.999999999% refers to durability (i.e. permanent data loss), not uptime.

    Their uptime SLA is 99.9% or 99.99%, depending on the exact service.

      Yes, only data loss. That's why I host all my site's images with them. They're a LOT more expensive for that than a lot of other hosts and resellers but much more reliable and more easily scalable.
      All my website's stuff is working fine too, so that's a relief.

Join the discussion!

Trending Stories Right Now