Amazon Says One Engineer’s Simple Mistake Brought The Internet Down

7 years ago

March 3, 2017 at 3:30 pm

Roughly 48 hours after its major service outage, Amazon is admitting what caused the problem. Apparently, some poor engineer at Amazon Web Services (AWS) did an oopsie and brought the internet to its knees. Oopsies are the worst!

In all seriousness, it’s a sobering story. Here’s how Amazon described it in a recent blog post:

At 9:37AM PST, an authorised S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

We’ve all been there. You push the wrong button and end up getting Sprite instead of Coke. But this poor guy or gal probably made an errant keystroke that crippled AWS for at least four hours. Since about a third of all internet traffic reportedly flows through AWS servers, deleting a whole bunch of those servers screwed up a few people’s days.

[referenced url=”https://www.gizmodo.com.au/2017/03/how-one-little-amazon-error-can-destroy-the-internet/” thumb=”https://www.gizmodo.com.au/wp-content/uploads/sites/2/2017/03/amazon2.jpg” title=”How One Little Amazon Error Can Destroy The Internet” excerpt=”The fact that Amazon controls a vast swath of cloud computing services became dreadfully clear on Wednesday morning when a string of errors brought countless websites to their knees. This consolidation of power is, perhaps suddenly, a very big problem.”]

In theory, a series of failsafes should keep the fallout from such errors localised, but Amazon says that some of the key systems involved hadn’t been fully restarted in many years and “took longer than expected” to come back online.

The company now claims it’s “making several changes as a result of this operational event.” One of these changes will involve modifying a tool so that a large number of servers can’t be deleted at once. Which makes total sense, but still doesn’t solve the problem of unknown unknowns (like, say, a slower than expected restart) on an internet that relies so heavily on a single service.

In the meantime, let this serve as a shoutout to that poor AWS engineer who made a tiny mistake that led to major consequences. We’re having a rough year, too.

We’ve reached out to Amazon to find out more details about the incident, specifically the fate of the poor engineer who caused the problem. We’ll update this post when we hear back.

This story originally appeared on Gizmodo.

How to Block the New Ads Microsoft Added to Windows 11

Everyone Who Pays for Slack Can Now Try Its New AI Tools

What ‘Bitcoin Halving’ Means (and Why It Matters)

A Beginner’s Guide to Backyard Astronomy

What to Do When YouTube Warns You About Your Ad Blocker

The Best Mobile Plans to Keep Your Phone Bill Under $30

Here Are Amazon Australia’s Best Deals of the Week

Here Are the Fastest Internet Providers and NBN Plans in Australia

19 Sustainable Mother’s Day Gifts That’ll Show Some Love to Your Mum and the Planet

Save up to $300 on Optus’ Biggest SIM-Only Plan Bundle

Amazon Says One Engineer’s Simple Mistake Brought The Internet Down

Comments