A little after 5AM this morning, countless websites and web services ground to a halt following a reported widespread outage of Amazon Web Services (AWS). Everything from Slack to Quora to Gizmodo saw major disruptions. Before Down Detector itself went down, the site showed outages on the tier 1 network Level 3 in most major population centres in the United States.
It was initially unclear what was causing the problem, but AWS did say on its website that it's experiencing "increased error rates". More specifically, the company said in an alert:
We've identified the issue as high error rates with S3 in US-EAST-1, which is also impacting applications and services dependent on S3. We are actively working on remediating the issue.
We now know that the widespread outage was cause by a failure at AWS’ Northern Virginia facility in the US. It’s the AWS’ oldest farm and also the most commonly borked. The Atlantic did a nice story on it earlier this year.
Amazon S3 refers to the company's Simple Storage Service that helps countless websites stay up and running. Because so many services depend on Amazon's cloud storage, a single outage can cripple large swathes of the internet in a matter of minutes.
The situation undeniably draws comparison to the DDoS attack that affected Dyn's systems late last year, bringing most of America's internet to its knees. Lots of work days are being ruined for people who depend on the internet to do their jobs.
Our own services here were affected by the outage, and others affected that we’ve seen include Slack, Trello, JWPlayer, SocialFlow, Charbeat, and Imgur.
We reached out to Amazon for more details about the outage but had not heard back at time of writing.
At least it looks like Amazon is making progress!
For S3, we believe we understand root cause and are working hard at repairing. Future updates across all services will be on dashboard.
— Amazon Web Services (@awscloud) February 28, 2017
Update 10:07AM: The latest from Amazon:
S3 object retrieval, listing and deletion are fully recovered now. We are still working to recover normal operations for adding new objects to S3.
That sounds like progress, but the catastrophe still isn’t quite over.
Originally posted on Gizmodo.