Dropbox’s Weekend Outage Was Caused By A Botched OS Upgrade

10 years ago

January 13, 2014 at 3:30 pm

Over the weekend, file syncing and backup service Dropbox suffered an extensive outage. While initial reports suggested a denial-of-service attack, the actual cause was rather more prosaic: a failure in a script designed to automatically update operating systems on its machines.

In a post on the Dropbox blog, head of infrastructure Akhil Gupta explained what went wrong:

On Friday at 5:30 PM PT, we had a planned maintenance scheduled to upgrade the OS on some of our machines. During this process, the upgrade script checks to make sure there is no active data on the machine before installing the new OS. A subtle bug in the script caused the command to reinstall a small number of active machines. Unfortunately, some master-slave pairs were impacted which resulted in the site going down.

The lesson everyone can learn?

When running infrastructure at large scale, the standard practice of running multiple slaves provides redundancy. However, should those slaves fail, the only option is to restore from backup. The standard tool used to recover MySQL data from backups is slow when dealing with large data sets.

Gupta also said that Dropbox plans to open source a tool for “parallelising the replay of binary logs” which can speed up the restore process when it happens. We await with interest.

Outage post-mortem [Dropbox Tech Blog]

How to Install Windows 11 Without All the Extra Junk

Our Housing System Is Broken and the Poorest Australians Are Being Hardest Hit

12 of the Best Lamps to Buy If You’re Sick of Using the Big Light

Limit Your Data Usage With These Plans and Phone Settings

Baby Reindeer: How the Series Brings a Needed Perspective on Male Victimisation

Here Are Amazon Australia’s Best Deals of the Week

TPG Has Changed the Prices for Almost All of Its NBN Plans

Wrap Me in ALDI’s $30 Heated Winter Travel Blanket

JB Hi-Fi Is Clearing Out Games For As Little As $2

Amazon Australia Beauty Week Sale: 24 of the Best Products to Shop

Dropbox’s Weekend Outage Was Caused By A Botched OS Upgrade

Comments

2 responses to “Dropbox’s Weekend Outage Was Caused By A Botched OS Upgrade”