Gmail suffered an outage yesterday, with some users unable to fully utilise the service for close to ten hours after two successive network failures created difficulties. While that sucked if you were one of the people affected, it’s worth noting that even a 10 hour outage means 98.6 per cent uptime.
Like many IT projects, Gmail aims for 99.9 per cent uptime. As the Google blog explains, the issue began around 6am PST, and some users couldn’t fully send messages until 4pm. 10 hours out of a 30-day month equates to a 1.4 per cent failure rate, which means 98.6 per cent uptime.
Google says only 1.5 per cent of users were affected that badly, and notes that most functions (reading existing email and searching) continued to work, so I guess it can claim the 99.9 per cent if it wants to.
The cause of the problem?
The message delivery delays were triggered by a dual network failure. This is a very rare event in which two separate, redundant network paths both stop working at the same time. The two network failures were unrelated, but in combination they reduced Gmail’s capacity to deliver messages to users.
Two lessons here. First, the intersection of complex network services means outages aren’t always predictable. Second: make your uptime targets realistic. 99.9 per cent gives you less than 45 minutes in a month for everything to go wrong.
More On Gmail’s Delivery Delays [Google Enterprise Blog]
Comments
2 responses to “10 Hours Of No Gmail Is Still 98.6% Uptime”
Super simplistic representation of availability, you arbitrarily decided on the timeframe as a month and assumed that would be the basis for the calculation, my money’s on it being an annual target, like most availability measures for high-availability systems. Your dismissive attitude towards the extent of the user impact underlines your lack of understanding of how availability works as a performance metric.
If you’re going to talk about a subject as complex as availability, please try to do it justice.
Yes, I would argue that the real uptime percentage would be closer to 17% for that 12 hours
Edit: by that I mean that nobody cares if it breaks while they sleep, unless you like sending email in your sleep