Does 99.999% Uptime Really Matter?

It's the gold standard of service-level agreements (SLAs): something being available 99.999 per cent of the time (or 'five nines', as those in the trade often call it). But does it really mean anything?

Uptime picture from Shutterstock

Google issued a blog post this morning boasting about the reliability of its Google Apps service. The actual guarantee for Apps is 99.9 per cent uptime (three nines, if you like). For 2012, the figure was 99.983 per cent. That's not quite five nines, but it's rather close, and Google maintains it wants to improve the figure. " We work hard to make sure any disruptions are rare, limited in scale and quickly resolved," the post notes.

That's good news for customers. However, it raises the first important point about high availability: it always costs more. Google makes no such guarantee about the availability or reliability of its free services (though the underlying infrastructure is used by both Google Apps and plain od vanilla Gmail). That doesn't mean lots of people don't use them very happily without incident. It just means they don't have much recourse when things go wrong.

The second point is this: downtime in itself is not always a bad thing. Unplanned downtime can be a bad thing, and in an era where many businesses expect to sell goods on their sites 24/7, it can make sense to try and minimise it. But the world won't stop turning if you have an occasional outage and tell people about it.

The obvious proof? Apple. Every time there's a major product announcement, its online store is taken offline. That's no credit to Apple's ability to develop web services (an area where it lacks serious smarts). But judging from its financial results, it hasn't been a business-crippling problem.


Comments

    If you do the math, 99.999% uptime for a 24/7 business has around 5 minutes of downtime per year. That's why it's an ideal figure. If you step it down to 99.99% it obviously becomes around an hour per year, still not devastating as long as that hour is dished out in small chunks. Google saying 99.9% however equates to 525.6 minutes of downtime per year which equates to 8 hours, or 43.8 minutes per month which if you work out how much money you make per transaction and how many transactions per minute is made, 43.8 minutes of downtime per month can be fairly costly.

    Big companies of course can swallow this, and in Apple's case do it on purpose, but in the case of small business and cloud hosting where you have little control over your hosting, it can be pretty bad

      The trouble is that nobody relies on only one service, and services rarely have the decency to fail at the same time.

      if you're relying on three four-nines services for your office to function (lets say hosted email, your internet connection, and a central file hosting service) you're up to about 3 hours per year. Add in your phone service, a random datacentre connection (maybe for your company VPN gateway), and a CRM and you're up to about 6 hours. If all those services were three-nines instead you'd be down for two full days a year.

      The more hosted services you rely on, the more important SLAs become.

        The main difference in most cases is actually the insurance your providers purchase for themselves. The higher their guarantee obviously the higher they pay, which gets passed on to the consumers to one degree or another.

        It's not really about uptime in their eyes because most of them do things in very similar, very efficient ways, it's about limiting liability across the board.

    I would say the compensation as part of any SLA is more important than the uptime claim. There will be downtime, but if it is that critical to you... Make sure you have a good SLA.

    Horses for courses. If it was up time for a bank switch which meant the difference between your a credit card transaction proceeding or not, you'd want it pretty high. If it was for your company's website that was non-transactional, you'd be willing to live with much less.

    The Australian Stock Exchange hovers somewhere between 99.9 and 99.99 depending on the year. I suspect most big banks are similar.

    Anyone asking more probably needs to consider if they are more important than those institutions. Realistically, any more than 99.99 gets very expensive (you would need multiple data centres geographically separated for a start), and is generally very hard to demonstrate with any statistical rigour. Even if you can get this out of your hardware, there is a reasonable chance that the software won't hold up.

    AWS aims for 99.95 on EC2 instances, so ask yourself, is anyone offering higher actually better able to produce reliability than Amazon?

      Keeping in mind that the 99.95% SLA provided by AWS is for the region, not for a single availability zone. So you would need to have multiple servers with load balancing to achieve the 99.95% that they state.

    As a rule of thumb, for every "9" you add after 99.99, add a "0" on to the end of the cost of your infrastructure. It's almost certainly an exaggeration in most cases, but it tends to focus businesses on what they *really* need rather than defaulting to five nines just because that's what they think is the standard.

    "Not quite five nines"
    It's not even four nines...

Join the discussion!

Trending Stories Right Now