What Organisations Can Learn From The ABS Census Fail

We can all agree that this year’s Census has been a colossal snafu. The Australian Bureau of Statistics (ABS) shutdown the website after it was supposedly hit by a number of distributed denial-of-service (DDoS) attacks. This was after the Census website had been stress tested. So what can organisations learn from this incident? Let’s find out.

When the ABS pulled the plug on the Census website on Tuesday, millions of Australians were left unable to complete the compulsory survey. The 2016 Census was already marred by controversy over privacy issues and the site shutdown didn’t help the situation.

So, What Happened?

The next morning, the ABS and Small Business Minister Michael McCormack came out to say the website was hacked… then they clarified that it was a number of DDoS attacks… then they said they weren’t attacks per se, but they were definitely DDoS attacks from an “overseas source”. Confused? Me too.

Currently, the ABS’ official story is that the Census website was subjected to a series of DDoS attacks launched from abroad. This, compounded by the fact that thousands of Australians were trying to log on at the same time, caused a hardware failure. A router became overloaded which caused a false positive in the system monitoring information. The ever so prudent ABS decided to take precautionary measure by shutting down the Census website to maintain the integrity of the data that was already collected.

“Had these events occurred in isolation, the online system would have been maintained,” according to the ABS, but questions remain.

The census server was hosted by IBM, one of the biggest technology companies in the world that is well versed in running server networks for the world’s largest organisations. It’s no stranger to dealing with high load capacities and DDoS attacks.

To ensure the Census ran smoothly online, the website was also load tested by Revolution IT for around $470,000. The ABS was confident that Census 2016 online would be uneventful. And yet here we are.

Many people are now starting to doubt whether the ABS really suffered DDoS attacks or did it simply underestimate the number of users trying to log onto the Census online form simultaneously. According to information from the ABS:

There are 12.9 million internet subscribers in Australia at the end of 2015, so perhaps the ABS miscalculated the expected demand.

Pure Hacking CTO Gordon Maddern told Lifehacker Australia that there are two possible scenarios that could have caused the Census website outage:

A) A failure in capacity planning and it was poorly architecture and engineered. i.e Some project manager might have not have properly scoped the load estimates correctly. Or there was a flaw in the design. Here is an interesting tweet:


Or B) It was actually a DDoS attack and they failed to adequately protect against DDoS attacks. This could be some project manager that thought load testing checked off denial-of-service (DoS) testing. When I heard this [yesterday] that they claimed it was a DDoS attack from overseas I though it sounded fishy. Multiple people on Reddit were reporting that they couldn’t access the site using their VPNs and that you had to come from inside Australia. My first thought was they are doing geoblocking to allow permit Australian IPs access. So the DDoS from “overseas”, although still possible, is not likely.”

The ABS has since explained that the geoblocking it had in place failed, but doubts remain. For now, we can only go by what the ABS says.

There are some lessons that organisations can learn from the Census 2016 incident in regards to performance testing and mitigating DDoS attacks. After all, enterprises are no strangers to these kinds of problems.

Learning From The Mistakes Of Others

Tim Koopmans, founder and CTO of Flood IO, a load testing service provider, has listed a number of things that the ABS could have done better in regards to performance testing in a blog post, including:

  • Model: An accurate application simulation model is priceless. One million form submissions per hour sounds awful to baseline and predict load form. What happens when more than 20 million Australians attempt to do their Census online after being reminded on the 6pm news? Stampede!
  • Test: If your process prohibits testing through lack of time, suitably sized environments, access to tools or anything else that gets in the way of performance testing, your process is broken. I can’t comment for this scenario but can make some educated guesses as to impediments.
  • Profile: Generating load is the first dimension. Profiling load is the second dimension. You should allow time and budget for profiling your application under load. Profiling allows you to tune and optimise your code.
  • Monitor: Monitoring is the third dimension of performance testing. When asked if the hacks were a deliberate attempt to sabotage the census, Mr Kalisch [head of the ABS] replied: “We believe so”. “The Australian Signals Directorate are investigating, but they did note that it was very difficult to source the attack.” The point is, monitoring should not make it difficult to source where your load originates.
  • Scale: Ability to scale is paramount. Not the fixed capacity you start with. We’re guessing 10 end points (discovered via DNS) was simply not enough to start with:
      for i in {1..100}; do dig stream$i.census.abs.gov.au +short; done | wc -l
     
    10
  • Throttle: Throttling will ensure performance and availability for your customers. Monitoring of simple metrics such as calls, latency, and error rates can feed into rules for throttling or blocking. Impact of DDoS can be mitigated, it’s a common place requirement to protect against.

These are all advice organisations can benefit from, especially ones that have to deal with a lot of requests in the background.

While Webroot senior information security analyst Dan Slattery couldn’t comment on specifically what the ABS could have improved on, he did echo Koopmans’ sentiments. He told Lifehacker Australia:

“It’s important for organisations that host websites and services to ensure they have a plan in place to deal with large spikes of activity. Monitoring usage and stats in real time is a must to detect potential DDoS attacks early. Servers should be able to quickly be scaled up to protect themselves from being taken down. Preparation and planning will always be the best way to mitigate these attacks.”

Maddern was more critical of the ABS’ failures to keep the Census page running:

“So if it was actually a DDoS they should have protected themselves. There are commercial solutions like CloudFlare that offer up to 600GB/sec DDoS protection (That’s crazy insanely huge). So whatever they purchased or used, it wasn’t enough.”

As for what organisations can learn from the ABS incident, Maddern said companies should invest their money wisely in the design and testing phases of their websites and to do their due diligence by double-checking everything. He added that if your business does pay for DDoS protection, it needs to be validated before it goes live.

The ABS Census online form is still down at the time of writing.


The Cheapest NBN 50 Plans

Here are the cheapest plans available for Australia’s most popular NBN speed tier.

At Lifehacker, we independently select and write about stuff we love and think you'll like too. We have affiliate and advertising partnerships, which means we may collect a share of sales or other compensation from the links on this page. BTW – prices are accurate and items in stock at the time of posting.

Comments


2 responses to “What Organisations Can Learn From The ABS Census Fail”