IBM and Nextgen have been blaming each other for the failure of Census 2016. Based on today's Senate Economics References Committee hearing into #CensusFail, it appears both companies were at fault to some extent. Nextgen may have incorrectly implemented geoblocking aimed at mitigating distributed denial of service (DDoS) attacks while IBM acknowledged it should have a real test of its router's resilience to failure. But Alastair MacGibbon, the Special Adviser to the Prime Minister on Cyber Security, has laid the blame predominantly on IBM for failing to handle relatively small DDoS attacks that shouldn't have brought down the Census website.
The Australian Bureau of Statistics (ABS) Census online form website was taken down after being pounded by DDoS attacks. IBM was the IT provider for the website and it enlisted Nextgen and Telstra to be the uplink providers. Each ISP had its links connected to one router each that would bring traffic to IBM's datacentre.
The ISPs were instructed by IBM to implement geoblocking in anticipation of DDoS attacks to block incoming traffic from overseas.
According to IBM engineer Michael Shallcross, who oversaw the project and spoke at the Senate hearing, when the Census website came online, it was hit by a large volume of traffic coming through the link from Nextgen, which became fully saturated. This was identified as a DDoS attack. The traffic primarily came from Singapore on a router managed by Nextgen where the geoblocking rule was not properly implemented, Shallcross said.
After some time and more DDoS attacks, IBM made the decision to restart the two routers to remediate the issue. Unfortunately, the router connected to the Telstra link didn't restart properly due to a configuration error. The decision was made to take the website down after IBM misinterpreted data that was being sent out from its load monitoring system as a possible security breach.
The reason why IBM used two uplink providers was to provide redundancy if one connection was affected. IBM had to deal with the unfortunate scenario where both services were affected.
When questioned about whether IBM would have done anything differently if it could do it all over again, Shallcross admitted the company didn't test the routers adequately. IBM had only completed simulations of failure scenarios. In hindsight, Shallcross said the company should have done a 'hard' test, which essentially means pulling the plug on the router and then powering it up again.
Speaking at today's Senate Committee hearing, McGibbon was called to provide an assessment of the #CensusFail incident. While there have been speculations as to whether the DDoS attacks actually happened, he assured the Committee members that those attacks did occur but they were rather small in scale.
According to information provided by Nextgen and IBM, the DDoS traffic was coming in at a rate of around 3Gbps. It's not uncommon to see DDoS attacks that hit 100Gbps.
The system managing Census online was degraded by the DDoS attacks but they didn't completely knock the website out; it was the ABS and IBM's decision to pull the plug.
"It shouldn't have caused the damaged that it did," McGibbon said. It was clear the problem was exacerbated by insufficient communication between IBM and Nextgen, he said.
While IBM maintained that geoblocking was an effective solution to ward off DDoS attacks for Census Online, McGibbon made it clear that better alternatives were available.
"Had it worked properly, it may have protected the site but there are other DDoS mitigation you can acquire from ISPs and it's my understandings the services were not acquired," he said.
IBM provided its reasoning for refusing DDoS mitigation offered by Nextgen earlier in the day. IBM is currently in discussion with the government over possible compensation that it will pay for #CensusFail.