Bees And Monkeys: 5 Cloud Lessons NAB Learned From AWS

Bees And Monkeys: 5 Cloud Lessons NAB Learned From AWS
To sign up for our daily newsletter covering the latest news, hacks and reviews, head HERE. For a running feed of all our stories, follow us on Twitter HERE. Or you can bookmark the Lifehacker Australia homepage to visit whenever you need a fix.

National Australia Bank (NAB) has used Amazon Web Services (AWS) to provide more reliable development and hosting processes for its online services. Here are the key lessons it learned as it moved key banking systems into the cloud.

David Broeren, head of digital and online channel services at NAB, gave a presentation on the bank’s cloud systems at the AWS Summit event in Sydney today. These are the key lessons that emerged in that discussion — lessons that are useful even if you’re not working in a big four bank.

You can’t jump onto the cloud straight away. NAB began a major internal transformation project in 2009, but didn’t look at actually using cloud services until some fundamental tool and platform decisions had been made.

“AWS is quite a new thing for NAB, but we’ve been trying to transform for some time,” Broeren said. Key internal tool decisions included introducing an internal GitHub for code tracking and Artifactory for repository management. “There’s been a focus for some time on getting the enterprise toolset right.”

Rollout is fast but not the key metric Broeren described how speedily NAB rolled out its initial AWS instance. “In 59 minutes we went from an account with AWS to two data centres fully ready. Within two minutes from that we had 40 servers out there running and ready.” Those systems include two load-balanced EC2 instances and an S3 instance hosting the relevant server images.

While that was impressively speedy, the more important measure for Broeren was performance and resilience. That led to another key decision: deliberate stress testing for the system.

Why you need monkeys and bees “This is where the resilience bit came in Broeren said. “From the outset we put in two key controls. The first was ‘bees with machine guns’: a brute force load onto the site to test out its resilience.”

The second key control was the Chaos Monkey tool, originally developed by Netflix, which deliberately takes out functioning servers to test whether the site could recover from system failures. “To get full effect, you have to run it in production,” Broeren said. “The great thing about that is it continually tests the design.”

“Chaos Monkey takes something that would be a high-severity incident — the loss of a server — so it’s just an information event. It actually delivers resilience to our teams.”

Use the same system design everywhere. “The environments for development, performance, test and production all look the same, so they’re all production code,” Broeren said. That means any problems with changes are very quickly identified.

Have plans for future deployments NAB’s next cloud activities will be in disaster recovery and performance optimisation. “I’d love to be able to do continuous disaster recovery,” Broeren said. Optimisation is also a major goal: “I can’t wait to get the Janitor Monkey in.”