How Securing Big Data Can Slow You Down

If you're attempting analysis of large pools of data using public cloud services, then you'll need a variety of security techniques to ensure that data is protected. Just be aware that this may introduce performance issues.

Slowdown picture from Shutterstock

That point was raised during a presentation on the final day at Data Center World in Las Vegas, which I've bee attending as part of our ongoing World Of Servers coverage. Sam Heywood, senior director of global standards at Gazzang, pointed out that while Amazon's Elastic MapReduce (EMR) service provided a cost-effective way for businesses to perform big data analysis without having to set up their own Hadoop servers, securing that data was more challenging.

A common technique in that scenario is to use encryption on all data used by the service. While a sensible move even if that data is already partially anonymised and a requirement of many compliance regimes, that necessitates complex key management. "If you're not managing the keys correctly, the encryption is useless," Heywood said.

Individual keys are required for each stage, including uploading the data, writing final results to an analytics server and downloading the data. As well, each individual Hadoop node also requires its own key.

Even with all that encryption in place, international organisations still worry about the potential for Amazon to surrender data to US law enforcement authorities. "We hear a lot: curse your blasted Patriot Act. Amazon can take data at any minute," Heywood said.

One solution to that dilemma is to ensure that key management happens offshore, ideally behind your own corporate firewall. Even if data is given over by Amazon as a result of a subpoena, that data won't be able to be decrypted without access to the keys, and those keys won't be subject to US laws if stored offshore.

The challenge again is that this introduces some delays, though Heywood says these aren't massive. "Absolutely there's some latency, however the communications between encryption endpoints are not real chatty." In situations where nodes are expected to be persistent, this could be a problem, but it's unlikely to be an issue for a one-off analysis.

Lifehacker's World Of Servers sees me travelling to conferences around Australia and around the globe in search of fresh insights into how server and infrastructure deployment is changing in the cloud era. This week, I'm in Las Vegas for Data Center World, looking at how the role of the data centre is changing and evolving.


Comments

    Angus,

    Thanks for attending the presentation last week. I was pleased to read your blog post. As you've noted, securing big data in cloud environments is a highly complex task that needs to support a variety of use cases. At the same time, if security significantly impinges on performance, that's a problem. This is the exact reason we introduced Gazzang CloudEncrypt, a solution designed specifically for big data in the cloud. We take care of the encryption and key management, so data scientists and big data users can focus on discovering interesting and valuable insights.

    To mitigate concerns related to the Patriot Act, Gazzang CloudEncrypt supports deployment of the Key Manager to any location including an Amazon customer's own data center. There may be some latency communicating with the on-premises key manage, but this communication is rare --only upon initialization or reboot-- and is therefore a non-issue in overall performance.

    The bottom line is that while securing big data can introduce some performance overhead, the cost is often very low, and the tradeoff is absolutely imperative when dealing with sensitive data.

    Thanks Angus,

    Sam Heywood
    Sr. Director Products, Gazzang

Join the discussion!