If you're attempting analysis of large pools of data using public cloud services, then you'll need a variety of security techniques to ensure that data is protected. Just be aware that this may introduce performance issues.
Slowdown picture from Shutterstock
That point was raised during a presentation on the final day at Data Center World in Las Vegas, which I've bee attending as part of our ongoing World Of Servers coverage. Sam Heywood, senior director of global standards at Gazzang, pointed out that while Amazon's Elastic MapReduce (EMR) service provided a cost-effective way for businesses to perform big data analysis without having to set up their own Hadoop servers, securing that data was more challenging.
A common technique in that scenario is to use encryption on all data used by the service. While a sensible move even if that data is already partially anonymised and a requirement of many compliance regimes, that necessitates complex key management. "If you're not managing the keys correctly, the encryption is useless," Heywood said.
Individual keys are required for each stage, including uploading the data, writing final results to an analytics server and downloading the data. As well, each individual Hadoop node also requires its own key.
Even with all that encryption in place, international organisations still worry about the potential for Amazon to surrender data to US law enforcement authorities. "We hear a lot: curse your blasted Patriot Act. Amazon can take data at any minute," Heywood said.
One solution to that dilemma is to ensure that key management happens offshore, ideally behind your own corporate firewall. Even if data is given over by Amazon as a result of a subpoena, that data won't be able to be decrypted without access to the keys, and those keys won't be subject to US laws if stored offshore.
The challenge again is that this introduces some delays, though Heywood says these aren't massive. "Absolutely there's some latency, however the communications between encryption endpoints are not real chatty." In situations where nodes are expected to be persistent, this could be a problem, but it's unlikely to be an issue for a one-off analysis.
Lifehacker's World Of Servers sees me travelling to conferences around Australia and around the globe in search of fresh insights into how server and infrastructure deployment is changing in the cloud era. This week, I'm in Las Vegas for Data Center World, looking at how the role of the data centre is changing and evolving.