Designing Systems To Manage Big Data

Big data is challenging, expensive and hard to staff. However, when it comes to implementing the computing systems needed to deliver it, a handful of key principles will make the task easier.

Wiring picture from Shutterstock

I've drawn these suggestion from the last presentation I'm attending at the Gartner IT Infrastructure Operations & Data Center Summit, which I've been covering throughout the week as part of our World Of Servers series. Gartner research vice president Roger Cox and senior analys Sid Deshpande outlined some of the key challenges involved in adapting your existing systems to allow big data analysis.

It's fine not to feel fully across your big data strategy. "It's really an immature technology; it's evolving rapidly and there's a lot of learning going on," Cox said. The same fundamentals apply as with any large IT project: start from the premise of how this will make money. "Unless you're able to establish business value for a big data solution, it's difficult to get funding in the future," Deshpande said.

Don't rebuild your entire architecture simply for big data. "Companies that are successful are the ones that do it without radically disrupting their architecture models," Deshpande said.

Big data requires a different approach. Systems used in other large IT rollouts may not make sense. Backup is one obvious example. "You don't want to back up if you don't have to back up," Cox said. You only really need to back up if it has real-time value going forward."

Other examples only have limited value, even in your own vertical Deshpande singled out the recent AGIMO policy paper and praised it as an example of how approaches to big data need to be customised for a specific business. "Business outcomes for different departments will drive the technology solution," he said. "It's not a prescriptive technology paper; its more of a suggestive paper, and every department will be different."

Prototyping is common and vital. Relatively few big data projects so far have moved beyond the prototype stage. "We're seeing so many prototypes because it's an additional expense and it's going to be a big expense," Cox said.

Life gets tough if you mix cloud and premises. "If your data is being generated on the cloud, you're better off doing analytics in the cloud as well," Deshpande said. "If you want data generated in your data centre to be analysed in the cloud, the question of paying for movement costs and worrying about how you want to shift that arises. Data location is key."

Lifehacker's World Of Servers sees me travelling to conferences around Australia and around the globe in search of fresh insights into how server and infrastructure deployment is changing in the cloud era. This week, I'm in Sydney for the Gartner Infrastructure, Operations & Data Center Summit, looking for practical guidance on developing and managing your IT infrastructure and using virtualisation effectively.


Comments

    Angus, good article! Cloud computing is driving a new wave of innovation in the area of big data. The open source solution from HPCC Systems provides a single platform that is easy to install, manage and code. Designed by data scientists, HPCC Systems is a data intensive supercomputer that has evolved for more than a decade, with enterprise customers who need to process large volumes of data in a 24/7 environment. Its Thor Data Refinery Cluster, which is responsible for ingesting vast amounts of data, transforming, linking and indexing that data, and its Roxie Data Delivery Cluster are now offered on the Amazon Web Services (AWS) platform. Taking advantage of HPCC Systems in the cloud provides a powerful combination designed to make Big Data Analytics computing easier for developers and can be launched and configured with the click of a button through their Instant Cloud solution. More at http://hpccsystems.com

Join the discussion!

Trending Stories Right Now