Top 10 Rules For Working With Big Data

If you're working as an IT pro, chances are you're going to be asked to work on a big data project at some point in the near future. Make the process more productive by bearing these 10 rules in mind.

Question marks picture from Shutterstock

I spent last Wednesday at the AIIA's Navigating Big Data conference, where a range of speakers discussed some of the challenges involved with trying to implement big data in both government and business environments. I've distilled observations from throughout the day into ten key principles. if you have additional experiences or insights, we'd love to hear about them in the comments.

10. Define Your Project

Every discussion of big data begins with an argument about what the phrase 'big data' actually means. "Big data gives big headaches and there is a lot of hype out there in the marketplace," said Ian Bertram, managing vice president for Gartner. "At the moment, big data is coming up the hype cycle and it will hit the trough of disillusionment in the next couple of years. All of a sudden reality starts to hit us and big data just becomes data. But we're not there yet.

The key lesson is that you don't have a 'big data' project for its own sake; you introduce it because you can see the potential for deeper analysis of existing business data sources. "We're not talking about throwing existing analytics away," Bertram said. "We're talking about enhancing it. You need to tie all of it together.

Sometimes the trigger for a big data project will be an obvious increase in the data sources available to an organisation. "Defence and national security are generating a huge amount of data and investing a lot in large platforms," said Brenton Cooper, technical director for information superiority at the Defence Systems Innovation Centre (DSIC). "Frankly they're just drowning in the data.

However, the road to big data can also be a more measured expansion of existing analytics. "We've been focusing on getting the basics right in the more traditional BI space," said Matt McKenzie, director for business intelligence at Optus. "We've made a lot of improvements in an area that was underperforming for quite a while. It's all well and good to have a big data strategy, but the real reason we want to do it is so we have a clear understanding of who our customers are."

9. Factor In Privacy

Sifting through data can be addictive, but you need to consider whether that analysis is a legitimate use of that information. Legal concerns can be a major issue; for instance, Optus can't easily share data with its parent company SingTel because of different regulations in their main markets (Singapore and Australia).

"It comes with great responsibility," said Optus' McKenzie. "The privacy impact on telcos is heightened. There's a lot of internal caution in how we apply and gather this data, but we see it as a real enabler moving forward.

8. Don't Let Privacy Derail You

With that said, don't get overly caught up in issues around security and privacy. "We're never going to fully solve privacy and security," said Parviz Peiravi, principle architect at Intel. Those issues shouldn't be used to defer a project which has clear business benefits.

7. Make Sure You Have Sufficient Data

The more data you have, the more useful the insights will be. DSIC's Cooper notes that in a machine learning environment, a data set of 40,000 examples won't produce much useful insight, but a data set of several million entries can product analysis "as good as a human". That doesn't mean the data has to be entirely clean and consistent to deliver results. Even though the data available may incomplete and noisy, you can still use it to make the decisions that you need," Copper said.

6. Draw On Other Experts

You're unlikely to be the only company in your industry to be exploring big data, so explore what your colleagues have already learned. That can be an informal chat or a more structured process. "We have just recently established a cross-agency working group to consider the broader implications of big data," said Glenn Archer, first assistant secretary policy and planning division for AGIMO. "The ATO is taking the lead here in the context of this cross-agency working group and in establishing a government agency centre of excellence." That has already resulted in a draft issues paper on big data policy in government.

5. Process Can Be A Bottleneck

The trickiest part of introducing big data isn't necessarily getting the systems up and running; it's in making the information accessible and comprehensible to individuals. "An important step is looking at what the human decision maker needs," said Cooper. "A lot of what we're doing is trying to build those analytics to filter the morass of data so humans can make a decision. The bottleneck might actually be there rather than in the processing. The challenge and the opportunity is in trying to 'invert the bathtub' and making processes data-driven rather than task-driven."

4. Ensure Systems Are Extensible

Your business will have many potential sources of data -- customer databases, website interactions, social networking. Not all of these will be immediately useful, but you shouldn't build systems that can only analyse a single kind of data.

"We do need to be curators and understand and treat this stuff with care," said Gartner's Bertram. "Building a roadmap is going to be important so we can use that data in context. If social is not important to you, don't focus on it today. But build a system that allows you to bring in those other data sources as requirements demand."

Optus' McKenzie agreed: "Invest in an infrastructure that gives you options moving forward. Investing only in short-term skill sets is a concern."

3. Find The Right Staff

As we've often pointed out, there's a shortage of people with expertise in big data, both in terms of building the architectures and conducting the analysis. That makes it a sensible area to develop skills, but most observers at the conference suggest that too much product-specific knowledge might be counterproductive. "Big data as a term in itself doesn't mean anything," McKenzie said. "Anyone who is claiming to be a big data expert or a big data architect, I'm highly sceptical.

"At the moment, I'm hesitant to put a massive investment in growing new teams at the moment, because every other week we hear about new technologies. So what do we resource for? Do I invest in a product and have it become irrelevant in five months? It's important to distinguish between technology-specific skill sets and overall skills. A good architect, a good data modeller can be stretched. There's a lot of reusability you can get."

With that said, the right hires are vital. "The best technology and the most expensive data will not deliver us the right outcome unless we have the right people who can integrate that data and interpret it," said AGIMO's Archer.

2. Retrain Everyone Else As Well

You also need to ensure big data systems are understood and accessed across the business, not just by a small and specialised cadre. "It's easy to get caught up in the hype of data scientists and PHDs, but it's important that we enable our own staff to rise to the challenge of big data," McKenzie said.

"You have to get everyone else in your organisation on the same page," said Gartner's Bertram. "Without some clarity on what it is, you'll miss opportunities and incur risk. You can't have a 'wait and see' attitude.

1. Use Your Imagination

The ultimate goal of big data is to learn something which wouldn't otherwise be evident in your business. That requires imagination, confidence, and a willingness to make occasional mistakes.

"We have to look at these new innovative approaches," said Bertram. We might trip over and fumble the ball a bit, but without understanding and learning we're not going to know the art of the possible."


Comments

    Good article - many of the Big Data articles at the moment say it's all very easy once you have it set up, but this article goes into some details on how to do it.

    Angus, good view on working with Big Data. With the explosion of big data, companies are faced with data challenges in three different areas. First, you know the type of results you want from your data but it’s computationally difficult to obtain. Second, you know the questions to ask but struggle with the answers and need to do data mining to help find those answers. And third is in the area of data exploration where you need to reveal the unknowns and look through the data for patterns and hidden relationships. The open source HPCC Systems big data processing platform can help companies with these challenges by deriving insights from massive data sets quick and simple. Designed by data scientists, it is a complete integrated solution from data ingestion and data processing to data delivery. More info at http://hpccsystems.com

Join the discussion!

Trending Stories Right Now