Big data is constantly spoken of as a key IT trend, but what does the phrase "big data" actually mean? Stripped of vendor hype and management panic, it amounts to this: huge volumes of seemingly disconnected data which could provide useful actionable business insights — provided you can work out the right way to analyse it all.
The world of professional IT is filled with buzzwords, but they're often undefined, misunderstood and abused. This week, our Myths And Realities series defines some much-discussed concepts and busts some of the myths that surround them.
What Big Data Is
Massive amounts of data are generated by IT systems these days. Some of this is carefully tracked and analysed by well-defined reporting systems (finance and payment information, for instance). However, much of it is either stored in logs that are never referred to again (website visitors) or dumped after a very limited period of time (security camera footage).
Processing these large volumes of data and correlating them with other business information and external sources of data can lead to useful insights. Businesses might discover, for example, that particular goods are often purchased in combination, but that those combinations vary by time of day or the location of the customer. That can make it easier to cross-sell to those customers.
Big data is often discussed in terms of the "three Vs": volume, velocity and variety. A project isn't big data if it isn't dealing with a very large stream of constant data from a wide range of sources which is arriving at an unpredictable pace.
The analysis process isn't straightforward, since it's not as simple as merely matching up columns of information using a predictable structure that remains the same over time. Data sources need to be rated for relevancy, and the mere act of churning through large volumes of data requires significant processing power, storage and I/O. It also requires constant attention, since the analyses themselves have to adjust to changing incoming data.
What Big Data Isn't
A rebranding of business intelligence or analytics. While some of the same basic techniques are used, big data is a separate discipline to BI or analytics. Those typically produce the same reports on a regular basis, and draw from a much-more tightly defined pool of data.
An easily replicated approach Big data is dynamic; the questions that are relevant differ from business to business, and change within the same business over time.
Big Data: The Challenges To Accept
Expertise is thin on the ground One constant theme in big data: it's hard finding people with the skills to do it well. The ideal combination includes skill in scientific analysis and confidence with large databases — a skill-set you'll pay a premium for. Because the analysis is often very specific to your business, outsourcing isn't really an option.
Data quality is a constant pain. Your analysis depends on the quality of the incoming data, and that's going to be variable. Recognising the quality of the information — and adjusting for it when possible — is an essential step, as in any science-based endeavour.
Keep privacy and legal issues in mind. Just because you have access to the data doesn't mean you can exploit it all in the same way. Be especially wary of data crossing international borders in multi-national organisations.
Big data picture from Shutterstock