Ask LH: What Is 'Big Data' And Who Is Collecting It?

What Is

Dear Lifehacker, I've been hearing more and more about "big data". What is it, and is it something I should be worried about? Is this another way companies harvest my data and sell it? Thanks, Bewitched By Buzzwords

Title image made using Carlos Amarillo (Shutterstock) and phipatbig (Shutterstock). Additional photos by Tony Dowler, Intel Free Press, Cognisant Technology Solutions and Shreyank Gupta.

Dear BBB,

Big data has been a hot topic in technology circles for quite a while now (and we regularly look at big data issues in our Lifehacker IT Pro coverage). Depending on who you ask, it either represents a threat to personal privacy, or a revolution in data processing and computing. We'll say this right out of the gate: "Big data" means so many things to so many different people that it runs the risk of meaning nothing at all. That said, there are some places where everyone agrees. Let's dive in from a consumer perspective. (For a more IT pro-centric approach, check out our Big Data 101: Myths And Realities post).

The Many Meanings Of "Big Data"

What Is

Wikipedia defines big data as "any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications". It's a definition that makes sense, and it's the most common way you'll hear scientists, economists, and statisticians describe it. Put simply, "big data" describes huge amounts of information that are relatively easy to obtain, but so massive that they challenge current computing technologies. Big data is the problem you have when you have information coming in from multiple sources (computers, satellites, mobile devices, cameras, microphones, and more). That information needs to be moved around, stored (we're talking petabytes and exabytes of storage), and processed.

If that were all, we'd be finished. Unfortunately, "big data" has also turned into an overused marketing phrase. Software companies and IT service providers use it to convey the superiority of their products or the quality of their talent to customers (and to their competition). Startups and Silicon Valley mainstays love to claim "Our systems are ready for the challenges big data provides" or "Our data scientists know how to handle big data". Unfortunately, those statements don't really say much.

The information that counts as "big data" also muddies the water. Many companies leverage the term when they're talking about the data they can collect and process about people, specifically their customers. The data is useful to better sell products, target marketing efforts, or just make better products to sell. Privacy advocates have latched on to that definition as well, campaigning against "big data" as yet another intrusion into people's private lives and personal data. In scientific or finance circles, "big data" represents everything from meteorological data from weather stations to market data from financial exchanges around the world. All of those data sets fit the original definition, but their uses -- and the connotations associated with the people collecting that information -- are fundamentally different.

When Big Data Is A Problem

What Is

So what should you think when you hear "big data?" It depends on the company that's using the phrase. If some tech startup you've never heard of is proud of how their "algorithm for processing cat pictures means they're capable of managing big data" and that their service "is like [x company] for [y noun]," then you should probably be sceptical. It's distinctly possible that said company has a revolutionary way to aggregate and make sense of all of the cat pictures on the internet, but it's more likely it's a marketing slogan.

Similarly, the term is often used to confuse you into thinking the service does something more than harvest your data for marketing purposes. If you hear so-called "data brokers" like Acxiom, CoreLogic, or DataLogix using the phrase, they certainly have tons of data to manage, but they're using the phrase to describe who they can harvest from, how they can process it, and who they can sell to.

However, if you hear a health care company talking about the challenges associated with handling patient records, electronic documents, and experimental papers from thousands of branch hospitals and research institutions, then you're probably looking at a legitimate, scientific use of the term. Plus, there are companies that specialize in providing software to hospitals, financial management companies, research institutions and government agencies specifically to handle their data challenges. If you're looking at an ad for IBM, Oracle, SAP or SAS's new data processing technologies, you're probably in the clear and they're using the phrase in the manner it was originally intended.

What Big Data Means To You

What Is

Big data may feel like far-off number crunching in a data centre somewhere, but it does have real-world implications. Privacy advocates are concerned about the massive volumes of information that can be stored in easily-accessed (and often insecure) databases, and then sold or traded at will. With a scrap of information, it's not difficult for any company or government agency to build a complete picture of a person and their activities. Best of all, they don't have to collect anything identifiable on their own, and they can use what they acquire for any purpose they choose.

On the bright side, the problem with big data is part of what makes it so useful. It's impersonal and contextless. Just because the data is good doesn't mean that the decisions made using it will be equally good. For example, Google Flu Data did all of the right things and sourced its information from all the right places, but incorrectly predicted infection rates for two years in a row. That means someone may be able to build a picture of you, but the data itself still can't accurately predict your behaviour or choices. Big data may mean there's a lot of information floating around, but it still requires human beings with the right skillset to sift through the information and make appropriate decisions based on what's been collected. Time will tell what those decisions turn out to be.

For the average person, this means two things: One, the vast volumes of information being collected about everything can be used for good or ill. Pay attention as this debate unfolds over the next few years, and it's not as simple as "big data bad, privacy good." Data is just that -- information. It's how it is used that's at issue.

Second, as with any emerging field, there's going to be a surge of interest (and opportunity) in data science. There will also be fluff marketing of course, diluting the phrase to the point of meaninglessness, but this is a new and evolving technology frontier -- one that can be lucrative if you're interested in picking up the skills.

Bottom Line: Don't Be Worried About Big Data, Be Worried About Who's Using It

What Is

At the end of the day, big data -- and the companies making a business out of managing it -- are paving the way towards some great innovations in science, technology and medicine. More information is available and being processed than ever before to study a wide range of topics.

However, on the consumer side, expect more of your life and lifestyle to be leveraged to make decisions about you that you may otherwise have no say in. As companies scramble to learn more about us, even seemingly unrelated industries will suddenly become useful to one another -- your shopping habits will be useful for health insurance companies and your internet browsing habits will be useful to financial services companies. Unless, of course, you take steps to protect your privacy.

We hope that helps clear the air a bit, Bewitched. It's a deep topic, and because it's a developing industry, it's changing all the time. However, it's important to separate the buzzwords from the facts, and the science from the marketing. Hopefully this helps. Keep an eye on the trend though -- it's not going away, even if the buzzword seems silly.

Cheers Lifehacker

Got your own question you want to put to Lifehacker? Send it using our contact form.


Be the first to comment on this story!