Beware Of Deducing Too Much From Big Data

Big data can provide powerful insights for business decisions. But there's a clear risk to be aware of: just because you have a very large set of data doesn't mean that it isn't biased and incomplete.

Big data picture from Shutterstock

Australian research Kate Crawford highlights the issue in a post for the Harvard Business Review (based in turn on a conference speech). As she points out:

Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves.

Crawford uses as an example the 20 million odd tweets relating to Hurricane Sandy. While these provide unusual insights in how people reacted to the disaster, they're hugely biased towards people in Manhattan. Communities which were more directly impacted by the event -- to the point of having no phone signal -- aren't represented to the same degree. That doesn't mean the Twitter data isn't useful, but it has to be seen as representing particular communities, not a broader response.

The lesson? "As we increasingly rely on big data's numbers to speak for themselves, we risk misunderstanding the results," Crawford writes. Big data can inform our actions, but utimately we still have to make our own decisions. Hit the link for the full post and more ideas.

The Hidden Biases in Big Data [Harvard Business Review]


Comments

    Obviously you have some interest in this area, but haven't worked in statistics at all previously. I swear half the articles you guys write are just "heres what i've been looking up today". Such an awesome job.

Join the discussion!