Basic Errors To Avoid When Interpreting Survey Statistics

Today is not just a pretty neat date (20/10/2010), it's also World Statistics Day. While the day is designed to celebrate the collectors of national statistics, it's also a timely chance to remind ourselves of traps we often fall into when looking at survey data.

Everyone loves quoting a statistic, and no wonder: specific numbers always sound more convincing than broad generalisations. However, not all statistics are created equal — the results of a web poll are demonstrably less valid than votes in a national election — and many of us misquote data for our own ends even when the source is basically reasonable. Here's some simple issues that are worth bearing in mind whenever you consider statistics. They should be obvious, but they're often ignored.

As a working example, we're going use a press release issued by the Enterprise Desktop Alliance (EDA) this week entitled Macs Will Increase Their Market Share in the Enterprise by 57%. Press releases are designed to encourage journalists to write stories, and at first glance this sounds impressive and will doubtless attract coverage: it has an Apple-related angle, it suggests major growth, and it hints that the long-standing enterprise bias against installing Macs because of management or security policies might be easing.

The first trap is that the quoted percentage in the headline has been chosen for maximum impact. Based on the survey's own quoted data, the obvious reason Mac enterprise share has risen a lot in percentage terms is that it has started from a very low base: from 3.3% in 2009 to a projected 5.2% in 2011. To take a reverse but equally valid slant, less than 1 in 10 machines going into an enterprise customer is a Mac (right now, less than 1 in 20). If the overall numbers are valid (and buying intentions are reflected in actual behaviour), there's definitely going to be growth, but the overall percentages for market share give a more accurate and less dramatic picture than the 57% change headline figure.

The next question is whether we can trust that the data has been accurately derived. According to the EDA, it is based on a survey of 460 enterprise IT administrators in June 2010. Companies included had more than 100 machines installed, which seems a reasonable basic definition for an enterprise.

Just 8% of respondents were from outside the US, so it would make sense to qualify any comments on the data with that point. (While the US is a dominant player in tech, buying patterns in other world markets can be very different.)

The bigger problem, however, is that the survey was conducted online and promoted via the mailing lists of EDA member companies. The EDA's own mission statement is to help "deliver solutions that streamline the deployment, integration and management of the Mac in sophisticated Windows-centric IT environments". In other words, anyone aware of it or its members is likely already interested in the topic of Mac/Windows integration.

As such, any figures about the use of Macs within business might well overstate their impact — if you ran a Windows-only company, your likelihood of dealing with the EDA would be much lower. The press release does say that only 65% of the organisations surveyed actually had any Macs in place, so this might not be a major issue in practice, but it's certainly worth bearing in mind.

None of this means that the data is automatically not relevant or interesting (or that the usage of Macs in the enterprise isn't growing): it just means that it should be treated with a fair degree of caution. We haven't even looked at more technical statistical concepts such as margins of error, or the dangers of quoting averages. Simply asking "What is the basis of these survey results?", "Is there a possibility of bias?" and "Are percentage changes the most accurate measure?" highlights a lot of possible issues.

Got your own pet statistical interpretation bugbears, or useful online resources for understanding statistics better? Share them in the comments.

Lifehacker 101 is a weekly feature covering fundamental techniques that Lifehacker constantly refers to, explaining them step-by-step. Hey, we were all newbies once, right?


    its a little ironic to have an article about poor representation of statics and the image used is a 3d bar graph which are fantastic at confusing the eye and not considered a good way to display data.

      Given that the graph is taken from the press release they use as an example of potentially misleading statistics, I'd say it was appropriate rather than ironic.

    I actually wrote a post on misleading "increase chance" type stats a while ago:

    Bias in the question being asked. Per one major news organisation in this country, which regularly peddles out the likes of

    "Are you one of the right-thinking people like us who considers that oil companies and banks in this country should be forced to donate all their profits to charity?" Yes/No

    As a rule, I deliberately choose the response that they're NOT targeting.

    Very true. This book should be required reading for everyone who even thinks about using a statistic:

    Relative vs Absolute risk. The media just LOVE relative risk. "You're twice as likely to get x if you do y!" when the absolute risk goes from one in a million to two in a million.. -_-

    Decades after finishing my graduate degree I still experience "school anxiety" dreams -- always about an upcoming statistics exam. The resource I use to avoid falling into traps like those Angus cites is a book by the late dean of science journalists, Victor Cohn. His book, "News & Numbers" (still in print -- check it out on Amazon ) is a simple non-math guide to understanding and using stats thrown at us every day.

Join the discussion!

Trending Stories Right Now