Numbers don’t tell the whole truth.
Four out of five dentists recommend sugarless gum for their patients who chew gum? Prove it. Show me the data—all the data.
It’s not that numbers lie; people do—or they mislead, at least, consciously or not. People are often wrong, inexperienced, shortsighted, and biased. People bring unspoken agendas to the table, consciously or unconsciously—a way of looking at numbers whereby they are hoping to be proved right.
We’re all guilty of it, and yet, most people trust data. Every day, I see people mindlessly sharing, tweeting, and retweeting survey and poll data, Web analytics, and infographics as if the Almighty had handed them down like the 10 Commandments.
Some are very authoritative; others are not. And that has me wondering whether we (as a profession) bring enough skepticism into the workplace.
I thought we should take a look at just a few of the many ways data can be skewed. Perhaps you’ll agree with some of these and add a dash of skepticism to your daily routine. Better yet, perhaps you can add to this list.
1. Failing to determine whom you are surveying or what you are studying. This is where it all begins. Known as cohort selection, it’s the foundation upon which all data analysis is built. If you get this wrong, every other aspect of your analysis is flawed. Without a data set that is sufficiently representative of the whole (and sufficiently large enough to ensure some level of statistical certainty) your analysis will not be accurate.
2. Asking slanted questions. On far too many occasions, researchers and pollsters ask questions that skew the results. These questions can be leading and create false assumptions or false comparisons. There are a number of ways in which a skilled researcher can create a poll question so the results are all but a foregone conclusion. If the conclusions drawn from the poll or survey don’t pass the sniff test for me, this is the first place I like to look.
3. Presenting data in a misleading way. Often, researchers present data in such a way as to overemphasize the results they’re seeking to communicate. This is often done visually, while crafting the chart or graph meant to create a shortcut for data visualization. Every time I hear that some effort increased our response rate by 50 percent, my knee-jerk reaction is to know from what to what? For instance, increasing conversion rate from 10 percent to 15 percent is quite different from increasing it from 2 percent to 3 percent. Both represent 50 percent increases, right?
4, Implying causation where only correlation exists. In the autumn, the oak tree sheds its leaves, and the squirrels forage for food for the coming winter. The squirrels might take a cue when the leaves start to fall (probably not), but the falling leaves do not cause them to gather acorns. On a number of occasions, I have seen two observed phenomena linked as though one caused the other, without adequately proving the causative relationship. When I see researchers implying causation where none exists (or where the causation has yet to be proved sufficiently), I run the other way, never to trust their analysis again.
What types of data skew have you seen in your career? Any great stories you can share?