Seven Rules for Reporting Polls and Research Results
by Steven S. Ross, February 11, 2008
Posted at: Stats.org
These are points that Ross gave to students in a Journalism school on the reporting of polls and research.
1. In general, effects are small. So you need a lot of statistical power
That means you need a large sample sizes and information on possible confounders – things that can change the results being reported, if they are not taken into account. Example: The number of new cancer cases in the US is increasing. But when the aging of our expanding population is taken into account, the chance that any specific individual will get cancer is declining.
To which we might emphasize that a sample size large enough to yield a significant result in the aggregate will not have the same power regarding on subsets. A stratified sample of 512 cans taken from a trailer of 32 cells may be sufficient to accept or reject the trailer at a 0.025% AQL, but it cannot be so used to accept or reject each cell: the sample per cell is only 16. In the same way, a sample of 1086 of the general population will not be a sample of 1086 wise Latina women or 1086 young adult professionals; so drawing conclusions about sub-populations is illegitimate.
2. You have to watch for spurious clusterng
Imagine a chess or checkerboard occupying the bottom of a large cardboard box. Toss in exactly 64 grains of rice. The grains will bounce around and finally come to rest on the 64 squares of the game board. The average incidence is one grain per square. But you’re not likely to ever see that in your lifetime. Some squares will have many grains – all by chance. Likewise, some communities will report much larger-than-average incidences of certain diseases, all by chance.
To put it another way, random distributions will always generate clusters; so finding a cluster does not in itself mean much.
3. Spurious studies, by definition, create news
Large, well-designed studies are very expensive, so persuasive studies of health issues are rare... and spurious studies, by definition, create “news” because results are unexpected.
Selection bias. Ten studies are performed by ten different researchers. Nine of them find no effect. The tenth researcher finds an effect significant at the 10% level. Which get published? Better yet, which make the evening news? That's right. "Scientists find no link!" is not the lead story at six o'clock.
4. Be skeptical of meta-analysis
The mathematical definition of a meta-analysis is the combining of raw data from many studies to gain the statistical power of a large sample, which is then analyzed as if all the data came from one place. ...a meta-analysis is often – in fact, almost always – BS-squared.
Take the ten studies above. Remember the 10% alpha risk? That is the risk that you would find a significant effect when there was none. That is, you would hear a signal that wasn't sent. At the 10% level, you would expect one positive study out of ten, which is what we postulated. But suppose we could claim a larger sample by lumping all ten studies into one? Perhaps that one study will be enough to give a positive signal for the aggregate. But we have only disguised the fact that it was a random positive. This is how the effects of second-hand tobacco smoke were "discovered." But the real problem with meta-analysis is that the terms and measurements were often made in different ways, with different definitions, and different methodologies. Combining them can be like combining apples and oranges.
5. Look for mechanisms when the results are unexpected
When you have unexpectedly high responses to seemingly low doses, the case is significantly bolstered by identifying a mechanism instead of looking only at statistical correlations or regressions....
The truism is that "correlation is not causation." That doesn't mean that it fails to prove causation, it means that It. Is. Not. Causation. Statistics can never prove a causal relation.
6. With polls, keep an eye on demographics
When it comes to polling, yes, we can take 850 in an imperfectly-drawn New Hampshire sample and split it 6 ways (young-old, rich-poor, male-female, minority-white...) and insinuate the overall statistical power of the overall sample, which isn't that great in the first place, while never once mentioning that New Hampshire's demographics and ground truths have changed a lot since the last hugely contested primary there in 2000, and that younger voters, who have only cell phones, are hard to find – and thus hard to poll. And why? Because bringing up any of this screws up the story!
There are two things wrapped up here. History may indeed make this year's sample literally incomparable to last year's, because of changes in the population, its history, the definition of terms, and so on. Think of various statistical facts about the USA that will change depending on whether or not Alaska and Hawaii are included. The other issue is that just as the operational definition of the thing being measured will affect the measurement obtained, so too will the sampling methodology. Calling people on the phone at random sounds easy; but it means you have to have a list of all phone numbers (including nowadays cell phones, throw-away no-name phones) and take account of people having more than one phone number. In the old days of land lines, pollsters would say that households with teenaged daughters were over-sampled because they often had a second phone line installed for the same house.
7. PR plays on laziness - your laziness
Thinking is such hard work. That's the secret of PR. Odds are, journalists will reprint the press release on the new study or poll results rather than thinking about what could go wrong.