This week in PIG-IE we discussed the just published paper by an all-star team of “skeptical” researchers that examined the reliability of neuroscience research.  It was a chance to take a break from our self-flagellation to see whether some of our colleagues suffer from similar problematic research practices.

Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience.

If you’d like to skip the particulars and go directly to an excellent overview of the paper, head over to Ed Yong’s blog.

There are too many gems in this little paper to ignore, so I’m going to highlight a few features that we thought were invaluable.  First, the opening paragraph is an almost poetic introduction to all of the integrity issues facing science, not only psychological science.  So, I quote verbatim:

“It has been claimed and demonstrated that many (and possibly most) of the conclusions drawn from biomedical research are probably false. A central cause for this important problem is that researchers must publish in order to succeed, and publishing is a highly competitive enterprise, with certain kinds of findings more likely to be published than others. Research that produces novel results, statistically significant results (that is, typically p < 0.05) and seemingly ‘clean’ results is more likely to be published. As a consequence, researchers have strong incentives to engage in research practices that make their findings publishable quickly, even if those practices reduce the likelihood that the findings reflect a true (that is, non-null) effect. Such practices include using flexible study designs and flexible statistical analyses and running small studies with low statistical power. A simulation of genetic association studies showed that a typical dataset would generate at least one false positive result almost 97% of the time, and two efforts to replicate promising findings in biomedicine reveal replication rates of 25% or less. Given that these publishing biases are pervasive across scientific practice, it is possible that false positives heavily contaminate the neuroscience literature as well, and this problem may affect at least as much, if not even more so, the most prominent journals.”

The authors go on to show that the average power of neuroscience research is an abysmal 21%.  Of course, “neuroscience” includes animal and human studies.  When broken out separately, the human fMRI studies had an average statistical power of 8%.  That’s right, 8%.  Might we suggest that the new Brain Initiative money be spent by going back and replicating the last ten years of fMRI research so we know which findings are reliable?  Heck, we gripe about our “coin flip” powered studies in social and personality psychology (50% power).  Compared to 8% power, we rock.

Here are some additional concepts, thoughts, and conclusions from their study worth noting:

1.  Excess Significance: “The phenomenon whereby the published literature has an excess of statistically significant results that are due to biases in reporting.”

2. Positive predictive value:  What the p-rep was supposed to be; “the probability that a positive research finding reflects a true effect (as in a replicable effect).”  They even provide a sensible formula for computing it.

3.  Proteus phenomenon: “The first published study is often the most biased towards an extreme result.”  This seems to be our legacy.  Unreliable but “breathtaking” findings that are untrue, but can’t be discarded because we seldom if ever publish the lack of replications.

4.  Vibration of effects:  “low -powered studies are more likely to provide a wide range of estimates of the magnitude of an effect”

Vibration effects are really, really important because there are some in our tribe who believe that using smaller sample sizes “protects” one from reporting spuriously small effects.  In reality, the authors describe how using small samples increases the likelihood of Type I and Type II errors.  Underpowered studies are simply bad news.


This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Schadenfreude

  1. I like the winner’s curse from their Box 1. The “lucky” scientist is somehow cursed by having an inflated effect size estimate (but paradoxically they probably have a paper in a highly esteemed journal). P-hacking also contributes to the winner’s curse bringing us full circle to the idea QRPs=PEDs theme.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s