Inspired by Rich Lucas’s recent analysis of the type 1 error rates underlying Daryl Bem’s approach to running his ESP research, we decided to write a little missive ourselves. Please comment.
Psychological Science has a Serious Problem: The Bem Fallout
Sometime in the near future Bem’s article showing evidence for the existence of ESP will be published in the Journal of Personality and Social Psychology, the flagship journal of our field. As could be expected an uproar has occurred over this eventuality. To our distress, the hue and cry has focused mostly on the topic itself and not the reasons for why an article like this is possible. Our thesis is simple. This article was a fait accompli because it is the result of the standard operating procedures of psychological science. Those operating procedures are the problem. The problem is not the peer review process per se or the reality (or lack thereof) of ESP.
The problem we have is one of evolutionary sociology. In the words of one of our esteemed colleagues, a publication is only worthy if “it shows something” (i.e., costly signaling). In no uncertain terms “showing something” means showing that something is statistically significant. Thus to be successful in our field we must publish a string of articles that reveal statistically significant results, even if the point of the article (e.g., ESP) is fanciful. We believe that this has led to practices that are widely accepted that undermine our ability to create a foundation of reliable scientific findings.
From this standpoint, the publication of Bem’s article is an indictment of our widely held practices. Therefore, the action editor and reviewers are not at fault. We are. As long as we implicitly or explicitly condone these practices, we should consider much of our science no different than Bem’s article. Moreover, our current concern is if we fail to address the implications of Bem’s article constructively, the effect will be to further marginalize our field at a time when many important institutions, such as the National Institutes of Health, already question the usefulness of our scientific contributions.
To this end, we have done two things. First, below, we diagnose the problem by documenting the practices that we believe are the basis to the publication of an article like Bem’s. We consider the list only provisional. Please add to it or modify it to your liking. Second, we have proposed a new journal that will be built on evaluation practices that will help to overcome our current inertia to change our standard practices. Preferably, our current journals would change their policies to endorse these practices. We believe that if a journal is created that uses these practices and/or an existing journal adopts these practices it will, with time, become the most important outlet for our science, trumping even JPSP.
Problematic Practices in Psychological Science
1. Null Hypothesis Significance Testing (NHST): We compare our results to the null hypothesis despite the fact that we are almost never testing something that has not been tested before. We should be comparing our results to previous results and testing whether they differ from what has been found before, not from the null.
2. Not valuing the Null Hypothesis: Our explicit goal is to produce “knowledge”. Our system for creating knowledge is showing that something is “statistically significant”. This creates a situation where we do not value null results, which are intrinsically necessary for creating knowledge because null results show us where our ideas and/or constructs do not work. Without the figure-ground patterns of null and confirmatory findings we have a cacophony of findings that add up to nothing.
3. Data churning: Running repeated experiments until you get a hit; surfing the correlation matrix in search of a finding.
4. Not replicating.
5. Not reporting your lack of replication.
6. Peeking: Checking data as it is being collected and discovering “significant” effects along the way.
7. HARKing: Hypothesizing After the Results are Known.
8. Data Topiary: The process of pruning insignificant findings or findings that contradict hypotheses followed closely by changing ones hypotheses (see HARKing).
9. Outcome fragmentation grenade: Collect so many outcomes that something is bound to hit.
10. Betting against the house: Running underpowered studies, which means you have a 50:50 chance of finding an effect even if it exists.
A Modest Proposal: The Journal of Reproducible Results in Social and Personality Psychology
- All original submissions must contain at least two studies. An original study and a direct replication of the original study.
- Any subsequent study that directly replicates the method and analyses used in the original set of studies, regardless of the results, will be published. The subsequent studies will be linked to the original study.
- When evaluating results, researchers will present point estimates, preferably in the form of an effect size indicator, and confidence intervals around that point estimate. The use of null hypothesis significance testing should be minimized.
- All data analyzed for the published studies will be submitted as an appendix and made available to the scientific community for re-analysis.
- IRB documentation, including the date of first submission, subsequent re-approvals, and number of participants run under the auspices of the IRB submission will be provided as an appendix. If there is a discrepancy between the number of participants run under the study IRB and the published research, an explanation will be required and recorded in the published manuscript.