Brent W. Roberts
My tweet, “Failure to replicate hurting your career? What about PhDs with no career because they were honest” was taken by some as a personal attack on Dr. Schnall. It was not and I apologize to Dr. Schnall if it were taken that way. The tweet was in reference to the field as a whole because our current publication and promotion system does not reward the honest design and reporting of research. And this places many young investigators at a disadvantage. Let me explain.
Our publication practices reward the reporting of optimized data—the data that looks the best or that could be dressed up to look nice through whatever means necessary. We have no choice given the way we incentivize our publication system. That system, which punishes null findings and rewards only statistically significant effects means that our published science is not currently an honest portrait of how our science works. The current rash of failures to replicate famous and not so famous studies is simply a symptom of a system that is in dire need of reform. Moreover, students who are unwilling to work within this system—to be honest with their failures to replicate published work, for example—are punished disproportionately. They wash out, get counseled into other fields, or simply choose to leave our field of their own accord.
Of course, I could be wrong. It is possible that the majority of researchers publish all of their tests of all of their ideas somewhere, including their null findings. I’m open to that possibility. But, like many hypotheses, it should be tested and I have an idea for how to test it.
Take any one of our flagship journals and for 1 year follow a publication practice much like that followed for the special replication issue just published. During that year, the editors agree to only review and publish manuscripts that have been 1) pre-registered, 2) have only their introduction, methods, and planned analyses described, not their results, 3) each paper would contain at least one direct replication of each unique study presented in any given proposed package of studies. The papers would be “accepted” based on the elegance of the theory and the adequacy of the methods alone. The results would not be considered in the review process. Of course, the pre-registered studies would be “published” in a form where readers would know that the idea was proposed even if the authors do not follow through with reporting the results.
After a year, we can examine what honest science looks like. I suspect the success rate for statistically significant findings will go down dramatically, but that is only a hypothesis. Generally speaking, think of the impact this would have on our field and science in general. The journal that takes up this challenge would have the chance to show the field and the world, what honest science looks like. It would be held up as an example for all fields of science for exactly how the process works, warts and all. And, if I’m wrong, if at the end that year the science produced in that journal looks exactly like the pages of our current journals I’ll not only apologize to the field, I’ll stop tweeting entirely.
Steps #1 and #2 are a great idea. Not a fan of step #3 though. You occasionally work with longitudinal data. Are you conducting direct replications of those data sets? Almost all of this “replication crisis” stuff is about experiments. It is hard for people not doing experiments to see it as relevant.
Even still, I don’t understand the insistence on a direct replications. Why not just get a bigger sample size to being with? Mathematically speaking, larger samples are more likely to replicate (by definition).
Ryne, to answer your question about conducting direct longitudinal replications, the answer is a qualified yes. Thanks to funding from NIA I’ve been able to run 3 parallel longitudinal studies that are partially overlapping–parts of each study are directly replicated across all 3 samples and each longitudinal study has unique features too. So, based on personal experience, it is possible to do.
That said, replications of longitudinal studies are almost always conceptual replications even if you use identical measures–it is seldom the case that you can run two exact longitudinal studies simultaneously. At the very least you are examining effects across different periods of history. It is the harsh reality of longitudinal research that we almost never have direct replications. This is one reason why I don’t care for the argument that we should emphasize conceptual replications. For inferential purposes, conceptual replications suck in the absence of evidence from direct replications. I know this from too much direct experience.
As for direct replications there are many reasons to insist on them, but I’ll give two most relevant to the “just do a large study” line of reasoning. First, as I can attest to from personal experience, you may have screwed up on the first study. The materials may have been flawed–typos, questions left off of the survey, ratings scales mislabeled, etc. The data may have been mishandled. The data analysis could have been performed incorrectly. You name it, you simply don’t know. So, if you use a large sample you might have just done a very large screw up.
Second, what is large? The fact of the matter is no-one knows the true effect size for any given relation so there is no way to know going into a study what “large” should be. Take, for example, the candidate gene literature. By most standards the studies that continue to be published in our top journals seem well-powered–200, 300, even 1000 participants. Of course, we now know from the GWAS studies that an adequately powered candidate gene study should have anywhere between 50K and 100K participants to reliably detect what most of us would construe as a minuscule effect. Until we have really strong evidence for the magnitude of an effect, we will be left with the uncertainty of not knowing the sampling distribution of any given effect in which we might be interested.
But all is not lost. Ask questions. Collect data. Make observations and conclusions from said data. Repeat. With a little time and effort we might even get close enough to the truth to do some good with the knowledge we’ve gained.
I agree with the sentiment of Ryne’s comments (and the sentiment of Brent’s, too). In addition to the very important point that many students are suffering consequences of these practices, another under-recognized issue, as Ryne mentions, is that many of these sweeping reforms do not readily accommodate different types of data collection, data which are regularly used by personality folks. As personality largely gets swept up with social psychology, it may end up marginalizing those personality psychologists among us who more closely align with sister fields other than social. I see a lack of representation across various disciplines within psychology in this conversation, and see this as especially impactful for personality psychology, where we are a field full of hybrid folks. (E.g., even much of the blog and media coverage on these issues refers to those personality psychologists involved as social psychologists.) The JRP editorial did an excellent job at taking this academic diversity into account, but I fear that many of our other journals (which combine social and personality) are going to effectively redefine practices in personality psychology in a way that will further narrow our field. And frankly, aren’t we small enough already?
It definitely makes me wonder what might be gained by paying better attention to other areas of psychology, where QRPs have been less common (at least if you believe survey data), CIs have more regularly been reported, and sample sizes have often been larger. You could argue that principles of replication apply to all (which, sure), but that ignores existing differences that, if we seek to better understand them, may help us figure out how we got into this mess in the first place.
Jennifer, I’m optimistic that we can handle that methodological diversity. Let’s consider the infinitesimal possibility that a journal like JRP would adopt my proposal for a year. Longitudinal researchers have many other outlets to choose from. We could publish in JP, JPSP, PSPB, SPPS, or elsewhere for a year. Consider the even more remote possibility that as a practice, all journals promote direct replication. The “rules” suggested above are, in reality, guides or incentives for behavior. We have the freedom to adjust them to accommodate deviations to the norm. We are, after all, our own bosses.
That said, if you look at the research psychology produces, including personality psychology, most of it could be directly replicated with the only cost being a slow down in productivity. And, if we all slow down simultaneously because it is done across all of our journals no-one would suffer disproportionately from the slow down because we would all be experiencing it at the same time.
Actually, SPPS has adopted new publication requirements that emphasize current proposals for replication reform, and as far as I can tell, are not easily accommodated by work with “challenging” data. I would be surprised if others (JPSP, PSPB) won’t similarly follow suit, even implicitly if not explicitly. I guess my concern is that it could send a message that real personality psychologists do X, and I think that would be an unfortunate consequence of current reform efforts. I definitely don’t like the idea that policy implementation would result in a huge sector of personality psychologists who “can’t” publish in JRP for a year. You know, we are (metaphorically?) large, we contain multitudes.
Perhaps most research can be replicated, but the costs of doing so vary drastically across different types of data collection. The conversation has largely focused on an area of psychology where costs of replication are really quite cheap. A more nuanced conversation about this could (and should) acknowledge the wide range of barriers researchers face when collecting other types of data, and produce a set of standards and recommendations that is flexible and accommodating, to reflect the different types of work that psychologists do. Otherwise, generalized recommendations for reform will create a new value hierarchy for types of data and research, even if unintentional. As you say, such flexibility is certainly possible and is up to us, but the conversation seems to have forged far ahead without much attention to these types of concerns.
Pingback: An undergraduate’s experience with replications | Cohen's b(rain)
Can’t nest this under Jennifer’s last comment, but I’d like to heartily second what she is talking about in her 2nd paragraph. The other areas of psychology (and other types of studies) besides those most commonly discussed in this replication conversation have certainly been left on the sidelines of the discussion. I’m not an exhaustive reader by any standard, but this post (http://osc.centerforopenscience.org/2014/02/27/data-trawling/) and Brent’s comment about longitudinal replications above are the only direct inclusions of longitudinal data in the discussion, that I’ve come across, which strikes me as being a shame (or perhaps marks my reading tendencies as being shameful).
Pingback: There is no ceiling effect in Johnson, Cheung, & Donnellan (2014) | 
Pingback: When Are Direct Replications Necessary? | Sherman's Head