The New Rules of Research

by Brent W. Roberts

A paper on one of the most important research projects in our generation came out a few weeks ago. I’m speaking, of course, of the Reproducibility Project conducted by several hundred psychologists. It is a tour de force of good science. Most importantly, it provided definitive evidence for the state of the field. Despite the fact that 97% of the original studies reported statistically significant effects, only 36% hit the magical p < .05 mark when closely replicated.

Two defenses have been raised against the effort. The first, described by some as the “move along folks, there’s nothing to see here” defense, proposes that a 36% replication rate is no big deal. It is to be expected given how tough it is to do psychological science. At one level I’m sympathetic to the argument that science is hard to do, especially psychological science. It is the case that very few psychologists have 36% of their ideas work. And, by work, I mean in the traditional sense of the word, which is to net a p value less than .05 in whatever type of study you run. On the other hand, to make this claim about published work is disingenuous. When we publish a peer-reviewed journal article, we are saying explicitly that we think the effect is real and that it will hold up. If we really believed that our published work was so ephemeral, then much of our behavior in response to the reproducibility crisis has been nonsensical. If we all knew and expected our work not to replicate most of the time, then we wouldn’t get upset when it didn’t. We have disproven that point many times over. If we thought our effects that passed the p< .05 threshold were so flimsy, we would all write caveats at the end of our papers saying other researchers should be wary of our results as they were unlikely to replicate. We never do that. If we really thought so little of our results we would not write such confident columns to the New York Times espousing our findings, stand up on the TED stage and claim such profound conclusions, or speak to the press in such glowing terms about the implications of our unreliable findings. But we do. I won’t get into the debate over whether this is a crisis or not, but please don’t pass off a 36% reproducibility rate as if it is either the norm, expected, or a good thing. It is not.

The second argument, that is somewhat related, is to restate the subtle moderator idea. It is disturbingly common to hear people argue that the reason a study does not replicate is because of subtle differences in the setting, sample, or demeanor of the experimenter across labs. To invoke this is problematic for several reasons. First, it is an acknowledgment that you haven’t been keeping up with the scholarship surrounding reproducibility issues. The Many Labs 3 report addressed this hypothesis directly and showed that the null hypothesis could not be rejected.  Second, it means you are walking back almost every finding ever covered in an introductory psychology textbook. It makes me cringe when I hear what used to be a brazen scientist who had no qualms generalizing his or her findings based on psychology undergraduates to all humans, claiming that their once robust effects are fragile, tender shoots, that only grow on the West coast and not in the Midwest. I’m not sure if the folks invoking this argument realize that this is worse than having 66% of our findings not replicate. At least 36% did work. The subtle moderator take on things basically says we can ignore the remaining 36% too because yet unknown subtle moderators will render them ungeneralizable if tested a third time. While I am no fan of the over-generalization of findings based on undergraduate samples, I’m not yet willing to give up the aspiration of finding things out about humans. Yes, humans. Third, if this was such a widely accepted fact, and not something solely invoked after our work fails to replicate, then again, our reactions to the failures to replicate would be different. If we never expected our work to replicate in the first place, our reactions to failures to replicate wouldn’t be as extreme as they’ve been.

One thing that has not really occurred much in response to the Reproducibility Report is to recommend some changes to the way we do things. With that in mind, and in homage to Bill Maher, I offer a list of the “New Rules of Research[1]” that follow, at least in my estimate, from taking the results of the Reproducibility Report seriously.

  1. Direct replication is yooge (huge). Just do it. Feed the science. Feed it! Good science needs reliable findings and direct replication is the quickest way to good science. Don’t listen to the apologists for conducting only conceptual replications. Don’t pay attention to the purists who argue that all you need is a large sample. Build direct replications into your work so that you know yourself whether your effects hold up. At the very least, doing your own direct replications will save you from evils of sampling error. At the very most, you may catch errors in your protocol that could affect results in unforeseen ways. Then share it with us however you can. When you are done with that do some service to the field and replicate someone else’s work.
  1. If your finding fails to replicate, the field will doubt your finding—for now. Don’t take it personally. We’re just going by base rates. After all, less than half of our studies replicate on average. If your study fails to replicate, you are in good company—the majority. The same thing goes if your study replicates. Two studies do not make a critical mass of evidence. Keep at it.
  1. Published research in top journals should have high informational value. In the parlance of the NHSTers this means high power. For the Bayesian folks, compelling evidence that is robust across a range of reasonable priors. Either way, we know from some nice simulations that for the typical between subjects study this means that we need a minimum of 165 participants for average main effects and more than 400 participants for 2×2 between-subjects interaction tests. You need even more observations if you want to get fancy or reliably detect infinitesimal effect sizes (e.g., birth order and personality, genetic polymorphisms and any phenotype). We now have hundreds of studies that have failed to replicate and the most powerful reason is the lack of informational value in the design of the original research. Many protest that the burden of collecting all of those extra participants will cost too much time, effort, and money. While it is true that increasing our average sample size will make doing our research more difficult, consider the current situation in which 64% of our studies fail to replicate and are therefore are a potential waste of time to read and review because they are poorly designed to start (e.g., small N studies with no evidence of direct replication). We waste countless dollars and hours of our time processing, reviewing, and following up on poorly designed research. The time spent collecting more data in the first place will be well worth it if the consequence is increasing the amount of reproducible and replicable research. And, the journals will love it because we will publish less and their impact factors will inevitably go up—making us even more famous.
  1. The gold standard for our science is a pre-registered direct replication by an independent lab. A finding is not worth touting or inserting in the textbooks until a well-powered, pre-registered, direct replication is published. Well, to be honest, it isn’t a worth touting until a good number of well-powered, pre-registered, direct replications have been published.
  1. The peer-reviewed paper is no longer the gold standard. We need to de-reify the publication as the unit of exaltation. We shouldn’t be winning awards, or tenure, or TED talks for single papers. Conversely, we shouldn’t be slinking away in shame if one of our studies fails to replicate. We are scientists. Our job is, in part, to figure out how the world works. Our tools are inherently flawed and will sometimes give us the wrong answer. Other times we will ask the wrong question. Often we will do things incorrectly even when our question is good. That is okay. What is not okay is to act as if our work is true just because it got published. Updating your priors should be an integral part of doing science.
  1. Don’t leave the replications to the young. Senior researchers, the ones with tenure, should be the front line of replication research—especially if it is their research that is not replicating. They are the ones who can suffer the reputational hits and not lose their paychecks. If we want the field to change quickly and effectively, the senior researchers must lead, not follow.
  1. Don’t trust anyone over 50[2]. You might have noticed that the persons most likely to protest the importance of direct replications or who seem willing to accept a 36% replication rate as “not a crisis” are all chronologically advanced and eminent. And why wouldn’t they want to keep the status quo? They built their careers on the one-off, counter-intuitive, amazeballs research model. You can’t expect them to abandon it overnight can you? That said if you are young, you might want to look elsewhere for inspiration and guidance. At this juncture, defending the status quo is like arguing to stay on board the Titanic.
  1. Stop writing rejoinders. Especially stop writing rejoinders that say 1) there were hidden, subtle moderators (that we didn’t identify in the first place), and 2) a load of my friends and their graduate students conceptually replicated my initial findings so it must be kind of real. Just show us more data. If you can reliably reproduce your own effect, show it. The more time you spend on a rejoinder and not producing a replication of your own work, the less the field will believe your original finding.
  1. Beware of meta-analyses. As Daniël Lakens put it: bad data + good data does not equal good data. As much as it pains me to say it, since I like meta-analyses, they are no panacea. Meta-analyses are especially problematic when a bunch of data has been p-hacked into submission and it is included with some high quality data. The most common result of this combination is to find an effect that is different from zero and thus statistically significant but strikingly small compared to the original finding. Then, you see the folks who published the original finding (usually with a d of .8 or 1) trumpeting the meta-analytic findings as proof that their idea holds, without facing the fact that the flawed meta-analytic effect size is so small that they would have never detected it using the methods they used to detect it in the first place.
  1. If you want anyone to really believe your direct or conceptual replication then pre-register it. Yes, we know, there will be folks who will collect the data, then analyze it, then “pre-register” it after the fact. There will always be cheaters in every field. Nonetheless, most of us are motivated to find the truth and eventually if the gold standard is applied (see rule #4), we will get better estimates of the true effect. In the mean time, pre-register your own replication attempts and the field will be better for your efforts.

[1] Of course, many of these are not at all new. But, given the reactions to the Reproducibility Report and the continued invocation of any reason possible to avoid doing things differently, it is clear that these rules are new to some.

[2] Yes, that includes me. And, yes, I know that there are some chronologically challenged individuals on the pro-reproducibility side of the coin. That said, among the outspoken critics of the effort I count a disproportionate number of eminent scientists without even scratching the surface.

Posted in Uncategorized | 5 Comments

What we are reading in PIG-IE 9-14-15

Last week, we read Chabris et al (2015) The fourth law of behavior genetics another in a series of lucid papers from the GWAS consortium.

This week, with Etienne LeBel in town, we are reading the OSF’s Reproducibility Report.

Posted in Uncategorized | Leave a comment

Be your own replicator

by Brent W. Roberts

One of the conspicuous features of the ongoing reproducibility crisis stewing in psychology is that we have a lot of fear, loathing, defensiveness, and theorizing being expressed about direct replications. But, if the pages of our journals are any indication, we have very few direct replications being conducted.

Reacting with fear is not surprising. It is not fun to have your hard-earned scientific contribution challenged by some random researcher. Even if the replicator is trustworthy, it is scary to have your work be the target of a replication attempt. For example, one colleague was especially concerned that graduate students were now afraid to publish papers given the seeming inevitability of someone trying to replicate and tear down their work. Seeing the replication police in your rearview mirror would make anyone nervous, but especially new drivers.

Another prototypical reaction appears to be various forms of loathing. We don’t need to repeat the monikers used to describe researchers who conduct and attempt to publish direct replications. It is clear that they are not held in high esteem. Other scholars may not demean the replicators but hold equally negative attitudes towards the direct replication enterprise and deem the entire effort a waste of time. They are, in a word, too busy making discoveries to fuss with conducting direct replications.

Other researchers who are the target of failed replications have turned to writing long rejoinders. Often reflecting a surprising amount of work, these papers typically argue that while the effect of interest failed to replicate, there are dozens of conceptual replications of the phenomenon of interest.

Finally, there appears to be an emerging domain of scholarship focused on the theoretical definition and function of replications. While fascinating, and often compelling, these essays are typically not written by people conducting direct replications themselves—a seemingly conspicuous fact.

While each of these reactions are sensible, they are entirely ineffectual, especially in light of the steady stream of papers failing to replicate major and minor findings in psychology. Looking across the various efforts at replication, it is not too much of an exaggeration to say that less than 50% of our work is reproducible. Acting fearful, loathing replicators, being defensive and arguing for the status quo, or writing voluminous discourses on the theoretical nature of replication are fundamentally ineffective responses to this situation. We dither while a remarkable proportion of our work fails to be reproduced.


There is, of course, a deceptively simple solution to this situation. Be your own replicator.


It is that simple. And, I don’t mean conceptual replicator; I mean direct replicator. Don’t wait for someone to take your study down. Don’t dedicate more time writing a rejoinder than it would take to conduct a study. Replicate your work yourself.

Now this is not much different than the position that Joe Cesario espoused, which is surprising because as Joe can attest to I did not care for his argument when it came out. But, it is clear at this juncture that there was much wisdom in his position. It is also clear that people haven’t paid it much heed. Thus, I think it merits restating.

Consider for a moment how conducting your own direct replication of your own research might change some of the interactions that have emerged over the last few years. In the current paradigm we get incredibly uncomfortable exchanges that go something like this:

Researcher R: “Dear eminent, highly popular Researcher A, I failed to replicate your study published in that high impact factor journal.”

Researcher A: “Researcher B, you are either incompetent or malicious. Also, I’d like to note that I don’t care for direct replications. I prefer conceptual replications, especially because I can identify dozens of conceptual replications of my work.”


Imagine an alternative universe in which Researcher A had a file of direct replications of the original findings. Then the conversation would go from a spitting match to something like this:

Researcher R: “Dear eminent, highly popular Researcher A, I failed to replicate your study published in that high impact factor journal.”

Researcher A: “Interesting. You didn’t get the same effect? I wonder why. What did you do?”

Researcher B: “We replicated your study as directly as we could and failed to find the same effect” (whether judged by p-values, effect sizes, confidence intervals, Bayesian priors or whatever).

Research A: “We’ve reproduced the effect several times in the past. You can find the replication data on the OSF site linked to the original paper. Let’s look at how you did things and maybe we can figure this discrepancy out.”


That is a much different exchange than the one’s we’ve seen so far which have been dominated by conspicuous failures to replicate and, well, little more than vitriolic arguments over details with little or no old or new data.

Of course, there will be protests. Some continue to argue for conceptual replications. This perspective is fine. And, let me be clear. No one to date has argued against conceptual replications per se. What has been said is that in the absence of strong proof that the original finding is robust (as in directly replicable), conceptual replications provide little evidence for the reliability and validity of an idea. That is to say, conceptual replications rock, if and when you have shown that the original finding can be reproduced.

And that is where being your own replicator is such an ingenious strategy. Not only do you inoculate the replicators, but also you bolster the validity of your conceptual replications in the process. That is a win-win situation.

And, of course, being your own direct replicator also addresses the argument that the replicators may be screw-ups. If you feel this way, fine. Be your own replicator. Show us you can get the goods. Twice. Three times. Maybe more. But, of course, make sure to pre-register your replication attempts otherwise some may accuse you of p-hacking your way to a direct replication.

It is also common, as noted, to see a response to a failure to replicate that lists out sometimes dozens of small sample, conceptual replications of original work as some kind of response. Why waste your time? The time spent crafting arguments about tenuous evidence could easily be spent conducting your own direct replication of your own work. Now that would be a convincing response. A direct replication is worth a thousand words—or a thousand conceptual replications.

Conversely, replication failures spur some to craft nuanced arguments about just what is a replication and if there anything that is really a “direct” replication and such. These are nice essays to read. But, we’ll have time for these discussions later, after we show that some of our work actually merits discussion. Proceeding to intellectual discussions is nothing more than a waste of time when more than 50% of our research fails to replicate.

Some might want to argue that conducting our own direct replications would be an added burden to already inconvenienced researchers. But, let’s be honest. The JPSP publication arms race has gotten way out of hand. Researchers seemingly have to produce at least 8 different studies to even have a chance of getting into the first two sections of JPSP. What real harm would there be if you still did the same number of studies but just included 4 conceptually distinct studies each replicated once? That’s still 8 studies, but now the package would include information that would dissipate the fear of being replicated.

Another argument would be that it is almost impossible to get direct replications published. And, that is correct. Our only bias more foolish than the bias against null findings is the bias against the value of direct replications. Resultantly, it would be hard to get direct replications published in mainstream outlets. I have utopian dreams sometimes where I imagine our entire field moving past this bias. One can dream, right?

But, this is no longer a real barrier. Some journals or sections of journals are actively fostering the publication of direct replications. Additionally, we have numerous outlets for direct replication research, whether it is formal ones, such as PloS-ONE or Frontiers, or less formal such as Psychfiledrawer or the Open Science Framework. If you have replication data, it can find a home, and interested parties can see it. Of course, it would help even more if the data were pre-registered.

So there you have it. Be your own replicator. It is a quick, easy, entirely reasonable way dispelling the inherent tension in the current replication crisis we are enduring.




Posted in Uncategorized | 1 Comment

Sample Sizes in Personality and Social Psychology

R. Chris Fraley

Imagine that you’re a young graduate student who has just completed a research project. You think the results are exciting and that they have the potential to advance the field in a number of ways. You would like to submit your research to a journal that has a reputation for publishing the highest caliber research in your field.

How would you know which journals are regarded for publishing high-quality research?

Traditionally, scholars and promotion committees have answered this question by referencing the citation Impact Factor (IF) of journals. But as critics of the IF have noted, citation rates per se may not reflect anything informative about the quality of empirical research. A paper can receive a large number of citations in the short run because it reports surprising, debatable, or counter-intuitive findings regardless of whether the research was conducted in a rigorous manner. In other words, the citation rate of a journal may not be particularly informative concerning the quality of the research it reports.

What would be useful is a way of indexing journal quality that is based upon the strength of the research designs used in published articles rather than the citation rate of those articles alone.

In an article recently published in PLoS ONE, Simine Vazire and I attempted to do this by ranking major journals in social-personality psychology with respect to what we call their N-pact Factors (NF)–the statistical power of the studies they publish. Statistical power is defined as the probability of detecting an effect of interest when that effect actually exists. Statistical power is relevant for judging the quality of empirical research literatures because, compared to lower powered studies, studies that are highly powered are more likely to (a) detect valid effects, (b) buffer the literature against false positives, and (c) produce findings that other researchers can replicate. Although power is certainly not the only way to evaluate the quality of empirical research, the more power a study has, the better positioned it is to provide useful information and to make robust contributions to the empirical literature.

Our analyses demonstrate that, overall, the statistical power of studies published by major journals in our field tends to be inadequate, ranging from 40% to 77% for detecting the typical kinds of effect sizes reported in social-personality psychology. Moreover, we show that there is considerable variation among journals; some journals tend to consistently publish higher power studies and have lower estimated false positive rates than others. And, importantly, we show that some journals, despite their comparatively high impact factors, publish studies that are greatly underpowered for scientific research in psychology.

We hope these rankings will help researchers and promotion committees better evaluate various journals, allow the public and the press (i.e., consumers of scientific knowledge in psychology) to have a better appreciation of the credibility of published research, and perhaps even facilitate competition among journals in a way that would improve the net quality of published research. We realize that sample size and power are not and should not be the gold standard in evaluating research But we hope that this effort will be viewed as a constructive, if incomplete, contribution to improving psychological science.

Simine wrote a nice blog post about some of the issues relevant to this work. Please check it out.


Posted in Uncategorized | 1 Comment

Is It Offensive To Declare A Social Psychological Claim Or Conclusion Wrong?

By Lee Jussim

Science is about “getting it right” – this is so obvious that it should go without saying. However, there are many obstacles to doing so, some relatively benign (an honestly conducted study produces a quirky result), others less so (p-hacking). Over the last few years, the focus on practices that lead us astray have focused primarily on issues of statistics, methods, and replication.

These are all justifiably important, but here I raise the possibility that other, more subjective factors, distort social and personality psychology in ways at least as problematic. Elsewhere, I have reviewed what I now call questionable interpretive practices – how cherrypicking, double standards, blind spots, and embedding political values in research all lead to distorted conclusions (Duarte et al, 2014; Jussim et al, in press a,b).

But there are other interpretations problems. Ever notice how very few social psychological theories are refuted or overturned?   Disconfirming theories and hypotheses (including the subset of disconfirmation, failures to replicate) should be a normal part of the advance of scientific knowledge. It is ok for you (or me, or Dr. I. V. Famous) to have reached or promoted a wrong conclusion.

In social psychology, this rarely happens. Why not? Many social psychologists seem to balk at declaring some claims “wrong.” This seems to occur primarily for three reasons. The first is that junior scholars, especially pre-tenure, may justifiably feel that potentially angering senior colleagues (who may later be called on to write letters for promotion) is not a wise move. That is the nature of the tenure beast, but it only explains the behavior of, at most, a minority. What about the rest of us?

The second reason is essentially social (i.e., not scientific). Declaring some scientific claim to be “wrong” is, I suspect, often perceived as a personal attack on the claimant. This probably occurs because it is impossible to declare some claim wrong without citing some article making the claim. Articles have authors, so that declaring a claim wrong is tantamount to saying “Dr. Earnest’s claims are wrong.” This problem is further exacerbated by the fact that theories, hypotheses, and phenomenon often become identified with either the originators or apostles (prestigious researchers who popularize them). Priming social behavior? Fundamental attribution error? Bystander effect? System justification? Implicit racism?  There are individual social psychologists associated with each of these ideas. To challenge the validity, or even the power or generality of such ideas/effects/theories/hypotheses risks being interpreted as something more than a mere scientific endeavor – it risks being seen as a personal insult to the person identified with them. Thus, declaring a claim “wrong” risks being seen, not as a scientific act of theory or hypothesis disconfirmation, but as a personal attack — and no one supports personal attacks.

The third reason is grounded in a very unique philosophy of science perspective – namely, that almost every claim is true under some conditions (for explicitly articulated versions of this, see Greenwald, Pratkanis, Leippe, & Baumgardner, 1986; McGuire, 1973, 1983). As such, we have a great deal of research on “conditions under which” some theory or hypothesis holds, but very little research providing wholesale refutation of a theory or hypothesis. I have heard apocryphal stories of prestigious researchers declaring (behind closed doors) that they only run studies to prove what they already know and that they can craft a study to confirm any hypothesis they choose. These apocrypha are not evidence – but the evidence of p-hacking in social psychology and elsewhere (e.g., Ioannidis, 2005; Simmons et al, 2012; Vul et al, 2009) raises the possibility that some unknown number of social psychologists conduct their research in a manner consistent both with these apocrypha and with the notion that everything is true under some conditions. If every claim is true under some conditions, then massive flexibility in methods and data analysis in the service of demonstrating almost any notion becomes, not a flaw to be rooted out of science, but evidence of the “skill” and “craftsmanship” of researchers, and of the “quality” of their research. In this context, declaring any scientific claim, conclusion, hypothesis or theory “wrong” becomes unjustified. It reflects little more than ignorance of this “sophisticated” view of science, and arrogance in the sense that no one, according to this view, can declare anything “wrong” because it is true under some conditions. As such, declaring some claim wrong can again be viewed as an offensive act.

The idea that claims cannot be “wrong” because “every claim is true under some circumstances” goes too far for two reasons. First, some claims are outright false, such as “the Sun revolves around the Earth.” Furthermore, even if two competing claims are both correct under some conditions, this does not mean they are equally true. Knowing that something is true 90% of the time is quite different than knowing it is true 10% of the time. Claiming that some phenomena is “powerful” or “pervasive,” when the data show it is only rarely true, is wrong. Let’s say that, on average, stereotype biases in person perception are not very powerful or pervasive – which they are not (Jussim, 2012 – multiple meta-analyses yield an average estimate of r = .10 for such biases). Isn’t it better to point out that the field’s long history of declaring them to be powerful and pervasive is wrong (at least when the criterion is the field’s own data), than to just report the data without acknowledging its bearing on longstanding conclusions?

This reluctance to declare certain theories or hypotheses wrong risks leading social psychology to become populated with a plethora of “… undead theories that are ideologically popular but have little basis in fact” (Ferguson & Heene, 2012, p. 555). This amusing phrasing cannot be easily dismissed – ask yourself, “Which theories in social psychology have ever been disconfirmed?” Indeed, a former President of the National Research Council, Dr. Bruce Alberts and editor of science put it this way (quoted in The Economist, 2013):

“And scientists themselves need to develop a value system where simply moving on from one’s mistakes without publicly acknowledging them severely damages, rather than protects, a scientific reputation.’”

I agree. It is ok to be wrong. In fact, if one engages in enough scientific research for a long enough period of time, one is almost guaranteed to be wrong about something. Good research at its best can be viewed as systematic, creative, and informed trial and error. But that includes … error! Both being wrong sometimes, and correcting wrong claims are integral parts of healthy scientific processes.

Furthermore, from a prescriptive standpoint of how science should proceed, I concur with Popper’s (1959/1968) notion that we should seek to disconfirm theories and hypotheses. Ideas left standing in the face of strong attempts at disconfirmation are those most likely to be robust and valid. Thus, rather than being something we social psychologists should shrink away from, bluntly identifying which theories and hypotheses do not (and do!) hold up to tests of logic and existing data should be a core component of how we conduct our science.



Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. E. (2014). Political diversity will improve social psychological science. Manuscript that I hope is on the verge of being accepted for publication.

The Economist (October 19, 2013). Trouble at the lab. Retrieved on 7/8/14 from:

Ferguson, C. J., & Heene, M. (2012). A vast graveyard of undead theories: Publication bias and psychology’s aversion to the null. Psychological Science, 7, 555-561.

Greenwald, A. G., Pratkanis, A. R., Leippe, M. R., & Baumgardner, M. H. (1986). Under what conditions does theory obstruct research progress? Psychological Review, 93, 216-229.

Ioannidis, J. P. A. (2005).  Why most published research findings are false. PLOS Medicine, 2, 696-701.

Jussim, L. (2012). Social perception and social reality: Why accuracy dominates bias and self-fulfilling prophecy. NY: Oxford University Press.

Jussim, L., Crawford, J. T., Anglin, S. M., & Stevens, S. T. (In press a). The politics of social psychological science II: Distortions in the social psychology of liberalism and conservatism. To appear in J. Forgas, K. Fiedler, & W. Crano (Eds.), Sydney Symposium on Social Psychology and Politics.

Jussim, L. Crawford, J. T., Stevens, S. T., & Anglin, S. M. (In press b). The politics of social psychological science I: Distortions in the social psychology of intergroup relations. To appear in P. Valdesolo & J. Graham (Eds.), Bridging Ideological Divides: Claremont Symposium on Applied Social Psychology.

McGuire, W. J. (1973). The yin and yang of progress in social psychology: Seven koan. Journal of Personality and Social Psychology, 26, 446-456.

McGuire, W. J. (1983). A contextualist theory of knowledge: Its implications for innovation reform in psychological research. Advances in Experimental Social Psychology, 16, 1-47.

Popper, K. R. (1959/1968). The logic of scientific discovery. New York: Harper & Row.

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011).  False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359-1366.

Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4, 274-290.

Posted in Uncategorized | 2 Comments

An apology and proposal

Brent W. Roberts

My tweet, “Failure to replicate hurting your career? What about PhDs with no career because they were honest” was taken by some as a personal attack on Dr. Schnall.  It was not and I apologize to Dr. Schnall if it were taken that way. The tweet was in reference to the field as a whole because our current publication and promotion system does not reward the honest design and reporting of research. And this places many young investigators at a disadvantage. Let me explain.

Our publication practices reward the reporting of optimized data—the data that looks the best or that could be dressed up to look nice through whatever means necessary. We have no choice given the way we incentivize our publication system. That system, which punishes null findings and rewards only statistically significant effects means that our published science is not currently an honest portrait of how our science works. The current rash of failures to replicate famous and not so famous studies is simply a symptom of a system that is in dire need of reform. Moreover, students who are unwilling to work within this system—to be honest with their failures to replicate published work, for example—are punished disproportionately. They wash out, get counseled into other fields, or simply choose to leave our field of their own accord.

Of course, I could be wrong. It is possible that the majority of researchers publish all of their tests of all of their ideas somewhere, including their null findings. I’m open to that possibility.  But, like many hypotheses, it should be tested and I have an idea for how to test it.

Take any one of our flagship journals and for 1 year follow a publication practice much like that followed for the special replication issue just published. During that year, the editors agree to only review and publish manuscripts that have been 1) pre-registered, 2) have only their introduction, methods, and planned analyses described, not their results, 3) each paper would contain at least one direct replication of each unique study presented in any given proposed package of studies. The papers would be “accepted” based on the elegance of the theory and the adequacy of the methods alone. The results would not be considered in the review process. Of course, the pre-registered studies would be “published” in a form where readers would know that the idea was proposed even if the authors do not follow through with reporting the results.

After a year, we can examine what honest science looks like. I suspect the success rate for statistically significant findings will go down dramatically, but that is only a hypothesis. Generally speaking, think of the impact this would have on our field and science in general. The journal that takes up this challenge would have the chance to show the field and the world, what honest science looks like. It would be held up as an example for all fields of science for exactly how the process works, warts and all. And, if I’m wrong, if at the end that year the science produced in that journal looks exactly like the pages of our current journals I’ll not only apologize to the field, I’ll stop tweeting entirely.

Posted in Uncategorized | 9 Comments

Additional Reflections on Ceiling Effects in Recent Replication Research

By R. Chris Fraley

In her commentary on the Johnson, Cheung, and Donnellan (2014) replication attempt, Schnall (2014) writes that the analyses reported in the Johnson et al. (2014) paper “are invalid and allow no conclusions about the reproducibility of the original findings” because of “the observed ceiling effect.”

I agree with Schnall that researchers should be concerned with ceiling effects. When there is relatively little room for scores to move around, it is more difficult to demonstrate that experimental manipulations are effective. But are the ratings so high in Johnson et al.’s (2014) Study 1 that the study is incapable of detecting an effect if one is present?


To address this question, I programmed some simulations in R. The details of the simulations are available at, but here is a summary of some of the key results:

  • Although there are a large number of scores on the high end of the scale in the Johnson et al. Study 1 (I’m focusing on the “Kitten” scenario in particular), the amount of compression that takes place is not sufficient to undermine the study’s ability to detect genuine effects.
  • If the true effect size for the manipulation is relatively large (e.g., Cohen’s d = -.60; See Table 1 of Johnson et al.), but we pass that through a squashing function that produces the distributions observed in the Johnson et al. study, the effect is still evident (see the Figure for a randomly selected example from the thousands of simulations conducted). And, given the sample size used in the Johnson et al. (2014) report, the authors had reasonable statistical power to detect it (70% to 84%, depending on exactly how things get parameterized).
  • Although it is possible to make the effect undetectable by compressing the scores, this requires either (a) that we assume the actual effect size is much smaller than what was originally reported or (b) that the scores be compressed so tightly that 80% or more of participants endorsed the highest response or (c) that the effect work in the opposite direction of what was expected (i.e., that the manipulation pushes scores upwards towards rather than away from the ceiling).

In short, although the Johnson et al. (2014) sample does differ from the original in some interesting ways (e.g., higher ratings), I don’t think it is clear at this point that those higher ratings produced a ceiling effect that precludes their conclusions.

Posted in Uncategorized | 11 Comments