By R. Chris Fraley
In her commentary on the Johnson, Cheung, and Donnellan (2014) replication attempt, Schnall (2014) writes that the analyses reported in the Johnson et al. (2014) paper “are invalid and allow no conclusions about the reproducibility of the original findings” because of “the observed ceiling effect.”
I agree with Schnall that researchers should be concerned with ceiling effects. When there is relatively little room for scores to move around, it is more difficult to demonstrate that experimental manipulations are effective. But are the ratings so high in Johnson et al.’s (2014) Study 1 that the study is incapable of detecting an effect if one is present?
To address this question, I programmed some simulations in R. The details of the simulations are available at http://osf.io/svbtw, but here is a summary of some of the key results:
- Although there are a large number of scores on the high end of the scale in the Johnson et al. Study 1 (I’m focusing on the “Kitten” scenario in particular), the amount of compression that takes place is not sufficient to undermine the study’s ability to detect genuine effects.
- If the true effect size for the manipulation is relatively large (e.g., Cohen’s d = -.60; See Table 1 of Johnson et al.), but we pass that through a squashing function that produces the distributions observed in the Johnson et al. study, the effect is still evident (see the Figure for a randomly selected example from the thousands of simulations conducted). And, given the sample size used in the Johnson et al. (2014) report, the authors had reasonable statistical power to detect it (70% to 84%, depending on exactly how things get parameterized).
- Although it is possible to make the effect undetectable by compressing the scores, this requires either (a) that we assume the actual effect size is much smaller than what was originally reported or (b) that the scores be compressed so tightly that 80% or more of participants endorsed the highest response or (c) that the effect work in the opposite direction of what was expected (i.e., that the manipulation pushes scores upwards towards rather than away from the ceiling).
In short, although the Johnson et al. (2014) sample does differ from the original in some interesting ways (e.g., higher ratings), I don’t think it is clear at this point that those higher ratings produced a ceiling effect that precludes their conclusions.