Over the past few weeks, I’ve had several seemingly benign conversations with students about research only to realize in retrospect that we had blithely engaged in, or proposed to engage in, several of the questionable research practices (QRPs) that lead to inflated Type 1 error rates. Having spent so much time stewing on these issues, the experience was rather deflating. That our students could escape our graduate program without the clear message that these practices are problematic seemed a clear indication of pedagogical failure. That I could fail to teach these issues effectively was an indication that my standard operating procedures were more than lacking. So, I thought it would be constructive to post some readings, that if consumed, would mean that someone engaging in QRPs would do so with eyes wide shut.
A Little Background
In a perfect world, most of these issues are covered in a basic methods course that all grad students take. A typical methods course will cover issues such as internal validity, external validity (generalizability), construct validity, and statistical conclusion validity along with topics such as ethics and the various techniques one uses in research such as within-subjects designs, experience sampling, or growth modeling. Given the demands of a typical methods course many issues cannot be covered in the detail they deserve. Also, we leave a lot methods teaching to On the Job Training (OJT)–we are supposed to teach our students how to do things properly as they conduct their research. Something in this combination has not gone as well as could be expected, thus the need for some supplementation.
It is clear from the blow up of methodological imbroglios in social and personality psychology that most of our problems arise from either a willful ignorance or Machiavellian abuse of null hypothesis significance testing, combined with the use of QRPs. So, the list below emphasizes the niceties of statistical conclusion validity and how to avoid QRPs to the detriment of topics like external and construct validity.
Null Hypothesis Significance Testing (NHST), or as Cohen describes it Significance Hypothesis Inference Testing (SHIT)
Fraley, R. C. & Marks, M.J. (2007). The null hypothesis significance-testing debate and its implications for personality research. In R.W. Robins, R.F. Krueger, & R.C. Fraley (eds.), Research Methods in Personality Psychology (Chap 9, 170-189). New York, NY: Guilford Press.
Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G. G., Moreland, K. L., Dies, R. R., Eisman, E. J., Kubiszyn, T. W., & Reed, G. M. (2001). Psychological testing and psychological assessment. American Psychologist, 56, 128-165.
What not to do (kind of like What Not to Wear)
LeBel, E.P., & Peters, K.R. (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study in deficiencies in modal research practice. Review of General Psychology, 15, 371-379.
Simmons, J.P., Nelson, L.D., & Simonshohn, U. (2012). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science.
As I’ve noted elsewhere, I’m not entirely optimistic that reading this material is enough to protect us from conducting problematic research. Given the fact that we have known for five decades (yes, 50 years since Cohen’s 1962 paper) that our studies are underpowered and we still conduct underpowered research could be taken as evidence that we are impervious to influence. That said, if you do read and understand these papers and continue to blithely run underpowered studies and p-hack your way to fame and fortune, at least you do it with the knowledge that your papers may make you famous in the short run, but fade away in the long run.
In closing, this list is clearly idiosyncratic and incomplete. Feel free to plug for your favorite classic methods papers. It can’t hurt, can it?