Is the Research World Facing a Crisis of Reproducibility?
By Marianna Evola
A few days ago, a colleague sent me a link to an article that addressed the reproducibility crisis in science. The article addressed a survey conducted in Nature to assess the perceptions and experiences of the scientific world with regard to whether they perceived that there was a reproducibility crisis in science. The article is worth perusing, but the title, "Is there a reproducibility crisis?" mimicked a question that I have been pondering for a while. Or rather, are we just perceiving a crisis when in fact, we are just observing the natural self-correcting nature of the research literature? And, if there are low levels of reproducibility in the published literature, is this a new problem, or has research always difficulties with reproduction. But most importantly, whether or not there is some reproducibility crisis, how do we improve research practice to increase the reproducibility of research?
Periodically, while watching travesty and devastation on the news, I get that feeling that the world is falling apart and like many people, I crave "the good old days". Later, I begin to wonder whether the world is actually so much worse than "the good old days". Or, are we just inundated with horrible news from across the globe and as humans, are we not able to process negative news reports 24 hours a day, 365 days per year? I'm sure that there are data that argue for one view or another, but I'm not really interested in an answer. Similarly, when we hear repeated reports about the failures of science, are things worse today, or are we just paying attention to issues such as reproducibility, misconduct and responsibility in a manner that our academic ancestors did not? The focus on Responsible research and research ethics in the research world is relatively recent; about 20-30 years. Thus, with little data older than 20 years, it is difficult to say whether our rates of reproducibility markedly differ from historical rates. That being said, we really don't need to know whether things are worse or better to address weaknesses in the practices and culture of the research world and improve the reproducibility of published research.
So, what is the proposed reproducibility crisis and what did the above referenced article say about researcher perspectives on whether or not we were facing a crisis? The reproducibility crisis has been referenced quite a bit in recent discussions of the research world. The concern is that there is a lot of published research that cannot be replicated. The concerns associated with the proposed abundance of irreproducible research include wasted resources, sloppy science, poor researchers, misconduct and a research record that cannot be trusted. I often talk to students about the research world being a community of trust. I explain that when we read the literature, we trust that the research was appropriately conducted, analyzed and represented in publication. Because we trust the literature, we can build on the published ideas of our colleagues. If we stop trusting the literature, research progress will be halted. Thus, preserving the trust in the research community is critical for scientific progress. However, recently, there have been publications that raise questions about the reliability of the published literature. Examples include the Begley & Ellis (2012) report on the poor rate of reproducibility of pre-clinical cancer research. They reported that pre-clinical research examining drug treatments for cancer was reproduction rate approximately 10 percent. A few years later, an examination of the social sciences (Open Science Collaboration, 2015) attempted to replicate 100 published studies. They reported that revealed that less than half of the studies could be reproduced. This has led to a focus on creating research strategies that will enhance our rates of reproducibility. In fact, NIH now requires researchers to describe strategies that will enhance rigor and the reproducibility of their research when they submit project applications. I fully support developing strategies that enhance the reproducibility of research results. However, I'm hesitant to accept that we are facing a reproducibility crisis. There are a lot of reasons that research replications can fail, and seldom does the explanation involve poor research practices or research misconduct.
Years ago, I was at a conference and a senior colleague approached me to discuss problems that he was having with a classic and simple behavioral research assay. He was trying to study the reinforcing effects of drugs in the Rat Runway model (Ettenberg et al., 1981). My colleague studied drugs of abuse, but his primary training was not in behavior. Thus, he was having difficulty effectively utilizing the behavioral assay. Basically, even though the assay had been effectively utilized by many labs and had been repeatedly reported to be an effective way to measure the reinforcing effects of drugs of abuse, my colleague could not demonstrate the effect in his lab. We sat and discussed the many behavioral parameters that could be impacting his experiments. As we sat and talked, he realized that there were a lot of factors that he had not considered when he began his work. He had purposely selected a seemingly simple assay but had not considered all the controls that should be addressed when conducting his experiments. As such, his initial attempts to replicate classic experiments had failed. They did not initially fail because the assay had been misrepresented in the literature. Nor had they failed because they were not following the procedures had been reported in the literature. They failed because they were missing rudimentary information that is a part of the basic training for behavior researchers. Ultimately, after considering all the factors that were contributing variability to his work, he returned home and successfully utilized the assay in his research.
There are a lot of reasons that it can be difficult to reproduce published research results and none of them are worthy of the label "crisis". Research is becoming increasingly specialized, and thus, technical research skills are also specialized. And many of those skills cannot be described in the literature. For example, when I was a brand new graduate student, I watched our lab technician give a subcutaneous injections to rats without restraining them in any way, and not one rat resisted her handling nor even reacted to the injection. I thought, "Wow, that's easy". Then it was my turn. Contrary to my naïve thoughts, the rats were struggling to get away, trying to bite me and vocalizing their displeasure as I tried to hold them still to do the injections. Our technician did her best to show me how to replicate her handling procedures but I had a lot to learn about giving rats stress free subcutaneous injections. Years and thousands of injections later, my student assistants would bring me "impossible" rats, and I would inject them effortlessly, as exasperated students watched in awe. Stable behavioral research relies on minimizing stress to animals, thus stress free injections markedly contributes to solid research. But, the only way to become good at injecting animals is to practice it, repeatedly. Thus, a researcher that is new to injections procedures may fail to replicate a study that was conducted by a seasoned researcher, simply because the seasoned researcher works with stress free animals and the new researcher cannot attain similarly low stress levels in their colony, simply because they have not yet established an effective technique. These experiential differences can also contribute to differing results within a research group when personnel conducting research skills have different levels of experience. Even if a research group takes great pains to ensure that all personnel are trained, there are some skills that simply come from years of repetition and no lab can afford to provide personnel years of practice before they are permitted to ever conduct an experiment. Nor can you effectively teach someone how to administer a stress free injection to a rat by describing the technique in a publication. No doubt, there are countless research skills that require large amounts of practice to master. However, there are few labs that can afford for personnel to practice to level of mastery before they ever engage in conducting a research project. Simple skills development likely contributes to reproduction failure in research, between teams, within a team or even for an individual as they develop skills.
Practical limitations regarding one's ability to control extraneous factors may also contribute to research variability and reproduction failures. Factors that can be controlled for animals are countless, caging, light, temperature, humidity, light cycles, circadian rhythms, diet, litter mates, age of delivery, method for transporting animals around the lab, odors or noise in the housing colony or experimental setting, experience of experimenters and the list can continue. All of these factors can impact the outcome of a study. We try to report all of the factors that we control in our publications. However, random occurrences that we cannot control could also impact our results and we may not report them because we never knew they happened. Most labs struggle with how to address random mechanical or power failures that throw off the environmental controls of the housing colony. Do you continue with work when you have already invested a lot of time and money in the work? Do you delay experiments? Are the animals stressed? Will it impact the experimental results? What if the failure occurred over the weekend and your team was not around and no one knows there were problems that could impact the study? When you consider all of the factors we attempt to control in animal research, you realize how many factors can impact the research. Furthermore, you begin to realize how easy it is to impact the reproducibility of a study.
In addition, when you consider all the factors that we try to control when we work with animals, you also realize how little can be controlled for researchers that conduct work with human research participants. Human subjects' researchers can control their experiments, but they cannot control the history of subjects, their living conditions, their experiences on the way to the study, their diet, basically not much about their research subjects can be controlled. And similar to animal subjects, many of these factors can impact the results of research. Thus, factors that have nothing to do with experimental manipulations could, in fact, impact the outcome of the research. It is easy to see how extraneous factors contribute to the reproducibility of research studies.
Although I find it interesting to ponder the proposed reproducibility crisis, I don't think it is effective to focus on pointing fingers and placing blame. Nor do I find it useful or accurate to describe it as a crisis because we know very little with regard to rates of historical versus current irreproducibility in research. Rather, assessing reproducibility is useful if it is done in a manner that enables us to improve our research and publication practices. It is always beneficial to improve our work but little is accomplished by attacking the research community.
Marianne Evola is senior administrator in the Responsible Research area of the Office of the Vice President for Research. She is a monthly contributor to Scholarly Messenger.