It is not easy to build a clock. When my daughter got married, I decided to build a clock as an heirloom wedding present. An accomplished designer, Clayton Boyer, offers a variety of plans for clocks that can be built almost entirely out of wood.1 His website and linked YouTube videos show ample examples of others who have built these delicate machines successfully and in several artistic iterations. Presented with the set of plans, which I considered akin to a validated methods section, I set out to build a wooden clock.
When dissected into its principal components, a clock is not that complicated. First, there is a train of gears. Its role is to transmit the force generated by the weights that power the clock. The last gear drives the escapement that, in a stop-and-go motion, synchronizes all movement to the pendulum. Swinging back and forth at its own leisurely pace, the pendulum sets the time. Finally, there is another train of gears with the sole purpose of running the hour and minute hands at the appropriate ratio. These fundamental principles that underlie Boyer’s plans are beautifully described in Ward Goodrich’s book The Modern Clock published in 1905 (not a typo),2 which I read cover-to-cover. A living cell is much more complicated.3
Over a few months’ period preceding the wedding, my evenings were filled with cutting each tooth of every wheel freehand on a scroll saw and filing the pieces to perfection—or so I thought. With a few weeks to spare, I finally put all of the pieces together, added the weights and pendulum, and there it finally was, a beautiful machine that did…nothing! The gears did not move; the escapement did not click, and the pendulum quickly came to an agonizing stop. I clearly had failed to reproduce prior work. It took weeks of error hunting, filing this and that tooth to a more befitting shape, and redoing an entire wheel where the wood had warped ever so slightly. As time grew shorter, each day I took the clock apart, then reassembled it, and put it through another test. Finally, a few days before the wedding, it started running with a resounding and robust tick-tock—it was music to my ears, and I stared at its hypnotic motions for a long time!
As this example illustrates, it is far more difficult to reproduce positive results than to declare failure. Boyer’s plans were virtually perfect, yet they left room for me to introduce errors that needed to be hunted down and solved. Success required tenacity and diligence and, perhaps most importantly, the conviction (in this case based on a few hundred years of evidence) that it can be done. In the end, I developed the prerequisite expertise, a Fingerspitzengefühl (the delicate feelings we sense with the tips of our fingers) for the task at hand. Even then, when I built a second, different clock for my other daughter, it required a similar effort of fine-tuning.
Responding to the Issue
Reproducibility in scientific research has recently become a hot topic of immense importance to our community. Reports on the unreliability of published research have severely tarnished the image of our profession, as exemplified by an article entitled “How science goes wrong” published in the The Economist in 2013.4 Publishing bad science is bad for all of us.
As a community we need to realize the importance of this issue and the major efforts that are underway to respond. In 2014, the ASCB convened a task force that issued a white paper,5 which includes tangible and constructive recommendations on how to improve current publication methods and standards, to ensure the publication of high-quality data. This is an excellent, must-read document, and I urge every researcher and publisher to implement its recommendations.
There is universal agreement that every paper has to contain a meaningfully complete, detailed methods section that allows others to replicate the work. Yet papers continue to be published that do not, even in the very journals that deeply engage in the dialog on reproducibility. A recent example from my area of research concerns the first report of a crystal structure of eIF2B, a huge protein complex that serves as a translation initiation factor. Published in Nature,6 this is a nicely written and exciting manuscript that conveys a major discovery in the field. But when read carefully, the reader discovers that the experimental conditions for crystal growth are missing, being referred to as “submitted.” Thus there is no access to the most basic information describing how this work was done, until (and if) a follow-up publication becomes publicly available. In this particular case, the situation was amicably resolved: My graduate students contacted the authors requesting the missing information and, after a brief back-and-forth, we promptly received a preprint of the submitted work including all of the missing information. But I am at a loss to explain how—in this time and age and in the midst of intense reproducibility discussions—the reviewers of this paper and the professional editorial team at Nature could have endorsed publication of this work with such a critical omission.
Another important and often overlooked aspect of introducing increased rigor is to remove the stigma and barriers associated with publishing negative data—i.e., data that fail to reproduce published results. Some prominent, reputable journals and publishing platforms, including PLOS ONE and F1000Research, already support and encourage such communication. Similarly, the “Reproducibility Project: Cancer”—in conjunction with the journal eLife—addresses the issue squarely.7 A team of scientists has selected a set of high-profile (i.e., highly cited) papers on cancer research with the aim of conducting an open reproducibility study. For this study, they will attempt to “conduct the experimental procedure as closely as possible to the original experiment using the same material and instrumentation, if available.” Importantly, “the replication protocol requires the core team to contact the original corresponding author to request materials and any available information that could improve the quality of the replication attempt.” This project will be conducted with complete transparency. By contrast, in 2012 Amgen researchers published that they had been unable to reproduce 47 of 53 landmark cancer papers,8 but they kept the data that may—or may not—support these alarming conclusions tightly under wraps. Before we take statements of this sort at face value, we need to evaluate the data and understand what is meant by “non-reproducible.”
The Need for Rigor in Assessing Reproducibility
The committee that wrote the ASCB White Paper on Reproducibility adopted a very insightful, multi-tier definition of reproducibility5:
- Analytical Replication attempts to reproduce the results from the same original data via reanalysis.
- Direct Replication attempts to reproduce the same results using the same conditions, materials, and methods as the original experiments.
- Systematic Replication aims to obtain the same finding of a given publication, but under different conditions (e.g., a different cell line, mouse strain, etc.).
- Conceptual Replication aims to demonstrate the validity of a concept or a finding using a different paradigm (e.g., in a divergent species).
This differentiation is important. In particular we need to keep in mind that Systematic and Conceptual Replication do not address the question of whether the work being replicated accurately reported on the outcome of a specific set of experiments. Instead, the question being asked is how far the conclusions of the original finding can be generalized. A finding can be correct in neurons but not in fibroblasts; it may be correct in mice but not translate to human biology. In the most extreme positive case—the “holy grail of validation”—a finding will be universally true. For example, after many decades of work, there is little doubt that ribosomes make proteins. Here is my bottom line: Any claim that a particular scientific finding is “not reproducible” must specify what is meant by this statement. Otherwise such claims threaten to become dangerous sniping, feeding analyses such as one that concluded that $28 billion is wasted each year in the United States on preclinical research that is not reproducible.9
Let me reiterate: It is much harder to replicate than to declare failure. What if I had used a different species of wood for making the clock than Boyer used? It might be more brittle, causing the teeth of the gears to break, or it might expand irregularly or warp as humidity changes. The clock would likely have failed. What if the antibody used for Western blotting is from a different bleed and no longer works as it used to? Or what if a company changed, unannounced to anyone, the composition of proprietary “Buffer A” in its experimental kit? There are literally thousands of variables that enter into every one of the complex experiments that characterize today’s biomedical research. It takes Fingerspitzengefühl and tenacity to work through these issues constructively. Resolving any replication issue requires active communication and open reagent exchange, as the authors of the Reproducibility Project: Cancer so appropriately state in their mission.7
Earlier this year, researchers at Amgen posted for open peer review on the F1000Research website an article that concludes that they “were unable to confirm a robust role for [the ubiquitin-specific protease] USP14 in Tau or TDP-43 degradation.”10 This submitted paper questions the results of a previous study by Dan Finley’s group at Harvard Medical School,11 which argued that inhibition of USP14 could enhance degradation of proteosomal substrates that are associated with neurodegenerative disease. Even though this was a preprint, not yet published, the message was quickly amplified in a Nature News section11 under the alarmist title “Biotech giant publishes failures to confirm high profile science.”
At first glance, this case appears to be another example where academic research produces unreliable and misleading conclusions. However, at least in this case, there seem to be problems with the negative data reported by Amgen. First, the Amgen researchers used different expression systems, yet protein expression levels were not compared with those in the original report. This is important as every assay has an intrinsic dynamic range and proteolytic systems can be saturated. Second, the siRNA knock-down experiments shown left 25% or more of USP14 behind, and nothing can be concluded from low efficiency knock-down experiments that yield negative results. Third, results from the original work are strongly supported in publications by others13,14 (as pointed out by Thomas Kodadek in the open review of the Amgen work) but not cited and discussed.
In this case it seems that conclusions drawn concerning irreproducibility are not convincing. This experience demonstrates the importance of submitting non-confirming results to peer review. Sasha Kamb, the head of research discovery at Amgen, states “we believe that interested scientists can look at our methods and results and draw their own conclusions.”12 Unfortunately, the Amgen researchers did not communicate with the Harvard group to resolve their discrepancies. Amgen also made it a condition that science journalists not contact the authors of the original work before the preprint was posted.15 The format at F1000Research now encourages the original authors to comment during its open review process of work disputing their conclusions, and we can hope a delayed dialog is still forthcoming.
What I am trying to emphasize is that substantial scrutiny and dialog are essential to assess the validity of irreproducibility claims. Yet most readers of the popular media, including the general public and our politicians who make funding decisions, hear that we have a rampant reproducibility crisis in academic research. They hear that scientists cannot be trusted and that research funds are being wasted. Generic claims that some work is “not reproducible” are harmful; they can be devastating to a project and funding, and even derail entire careers.
As a research community we have work to do—we need to continuously improve our ways of describing, standardizing, and sharing our methods and reagents, and we need to enable open discussion and responsible, rigorous publication of results, be they positive or negative. Engaging stakeholders in the assessment of results that refute prior findings is to me an essential ingredient in any recipe for success, but for this to work, scientists must be willing and supportive in helping others to reproduce their findings successfully. At best such dialogs may resolve the issue. At the least, I feel that journals publishing negative results must solicit and include comments and possible explanations by the authors of the original work.
And we must aspire to the same high standards for papers publishing negative results as we do for those publishing positive advances.
It takes hard work to make a clock, and it is much harder still to make it run.
Questions and comments are welcome and should be sent to email@example.com.
1Boyer C. www.lisaboyer.com/Claytonsite/Claytonsite1.htm.
2Goodrich WL (1905). The Modern Clock. Chicago: Hazlitt & Walker.
3Alberts B et al. (2015). Molecular Biology of the Cell. 6th ed. New York: Garland Science.
4Anonymous (Oct 19, 2013). How science goes wrong. The Economist.
5American Society for Cell Biology (2015). How can scientists enhance rigor in conducting basic research and reporting research results? A white paper from the American Society for Cell Biology. www.ascb.org/reproducibility.
6Kashiwagi K et al. (2016). Crystal structure of eukaryotic translation initiation factor 2B. Nature 531, 122–125.
7Errington TM et al. (2014). An open investigation of the reproducibility of cancer biology research. eLife 3:e04333.
8Freedman LP et al. (2015). The economics of reproducibility in preclinical research. PLOS Biol. 13:e1002165.
9Begley CG, Ellis LM (2012). Drug development: raise standards for preclinical cancer research. Nature 483, 531–533.
10Ortuno D et al. (2016). Does inactivation of USP14 enhance degradation of proteosomal substrates that are associated with neurodegenerative diseases? F1000Research 5,137.
11Lee BH et al. (2010) Enhancement of proteasome activity by a small-molecule inhibitor of USP14. Nature 467,179–184.
12Baker M (2016). Biotech giant publishes failures to confirm high-profile science. Nature 530, 141.
13Homma T et al. (2015). Ubiquitin specific protease 14 modulates degradation of cellular prion protein. Sci Rep 5, 11028.
14McKinnon C et al. (2016). Prion-mediated neurodegeneration is associated with early impairment of the ubiquitin-proteasome system. Acta Neuropathol. 131, 411–425.
15Kaiser J (2016). Calling all failed replication experiments. Science 351, 548.
“ASCB,” “The American Society for Cell Biology,” “iBioSeminars,” “DORA,” and “Molecular Biology of the Cell” are registered trademarks of The American Society for Cell Biology. “Cell Image Library” is a common law trademark of The American Society for Cell Biology.