The Perils of Reviewing Peer Review
As a recovering Federal employee, I recognize that one of the biggest challenges the government faces in funding science is that of being truly Darwinian in a rapidly evolving scientific environment. Engineering natural selection doesn’t come easily to government funding agencies, which are often playing with one hand behind the back.
I served in the government at NIH as a senior scientific executive. Now as Executive Director of ASCB, an independent scientific society, I must look at things from the outside while still remaining an integral member of the scientific community. From this vantage point, I can recognize how even experienced scientists can fail to understand how the wonderful and dedicated NIH employees must struggle to do the right thing. Evaluating proposals is like dancing on a floor overgrown with rules and red tape. It often seems more like a game of Twister than a waltz.
Perhaps one of the biggest challenges facing NIH is making peer review system serve today’s science—not yesterday’s. For NIH, the charge is to deliver better health outcomes in five, ten, and 20 years from today. So I applaud NIH efforts to continually review peer review.
The peer review system faces challenges at multiple levels. Perhaps, the greatest is to enlist a steady supply of high quality reviewers in the study sections. NIH needs the leaders of every field who can focus on the substance of the grants, without tinkering with minutia, and be able to craft helpful reviews so applicants can learn from the process and if possible improve future proposals. There is no simple solution here because the best reviewers are usually the best scientists, the best speakers, the best mentors, and the busiest people. Thus they are often unavailable for study sections. Currently NIH is receiving roughly 80,000 grant proposals a year so finding qualified and objective reviewers is a never-ending hide and seek exercise.
Recently, the NIH announced an initiative that will take a fresh look at its peer review system from a slight different angle. NIH is asking itself how internal review groups (IRGs) and study sections could be thematically organized. In particular, NIH wants to analyze input and output metrics for each study section. In other words NIH is interested in measuring productivity, although NIH seems very careful not to use this word. On the input side of the equation, the idea would be to see if the number of new applications and new awards reviewed by different study sections, controlled for field size, is balanced across study sections. On the output side, NIH wants to use various bibliometric analyses that rely mostly on citations to analyze the performance of each study section normalized to its field of science.
Thematic update is a question worth addressing. It is essential that the structure of the various IRGs stays relevant to the demands of ever-changing science. The first study section in NIH history was on syphilis but clearly structuring a study section today dedicated to syphilis would be less than optimal. Ensuring that study sections represent a good cross section (no pun!) of current biomedical science is particularly important because of the NIH system of relative ranking (by percentiles ) across all IRGs in the current review round and in the two preceding rounds. If the IRG structure is thematically obsolete, the risk is that funding applications would not be of the highest novelty and quality for evolving science.
But here comes the thorniest problem—how. How do you evaluate scientific productivity? What are the best metrics? And what are the appropriate controls? The issue is particularly complex, and I am firmly convinced that there is no silver bullet, no single metric OR single summary statistic that can adequately express the value and the relevance of a particular publication or set of publications. While I could imagine that such a summary statistic would be extremely helpful, it simply doesn’t exist. Science is way too complex and nuanced to be represented by a single metric. Ultimately, it is an imperative to read what is in a paper and judge the content of that contribution, regardless of the number of citations or other metrics that it has received.
Nobel laureate Renato Dulbecco once told me that if his pioneering research in cancer were to be judged by the impact factors of the journals where it was published, not only he would have never gotten the Nobel Prize, probably he would not even have been tenured. Last week, I was talking to a bright postdoc, who does research on extracellular matrix, who told me with great pride that he had just had two papers accepted in one day. I asked, “What were the papers about?” His response was almost automatic. He said one was in the journal of such-and-such and the other in the journal of this-and-that! I said, “Hold it. I asked you what, not where!”
Even our best and brightest are victims of this pervasive culture in science that focuses on where, rather than what and why. ASCB has expressed these sentiments on many occasions. Most recently, ASCB has been a leader in the so-called DORA insurrection http://am.ascb.org/dora/ against the misuse of impact factors, which is a particularly flawed metric. But most of all, ASCB wants to change the culture, help scientists focus on the important what, rather than where, while moving away from one single treacherous, flawed algorithm that runs our scientific lives by making them more miserable than they need to be.
But back to the NIH reviewing the peer review system. I applaud NIH’s effort to make sure that IRGs and study sections are aligned with present and emerging scientific areas. However I hope NIH will stay away from inflaming a culture already overly influenced by treacherous metrics. Instead, NIH should lead the way by saying to the community that content matters, not a single metric, nor a single algorithm. NIH should demonstrate that in re-aligning its peer review system that what, not where is what matters.