Careers depend on how research is assessed

Filling a faculty position can be a daunting process. Universities can receive hundreds of applications for a single opening. But with so many people to consider, how can universities give each applicant his or her due? It’s a significant challenge, and there are not a lot of easy answers. Research evaluation takes time, and faculty who serve on hiring committees have other roles and responsibilities they need to balance with assessment.

[Q]uality is more objective than impact, and that should make it easier to assess.

It can be tempting to rely on the Journal Impact Factor (JIF) and other proxy measures of quality and impact when time is scarce. But when you look at the data, the JIF is not a reliable tool for research assessment—not for individual articles and certainly not for people. Journals have broad citation distributions, and the JIF has long been known to be a poor predictor of how many citations a paper will receive.1 Despite this shortcoming and the well-known systemic effects of an undue fixation on JIFs,2 a recent survey of review, promotion, and tenure documents across the United States and Canada found that JIFs are still widely used in evaluation of individuals. Of the research-intensive universities sampled, 40% mention the impact factor or other closely related terms in their review, promotion, and tenure documents.3

[W]hat is the difference between an outcome, an impact, or an output?

Impact Is Not Quality

There are other shortcuts that can lead to bias in decision-making during evaluation. Some of them quietly slip into the process without the reviewer even noticing. For example, it is easy to skim an academic CV and make unconscious judgments based on aggregate information that may provide a less-than-accurate image of what someone has contributed to the field. Shortcuts include things like the reputation of the institution where the applicant completed his or her training, or personal attributes such as gender or ethnicity.

Short-term impact is not the same thing as quality, and reliable, carefully done research that enables others to build on the results should be valued over a news headline. It takes time to understand the total value of an article. For example, when a lab does not authenticate cell lines, its research may not be reproducible by others and trust in its results is eroded. Likewise, verifying antibodies increases confidence in the work by avoiding false results due to cross-reactivity or batch-to-batch variability.4 Though it may not seem so immediately, in many ways quality is more objective than impact, and that should make it easier to assess. For example, it is possible to objectively assess whether sample sizes are appropriate or whether data are shared according to the FAIR (findability, accessibility, interoperability, and reuse) principles.5

DORA

The San Francisco Declaration on Research Assessment (DORA) was launched in 2013 as a call to action to improve the ways we assess the outputs of scholarly research. It specifically discourages using the JIF or other journal-based metrics in research evaluation and highlights the need to consider the value and impact of all aspects and outputs of scholarly research. The declaration has since accrued more than 1,300 organizational and 14,000 individual signers.

Now DORA has become an initiative that is building a community of practice to improve how we evaluate researchers. A big part of this involves reaching out to the academic community and listening to needs, challenges, and concerns. At the ASCB | EMBO Meeting in December 2018, DORA hosted an interactive red-pen session, “How to improve research assessment for hiring and funding decisions,” as part of the career enhancement programming. Participants worked in small groups to provide feedback on grant applications and faculty position postings.6 They identified minor changes to applications and CVs that would encourage reviewers to pause and reflect before making judgements, such as moving the educational history to the end of a CV or application, removing journal names in bibliographies, and recognizing preprints, data, and protocols as outputs of research.

It is important…for institutions to think about the breadth of applicant contributions that they want to assess.

How We Assess Researchers Matters

Unintentional bias can disadvantage stereotyped groups in hiring, promotion, and funding decisions―but change is possible. At a session we organized at the 2019 American Association for the Advancement of Science meeting, Patricia Devine, a professor of psychology at the University of Wisconsin, Madison, showed how a 2.5-hour bias intervention workshop increased the number of female faculty hires in STEMM departments in a randomized controlled trial at the University of Wisconsin, Madison.7 The composition of hiring committees and review panels is another place where institutions can take action, since a more diverse group may help achieve more equitable outcomes.

Do We Even Speak the Same Language?

The language related to research assessment can be another source of frustration for applicants and reviewers. Definitions can vary depending on location. For example, what is the difference between an outcome, an impact, or an output? Using broad phrases to describe desirable applicant qualities can also be confusing. What is meant by world-class research? Ill-defined terms like these are also open to interpretation, which means that different standards are applied to different people. Setting clear and meaningful expectations for applicants, hiring committees, and grant panels is an important step to achieving fairer outcomes. It is also important  for institutions to think about the breadth of applicant contributions that they want to assess. For example, the University of California, Berkeley, developed a rubric to assess candidate contributions to diversity, equity, and inclusion.8 Rubrics can also increase transparency in decision-making and provide a constructive way to give applicants feedback.

Conclusion

Meaningful evaluation of researchers takes time, and changing deep-seated practices is not simple, but we owe it to the community and early-career researchers in particular to advance practical and robust approaches to assessment that address the breadth of contributions a person makes to research. We are trained as researchers to experiment, and exploring new ways to evaluate research will lead to a more reliable and equitable system. DORA is working with the community to build capacity and develop workable alternatives.

References

1Larivière V et al. (2016). A simple proposal for the publication of journal citation distributions. BioRxiv doi: 10.1101/062109.

2 Curry S (February 7, 2018). Let’s move beyond the rhetoric: it’s time to change how we judge research. Nature doi: 10.1038/d41586-018-01642-w.

3 McKiernan EC et al. (2019). Use of the Journal Impact Factor in academic review, promotion, and tenure evaluations. PeerJ Preprints 7:e27638v2.

4 Baker M. (May 19, 2015). Reproducibility crisis: Blame it on the antibodies. Nature doi: 10.1038/521274a

5 Go FAIR. FAIR Principles. www.go-fair.org/fair-principles.

6 Hatch AL et al. (2019). Research assessment: Reducing bias in the evaluation of researchers. eLife https://bit.ly/2HsKE3T.

7 Devine PG et al. (2017). A gender bias habit-breaking intervention led to increased hiring of female faculty in STEMM departments. J Exp Soc Psychol doi: 10.1016/j.jesp.2017.07.002.

8 University of California, Berkeley Office for Faculty Welfare and Equity (2018). Rubric to Assess Candidate Contributions to Diversity, Equity, and Inclusion. https://bit.ly/2Ek5rVu.

 

Editor’s Note: This verson varies slightly from that originally published in the ASCB Newsletter.

About the Author:


Anna Hatch is the DORA Community Manager. She earned a PhD in biochemistry from Dartmouth College.
Stephen Curry is a professor at Imperial College London and chair of the DORA Steering Committee.