Biostatistics is the application of statistics to understanding health and biology. It provides powerful tools for developing research questions, designing studies, refining measurements, analyzing data, and interpreting findings. In the most rigorous cell biology studies, we see the increasing use of statistical methods, but too often we see inappropriate use of statistics and over-interpretation of results. Published data indicate that involving a biostatistician is valuable for a) increasing the chances of publishing and of obtaining grant funding; b) improving knowledge in experimental design and data analytics; and c) ensuring rigor and reproducibility of research. That said, it may seem challenging to identify and recruit biostatisticians early enough in your research.
[T]he current “reproducibility crisis” may result in part from biologists not collaborating with statisticians who have expertise in [statistical methodologies].
Here we put forward some advice on why and how to work with biostatisticians. We discuss how to educate yourself in the basics of statistics to achieve improved communication with biostatisticians, when and where to find biostatisticians, and how to attract them toward one’s research.
Statistical Best Practices
As part of your meticulous planning in experimental design, you need to include details on data collection and justify why a particular method was chosen for data analysis. You are probably already collaborating with experts in experimental biological techniques appropriate to your questions. However, you will also need to be familiar with, and use correctly, a variety of statistical methods. These include the Box and Whisker plot and the Shapiro-Wilk test to describe and test the distributional form of the data and also Student’s t-test, Fisher’s Exact test, Mann-Whitney U test, and analysis of variance to compare the group differences. Additionally, you might need to use advanced multivariable regression modeling approaches, such as survival analysis for time to event data and linear mixed modeling for clustered data, both for adjustment of confounding variables. Because most biology articles do not involve biostatisticians as coauthors, can we conclude that biologists are knowledgeable in appropriate use of these methodologies and related software? Or perhaps that the current “reproducibility crisis” may result in part from biologists not collaborating with statisticians who have expertise in these areas?
How to Educate Yourself in Statistics and Why
There are many books and tutorials that can acquaint biological researchers with statistical principles useful
for experimental design. These cover the role of randomization, sample size (or power) calculation, the value of utilization of repeated measure, handling of missing data, and so on. For example, the books or book sections written by Heath (1994), Hanfelt (1997), Wardlaw (2000), Ambrosius (2007), and Bang (2010) are excellent starting points.1–5 Further statistical learning can come from deep collaboration with a biostatistician, in which case you would learn statistics and he or she would learn cell biology. Sometimes workshops are offered to provide joint training.6 Self-study in statistics and joint training provide a common language to facilitate communication when working with a biostatistician.
Why Work with Biostatisticians?
Cell biologists are obviously smart researchers, and they certainly can learn some statistical techniques and perform the design and analysis for some simple experiments. However, many experiments present complex scenarios for data analysis, and most statistical tests have underlying assumptions with which the data may not comply. If the data do not exactly match the assumptions for a given statistical test, biostatisticians are well equipped to find the most
appropriate alternative analytic approaches. In addition to applying assumption-test matching, a biostatistician is also trained to help investigators match research goals to their hypothesis. This input helps the investigator decide which method will best test the hypothesis, how the data should be collected and shared, what result is expected from the analysis, and how to report it. In fact, working with a biostatistician can not only produce better quality research, it can also save a tremendous amount of time for a researcher, time that is better spent in writing the next manuscript or grant application.
How to Work with Biostatisticians
The first step in working with a biostatistician is to understand the role of a biostatistician, as discussed here, and the ethical standards that guide their work, as put forth by the American Statistical Association.7 Next, establish good communication that fosters listening and questioning by both parties. This careful communication should proceed until the biostatistician understands the project at hand and the investigator understands the statistical design and analysis plan for the project. Provide supplemental materials to each other and maintain an adequate level of discussion. Finally, develop a clear list of tasks and expectations for the project in the short and long term. Further details on this process are available in an article by Berman and Gullíon.8
When to Seek Out a Biostatistician
Seek advice from a biostatistician as early as possible. Biostatisticians are not only well trained in design and analysis, they are also able to advise on which data items to collect and how to check the quality of the data. Statistical analysis at the end of a study, however sophisticated, cannot fix a poorly designed study or an inadequately collected dataset. Engaging a biostatistician during grant application writing can also be crucial to successfully presenting the significance of the proposed research and adding innovation by using emerging statistical methodologies.
Where to Find Biostatisticians and How to Fund Them
In many cases, a biostatistician is paid for his/her collaborations on projects. However, an investigator may not have enough funding to support a biostatistician. One option is to include the statistician as a co-investigator in the grant proposal at an appropriate funding level. In addition, many institutes offer free statistical support for grant development through biostatistics shared resources. This support is funded either by a center grant or by institutional funding. Most visible are a) Biostatistics, Epidemiology, and Research Design Programs offered by Clinical Translation Science Centers; b) Biostatistics Shared Resources offered by National Cancer Institute designated cancer centers; and c) consulting centers run by departments of biostatistics that are used for training graduate students in biostatistics under supervision of a senior biostatistics professor. Although these resources are often over-subscribed, investigators with a compelling research project can attract a biostatistician to collaborate by exciting him or her about their research and by showing their self-preparedness through organized presentation of their materials and queries.
Statistical analysis at the end of a study, however sophisticated, cannot fix a poorly designed study or an inadequately collected dataset.
We have embedded biostatisticians in ongoing journal clubs in laboratories and have organized regular seminars where we invite biologists to present their recent work. These exchanges invoke mutual interest and result in successful collaborations between biologists and biostatisticians with mutual benefit.
Finding the right collaborating biostatistician is worth the researchers’ time and will enhance their productivity and research quality. We have repeatedly seen biologists who approached a biostatistician with a simple study design and related sample size calculation end up with successful funding and publications by revising experimental pipelines, generating better quality data, and improving study characteristics (higher power and lower false positive rates). We have seen teams develop and apply innovative statistical and computational tools that enabled the screening of genetic markers and biomarkers through supervised learning, cluster analysis, network construction, and improved prediction. The key to these successful collaborations is early and deep communications between biologists and biostatisticians.
Throughout, the biologist should keep in mind that collaboration is a two-way street: The biostatistician can share the excitement of cell biology, and cell biologists can join in the enjoyment of statistics.
1Heath D (1995). An Introduction to Experimental Design and Statistics for Biology. CRC Press.
2Hanfelt JJ (1997). Statistical approaches to experimental design and data analysis of in vivo studies. Breast Cancer Research and Treatment 46, 279-302.
3Wardlaw AC (2000). Practical Statistics for Experimental Biologists. John Wiley and Sons.
4Ambrosius W T (Ed) (2007). Topics in Biostatistics. Humana Press.
5Bang H, Zhou XK, van Epps HL, Mazumdar M (Eds.) (2010). Statistical Methods in Molecular Biology. Humana Press
6Hofner B, Vaas L, Lawo JP, Müller T, Sikorski J, Repsilber D (2012). Biologists meet statisticians: A workshop for young scientists to foster interdisciplinary team work. arXiv preprint arXiv:12085597.
8Berman N, Gullíon C (2007). Working with a Statistician. In Topics in Biostatistics (pp 489–503). Humana Press.
About the Author:
Lihua Li is at the Institute for Healthcare Delivery Science, Tisch Cancer Institute Biostatistics Shared Facility, Center for Biostatistics, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai.