Big data is coming to cell biology. Are we poised to take advantage of it? Image by Johnny Chang.

Big data is coming to cell biology. Are we poised to take advantage of it? Image by Johnny Chang.

Computer engineers have served up some unimaginable surprises. Just as the steam engine (in particular the highly efficient version devised by James Watt) was key to the industrial revolution in the 18th century, computers and other digital technologies have brought us to the second machine age and will do for mental power what the Watt steam engine did for muscle power.

This is the main argument articulated in a brilliant book which I just read, The Second Machine Age by MIT computer engineers Erik Brynjolfsson and Andrew McAfee. The exponential growth of computing applies not only to data collection, but also to data processing and data analysis. The intersection of all this is taking us to astonishing places we could not have imagined only a decade ago.

Think of Google self-driving cars, technically known as “autonomous cars.” What was once deemed an impossible engineering feat has been enabled by four fundamental advances:

  1. Very accurate digital road maps, together with a wealth of information not only about the roads themselves, but about terrain, obstacles, and other attributes.
  2. Real-time traffic information (These days I don’t even undertake my five-minute commute to work without Waze turned on to ensure I do not hit any traffic. I am too impatient for that.)
  3. Laser and radar detection systems that can quickly map all the various still and moving objects within a large radius.
  4. And finally a hell of a lot of computational power to integrate all this information in real time.

I have never ridden in a Google car, but those who have tell me it is quite an experience and that the accuracy is so impressive that one very quickly forgets that a computer is actually driving a car through traffic composed of human-controlled cars! I am only left to wonder if the system has been calibrated to deal with Italian drivers like me, but that is another problem, which I am sure engineers will be able to figure out.

So, what does this mean for cell biologists and for us biomedical scientists in general? I think it is obvious. What happened in the computer industry has enabled incredible transformations in our field, but even more importantly, our field itself is following the same exponential growth trajectory as the computer industry.

Because of the daunting complexity of biology and of the cell as the fundamental unit of life, up until only a decade or two ago most of the work in our labs was essentially descriptive, as we learned about the various subcellular components through electron micrographs and other static images. We then revisited all these amazing discoveries at a molecular and physiological level. Today we can begin to understand the cell at a systems level, thanks to the enormous amount of available data that allows us to develop machine learning algorithms to detect patterns that even the brightest mind could not identify. Think, for example, of ribosome profiling or super resolution microscopy. We can now understand relationships among cells and within the cell that were unknowable before.

Biology is being transformed by the data and with the data that biologists can now generate and analyze. Our field is becoming heavily data-driven, and although it is unlikely in the near future to be as theory-driven as physics—where the existence of the Higgs boson was forecast before the empirical evidence was available—certainly biology will come much closer to the big data, big analytical approach of physics. It will become more high-tech like astronomy, physics, and engineering, but with the extra complication of what I call the pesky Factor B (Biology), the unpredictable and mysterious ways in which Mother Nature never finishes throwing curves that astonish and surprise us.

I think there are some science policy lessons that we can immediately consider from this perspective. My instinct is that there are two major bottlenecks in our science that need to be addressed to ensure that we reap the same exponential growth as the computing industry.

The first potential bottleneck is data sharing. Think about it: Google cars can theoretically be driven by anyone today because Google maps and traffic information are readily available on our phones in real time. Unfortunately, in biology we do not have a strong tradition of data sharing (with the exception of GenBank and a few other resources, which have been instrumental in originally launching and enabling the field of molecular biology worldwide). But today the hyper-competition between labs and the lack of a nimble data sharing infrastructure are counter incentives, encouraging researchers to keep data closely held, This severely limits crowd-sourcing approaches to analysis.

It is true that some things are changing. The National Institutes of Health (NIH) now has several large databases, mostly on the genomics front. We can applaud the Integrative Human Microbiome Project (phase 2 of the Human Microbiome Project), which has made many important datasets available to researchers. However, most of our imaging and metabolic data are not publicly available, and therefore the community cannot capitalize effectively on work done by others through simple “plug and play” use of data in new models, experiments, or theories. We should strongly encourage NIH, the National Science Foundation, and all science agencies to be as aggressive as is reasonable with data sharing. It is key for the future of biology.

The second potential bottleneck involves workforce issues. Just as the industrial revolution created what legendary British economist John Maynard Keynes recognized as technological unemployment, it is clear that the second machine age will create a dramatic shift in employment patterns. And this will also be true in biology. There will very soon be a high demand for PhDs, postdocs, and biology professors with high-level programming skills who have a data-heavy approach to their science.

This poses an urgent challenge for our training programs, which will need to adopt an interdisciplinary focus on computational and big data skills. There are some wonderful examples around the country of programs that do this, but they are not as widespread as they should be. It will be essential for students to start very early, in college or even in high school, to think of the biological sciences as highly quantitative and highly amenable to computational approaches.

ASCB is at the forefront. This year’s Annual Meeting, which is just around the corner, perfectly captures the challenge of big data and integration at different scales, from the intracellular to the cosmic level. But we also paid attention to the bottlenecks I mentioned above. Under the leadership of President Shirley Tilghman and Program Chair Julie Theriot, the Program Committee have together an innovative meeting that will help ASCB scientists leap forward into this new and exciting world of data integration.

If you are scared by all this, you are not alone. In many respects, this represents a sea change for our field,. To help newcomers and seasoned researchers get their sea legs, the ASCB meeting will have several hands-on workshops in which participants can learn directly from the experts how to generate, use, and analyze big data in cell biology. Consider this my shameless plug for the ASCB Annual Meeting: Get ready for the second machine age of cell biology. Be there to witness how the new “steam engine” of big data will transform our research in ways that we can now hardly imagine. Don’t get stuck with the cell biology version of a plow pulled by an ox!


Stefano Bertuzzi

Dr. Stefano Bertuzzi is the Executive Director of the American Society for Cell Biology. In this position he is responsible, with the ASCB Board, for strategic planning and all operations at the Society to serve the needs of its ~9,000 members and to promote the field of cellular biology and basic science. Email:

Comments are closed for this post.