From bench to bytes: Learning how to code as a biologist

I am a cell biologist by training. Give me a cell culture dish or a microscope, and I’m happy. During the first two years of grad school, my work lived solidly and happily within the realm of cancer cell biology that I knew and was comfortable with. However, during my third year, my datasets got larger and my project took a sharp and intimidating turn toward bioinformatics. To be honest, this prospect terrified me. I am not particularly computer-savvy, and I had zero computer science or coding experience. But, given enough time, practice, collaboration, and coffee, I managed to grasp the basics of bioinformatics and analyze a brand new and exciting dataset that was previously foreign to me.

In fact, there is a shift in the entire field of biology toward larger and more complex datasets that will require bioinformatics and machine learning to fully understand. As technologies such as single cell sequencing, metabolomics, and live cell imaging analysis become more common, we as scientists will need to diversify our skill sets in order to analyze the massive volumes of “-omics” data we can now produce. Although moving from “wet lab” to “dry lab” research may seem like an intimidating prospect, I would like to share some things that helped me along the way and may be of use to other beginners as well.

Pick the language(s) you need

A wide variety of coding languages are used, and every field has its favorites. However, for bioinformatics-related applications (like analyzing sequencing or proteomics data), two standouts are good choices for novice coders: R and Python. I will spare you a lengthy discussion of the differences between the two, but just know that these are both great languages for beginner biologists because huge communities of biologists are already using them. This means that there is a vast array of plugins and packages already developed by people who know what they’re doing, i.e., you don’t need to reinvent the wheel. For example, I wanted to analyze a DNA methylation array, and I used a package in R called ChAMP that already had exactly the functions I needed without having to write a whole new protocol. In addition, R and Python both have integrated development environments (IDEs) and code editors. Basically, IDEs and code editors add a user interface that makes your coding workspace more organized and user-friendly for beginners; they make it much easier to add additional packages, manage multiple scripts/files/plots at once, autocomplete code as you type, find errors in your code with more detailed error messages, and more. For R, RStudio is the go-to standard, and Python has many available options, including, Jupyter, Spyder, or PyCharm. Note that R and Python are only two of MANY programming languages, so talk to your colleagues and see which languages they are already using to decide the best choice for you.

Find training resources in person when possible

When I started to learn how to code, I used a mix of different resources to learn what I needed. By far my most useful resource was a grad student in our neighboring lab who is an expert in R. He was able to point me to the best resources to learn, the best packages to use, scripts he had already written for similar project, etc., and he answered all of my many questions along the way. I strongly recommend finding a colleague at your institution who is already familiar with the language you want to learn.

I also took two courses at my home institution to learn and practice more of the coding skills I needed. Having in-person training and feedback was incredibly valuable to me, so I encourage you to look for courses or workshops as well.

Google and online help sites are your best friend

If in-person courses aren’t available, there are many online training courses (both free and paid) to learn everything from the basics of “What the heck is coding?” to specifics like “How do I make a heatmap from my RNAseq data?” Online tutorials are great resources because they almost always have practice datasets for you to use. I used the beginner R tutorials from swirl and DataCamp, but there are plenty of other sites to use as well, including Codecademy, Free Code Camp, and Coursera.

In addition, I can’t tell you the number of times I’ve googled “How to do ___ in R” or “What does XYZ error message mean?” Fortunately, the internet is full of people with the same problems and with many opinions on how to fix them. One of the biggest online communities for coding help is Stack Overflow, which is full of Q&A pages on any and every programming problem you can think of. In addition, GitHub is a great site for sharing your code with other people on the project, or even the wider scientific community in order to debug or get more feedback.

Collaborate, collaborate, collaborate!

As I mentioned above, my best training resource was a fellow grad student. Learning from others who know what they’re doing is a fantastic way to polish your skills and get constructive feedback. In addition, it’s great to have someone else proofread your scripts to make sure they make sense and are usable by other people.

And, good news, if you can’t find an expert to learn from at your home institution, both R and Python (and many other languages) have many online support groups of other people using and learning the languages. Sometimes there are even city-specific groups where you can meet up in person with coders of all skill levels (e.g., Python Local User Groups and R-Ladies).

You don’t need to become an expert in everything coding-related

Let’s say you are planning to go on a month-long trip to a foreign country where people don’t speak your native language. There’s no need for you to quit your job, study that language every day, and become 100% fluent before your trip. Instead, it would be wise to spend a little time each day learning practical phrases, like “Good morning, how are you today?” or “Where is the bathroom?” In the same way, there is no need for you to become a coding expert to perform the analyses you need. Instead, decide on what specifically you want to accomplish and focus on the skills and commands needed for your specific project instead of trying to absorb everything. Once you start figuring out the steps you need to take to get from dataset to paper figure, it’s easier to decide what you really need to learn and what you can skip.

Fail often, but also celebrate the small victories

Coding is hard, especially for beginners. It’s an entirely different way of thinking that’s not always intuitive to people who have spent their lives in the world of biology. But remember that it’s perfectly normal (and even good) to “fail” at programming. The stakes are incredibly low; you don’t need money or precious lab resources to practice coding, only time and patience. If your code keeps running into bugs, take the time to figure out what’s wrong and adjust. Every little mistake in coding is a learning opportunity, and a chance to find creative solutions to complex problems. And every time something goes right, be proud of your progress! So go ahead, celebrate that first data table you finally compiled, or print out that first heatmap and post it above your desk for all to admire. Even small victories in coding are major steps on the way to becoming a more well-rounded scientist.

The views and opinions expressed in this blog are the views of the author(s) and do not represent the official policy or position of ASCB.

About the Author:

Emily Summerbell is a member of the ASCB Committee for Postdocs and Students (COMPASS). She is a postdoc in the lab of Meera Murgai at the National Cancer Institute studying phenotypic plasticity in stromal cells during metastasis. Twitter: @esummerbell