The book world usually provides the metaphors for genomics, likening genes to words or chapters, but in his Keith Porter Lecture at ASCB 2015 last December, Jonathan Weissman compared the human genome to music. We have around 20,000 genes or protein products, said Weissman, compared with the 2.3 million parts in a Boeing 787 Dreamliner. And yet from this relatively small genome, our cells assemble a dazzling array of structures, actions, and effects. Think of a piano, which has only 88 keys, suggested Weissman, a professor at the University of California, San Francisco (UCSF), and a Howard Hughes Medical Institute investigator. “And yet a piano can play this enormous diversity of music, everything from a Mozart piano concerto to Ray Charles based on which score is being played, which keys are being pressed, how loud, and in what cadence,” Weissman told the Porter audience. “The same thing is true of our genes.” In Weissman’s view, cell biologists have only just begun to hear genomic music in its full sweep and register.
New technologies—next-generation sequencing and big data analysis through machine learning algorithms—have opened our ears to the music of the genome, according to Weissman. And newer tools, he said, such as CRISPR/Cas9 methodology, which is already expanding its usefulness through Cas9 variants, “are going to allow us to go beyond listening to composing our own music.”
Listening to Genomes
Ribosomal profiling is the listening technology most closely associated with Weissman’s lab at UCSF. Its scientific roots go back to pioneering work by Joan Steitz at Yale and developed further by Sandra Wolin (now at Yale) when she was in Peter Walter’s UCSF lab. It is based on an old observation that in translation, ribosomes strongly protect about 30 nucleotides from digestion, leaving a specific mRNA fragment or footprint that can be read out by nuclease sequencing. Until recently, only a handful of these mRNA footprints could be read at a time but next-gen changed all that, said Weissman. Now it is possible to deep sequence these footprints by the hundreds of millions and to plot them through bioinformatics as histograms. Suddenly a richly detailed panorama emerges of the translation landscape, said Weissman, revealing “what proteins are being expressed, where, and how many in the cell.”
But why go to such lengths to read out the mRNA transcription of canonical genes that have been thoroughly annotated already? The problem, said Weissman, is that, “The long canonical protein-coding regions are annotated with assumptions like they have to start with AUG or they have to be longer than 100 or so amino acids. And that’s true except when it’s not true.”
Weissman’s example of what’s not true came from studies in his lab by Noam Stern-Ginossar, now at the Weizmann Institute, who analyzed two Open Reading Frame (ORF) genes from the thoroughly annotated genome of human cytomegalovirus. Read in either direction, the ORF turns out a canonical protein, but through massively parallel ribosome fragment plotting, Stern-Ginossar found hidden in the mid-region of the canonical gene a non-canonical protein product only 25 nucleotides long that cytomegalovirus was churning out in far larger quantities than either of the known genes. The hidden gene had been passed over because it started with the codon CUG instead of the canonical AUG. Weissman believes that there are thousands of these short peptides or altered forms of canonical genes to be discovered, the “dark matter” of the genome, he called them.
Translation is serious business for the cell, Weissman said. “Protein translation, especially for a rapidly growing cell, is the most energy expensive process in the cell. Half or more than half of [a bacterial cell’s] energy can go into making proteins or making the machinery that makes proteins. So the cell cares desperately how it uses these limited resources.”
Deep sequencing of mRNA fragments has thrown light on other aspects of cell life. For example, there is the “Ikea principle,” Weissman explained. It was proposed by another former Weissman postdoc, Gene-Wei Lei, now at the Massachusetts Institute of Technology, who studied the ribosomal footprints left in Escherichia coli by translation of the ATP operon, a single polycistronic gene that controls the production of the eight subunits that make up the F0F1 ATP synthase machinery. This includes the axle and stator motor apparatus that drives ATP production. Although controlled by a single gene, the ATP operon manages to churn out different quantities of each subunit—in Wei’s analysis, 10 times as many axles as stators. This, said Weissman, follows the approach of Ikea, the home goods retailer, which can sell its “Lack” end table for $9.95 because it produces only one table top for every four legs. Extra legs (or extra table tops) would not only be wasteful but difficult to store in Ikea’s warehouses or in cells, Weissman pointed out.
Composing Our Own Genomic Music
But cell biology is now moving beyond just listening to the genomic music. Weissman said that CRISPR/Cas9 has brought us to the point where we will be able to compose. Feeling no need to explain CRISPR/Cas9 to an audience of cell biologists, he told the ASCB attendees that there are new possibilities for the technology beyond precision gene splicing. Working with Jennifer Doudna, one of CRISPR’s original developers, Weissman was part of a group including Stanley Qi (then at UCSF, now at Stanford) and Wendell Lim, UCSF, that developed a catalytically “dead” version that they called dCas9. Instead of a programmable nuclease, dCas9 is a programmable DNA-binding protein. “This now lets us do lots of nifty things,” he said. “Essentially what we’ve done is to create a volume switch, as it were, that lets us turn up or turn down any gene or combination of genes at will.”
But dCas9 is only the first of many “CRISPR i/a,” variants that use interference and activation to precisely manipulate the genome. These variants will be able to block transcription, place epigenetic modifiers at any locus researchers choose, or insert green fluorescent protein to light up places in the genome without having to modify the cell. They will make possible unimaginably large drug and genetic screens, becoming tools to road test combinatorial drug therapies and stop driver mutations in cancer.
These technological advances were developed in response to curiosity-driven biological questions, a “virtuous cycle” in Weissman’s view. Together, they offer researchers the prospect of a new science, that is both hypothesis-driven and discovery-based. “It lets you wander in the forest to lift up a log,” said Weissman, “and see what crawls out from under it.”