- Baylor College of Medicine, Houston, TX, U.S., PhD in Cellular and Molecular Biology, 1998.
- University of Iowa, Iowa City, IA, U.S., BSc in Electrical Engineering, 1993.
- University of Iowa, Iowa City, IA, U.S., Bachelor of Music in String Bass Performance, 1991.
MY RESEARCH OVERVIEW (GO TO SCIENTIFIC OVERVIEW)
How Do Cells See The Genome?
Interpretation of nucleic acid sequence is a fundamental problem in molecular biology. It is now clear that only a small minority of most genomes is protein coding. In contrast, the number and variety of apparent regulatory sequences continues to grow - in human, eclipsing the number of protein coding genes by orders of magnitude. Decoding how regulatory sequences are recognized and interpreted by cells is fundamental to dissecting gene expression mechanisms, interpreting the significance of sequence variants, and understanding the function and evolution of genomes. Despite a wealth of sequencing and expression data, however, it remains surprisingly difficult to predict gene expression patterns on the basis of primary sequence, even in “simple” genomes such as that of yeast. This shortcoming suggest that there are large gaps in available data, and/or that our conceptual models of how transcriptional and post-transcriptional regulation work are too simplistic.
This vexing problem - how do cells recognize DNA and RNA sequence, and act upon it - represents one of the grand challenges of our era, and our goal is to solve it in the general case.
We know that the problem can be solved, because cellular biochemical mechanisms can easily solve it: foreign DNA introduced into cells is typically regulated in the same way as equivalent native DNA. In living cells, many regulatory factors and processes act in concert, and we therefore believe that a complete index of protein-DNA and protein-RNA sequence motifs will likely be necessary for this task. In addition, many biological complexities will need to be incorporated into computational analyses. For example, DNA and RNA binding proteins interact with one another, and also provide an interface that selectively recruits other enzymes and structural components, thus determining the “epigenetic landscape”. Uncovering the linkages between these layers of regulation is critical to a mechanistic understanding of the functions of individual components as well as the entire system.
SCIENTIFIC RESEARCH OVERVIEW
Major projects in the lab typically involve both wet and dry lab aspects, and many are collaborative, because it often becomes essential to engage experts in diverse and unexpected fields.
Project central to the lab include the following:
1. Improving sequence-based models of gene regulation
Basic questions in this arena encompass how regulatory sequences and even gene structures are defined, and how they are activated. We recently described a computational model that inputs only the sequence preferences of DNA and RNA binding proteins, and accurately predicts known gene structures, as well as expression from randomly-generated sequences (de Boer et al., Genome Research 2014). The model also predicts and explains the origins of non-genic transcripts, and indicates that definition of genomic elements is intimately tied to control of expression levels - a finding that is likely relevant in other genomes, e.g. human, in which the locations of regulatory elements and transcript isoforms often change between cell types. Goals of ongoing efforts include predicting transcript levels, modelling post-transcriptional regulatory mechanisms, and extending these models in other organisms.
2. Comprehensive views of protein-DNA and protein-RNA recognition
Eukaryotic genomes encode hundreds to thousands of proteins that contain sequence-specific DNA and RNA binding domains. These are the basic building blocks of gene regulation programs, and their binding motifs are an essential ingredient in the gene regulation formula. We are systematically decoding protein-DNA and proten-RNA sequence preferences across the eukaryotes (Weirauch, Yang et al., Cell 2014; Ray, Kazan, Cook, Weirauch, Najafabadi et al., Nature 2013) using a variety of approaches, including new experimental and computational techniques that we develop.
3. Dissecting the evolution of gene regulation mechanisms
Many DNA and RNA binding proteins display deep evolutionary conservation, but most lineages also contain families in which divergence is common. Striking cases include the ~700 human C2H2 zinc finger proteins, whose expansion appears to be driven by retroelements, which are highly enriched among genomic sequences bound in vivo (Najafabadi, Mnaimneh, Schmitges et al., Nature Biotechnology 2015). In C. elegans, the nuclear hormone receptors have undergone a similar expansion and diversification in binding sites (Narasimhan, Lambert et al., eLife 2015). We anticipate that global analysis of binding sites, sequence preferences, protein partners, and the impact of genetic perturbations on transcript levels will provide insight into the function and origin of newly-evolved proteins.
- C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Najafabadi HS, Mnaimneh S, Schmitges FW, Garton M, Lam KN, Yang A, Albu M, Weirauch MT, Radovani E, Kim PM, Greenblatt J, Frey BJ, Hughes TR. Nat Biotechnol. 2015 May;33(5):555-62.
- Determination and inference of eukaryotic transcription factor sequence specificity. Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, Zheng H, Goity A, van Bakel H, Lozano JC, Galli M, Lewsey MG, Huang E, Mukherjee T, Chen X, Reece-Hoyes JS, Govindarajan S, Shaulsky G, Walhout AJ, Bouget FY, Ratsch G, Larrondo LF, Ecker JR, Hughes TR. Cell. 2014 Sep 11;158(6):1431-43.
- A compendium of RNA-binding motifs for decoding gene regulation. Ray D, Kazan H, Cook KB, Weirauch MT, Najafabadi HS, Li X, Gueroussov S, Albu M, Zheng H, Yang A, Na H, Irimia M, Matzat LH, Dale RK, Smith SA, Yarosh CA, Kelly SM, Nabet B, Mecenas D, Li W, Laishram RS, Qiao M, Lipshitz HD, Piano F, Corbett AH, Carstens RP, Frey BJ, Anderson RA, Lynch KW, Penalva LO, Lei EP, Fraser AG, Blencowe BJ, Morris QD, Hughes TR. Nature. 2013 Jul 11;499(7457):172-7.