Assistant Professor  |  Principal Investigator

Artem Babaian

Department of Molecular Genetics

PhD

Address
Room 610
Clinical Interests
RNA, Computational Biology, Molecular Genetics, Virology, Open Science
Appointment Status
Cross-Appointed

Qualification

  • University of Cambridge, Banting Postdoctoral Researcher, 2021-2022.
  • Independent Researcher, 2019-2021.
  • University of British Columbia, PhD in Medical Genetics, 2012-2019.
  • McMaster University, BSc (Hons) in Molecular Biology and Genetics, 2007-2011.

MY RESEARCH OVERVIEW (GO TO SCIENTIFIC OVERVIEW)

The Laboratory for RNA-Based Lifeforms’ is an interdisciplinary research group applying state-of-the-art computing to solve biology’s biggest problems.

Planetary-Scale Virus Surveillance Network

DNA and RNA sequencing data is growing exponentially, even outpacing Moore’s Law. Currently, public databases contain 60+ million gigabytes (60 petabytes) of sequencing data from 10+ million samples, and this doubles every 18 months. Samples range from cancer cells in a lab at UofT, to anal swabs of penguins in Antarctica and everything in-between. Along with what researchers intended to study, sequences from the viruses can also be captured, yet go unanalyzed.

At most 0.1% of Earth’s viruses have been identified. To characterize the full diversity of the viruses on Earth, we develop computing algorithms and techniques to analyze sequencing data at the petabyte-scale. In effect, we recycle billions of dollars of data to drive biological discovery. Recently, in one 11-day analysis we discovered 130,000+ new species of RNA viruses, nearly 10x more than were previously known (including nine new species of surprising Coronaviruses). Moving forward we are developing a system to monitor this global-stream of sequencing data to identify where and when pathogens of pandemic potential show up. It is better that we find them, before they find us.

Our lab upcycles the >10 million publicly-available sequencing datasets
Our lab upcycles the >10 million publicly-available sequencing datasets. Produced and shared by the global biology community, this represents $50+ billion dollars of data.

SCIENTIFIC RESEARCH OVERVIEW

Our research collective focuses on understanding the structure and function of genes through the prism of RNA. Interdisciplinary by design, we complement computational and molecular innovation in the pursuit of fundamental ideas.

Ultra-deep RNA Virus Discovery

The sequence biodiversity of Earth’s RNA virome is enormous and unexplored, at most 0.1% of RNA viruses have been described. We create the computational means for ultra-efficient virus discovery by combining modern informatics and massive (petabyte-scale) data analyses. Together, we are building the digital infrastructure to enable the global surveillance of pathogens of pandemic potential.

A new RNA Genetics

Through illuminating the depths of the “Dark Virome”, we are expanding the known diversity of RNA viruses and virus-like elements, including those thought to be modern remnants of Earth’s most primordial lifeforms. Specifically, we study RNA enzymes, or ribozymes, and structural RNA elements of unknown function. Analogous to the “DNA genetic code” for protein-coding genes, we are learning to read the structural “RNA genetic code” which first evolved in the early RNA World.

Deciphering the ribosome heterogeneity of cancer

The ribosome, itself a catalytic RNA molecule decorated with protein, is central to life as we have come to understand it. Yet the natural and pathogenic (cancer) population genetic variation of ribosomal RNA is poorly understood. We are cataloguing the genetic and epigenetic heterogeneity of ribosomal RNA and delineating its impact on physionormal and diseased translation.

SELECT PUBLICATIONS