Researchers at the University of Toronto’s Donnelly Centre for Cellular and Biomolecular Research have found close to one million new exons – stretches of DNA that are expressed in mature RNA – in the human genome.
There are around 20,000 protein-coding genes in humans that contain approximately 180,000 known internal exons. These protein-coding regions account for only one per cent of the entire human genome. The vast majority of what remains is a mystery – aptly referred to as the ‘dark genome’.
“We’ve started to chip away at the dark genome by finding nearly one million previously unknown exons through a method called exon trapping,” said Timothy Hughes, principal investigator on the study and professor and chair of molecular genetics in U of T’s Temerty Faculty of Medicine.
“The technique involves an assay with plasmids to find exons in DNA fragments of unknown composition,” said Hughes, who holds the Canada Research Chair in Decoding Gene Regulation and the Billes Chair of Medical Research at U of T. “While exon trapping is not widely used anymore, it proved to be effective when used in combination with high-throughput sequencing to scan the entire human genome.”
The journal Genome Research published the findings in 2023.
Exons are segments of the genome that can encode proteins to direct tissue development and biological processes within the body. Exons are considered to be autonomous if they don’t require external assistance to splice into a mature RNA transcript, which is then translated into a protein.
The team behind the study was driven to test the exon definition model that guides research in molecular genetics after questioning one of its assumptions – that the accurate removal of non-protein-coding intron regions of the genome is aided by clear and consistent indicators of where exons begin and end. This assumption does not seem to hold in all cases as the splicing of exons does not always go smoothly, sometimes resulting in mature RNA transcripts that contain nonfunctional components.
“Almost none of the newly discovered exons are found consistently across genomes of different species,” said Hughes. “They seem to appear in the human genome mainly due to random mutation and are unlikely to play a significant role in our biology. This is evidence that evolution in humans involves a lot of trial and error – most likely enabled by the vast size of our genome.”
It is helpful to document randomly mutated exons within the human genome as their translation could potentially be harmful. Long noncoding RNA exons, which are autonomous but often have no known function, have been connected to the development of cancer. Of the roughly 1.25 million known and unknown exons the team found through exon trapping, almost four per cent were long noncoding RNA exons.
In addition, the exons residing within non-coding introns, called pseudoexons, can mutate to make a weak splice site stronger. This results in the exon being included in a mature RNA transcript, potentially leading to disease.
“This is an interesting study that broadens our knowledge of sequences across the human genome that have the potential to be recognized as exons in transcribed RNA,” said Benjamin Blencowe, professor of molecular genetics at U of T, who was not involved in the study. “While the significance of the majority of the newly detected exons is unclear, some of them may be activated in certain contexts – for example, by disease mutations – and therefore cataloguing them is important. This study will further serve as a valuable resource facilitating ongoing efforts directed at deciphering the splicing code.”
A stronger understanding of the factors impacting exon inclusion in mature RNA can help improve programs like SpliceAI, a widely used tool for predicting splice sites and aberrant splicing. SpliceAI can be trained on new data, like that produced through this study, to refine its prediction capabilities.
“SpliceAI often doesn’t provide details on the characteristics of exons and has a poor ability to predict splicing in exons that aren’t already catalogued,” said Hughes. “Our exon trapping data contains biologically meaningful information that can be fed into SpliceAI and other splicing predictors to open up new paths for exploring the dark genome.”
This research was supported by the Canadian Institutes of Health Research (CIHR) and the National Institutes of Health (NIH).