One notable and medically relevant type of genomic variant is missense variants, defined as gene mutations resulting in a change from one amino acid to another. A significant barrier in determining individual risk for genetic disorders is ascribing disease risk to specific variants. This is because over 99% of missense variants in the human population are classified as rare, meaning they have a minor allele frequency (MAF)—defined as the frequency of the second most common allele in a population—of around 0.5%, and 90% are extremely rare, with a MAF less than 10-6. There is a lack of available evidence for the pathogenicity of rare variants compared to common variants, resulting in a need to improve computational methods for inferring disease risk. In their new study, a research effort from the lab of Fritz Roth, a professor of molecular genetics and compter science in the Donnelly Centre for Cellular and Biomolecular Research, led by PhD student Yingzhou (Joe) Wu, aimed to better predict pathogenicity for rare and extremely rare variants by creating a computational algorithm called VARITY, optimized for rare and extremely rare missense variants.
Read more about Professor Roth's work on hunting down harmful variants.
The VARITY model is as follows: the team extracted all missense variants from roughly 18,000 genes and identified around 4,000 disease-associated proteins. They used variants and properties (‘features’) of variants from many databases to train their machine-learning algorithm to classify variants. Although many sources of variant annotation were used, the model was optimized for performance on rare or extremely rare variants with high quality pathogenicity annotations from ClinVar.. After a machine learning step, the researchers analyzed their model. They found that features such as conservation scores, differences in physicochemical properties between the missense and wild-type amino acid, and molecule surface area accessible to solvent were the most critical contributors to predicting variant outcomes. Most importantly, the Roth lab found that the VARITY approach outperforms other computational methods in pinpointing rare pathogenic variants, identifying 12-13% more pathogenic variants than others. Indeed, when tested on de novo missense mutations for neurodevelopmental disorders, VARITY was more sensitive (had higher recall) than all the other algorithms, at a stringent threshold where 90% of predictions were correct. It also surpassed other methods when tested on ClinVar rare variants that had not been used to train the model. Future studies could address VARITY performance improvement by adding features such as inheritance (ex. dominant, recessive) and mechanism (gain or loss of function) to their databases. This model alongside further research into computational predictors will contribute to boosting clinical genetic testing accuracy and giving further insight into genetic disorders and their mechanisms.
Follow us on LinkedIn and Twitter to keep up with Donnelly Centre news.