New algorithm improves ability to generate better genetic predictors

Researchers now have a new, more refined tool, the Polygenic-Risk-Score (PRS) on Spark, to help them understand an individual’s genetic risk for common illnesses, including mental illnesses. In an initial validation study, the newly developed software package successfully generated a polygenic risk score for a cohort of Canadian women that explained and predicted more of the variations in their symptoms of depression compared to other software packages.

Search for single genetic factor shifting to polygenic risk scores

With the realisation that most common illnesses cannot be linked to a single genetic or environmental cause, polygenic risk scores are becoming a key focus of emerging research across many health domains. The PRS on Spark software, published in BMC Bioinformatics, was developed by a research team that included fourteen pan-Ludmer-Centre researchers at the Douglas Hospital Research Centre and the Jewish General Hospital together with colleagues from the University of British Columbia.

Lawrence M Chen, first author and a PhD candidate in McGill’s Integrated Program in Neuroscience, noted, “Polygenic risk scores can be used to describe an individual’s genetic liability for a range of complex traits, including adverse mental health outcomes”. These are important because polygenic risk scores allow researchers to move beyond analysis of single genes (monogenic) to describe how variation across the genome (polygenic) influences an individual’s risk for adverse health outcomes. Such approaches capitalize on large-scale global initiatives, such as the Psychiatric Genomics Consortium. These describe the genetic architecture of psychiatric disorders and allow us to build better genetic predictors, and then apply this knowledge in relatively small cohorts of individuals.

Lawrence M. Chen

Ludmer Centre researcher and CIFAR Global Scholar, Dr Kieran O’Donnell, added, “The algorithms that facilitate the creation of polygenic risk scores are relatively simple; however, our new software provides users with additional options to include more genetic variants, specifically, those not usually considered by conventional approaches.”

Why be excitement by Polygenic Risk Scores?

While single genetic mutations can impact an individual’s health, most common complex disorders arise from the combined effect of multiple genetic variants across the genome, which operate in conjunction with environmental factors (such as exposure to stress, trauma, etc.) to influence health outcomes, including who develops an illness or disorder. New data-acquisition technologies for genome-wide association studies (GWAS) and their decreasing costs have expanded the availability of genetic and phenotypic data collected across hundreds of thousands of individuals. Not only is this data improving our understanding of the genetic underpinnings of mental illnesses, but it also enables researchers to better identify ‘risk variants’ and to create summary scores, or polygenic risk scores, that reflect an individual’s overall susceptibility, or genetic risk, for a specific health outcome. Tools, like PRS-on-Spark, that transform this data into a risk score are indispensable.

According to Dr O’Donnell, “Polygenic risk scores have the potential to inform personalized approaches, for example, in understanding individual differences in intervention efficacy. We hope our new software, PRS-on-Spark, will help facilitate this.”

PRS-on-Spark improves research

To increase the predictive power of a polygenic risk score, researchers need to consider as much genomic information as possible. For DNA, researchers assess genomic variety through the measurement of single nucleotide polymorphisms (SNPs) between genomes. Calculating allele frequencies or genotype counts per SNP is complex, time-intensive and error-prone. In the PRS-on-Spark research, Ludmer Scientific Director Dr Celia Greenwood noted that strand ambiguity was extremely tricky. From one genotyping chip to another she and her colleague Dr Marie Forest spent a lot of time with Dr Chen’s team just ensuring the alleles were correctly counted.

Ludmer Centre researcher and CIFAR Global Scholar, Dr Kieran O’Donnell, added, “The algorithms that facilitate the creation of polygenic risk scores are relatively simple; however, our new software provides users with additional options to include more genetic variants, specifically, those not usually considered by conventional approaches.”

Why be excitement by Polygenic Risk Scores?

While single genetic mutations can impact an individual’s health, most common complex disorders arise from the combined effect of multiple genetic variants across the genome, which operate in conjunction with environmental factors (such as exposure to stress, trauma, etc.) to influence health outcomes, including who develops an illness or disorder. New data-acquisition technologies for genome-wide association studies (GWAS) and their decreasing costs have expanded the availability of genetic and phenotypic data collected across hundreds of thousands of individuals. Not only is this data improving our understanding of the genetic underpinnings of mental illnesses, but it also enables researchers to better identify ‘risk variants’ and to create summary scores, or polygenic risk scores, that reflect an individual’s overall susceptibility, or genetic risk, for a specific health outcome. Tools, like PRS-on-Spark, that transform this data into a risk score are indispensable.

According to Dr O’Donnell, “Polygenic risk scores have the potential to inform personalized approaches, for example, in understanding individual differences in intervention efficacy. We hope our new software, PRS-on-Spark, will help facilitate this.”

PRS-on-Spark improves research

To increase the predictive power of a polygenic risk score, researchers need to consider as much genomic information as possible. For DNA, researchers assess genomic variety through the measurement of single nucleotide polymorphisms (SNPs) between genomes. Calculating allele frequencies or genotype counts per SNP is complex, time-intensive and error-prone. In the PRS-on-Spark research, Ludmer Scientific Director Dr Celia Greenwood noted that strand ambiguity was extremely tricky. From one genotyping chip to another she and her colleague Dr Marie Forest spent a lot of time with Dr Chen’s team just ensuring the alleles were correctly counted.

Dr Celia Greenwood

Finally, the number of SNPs that need to be considered are often large, rendering PRS calculations memory intensive. Dr Celia Greenwood said, “As a result, existing software packages that semi-automate this processing discard strand-ambiguous SNPs (those with A/T or C/G alleles) before calculating the polygenic risk score, potentially discarding important SNPs that may be informative for disease prediction or health outcomes.” PRS-on-Spark software provides users with the option of retaining these SNPs to build more inclusive polygenic risk scores. It is this feature that improves is effectiveness.

PRS-on-Spark validate on Major Depressive Disorder

To validate and test its effectiveness, PRS-on-Spark was compared to another software program commonly used to generate polygenic risk scores, PRSice. Both were used to predict the occurrence of major depressive disorder among women participating in the Maternal Adversity, Vulnerability and Neurodevelopment (MAVAN) cohort.

The PRS-on-Spark program generally calculated polygenic risk scores more quickly than existing software, but more importantly, due to the inclusion of previously discarded strand-ambiguous SNPs, also explained more of the variation in symptoms of depression. 

Dr O’Donnell concluded, “We are excited about these findings but there is still much more to be done.  In general, PRS still explain just a fraction of an individual’s risk for most mental health outcomes. Likewise, these predictors do not work equally as well in everyone. Hopefully, by making resources such as PRS-on-Spark available to the community, we can better understand why that is, and in doing so advance our understanding of the genetic contribution to mental health.”

For a more scientific description, read the article published in BMC Bioinformatics:

Dr Kieran O'Donnell

Chen LM, Yao N, Garg E1, Zhu Y1,2, Nguyen TTT1, Pokhvisneva I, Hari Dass SA, Unternaehrer E, Gaudreau H, Forest M, McEwen LM4, MacIsaac JL, Kobor MS, Greenwood CMT, Silveira PP, Meaney MJ, O’Donnell KJ. (2018) PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores. BMC Bioinformatics. 2018 Aug 8;19(1):295. doi: 10.1186/s12859-018-2289-9.

PRS-on-Spark (PRSoS: https://github.com/MeaneyLab/PRSoS) is implemented in Apache Spark 2.0.0+ (Spark) and Python 2.7. Spark is an open source cluster-computing framework for big data processing that can be integrated into Python programming.