Designing precision tools to mine DNA data

Ludmer Centre Scientific Director Celia Greenwood secured over $600K in funding for new research, Precision Medicine in Cellular Epigenomic, from the recent Genome Canada competitions. 

To understand brain development, researchers need to unlock the secrets of our DNA. Currently, we can collect data on multiple aspects of DNA, but to extract meaning from the ever-expanding data trove requires the right tools: sophisticated algorithms and software applications that automate complex analytical processes at the push of a button. Building these tools takes a highly skilled, transdisciplinary team. Funded by a new three-year Genome Canada Bioinformatics and Computational Biology grant, Dr Celia Greenwood has brought together a team of experts to advance tool-development for processing DNA methylation data. The end goal of the research program is to develop tools that will impact research across multiple disorders, physical and mental.

One tool to serve multiple needs

Although most people think of them as being completely different, mental, neurological and even some physical illnesses share commonalities in their origin, or epigenetic, stories. Like a dimmer switch, epigenetic mechanisms—small chemical reactions at the cell level—regulate the degree or intensity to which our genes are expressed, or whether they are turned on and off altogether. One such mechanism, DNA methylation, can determine which parts of our DNA are rendered active or inactive. DNA methylation is influenced by environmental exposures (disease, contaminants, stress, adversity, etc.) and factors such as genes, age and gender. Importantly, for diagnosis and precision medicine, DNA methylation leaves marks on our genome that capture the convergence of both, our genes and environmental (epigenetic) influences. These marks enable insights into disease causation as well as a person’s response to therapies. Consequently, a shared research goal across multiple illnesses is to identify and map DNA methylation markers that associate with particular patient symptoms or types of environmental exposures.

Data collection easier, but analytical tools lag

While gathering genetic and DNA methylation data has gotten a lot easier, the tools to process and extract information from what is fast becoming a data tsunami are lagging. For example, using bisulfite sequencing tools it is now technically and financially feasible to undertake single nucleotide-resolution measurement of DNA methylation on a large scale across the genome. However, the resulting datasets are voluminous and highly interconnected, rendering them extremely messy or noisy. They often contain imprecise measures or are missing data, which makes interpretation a challenge and severely limits the potential of bisulfite sequencing studies to describe the role of epigenetics in a specific disease.

Dr Greenwood and her colleagues aim to address this challenge

The objective is to develop an algorithm and software package to analyze large-scale, high-dimensional DNA methylation data that will enable researchers to extract the valuable data hidden amongst the noise in the data. Dr Greenwood noted, “The team is fairly optimistic, as we have already developed a prototype bisulfite sequencing method for single-sample analysis. Building on this and with the new funding, we are confident in our ability to design algorithms for multiple-sample analysis.”

A beta version of the new algorithm, called SOMNiBUS (SmOoth ModeliNg of BisUlfite Sequencing), has already been developed. It estimates locally-smoothed effects[1] of factors of interest, such as exposures or treatments, which provides a conceptually straightforward route towards integrating genetic information into DNA methylation profiles. With research and development funding from Genome Canada and partners, the team will expand and move SOMNiBUS from its beta version to a fully tested software application that will facilitate regional modeling of bisulfite sequencing methylation data. This will lead to improved understanding of how environmental exposures, through epigenetic alterations, lead to disease or impact treatments.

Noting that the DNA algorithm for multiple-sample analysis could be developed using datasets from any number of diseases, Dr Greenwood said the team will commence with data from Systemic Autoimmune Rheumatic Diseases (SARD), in particular, scleroderma, a rare debilitating autoimmune and rheumatic disease. Epigenetic factors are believed to play a role in scleroderma risk, progression and treatment response, which is currently unpredictable. Members of the team have been collecting multiple tissue samples from the patients, pre- and post-treatment, but lack methods for regional analysis of large-scale bisulfite sequencing data that can account for or model multiple factors simultaneously (e.g. age, sex, treatment). The team will use the new SOMNiBUS algorithm to identify the DNA methylation patterns associated with SARD symptoms. In turn, these will be used to predict treatment response or disease progression.

According to Gerald Batist, Director of the Segal Cancer Centre and the McGill Centre for Translational Research in Cancer, “Advances in methods for analysis of epigenetic data are strongly needed. At the Segal Cancer Centre, we are characterizing tumours both before and after treatment, and are extremely interested in obtaining a better understanding of epigenetic modifications.” The Ludmer Centre is supporting SOMNiBUS development as research suggests that exposure to early-childhood adversity is increasingly associated with altered chemical markers for DNA methylation and an individual’s eventual risk of developing a mental illness later in life. 

Matched-funding partners, crucial to R&D

Visit our Seeds of Change Campaign help support student research 

A condition of receiving the Genome Canada funding was the identification of matched financial support.  Matched funding for this crucial R&D development came from Scleroderma Quebec and two Montreal-based companies, My Intelligent Machines (MIM) and GPU one. MIMs, which develops machine-learning methods, is supporting a trainee and will integrate SOMNiBUS into the set of services they provide to clients looking for analytic solutions for bisulfite sequencing measures of DNA methylation. GPU one, which is providing access to a high-performance computational platform, will provide opportunities for dissemination to the GPU-user community and visibility through their portal. The Lady Davis Institute is providing trainee support.

According to Genome Quebec, which also supported Dr Greenwood’s application, Québec researchers secured $18 million for 11 projects, over 30% of the available federal funding across three competitions. The strong performance by Québec researchers in Genome Canada’s most recent competitions positions Québec as a key driver of innovation.

Meet the team

The lead investigators, Dr Greenwood, a McGill researcher at the Lady Davis Institute, and Dr Karim Oualkacha at the Université du Québec à Montréal, are both Canadian leaders in statistical genetics. Co-investigators include two methodologically strong statisticians, Dr Lajmi Lakhal-Chaieb at Université Laval and Dr Aurélie Labbe at HEC Montréal.  Three outstanding epigenetics researchers complete the team, Dr Marie Hudson, a McGill researcher at the Lady Davis Institute studying scleroderma and autoimmune rheumatic diseases; Dr Denise Daley at the University of British Colombia who studies asthma and food allergy; and Ludmer researcher Dr Tie-Yuan Zhang at the Douglas Hospital who studies maternal-foetal interactions and behavioural development.

[1] Smoothing: A process by which data points are averaged with their neighbors in a series, such as a time series, or image. Also referred to as filtering as it has the effect of suppressing high frequency signal and enhancing low frequency signal.