From Molecules and Cells to Human Health : Ideas and concepts

Collection From Molecules and Cells to Human Health : Ideas and concepts

Organizer(s)
Date(s) 27/04/2024
00:00:00 / 00:00:00
10 34

Development of an AI-assisted algorithm for the prediction of novel causal genes and variants for mendelian disorders from whole genome sequencing

By Shuji Kawaguchi

Advances in DNA sequencing technologies have now enabled the rapid and cost-efficient identification of casual genes and variants for a number of diseases. This is especially true for Mendelian disorders, where patients who carry a causative variant in their genome, can finally obtain a definitive diagnosis on their disease. However, even with this revolutionary technology, the actual success rate of genetic diagnosis via next-generation sequencing is currently only at around 30% for undiagnosed Mendelian disease cases. This is in part due to the limitations of the analytical methods that are available to identify and prioritize casual variants from the vast amounts of sequencing data generated. Currently, the genetic diagnosis of Mendelian disorders is performed by comparing the genome of a patient to those of a large number of controls. Such comparisons generally produce a large list of genetic variants that are unique to the patient. Many of these are probably benign and identifying the causal gene and variant can be a real challenge. To address this problem, we have developed a novel method thatranks candidate genes and variants using an AI-assisted algorithm that relies on IBM Watson’stext mining approach. As a proof of concept, we used a large whole-genome sequencing (WGS) dataset on Retinitis pigmentosa (RP) with 523 cases and 2,143 controls. Our method consists of the following steps: 1) Select the inclusion criteria of variants to maximize the difference between true positive rate for patients and false positive rate for controls based on previously known causal genes from a public database. 2) Using this inclusion criteria, create a list of candidate genes and variants. 3) Use IBM Watson to sort and prioritize this list of genes. Using this strategy on the RP WGS dataset, we were able to identify and priority 994 candidate genes. Notably, many of our top ranked genes shared structural and functional features with previously known RP genes. We also succeeded in increasing the diagnosis rate of RP from 37% to 52% by incorporating these top ranked candidates without increasing the rateof false positives in controls. Going forward we plan to further improve the approach by integrating other AI technologies that rely on omics or image analysis data. We also plan to develop a gene and variant registry with the aim of constructing a comprehensive infrastructure in Japan for studying the genetics of intractable diseases. In this registry, various AI technologies will be implemented to perform integrative analyses across various diseases.

Information about the video

  • Date of recording 06/03/2018
  • Date of publication 26/03/2018
  • Institution IHES
  • Format MP4

Last related questions on MathOverflow

You have to connect your Carmin.tv account with mathoverflow to add question

Ask a question on MathOverflow




Register

  • Bookmark videos
  • Add videos to see later &
    keep your browsing history
  • Comment with the scientific
    community
  • Get notification updates
    for your favorite subjects
Give feedback