Skip to main content
  • Home
  • ताजा घटनाएं
  • कार्यक्रम
  • Analysis and development of machine learning models for identifying driver mutations and disease-prone sites in cancer genomes
Analysis and development of machine learning models for identifying driver mutations and disease-prone sites in cancer genomes

Analysis and development of machine learning models for identifying driver mutations and disease-prone sites in cancer genomes

Date13th Oct 2023

Time10:00 AM

Venue Google Meet

PAST EVENT

Details

Cancer is one of the most deleterious diseases that depend upon the changes occurring at the gene level and is mainly driven by somatic missense mutations that lead to uncontrolled cell division. Missense mutations play an important role in altering many cellular signalling processes. These mutations can be classified as drivers, which confer a growth advantage on cells, while passengers do not have much proliferative benefit. We collected the data for experimentally known drivers from the cancer gene census and identified cancer-type specific motifs and sequence-based properties to develop computational methods. In lung cancer, Gly, Asp, Glu, Gln, and Trp residues and LG, QF and TST motifs are preferred in disease-prone sites. We developed a classification method to identify disease-prone sites in lung cancer (CanProSite). The model can predict the disease-prone sites at an accuracy of 81% with sensitivity, specificity, and AUC of 82%, 78%, and 0.91, respectively, on 10-fold cross-validation. We further extended the study for other cancer types and developed classification methods. Our method showed excellent performance with average sensitivity, specificity, and AUC of 95%, 87%, and 0.94 in a test set of 808 sites. We have developed a web server, MutBLESS, to access these classification methods. Additionally, we collected a set of 9386 disease-causing (drivers) mutations based on the recurrence in patient samples and experimentally annotated as pathogenic and 8728 neutral (passenger) mutations in glioblastoma. We developed a machine learning-based method, GBMDriver, for distinguishing between driver and passenger mutations. Our method showed accuracy and AUC of 73.59% and 0.82 on 10-fold cross-validation and 81.99% and 0.87 in a blind set of 1809 mutants. We expanded the analysis of mutation data for 30 tumor types, utilizing the PAN-cancer dataset from COSMIC v97, we focused on 61,364 missense mutations spanning 682 cancer-causing genes. Employing deep learning techniques, we developed cancer-specific computational models that achieved an average classification accuracy of 84.06% through 10-fold cross-validation. We found that features such as solvent accessibility, amino acid sequence, di- and tri-peptide motifs, network-based features, and structure-based features derived from AlphaFold structures were instrumental in discriminating driver mutations and disease-prone sites. This study demonstrates the effectiveness of our computational models in accurately differentiating between driver and passenger mutations, providing valuable insights into mutation pathogenicity across diverse cancer types.

Publications:
1. Pandey, M., and Gromiha, M. M. (2023). MutBLESS: A tool to identify disease- prone sites in cancer using deep learning. Biochim Biophys Acta Mol Basis Dis. 1869(6), 166721.
2. Pandey, M., Anoosha, P., Yesudhas, D., and Gromiha, M. M. (2022). Identification of potential driver mutations in glioblastoma using machine learning. Brief Bioinform, 23(6), bbac451.
3. Pandey, M., and Gromiha, M. M. (2021). Predicting potential residues associated with lung cancer using a deep neural network. Mutat Res, 822, 111737.
4. Pandey, M., Shah, S., and Gromiha, M. M. (2023). Computational approaches for identifying disease-causing mutations in proteins. Adv Protein Chem Struct Biol. (Accepted)

Speakers

Medha Pandey (BT17D027)

Department of Biotechnology