Inferring gene-disease association by an integrative analysis of eQTL genome-wide association study and protein-protein interaction data

J Wang, J Zheng, Z Wang, H Li, M Deng - Human heredity, 2019 - karger.com
J Wang, J Zheng, Z Wang, H Li, M Deng
Human heredity, 2019karger.com
Objectives: Genome-wide association studies (GWASs) have revealed many candidate
SNPs, but the mechanisms by which these SNPs influence diseases are largely unknown. In
order to decipher the underlying mechanisms, several methods have been developed to
predict disease-associated genes based on the integration of GWAS and eQTL data (eg,
Sherlock and COLOC). A number of studies have also incorporated information from gene
networks into GWAS analysis to reprioritize candidate genes. Methods: Motivated by these …
Abstract
Objectives: Genome-wide association studies (GWASs) have revealed many candidate SNPs, but the mechanisms by which these SNPs influence diseases are largely unknown. In order to decipher the underlying mechanisms, several methods have been developed to predict disease-associated genes based on the integration of GWAS and eQTL data (e.g., Sherlock and COLOC). A number of studies have also incorporated information from gene networks into GWAS analysis to reprioritize candidate genes. Methods: Motivated by these two different approaches, we have developed a statistical framework to integrate information from GWAS, eQTL, and protein-protein interaction (PPI) data to predict disease-associated genes. Our approach is based on a hidden Markov random field (HMRF) model, and we called the resulting computational algorithm GeP-HMRF (a GWAS-eQTL-PPI-based HMRF). Results: We compared the performance of GeP-HMRF with Sherlock, COLOC, and NetWAS methods on 9 GWAS datasets, using the disease-related genes in the MalaCards database as the standard, and found that GeP-HMRF significantly improves the prediction accuracy. We also applied GeP-HMRF to an age-related macular degeneration disease (AMD) dataset. Among the top 50 genes predicted by GeP-HMRF, 7 are reported by the MalaCards database to be AMD-related with an enrichment p value of 3.61 × 10–119. Among the top 20 genes predicted by GeP-HMRF, CFHR1, CGHR3, HTRA1, and CFH are AMD-related in the MalaCards database, and another 9 genes are supported by the literature. Conclusions: We built a unified statistical model to predict disease-related genes by integrating GWAS, eQTL, and PPI data. Our approach outperforms Sherlock, COLOC, and NetWAS in simulation studies and 9 GWAS datasets. Our approach can be generalized to incorporate other molecular trait data beyond eQTL and other interaction data beyond PPI.
Karger