

浏览全部资源
扫码关注微信
1.山东大学软件学院,山东济南 250101
2.山东大学人工智能国际联合研究院,山东济南 250101
3.中国农业大学农学院,北京 100083
4.北京建筑大学电气与信息工程学院,北京 100044
Published Online:10 July 2023,
移动端阅览
TAN Hao-jiang, WANG Jun, YU Guo-xian, et al. Individual Random Walks for Gene-Phenotype Association Analysis[J/OL]. ACTA ELECTRONICA SINICA, 2023, 1-14.
基因与表型间的关联分析对揭示生物的内在遗传关联具有重要意义. 随机游走算法可以融合多组学数据,聚合一阶或高阶邻居的标签信息,对网络中不同节点间关联信息进行补全,提高关联预测的准确度,进而发现基因和表型间潜在的遗传关联. 但现有随机游走算法通常平等地对待每个节点,忽略了不同节点的重要性,使得非重要节点过度传播,降低了模型性能. 为此,本文提出了一种基于多组学数据融合的个性化随机游走算法(individual Multiple Random Walks, iMRW),在由基因、miRNA及表型节点构建的多组学异质网络上,基于网络拓扑结构,设计个性化多元随机游走策略,为不同重要程度的节点分配不同的游走步长,并结合高斯相互作用属性核相似性与随机游走,对网络不同节点及节点间关联信息进行补全,最终实现多源基因-表型关联矩阵的融合,准确获取基因-表型关联预测矩阵. 在不同实验设置下,与主流算法的对比实验结果均显示iMRW能够取得更优的预测性能. 在玉米光合作用能力和淀粉含量表型的实验分析结果也进一步证实了iMRW在识别潜在的基因-表型关联的实用性与有效性.
Association analysis between genes and phenotypes is crucial to reveal the inherent genetic association of organisms. Random walk-based algorithms can fuse multiple omics data
aggregate the label information of first-order or higher-order neighbors
complete the association information between different nodes in the network
improve the accuracy of association prediction and further discover the potential genetic associations between genes and phenotypes. However
existing random walk algorithms usually treat each node equally and ignore the varying importance of different nodes
as such non-important nodes can be excessively propagated and the model performance is compromised. To this end
an individual Multiple Random Walks (iMRW) algorithm based on multi-omics data fusion is proposed. On the heterogeneous genetic network composed with genes
miRNAs and phenotype nodes
we design the individual multiple random walks strategy based on the network topology
assign nodes of different importance with different walking lengths. We then complete the genetic information of different nodes by fusing multi-source association matrix
Gaussian interaction profile kernel similarity and random walk
and accurately obtain the gene-phenotype association prediction matrix. Under different experimental settings
iMRW can achieve the best prediction performance compared with the state-of-the-art algorithms. The case study with respect to maize photosynthetic ability and starch content further confirm the usefulness and effectiveness of iMRW in identifying potential gene-phenotype associations.
LI Y F , WU F X , ALIOUNE N . A review on machine learning principles for multi-view biological data integration [J]. Briefings in Bioinformatics , 2018 , 19 ( 2 ): 325 - 340 .
PAN Y , LEI X J , ZHANG Y C . Association predictions of genomics , proteinomics , transcriptomics , microbiome , metabolomics , pathomics , radiomics , drug , symptoms , factor environment , and networks disease : a comprehensive approach [J]. Medicinal research reviews , 2022 , 42 ( 1 ): 441 - 461 .
DING Y L , LEI X J , LIAO B , et al . Machine learning approaches for predicting biomolecule–disease associations [J]. Briefings in Functional Genomics , 2021 , 20 ( 4 ): 273 - 287 .
PIERUSCHKA R , POORTER H . Phenotyping plants: genes, phenes and machines [J]. Functional Plant Biology , 2012 , 39 ( 11 ): 813 - 820 .
YANG W N , DUAN L F , CHEN G X , et al . Plant phenomics and high-throughput phenotyping: accelerating rice functional genomics using multidisciplinary technologies [J]. Current opinion in plant biology , 2013 , 16 ( 2 ): 180 - 187 .
DHONDT S , WUYTS N , INZE D . Cell to whole-plant phenotyping: the best is yet to come [J]. Trends in plant science , 2013 , 18 ( 8 ): 428 - 439 .
PENG C , LI A , WANG M H . Discovery of bladder cancer-related genes using integrative heterogeneous network modeling of multi-omics data [J]. Scientific reports , 2017 , 7 ( 1 ): 1 - 11 .
DAVIS B D . The isolation of biochemically deficient mutants of bacteria by means of penicillin [J]. Proceedings of the National Academy of Sciences of the United States of America , 1949 , 35 ( 1 ): 1 - 10 .
SOULE M . Phenetics of natural populations i. phenetic relationships of insular populations of the side-blotched lizard [J]. Evolution , 1967 , 21 ( 3 ): 584 - 591 .
SCHORK N J . Genetics of complex disease: approaches, problems, and solutions [J]. American journal of respiratory and critical care medicine , 1997 , 156 ( 4 ): S103 - S109 .
GANDHI T K B , ZHONG J , MATHIVANAN S , et al . Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets [J]. Nature genetics , 2006 , 38 ( 3 ): 285 - 293 .
OTI M , BRUNNER H G . The modular nature of genetic diseases [J]. Clinical genetics , 2007 , 71 ( 1 ): 1 - 11 .
XIE M Q , XU Y J , ZHANG Y G , et al . Network-based phenome-genome association prediction by bi-random walk [J]. PLoS one , 2015 , 10 ( 5 ): 1 - 18 .
PETEGROSSO R , PARK S , HWANG T H , et al . Transfer learning across ontologies for phenome–genome association prediction [J]. Bioinformatics , 2017 , 33 ( 4 ): 529 - 536 .
傅广垣 , 余国先 , 王峻 , 等 . 基于正负样例的蛋白质功能预测 [J]. 计算机研究与发展 , 2016 , 53 ( 8 ): 1753 - 1765 .
FU G Y , YU G X , WANG J , et al . Protein function prediction using positive and negative examples [J]. Journal of Computer Research and Development , 2016 , 53 ( 8 ): 1753 - 1765 (in Chinese) .
李敏 , 王晓桐 , 罗慧敏 , 等 . 随机游走技术在网络生物学中的研究进展 [J]. 电子学报 , 2018 , 46 ( 8 ): 2035 - 2048 .
LI M , WANG X T , LUO H M , et al . Progress on random walk and its application in network biology [J]. Acta Electronica Sinica , 2018 , 46 ( 8 ): 2035 - 2048 (in Chinese) .
KOHLER S , BAUR S , HORN D , et al . Walking the interactome for prioritization of candidate disease genes [J]. The American journal of human genetics , 2008 , 82 ( 4 ): 949 - 958 .
VANUNU O , MAGGER O , RUPPIN E , et al . Associating genes and protein complexes with disease via network propagation [J]. PLoS computational biology , 2010 , 6 ( 1 ): 1 - 9 .
CHEN Y , JIANG T , JIANG R . Uncover disease genes by maximizing information flow in the phenome–interactome network [J]. Bioinformatics , 2011 , 27 ( 13 ): i167 - i176 .
谢雨洋 , 冯栩 , 喻文健 , 等 . 基于随机化矩阵分解的网络嵌入方法 [J]. 计算机学报 , 2021 , 44 ( 3 ): 447 - 461 .
XIE Y Y , FENG X , YU W J , et al . Learning network embedding with randomized matrix factorization [J]. Chinese Journal Computers , 2021 , 44 ( 3 ): 447 - 461 (in Chinese) .
ZHANG Y G , LIU J H , LIU X H , et al . Prioritizing disease genes with an improved dual label propagation framework [J]. BMC bioinformatics , 2018 , 19 ( 1 ): 1 - 12 .
RITCHIE M D , HOLZINGER E R , LI R , et al . Methods of integrating data to uncover genotype–phenotype interactions [J]. Nature reviews genetics , 2015 , 16 ( 2 ): 85 - 97 .
FU G Y , WANG J , DOMENICONI C , et al . Matrix factorization-based data fusion for the prediction of lncrna–disease associations [J]. Bioinformatics , 2018 , 34 ( 9 ): 1529 - 1537 .
CHEN X , ZHANG D H , YOU Z H . A heterogeneous label propagation approach to explore the potential associations between mirna and disease [J]. Journal of translational medicine , 2018 , 16 ( 1 ): 1 - 14 .
马慧芳 , 贾美惠子 , 张迪 , 等 . 融合标签关联关系与用户社交关系的微博推荐方法 [J]. 电子学报 , 2017 , 45 ( 1 ): 112 - 118 .
MA H F , JIA M H Z , ZANG D , et al . Microblog recommendation based on tag correlation and user social relation . Acta Electronica Sinica , 2017 , 45 ( 1 ): 112 - 118 (in Chinese) .
HUANG Q Y , WANG J , ZHANG X L , et al . Isoform-disease association prediction by data fusion [C]// International Symposiumon Bioinformatics Research and Applications . [S.l.] : Springer , 2020 : 44 - 55 .
GARTNER T , STEINFATH M , ANDORF S , et al . Improved heterosis prediction by combining information on dna-and metabolic markers [J]. PLoS one , 2009 , 4 ( 4 ): 1 - 12 .
RIEDELSHEIMER C , TECHNOW F , MELCHINGER A E . Comparison of whole-genome prediction models for traits with contrasting genetic architecture in a diversity panel of maize inbred lines [J]. BMC genomics , 2012 , 13 ( 1 ): 1 - 9 .
JINAG J , XING F , WANG C Y , et al . Investigation and development of maize fused network analysis with multi-omics [J]. Plant Physiology and Biochemistry , 2019 , 141 ( 1 ): 380 - 387 .
XU Y , XU C , XU S . Prediction and association mapping of agronomic traits in maize using multiple omic data [J]. Heredity , 2017 , 119 ( 3 ): 174 - 184 .
JIANG J , XING F , ZENG X X , et al . Investigating maize yield-related genes in multiple omics interaction network data [J]. IEEE Transactions on Nanobioscience , 2019 , 19 ( 1 ): 142 - 151 .
YU G X , YANG Y Q , YAN Y Y , et al . Deepida: predicting isoform-disease associations by data fusion and deep neural networks [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2021 , PP( 99 ): 1 - 1 .
COOPER L , MEIER A , LAPORTE M A , et al . The planteome database: an integrated resource for reference ontologies, plant genomics and phenomics [J]. Nucleic acids research , 2018 , 46 ( D1 ): D1168 - D1180 .
VALENTINI G . True path rule hierarchical ensembles for genome-wide gene function prediction [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2010 , 8 ( 3 ): 832 - 847 .
YU G X , FU G Y , WANG J , et al . Newgoa: predicting new go annotations of proteins by bi-random walks on a hybrid graph [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2017 , 15 ( 4 ): 1390 - 1402 .
AMBROS V . micrornas: tiny regulators with great potential [J]. Cell , 2001 , 107 ( 7 ): 823 - 826 .
AMBROS V . The functions of animal micrornas [J]. Nature , 2004 , 431 ( 7006 ): 350 - 355 .
王磊 , 徐涛 , 宋传东 , 等 . 基于深度学习的miRNA与疾病相关性预测算法 [J]. 电子学报 , 2020 , 48 ( 5 ): 870 - 877 .
WANG L , XU T , SONG C D , et al . Prediction algorithm of association between miRNAs and diseases based on deep learning [J]. Acta Electronica Sinica , 2020 , 48 ( 5 ): 870 - 877 (in Chinese) .
MISKA E A . How micrornas control cell division, differentiation and death [J]. Current opinion in genetics & development , 2005 , 15 ( 5 ): 563 - 568 .
BARTEL D P . Micrornas: target recognition and regulatory functions [J]. Cell , 2009 , 136 ( 2 ): 215 - 233 .
GUO Z L , KUANG Z , WANG Y , et al . Pmiren: a comprehensive encyclopedia of plant mirnas [J]. Nucleic acids research , 2020 , 48 ( D1 ): D1114 - D1121 .
KUANG Z , WANG Y , LI L , et al . mirdeep-p2: accurate and fast analysis of the microrna transcriptome in plants [J]. Bioinformatics , 2019 , 35 ( 14 ): 2521 - 2522 .
LIANG C , YU S P , LUO J W . Adaptive multi-view multi-label learning for identifying disease-associated candidate mirnas [J]. PLoS computational biology , 2019 , 15 ( 4 ): 1 - 18 .
PAN Q C , WEI J F , GUO F , et al . Trait ontology analysis based on association mapping studies bridges the gap between crop genomics and phenomics [J]. BMC genomics , 2019 , 20 ( 1 ): 1 - 13 .
HU J L , GAO Y Q , LI J , et al . A novel algorithm based on bi-random walks to identify disease-related lncrnas [J]. BMC bioinformatics , 2019 , 20 ( 18 ): 1 - 11 .
YU G X , WANG K Y , DOMENICONI C , et al . Isoform function prediction based on bi-random walks on a heterogeneous network [J]. Bioinformatics , 2020 , 36 ( 1 ): 303 - 310 .
XIE G B , WU C H , GU G S , et al . Haubrw: Hybrid algorithm and unbalanced bi-random walk for predicting lncrna-disease associations [J]. Genomics , 2020 , 112 ( 6 ): 4777 - 4787 .
ZHAO Y W , WANG J , GUO M Z , et al . Cross-species protein function prediction with asynchronous-random walk [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2019 , 18 ( 4 ): 1439 - 1450 .
WANG Y H , GUO M Z , REN Y Z , et al . Drug repositioning based on individual bi-random walks on a heterogeneous network [J]. BMC bioinformatics , 2019 , 20 ( 15 ): 1 - 13 .
PAN Z X , ZHANG H X , LIANG C , et al . Self-weighted multi-kernel multi-label learning for potential mirna-disease association prediction [J]. Molecular Therapy-Nucleic Acids , 2019 , 17 ( 1 ): 414 - 423 .
YIN M M , CUI Z , GAO M M , et al . Lwpcmf: logistic weighted profile based collaborative matrix factorization for predicting mirna-disease associations [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2021 , 18 ( 3 ): 1122 - 1129 .
TAN H J , SUN Q M , LI G H , et al . Multiview consensus graph learning for lncrna–disease association prediction [J]. Frontiers in Genetics , 2020 , 11 ( 89 ): 1 - 10 .
PENG W , LI M , CHEN L , et al . Predicting protein functions by using unbalanced random walk algorithm on three biological networks [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2017 , 14 ( 2 ): 360 - 369 .
YU L M , SHEN X J , ZHONG D , et al . Three-layer heterogeneous network combined with unbalanced random walk for mirna-disease association prediction [J]. Frontiers in Genetics , 2020 , 10 ( 1316 ): 1 - 10 .
KIPF T N , WELLING M . Semi-supervised classification with graph convolutional networks [C]. International Conference on Learning Representations , 2017 , 1 – 14 .
VELICKOVIC P , CUCURULL G , CASANOVA A , et al . Graph attention networks [C]. International Conference on Learning Representations , 2018 , 1 - 12 .
RADIVOJAC P , CLARK W T , ORON T R , et al . A large-scale evaluation of computational protein function prediction [J]. Nature Methods , 2013 , 10 ( 3 ): 221 - 227 .
CLARK W T , RADIVOJAC P . Information-theoretic evaluation of predicted ontological annotations [J]. Bioinformatics , 2013 , 29 ( 13 ): i53 - i61 .
JIANG Y X , ORON T R , CLARK W T , et al . An expanded evaluation of protein function prediction methods shows an improvement in accuracy [J]. Genome biology , 2016 , 17 ( 1 ): 1 - 19 .
ZHOU G J , WANG J , ZHANG X L , et al . Predicting functions of maize proteins using graph convolutional network [J]. BMC bioinformatics , 2020 , 21 ( 16 ): 1 - 16 .
CONSORTIUM G O . The gene ontology resource: 20 years and still going strong [J]. Nucleic acids research , 2019 , 47 ( D1 ): D330 - D338 .
ALLEN J F , ALEXCIEV K , HAKANSSON G . Photosynthesis: Regulation by redox signalling [J]. Current Biology , 1995 , 5 ( 8 ): 869 - 872 .
ECANS J R . Photosynthesis and nitrogen relationships in leaves of c3 plants [J]. Oecologia , 1989 , 78 ( 1 ): 9 - 19 .
0
Views
28
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621