DOI: 10.3724/SP.J.1206.2012.00410

Progress in Biochemistry and Biophysics (生物化学与生物物理进展) 2013/40:12 PP.1256-1264

A Method of Pathway Enrichment Analysis Based Gene Expression Variability

Current pathway enrichment method is mainly based on the gene that are differentially expressed, andno enrichment method considers pathway variability (variance). We observed that in the phenotype of disease,some pathways have a significant increase or decrease in variability describing appropriate statistics. Therefore, inthis article, we hypothesize that the variation of single pathway is significantly different between two phenotypes.We designed fourteen types of statistics coupled with their test methods to analyze pathways variation and thepathways enrichment significance between two phenotypes, and we compared the results with those obtained bydocument retrieval. At the same time, the results of five different data preprocessing methods on data wereinvestigated. The results show that RMA is stable in the five gene expression data preprocessing methods. Thepathway variation is different between the two phenotypes. According to the literature research results, thepermutation test coupled with the variance of Euclidean distance of each gene (the eleventh method) can identifysignificant pathways more efficiently than GSEA. In conclusion, pathway enrichment analysis strategy based on thepathway variation is feasible, which could be a theoretical guideline for enrichment analysis and a new biologicalinsights of study in human diseases.

Key words:variation,pathway,enrichment analysis,preprocessing method

ReleaseDate:2015-04-18 09:14:52

[1] Huang D W, Sherman B T, Lempicki R A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucl Acid Res, 2009, 37(1): 1-13

[2] Huang D W, Sherman B T, Tan Q, et al. The DAVID gene functional classification tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biology,2007, 8(9): R183

[3] Bei βbarth T, Speed T P. GOstat: find statistically overrepresented gene ontologies within a group of genes. Bioinformatics, 2004,20(9): 1464-1465

[4] Hosack D A, Jr D G, Sherman B T, et al. Identifying biological themes within lists of genes with EASE. Genome Biology, 2003,4(10): R70

[5] Subramanian A, Tamayo P, Mootha V K, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genomewide expression profiles. Proc Natl Acad Sci USA, 2005, 102(43):15545-15550

[6] Kim S Y, Volsky D J. PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics, 2005, 6(1): 144

[7] Backes C, Keller A, Kuentzer J, et al. GeneTrail —advanced gene set enrichment analysis. Nucl Acid Res, 2007, 35(suppl 2), W186-192

[8] Bauer S, Grossmann S, Vingron M, et al. Ontologizer 2.0 - A multifunctional tool for GO term enrichment analysis and data Table 8 Area under ROC curve of fourteen types of methods and GSEA expl orat i on. Bi oi nform at i cs, 2008, 24(14): 1650-1651

[9] Alexa A, Rahnenführer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics, 2006, 22(13): 1600-1607

[10] Carmona-Saez P, Chagoyen M, Tirado F, et al.

[11] Ho J W K, Stefani M, Remedios C G D, et al. Differential variability analysis of gene expression and its application to human diseases. Bioinformatics, 2008, 24 (13): i390-i398

[12] Stearman R S, Dwyer-Nield L, Zerbe L, et al. . Analysis of orthologous gene expression between human pulmonary adenocarcinoma and a carcinogen-induced murine model. Am J Pathol, 2005, 167(6): 1763-1775

[13] Irizarry R A, Bolstad B M, Collin F, et al. Summaries of Affymetrix GeneChip probe level data. Nucl Acid Res, 2003, 31(4): e15

[14] Affymetrix Inc. Statistical algorithms description document. Santa Clara: Affymetrix Inc, 2002(2013-03-28). http://media.affymetrix. com/support/technical/whitepapers/sadd_whitepaper.pdf

[15] Wu Z, Irizarry R A, Gentleman R, et al.. A model-based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc,2004, 99(468): 909-917

[16] Hochreiter S, Clevert D A, Obermayer K. A new summarization method for Affymetrix probe level data. Bioinformatics, 2006,22(8): 943-949

[17] Chen Z, McGee1 M, Liu Q Z, et al.

[18] Ogata H, Goto S, Sato K, et al.

[19] Rousseeuw P J, Croux C. Alternatives to the median absolute deviation. J Am Stat Assoc,1993, 88(424): 1273-1283

[20] Bhardwaj N, Lu H. Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics, 2005, 21(11): 2730-2738

[21] Prasad T S K, Goel R, Kandasamy K, et al. Human protein reference database——2009 update. Nucl Acid Res, 2009, 37(suppl 1):D767-772

[22] Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc B, 1995, 57(1): 289-300

[23] 刘 杰, 林一帆, 张 沥, 等. 图象自动分析检测 MG7 抗原表达预 测 胃 癌 高 危 价 值 探 讨 . 中 华 预 防 医 学 杂 志 , 1996, 30 (5):286-288Liu J, Lin Y F, Zhang L, et al. Chin J Prev Med, 1996, 30 (5):286-288

[24] Tian L, Greenberg S A, Kong S W, et al. Discovering statistically significant pathways in expression profiling studie. Proc Nat Acad Sci USA, 2005, 102(38): 13544-13549

[25] Subramanian A, Tamayo P, Mootha V K, et al. . Gene set enrichment analysis: a knowledge-based approach for interpreting genomewide expression profiles. Proc Natl Acad Sci USA, 2005, 102(43):15545-15550

[26] Sunaga N, Kohno T, Kolligs F T, et al. Constitutive activation of the Wnt signaling pathway by CTNNB1 (β-catenin) mutations in a subset of human lung adenocarcinoma. Genes, Chromosomes and Cancer, 2001, 30(3): 316-321

[27] Chen Y B, Marco M A D, Graziani I, et al. Oxygen concentration determines the biological effects of NOTCH-1 signaling in adenocarcinoma of the lung. Cancer Res, 2007, 67(17): 7954-7959

[28] Chen N, Karantza-Wadsworth V. Role and regulation of autophagy in cancer. Biochim Biophys Acta - Mol Cell Res, 2009, 1793 (9):1516-1523