DOI: 10.3724/SP.J.1005.2011.01191

Hereditas (Beijing) (遗传) 2011/33:11 PP.1191-1202

RNA-Seq and its applications: a new technology for transcriptomics

The transcriptome is the complete set of transcripts for certain type of cells or tissues in a specific developmental stage or physiological condition. Transcriptome analysis can provide a comprehensive understanding of molecular mechanisms involved in specific biological processes and diseases from the information on gene structure and function. Transcriptome has been challenging due to the efficient and fast procedures of RNA-seq. RNA-seq, refers to the use of high-throughput sequencing technologies to sequence cDNA library transcribed from all RNAs in tissues or cells, can be used to quantify, profile, and discover RNA transcripts by sequence reads. Thus, the transcripts can then be mapped on the reference genome to get comprehensive genetic information, such as transcription localization and alternative splicing status. RNA-Seq has been widely used in biological, medical, clinical and pharmaceutical research. The detailed principles, technical characteristics and applications of RNA-seq are reviewed here, and the challenges and application potentials of RNA-seq in the future are also discussed. This will present the useful information for other researchers.

Key words:RNA-Seq,transcriptome,next-generation sequencing (NGS) technology

ReleaseDate:2014-07-21 16:02:14

[1] Lockhart DJ, Winzeler EA. Genomics, gene expression and DNA arrays. Nature, 2000, 405(6788): 827-836.

[2] Costa V, Angelini C, De Feis I, Ciccodicola A. Uncovering the complexity of transcriptomes with RNA-Seq. J Biomed Biotechnol, 2010, 2010: 853916.

[3] Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009, 10(1): 57-63.

[4] 454 Home Page.

[5] Illumina Home Page.

[6] Applied Biosystems Home Page. http://www.appliedbiosystems.

[7] Helicos Home Page.

[8] Magi A, Benelli M, Gozzini A, Girolami F, Torricelli F, Brandi ML. Bioinformatics for next generation sequencing data. Genes, 2010, 1(2): 294-307.

[9] Nowrousian M. Next-generation sequencing techniques for eukaryotic microorganisms: sequencing-based solutions to biological problems. Eukaryot Cell, 2010, 9(9): 1300-1310.

[10] Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol, 2008, 26(10): 1135-1145.

[11] Metzker ML. Sequencing technologies-the next generation. Nat Rev Genet, 2010, 11(1): 31-46.

[12] Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ. Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 2008, 456(7218): 53-59.

[13] Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet, 2008, 24(3): 133-141.

[14] Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen ZT, Dewell SB, Du L, Fierro JM. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005, 437(7057): 376-380.

[15] Ronaghi M, Uhlén M, Nyrén P. A sequencing method based on real-time pyrophosphate. Science, 1998, 281(5375): 363-365.

[16] Smih DR, Quinlan AR, Peckhham HE, Makowsky K, Tao W, Woolf B, Shen L, Donahue WF, Tusneem N, Stromberg MP, Stewart DA, Zhang L, Ranade SS, Warner JB, Lee CC, Coleman BE, Zhang Z, McLaughlin SF, Malek JA, Sorenson JM, Blanchard AP, Chapman J, Hillman D, Chen F, Rokhsar DS, McKernan KJ, Jeffries TW, Marth GT, Richardson PM. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res, 2008, 18(10): 1638-1642.

[17] Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M, Colonell J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M, Lapen D, Moulton K, Quake SR, Steinmann K, Thayer E, Tyurina A, Ward R, Weiss H, Xie Z. Single-molecule DNA sequencing of a viral genome. Science, 2008, 320(5872): 106-109.

[18] Harris TD, Buzby PR, Jarosz M, Gill J, Weiss H, Lapidus SN. Optical train and method for TIRF single molecule detection and analysis. US patent application, 20070070349, 2007.

[19] Haas BJ, Zody MC. Advancing RNA-Seq analysis. Nat Biotechnol, 2010, 28(5): 421-423.

[20] Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet, 2011, 12(2): 87-98.

[21] Maher CA, Palanisamy N, Brenner JC, Cao XH, Kalyana-Sundaram S, Luo SJ, Khrebtukova I, Barrette TR, Grasso C, Yu JD, Lonigro RJ, Schroth G, Kumar-Sinha C, Chinnaiyan AM. Chimeric transcript discovery by paired-end transcriptome sequencing. Proc Natl Acad Sci USA, 2009, 106(30): 12353-12358.

[22] Au KF, Jiang H, Lin L, Xing Y, Wong WH. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res, 2010, 38(14): 4570-4578.

[23] Edgren H, Murumagi A, Kangaspeska S, Nicorici D, Hongisto V, Kleivi K, Rye IH, Nyberg S, Wolf M, Borresen-Dale AL, Kallioniemi O. Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome Biol, 2011, 12(1): R6.

[24] Bashir A, Volik S, Collins C, Bafna V, Raphael BJ. Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer. PLoS Comput Biol, 2008, 4(4): e1000051.

[25] Okoniewski MJ, Miller CJ. Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations. BMC Bioinformatics, 2006, 7(1): 276.

[26] Royce TE, Rozowsky JS, Gerstein MB. Toward a universal microarray: prediction of gene expression through nearest-neighbor probe sequence identification. Nucleic Acids Res, 2007, 35(15): e99.

[27] Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression. Science, 1995, 270(5235): 484-487.

[28] Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo SJ, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol, 2000, 18(6): 630-634.

[29] Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, JaneRogers J, Bähler J. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature, 2008, 453(7199): 1239- 1243.

[30] Wang XW, Luan JB, Li JM, Bao YY, Zhang CX, Liu SS. De novo characterization of a whitefly transcriptome and analysis of its gene expression during development. BMC Genomics, 2010, 11(1): 400.

[31] Xiang LX, He D, Dong WR, Zhang YW, Shao JZ. Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish. BMC Genomics, 2010, 11(1): 472.

[32] Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH. Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol, 2008, 17(7): 1636-1647.

[33] Cloonan N, Forrest Alistair RR, Kolle G, Gardiner BBA, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham HE, Manning JM, McKernan KJ, Grimmond SE. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods, 2008, 5(7): 613-619.

[34] Morin RD, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh TJ, McDonald H, Varhol R, Jones SJM, Marra MA. Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques, 2008, 45(1): 81-94.

[35] Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008, 320(5881): 1344-1349.

[36] Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods, 2008, 5(7): 621-628.

[37] Zhang GJ, Guo GW, Hu XD, Zhang Y, Li QY, Li RQ, Zhuang RH, Lu ZK, He ZQ, Fang XD, Chen L, Tian W, Tao Y, Kristiansen K, Zhang XQ, Li SG, Yang HM, Wang J, Wang J. Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res, 2010, 20(5): 646-654.

[38] Lu TT, Lu GJ, Fan DL, Zhu CR, Li W, Zhao Q, Feng Q, Zhao Y, Guo YL, Li WJ, Huang XH, Han B. Function an-notation of the rice transcriptome at single-nucleotide resolution by RNA-seq. Genome Res, 2010, 20(9): 1238-1249.

[39] Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Park-homchuk D, Schmidt D, O'Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science, 2008, 321(5891): 956-960.

[40] Campbell MA, Haas BJ, Hamilton JP, Mount SM, Buell CR. Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics, 2006, 7: 327.

[41] Wang BB, Brendel V. Genomewide comparative analysis of alternative splicing in plants. Proc Natl Acad Sci USA, 2006, 103(18): 7175-7180.

[42] Filichkin SA, Priest HD, Givan SA, Shen R, Bryant DW, Fox SE, Wong WK, Mockler TC. Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res, 2010, 20(1): 45-58.

[43] Chen FC, Wang SS, Chaw SM, Huang YT, Chuang TJ. Plant Gene and Alternatively Spliced Variant Annotator. A plant genome annotation pipeline for rice gene and alter-natively spliced variant identification with cross-species expressed sequence tag conservation from seven plant species. Plant Physiol, 2007, 143(3): 1086-1095.

[44] Barbazuk WB, Fu Y, McGinnis KM. Genome-wide analyses of alternative splicing in plants: opportunities and challenges. Genome Res, 2008, 18(9): 1381-1392.

[45] Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, Delaney A, Gelmon K, Guliany R, Senz J, Steidl C, Holt RA, Jones S, Sun M, Leung G, Moore R, Severson T, Taylor GA, Teschendorff AE, Tse K, Turashvili G, Varhol R, Warren RL, Watson P, Zhao YJ, Caldas C, Hunts-man D, Hirst M, Marra MA, Aparicio S. Mutational evolution in a lobular breast tumour profiled at single nucleo-tide resolution. Nature, 2009, 461(7265): 809-813.

[46] Sugarbaker DJ, Richards WG, Gordon GJ, Dong LS, De Rienzo A, Maulik G, Glickman JN, Chirieac LR, Hartman ML, Taillon BE, Du L, Bouffard P, Kingsmore SF, Miller NA, Farmer AD, Jensen RV, Gullans SR, Bueno R. Transcriptome sequencing of malignant pleural mesothelioma tumors. Proc Natl Acad Sci USA, 2008, 105(9): 3521-3526.

[47] Chepelev I, Wei G, Tang QS, Zhao KJ. Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq. Nucleic Acids Res, 2009, 37(16): e106.

[48] Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res, 2008, 18(9): 1509-1517.

[49] Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, de Jong PJ. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature, 2007, 447(7146): 799-816.

[50] Clamp M, Fry B, Kamal M, Xie XH, Cuff J, Lin MF, Kellis M, Lindblad-Toh K, Lander ES. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci USA, 2007, 104(49): 19428-19433.

[51] Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell, 2009, 136(4): 629-641.

[52] Filipowicz W, Bhattacharyya SN, Sonenberg N. Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight? Nat Rev Genet, 2008, 9(2): 102-114.

[53] Lu C, Tej SS, Luo SJ, Haudenschild CD, Meyers BC, Green PJ. Elucidation of the small RNA component of the transcriptome. Science, 2005, 309(5740): 1567-1569.

[54] Xie ZX, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, Zilberman D, Jacobsen SE, Carrington JC. Genetic and functional diversication of small RNA pathways in plants. PloS Biol, 2004, 2(5): 642-652.

[55] Zhao T, Li GL, Mi SJ, Li S, Hannon GJ, Wang XJ, Qi YJ. A complex system of small RNAs in the unicellular green alga Chlamydomonas reinhardtii. Genes Dev, 2007, 21(10): 1190-1203.

[56] Burnside J, Bernberg E, Anderson A, Lu C, Meyers BC, Green PJ, Jain N, Isaacs G, Morgan RW. Marek's disease virus encodes microRNAs that map to meq and the latency-associated transcript. J Virol, 2006, 80(17): 8778-8786.

[57] Yao YY, Guo GG, Ni ZF, Sunkar R, Du JK, Zhu JK, Sun QX. Cloning and characterization of microRNAs from wheat (Triticum aestivum L.). Genome Biol, 2007, 8(6): R96.

[58] Berezikov E, Thuemmler F, van Laake LW, Kondova I, Bontrop R, Cuppen E, Plasterk RH. Diversity of microRNAs in human and chimpanzee brain. Nat Genet, 2006, 38(12): 1375-1377.

[59] Lau NC, Seto AG, Kim J, Kuramochi-Miyagawa S, Na-kano T, Bartel DP, Kingston RE. Characterization of the piRNA complex from rat testes. Science, 2006, 313(5785): 363-367.

[60] Morin RD, O'Connor MD, Griffith M, Kuchenbauer F, Delaney A, Prabhu AL, Zhao YJ, McDonald H, Zeng T, Hirst M, Eaves CJ, Marra MA. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res, 2008, 18(4): 610-621.

[61] Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature, 2009, 458(7235): 223-227.

[62] Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, Thomas K, Presser A, Bernstein BE, van Oudenaarden A, Regev A, Lander ES, Rinn JL. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci USA, 2009, 106(28): 11667-11672.

[63] 黄文涛, 郭向前, 戴甲培, 陈润生. MicroRNA, lncRNA与神经退行性疾病. 生物化学与生物物理进展, 2010, 37(8): 826-833.

[64] Wilusz JE, Sunwoo H, Spector DL. Long noncoding RNAs: functional surprises from the RNA world. Genes Dev, 2009, 23(13): 1494-1504.

[65] Peng XX, Gralinski L, Armour CD, Ferris MT, Thomas MJ, Proll S, Bradel-Tretheway BG, Korth MJ, Castle JC, Biery MC, Bouzek HK, Haynor DR, Frieman MB, Heise M, Raymond CK, Baric RS, Katze MG. Unique signatures of long noncoding RNA expression in response to virus infection and altered innate immune signaling. MBio, 2010, 1(5): e00206-10.

[66] Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell, 2008, 133(3): 523-536.

[67] Bruno VM, Wang Z, Marjani SL, Euskirchen GM, Martin J, Sherlock G, Snyder M. Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq. Genome Res, 2010, 20(10): 1451-1458.

[68] Vliet VA. Next generation sequencing of microbial transcriptomes: challenges and opportunities. FEMS Microbiol Lett, 2010, 302(1): 1-7.

[69] Tang FC, Barbacioru C, Wang YZ, Nordman E, Lee C, Xu NL, Wang XH, Bodeau J, Tuch BB, Siddiqui A, Lao KQ, Surani MA. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods, 2009, 6(5): 377-382.

[70] Croucher NJ, Fookes MC, Perkins TT, Turner DJ, Mar-guerat SB, Keane T, Quail MA, He M, Assefa S, Bähler J, Kingsley RA, Parkhill J, Bentley SD, Dougan G, Thomson NR. A simple method for directional transcriptome sequencing using Illumina technology. Nucleic Acids Res, 2009, 37(22): e148.

[71] Vivancos AP, Güell M, Dohm JC, Serrano L, Himmel-bauer H. Strand-specific deep sequencing of the transcriptome. Genome Res, 2010, 20(7): 989-999.