CN101976296B - Method for interactive network predication in combination with plant microRNA target based on next generation of sequencing data - Google Patents

Method for interactive network predication in combination with plant microRNA target based on next generation of sequencing data Download PDF

Info

Publication number
CN101976296B
CN101976296B CN2010102816834A CN201010281683A CN101976296B CN 101976296 B CN101976296 B CN 101976296B CN 2010102816834 A CN2010102816834 A CN 2010102816834A CN 201010281683 A CN201010281683 A CN 201010281683A CN 101976296 B CN101976296 B CN 101976296B
Authority
CN
China
Prior art keywords
microrna
data
plant
mutually
pare
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010102816834A
Other languages
Chinese (zh)
Other versions
CN101976296A (en
Inventor
陈铭
孟一君
苟凌峰
陈迪俊
白琳
黄冬林
克里斯汀·克鲁卡斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN2010102816834A priority Critical patent/CN101976296B/en
Publication of CN101976296A publication Critical patent/CN101976296A/en
Application granted granted Critical
Publication of CN101976296B publication Critical patent/CN101976296B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for interactive network predication in combination with a plant microRNA target based on the next generation of sequencing data. The method comprises the following steps of: (1) collecting plant microRNA and genome data; (2) processing the plant microRNA data; (3) using miRU to predicate the target point of the plant microRNA; (4) collecting PARE signal data; (5) establishing an MiR-Tar module of a PmiPKB database; (6) using PARE signal data to verify the interaction of the plant microRNA target; and (7) constructing a plant microRNA target interaction network. The invention integrates the RNA tail end parallel analysis data of rice and Arabidopsis, provides PARE signal information mapped close to the junction of the target gene mRNA and microRNA, can be used for identifying and predicating whether true cutting regulation relation exists between the microRNA-target mRNA; and the PARE data sets from different tissue materials can be compared to reveal the tissue specificity of the regulation relation. The invention can predicate the traditional microRNA target interaction network of rice and Arabidopsis to obtain a final network model through further manual screening, thereby achieving extremely high reliability.

Description

Plant microRNA target position in conjunction with based on sequencing data of future generation is made the network forecast method mutually
Technical field
The present invention relates to a kind of plant microRNA target position and make the network forecast method mutually based on sequencing data of future generation.
Background technology
Plant microRNA is the long non-coding RNA of one type of 20-24 base, is important function of gene controlling element [10].After being loaded into RNA guiding silencing complex (RISC), plant microRNA can guide the cutting in complementary site with its highly complementary target gene mRNA, reduces target gene expression level [10].Most of encoding transcription factor in the target gene, this makes the modification scope of plant microRNA almost spread all over whole genome [12].Therefore microRNA has played vital role in the multiple bioprocess of plant, comprises development of plants, stress reaction and microRNA approach self [12].
To discovering of plant microRNA a large amount of plant microRNA, set up special microRNA database [9,19] on this basis.MiRBase is a comprehensive microRNA database, has comprised the microRNA that has delivered in the animals and plants, and information [9] such as microRNA sequence, precursor sequence, precursor secondary structure, genome context and list of references are provided.PMRD is a special plant microRNA database, has contained more plant species, and has comprised the microRNA [19] of the no experimental verification that predicts in a large number.For paddy rice and arabidopsis etc. the species of mRNA sequence data are arranged, PMRD has also listed the target gene [19] that predicts.
As important model organism, paddy rice and arabidopsis have a large amount of bioinformatics resources, comprising: the genome sequence of detailed note, polymorphism data, and a large amount of high-flux sequence number [7,11,13-17].In these data, much can be used for the research of plant microRNA.
With sequencing data or microarray experiment, detected SNPs (SNP) [7,13,15] a large amount of between paddy rice and arabidopsis subspecies.The SNP of microRNA precursor can influence the folding of microRNA precursor, and then has influence on identification and the cutting [10] of DCL1 to the microRNA precursor.The SNP of ripe body of microRNA or target gene binding site can change the complementary degree of microRNA and target gene mRNA, thereby changes the cutting efficiency [10] of microRNA to mRNA.Can utilize the SNP data to study SNP in of the contribution of microRNA approach level to subspecies difference.
Large-scale parallel signal order-checking (MPSS) is a kind of high throughput sequencing technologies of studying gene expression, and paddy rice and arabidopsis have a large amount of MPSS data [14].Plant microRNA is a transcriptional units independently, equally with protein coding gene is transcribed by RNA two type polymerases, has 5 ' cap and 3 ' poly A tail [10].Therefore, can analyze the microRNA expression of gene with the MPSS data.In addition, MPSS's transcribes signal and can reference be provided for microRNA gene transcription interval and genetic model.
The terminal parallel parsing of RNA (PARE) is a kind of degraded group high throughput sequencing technologies, measures the 5 ' terminal sequence that product is cut in the 3 ' end-grain cutting that poly A tail is arranged, and paddy rice and arabidopsis also have a large amount of PARE data [14].Plant microRNA and target gene mRNA are highly complementary, mainly guide the cutting of target gene mRNA, and cleaved products can be detected [8] by the PARE technology.Therefore, the PARE data can be used for the analysis of microRNA to target gene mRNA cutting action.In addition, the biology of microRNA needs the cutting of DCL1, and microRNA also possibly guide the cutting of microRNA precursor self, can analyze these cutting actions [8] with the PARE data.
List of references
[1]Apache?HTTP?Server?Project:http://httpd.apache.org/.
[2]PostgreSQL:http://www.postgresql.org/.
[3]Scalable?Vector?Graphics:http://www.w3.org/Graphics/SVG/.
[4]Vienna?RNA?Package:http://www.tbi.univie.ac.at/~1vo/RNA/.
[5]S.F.Altschul,T.L.Madden,A.A.Schaffer,J.Zhang,Z.Zhang,W.Miller,and?D.J.Lipman.Gapped?BLAST?and?PSI-BLAST:a?new?generation?of?protein?database?search?programs.Nucl.Acids?Res.,25:3389-3402,1997.
[6]R.Bruccoleri?and?G?Heinrich.ComputerApplications?in?the?Biosciences,4:167-173,1988.
[7]F?Alex?Feltus,Jun?Wan,Stefan?R?Schulze,James?C?Estill,Ning?Jiang,and?Andrew?H?Paterson.An?SNP?resource?for?rice?genetics?and?breeding?based?on?subspecies?indica?and?japonica?genome?alignments.Genome?Res.,14:1812-9,2004.
[8]Marcelo?A?German,Manoj?Pillay,Dong-Hoon?Jeong,Amit?Hetawal,Shujun?Luo,Prakash?Janardhanan,Vimal?Kannan,Linda?A?Rymarquis,Kan?Nobuta,Rana?German,Emanuele?De?Paoli,Cheng?Lu,Gary?Schroth,Blake?C?Meyers,and?Pamela?J?Green.Globalidentification?of?microRNA-target?RNA?pairs?by?parallel?analysis?ofRNA?ends.Nat.Biotechnol.,26:941-6,2008.
[9]Sam?Griffiths-Jones,Harpreet?Kaur?Saini,Stijn?van?Dongen,and?Anton?J?Enright.miRBase:tools?for?microRNA?genomics.Nucleic?Acids?Res.,36:D?154-8,2008.
[10]MatthewW.Jones-Rhoades,David?P.?Bartel,and?Bonnie?Bartel.MicroRNAs?and?their?regulatory?roles?in?plants.Annual?Review?ofPlant?Biology,57:19-53,2006.
[11]Yong-Fang?Li,Yun?Zheng,Charles?Addo-Quaye,Li?Zhang,Aj?ay?Saini,Guru?Jagadeeswaran,Michael?J?Axtell,Weixiong?Zhang,and?Ramanjulu?Sunkar.Transcriptome-wide?identification?of?microRNA?targets?in?rice.The?Plant?journal:for?cell?and?molecular?biology,2010.
[12]Allison?C?Mallory?and?Herv′e?Vaucheret.Functions?of?microRNAs?and?related?small?RNAs?in?plants.Nat.Genet.,38:S31-6,2006.
[13]Kenneth?L?McNally,Kevin?L?Childs,Regina?Bohnert,Rebecca?M?Davidson,Keyan?Zhao,Victor?J?Ulat,GeorgZeller,Richard?M?Clark,Douglas?R?Hoen,Thomas?E?Bureau,Renee?Stokowski,Dennis?G?Ballinger,?Kelly?A?Frazer,David?R?Cox,Badri?Padhukasahasram,Carlos?D?Bustamante,Detlef?Weigel,David?J?Mackill,Richard?M?Bruskiewich,Gunnar?R″atsch,C?Robin?Buell,Hei?Leung,and?Jan?E?Leach.Genomewide?SNP?variation?reveals?relationships?among?landraces?and?modern?varieties?ofrice.Proc.Natl.Acad.Sci.U.S.A.,106:12273-8,2009.
[14]Mayumi?Nakano,Kan?Nobuta,Kalyan?Vemaraju,Shivakundan?Singh?Tej,JeremyWSkogen,and?Blake?C?Meyers.Plant?MPSS?databases:signature-based?transcriptional?resources?for?analyses?of?mRNA?and?small?RNA.Nucleic?Acids?Res.,34:D731-5,2006.
[15]Seung?Yon?Rhee,William?Beavis,Tanya?Z.Berardini,Guanghong?Chen,David?Dixon,Aisling?Doyle,Margarita?Garcia-Hernandez,Eva?Huala,Gabriel?Lander,Mary?Montoya,Neil?Miller,Lukas?A.Mueller,Suparna?Mundodi,Leonore?Reiser,Julie?Tacklind,Dan?C.Weems,Yihe?Wu,Iris?Xu,Daniel?Yoo,Jungwon?Yoon,and?Peifen?Zhang.The?Arabidopsis?Information?Resource(TAIR):a?model?organism?database?providing?a?centralized,curated?gateway?to?Arabidopsis?biology,research?materials?and?community.Nucl.Acids?Res.,31:224-228,2003.
[16]Qiaoping?Yuan,Shu?Ouyang,Aihui?Wang,Wei?Zhu,Rama?Maiti,Haining?Lin,John?Hamilton,Brian?Haas,Razvan?Sultana,Foo?Cheung,Jennifer?Wortman,and?C.Robin?Buell.The?Institute?for?Genomic?Research?Osal?rice?genome?annotation?database.Plant?Physiol.,138:18-26,2005.
[17]Guojie?Zhang,Guangwu?Guo,Xueda?Hu,Yong?Zhang,Qiye?Li,Ruiqiang?Li,Ruhong?Zhuang,Zhike?Lu,Zengquan?He,Xiaodong?Fang,Li?Chen,Wei?Tian,Yong?Tao,Karsten?Kristiansen,Xiuqing?Zhang,Songgang?Li,Huanming?Yang,Jian?Wang,and?Jun?Wang.Deep?RNA?sequencing?at?single?base-pair?resolution?reveals?high?complexity?of?the?rice?transcriptome.Genome?Res.,20:646-54,2010.
[18]Yuanji?Zhang.miRU:an?automated?plant?microRNA?target?prediction?server.Nucleic?Acids?Res.,33:W701-4,2005.
[19]Zhenhai?Zhang,Jingyin?Yu,Daofeng?Li,Zuyong?Zhang,Fengxia?Liu,Xin?Zhou,Tao?Wang,Yi?Ling,and?Zhen?Su.PMRD:plant?microRNA?database.Nucleic?Acids?Res.,38:D806-13,2010.
Summary of the invention
The purpose of this invention is to provide a kind of plant microRNA target position and make the network forecast method mutually based on sequencing data of future generation.
Making the network forecast method mutually based on the plant microRNA target position of sequencing data of future generation comprises the steps:
1) collects plant microRNA and genomic data;
2) handle plant microRNA data;
3) use miRU to predict the target site of plant microRNA;
4) collect the PARE signal data;
5) set up " MiR-Tar " module of PmiPKB database;
6) utilize PARE signal data checking plant microRNA target position to concern mutually;
7) make up plant microRNA target position and make network mutually.
Described collection plant microRNA and genomic data step are: the microRNA data of paddy rice and arabidopsis come from the miRBase of version 15; Wherein, paddy rice has 498 of ripe body sequences, 449 of precursor sequence; Arabidopsis has 224 of ripe body sequences; 199 of precursor sequence, the genomic data of paddy rice comes from the TIGR of version 6.1, and the genomic data of arabidopsis comes from the TAIR of version 9.
Described processing plant microRNA data step is: the microRNA data of miRBase are the EMBL form, and the genome coordinate data is the GFF form, use the PERL script to resolve these data, deposit it in database, and all sequences all convert capitalization to.
The target site step of described use miRU software prediction plant microRNA is: import microRNA and the rice genome data of paddy rice respectively, select the default parameters of miRU software, then the gene target site of paddy rice microRNA is predicted; Import the microRNA and the arabidopsis gene group data of arabidopsis respectively, select the default parameters of miRU software, then the gene target site of arabidopsis microRNA is predicted.
Described collection PARE signal data step is: the PARE signal data is from 10 data sets of NGSD and 1 data set of Yongfang Li, and former data are carried out the normalization processing.
" MiR-Tar " module step of the described PmiPKB of foundation database is: with near the PARE signal data the SVG diagrammatic representation microRNA gene.Illustrated scope is totally one ten thousand base-pairs about microRNA precursor-gene group coordinate, and data set is vertically arranged, and makes things convenient for the user to compare.
The described PARE of utilization signal data checking plant microRNA target position as relationship step is mutually: use " MiR-Tar " module in the PmiRKB database; Whole target sites that graphical output contains the PARE signal data concern mutually; Amount to 8253 pairs; Carry out artificial screening again and proofread and correct, finally obtain 3077 pairs of higher microRNA target position of reliability and concern mutually.
Described prediction plant microRNA target position is made the network step mutually and is: the higher microRNA target position of 3077 pairs of reliabilities that will obtain is done mutually to concern and is stored in the text of separating with the tab key; Utilize NeAT that text file is converted into general GML network format file; Use yED network visualization instrument that these 3077 pairs of microRNA target position are made relation mutually and carry out visualization processing, construct plant microRNA target position and make network mutually.
The present invention has integrated the terminal parallel parsing data of RNA of paddy rice, arabidopsis; Near the PARE signal message that is mapped to target gene mRNA and the microRNA binding site is provided, has can be used for differentiating between the microRNA-target mRNA of prediction whether have real cutting regulation relationship; Can compare to disclose the tissue specificity of this regulation relationship between PARE data set from the different tissues material.In addition; Existing PARE data have been integrated again; The PARE that is mapped on the pre-microRNA is provided RST; Can be used for monitoring the processing situation of DCL1 to pri-or pre-microRNA, and microRNA or microRNA* be to the effect of cutting certainly of its microRNA precursor, the difference between tissue still can relatively be observed through inter-library.At last paddy rice and the existing microRNA target position of arabidopsis are done mutually to concern that carrying out artificial screening proofreaies and correct; Obtaining 3077 pairs of higher microRNA target position of reliability concerns mutually; Made up network model and carried out the network visualization processing, this network model has quite high reliability.
Description of drawings
Fig. 1 is the concise and to the point ER figure of PmiRKB database;
Fig. 2 uses the cutting of PARE signal data checking arabidopsis miR156h to AT5G50570.1 in PmiRKB database " MiR-Tar " module;
Fig. 3 is that the paddy rice microRNA target position that predicts is made the network partial schematic diagram mutually;
Fig. 4 is that the arabidopsis microRNA target position that predicts is made the network partial schematic diagram mutually.
Embodiment
Making the network forecast method mutually based on the plant microRNA target position of sequencing data of future generation comprises the steps:
1) collects plant microRNA and genomic data;
2) handle plant microRNA data;
3) use miRU to predict the target site of plant microRNA;
4) collect the PARE signal data;
5) set up " MiR-Tar " module of PmiPKB database;
6) utilize PARE signal data checking plant microRNA target position to concern mutually;
7) make up plant microRNA target position and make network mutually.
Described collection plant microRNA and genomic data step are: the microRNA data of paddy rice and arabidopsis come from the miRBase of version 15, and data have comprised: the genome coordinate and the list of references of microRNA title, microRNA sequence, precursor title, precursor sequence, precursor.Wherein, paddy rice has 498 of ripe body sequences, 449 of precursor sequence, and arabidopsis has 224 of ripe body sequences, 199 of precursor sequence, a precursor maybe be to there being many ripe bodies.The genomic data of paddy rice comes from the TIGR of version 6.1, and the genomic data of arabidopsis comes from the TAIR of version 9.
Described processing plant microRNA data step is: the microRNA data of miRBase are the EMBL form, and the genome coordinate data is the GFF form, use the PERL script to resolve these data, deposit it in database, and all sequences all convert capitalization to.MIR156f of paddy rice and MIR531 precursor are all to there being two genome coordinates; For the reduced data library structure, will make a plurality of precursors corresponding to the same precursor branch of different genes group coordinate and represent: MIR156f (1), MIR156f (2), MIR531 (1) and MIR531 (2).For the microRNA that does not provide the microRNA* sequence,, select the microRNA* sequence to make duplex 3 ' end that outstanding [10] of two bases arranged according to the secondary structure of precursor.
The target site step of described use miRU software prediction plant microRNA is: import microRNA and the rice genome data of paddy rice respectively, select the default parameters of miRU software, then the gene target site of paddy rice microRNA is predicted; Import the microRNA and the arabidopsis gene group data of arabidopsis respectively, select the default parameters of miRU software, then the gene target site of arabidopsis microRNA is predicted.
Described collection PARE signal data step is: the terminal parallel parsing of RNA (PARE) is a kind of degraded group high throughput sequencing technologies, and the PARE signal data can be used for the analysis of microRNA to target gene mRNA cutting action.The PARE signal data is from 10 data sets of NGSD and 1 data set of Yongfang Li; Former data are carried out normalization and are handled; The arithmetical operation that promptly utilizes database to provide is carried out normalization to former data and is handled; Be about to the total indicator reading (TIR) of the reading of each sequence, multiply by 1,000,000 again, obtain the RPM (reading of sequence in per 1,000,000 readings of data set) of sequence divided by the place data set.
" MiR-Tar " module step of the described PmiPKB of foundation database is: with near the PARE signal data the SVG diagrammatic representation microRNA gene.Illustrated scope is totally one ten thousand base-pairs about microRNA precursor-gene group coordinate, because scope is too big, has provided thumbnail and window movably above illustrated, realizes that through JavaScript moving window checks the function of details.The RPM of PARE sequence representes with opacity, when mouse points to sequence, demonstrates concrete genome coordinate of this sequence and RPM value.Data set is vertically arranged, and makes things convenient for the user to compare.Express the pairing between microRNA and target gene mRNA therein, the diagram scope is that mRNA upward is total to about 120 base-pairs about the microRNA binding site.For unique PARE sequence that is mapped to this site, add frame at the rectangle of expression signal, to show difference.
The described PARE of utilization signal data checking plant microRNA target position as relationship step is mutually: use " MiR-Tar " module in the PmiRKB database; Whole target sites that graphical output contains the PARE signal data concern mutually; Amount to 8253 pairs; Carry out artificial screening again and proofread and correct, finally obtain 3077 pairs of higher microRNA target position of reliability and concern mutually.
Described prediction plant microRNA target position is made the network step mutually and is: the higher microRNA target position of 3077 pairs of reliabilities that will obtain is done mutually to concern and is stored in the text of separating with the tab key; Utilize NeAT that text file is converted into general GML network format file; Use yED network visualization instrument that these 3077 pairs of microRNA target position are made relation mutually and carry out visualization processing, construct plant microRNA target position and make network mutually.
Embodiment
1. Data Source
The microRNA data of paddy rice and arabidopsis come from miRBase [9], and version is 15.Data have comprised: the genome coordinate and the list of references of microRNA title, microRNA sequence, precursor title, precursor sequence, precursor.Wherein, paddy rice has 498 of ripe body sequences, 449 of precursor sequence; Arabidopsis has 224 of ripe body sequences, 199 of precursor sequence.Article one, precursor maybe be to there being many ripe bodies.The genome coordinate of paddy rice microRNA precursor is based on the pseudo-molecule of TIGR6.0, arabidopsis microRNA precursor based on the TAIR9 genome.The genomic data of paddy rice comes from TIGR [16], and version is 6.1.Version 6.1 and 6.0 only has minority gene classification different, so the genome coordinate of the paddy rice microRNA precursor that provides of miRBase is applicable to TIGR6.1.The genomic data of arabidopsis comes from TAIR, and version is 9.(seeing table 1)
The SNP data of paddy rice have related to 21 subspecies: 93-11, Nipponbare, Tainung 67, Li-Jiang-Xin-Tuan-Hei-Gu, M 202, Azucena, Moroberekan, Cypress, Dom-Sufid, N 22, Dular, FR13A, Aswina, Rayada, IR64-21, Shan-Huang Zhan-2, Pokkali, Swarna, Sadu-Cho, Minghui 63 and Zhenshan 97B.Wherein Nipponbare is with reference to subspecies.The SNP data of subspecies 93-11 come from genomic sequence alignment, and former data provide the sequence that totally 41 bases are long around the SNP to be used for location [7].SNP data between all the other subspecies and Nipponbare are measured [13] by the preface microarray technology of resurveying in conjunction with the computing method based on model (MB) or machine learning (ML).Former data provide the TIGR5 of SNP pseudo-subcoordinate and the sequence grown of totally 201 bases on every side of dividing, and can SNP be navigated on the TIGR6.1 with these sequences.Get the common factor of MB and ML method, to guarantee the high reliability of data.The SNP data of arabidopsis have related to 7 subspecies: Col-0, Bur-0, Tsu-1, Ler-1, Bay-0, Sha and Cvi-0.Wherein Col-0 is with reference to subspecies.The SNP data of these subspecies are from the Polymorphism database of TAIR, and former data directly provide the TAIR9 genome coordinate [15] of SNP.
The MPSS data owner of paddy rice and arabidopsis will come from 35 data sets [14] of NGSD (Next-Gen Sequence Database).Former data provide the reading of each sequence label, need normalization to handle so that carry out the contrast between data set.In addition, the paddy rice subspecies 93-11 that human high throughput methods such as Guojie Zhang obtain transcribes the group data, totally 2 data sets, with the MPSS data class seemingly, be fit to do the analysis [17] of microRNA genetic transcription equally.Therefore, can these 2 data be handled as the MPSS data.
The PARE data owner will be from 10 data sets [14] of NGSD, and former data need normalization to handle.In addition, people's such as Yongfang Li paddy rice degraded group data, totally 1 data set, with the PARE data class seemingly, also can be used for analyzing the mRNA cutting [11] of microRNA guiding.Therefore, handle this data set as the PARE data, construct plant microRNA target position and make network mutually.(seeing table 2)
Table 1: plant microRNA and genomic Data Source
Table 2: plant PARE data set
Figure BSA00000269913800072
2. data processing
PmiRKB is with PostgreSQL [2] data base administration internal data, and all former data deposit database in through handling.Web interface is through Query Database, for the user provides each functional module.
The microRNA data of miRBase are the EMBL form, and the genome coordinate data is the GFF form, use the PERL script to resolve these data, deposit it in database.MIR156f of paddy rice and MIR531 precursor are all to there being two genome coordinates; For the reduced data library structure, will make a plurality of precursors corresponding to the same precursor branch of different genes group coordinate and represent: MIR156f (1), MIR156f (2), MIR531 (1) and MIR531 (2).For the microRNA that does not provide the microRNA* sequence,, select the microRNA* sequence to make duplex 3 ' end that outstanding [10] of two bases arranged according to the secondary structure of precursor.All sequences all convert capitalization to, to keep unified, simplify the code compiling of functional module.
The SNP data of arabidopsis all provide the genome position, after resolving through the PERL script, directly deposit database in.The SNP data of paddy rice do not have the genome positional information, and perhaps the pseudo-molecule of positional information and latest edition is incompatible.Sequence is carried out BLAST [5] with the pseudo-molecule of TIGR6.1 around the SNP that former data are provided, with the genome position of definite SNP.Only get coupling fully during BLAST,, or have a plurality of couplings fully, just ignore this SNP to keep the high reliability of data if there is not coupling fully.Some sequences possibly match pseudo-molecule minus strand, in this case, after SNP information will be passed through complementary transitions, the database of restoring.The base unification of SNP is represented with capitalization.In addition; Multi-process BLAST for ease; Earlier deposit data in volatile data base; Sequence is handled each BLAST process is selected the SNP of specific 5 ' terminal sequence from volatile data base around, can guarantee the process parallel running and does not disturb mutually, and the BLAST of back MPSS and PARE sequence has used same method.
The similar CSV of MPSS data layout, sequence field deposits database in after the parsing.The normalization processing is carried out in the arithmetical operation that utilizes database to provide then, is about to the total indicator reading (TIR) of the reading of each sequence divided by the place data set, multiply by 1,000,000 again, obtains the RPM (reading of sequence in per 1,000,000 readings of data set) of sequence.Confirm the genome position of these MPSS sequences with BLAST.Nothing is mated fully or is had a plurality of MPSS sequences of coupling fully to be not sure of its genome position, when the reduced data storehouse, from database, removes.The sequence information of MPSS is deleted behind BLAST, with the reduced data storehouse.
Depositing in of PARE data is the same with the MPSS data with normalization.When PARE sequence and mRNA or microRNA precursor were carried out BLAST, not having fully, the PARE sequence of coupling was removed.But there are a plurality of PARE sequences of coupling fully not delete,, have a large amount of identical subsequences, and the microRNA of same family have identical ripe body sequence because different mRNA possibly transcribe from same gene.In database, write down the corresponding mRNA of PARE sequence and the relation of the preceding body position of microRNA this type multi-to-multi with two extra tables.
The target gene mRNA of microRNA is obtained by prediction among the PmiRKB, and dynamic programming algorithm is used in prediction.The all sites of the ripe body sequence of microRNA and all mRNA sequences is compared, calculate point penalty, get the less mRNA site of all point penalties, deposit the information in these sites in database according to the point penalty rule [18] of miRU.Target gene mRNA title and sequence also deposit database in, the unified capitalization of sequence, and base T (the mRNA sequence is obtained by genome sequence and coordinate, so contain T) all converts U to.(see figure 1)
3.PmiRKB database is set up
MiR info module only need inquire the information that needs from database, show with html format.Wherein, it is the genome coordinate according to the microRNA precursor that the microRNA gene is assembled, and in certain limit (about the microRNA gene totally two ten thousand base-pairs), inquiry obtains.The connection of external data base is provided, browses to make things convenient for the user.
The SNP module is divided microRNA precursor SNP and microRNA and two parts of target gene mRNA binding site SNP.At first, inquire the SNP of all microRNA precursors and target gene mRNA,, confirm the microRNA precursor and the target gene mRNA sequence of each subspecies according to these SNP.According to the microRNA precursor sequence of different subspecies, use RNAfold [4] to predict its secondary structure, and draw out figure with NAView algorithm [6], appear with SVG form [3].The operational factor of RNAfold is " d2-noLP ", marked in the figure microRNA sequence, microRNA* sequence and with SNP with reference to subspecies.According to the sequence of different subspecies microRNA and target gene mRNA binding site, calculate point penalty with the point penalty rule of miRU, point penalty is come out with color showing, with convenient contrast.Simultaneously, the external linkage of target gene mRNA is provided, can makes things convenient for the user to check relevant information.
The Pri-miR module is with near the MPSS signal figure (SVG) the expression microRNA gene.Illustrated scope is totally one ten thousand base-pairs about microRNA precursor-gene group coordinate.Because scope is too big, provided thumbnail and window movably above illustrated, realize that through JavaScript moving window checks the function of details.The RPM of MPSS sequence representes with opacity, when mouse points to sequence, demonstrates concrete genome coordinate of this sequence and RPM value.Data set makes things convenient for the user to compare by vertically arranging.
MiR-Tar module and Pri-miR module class are seemingly.Express the pairing between microRNA and target gene mRNA therein, the diagram scope is that mRNA upward is total to about 120 base-pairs about the microRNA binding site.For unique PARE sequence that is mapped to this site, add frame at the rectangle of expression signal, to show difference.(see figure 2)
The Self-reg module is similar with the MiR-Tar module.The diagram scope is whole microRNA precursor, has marked the position of microRNA and microRNA* in the diagram.
The user searches specific microRNA for ease, and each module all provides function of search, presses microRNA title search microRNA, case-insensitive.CSS is used in the design of web station interface, as test platform, guarantees the support to these two kinds of browsers with Firefox 3.5 and Internet Explorer 8 (the SVG plug-in unit is housed).
4. use the target site of miRU software prediction plant microRNA
Import microRNA and the rice genome data of paddy rice respectively, select the default parameters of miRU software, then the gene target site of paddy rice microRNA is predicted; Import the microRNA and the arabidopsis gene group data of arabidopsis respectively, select the default parameters of miRU software, then the gene target site of arabidopsis microRNA is predicted.
5. network model is set up
The terminal parallel parsing of RNA (PARE) is a kind of degraded group high throughput sequencing technologies, and the PARE signal data can be used for the analysis of microRNA to target gene mRNA cutting action.The PARE signal data is from 10 data sets of NGSD and 1 data set of Yongfang Li; Former data are carried out normalization and are handled; The arithmetical operation that promptly utilizes database to provide is carried out normalization to former data and is handled; Be about to the total indicator reading (TIR) of the reading of each sequence, multiply by 1,000,000 again, obtain the RPM (reading of sequence in per 1,000,000 readings of data set) of sequence divided by the place data set.
With near the PARE signal data the SVG diagrammatic representation microRNA gene; Illustrated scope is totally one ten thousand base-pairs about microRNA precursor-gene group coordinate; Because scope is too big; Provided thumbnail and window movably above illustrated, realized that through JavaScript moving window checks the function of details.The RPM of PARE sequence representes with opacity, when mouse points to sequence, demonstrates concrete genome coordinate of this sequence and RPM value.Data set is vertically arranged, and makes things convenient for the user to compare.Express the pairing between microRNA and target gene mRNA therein, the diagram scope is that mRNA upward is total to about 120 base-pairs about the microRNA binding site.For unique PARE sequence that is mapped to this site, add frame at the rectangle of expression signal, to show difference.
Utilize PARE signal data checking plant microRNA target position to concern mutually; Use " MiR-Tar " module in the PmiRKB database; Whole target sites that graphical output contains the PARE signal data concern mutually; Amount to 8253 pairs, carry out artificial screening again and proofread and correct, finally obtain 3077 pairs of higher microRNA target position of reliability and concern mutually.Prediction plant microRNA target position is made network mutually; The higher microRNA target position of 3077 pairs of reliabilities that obtains is done to concern in the text that stores into the separation of tab key mutually; Utilize NeAT that text file is converted into general GML network format file; Use yED network visualization instrument that these 3077 pairs of microRNA target position are made relation mutually and carry out visualization processing; Construct plant microRNA target position and make network mutually, Fig. 3 and Fig. 4 are respectively the microRNA target position of paddy rice and arabidopsis and make the network partial schematic diagram mutually.
6. evaluation of result
The present invention has integrated the terminal parallel parsing data of RNA of paddy rice, arabidopsis; Near the PARE signal message that is mapped to target gene mRNA and the microRNA binding site is provided, has can be used for differentiating between the microRNA-target mRNA of prediction whether have real cutting regulation relationship; Can compare to disclose the tissue specificity of this regulation relationship between PARE data set from the different tissues material.In addition; Existing PARE data have been integrated again; The PARE that is mapped on the pre-microRNA is provided RST; Can be used for monitoring the processing situation of DCL1 to pri-or pre-microRNA, and microRNA or microRNA* be to the effect of cutting certainly of its microRNA precursor, the difference between tissue still can relatively be observed through inter-library.At last paddy rice and the existing microRNA target position of arabidopsis are done mutually to concern that carrying out artificial screening proofreaies and correct; Obtaining 3077 pairs of higher microRNA target position of reliability concerns mutually; Make up network model and carried out the network visualization processing; This network model has quite high reliability, makes relation mutually for later research plant microRNA target position important references is provided.
Attach. the Chinese and English table of comparisons
Initialism English name Chinese
RISC RNA-Induced?Silencing?Complex RNA guides silencing complex
SNP S?ingle-Nucleotide?Polymorphism SNP
MPSS Massively Parallel Signature Sequencing The order-checking of large-scale parallel signal
NGSD Next-Gen?S?equence?Database Sequencing data of future generation storehouse
PARE Parallel?Analysis?of?RNA?Ends The terminal parallel parsing of RNA
CSV Comma?Separated?Value The comma separated value file form
RPM Reads?Per?Million The reading of sequence in per 1,000,000 readings of data set
HTML Hyper?Text?Markup?Language HTML
SVG Support?Vector?Graphics Scalable vector graphics
CSS Cascading?Style?Sheet CSS is single

Claims (3)

1. the plant microRNA target position based on sequencing data of future generation is made the network forecast method mutually, it is characterized in that, comprises the steps:
1) collects plant microRNA and genomic data;
2) handle plant microRNA data;
3) target site of use miRU software prediction plant microRNA;
4) collect the PARE signal data;
5) set up " MiR-Tar " module of PmiPKB database;
6) utilize PARE signal data checking plant microRNA target position to concern mutually;
7) make up plant microRNA target position and make network mutually;
Described collection plant microRNA and genomic data step are: the microRNA data of paddy rice and arabidopsis come from the miRBase database of version 15; Wherein, paddy rice has 498 of ripe body sequences, 449 of precursor sequence; Arabidopsis has 224 of ripe body sequences; 199 of precursor sequence, the genomic data of paddy rice come from the TIGR database of version 6.1, and the genomic data of arabidopsis comes from the TAIR database of version 9;
Described processing plant microRNA data step is: the microRNA data that handle and genome coordinate data come from the miRBase database of version 15; The microRNA data are the EMBL form; The genome coordinate data is the GFF form; Use the PERL script to resolve these data, deposit it in database, all sequences all convert capitalization to;
" MiR-Tar " module step of the described PmiPKB of foundation database is: with near the PARE signal data the SVG diagrammatic representation microRNA gene;
The described PARE of utilization signal data checking plant microRNA target position as relationship step is mutually: use " MiR-Tar " module in the PmiRKB database; Whole target sites that graphical output contains the PARE signal data concern mutually; Amount to 8253 pairs; Carry out artificial screening again and proofread and correct, finally obtain 3077 pairs of higher microRNA target position of reliability and concern mutually;
Described structure plant microRNA target position is made the network step mutually and is: the higher microRNA target position of 3077 pairs of reliabilities that will obtain is done mutually to concern and is stored in the text of separating with the tab key; Utilize NeAT software that text file is converted into general GML network format file; Use yED network visualization instrument that these 3077 pairs of microRNA target position are made relation mutually and carry out visualization processing, construct plant microRNA target position and make network mutually.
2. a kind of plant microRNA target position based on sequencing data of future generation as claimed in claim 1 is made the network forecast method mutually; It is characterized in that; The target site step of described use miRU software prediction plant microRNA is: microRNA and the rice genome data of importing paddy rice respectively; Select the default parameters of miRU software, then the gene target site of paddy rice microRNA is predicted; Import the microRNA and the arabidopsis gene group data of arabidopsis respectively, select the default parameters of miRU software, then the gene target site of arabidopsis microRNA is predicted.
3. a kind of plant microRNA target position based on sequencing data of future generation as claimed in claim 1 is made the network forecast method mutually; It is characterized in that; Described collection PARE signal data step is: the PARE signal data is from 10 data sets of NGSD database and 1 data set of author Yongfang Li, and former data are carried out the normalization processing.
CN2010102816834A 2010-09-10 2010-09-10 Method for interactive network predication in combination with plant microRNA target based on next generation of sequencing data Expired - Fee Related CN101976296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102816834A CN101976296B (en) 2010-09-10 2010-09-10 Method for interactive network predication in combination with plant microRNA target based on next generation of sequencing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102816834A CN101976296B (en) 2010-09-10 2010-09-10 Method for interactive network predication in combination with plant microRNA target based on next generation of sequencing data

Publications (2)

Publication Number Publication Date
CN101976296A CN101976296A (en) 2011-02-16
CN101976296B true CN101976296B (en) 2012-05-23

Family

ID=43576181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102816834A Expired - Fee Related CN101976296B (en) 2010-09-10 2010-09-10 Method for interactive network predication in combination with plant microRNA target based on next generation of sequencing data

Country Status (1)

Country Link
CN (1) CN101976296B (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008522585A (en) * 2004-10-12 2008-07-03 ザ ロックフェラー ユニバーシティー MicroRNA
CN101824410B (en) * 2010-02-25 2012-05-09 浙江省农业科学院 Simple method for establishing plant artificial microRNA

Also Published As

Publication number Publication date
CN101976296A (en) 2011-02-16

Similar Documents

Publication Publication Date Title
Zhu et al. mirTools: microRNA profiling and discovery based on high-throughput sequencing
Hackenberg et al. miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments
Yi et al. PNRD: a plant non-coding RNA database
Hackenberg et al. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments
Chang et al. An enhanced computational platform for investigating the roles of regulatory RNA and for identifying functional RNA motifs
Ekimler et al. Computational methods for microRNA target prediction
Xie et al. Genome-wide functional analysis of the cotton transcriptome by creating an integrated EST database
Griffiths‐Jones miRBase: microRNA sequences and annotation
Szcześniak et al. miRNEST database: an integrative approach in microRNA search and annotation
Bhatia et al. Present scenario of long non-coding RNAs in plants
Ucar et al. Combinatorial chromatin modification patterns in the human genome revealed by subspace clustering
Karakülah et al. PeTMbase: a database of plant endogenous target mimics (eTMs)
Schaarschmidt et al. Evaluation of seven different RNA-seq alignment tools based on experimental data from the model plant Arabidopsis thaliana
Gunaratne et al. miRNA data analysis: next-gen sequencing
Zhou et al. Recent trends and advances in identification and functional characterization of plant miRNAs
Liu et al. miRFANs: an integrated database for Arabidopsis thaliana microRNA function annotations
Patra et al. plantDARIO: web based quantitative and qualitative analysis of small RNA-seq data in plants
Ahmed et al. Comparative analysis of miRNA expression profiles between heat-tolerant and heat-sensitive genotypes of flowering chinese cabbage under heat stress using high-throughput sequencing
Yang et al. isomiR2Function: an integrated workflow for identifying microRNA variants in plants
Li et al. Integrative analysis of the lncRNA and mRNA transcriptome revealed genes and pathways potentially involved in the anther abortion of cotton (Gossypium hirsutum l.)
Yang et al. Analyzing the microRNA transcriptome in plants using deep sequencing data
Thody et al. NATpare: a pipeline for high-throughput prediction and functional analysis of nat-siRNAs
Pawełkowicz et al. miRNA profiling and its role in multi-omics regulatory networks connected with somaclonal variation in cucumber (Cucumis sativus L.)
Pei et al. Characterization of simple sequence repeat (SSR) markers mined in whole grape genomes
Xie et al. Distinct evolutionary profiles and functions of microRNA156 and microRNA529 in land plants

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120523

Termination date: 20130910