Summary of the invention
The technical problem that the present invention will solve provides a kind of RNA sequencing quality control method and device about genetic expression, provides the quality control scheme by the analysis to genetic expression for gene sequencing.
One aspect of the present invention provides a kind of quality control method that checks order about the RNA of genetic expression, and this method comprises: the order-checking fragment that sequencing technologies obtains is carried out digital gene expression pattern analysis (DGE) respectively and transcribed group analysis (RNA-Seq); The result of digital gene expression pattern analysis and the result that transcribes group analysis respectively with real-time quantitative gene amplification fluoroscopic examination (qPCR, Real-time Quantitative PCR Detecting System; PCR wherein, Polymerase Chain Reaction, polymerase chain reaction) the result carry out correlation analysis; According to the correlation analysis result, judge digital gene expression spectrum analysis and transcribe the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode the group analysis with transcribing from the digital gene expression pattern analysis; From the analytical results that selected sequencing analysis mode is obtained, choose 1,000,000 label datas (1M reads), carry out the order-checking stability analysis of genetic expression.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, this method also comprises: adopt high throughput sequencing technologies to carry out RNA order-checking about genetic expression; Result to digital gene expression spectrum analysis goes joint sequence and the processing of going the inferior quality sequence respectively with the result who transcribes group analysis.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, by high throughput sequencing technologies the genetic expression of sample fragment is repeatedly checked order, and the data of repeatedly order-checking are averaged to obtain the result of real-time quantitative gene amplification fluoroscopic examination.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, the result of digital gene expression pattern analysis and the result that transcribes group analysis carry out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively and further comprise: not full-time when the reference gene, the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively; And/or under the situation of identical order-checking amount, the result of comparative figures gene expression spectrum analysis and the detected gene number of result of transcribing group analysis.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, not full-time when the reference gene, the result of digital gene expression pattern analysis and the result that transcribes group analysis are further comprised with the step that the result of real-time quantitative gene amplification fluoroscopic examination carries out correlation analysis respectively: will hold 5 ' to hold level with both hands and all be cut into three parts from 3 ' with reference to gene; Three parts are carried out the digital gene expression pattern analysis respectively and transcribe group analysis with reference to gene; The analytical results that obtained is carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, under the situation of identical order-checking amount, the result of comparative figures gene expression spectrum analysis further comprises with the step of transcribing the detected gene number of result of group analysis: take out 3,000,000 label datas (3M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis from the order-checking fragment that high-flux sequence obtains; Taking out 2,000,000 label datas (2M reads) from the order-checking fragment that high-flux sequence obtains carries out the digital gene expression pattern analysis respectively and transcribes group analysis; And/or from the order-checking fragment that high-flux sequence obtains, take out 1,000,000 label datas (1M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; Under the situation of identical order-checking amount, respectively the comparative figures gene expression spectrum analysis and transcribe group analyzing method can detected gene number.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, from the analytical results that selected sequencing analysis mode is obtained, choose 1,000,000 label datas (1M reads), the step of carrying out the order-checking stability analysis of genetic expression further comprises: take out 1,000,000 label datas (1Mreads) from digital gene expression pattern analysis result, and itself and whole digital gene expression pattern analysis result are carried out correlation analysis; And/or from transcribe the group analysis result, take out 1,000,000 label datas (1M reads), and itself and the whole group analysis result that transcribes are carried out correlation analysis.
Another aspect of the present invention provides a kind of Quality Control device that checks order about the RNA of genetic expression, this device comprises: genetic expression measuring and calculating module, and the order-checking fragment that is used for that sequencing technologies is obtained is carried out digital gene expression pattern analysis (DGE) respectively and is transcribed group analysis (RNA-Seq); The correlation analysis module is used for the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively; The sequencing analysis mode is chosen module, is used for according to the correlation analysis result, judges digital gene expression spectrum analysis and transcribes the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode from the digital gene expression pattern analysis the group analysis with transcribing; Order-checking stability analysis module is used for choosing 1,000,000 label datas (1M reads) from the analytical results that selected sequencing analysis mode is obtained, and carries out the order-checking stability analysis of genetic expression.
Among the embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention, the correlation analysis module further comprises: the first correlation analysis submodule, be used for when not full-time, will hold 5 ' to hold level with both hands and all be cut into three parts from 3 ' with reference to gene with reference to gene; Three parts are carried out the digital gene expression pattern analysis respectively and transcribe group analysis with reference to gene; The analytical results that obtained is carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively; The second correlation analysis submodule, be used under the situation of identical order-checking amount, from the order-checking fragment that high-flux sequence obtains, take out 3,000,000 label datas (3M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis, from the order-checking fragment that high-flux sequence obtains, take out 2,000,000 label datas (2M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; And/or from the order-checking fragment that high-flux sequence obtains, take out 1,000,000 label datas (1M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; And under the situation of identical order-checking amount, respectively the comparative figures gene expression spectrum analysis and transcribe group analyzing method can detected gene number.
Among the embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention, order-checking stability analysis module further comprises: the first order-checking stability analysis submodule, be used for taking out 1,000,000 label datas (1Mreads), and itself and whole digital gene expression pattern analysis result are carried out correlation analysis from digital gene expression pattern analysis result; The second order-checking stability analysis submodule is used for taking out 1,000,000 label datas (1M reads) from transcribing the group analysis result, and itself and the whole group analysis result that transcribes are carried out correlation analysis.
The invention provides a kind of RNA sequencing quality control method and device about genetic expression, carry out correlation analysis and comprehensive assessment by analysis means to genetic expression, thereby choose gene expression analysis means with higher reliability, the accuracy of true reflection gene sequencing, assurance industry feasibility is for the stability of producing provides the quality control scheme.
Embodiment
With exemplary embodiment of the present invention the present invention is described more fully and illustrates with reference to the accompanying drawings.
The schema of the quality control method of a kind of RNA order-checking about genetic expression that Fig. 1 illustrates that the embodiment of the invention provides.
As shown in Figure 1, comprise step 102, the order-checking fragment that sequencing technologies obtains is carried out digital gene expression pattern analysis (DGE) respectively and transcribed group analysis (RNA-Seq) about the quality control method 100 of the RNA of genetic expression order-checking.In the embodiment of the invention, sequence measurement can adopt high throughput sequencing technologies, for example adopts Illumina GA Solexa sequencing technologies; Solexa be a kind of based on while synthesizing sequencing technologies (SBS, novel sequence measurement Sequencing-By-Synthesis) is by utilizing single molecule array to be implemented in to carry out on the small chip (Flow Cell) bridge-type PCR reaction.New reversible interrupter technique can be realized each only synthetic base, does not need the mark fluorescent group, utilizes corresponding LASER Excited Fluorescence group to catch exciting light again, thereby reads base information.Experiment can be adopted 36Single End order-checking platform, and RNA standard substance/laboratory sample is carried out the double digestion order-checking respectively and interrupts order-checking at random.
Step 104, the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively.After a while the result who analyzes about DGE and RNA-Seq is done further introduction in detail with the result's of qPCR correlation analysis.
Step 106 according to the correlation analysis result, is judged digital gene expression spectrum analysis and is transcribed the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode from the digital gene expression pattern analysis the group analysis with transcribing.For example, analysis-by-synthesis digital gene express spectra and the RNA-Seq difference on quantitative gene expression (relating to gene number and gene expression amount), specifically, can comprise comparative figures gene expression profile and RNA-Seq are analyzed when analyzing normal order-checking amount result and qPCR result's dependency, result and qPCR result's dependency that analysis is analyzed with reference to the not full-time comparative figures gene expression profile of gene and RNA-Seq, and under identical order-checking amount any one mode at least in comparative figures gene expression profile and the detected gene number of RNA-Seq energy.According to aforementioned Comprehensive analysis results, draw DGE and the RNA-Seq difference on quantitative gene expression, thereby choose suitable sequencing analysis mode.
Step 108 is chosen 1,000,000 label datas (1M reads) from the analytical results that selected sequencing analysis mode is obtained, carry out the order-checking stability analysis of genetic expression.For example, according to aforementioned analysis-by-synthesis, if the quantitative gene expression that the RNA-Seq analysis mode is obtained more accurate (being that the gene expression amount that RNA-Seq obtains more approaches the gene expression amount that qPCR obtains), picked at random 1Mreads from the analytical results that the RNA-Seq analysis mode is obtained then, and itself and the whole group analysis result that transcribes carried out correlation analysis; The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily; If the quantitative gene expression that DGE and RNA-Seq analysis mode are obtained is suitable, then can therefrom choose any one kind of them, choosing 1M reads in the selected analytical results that mode was obtained, and itself and the whole group analysis result that transcribes carried out correlation analysis; Thereby (mainly is by analyzing the repeatability of sequencing result about " detecting and assessment " wherein according to analytical results to the accuracy that the stability of producing order-checking detects and assesses to guarantee examining order, because number gene and the expression amount of 1M reads determine, if certain order-checking with determine that result's repeatability is bad and just illustrate that this time order-checking is unstable incorrect).
Digital gene expression pattern analysis (DGE) experimental section mainly comprises: sample preparation experiment and order-checking experiment.The main agents consumptive material is Illumina Gene Expression Sample Prep Kit and Solexa sequence testing chip (flowcell), and key instrument is Illumina Cluster Station (Illumina company) and Illumina Genome Analyzer (Illumina company) system.Concrete experiment flow: extract the total RNA of 6 μ g, utilize Oligo (dT) magnetic bead adsorption and purification mRNA, and with Oligo (dT) guiding reverse transcription synthetic double chain cDNA.Label 5 ' the terminal available two kinds of restriction endonucleases of generation are realized: NlaIII or DpnII, usually we use NlaIII, CATG site on its identification and the cut-out cDNA, utilizing the magnetic bead deposition and purification to have the fragment of cDNA3 ' end, (is sequence: ACAGGTTCAGAGTTCTACAGTCCGACATG) with its 5 ' terminal connection Illumina joint 1.Illumina joint 1 is the recognition site of MmeI with the junction in CATG site, and MmeI is a kind of recognition site and the isolating restriction endonuclease of restriction enzyme site, and enzyme is cut 17bp place, downstream, CATG site, has so just produced the Tag that has joint 1.After magnetic bead precipitation removal 3 ' fragment, (be sequence: CAAGCAGAAGACGGCATACGANN), thereby obtain the 21bp label library that two ends are connected with different joint sequences at the terminal Illumina joint 2 that connects of Tag3 '.Behind 15 round-robin PCR linear amplifications, by 6%TBE PAGE gel electrophoresis purifying 85 base bands, after unwinding, single chain molecule is added to Solexa sequence testing chip (flowcell) to be gone up and fixes, amplification becomes a unit molecule bunch (cluster) sequencing template to every molecule through original position, add 4 looks fluorescently-labeled, 4 kinds of Nucleotide, adopt while synthesizing sequencing (sequencing by synthesis, SBS) order-checking.Each passage will produce millions of original Read, and the long 35bp that is is read in the order-checking of Read.Utilize mRNA among the total RNA of beads enrichment of OligodT, and reverse transcription is double-stranded cDNA, adopts 4 bases identification enzyme NlaIII, enzyme to cut double-stranded cDNA, link Illumina joint 1, utilize the MmeI enzyme to cut 3 ' end CATG downstream 17bp base, and at 3 ' end link Illumina joint 2.Add Primer GX1 again and Primer GX2 carries out pcr amplification.Amplification back sample reclaims 85 base bands by 6%TBE PAGE glue, checks order by Illumina genetic expression sequencing behind the purifying.Transcribing group analysis (RNA-Seq) experimental section order-checking primary process comprises: after extracting the total RNA of sample, with the enrichment with magnetic bead eukaryote mRNA that has Oligo (dT) (if prokaryotic organism enter next step after then removing rRNA with test kit).Add fragmentation buffer mRNA is broken into the short-movie section, with mRNA is template, with the synthetic article one cDNA chain of hexabasic basic random primer (random hexamers), add the synthetic second cDNA chain of damping fluid, dNTPs, RNase H and DNA polymerase I then, do terminal the reparation and the connection sequence measuring joints after passing through QiaQuick PCR test kit (production of Qiagen company) purifying again and adding the EB buffer solution elution, carrying out clip size with agarose gel electrophoresis then selects, carry out pcr amplification at last, use the order-checking library of building up to check order.
Next the result's of result that DGE and RNA-Seq are analyzed and qPCR correlation analysis is described in detail:
The result of digital gene expression pattern analysis (DGE) and the result's of qPCR correlation analysis, relate generally to the account form of expression amount TPM in the DGE standard analysis (Transcripts Per Million clean reads), specifically: total clean Tags number * 1 in original Clean Tags number/this sample that each gene of TPM=comprises, 000,000 (referring to Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms, Peter A.C. ' t Hoen, Yavuz Ariyurek, et al., Nucleic Acids Research, 15 October 2008, Vol.36, No.21).
Fig. 2 shows result that the DGE of the present invention's two samples analyzes and qPCR result's correlation analysis result's synoptic diagram.As a rule, be 3M reads as the DGE data throughput, can from the sample sequencing data, get 3M reads at random and carry out the DGE analysis of accuracy as a result; The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 3M more arbitrarily.Because UHRR and HBRR are the RNA standard models, can download what obtain is the qPCR result of this RNA standard model, and the sample sequencing data can not be downloaded, and need check order voluntarily.Fig. 2 (a) shows the DGE analytical results of sample UHRR and qPCR result's analytical results synoptic diagram, and Fig. 2 (b) shows the DGE analytical results of sample HBRR and qPCR result's analytical results synoptic diagram; Wherein the UHRR of the present invention's use is Universal Human Reference RNA (UHRR) standard substance of Stratagene company, and HBRR is Human Brain Reference RNA (HBRR) standard substance of Ambion company.Shown in Fig. 2 (a), result that the DGE of sample UHRR analyzes and qPCR result's relation conefficient are about 0.3, shown in Fig. 2 (b), result that the DGE of sample HBRR analyzes and qPCR result's relation conefficient are about 0.53 (wherein UHRR and HBRR sample can detected gene number all be 716, and UHRR and the HBRR sample detected gene number of energy in qPCR all is 687) in DGE analyzes.
Transcribe the result of group analysis (RNA-Seq) and qPCR result's correlation analysis, relate generally to the account form of expression amount RPKM in the RNA-Seq standard analysis (Reads Per Kb per Million reads), specifically: the algorithm of expression amount RPKM in the RNA-Seq standard analysis is (referring to Mapping and quantifying mammalian transcriptomes by RNA-Seq, Ali Mortazavi et al., 30May 2008, Nature Methods|Advance Online Publication) as follows:
Wherein, RPKM (A) is the expression amount about gene A, and C is unique reads number of comparing gene A, and N is unique genomic total reads number of comparing, and L is the base number of gene A coding region.
Fig. 3 shows result that the RNA-Seq of the present invention's two samples analyzes and qPCR result's correlation analysis result's synoptic diagram.As a rule, as the RNA-Seq data throughput is 3Mreads, can get 3M reads at random and carry out the DGE analysis of accuracy as a result from the sample sequencing data, the described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 3M more arbitrarily.Fig. 3 (a) shows the result and qPCR result's analytical results synoptic diagram that the RNA-Seq of sample UHRR analyzes, and Fig. 3 (b) shows the result of RNA-Seq analysis of sample HBRR and qPCR result's analytical results synoptic diagram; Wherein the result of the RNA-Seq of sample UHRR analysis and qPCR result's relation conefficient are about 0.91, result that the RNA-Seq of sample HBRR analyzes and qPCR result's relation conefficient are about 0.86 (wherein UHRR and HBRR sample can detected gene number all be 872, and UHRR and the HBRR sample detected gene number of energy in qPCR all is 851) in RNA-Seq analyzes.In addition, need to prove: from sample UHRR and HBRR, extract 3M reads and carry out RNA-Seq and qPCR correlation analysis, identical with the relation conefficient of RNA-Seq that calculates with all data and qPCR, all be respectively 0.91 and 0.86.This shows that for the data volume of gene sequencing, a few nothing influences of its quantitative analysis to RNA-Seq influence in other words is very little.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, before the order-checking fragment that sequencing technologies is obtained is carried out DGE and RNA-Seq analytical procedure respectively, the result of digital gene expression spectrum analysis and the result that transcribes group analysis are removed joint sequence respectively; Further, also can go the processing of inferior quality sequence again to the result who removes joint sequence, thus obtain can be used in label data (clean tag) to carry out subsequent analysis.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, by high throughput sequencing technologies the genetic expression of sample fragment is repeatedly checked order, and the data of repeatedly order-checking are averaged to obtain the result of real-time quantitative gene amplification fluoroscopic examination.For example, the qPCR data of UHRR and HBRR sample are from GEO (high-throughput genetic expression, Gene Expression Omnibus) upward downloads, its download path specifically: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE5350, wherein the accession number of UHRR is GSM129638, and open day is on September 8th, 2006; The accession number of HBRR is GSM129645, and open day is on September 8th, 2006.To repeatedly check order the respectively parallel laboratory test of (as 4 times) of UHRR and HBRR sample, and to these 4 parallel laboratory tests about the results averaged of gene number and gene expression amount with as the qPCR quantitative result.
The quality control method that checks order about the RNA of genetic expression provided by the invention, based on the order-checking fragment being carried out DGE and RNA-Seq analysis, and result and qPCR result that DGE and RNA-Seq are analyzed carry out the analysis-by-synthesis of dependency, thereby choose the order-checking stability analysis that suitable sequencing analysis mode is carried out genetic expression.An embodiment of the quality control method by the RNA order-checking about genetic expression provided by the invention, it can truly reflect the accuracy of gene sequencing, guarantees the industry feasibility, for the stability of producing provides the quality control scheme.
Fig. 4 illustrates the schema of another embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention.
As shown in Figure 4, quality control method 400 about the RNA of genetic expression order-checking comprises step 402,404,405,406,408, wherein step 402,406 and 408 can be carried out respectively and can carry out respectively and step 102 shown in Figure 1,106 and 108 same or analogous technology contents, for for purpose of brevity, repeat no more its technology contents here.
As shown in Figure 4, after step 402, performing step 404, not full-time when the reference gene, the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively.With reference to gene all is to have spliced good nucleotide sequence (http://www.ncbi.nlm.nih.gov/) in the existing database, these nucleotide sequences have a lot of versions (by different research institutions, the issue of unit of Data centre or the like), each mechanism is because the restriction of its state of the art, so it is different that result and the truth of gene of issue have, so may exist with reference to the incomplete or incomplete situation of gene.For example, when the result who analyzes at the result who analyzes with DGE and RNA-Seq carries out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively, when the reference gene complete/when imperfect, can carry out the analysis of dependency in the following way.
Specifically, in non-model animals, have reason to suspect that the not plenary session with reference to gene order causes DGE quantitatively inaccurate; At first, complete reference gene order (as the refseq gene of people among the NCBI) is carried out trisection since 3 ' end, then the gene order of trisection is used as complete reference gene order, is carried out the result of DGE analysis and qPCR result's correlation analysis respectively.Fig. 5 shows result that sample UHRR trisection of the present invention analyzes with reference to the DGE of gene order and qPCR result's correlation analysis result's synoptic diagram, wherein Fig. 5 (a) shows result that the DGE of first section of sample UHRR analyzes and qPCR result's analytical results synoptic diagram, Fig. 5 (b) shows the result and qPCR result's analytical results synoptic diagram that the DGE of second section of sample UHRR analyzes, and Fig. 5 (c) shows the result of DGE analysis of the 3rd section of sample UHRR and qPCR result's analytical results synoptic diagram; Analyze this three partial sequence of finding sample UHRR, its DGE analytical results and qPCR result's relation conefficient is about 0.71 respectively, 0.39 and 0.33 (DGE analytical results and qPCR result's relation conefficient is 0.76 when performing an analysis with complete genome sequence), it can detected gene number be respectively 774,596 and 435 in DGE analyzes.Similarly, carry out for adopting the RNA-Seq analysis mode that gene order expresses, also be earlier complete reference gene order to be carried out trisection since 3 ' end, then the gene order of trisection be used as complete reference gene order, carry out RNA-Seq and qPCR correlation analysis respectively.Fig. 6 shows result that sample UHRR trisection of the present invention analyzes with reference to the RNA-Seq of gene order and qPCR result's correlation analysis result's synoptic diagram, wherein Fig. 6 (a) shows result that the RNA-Seq of first section of sample UHRR analyzes and qPCR result's analytical results synoptic diagram, Fig. 6 (b) shows the result and qPCR result's analytical results synoptic diagram that the RNA-Seq of second section of sample UHRR analyzes, and Fig. 6 (c) shows the result of RNA-Seq analysis of the 3rd section of sample UHRR and qPCR result's analytical results synoptic diagram; Analyze this three partial sequence of finding sample UHRR, its RNA-Seq analytical results and qPCR result's relation conefficient is about 0.85,0.91 and 0.84 (relation conefficient of RNA-Seq and qPCR is 0.91 when performing an analysis with complete genome sequence) respectively, and it can detected gene number be respectively 917,911 and 896 in RNA-Seq analyzes.
DGE is because the shortcoming of himself, it can't detect the gene that does not contain CATG (or GATC) site, tend to obtain the label of the Tag of every the most close 3 ' end of mRNA as this mRNA, therefore it is relatively stricter to the requirement of reference gene, and is very big with reference to the imperfect influence to DGE result of gene order; And RNA-Seq interrupts at random to mRNA, so every mRNA can access a lot of labels, is not very strong to the dependency of reference gene, also can access expression amount information more accurately under the incomplete situation of reference gene.This shows that gene order is imperfect bigger to the influence of DGE analytical results, and little for the analytical results influence of RNA-Seq; That is to say, not full-time for the reference gene, if use DGE to analyze, then preferably adopt genetic expression 3 ' to hold first section that begins; And further, preferably adopt the RNA-Seq analysis mode that gene fragment is carried out expression analysis.
Step 405, under the situation of identical order-checking amount, the result of comparative figures gene expression spectrum analysis and the detected gene number of result of transcribing group analysis.For example, under identical order-checking amount, relatively DGE and the detected gene number of two kinds of analysis modes of RNA-Seq specifically can comprise: the reads number that takes out 3M from the order-checking fragment that high-flux sequence obtains at random carries out digital gene express spectra and RNA-Seq respectively and analyzes, the reads number that takes out 2M from the order-checking fragment that high-flux sequence obtains at random carries out digital gene express spectra and RNA-Seq respectively to be analyzed, and the reads number that takes out 1M from the order-checking fragment that high-flux sequence obtains at random carries out DGE and RNA-Seq respectively and analyzes; The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of respective amount more arbitrarily.Under the situation of identical order-checking amount, can from aforementioned three kinds of modes, choose at least a mode wantonly and come comparative figures gene expression profile and the detected gene number of RNA-Seq energy respectively.Fig. 7 be sample UHRR of the present invention under identical order-checking amount, the synoptic diagram of the detected gene number of DGE and RNA-Seq.As shown in Figure 7, RNA-Seq can detected gene be Duoed about 1000 genes than DGE when identical order-checking amount.
Fig. 8 illustrates the schema of another embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention.
As shown in Figure 8, quality control method 800 about the RNA of genetic expression order-checking comprises step 802,804,806,808,809, wherein step 802,804,806 can be carried out respectively and can carry out respectively and step 102 shown in Figure 1,104,106 same or analogous technology contents, for for purpose of brevity, repeat no more its technology contents here.
As shown in Figure 8, after step 806, performing step 808 is taken out 1,000,000 label datas (1M reads) at random from digital gene expression pattern analysis result, and itself and whole digital gene expression pattern analysis result are carried out correlation analysis.The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily.
Step 809 is taken out 1,000,000 label datas (1M reads) at random from transcribe the group analysis result, and itself and the whole group analysis result that transcribes are carried out correlation analysis.The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily.
About the correlation analysis in step 808 and the step 809, can be in the following way: each lane adds known storehouse (only surveying 1M left and right sides reads), and it is stable that the sequencing result by more known storehouse detects order-checking.For example, adopt 1M order-checking amount standard substance correlation analysis in the embodiment of the invention, twice 1M order-checking amount standard substance UHRR and repeated experiments sample UHRR dependency synopsis, as shown in table 1.
The data dependence synopsis of table 11M order-checking amount standard substance and repeated experiments
Table 1 shows, to the discovery of analysing and comparing of the data of 1M order-checking amount standard substance and repeated experiments, the Gene dependency is that Spearman relation conefficient or Pearson correlation coefficient are all very high; It can be said that bright sequencing result is normal, trusty, the method for using standard substance to detect order-checking stability has feasibility, can provide the quality control scheme for gene sequencing by the analysis to genetic expression.Carry out the application of method in the RNA order-checking of genetic expression Quality Control with 1M reads and can assess production stability.
In addition, need to prove, for quality control method of the present invention and device, no matter be angle from quantitative accuracy, still from detected number gene, or the dependency equal angles of reference gene come comparison, in the Quality Control scheme, adopt the RNA-Seq analytical procedure to have the advantage that reflects genetic expression than DGE more accurately.
The structural representation of the Quality Control device of a kind of RNA order-checking about genetic expression that Fig. 9 illustrates that the embodiment of the invention provides.
As shown in Figure 9, a kind of Quality Control device 900 that checks order about the RNA of genetic expression comprises: genetic expression measuring and calculating module 902, correlation analysis module 904, sequencing analysis mode are chosen module 906 and order-checking stability analysis module 908.
Wherein, genetic expression measuring and calculating module 902, the order-checking fragment that is used for that sequencing technologies is obtained is carried out digital gene expression pattern analysis (DGE) respectively and is transcribed group analysis (RNA-Seq);
Correlation analysis module 904 is used for the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively.
The sequencing analysis mode is chosen module 906, be used for according to the correlation analysis result, judge digital gene expression spectrum analysis and transcribe the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode the group analysis with transcribing from the digital gene expression pattern analysis.
Order-checking stability analysis module 908 is used for choosing 1,000,000 label datas (1M reads) from the analytical results that selected sequencing analysis mode is obtained, and carries out the order-checking stability analysis of genetic expression.
Figure 10 illustrates the structural representation of another embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention.
As shown in figure 10, a kind of Quality Control device 1000 of the RNA order-checking about genetic expression comprises: genetic expression measuring and calculating module 1002, correlation analysis module 1004, sequencing analysis mode are chosen module 1006 and order-checking stability analysis module 1008, and wherein genetic expression measuring and calculating module 1002, sequencing analysis mode are chosen module 1006 and order-checking stability analysis module 1008 can be to choose module 906 and order-checking stability analysis module 908 same or analogous functional modules with genetic expression measuring and calculating module 902 shown in Figure 9, sequencing analysis mode.For for purpose of brevity, repeat no more here.
As shown in figure 10, correlation analysis module 1004 further comprises: the first correlation analysis submodule and the second correlation analysis submodule; Wherein
The first correlation analysis submodule 10041 is used for when not full-time with reference to gene, will hold 5 ' to hold level with both hands and all be cut into three parts from 3 ' with reference to gene; Three parts are carried out the digital gene expression pattern analysis respectively and transcribe group analysis with reference to gene; The analytical results that obtained is carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively.
The second correlation analysis submodule 10042, be used under the situation of identical order-checking amount, from the order-checking fragment that high-flux sequence obtains, take out 3,000,000 label datas (3M reads) at random and carry out the digital gene expression pattern analysis respectively and transcribe group analysis, from the order-checking fragment that high-flux sequence obtains, take out 2,000,000 label datas (2M reads) at random and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; And/or from the order-checking fragment that high-flux sequence obtains, take out 1,000,000 label datas (1M reads) at random and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of respective amount more arbitrarily.And under the situation of identical order-checking amount, respectively the comparative figures gene expression spectrum analysis and transcribe group analyzing method can detected gene number.
Figure 11 illustrates the structural representation of another embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention.
As shown in figure 11, a kind of Quality Control device 1100 that checks order about the RNA of genetic expression comprises: genetic expression measuring and calculating module 1102, correlation analysis module 1104, sequencing analysis mode are chosen module 1106 and order-checking stability analysis module 1108.Wherein to choose module 1106 can be to choose module 906 same or analogous functional modules with genetic expression measuring and calculating module 902 shown in Figure 9, correlation analysis module 904, sequencing analysis mode for genetic expression measuring and calculating module 1102, correlation analysis module 1104, sequencing analysis mode.For for purpose of brevity, repeat no more here.
As shown in figure 11, order-checking stability analysis module 1108 further comprises: the first order-checking stability analysis submodule 11081 and the second order-checking stability analysis submodule 11082, wherein
The first order-checking stability analysis submodule 11081 is used for taking out 1,000,000 label datas (1M reads) at random from digital gene expression pattern analysis result, and itself and whole digital gene expression pattern analysis result are carried out correlation analysis.The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily.
The second order-checking stability analysis submodule 11082 is used for taking out 1,000,000 label datas (1M reads) at random from transcribing the group analysis result, and itself and the whole group analysis result that transcribes are carried out correlation analysis.The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily.
The Quality Control device that checks order about the RNA of genetic expression provided by the invention, by genetic expression measuring and calculating module gene fragment is analyzed, and choose module by correlation analysis module and sequencing analysis mode and carry out correlation analysis and comprehensive assessment, thereby choose gene expression analysis means with higher reliability, the accuracy of true reflection gene sequencing is for the stability of producing provides the quality control scheme.
With reference to the exemplary description of aforementioned the present invention, those skilled in the art can clearly know the aforementioned advantages that quality control method and device had that checks order about the RNA of genetic expression provided by the invention, Quality Control scheme provided by the invention is applicable to high throughput sequencing technologies, can assess the stability of RNA order-checking effectively, guarantee the accuracy of examining order.
Description of the invention provides for example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.The functional module of describing among the present invention and the dividing mode of functional module only are explanation thought of the present invention, and those skilled in the art can freely change the dividing mode of functional module and module structure thereof with the realization identical functions according to the needs of instruction of the present invention and practical application; Selecting and describing embodiment is for better explanation principle of the present invention and practical application, thereby and makes those of ordinary skill in the art can understand the various embodiment that have various modifications that the present invention's design is suitable for specific end use.