CN101914619A - RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression - Google Patents

RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression Download PDF

Info

Publication number
CN101914619A
CN101914619A CN2010102361769A CN201010236176A CN101914619A CN 101914619 A CN101914619 A CN 101914619A CN 2010102361769 A CN2010102361769 A CN 2010102361769A CN 201010236176 A CN201010236176 A CN 201010236176A CN 101914619 A CN101914619 A CN 101914619A
Authority
CN
China
Prior art keywords
analysis
result
gene expression
order
checking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010102361769A
Other languages
Chinese (zh)
Inventor
彭智宇
韩祖晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Technology Solutions Co Ltd
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Priority to CN2010102361769A priority Critical patent/CN101914619A/en
Publication of CN101914619A publication Critical patent/CN101914619A/en
Priority to PCT/CN2011/001158 priority patent/WO2012009952A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation

Landscapes

  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression. The method comprises the following steps of: respectively carrying out DGE (Digital Gene Expression) analysis and RNA-Seq (Ribonucleic Acid-Sequence) analysis on a sequencing segment obtained through a sequencing technology; respectively carrying out correlation analysis between a DGE analysis result and a qPCR (quantitative Polymerase Chain Reaction) result and between a transcriptome analysis result and the qPCR result; judging the difference of DGE analysis and the transcriptome analysis in gene expression quantification according to a correlation analysis result and selecting a sequencing analysis mode from the DGE analysis and the transcriptome analysis; and selecting 1M of reads from analysis results acquired from the selected sequencing analysis mode to analyze the sequencing stability of the gene expression. By carrying out the correlation analysis and the comprehensive estimation of the gene segment, the invention selects a gene expression analysis method with higher reliability, ensures the accuracy of the sequencing work and provides a quality control scheme for the production stability.

Description

RNA sequencing quality control method and device about genetic expression
Technical field
The present invention relates to biological technical field, relate in particular to a kind of Yeast Nucleic Acid (RNA, RiboNucleic Acid) sequencing quality control method and device about genetic expression.
Background technology
Genetic expression is meant that gene fragment thymus nucleic acid (DNA, Deoxyribonucleic acid) is transcribed into messenger RNA(mRNA) (mRNA, Messenger RNA) and mRNA translates into proteinic process.Along with finishing of the Human Genome Project (HGP, Human Genome Project) complete nucleotide order-checking, the center of gravity of human genome research progresses into the genome times afterwards comprehensively (Postgenome Era) to the function of gene and the diversity inclination of gene.By parallel analysis to individuality a large amount of genetic expressions under different growth and development stages or different physiological status, research corresponding gene function in vivo, illustrate the synergistic mechanism of different levels polygene, and then in the research performance enormous function of aspects such as the pathogeny of human major disease such as cancer, cardiovascular disorder, diagnoses and treatment, drug development.It will promote every genome research plan of human structure gene group and functional genome greatly.
All the time, method based on molecular hybridization is analyzed genetic expression, making nucleic acid molecular hybridization method (southern, northern blotting) biochip technology up till now from classics, all be to use known nucleic acid sequence, carry out the qualitative and quantitative analysis by signal detection subsequently as probe and the hybridization of complementary target nucleotide sequences.
The invention of a new generation's high throughput sequencing technologies particularly has epoch making significance the research of genomics to biology, and the analysis that its high flux property makes transcript and the genome to species carry out careful overall picture becomes possibility.Appearance along with the Solexa sequencing technologies, make high-throughput, low-cost order-checking becomes possibility, and compares with the simulating signal of chip technology, has avoided shortcomings such as the cross hybridization in the chip technology, analytical model complexity and sensitivity is low based on the expression analysis of Solexa sequencing technologies; But, because high-flux sequence is read long restriction, it is restricted in the application of the unknown gene group being carried out de novo sequencing (de novo sequencing), this part work still needs the assistance of traditional order-checking means (read length and can reach 850 bases).And this do not influence high throughput sequencing technologies at mRNA express spectra, microRNA express spectra, transcribe the application of aspects such as group order-checking, karyomit(e) co-immunoprecipitation (ChIP-chip, Chromatin Immunoprecipitation) and dna methylation.
Digital gene express spectra (DGE, Digital Gene Expression Profiling) and to transcribe group analysis (RNA-Seq) be to utilize high throughput sequencing technologies of new generation and high-performance calculation analytical technology the expression conditions under a certain species particular organization and the state to be carried out the novel method of sequence capture and accurate Analysis.Continuous development along with the new-generation sequencing technology, research to genetic expression also can be goed deep into more, therefore, need carry out relevance evaluation to the analysis means of genetic expression, thereby get rid of because analysis means self inaccurate or analytical error that unstable caused, thereby choose gene expression analysis means, so that truly reflect the accuracy of gene sequencing with higher reliability, guarantee that assessment is reliable, thereby guarantee the stability of industry feasibility and production.
Summary of the invention
The technical problem that the present invention will solve provides a kind of RNA sequencing quality control method and device about genetic expression, provides the quality control scheme by the analysis to genetic expression for gene sequencing.
One aspect of the present invention provides a kind of quality control method that checks order about the RNA of genetic expression, and this method comprises: the order-checking fragment that sequencing technologies obtains is carried out digital gene expression pattern analysis (DGE) respectively and transcribed group analysis (RNA-Seq); The result of digital gene expression pattern analysis and the result that transcribes group analysis respectively with real-time quantitative gene amplification fluoroscopic examination (qPCR, Real-time Quantitative PCR Detecting System; PCR wherein, Polymerase Chain Reaction, polymerase chain reaction) the result carry out correlation analysis; According to the correlation analysis result, judge digital gene expression spectrum analysis and transcribe the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode the group analysis with transcribing from the digital gene expression pattern analysis; From the analytical results that selected sequencing analysis mode is obtained, choose 1,000,000 label datas (1M reads), carry out the order-checking stability analysis of genetic expression.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, this method also comprises: adopt high throughput sequencing technologies to carry out RNA order-checking about genetic expression; Result to digital gene expression spectrum analysis goes joint sequence and the processing of going the inferior quality sequence respectively with the result who transcribes group analysis.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, by high throughput sequencing technologies the genetic expression of sample fragment is repeatedly checked order, and the data of repeatedly order-checking are averaged to obtain the result of real-time quantitative gene amplification fluoroscopic examination.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, the result of digital gene expression pattern analysis and the result that transcribes group analysis carry out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively and further comprise: not full-time when the reference gene, the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively; And/or under the situation of identical order-checking amount, the result of comparative figures gene expression spectrum analysis and the detected gene number of result of transcribing group analysis.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, not full-time when the reference gene, the result of digital gene expression pattern analysis and the result that transcribes group analysis are further comprised with the step that the result of real-time quantitative gene amplification fluoroscopic examination carries out correlation analysis respectively: will hold 5 ' to hold level with both hands and all be cut into three parts from 3 ' with reference to gene; Three parts are carried out the digital gene expression pattern analysis respectively and transcribe group analysis with reference to gene; The analytical results that obtained is carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, under the situation of identical order-checking amount, the result of comparative figures gene expression spectrum analysis further comprises with the step of transcribing the detected gene number of result of group analysis: take out 3,000,000 label datas (3M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis from the order-checking fragment that high-flux sequence obtains; Taking out 2,000,000 label datas (2M reads) from the order-checking fragment that high-flux sequence obtains carries out the digital gene expression pattern analysis respectively and transcribes group analysis; And/or from the order-checking fragment that high-flux sequence obtains, take out 1,000,000 label datas (1M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; Under the situation of identical order-checking amount, respectively the comparative figures gene expression spectrum analysis and transcribe group analyzing method can detected gene number.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, from the analytical results that selected sequencing analysis mode is obtained, choose 1,000,000 label datas (1M reads), the step of carrying out the order-checking stability analysis of genetic expression further comprises: take out 1,000,000 label datas (1Mreads) from digital gene expression pattern analysis result, and itself and whole digital gene expression pattern analysis result are carried out correlation analysis; And/or from transcribe the group analysis result, take out 1,000,000 label datas (1M reads), and itself and the whole group analysis result that transcribes are carried out correlation analysis.
Another aspect of the present invention provides a kind of Quality Control device that checks order about the RNA of genetic expression, this device comprises: genetic expression measuring and calculating module, and the order-checking fragment that is used for that sequencing technologies is obtained is carried out digital gene expression pattern analysis (DGE) respectively and is transcribed group analysis (RNA-Seq); The correlation analysis module is used for the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively; The sequencing analysis mode is chosen module, is used for according to the correlation analysis result, judges digital gene expression spectrum analysis and transcribes the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode from the digital gene expression pattern analysis the group analysis with transcribing; Order-checking stability analysis module is used for choosing 1,000,000 label datas (1M reads) from the analytical results that selected sequencing analysis mode is obtained, and carries out the order-checking stability analysis of genetic expression.
Among the embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention, the correlation analysis module further comprises: the first correlation analysis submodule, be used for when not full-time, will hold 5 ' to hold level with both hands and all be cut into three parts from 3 ' with reference to gene with reference to gene; Three parts are carried out the digital gene expression pattern analysis respectively and transcribe group analysis with reference to gene; The analytical results that obtained is carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively; The second correlation analysis submodule, be used under the situation of identical order-checking amount, from the order-checking fragment that high-flux sequence obtains, take out 3,000,000 label datas (3M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis, from the order-checking fragment that high-flux sequence obtains, take out 2,000,000 label datas (2M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; And/or from the order-checking fragment that high-flux sequence obtains, take out 1,000,000 label datas (1M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; And under the situation of identical order-checking amount, respectively the comparative figures gene expression spectrum analysis and transcribe group analyzing method can detected gene number.
Among the embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention, order-checking stability analysis module further comprises: the first order-checking stability analysis submodule, be used for taking out 1,000,000 label datas (1Mreads), and itself and whole digital gene expression pattern analysis result are carried out correlation analysis from digital gene expression pattern analysis result; The second order-checking stability analysis submodule is used for taking out 1,000,000 label datas (1M reads) from transcribing the group analysis result, and itself and the whole group analysis result that transcribes are carried out correlation analysis.
The invention provides a kind of RNA sequencing quality control method and device about genetic expression, carry out correlation analysis and comprehensive assessment by analysis means to genetic expression, thereby choose gene expression analysis means with higher reliability, the accuracy of true reflection gene sequencing, assurance industry feasibility is for the stability of producing provides the quality control scheme.
Description of drawings
The schema of the quality control method of a kind of RNA order-checking about genetic expression that Fig. 1 illustrates that the embodiment of the invention provides;
Fig. 2 shows result that the DGE of the present invention's two samples analyzes and qPCR result's correlation analysis result's synoptic diagram, wherein Fig. 2 (a) shows the result and qPCR result's analytical results synoptic diagram that the DGE of sample UHRR analyzes, and Fig. 2 (b) shows the result of DGE analysis of sample HBRR and qPCR result's analytical results synoptic diagram;
Fig. 3 shows result that the RNA-Seq of the present invention's two samples analyzes and qPCR result's correlation analysis result's synoptic diagram, wherein Fig. 3 (a) shows the result and qPCR result's analytical results synoptic diagram that the RNA-Seq of sample UHRR analyzes, and Fig. 3 (b) shows the result of RNA-Seq analysis of sample HBRR and qPCR result's analytical results synoptic diagram;
Fig. 4 illustrates the schema of another embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention;
Fig. 5 shows result that sample UHRR trisection of the present invention analyzes with reference to the DGE of gene order and qPCR result's correlation analysis result's synoptic diagram, wherein Fig. 5 (a) shows result that the DGE of first section of sample UHRR analyzes and qPCR result's analytical results synoptic diagram, Fig. 5 (b) shows the result and qPCR result's analytical results synoptic diagram that the DGE of second section of sample UHRR analyzes, and Fig. 5 (c) shows the result of DGE analysis of the 3rd section of sample UHRR and qPCR result's analytical results synoptic diagram;
Fig. 6 shows result that sample UHRR trisection of the present invention analyzes with reference to the RNA-Seq of gene order and qPCR result's correlation analysis result's synoptic diagram, wherein Fig. 6 (a) shows result that the RNA-Seq of first section of sample UHRR analyzes and qPCR result's analytical results synoptic diagram, Fig. 6 (b) shows the result and qPCR result's analytical results synoptic diagram that the RNA-Seq of second section of sample UHRR analyzes, and Fig. 6 (c) shows the result of RNA-Seq analysis of the 3rd section of sample UHRR and qPCR result's analytical results synoptic diagram;
Fig. 7 be sample UHRR of the present invention under identical order-checking amount, the synoptic diagram of the detected gene number of DGE and RNA-Seq;
Fig. 8 illustrates the schema of another embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention;
The structural representation of the Quality Control device of a kind of RNA order-checking about genetic expression that Fig. 9 illustrates that the embodiment of the invention provides;
Figure 10 illustrates the structural representation of another embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention;
Figure 11 illustrates the structural representation of another embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention.
Embodiment
With exemplary embodiment of the present invention the present invention is described more fully and illustrates with reference to the accompanying drawings.
The schema of the quality control method of a kind of RNA order-checking about genetic expression that Fig. 1 illustrates that the embodiment of the invention provides.
As shown in Figure 1, comprise step 102, the order-checking fragment that sequencing technologies obtains is carried out digital gene expression pattern analysis (DGE) respectively and transcribed group analysis (RNA-Seq) about the quality control method 100 of the RNA of genetic expression order-checking.In the embodiment of the invention, sequence measurement can adopt high throughput sequencing technologies, for example adopts Illumina GA Solexa sequencing technologies; Solexa be a kind of based on while synthesizing sequencing technologies (SBS, novel sequence measurement Sequencing-By-Synthesis) is by utilizing single molecule array to be implemented in to carry out on the small chip (Flow Cell) bridge-type PCR reaction.New reversible interrupter technique can be realized each only synthetic base, does not need the mark fluorescent group, utilizes corresponding LASER Excited Fluorescence group to catch exciting light again, thereby reads base information.Experiment can be adopted 36Single End order-checking platform, and RNA standard substance/laboratory sample is carried out the double digestion order-checking respectively and interrupts order-checking at random.
Step 104, the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively.After a while the result who analyzes about DGE and RNA-Seq is done further introduction in detail with the result's of qPCR correlation analysis.
Step 106 according to the correlation analysis result, is judged digital gene expression spectrum analysis and is transcribed the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode from the digital gene expression pattern analysis the group analysis with transcribing.For example, analysis-by-synthesis digital gene express spectra and the RNA-Seq difference on quantitative gene expression (relating to gene number and gene expression amount), specifically, can comprise comparative figures gene expression profile and RNA-Seq are analyzed when analyzing normal order-checking amount result and qPCR result's dependency, result and qPCR result's dependency that analysis is analyzed with reference to the not full-time comparative figures gene expression profile of gene and RNA-Seq, and under identical order-checking amount any one mode at least in comparative figures gene expression profile and the detected gene number of RNA-Seq energy.According to aforementioned Comprehensive analysis results, draw DGE and the RNA-Seq difference on quantitative gene expression, thereby choose suitable sequencing analysis mode.
Step 108 is chosen 1,000,000 label datas (1M reads) from the analytical results that selected sequencing analysis mode is obtained, carry out the order-checking stability analysis of genetic expression.For example, according to aforementioned analysis-by-synthesis, if the quantitative gene expression that the RNA-Seq analysis mode is obtained more accurate (being that the gene expression amount that RNA-Seq obtains more approaches the gene expression amount that qPCR obtains), picked at random 1Mreads from the analytical results that the RNA-Seq analysis mode is obtained then, and itself and the whole group analysis result that transcribes carried out correlation analysis; The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily; If the quantitative gene expression that DGE and RNA-Seq analysis mode are obtained is suitable, then can therefrom choose any one kind of them, choosing 1M reads in the selected analytical results that mode was obtained, and itself and the whole group analysis result that transcribes carried out correlation analysis; Thereby (mainly is by analyzing the repeatability of sequencing result about " detecting and assessment " wherein according to analytical results to the accuracy that the stability of producing order-checking detects and assesses to guarantee examining order, because number gene and the expression amount of 1M reads determine, if certain order-checking with determine that result's repeatability is bad and just illustrate that this time order-checking is unstable incorrect).
Digital gene expression pattern analysis (DGE) experimental section mainly comprises: sample preparation experiment and order-checking experiment.The main agents consumptive material is Illumina Gene Expression Sample Prep Kit and Solexa sequence testing chip (flowcell), and key instrument is Illumina Cluster Station (Illumina company) and Illumina Genome Analyzer (Illumina company) system.Concrete experiment flow: extract the total RNA of 6 μ g, utilize Oligo (dT) magnetic bead adsorption and purification mRNA, and with Oligo (dT) guiding reverse transcription synthetic double chain cDNA.Label 5 ' the terminal available two kinds of restriction endonucleases of generation are realized: NlaIII or DpnII, usually we use NlaIII, CATG site on its identification and the cut-out cDNA, utilizing the magnetic bead deposition and purification to have the fragment of cDNA3 ' end, (is sequence: ACAGGTTCAGAGTTCTACAGTCCGACATG) with its 5 ' terminal connection Illumina joint 1.Illumina joint 1 is the recognition site of MmeI with the junction in CATG site, and MmeI is a kind of recognition site and the isolating restriction endonuclease of restriction enzyme site, and enzyme is cut 17bp place, downstream, CATG site, has so just produced the Tag that has joint 1.After magnetic bead precipitation removal 3 ' fragment, (be sequence: CAAGCAGAAGACGGCATACGANN), thereby obtain the 21bp label library that two ends are connected with different joint sequences at the terminal Illumina joint 2 that connects of Tag3 '.Behind 15 round-robin PCR linear amplifications, by 6%TBE PAGE gel electrophoresis purifying 85 base bands, after unwinding, single chain molecule is added to Solexa sequence testing chip (flowcell) to be gone up and fixes, amplification becomes a unit molecule bunch (cluster) sequencing template to every molecule through original position, add 4 looks fluorescently-labeled, 4 kinds of Nucleotide, adopt while synthesizing sequencing (sequencing by synthesis, SBS) order-checking.Each passage will produce millions of original Read, and the long 35bp that is is read in the order-checking of Read.Utilize mRNA among the total RNA of beads enrichment of OligodT, and reverse transcription is double-stranded cDNA, adopts 4 bases identification enzyme NlaIII, enzyme to cut double-stranded cDNA, link Illumina joint 1, utilize the MmeI enzyme to cut 3 ' end CATG downstream 17bp base, and at 3 ' end link Illumina joint 2.Add Primer GX1 again and Primer GX2 carries out pcr amplification.Amplification back sample reclaims 85 base bands by 6%TBE PAGE glue, checks order by Illumina genetic expression sequencing behind the purifying.Transcribing group analysis (RNA-Seq) experimental section order-checking primary process comprises: after extracting the total RNA of sample, with the enrichment with magnetic bead eukaryote mRNA that has Oligo (dT) (if prokaryotic organism enter next step after then removing rRNA with test kit).Add fragmentation buffer mRNA is broken into the short-movie section, with mRNA is template, with the synthetic article one cDNA chain of hexabasic basic random primer (random hexamers), add the synthetic second cDNA chain of damping fluid, dNTPs, RNase H and DNA polymerase I then, do terminal the reparation and the connection sequence measuring joints after passing through QiaQuick PCR test kit (production of Qiagen company) purifying again and adding the EB buffer solution elution, carrying out clip size with agarose gel electrophoresis then selects, carry out pcr amplification at last, use the order-checking library of building up to check order.
Next the result's of result that DGE and RNA-Seq are analyzed and qPCR correlation analysis is described in detail:
The result of digital gene expression pattern analysis (DGE) and the result's of qPCR correlation analysis, relate generally to the account form of expression amount TPM in the DGE standard analysis (Transcripts Per Million clean reads), specifically: total clean Tags number * 1 in original Clean Tags number/this sample that each gene of TPM=comprises, 000,000 (referring to Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms, Peter A.C. ' t Hoen, Yavuz Ariyurek, et al., Nucleic Acids Research, 15 October 2008, Vol.36, No.21).
Fig. 2 shows result that the DGE of the present invention's two samples analyzes and qPCR result's correlation analysis result's synoptic diagram.As a rule, be 3M reads as the DGE data throughput, can from the sample sequencing data, get 3M reads at random and carry out the DGE analysis of accuracy as a result; The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 3M more arbitrarily.Because UHRR and HBRR are the RNA standard models, can download what obtain is the qPCR result of this RNA standard model, and the sample sequencing data can not be downloaded, and need check order voluntarily.Fig. 2 (a) shows the DGE analytical results of sample UHRR and qPCR result's analytical results synoptic diagram, and Fig. 2 (b) shows the DGE analytical results of sample HBRR and qPCR result's analytical results synoptic diagram; Wherein the UHRR of the present invention's use is Universal Human Reference RNA (UHRR) standard substance of Stratagene company, and HBRR is Human Brain Reference RNA (HBRR) standard substance of Ambion company.Shown in Fig. 2 (a), result that the DGE of sample UHRR analyzes and qPCR result's relation conefficient are about 0.3, shown in Fig. 2 (b), result that the DGE of sample HBRR analyzes and qPCR result's relation conefficient are about 0.53 (wherein UHRR and HBRR sample can detected gene number all be 716, and UHRR and the HBRR sample detected gene number of energy in qPCR all is 687) in DGE analyzes.
Transcribe the result of group analysis (RNA-Seq) and qPCR result's correlation analysis, relate generally to the account form of expression amount RPKM in the RNA-Seq standard analysis (Reads Per Kb per Million reads), specifically: the algorithm of expression amount RPKM in the RNA-Seq standard analysis is (referring to Mapping and quantifying mammalian transcriptomes by RNA-Seq, Ali Mortazavi et al., 30May 2008, Nature Methods|Advance Online Publication) as follows:
RPKM = 10 6 C NL / 10 3
Wherein, RPKM (A) is the expression amount about gene A, and C is unique reads number of comparing gene A, and N is unique genomic total reads number of comparing, and L is the base number of gene A coding region.
Fig. 3 shows result that the RNA-Seq of the present invention's two samples analyzes and qPCR result's correlation analysis result's synoptic diagram.As a rule, as the RNA-Seq data throughput is 3Mreads, can get 3M reads at random and carry out the DGE analysis of accuracy as a result from the sample sequencing data, the described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 3M more arbitrarily.Fig. 3 (a) shows the result and qPCR result's analytical results synoptic diagram that the RNA-Seq of sample UHRR analyzes, and Fig. 3 (b) shows the result of RNA-Seq analysis of sample HBRR and qPCR result's analytical results synoptic diagram; Wherein the result of the RNA-Seq of sample UHRR analysis and qPCR result's relation conefficient are about 0.91, result that the RNA-Seq of sample HBRR analyzes and qPCR result's relation conefficient are about 0.86 (wherein UHRR and HBRR sample can detected gene number all be 872, and UHRR and the HBRR sample detected gene number of energy in qPCR all is 851) in RNA-Seq analyzes.In addition, need to prove: from sample UHRR and HBRR, extract 3M reads and carry out RNA-Seq and qPCR correlation analysis, identical with the relation conefficient of RNA-Seq that calculates with all data and qPCR, all be respectively 0.91 and 0.86.This shows that for the data volume of gene sequencing, a few nothing influences of its quantitative analysis to RNA-Seq influence in other words is very little.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, before the order-checking fragment that sequencing technologies is obtained is carried out DGE and RNA-Seq analytical procedure respectively, the result of digital gene expression spectrum analysis and the result that transcribes group analysis are removed joint sequence respectively; Further, also can go the processing of inferior quality sequence again to the result who removes joint sequence, thus obtain can be used in label data (clean tag) to carry out subsequent analysis.
Among the embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention, by high throughput sequencing technologies the genetic expression of sample fragment is repeatedly checked order, and the data of repeatedly order-checking are averaged to obtain the result of real-time quantitative gene amplification fluoroscopic examination.For example, the qPCR data of UHRR and HBRR sample are from GEO (high-throughput genetic expression, Gene Expression Omnibus) upward downloads, its download path specifically: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE5350, wherein the accession number of UHRR is GSM129638, and open day is on September 8th, 2006; The accession number of HBRR is GSM129645, and open day is on September 8th, 2006.To repeatedly check order the respectively parallel laboratory test of (as 4 times) of UHRR and HBRR sample, and to these 4 parallel laboratory tests about the results averaged of gene number and gene expression amount with as the qPCR quantitative result.
The quality control method that checks order about the RNA of genetic expression provided by the invention, based on the order-checking fragment being carried out DGE and RNA-Seq analysis, and result and qPCR result that DGE and RNA-Seq are analyzed carry out the analysis-by-synthesis of dependency, thereby choose the order-checking stability analysis that suitable sequencing analysis mode is carried out genetic expression.An embodiment of the quality control method by the RNA order-checking about genetic expression provided by the invention, it can truly reflect the accuracy of gene sequencing, guarantees the industry feasibility, for the stability of producing provides the quality control scheme.
Fig. 4 illustrates the schema of another embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention.
As shown in Figure 4, quality control method 400 about the RNA of genetic expression order-checking comprises step 402,404,405,406,408, wherein step 402,406 and 408 can be carried out respectively and can carry out respectively and step 102 shown in Figure 1,106 and 108 same or analogous technology contents, for for purpose of brevity, repeat no more its technology contents here.
As shown in Figure 4, after step 402, performing step 404, not full-time when the reference gene, the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively.With reference to gene all is to have spliced good nucleotide sequence (http://www.ncbi.nlm.nih.gov/) in the existing database, these nucleotide sequences have a lot of versions (by different research institutions, the issue of unit of Data centre or the like), each mechanism is because the restriction of its state of the art, so it is different that result and the truth of gene of issue have, so may exist with reference to the incomplete or incomplete situation of gene.For example, when the result who analyzes at the result who analyzes with DGE and RNA-Seq carries out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively, when the reference gene complete/when imperfect, can carry out the analysis of dependency in the following way.
Specifically, in non-model animals, have reason to suspect that the not plenary session with reference to gene order causes DGE quantitatively inaccurate; At first, complete reference gene order (as the refseq gene of people among the NCBI) is carried out trisection since 3 ' end, then the gene order of trisection is used as complete reference gene order, is carried out the result of DGE analysis and qPCR result's correlation analysis respectively.Fig. 5 shows result that sample UHRR trisection of the present invention analyzes with reference to the DGE of gene order and qPCR result's correlation analysis result's synoptic diagram, wherein Fig. 5 (a) shows result that the DGE of first section of sample UHRR analyzes and qPCR result's analytical results synoptic diagram, Fig. 5 (b) shows the result and qPCR result's analytical results synoptic diagram that the DGE of second section of sample UHRR analyzes, and Fig. 5 (c) shows the result of DGE analysis of the 3rd section of sample UHRR and qPCR result's analytical results synoptic diagram; Analyze this three partial sequence of finding sample UHRR, its DGE analytical results and qPCR result's relation conefficient is about 0.71 respectively, 0.39 and 0.33 (DGE analytical results and qPCR result's relation conefficient is 0.76 when performing an analysis with complete genome sequence), it can detected gene number be respectively 774,596 and 435 in DGE analyzes.Similarly, carry out for adopting the RNA-Seq analysis mode that gene order expresses, also be earlier complete reference gene order to be carried out trisection since 3 ' end, then the gene order of trisection be used as complete reference gene order, carry out RNA-Seq and qPCR correlation analysis respectively.Fig. 6 shows result that sample UHRR trisection of the present invention analyzes with reference to the RNA-Seq of gene order and qPCR result's correlation analysis result's synoptic diagram, wherein Fig. 6 (a) shows result that the RNA-Seq of first section of sample UHRR analyzes and qPCR result's analytical results synoptic diagram, Fig. 6 (b) shows the result and qPCR result's analytical results synoptic diagram that the RNA-Seq of second section of sample UHRR analyzes, and Fig. 6 (c) shows the result of RNA-Seq analysis of the 3rd section of sample UHRR and qPCR result's analytical results synoptic diagram; Analyze this three partial sequence of finding sample UHRR, its RNA-Seq analytical results and qPCR result's relation conefficient is about 0.85,0.91 and 0.84 (relation conefficient of RNA-Seq and qPCR is 0.91 when performing an analysis with complete genome sequence) respectively, and it can detected gene number be respectively 917,911 and 896 in RNA-Seq analyzes.
DGE is because the shortcoming of himself, it can't detect the gene that does not contain CATG (or GATC) site, tend to obtain the label of the Tag of every the most close 3 ' end of mRNA as this mRNA, therefore it is relatively stricter to the requirement of reference gene, and is very big with reference to the imperfect influence to DGE result of gene order; And RNA-Seq interrupts at random to mRNA, so every mRNA can access a lot of labels, is not very strong to the dependency of reference gene, also can access expression amount information more accurately under the incomplete situation of reference gene.This shows that gene order is imperfect bigger to the influence of DGE analytical results, and little for the analytical results influence of RNA-Seq; That is to say, not full-time for the reference gene, if use DGE to analyze, then preferably adopt genetic expression 3 ' to hold first section that begins; And further, preferably adopt the RNA-Seq analysis mode that gene fragment is carried out expression analysis.
Step 405, under the situation of identical order-checking amount, the result of comparative figures gene expression spectrum analysis and the detected gene number of result of transcribing group analysis.For example, under identical order-checking amount, relatively DGE and the detected gene number of two kinds of analysis modes of RNA-Seq specifically can comprise: the reads number that takes out 3M from the order-checking fragment that high-flux sequence obtains at random carries out digital gene express spectra and RNA-Seq respectively and analyzes, the reads number that takes out 2M from the order-checking fragment that high-flux sequence obtains at random carries out digital gene express spectra and RNA-Seq respectively to be analyzed, and the reads number that takes out 1M from the order-checking fragment that high-flux sequence obtains at random carries out DGE and RNA-Seq respectively and analyzes; The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of respective amount more arbitrarily.Under the situation of identical order-checking amount, can from aforementioned three kinds of modes, choose at least a mode wantonly and come comparative figures gene expression profile and the detected gene number of RNA-Seq energy respectively.Fig. 7 be sample UHRR of the present invention under identical order-checking amount, the synoptic diagram of the detected gene number of DGE and RNA-Seq.As shown in Figure 7, RNA-Seq can detected gene be Duoed about 1000 genes than DGE when identical order-checking amount.
Fig. 8 illustrates the schema of another embodiment of the quality control method of the RNA order-checking about genetic expression provided by the invention.
As shown in Figure 8, quality control method 800 about the RNA of genetic expression order-checking comprises step 802,804,806,808,809, wherein step 802,804,806 can be carried out respectively and can carry out respectively and step 102 shown in Figure 1,104,106 same or analogous technology contents, for for purpose of brevity, repeat no more its technology contents here.
As shown in Figure 8, after step 806, performing step 808 is taken out 1,000,000 label datas (1M reads) at random from digital gene expression pattern analysis result, and itself and whole digital gene expression pattern analysis result are carried out correlation analysis.The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily.
Step 809 is taken out 1,000,000 label datas (1M reads) at random from transcribe the group analysis result, and itself and the whole group analysis result that transcribes are carried out correlation analysis.The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily.
About the correlation analysis in step 808 and the step 809, can be in the following way: each lane adds known storehouse (only surveying 1M left and right sides reads), and it is stable that the sequencing result by more known storehouse detects order-checking.For example, adopt 1M order-checking amount standard substance correlation analysis in the embodiment of the invention, twice 1M order-checking amount standard substance UHRR and repeated experiments sample UHRR dependency synopsis, as shown in table 1.
Figure BSA00000204094700151
The data dependence synopsis of table 11M order-checking amount standard substance and repeated experiments
Table 1 shows, to the discovery of analysing and comparing of the data of 1M order-checking amount standard substance and repeated experiments, the Gene dependency is that Spearman relation conefficient or Pearson correlation coefficient are all very high; It can be said that bright sequencing result is normal, trusty, the method for using standard substance to detect order-checking stability has feasibility, can provide the quality control scheme for gene sequencing by the analysis to genetic expression.Carry out the application of method in the RNA order-checking of genetic expression Quality Control with 1M reads and can assess production stability.
In addition, need to prove, for quality control method of the present invention and device, no matter be angle from quantitative accuracy, still from detected number gene, or the dependency equal angles of reference gene come comparison, in the Quality Control scheme, adopt the RNA-Seq analytical procedure to have the advantage that reflects genetic expression than DGE more accurately.
The structural representation of the Quality Control device of a kind of RNA order-checking about genetic expression that Fig. 9 illustrates that the embodiment of the invention provides.
As shown in Figure 9, a kind of Quality Control device 900 that checks order about the RNA of genetic expression comprises: genetic expression measuring and calculating module 902, correlation analysis module 904, sequencing analysis mode are chosen module 906 and order-checking stability analysis module 908.
Wherein, genetic expression measuring and calculating module 902, the order-checking fragment that is used for that sequencing technologies is obtained is carried out digital gene expression pattern analysis (DGE) respectively and is transcribed group analysis (RNA-Seq);
Correlation analysis module 904 is used for the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively.
The sequencing analysis mode is chosen module 906, be used for according to the correlation analysis result, judge digital gene expression spectrum analysis and transcribe the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode the group analysis with transcribing from the digital gene expression pattern analysis.
Order-checking stability analysis module 908 is used for choosing 1,000,000 label datas (1M reads) from the analytical results that selected sequencing analysis mode is obtained, and carries out the order-checking stability analysis of genetic expression.
Figure 10 illustrates the structural representation of another embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention.
As shown in figure 10, a kind of Quality Control device 1000 of the RNA order-checking about genetic expression comprises: genetic expression measuring and calculating module 1002, correlation analysis module 1004, sequencing analysis mode are chosen module 1006 and order-checking stability analysis module 1008, and wherein genetic expression measuring and calculating module 1002, sequencing analysis mode are chosen module 1006 and order-checking stability analysis module 1008 can be to choose module 906 and order-checking stability analysis module 908 same or analogous functional modules with genetic expression measuring and calculating module 902 shown in Figure 9, sequencing analysis mode.For for purpose of brevity, repeat no more here.
As shown in figure 10, correlation analysis module 1004 further comprises: the first correlation analysis submodule and the second correlation analysis submodule; Wherein
The first correlation analysis submodule 10041 is used for when not full-time with reference to gene, will hold 5 ' to hold level with both hands and all be cut into three parts from 3 ' with reference to gene; Three parts are carried out the digital gene expression pattern analysis respectively and transcribe group analysis with reference to gene; The analytical results that obtained is carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively.
The second correlation analysis submodule 10042, be used under the situation of identical order-checking amount, from the order-checking fragment that high-flux sequence obtains, take out 3,000,000 label datas (3M reads) at random and carry out the digital gene expression pattern analysis respectively and transcribe group analysis, from the order-checking fragment that high-flux sequence obtains, take out 2,000,000 label datas (2M reads) at random and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; And/or from the order-checking fragment that high-flux sequence obtains, take out 1,000,000 label datas (1M reads) at random and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of respective amount more arbitrarily.And under the situation of identical order-checking amount, respectively the comparative figures gene expression spectrum analysis and transcribe group analyzing method can detected gene number.
Figure 11 illustrates the structural representation of another embodiment of the Quality Control device of the RNA order-checking about genetic expression provided by the invention.
As shown in figure 11, a kind of Quality Control device 1100 that checks order about the RNA of genetic expression comprises: genetic expression measuring and calculating module 1102, correlation analysis module 1104, sequencing analysis mode are chosen module 1106 and order-checking stability analysis module 1108.Wherein to choose module 1106 can be to choose module 906 same or analogous functional modules with genetic expression measuring and calculating module 902 shown in Figure 9, correlation analysis module 904, sequencing analysis mode for genetic expression measuring and calculating module 1102, correlation analysis module 1104, sequencing analysis mode.For for purpose of brevity, repeat no more here.
As shown in figure 11, order-checking stability analysis module 1108 further comprises: the first order-checking stability analysis submodule 11081 and the second order-checking stability analysis submodule 11082, wherein
The first order-checking stability analysis submodule 11081 is used for taking out 1,000,000 label datas (1M reads) at random from digital gene expression pattern analysis result, and itself and whole digital gene expression pattern analysis result are carried out correlation analysis.The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily.
The second order-checking stability analysis submodule 11082 is used for taking out 1,000,000 label datas (1M reads) at random from transcribing the group analysis result, and itself and the whole group analysis result that transcribes are carried out correlation analysis.The described mode of choosing at random can be that the reads that all order-checkings obtain is upset fully, therefrom takes out the reads of 1M more arbitrarily.
The Quality Control device that checks order about the RNA of genetic expression provided by the invention, by genetic expression measuring and calculating module gene fragment is analyzed, and choose module by correlation analysis module and sequencing analysis mode and carry out correlation analysis and comprehensive assessment, thereby choose gene expression analysis means with higher reliability, the accuracy of true reflection gene sequencing is for the stability of producing provides the quality control scheme.
With reference to the exemplary description of aforementioned the present invention, those skilled in the art can clearly know the aforementioned advantages that quality control method and device had that checks order about the RNA of genetic expression provided by the invention, Quality Control scheme provided by the invention is applicable to high throughput sequencing technologies, can assess the stability of RNA order-checking effectively, guarantee the accuracy of examining order.
Description of the invention provides for example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.The functional module of describing among the present invention and the dividing mode of functional module only are explanation thought of the present invention, and those skilled in the art can freely change the dividing mode of functional module and module structure thereof with the realization identical functions according to the needs of instruction of the present invention and practical application; Selecting and describing embodiment is for better explanation principle of the present invention and practical application, thereby and makes those of ordinary skill in the art can understand the various embodiment that have various modifications that the present invention's design is suitable for specific end use.

Claims (10)

1. quality control method about the RNA of genetic expression order-checking is characterized in that described method comprises:
The order-checking fragment that sequencing technologies obtains is carried out digital gene expression pattern analysis (DGE) respectively and transcribed group analysis (RNA-Seq);
The result of described digital gene expression pattern analysis and the described result who transcribes group analysis carry out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively;
According to the correlation analysis result, judge digital gene expression spectrum analysis and transcribe the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode the group analysis with transcribing from described digital gene expression pattern analysis;
From the analytical results that selected sequencing analysis mode is obtained, choose 1,000,000 label datas (1M reads), carry out the order-checking stability analysis of genetic expression.
2. the method for claim 1 is characterized in that, described method also comprises:
Adopt high throughput sequencing technologies to carry out checking order about the RNA of genetic expression;
Result and the described result who transcribes group analysis to described digital gene expression pattern analysis go joint sequence and the processing of going the inferior quality sequence respectively.
3. the method for claim 1 is characterized in that, by high throughput sequencing technologies the genetic expression of sample fragment is repeatedly checked order, and the data of repeatedly order-checking are averaged to obtain the result of described real-time quantitative gene amplification fluoroscopic examination.
4. the method for claim 1 is characterized in that, the result of described digital gene expression pattern analysis and the described result who transcribes group analysis carry out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively and further comprise:
Not full-time when the reference gene, the result of digital gene expression pattern analysis and the result that transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively; And/or
Under the situation of identical order-checking amount, the result of comparative figures gene expression spectrum analysis and the detected gene number of result of transcribing group analysis.
5. method as claimed in claim 4, it is characterized in that, described when not full-time with reference to gene, the result of digital gene expression pattern analysis and the result that transcribes group analysis are further comprised with the step that the result of real-time quantitative gene amplification fluoroscopic examination carries out correlation analysis respectively:
Hold 5 ' to hold all be cut into three part with reference to gene from 3 ' with described;
Described three parts are carried out the digital gene expression pattern analysis respectively and transcribe group analysis with reference to gene;
The analytical results that obtained is carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively.
6. method as claimed in claim 4 is characterized in that, and is described under the situation of identical order-checking amount, and the result of comparative figures gene expression spectrum analysis further comprises with the step of transcribing the detected gene number of result of group analysis:
From the order-checking fragment that high-flux sequence obtains, take out 3,000,000 label datas (3M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis, from the order-checking fragment that high-flux sequence obtains, take out 2,000,000 label datas (2M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; And/or from the order-checking fragment that high-flux sequence obtains, take out 1,000,000 label datas (1M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis;
Under the situation of identical order-checking amount, respectively the comparative figures gene expression spectrum analysis and transcribe group analyzing method can detected gene number.
7. the method for claim 1 is characterized in that, described 1,000,000 label datas (1M reads) of from the analytical results that selected sequencing analysis mode is obtained, choosing, and the step of carrying out the order-checking stability analysis of genetic expression further comprises:
From digital gene expression pattern analysis result, take out 1,000,000 label datas (1M reads), and itself and whole digital gene expression pattern analysis result are carried out correlation analysis; And/or
From transcribe the group analysis result, take out 1,000,000 label datas (1M reads), and itself and the whole group analysis result that transcribes are carried out correlation analysis.
8. Quality Control device about the RNA of genetic expression order-checking is characterized in that described device comprises:
Genetic expression measuring and calculating module, the order-checking fragment that is used for that sequencing technologies is obtained is carried out digital gene expression pattern analysis (DGE) respectively and is transcribed group analysis (RNA-Seq);
The correlation analysis module is used for the result of described digital gene expression pattern analysis and the described result who transcribes group analysis are carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination (qPCR) respectively;
The sequencing analysis mode is chosen module, be used for according to the correlation analysis result, judge digital gene expression spectrum analysis and transcribe the difference of group analysis on quantitative gene expression, and choose a kind of sequencing analysis mode the group analysis with transcribing from described digital gene expression pattern analysis;
Order-checking stability analysis module is used for choosing 1,000,000 label datas (1M reads) from the analytical results that selected sequencing analysis mode is obtained, and carries out the order-checking stability analysis of genetic expression.
9. device as claimed in claim 8 is characterized in that, described correlation analysis module further comprises:
The first correlation analysis submodule is used for when not full-time with reference to gene, holds 5 ' to hold all be cut into three part with reference to gene from 3 ' with described; Described three parts are carried out the digital gene expression pattern analysis respectively and transcribe group analysis with reference to gene; The analytical results that obtained is carried out correlation analysis with the result of real-time quantitative gene amplification fluoroscopic examination respectively;
The second correlation analysis submodule is used under the situation of identical order-checking amount, takes out 3,000,000 label datas (3M reads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis from the order-checking fragment that high-flux sequence obtains; Taking out 2,000,000 label datas (2M reads) from the order-checking fragment that high-flux sequence obtains carries out the digital gene expression pattern analysis respectively and transcribes group analysis; And/or from the order-checking fragment that high-flux sequence obtains, take out 1,000,000 label datas (1Mreads) and carry out the digital gene expression pattern analysis respectively and transcribe group analysis; And under the situation of identical order-checking amount, respectively the comparative figures gene expression spectrum analysis and transcribe group analyzing method can detected gene number.
10. device as claimed in claim 8 is characterized in that, described order-checking stability analysis module further comprises:
The first order-checking stability analysis submodule is used for taking out 1,000,000 label datas (1M reads) from digital gene expression pattern analysis result, and itself and whole digital gene expression pattern analysis result are carried out correlation analysis;
The second order-checking stability analysis submodule is used for taking out 1,000,000 label datas (1M reads) from transcribing the group analysis result, and itself and the whole group analysis result that transcribes are carried out correlation analysis.
CN2010102361769A 2010-07-22 2010-07-22 RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression Pending CN101914619A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN2010102361769A CN101914619A (en) 2010-07-22 2010-07-22 RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression
PCT/CN2011/001158 WO2012009952A1 (en) 2010-07-22 2011-07-13 Quality control method and apparatus for rna sequencing of gene expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102361769A CN101914619A (en) 2010-07-22 2010-07-22 RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression

Publications (1)

Publication Number Publication Date
CN101914619A true CN101914619A (en) 2010-12-15

Family

ID=43322256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102361769A Pending CN101914619A (en) 2010-07-22 2010-07-22 RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression

Country Status (2)

Country Link
CN (1) CN101914619A (en)
WO (1) WO2012009952A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012009952A1 (en) * 2010-07-22 2012-01-26 深圳华大基因科技有限公司 Quality control method and apparatus for rna sequencing of gene expression
CN104711340A (en) * 2013-12-17 2015-06-17 北京大学 Transcriptome sequencing method
CN105095686A (en) * 2014-05-15 2015-11-25 中国科学院青岛生物能源与过程研究所 High-flux transcriptome sequencing data quality control method based on multi-core CPU (Central Processing Unit) hardware
CN105734159A (en) * 2016-04-29 2016-07-06 北京泱深生物信息技术有限公司 Molecular marker of esophageal squamous carcinoma
CN105779442A (en) * 2016-05-19 2016-07-20 清华大学 Extraction method of macro-transcriptome RNA of flora utilized by natural cellulose
CN107119120A (en) * 2017-05-04 2017-09-01 河海大学常州校区 A kind of key effect molecular detecting method based on chromatin 3D conformation technologies
CN108004302A (en) * 2017-12-12 2018-05-08 中国农业科学院麻类研究所 A kind of association analysis method of transcript profile reference and its application
CN111455031A (en) * 2019-01-18 2020-07-28 中国科学院微生物研究所 Multi-group chemical sequencing and analysis method based on Nanopore sequencing technology

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837751B (en) * 2021-01-21 2024-02-09 佛山科学技术学院 High-throughput transcriptome sequencing data and trait association analysis system and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009124255A2 (en) * 2008-04-04 2009-10-08 Helicos Biosciences Corporation Methods for transcript analysis
CN101914619A (en) * 2010-07-22 2010-12-15 深圳华大基因科技有限公司 RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012009952A1 (en) * 2010-07-22 2012-01-26 深圳华大基因科技有限公司 Quality control method and apparatus for rna sequencing of gene expression
CN104711340A (en) * 2013-12-17 2015-06-17 北京大学 Transcriptome sequencing method
CN105095686A (en) * 2014-05-15 2015-11-25 中国科学院青岛生物能源与过程研究所 High-flux transcriptome sequencing data quality control method based on multi-core CPU (Central Processing Unit) hardware
CN105734159A (en) * 2016-04-29 2016-07-06 北京泱深生物信息技术有限公司 Molecular marker of esophageal squamous carcinoma
CN105734159B (en) * 2016-04-29 2019-04-05 北京泱深生物信息技术有限公司 The molecular marked compound of esophageal squamous cell carcinoma
CN105779442A (en) * 2016-05-19 2016-07-20 清华大学 Extraction method of macro-transcriptome RNA of flora utilized by natural cellulose
CN105779442B (en) * 2016-05-19 2019-05-28 清华大学 A kind of native cellulose using the macro transcript profile RNA of flora extracting method
CN107119120A (en) * 2017-05-04 2017-09-01 河海大学常州校区 A kind of key effect molecular detecting method based on chromatin 3D conformation technologies
CN108004302A (en) * 2017-12-12 2018-05-08 中国农业科学院麻类研究所 A kind of association analysis method of transcript profile reference and its application
CN111455031A (en) * 2019-01-18 2020-07-28 中国科学院微生物研究所 Multi-group chemical sequencing and analysis method based on Nanopore sequencing technology

Also Published As

Publication number Publication date
WO2012009952A1 (en) 2012-01-26

Similar Documents

Publication Publication Date Title
CN101914619A (en) RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression
Foley et al. Gene expression profiling of single cells from archival tissue with laser-capture microdissection and Smart-3SEQ
Wan et al. Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data
AU2014337089B2 (en) Methods and systems for genotyping genetic samples
Korpelainen et al. RNA-seq data analysis: a practical approach
CN117887804A (en) Methods and compositions for identifying or quantifying targets in biological samples
CN103014137B (en) Gene expression quantification analysis method
US11773429B2 (en) Reduction of bias in genomic coverage measurements
Wang et al. The effect of methanol fixation on single-cell RNA sequencing data
Zeng et al. Technical considerations for functional sequencing assays
AU2014324438A1 (en) Methods and system for detecting sequence variants
Reif et al. Integrated analysis of genetic, genomic and proteomic data
US11474107B2 (en) Digital analysis of molecular analytes using electrical methods
CN104404160A (en) MIT (Mitochondrion) primer design method and method for constructing planktonic animal barcode database by utilization of high-throughput sequencing
WO2017189677A1 (en) Machine learning techniques for analysis of structural variants
Schmitz et al. Quality control and evaluation of plant epigenomics data
Chamberlin et al. Variable RNA sampling biases mediate concordance of single-cell and nucleus sequencing across cell types
KR20070086080A (en) Method, program and system for the standardization of gene expression amount
CN110241191A (en) A method of mtDNA copy number and mutation are detected based on NGS simultaneously
Bhowmik et al. A review article on ChIP-Seq tools: MACS2, HOMER, SICER, PEAKANNOTATOR and MEME
Mitra et al. Statistical analyses of next generation sequencing data: an overview
US20230268024A1 (en) Computational models to analyze rna velocity
Bernasconi et al. Scenarios for the Integration of Microarray Gene Expression Profiles in COVID-19–Related Studies
Lyu et al. KAS-pipe2: a flexible toolkit for exploring KAS-seq and spKAS-seq data
Single-Molecule et al. Check Chapter 11 updates

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1150365

Country of ref document: HK

ASS Succession or assignment of patent right

Owner name: BGI TECHNOLOGY SOLUTIONS CO., LTD.

Free format text: FORMER OWNER: BGI-SHENZHEN CO., LTD.

Effective date: 20130422

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20130422

Address after: 518083 science and Technology Pioneer Park, comprehensive building, Beishan Industrial Zone, Yantian District, Guangdong, Shenzhen 201

Applicant after: BGI Technology Solutions Co., Ltd.

Address before: Beishan Industrial Zone Building in Yantian District of Shenzhen city of Guangdong Province in 518083

Applicant before: BGI-Shenzhen Co., Ltd.

C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20101215

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1150365

Country of ref document: HK