CN102622534B - A kind of DNA high pass sequencing data bearing calibration detected for gene expression - Google Patents

A kind of DNA high pass sequencing data bearing calibration detected for gene expression Download PDF

Info

Publication number
CN102622534B
CN102622534B CN201210104293.9A CN201210104293A CN102622534B CN 102622534 B CN102622534 B CN 102622534B CN 201210104293 A CN201210104293 A CN 201210104293A CN 102622534 B CN102622534 B CN 102622534B
Authority
CN
China
Prior art keywords
gene expression
high pass
dna
checking
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210104293.9A
Other languages
Chinese (zh)
Other versions
CN102622534A (en
Inventor
冯伟兴
宋艳霞
贺波
栾兴桃
王科俊
刘晓龙
赵拓
李双林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Tengfei gene Polytron Technologies Inc
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201210104293.9A priority Critical patent/CN102622534B/en
Publication of CN102622534A publication Critical patent/CN102622534A/en
Application granted granted Critical
Publication of CN102622534B publication Critical patent/CN102622534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The invention belongs to molecular biosciences infomation detection field.Specifically a kind of bearing calibration improving DNA high pass order-checking acquisition gene expression detection data accuracy.This invention comprises the following steps: that (1) cdna collection is expressed DNA sequencing and detected data, sets up the order-checking of gene expression DNA high pass and detects Data correction model; (2) gene expression values of cdna collection chip measurement; (3) model parameter in correlation analysis determination gene expression high pass order-checking calibration model is adopted; (4) the gene expression DNA high pass order-checking after Confirming model parameter value detects the gene expression values after the correction of Data correction model generation.The present invention utilizes calibration model estimate the sequence alignment mapping error that DNA sequencing value exists and compensate, and reduces metrical error, giving full play on DNA high pass order-checking detection data high resolving power, high-precision basis, effectively improves the accuracy detected.

Description

A kind of DNA high pass sequencing data bearing calibration detected for gene expression
Technical field
The invention belongs to molecular biosciences infomation detection field.Specifically a kind of bearing calibration improving DNA high pass sequenced genes detection of expression data accuracy.
Background technology
Along with the progress of information science experimental technique, also make rapid progress for the laboratory facilities obtaining molecular biosciences information.Wherein, as epoch-making molecular biosciences information detection technology, DNA high pass sequencing technologies has the ability really to realize the gene expression information high resolving power of full-length genome, high precision test.
The gene expression Cleaning Principle of DNA high pass sequencing technologies directly checks order to the target nucleotide sequences of reflection gene expression, then find target nucleotide sequences position according to sequencing result with reference to being mapped by sequence alignment in genome, thus obtain the relevant gene expression information in this position.Because be directly check order to target nucleotide sequences, high flux DNA sequencing technology improves detection resolution and the accuracy of detection of gene expression significantly.But because sequencing result needs to be mapped by sequence alignment just can be converted into significant gene expression information, the measurement of high flux DNA sequencing to gene expression belongs to indirect inspection, and there is original reason error.Namely part sequencing result causes Detection Information to occur error owing to cannot successfully map back with reference to genome.This error will cause detected value less than normal than actual value.
Summary of the invention
The object of this invention is to provide a kind of principle sequence alignment mapping error to existing when generating DNA sequencing data in gene expression detection to compensate, on high resolving power, high precision basis, obtain the DNA high pass sequencing data bearing calibration that gene expression more accurately detects.
The object of the present invention is achieved like this:
The bearing calibration of DNA high pass sequencing data, comprises the following steps:
(1) cdna collection is expressed DNA sequencing and is detected data, sets up the order-checking of gene expression DNA high pass and detects Data correction model:
Z i=(1+β×1/C i)×Y i
Wherein, Y ifor i-th gene expression high pass order-checking value of actual measurement, Z ifor i-th gene expression values after correction, C ifor the conservative value in this region of DNA territory, gene place, β is model parameter;
(2) gene expression values of cdna collection chip measurement;
(3) model parameter in correlation analysis determination gene expression high pass order-checking calibration model is adopted: the gene expression values that the gene expression values obtain gene expression DNA high pass order-checking detection Data correction model and genetic chip record carries out degree of correlation calculating, determines model parameter β value when correlation is maximum;
(4) the gene expression DNA high pass order-checking after Confirming model parameter value detects the gene expression values after the correction of Data correction model generation.
Beneficial effect of the present invention is:
The present invention utilizes calibration model estimate the sequence alignment mapping error that DNA sequencing value exists and compensate, and reduces metrical error, giving full play on DNA high pass order-checking detection data high resolving power, high-precision basis, effectively improves the accuracy detected.
Accompanying drawing explanation
Fig. 1 is schematic flow sheet of the present invention;
Fig. 2 is that target gene guards Distribution value figure;
Fig. 3 is calibration model parameter optimization curve.
Embodiment
The inventive method embodiment is as follows:
First detect data by the order-checking of analyzing DNA high pass and map due to indirect comparison the original reason error caused, set up the order-checking of gene expression DNA high pass pointedly and detect Data correction model;
Utilize relevant function method, adopt the complementary another kind of high flux gene expression test experience method that to check order with DNA high pass in principle, namely the data that generate of gene chips, determine model parameter.And obtain final gene expression DNA high pass order-checking detection Data correction model.The correction data that this model produces is better than the detection data before correcting in accuracy.
1. the calibration model of gene expression DNA high pass sequencing data
High flux DNA sequencing, in acquisition gene expression information process, needs a sequencing data to the link mapped with reference to genome.When causing sequencing data cannot be mapped to reference to genome for a certain reason, high flux DNA sequencing error just there will be.Therefore, when utilizing high flux DNA sequencing technology to carry out gene expression information detection, detection of expression value is often less than normal than actual value.This wherein, topmost source of error is when corresponding gene region exists a large amount of repetitive sequence, and DNA sequencing data will cause mapping unsuccessfully because of non-one-to-one mapping problem.Therefore, the repetitive sequence that corresponding gene region exists is more, and this error is more serious.
Based on this, this method set up gene expression DNA high pass sequencing data calibration model as shown in Equation 1:
Z i=(1+β×1/C i)×Y i(1)
Wherein, Y ifor i-th gene expression high pass order-checking value of actual measurement.Z ifor i-th gene expression values after correction.C ifor the conservative value in this region of DNA territory, gene place.β is model parameter.According to biological evolution theory, the conservative value in region of DNA territory is higher, and its base repeatability is lower.Therefore, the base in region of DNA territory repeats degree value to adopt conservative value to reflect here.
In calibration model, generation value Z ialways be greater than measured value Y i.This mainly considers that mapping the error unsuccessfully caused in high pass sequencing error will make measured value Y itendency is less than normal than true value.In addition, generation value Z ic is worth with conservative ibe inversely proportional to.I.e. C ibe worth larger, generation value Z imore close to Y i.This is larger with conservative value, and adjustment amount should be less consistent.
2. the parameter of calibration model is asked for
As another high-flux detection method---the gene chips of gene expression, although not as DNA high pass sequencing in resolution and accuracy of detection, but because it is direct-detection to gene expression, there is not sequence alignment problem, therefore, adopt genechip detection data to detect data to the gene expression that DNA high pass checks order here to correct.Be specially adopt relevant function method ask for gene expression high pass order-checking calibration model in model parameter.Namely, when model parameter β gets different value, different gene expression high pass order-checking corrected values can be obtained.Then, the expression value that corrected value and genetic chip record is carried out degree of correlation calculating.When correlation reaches maximum, the model parameter β value of its correspondence is optimal value.Corresponding model can generate gene expression sequencing data more accurately.
3. experiment test
3.1 data acquisition
1) sequencing data
ChIP-seq technology in utilizing DNA high pass to check order can be measured and add up, directly to reflect gene transcription level Pol II protein quantity in genetic transcription district.First this technology utilizes ultrasound wave to be DNA fragmentation by DNA chain degradation, then special antibody capture is utilized to be combined in Pol II albumen on DNA fragmentation, DNA fragmentation containing antibody leaches by recycling sedimentation (IP), by sequencing technologies (seq), all DNA fragmentations leached checked order and map back on DNA by sequence alignment subsequently, finally defining the measurement and statistics that can realize Pol II protein quantity in genetic transcription district according to the position of genetic transcription district on DNA.
Totally 4 groups of gene promoter area Pol II sequencing datas are selected before and after two kinds of common and resistance to the action of a drug MCF7 breast cancer cell dosings in this experiment.ChIP-seq technology in the order-checking of this data separate DNA high pass is measured and is added up Pol II protein quantity in gene promoter region, directly to reflect gene expression dose.
2) microarray data
This experiment is selected and is adopted the gene expression data of genetic chip ChIP-chip technical limit spacing to carry out correlation analysis with gene expression sequencing data.These data be for before and after identical two kinds of common and resistance to the action of a drug MCF7 breast cancer cell dosings totally 4 groups of gene expressions detect data.This genetic chip selects the Human Genome U133Plus 2.0Array chip of Affymetrix company, can once to human genome 38, and 500 genes carry out expressing information detection.
3) gene guards Value Data
This is tested DNA nucleotide sequence used and guards Value Data download from the large-scale public database UCSC of biological information.This conservative Value Data is by comparison 44 kinds of vertebrate gene group nucleotide sequences and the column-generation of human genome nucleotides sequence.
4) gene sequence data
This is tested DNA nucleotide sequence data used and also downloads from the large-scale public database UCSC of biological information.
3.2DNA high pass sequencing data corrects
First to gene expression sequencing data, the integrality of microarray data and conservative Value Data is analyzed, and obtains 9424 genes with above-mentioned complete information.
Subsequently, guard the conservative value of Value Data to these gene regions according to DNA nucleotide sequence to calculate.Testing sequencing data used due to this is the measurement and statistics carried out Pol II protein quantity in gene promoter, and therefore, we are also by the conservative value of same area and the conservative value representing this region.The conservative Distribution value obtained as shown in Figure 2.In figure, transverse axis is conservative value, and the longitudinal axis is number of times.
The method finally adopting the present invention to introduce processes gene expression DNA high pass sequencing data.In processing procedure, the correlativity of gene expression sequencing data and genechip detection data is adopted to be optimized value to model parameter β.Optimizing process as shown in Figure 3.Fig. 3 comprises the cell under 4 kinds of different experimental conditions.Respectively: A: normal breast cancer cell B before dosing: normal breast cancer cell C after dosing: anti-medicine breast cancer cell D before dosing: anti-medicine breast cancer cell after dosing.In Fig. 3, transverse axis is the value of model parameter β, and the longitudinal axis is gene sequencing data after correcting and the correlation degree value of microarray data.In trimming process, along with β value starts to increase by 0, correlation degree value improves rapidly, reaches extreme value when β value gets a certain value, and when β value continues to increase, correlation degree value declines due to exaggerated correction on the contrary.As seen from the figure, and do not carry out compared with correction, the gene sequencing data after correction and the degree of correlation of microarray data significantly improve.This shows that adopting institute of the present invention extracting method to carry out correction to sequencing data achieves more reasonably result.Namely model corresponding to β optimal value is the final calibration model of sequencing data.Table 1 for adopt this method to two kinds before and after common and resistance to the action of a drug MCF7 breast cancer cell dosing totally 4 groups of genes open the transaction module optimized parameter β value of sequencing data.
Table 1 calibration model optimal value of the parameter
The base sequencing result needs that the present invention obtains due to DNA sequencing technology just can be converted into significant gene expression information by carrying out sequence alignment mapping with reference genome base sequence, when part sequencing result cannot successfully map back with reference to genome due to non-one-to-one mapping, Detection Information will be caused to occur error.According to biological evolution theory, the conservative value in region of DNA territory is higher, and its base repetition rate is then lower, and the DNA sequencing data-mapping success ratio in this region is also higher.Therefore, conservative value is adopted to reflect the comparison mapping error that the base in region of DNA territory repeats degree and produces thereupon in model.
Although due to another kind of full-length genome gene expression detection technique means---biochip technology is not so good as DNA sequencing technology in detection resolution, but there is not comparison and map link, therefore, the present invention detects data to the gene expression generated from two different autonomous channels such as DNA sequencing and genetic chip and carries out correlation analysis, to determine correction model parameter, and the final correction realized DNA sequencing gene expression detection data.

Claims (1)

1., for the DNA high pass sequencing data bearing calibration that gene expression detects, it is characterized in that, comprise the following steps:
(1) cdna collection is expressed DNA sequencing and is detected data, sets up the order-checking of gene expression DNA high pass and detects Data correction model:
Z i=(1+β×1/C i)×Y i
Wherein, Y ifor i-th gene expression high pass order-checking value of actual measurement, Z ifor i-th gene expression values after correction, C ifor the conservative value in this region of DNA territory, gene place, β is model parameter;
(2) gene expression values of cdna collection chip measurement;
(3) model parameter in correlation analysis determination gene expression high pass order-checking calibration model is adopted: the gene expression values that the gene expression values obtain gene expression DNA high pass order-checking detection Data correction model and genetic chip record carries out degree of correlation calculating, determines model parameter β value when correlation is maximum;
(4) the gene expression DNA high pass order-checking after Confirming model parameter value detects the gene expression values after the correction of Data correction model generation.
CN201210104293.9A 2012-04-11 2012-04-11 A kind of DNA high pass sequencing data bearing calibration detected for gene expression Active CN102622534B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210104293.9A CN102622534B (en) 2012-04-11 2012-04-11 A kind of DNA high pass sequencing data bearing calibration detected for gene expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210104293.9A CN102622534B (en) 2012-04-11 2012-04-11 A kind of DNA high pass sequencing data bearing calibration detected for gene expression

Publications (2)

Publication Number Publication Date
CN102622534A CN102622534A (en) 2012-08-01
CN102622534B true CN102622534B (en) 2015-09-30

Family

ID=46562449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210104293.9A Active CN102622534B (en) 2012-04-11 2012-04-11 A kind of DNA high pass sequencing data bearing calibration detected for gene expression

Country Status (1)

Country Link
CN (1) CN102622534B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116240272A (en) 2015-11-19 2023-06-09 赛纳生物科技(北京)有限公司 Kit or system for obtaining sequence information of polynucleotides
CN107958138B (en) * 2016-10-14 2019-06-18 赛纳生物科技(北京)有限公司 A method of reading sequence information from the original signal of high-throughput DNA sequencing
CN105893788B (en) * 2016-04-26 2018-04-17 哈尔滨工程大学 Utilize the sequencing data bearing calibration of the semiconductor microarray dataset of reference gene group information
CN106650313B (en) * 2016-09-29 2019-10-18 哈尔滨工程大学 A method of it filtering out DNA base in DNase high-flux sequence data and is inclined to sexual deviation
CN107463800B (en) * 2017-07-19 2018-05-11 东莞博奥木华基因科技有限公司 A kind of enteric microorganism information analysis method and system
CN108959851B (en) * 2018-06-12 2022-03-18 哈尔滨工程大学 Illumina high-throughput sequencing data error correction method
CN109785899B (en) * 2019-02-18 2020-01-07 东莞博奥木华基因科技有限公司 Genotype correction device and method
CN115831233B (en) * 2023-02-07 2023-05-16 杭州联川基因诊断技术有限公司 Targeted sequencing data preprocessing method, equipment and medium based on mTag

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MD20080012U (en) * 2008-04-11 2008-10-31 Gmc Ip-Holding Ltd.   Biological microchip for identifying transgenic DNA sequences and measuring complex
CN101408501A (en) * 2008-11-28 2009-04-15 长春理工大学 Method for quantitatively detecting DNA base by using near-infrared spectrum-partial least squares method
CN101492740A (en) * 2009-02-24 2009-07-29 武汉兰丁医学高科技有限公司 Correct measurement method for nucleus DNA matter content in cell quantitative investigation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MD20080012U (en) * 2008-04-11 2008-10-31 Gmc Ip-Holding Ltd.   Biological microchip for identifying transgenic DNA sequences and measuring complex
CN101408501A (en) * 2008-11-28 2009-04-15 长春理工大学 Method for quantitatively detecting DNA base by using near-infrared spectrum-partial least squares method
CN101492740A (en) * 2009-02-24 2009-07-29 武汉兰丁医学高科技有限公司 Correct measurement method for nucleus DNA matter content in cell quantitative investigation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯伟兴等.《采用粒子群优化的基因转录差异分析模型》.《中国生物医学工程学报》.2010,第29卷(第2期), *

Also Published As

Publication number Publication date
CN102622534A (en) 2012-08-01

Similar Documents

Publication Publication Date Title
CN102622534B (en) A kind of DNA high pass sequencing data bearing calibration detected for gene expression
Kekkonen et al. DNA barcode‐based delineation of putative species: efficient start for taxonomic workflows
Goremykin et al. Analysis of Acorus calamus chloroplast genome and its phylogenetic implications
KR101325736B1 (en) Apparatus and method for extracting bio markers
Giorgi et al. Algorithm-driven artifacts in median polish summarization of microarray data
CN103984879B (en) A kind of method and system for determining testing gene group Zonal expression level
CN106033502B (en) The method and apparatus for identifying virus
Kelly et al. Microsatellites behaving badly: empirical evaluation of genotyping errors and subsequent impacts on population studies
Jones et al. An empirical assessment of a single family‐wide hybrid capture locus set at multiple evolutionary timescales in Asteraceae
CN115595371B (en) Method for determining MSI state of colorectal cancer patient and application
Simmons Relative benefits of amino‐acid, codon, degeneracy, DNA, and purine‐pyrimidine character coding for phylogenetic analyses of exons
US20220136063A1 (en) Method of predicting survival rates for cancer patients
Brozynska et al. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding
Sistrom et al. Morphological differentiation correlates with ecological but not with genetic divergence in a Gehyra gecko
Ao et al. Evaluating hepatocellular carcinoma cell lines for tumour samples using within‐sample relative expression orderings of genes
CN105243296A (en) Tumor feature gene selection method combining mRNA and microRNA expression profile chips
Schmutzer et al. Kmasker-a tool for in silico prediction of single-copy FISH probes for the large-genome species Hordeum vulgare
Sipos et al. Robust computational analysis of rRNA hypervariable tag datasets
CN104968806B (en) The method and apparatus that the information relevant with individual's mark based on gene order is provided
CN107619863A (en) Method for detecting the Presence of a cancer
Stokes et al. Transcriptomics for clinical and experimental biology research: hang on a seq
WO2020068881A9 (en) Compositions, systems, apparatuses, and methods for validation of microbiome sequence processing and differential abundance analyses via multiple bespoke spike-in mixtures
CN112786103A (en) Method and device for analyzing feasibility of target sequencing Panel for estimating tumor mutation load
CN104769133A (en) Method of improving microarray performance by strand elimination
CN108595914B (en) High-precision prediction method for tobacco mitochondrial RNA editing sites

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Feng Weixing

Inventor after: Song Yanxia

Inventor after: He Bo

Inventor after: Luan Xingtao

Inventor after: Wang Kejun

Inventor after: Liu Xiaolong

Inventor after: Zhao Tuo

Inventor after: Li Shuanglin

Inventor before: Feng Weixing

Inventor before: He Bo

Inventor before: Luan Xingtao

Inventor before: Wang Kejun

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: FENG WEIXING HE BO LUAN XINGTAO WANG KEJUN TO: FENG WEIXING SONG YANXIA HE BO LUAN XINGTAO WANG KEJUN LIU XIAOLONG ZHAO TUO LI SHUANGLIN

C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20151221

Address after: 528437 Guangdong province Zhongshan Torch Development Zone, Cheung Hing Road 6 No. 8 South trade building layer

Patentee after: GUANGDONG ASCENDAS GENOMICS TECHNOLOGY CO., LTD.

Address before: 150001 Heilongjiang, Nangang District, Nantong street,, Harbin Engineering University, Department of Intellectual Property Office

Patentee before: Harbin Engineering Univ.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 528437 Guangdong city of Zhongshan province Zhongshan Torch Development Zone, Cheung Hing Road 6 No. 8 South trade building layer

Patentee after: Guangdong Tengfei gene Polytron Technologies Inc

Address before: 528437 Guangdong province Zhongshan Torch Development Zone, Cheung Hing Road 6 No. 8 South trade building layer

Patentee before: GUANGDONG ASCENDAS GENOMICS TECHNOLOGY CO., LTD.