CN102061337B - Method and system for detecting tissue-specific differentially methylated region (tDMR) - Google Patents

Method and system for detecting tissue-specific differentially methylated region (tDMR) Download PDF

Info

Publication number
CN102061337B
CN102061337B CN2010105571311A CN201010557131A CN102061337B CN 102061337 B CN102061337 B CN 102061337B CN 2010105571311 A CN2010105571311 A CN 2010105571311A CN 201010557131 A CN201010557131 A CN 201010557131A CN 102061337 B CN102061337 B CN 102061337B
Authority
CN
China
Prior art keywords
tdmr
full genome
methylates
genome
seed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2010105571311A
Other languages
Chinese (zh)
Other versions
CN102061337A (en
Inventor
余昶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Technology Solutions Co Ltd
Original Assignee
BGI Technology Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Technology Solutions Co Ltd filed Critical BGI Technology Solutions Co Ltd
Priority to CN2010105571311A priority Critical patent/CN102061337B/en
Publication of CN102061337A publication Critical patent/CN102061337A/en
Application granted granted Critical
Publication of CN102061337B publication Critical patent/CN102061337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a system for detecting a tissue-specific differentially methylated region (tDMR). The method comprises the following steps: obtaining single-point methylation information on a complete genome by genome-wide sequencing; determining a seed tDMR on the complete genome under a pre-selection condition according to the single-point methylation information on the complete genome; extending the seed tDMR along two sides, and obtaining candidate tDMRs based on an extension terminal condition; and filtering the candidate tDMRs based on a filtering condition to obtain the tDMR result. By means of the method and the system provided by the invention, the methylation state of each site on the complete genome can be defined so as to determine the tDMR within the range of the complete genome, thus greatly improving the detection efficiency and lowering the cost. Subsequent validation shows that the found tDMR has high accuracy rate up to 85%; and validation in a plurality of species (not only limited to mammals) shows that the method has higher accuracy and higher sensitivity compared with other methods such as vardhman Rakyan and the like.

Description

A kind of tissue specificity difference methylate method for detecting area and system
Technical field
The present invention relates to genomics and field of bioinformatics, relate in particular to a kind of tissue specificity difference methylate zone (tissue-specific differentially methylated region, tDMR) detection method and system.
Background technology
In Mammals, DNA methylation is that genome functions realizes necessary.Existing many complete genomic researchs show that mammiferous DNA methylation profiles (being translated into collection of illustrative plates) has tissue specificity.TDMR (reference [1]) is the important implementation method of organizing the difference function, and the tDMR that finds between different tissues is significant to the research of genome functions.
Ordinary method checks methylating usually on genomic indivedual genes or subregion.Existing success ratio is low based on chip technology detection tDMR experimental technique complicated operation, and cost is high.Biochip technology can only be studied the control region of known indivedual genes at present, and can not be to the methylation analysis of lots of genes especially unknown gene, and chip manufacturing and the testing process complicated operations such as target gene and probe preparation, experiment success rate is low, and cost is high.The research method of [1] such as Vardhman Rakyan shows, find interested zone with chip, if the average methyl rate in tissue is more than 60%, and another tissue tissue just is being defined as tDMR below 40%, threshold value is chosen more dogmatic, the positive predictive value of tDMR (positive predictive value) and susceptibility (sensitivity) are respectively 78% and 61%, and error is larger.
Reference:
[1]Vardhman?Rakyan,Thomas?Down,Natalie?Thorne,et?al.An?integrated?resource?for?genome-wide?identification?and?analysis?of?human?tissue-specific?differentially?methylated?regions(tDMRs)Genome?Res.published?online?June24,2008
[2]Petra?Hajkova,Osman?El-Maarri,DNA-Methylation?Analysis?by?the?Bisulfite-Assisted?Genomic?Sequencing?Method?Methods?in?Molecular?Biology,2002,Volume200,143-154,DOI:10.1385/1-59259-182-5:143
Summary of the invention
The technical problem that the present invention will solve is to provide a kind of tissue specificity difference method for detecting area that methylates, and realized that carrying out tDMR based on full genome detects, and accuracy is high.
The invention provides a kind of tissue specificity difference regional tDMR detection method that methylates, be applied to non-medical diagnosis on disease purposes, comprising:
Obtain on full genome the single-point information that methylates by genome sequencing;
Determine seed tDMR on full genome according to the information that methylates of single-point on full genome based on preselected conditions;
TDMR extends to both sides to seed, based on extending end condition, obtains candidate tDMR;
Based on filtration condition, candidate tDMR is filtered, obtain the tDMR result;
Determine that based on preselected conditions on full genome, seed tDMR comprises according to the information that methylates of single-point on full genome: scan on full genome the single-point information that methylates by moving window, based on preselected conditions, determine seed tDMR on full genome;
Wherein, this preselected conditions comprises:
The p of (1) chi square test (in conjunction with the fisher rigorous examination)<=0.05;
The significant difference of (2) two times of methylation level; With
(3) methylation level of at least one sample is more than 20%;
Extending end condition comprises:
Distance between (1) two continuous CpG surpasses 200bp;
The average methylation level of (2) two samples is less than two times of differences;
This regional methylation level of (3) two samples is all less than 20%;
(4) p of chi square test>0.01;
Filtration condition comprises:
(1)FDR<=0.05;
The average coverage in the tDMR zone that (2) obtains is greater than 20 reads;
The coverage of the CG site single-point that (3) obtains is greater than 10 reads;
(4) accuracy of sampling with replacement assay being carried out in the CpG site in the tDMR that obtains will be more than 95%.
According to an embodiment of detection method of the present invention, take 5 CpG as length, 1 CpG scans on full genome the single-point information that methylates as the moving window of step-length.
An embodiment according to detection method of the present invention, obtaining the single-point information of methylating on full genome by genome sequencing comprises: make in genomic DNA and methylated cytosine(Cyt) deaminizating does not occur be transformed into uridylic by bisulfite, and methylated cytosine(Cyt) remains unchanged; Treated full genome is checked order, and with undressed whole genome sequence, compare, determine on full genome to occur methylated CpG site.
Detection method provided by the invention, obtain single-point on the full genome of the sample information that methylates by the genome sequencing technology, on the basis of genome sequencing, two order-checking samples are carried out analyzing and processing, can extract tDMR in full genome range; , by steps such as seed tDMR selection, seed tDMR extension, candidate tDMR filtrations, improved the accuracy that detects.
The technical problem that the present invention will solve is to provide a kind of tissue specificity difference regional detection system that methylates, and realized that carrying out tDMR based on full genome detects, and accuracy is high.
The invention provides a kind of tissue specificity difference regional tDMR detection system that methylates, comprising:
The acquisition of information module that methylates, be used for obtaining on full genome the single-point information that methylates by genome sequencing;
The seed region determination module, be used for scanning on described full genome the single-point information that methylates by moving window, based on preselected conditions, determines seed tDMR on full genome;
The seed region extension of module, be used for described seed tDMR is extended to both sides, based on extending end condition, obtains candidate tDMR;
The candidate region filtration module, be used for based on filtration condition, described candidate tDMR being filtered, and obtains the tDMR result;
Wherein, preselected conditions comprises: the p of (1) chi square test (in conjunction with the fisher rigorous examination)<=0.05; The significant difference of (2) two times of methylation level; (3) methylation level of at least one sample is more than 20%; Extending end condition comprises: the distance between (1) two continuous CpG surpasses 200bp; The average methylation level of (2) two samples is less than two times of differences; This regional methylation level of (3) two samples is all less than 20%; (4) p of chi square test>0.01; Filtration condition comprises: (1) FDR<=0.05; The average coverage in the tDMR zone that (2) obtains is greater than 20 reads; The coverage of the CG site single-point that (3) obtains is greater than 10 reads; (4) accuracy of sampling with replacement assay being carried out in the CpG site in the tDMR that obtains will be more than 95%.
Detection system embodiment according to the present invention, the seed region determination module scans on described full genome the single-point information that methylates by moving window, based on preselected conditions, determines seed tDMR on full genome; Wherein, take 5 CpG as length, 1 CpG scans on described full genome the single-point information that methylates as the moving window of step-length.
Detection system embodiment according to the present invention, the acquisition of information module that methylates comprises: the sodium bisulfite treatment facility, be used for making complete genomic DNA that methylated cytosine(Cyt) deaminizating not occur by bisulfite and be transformed into uridylic, and methylated cytosine(Cyt) remains unchanged; Full genome alignment equipment, be used for treated full genome is checked order, and with undressed whole genome sequence, compare, and determines on full genome to occur methylated CpG site.
Detection system provided by the invention, the acquisition of information module that methylates obtains single-point on the full genome of the sample information that methylates by the genome sequencing technology, follow-up modules carries out analyzing and processing to two order-checking samples on the basis of genome sequencing, can extract tDMR in full genome range; Carry out seed tDMR by the seed region determination module and select, by the seed region extension of module, carry out seed tDMR extension, by the candidate region filtration module, candidate tDMR is filtered, improved the accuracy that detects.
Description of drawings
Fig. 1 illustrates the methylate schema of an embodiment of method for detecting area of tissue specificity difference of the present invention;
Fig. 2 illustrates the methylate schema of another embodiment of method for detecting area of tissue specificity difference of the present invention;
Fig. 3 illustrates the methylate block diagram of an embodiment of regional detection system of tissue specificity difference of the present invention;
Fig. 4 illustrates the methylate block diagram of another embodiment of regional detection system of tissue specificity difference of the present invention.
Embodiment
With reference to the accompanying drawings the present invention is described more fully, exemplary embodiment of the present invention wherein is described.In the accompanying drawings, identical label represents identical or similar assembly or element.
Fig. 1 illustrates the methylate schema of an embodiment of method for detecting area of tissue specificity difference of the present invention.
As shown in Figure 1,, in step 102, by genome sequencing, obtain on the full genome of sample the single-point information that methylates.For example, on s-generation high-throughput genome sequencing basis, by bisulfite sequencing (Bisulfite-sequencing) (reference (2)), obtain the single-point of sample on the full genome information that methylates.After the processing of step 102, below step 104 to 108 differences that be used for to extract two samples zone that methylates.
, in step 104, determine seed tDMR on the full genome of two samples based on preselected conditions according to the information that methylates of single-point on two full genomes of sample.
In step 106, tDMR extends to both sides to seed, based on extending end condition, obtains candidate tDMR.
In step 108, based on filtration condition, candidate tDMR is filtered, obtain final tDMR result.
Detect tDMR experimental technique complicated operation for existing chip technology, success ratio is low, the high in cost of production problem, above-described embodiment obtains single-point on the full genome of the sample information that methylates by the genome sequencing technology, on the basis of genome sequencing, two order-checking samples are carried out analyzing and processing, can find tDMR in full genome range, easy, extract tDMR rapidly from full genome, greatly improve detection efficiency, reduced cost.In addition,, by steps such as seed tDMR selection, seed tDMR extension, candidate tDMR filtrations, the accuracy and the sensitivity that detect have been improved.
Fig. 2 illustrates the methylate schema of another embodiment of method for detecting area of tissue specificity difference of the present invention.
As shown in Figure 2,, in step 202, make in genomic DNA and methylated cytosine(Cyt) deaminizating does not occur be transformed into uridylic by bisulfite, and methylated cytosine(Cyt) remains unchanged.
In step 204, treated full genome is checked order, and with undressed whole genome sequence, compare, determine on full genome to occur methylated CpG site.
In step 206, scan on described full genome the single-point information that methylates by moving window, determine seed tDMR on full genome based on preselected conditions.
Take 5 CpG as length, 1 CpG scans on described full genome the single-point information that methylates as the moving window of step-length; Preselected conditions comprises:
The p of (1) chi square test (in conjunction with the fisher rigorous examination)<=0.05;
In the time of p<=0.05, can think in this zone that there is the difference of significance in methylating between sample in twos.
The significant difference of (2) two times of methylation level; With
(3) methylation level of at least one sample is more than 20%; One of them methyl rate in the tDMR zone of finding needs more than 20%, makes the zone of finding have biological significance.
In step 208, seed tDMR is extended to both sides and obtains candidate tDMR, the extension end condition is:
Distance between (1) two continuous CpG surpasses 200bp;
If the distance between two continuous CpG is long, the cognation between these two CpG is little, thus when this situation occurs, stop extending, thus guarantee as far as possible the reliability of detected result.
The average methylation level of (2) two samples is less than two times of differences;
This regional methylation level of (3) two samples is all less than 20%;
(4) p of chi square test>0.01.
In step 210, based on filtration condition, candidate tDMR is filtered, filtration condition comprises:
(1) FDR (false discovery rate, false discovery rate)<=0.05
The average coverage in the tDMR zone that (2) obtains is greater than 20 reads (read, the DNA sequence dna with the length necessarily read that utilizes the order-checking of new-generation sequencing technology to obtain).
The coverage of the CpG site single-point that (3) obtains is greater than 10 reads
(4) the sampling with replacement check is carried out in the CpG site in the tDMR that obtains and (extracted any one site in namely from all CpG sites, test, after completing, then put back to the method that overall middle participation is selected next time), the accuracy of result will be more than 95%.
Step 212, obtain final tDMR result by filtration.
It may be noted that in above-described embodiment and detect the CG site, it will be understood by those of skill in the art that method of the present invention goes for CHH, CHG site equally, wherein H represents any one in A, C, T.
In above-described embodiment,, by a large amount of experimental studies and creative work, determine specifically to have adopted preselected conditions, extension end condition and the filtration condition of tDMR, accuracy is high.Through subsequent authentication, the tDMR accuracy rate that finds by aforesaid method is more than 85%.
Fig. 3 illustrates the methylate block diagram of an embodiment of regional detection system of tissue specificity difference of the present invention.As shown in Figure 3, in this embodiment, detection system comprises the acquisition of information module 31 that methylates, seed region determination module 32, seed region extension of module 33 and candidate region filtration module 34.Wherein, the acquisition of information module 31 that methylates obtains on full genome the single-point information that methylates by genome sequencing; Seed region determination module 32 is determined seed tDMR on full genome according to the single-point information that methylates on full genome based on preselected conditions; 33 couples of seed tDMR of seed region extension of module extend to both sides, based on extending end condition, obtain candidate tDMR; Candidate region filtration module 34 filters candidate tDMR based on filtration condition, obtains the tDMR result.
In above-described embodiment, the acquisition of information module that methylates obtains single-point on the full genome of the sample information that methylates by the genome sequencing technology, follow-up modules carries out analyzing and processing to two order-checking samples on the basis of genome sequencing, can find tDMR in full genome range, easy, extract tDMR rapidly from full genome, greatly improve detection efficiency, reduced cost.Select, by the seed region extension of module, carry out seed tDMR extension and by the seed region determination module, carry out seed tDMR, by the candidate region filtration module, candidate tDMR is filtered, improved the accuracy and the sensitivity that detect.
In one embodiment, seed region determination module 32 scans on full genome the single-point information that methylates by moving window, based on preselected conditions, determines seed tDMR on full genome; Wherein, take 5 CpG as length, 1 CpG scans on described full genome the single-point information that methylates as the moving window of step-length; Preselected conditions comprises: the p of (1) chi square test (in conjunction with the fisher rigorous examination)<=0.05; The significant difference of (2) two times of methylation level; (3) methylation level of at least one sample is more than 20%.According to an embodiment of detection system of the present invention, above-mentioned extension end condition comprises: the distance between two continuous CpG surpasses 200bp; The average methylation level of two samples is less than two times of differences; This regional methylation level of two samples is all less than 20%; The p of chi square test>0.01.According to an embodiment of detection system of the present invention, filtration condition comprises: FDR<=0.05; The average coverage in the tDMR zone that obtains is greater than 20 reads; The coverage of the CG site single-point that obtains is greater than 10 reads; The accuracy of the CpG site in the tDMR that obtains being carried out the sampling with replacement assay will be more than 95%.
In above-described embodiment,, by a large amount of experimental studies and creative work, determine specifically to have adopted preselected conditions, extension end condition and the filtration condition of tDMR, accuracy is high.Through subsequent authentication, the tDMR accuracy rate that finds by aforesaid method is more than 85%.
Fig. 4 illustrates the methylate block diagram of another embodiment of regional detection system of tissue specificity difference of the present invention.In Fig. 4 and the module of Fig. 3 with same numeral can describe referring to the correspondence in Fig. 3,, for for purpose of brevity, at this, be not described in detail.Compare with Fig. 3, the acquisition of information module 41 that methylates in Fig. 4 comprises sodium bisulfite treatment facility 411 and full genome alignment equipment 412.Wherein, sodium bisulfite treatment facility 411 makes in complete genomic DNA and methylated cytosine(Cyt) deaminizating does not occur is transformed into uridylic by bisulfite, and methylated cytosine(Cyt) remains unchanged; Complete 412 pairs of treated full genomes of genome alignment equipment check order, and with undressed whole genome sequence, compare, and determine on full genome to occur methylated CpG site.
In above-described embodiment, the sodium bisulfite treatment facility makes in DNA and methylated cytosine(Cyt) deaminizating does not occur is transformed into uridylic with bisulfite, and methylated cytosine(Cyt) remains unchanged; Full genome alignment equipment checks order to treated full genome, and with undressed sequence, compare, judge whether that the CpG site methylates, on the clear and definite genome of energy, the methylation state in each CpG site, have very high reliability and tolerance range.
Through checking (being not limited only to Mammals) in many species, method and system of the present invention has higher tolerance range and susceptibility than the method for Vardhman Rakyan etc., and very high reliability and tolerance range are arranged.
Below introduce an application examples of the present invention.
Sampled data used is the fasta data of No. one, inoblast imr90 and YH in this application examples.The fasta data download address that No. one, inoblast imr90 and YH is respectively:
imr90:http://neomorph.salk.edu/human_methylome/data.html
YH:http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=ADDF
In this application examples, a plurality for the treatment of steps in technical scheme are realized by software, the running environment of software can be Unix/Linux operating system, by the Unix/Linux order line, moves this software.Description below provides the command line parameter of running software simultaneously.
At first carrying out data prepares to process.Multiple through comparison, duplicate removal after data are downloaded, extract the step process such as the information that methylates, obtain the cout file (recording the file of the situation that methylates in cytosine(Cyt) C site) of YH and imr90, the sample input file that extracts tDMR is this two cout files.It may be noted that method of the present invention can be used for any species that can access the cout file, the content that is not limited to enumerate in embodiment, so range of application is extremely wide.
The concrete form of Cout file is as follows:
Figure GSB00001091393500101
Table 1
The first step: selected seed tDMR
Take 5 CpG as length, 1 CpG scans on the full genome that comprises in two sample file count files the single-point information that methylates as the moving window of step-length; Preselected conditions is: the p of (1) chi square test (in conjunction with the fisher rigorous examination)<=0.05; The significant difference of (2) two times of methylation level; (3) methylation level of at least one sample is more than 20%.
Computer command line parameter is:
″./tdmr?slide-c?CG?YH.cout/chr$i.cout?imr90.cout/chr$i.cout>outfile/CG/chr$i.CG;echo?slide?done;
The form of parameter-cytosine(Cyt) C that the c representative will be found.Usually have three kinds of situations optional: CG, CHH, CHG, research is the CpG site herein, so select CG here.
Output rusults: the seed that obtains altogether 916949 tDMR
The Output rusults file layout is as follows:
Figure GSB00001091393500111
Table 2
In above-mentioned steps, can divide the seed of the parallel tDMR of searching of karyomit(e) on full genome, thereby can reduce operation time, raise the efficiency.
Second step: the two-way extension of seed tDMR
Seed tDMR extend is obtained candidate tDMR to both sides, extend end condition and be: the distance between (1) two continuous CpG surpasses 200bp; The average methylation level of (2) two samples is less than two times of differences; This regional methylation level of (3) two samples is all less than 20%; (4) p of chi square test>0.01.
Computer command line parameter is:
./tdmr?extend-t8-c?CG?YH.cout/chr$i.cout?imr90.cout/chr$i.cout?outfile/CG/chr$i.CG>outfile/CG/chr$i.CG.ext;echo?extend?done;
The time marquis that parameter-the t representative moves in program, the CPU number that use,
Parameter-c represents the form (have three kinds optional, CG, CHH, CHG, choose CG here) of cytosine(Cyt) C.
Output rusults: totally 279004 of the tDMR after extending
Front 10 row of output file form are as follows:
Figure GSB00001091393500121
Table 3
Rear 8 row are as follows:
Figure GSB00001091393500122
Table 4
The 3rd step: candidate tDMR is filtered
Based on filtration condition, candidate tDMR is filtered, filtration condition comprises: (1) FDR<=0.05; The average coverage in the tDMR zone that (2) obtains is greater than 20 reads; The coverage of the CG site single-point that (3) obtains is greater than 10 reads; (4) accuracy of sampling with replacement assay being carried out in the CpG site in the tDMR that obtains will be more than 95%.
Utilizing order under linux to complete filters tDMR:
sort-k2-n?outfile/CG/chr$i.CG.ext|awk′\$16>0.95{a=\$1;for(i=2;i<16;i++){a=a\"\t\"\$i}{print?a}}′|uniq>outfile/CG/chr$i.CG.ext.filter;echo?filter?done″
Output rusults: through filtering, obtain altogether finally 36924 tDMR.Dependency checking through tDMR and genetic expression relation, be about to all genes that occur in tDMR and find, and whether the exploit information analytical procedure, find the relation of gene expression amount and methyl rate identical with known relation, and statistics gets final product.The tDMR accuracy rate that present method finds can be more than 85%.
Totally 15 be listed as:
Front 8 row of output file form are as follows:
Table 5
Rear 7 row are as follows:
Figure GSB00001091393500141
Table 6
In above-mentioned application examples, determined the methylation state in each CpG site on full genome, the tDMR software of exploitation can be analyzed two order-checking samples on full genomic level on this basis, find out simply, fast between tissue the difference zone that methylates, greatly improve detection efficiency, reduced cost.Above-mentioned processing can divide the karyomit(e) parallel running on full genome, thus raising speed and efficiency, the present invention's minute karyomit(e) parallel running on full genome, can further improve speed and efficiency, and speed is fast, and efficiency is high.
Description of the invention provides for example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Selecting and describing embodiment is for better explanation principle of the present invention and practical application, thereby and makes those of ordinary skill in the art can understand the present invention's design to be suitable for the various embodiment with various modifications of specific end use.

Claims (6)

1. a tissue specificity difference regional tDMR detection method that methylates, be applied to non-medical diagnosis on disease purpose, it is characterized in that, comprising:
Obtain on full genome the single-point information that methylates by genome sequencing;
Determine seed tDMR on full genome according to the information that methylates of single-point on described full genome based on preselected conditions;
Described seed tDMR is extended to both sides, based on extending end condition, obtain candidate tDMR;
Based on filtration condition, described candidate tDMR is filtered, obtain the tDMR result;
Describedly according to the information that methylates of single-point on described full genome, based on preselected conditions, determine that on full genome, seed tDMR comprises:
Scan on described full genome the single-point information that methylates by moving window, based on preselected conditions, determine seed tDMR on full genome;
Wherein, described preselected conditions comprises:
(1) p of chi square test<=0.05;
The significant difference of (2) two times of methylation level; With
(3) methylation level of at least one sample is more than 20%;
Described extension end condition comprises:
Distance between (1) two continuous CpG surpasses 200bp;
The average methylation level of (2) two samples is less than two times of differences;
This regional methylation level of (3) two samples is all less than 20%;
(4) p of chi square test>0.01;
Described filtration condition comprises:
(1)FDR<=0.05;
The average coverage in the tDMR zone that (2) obtains is greater than 20 reads;
The coverage of the CG site single-point that (3) obtains is greater than 10 reads;
(4) accuracy of sampling with replacement assay is carried out more than 95% in the CpG site in the tDMR that obtains.
2. detection method according to claim 1, is characterized in that, take 5 CpG as length, 1 CpG scans on described full genome the single-point information that methylates as the moving window of step-length.
3. detection method according to claim 1, is characterized in that, obtains the single-point information of methylating on full genome by genome sequencing and comprise:
Make in genomic DNA and methylated cytosine(Cyt) deaminizating does not occur be transformed into uridylic by bisulfite, and methylated cytosine(Cyt) remains unchanged;
Treated full genome is checked order, and with undressed whole genome sequence, compare, determine on full genome to occur methylated CpG site.
4. a tissue specificity difference regional tDMR detection system that methylates, is characterized in that, comprising:
The acquisition of information module that methylates, be used for obtaining on full genome the single-point information that methylates by genome sequencing;
The seed region determination module, be used for scanning on described full genome the single-point information that methylates by moving window, based on preselected conditions, determines seed tDMR on full genome;
The seed region extension of module, be used for described seed tDMR is extended to both sides, based on extending end condition, obtains candidate tDMR;
The candidate region filtration module, be used for based on filtration condition, described candidate tDMR being filtered, and obtains the tDMR result;
Wherein, described extension end condition comprises:
Distance between (1) two continuous CpG surpasses 200bp;
The average methylation level of (2) two samples is less than two times of differences;
This regional methylation level of (3) two samples is all less than 20%;
(4) p of chi square test>0.01;
Described preselected conditions comprises:
(1) p of chi square test<=0.05;
The significant difference of (2) two times of methylation level; With
(3) methylation level of at least one sample is more than 20%;
Described filtration condition comprises:
(1)FDR<=0.05;
The average coverage in the tDMR zone that (2) obtains is greater than 20 reads;
The coverage of the CG site single-point that (3) obtains is greater than 10 reads;
(4) accuracy of sampling with replacement assay is carried out more than 95% in the CpG site in the tDMR that obtains.
5. detection system according to claim 4, is characterized in that, described seed region determination module scans on described full genome the single-point information that methylates by moving window, based on preselected conditions, determines seed tDMR on full genome; Wherein, take 5 CpG as length, 1 CpG scans on described full genome the single-point information that methylates as the moving window of step-length.
6. detection system according to claim 4, is characterized in that, the described acquisition of information module that methylates comprises:
The sodium bisulfite treatment facility, is used for making complete genomic DNA that methylated cytosine(Cyt) deaminizating not occur by bisulfite and is transformed into uridylic, and methylated cytosine(Cyt) remains unchanged;
Full genome alignment equipment, be used for treated full genome is checked order, and with undressed whole genome sequence, compare, and determines on full genome to occur methylated CpG site.
CN2010105571311A 2010-11-24 2010-11-24 Method and system for detecting tissue-specific differentially methylated region (tDMR) Active CN102061337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105571311A CN102061337B (en) 2010-11-24 2010-11-24 Method and system for detecting tissue-specific differentially methylated region (tDMR)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105571311A CN102061337B (en) 2010-11-24 2010-11-24 Method and system for detecting tissue-specific differentially methylated region (tDMR)

Publications (2)

Publication Number Publication Date
CN102061337A CN102061337A (en) 2011-05-18
CN102061337B true CN102061337B (en) 2013-11-20

Family

ID=43996843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105571311A Active CN102061337B (en) 2010-11-24 2010-11-24 Method and system for detecting tissue-specific differentially methylated region (tDMR)

Country Status (1)

Country Link
CN (1) CN102061337B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982253B (en) * 2011-09-02 2015-11-25 深圳华大基因科技服务有限公司 Methylation differential detection method and device between a kind of multisample
WO2013097061A1 (en) * 2011-12-31 2013-07-04 深圳华大基因研究院 Bs and rrbs sequencing-based bioinformatics analysis method and device
CN107451420A (en) * 2017-07-26 2017-12-08 同济大学 The differential methylation parser of purity effect is considered based on DNA methylation data
CN109979534B (en) * 2017-12-28 2021-07-09 浙江安诺优达生物科技有限公司 C site extraction method and device
CN109637583B (en) * 2018-12-20 2020-06-16 中国科学院昆明植物研究所 Method for detecting differential methylation region of plant genome
CN109841264B (en) * 2019-01-31 2022-02-18 郑州云海信息技术有限公司 Sequence comparison filtering processing method, system and device and readable storage medium
CN111627499B (en) * 2020-05-27 2020-12-08 广州市基准医疗有限责任公司 Methylation level vectorization representation and specific sequencing interval detection method and device
CN113454219B (en) * 2020-08-10 2024-03-08 华大数极生物科技(深圳)有限公司 Methylation marker for liver cancer detection and diagnosis
WO2023082140A1 (en) * 2021-11-11 2023-05-19 华大数极生物科技(深圳)有限公司 Nucleic acid detection kit for diagnosing liver cancer

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101205559A (en) * 2006-12-19 2008-06-25 上海生物芯片有限公司 Oligonucleotide chip for detecting complete genome CpG island and uses thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101205559A (en) * 2006-12-19 2008-06-25 上海生物芯片有限公司 Oligonucleotide chip for detecting complete genome CpG island and uses thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CpG island methylation as an early event during adenoma progression in carcinogenesis of sporadic colorectal cancer;Kim HC ET AL;《JOURNAL OF GASTROENTEROLOGY AND HEPAROLOGY》;20051231;第20卷(第12期);第1920-1926页 *
Kim HC ET AL.CpG island methylation as an early event during adenoma progression in carcinogenesis of sporadic colorectal cancer.《JOURNAL OF GASTROENTEROLOGY AND HEPAROLOGY》.2005,第20卷(第12期),第1920-1926页.
基因组差异甲基化片段筛选技术;赵贵森等;《医学分子生物学杂志》;20071231;第4卷(第1期);第91-94页 *
赵贵森等.基因组差异甲基化片段筛选技术.《医学分子生物学杂志》.2007,第4卷(第1期),第91-94页.

Also Published As

Publication number Publication date
CN102061337A (en) 2011-05-18

Similar Documents

Publication Publication Date Title
CN102061337B (en) Method and system for detecting tissue-specific differentially methylated region (tDMR)
JP6151739B2 (en) Diagnosis of fetal chromosomal aneuploidy using genomic sequencing
AU2003241607B2 (en) Microarrays and method for running hybridization reaction for multiple samples on a single microarray
CN110211633B (en) Detection method for MGMT gene promoter methylation, processing method for sequencing data and processing device
WO2021202752A1 (en) Determining tumor fraction for a sample based on methyl binding domain calibration data
JP6502477B2 (en) Method of determining fetal gene status
KR20210040714A (en) Method and appartus for detecting false positive variants in nucleic acid sequencing analysis
WO2006064631A1 (en) Method, program and system for the standardization of gene expression amount
CN102982253B (en) Methylation differential detection method and device between a kind of multisample
CN105986032A (en) Kit, library establishment method, and method and system for detecting target region variation
KEKEÇ et al. New generation genome sequencing methods
Alkodsi et al. ctDNAtools: An R package to work with sequencing data of circulating tumor DNA
AU2013203079B2 (en) Diagnosing fetal chromosomal aneuploidy using genomic sequencing
CN116403641A (en) Method for eliminating base sequencing errors, method for identifying low-frequency mutation, and related device
AU2008278843B2 (en) Diagnosing fetal chromosomal aneuploidy using genomic sequencing
CN117344014A (en) Pancreatic cancer early diagnosis kit, method and device thereof
Townsend December 2012 Biochem 218 A critical review of ChIP-seq enrichment analysis tools
expression profiles from Lopez-Rios SUPPLEMENTARY METHODS Datasets and samples

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: BGI TECHNOLOGY SOLUTIONS CO., LTD.

Free format text: FORMER OWNER: BGI-SHENZHEN CO., LTD.

Effective date: 20130422

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20130422

Address after: 518083 science and Technology Pioneer Park, comprehensive building, Beishan Industrial Zone, Yantian District, Guangdong, Shenzhen 201

Applicant after: BGI Technology Solutions Co., Ltd.

Address before: Beishan Industrial Zone Building in Yantian District of Shenzhen city of Guangdong Province in 518083

Applicant before: BGI-Shenzhen Co., Ltd.

C14 Grant of patent or utility model
GR01 Patent grant