WO2013097060A1 - 一种基于MspJI酶切的DNA甲基化分析方法 - Google Patents

一种基于MspJI酶切的DNA甲基化分析方法 Download PDF

Info

Publication number
WO2013097060A1
WO2013097060A1 PCT/CN2011/002242 CN2011002242W WO2013097060A1 WO 2013097060 A1 WO2013097060 A1 WO 2013097060A1 CN 2011002242 W CN2011002242 W CN 2011002242W WO 2013097060 A1 WO2013097060 A1 WO 2013097060A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
sequence
fragment
site
cytosine
Prior art date
Application number
PCT/CN2011/002242
Other languages
English (en)
French (fr)
Inventor
卢瀚林
王俊
汪建
杨焕明
Original Assignee
深圳华大基因研究院
深圳华大基因科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因研究院, 深圳华大基因科技有限公司 filed Critical 深圳华大基因研究院
Priority to US14/369,447 priority Critical patent/US20140364321A1/en
Priority to PCT/CN2011/002242 priority patent/WO2013097060A1/zh
Publication of WO2013097060A1 publication Critical patent/WO2013097060A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Definitions

  • the invention belongs to the field of bioinformatics, and particularly relates to an efficient and accurate method for studying plant genome thiolated bioinformatics. Background technique
  • DNA thiolation modification is an important aspect in genetic research. Many biological phenomena and processes, such as doses, DNA polymorphisms, and transposon silencing, are inseparable from DNA. The role of the basic. Current methods for studying DNA methylation in combination with high-throughput sequencing technologies include: BS-sequencing, MBD using protein-bound thiolated cytosine, MeDIP using antibody capture sites, and methylation Cytosine site-specific digestion of RRBS and the like. MBD sequencing is sensitive to high methylation and medium CpG density. MeDIP sequencing is sensitive to hypermethylation and high density CpG, both of which are not accurate enough.
  • BS sequencing can accurately analyze the thiolation state of each C 1 ⁇ 2 and can draw Single-base resolution DNA thiolation map, but the cost of sequencing is high, the amount of data is large, and the reduced representative bisulfite sequencing (RRBS) is based on BS, and the partial region of the whole genome is selected by enzymatic cleavage technique. Further BS sequencing has some advantages over BS in terms of cost, but it is difficult to enrich the thiolated forms of mCHG and mCHH in plant samples.
  • the present invention provides a bioinformatics analysis method for detecting DNA methylation based on MspJI digestion.
  • the method of enzymatic digestion and enrichment of thiolation sites by MspJI does not require bisulfite treatment of the whole genome, and only the thiolation site and the attached information are obtained, and the amount of data generated by the whole genome heavy sulfur reduction WlL sequencing is low. It is a thiolated sequencing method with mild operating conditions and relatively simple and inexpensive.
  • the invention provides a method for detecting genomic DNA thiolation, comprising the steps of:
  • the invention provides a method of analyzing genomic methylation comprising the steps of:
  • genomic DNA sample is digested with a restriction endonuclease MSPJI to obtain a fragment, and preferably a fragment of 28-34 bp in length is collected;
  • Figure 1 is a diagram of a specific embodiment of the present invention.
  • Fig. 1 is a schematic diagram showing the recognition site of the restriction endonuclease MspJI used in the present invention.
  • Figure 3 is a genomic integrity test of Arabidopsis samples showing the Arabidopsis genome quality for enzymatic digestion detected by 1% agarose electrophoresis. It can be seen that the Arabidopsis genome has good integrity, no pollution and no degradation, and can be used for subsequent enzymatic cleavage reactions.
  • Figure 4 shows the results of recovering the 26-38 bp fragment of the Arabidopsis thaliana genome digested with MspJI by a 15% non-denaturing polyacrylamide gel.
  • the left panel is for fragment selection and the right panel is for fragment selection.
  • the cut piece recovers a short fragment enriched in an appropriate range of about 30 bp, and can be used for subsequent construction.
  • Figure 5 shows the results of PCR-amplified target fragments by 2% agarose gel electrophoresis.
  • the left panel shows the library recovery and the right panel shows the library recovery.
  • About 150 bp the fragment of the target fragment was ligated by ligation and PCR amplification, and the fragment in the range of 146-158 bp was recovered, and the original 26-38 bp target fragment was correctly selected.
  • the library thus constructed can examine most of the MspJI.
  • Figure 6 is a schematic representation of the sequence Logo (bottom) of the various ⁇ 1 ⁇ 4 loci (top left), cytosine thiolation type (upper right) and YNCGNR loci on the South African genome. It can be seen from the figure that in addition to the one-way restriction site type, YNCG, YCNGR and C NG account for the vast majority of the digested thiolated fragments; the sequence of the following figure reflects A 51 ⁇ 2 conservative distribution in the sequence containing the YNCGNR site.
  • Fig. 7 is a schematic view showing the distribution trend of the site determined to be a thiolated cytosine on chromosome 1 of Arabidopsis thaliana.
  • Figure 8 shows the distribution of the Arabidopsis gene and its upstream and downstream methylated cytosine (top left), the distribution of methylated cytosine in the repeat region (top right) with 51 ⁇ genome-wide windows Schematic diagram of thiolated cytosine, reordering, and read distribution (bottom).
  • Figure 9 is a schematic diagram showing the correlation between the methylation data of the genomic DNA and the BS sequencing data obtained by the enzyme digestion and thiolation detection method. detailed description
  • Y stands for C or T
  • R stands for A or G
  • N stands for A, C, T or G
  • a read is a sequence of a sequencer output and a sequence of fragments.
  • a thiolation sensitive restriction endonuclease MspJl having a relatively distant homology to E. coli Mrr is used in the present invention, and the enzyme is commercially available, for example, from New England Biolabs (EB).
  • the bp and 13 bp cuts form a short segment of 5, 4 1 ⁇ 2 overhang. If the recognition site is fully thiolated, a 32 ⁇ 31 fragment can be generated by bidirectional cleavage, and the methylation site is included in the central position of the fragment, and the fragments are enriched for sequencing analysis and alignment, ie The location of the thiolated cytosine on the genome can be understood.
  • Figure 1 shows an implementation of the detection of DNA methylation in the present invention, as detailed below.
  • step S1 although sequencing can use any of the commonly used sequencing techniques in the art, sequencing is preferably performed using SE50 due to the relatively short sequence of the enzyme slice.
  • the invention may also use other high throughput sequencing techniques, such as IHumina GA sequencing technology, or existing _ other high throughput sequencing technologies.
  • step S2 the sequence of the lower machine will be sequenced, preferably by filtering the sequence to remove the unqualified sequence.
  • the sequence is unqualified: the sequencing quality value of more than 50% of the total number of bases in the sequence is below a certain threshold, and more than 10% of the total number of bases in the sequence are uncertain (eg N) in the IHumina GA sequencing results.
  • the low quality threshold can be determined by those skilled in the art according to the specific sequencing technology and the sequencing environment.
  • After removal of the unsuccessful sequence it is also preferred to screen the sequencing sequence, retaining the complete sequence without the sequencing linker and the sequence of 28-34 pb in length after removal of the sequencing linker.
  • the sequencing sequence is the genome-wide localization of the 1 ⁇ 4 fragment.
  • the alignment software such as Soap 2.20
  • the alignment software is used for two comparisons: 1) setting software parameters, allowing each seed sequence The two mismatches, up to 4 mismatches per sequencing sequence, get the comparison results; 2) Reset the Soap2.20 parameters, do not allow mismatches during the comparison, compare the ⁇ one alignment The positions are compared with the sequences on the unpaired pair to obtain the second alignment result; 3) the two alignment results are combined, the statistical comparison ratio and the unique alignment ratio.
  • Other short sequence mapping programs can also be used to achieve the alignment.
  • the unique aligned sequence can confirm the position of the thiolated cytosine according to the type and length relationship of the enzyme recognition site, and classify according to the sequence characteristics of the thiolated cytosine.
  • the presence or absence of methylated cytosine in the unique alignment sequence is determined. If a corresponding MspJI recognition site is found at a specific distance from the cleavage end, the cytosine in the site is a thiolated cytosine. .
  • the 28-34 bp fragment after enzymatic cleavage was divided into homomethylation-recognizing sites (the corresponding C and G complementary sites were methylated), taking into account the fluctuation of the cleavage position 1-2.
  • Locus YNCGNR, YCNGR, CNNG, GNNC, CYNRG, CNYRNG, YN GCNNR. 8 fragments of YNNGNCN, and 2 fragments containing semi-thiolated recognition sites C IS and YN G, 10 types, each The fragment type corresponds to a fragment length.
  • the binding cleavage site and the sequence type of thiolated cytosine are used for the classification, the fragments cannot be accurately classified, and there is overlap between the types (
  • the TCCGGA fragment can be either YNCGN or YCNGR, however, this classification still provides great for the search and localization of methylated cytosine sites based on fragment length and recognition site type relationships. Convenience.
  • step S4 according to the recognition site type of each sequence fragment, binding to the alignment position on the Arabidopsis reference sequence (TAIR8), the position of the thiolated cytosine on the genome is located, and finally the A is confirmed.
  • the shy type of cytosine ie CG, CHG or CHH.
  • the distribution of each recognition site and cytosine type was counted, and each type of sequence feature was described using SeqLogo.
  • step S5 after confirmation and typing of methylated cytosine, the sequencing depth of each site determined to be a thiolated cytosine is counted, giving a similar thiolated single nucleotide annotation in BS sequencing.
  • a detailed description of the chromosome, sequence coordinates, positive and negative strands, depth, restriction endonuclease recognition, cytosine type, etc. of each site identified as methylated cytosine, and finally statistically determined to be methylated cytosine The total amount and coverage of the loci, and the methylation status of the whole genome MspJI digestion.
  • An exemplary file format similar to the thiolated single nucleotide annotation in BS sequencing is as follows:
  • Fourth column the number of covered thiosed reads; Fifth column: Identify the type of site;
  • other relevant analysis can also be performed, which combines the characteristics of the plant genome used to analyze the distribution of methylated cytosine on the genome, for example, distribution on each element of the gene, in the region of the repeat sequence. Distribution and distribution of some local areas.
  • step SI there are several steps including DNA extraction, restriction enzyme digestion and selection and recovery of the fragment, construction of the SE library, and sequencing on the machine.
  • CAB ammonium hexabromide
  • the samples were purified by phenol: chloroform extraction and ethanol precipitation.
  • the mass of the samples was detected by 1% agarose gel electrophoresis (Fig. 3) After passing the test, it was used for MspJI (purchased from New England Biolabs (NEB)).
  • the scope of fragment recovery has been expanded to detect a large number of methylated cytosines in the Arabidopsis genome in the form of non-CpG.
  • the method of building the library refers to the commonly used Illumina PE library construction process. After end repair, the end is added with "A", the linker and the PCR amplification step are used to obtain the product with the fragment size of 146-158 bp, and each step must be followed by: chloroform pumping The method of ethanol precipitation is mentioned, and the next reaction is carried out.
  • the PCR product was passed at 2 ° /. It was detected by agarose gel electrophoresis and recovered (Fig. 5), purified using QIAquick Gel Purification Kit, library quality control was performed by Bioanalyzer analysis system, and then SE50 sequencing was performed on an Illumina HiSeq2000 sequencer.
  • step S2 the sequence of the down-sequence is sequenced, preferably by filtering the sequence to remove the unqualified sequence, including the following two cases: the sequencing shield of more than 50% of the total number of sequences is less than 20, and the total number of sequences More than 10% of the base is an uncertain base.
  • the sequencing sequence is preferably screened, and the complete sequence containing no sequencing linker and the sequence of 28-34 pb in length after removal of the sequencing linker are retained after screening.
  • the sequencing sequence is the genome-wide localization of the fragment. Considering that the sequence fragment is short, there may be cases where the alignment is not matched or the multiple alignment cannot be located.
  • the alignment software Soap2.20 (obtained from soap.genomics.org.cn/) is used for two comparisons: 1) setting Software ⁇ :, allows two mismatches of a seed sequence, up to 4 mismatches per sequencing sequence, and obtains the alignment result; 2) reset Soap2.20 parameters, do not allow mismatch during the comparison process, will be Comparing multiple positions in one alignment with no alignment on the alignment, obtaining a second alignment result; 3) combining the two alignment results, the statistical comparison ratio and the unique alignment ratio, See Table 1.
  • Table 1 gives the total amount of data for the Arabidopsis sample, the amount of data obtained by filtration and screening, and the total number of sequences that can be uniquely aligned to the Arabidopsis genome after alignment, due to the enzyme The shorter sequence and the actual distribution of methylation sites result in a relatively low ratio of unique alignments.
  • the unique aligned sequence can confirm the position of the thiolated cytosine according to the type and length relationship of the enzyme recognition site, and classify according to the sequence characteristics of the thiolated cytosine.
  • the presence or absence of a thiolated cytosine in the unique alignment sequence is determined. If a corresponding MspJI recognition site is found at a specific distance from the cleavage end, the cytosine in the site is a thiol group. Cytosine.
  • the 28-34 bp fragment after digestion was divided into full thiol recognition sites (the corresponding C and G complementary sites are thiolated).
  • Table 2 shows the «-degree and depth distribution of the sequences containing the thiolated cytosine on each chromosome.
  • Table 3 shows the type of methylation-containing cytosine recognition site on the unique alignment fragment.
  • step S4 based on the recognition site type of each sequence fragment, binding to the alignment position on the Arabidopsis reference sequence (TAIR8), the position of the thiolated cytosine on the genome is located, and finally the ⁇ is confirmed.
  • the basic type of cytosine ie CG, CHG or CHH.
  • the distribution of each recognition site and cytosine type was counted, and the sequence characteristics of all capture sequences of each type were described using SeqLogo, see Figure 7.
  • Figure 7 shows the methylated cytosine determined on chromosome 1 of Arabidopsis thaliana The distribution trend of the sites, from the figure we can see the general trend of the distribution, and the distribution of thiolated cytosine sites near the JL ⁇ centromere is dense.
  • step S5 after confirming and typing the thiolated cytosine, the sequencing depth of each site determined to be a thiolated cytosine is counted, giving a thiolated single nucleotide annotation similar to BS-sequence a file detailing the chromosomes, sequence coordinates, positive and negative strands, depth, restriction endonuclease recognition, cytosine type, etc. of each locus determined to be methylated cytosine, ⁇ statistically determined to be methylated
  • the total amount of pyrimidine sites and the case give the thiolation of the whole genome MspJI digestion, see Figure 8.
  • the top left of Figure 8 shows the distribution of all the genes of Arabidopsis thaliana and the capture of each thiolated cytosine in the range of 2000 bp.
  • the overall distribution is consistent with the literature, ie the gene region is relative to the above. There is a higher level of thiolation in the downstream, and the relative methylation level near the TSS locus is very low; the upper right shows the distribution of all the digested fragments in the repeating elements, and about 45% of the fragments fall in the heavy column.
  • the following figure also shows a distribution of the number of thiolated cytosines on the chromosome 1 of Arabidopsis thaliana, the coverage length of the sequenced fragments and the length of the repeats. The distribution of thiolated cytosines can be seen. And the relationship of re-listing.
  • FIG. 9 shows the mCG, mCHG and mCHH loci and BS sequencing data in the corresponding interval (experimental steps see below) Correlation, the abscissa in the figure is the level of thiolation obtained by sequencing the restriction enzyme, and the ordinate is the thiolation level of BS sequencing. The length of the interval is 50Kb, and the correlation level is given from the figure. It can be seen that the correlation level of mCG and mCHG is higher than that of mCHH. This result is consistent with the understanding in the art and illustrates the effectiveness of the method of the present invention.
  • the BS sequencing data was obtained using the same experimental procedure as above.
  • the DNA samples were treated with the ZYMO EZ DNA Methylation-Gold kit for Bisulfite treatment.
  • DNA was detected and recovered by 2% agarose gel electrophoresis, purified by QIAquick gel purification kit, and the fragment size was selected. After PCR amplification, the fragment size was selected again. L ⁇ QC was analyzed by Bioanalyzer. Upon completion, SE50 sequencing was performed on an IHumina HiSeq2000 sequencer and the results were analyzed.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Organic Chemistry (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明提供了一种基于MspJI酶切检测DNA甲基化和对基因组甲基化进行生物信息学分析的方法。

Description

一种基于 MspJI酶切的 DNA甲基化分析方法
技术领域
本发明属于生物信息技术领域, 尤其涉及一种高效而准确的研究植物基因 组曱基化生物信息学分析方法。 背景技术
DNA 曱基化修饰是 ^见遗传学研究中的一个重要方面, 很多生物学现象 和过程, 例如剂量 ^hi尝、 DNA位点多态性及转座子的沉默等, 都离不开 DNA 甲基化的作用。 目前结合高通量测序技术研究 DNA甲基化的方法有: 重亚硫 酸盐测序(BS-sequencing ) 、 利用蛋白结合曱基化胞嘧啶的 MBD、 利用抗体 捕获位点的 MeDIP、 利用甲基化胞嘧啶位点特异性酶切的 RRBS等。 MBD测 序对高甲基化和中等 CpG密度部分较敏感, MeDIP测序对高甲基化和高密度 的 CpG较敏感, 两者都不够精确, BS测序则能精确分析每一个 C ½的曱基 化状态, 能够绘制单 基分辨率的 DNA曱基化图谘, 但是测序成本高, 数据 量大, 减少的代表性重亚硫酸盐测序(RRBS )是在 BS的基础上, 通过酶切技 术选取全基因组的部分区域再进行 BS测序, 在成本上讲比 BS有些优势, 但 难以富集植物样品中大量 mCHG和 mCHH的曱基化形式。
因此, 现在仍需探索一种高效而准确的方法研究植物基因组曱基化。 发明内容
为实现无须 BS测序的海量测序即可检测 DNA曱基化,本发明提供了一种 基于 MspJI酶切检测 DNA甲基化的生物信息学分析方法。 MspJI酶切富集曱 基化位点的方法无须对全基因组进行重亚硫酸盐处理, 且只获得曱基化位点及 附 列信息, 相对全基因组重亚硫 WlL测序产生的数据量低, 是一种操作条 件温和且相对简便低廉的曱基化测序方法。 由此, 我们设计了与之对应的生物 信息学分析方法, 以确定酶切片段中的识别位点、 曱基化位点及其类型, 并提 供了后续分析的方法实例。
. 在一方面, 本发明提供了一种基因组 DNA曱基化检测方法, 包括步骤:
1 )以修饰 性限制性内切酶 MSPJI对基因组 DNA样品酶切以获得片 段, 优选收集长度 28-34bp的片段; 2 )对上述片段进行测序 , 得到测序片段;
3 ) 将上述测序片段比对到参考基因组序列上, 选取唯一比对的序列;
4 )对于上述唯一比对的序列, 在 YNCG R、 YCNGR、 CN G、 GNNCv CY RG、 CNYRNG. YNNGCNNR. YNNGNC 1N , CNNR和 Y G中的 C位点以及其互 ^h^J的 C位点对应的参考基因组序列上位点被确定为被曱基 化。
在另一方面, 本发明提供了一种对基因组甲基化进行分析的方法, 包括下 述步驟:
1 )以^ 性限制性内切酶 MSPJI对基因组 DNA样品酶切以获得片 段, 优选收集长度 28-34bp的片段;
2 )对上述片段进行测序, 得到测序片段;
3 )将上述测序片段比对到参考基因组序列上, 选取唯一比对的序列;
4 )对于上述唯一比对的序列, 在 YNCGNR、 YCNGR、 CN G、 GNNC、 CY RG、 C YRNG, YN GCNNR, YNNGNCNNR. C NR和 YNNG中的 C位点以及其互 ^h^Ji的 C位点对应的参考基因组序列上位点被确楚为被曱基 化;
5 )在上述甲基化胞嘧啶位点中, 统计类型 CG、 CHG或 CHH ( H为 C、 A或 T絲)的分布;
6 )将如下的一种或多种信息标注在全基因组图讲上: 每个曱基化胞嘧啶 的位点的测序深度, 曱基化单核苷酸注释信息、 每个确定为曱基化胞嘧啶的位 点所在染色体号、胞嘧啶位点的位置、正负链信息、 深度、酶切识别位点、 胞嘧啶类型等信息以及确定为甲基化胞嘧啶的位置的总量和覆盖情况, 得到全 基因组曱基化图谱。
为了使本发明的目的、 技术方案及优点更加清楚明白, 以下结合附图及实 施例, 对本发明进行进一步详细说明。 应当理解, 此处所描述的具体实施例仅 用以解释本发明, 并不用于限定本发明。
附图说明 图 1是本发明具体实施方案的¾ ^呈图。
图 1是本发明中使用的限制性内切酶 MspJI的识别位点示意图。 MspJI 能够识别甲基化的 C R(R=A/G)的双链位点,并在双 Ji距离 R—端 9bp及 13bp切割形成一个 5,末端 4 ½突出的短片段, 若识别位点发生全曱基化, 即双链上对应位置都存在曱基化的 MspJI识别位点, 则可双向切割产生 30-32 的酶切片段, 在本发明中以此类 片段为重点研究对象。
图 3是拟南芥样品基因组完整性检测结果, 示出了以 1%琼脂糖 电泳 检测的用于酶切的拟南芥基因组质量。 从中可以看出拟南芥基因组完整性良 好, 无污染无降解, 可用于后续酶切反应。
图 4是以 15%非变性聚丙烯酰胺凝胶回收经 MspJI酶切的拟南芥基因组 中 26-38bp片段的结果, 左图为片段选择前, 右图为片段选择后。 对比可知切 胶回收了 30bp左右适当范围内富集的短片段, 可用于后续^的构建。
图 5是以 2%琼脂糖凝股电泳回收经 PCR扩增的目的片段的结果,左图为 文库回收前, 右图为文库回收后。 150bp左右为目的片段经接头连接和 PCR 扩增增长后的片段大小, 回收此处 146-158bp 范围内的片段, 可正确选择原 26-38bp目的片段, 由此所构建的文库可考察大部分 MspJI全甲基化的识别位 点, 即对称甲基化的 CpG、 CHG、 CHH位点。
图 6是拟南 基因组上各类 § ¼位点(左上)、 胞嘧啶曱基化 ^类型 (右上)和 YNCGNR位点的序列 Logo (下)的示意图。 从图中可以看出, 除 了单向酶切的位点类型 ^卜,双向酶切的片段中 YNCG , YCNGR和 C NG 占了酶切曱基化片段的绝大多数; 下图的序列 Logo反映了含有 YNCGNR位 点的序列中5½保守性的分布。
图 7是拟南芥 1号染色体上确定为曱基化胞嘧啶的位点的分布趋势示意 图。
图 8是拟南芥的基因及其上下游的甲基化胞嘧啶的位点分布情况(左上), 甲基化胞嘧啶在重复序列区域的分布统计(右上)以51^全基因组上各个窗口 中曱基化胞嘧啶、 重 列, 和读 分布示意图 (下)。
图 9是酶切曱基化检测方法得到的拟南^^基因组甲基化数据和 BS测序 数据相关性的示意图。 具体实施方式
在本发明的 DNA序列中
Y代表 C或 T;
R代表 A或 G;
N代表 A、 C, T或 G;
11为<、 或1\ 在本发明中, 读段(reads )是指测序仪输出、 拼^ ^前的测序片段。 在本发明中使用了一种和大肠杆菌 Mrr有较远同源性的曱基化敏感性限 制性内切酶 MspJl , 该酶可以市购获得, 例如自 New England Biolabs( EB)。
如图 2所示, MspJI能识别曱基化的 C NR(R=A/G)的双链位点, 其互补 链为 YN NG ( Y = T/C ), 并在双 距离 R—端 9 bp及 13 bp切割形成一 个 5,末端 4 ½突出的短片段。 若识别位点发生全曱基化, 则可双向切割产生 32 ^Μ 31 的片段, 而此时甲基化位点被包含在 片段的中央位置, 富集这些片段进行测序分析和比对, 即可了解基因组上曱基化胞嘧啶的位置。 由于甲基化大多以全曱基化形式在 CpG、 CHG、 CHH序列中发生, 而这些序 列经 MspJI识别及切割后主要产生 30-32 bp的片段, 考虑到识别位点类型的 多样性和切割位置 1-2 bp的波动,我们以酶切后 28-34 bp的片段为例,对其进 行测序分析和比对, 得到这些含有甲基化位点的序列信息。
图 1示出了本发明的检测 DNA甲基化的实现 ¾½, 详述如下。
在步骤 S1 中, 虽然测序可以使用本领域中任何常用测序技术, 但是由于 酶切片■ ^相对较短的序列, 优选采用 SE50进行测序。 本发明还可以使用其 他高通量测序技术, 例如 IHumina GA测序技术, 或者现有的 _其他高通量测序 技术》
在步骤 S2 中, 将测序下机的序列, 优选通过对序列的过滤去除不合格序 列。例如,序列不合格有以下两种情况: 序列碱基总数中 50%以上的緘基的测 序质量值低于某一阈值, 以及序列碱基总数中 10%以上的碱基为不确定 基 (如 IHumina GA测序结果中的 N )。 其中, 低质量阀值可以由本领域技术人 员根据具体测序技术及测序环境而定。 去除不合格序列之后, 还优选对测序序 列进行筛选, 筛选后保留不含有测序接头的完整序列和去掉测序接头之后长度 在 28-34 pb的序列。 现测序序列即 1 ¼片段的全基因组定位。 考虑到序列片段较短, 可能出现比对 不上或者多重比对无法定位的情况, 优选利用比对软件, 例如 Soap2.20, 进行 两次比对: 1 )设置软件参数, 允许每个种子序列的两个错配, 每条测序序列 最多 4个错配, 得到比对结果; 2 )重新设置 Soap2.20参数, 在比对过程中不 允许错配, 将笫一次比对中比对到多个位置和没有比对上的序列再进行一次比 对, 得到二次比对结果; 3 )合并两次比对结果, 统计比对率和唯一比对率。 也可以利用其他的短序列映射程序实现比对。 在步骤 S3中, 唯一比对的序列可以按照酶识别位点的类型与长度的关系 确认曱基化胞嘧啶的位置, 并根据曱基化胞嘧啶所在的序列特征进行分类。 首 先,根据 MspJI酶切特征判别唯一比对序列中是否存在甲基化胞嘧啶,若在切 割末端特定距离找到相应的 MspJI识别位点,则位点中的胞嘧啶为发生曱基化 的胞嘧啶。 在考虑切割位置 1-2 基波动的前提下,将酶切后 28-34 bp的片段 分为含全甲基化识别位点(这些对应的 C和 G的互补位点都是甲基化的位点) YNCGNR, YCNGR, CNNG、 GNNC、 CYNRG、 CNYRNG, YN GCNNR. YNNGNCN 的 8种片段, 以及含半曱基化识别位点 C IS 、 YN G的 2种 片段, 共 10种类型, 每种片段类型对应一种片段长度。 需要说明的是, 在结 合酶切位点与曱基化胞嘧啶所在序列类型 (CpG、 CHG、 CHH )这两种分类 标准进行统计时, 无法将片段进行精确分类, 类型之间存在重叠现象(如 TCCGGA片段可以为 YNCGN 或 YCNGR两种类型中的任一种 ) , 尽管如 此, 此分类仍然为基于片段长度和识别位点类型关系的甲基化胞嘧啶位点的查 找及定位提供了极大的便利。
在步骤 S4中, 根据每个序列片段的识别位点类型, 结合在拟南芥参考序 列 (TAIR8 )上的比对位置, 定位发生曱基化的胞嘧啶在基因组上的位置, 最 后确认该甲基化胞嘧啶的羞 类型 (即 CG、 CHG或 CHH )。 统计各个识别 位点和胞嘧啶类型的分布, 利用 SeqLogo描述各类型序列特征。
在步骤 S5中, 对甲基化胞嘧啶确认和分型之后, 统计每个确定为曱基化 胞嘧啶的位点的测序深度,给出类似于 BS测序中曱基化单核苷酸注释的文件, 详细描述每个确定为甲基化胞嘧啶的位点所在的染色体、 序列坐标、 正负链、 深度、 酶切识别位点、 胞嘧啶类型等信息, 最后统计确定为甲基化胞嘧啶 的位点的总量和覆盖情况,给出全基因组 MspJI酶切的甲基化情况等。类似于 BS测序中曱基化单核苷酸注释的示例性文件格式具体如下:
Chrl 17 + 3 CNNR CTAA CHH
Chrl 24 + 3 CNNR CTAA CHH
Chrl 1649 + 8 YNCGNR TACGAA CG
Chrl 1650 10 YNCGNR TACGAA CG
第一列: 染色体号;
第二列: 胞嘧啶位点的位置;
第三列: 正负链信息;
第四列: 覆盖的曱基化的读段数; 第五列: 识别位点类型;
第六列: 具体的位点序列;
第七列: C位点类型;
在本发明中,还可以进行相关的其他分析,即结合所用植物基因组的特点, 分析在该基因组上甲基化胞嘧啶的分布情况, 例如在基因每个元件上的分布、 在重复序列区域的分布及一些局部区域的分布等。
实施例:
样本: 哥伦比亚型拟南芥叶片组织全基因组样本 1个;
测序策格: single ends (SE) Illumina sequencing datasets;
以下结合图 1对具 *f ¾½, 进 fr兌明:
在步骤 SI中, 包括 DNA提取、 酶切及酶切片段的选择回收、构建 SE文 库、 上机测序几个步骤。 以十六; ί^ί^曱基溴化铵(CTAB )提 物叶片组 织 DNA后, 采用酚: 氯仿抽提和乙醇沉淀的方法純化样品, 样品质量经 1% 琼脂糖凝胶电泳检测 (图 3 )合格后, 用于 MspJI (购自 New England Biolabs(NEB) )酶切。 在 NEB网站为 MspJI产品提供的推荐酶切体系的羞^ 上,针对植物基因组做出如下改进, 1.5 ug拟南芥基因组用 12U (3 ul)的酶量酶 切, 体系中加入终浓度为 0.8 uM的寡核苷酸催化剂, 使得酶切效果较之前有 显著提高。 酶切 16小时后以 15%的非变性聚丙烯酰胺凝股进行电泳, 回收包 含主带在内的 26-38 bp酶切片段(图 4 )以泡压法进行纯化, 纯化后的短片段 用以构建 DNA文库。 此处扩大了片段回收的范围, 目的在于检测拟南芥基因 组中大量以非 CpG形式存在的甲基化胞嘧啶。 建库方法参照常用的 Illumina PE文库构建流程, 经过末端修复, 末端加 "A",加接头及 PCR扩增步骤, 得 到片段大小在 146-158 bp的产物, 其中每一步之后须以 : 氯仿抽提及乙醇沉 淀的方法回收, 再进行下一步反应。 PCR产物经 2°/。琼脂糖凝胶电泳检测并回 收(图 5 ) , 用 QIAquick凝胶纯化试剂盒纯化, 文库质控由 Bioanalyzer分析 系统完成, 之后在 Illumina HiSeq2000测序仪上进行 SE50测序。
在步骤 S2 中, 将测序下机的序列, 优选通过对序列的过滤去除不合格序 列, 包括以下两种情况: 序列 总数中 50%以上的碱基的测序盾量低于 20, 及序列 总数中 10%以上的减基为不确定碱基。去除不合格序列之后,再优 选对测序序列进行筛选, 筛选后保留不含有测序接头的完整序列和去掉测序接 头之后长度在 28-34pb的序列。 测序序列即酶切片段的全基因组定位。 考虑到序列片段较短, 可能出现比对不 上或者多重比对无法定位的情况, 利用比对软件 Soap2.20 (获自 soap.genomics.org.cn/ )进行两次比对: 1 )设置软件^:, 允 个种子序列 的两个错配,每条测序序列最多 4个错配,得到比对结果; 2 )重新设置 Soap2.20 参数, 在比对过程中不允许错配, 将第一次比对中比对到多个位置和没有比对 上的序列再进行一次比对, 得到二次比对结果; 3 )合并两次比对结果, 统计 比对率和唯一比对率, 参见表 1。 表 1给出了拟南芥样品具体的下机的数据总 量、 通过过滤和筛选之后得到的数据量, 以及通过比对之后最终能够唯一比对 到拟南芥基因组上的序列总数, 由于酶切序列较短和甲基化位点的实际分布情 况, 从而导致了唯一比对率相对较低。
表 1, 拟南芥数据产出、 过滤和比对情况统计 样品 原始序列 过滤后序列 比对上的序列 唯一比对序列 拟南芥 43578097 32107319(100%) 26222436(81.67%) 6002281(18.69%)
在步骤 S3 中, 唯一比对的序列可以按照酶识别位点的类型与长度的关系 确认曱基化胞嘧啶的位置, 并根据曱基化胞嘧啶所在的序列特征进行分类。 首 先,根据 MspJI酶切特征(图 6 )判别唯一比对序列中是否存在曱基化胞嘧啶, 若在切割末端特定距离找到相应的 MspJI识别位点,则位点中的胞嘧啶为发生 曱基化的胞嘧啶。 在考虑切割位置 1-2减基波动的前提下,将酶切后 28-34 bp 的片段分为含全曱基化识别位点(这些对应的 C和 G的互补位点都是曱基化 的位点) YNCGNR, YCNGR. CNNG、 GNNC、 CYNRG、 CNYRNG . YNNGC NR, YNNGNCN R的 8种片段, 及含半甲基化识别位点 C NR、 YNNG的 2种片段, 共 10种类型, 参 2和表 3。 表 2给出了确定含有曱 基化胞嘧啶的序列在各染色体上的 «_度和深度分布。 表 3给出了唯一比对片 段上的含有甲基化胞嘧啶识别位点类型统计, 需要说明的是, 此种分类的意义 在于为基于片段长度和识别位点类型关系的甲基化胞嘧啶位点的查找及定位 提供便利,但在统计读段时存在不同类型位点间的重复统计现象(如 TCCGGA 片段可被 YNCGN 及 YCNGR两种类型分别统计一次)。 但仍可以看出, 位 点类型 YNCGNR和 YCNGR以及只有单向酶切的位点( CN R和 YNNG ) 在所有类型中所占的比例较大。 2.确定含有曱基化胞嘧啶的序列在各染色体上的分布统计 染色体 读段 总长度 (bp) 长度 (bp) 深度 ( X )
Chrl 858094 27588022 8430027 3.27
Chr2 809360 25849788 5822740 4.44
Chr3 1224824 39586855 6872663 5.76
Chr4 923907 30126662 5460987 5.52
Chr5 1278842 41120637 7777612 5.29
ChrC 882831 28667142 246026 116.52 总计 5977858 192939106 34610055 5.57
^3.唯一比对片段上的含有曱基化胞嘧啶识别位点类型统计
YNCGN 920954 15.34%
YCNGR 418696 6.98%
YNNGCNNR 183789 3.06%
YN GNCNNR 193914 3.23%
CNNG 449739 7.49%
G C 226264 3.77%
CYNRG 3438 0.06%
CNY NG 2191 0.04%
C N 863932 14.39%
YNNG 713926 11.89%
NA 2025438 33.74%
总计 6002281 100.00%
在步骤 S4中, 根据每个序列片段的识别位点类型, 结合在拟南芥参考 序列( TAIR8 )上的比对位置,定位发生曱基化的胞嘧啶在基因组上的位置, 最后确认该曱基化胞嘧啶的基本类型(即 CG、 CHG或 CHH )。 统计各个 识别位点和胞嘧啶类型的分布, 利用 SeqLogo描述各类型所有捕获序列的 序列特征, 参见图 7。 图 7示出了拟南芥 1号染色体上确定为甲基化胞嘧啶 的位点的分布趋势,从图中我们可以看出分布的大致的趋势, 并 JL^着丝粒 附近曱基化胞嘧啶位点的分布比较密集。
在步驟 S5中, 对曱基化胞嘧啶确认和分型之后, 统计每个确定为曱基 化胞嘧啶的位点的测序深度,给出类似于 BS-sequence中曱基化单核苷酸注 释的文件,详细描述每个确定为甲基化胞嘧啶的位点所在的染色体、序列坐 标、 正负链、 深度、 酶切识别位点、 胞嘧啶类型等信息, ^统计确定 为甲基化胞嘧啶的位点的总量和 ^情况, 给出全基因组 MspJI酶切的曱 基化情况等, 参见图 8
图 8左上示出了拟南芥的所有的基因以及其上下游 2000bp范围内, 捕 获得到的各个曱基化胞嘧啶的分布情况, 整体的分布情况与文献报道的相 符, 即基因区相对于上下游有较高的曱基化水平分布, TSS位点附近相对甲 基化水平很低;右上示出了所有酶切片段在重复序列元件中的分布,约 45% 的片段落在重^^列元件内;下图同样也展示了拟南芥 1号染色体上曱基化 胞嘧啶的个数、测序片段的覆盖长度和重复序列的长度的一个分布,可以看 出曱基化的胞嘧啶的分布和重 列的关系。
类似于 BS-sequence 中甲基化单核苷酸注释的示例性文件格式具体如 下:
Chrl 17 + 3 CNNR CTAA CHH
Chrl 24 + 3 CNNR CTAA CHH
Chrl 1649 + 8 YNCGNR TACGAA CG
Chrl 1650 10 YNCGNR TACGAA CG 第一列: 染色体号;
第二列: 胞嘧啶位点的位置;
第三列: 正负链信息;
第四列: 覆盖的曱基化的读段数;
第五列: 识别位点类型;
第六列: 具体的位点序列;
第七列: C位点类型;
还进行了相关的其他分析, 即结合所用植物基因组的特点,分析在该基 因组上曱基化胞嘧啶的分布情况,例如在基因每个元件上的分布、在重 列区域的分布及一些局部区域的分布等, 参见图 9。 图 9示出了在对应区间 内 mCG, mCHG和 mCHH位点和 BS测序数据 (实验步骤见下文)的一 个相关性, 图中横坐标为本次酶切的测序得到的曱基化水平, 纵坐标是 BS 测序的曱基化水平, 划定的区间长度为 50Kb, 从图中给出的相关性水平可 以看出, mCG和 mCHG的相关性水平较 mCHH的高。 这种结果与本领域 已有理解是一致的, 说明了本发明的方法的有效性。
用与上文相同的样^ ¾行以下实验步骤, 获得 BS测序数据。
1.以十六烷基三甲基溴化铵 ( CTAB )提 Wfe物叶片组织 DNA后, 采用 酚: 氯仿抽提和乙醇沉淀的方法纯化样品, 样品质量经 1%琼脂糖凝胶电泳检 测合格后, 超声打断至 100-300bp的片段。 建库方法参照常用的 Illumina PE 文库构建流程, 经过末端修复, 末端加 "A",加接头及 PCR扩增步骤, 其中每 一步之后须以酚: 氯仿抽提及乙醇沉淀的方法回收, 再进行下一步反应。
2. 依 照 厂 商 说 明 书 , 将 所 述 DNA 样 品 采 用 ZYMO EZ DNA Methylation-Gold kit 进 行 Bisulfite 处 理
( ZYMO EZ DNA Methylation-Gold kit , 市 购 自 http://www.bioon.com.cn/reagent/show product.asp?id=6078 )。
3. DNA经 2%琼脂糖凝胶电泳检测并回收, 用 QIAquick凝胶纯化试剂盒 纯化, 并进行 片段大小选择, PCR扩增后再次进行 l ^片段大小选择, L^质控由 Bioanalyzer分析系统完成, 之后在 IHumina HiSeq2000测序仪上 进行 SE50测序并分析结果。
以上所述仅为本发明的普通的实施例而已, 并不用以限制本发明, 凡在 本发明的精神和原则之内所作的^ Γ修改、等同替换和改进等, 均应包含在 本发明的保护范围之内。

Claims

权 利 要 求 书
1. 一种基因组 DNA曱基化检测方法, 包括步骤:
1 ) 以 ^ 性限制性内切酶 MspJI对基因组 DNA样品酶切以获得片 段;
2 )对上述片段进行测序 , 得到测序片段;
3)将上述测序片段比对到参考基因 列上, 选取唯一比对的序列;
4)对于上述唯一比对的序列, 在 YNCGNR、 YCNGR, C G、 G C、 CYNRG、 CNYRNG. YNNGCNNR, YNNGNCNNR, C R和 YN G中的 C位点以及其互 h J的 C位点对应的参考基因组序列上位点被确定为被曱基 化。
2. 权利要求 1的方法, 在所述步骤 1) 中, 还包括在酶切后收集长度 28-34bp的片段。
3. 权利要求 1的方法, 在所述步驟 2) 中, 其中所述测序使用 illumina solexa、 ABI SOLiD和 /或 Roche 454测序平台。
4. 权利要求 1的方法, 在所述步骤 3) 中进行两次比对, 步骤如下: 允 i ^个种子序列的两个错配, 每条测序序列最多 4个错配, 得到比对结果; 在 比对过程中不允许错配, 将第一次比对中比对到多个位置和没有比对上的序列 再进行一次比对, 得到二次比对结果; 合并两次比对结果。
5. 一种对基因组曱基化进行分析的方法, 包括下述步骤:
1 ) 以 ^^性限制性内切酶 MspJI对基因组 DNA样品 以获得片 段;
2 )对上述片段进行测序, 得到测序片段;
3)将上述测序片段比对到参考基因^ ^列上, 选取唯一比对的序列;
4)对于上述唯一比对的序列, 在 Y CG 、 YCNGR、 C NG、 GNNC、 CY RG、 CNYRNG, YNNGCN R, YNNGNCN R, CNNR和 YNNG中, C位点以及与 G互补的 C位点为曱基化的 C位点, 从而确定参考基因组序列 上相应位点被甲基化;
5)在上述甲基化胞嘧啶位点中, 统计类型 CG、 CHG或 CHH的分布, 其中 11为 C、 A或 T;
6)将如下的一种或多种信息标注在全基因组图傳上: 每个曱基化胞嘧啶 的位点的测序深度, 曱基化单核苷酸注释信息、 每个确定为甲基化胞嘧啶的位 点所在染色体号、胞嘧啶位点的位置、正负链信息、 深度、 S ^识别位点、 胞嘧啶类型等信息以及确定为甲基化胞嘧啶的位置的总量和覆盖情况, 得到全 基因组曱基化图谱。
6. 权利要求 5 的方法, 在所述步骤 1 ) 中, 还包括在酶切后收集长度 28-34bp的片段。
7. 权利要求 5的方法, 在所述步骤 2 ) 中, 其中所述测序使用 illumina solexa、 ABI SOLiD和 /或 Roche 454测序平台。
8. 权利要求 5的方法, 在所述步骤 3 ) 中进行两次比对, 步骤如下: 允 i ^个种子序列的两个错配, 每条测序序列最多 4个错配, 得到比对结果; 在 比对过程中不允许错配, 将第一次比对中比对到多个位置和没有比对上的序列 再进行一次比对, 得到二次比对结果; 合并两次比对结果。
PCT/CN2011/002242 2011-12-31 2011-12-31 一种基于MspJI酶切的DNA甲基化分析方法 WO2013097060A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/369,447 US20140364321A1 (en) 2011-12-31 2011-12-31 Method for analyzing DNA methylation based on MspJI cleavage
PCT/CN2011/002242 WO2013097060A1 (zh) 2011-12-31 2011-12-31 一种基于MspJI酶切的DNA甲基化分析方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/002242 WO2013097060A1 (zh) 2011-12-31 2011-12-31 一种基于MspJI酶切的DNA甲基化分析方法

Publications (1)

Publication Number Publication Date
WO2013097060A1 true WO2013097060A1 (zh) 2013-07-04

Family

ID=48696159

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/002242 WO2013097060A1 (zh) 2011-12-31 2011-12-31 一种基于MspJI酶切的DNA甲基化分析方法

Country Status (2)

Country Link
US (1) US20140364321A1 (zh)
WO (1) WO2013097060A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016024182A1 (en) * 2014-08-13 2016-02-18 Vanadis Diagnostics Method of estimating the amount of a methylated locus in a sample
WO2016061624A1 (en) * 2014-10-20 2016-04-28 Commonwealth Scientific And Industrial Research Organisation Genome methylation analysis

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3518656A4 (en) * 2016-09-30 2020-09-30 Monsanto Technology LLC METHOD OF SELECTING TARGETS FOR SITE-SPECIFIC GENOME MODIFICATION IN PLANTS
CN110322928B (zh) * 2019-08-16 2022-09-13 河海大学常州校区 Dna甲基化谱检测方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2376632B1 (en) * 2008-12-23 2016-11-02 New England Biolabs, Inc. Compositions, methods and related uses for cleaving modified dna
US20100216648A1 (en) * 2009-02-20 2010-08-26 Febit Holding Gmbh Synthesis of sequence-verified nucleic acids

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
COHEN-KARNI D ET AL.: "The MspJI family of modification-dependent restriction endonucleases for epigenetic studies.", PNAS., vol. 108, no. 27, 5 July 2011 (2011-07-05), pages 11040 - 11045, XP055098783, DOI: doi:10.1073/pnas.1018448108 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016024182A1 (en) * 2014-08-13 2016-02-18 Vanadis Diagnostics Method of estimating the amount of a methylated locus in a sample
CN107109468A (zh) * 2014-08-13 2017-08-29 苏州新波生物技术有限公司 评估样品中甲基化基因座的量的方法
US10174383B2 (en) 2014-08-13 2019-01-08 Vanadis Diagnostics Method of estimating the amount of a methylated locus in a sample
CN107109468B (zh) * 2014-08-13 2020-09-08 苏州新波生物技术有限公司 评估样品中甲基化基因座的量的方法
US10876169B2 (en) 2014-08-13 2020-12-29 Vanadis Diagnostics Method and kit for estimating the amount of a methylated locus in a sample
WO2016061624A1 (en) * 2014-10-20 2016-04-28 Commonwealth Scientific And Industrial Research Organisation Genome methylation analysis
US10889852B2 (en) 2014-10-20 2021-01-12 Commonwealth Scientific And Industrial Research Organisation Genome methylation analysis

Also Published As

Publication number Publication date
US20140364321A1 (en) 2014-12-11

Similar Documents

Publication Publication Date Title
Zhao et al. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies
CN108753967B (zh) 一种用于肝癌检测的基因集及其panel检测设计方法
CN106367485B (zh) 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用
Deschamps et al. Utilization of next-generation sequencing platforms in plant genomics and genetic variant discovery
Shin et al. CRISPR–Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis
WO2017076299A1 (zh) 一种多重pcr引物及应用
McCarty et al. Mu-seq: sequence-based mapping and identification of transposon induced mutations
EP3191628A1 (en) Identification and use of circulating nucleic acids
Zeng et al. Technical considerations for functional sequencing assays
KR20190112843A (ko) 희귀 돌연변이 및 카피수 변이를 검출하기 위한 시스템 및 방법
CN111755072B (zh) 一种同时检测甲基化水平、基因组变异和插入片段的方法及装置
CN102061526A (zh) 一种DNA文库及其制备方法、以及一种检测SNPs的方法和装置
Corney RNA-seq using next generation sequencing
US11319576B2 (en) Methods of producing nucleic acid libraries and compositions and kits for practicing same
CN110343748B (zh) 基于高通量靶向测序分析肿瘤突变负荷的方法
WO2013075629A1 (zh) 一种检测核酸羟甲基化修饰的方法及其应用
Johnson et al. Single nucleotide analysis of cytosine methylation by whole‐genome shotgun bisulfite sequencing
Cummings et al. Combining target enrichment with barcode multiplexing for high throughput SNP discovery
CN110628880A (zh) 一种同步使用信使rna与基因组dna模板检测基因变异的方法
EP4083231A1 (en) Compositions and methods for nucleic acid analysis
WO2013097060A1 (zh) 一种基于MspJI酶切的DNA甲基化分析方法
WO2013049135A1 (en) Algorithms for sequence determinations
CN108359723B (zh) 一种降低深度测序错误的方法
Yang et al. A genome-phenome association study in native microbiomes identifies a mechanism for cytosine modification in DNA and RNA
CN108866154B (zh) 基于长片段dna捕获和三代测序的无创产前单体型构建方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11878998

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14369447

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07/11/2014)

122 Ep: pct application non-entry in european phase

Ref document number: 11878998

Country of ref document: EP

Kind code of ref document: A1