WO2024114696A1 - Cpg island methylation enrichment sequencing technology based on restriction enzyme digestion - Google Patents

Cpg island methylation enrichment sequencing technology based on restriction enzyme digestion Download PDF

Info

Publication number
WO2024114696A1
WO2024114696A1 PCT/CN2023/135179 CN2023135179W WO2024114696A1 WO 2024114696 A1 WO2024114696 A1 WO 2024114696A1 CN 2023135179 W CN2023135179 W CN 2023135179W WO 2024114696 A1 WO2024114696 A1 WO 2024114696A1
Authority
WO
WIPO (PCT)
Prior art keywords
digestion
dna
methylation
sequencing
round
Prior art date
Application number
PCT/CN2023/135179
Other languages
French (fr)
Chinese (zh)
Inventor
姜正文
方欧
王果
林芳斌
Original Assignee
天昊基因科技(苏州)有限公司
上海天昊生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天昊基因科技(苏州)有限公司, 上海天昊生物科技有限公司 filed Critical 天昊基因科技(苏州)有限公司
Publication of WO2024114696A1 publication Critical patent/WO2024114696A1/en

Links

Definitions

  • the present invention relates to the field of DNA sequencing, and in particular to a CpG island methylation enrichment sequencing technology based on restriction enzyme digestion.
  • DNA methylation modification is crucial to normal gene expression and cell function, and is involved in regulating the most basic life activities such as gene expression and chromatin stability.
  • Gene methylation polymorphism is an important cause of individual phenotypic differences, and gene methylation variation can also lead to individual phenotypic abnormalities.
  • a large number of studies have shown that compared with normal cells, the methylation characteristics of tumor cell genes have undergone extensive and significant changes, and cancer-specific methylation variant genes have also been found in different cancer types. Therefore, gene methylation variation can be used as a pan-cancer biomarker.
  • methylation tumor markers rely on differential methylation gene analysis of genomic DNA in healthy tissues and cancer tissues.
  • WGBS Whole-genome bisulfite sequencing
  • RRBS Reduced representation bisulfite sequencing
  • Methylation detection for specific sites/regions usually uses methods such as MSP (Methylation-specific PCR), BSP (Bisulfite-Sequencing PCR) and MSRE-qPCR.
  • ctDNA (Plasma Cell-free tumor DNA, ctDNA), which is the focus of liquid biopsy, carries methylation information from tumor tissues. Its detection can realize cancer screening, companion diagnosis and prognosis monitoring. Due to the low copy number and high degree of degradation of ctDNA, its detection is not easy, and the weak methylation marker signal will also be masked by background noise. In early applications, taqman probe qPCR technology is usually relied on to detect designated methylation sites or haplotypes. In recent years, whole genome methylation sequencing (WGBS) based on NGS and target region methylation sequencing technology captured by probe hybridization have also been developed for ctDNA detection at the omics level.
  • WGBS whole genome methylation sequencing
  • This type of method can integrate the methylation information of a large number of sites, train mathematical models, and maximize the detection sensitivity and specificity. At the same time, it has the characteristics of tissue tracing, so it has gradually become the mainstream technology for pan-cancer screening.
  • both methods have limitations.
  • WGBS can obtain whole genome methylation information, it is limited by sequencing costs and has a low sequencing depth. For example, the common 90G sequencing volume can only obtain an average sequencing depth of about 20X, and the detection accuracy and sensitivity are poor.
  • the methylation variation regions in the genome that can actually be used to indicate cell carcinogenesis only account for a very small part, and whole genome sequencing is undoubtedly a strategy with low cost performance.
  • Methylation sequencing of target regions based on probe capture certainly makes up for the lack of depth of WGBS sequencing, but due to the complexity of methylation,
  • the haplotype pattern is complex (with the increase of the number of CpG sites in the target region, the number of haplotypes increases exponentially); there are still many technical difficulties in achieving efficient, stable and accurate capture of the target region.
  • the additional steps of probe synthesis and target region hybridization capture also increase the difficulty and cost of the experiment.
  • the purpose of the present invention is to provide a CpG island methylation enrichment sequencing technology based on restriction enzyme cutting.
  • a method for constructing a DNA methylation sequencing library comprising the steps of:
  • the DNA sample to be tested is selected from the following group: genomic DNA (gDNA), cell-free DNA (cfDNA), or a combination thereof.
  • step S1 the DNA sample to be tested is fragmented, preferably fragmented to 100-600 bp, more preferably 200-400 bp.
  • the linker in step S2), is a methylated linker, in which all cytosine Cs are 5-methylated.
  • the linker does not contain AATT or TTAA sequence.
  • the linker comprises a first chain and a second chain, the 3' end of the first chain is partially base complementary to the 5' end of the second chain, and after the two chains are annealed, a single A base of the first chain protrudes.
  • the 5' end of the first chain is modified with a phosphate group.
  • first chain nucleotide sequence is shown as SEQ ID NO.1
  • second chain nucleotide sequence is shown as SEQ ID NO.2.
  • step S2) the steps of end repair and A tailing are also included before connecting the adapter.
  • step S2 after connecting the methylated linker, a sorting and/or purification step is also included.
  • the sorting step includes sorting the length of the DNA insert fragments, preferably 200-400 bp, more preferably 250-350 bp.
  • the purification step includes purifying and recovering DNA fragments with double-ends connected to methylated adapters.
  • the sorting and/or purification includes magnetic bead sorting and/or purification.
  • step S3) the DNA is subjected to one or more rounds of AT enzyme digestion.
  • step S3 the AT enzyme cleavage is performed before CT transformation and/or after CT transformation; or is performed both before CT transformation and after CT transformation.
  • the non-methylated site cleavage is performed before CT conversion.
  • the non-methylated site digestion is performed before AT digestion, or after AT digestion, or simultaneously with AT digestion in the same reaction system.
  • step S3) further includes the step of PCR amplification.
  • step S3) comprises a step selected from the following group:
  • step S3) the AT enzyme cleavage is performed using an enzyme that can recognize and cleave AT-rich sequence DNA.
  • the enzyme is selected from the group consisting of restriction endonucleases, CRISPR gene editing enzymes, ZFNs, TALENs, giant nucleases, or combinations thereof.
  • the AT digestion is performed using a restriction endonuclease whose digestion recognition site contains only A and T bases.
  • the enzyme used for the AT cleavage is selected from the following group: MluCI, MseI, SspI, PsiI, AseI, DraI, PacI, AnaI, AcsI, AgsI, ApoI, AflII, BfrI, BspTI, BstAFI, EcoRI, EcoRV, FaiI, FauNDI, HpaI, KspAI, MfeI, MspCI, MssI, MunI, NdeI, PmeI, PshBI, SaqAI, SmiI, Sse9I, TasI, Tru1I, Tru9I, TspDTI, Tsp509I, VspI, XapI, HindIII, NsiI, NspV, PagI, PciI, SfuI, SnaBI, BfrBI, ClaI, ScaI, SwaI, or a combination thereof.
  • the AT digestion is single digestion with MseI, single digestion with MluCI, or double digestion with MseI and MluCI.
  • the CT conversion is sulfite chemical conversion or APOBEC deaminase conversion, preferably APOBEC deaminase conversion.
  • the CT conversion comprises the steps of:
  • TET2 enzyme oxidatively protects the 5mC base
  • the CT conversion comprises the steps of:
  • step S3 the non-methylated site cleavage is performed using a 5mC methylation-sensitive restriction endonuclease or a combination of endonucleases.
  • the 5mC methylation-sensitive restriction endonuclease or endonuclease combination is selected from the following group: HpaII endonuclease, BstUI endonuclease, FspI endonuclease, or a combination thereof.
  • step S3) the first round of PCR amplification is performed using a high-fidelity DNA amplification enzyme capable of amplifying templates containing uracil (U base).
  • the first round of PCR amplification is performed using adapter sequence-specific primers.
  • the first round of PCR amplification uses a forward primer nucleotide sequence as shown in SEQ ID NO.3, and a reverse primer nucleotide sequence as shown in SEQ ID NO.4.
  • the second PCR amplification uses a forward primer nucleotide sequence as shown in SEQ ID NO.5 and a reverse primer nucleotide sequence as shown in SEQ ID NO.6.
  • each AT digestion and/or non-methylation site digestion step may be followed by a sorting and/or purification step.
  • a DNA methylation sequencing library is provided.
  • the methylation sequencing library is constructed using the method described in the first aspect of the present invention.
  • a method for detecting DNA methylation in a sample comprising the steps of:
  • the sequencing includes using an Illumina sequencing platform or a BGI sequencing platform.
  • a method for diagnosing or predicting diseases related to abnormal DNA methylation comprising the steps of obtaining a DNA sample from a subject to be tested, and detecting methylation of the DNA sample using the method described in the first aspect of the present invention, thereby diagnosing or predicting the disease.
  • the disease is a tumor.
  • the subject is a human or non-human mammal.
  • FIG1 shows the experimental flow chart of the CpG island methylation enrichment sequencing technology based on restriction enzyme digestion.
  • Figure 3 shows the ratio of the sequencing depth of each CpG island after the first round of AT digestion to the depth before digestion in Example 2 when the same amount of sequencing data is used, displayed in a histogram.
  • the part corresponding to the black highlighted horizontal axis [0,1] indicates that after the first round of AT digestion, the sequencing depth is lower than the number of CpG islands without digestion.
  • Figure 4 shows a histogram of sequencing depths of the three libraries treated differently in Example 3 in the commonly detected CpG island regions, with the horizontal axis representing sequencing depths.
  • the top, middle and bottom display the libraries of "no restriction enzyme digestion", “first round AT digestion + non-methylated site digestion” and “first round AT digestion + non-methylated site digestion + second round AT digestion”.
  • the dotted line represents the mean of the sequencing depths of the libraries in these common CpG islands.
  • Figure 5 shows the enrichment effect of enzyme digestion on methylated CpG islands.
  • MseI and MluCI were used for the first round of AT digestion, and HpaII was used for non-methylated site digestion; MseI and MluCI were used for the second round of AT digestion before library construction and sequencing.
  • the results of enrichment sequencing of CpG sites in the promoter region of SEPTIN9 are shown: gray lines represent sequencing reads, NEGATIVE and POSITIVE represent positive and negative strands, respectively; black squares represent methylated CpG sites, and white squares represent non-methylated CpG sites.
  • the inventors After extensive and in-depth research, the inventors have developed a special methylation second-generation sequencing library construction method that introduces a restriction enzyme cutting step during the library construction process. Utilizing the characteristics of CpG islands rich in C and G bases, through the recognition site, the restriction enzyme cutting combination with different characteristics, the enrichment sequencing of CpG islands with methylation modification is realized.
  • the present invention is suitable for the research and application of methylation of CpG islands, not only for conventional genomic DNA, but also for the methylome detection of trace ctDNA, providing a new solution for the field of tumor liquid biopsy that relies on ctDNA methylation. On this basis, the present invention is completed.
  • the present invention comprehensively applies AT digestion, CT conversion and non-methylation site digestion.
  • the AT digestion step the interference of a large number of AT-enriched fragments can be removed, the sequencing efficiency can be improved, and the CpG island can be enriched.
  • non-methylation site digestion the interference of non-methylated CG fragments can be removed, the sequencing efficiency can be improved, and the CpG island containing methylation sites can be enriched.
  • CT conversion step on the one hand, non-methylated cytosine is converted, and only methylated cytosine is retained in the sequence; on the other hand, CT conversion further generates new AT digestion sites in the sequence, and secondary AT digestion can be performed to improve the enrichment efficiency.
  • methylation library and “enzyme digestion sequencing library” have the same meaning, and refer to a double-stranded DNA fragment that is repaired at the end, A is added, and a Y-shaped methylated adapter is connected, followed by one or more steps selected from AT digestion, non-methylated site digestion, CT conversion, or a combination thereof, and finally PCR enrichment amplification to obtain the library.
  • the term "AT digestion” refers to the use of a restriction endonuclease whose recognition sequence contains only A/T to digest the whole genome methylation sequencing library.
  • the restriction endonuclease whose recognition sequence contains only A/T includes any enzyme that can cut AT-rich sequences, such as but not limited to MseI (recognition site is TTAA) and MluCI (recognition site AATT).
  • non-methylated site digestion refers to the use of a 5mC methylation-sensitive restriction endonuclease to digest the whole genome methylation sequencing library.
  • the 5mC methylation-sensitive restriction endonuclease includes but is not limited to HpaII (recognition site is CCGG).
  • CT conversion refers to the deamination of cytosine into uracil or thymine.
  • CT conversion can be performed using an enzyme treatment method, wherein the 5mC base is protected by TET2 enzyme oxidation, and then the unprotected C base (unmethylated C base) is completely deaminated and converted into U base using APOBEC deaminase.
  • Conventional chemical methods can also be used to convert unmethylated C bases into U bases using traditional sulfite treatment. Enzyme treatment is preferred because it is milder and less likely to cause DNA chain breaks.
  • AT digestion can be performed before CT conversion and/or after CT conversion alone; it can also be performed before CT conversion and after CT conversion.
  • Non-methylation site digestion is performed before CT conversion, and can be performed separately or in combination with AT digestion.
  • universal primers at both ends of the library are used to enrich DNA fragments whose insert fragments do not contain the above-mentioned digestion sites.
  • the present invention utilizes the characteristics of CpG islands being rich in C and G bases, utilizes AT digestion and non-methylated site digestion to simplify the sequencing library, retains CpG island information, and can utilize the sequence changes after CT conversion to perform a second round of AT digestion to further improve the simplification efficiency.
  • Non-methylated site digestion further enriches the CpG island information that has undergone methylation.
  • two rounds of restriction enzyme digestion can be performed to remove AT-rich regions (AT digestion) or unmethylated regions (unmethylated site digestion). Performing two rounds of digestion steps can maximize the enrichment of CpG islands that have undergone methylation modification. After CT conversion, the sequence will change, and the new sequence may therefore constitute a new AT digestion site.
  • the method of the present invention can refer to Figure 1, comprising the steps of:
  • DNA fragments are first end-repaired and A-tailed, and methylated adapters are connected to both ends of the DNA fragments;
  • the methylated linker comprises a first strand and a second strand, all cytosine Cs are 5-methylated, wherein the 5' phosphate group of the first strand is modified, the 3' end of the first strand is partially complementary to the 5' end of the second strand, and after the two strands are annealed, a single A base of the first strand protrudes;
  • the type of DNA that can be used for library construction can be genomic DNA (gDNA) or cell-free DNA (cfDNA);
  • a high-fidelity DNA amplification enzyme capable of amplifying templates containing uracil (U base) is used to PCR amplify the fragments of the inserted fragments that do not contain restriction sites to obtain a single-round restriction digestion sequencing library;
  • a second round of PCR amplification is performed using primers matching the universal adapter to enrich the fragments whose inserts still do not contain the digestion site after CT conversion, and obtain a twice-digested sequencing library;
  • the sequencing library is sequenced on a sequencing machine.
  • the sequencing platform is determined based on the connected methylated adapters.
  • the Illumina sequencing platform or the BGI sequencing platform is used.
  • the method of the present invention reduces the sequencing cost while ensuring the acquisition of effective information of CpG islands as much as possible.
  • the CpG island enrichment effect can be as high as 10 times.
  • the method of the present invention eliminates the complex, high-cost and uncertain methylation probe design and capture steps.
  • the method of the present invention is compatible with various types of DNA samples, including genomic DNA and cell-free DNA.
  • This example simulates the enrichment multiple of CpG islands in the human genome sequence (hg19) by performing the first round of AT digestion with MseI and MluCI, respectively, followed by the second round of AT digestion after CT conversion.
  • the genome sequence was divided into N fragments; the number of fragments containing the AATT or TTAA base combination in the sequence was counted, recorded as N1; the number of fragments containing any one of the AATT, TTAA, AACC, AATC, AACT, CCAA, TCAA, and CTAA base combinations in the sequence was counted, recorded as N2;
  • the sequence of human CpG island (data source: UCSC database) is divided into M fragments; the number of fragments containing the base combination of AATT or TTAA in the sequence is counted, recorded as M1; the number of fragments containing any one of the base combinations of AATT, TTAA, AACC, AATC, AACT, CCAA, TCAA, and CTAA in the sequence is counted, recorded as M2;
  • the genome simplification efficiency is positively correlated with the sliding window length regardless of single or double AT digestion.
  • the genome simplification efficiency of single-round AT digestion is more susceptible to the sliding window length, and the genome simplification efficiency of double AT digestion is higher than that of single-round digestion, but is less affected by the sliding window length.
  • the CpG island information retention ratio after single and double digestion is negatively correlated with the sliding window length, and the CpG island information retention ratio of single-round AT digestion is higher, and is less affected by the sliding window length; while the CpG island information retention ratio of double AT digestion is lower than that of single-round digestion, and is more affected by the sliding window length.
  • 5mC-AD-F [ROX]ACACTCTTTCCCTACACGACGCTCTTCCGATCT (SEQ ID NO.1)
  • the primer sequences are:
  • the sequencing results are shown in the following table. The number of bases aligned to the entire genome and the number of bases aligned to the CpG region are counted separately to calculate the average sequencing depth. After one round of AT digestion, the sequencing depth of each G sequencing data in the CpG island is 1.4X compared to 0.4X of the undigested library, and the CpG island enrichment efficiency is about 3.5 times;
  • the sequencing data aligned to the CpG island after one round of AT digestion is mainly divided into the following types: A) all CpG island information is retained, B) CpG island information is partially retained, and C) CpG island information is completely lost ( Figure 2).
  • Figure 3 After calculation, under the same sequencing amount, among the 27,949 CpG islands, the number of sequencing depths after one round of AT digestion is higher than that of sequencing without digestion reached 25,606, accounting for 91.6% ( Figure 3).
  • each CpG site contained in a CpG island usually has a consistent methylation pattern, so After enzyme digestion, even if only part of the information in the CpG island is retained, it can be used to infer the methylation degree of the CpG island.
  • a single round of AT digestion can reduce sequencing costs by more than 70%, and obtain a sequencing depth that is no less than that of whole genome sequencing in more than 90% of CpG island regions.
  • Example 3 CpG island enrichment sequencing after two rounds of AT digestion and non-methylated site digestion of cfDNA
  • Enzymatic Methyl-seq Conversion Module (E7125L, NEB) was used for CT conversion;
  • TET2 Reaction Buffer a. Add 400 ⁇ L TET2 Reaction Buffer to a tube of TET2 Reaction Buffer Supplement and mix well. Mark it as TET2 Reaction Buffer (reconstituted).
  • the primer sequences are the same as in Example 2.
  • the undigested control amplification product is purified, it is directly used as the undigested control sequencing library; after the two digested CT conversion amplification products are purified, one is used as the sequencing library of single-round AT digestion + non-methylated site digestion,
  • the sequencing data were statistically analyzed. When the sequencing data volume was the same as 5G, there were 27949 CpG islands in total without enzyme digestion, of which 21223 CpG islands were sequenced to at least one read; after one round of AT digestion and unmethylated site digestion, at least one read could be detected in 17006 CpG islands, and the sequencing depth was higher than that of the undigested library for 9349 CpG islands; after a second round of AT digestion, at least one read could be detected in 15542 CpG islands, and the sequencing depth was higher than that of the undigested library for 10967 CpG islands.
  • the sequencing results counted the number of bases aligned to the entire genome and the number of bases aligned to the CpG island region, and calculated the average sequencing depth of each CpG island. Due to the differences in genome simplification efficiency of different processing libraries, the CpG islands shared by the sequencing data of the three libraries were counted; when the sequencing volume was 5G, in these effective regions, the average coverage depth was about 1.68X without enzyme digestion; after one round of AT digestion + non-methylated site digestion, the average coverage depth was about 2.99X; after two rounds of AT digestion + one round of non-methylated site digestion, the average coverage depth was 8.25X ( Figure 4).
  • the sequencing data after restriction digestion of non-methylated sites, the sequencing data also enriches CpG islands that have undergone methylation modification. As shown in Figure 5, a CpG island located in the promoter region of the SEPTIN9 gene has 4 reads detected without restriction digestion, all of which are non-methylated. After restriction digestion of non-methylated sites, methylated reads are enriched, and the number of sequenced reads increases with the increase in the number of AT restriction digestions.
  • AT enzyme cleavage can significantly increase the sequencing depth of the CpG island region, and enzyme cleavage at non-methylated sites can enrich the regions where methylation modification occurs.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Public Health (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Veterinary Medicine (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)

Abstract

The present invention provides a simple and efficient methylation sequencing library construction method. Restriction endonuclease combination enzyme digestion is added on the basis of a traditional whole-genome methylation sequencing method, thus realizing enrichment sequencing of CpG islands, especially methylation-modified CpG islands. The method of the present invention reduces the sequencing cost and eliminates complex operations, thus being suitable for research and applications targeting methylation of CpG islands.

Description

基于限制性酶切的CpG岛甲基化富集测序技术CpG island methylation enrichment sequencing technology based on restriction enzyme digestion 技术领域Technical Field
本发明涉及DNA测序领域。具体地说,本发明涉及基于限制性酶切的CpG岛甲基化富集测序技术。The present invention relates to the field of DNA sequencing, and in particular to a CpG island methylation enrichment sequencing technology based on restriction enzyme digestion.
背景技术Background technique
DNA的甲基化修饰(5mC)对正常的基因表达以及细胞功能至关重要,参与调控基因表达、染色质稳定性等最基础的生命活动。基因甲基化多态性是个体表型差异的重要原因,基因的甲基化变异也因此会导致个体表型异常。目前已有大量研究表明相较于正常细胞,肿瘤细胞基因的甲基化特征发生广泛且显著的变化,不同癌种也发现具有癌种特异性的甲基化变异基因。因此,基因的甲基化变异可作为泛癌种的生物标志物。DNA methylation modification (5mC) is crucial to normal gene expression and cell function, and is involved in regulating the most basic life activities such as gene expression and chromatin stability. Gene methylation polymorphism is an important cause of individual phenotypic differences, and gene methylation variation can also lead to individual phenotypic abnormalities. A large number of studies have shown that compared with normal cells, the methylation characteristics of tumor cell genes have undergone extensive and significant changes, and cancer-specific methylation variant genes have also been found in different cancer types. Therefore, gene methylation variation can be used as a pan-cancer biomarker.
甲基化肿瘤标志物的发现与研究依赖于对健康组织与癌组织的基因组DNA进行差异甲基化基因分析,目前已有多种成熟的技术能够用于全基因组甲基化检测,包括基于微阵列芯片的Illumina 850K甲基化芯片、基于NGS测序平台的WGBS(Whole-genome bisulfite sequencing,全基因组重亚硫酸盐测序)、RRBS(Reduced representation bisulfite sequencing,简化基因组重亚硫酸盐测序)等,基于抗体免疫沉淀的MeDip、MBD-seq技术等。而针对特定位点/区域的甲基化检测通常使用MSP(Methylation-specific PCR,甲基化特异PCR)、BSP(Bisulfite-Sequencing PCR,甲基化PCR测序)和MSRE-qPCR等方法。The discovery and research of methylation tumor markers rely on differential methylation gene analysis of genomic DNA in healthy tissues and cancer tissues. Currently, there are a variety of mature technologies that can be used for whole-genome methylation detection, including Illumina 850K methylation chips based on microarray chips, WGBS (Whole-genome bisulfite sequencing) and RRBS (Reduced representation bisulfite sequencing) based on NGS sequencing platforms, MeDip and MBD-seq technologies based on antibody immunoprecipitation, etc. Methylation detection for specific sites/regions usually uses methods such as MSP (Methylation-specific PCR), BSP (Bisulfite-Sequencing PCR) and MSRE-qPCR.
另一方面,作为液体活检关注对象的ctDNA(Plasma Cell-free tumor DNA,ctDNA)携带着来自肿瘤组织的甲基化信息,对其进行检测可实现对癌症的筛查、伴随诊断以及预后监测。因ctDNA拷贝数低,降解程度高,对其检测并非易事,并且微弱的甲基化标志物信号也会掩盖在背景噪音中。早期应用中通常依赖于taqman探针qPCR技术对指定的甲基化位点或单倍型进行检测。近年来,基于NGS的全基因组甲基化测序(WGBS)以及通过探针杂交捕获的目标区域甲基化测序技术也被开发用于组学水平的ctDNA的检测。这类方法能够综合大量位点的甲基化信息,训练数学模型,最大程度提升检测灵敏度与特异性,同时具备组织溯源的特点,因此逐渐成为泛癌种筛查的主流技术。然而在实际运用中,这两种方法均存在局限性。WGBS虽可获得全基因组甲基化信息,但因受限于测序成本,测序深度较低,如常见的90G测序量,仅能获得平均20X左右的测序深度,检测准确性与灵敏度较差。同时,基因组中实际可用于指征细胞癌变的甲基化变异区域仅占极小的一部分,全基因组测序无疑是一种性价比较低的策略。基于探针捕获的目标区域甲基化测序固然弥补了WGBS测序深度不足的缺陷,然而因甲基化复 杂的单倍型模式(随着目标区域CpG位点数量的增加,单倍型种类呈指数型上升);实现对目标区域高效、稳定、准确的捕获仍旧存在着诸多技术难题。而额外的探针合成和目标区域杂交捕获步骤也增加了实验的难度与成本。On the other hand, ctDNA (Plasma Cell-free tumor DNA, ctDNA), which is the focus of liquid biopsy, carries methylation information from tumor tissues. Its detection can realize cancer screening, companion diagnosis and prognosis monitoring. Due to the low copy number and high degree of degradation of ctDNA, its detection is not easy, and the weak methylation marker signal will also be masked by background noise. In early applications, taqman probe qPCR technology is usually relied on to detect designated methylation sites or haplotypes. In recent years, whole genome methylation sequencing (WGBS) based on NGS and target region methylation sequencing technology captured by probe hybridization have also been developed for ctDNA detection at the omics level. This type of method can integrate the methylation information of a large number of sites, train mathematical models, and maximize the detection sensitivity and specificity. At the same time, it has the characteristics of tissue tracing, so it has gradually become the mainstream technology for pan-cancer screening. However, in actual application, both methods have limitations. Although WGBS can obtain whole genome methylation information, it is limited by sequencing costs and has a low sequencing depth. For example, the common 90G sequencing volume can only obtain an average sequencing depth of about 20X, and the detection accuracy and sensitivity are poor. At the same time, the methylation variation regions in the genome that can actually be used to indicate cell carcinogenesis only account for a very small part, and whole genome sequencing is undoubtedly a strategy with low cost performance. Methylation sequencing of target regions based on probe capture certainly makes up for the lack of depth of WGBS sequencing, but due to the complexity of methylation, The haplotype pattern is complex (with the increase of the number of CpG sites in the target region, the number of haplotypes increases exponentially); there are still many technical difficulties in achieving efficient, stable and accurate capture of the target region. The additional steps of probe synthesis and target region hybridization capture also increase the difficulty and cost of the experiment.
因此,本领域需要开发一种简便高效地捕获DNA甲基化序列的方法。Therefore, there is a need in the art to develop a method for simply and efficiently capturing DNA methylation sequences.
发明内容Summary of the invention
本发明的目的就是提供基于限制性酶切的CpG岛甲基化富集测序技术。The purpose of the present invention is to provide a CpG island methylation enrichment sequencing technology based on restriction enzyme cutting.
在本发明的第一方面,提供了一种DNA甲基化测序文库的构建方法,包括步骤:In a first aspect of the present invention, a method for constructing a DNA methylation sequencing library is provided, comprising the steps of:
S1)提供待测DNA样品;S1) providing a DNA sample to be tested;
S2)对待测DNA两端连接接头序列,从而得到带接头的DNA;S2) connecting the adapter sequences to both ends of the DNA to be tested, thereby obtaining DNA with adapters;
S3)对所述带接头的DNA进行AT酶切、非甲基化位点酶切和CT转化,从而得到酶切测序文库。S3) performing AT digestion, non-methylation site digestion and CT conversion on the DNA with the adapter, thereby obtaining a digestion sequencing library.
在另一优选例中,步骤S1)中,所述待测DNA样品选自下组:基因组DNA(gDNA)、细胞游离DNA(cfDNA)、或其组合。In another preferred embodiment, in step S1), the DNA sample to be tested is selected from the following group: genomic DNA (gDNA), cell-free DNA (cfDNA), or a combination thereof.
在另一优选例中,步骤S1)中,所述待测DNA样品经过片段化处理,优选地被片段化至100-600bp,更优选为200-400bp。In another preferred embodiment, in step S1), the DNA sample to be tested is fragmented, preferably fragmented to 100-600 bp, more preferably 200-400 bp.
在另一优选例中,步骤S2)中,所述的接头为甲基化接头,其中所有胞嘧啶C均被5-甲基化修饰。In another preferred embodiment, in step S2), the linker is a methylated linker, in which all cytosine Cs are 5-methylated.
在另一优选例中,所述的接头不包含AATT或TTAA序列。In another preferred embodiment, the linker does not contain AATT or TTAA sequence.
在另一优选例中,所述的接头包含第一链和第二链,所述的第一链3'端与第二链的5'端部分碱基互补,两条链退火后,第一链单个A碱基突出。In another preferred embodiment, the linker comprises a first chain and a second chain, the 3' end of the first chain is partially base complementary to the 5' end of the second chain, and after the two chains are annealed, a single A base of the first chain protrudes.
在另一优选例中,所述的第一链5'端经磷酸基团修饰。In another preferred embodiment, the 5' end of the first chain is modified with a phosphate group.
在另一优选例中,所述的第一链核苷酸序列如SEQ ID NO.1所示,所述的第二链核苷酸序列如SEQ ID NO.2所示。In another preferred example, the first chain nucleotide sequence is shown as SEQ ID NO.1, and the second chain nucleotide sequence is shown as SEQ ID NO.2.
在另一优选例中,步骤S2)中,在连接接头前还包括末端修复和加A尾的步骤。In another preferred embodiment, in step S2), the steps of end repair and A tailing are also included before connecting the adapter.
在另一优选例中,步骤S2)中,连接甲基化接头后,还包括分选和/或纯化步骤。In another preferred embodiment, in step S2), after connecting the methylated linker, a sorting and/or purification step is also included.
在另一优选例中,所述的分选步骤包括分选DNA插入片段长度,优选长度为200-400bp,更优选为250-350bp。In another preferred embodiment, the sorting step includes sorting the length of the DNA insert fragments, preferably 200-400 bp, more preferably 250-350 bp.
在另一优选例中,所述纯化步骤包括纯化回收双端连接甲基化接头的DNA片段。In another preferred embodiment, the purification step includes purifying and recovering DNA fragments with double-ends connected to methylated adapters.
在另一优选例中,所述的分选和/或纯化包括磁珠分选和/或纯化。In another preferred embodiment, the sorting and/or purification includes magnetic bead sorting and/or purification.
在另一优选例中,步骤S3)中,对所述的DNA进行一轮或多轮AT酶切。 In another preferred embodiment, in step S3), the DNA is subjected to one or more rounds of AT enzyme digestion.
在另一优选例中,步骤S3)中,所述AT酶切在CT转化前和/或CT转化后单独进行;或在CT转化前和CT转化后均进行。In another preferred embodiment, in step S3), the AT enzyme cleavage is performed before CT transformation and/or after CT transformation; or is performed both before CT transformation and after CT transformation.
在另一优选例中,所述的非甲基化位点酶切在CT转化前进行。In another preferred embodiment, the non-methylated site cleavage is performed before CT conversion.
在另一优选例中,所述的非甲基化位点酶切在AT酶切之前进行、或在AT酶切之后进行、或在同一反应体系中与AT酶切同时进行。In another preferred embodiment, the non-methylated site digestion is performed before AT digestion, or after AT digestion, or simultaneously with AT digestion in the same reaction system.
在另一优选例中,步骤S3)中,还包括步骤:PCR扩增。In another preferred embodiment, step S3) further includes the step of PCR amplification.
在另一优选例中,步骤S3)包括选自下组的步骤:In another preferred embodiment, step S3) comprises a step selected from the following group:
i)a.对DNA进行非甲基化位点酶切和AT酶切,和i)a. Perform non-methylated site digestion and AT digestion on DNA, and
b.对经酶切的DNA进行CT转化,随后进行第一轮PCR扩增,从而得到酶切测序文库;或b. Perform CT conversion on the digested DNA, followed by a first round of PCR amplification to obtain a digested sequencing library; or
ii)a.对DNA进行非甲基化位点酶切,ii)a. Enzyme digestion of DNA at non-methylated sites,
b.对经酶切的DNA进行CT转化,随后进行第一轮PCR扩增,和b. CT conversion of the digested DNA followed by the first round of PCR amplification, and
c.对经转化的DNA进行AT酶切,随后进行第二轮PCR扩增,从而得到酶切测序文库;或c. performing AT digestion on the transformed DNA, followed by a second round of PCR amplification to obtain a digestion sequencing library; or
iii)a.对DNA进行非甲基化位点酶切和第一轮AT酶切,iii) a. Perform enzyme digestion of non-methylated sites and the first round of AT digestion on DNA,
b.对经酶切的DNA进行CT转化,随后进行第一轮PCR扩增,和b. CT conversion of the digested DNA followed by the first round of PCR amplification, and
c.对经转化的DNA进行AT酶切,随后进行第二轮PCR扩增,从而得到酶切测序文库。c. Perform AT digestion on the transformed DNA, followed by a second round of PCR amplification to obtain a digestion sequencing library.
在另一优选例中,步骤S3)中,所述AT酶切使用能够识别切割AT富有序列DNA的酶进行。In another preferred embodiment, in step S3), the AT enzyme cleavage is performed using an enzyme that can recognize and cleave AT-rich sequence DNA.
在另一优选例中,所述的酶选自下组:限制性内切酶、CRISPR基因编辑酶、ZFN、TALEN、巨型核酸酶、或其组合。In another preferred embodiment, the enzyme is selected from the group consisting of restriction endonucleases, CRISPR gene editing enzymes, ZFNs, TALENs, giant nucleases, or combinations thereof.
在另一优选例中,所述的AT酶切使用酶切识别位点仅包含A和T碱基的限制性内切酶进行。In another preferred embodiment, the AT digestion is performed using a restriction endonuclease whose digestion recognition site contains only A and T bases.
在另一优选例中,所述AT酶切使用的酶选自下组:MluCI、MseI、SspI、PsiI、AseI、DraI、PacI、AnaI、AcsI、AgsI、ApoI、AflII、BfrI、BspTI、BstAFI、EcoRI、EcoRV、FaiI、FauNDI、HpaI、KspAI、MfeI、MspCI、MssI、MunI、NdeI、PmeI、PshBI、SaqAI、SmiI、Sse9I、TasI、Tru1I、Tru9I、TspDTI、Tsp509I、VspI、XapI、HindIII、NsiI、NspV、PagI、PciI、SfuI、SnaBI、BfrBI、ClaI、ScaI、SwaI、或其组合。In another preferred embodiment, the enzyme used for the AT cleavage is selected from the following group: MluCI, MseI, SspI, PsiI, AseI, DraI, PacI, AnaI, AcsI, AgsI, ApoI, AflII, BfrI, BspTI, BstAFI, EcoRI, EcoRV, FaiI, FauNDI, HpaI, KspAI, MfeI, MspCI, MssI, MunI, NdeI, PmeI, PshBI, SaqAI, SmiI, Sse9I, TasI, Tru1I, Tru9I, TspDTI, Tsp509I, VspI, XapI, HindIII, NsiI, NspV, PagI, PciI, SfuI, SnaBI, BfrBI, ClaI, ScaI, SwaI, or a combination thereof.
在另一优选例中,所述的AT酶切为MseI单酶切、MluCI单酶切或MseI和MluCI双酶切。In another preferred embodiment, the AT digestion is single digestion with MseI, single digestion with MluCI, or double digestion with MseI and MluCI.
在另一优选例中,步骤S3)中,所述的CT转化为亚硫酸盐化学转化或APOBEC脱氨酶转化,优选APOBEC脱氨酶转化。In another preferred embodiment, in step S3), the CT conversion is sulfite chemical conversion or APOBEC deaminase conversion, preferably APOBEC deaminase conversion.
在另一优选例中,所述的CT转化包括步骤:In another preferred embodiment, the CT conversion comprises the steps of:
C1)TET2酶氧化保护5mC碱基; C1) TET2 enzyme oxidatively protects the 5mC base;
C2)使用APOBEC脱氨酶将未保护的C碱基(未甲基化修饰的C碱基)全部转化为U碱基。C2) Use APOBEC deaminase to convert all unprotected C bases (unmethylated C bases) into U bases.
在另一优选例中,所述的CT转化包括步骤:In another preferred embodiment, the CT conversion comprises the steps of:
使用亚硫酸盐将C碱基(未甲基化修饰的C碱基)全部转化为U碱基。Sulfite was used to convert all C bases (unmethylated C bases) into U bases.
在另一优选例中,步骤S3)中,所述的非甲基化位点酶切使用5mC甲基化敏感型限制性内切酶或内切酶组合进行。In another preferred embodiment, in step S3), the non-methylated site cleavage is performed using a 5mC methylation-sensitive restriction endonuclease or a combination of endonucleases.
在另一优选例中,所述的5mC甲基化敏感型限制性内切酶或内切酶组合选自下组:HpaII内切酶、BstUI内切酶、FspI内切酶、或其组合。In another preferred embodiment, the 5mC methylation-sensitive restriction endonuclease or endonuclease combination is selected from the following group: HpaII endonuclease, BstUI endonuclease, FspI endonuclease, or a combination thereof.
在另一优选例中,步骤S3)中,所述的第一轮PCR扩增使用能够扩增包含尿嘧啶(U碱基)模板的高保真DNA扩增酶进行。In another preferred embodiment, in step S3), the first round of PCR amplification is performed using a high-fidelity DNA amplification enzyme capable of amplifying templates containing uracil (U base).
在另一优选例中,所述的第一轮PCR扩增使用接头序列特异性引物进行。In another preferred embodiment, the first round of PCR amplification is performed using adapter sequence-specific primers.
在另一优选例中,所述的第一轮PCR扩增使用正向引物核苷酸序列如SEQ ID NO.3所示,反向引物核苷酸序列如SEQ ID NO.4所示。In another preferred example, the first round of PCR amplification uses a forward primer nucleotide sequence as shown in SEQ ID NO.3, and a reverse primer nucleotide sequence as shown in SEQ ID NO.4.
在另一优选例中,所述的第二次PCR扩增使用正向引物核苷酸序列如SEQ ID NO.5所示,反向引物核苷酸序列如SEQ ID NO.6所示。In another preferred example, the second PCR amplification uses a forward primer nucleotide sequence as shown in SEQ ID NO.5 and a reverse primer nucleotide sequence as shown in SEQ ID NO.6.
在另一优选例中,每一个AT酶切和/或非甲基化位点酶切步骤后,均可选地包括分选和/或纯化步骤。In another preferred embodiment, each AT digestion and/or non-methylation site digestion step may be followed by a sorting and/or purification step.
在本发明的第二方面,提供了一种DNA甲基化测序文库,所述甲基化测序文库是用如本发明第一方面所述的方法构建的。In the second aspect of the present invention, a DNA methylation sequencing library is provided. The methylation sequencing library is constructed using the method described in the first aspect of the present invention.
在本发明的第三方面,提供了一种检测样品中DNA甲基化的方法,包括步骤:In a third aspect of the present invention, a method for detecting DNA methylation in a sample is provided, comprising the steps of:
1)利用如本发明第一方面所述的方法构建甲基化文库;和1) constructing a methylation library using the method described in the first aspect of the present invention; and
2)对所述甲基化文库进行测序,从而检测样品中的DNA甲基化。2) Sequencing the methylation library to detect DNA methylation in the sample.
在另一优选例中,所述测序包括使用Illumina测序平台、或使用华大测序平台。In another preferred embodiment, the sequencing includes using an Illumina sequencing platform or a BGI sequencing platform.
在本发明的第四方面,提供了一种DNA甲基化异常相关疾病诊断或预测方法,包括步骤:获取来自待测对象的DNA样本,并使用如本发明第一方面所述的方法检测DNA样本甲基化,从而诊断或预测疾病。In the fourth aspect of the present invention, a method for diagnosing or predicting diseases related to abnormal DNA methylation is provided, comprising the steps of obtaining a DNA sample from a subject to be tested, and detecting methylation of the DNA sample using the method described in the first aspect of the present invention, thereby diagnosing or predicting the disease.
在另一优选例中,所述的疾病为肿瘤。In another preferred embodiment, the disease is a tumor.
在另一优选例中,所述的对象为人或非人哺乳动物。In another preferred embodiment, the subject is a human or non-human mammal.
应理解,在本发明范围内中,本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合,从而构成新的或优选的技术方案。 限于篇幅,在此不再一一累述。It should be understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features specifically described below (such as embodiments) can be combined with each other to form new or preferred technical solutions. Due to limited space, I will not elaborate on them one by one here.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
下列附图用于说明本发明的具体实施方案,而不用于限定由权利要求书所界定的本发明范围。The following drawings are used to illustrate specific embodiments of the present invention and are not used to limit the scope of the present invention defined by the claims.
图1显示了基于限制性酶切的CpG岛甲基化富集测序技术实验流程图。FIG1 shows the experimental flow chart of the CpG island methylation enrichment sequencing technology based on restriction enzyme digestion.
图2显示了实施例2中,第一轮AT酶切后,测序数据在CpG岛的分布示例;CpG岛范围使用灰色长条(峰图下方)表示,A)CpG岛信息均保留,B)CpG岛信息部分保留,C)CpG岛信息完全丢失。上半部分为未酶切文库的测序数据在CpG岛分布,下半部分为第一轮AT酶切后,测序数据在CpG岛分布。Figure 2 shows an example of the distribution of sequencing data in CpG islands after the first round of AT digestion in Example 2; the CpG island range is represented by a gray bar (below the peak graph), A) all CpG island information is retained, B) CpG island information is partially retained, and C) CpG island information is completely lost. The upper part shows the distribution of sequencing data in CpG islands for undigested libraries, and the lower part shows the distribution of sequencing data in CpG islands after the first round of AT digestion.
图3显示了实施例2中,相同测序数据量时,经第一轮AT酶切后各CpG岛测序深度与未酶切时深度的比值,以直方图展示。黑色高亮的横坐标[0,1]对应的部分,表示第一轮AT酶切后,测序深度低于未酶切的CpG岛数量。Figure 3 shows the ratio of the sequencing depth of each CpG island after the first round of AT digestion to the depth before digestion in Example 2 when the same amount of sequencing data is used, displayed in a histogram. The part corresponding to the black highlighted horizontal axis [0,1] indicates that after the first round of AT digestion, the sequencing depth is lower than the number of CpG islands without digestion.
图4显示了实施例3中经不同处理的3份文库在共同检测到的CpG岛区域的测序深度直方图,横坐标表示测序深度。上中下分别展示“未进行限制性酶切”、“第一轮AT酶切+非甲基化位点酶切”和“第一轮AT酶切+非甲基化位点酶切+第二轮AT酶切”文库。虚线表示文库在这些共有CpG岛测序深度的均值。Figure 4 shows a histogram of sequencing depths of the three libraries treated differently in Example 3 in the commonly detected CpG island regions, with the horizontal axis representing sequencing depths. The top, middle and bottom display the libraries of "no restriction enzyme digestion", "first round AT digestion + non-methylated site digestion" and "first round AT digestion + non-methylated site digestion + second round AT digestion". The dotted line represents the mean of the sequencing depths of the libraries in these common CpG islands.
图5显示了酶切对于甲基化修饰的CpG岛的富集作用。其中,使用MseI和MluCI进行第一轮AT酶切,HpaII进行非甲基化位点酶切;MseI和MluCI进行第二轮AT酶切后建库测序。展示SEPTIN9启动子区CpG位点富集测序结果:灰色线条代表测序read,NEGATIVE与POSITIVE分别代表正链和负链;黑色方块代表甲基化的CpG位点,白色方块代表非甲基化的CpG位点。Figure 5 shows the enrichment effect of enzyme digestion on methylated CpG islands. MseI and MluCI were used for the first round of AT digestion, and HpaII was used for non-methylated site digestion; MseI and MluCI were used for the second round of AT digestion before library construction and sequencing. The results of enrichment sequencing of CpG sites in the promoter region of SEPTIN9 are shown: gray lines represent sequencing reads, NEGATIVE and POSITIVE represent positive and negative strands, respectively; black squares represent methylated CpG sites, and white squares represent non-methylated CpG sites.
具体实施方式Detailed ways
本发明人经过广泛而深入的研究,开发出一种在建库过程中引入限制性酶切步骤的特殊甲基化二代测序文库构建方式。利用CpG岛富含C、G碱基的特征,通过识别位点,特性不同的限制性酶切组合,实现对发生了甲基化修饰的CpG岛的富集测序,本发明适用于针对CpG岛的甲基化的研究与应用,不仅适合常规基因组DNA,更加适合痕量ctDNA的甲基化组学检测,为依赖ctDNA甲基化进行的肿瘤液体活检领域提供了全新的解决方案。在此基础上,完成了本发明。After extensive and in-depth research, the inventors have developed a special methylation second-generation sequencing library construction method that introduces a restriction enzyme cutting step during the library construction process. Utilizing the characteristics of CpG islands rich in C and G bases, through the recognition site, the restriction enzyme cutting combination with different characteristics, the enrichment sequencing of CpG islands with methylation modification is realized. The present invention is suitable for the research and application of methylation of CpG islands, not only for conventional genomic DNA, but also for the methylome detection of trace ctDNA, providing a new solution for the field of tumor liquid biopsy that relies on ctDNA methylation. On this basis, the present invention is completed.
本发明综合应用了AT酶切、CT转化和非甲基化位点酶切。通过AT酶切步骤,能够去除大量AT富集片段的干扰,提高测序效率,富集CpG岛。通过使用非甲基化位点酶切,能够去除非甲基化CG片段的干扰,提高测序效率,富集含甲基化位点的CpG岛。通过CT转化步骤,一方面使非甲基化的胞嘧啶被转化,在序列中仅保留了甲基化的胞嘧啶;另一方面,CT转化进一步在序列中产生了新的AT酶切位点,能够进行二次AT酶切以提高富集效率。 The present invention comprehensively applies AT digestion, CT conversion and non-methylation site digestion. Through the AT digestion step, the interference of a large number of AT-enriched fragments can be removed, the sequencing efficiency can be improved, and the CpG island can be enriched. By using non-methylation site digestion, the interference of non-methylated CG fragments can be removed, the sequencing efficiency can be improved, and the CpG island containing methylation sites can be enriched. Through the CT conversion step, on the one hand, non-methylated cytosine is converted, and only methylated cytosine is retained in the sequence; on the other hand, CT conversion further generates new AT digestion sites in the sequence, and secondary AT digestion can be performed to improve the enrichment efficiency.
术语the term
如本文所用,术语“甲基化文库”、“酶切测序文库”具有相同的含义,是指双链DNA片段经末端修复,加A,连接Y型甲基化接头后,再进行一个或多个选自AT酶切、非甲基化位点酶切、CT转化、或其组合的步骤,最终PCR富集扩增所获得的文库。As used herein, the terms "methylation library" and "enzyme digestion sequencing library" have the same meaning, and refer to a double-stranded DNA fragment that is repaired at the end, A is added, and a Y-shaped methylated adapter is connected, followed by one or more steps selected from AT digestion, non-methylated site digestion, CT conversion, or a combination thereof, and finally PCR enrichment amplification to obtain the library.
如本文所用,术语“AT酶切”是指使用识别序列仅包含A/T的限制性内切酶对全基因组甲基化测序文库进行限制性内切酶消化。在优选的实施方式中,所述的识别序列仅包含A/T的限制性内切酶包括任意的能够切割AT富集序列的酶,例如但不限于MseI(识别位点为TTAA)和MluCI(识别位点AATT)。As used herein, the term "AT digestion" refers to the use of a restriction endonuclease whose recognition sequence contains only A/T to digest the whole genome methylation sequencing library. In a preferred embodiment, the restriction endonuclease whose recognition sequence contains only A/T includes any enzyme that can cut AT-rich sequences, such as but not limited to MseI (recognition site is TTAA) and MluCI (recognition site AATT).
如本文所用,术语“非甲基化位点酶切”是指使用5mC甲基化敏感型限制性内切酶对全基因组甲基化测序文库进行限制性内切酶消化。在优选的实施方式中,所述的5mC甲基化敏感型限制性内切酶包括但不限于HpaII(识别位点为CCGG)。As used herein, the term "non-methylated site digestion" refers to the use of a 5mC methylation-sensitive restriction endonuclease to digest the whole genome methylation sequencing library. In a preferred embodiment, the 5mC methylation-sensitive restriction endonuclease includes but is not limited to HpaII (recognition site is CCGG).
如本文所用,术语“CT转化”是指胞嘧啶脱氨转化为尿嘧啶或胸腺嘧啶。CT转化可使用酶处理方法,其中5mC碱基经TET2酶氧化保护后,使用APOBEC脱氨酶将未保护的C碱基(未甲基化修饰的C碱基)全部脱氨转化为U碱基。也可使用常规的化学方法,使用传统亚硫酸盐处理将未甲基化修饰的C碱基转化为U碱基。优选使用酶处理法,因为其较温和,不易造成DNA链断裂。As used herein, the term "CT conversion" refers to the deamination of cytosine into uracil or thymine. CT conversion can be performed using an enzyme treatment method, wherein the 5mC base is protected by TET2 enzyme oxidation, and then the unprotected C base (unmethylated C base) is completely deaminated and converted into U base using APOBEC deaminase. Conventional chemical methods can also be used to convert unmethylated C bases into U bases using traditional sulfite treatment. Enzyme treatment is preferred because it is milder and less likely to cause DNA chain breaks.
在本发明中,AT酶切可在CT转化前和/或CT转化后单独进行;也可在CT转化前和CT转化后均进行。非甲基化位点酶切在CT转化前进行,可与AT酶切分开或组合进行。酶切后再使用文库两端通用引物富集插入片段不含上述酶切位点的DNA片段。In the present invention, AT digestion can be performed before CT conversion and/or after CT conversion alone; it can also be performed before CT conversion and after CT conversion. Non-methylation site digestion is performed before CT conversion, and can be performed separately or in combination with AT digestion. After digestion, universal primers at both ends of the library are used to enrich DNA fragments whose insert fragments do not contain the above-mentioned digestion sites.
技术原理Technical principle
为了便于理解本发明,本说明书中提供以下原理。应理解,本发明的保护范围并不受到所述原理的限制。In order to facilitate the understanding of the present invention, the following principles are provided in this specification. It should be understood that the protection scope of the present invention is not limited by the principles.
本发明利用CpG岛富含C、G碱基的特征,利用AT酶切和非甲基化位点酶切简化测序文库、保留CpG岛信息,并可以利用CT转化后的序列改变,进行第二轮AT酶切以进一步提高简化效率。非甲基化位点酶切进一步富集发生了甲基化的CpG岛信息。The present invention utilizes the characteristics of CpG islands being rich in C and G bases, utilizes AT digestion and non-methylated site digestion to simplify the sequencing library, retains CpG island information, and can utilize the sequence changes after CT conversion to perform a second round of AT digestion to further improve the simplification efficiency. Non-methylated site digestion further enriches the CpG island information that has undergone methylation.
如图1所示,本发明的方法中,可以进行两轮限制性酶切(CT转化前/后),从而分别去除AT富集区域(AT酶切)或去除未甲基化区域(非甲基化位点酶切)。进行两轮酶切步骤可最大程度对发生了甲基化修饰的CpG岛实现富集。CT转化后,序列会发生改变,新的序列因此可能组成新的AT酶切位点。As shown in Figure 1, in the method of the present invention, two rounds of restriction enzyme digestion (before/after CT conversion) can be performed to remove AT-rich regions (AT digestion) or unmethylated regions (unmethylated site digestion). Performing two rounds of digestion steps can maximize the enrichment of CpG islands that have undergone methylation modification. After CT conversion, the sequence will change, and the new sequence may therefore constitute a new AT digestion site.
甲基化文库构建方法 Methylation library construction method
典型地,本发明方法流程可参考图1,包括步骤:Typically, the method of the present invention can refer to Figure 1, comprising the steps of:
1)DNA片段首先经末端修复,加A尾的步骤,在DNA片段两端连接上甲基化接头;1) The DNA fragments are first end-repaired and A-tailed, and methylated adapters are connected to both ends of the DNA fragments;
2)甲基化接头包含第一链和第二链,所有的胞嘧啶C均被5-甲基化修饰,其中第一链5’磷酸基团修饰,第一链3’端与第二链的5端部分碱基互补,两条链退火后,第一链单个A碱基突出;2) The methylated linker comprises a first strand and a second strand, all cytosine Cs are 5-methylated, wherein the 5' phosphate group of the first strand is modified, the 3' end of the first strand is partially complementary to the 5' end of the second strand, and after the two strands are annealed, a single A base of the first strand protrudes;
3)可进行建库的DNA类型可以为基因组DNA(gDNA),或细胞游离DNA(cfDNA);3) The type of DNA that can be used for library construction can be genomic DNA (gDNA) or cell-free DNA (cfDNA);
4)如对基因组DNA进行建库,首先需要将其片段化至目标大小,如200-400bp,或100-600bp;4) If you are building a library of genomic DNA, you first need to fragment it to the target size, such as 200-400 bp, or 100-600 bp;
5)如对cfDNA进行建库,不需要进行片段化步骤;5) If the cfDNA library is constructed, no fragmentation step is required;
6)甲基化接头连接后,如对gDNA建库,需使用磁珠分选适当插入片段长度的文库,如250-350bp;次优的,200-400bp;如对cfDNA建库,使用磁珠纯化连接产物,需尽可能多的回收双端连接了甲基化接头的DNA片段;6) After methylated adapter ligation, if you are building a library for gDNA, you need to use magnetic beads to sort the library with an appropriate insert fragment length, such as 250-350bp; suboptimal, 200-400bp; if you are building a library for cfDNA, use magnetic beads to purify the ligation products, and you need to recover as many DNA fragments as possible that are ligated to the methylated adapter at both ends;
7)进行第一轮AT酶切,使用酶切识别位点仅包含A和T碱基的限制性内切酶,对上述连接产物进行AT酶切。最优的,使用MseI(识别位点为TTAA)和MluCI(识别位点AATT)双酶切;较优的,分别使用MseI(识别位点为TTAA)和MluCI(识别位点AATT)单酶切;7) Perform the first round of AT digestion, using a restriction endonuclease whose digestion recognition site contains only A and T bases, to perform AT digestion on the above ligation product. The best is to use MseI (recognition site is TTAA) and MluCI (recognition site AATT) for double digestion; the better is to use MseI (recognition site is TTAA) and MluCI (recognition site AATT) for single digestion respectively;
8)使用5mC甲基化敏感型限制性内切酶,对连接产物进行非甲基化位点酶切。最优的,使用HpaII(CCGG)酶切;8) Use a 5mC methylation-sensitive restriction endonuclease to digest the ligation product at the non-methylated site. The best approach is to use HpaII (CCGG) for digestion;
9)如进行第一轮AT酶切和非甲基化位点酶切,可在同一反应体系中同时进行多酶切;9) If the first round of AT digestion and non-methylated site digestion are performed, multiple digestions can be performed simultaneously in the same reaction system;
10)酶切产物使用磁珠纯化后,进行CT转化处理。最优的,使用酶法处理,5mC碱基经TET2酶氧化保护后,使用APOBEC脱氨酶将未保护的C碱基(未甲基化修饰的C碱基)全部脱氨转化为U碱基;较优的,使用传统亚硫酸盐处理将未甲基化修饰的C碱基转化为U碱基;10) After the enzyme digestion product is purified using magnetic beads, it is subjected to CT conversion treatment. The best is to use enzymatic treatment, after the 5mC base is oxidized and protected by TET2 enzyme, the unprotected C base (unmethylated C base) is completely deaminated and converted into U base using APOBEC deaminase; the better is to use traditional sulfite treatment to convert the unmethylated C base into U base;
11)CT转化后,使用能够扩增包含尿嘧啶(U碱基)模板的高保真DNA扩增酶,PCR扩增富集插入片段不包含酶切位点的片段,获得单轮酶切测序文库;11) After CT conversion, a high-fidelity DNA amplification enzyme capable of amplifying templates containing uracil (U base) is used to PCR amplify the fragments of the inserted fragments that do not contain restriction sites to obtain a single-round restriction digestion sequencing library;
12)CT转化后,序列发生改变,可进行第二轮AT酶切,再次使用酶切识别位点仅包含A和T碱基的限制性内切酶,对上述连接产物进行AT酶切。最优的,使用MseI(识别位点为TTAA)和MluCI(识别位点AATT)双酶切;较优的,分别使用MseI(识别位点为TTAA)和MluCI(识别位点AATT)单酶切;12) After CT conversion, the sequence changes, and a second round of AT digestion can be performed, using a restriction endonuclease whose digestion recognition site contains only A and T bases, to perform AT digestion on the above ligation product. The best is to use MseI (recognition site is TTAA) and MluCI (recognition site AATT) for double digestion; the better is to use MseI (recognition site is TTAA) and MluCI (recognition site AATT) for single digestion;
13)第二轮AT酶切后,使用与通用接头匹配的引物,进行第二轮PCR扩增,富集CT转化后,插入片段仍不包含酶切位点的片段,获得两次酶切测序文库;13) After the second round of AT digestion, a second round of PCR amplification is performed using primers matching the universal adapter to enrich the fragments whose inserts still do not contain the digestion site after CT conversion, and obtain a twice-digested sequencing library;
14)以上第一轮AT酶切,非甲基化位点酶切,第二轮酶切可组合进行; 14) The above first round of AT digestion, non-methylated site digestion, and second round of digestion can be performed in combination;
15)测序文库上机测序,根据连接的甲基化接头,决定上机平台,优选地使用Illumina测序平台,或使用华大测序平台。15) The sequencing library is sequenced on a sequencing machine. The sequencing platform is determined based on the connected methylated adapters. Preferably, the Illumina sequencing platform or the BGI sequencing platform is used.
本发明的主要优点包括:The main advantages of the present invention include:
1)本发明方法相较于传统的全基因组甲基化测序方法,在降低测序成本的同时,尽可能保证了CpG岛有效信息的获取,根据酶切组合的差别,可实现高至10倍的CpG岛富集效果。1) Compared with the traditional whole-genome methylation sequencing method, the method of the present invention reduces the sequencing cost while ensuring the acquisition of effective information of CpG islands as much as possible. Depending on the difference in enzyme cutting combinations, the CpG island enrichment effect can be as high as 10 times.
2)本发明方法相较于基于探针捕获的目标区域富集甲基化测序方法,省去了复杂、高成本且不确定的甲基化探针设计与捕获步骤。2) Compared with the target region enrichment methylation sequencing method based on probe capture, the method of the present invention eliminates the complex, high-cost and uncertain methylation probe design and capture steps.
3)本发明方法能够兼容各种类型的DNA样本,包括基因组DNA与细胞游离DNA。3) The method of the present invention is compatible with various types of DNA samples, including genomic DNA and cell-free DNA.
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明具体条件的实验方法,通常按照常规条件,例如Sambrook等人,分子克隆:实验室手册(New York:Cold Spring Harbor Laboratory Press,1989)中所述的条件,或按照制造厂商所建议的条件。除非另外说明,否则百分比和份数是重量百分比和重量份数。The present invention is further described below in conjunction with specific examples. It should be understood that these examples are only used to illustrate the present invention and are not used to limit the scope of the present invention. The experimental methods in the following examples where specific conditions are not specified are usually carried out under conventional conditions, such as the conditions described in Sambrook et al., Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory Press, 1989), or according to the conditions recommended by the manufacturer. Unless otherwise stated, percentages and parts are weight percentages and weight parts.
实施例1 人基因组富集模拟数据Example 1 Human genome enrichment simulation data
该实施例模拟使用MseI和MluCI分别进行第一轮AT酶切,CT转化后进而进行第二轮AT酶切,人基因组序列(hg19)中CpG岛的富集倍数。This example simulates the enrichment multiple of CpG islands in the human genome sequence (hg19) by performing the first round of AT digestion with MseI and MluCI, respectively, followed by the second round of AT digestion after CT conversion.
1)按100bp,200bp,300bp,400bp,500bp,以及ctDNA的特征长度170bp为滑动窗口,步移长度为1个碱基,将基因组序列分割为N个片段;统计序列包含有AATT或TTAA碱基组合的片段数量,记作N1;统计序列包含有AATT,TTAA,AACC,AATC,AACT,CCAA,TCAA,CTAA碱基组合中任意一种的片段数量,记作N2;1) Using 100bp, 200bp, 300bp, 400bp, 500bp, and the characteristic length of ctDNA 170bp as sliding windows, and a step length of 1 base, the genome sequence was divided into N fragments; the number of fragments containing the AATT or TTAA base combination in the sequence was counted, recorded as N1; the number of fragments containing any one of the AATT, TTAA, AACC, AATC, AACT, CCAA, TCAA, and CTAA base combinations in the sequence was counted, recorded as N2;
2)同样按上述长度的滑动窗口与步移单位,将人CpG岛的序列(数据来源自UCSC数据库),分割为M个片段;统计序列包含有AATT或TTAA碱基组合的片段数量,记作M1;统计序列包含有AATT,TTAA,AACC,AATC,AACT,CCAA,TCAA,CTAA碱基组合中任意一种的片段数量,记作M2;2) Similarly, according to the above-mentioned sliding window length and step unit, the sequence of human CpG island (data source: UCSC database) is divided into M fragments; the number of fragments containing the base combination of AATT or TTAA in the sequence is counted, recorded as M1; the number of fragments containing any one of the base combinations of AATT, TTAA, AACC, AATC, AACT, CCAA, TCAA, and CTAA in the sequence is counted, recorded as M2;
3)计算单轮AT酶切基因组简化效率R1=N/(N-N1);两轮AT酶切基因组简化效率R2=N/(N-N2),基因组简化效率直接影响测序量,决定实验成本;3) Calculate the genome simplification efficiency of a single round of AT digestion: R1 = N/(N-N1); the genome simplification efficiency of two rounds of AT digestion: R2 = N/(N-N2). The genome simplification efficiency directly affects the sequencing volume and determines the experimental cost.
4)计算单轮AT酶切CpG岛保留比例K1=(M-M1)/M;两轮AT酶切CpG岛保留比例K2=(M-M2)/M,CpG岛保留比例直接影响能够获得的有效信息;4) Calculate the CpG island retention ratio after a single round of AT digestion K1 = (M-M1)/M; the CpG island retention ratio after two rounds of AT digestion K2 = (M-M2)/M. The CpG island retention ratio directly affects the effective information that can be obtained.
5)计算结果如下表所示,可见当滑动窗口为170bp时(ctDNA特征长度),经单轮AT酶切后,基因组简化效率达到3.9倍,可保留77%的CpG岛信 息;经两轮AT酶切后,基因组简化效率达到10.7倍,但保留的CpG岛信息下降为24%;5) The calculation results are shown in the following table. It can be seen that when the sliding window is 170bp (the characteristic length of ctDNA), after a single round of AT digestion, the genome simplification efficiency reaches 3.9 times, and 77% of the CpG island signal can be retained. After two rounds of AT digestion, the genome simplification efficiency reached 10.7 times, but the retained CpG island information dropped to 24%;
6)根据模拟结果,可见无论单轮还是两轮AT酶切,基因组简化效率都与滑动窗口长度呈正相关。单轮AT酶切的基因组简化效率更易受滑动窗口长度的影响,两轮AT酶切基因组简化效率本身高于单轮酶切,但受滑动窗口长度的影响较小。另一方面,经过单轮和两轮酶切的CpG岛信息保留比例均与滑动窗口长度负相关,单轮AT酶切的CpG岛信息保留比例较高,受滑动窗口长度的影响也较小;而两轮AT酶切CpG岛信息保留比例本身低于单轮酶切,并且受滑动窗口长度的影响较大。6) According to the simulation results, it can be seen that the genome simplification efficiency is positively correlated with the sliding window length regardless of single or double AT digestion. The genome simplification efficiency of single-round AT digestion is more susceptible to the sliding window length, and the genome simplification efficiency of double AT digestion is higher than that of single-round digestion, but is less affected by the sliding window length. On the other hand, the CpG island information retention ratio after single and double digestion is negatively correlated with the sliding window length, and the CpG island information retention ratio of single-round AT digestion is higher, and is less affected by the sliding window length; while the CpG island information retention ratio of double AT digestion is lower than that of single-round digestion, and is more affected by the sliding window length.
该实施例可知,可通过调整文库插入长度,以及AT酶切策略,对基因组简化效率和CpG岛信息保留比例这两个关键参数进行调整,两者需平衡以更贴合实际需求。From this embodiment, it can be seen that by adjusting the library insertion length and the AT restriction enzyme cutting strategy, the two key parameters of genome simplification efficiency and CpG island information retention ratio can be adjusted, and the two need to be balanced to better meet actual needs.
表1滑动窗口与简化效率及CpG岛信息保留比例的关系
Table 1 Relationship between sliding window, simplification efficiency and CpG island information retention ratio
实施例2:人基因组DNA进行第一轮AT酶切测序结果Example 2: Results of the first round of AT restriction enzyme digestion and sequencing of human genomic DNA
1)化学合成下列接头序列,HPLC纯化;1) Chemically synthesize the following linker sequences and HPLC purify;
5mC-AD-F:[ROX]ACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO.1)5mC-AD-F: [ROX]ACACTCTTTCCCTACACGACGCTCTTCCGATCT (SEQ ID NO.1)
5mC-AD-R:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[HEX](SEQ ID NO.2) 以上序列所有C为5mC碱基;5mC-AD-R:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[HEX] (SEQ ID NO.2) All Cs in the above sequences are 5mC bases;
2)使用退火缓冲液(10mM Tris-Cl pH 8.0,1mM EDTA,50mM NaCl)将上述合成的oligo溶解至终浓度100μM。取25μL 5mC-AD-F和25μL 5mC-AD-R,在0.2ml薄壁管内混合为50μL的5mC-AD(终浓度50μM);2) Use annealing buffer (10mM Tris-Cl pH 8.0, 1mM EDTA, 50mM NaCl) to dissolve the above synthesized oligo to a final concentration of 100μM. Take 25μL 5mC-AD-F and 25μL 5mC-AD-R, mix them in a 0.2ml thin-walled tube to make 50μL 5mC-AD (final concentration 50μM);
3)将上述5mC-AD置于PCR仪退火,条件为95℃变性10min,0.1℃/秒降温至25℃;并在25℃保持2小时,即为制备好的甲基化接头;3) Place the above 5mC-AD in a PCR instrument for annealing, the conditions are: denaturation at 95°C for 10 minutes, cooling to 25°C at 0.1°C/second; and maintaining at 25°C for 2 hours, to obtain the prepared methylated linker;
4)取30μL 10ng/μL的基因组DNA,使用Covaris超声破碎成200-400bp的DNA片段;4) Take 30 μL 10 ng/μL genomic DNA and use Covaris ultrasonic to break it into 200-400 bp DNA fragments;
5)使用DNA建库试剂盒(E7120,NEB)进行末端补齐,加A,甲基化接头连接,具体步骤如下:5) Use a DNA library construction kit (E7120, NEB) to perform end filling, add A, and methylate adapter ligation. The specific steps are as follows:
a.配制如下体系填平末端并加A
a. Prepare the following system to fill the end and add A
反应条件:20℃30分钟,65℃30分钟Reaction conditions: 20℃ for 30 minutes, 65℃ for 30 minutes
b.末端补齐并加A后,进行甲基化接头(5mC-AD)连接,配制如下体系:
b. After the ends are filled and A is added, a methylated linker (5mC-AD) is connected to prepare the following system:
反应条件:20℃15分钟;Reaction conditions: 20°C for 15 minutes;
c.使用磁珠纯化连接产物,最终溶解于50μL H2O;c. Purify the ligation product using magnetic beads and finally dissolve in 50 μL H 2 O;
6)取一半连接产物直接进行步骤9)的PCR扩增步骤,作为未酶切对照;6) Take half of the ligation product and directly perform the PCR amplification step in step 9) as a non-enzyme digestion control;
7)配制如下AT酶切体系:
7) Prepare the following AT enzyme digestion system:
反应条件,37℃,2小时Reaction conditions: 37°C, 2 hours
8)使用磁珠纯化AT酶切产物,最终溶解于30μL H2O;8) Purify the AT digestion product using magnetic beads and finally dissolve it in 30 μL H 2 O;
9)于冰上配制如下体系,混合均匀并离心,置于PCR仪上反应,对不包含酶切位点的DNA片段扩增富集;9) Prepare the following system on ice, mix well and centrifuge, place on a PCR instrument to react, and amplify and enrich the DNA fragments that do not contain the restriction site;
a.反应体系

a. Reaction system

引物序列为:
The primer sequences are:
b.反应条件:
b. Reaction conditions:
10)文库经2100质控插入片段长度,并使用qPCR定量摩尔浓度后,Illumina10) After the library was subjected to 2100 quality control for insert length and quantified by qPCR, Illumina
Novaseq上机;Novaseq on board;
结果:测序结果统计如下表,分别统计比对到基因组整体的碱基数和比对到CpG区域的碱基数,计算平均测序深度。一轮AT酶切后,每G测序数据在CpG岛测序深度为1.4X相较于未酶切处理文库的0.4X,CpG岛富集效率约3.5倍;Results: The sequencing results are shown in the following table. The number of bases aligned to the entire genome and the number of bases aligned to the CpG region are counted separately to calculate the average sequencing depth. After one round of AT digestion, the sequencing depth of each G sequencing data in the CpG island is 1.4X compared to 0.4X of the undigested library, and the CpG island enrichment efficiency is about 3.5 times;
表2酶切对测序效率的影响
Table 2 Effect of enzyme digestion on sequencing efficiency
同时,相较于未酶切处理,一轮AT酶切后比对到CpG岛的测序数据主要分为以下几种类型;A)CpG岛信息均保留,B)CpG岛信息部分保留,C)CpG岛信息完全丢失(图2)。经计算,相同测序量情况下,27949个CpG岛中,一轮AT酶切后测序,测序深度高于未酶切测序的数量达到25606个,占比91.6%(图3)。At the same time, compared with the undigested treatment, the sequencing data aligned to the CpG island after one round of AT digestion is mainly divided into the following types: A) all CpG island information is retained, B) CpG island information is partially retained, and C) CpG island information is completely lost (Figure 2). After calculation, under the same sequencing amount, among the 27,949 CpG islands, the number of sequencing depths after one round of AT digestion is higher than that of sequencing without digestion reached 25,606, accounting for 91.6% (Figure 3).
需注意,CpG岛包含的各CpG位点通常具有一致的甲基化模式,因此经限制 酶切后,即使CpG岛中只有部分信息被保留,也可被用于推断该CpG岛的甲基化程度。It should be noted that each CpG site contained in a CpG island usually has a consistent methylation pattern, so After enzyme digestion, even if only part of the information in the CpG island is retained, it can be used to infer the methylation degree of the CpG island.
该实施例可见,单轮AT酶切即可降低超过70%的测序成本,且在超过90%的CpG岛区域获得不亚于全基因组测序的测序深度。It can be seen from this example that a single round of AT digestion can reduce sequencing costs by more than 70%, and obtain a sequencing depth that is no less than that of whole genome sequencing in more than 90% of CpG island regions.
实施例3:cfDNA经两轮AT酶切和非甲基化位点酶切后CpG岛富集测序Example 3: CpG island enrichment sequencing after two rounds of AT digestion and non-methylated site digestion of cfDNA
1)取30ng的cfDNA,使用DNA建库试剂盒(E7120,NEB)进行末端补齐,加A,甲基化接头连接,具体步骤同实施例2;1) 30 ng of cfDNA was taken, and the end was filled, A was added, and the methylated adapter was connected using a DNA library construction kit (E7120, NEB). The specific steps were the same as in Example 2;
2)连接产物使用磁珠纯化连接产物,最终溶解于45μL H2O,取15μL直接进行步骤4)CT转化步骤,作为未酶切对照;2) The ligation product was purified using magnetic beads and finally dissolved in 45 μL H 2 O. 15 μL was taken and directly carried out to step 4) CT conversion step as a non-enzyme digestion control;
3)配制下列体系同时进行AT酶切和非甲基化位点酶切;
3) Prepare the following system to perform AT digestion and non-methylated site digestion simultaneously;
反应条件,37℃,2小时Reaction conditions: 37°C, 2 hours
4)使用磁珠纯化酶切产物后,使用Enzymatic Methyl-seq Conversion Module(E7125L,NEB)酶处理方法进行CT转化;4) After purifying the enzyme digestion product using magnetic beads, use Enzymatic Methyl-seq Conversion Module (E7125L, NEB) was used for CT conversion;
a.添加400μL TET2Reaction Buffer至一管TET2Reaction Buffer Supplement中混合均匀,标记为TET2Reaction Buffer(reconstituted);a. Add 400 μL TET2 Reaction Buffer to a tube of TET2 Reaction Buffer Supplement and mix well. Mark it as TET2 Reaction Buffer (reconstituted).
b.稀释Fe(II)Solution:取1μL 500mM Fe(II)Solution(黄色盖子)加入至1249μL水中混合均匀,置于冰上备用;b. Dilute Fe(II) Solution: Add 1 μL 500 mM Fe(II) Solution (yellow cap) to 1249 μL water, mix well, and place on ice for later use;
c.按照下表冰上配制反应体系,混合均匀,短暂离心:
c. Prepare the reaction system on ice according to the table below, mix well, and centrifuge briefly:
d.在上述已混合体系中添加5μL已稀释的Fe(II)Solution,移液器吹打10次混合均匀并短暂离心,置于PCR仪上进行氧化反应:d. Add 5 μL of diluted Fe(II)Solution to the mixed system, pipette 10 times to mix evenly and centrifuge briefly, then place on a PCR instrument for oxidation reaction:
反应条件:
Reaction conditions:
e.将氧化反应结束的样本转移至冰上,加入1μL Stop Reagent移液器吹打10次混合均匀并短暂离心,置于PCR仪上进行终止反应:e. Transfer the sample after the oxidation reaction to ice, add 1 μL Stop Reagent, pipette 10 times to mix evenly, centrifuge briefly, and place on the PCR instrument to terminate the reaction:
反应条件:
Reaction conditions:
f.氧化产物经磁珠纯化后,溶解于16μL H2O,加入4μL HIDI,置于提前预热85℃的PCR仪上反应10min,结束后立即放冰上;f. After the oxidation product was purified by magnetic beads, it was dissolved in 16 μL H 2 O, 4 μL HIDI was added, and it was placed in a PCR instrument preheated to 85°C for 10 minutes, and immediately placed on ice after the reaction;
g.按照下表配制反应体系,移液器吹打混匀,盖上盖膜后短暂离心后,置于PCR仪上反应:g. Prepare the reaction system according to the table below, mix well by pipetting, cover with film, centrifuge briefly, and place on PCR instrument for reaction:
反应体系:
reaction system:
反应条件:
Reaction conditions:
h.使用磁珠纯化CT转化产物,最终溶解于H2O中,酶切CT转化产物平均分为2份进行后续扩增反应;h. Purify the CT conversion product using magnetic beads, and finally dissolve it in H 2 O. Divide the CT conversion product by enzyme digestion into two equal parts for subsequent amplification reaction;
5)于冰上平行配制如下3个体系,混合均匀并离心,置于PCR仪上反应,对不包含酶切位点的DNA片段及未酶切对照扩增富集;5) Prepare the following three systems in parallel on ice, mix well and centrifuge, place on a PCR instrument for reaction, and amplify and enrich the DNA fragments that do not contain the restriction site and the uncut control;
反应体系

reaction system

引物序列同实施例2。The primer sequences are the same as in Example 2.
反应条件:
Reaction conditions:
6)未酶切对照扩增产物纯化后,直接作为未酶切对照测序文库;2份酶切CT转化扩增产物纯化后,一份作为单轮AT酶切+非甲基化位点酶切的测序文库,6) After the undigested control amplification product is purified, it is directly used as the undigested control sequencing library; after the two digested CT conversion amplification products are purified, one is used as the sequencing library of single-round AT digestion + non-methylated site digestion,
另一份配制如下体系进行第二轮AT酶切;
Another portion was prepared with the following system for the second round of AT digestion;
7)酶切产物经磁珠纯化后,配制如下体系对不包含酶切位点的DNA片段再次扩增富集;
7) After the enzyme cleavage products are purified by magnetic beads, the following system is prepared to re-amplify and enrich the DNA fragments that do not contain the enzyme cleavage sites;
引物序列:

Primer sequences:

8)扩增产物经磁珠纯化后作为两轮AT酶切+非甲基化位点酶切测序文库;8) The amplified products were purified by magnetic beads and used as two-round AT digestion + non-methylated site digestion sequencing library;
9)所有文库经2100质控插入片段长度,并使用qPCR定量摩尔浓度后,Illumina Novaseq上机。9) All libraries were subjected to 2100 quality control for insert fragment length and quantified by qPCR before being loaded onto the Illumina Novaseq machine.
结果:对测序数据进行统计,当测序测序数据量同为5G:未进行酶切时,总计27949个CpG岛,其中21223个CpG岛测序到至少一条read;进行一轮AT酶切和非甲基化位点酶切后,有17006个CpG岛可检测到至少一条read,测序深度高于未酶切文库的有9349个CpG岛;当继续进行二轮AT酶切后,其中15542个CpG岛可以测到至少一条read,测序深度高于未酶切文库的有10967个CpG岛。Results: The sequencing data were statistically analyzed. When the sequencing data volume was the same as 5G, there were 27949 CpG islands in total without enzyme digestion, of which 21223 CpG islands were sequenced to at least one read; after one round of AT digestion and unmethylated site digestion, at least one read could be detected in 17006 CpG islands, and the sequencing depth was higher than that of the undigested library for 9349 CpG islands; after a second round of AT digestion, at least one read could be detected in 15542 CpG islands, and the sequencing depth was higher than that of the undigested library for 10967 CpG islands.
测序结果分别统计比对到基因组整体的碱基数和比对到CpG岛区域的碱基数,并计算每个CpG岛的平均测序深度。因不同处理文库基因组简化效率存在区别,对3份文库测序数据共有的CpG岛进行统计;测序量为5G时,在这些有效区域中,未进行酶切处理平均覆盖深度约1.68X;一轮AT酶切+非甲基化位点酶切后,平均覆盖深度约2.99X;两轮AT酶切+一轮非甲基化位点酶切后,平均覆盖深度为8.25X(图4)。The sequencing results counted the number of bases aligned to the entire genome and the number of bases aligned to the CpG island region, and calculated the average sequencing depth of each CpG island. Due to the differences in genome simplification efficiency of different processing libraries, the CpG islands shared by the sequencing data of the three libraries were counted; when the sequencing volume was 5G, in these effective regions, the average coverage depth was about 1.68X without enzyme digestion; after one round of AT digestion + non-methylated site digestion, the average coverage depth was about 2.99X; after two rounds of AT digestion + one round of non-methylated site digestion, the average coverage depth was 8.25X (Figure 4).
需注意,经过非甲基化位点酶切后,测序数据对发生了甲基化修饰的CpG岛也有富集作用,如图5所示,位于SEPTIN9基因启动子区的一个CpG岛,在未进行酶切处理时,测到的4条reads均为非甲基化修饰;当经过非甲基化位点酶切后,甲基化修饰的reads被富集出来,测序到的reads数量也随着AT酶切次数的增加而上升。It should be noted that after restriction digestion of non-methylated sites, the sequencing data also enriches CpG islands that have undergone methylation modification. As shown in Figure 5, a CpG island located in the promoter region of the SEPTIN9 gene has 4 reads detected without restriction digestion, all of which are non-methylated. After restriction digestion of non-methylated sites, methylated reads are enriched, and the number of sequenced reads increases with the increase in the number of AT restriction digestions.
通过该实例数据可见,本发明方法同样适用于cfDNA,AT酶切能够显著提高CpG岛区域的测序深度,非甲基化位点酶切可对发生甲基化修饰的区域实现富集。It can be seen from the example data that the method of the present invention is also applicable to cfDNA. AT enzyme cleavage can significantly increase the sequencing depth of the CpG island region, and enzyme cleavage at non-methylated sites can enrich the regions where methylation modification occurs.
在本发明提及的所有文献都在本申请中引用作为参考,就如同每一篇文献被单独引用作为参考那样。此外应理解,在阅读了本发明的上述讲授内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。 All documents mentioned in the present invention are cited as references in this application, just as each document is cited as reference individually. In addition, it should be understood that after reading the above teachings of the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the claims attached to this application.

Claims (15)

  1. 一种DNA甲基化测序文库的构建方法,其特征在于,包括步骤:A method for constructing a DNA methylation sequencing library, characterized in that it comprises the steps of:
    S1)提供待测DNA样品;S1) providing a DNA sample to be tested;
    S2)对待测DNA两端连接接头序列,从而得到带接头的DNA;S2) connecting the adapter sequences to both ends of the DNA to be tested, thereby obtaining DNA with adapters;
    S3)对所述带接头的DNA进行AT酶切、非甲基化位点酶切、和CT转化,从而得到所述测序文库。S3) performing AT digestion, non-methylation site digestion, and CT conversion on the DNA with the adapter to obtain the sequencing library.
  2. 如权利要求1所述的方法,其特征在于,步骤S1)中,所述待测DNA样品选自下组:基因组DNA(gDNA)、细胞游离DNA(cfDNA)、或其组合。The method according to claim 1, characterized in that, in step S1), the DNA sample to be tested is selected from the group consisting of genomic DNA (gDNA), cell-free DNA (cfDNA), or a combination thereof.
  3. 如权利要求1所述的方法,其特征在于,步骤S2)中,所述的接头为甲基化接头,其中所有胞嘧啶C均被5-甲基化修饰。The method according to claim 1, characterized in that in step S2), the linker is a methylated linker in which all cytosine Cs are 5-methylated.
  4. 如权利要求1所述的方法,其特征在于,步骤S3)包括以下步骤:The method according to claim 1, characterized in that step S3) comprises the following steps:
    a.对DNA进行非甲基化位点酶切和AT酶切,和a. Perform DNA digestion at non-methylated sites and AT sites, and
    b.对经酶切的DNA进行CT转化,随后进行第一轮PCR扩增,从而得到酶切测序文库。b. Perform CT conversion on the digested DNA, followed by the first round of PCR amplification to obtain a digested sequencing library.
  5. 如权利要求1所述的方法,其特征在于,步骤S3)包括以下步骤:The method according to claim 1, characterized in that step S3) comprises the following steps:
    a.对DNA进行非甲基化位点酶切,a. Enzyme digestion of DNA at non-methylated sites,
    b.对经酶切的DNA进行CT转化,随后进行第一轮PCR扩增,和b. CT conversion of the digested DNA followed by the first round of PCR amplification, and
    c.对经转化的DNA进行AT酶切,随后进行第二轮PCR扩增,从而得到酶切测序文库。c. Perform AT digestion on the transformed DNA, followed by a second round of PCR amplification to obtain a digestion sequencing library.
  6. 如权利要求1所述的方法,其特征在于,步骤S3)包括以下步骤:The method according to claim 1, characterized in that step S3) comprises the following steps:
    a.对DNA进行非甲基化位点酶切和第一轮AT酶切,a. Perform enzyme digestion on non-methylated sites and the first round of AT digestion on DNA.
    b.对经酶切的DNA进行CT转化,随后进行第一轮PCR扩增,和b. CT conversion of the digested DNA followed by the first round of PCR amplification, and
    c.对经转化的DNA进行AT酶切,随后进行第二轮PCR扩增,从而得到酶切测序文库。c. Perform AT digestion on the transformed DNA, followed by a second round of PCR amplification to obtain a digestion sequencing library.
  7. 如权利要求1所述的方法,其特征在于,步骤S1)中,所述的AT酶切使用酶切识别位点仅包含A和T碱基的限制性内切酶进行。The method according to claim 1, characterized in that, in step S1), the AT digestion is performed using a restriction endonuclease whose digestion recognition site contains only A and T bases.
  8. 如权利要求7所述的方法,其特征在于,所述的AT酶切为MseI单酶切、MluCI单酶切或MseI和MluCI双酶切。The method according to claim 7, characterized in that the AT digestion is single digestion with MseI, single digestion with MluCI, or double digestion with MseI and MluCI.
  9. 如权利要求1所述的方法,其特征在于,步骤S3)中,所述的CT转化为亚硫酸盐化学转化或APOBEC脱氨酶转化。The method according to claim 1, characterized in that in step S3), the CT conversion is sulfite chemical conversion or APOBEC deaminase conversion.
  10. 如权利要求1所述的方法,其特征在于,步骤S3)中,所述的非甲基化位点酶切使用5mC甲基化敏感型限制性内切酶或内切酶组合进行。The method according to claim 1, characterized in that in step S3), the non-methylated site enzyme cleavage is performed using a 5mC methylation-sensitive restriction endonuclease or a combination of endonucleases.
  11. 如权利要求10所述的方法,其特征在于,所述的5mC甲基化敏感型限制性内切酶或内切酶组合选自下组:HpaII内切酶、BstUI内切酶、FspI内切酶、或其组合。 The method according to claim 10, characterized in that the 5mC methylation-sensitive restriction endonuclease or endonuclease combination is selected from the group consisting of HpaII endonuclease, BstUI endonuclease, FspI endonuclease, or a combination thereof.
  12. 一种DNA甲基化测序文库,其特征在于,所述甲基化测序文库是用如权利要求1所述的方法构建的。A DNA methylation sequencing library, characterized in that the methylation sequencing library is constructed using the method according to claim 1.
  13. 一种检测样品中DNA甲基化的方法,其特征在于,包括步骤:A method for detecting DNA methylation in a sample, characterized in that it comprises the steps of:
    1)利用如权利要求1所述的方法构建甲基化文库;和1) constructing a methylation library using the method according to claim 1; and
    2)对所述甲基化文库进行测序,从而检测样品中的DNA甲基化。2) Sequencing the methylation library to detect DNA methylation in the sample.
  14. 一种DNA甲基化异常相关疾病的诊断或预测方法,其特征在于,包括步骤:获取来自待测对象的DNA样本,并使用如权利要求1所述的方法检测所述DNA样本的甲基化,从而诊断或预测所述疾病。A method for diagnosing or predicting a disease related to abnormal DNA methylation, characterized in that it comprises the steps of obtaining a DNA sample from a subject to be tested, and detecting the methylation of the DNA sample using the method as described in claim 1, thereby diagnosing or predicting the disease.
  15. 如权利要求14所述的方法,其特征在于,所述的疾病为肿瘤。 The method according to claim 14, characterized in that the disease is a tumor.
PCT/CN2023/135179 2022-11-30 2023-11-29 Cpg island methylation enrichment sequencing technology based on restriction enzyme digestion WO2024114696A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211525634.X 2022-11-30
CN202211525634.XA CN115976161A (en) 2022-11-30 2022-11-30 CpG island methylation enrichment sequencing technology based on restriction enzyme digestion

Publications (1)

Publication Number Publication Date
WO2024114696A1 true WO2024114696A1 (en) 2024-06-06

Family

ID=85971249

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/135179 WO2024114696A1 (en) 2022-11-30 2023-11-29 Cpg island methylation enrichment sequencing technology based on restriction enzyme digestion

Country Status (2)

Country Link
CN (1) CN115976161A (en)
WO (1) WO2024114696A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115976161A (en) * 2022-11-30 2023-04-18 天昊基因科技(苏州)有限公司 CpG island methylation enrichment sequencing technology based on restriction enzyme digestion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109295188A (en) * 2018-11-02 2019-02-01 深圳海普洛斯医学检验实验室 It is a kind of to simplify methylation sequencing approach and application for cfDNA
CN113943779A (en) * 2021-10-15 2022-01-18 厦门万基生物科技有限公司 Enrichment method of DNA sequence with high CG content and application thereof
CN114438184A (en) * 2022-04-08 2022-05-06 昌平国家实验室 Free DNA methylation sequencing library construction method and application
CN115125624A (en) * 2021-03-25 2022-09-30 南方医科大学 Barcode adaptor and medium-throughput multiple single-cell representative DNA methylation library construction and sequencing method
CN115976161A (en) * 2022-11-30 2023-04-18 天昊基因科技(苏州)有限公司 CpG island methylation enrichment sequencing technology based on restriction enzyme digestion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109295188A (en) * 2018-11-02 2019-02-01 深圳海普洛斯医学检验实验室 It is a kind of to simplify methylation sequencing approach and application for cfDNA
CN115125624A (en) * 2021-03-25 2022-09-30 南方医科大学 Barcode adaptor and medium-throughput multiple single-cell representative DNA methylation library construction and sequencing method
CN113943779A (en) * 2021-10-15 2022-01-18 厦门万基生物科技有限公司 Enrichment method of DNA sequence with high CG content and application thereof
CN114438184A (en) * 2022-04-08 2022-05-06 昌平国家实验室 Free DNA methylation sequencing library construction method and application
CN115976161A (en) * 2022-11-30 2023-04-18 天昊基因科技(苏州)有限公司 CpG island methylation enrichment sequencing technology based on restriction enzyme digestion

Also Published As

Publication number Publication date
CN115976161A (en) 2023-04-18

Similar Documents

Publication Publication Date Title
JP7256748B2 (en) Methods for targeted nucleic acid sequence enrichment with application to error-corrected nucleic acid sequencing
US11603553B2 (en) Methods of analyzing nucleic acid fragments
EP3377647B1 (en) Nucleic acids and methods for detecting methylation status
CN113661249A (en) Compositions and methods for isolating cell-free DNA
US10544467B2 (en) Solid tumor methylation markers and uses thereof
US20190309352A1 (en) Multimodal assay for detecting nucleic acid aberrations
WO2024114696A1 (en) Cpg island methylation enrichment sequencing technology based on restriction enzyme digestion
CN114438184B (en) Free DNA methylation sequencing library construction method and application
CN114072527A (en) Determination of Linear and circular forms of circulating nucleic acids
CN110283889A (en) The dual-gene monitoring reaction system of one kind, kit and its application
CN113881739B (en) Method for oxidizing nucleic acid molecules containing jagged ends, reduction method and library construction method
CN113493835A (en) Method and kit for screening large intestine tumor by detecting methylation state of BCAN gene region
US20200283840A1 (en) Epigenetic discrimination of dna
US20240026453A1 (en) Detecting methylation changes in dna samples using restriction enzymes and high throughput sequencing
AU2015336938A1 (en) Genome methylation analysis
AU2021384324A9 (en) Detecting methylation changes in dna samples using restriction enzymes and high throughput sequencing
CN117441027A (en) Headrich-BS: thermal enrichment of CpG-rich regions for bisulfite sequencing
US20220307077A1 (en) Conservative concurrent evaluation of dna modifications
WO2023228175A1 (en) Reaction buffer compositions and methods for dna amplification and sequencing
WO2023287876A1 (en) Efficient duplex sequencing using high fidelity next generation sequencing reads