CN114438184B - Free DNA methylation sequencing library construction method and application - Google Patents

Free DNA methylation sequencing library construction method and application Download PDF

Info

Publication number
CN114438184B
CN114438184B CN202210365172.3A CN202210365172A CN114438184B CN 114438184 B CN114438184 B CN 114438184B CN 202210365172 A CN202210365172 A CN 202210365172A CN 114438184 B CN114438184 B CN 114438184B
Authority
CN
China
Prior art keywords
dna
biological sample
methylation
sequencing
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210365172.3A
Other languages
Chinese (zh)
Other versions
CN114438184A (en
Inventor
曹云龙
谢晓亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changping National Laboratory
Original Assignee
Changping National Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changping National Laboratory filed Critical Changping National Laboratory
Priority to CN202210365172.3A priority Critical patent/CN114438184B/en
Publication of CN114438184A publication Critical patent/CN114438184A/en
Priority to PCT/CN2022/103499 priority patent/WO2023193357A1/en
Application granted granted Critical
Publication of CN114438184B publication Critical patent/CN114438184B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biochemistry (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a free DNA methylation sequencing library construction method and application, and discloses a library construction method capable of simultaneously realizing DNA methylation and fragmentation omics detection. When the sequencing library is established, firstly, phosphodiester bond gaps on a DNA double strand are repaired, then, the double strand DNA is subjected to linker sequence connection, and then, the DNA methylation sequencing library is established by utilizing an enzymatic conversion method. The methods provided herein preserve DNA double strand integrity, prevent DNA strand breaks, ensure that fragmentation omics information in sequencing libraries is not corrupted, and at the same time prevent end repair steps from generating DNA methylation-modified erasures downstream of the nicks to obtain accurate DNA methylation information.

Description

Free DNA methylation sequencing library construction method and application
Technical Field
The present application relates to biotechnology, and more particularly to methods of epigenetic analysis of free DNA.
Background
Extracellular free DNA (cfDNA) is generally a fragment of DNA that is released from the inside of a cell into the body fluid or environment during cellular metabolism, exocytosis or apoptosis (apoptosis) and necrosis (necrotic). Materials such as blood plasma, blood serum, urine, saliva, amniotic fluid and the like contain a trace amount of free DNA, and reflect information such as in vivo genetic variation, diseases, aging and the like. Free DNA is extracted, library is constructed, sequenced and analyzed as an effective noninvasive/minimally invasive detection means which is widely applied to various aspects of clinical diagnosis and scientific research, such as the fields of cancer screening and diagnosis, prenatal diagnosis, diagnosis before implantation, aging research and the like. Epigenetic information carried by free DNA, such as DNA methylation/hydroxymethylation omic features, fragmentation omic features, are increasingly considered to be valuable molecular markers for detecting, diagnosing and/or monitoring various diseases such as cancer.
DNA methylation (DNAmethylation) is a phenomenon in which methylation chemical modification occurs on DNA within an organism, and its chemical nature is: at the 5 th carbon atom of the pyrimidine ring of the cytosine base, one of the original hydrogen atoms is replaced by a methyl group to form 5-methylcytosine (5-methylcytosine). The DNA modification influences a plurality of life processes such as gene expression and regulation, cell division and differentiation, cell metabolism and growth and the like on the premise of not changing a DNA sequence, finally influences the growth and development, metabolic multiplication, disease occurrence and development and aging death of organism individuals, and is an epigenetic mechanism. If high throughput Sequencing (NGS) of methylation modifications on DNA sequences is required, the methylation modifications must be distinguished by specifically converting cytosine with/without methylation to other bases.
The fragmentation group information is a set of information obtained by analyzing the fragmentation characteristics of the free DNA, for example, information such as distribution characteristics of the free DNA on the genome, fragment length and start/stop position, fragment terminal base characteristics, and fragment non-alignment characteristics. Studies have shown that free DNA tends to be cleaved at certain genomic regions or elements, so fragmented omic information can reflect open and closed chromatin, nucleosome occupancy, transcription factor binding, epigenetic modifications, etc., and is one of the key epigenomic information and a valuable molecular marker for monitoring and diagnosing various diseases such as cancer [1 ]. The integrity of the DNA fragments must be preserved if fragmentation information is to be obtained at the time of high throughput sequencing.
However, since free DNA is formed by strong DNA enzyme degradation inside/outside the cell, it is difficult to detect free DNA due to the following characteristics:
1. cfDNA content is low (e.g. only a few nanograms per milliliter of plasma). This requires that the detection method must accommodate an initial amount of cfDNA of 1ng or less, requiring higher library building efficiency to improve cfDNA detection sensitivity.
2. cfDNA fragments are short (around 100bp to 200 bp). Firstly, short DNA fragments are easily lost due to purification operation in the process of library construction, and cannot bear secondary fracture caused by chemical conversion; secondly, the short fragment can not be applied to the whole genome database building methods such as transposase, random primer, broken connection and the like, and the application of the targeted amplification database building methods such as targeted PCR and the like also has certain difficulties and limitations.
3. cfDNA tends to be severely chemically damaged. Damage to cfDNA results from multiple processes: DNA damage due to cellular physiology, pathological conditions; damage caused by nuclease digestion during cell death; damage caused by continuous attack of active substances such as free nuclease and the like in a body fluid environment (such as blood circulation); the product can be preserved in plasma solution for a long time and is damaged by repeated freezing and thawing processes. It has been found that 97-98% of cfDNA is nicked, a single-stranded phosphodiester bond break occurring on the DNA double strand [2 ]. Using the single-stranded banking method, cfDNA will break at the nicks after double-strand denaturation dissociation, generating fragments [3 ].
The existing DNA methylation sequencing library construction method can be divided into the following four methods: (1) double-stranded library construction method based on bisulfite conversion [4 ]; (2) a bisulfite conversion based single-strand Library construction method (such as Swift Biosciences Kit Accel-NGS Methyl-Seq DNA Library Kit, Cat. number 30024); (3) double-chain library construction based on enzymatic conversion (e.g. New England Biolabs, Inc. EM-seq technology WO2017075436A1 and its commercial kit E7120; TAPS technology of WO2019136413A 1; Cabernet technology of WO 2021077415); (4) single-strand library construction method based on enzymatic conversion method (such as CN 114032287A). However, none of the above four methods completely solve the difficulty of methylation detection of free DNA.
Currently, a widely used DNA methylation detection method, Bisulfite conversion, requires that DNA is reacted for several hours in a harsh environment of high salt (9M), high acid (pH =5), and high temperature (50-90 ℃), so that DNA is denatured and fragmented, and most of DNA is degraded and lost in the conversion reaction process. The massive fragmentation and loss of DNA make the bisulfite conversion method unable to cope with the low initial quantity of cfDNA library building requirements, and exacerbate the short fragmentation and gap damage of cfDNA, so it is difficult to develop highly sensitive detection technology for free DNA methylation based on the bisulfite conversion method.
In recent years, DNA methylation sequencing technologies based on enzymatic conversion methods have been developed and matured, such as EM-seq technology, TAPS technology, Cabernet technology and the like. The enzymology conversion method abandons the harsh chemical treatment process, avoids breaking and degrading DNA during conversion, and has higher library building efficiency, higher sensitivity and higher library uniformity. However, the existing enzyme conversion method patents and commercial products can not completely solve all the problems of methylation detection of free DNA. For the problem of small initial amount of cfDNA, the minimum initial amount of the EM-seq technology (WO 2017075436A 1) and the TAPS technology (WO 2019136413A 1) are both 5-10 nanograms, and high-sensitivity sequencing can not be carried out on free DNA; the Cabernet technology (WO 2021077415) can bank low initial amount of DNA, even single cells with high sensitivity, but it cannot apply transposase to short fragment free DNA, especially for cfDNA nicking problem, and the Cabernet technology does not provide a solution.
The problem of chemical damage of cfDNA is that for conventional sequencing without methylation detection, a double-strand library is used, gaps can be smoothed by a terminal repair process, sequencing is not affected, and attention is not paid. Nor appreciated in traditional bisulfite methylation sequencing because the use of single-stranded banking would allow cfDNA breaks to be lost at the nicks, but the methylation sequencing data itself would not be erroneous. The conventional method based on the novel enzymatic conversion method is double-strand library construction, and severe methylation erasure is really observed, so that severe methylation loss phenomenon is generated. For example, Erger et al [5] applied the double-stranded library construction method to construct methylation sequencing library data showed that the average methylation level continued to decrease from about 80% to 40% from the beginning of the library Read length, and that the prevalent methylation level erasures were observed in subsequent Read2, but the authors did not discuss the abnormality herein. In the DNA methylation sequencing library disclosed by the bamboo stone biology and the construction method and the detection method thereof (CN 114032287A), an author observes an abnormal phenomenon that the methylation level is reduced along with the reading length of the library, but the reason is simply attributed to the deletion of a terminal single strand, and the problem is avoided by using a single strand library construction method, so that the problem is not really solved, the fragmentation at the gap under the single strand state cannot accurately obtain the fragment omics information, the single strand connection efficiency is not high, and the library construction yield is low. Thus, to date, the influence of the presence of single-stranded nicks on the detection of DNA methylation has not been recognized and addressed by the academia.
In summary, no library building method exists at present, which can comprehensively solve the problems of low template initial amount, short fragments and serious gaps in free DNA methylation detection, and cannot obtain a complete DNA library so as to accurately measure free DNA methylation and obtain fragment omics information.
Disclosure of Invention
After deeply researching the characteristics of the cfDNA and the library building mode principle, the inventor analyzes the reasons of the problems existing in the cfDNA methylation detection process, performs a series of targeted improvements, and provides a novel cfDNA methylation sequencing method.
In one aspect, provided herein is a method of pre-processing a biological sample for constructing a sequencing library, comprising:
s1: repairing single-stranded gaps in DNA fragments contained in the sample with DNA ligase; and
s2: optionally, the terminal nicks on the DNA fragments are filled in with DNA polymerase,
wherein the sequencing library is used to obtain DNA methylation information data of the biological sample after sequencing or to obtain DNA methylation information and fragmentation omics information data of the biological sample.
In some embodiments, the DNA fragment is between 50bp to 1000 bp in length; preferably, between 100bp and 500 bp; more preferably, between 100bp and 350 bp.
In some embodiments, the method further comprises fragmenting DNA molecules comprised in the biological sample to generate the DNA fragments prior to step S1.
In some embodiments, the ligase has 3 ', 5' -phosphodiester catalytic activity.
In some embodiments, the ligase is selected from the group consisting of HiFi-Taq ligase, T4 DNA ligase, Taq DNA ligase,E.coliDNA ligase and T7 DNA ligase.
In some embodiments, the biological sample is selected from an animal body fluid, a cell culture fluid, or a natural biological sample, including, but not limited to, peripheral blood, plasma, serum, urine, feces, saliva, cerebrospinal fluid, lymph fluid, alveolar lavage fluid, amniotic fluid, blastocoel fluid, a cell culture fluid, an embryo culture fluid, a microbial culture medium, a soil leachate, and a bone meal leachate.
In some embodiments, the DNA fragments include, but are not limited to, extracellular free DNA, fragmented DNA, e.g., extracellular free DNA from a cancer subject.
In some embodiments, the amount of DNA fragments in said sample is no more than 10ng, or no more than 1ng, for example 0.1 ng.
In some embodiments, the DNA methylation includes, but is not limited to, methylation and/or hydroxymethylation of cytosine.
In another aspect, provided herein is a method of constructing a sequencing library, comprising:
1) providing a biological sample, and pretreating the biological sample by adopting the pretreatment method to obtain a DNA fragment without a 3 ', 5' -phosphodiester bond gap;
2) constructing the sequencing library by using the DNA fragments without 3 ', 5' -phosphodiester bond gaps;
wherein the sequencing library is used to obtain DNA methylation information data of the biological sample after sequencing or to obtain DNA methylation information and fragmentation omics information data of the biological sample.
In some embodiments, step 2) further comprises:
a) performing end filling treatment and adding a linker sequence to two ends of the DNA fragment, wherein part or all of cytosines in the linker sequence are methylated cytosines and/or hydroxymethylated cytosines; and
b) converting cytosine in the DNA fragment into uracil by an enzymatic conversion method, and recognizing the cytosine as thymine in subsequent amplification and sequencing, and recognizing the methylated cytosine as cytosine in subsequent amplification and sequencing.
In some embodiments, step 2) further comprises:
a) carrying out terminal filling-in treatment and adding a linker sequence on two ends of the DNA fragment;
b) converting the methylated cytosine in the terminal DNA fragment into uracil or dihydrouracil by an enzymatic conversion method, wherein the uracil or the dihydrouracil is identified as thymine in subsequent amplification and sequencing, and the unmethylated cytosine is identified as cytosine in subsequent amplification and sequencing.
In some embodiments, the enzymatic conversion process employs an EM-seq conversion process or a TAPS conversion process.
In some embodiments, the method further comprises adding carrier DNA to the biological sample after step a).
In some embodiments, the carrying DNA may be any DNA that does not include the linker sequence, preferably, the carrying DNA fragment has a size of 100-500 bp.
In another aspect, provided herein is a sequencing library obtained by the above method.
In another aspect, provided herein is a method of determining the location of a single-stranded nick in a DNA fragment contained in a biological sample, comprising:
1) dividing the biological sample into two parts: biological sample a and biological sample B;
2) treating the biological sample a using the aforementioned pretreatment method to prepare a first sequencing library; treating the biological sample B using the aforementioned pretreatment method, provided that no DNA ligase is used in the pretreatment step, to prepare a second sequencing library;
3) determining the single-stranded nick location based on the difference in DNA methylation information obtained from the first sequencing library and the second sequencing library.
In another aspect, provided herein is a method of identifying a health condition in a subject, comprising:
1) providing a biological sample from said subject;
2) constructing a sequencing library by adopting the pretreatment method; and
3) sequencing the sequencing library, or sequencing after hybridization capture enrichment,
wherein the DNA methylation information data of the biological sample or the DNA methylation information data and the fragmentation omics information data of the biological sample are obtained from the sequencing data and compared to normal DNA methylation information data and/or fragmentation omics information data in the population to determine the health condition of the subject.
The above methods, provided herein, can be used for epigenetic analysis of extracellular free DNA. The methods provided herein can also be used in the fields of genomics, medicine, diagnostics, and epigenetics research.
Drawings
FIG. 1 is a schematic flow diagram of a methylation library construction method provided herein.
FIG. 2 is a schematic flow diagram of the EM-seq transformation method.
FIG. 3 is a schematic flow diagram of the TAPS conversion process.
FIG. 4 shows the results of library methylation bias analysis of methylation libraries of human peripheral blood free DNA using DNALigase treated (solid line) and untreated (dashed line). The mean CpG methylation rates (vertical axis) varied along the 5 'to 3' direction (horizontal axis) for each library fragment.
FIG. 5 shows the distribution of fragments of free DNA, in which peaks with a period of about 170bp are present.
FIG. 6 shows the distribution of library fragments generated by different library construction methods. And (3) carrying out capillary gel electrophoresis with the same input amount of the library DNA, and analyzing the distribution of DNA fragments within the range of 1-6000 bp. The results show that the library of the method (upper graph) of the invention embodies a periodic DNA fragment peak spectrum, and accords with the characteristics of free DNA of peripheral blood; while the Swift Biosciences, Accel-NGS Methyl-Seq DNA Library Kit (lower panel) has only one main peak, no periodic peak and contains a large amount of non-Library hetero peaks.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art.
"DNA ligase" refers herein to an enzyme that is used to repair single-stranded nicks (nicks) in double-stranded DNA molecules. DNA ligases generally have 3 ', 5' -phosphodiester catalytic activity. In one example, the DNA ligase can catalyze the formation of a phosphodiester bond between the 5 'phosphate group at the nick and the 3' hydroxyl group. DNA ligase will not normally add new nucleotides at the nicks.
"DNA polymerase" as used herein refers to an enzyme capable of adding dNTPs (deoxyribonucleotides) at the 3' end of a primer using one DNA strand as a template to synthesize the complementary strand thereof in the presence of the primer. DNA polymerases can also generally have 3 '-5' exonuclease activity, which serves as a proofreading during synthesis, and 5 '-3' exonuclease activity, which serves as an excision repair.
"DNA fragment" as used herein refers to a short fragment of DNA, e.g.between 50bp and 700 bp in length, e.g.between 100bp and 500 bp, especially between 100bp and 350 bp. The DNA fragments contained in a biological sample are usually heterogeneous, i.e.of unequal length, and accordingly the above length may refer to the average of the lengths of these DNA fragments. These DNA fragments may have different sequences, e.g. from different regions of the genome of the same organism, or even from different organisms. The DNA fragment may be nicked with a single strand, or may be blunt-ended or non-blunt-ended (with a 3 'overhang or a 5' overhang).
"blunt-ended DNA fragment" as used herein refers to a double-stranded DNA fragment with no 3 'overhang or 5' overhang at the end.
"DNA methylation" as used herein refers to the modification of a cytosine base in a DNA molecule or DNA fragment to 5-methylcytosine (5 mC). DNA methylation in vertebrates generally occurs at CpG sites (i.e., sites in the DNA sequence immediately following a cytosine and a guanine), and is catalyzed by DNA methyltransferase to convert cytosine to 5-methylcytosine. Most CpG sites in the human genome are methylated, but in certain regions, such as CpG islands rich in cytosine (C) and guanine (G), are not usually methylated. CpG methylation can affect the transcriptional activity of the gene of interest, e.g., methylation can inhibit oncogenes, while demethylation stimulates the expression of certain oncogenes, all of which can lead to carcinogenesis. In addition, although the incidence is less than 5mC, a few cytosine bases are modified to 5-hydroxymethylcytosine (5hmC), 5-formylated (5fC), and 5-carboxylated (5 caC). References herein to methylation may also refer to modifications to 5hmC, 5fC, 5caC, unless the context indicates otherwise.
"DNA methylation information" as used herein refers to information about the methylation status in a DNA molecule or DNA fragment, including, but not limited to, methylation site, methylation level, methylation pattern (5mC or 5hmC), and the like. "methylation level", which may also be referred to herein as "degree of methylation", refers to the proportion (or frequency) of a particular methylation site in a sample that is modified by methylation. Whether a site is methylated or not can be detected in a variety of ways. Common methods include chemical or enzymatic conversion, in which one of methylated cytosine and unmethylated cytosine is converted to uracil (U) or a base that is substantially identical in base pairing to uracil (e.g., dihydrouracil, DHU). During the subsequent amplification process, the corresponding uracil is used as thymine (T) to pair with adenine (A), and the final result is that cytosine or methylated cytosine at the methylation site is expressed as thymine in the detection result (such as the sequencing result). By comparison with a reference sequence, it can be determined whether a cytosine in a DNA molecule or DNA fragment is methylated. The reference sequence may be a sequence from the same sample but not transformed as described above, or a corresponding sequence in a healthy population. In addition, as described below, 5mC and 5hmC may also be distinguished by some means. DNA methylation information is now widely used in cancer (e.g., lung, breast, liver, colorectal, etc.) screening and diagnosis, particularly early screening and diagnosis.
"fragmentation omics information" as used herein refers to the set of information obtained by analyzing the fragmentation characteristics of the free DNA in the sample, such as the distribution characteristics of the free DNA on the genome, the fragment length and the start/stop position, the terminal base characteristics of the fragment, the non-alignment characteristics of the fragment, and the like. There have been several recent reports in the literature for cancer (e.g., lung cancer) screening.
"enzymatic conversion" as used herein refers to the modification of cytosine or methylated cytosine by catalysis by a particular enzyme so that its methylated and unmethylated states can be distinguished in a subsequent assay, or 5mC and 5 hmC. Several enzymatic conversion methods are known in the art, including, for example, but not limited to, the EM-seq conversion method and the TAPS conversion method. In enzymatic conversion processes, these enzymes are usually used in combination with chemical reagents.
"EM-seq conversion method" as used herein refers to the technique developed by New England Biolabs to distinguish between methylated and unmethylated cytosines by enzymatic conversion. It uses the dioxygenases TET (including TET1, TET2, TET3, etc.) to modify 5mC and 5hmC to 5-carboxycytosine (5caC), followed by deamination of unmethylated cytosine with cytidine deaminase to convert to uracil. The cytidine deaminase used comprises a member of the APOBEC protein family, e.g. APOBEC 3A. In the case of using DNA Glycosyltransferase (GT), 5mC and 5hmC can be further distinguished. The scheme of the EM-seq transformation method can be seen in FIG. 2. For more details on the EM-seq transformation method, see PCT application publication WO2017075436A, which is incorporated herein by reference in its entirety.
The "TAPS (TET-assisted pyridine binder sequencing) conversion method" is somewhat similar to the EM-seq conversion method, which also enzymatically modifies 5mC and 5hmC to 5-carboxycytosine (5ca c), but which is followed by no deamination step, but instead converts 5ca c to Dihydrouracil (DHU) using the reducing agent pyridine borane (pyridine binder), while the unmethylated cytosine remains unchanged. During subsequent replication or amplification, DHU is recognized by the polymerase as uracil, pairing with a. Thus, sites with 5mC and 5hmC modifications were detected as T, and C with no modification was still detected as C. Similarly, in the use of DNA Glycosyltransferases (GT) or KRuO4In the case of (3), 5mC and 5hmC can be further distinguished. The scheme of the TAPS conversion process can be seen in FIG. 3. For more details on the TAPS transformation method, see PCT application publication WO2019136413A, which is incorporated herein by reference in its entirety.
"health condition" as used herein refers to the health status of a subject, including whether a disease is present, the magnitude of risk of a disease, whether a therapeutic agent or treatment is appropriate, the prognosis of a disease, and the like.
By "subject" is meant herein an animal, such as a mammal, including, but not limited to, humans, rodents, simians, felines, canines, equines, bovines, porcines, ovines, caprines, mammalian laboratory animals, mammalian farm animals, mammalian sport animals, and mammalian pets. The subject may be male or female and may be any suitable age subject, including infant, juvenile, adolescent, adult and geriatric subjects. In some examples, the subject is a patient. In a particular example, the subject is a human, such as a human patient. The term is often used interchangeably with "patient," "subject to be treated," and the like.
In one aspect of the invention, a method of pre-treating a sample to be subjected to sequencing library construction is provided. This pretreatment involves repairing single-stranded nicks in the DNA fragments contained in the sample, followed by conventional end repair. The gap can be repaired by using DNA ligase, and the end repair can be performed by using DNA polymerase. The need for gap repair prior to conventional end repair is because the inventors have for the first time realized and have demonstrated through research that the presence of these single-stranded gaps is a significant cause of loss of methylation information, particularly for extracellular free DNA (cfdna), such as circulating tumor DNA. Only in the conventional repair step, the DNA polymerase used carries out strand replacement downstream of the nick by its 5 '-3' exonuclease activity and polymerase activity, thereby erasing the methylation information contained in the original strand. The end of the DNA fragment is repaired after the gap is repaired, so that the chain replacement process can not occur, and the original methylation information can be kept.
In another aspect of the present invention, there is provided a method for constructing a sequencing library, which comprises performing the construction of the sequencing library after the above pretreatment of a sample. In some embodiments, the library construction process further comprises modifying cytosine (C) or methylated cytosine (including 5mC and/or 5hmC) therein by enzyme conversion, so that a subsequent detection step can distinguish methylated from unmethylated cytosine or distinguish 5mC from 5hmC to form a DNA methylation sequencing library. And (3) carrying out computer sequencing on the sequencing library to obtain corresponding methylation information data. As can be understood by those skilled in the art, the methylation information data and the fragmentation omics information data can be obtained after sequencing by adopting the method disclosed by the invention. The database construction technology is used for constructing a methylation NGS detection library based on cfDNA double-stranded fragments, and the methylation state is distinguished through enzymatic conversion after single-stranded gaps are repaired, so that the integrity of the original DNA fragments in a sample is basically not damaged, and the original fragment omics information including fragment length distribution characteristics, fragment length and the like is kept.
In another aspect of the invention, the requirement for the amount of DNA fragments in the sample by the method of the invention can be further reduced by using carrier DNA until the amount of DNA is as low as 0.1ng, or even lower.
In another aspect of the invention, the location of single-stranded nicks in DNA fragments contained in a biological sample is determined by the difference in methylation information with or without the addition of DNA ligase. It will be appreciated that typically no 5mC or 5hmC is present at the notch position, and therefore, in most cases, the notch position may be considered a range of notch positions, e.g., the notch position may be considered to lie between a first position and a second position downstream thereof: the first position is a methylation position that is detectable with or without the addition of DNA ligase, and the second position is a position where a methylation difference is detected (e.g., methylation is detectable with the addition of DNA ligase and no methylation is detectable without the addition of ligase).
In another aspect of the invention, the health status of the subject can be determined from the obtained methylation information, and in particular, the methylation information can be combined with the fragmentation omics information to more accurately determine the health status of the subject.
Understandably, the technical problems to be solved by the present invention include, but are not limited to:
1. the method for constructing the high-throughput next-generation sequencing library of the free DNA and sequencing the NGS is provided, and the DNA methylation and the fragment omics information can be simultaneously and accurately determined;
2. the method needs to have high sensitivity, and can carry out efficient NGS library construction on low initial amount samples as low as 1ng and below 1 ng;
3. the method needs to keep the integrity of free DNA chains and ensure that the fragmentation omics information in the sequencing library is not damaged;
4. DNA breakage and methylation information loss caused by nick damage on free DNA are avoided;
5. the determination of arbitrary methylation information such as DNA hydroxymethylation and the like and fragment omics information can be realized after the technical route is simply changed.
The method provided by the invention can be used for, but is not limited to, epigenetic analysis of extracellular free DNA, and can achieve the aim. The method may include one or more of the following features:
1. the use of specific DNA ligases avoids loss of methylation due to DNA nick damage. For DNA containing gaps, the method uses one or more specific DNA ligases, such as Taq DNA ligase, to repair the phosphodiester bond gaps on the template DNA duplex, preventing the DNA duplex from being broken and lost and/or the methylation information from being erased in the subsequent process. After the DNA template subjected to gap repair treatment is subjected to double-strand library construction by using an enzymatic conversion method, the 3' end of the library does not generate serious methylation information erasure any more, and the detected methylation rate is recovered to be close to the average methylation rate.
2. And carrying DNA to assist in protecting the trace DNA, so that effective library construction and high-sensitivity methylation sequencing on the trace DNA are realized. As for the improvement of the method based on the enzymatic conversion method, referring to the Cabernet technology of the patent publication WO2021077415, after adding the library adaptor to the template DNA and before the enzymatic conversion, carrier DNA (carrier DNA) with similar physicochemical properties but without adaptor sequence is added, so that the carrier DNA becomes a main lost part in the steps of DNA loss or degradation easily occurring such as purification, and the like, and the loss of the template DNA is effectively avoided. In addition, a single-strand purification step is omitted after the conversion is finished, and a PCR amplification reaction system is directly added, so that the loss of the DNA template caused by the purification process is avoided to the maximum extent.
3. Bisulfite conversion can cause loss of the information of the DNA fragmentation group, and the method applies an enzyme conversion method and a double-stranded DNA library establishing strategy, so that the DNA methylation information is accurately measured and simultaneously the DNA fragmentation group information is kept. And does not affect the preservation of the fragmentation group information when the DNA ligase is used for processing.
Overview of the method
(1) Extraction of free DNA
Free DNA refers to DNA free outside cells or nuclei, and can be extracted from various biological materials such as body fluids, in vitro cell culture fluids, and natural environments, including but not limited to: peripheral blood, plasma, serum, urine, feces, saliva, cerebrospinal fluid, lymph fluid, alveolar lavage fluid, amniotic fluid, blastocoel fluid, cell culture fluid, embryo culture fluid, microorganism culture medium, soil leachate, bone meal leachate, etc. The free DNA can be obtained by extraction and purification.
The total amount of free DNA used for downstream sequencing analysis can be as low as picograms, nanograms.
(2) Gap repair of DNA
This step uses a specific Ligase (Ligase), which may be, but is not limited to, any one or a combination of several repair reagents and/or DNA ligases: HiFi-Taq DNA Ligase, T4 DNA Ligase, Taq DNA Ligase, T7 DNA Ligase, E.
The DNA ligase and/or DNA repair reagent repairs the nicks on the template DNA double strand, preventing the break loss of the DNA strand and/or the erasure of the methylation information in the subsequent process.
(3) Adding a library adaptor sequence (adaptor) to both ends of DNA
Double-stranded end gaps are filled using polymerase or the like, and in some instances, an A (adenine) base is added to the ends to facilitate linker ligation.
This step uses DNA ligase to ligate DNA linkers with specific sequences modified by partial/total cytosine methylation or hydroxymethylation to both ends of the template DNA.
In some examples, the DNA linker sequence may have a biotin label on it to facilitate subsequent purification.
In some examples, a sample barcode and a single DNA molecule barcode can be introduced on the DNA linker sequence, labeling the DNA template source, increasing detection accuracy, and also allowing for mixing of samples from multiple sources in subsequent steps.
(4) Addition of Carrier DNA
In order to prevent the loss of trace template DNA in a plurality of subsequent steps, carrier DNA (carrier DNA) with similar properties and with the quantity being several times of that of the template DNA is added into a template DNA solution, so that the carrier DNA can become a main lost part in steps such as purification and the like which are easy to cause DNA loss or degradation, and the loss of the template DNA is effectively avoided.
Since carrier DNA does not have a DNA adaptor sequence of a specific sequence, it is not detected in subsequent library amplification and sequencing.
(5) Methylation conversion
The step uses an enzymatic conversion method to carry out specific conversion on cytosine, methylated cytosine and/or hydroxymethylated cytosine on DNA, so that the base pairing rules of the cytosine and the methylated and/or hydroxymethylated cytosine are changed, and different bases are corresponding in the subsequent amplification and sequencing processes, thereby being distinguished.
This step can be performed using the "EM-seq" technology of New England Biolabs. In another example, TAPS technology may be used.
(6) Library amplification and sequencing
Upon methylation conversion, the DNA undergoes thermal denaturation and a change in base pairing relationship to be in a single-stranded or partially single-stranded state. Single-stranded DNA has poor affinity during purification, resulting in substantial losses. The method can save the single-strand purification step, and directly add the PCR amplification reaction system to avoid the loss of the template DNA in the purification process to the maximum extent.
The library will be subjected to a PCR amplification reaction, the amplification primers will contain sequences complementary to the above-mentioned linker sequences and sequencer adaptor sequences, and may also be introduced into the sample barcode. The primers can be complementary paired with the adaptor sequences added to both ends of the template DNA, and the desired sequencing library is formed while amplifying.
The sequencing library is subjected to quantification and quality detection after purification, and then is subjected to sequencing by a sequencer.
(7) Analysis of sequencing results
In the case of methylation conversion using the EM-seq technique, cytosine is converted to thymine, and methylated cytosine remains as cytosine; in the case of methylation conversion using the TAPS technique, methylated cytosine is converted to thymine, and cytosine is not changed. By alignment with a reference genome, DNA methylation information can be obtained. Because the library adaptor is added before methylation conversion and when the free DNA keeps a double-stranded state, the DNA fragment information is retained, and the fragment omics information such as the original DNA fragment length, the fragment distribution, the upstream and downstream breakpoint positions, the terminal base mode and the like of the free DNA can be obtained through comparison with a reference genome. The above analysis process can be referred to fig. 1.
Other applications of the method of the invention
DNA hydroxymethylation and fragmentation sequencing of free DNA
In mammals, DNA is less hydroxymethylated than DNA, but has a different biological significance.
In the methylation conversion process, DNA methylation and hydroxymethylation are not distinguished, if only hydroxymethylation needs to be measured, 5mC and 5hmC can be distinguished, for example, only TET oxidase is not added in the conversion process, and other enzymological reactions are kept.
Enrichment method based on targeted PCR
The interested target fragments are enriched, so that the sequencing cost can be saved, and the detection sensitivity of the target gene can be improved.
In step (6) of the above method outline section, PCR using primers complementary to the adapters at both ends of the template DNA will amplify the entire template DNA; if enrichment of the gene fragment of interest is desired, amplification can be performed using a method of targeted PCR. Designing a targeting primer containing a sequence complementary to the methylation-converted targeting gene fragment, and putting the targeting primer into a PCR reaction system to amplify the targeting gene sequence.
The targeting primer may/may not contain part or all of the linker sequence compatible with the sequencer.
Only the upstream or downstream primer can be added to obtain a fragment with the targeting primer at one end and the linker sequence added in step (3) of the above method overview.
The upstream and downstream primers may also be end-linked and constructed on a single piece of DNA to form a padlock primer.
Default joint sequence primers can be added firstly, and then the target primers are added after amplification; can also be put in at the same time; instead of using default adaptor sequence primers, targeting primers may be used.
Enrichment method based on hybrid capture
In step (6) of the above method summary section, hybridization capture enrichment can be performed before or after amplification. It is desirable to design probes that contain sequences complementary to the targeted gene fragment after methylation transformation.
The hybridization capture enrichment uses a single-stranded DNA/RNA probe marked by biotin and the like to be mixed with the library, the library complementary with the target sequence can be hybridized with the probe through thermal denaturation and renaturation, and the hybridized probe can be captured by a method of magnetic beads with streptavidin and the like to obtain the enriched library.
Obtaining gap location information without gap repair
If the ligase treatment in step (2) of the above method summary is not performed, repair is performed in conjunction with the use of an appropriate polymerase, or nucleotides with modifications may be used in conjunction, and methylation information downstream of the nick will be altered. By analyzing the sequence of methylation erasures/changes, the notch position can be obtained.
The invention discloses a high-sensitivity DNA methylation sequencing technology for DNA with single-stranded gaps. The technology firstly repairs DNA nick damage, ensures that DNA methylation signals cannot be lost and misread, secondly uses double-stranded DNA as a template to build a library, overcomes the problem of low efficiency of single-stranded library building, then uses a greatly optimized and adjusted enzymology conversion method to carry out base pairing regular conversion on methylation information, and cooperates with DNA carried in the protection of the template DNA in the reaction and purification processes, finally obtains a high-quality and high-sensitivity methylation information sequencing library by using DNA with extremely low initial amount as the template, and simultaneously retains the fragment information of the DNA.
Advantages of the process of the invention include, but are not limited to:
1. minimum initial amount of DNA: as low as 0.1ng or even lower;
2. library yield: the efficiency can be improved by 4 times;
3. library alignment: from 40% to 70%;
4. library 3' end flap DNA methylation loss length: from 170bp to within 40 bp.
The method can be applied to various liquid biopsy diagnosis products such as cancer early screening, cancer gene detection, prenatal diagnosis, pre-implantation diagnosis, genetic diagnosis and the like, and also provides powerful technical support for scientific research in related fields.
The invention is further illustrated by the following specific examples.
Example 1:
this example describes the sequencing library preparation process with peripheral blood cfDNA as an example.
1.1 extraction of cfDNA
Extraction of cfDNA can be performed using any standard means in the art.
5mL of peripheral blood was collected using a 5mL EDTA venipuncture vacuum blood collection tube (purple cap), and the following plasma separations were performed over 4 hours: the horizontal rotor was centrifuged at room temperature at 1600Xg for 10min at low speed. The upper plasma was carefully removed and transferred to a 1.5mL tube. Centrifuging at 4 deg.C and 6000Xg for 10min with high speed centrifuge, transferring supernatant to a new 1.5mL tube, and storing in-80 deg.C refrigerator or dry ice. Adopt Nucleic Acid extraction or purification kit (QIAGEN, QIAamp Circulating Nucleic Acid, goods number 55114), extract from the blood plasma of separation and obtain cfDNA, usually 1ml blood plasma can obtain 5~20ng cfDNA.
1.2cfDNA gap repair
Prepare a Ligase reaction system, here Taq DNA Ligase is taken as an example, and add the reagents in table 1 below to the purified cfDNA solution. Wherein the Sonated Spike-in DNA is whole genome CpG methylated pUC19 DNA (NEB E7122) and whole genome CpG unmethylated lambda DNA (NEB E7123) which are mixed in equal volume, and the mixture is broken to 300bp by using ultrasonic waves and then diluted to 0.2 ng/. mu.L for standby.
TABLE 1 ligase premix composition
Figure 603497DEST_PATH_IMAGE001
And (3) mixing uniformly for a short time, centrifuging, and reacting for 10-30 min at 37-60 ℃ on a PCR instrument.
1.3 end repair and linker attachment
An End Prep reaction system is configured, and in the Repair product of the previous step, the reagents of a kit NEBNext Ultra II End Repair/dA-labeling Module (NEB E7546L) are added in sequence (Table 2):
TABLE 2 end-point repair reagents
Figure 582954DEST_PATH_IMAGE002
After brief mixing and centrifugation, the following were performed on a PCR instrument: the reaction is carried out at 20 ℃ for 30min and at 65 ℃ for 30 min.
And adding a linker sequence modified by full cytosine methylation after the reaction is finished. To complement the subsequent pooling primers, a proposed pair of linker sequences is: 5 '-A [5mC ] A [5mC ] T [5mC ] T [5mC ] [5mC ] [5mC ] T A [5mC ] A [5mC ] G A [5mC ] G [5mC ] T [5mC ] T T [5mC ] [5mC ] G A T [5mC ] T-3' (SEQ ID NO: 1) and 5 '-phosphate-G A T [5mC ] G A G [5mC ] A [5mC ] A [5mC ] T [5mC ] G T [5mC ] T G A [5mC ] T [5mC ] A G T [5mC ] A-3' (SEQ ID NO: 2). Primers were diluted to a total concentration of 15. mu.M, 1.25. mu.L was added and mixed well.
The following reagents of the kit NEBNext Ultra II Ligation Module (NEB E7595L) were then added in order (table 3):
TABLE 3 addition of linker sequence reagents
Figure 362691DEST_PATH_IMAGE003
After brief mixing and centrifugation, the following were performed on a PCR instrument: the reaction was carried out at 20 ℃ for 15 min.
1.4 Carrier DNA addition and purification
Add 1. mu.L of Carrier DNA to the ligation product from the previous step, mix briefly and centrifuge.
The preparation method of Carrier DNA comprises the following steps: lambda DNA (NEB N3011) was disrupted to 300bp using ultrasound and diluted to 25 ng/. mu.L for use.
DNA was purified using 1.8 volumes of SPRI or Ampure XP magnetic beads and eluted with water to give purified DNA.
1.5 enzymatic conversion of DNA by methylation modification
The purified DNA obtained in the previous step was enzymatically transformed using NEBNext enzyme Methyl-seq Conversion Module (NEB, E7125L) kit. Including TET oxidation reaction and glycosylation protection of DNA, thermal denaturation of DNA, and APOBEC deamination reaction. After the APOBEC deamination reaction, the next reaction was carried out without purification.
1.6 library PCR amplification
Adding the liquid after the APOBEC deamination reaction in the last step into PCR primers with a library joint and an Index label, for example, adding 5 mu L of NEBNext multiple oligonucleotides for Illumina pair of primers. Then 2x Q5U master mix (NEB M0597L) with the same volume as the existing liquid was added. After brief mixing and centrifugation on a PCR instrument:
30s at 98 ℃; 7 cycles of 98 ℃ 10s, 62 ℃ 30s, 72 ℃ 90 s; 5min at 65 ℃ and storing at 4 ℃.
The DNA was purified using 1.1 volumes of SPRI or Ampure XP magnetic beads, eluted using 1 × TE buffer to obtain purified DNA, which was stored at-20 ℃.
1.7 library sequencing and analysis
And performing double-end sequencing on the constructed library on an illumina sequencer after concentration quantification and fragment distribution quality detection. And comparing and analyzing the sequencing off-machine data by using Bismark software (in order to avoid bases with unevenly distributed methylation rates, the methylation rates of the first 10 bases of Read1 and the first 40 bases of Read2 are not included in a methylation analysis result), and obtaining the whole genome methylation data. Through bioinformatics analysis, information such as distribution characteristics, fragment lengths, starting and ending positions, 5' terminal base distribution characteristics and the like of the library on a genome are fragment omics information.
Table 4 sequencing results are exemplified by the following reads:
Figure 69485DEST_PATH_IMAGE004
the sequencing result contains tens of millions to billions of pieces of information, and the sequencing quantity is different according to sequencing instruments and chip selection.
Example 2 comparison of DNA gap repair
In order to evaluate that the method can make up the methylation loss defect of the existing enzymology conversion library building method, a parallel comparison test is carried out:
after extracting free DNA from healthy human plasma, the experiments were divided into two groups:
experimental groups: treatment with DNA Ligase (see step 1.2 in example 1: DNA gap repair reaction)
Control group: no DNA Ligase treatment (as a control, water instead of DNA Ligase)
Two sets of free DNA were subjected to the same double strand library building steps:
terminal modification was performed using the reagent NEBNe.t Ultra II End Repair/dA-labeling Module (NEB E7546L) and linker Ligation was performed using the reagent NEBNe.t Ultra II Ligation Module (NEB E7595L) and the reagent NEBNe.t EM-seq Adaptor (NEB E7165).
The adaptor-added template DNA was purified by magnetic beads and enzymatically converted using NEBNE.t enzymic Methyl-seq Conversion Module (NEB, E7125L) kit. The DNA after the transformation treatment was subjected to PCR library amplification using 2. Q5U master mi (NEB M0597L).
The constructed library is subjected to double-ended 150bp sequencing on an illumina sequencer, aligned and analyzed by using Bismark software, methylation deviation (M-bias) of the library is counted and analyzed, the average methylation rate of each read position is plotted along the 5 'to 3' direction of the library fragment, and whether the methylation level of the library is average or not is evaluated. The results are shown in fig. 4.
As shown in the figure, the nick repair is not carried out on the free DNA, and after the methylated double-strand library is built, the average methylation level of the DNA in the sequencing library starts from about 130bp of Read1 and extends to Read2 along the 5 'to 3' direction of the library fragment, so that more and more significant methylation loss (a dotted line in the figure) is generated, and the average 40% methylation rate of the true value is lost particularly obvious in the sequencing data of Read 2. The experimental group was treated with DNA ligase repair (solid line in the figure) to repair the methylation loss in Read2 (the decrease in methylation rate of 30bp at the 3 'extreme of the library was due to the absence of free DNA double stranded template at the 3' end itself).
This data demonstrates that the methods provided herein can compensate for loss of methylation due to free DNA nick damage.
Example 3 obtaining methylation information while obtaining fragmentation group information
To assess the superiority of the present method over existing methods, the technology and commercial kits of the present invention were tested against each other.
Experimental groups: the technical method of the invention has the specific operation details shown in the example 1;
comparison group: swift Biosciences (Kit manufacturer goods No. 33024Accel-NGS Methyl-Seq DNA Library Kit, see the Kit operation manual for details).
After extracting free DNA from healthy human plasma, two different methods of pooling tests were performed with the same starting amount of free DNA (3ng) and the same number of PCR amplification rounds (7 rounds). After sequencing and analysis of the library, the efficiency was assessed using the Bismark software comparison. The results are shown in Table 5.
TABLE 5 comparison of the present invention and Swift method for library construction
Figure 883857DEST_PATH_IMAGE005
Specifically, the yield of DNA after library construction is subjected to Qubit quantification, and compared with a Swift kit, the yield of the library constructed by the technical method is 83.4ng, which is far higher than that of the library constructed by the Swift method; the comparison rate (align rate) of the sequencing library of the technical method is 70 percent, which is nearly one time higher than 44 percent of the swift library, and the effectiveness of the library is good; the repetition rate (duplication rate) of the sequencing library constructed by the method is 15%, which is obviously lower than the 22% repetition rate of the swift library, and thus, the repeated fragments in the library are fewer and the quality is better; the method retains the fragmentation omics information (see fig. 5 and 6). Analyzing the DNA fragments before library establishment by using an Agilent pulsed field automated electrophoresis fragment analysis system (Agilent Femto Pulse), wherein the result is shown in FIG. 5, and the free DNA has a characteristic peak spectral line with a period of about 170bp and is a typical fragmentation characteristic caused by nucleosome occupation; the Agilent 5200 Fragment Analyzer System is used for analyzing the fragments after the library is built, and the result shows that (shown in figure 6), compared with the library built by the Swift kit, the sequencing library built by the library building sequencing method provided by the invention contains the fragmentation distribution of free DNA characteristics, so that the fragmentation omics data can be better analyzed.
The invention indicates for the first time that the serious methylation deletion problem generated by double-chain library establishment is generated by single-chain gaps and terminal single-chain deletions together, and the terminal single-chain deletion belongs to original information loss and cannot be recovered; the single-stranded gap causes most of the read length methylation to be erased. The principle is as follows: in the step of double-strand library construction, the double-strand DNA template needs to be subjected to end repair to form a DNA double strand with a flat end or a 3' -A sticky end, and then a linker sequence is connected. The DNA synthetase used for end repair has activity of synthesis from the notch (Extension from Nick), and displaces or enzymatically cleaves the original DNA template strand downstream from the notch position along the 5 'to 3' direction of the DNA, and synthesizes a substitute strand with artificially supplied unmodified bases, resulting in deletion of DNA methylation information erasure in a sequence, and such methylation erasure cannot be recovered at a later stage by means of bioinformatics or the like, because of the randomness of the notch position, and the effective methylation information part cannot be accurately judged using a uniform standard. The preprocessing method provided herein repairs gaps prior to library construction, solving the problem of extensive methylation loss (fig. 4).
Reference:
1. Lo, Y. M. D., Han, D. S. C., Jiang, P. & Chiu, R. W. K. Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies. Science 372, (2021).
2. Sanchez, C., Roch, B., Mazard, T., Blache, P., Dache, Z., Pastor, B., Pisareva, E., Tanos, R., & Thierry, A. R. (2021). Circulating nuclear DNA structural features, origins, and complete size profile revealed by fragmentomics. JCI insight, 6(7), e144561. https://doi.org/10.1172/jci.insight.144561.
3. (Sanchez, C., Snyder, M. W., Tanos, R., Shendure, J., & Thierry, A. R. (2018). New insights into structural features and optimal detection of circulating tumor DNA determined by single-strand DNA analysis.NPJ genomic medicine, 3, 31. https://doi.org/10.1038/s41525-018-0069-0).
4. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell.2008; 133:523–36.
5.Erger, F., Nörling, D., Borchert, D., Leenen, E., Habbig, S., Wiesener, M. S., Bartram, M. P., Wenzel, A., Becker, C., Toliat, M. R., Nürnberg, P., Beck, B. B., &Altmüller, J. (2020). cfNOMe — A single assay for comprehensive epigenetic analyses of cell-free DNA. Genome Medicine, 12(1), 54. https://doi.org/10.1186/s13073-020-00750-5.
sequence listing
<110> Changping national laboratory
<120> free DNA methylation sequencing library construction method and application
<160> 4
<170> SIPOSequenceListing 1.0
<210> 1
<211> 33
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 1
acactctttc cctacacgac gctcttccga tct 33
<210> 2
<211> 32
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 2
gatcggaaga gcacacgtct gaactccagt ca 32
<210> 3
<211> 149
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 3
ttttaaaaat atttaagtaa agtagagaat gtagaaatgt tatagattat attttttgat 60
tatgatataa taaaattaga aattatagta tggaaattta aaagtttttt tttaaataat 120
tttatgttaa aaagaaattt aggtcgggt 149
<210> 4
<211> 150
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 4
atgtttgtaa ttttagtatt ttgggaggtt aaggtgggta ggttatttga ggttagtagt 60
ttaagattag tttcgttaat atggtaatat tttgttttta ttaaaaatat aaaaattagt 120
tgggtttggt ggtttatgtt tgtaatttta 150

Claims (16)

1. A method of pre-processing a biological sample for constructing a sequencing library, comprising:
s1: repairing single-stranded gaps in DNA fragments contained in the sample with DNA ligase; and
s2: filling up the terminal gaps on the DNA fragments by using DNA polymerase,
wherein the sequencing library is used to obtain DNA methylation information data of the biological sample after sequencing or to obtain DNA methylation information and fragmentation omics information data of a biological sample;
wherein the single-stranded notch in step S1 is a 3 ', 5' -phosphodiester linkage notch.
2. The method of claim 1, wherein the DNA fragment is between 50bp and 1000 bp in length.
3. The method of claim 1, further comprising fragmenting DNA molecules contained in the biological sample to generate the DNA fragments prior to step S1.
4. The method of claim 1, wherein the ligase has 3 ', 5' -phosphodiester bond catalytic activity.
5. The method of claim 1 or 4, wherein the ligase is selected from the group consisting of HiFi-Taq ligase, T4 DNA ligase, Taq DNA ligase,E.coliDNA ligase and T7 DNA ligase.
6. The method of claim 1, wherein the biological sample is selected from a cell culture fluid or a natural biological sample.
7. The method of claim 1, wherein the biological sample is selected from the group consisting of peripheral blood, plasma, serum, urine, feces, saliva, cerebrospinal fluid, lymph fluid, alveolar lavage fluid, amniotic fluid, blastocoel fluid, cell culture fluid, embryo culture fluid, microbial culture medium, soil leachate, and bone meal leachate.
8. The method of claim 7, wherein the DNA fragments are selected from extracellular free DNA and fragmented DNA.
9. The method of claim 1, wherein the amount of DNA fragments in the sample is no greater than 10 ng.
10. The method of claim 1, wherein the DNA methylation comprises methylation and/or hydroxymethylation of cytosine.
11. A method of constructing a sequencing library, comprising:
1) providing a biological sample and pre-treating said biological sample by the method of any one of claims 1 to 10 to obtain DNA fragments that do not have 3 ', 5' -phosphodiester gaps;
2) constructing the sequencing library using the DNA fragments;
wherein the sequencing library is used to obtain DNA methylation information data of the biological sample after sequencing or to obtain DNA methylation information and fragmentation omics information data of the biological sample.
12. The method of claim 11, wherein step 2) further comprises:
a) carrying out end filling-in treatment and adding a linker sequence to two ends of the DNA fragment, wherein part or all of cytosines in the linker sequence are methylated cytosines and/or hydroxymethylated cytosines; and
b) converting cytosine in the DNA fragment into uracil by an enzyme conversion method, and identifying the uracil as thymine in subsequent amplification and sequencing, and identifying the methylated cytosine as cytosine in subsequent amplification and sequencing;
or
a) Carrying out terminal filling-in treatment and adding a linker sequence on two ends of the DNA fragment;
b) converting the methylated cytosine in the DNA fragment into uracil or dihydrouracil by an enzymatic conversion method, wherein the uracil or the dihydrouracil is identified as thymine in subsequent amplification and sequencing, and the unmethylated cytosine is identified as cytosine in subsequent amplification and sequencing.
13. The method of claim 12, wherein said enzymatic conversion process employs EM-seq conversion process or TAPS conversion process.
14. The method of claim 12, further comprising adding carrier DNA to the biological sample after step a).
15. The method of claim 14, wherein the carrier DNA does not contain the linker sequence and the fragment size of the carrier DNA is 100-500 bp.
16. A method for determining the location of a single-stranded nick in a DNA fragment contained in a biological sample, comprising:
1) dividing the biological sample into two parts: biological sample a and biological sample B;
2) processing a biological sample a using the method of any one of claims 1-10 to prepare a first sequencing library; treating a biological sample B using the method of any one of claims 1-10, with the proviso that the DNA ligase is not used in step S1 to prepare a second sequencing library; and
3) determining the single-stranded nick location based on the difference in DNA methylation information obtained from the first sequencing library and the second sequencing library,
wherein the single-chain notch is a single-chain 3 ', 5' -phosphodiester linkage notch.
CN202210365172.3A 2022-04-08 2022-04-08 Free DNA methylation sequencing library construction method and application Active CN114438184B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210365172.3A CN114438184B (en) 2022-04-08 2022-04-08 Free DNA methylation sequencing library construction method and application
PCT/CN2022/103499 WO2023193357A1 (en) 2022-04-08 2022-07-02 Method for constructing free dna methylation sequencing library and use thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210365172.3A CN114438184B (en) 2022-04-08 2022-04-08 Free DNA methylation sequencing library construction method and application

Publications (2)

Publication Number Publication Date
CN114438184A CN114438184A (en) 2022-05-06
CN114438184B true CN114438184B (en) 2022-07-12

Family

ID=81358674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210365172.3A Active CN114438184B (en) 2022-04-08 2022-04-08 Free DNA methylation sequencing library construction method and application

Country Status (2)

Country Link
CN (1) CN114438184B (en)
WO (1) WO2023193357A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111154846A (en) * 2020-01-13 2020-05-15 四川大学华西医院 Detection method of methylated nucleic acid
CN114438184B (en) * 2022-04-08 2022-07-12 昌平国家实验室 Free DNA methylation sequencing library construction method and application
CN115678964B (en) * 2022-11-08 2023-07-14 广州女娲生命科技有限公司 Noninvasive screening method of embryo before implantation based on embryo culture solution
CN115976161A (en) * 2022-11-30 2023-04-18 天昊基因科技(苏州)有限公司 CpG island methylation enrichment sequencing technology based on restriction enzyme digestion
CN118308491A (en) * 2023-01-06 2024-07-09 北京昌平实验室 Single cell DNA methylation detection method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2984019A1 (en) * 2017-10-27 2019-04-27 Marie-Chantal Gregoire Double-strand dna break quantification method
CN113897414A (en) * 2021-10-11 2022-01-07 湖南大地同年生物科技有限公司 Trace nucleic acid library construction method

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8236499B2 (en) * 2008-03-28 2012-08-07 Pacific Biosciences Of California, Inc. Methods and compositions for nucleic acid sample preparation
WO2018195224A1 (en) * 2017-04-18 2018-10-25 Fred Hutchinson Cancer Research Center Barcoded transposases to increase efficiency of high-accuracy genetic sequencing
CN107858409B (en) * 2017-11-12 2021-05-04 深圳市易基因科技有限公司 Methylation library-building sequencing method for micro-degradation genome DNA and kit thereof
CN108166068A (en) * 2018-01-02 2018-06-15 上海美吉生物医药科技有限公司 A kind of Novel DNA builds library kit and its application
CN107904667A (en) * 2018-01-02 2018-04-13 上海美吉生物医药科技有限公司 A kind of new methylate builds storehouse kit and its application
CN110669824A (en) * 2019-10-11 2020-01-10 广州迈森致远基因科技有限公司 Kit and method for methylation library construction of low-initial-amount plasma free DNA
WO2021077415A1 (en) * 2019-10-25 2021-04-29 Peking University Methylation detection and analysis of mammalian dna
CN114032287B (en) * 2021-11-24 2024-09-17 竹石生物科技(苏州)有限公司 DNA methylation sequencing library, construction method and detection method thereof
CN114438184B (en) * 2022-04-08 2022-07-12 昌平国家实验室 Free DNA methylation sequencing library construction method and application

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2984019A1 (en) * 2017-10-27 2019-04-27 Marie-Chantal Gregoire Double-strand dna break quantification method
CN113897414A (en) * 2021-10-11 2022-01-07 湖南大地同年生物科技有限公司 Trace nucleic acid library construction method

Also Published As

Publication number Publication date
CN114438184A (en) 2022-05-06
WO2023193357A1 (en) 2023-10-12

Similar Documents

Publication Publication Date Title
CN114438184B (en) Free DNA methylation sequencing library construction method and application
US20210095341A1 (en) Multiplex 5mc marker barcode counting for methylation detection in cell free dna
CN111183145B (en) High sensitivity DNA methylation analysis method
EP2470675B1 (en) Detection and quantification of hydroxymethylated nucleotides in a polynucleotide preparation
CA3080686C (en) Varietal counting of nucleic acids for obtaining genomic copy number information
US20120003657A1 (en) Targeted sequencing library preparation by genomic dna circularization
CN109952381B (en) Method for multiplex detection of methylated DNA
EP4370711A1 (en) Modified adapters for enzymatic dna deamination and methods of use thereof for epigenetic sequencing of free and immobilized dna
WO2007067719A2 (en) Diagnosing human diseases by detecting dna methylation changes
Tost Current and emerging technologies for the analysis of the genome-wide and locus-specific DNA methylation patterns
JP2023508795A (en) Methods and Kits for Enrichment and Detection of DNA and RNA Modifications, and Functional Motifs
CN118591636A (en) DNA methylation library construction method, library thereof, DNA hybridization capturing method and kit
CN113969307A (en) DNA methylation sequencing library, preparation method and DNA methylation detection method
CN113881739B (en) Method for oxidizing nucleic acid molecules containing jagged ends, reduction method and library construction method
CN113493932B (en) Method and kit for constructing capture library with high detection performance
WO2024114696A1 (en) Cpg island methylation enrichment sequencing technology based on restriction enzyme digestion
Tost Current and emerging technologies for the analysis of the genome-wide and locus-specific DNA methylation patterns
TW202144586A (en) Methods and kits for screening colorectal neoplasm
CN113493835A (en) Method and kit for screening large intestine tumor by detecting methylation state of BCAN gene region
US20240301466A1 (en) Efficient duplex sequencing using high fidelity next generation sequencing reads
CN113817723B (en) Polynucleotide and standard substance, kit and application thereof
CN117778568A (en) Marker for identifying gastric cancer and application thereof
EP4172357B1 (en) Methods and compositions for analyzing nucleic acid
WO2024056008A1 (en) Methylation marker for identifying cancer and use thereof
Agborbesong et al. Investigation of DNA Methylation in Autosomal Dominant Polycystic Kidney Disease

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant