WO2024064915A2 - Methods for enriching nucleic acid target sequences - Google Patents

Methods for enriching nucleic acid target sequences Download PDF

Info

Publication number
WO2024064915A2
WO2024064915A2 PCT/US2023/074937 US2023074937W WO2024064915A2 WO 2024064915 A2 WO2024064915 A2 WO 2024064915A2 US 2023074937 W US2023074937 W US 2023074937W WO 2024064915 A2 WO2024064915 A2 WO 2024064915A2
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
nuclease
target
certain embodiments
molecule
Prior art date
Application number
PCT/US2023/074937
Other languages
French (fr)
Other versions
WO2024064915A3 (en
Inventor
Anthony P. Shuber
Caitlin M. GILLEY
Rosemary Turingan Witkowski
Original Assignee
Flagship Pioneering Innovations Vi, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Flagship Pioneering Innovations Vi, Llc filed Critical Flagship Pioneering Innovations Vi, Llc
Publication of WO2024064915A2 publication Critical patent/WO2024064915A2/en
Publication of WO2024064915A3 publication Critical patent/WO2024064915A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]

Definitions

  • the invention relates generally to methods for enriching nucleic acid target sequences from a sample, for example, from a biological sample or from a nucleic acid library.
  • Detection of target sequences in a nucleic acid can be a challenge when the target sequence is present at a low frequency in the nucleic acid sample. Amplification and/or sequencing of target sequences can fail if such sequences occur at a low frequency. For example, circulating tumor DNA (ctDNA) levels are present at a very low frequency in most early-stage and many advanced stage cancer patients (Bettegowda et al. (2014) Sci Transl Med 6(224): p. 224ra24). Accordingly, a major challenge in the identification of ctDNA is how to identify a trace amount of ctDNAs out of a much larger proportion of total cell free DNA (cfDNA).
  • cfDNA total cell free DNA
  • the disclosure relates to methods of enriching target sequences in a nucleic acid sample.
  • the methods include, for example, cutting a nucleic acid molecule that includes a target sequence to form a single-stranded overhang, filling in the overhang with a label, and capturing the nucleic acid molecule that includes the target, thereby enriching the target sequence.
  • the methods can be used to enrich target sequences prior to assembling a nucleic acid library or can be used to enrich target sequences in an existing library.
  • the disclosure relates to a nucleic acid enrichment method.
  • the method includes cutting a nucleic acid molecule that includes a target sequence to generate a single stranded overhang at a cut end of the molecule that includes the target; filling in the overhang with at least one labeled nucleotide; and enriching the molecule that includes the target by contacting at least one of the labeled nucleotides in the molecule with a capture domain.
  • the cutting step is performed by a nuclease, for example, a CRISPR-Cas nuclease.
  • the nuclease is a type II CRISPR-Cas nuclease.
  • the nuclease is a Cas9 nuclease.
  • the nuclease is a type V CRISPR-Cas nuclease.
  • the nuclease is a Casl2 nuclease.
  • the nuclease is a Casl2a/Cpfl nuclease.
  • the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
  • gRNA guide RNA
  • the cutting step is performed at room temperature.
  • the overhang is filled in using a DNA polymerase.
  • the DNA polymerase is DNA polymerase I.
  • the DNA polymerase I consists of the Klenow fragment.
  • the label comprises biotin or digoxigenin.
  • the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
  • the capture domain comprises or is connected to a solid support.
  • the solid support is a bead, a well, a tube, or a slide.
  • the capture domain comprises streptavidin connected to the bead.
  • the nucleic acid molecule is present in a nucleic acid sequencing library, and the method enriches target sequences in the library.
  • the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
  • the nucleic acid sample is a plasma sample.
  • the plasma sample is used directly in the nucleic acid enrichment method (for example, directly in the cutting step) without prior enrichment or purification of the nucleic acid.
  • the nucleic acid sample comprises cell free DNA (cfDNA).
  • cytosines in the cfDNA have been converted to uracils.
  • the cfDNA has been treated with bisulfite.
  • the method further comprises the step of converting methylated cytosines to uracils.
  • the method further comprises preparing a library before or after enriching the molecule that includes the target.
  • the method further comprises a wash step to remove nucleic acid molecules that do not include the target.
  • the method further comprises amplifying the nucleic acid molecule.
  • the amplification occurs while the nucleic acid is in contact with the capture domain.
  • the method further comprises sequencing the enriched molecule.
  • the method further comprises separating the nucleic acid molecule from the capture domain.
  • the separating step comprises heat elution off of the capture domain.
  • the separating step is performed using a chemical agent.
  • the separating step is performed using mechanical disruption.
  • the separating step is performed using heat elution, a chemical agent, mechanical disruption, or combinations thereof.
  • the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
  • the method further comprises an additional enrichment step.
  • the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
  • the additional enrichment step comprises hybrid capture.
  • the additional enrichment step comprises using a nucleic acid binding protein.
  • the disclosure relates to a method of capturing a nucleic acid molecule having a target sequence.
  • the method includes cutting a nucleic acid molecule that includes a target sequence to generate a single stranded overhang at a cut end of the molecule that includes the target; filling in the overhang with at least one labeled nucleotide; and capturing the molecule that includes the target by contacting at least one of the labeled nucleotides in the molecule with a capture domain.
  • the cutting step is performed by a nuclease.
  • the nuclease is a CRISPR-Cas nuclease.
  • the nuclease is a type II CRISPR-Cas nuclease.
  • the nuclease is a Cas9 nuclease.
  • the nuclease is a type V CRISPR-Cas nuclease.
  • the nuclease is a Casl2 nuclease.
  • the nuclease is a Casl2a/Cpfl nuclease.
  • the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
  • gRNA guide RNA
  • the cutting step is performed at room temperature.
  • the overhang is filled in using a DNA polymerase.
  • the DNA polymerase is DNA polymerase I.
  • the DNA polymerase I consists of the Klenow fragment.
  • the label comprises biotin or digoxigenin.
  • the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
  • the capture domain comprises or is connected to a solid support.
  • the solid support is a bead, a well, a tube, or a slide.
  • the capture domain comprises streptavidin connected to the bead.
  • the nucleic acid molecule is present in a nucleic acid sequencing library, and the method captures target sequences of interest in the library.
  • the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
  • the nucleic acid sample is a plasma sample and the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
  • the nucleic acid sample comprises cell free DNA (cfDNA).
  • the cfDNA have been converted to uracils.
  • the cfDNA has been treated with bisulfite.
  • the method further comprises the step of converting methylated cytosines to uracils.
  • the method further comprises preparing a library before or after capturing the molecule that includes the target.
  • the method further comprises a wash step to remove nucleic acid molecules that do not include the target.
  • the method further comprises amplifying the nucleic acid molecule. In certain embodiments, the amplification occurs while the nucleic acid is in contact with the capture domain. In certain embodiments, the method further comprises sequencing the captured molecule. In certain embodiments, the method further comprises separating the nucleic acid molecule from the capture domain. In certain embodiments, the separating step comprises heat elution off of the capture domain. In certain embodiments, the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
  • the method further comprises an additional enrichment step.
  • the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
  • the additional enrichment step comprises hybrid capture.
  • the additional enrichment step comprises using a nucleic acid binding protein.
  • the disclosure relates to a nucleic acid enrichment method.
  • the method includes cutting a nucleic acid molecule that includes a target sequence to generate a single stranded overhang at a cut end of the molecule that includes the target; filling in the overhang with at least one labeled nucleotide; and enriching the molecule that includes the target by separating labeled molecules from unlabeled molecules.
  • the cutting step is performed by a nuclease.
  • the nuclease is a CRISPR-Cas nuclease.
  • the nuclease is a type V CRISPR-Cas nuclease.
  • the nuclease is a Cas9 or Casl2 nuclease.
  • the nuclease is a Casl2a/Cpfl nuclease.
  • the nuclease is a MAD7 nuclease.
  • the nuclease is a CasX nuclease.
  • the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
  • gRNA guide RNA
  • the cutting step is performed at room temperature.
  • the overhang is filled in using a DNA polymerase.
  • the DNA polymerase is DNA polymerase I.
  • the DNA polymerase I consists of the Klenow fragment.
  • the label comprises biotin, digoxigenin, or a fluorophore.
  • the capture domain comprises or is connected to a solid support.
  • the solid support is a bead, a well, a tube, or a slide.
  • the capture domain comprises streptavidin connected to the bead.
  • the nucleic acid molecule is present in a nucleic acid sequencing library, and the method enriches target sequences of interest in the library.
  • the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
  • the nucleic acid sample is a plasma sample.
  • the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
  • the nucleic acid sample comprises cell free DNA (cfDNA).
  • cytosines in the cfDNA have been converted to uracils.
  • the cfDNA has been treated with bisulfite.
  • the method further comprises the step of converting methylated cytosines to uracils.
  • the method further comprises preparing a library before or after enriching the molecule that includes the target.
  • the method includes a wash step.
  • the method further comprises amplifying the nucleic acid molecule. In certain embodiments, the amplification occurs while the nucleic acid is in contact with the capture domain. In certain embodiments, the method further comprises sequencing the enriched molecule. In certain embodiments, the method further comprises separating the nucleic acid molecule from the capture domain. In certain embodiments, the separating step comprises heat elution off of the capture domain. In certain embodiments, the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
  • the method further comprises an additional enrichment step.
  • the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
  • the additional enrichment step comprises hybrid capture.
  • the additional enrichment step comprises using a nucleic acid binding protein.
  • the disclosure relates to a method of producing a nucleic acid library enriched for regions of interest.
  • the method includes cutting a plurality of nucleic acid molecules comprising regions of interest to generate single stranded overhangs at cut ends of the molecules that include the regions of interest; filling in each overhang with a least one labeled nucleotide; and enriching the molecules that include the regions of interest by contacting the labeled nucleotides in the molecule with capture domains.
  • the cutting step is performed by a nuclease.
  • the nuclease is a CRISPR-Cas nuclease.
  • the nuclease is a type II CRISPR-Cas nuclease.
  • the nuclease is a Cas9 nuclease.
  • the nuclease is a type V CRISPR-Cas nuclease.
  • the nuclease is a Casl2 nuclease.
  • the nuclease is a Casl2a/Cpfl nuclease.
  • the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecules that include the regions of interest.
  • gRNA guide RNA
  • the cutting step is performed at room temperature.
  • the overhangs are filled in using a DNA polymerase.
  • the DNA polymerase is DNA polymerase I.
  • the DNA polymerase I consists of the Klenow fragment.
  • the label comprises biotin, digoxigenin, or a fluorophore.
  • the capture domains comprise or are connected to solid supports.
  • the solid supports are beads, wells, tubes, or slides.
  • the capture domains comprise streptavidin connected to beads.
  • the method further comprises amplifying the nucleic acid molecules.
  • the amplifying is performed with primers that comprise adapters to facilitate sequencing of the nucleic acid molecules.
  • the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
  • the nucleic acid sample is a plasma sample.
  • the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
  • the nucleic acid sample comprises cell free DNA (cfDNA).
  • cytosines in the cfDNA have been converted to uracils.
  • the cfDNA has been treated with bisulfite.
  • the method further comprises the step of converting methylated cytosines to uracils.
  • the method further comprises a wash step to remove nucleic acid molecules that do not include the regions of interest.
  • the method further comprises amplifying the nucleic acid molecule. In certain embodiments, the amplification occurs while the nucleic acid is in contact with the capture domain. In certain embodiments, the method further comprises separating the nucleic acid molecules from the capture domains. In certain embodiments, the separating step comprises heat elution off of the capture domain. In certain embodiments, the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
  • the method further comprises an additional enrichment step.
  • the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
  • the additional enrichment step comprises hybrid capture.
  • the additional enrichment step comprises using a nucleic acid binding protein.
  • the disclosure relates to a method for producing a nucleic acid library enriched for regions of interest.
  • the method includes obtaining a sample comprising a plurality of nucleic acids, wherein a subset of the plurality of nucleic acids comprise regions of interest; optionally converting methylated cytosines to uracils; adding nucleic acid adapters to the plurality of nucleic acids to form a nucleic acid library; cutting the subset of the plurality of nucleic acid molecules having regions of interest to generate single stranded overhangs at cut ends of the molecules that include the regions of interest; filling in each overhang with a least one labeled nucleotide; enriching the molecules that include the regions of interest by contacting the labeled nucleotides in the molecule with capture domains; and amplifying the molecules that include the regions of interest to form the nucleic acid library enriched for regions of interest.
  • the disclosure relates to a method for producing a nucleic acid library enriched for regions of interest.
  • the method includes obtaining a sample comprising a plurality of nucleic acids, wherein a subset of the plurality of nucleic acids comprise regions of interest; cutting the subset of the plurality of nucleic acid molecules having regions of interest to generate single stranded overhangs at cut ends of the molecules that include the regions of interest; filling in each overhang with a least one labeled nucleotide; enriching the molecules that include the regions of interest by contacting the labeled nucleotides in the molecule with capture domains; removing the molecules that include the regions of interest from the capture domains; optionally converting methylated cytosines to uracils; and adding nucleic acid adapters to the plurality of nucleic acids to form the nucleic acid library enriched for regions of interest.
  • the cutting step is performed by a nuclease.
  • the nuclease is a CRISPR-Cas nuclease.
  • the nuclease is a type II CRISPR-Cas nuclease.
  • the nuclease is a Cas9 nuclease.
  • the nuclease is a type V CRISPR-Cas nuclease.
  • the nuclease is a Casl2 nuclease.
  • the nuclease is a Casl2a/Cpfl nuclease.
  • the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
  • gRNA guide RNA
  • the cutting step is performed at room temperature.
  • the overhang is filled in using a DNA polymerase.
  • the DNA polymerase is DNA polymerase I.
  • the DNA polymerase I consists of the Klenow fragment.
  • the label comprises biotin or digoxigenin.
  • the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
  • the capture domain comprises or is connected to a solid support.
  • the solid support is a bead, a well, a tube, or a slide.
  • the capture domain comprises streptavidin connected to the bead.
  • the disclosure relates to a nucleic acid library, produced by the methods described herein.
  • the method further comprises an additional enrichment step.
  • the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
  • the additional enrichment step comprises hybrid capture.
  • the additional enrichment step comprises using a nucleic acid binding protein.
  • the disclosure relates to a kit comprising a nuclease that cuts a nucleic acid molecule including a target sequence to generate a single stranded overhand at a cut end of the molecule that includes the target; labeled dNTPs; DNA polymerase; and a capture moiety comprising a capture domain.
  • the nuclease is a CRISPR-Cas nuclease. In certain embodiments, the nuclease is a type II CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Cas9 nuclease. In certain embodiments, the nuclease is a type V CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Casl2 nuclease. In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease.
  • the disclosure relates to a nucleic acid enrichment method comprising the steps of (a) designing a first set of guide RNAs to bind a first set of target sequences for cleavage with a first nuclease, (b) designing a second set of guide RNAs to bind a second set of target sequences for cleavage with a second nuclease, (c) adding the first and second sets of guide sequences and the first and second nucleases to a nucleic acid comprising a plurality of target sequences, (d) generating single stranded overhangs at the cleavage sites in the first and second sets of target sequences, (e) filling in each overhang with at least one labeled nucleotide; and (f) enriching the target sequences by contacting at least one of the labeled nucleotides in the molecule with a capture domain.
  • the first nuclease or the second nuclease is a CRISPR-Cas nuclease. In certain embodiments, the first nuclease or the second nuclease is a type II or a type V CRISPR-Cas nuclease. In certain embodiments, the first nuclease or the second nuclease is a Cas9, Casl2, or CasX nuclease. In certain embodiments, the first nuclease or the second nuclease is a Casl2a/Cpfl nuclease.
  • the first nuclease or the second nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
  • gRNA guide RNA
  • the cutting step is performed at room temperature.
  • the overhang is filled in using a DNA polymerase.
  • the DNA polymerase is DNA polymerase I.
  • the DNA polymerase I consists of the Klenow fragment.
  • the label comprises biotin or digoxigenin.
  • the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
  • the capture domain comprises or is connected to a solid support.
  • the solid support is a bead, a well, a tube, or a slide.
  • the capture domain comprises streptavidin connected to the bead.
  • the nucleic acid molecule is present in a nucleic acid sequencing library, and the method enriches target sequences in the library.
  • the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
  • the nucleic acid sample is a plasma sample.
  • the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
  • the nucleic acid sample comprises cell free DNA (cfDNA).
  • cytosines in the cfDNA have been converted to uracils.
  • the cfDNA has been treated with bisulfite.
  • the method further comprises preparing a library before or after enriching the molecule that includes the target. In certain embodiments, the method further comprising the step of converting methylated cytosines to uracils.
  • the method further comprising a wash step to remove nucleic acid molecules that do not include the target.
  • the method further comprising amplifying the nucleic acid molecule.
  • the amplification occurs while the nucleic acid is in contact with the capture domain.
  • the method further comprising sequencing the enriched molecule.
  • the method further comprising separating the nucleic acid molecule from the capture domain.
  • the separating step is performed using heat elution, a chemical agent, mechanical disruption, or combinations thereof.
  • the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
  • the method further comprises an additional enrichment step.
  • the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
  • the additional enrichment step comprises hybrid capture.
  • the additional enrichment step comprises using a nucleic acid binding protein.
  • FIGURE 1 is a schematic flowchart showing a method according to the disclosure for enriching target sequences (z.e., regions of interest (RO I)) from a nucleic acid library.
  • target sequences z.e., regions of interest (RO I)
  • FIGURE 2 is a schematic flowchart showing a method according to the disclosure for enriching target sequences (z.e., regions of interest (RO I)) from a sample and constructing a nucleic acid library using the enriched target sequences.
  • target sequences z.e., regions of interest (RO I)
  • FIGURE 3 is a schematic of a bisulfite conversion reaction.
  • FIGURE 4 provides electrophoresis results for an experiment testing whether biotinylated dNTPs could be incorporated into a nucleic acid comprising a target sequence and enriched using streptavidin beads.
  • Bands representing a biotinylated target fragment bound to beads were seen in both the lx and 5x polymerase (“Enzyme”) conditions and with anywhere from 10% to 100% biotinylated dNTPs.
  • End control lacking polymerase enzyme and biotinylated dNTPs
  • no bind control lacking Casl2, crRNA, and polymerase enzyme
  • bind control which contained a biotinylated amplicon, did not contain biotinylated target fragment. Accordingly, streptavidin beads were capable of binding to and isolating target fragments that had incorporated biotinylated dNTPs.
  • FIGURE 5 provides a flow chart showing an exemplary process overview for Casl2a positive enrichment of target sequences.
  • FIGURE 6 provides a schematic of the steps of the exemplary library creation method of Example 5.
  • FIGURE 7 shows the sequencing results of a library constructed in Example 5 using the methods of the disclosure.
  • FIGURE 8 shows the sequencing results of a library constructed in Example 6 using the methods of the disclosure. As shown, target CpG-4 within a 5-plex target was successfully enriched using the methods of the disclosure.
  • the disclosure relates to methods of enriching target sequences in a nucleic acid sample.
  • the methods include, for example, cutting a nucleic acid molecule that includes a target sequence to form a single-stranded overhang, filling in the overhang with a label, and capturing the nucleic acid molecule that includes the target, thereby enriching the target sequence.
  • the methods can be used, for example, to enrich target sequences prior to assembling a nucleic acid library or can be used to enrich target sequences in an existing library.
  • FIG. 1 An exemplary method of enriching target sequences is shown in FIG. 1.
  • a cell-free DNA (cfDNA) sample comprising methylated nucleotides that have been converted using bisulfite treatment are used to construct a nucleic acid library.
  • sgRNAs complementary to target sequences are constructed, and the library is exposed to Casl2 and the sgRNAs.
  • the sgRNAs direct Casl2 to the target sequence and cleave the DNA, leaving an overhang.
  • Biotinylated dNTPs are added with a polymerase (Klenow fragment) to fill in the nucleotides complementary to the overhang.
  • Streptavidin beads bind to the biotinylated nucleotides of the target sequences, and any uncut, off-target sequences are washed away. A PCR amplification is then performed to amplify the enriched target sequences that have been bound to the bead.
  • FIG. 2 Another exemplary method of enriching target sequences is shown in FIG. 2.
  • a cell-free DNA (cfDNA) sample is exposed to Casl2 and sgRNAs complementary to target sequences in a target region.
  • the sgRNAs direct Casl2 to the target sequence and cleave the DNA, leaving an overhang.
  • Biotinylated dNTPs are added with a polymerase (Klenow fragment) to fill in the nucleotides complementary to the overhang.
  • Streptavidin beads bind to the biotinylated nucleotides present in the target region, and any uncut, off-target sequences are washed away.
  • the enriched target sequences are eluted off of the beads using heat treatment.
  • the eluted target sequences are then treated with bisulfite to preserve methylation information.
  • the bisulfite converted target sequences are used to construct a library that is enriched for target sequences.
  • Enzymatic reactions and purification techniques are performed according to manufacturer’s specifications, as commonly accomplished in the art or as described herein.
  • the nomenclatures used in connection with, and the laboratory procedures and techniques of, analytical chemistry, biochemistry, immunology, molecular biology, synthetic organic chemistry, and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art. Standard techniques are used for chemical syntheses, and chemical analyses.
  • a stated range of “1 to 10” should be considered to include any and all subranges between (and inclusive of) the minimum value of 1 and the maximum value of 10; that is, all subranges beginning with a minimum value of 1 or more, e.g., 1 to 6.1, and ending with a maximum value of 10 or less, e.g., 5.5 to 10.
  • the term “about” or “approximately” can mean within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which can depend in part on how the value is measured or determined, e.g., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. “About” can mean a range of ⁇ 20%, ⁇ 10%, ⁇ 5%, or ⁇ 1% of a given value. The term “about” or “approximately” can mean within an order of magnitude, within 5-fold, or within 2-fold, of a value.
  • biological sample refers to any sample taken from a subject, which can reflect a biological state associated with the subject, and that includes cell free DNA.
  • a biological sample can take any of a variety of forms, such as a liquid biopsy (e.g., blood, urine, stool, saliva, or mucous), or a tissue biopsy, or other solid biopsy.
  • a biological sample can include any tissue or material derived from a living or dead subject.
  • a biological sample can be a cell-free sample.
  • a biological sample can comprise a nucleic acid e.g., DNA or RNA) or a fragment thereof.
  • the term “nucleic acid” can refer to deoxyribonucleic acid (DNA), ribonucleic acid (RNA) or any hybrid or fragment thereof.
  • the nucleic acid in the sample can be a cell-free nucleic acid.
  • a sample can be a liquid sample or a solid sample e.g., a cell or tissue sample).
  • a biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g., of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g., thyroid, breast), etc.
  • a biological sample can be a stool sample.
  • the majority of DNA in a biological sample that has been enriched for cell-free DNA can be cell-free (e.g., greater than 50%, 60%, 70%, 80%, 90%, 95%, or 99% of the DNA can be cell-free).
  • a biological sample can be treated to physically disrupt tissue or cell structure (e.g., centrifugation and/or cell lysis), thus releasing intracellular components into a solution which can further contain enzymes, buffers, salts, detergents, and the like which can be used to prepare the sample for analysis.
  • nucleic acid and “nucleic acid molecule” are used interchangeably.
  • the terms refer to nucleic acids of any composition form, such as deoxyribonucleic acid (DNA, e.g., complementary DNA (cDNA), genomic DNA (gDNA) and the like), and/or DNA analogs (e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like), all of which can be in single- or double-stranded form.
  • DNA deoxyribonucleic acid
  • cDNA complementary DNA
  • gDNA genomic DNA
  • DNA analogs e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like
  • a nucleic acid can comprise known analogs of natural nucleotides, some of which can function in a similar manner as naturally occurring nucleotides.
  • a nucleic acid can be in any form useful for conducting processes herein (e.g., linear, circular, supercoiled, single-stranded, double-stranded and the like).
  • a nucleic acid in some embodiments can be from a single chromosome or fragment thereof (e.g. , a nucleic acid sample may be from one chromosome of a sample obtained from a diploid organism).
  • nucleic acids comprise nucleosomes, fragments or parts of nucleosomes or nucleosome-like structures.
  • Nucleic acids can comprise protein e.g., histones, DNA binding proteins, and the like).
  • Nucleic acids analyzed by processes described herein can be substantially isolated and are not substantially associated with protein or other molecules.
  • Nucleic acids can also include derivatives, variants and analogs of DNA synthesized, replicated or amplified from single- stranded (“sense” or “antisense,” “plus” strand or “minus” strand, “forward” reading frame or “reverse” reading frame) and double-stranded polynucleotides.
  • Deoxyribonucleotides can include deoxy adenosine, deoxycytidine, deoxyguanosine and deoxy thymidine.
  • a nucleic acid may be prepared using a nucleic acid obtained from a subject as a template.
  • template nucleic acid and “template nucleic acid molecule(s)” are used interchangeably.
  • the terms refer to nucleic acid that has been obtained from a sample and processed to form an immortalized library.
  • the template nucleic acid can be nucleic acid obtained directly from the sample, or nucleic acid that is derived from that obtained directly from the sample.
  • Examples of nucleic acid derived from a sample include DNA that has been reverse-transcribed from RNA obtained directly from a sample, or DNA that has be amplified from DNA obtained directly from a sample, for example, by PCR.
  • cell-free nucleic acids refers to nucleic acid molecules that can be found outside cells, in bodily fluids such as blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of a subject.
  • Cell-free nucleic acids originate from one or more healthy cells and/or from one or more cancer cells, or from non-human sources such bacteria, fungi, viruses. Examples of the cell-free nucleic acids include but are not limited to cell-free DNA (“cfDNA”), including mitochondrial DNA or genomic DNA, and cell-free RNA.
  • instruments for assessing the quality of the cell-free nucleic acids such as the TapeStation System from Agilent Technologies (Santa Clara, CA) can be used. Concentrating low- abundance cfDNA can be accomplished, for example using a QubitTM Fluorometer from Thermofisher Scientific (Waltham, MA).
  • methylation refers to a modification of a nucleic acid where a hydrogen atom on the pyrimidine ring of a cytosine base is converted to a methyl group, forming 5-methylcytosine.
  • Methylation can occur at dinucleotides of cytosine and guanine referred to herein as “CpG sites”.
  • Methylation of cytosine can occur in cytosines in other sequence contexts, for example, 5'-CHG-3' and 5'-CHH-3', where H is adenine, cytosine or thymine. Cytosine methylation can also be in the form of 5-hydroxymethylcytosine.
  • Methylation of DNA can include methylation of non-cytosine nucleotides, such as N6-methyladenine.
  • Anomalous cfDNA methylation can be identified as hypermethylation or hypomethylation, both of which may be indicative of cancer status.
  • DNA methylation anomalies compared to healthy controls
  • methylation index for each genomic site (e.g., a CpG site, a region of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5'— >3' direction) can refer to the proportion of sequence reads showing methylation at the site over the total number of reads covering that site.
  • the “methylation density” of a region can be the number of reads at sites within a region showing methylation divided by the total number of reads covering the sites in the region.
  • the sites can have specific characteristics, e.g., the sites can be CpG sites).
  • the “CpG methylation density” of a region can be the number of reads showing CpG methylation divided by the total number of reads covering CpG sites in the region e.g., a particular CpG site, CpG sites within a CpG island, or a larger region).
  • the methylation density for each 100-kb bin in the human genome can be determined from the total number of unconverted cytosines (which can correspond to methylated cytosine) at CpG sites as a proportion of all CpG sites covered by sequence reads mapped to the 100-kb region. In some embodiments, this analysis is performed for other bin sizes, e.g., 50-kb or 1-Mb, etc.
  • a region is an entire genome or a chromosome or part of a chromosome (e.g., a chromosomal arm).
  • a methylation index of a CpG site can be the same as the methylation density for a region when the region includes that CpG site.
  • the “proportion of methylated cytosines” can refer the number of cytosine sites, “C's,” that are shown to be methylated (for example unconverted after bisulfite conversion) over the total number of analyzed cytosine residues, e.g., including cytosines outside of the CpG context, in the region.
  • the methylation index, methylation density and proportion of methylated cytosines are examples of “methylation levels.”
  • Certain portions of a genome comprise regions with a high frequency of CpG sites.
  • a CpG site is portion of a genome that has cytosine and guanine separated by only one phosphate group and is often denoted as “5' — C — phosphate — G — 3'”, or “CpG” for short.
  • Regions with a high frequency of CpG sites are commonly referred to as “CG islands” or “CGIs”. It has been found that certain CGIs and certain features of certain CGIs in tumor cells tend to be different from the same CGIs or features of the CGIs in healthy cells.
  • cancer informative CGIs such CGIS and features of the genome are referred to herein as “cancer informative CGIs”, which is defined and described in more detail below.
  • An “informative CpG” can be specified by reference to a specific CpG site, or to a collection of one or more CpG sites by reference to a CG island that contains the collection.
  • These cancer informative CGIs tend to have methylation patterns in tumor cells that are different from the methylation patterns in healthy cells. DNA fragments from other CGIs may not express such differences.
  • methylation profile can include information related to DNA methylation for a region.
  • Information related to DNA methylation can include a methylation index of a CpG site, a methylation density of CpG sites in a region, a distribution of CpG sites over a contiguous region, a pattern or level of methylation for each individual CpG site within a region that contains more than one CpG site, and non-CpG methylation.
  • a methylation profile of a substantial part of the genome can be considered equivalent to the methylome.
  • DNA methylation in mammalian genomes can refer to the addition of a methyl group to position 5 of the heterocyclic ring of cytosine e.g., to produce 5- methylcytosine) among CpG dinucleotides. Methylation of cytosine can occur in cytosines in other sequence contexts, for example, 5'-CHG-3' and 5'-CHH-3', where H is adenine, cytosine or thymine. Cytosine methylation can also be in the form of 5-hydroxymethylcytosine.
  • Methylation of DNA can include methylation of non-cytosine nucleotides, such as N6- methyladenine.
  • epitype or “nucleic acid epitype” refer to a region of nucleic acid (i.e., DNA or RNA) containing an epigenetic variation.
  • the epigenetic variation could be methylation or non-methylation of one or more nucleotides in a region of nucleic acid.
  • the nucleotide that could be methylated or non-methylated may be a cytidine, e.g., at a CpG site (e.g., the nucleotide could be 5 -methylcytidine or cytidine).
  • Exemplary CpG sites may be found in, for example, CpG islands (CGIs) shown in TABLES 1-4.
  • CpG islands (CGIs) may be regions having a length greater than 200 bp, a GC content greater than 50% and a ratio of observed to expected CpG greater than 0.6.
  • CpG islands are often found in promoter regions, where methylation is associated with transcriptional repression.
  • a nucleic acid epitype containing one or more CpG sites may have a methylation pattern, such as any of fully non-methylated (e.g., none of the CpG sites in the epitype are methylated), partially methylated (e.g., at least one but not all of the CpG sites in the epitype are methylated), or fully methylated (e.g., all of the CpG sites in the epitype are methylated).
  • the nucleotide that could be methylated or non-methylated may be adenosine (e.g., the nucleotide could be N6-methyladenosine or adenosine).
  • an amplification reaction is “template-driven” in that base pairing of reactants, either nucleotides or oligonucleotides, have complements in a template polynucleotide that are required for the creation of reaction products.
  • template-driven reactions are primer extensions with a nucleic acid polymerase, or oligonucleotide ligations with a nucleic acid ligase.
  • Such reactions include, but are not limited to, polymerase chain reactions (PCRs), linear polymerase reactions, nucleic acid sequence-based amplification (NASBAs), rolling circle amplifications, and the like, disclosed in the following references, each of which are incorporated herein by reference herein in their entirety: Mullis et al., U.S. Pat. Nos. 4,683,195; 4,965,188; 4,683,202; 4,800,159 (PCR); Gelfand et al., U.S. Pat. No. 5,210,015 (real-time PCR with “taqman” probes); Wittwer et al., U.S. Pat. No. 6,174,670; Kacian et al., U.S. Pat. No.
  • the amplification reaction is PCR.
  • An amplification reaction may be a “real-time” amplification if a detection chemistry is available that permits a reaction product to be measured as the amplification reaction progresses, e.g., “real-time PCR”, or “real-time NASBA” as described in Leone et al., Nucleic Acids Research, 26: 2150-2155 (1998), and like references.
  • reaction mixture means a solution containing all the necessary reactants for performing a reaction, which may include, but is not be limited to, buffering agents to maintain pH at a selected level during a reaction, salts, co-factors, scavengers, and the like.
  • fragment refers to a portion of a larger polynucleotide molecule.
  • a polynucleotide for example, can be broken up, or fragmented into, a plurality of segments.
  • Various methods of fragmenting nucleic acid are well known in the art. These methods may be, for example, either chemical or physical or enzymatic in nature.
  • Enzymatic fragmentation may include partial degradation with a DNase; partial depurination with acid; the use of restriction enzymes; intron-encoded endonucleases; DNA-based cleavage methods, such as triplex and hybrid formation methods, that rely on the specific hybridization of a nucleic acid segment to localize a cleavage agent to a specific location in the nucleic acid molecule; or other enzymes or compounds which cleave a polynucleotide at known or unknown locations.
  • Physical fragmentation methods may involve subjecting a polynucleotide to a high shear rate.
  • High shear rates may be produced, for example, by moving DNA through a chamber or channel with pits or spikes, or forcing a DNA sample through a restricted size flow passage, e.g., an aperture having a cross sectional dimension in the micron or submicron range.
  • Other physical methods include sonication and nebulization.
  • Combinations of physical and chemical fragmentation methods may likewise be employed, such as fragmentation by heat and ion-mediated hydrolysis. See, e.g., Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 3rd Ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001) (“Sambrook et al.) which is incorporated herein by reference for all purposes. These methods can be optimized to digest a nucleic acid into fragments of a selected size range.
  • PCR polymerase chain reaction
  • PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates.
  • the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument.
  • a double stranded target nucleic acid may be denatured at a temperature>90° C, primers annealed at a temperature in the range 50-75° C, and primers extended at a temperature in the range 72-78° C.
  • PCR encompasses derivative forms of the reaction, including, but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, and the like.
  • Reaction volumes can range from a few hundred nanoliters, e.g., 200 nL, to a few hundred pL, e. g., 200 pL.
  • Reverse transcription PCR means a PCR that is preceded by a reverse transcription reaction that converts a target RNA to a complementary single stranded DNA, which is then amplified, an example of which is described in Tecott et al., U.S. Pat. No.
  • Real-time PCR means a PCR for which the amount of reaction product, i.e., amplicon, is monitored as the reaction proceeds.
  • Nested PCR means a two-stage PCR wherein the amplicon of a first PCR becomes the sample for a second PCR using a new set of primers, at least one of which binds to an interior location of the first amplicon.
  • initial primers in reference to a nested amplification reaction mean the primers used to generate a first amplicon
  • secondary primers mean the one or more primers used to generate a second, or nested, amplicon.
  • Asymmetric PCR means a PCR wherein one of the two primers employed is in great excess concentration so that the reaction is primarily a linear amplification in which one of the two strands of a target nucleic acid is preferentially copied.
  • the excess concentration of asymmetric PCR primers may be expressed as a concentration ratio. Typical ratios are in the range of from 10 to 100.
  • Multiplexed PCR means a PCR wherein multiple target sequences (or a single target sequence and one or more reference sequences) are simultaneously carried out in the same reaction mixture, e.g., Bernard et al., Anal. Biochem., 273: 221-228 (1999) (two- color real-time PCR). Usually, distinct sets of primers are employed for each sequence being amplified. Typically, the number of target sequences in a multiplex PCR is in the range of from 2 to 50, or from 2 to 40, or from 2 to 30. “Quantitative PCR” means a PCR designed to measure the abundance of one or more specific target sequences in a sample or specimen. Quantitative PCR includes both absolute quantitation and relative quantitation of such target sequences.
  • Quantitative measurements are made using one or more reference sequences or internal standards that may be assayed separately or together with a target sequence.
  • the reference sequence may be endogenous or exogenous to a sample or specimen, and in the latter case, may comprise one or more competitor templates.
  • Typical endogenous reference sequences include segments of transcripts of the following genes: P-actin, GAPDH, p2-microglobulin, ribosomal RNA, and the like.
  • primer as used herein means an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3' end along the template so that an extended duplex is formed.
  • Extension of a primer is usually carried out with a nucleic acid polymerase, such as a DNA or RNA polymerase.
  • the sequence of nucleotides added in the extension process is determined by the sequence of the template polynucleotide.
  • primers are extended by a DNA polymerase.
  • Primers usually have a length in the range of from 14 to 40 nucleotides, or in the range of from 18 to 36 nucleotides. Primers are employed in a variety of nucleic amplification reactions, for example, linear amplification reactions using a single primer, or polymerase chain reactions, employing two or more primers. Guidance for selecting the lengths and sequences of primers for particular applications is well known to those of ordinary skill in the art, as evidenced by the following reference that is incorporated by reference herein in its entirety: Dieffenbach, editor, PCR Primer: A Laboratory Manual, 2nd Edition (Cold Spring Harbor Press, New York, 2003).
  • unique identifier refers to an oligonucleotide that is attached to a polynucleotide or template molecule and is used to identify and/or track the polynucleotide or template in a reaction or a series of reactions.
  • a unique identifier may be attached to the 3'- or 5 '-end of a polynucleotide or template, or it may be inserted into the interior of such polynucleotide or template to form a linear conjugate, sometimes referred to herein as a “tagged polynucleotide,” or “tagged template,” or the like.
  • a unique identifier may vary widely in size and compositions; the following references, which are incorporated herein by reference in their entireties, provide guidance for selecting sets of unique identifiers appropriate for particular embodiments: Brenner, U.S. Pat. No. 5,635,400; Brenner and Macevicz, U.S. Pat. No.
  • Lengths and compositions of unique identifiers can vary widely, and the selection of particular lengths and/or compositions depends on several factors including, without limitation, how unique identifiers are used to generate a readout, e.g., via a hybridization reaction or via an enzymatic reaction, such as sequencing; whether they are labeled, e.g., with a fluorescent dye or the like; the number of distinguishable oligonucleotide identifiers required to unambiguously identify a set of polynucleotides, and the like, and how different the identifiers of a particular set must be in order to ensure reliable identification, e.g., freedom from cross hybridization or misidentification from sequencing errors.
  • unique identifiers can each have a length within a range of from about 2 to about 36 nucleotides, or from about 4 to about 30 nucleotides, or from about 8 to about 20 nucleotides, or from about 6 to about 10 nucleotides.
  • sets of unique identifiers are used, wherein each unique identifiers of a set has a unique nucleotide sequence that differs from that of every other tag of the same set by at least two bases; in another aspect, sets of unique identifiers are used wherein the sequence of each unique identifiers of a set differs from that of every other unique identifiers of the same set by at least three bases.
  • Unique identifiers in accordance with embodiments of the invention can serve many functions.
  • unique sequence tags can include molecular barcode sequences, unique molecular identifier (UMI) sequences, or index sequences.
  • UMI unique molecular identifier
  • unique sequence tags e.g., barcode or index sequences
  • UMI unique molecular identifier
  • index sequences can be used to identify DNA sequences originating from a common source such as a sample type, tissue, subject, or individual.
  • barcodes or index sequences can be used for multiplex sequencing.
  • unique sequence tags e.g., unique molecular identifiers (UMIs)
  • UMIs unique molecular identifiers
  • differing unique molecular identifiers e.g., UMIs
  • ssDNA molecules, dsDNA molecules, or damaged molecules e.g., nicked dsDNA contained in a cfDNA sample.
  • unique molecular identifiers e.g., UMIs
  • the unique molecular identifiers can be used to discriminate between nucleic acid mutations that arise during amplification.
  • the unique sequence tags can be present in a multi-functional nucleic acid adapter, which adapter can comprise both a unique sequence tag and a universal priming site. In some embodiments, unique sequence tags can be greater than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18 nucleic acids in length.
  • ssDNA molecules in a mixture of dsDNA and ssDNA molecules can be tagged with a unique sequence tags (e.g., ssDNA-specific tags, barcodes or UMIs) using an ssDNA ligation protocol and converted to dsDNA prior to preparation of a combined cfDNA library.
  • a unique sequence tags e.g., ssDNA-specific tags, barcodes or UMIs
  • dsDNA molecules in a mixture of dsDNA and ssDNA molecules can be tagged with unique molecular identifiers (e.g., UMIs) in a dsDNA ligation protocol using Y -shaped sequencing adapters and then ssDNA molecules can be tagged with a unique identifiers (e.g., barcode or unique UMI) and converted to dsDNA.
  • unique molecular identifiers e.g., UMIs
  • the methods of the invention involve differential tagging of populations of cfDNA molecules e.g., dsDNA molecules, ssDNA molecules, and nicked dsDNA molecules) in a sample with unique sequence tags to distinguish sequence information derived from one population of cfDNA molecules e.g., dsDNA molecules) from sequence information derived from another population of cfDNA molecules (e.g., ssDNA molecules).
  • Analysis of all populations of cfDNA molecules e.g., dsDNA molecules, ssDNA molecules, and nicked dsDNA molecules
  • ssDNA molecules and/or nicked dsDNA may provide additional valuable insight for cancer detection and screening from a cfDNA sample, and/or may be more representative of tumor content in a cfDNA sample.
  • ssDNA molecules in a mixture of dsDNA and ssDNA molecules can be tagged with a unique sequence tags (e.g., ssDNA-specific tags, barcodes or UMIs) using an ssDNA ligation protocol and converted to dsDNA prior to preparation of a combined cfDNA library.
  • a unique sequence tags e.g., ssDNA-specific tags, barcodes or UMIs
  • dsDNA molecules in a mixture of dsDNA and ssDNA molecules can be tagged with unique sequence tags (e.g., UMIs) in a dsDNA ligation protocol using Y-shaped sequencing adapters (also referred to herein as “Y adapters”) and then ssDNA molecules can be tagged with a unique sequence tags (e.g., barcode or unique UMI) and converted to dsDNA.
  • unique sequence tags e.g., UMIs
  • Y adapters also referred to herein as “Y adapters”
  • the incorporated unique sequences tags and ssDNA-specific tag can be used to distinguish sequencing reads as being originally derived from dsDNA or ssDNA in a cfDNA sample.
  • the incorporated unique sequences tags e.g., UMIs
  • ssDNA-specific tags e.g., barcodes or UMIs
  • the incorporated unique sequences tags are used to reduce error introduced by amplification, library preparation, and/or sequencing.
  • sensitivity refers to the ability of a diagnostic assay to correctly identify subjects with a condition of interest.
  • specificity refers to the ability of a diagnostic assay to correctly identify subjects without a condition of interest.
  • the term “subject” refers to any living or non-living organism, including but not limited to a human e.g., a male human, female human, fetus, pregnant female, child, or the like), a non-human animal, a plant, a bacterium, a fungus or a protist.
  • Any human or non-human animal can serve as a subject, including but not limited to mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g., cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g., pig), camelid (e.g., camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark.
  • a subject is a male or female of any age (e.g., a man, a women or a child).
  • Detecting a target sequence in a nucleic acid can be a challenge when the target sequence is present at a low frequency in the nucleic acid sample.
  • the instant disclosure provides methods that improve detection of nucleic acids containing a target sequence, e.g., rare target sequences, by isolating and/or enriching such target sequences in a nucleic acid sample.
  • the rare target sequence may be in a nucleic acid sequence from a cfDNA sample, such as a cfDNA sample that has been treated with bisulfite or chemical conversion to convert cytosines to uracils to preserve information regarding the methylation status of a particular nucleic acid sequence (e.g., comprising a CpG site), in a subject.
  • the target sequence may be indicative of the risk of developing or the presence of cancer in the subject from whom the sample was taken.
  • the target sequence is present in a nucleic acid library (e.g., the sample may be a nucleic acid library), and the methods described herein enrich the target sequence in the nucleic acid library.
  • a nucleic acid target sequence can be isolated and/or enriched by cutting a nucleic acid molecule that includes the target sequence to form a single-stranded overhang, filling in the overhang with a label, and capturing the nucleic acid molecule that includes the target by contacting at least one of the labeled nucleotides with a capture domain, thereby isolating and/or enriching the target sequence.
  • a nucleic acid molecule that includes the target molecule can be enriched by separating labeled nucleic acids from unlabeled nucleic acids without the use of a capture domain.
  • labeled nucleic acids can be separated from unlabeled nucleic acids using a method that sorts fluorescent molecules away from non-fluorescent molecules and/or by using magnetic fields to sort a nucleic acid labeled with a magnetic label away from non-labeled nucleic acids.
  • Nucleic acids containing or suspected of containing a target sequence can be contacted with an enzyme that (1) recognizes a target sequence and (2) cuts (cleaves) the nucleic acid molecules that contain the target sequence with a nuclease.
  • the nuclease cleaves the nucleic acid within the target sequence.
  • the nuclease cleaves the nucleic acid near the target sequence.
  • the nuclease may cleave the nucleic acid within about 1 nt to about 20 nt, about 2 nt to about 20 nt, about 5 nt to about 20 nt, about 10 nt to about 20 nt, about 15 nt to about 20 nt, about 1 nt to about 15 nt, about 2 nt to about 15 nt, about 5 nt to about 15 nt, about 10 nt to about 15 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 5 nt, about 2 nt to about 5
  • Nucleases suitable for use herein generate a staggered cut in the nucleic acid, leaving a single stranded overhang of unpaired nucleotides.
  • the overhang may be any length, for example, between 1 and 3 nt, between 1 and 5 nt, between 1 and 10 nt, between 1 and 15 nt, between 3 and 5 nt, between 3 and 10 nt, between 3 and 15 nt, or between 5 and 10 nt, between 5 and 15 nt, between 10 and 15 nt, or about 2 nt, about 3 nt, about 4 nt, about 5 nt, about 6 nt, about 7 nt, about 8 nt, about 9 nt, about 10 nt, about 11 nt, about 12 nt, about 13 nt, about 14 nt, or about 15 nt.
  • nucleases include type II and type V CRISPR-Cas nucleases.
  • the nuclease is a Cas9 or Casl2 nuclease, or a variant thereof (see, e.g., Liu et al. (2019) Nature Communications 10; Article 5524).
  • the nuclease is a Casl2a/Cpfl nuclease.
  • the nuclease is a MAD7 nuclease.
  • nuclease is a CasX nuclease.
  • the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence that binds to the target sequence.
  • gRNA guide RNA
  • two or more nucleases are used in the methods described herein. In certain embodiments, the two or more nucleases are used sequentially. In certain embodiments, the two or more nucleases are used are used simultaneously. In certain embodiments, two or more nucleases are used to increase the number or variety of target sequences that can be enriched. For example, in certain embodiments, if one or more target sequences is not located near a Casl2 (e.g., Casl2a) PAM site, a second nuclease e.g., a Cas9 or a CasX nuclease) may be used if the second nuclease has a PAM site near the remaining target sequences.
  • a Casl2 e.g., Casl2a
  • a second nuclease e.g., a Cas9 or a CasX nuclease
  • An enrichment method may include one or more of the steps of: (1) designing guide RNAs to bind target sequences for cleavage with Casl2 (e.g., Casl2a), (2) designing guide RNAs to bind target sequences for cleavage with Cas9 (e.g., for additional target sequence that are not near a Casl2 PAM), (3) adding a mixture of guide sequences, Casl2, Cas9 to the nucleic acid comprising a plurality of target sequences.
  • Casl2 e.g., Casl2a
  • Cas9 e.g., for additional target sequence that are not near a Casl2 PAM
  • an end-repair step may be performed prior to a cutting step.
  • an end-repair step can be performed to blunt-end repair any overhangs unrelated to the target sequence, prior to the cutting step.
  • the overhangs are filled in using a polymerase, such as a DNA polymerase.
  • the DNA polymerase I consists of the Klenow fragment.
  • the polymerase reaction includes nucleotides (free nucleotides), such as dNTPs, which are used by the DNA polymerase to fill in the overhangs.
  • Klenow fragment can be used in an amount of from about 0.01 units/pL to about 1 unit/pL, for example, from about 0.05 units/pL to about 0.5 units/pL, from about 0.075 units/pL to about 0.125 units/pL or at about 0.1 unit/pL.
  • the nucleotides are associated with (e.g., bound to) a label, which allows for the separation and/or isolation of the nucleic acid comprising the target sequence from other nucleic acids not containing the target sequence.
  • the label comprises a fluorophore, a magnetic moiety, biotin or digoxigenin.
  • a labeled e.g., biotin-labeled nucleic acid comprising a target sequence is exposed to a capture domain e.g., avidin), forming a capture domain-label- nucleic acid target complex.
  • the capture domain can be bound to a solid support, such as a bead.
  • the beads will be bound to the capture domain-label-nucleic acid target complex, which can be separated from non-target sequence from the nucleic acid comprising the target sequence, e.g. , by a wash step.
  • the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
  • the solid support is a bead, a well, a tube, or a slide.
  • the steps of the methods described herein, such as the enrichment method, including the cutting step, can be performed at a variety of temperatures, including but not limited to room temperature and/or about 20°C to about 45°C, about 20°C, about 25°C, about 30°C, about 35°C, about 37°C, about 40°C, about 45°C, or any ranges therein (e.g., about 20°C to about 25°C, about 25°C to about 37°C, about 35°C to about 45°C, and so on).
  • room temperature including but not limited to room temperature and/or about 20°C to about 45°C, about 20°C, about 25°C, about 30°C, about 35°C, about 37°C, about 40°C, about 45°C, or any ranges therein (e.g., about 20°C to about 25°C, about 25°C to about 37°C, about 35°C to about 45°C, and so on).
  • the target sequence or a subset of target sequences in the nucleic acid can be enriched using one or more additional enrichment steps.
  • the one or more additional enrichments steps can be performed using any enrichment method known in the art. Non-limiting examples include hybrid capture and use of DNA-binding proteins to enrich a target sequence or a subset of target sequences.
  • the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to a nucleic acid enrichment method that includes cutting the nucleic acid molecules that include the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the molecules that include the one or more e.g., the plurality of) target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more e.g., the plurality of) target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules.
  • a nucleic acid enrichment method that includes cutting the nucleic acid molecules that include the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the molecules
  • the method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to one or more additional enrichment steps to enrich for a subset of the target sequences.
  • the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
  • the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to an enrichment step to enrich for the one or more (e.g., the plurality of) target sequences.
  • the method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to a nucleic acid enrichment method that includes cutting the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the target sequences or a subset of the target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more target sequences or subset of target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules.
  • a nucleic acid enrichment method that includes cutting the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the target sequences or a subset of the target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one
  • the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
  • a methylation pattern e.g., an epitype
  • a target sequence or a subset of target sequences in the nucleic acid can be enriched by subjecting the nucleic acid comprising the target sequence or the subset of target sequences to hybrid capture.
  • labeled (e.g., biotinylated) capture probes that can bind to one or more target sequences or subsets of target sequences are exposed to the nucleic acid comprising the one or more target sequences.
  • the capture probes are specific to a sequence of interest, for example, a methylation pattern of interest that can be detected as a bisulfite-converted epitype. Examples of such hybrid capture probe sets include the KAPA HyperPrep Kit and SeqCAP Epi Enrichment System from Roche Diagnostics (Pleasanton, CA).
  • Hybrid capture can be performed before or after enrichment using the targeted cutting and overhang filling method described above.
  • the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to a nucleic acid enrichment method that includes cutting the nucleic acid molecules that include the one or more e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the molecules that include the one or more e.g., the plurality of) target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more (e.g., the plurality of) target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules.
  • the method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to hybrid capture to enrich for a subset of the target sequences.
  • the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
  • the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to hybrid capture to enrich for the one or more (e.g., the plurality of) target sequences.
  • the method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to a nucleic acid enrichment method that includes cutting the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the target sequences or a subset of the target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more target sequences or subset of target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules.
  • a nucleic acid enrichment method that includes cutting the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the target sequences or a subset of the target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one
  • the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
  • a methylation pattern e.g., an epitype
  • a target sequence or a subset of target sequences in the nucleic acid can be enriched by subjecting the nucleic acid comprising the target sequence to nucleic acid binding proteins (also referred to herein as protein binders).
  • the nucleic acid binding protein may bind a particular sequence or may bind to methylated CpGs.
  • Exemplary nucleic acid binding proteins that bind to a particular sequence include transcription factors and nuclease deficient CRISPR enzymes (e.g., DCas9).
  • Exemplary DNA binding proteins that bind methylated CpGs include methyl-CpG-binding domain (MBD) proteins such as MECP2 (methyl-CpG-binding protein 2), MBD1, MBD2, MBD3, MBD4, MBD5, MBD6, the Kaiso family proteins, and the SET- and Ring finger-associated (SRA) domain family.
  • MBD protein is selected from MECP2, MBD1, MBD2, and MBD4. See, e.g., Du et al. (2015) Epigenomics 7(6): 1051-1073, incorporated by reference herein for all purposes.
  • a nucleic acid comprising a target sequence is exposed to a protein comprising a nucleic acid binding protein, which binds to a target sequence or a subset of target sequences.
  • the target sequence or subset of target sequences can be enriched by isolating the target sequence- nucleic acid binding protein complex, for example, using an antibody to the nucleic acid binding protein.
  • the nucleic acid binding protein is attached to a label.
  • the label comprises a fluorophore, biotin or digoxigenin.
  • the label binds a capture domain that can be used to isolate and/or separate the target nucleic acid or subset of target nucleic acids from a sample.
  • the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
  • the solid support is a bead, a well, a tube, or a slide.
  • a nucleic acid comprising a target sequence is exposed to a protein comprising a methyl-CpG-binding domain (MBD), which binds to a methylated CpG within the target sequence.
  • the target sequence can be enriched by isolating the target sequence MDB complex. Because CGIs are typically not methylated, use of a nucleic acid binding protein enrichment using an MBD would enrich for methylated fragments, for example, rare methylated fragments.
  • the MBD is MBD3, which binds to 5- hydroxymethylcytosine.
  • the method enriches for hydroxymethylcytosine-containing fragments of a CGI.
  • Enrichment using a nucleic acid binding protein can be performed before or after enrichment using the targeted cutting and overhang filling method described above.
  • the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to a nucleic acid enrichment method that includes cutting the nucleic acid molecules that include the one or more (e.g.
  • the plurality of) target sequences to generate single stranded overhangs at the cut end of the molecules that include the one or more e.g., the plurality of) target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more (e.g., the plurality of) target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules.
  • the method comprises a second step of combining the one or more (e.g., the plurality of) target molecules with a nucleic acid binding protein to enrich for a subset of the target sequences.
  • the nucleic acid binding protein binds to the one or more (e.g., the plurality of) target molecules and the nucleic acid binding protein-target complex is isolated as described above.
  • the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
  • the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to hybrid capture to enrich for the one or more (e.g., the plurality of) target sequences.
  • the method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to a nucleic acid enrichment method that includes cutting the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the target sequences or a subset of the target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more target sequences or subset of target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules.
  • the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
  • the method can include amplifying the nucleic acid molecule, e.g., by PCR. Amplification can occur while the nucleic acid is in contact with the capture domain, or the nucleic acid can be removed from the capture domain e.g., by heat elution, a chemical agent, mechanical disruption, or combinations thereof) prior to amplification.
  • the disclosure relates to a method for producing a nucleic acid library enriched for regions of interest (i.e., targets).
  • regions of interest are enriched prior to making the library.
  • the library is made and then regions of interest present in the library are enriched.
  • a method of making a nucleic acid library enriched for regions of interest can include obtaining a sample comprising a plurality of nucleic acids, wherein the plurality of nucleic acids comprise regions of interest.
  • methylated cytosines are converted to uracils.
  • Adapters are added to the plurality of nucleic acids to form a nucleic acid library.
  • the plurality of nucleic acid molecules having regions of interest can be cut, e.g., by a nuclease, to generate single stranded overhangs at cut ends of the molecules that include the regions of interest. The overhangs are filled in, e.g.
  • nucleic acids containing the regions of interest can be amplified to form the nucleic acid library enriched for regions of interest.
  • FIG. 1 An exemplary method of making a nucleic acid library enriched for regions of interest is shown in FIG. 1.
  • a cell-free DNA (cfDNA) sample comprising methylated nucleotides that have been converted using bisulfite treatment are used to construct a nucleic acid library.
  • sgRNAs complementary to target sequences are constructed, and the library is exposed to Casl2 and the sgRNAs.
  • the sgRNAs direct Casl2 to the target sequence and cleave the DNA, leaving an overhang.
  • Biotinylated dNTPs are added with a polymerase (Klenow fragment) to fill in the nucleotides complementary to the overhang.
  • Streptavidin beads bind to the biotinylated nucleotides of the target sequences, and any uncut, off-target sequences are washed away. A PCR amplification is then performed to amplify the enriched target sequences that have been bound to the bead.
  • Enriched libraries can also be made by enriching for regions of interest prior to making the library.
  • a sample is obtained which comprises a plurality of nucleic acids, wherein a subset of the plurality of nucleic acids comprise or are suspected to comprise regions of interest.
  • the subset of the plurality of nucleic acid molecules having regions of interest are cut to generate single stranded overhangs at cut ends of the molecules that include the regions of interest.
  • the overhangs are filled in, e.g., using a polymerase, with a least one labeled nucleotide.
  • the nucleic acids that include the regions of interest are then enriched by contacting the labeled nucleic acids with capture domains which can be used to separate and/or isolate the labeled nucleic acids from unlabeled nucleic acids.
  • the nucleic acids that include the regions of interest are removed from the capture domains.
  • the nucleic acids are treated to convert methylated cytosines to uracils, to preserve information about the methylation state of the nucleic acids.
  • Nucleic acid adapters are added to the plurality of nucleic acids to form the nucleic acid library enriched for regions of interest.
  • FIG. 2 Another exemplary method of making a nucleic acid library enriched for regions of interest is shown in FIG. 2.
  • a cell-free DNA (cfDNA) sample is exposed to Casl2 and sgRNAs complementary to target sequences in a target region.
  • the sgRNAs direct Casl2 to the target sequence, and cleave the DNA, leaving an overhang.
  • Biotinylated dNTPs are added with a polymerase (Klenow fragment) to fill in the nucleotides complementary to the overhang.
  • Streptavidin beads bind to the biotinylated nucleotides present in the target region, and any uncut, off-target sequences are washed away.
  • the enriched target sequences are eluted off of the beads using heat treatment.
  • the eluted target sequences are then treated with bisulfite to preserve methylation information.
  • the bisulfite converted target sequences are used to construct a library that is enriched for target sequences.
  • nucleic acids containing or suspected of containing a target sequence are contacted with an enzyme that (1) recognizes a region of interest (z.e., target sequence) and (2) cuts (cleaves) the nucleic acid molecules that contain the target sequence with a nuclease.
  • the nuclease cleaves the nucleic acid within the target sequence.
  • the nuclease cleaves the nucleic acid near the target sequence.
  • the nuclease may cleave the nucleic acid within about 1 nt to about 20 nt, about 2 nt to about 20 nt, about 5 nt to about 20 nt, about 10 nt to about 20 nt, about 15 nt to about 20 nt, about 1 nt to about 15 nt, about 2 nt to about 15 nt, about 5 nt to about 15 nt, about 10 nt to about 15 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 5 nt, about 2 nt to about 5
  • Nucleases suitable for use herein generate a staggered cut in the nucleic acid, leaving a single stranded overhang of unpaired nucleotides.
  • the overhang may be any length, for example, between 1 and 3 nt, between 1 and 5 nt, between 1 and 10 nt, between 1 and 15 nt, between 3 and 5 nt, between 3 and 10 nt, between 3 and 15 nt, or between 5 and 10 nt, between 5 and 15 nt, between 10 and 15 nt, or about 2 nt, about 3 nt, about 4 nt, about 5 nt, about 6 nt, about 7 nt, about 8 nt, about 9 nt, about 10 nt, about 11 nt, about 12 nt, about 13 nt, about 14 nt, or about 15 nt.
  • nucleases include type II and type V CRISPR-Cas nucleases.
  • the nuclease is a Cas9 or Casl2 nuclease, or a variant thereof (see, e.g., Liu et al. (2019) supra).
  • the nuclease is a Casl2a/Cpfl nuclease.
  • the nuclease is a MAD7 nuclease.
  • nuclease is a CasX nuclease.
  • the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence that binds to the target sequence.
  • gRNA guide RNA
  • overhangs are filled in using a polymerase, such as a DNA polymerase.
  • the DNA polymerase I consists of the Klenow fragment.
  • the polymerase reaction includes nucleotides (free nucleotides), such as dNTPs, which are used by the DNA polymerase to fill in the overhangs.
  • at least one nucleotide comprises a label.
  • the label comprises a fluorophore, biotin or digoxigenin.
  • the target nucleic acid is enriched by isolating and/or separating the labeled nucleic acid.
  • the label binds to a capture domain.
  • the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
  • the capture moiety comprises or is connected to a solid support.
  • the solid support is a bead, a well, a tube, or a slide.
  • the target sequence or a subset of target sequences in the nucleic acid can be enriched by subjecting the nucleic acid comprising the target sequence to hybrid capture.
  • hybrid capture can be performed before or after enrichment using the targeted cutting and overhang filling method described above.
  • Hybrid capture can be performed before or after addition of adaptors.
  • Hybrid capture can be performed before or after conversion of nucleotides (e.g., by bisulfite conversion).
  • hybrid capture enriches for target sequences in a genomic region of interest and targeted cutting and overhang filling enriches for a subset of the target sequences that contains a methylation pattern e.g., an epitype) of interest.
  • targeted cutting and overhang filling enriches for target sequences in a genomic region of interest and hybrid capture enriches for a subset of the target sequences that contains a methylation pattern e.g., an epitype) of interest.
  • the method can include amplifying the nucleic acid molecule. Amplification can occur while the nucleic acid is in contact with the capture domain, or the nucleic acid can be removed from the capture domain (e.g., by heat elution, a chemical agent, mechanical disruption, or combinations thereof) prior to amplification.
  • adaptors can be attached to a nucleic acid by any means known in the art, for example, as are used in connection with next generation sequencing (NGS).
  • NGS next generation sequencing
  • adapters such as a Y adapter
  • the adapter is attached by nucleic acid amplification of the cell-free nucleic acid using a primer comprising the adapter.
  • the adapter comprises one or more of a flow cell binding site, an index, a unique molecular identifier (UMI), and a sequencing binding site.
  • UMI unique molecular identifier
  • the disclosure relates to a nucleic acid library, produced by the methods described herein.
  • Nucleic acids used in the methods described herein can be derived from any source, such as a sample taken from the environment or from a subject (e.g., a human subject).
  • a biological sample can be treated to physically disrupt tissue or cell structure (e.g. , centrifugation and/or cell lysis), thus releasing intracellular components into a solution which can further contain enzymes, buffers, salts, detergents, and the like which can be used to prepare the sample for analysis.
  • a biological sample can take any of a variety of forms, such as a liquid biopsy e.g., blood, urine, stool, saliva, or mucous), or a tissue biopsy, or other solid biopsy.
  • biological samples include, but are not limited to, blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of the subject.
  • a biological sample can include any tissue or material derived from a living or dead subject.
  • a biological sample can be a cell-free sample.
  • a sample can be a liquid sample or a solid sample (e.g., a cell or tissue sample).
  • a biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g., of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g., thyroid, breast), etc.
  • a bodily fluid such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g., of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g., thyroid, breast), etc.
  • the nucleic acid can be of any composition form, such as deoxyribonucleic acid (DNA, e.g., complementary DNA (cDNA), genomic DNA (gDNA) and the like), and/or DNA analogs (e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like), and/or ribonucleic acid (RNA) and or RNA analogs, all of which can be in single- or doublestranded form.
  • DNA deoxyribonucleic acid
  • cDNA complementary DNA
  • genomic DNA gDNA
  • DNA analogs e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like
  • RNA ribonucleic acid
  • single-stranded nucleic acids can be made double stranded prior to cutting with an enzyme.
  • nucleic acid can comprise known analogs of natural nucleotides, some of which can function in a similar manner as naturally occurring nucleotides.
  • a nucleic acid can be in any form useful for conducting processes herein (e.g., linear, circular, supercoiled, single-stranded, double-stranded and the like).
  • a nucleic acid in some embodiments can be from a single chromosome or fragment thereof (e.g. , a nucleic acid sample may be from one chromosome of a sample obtained from a diploid organism).
  • nucleic acids comprise nucleosomes, fragments or parts of nucleosomes or nucleosome-like structures.
  • Nucleic acids can comprise protein e.g., histones, DNA binding proteins, and the like). Nucleic acids analyzed by processes described herein can be substantially isolated and are not substantially associated with protein or other molecules. Nucleic acids can also include derivatives, variants and analogs of DNA synthesized, replicated or amplified from single-stranded (“sense” or “antisense,” “plus” strand or “minus” strand, “forward” reading frame or “reverse” reading frame) and double-stranded polynucleotides. Deoxyribonucleotides can include deoxyadenosine, deoxycytidine, deoxyguanosine and deoxythymidine.
  • a nucleic acid may be prepared using a nucleic acid obtained from a subject as a template.
  • the nucleic acid is a cell-free nucleic acid, which can be found in bodily fluids such as blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of a subject.
  • a plasma sample can be used directly in the methods disclosed herein (for example, in the cutting step), without prior purification or isolation of nucleic acids in the plasma.
  • Cell-free nucleic acids originate from one or more healthy cells and/or from one or more cancer cells, or from non-human sources such bacteria, fungi, viruses.
  • cell-free nucleic acids examples include but are not limited to cell-free DNA (“cfDNA”), including mitochondrial DNA or genomic DNA, and cell-free RNA.
  • cfDNA cell-free DNA
  • instruments for assessing the quality of the cell-free nucleic acids such as the TapeStation System from Agilent Technologies (Santa Clara, CA) can be used. Concentrating low- abundance cfDNA can be accomplished, for example using a Qubit Fluorometer from Thermofisher Scientific (Waltham, MA).
  • the majority of DNA in a biological sample that has been enriched for cell-free DNA can be cell-free e.g., greater than 50%, 60%, 70%, 80%, 90%, 95%, or 99% of the DNA can be cell-free).
  • a methylated nucleic acid is a nucleic acid having a modification in which a hydrogen atom on the pyrimidine ring of a cytosine base is converted to a methyl group, forming 5 -methylcytosine.
  • Methylation can occur at dinucleotides of cytosine and guanine referred to herein as “CpG sites”, which can be a target for enrichment.
  • Methylation of cytosine can occur in cytosines in other sequence contexts, for example, 5'-CHG-3' and 5'-CHH-3', where H is adenine, cytosine or thymine. Cytosine methylation can also be in the form of 5- hydroxymethylcytosine.
  • Methylation of DNA can include methylation of non-cytosine nucleotides, such as /'/’-methyl adenine (6mA).
  • Anomalous cfDNA methylation can be identified as hypermethylation or hypomethylation, both of which may be indicative of cancer status.
  • DNA methylation anomalies compared to healthy controls
  • the nucleic acid comprises a CpG site (z'.e. , cytosine and guanine separated by only one phosphate group).
  • the nucleic acid comprises a CpG island (also referred to as a “CG islands” or “CGI”) or a portion thereof, which is the target for enrichment. Because certain CGIs and certain features of certain CGIs in tumor cells tend to be different from the same CGIs or features of the CGIs in healthy cells, detection of such CGIs can be informative of a health condition.
  • the CGI is a “cancer informative CGIs”, which is defined and described in more detail below.
  • the CpG is an “informative CpG”, e.g., a “cancer informative CGI”.
  • Such CGIs may have methylation patterns in tumor cells that are different from the methylation patterns in healthy cells. Accordingly, detection of a cancer informative CGI can be informative regarding a subject’s risk of developing cancer or can be indicative that the subject has cancer.
  • Exemplary cancer informative CGIs, which can be target sequences as described herein, are identified in, e.g., Table 1 of U.S. Patent Publication 2020/0109456A1, Tables 2 and 3 of WO2022/133315, and TABLES 1-4 provided herein.
  • the nucleic acids of the invention have been treated to convert one or more unmethylated nucleotides (e.g., cytosines) to another nucleotide (a “converted nucleotide”, as used herein, such as a uracil), for example, prior to amplification.
  • one or more unmethylated cytosines are converted to a nucleotide that pairs with adenine e.g., the unmethylated cytosine may be converted to uracil).
  • one or more unmethylated adenines are converted to a base that pairs with cytosine e.g., the unmethylated adenine may be converted to inosine (I)).
  • one or more methylated cytosines e.g., a 5 -methylcytosine (5mC)
  • methylated cytosines are protected from conversion (e.g., deamination) during the conversion step.
  • the nucleic acid may be amplified. During amplification, the converted nucleotide pairs with its complementary nucleotide, and in the next round of amplification, the complementary nucleotide pairs with a replacement nucleotide. For example, following the conversion of an unmethylated cytosine to a uracil, the nucleic acid may be amplified such that an adenine pairs with the uracil in the first round of replication, and in the second round of replication, the adenine pairs with a thymine. Accordingly, the thymine replaces the uracil in the original nucleic acid sequence, and is referred to herein as a “replacement nucleotide”.
  • the nucleic acids of the invention have been selectively deaminated.
  • Selective deamination refers to a process in which unmethylated cytosine residues are selectively deaminated over methylated cytosine (5-methylcytosine) residues.
  • deamination of cytosine forms uracil, effectively inducing a C to T point mutation to allow for detection of methylated cytosines.
  • Methods of deaminating cytosine are known in the art, and include bisulfite conversion and enzymatic conversion.
  • the enzymatic conversion comprises subjecting the nucleic acid to TET2, which oxidizes methylated cytosines, thereby protecting them, and subsequent exposure to APOBEC, which converts unprotected (i.e., unmethylated) cytosines to uracils.
  • the conversion for example, bisulfite conversion or enzymatic conversion, uses commercially available kits.
  • Bisulfite conversion can be performed using commercially available technologies, such as EZ DNA Methylation-Gold, EZ DNAMethylation-Direct or an EZ DNAMethylation-Lighting kit (Zymo Research Corp (Irvine, California)) or EpiTect Fast available from Qiagen (Germantown, MD).
  • a kit such as APOBECSeq (NEBiolabs) or OneStep qMethyl-PCR Kit (Zymo Research Corp (Irvine, California)) is used.
  • Bisulfite conversion can be performed using commercially available technologies, such as EZ DNA Methy
  • the methods include treatment of the sample with bisulfite (e.g., sodium bisulfite, potassium bisulfite, ammonium bisulfite, magnesium bisulfite, sodium metabisulfite, potassium metabisulfite, ammonium metabisulfite, magnesium metabisulfite and the like).
  • bisulfite e.g., sodium bisulfite, potassium bisulfite, ammonium bisulfite, magnesium bisulfite, sodium metabisulfite, potassium metabisulfite, ammonium metabisulfite, magnesium metabisulfite and the like.
  • Unmethylated cytosine is converted to uracil through a three-step process during sodium bisulfite modification. As shown in FIG.
  • the steps are sulphonation to convert cytosine to cytosine sulphonate, deamination to convert cytosine sulphonate to uracil sulphonate and alkali desulfonation to convert uracil sulphonate to uracil.
  • Conversion on methylated cytosine is much slower and is not observed at significant levels in a 4-16 hour reaction. (See Clark et al., Nucleic Acids Res., 22(15):2990-7 (1994).) If the cytosine is methylated it will remain a methylated cytosine. If the cytosine is unmethylated it will be converted to uracil.
  • a G When the modified strand is copied, for example, through extension of a locus specific primer, a random or degenerate primer or a primer to an adaptor, a G will be incorporated in the interrogation position (opposite the C being interrogated) if the C was methylated and an A will be incorporated in the interrogation position if the C was unmethylated and converted to U.
  • the enzymatic treatment with a cytidine deaminase enzyme is used to convert cytosine to uracil.
  • Enzymatic conversion can include an oxidation step, in which Tet methylcytosine dioxygenase 2 (TET2) catalyzes the oxidation of 5mC to 5hmC to protect methylated cytosines from conversion by subsequent exposure to a cytidine deaminase.
  • TET2 Tet methylcytosine dioxygenase 2
  • Other protection steps known in the art can be used in addition to or in place of oxidation by TET2.
  • the nucleic acid is treated with the cytidine deaminase to convert one or more unmethylated cytosines to uracils.
  • the cytidine deaminase may be APOBEC.
  • the cytidine deaminase includes activation induced cytidine deaminase (AID) and apolipoprotein B mRNA editing enzymes, catalytic polypeptide-like (APOBEC).
  • the APOBEC enzyme is selected from the human APOBEC family consisting of: APOBEC- 1 (Apol), APOBEC-2 (Apo2), AID, APOBEC-3A, -3B, -3C, -3DE, -3F, -3G, -3H and APOBEC-4 (Apo4).
  • the APOBEC enzyme is APOBEC-seq. iii. Nitrite Conversion
  • nitrite treatment is used to deaminate adenine and cytosine.
  • Deamination of an A results in conversion to an inosine (I), which is read by a polymerase as a G
  • deamination of a methylated A results in a nitrosylated 6mA (6mA-N0), which causes the base to be read by a polymerase as an A.
  • Deamination of a C results in conversion to a uracil, which is read by a polymerase as a T
  • deamination of a A 4 -methylcytosine (4mC) to 4mC-N0 or a 5-methylcytosine (5mC) to a T causes the base to be read by a polymerase as a C or a T, respectively.
  • the C to T ratio at the 5mC position is about 40% higher than other cytosine positions, allowing 5mC to be differentiated from C.
  • RNAs Guide RNAs (gRNAs, sgRNAs)
  • a “guide RNA” (“gRNA”) is a type of RNA that includes a CRISPR RNA sequence (crRNA, also referred to as a “guide sequence” or “spacer”), and, in certain embodiments, a trans-activating CRISPR RNA sequence (tracrRNA).
  • the tracrRNA if present, binds to an endonuclease (e.g., a CRISPR enzyme) and the crRNA is complementary to a target sequence.
  • the guide RNA is referred to as a single guide RNA (sgRNA), which refers to a guide RNA comprising both a crRNA and tracrRNA.
  • a guide sequence can be designed to have complementarity to a target sequence of the disclosure, where hybridization between a target sequence and a guide sequence promotes the formation of a gene editing endonuclease complex (e.g., a CRISPR complex).
  • a gene editing endonuclease complex e.g., a CRISPR complex
  • Full complementarity may not be required, provided there is sufficient complementarity to cause hybridization and promote formation of a gene editing endonuclease complex (e.g., a CRISPR complex).
  • the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • the guide sequence and the target sequence exhibit full (100%) complementarity.
  • Optimal alignment of the polyribonucleotide to the target sequence may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows -Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina®, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
  • any suitable algorithm for aligning sequences include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows -Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina®, San Diego, Calif
  • a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length.
  • the ability of a guide sequence to direct sequence- specific binding of a CRISPR complex to a target sequence may be assessed by any suitable assay.
  • the components of a CRISPR system sufficient to form a CRISPR complex, including the guide sequence to be tested may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the CRISPR sequence, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein.
  • cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a CRISPR complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions.
  • Other assays are possible, and will occur to those skilled in the art.
  • the nuclease used in the methods described herein can be an endonuclease, for example, a Cas protein, that is capable of cleaving DNA to effect a staggered break at the intended locus, wherein the break results in an overhang.
  • Non-limiting examples of Cas proteins that are capable of cleaving DNA to effect a staggered break at the intended locus, wherein the break results in an overhang include type V CRISPR enzymes such as Casl2, Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2fl, Casl2g, Casl2h, Casl2i, homologs thereof, or modified versions thereof, (see, e.g., Liu et al. (2019) supra).
  • the DNA endonuclease is a Cas 12 endonuclease that effects a staggered break at a locus within or near a target sequence, producing a 1-5 nt overhang.
  • Cas 12 recognizes a 5’-T-rich PAM, such as TTN or TTTN.
  • the endonuclease is a Casl2a/Cpfl endonuclease; a homolog thereof, a recombinant of the naturally occurring molecule thereof, a codon-optimized version thereof, a modified version thereof, and combinations of any of the foregoing.
  • the Casl2a/Cpfl endonuclease can be derived from a variety of bacterial species.
  • the Casl2a/Cpfl endonuclease is derived from Acidaminococcus bacteria or Lachnospiraceae bacteria.
  • the Casl2a/Cpfl endonuclease is a Lachnospiraceae bacterium ND2006 Cpf 1.
  • the endonuclease is a MAD7 endonuclease, a homolog thereof, a recombinant of the naturally occurring molecule thereof, a codon-optimized version thereof, a modified version thereof, and combinations of any of the foregoing.
  • MAD7 is a codon optimized endonuclease can be derived from Eubacterium rectale (Inscripta, Boulder, CO.) MAD7 is described in U.S. Patent No. 9,982,279.
  • Cas9 enzyme a type II CRISPR enzyme that recognizes a 3’-G-rich PAM such as NGG
  • the endonuclease is a Cas9 protein.
  • CasX nuclease Another nuclease capable of cleaving DNA to effect a staggered break at the intended locus, wherein the break results in an overhang, is a CasX nuclease.
  • CasX recognizes a 5’-TTCN PAM and is capable of creating 10-nt overhangs.
  • the endonuclease (e.g., a CRISPR enzyme) directs cleavage at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the endonuclease directs cleavage within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
  • CRISPR system refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus.
  • one or more elements of a CRISPR system is derived from a type I, type II, or type III CRISPR system.
  • one or more elements of a CRISPR system is derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes.
  • a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system).
  • kits for enriching a target nucleic acid and/or making an enriched nucleic acid library may include a nuclease that cuts a nucleic acid molecule including a target sequence to generate a single stranded overhand at a cut end of the molecule that includes the target; labeled dNTPs; DNA polymerase; and a capture moiety comprising a capture domain.
  • the kit includes a nuclease, such as a CRISPR-Cas nuclease.
  • the nuclease is a type II CRISPR-Cas nuclease.
  • the nuclease is a Cas9 nuclease.
  • the nuclease is a type V CRISPR-Cas nuclease.
  • the nuclease is a Casl2 nuclease.
  • the nuclease is a Casl2a/Cpfl nuclease.
  • the nuclease is a MAD7 nuclease.
  • the nuclease is a CasX nuclease.
  • the DNA polymerase is DNA polymerase I.
  • the DNA polymerase I consists of the Klenow fragment.
  • the label comprises biotin, digoxigenin, a magnetic moiety or a fluorophore.
  • the capture moiety comprises avidin, streptavidin, or a DIG-binding molecule. In certain embodiments, the capture moiety comprises or is connected to a solid support.
  • Kits contemplated herein may further include a solid support, such as a bead, a well, a tube, or a slide.
  • the capture domain comprises streptavidin connected to a bead.
  • This example describes an exemplary method for target cleavage (e.g., at a CpG site) using a gene editing system (CRISPR-Casl2a), for use in an enrichment method provided herein.
  • CRISPR-Casl2a a gene editing system
  • DNA samples (either genomic DNA (gDNA) or sheared genomic DNA (shDNA)) comprising a target sequence
  • the shDNA was sheared to approximately 180bp to serve as a model for cfDNA.
  • Herring DNA lacking the target sequence of interest was used as a negative control.
  • An amplicon containing the target sequence of interest (HPRT control target, or one or six experimental CpG sites) was generated using PCR (New England BioLabs® LongAmp®) and purified using solid-phase reversible immobilization (SPRI).
  • a human DNA sample comprising a target sequence of interest was obtained.
  • An amplicon containing the target sequence of interest was generated using PCR (New England BioLabs® LongAmp®) and purified using solid-phase reversible immobilization (SPRI).
  • Casl2a is capable of cutting a target DNA sequence in the presence of plasma, and increasing the amount of Casl2a and crRNA in the reaction increases the efficiency of cutting to a level that is similar to the efficiency of cutting in buffer.
  • a human DNA sample comprising a target sequence of interest (CpG-4) was obtained.
  • An amplicon containing the target sequence of interest was generated using PCR (New England BioLabs® LongAmp®) and purified using solid-phase reversible immobilization (SPRI).
  • One (1) pM Casl2a and 300nM crRNA were incubated at room temperature for at least 10 minutes to create Cas complexes. Thirty (30) nM of the CpG-4 amplicon and 21pL of deionized water were added to the complexes to cut the amplicon at the target site. The reaction was incubated at room temperature for 30s, Im, 3m, 5m, or 10m, and then 1 pL ProK was added. A solid-phase reversible immobilization (SPRI) selection was performed to remove unwanted DNA fragments, excess enzymes, dNTPs and molecules. Purified cleaved DNA was analyzed using the Agilent TapeStation and QubitTM Fluorometer to determine cutting efficiency.
  • SPRI solid-phase reversible immobilization
  • Cas 12a is capable of cutting a target DNA sequence at room temperature, with the highest efficiency of cutting seen with a 5m and 10m (above 5 minutes) incubation time.
  • a human DNA sample comprising a target sequence of interest was obtained.
  • An amplicon containing the target sequence of interest was generated using PCR (New England BioLabs® LongAmp®) and purified using solid-phase reversible immobilization (SPRI).
  • Casl2a (1 pM) and crRNA (300 nM) were incubated for at least 10 minutes to create Cas complexes. Amplicon was added to the complexes to cut the amplicon at the target sites with a 4-base overhang on the opposite strand to the PAM. The overhang bases were filled in using DNA Polymerase-I and 1 mM biotinylated-dNTPs and/or 1 mM unlabeled dNTPs. DNA polymerase was used at 0.1 units/pL (lx) or 0.5 units/pL (5x). Streptavidin beads were added and bound to DNA containing biotinylated dNTPs. The reaction mixture was centrifuged and the beads separated from the supernatant. Bead and supernatant samples were analyzed using the Agilent TapeStation and QubitTM Fluorometer to determine cutting efficiency. Results
  • This example provides an exemplary process overview for Casl2a positive enrichment of target sequences.
  • a flowchart of the experimental design is shown in FIG. 5 and a schematic of each step is shown in FIG. 6.
  • This example demonstrates successful completion of a target-sequence enriched library using CRISPR to enrich sequences of interest.
  • Cell-free DNA comprising a target of interest was blunt end-repaired by incubating cfDNA, dNTPs and Klenow fragment (3 ’-5’ exo-) at 37C for 30 minutes.
  • a solidphase reversible immobilization (SPRI) selection was used to remove unwanted DNA fragments, excess enzymes, dNTPs and molecules.
  • Casl2a and crRNA were incubated at room temperature (25°C) for 10 minutes to create Cas complexes.
  • the target specimen was then spiked into the complexes to cut the specimen at the target sites with a 4-base overhang on the opposite strand to the PAM.
  • Biotinylated dNTPs and Klenow fragment (3 ’-5’ exo-; 0.1 units/pL) were then added and incubated at 37°C for 30 minutes to fill-in the overhang.
  • a solid-phase reversible immobilization (SPRI) selection was performed to remove unwanted DNA fragments, excess enzymes, dNTPs and molecules.
  • SPRI solid-phase reversible immobilization
  • DNA comprising the target sequence (having biotinylated dNTPs) were hybridized to streptavidin beads using the biotin: streptavidin interaction.
  • a series of washes removed off-target DNA molecules and the samples were enriched for on-target fragments and depleted for off-target fragments.
  • Streptavidin beads with target DNA bound were resuspended in water.
  • the bisulfite converted ssDNA was then used to create a library (“library creation”, LC) using Adaptase® technology from IDT.
  • This technology uses an enzymatic reaction resulting in unbiased addition of a truncated adapter.
  • the Adaptase® enzymatic reaction performed end-repairing, tailing of 3’ ends and ligation of first truncated adapter complement to 3 ’ ends simultaneously.
  • a uracil-free reverse complement to the bisulfite converted ssDNA was then generated using the truncated adapter to prime and extend.
  • a solidphase reversible immobilization (SPRI) selection was performed to remove unwanted ssDNA fragments, excess adapters and molecules.
  • a ligation reaction was performed, adding truncated P5 adapter to the 3’ end of the uracil-free reverse complement fragment.
  • a solid-phase reversible immobilization (SPRI) selection was used to remove unwanted ssDNA fragments, excess adapters and molecules.
  • Indexing PCR amplification was performed with a high fidelity DNA polymerase and unique, known 10-bp barcodes. Indices allow for sample multiplex for the downstream assay.
  • the product was a bisulfite converted dsDNA library with full length adapters.
  • Post-PCR a SPRI selection was done to remove unwanted ssDNA fragments, excess primers, excess adapters and excess molecules. After library construction, the library quality and quantity were evaluated using the Agilent TapeStation and Qubit Fluorometer, respectively. Sequencing of enriched library
  • Sequencing was performed using an iSeq using paired end 150x150 base sequencing with a 5% PhiX spike-in. Sequencing data generated was then demultiplexed utilizing the assigned barcode, aligned to the human genome and trimmed. The cleaned-up data was then processed through a quality pipeline to collapse duplicate reads and the sequencing data was evaluated. As shown in FIG. 7, the library exhibited a conversion efficiency of 99.04%.
  • Example 5 The methods of Example 5 were repeated using gDNA as the nucleic acid source and CpG-5plex as the target, which contained multiple cut sites. Enrichment for one of the targets (CpG-4) within the CPG-5plex is shown in FIG. 8. As shown, the CpG-4 target is enriched in the resulting library, where a no Cas/crRNA (“no cut”) control and a library constructed using the gDNA without the enrichment steps (“no C-Select”) showed no enrichment. These results demonstrate that a library enriched for specific target sequences can be constructed using the methods of the disclosure.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides methods for enriching nucleic acid target sequences from a sample, for example, from a biological sample or from a nucleic acid library.

Description

METHODS FOR ENRICHING NUCLEIC ACID TARGET SEQUENCES
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/409,589, filed on September 23, 2022 and U.S. Provisional Patent Application No. 63/497,175, filed on April 19, 2023, the entire contents of each of which are incorporated herein by reference.
FIELD OF THE INVENTION
[0002] The invention relates generally to methods for enriching nucleic acid target sequences from a sample, for example, from a biological sample or from a nucleic acid library.
BACKGROUND
[0003] Detection of target sequences in a nucleic acid can be a challenge when the target sequence is present at a low frequency in the nucleic acid sample. Amplification and/or sequencing of target sequences can fail if such sequences occur at a low frequency. For example, circulating tumor DNA (ctDNA) levels are present at a very low frequency in most early-stage and many advanced stage cancer patients (Bettegowda et al. (2014) Sci Transl Med 6(224): p. 224ra24). Accordingly, a major challenge in the identification of ctDNA is how to identify a trace amount of ctDNAs out of a much larger proportion of total cell free DNA (cfDNA). Several recent studies have adopted either reduced-representation bisulfite sequencing (RRBS; Guo et al. (2017) Nat Genet 49(4): p. 635-642), whole-genome bisulfite sequencing (WGBS; Li et al. (2018) Nucleic Acids Res. 46( 15):e89) or methylated DNA immunoprecipitation sequencing (MeDIP-seq; Shen et al. (2018) Nature 563(7732):579-583) approaches to enrich methylated DNA sequences from a cell-free DNA sample. However, all of these techniques suffer from poor coverage in regions of interest in exchange for the availability of genome-wide information.
[0004] Accordingly, there is a need in the art for improved techniques for enriching target sequences of interest in a nucleic acid sample. SUMMARY OF THE INVENTION
[0005] The disclosure relates to methods of enriching target sequences in a nucleic acid sample. The methods include, for example, cutting a nucleic acid molecule that includes a target sequence to form a single-stranded overhang, filling in the overhang with a label, and capturing the nucleic acid molecule that includes the target, thereby enriching the target sequence. The methods can be used to enrich target sequences prior to assembling a nucleic acid library or can be used to enrich target sequences in an existing library.
[0006] In one aspect, the disclosure relates to a nucleic acid enrichment method. The method includes cutting a nucleic acid molecule that includes a target sequence to generate a single stranded overhang at a cut end of the molecule that includes the target; filling in the overhang with at least one labeled nucleotide; and enriching the molecule that includes the target by contacting at least one of the labeled nucleotides in the molecule with a capture domain.
[0007] In certain embodiments, the cutting step is performed by a nuclease, for example, a CRISPR-Cas nuclease. In certain embodiments, the nuclease is a type II CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Cas9 nuclease. In certain embodiments, the nuclease is a type V CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Casl2 nuclease. In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
[0008] In certain embodiments, the cutting step is performed at room temperature.
[0009] In certain embodiments, the overhang is filled in using a DNA polymerase. In certain embodiments, the DNA polymerase is DNA polymerase I. In certain embodiments, the DNA polymerase I consists of the Klenow fragment.
[0010] In certain embodiments, the label comprises biotin or digoxigenin. In certain embodiments, the capture domain comprises avidin, streptavidin, or a DIG-binding protein. In certain embodiments, the capture domain comprises or is connected to a solid support. In certain embodiments, the solid support is a bead, a well, a tube, or a slide. In certain embodiments, the capture domain comprises streptavidin connected to the bead.
[0011] In certain embodiments, the nucleic acid molecule is present in a nucleic acid sequencing library, and the method enriches target sequences in the library.
[0012] In certain embodiments, the nucleic acid molecule was obtained from a nucleic acid sample from a subject. In certain embodiments, the nucleic acid sample is a plasma sample. In certain embodiments, the plasma sample is used directly in the nucleic acid enrichment method (for example, directly in the cutting step) without prior enrichment or purification of the nucleic acid.
[0013] In certain embodiments, the nucleic acid sample comprises cell free DNA (cfDNA). In certain embodiments, cytosines in the cfDNA have been converted to uracils. In certain embodiments, the cfDNA has been treated with bisulfite. In certain embodiments, the method further comprises the step of converting methylated cytosines to uracils.
[0014] In certain embodiments, the method further comprises preparing a library before or after enriching the molecule that includes the target.
[0015] In certain embodiments, the method further comprises a wash step to remove nucleic acid molecules that do not include the target.
[0016] In certain embodiments, the method further comprises amplifying the nucleic acid molecule. In certain embodiments, the amplification occurs while the nucleic acid is in contact with the capture domain.
[0017] In certain embodiments, the method further comprises sequencing the enriched molecule.
[0018] In certain embodiments, the method further comprises separating the nucleic acid molecule from the capture domain. In certain embodiments, the separating step comprises heat elution off of the capture domain. In certain embodiments, the separating step is performed using a chemical agent. In certain embodiments, the separating step is performed using mechanical disruption. In certain embodiments, the separating step is performed using heat elution, a chemical agent, mechanical disruption, or combinations thereof. In certain embodiments, the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
[0019] In certain embodiments, the method further comprises an additional enrichment step. In certain embodiments, the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences. In certain embodiments, the additional enrichment step comprises hybrid capture. In certain embodiments, the additional enrichment step comprises using a nucleic acid binding protein.
[0020] In another aspect, the disclosure relates to a method of capturing a nucleic acid molecule having a target sequence. The method includes cutting a nucleic acid molecule that includes a target sequence to generate a single stranded overhang at a cut end of the molecule that includes the target; filling in the overhang with at least one labeled nucleotide; and capturing the molecule that includes the target by contacting at least one of the labeled nucleotides in the molecule with a capture domain.
[0021] In certain embodiments, the cutting step is performed by a nuclease. In certain embodiments, the nuclease is a CRISPR-Cas nuclease. In certain embodiments, the nuclease is a type II CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Cas9 nuclease. In certain embodiments, the nuclease is a type V CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Casl2 nuclease. In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
[0022] In certain embodiments, the cutting step is performed at room temperature.
[0023] In certain embodiments, the overhang is filled in using a DNA polymerase. In certain embodiments, the DNA polymerase is DNA polymerase I. In certain embodiments, the DNA polymerase I consists of the Klenow fragment.
[0024] In certain embodiments, the label comprises biotin or digoxigenin. In certain embodiments, the capture domain comprises avidin, streptavidin, or a DIG-binding protein. In certain embodiments, the capture domain comprises or is connected to a solid support. In certain embodiments, the solid support is a bead, a well, a tube, or a slide. In certain embodiments, the capture domain comprises streptavidin connected to the bead.
[0025] In certain embodiments, the nucleic acid molecule is present in a nucleic acid sequencing library, and the method captures target sequences of interest in the library.
[0026] In certain embodiments, the nucleic acid molecule was obtained from a nucleic acid sample from a subject. In certain embodiments, the nucleic acid sample is a plasma sample and the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
[0027] In certain embodiments, the nucleic acid sample comprises cell free DNA (cfDNA). In certain embodiments, the cfDNA have been converted to uracils. In certain embodiments, the cfDNA has been treated with bisulfite. In certain embodiments, the method further comprises the step of converting methylated cytosines to uracils.
[0028] In certain embodiments, the method further comprises preparing a library before or after capturing the molecule that includes the target.
[0029] In certain embodiments, the method further comprises a wash step to remove nucleic acid molecules that do not include the target.
[0030] In certain embodiments, the method further comprises amplifying the nucleic acid molecule. In certain embodiments, the amplification occurs while the nucleic acid is in contact with the capture domain. In certain embodiments, the method further comprises sequencing the captured molecule. In certain embodiments, the method further comprises separating the nucleic acid molecule from the capture domain. In certain embodiments, the separating step comprises heat elution off of the capture domain. In certain embodiments, the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
[0031] In certain embodiments, the method further comprises an additional enrichment step. In certain embodiments, the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences. In certain embodiments, the additional enrichment step comprises hybrid capture. In certain embodiments, the additional enrichment step comprises using a nucleic acid binding protein. [0032] In another aspect, the disclosure relates to a nucleic acid enrichment method. The method includes cutting a nucleic acid molecule that includes a target sequence to generate a single stranded overhang at a cut end of the molecule that includes the target; filling in the overhang with at least one labeled nucleotide; and enriching the molecule that includes the target by separating labeled molecules from unlabeled molecules.
[0033] In certain embodiments, the cutting step is performed by a nuclease. In certain embodiments, the nuclease is a CRISPR-Cas nuclease. In certain embodiments, the nuclease is a type V CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Cas9 or Casl2 nuclease. In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
[0034] In certain embodiments, the cutting step is performed at room temperature.
[0035] In certain embodiments, the overhang is filled in using a DNA polymerase. In certain embodiments, the DNA polymerase is DNA polymerase I. In certain embodiments, the DNA polymerase I consists of the Klenow fragment.
[0036] In certain embodiments, the label comprises biotin, digoxigenin, or a fluorophore. In certain embodiments, the capture domain comprises or is connected to a solid support. In certain embodiments, the solid support is a bead, a well, a tube, or a slide. In certain embodiments, the capture domain comprises streptavidin connected to the bead. In certain embodiments, the nucleic acid molecule is present in a nucleic acid sequencing library, and the method enriches target sequences of interest in the library.
[0037] In certain embodiments, the nucleic acid molecule was obtained from a nucleic acid sample from a subject. In certain embodiments, the nucleic acid sample is a plasma sample. In certain embodiments, the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
[0038] In certain embodiments, the nucleic acid sample comprises cell free DNA (cfDNA). In certain embodiments, cytosines in the cfDNA have been converted to uracils. In certain embodiments, the cfDNA has been treated with bisulfite. In certain embodiments, the method further comprises the step of converting methylated cytosines to uracils.
[0039] In certain embodiments, the method further comprises preparing a library before or after enriching the molecule that includes the target.
[0040] In certain embodiments, the method includes a wash step.
[0041] In certain embodiments, the method further comprises amplifying the nucleic acid molecule. In certain embodiments, the amplification occurs while the nucleic acid is in contact with the capture domain. In certain embodiments, the method further comprises sequencing the enriched molecule. In certain embodiments, the method further comprises separating the nucleic acid molecule from the capture domain. In certain embodiments, the separating step comprises heat elution off of the capture domain. In certain embodiments, the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
[0042] In certain embodiments, the method further comprises an additional enrichment step. In certain embodiments, the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences. In certain embodiments, the additional enrichment step comprises hybrid capture. In certain embodiments, the additional enrichment step comprises using a nucleic acid binding protein.
[0043] In another aspect, the disclosure relates to a method of producing a nucleic acid library enriched for regions of interest. The method includes cutting a plurality of nucleic acid molecules comprising regions of interest to generate single stranded overhangs at cut ends of the molecules that include the regions of interest; filling in each overhang with a least one labeled nucleotide; and enriching the molecules that include the regions of interest by contacting the labeled nucleotides in the molecule with capture domains.
[0044] In certain embodiments, the cutting step is performed by a nuclease. In certain embodiments, the nuclease is a CRISPR-Cas nuclease. In certain embodiments, the nuclease is a type II CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Cas9 nuclease. In certain embodiments, the nuclease is a type V CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Casl2 nuclease. In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecules that include the regions of interest.
[0045] In certain embodiments, the cutting step is performed at room temperature.
[0046] In certain embodiments, the overhangs are filled in using a DNA polymerase. In certain embodiments, the DNA polymerase is DNA polymerase I. In certain embodiments, the DNA polymerase I consists of the Klenow fragment.
[0047] In certain embodiments, the label comprises biotin, digoxigenin, or a fluorophore. In certain embodiments, the capture domains comprise or are connected to solid supports. In certain embodiments, the solid supports are beads, wells, tubes, or slides. In certain embodiments, the capture domains comprise streptavidin connected to beads. In certain embodiments, the method further comprises amplifying the nucleic acid molecules. In certain embodiments, the amplifying is performed with primers that comprise adapters to facilitate sequencing of the nucleic acid molecules.
[0048] In certain embodiments, the nucleic acid molecule was obtained from a nucleic acid sample from a subject. In certain embodiments, the nucleic acid sample is a plasma sample. In certain embodiments, the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
[0049] In certain embodiments, the nucleic acid sample comprises cell free DNA (cfDNA). In certain embodiments, cytosines in the cfDNA have been converted to uracils. In certain embodiments, the cfDNA has been treated with bisulfite. In certain embodiments, the method further comprises the step of converting methylated cytosines to uracils.
[0050] In certain embodiments, the method further comprises a wash step to remove nucleic acid molecules that do not include the regions of interest.
[0051] In certain embodiments, the method further comprises amplifying the nucleic acid molecule. In certain embodiments, the amplification occurs while the nucleic acid is in contact with the capture domain. In certain embodiments, the method further comprises separating the nucleic acid molecules from the capture domains. In certain embodiments, the separating step comprises heat elution off of the capture domain. In certain embodiments, the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
[0052] In certain embodiments, the method further comprises an additional enrichment step. In certain embodiments, the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences. In certain embodiments, the additional enrichment step comprises hybrid capture. In certain embodiments, the additional enrichment step comprises using a nucleic acid binding protein.
[0053] In another aspect, the disclosure relates to a method for producing a nucleic acid library enriched for regions of interest. The method includes obtaining a sample comprising a plurality of nucleic acids, wherein a subset of the plurality of nucleic acids comprise regions of interest; optionally converting methylated cytosines to uracils; adding nucleic acid adapters to the plurality of nucleic acids to form a nucleic acid library; cutting the subset of the plurality of nucleic acid molecules having regions of interest to generate single stranded overhangs at cut ends of the molecules that include the regions of interest; filling in each overhang with a least one labeled nucleotide; enriching the molecules that include the regions of interest by contacting the labeled nucleotides in the molecule with capture domains; and amplifying the molecules that include the regions of interest to form the nucleic acid library enriched for regions of interest.
[0054] In another aspect, the disclosure relates to a method for producing a nucleic acid library enriched for regions of interest. The method includes obtaining a sample comprising a plurality of nucleic acids, wherein a subset of the plurality of nucleic acids comprise regions of interest; cutting the subset of the plurality of nucleic acid molecules having regions of interest to generate single stranded overhangs at cut ends of the molecules that include the regions of interest; filling in each overhang with a least one labeled nucleotide; enriching the molecules that include the regions of interest by contacting the labeled nucleotides in the molecule with capture domains; removing the molecules that include the regions of interest from the capture domains; optionally converting methylated cytosines to uracils; and adding nucleic acid adapters to the plurality of nucleic acids to form the nucleic acid library enriched for regions of interest.
[0055] In certain embodiments, the cutting step is performed by a nuclease. In certain embodiments, the nuclease is a CRISPR-Cas nuclease. In certain embodiments, the nuclease is a type II CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Cas9 nuclease. In certain embodiments, the nuclease is a type V CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Casl2 nuclease. In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
[0056] In certain embodiments, the cutting step is performed at room temperature.
[0057] In certain embodiments, the overhang is filled in using a DNA polymerase. In certain embodiments, the DNA polymerase is DNA polymerase I. In certain embodiments, the DNA polymerase I consists of the Klenow fragment. In certain embodiments, the label comprises biotin or digoxigenin. In certain embodiments, the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
[0058] In certain embodiments, the capture domain comprises or is connected to a solid support. In certain embodiments, the solid support is a bead, a well, a tube, or a slide. In certain embodiments, the capture domain comprises streptavidin connected to the bead. In another aspect, the disclosure relates to a nucleic acid library, produced by the methods described herein.
[0059] In certain embodiments, the method further comprises an additional enrichment step. In certain embodiments, the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences. In certain embodiments, the additional enrichment step comprises hybrid capture. In certain embodiments, the additional enrichment step comprises using a nucleic acid binding protein.
[0060] In another aspect, the disclosure relates to a kit comprising a nuclease that cuts a nucleic acid molecule including a target sequence to generate a single stranded overhand at a cut end of the molecule that includes the target; labeled dNTPs; DNA polymerase; and a capture moiety comprising a capture domain.
[0061] In certain embodiments, the nuclease is a CRISPR-Cas nuclease. In certain embodiments, the nuclease is a type II CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Cas9 nuclease. In certain embodiments, the nuclease is a type V CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Casl2 nuclease. In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease.
[0062] In another aspect, the disclosure relates to a nucleic acid enrichment method comprising the steps of (a) designing a first set of guide RNAs to bind a first set of target sequences for cleavage with a first nuclease, (b) designing a second set of guide RNAs to bind a second set of target sequences for cleavage with a second nuclease, (c) adding the first and second sets of guide sequences and the first and second nucleases to a nucleic acid comprising a plurality of target sequences, (d) generating single stranded overhangs at the cleavage sites in the first and second sets of target sequences, (e) filling in each overhang with at least one labeled nucleotide; and (f) enriching the target sequences by contacting at least one of the labeled nucleotides in the molecule with a capture domain.
[0063] In certain embodiments, the first nuclease or the second nuclease is a CRISPR-Cas nuclease. In certain embodiments, the first nuclease or the second nuclease is a type II or a type V CRISPR-Cas nuclease. In certain embodiments, the first nuclease or the second nuclease is a Cas9, Casl2, or CasX nuclease. In certain embodiments, the first nuclease or the second nuclease is a Casl2a/Cpfl nuclease.
[0064] In certain embodiments, the first nuclease or the second nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
[0065] In certain embodiments, the cutting step is performed at room temperature.
[0066] In certain embodiments, the overhang is filled in using a DNA polymerase. In certain embodiments, the DNA polymerase is DNA polymerase I. In certain embodiments, the DNA polymerase I consists of the Klenow fragment.
[0067] In certain embodiments, the label comprises biotin or digoxigenin. In certain embodiments, the capture domain comprises avidin, streptavidin, or a DIG-binding protein. In certain embodiments, the capture domain comprises or is connected to a solid support. In certain embodiments, the solid support is a bead, a well, a tube, or a slide. In certain embodiments, the capture domain comprises streptavidin connected to the bead. In certain embodiments, the nucleic acid molecule is present in a nucleic acid sequencing library, and the method enriches target sequences in the library. In certain embodiments, the nucleic acid molecule was obtained from a nucleic acid sample from a subject. In certain embodiments, the nucleic acid sample is a plasma sample. In certain embodiments, the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
[0068] In certain embodiments, the nucleic acid sample comprises cell free DNA (cfDNA). In certain embodiments, cytosines in the cfDNA have been converted to uracils. In certain embodiments, the cfDNA has been treated with bisulfite.
[0069] In certain embodiments, the method further comprises preparing a library before or after enriching the molecule that includes the target. In certain embodiments, the method further comprising the step of converting methylated cytosines to uracils.
[0070] In certain embodiments, the method further comprising a wash step to remove nucleic acid molecules that do not include the target.
[0071] In certain embodiments, the method further comprising amplifying the nucleic acid molecule. In certain embodiments, the amplification occurs while the nucleic acid is in contact with the capture domain.
[0072] In certain embodiments, the method further comprising sequencing the enriched molecule.
[0073] In certain embodiments, the method further comprising separating the nucleic acid molecule from the capture domain. In certain embodiments, the separating step is performed using heat elution, a chemical agent, mechanical disruption, or combinations thereof.
[0074] In certain embodiments, the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
[0075] In certain embodiments, the method further comprises an additional enrichment step. In certain embodiments, the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences. In certain embodiments, the additional enrichment step comprises hybrid capture. In certain embodiments, the additional enrichment step comprises using a nucleic acid binding protein.
[0076] These and other aspects and features of the invention are described in the following detailed description and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0077] The foregoing and other objects, features and advantages of the invention will become apparent from the following description of preferred embodiments, as illustrated in the accompanying drawings. Like referenced elements identify common features in the corresponding drawings. The drawings are not necessarily to scale, with emphasis instead being placed on illustrating the principles of the present invention, in which:
[0078] FIGURE 1 is a schematic flowchart showing a method according to the disclosure for enriching target sequences (z.e., regions of interest (RO I)) from a nucleic acid library.
[0079] FIGURE 2 is a schematic flowchart showing a method according to the disclosure for enriching target sequences (z.e., regions of interest (RO I)) from a sample and constructing a nucleic acid library using the enriched target sequences.
[0080] FIGURE 3 is a schematic of a bisulfite conversion reaction.
[0081] FIGURE 4 provides electrophoresis results for an experiment testing whether biotinylated dNTPs could be incorporated into a nucleic acid comprising a target sequence and enriched using streptavidin beads. Bands representing a biotinylated target fragment bound to beads were seen in both the lx and 5x polymerase (“Enzyme”) conditions and with anywhere from 10% to 100% biotinylated dNTPs. Three negative controls, “cut control” lacking polymerase enzyme and biotinylated dNTPs, “no bind control” lacking Casl2, crRNA, and polymerase enzyme, and “bind control” which contained a biotinylated amplicon, did not contain biotinylated target fragment. Accordingly, streptavidin beads were capable of binding to and isolating target fragments that had incorporated biotinylated dNTPs.
[0082] FIGURE 5 provides a flow chart showing an exemplary process overview for Casl2a positive enrichment of target sequences. [0083] FIGURE 6 provides a schematic of the steps of the exemplary library creation method of Example 5.
[0084] FIGURE 7 shows the sequencing results of a library constructed in Example 5 using the methods of the disclosure.
[0085] FIGURE 8 shows the sequencing results of a library constructed in Example 6 using the methods of the disclosure. As shown, target CpG-4 within a 5-plex target was successfully enriched using the methods of the disclosure.
DETAILED DESCRIPTION
[0086] The disclosure relates to methods of enriching target sequences in a nucleic acid sample. The methods include, for example, cutting a nucleic acid molecule that includes a target sequence to form a single-stranded overhang, filling in the overhang with a label, and capturing the nucleic acid molecule that includes the target, thereby enriching the target sequence. The methods can be used, for example, to enrich target sequences prior to assembling a nucleic acid library or can be used to enrich target sequences in an existing library.
[0087] An exemplary method of enriching target sequences is shown in FIG. 1. A cell-free DNA (cfDNA) sample comprising methylated nucleotides that have been converted using bisulfite treatment are used to construct a nucleic acid library. sgRNAs complementary to target sequences are constructed, and the library is exposed to Casl2 and the sgRNAs. The sgRNAs direct Casl2 to the target sequence and cleave the DNA, leaving an overhang. Biotinylated dNTPs are added with a polymerase (Klenow fragment) to fill in the nucleotides complementary to the overhang. Streptavidin beads bind to the biotinylated nucleotides of the target sequences, and any uncut, off-target sequences are washed away. A PCR amplification is then performed to amplify the enriched target sequences that have been bound to the bead.
[0088] Another exemplary method of enriching target sequences is shown in FIG. 2. In this method, a cell-free DNA (cfDNA) sample is exposed to Casl2 and sgRNAs complementary to target sequences in a target region. The sgRNAs direct Casl2 to the target sequence and cleave the DNA, leaving an overhang. Biotinylated dNTPs are added with a polymerase (Klenow fragment) to fill in the nucleotides complementary to the overhang. Streptavidin beads bind to the biotinylated nucleotides present in the target region, and any uncut, off-target sequences are washed away. The enriched target sequences are eluted off of the beads using heat treatment. The eluted target sequences are then treated with bisulfite to preserve methylation information. The bisulfite converted target sequences are used to construct a library that is enriched for target sequences.
[0089] Unless otherwise defined herein, scientific and technical terms used in this application shall have the meanings that are commonly understood by those of ordinary skill in the art.
[0090] The practice of the present disclosure will employ, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are within the skill of the art. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual, second edition (Sambrook et al., 1989) Cold Spring Harbor Press; Oligonucleotide Synthesis (M.J. Gait, ed., 1984); Methods in Molecular Biology, Humana Press; Cell Biology: A Laboratory Notebook (J.E. Cellis, ed., 1998) Academic Press; Animal Cell Culture (R.I. Freshney, ed., 1987); Introduction to Cell and Tissue Culture (J. P. Mather and P.E. Roberts, 1998) Plenum Press; Cell and Tissue Culture: Laboratory Procedures (A. Doyle, J.B. Griffiths, and D.G. Newell, eds., 1993-1998) J. Wiley and Sons; Methods in Enzymology (Academic Press, Inc.); Gene Transfer Vectors for Mammalian Cells (J.M. Miller and M.P. Calos, eds., 1987); Current Protocols in Molecular Biology (F.M. Ausubel et al., eds., 1987); PCR: The Polymerase Chain Reaction, (Mullis et al., eds., 1994); Sambrook and Russell, Molecular Cloning: A Laboratory Manual, 3rd. ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (2001); Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, NY (2002); Harlow and Lane Using Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1998); Coligan et al., Short Protocols in Protein Science, John Wiley & Sons, NY (2003); Short Protocols in Molecular Biology (Wiley and Sons, 1999).
[0091] Enzymatic reactions and purification techniques are performed according to manufacturer’s specifications, as commonly accomplished in the art or as described herein. The nomenclatures used in connection with, and the laboratory procedures and techniques of, analytical chemistry, biochemistry, immunology, molecular biology, synthetic organic chemistry, and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art. Standard techniques are used for chemical syntheses, and chemical analyses.
[0092] Throughout this specification and embodiments, the word “comprise,” or variations such as “comprises” or “comprising,” will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
[0093] It is understood that wherever embodiments are described herein with the language “comprising,” otherwise analogous embodiments described in terms of “consisting of’ and/or “consisting essentially of’ are also provided.
[0094] Any example(s) following the term “e.g.” or “for example” is not meant to be exhaustive or limiting.
[0095] Unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.
[0096] Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the disclosure are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in their respective testing measurements. Moreover, all ranges disclosed herein are to be understood to be inclusive of the numbers defining the range and to encompass any and all subranges subsumed therein. For example, a stated range of “1 to 10” should be considered to include any and all subranges between (and inclusive of) the minimum value of 1 and the maximum value of 10; that is, all subranges beginning with a minimum value of 1 or more, e.g., 1 to 6.1, and ending with a maximum value of 10 or less, e.g., 5.5 to 10.
[0097] Where aspects or embodiments of the disclosure are described in terms of a Markush group or other grouping of alternatives, the present disclosure encompasses not only the entire group listed as a whole, but each member of the group individually and all possible subgroups of the main group, but also the main group absent one or more of the group members. The present disclosure also envisages the explicit exclusion of one or more of any of the group members in an embodiment of the disclosure.
[0098] Exemplary methods and materials are described herein, although methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure. The materials, methods, and examples are illustrative only and not intended to be limiting.
I. Definitions
[0099] The articles “a” and “an” are used herein to refer to one or to more than one (z.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element.
[00100] As used herein, the term “about” or “approximately” can mean within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which can depend in part on how the value is measured or determined, e.g., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. “About” can mean a range of ±20%, ±10%, ±5%, or ±1% of a given value. The term “about” or “approximately” can mean within an order of magnitude, within 5-fold, or within 2-fold, of a value. Where a particular value is described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value can be assumed. The term “about” can have the meaning as commonly understood by one of ordinary skill in the art. The term “about” can refer to ±10%. The term “about” can refer to ±5%.
[00101] It should be understood that the expression of “at least one of’ includes individually each of the recited objects after the expression and the various combinations of two or more of the recited objects unless otherwise understood from the context and use. The expression “and/or” in connection with three or more recited objects should be understood to have the same meaning unless otherwise understood from the context. [00102] As used herein, the term “biological sample,” or “sample” refers to any sample taken from a subject, which can reflect a biological state associated with the subject, and that includes cell free DNA. A biological sample can take any of a variety of forms, such as a liquid biopsy (e.g., blood, urine, stool, saliva, or mucous), or a tissue biopsy, or other solid biopsy. Examples of biological samples include, but are not limited to, blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of the subject. A biological sample can include any tissue or material derived from a living or dead subject. A biological sample can be a cell-free sample. A biological sample can comprise a nucleic acid e.g., DNA or RNA) or a fragment thereof. The term “nucleic acid” can refer to deoxyribonucleic acid (DNA), ribonucleic acid (RNA) or any hybrid or fragment thereof. The nucleic acid in the sample can be a cell-free nucleic acid. A sample can be a liquid sample or a solid sample e.g., a cell or tissue sample). A biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g., of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g., thyroid, breast), etc. A biological sample can be a stool sample. In various embodiments, the majority of DNA in a biological sample that has been enriched for cell-free DNA (e.g., a plasma sample obtained via a centrifugation protocol) can be cell-free (e.g., greater than 50%, 60%, 70%, 80%, 90%, 95%, or 99% of the DNA can be cell-free). A biological sample can be treated to physically disrupt tissue or cell structure (e.g., centrifugation and/or cell lysis), thus releasing intracellular components into a solution which can further contain enzymes, buffers, salts, detergents, and the like which can be used to prepare the sample for analysis.
[00103] As used herein, the terms “nucleic acid” and “nucleic acid molecule” are used interchangeably. The terms refer to nucleic acids of any composition form, such as deoxyribonucleic acid (DNA, e.g., complementary DNA (cDNA), genomic DNA (gDNA) and the like), and/or DNA analogs (e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like), all of which can be in single- or double-stranded form. Unless otherwise limited, a nucleic acid can comprise known analogs of natural nucleotides, some of which can function in a similar manner as naturally occurring nucleotides. A nucleic acid can be in any form useful for conducting processes herein (e.g., linear, circular, supercoiled, single-stranded, double-stranded and the like). A nucleic acid in some embodiments can be from a single chromosome or fragment thereof (e.g. , a nucleic acid sample may be from one chromosome of a sample obtained from a diploid organism). In certain embodiments nucleic acids comprise nucleosomes, fragments or parts of nucleosomes or nucleosome-like structures. Nucleic acids can comprise protein e.g., histones, DNA binding proteins, and the like). Nucleic acids analyzed by processes described herein can be substantially isolated and are not substantially associated with protein or other molecules. Nucleic acids can also include derivatives, variants and analogs of DNA synthesized, replicated or amplified from single- stranded (“sense” or “antisense,” “plus” strand or “minus” strand, “forward” reading frame or “reverse” reading frame) and double-stranded polynucleotides. Deoxyribonucleotides can include deoxy adenosine, deoxycytidine, deoxyguanosine and deoxy thymidine. A nucleic acid may be prepared using a nucleic acid obtained from a subject as a template.
[00104] As used herein, the terms “template nucleic acid” and “template nucleic acid molecule(s)” are used interchangeably. The terms refer to nucleic acid that has been obtained from a sample and processed to form an immortalized library. The template nucleic acid can be nucleic acid obtained directly from the sample, or nucleic acid that is derived from that obtained directly from the sample. Examples of nucleic acid derived from a sample include DNA that has been reverse-transcribed from RNA obtained directly from a sample, or DNA that has be amplified from DNA obtained directly from a sample, for example, by PCR.
[00105] As used herein, the term “cell-free nucleic acids” refers to nucleic acid molecules that can be found outside cells, in bodily fluids such as blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of a subject. Cell-free nucleic acids originate from one or more healthy cells and/or from one or more cancer cells, or from non-human sources such bacteria, fungi, viruses. Examples of the cell-free nucleic acids include but are not limited to cell-free DNA (“cfDNA”), including mitochondrial DNA or genomic DNA, and cell-free RNA. In certain embodiments herein, instruments for assessing the quality of the cell-free nucleic acids, such as the TapeStation System from Agilent Technologies (Santa Clara, CA) can be used. Concentrating low- abundance cfDNA can be accomplished, for example using a Qubit™ Fluorometer from Thermofisher Scientific (Waltham, MA).
[00106] As used herein, the term “methylation” refers to a modification of a nucleic acid where a hydrogen atom on the pyrimidine ring of a cytosine base is converted to a methyl group, forming 5-methylcytosine. Methylation can occur at dinucleotides of cytosine and guanine referred to herein as “CpG sites”. Methylation of cytosine can occur in cytosines in other sequence contexts, for example, 5'-CHG-3' and 5'-CHH-3', where H is adenine, cytosine or thymine. Cytosine methylation can also be in the form of 5-hydroxymethylcytosine. Methylation of DNA can include methylation of non-cytosine nucleotides, such as N6-methyladenine. Anomalous cfDNA methylation can be identified as hypermethylation or hypomethylation, both of which may be indicative of cancer status. As is well known in the art, DNA methylation anomalies (compared to healthy controls) can cause different effects, which may contribute to cancer.
[00107] As used herein the term “methylation index” for each genomic site (e.g., a CpG site, a region of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5'— >3' direction) can refer to the proportion of sequence reads showing methylation at the site over the total number of reads covering that site. The “methylation density” of a region can be the number of reads at sites within a region showing methylation divided by the total number of reads covering the sites in the region. The sites can have specific characteristics, e.g., the sites can be CpG sites). The “CpG methylation density” of a region can be the number of reads showing CpG methylation divided by the total number of reads covering CpG sites in the region e.g., a particular CpG site, CpG sites within a CpG island, or a larger region). For example, the methylation density for each 100-kb bin in the human genome can be determined from the total number of unconverted cytosines (which can correspond to methylated cytosine) at CpG sites as a proportion of all CpG sites covered by sequence reads mapped to the 100-kb region. In some embodiments, this analysis is performed for other bin sizes, e.g., 50-kb or 1-Mb, etc. In some embodiments, a region is an entire genome or a chromosome or part of a chromosome (e.g., a chromosomal arm). A methylation index of a CpG site can be the same as the methylation density for a region when the region includes that CpG site. The “proportion of methylated cytosines” can refer the number of cytosine sites, “C's,” that are shown to be methylated (for example unconverted after bisulfite conversion) over the total number of analyzed cytosine residues, e.g., including cytosines outside of the CpG context, in the region. The methylation index, methylation density and proportion of methylated cytosines are examples of “methylation levels.”
[00108] Certain portions of a genome comprise regions with a high frequency of CpG sites. A CpG site is portion of a genome that has cytosine and guanine separated by only one phosphate group and is often denoted as “5' — C — phosphate — G — 3'”, or “CpG” for short. Regions with a high frequency of CpG sites are commonly referred to as “CG islands” or “CGIs”. It has been found that certain CGIs and certain features of certain CGIs in tumor cells tend to be different from the same CGIs or features of the CGIs in healthy cells. Herein, such CGIS and features of the genome are referred to herein as “cancer informative CGIs”, which is defined and described in more detail below. An “informative CpG” can be specified by reference to a specific CpG site, or to a collection of one or more CpG sites by reference to a CG island that contains the collection. These cancer informative CGIs tend to have methylation patterns in tumor cells that are different from the methylation patterns in healthy cells. DNA fragments from other CGIs may not express such differences.
[00109] As used herein, the term “methylation profile” (also called methylation status) can include information related to DNA methylation for a region. Information related to DNA methylation can include a methylation index of a CpG site, a methylation density of CpG sites in a region, a distribution of CpG sites over a contiguous region, a pattern or level of methylation for each individual CpG site within a region that contains more than one CpG site, and non-CpG methylation. A methylation profile of a substantial part of the genome can be considered equivalent to the methylome. “DNA methylation” in mammalian genomes can refer to the addition of a methyl group to position 5 of the heterocyclic ring of cytosine e.g., to produce 5- methylcytosine) among CpG dinucleotides. Methylation of cytosine can occur in cytosines in other sequence contexts, for example, 5'-CHG-3' and 5'-CHH-3', where H is adenine, cytosine or thymine. Cytosine methylation can also be in the form of 5-hydroxymethylcytosine.
Methylation of DNA can include methylation of non-cytosine nucleotides, such as N6- methyladenine. [00110] The term “epitype” or “nucleic acid epitype” refer to a region of nucleic acid (i.e., DNA or RNA) containing an epigenetic variation. For example, the epigenetic variation could be methylation or non-methylation of one or more nucleotides in a region of nucleic acid. For instance, in some embodiments the nucleotide that could be methylated or non-methylated may be a cytidine, e.g., at a CpG site (e.g., the nucleotide could be 5 -methylcytidine or cytidine). Exemplary CpG sites may be found in, for example, CpG islands (CGIs) shown in TABLES 1-4. CpG islands (CGIs) may be regions having a length greater than 200 bp, a GC content greater than 50% and a ratio of observed to expected CpG greater than 0.6. CpG islands are often found in promoter regions, where methylation is associated with transcriptional repression. Generally, a nucleic acid epitype containing one or more CpG sites may have a methylation pattern, such as any of fully non-methylated (e.g., none of the CpG sites in the epitype are methylated), partially methylated (e.g., at least one but not all of the CpG sites in the epitype are methylated), or fully methylated (e.g., all of the CpG sites in the epitype are methylated). In other embodiments, the nucleotide that could be methylated or non-methylated may be adenosine (e.g., the nucleotide could be N6-methyladenosine or adenosine).
[00111] As used herein, the term “amplifying” means performing an amplification reaction. In one aspect, an amplification reaction is “template-driven” in that base pairing of reactants, either nucleotides or oligonucleotides, have complements in a template polynucleotide that are required for the creation of reaction products. In one aspect, template-driven reactions are primer extensions with a nucleic acid polymerase, or oligonucleotide ligations with a nucleic acid ligase. Such reactions include, but are not limited to, polymerase chain reactions (PCRs), linear polymerase reactions, nucleic acid sequence-based amplification (NASBAs), rolling circle amplifications, and the like, disclosed in the following references, each of which are incorporated herein by reference herein in their entirety: Mullis et al., U.S. Pat. Nos. 4,683,195; 4,965,188; 4,683,202; 4,800,159 (PCR); Gelfand et al., U.S. Pat. No. 5,210,015 (real-time PCR with “taqman” probes); Wittwer et al., U.S. Pat. No. 6,174,670; Kacian et al., U.S. Pat. No. 5,399,491 (“NASBA”); Lizardi, U.S. Pat. No. 5,854,033; Aono et al., Japanese patent publ. JP 4-262799 (rolling circle amplification); and the like. In one aspect, the amplification reaction is PCR. An amplification reaction may be a “real-time” amplification if a detection chemistry is available that permits a reaction product to be measured as the amplification reaction progresses, e.g., “real-time PCR”, or “real-time NASBA” as described in Leone et al., Nucleic Acids Research, 26: 2150-2155 (1998), and like references.
[00112] A “reaction mixture” means a solution containing all the necessary reactants for performing a reaction, which may include, but is not be limited to, buffering agents to maintain pH at a selected level during a reaction, salts, co-factors, scavengers, and the like.
[00113] The terms “fragment” or “segment”, as used interchangeably herein, refer to a portion of a larger polynucleotide molecule. A polynucleotide, for example, can be broken up, or fragmented into, a plurality of segments. Various methods of fragmenting nucleic acid are well known in the art. These methods may be, for example, either chemical or physical or enzymatic in nature. Enzymatic fragmentation may include partial degradation with a DNase; partial depurination with acid; the use of restriction enzymes; intron-encoded endonucleases; DNA-based cleavage methods, such as triplex and hybrid formation methods, that rely on the specific hybridization of a nucleic acid segment to localize a cleavage agent to a specific location in the nucleic acid molecule; or other enzymes or compounds which cleave a polynucleotide at known or unknown locations. Physical fragmentation methods may involve subjecting a polynucleotide to a high shear rate. High shear rates may be produced, for example, by moving DNA through a chamber or channel with pits or spikes, or forcing a DNA sample through a restricted size flow passage, e.g., an aperture having a cross sectional dimension in the micron or submicron range. Other physical methods include sonication and nebulization. Combinations of physical and chemical fragmentation methods may likewise be employed, such as fragmentation by heat and ion-mediated hydrolysis. See, e.g., Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 3rd Ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001) (“Sambrook et al.) which is incorporated herein by reference for all purposes. These methods can be optimized to digest a nucleic acid into fragments of a selected size range.
[00114] The terms “polymerase chain reaction” or “PCR”, as used interchangeably herein, mean a reaction for the in vitro amplification of specific DNA sequences by the simultaneous primer extension of complementary strands of DNA. In other words, PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates. Usually, the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument. Particular temperatures, durations at each step, and rates of change between steps depend on many factors that are well-known to those of ordinary skill in the art, e.g., exemplified by the following references: McPherson et al., editors, PCR: A Practical Approach and PCR2: A Practical Approach (IRL Press, Oxford, 1991 and 1995, respectively). For example, in a conventional PCR using Taq DNA polymerase, a double stranded target nucleic acid may be denatured at a temperature>90° C, primers annealed at a temperature in the range 50-75° C, and primers extended at a temperature in the range 72-78° C. The term “PCR” encompasses derivative forms of the reaction, including, but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, and the like. The particular format of PCR being employed is discernible by one skilled in the art from the context of an application. Reaction volumes can range from a few hundred nanoliters, e.g., 200 nL, to a few hundred pL, e. g., 200 pL. “Reverse transcription PCR,” or “RT-PCR,” means a PCR that is preceded by a reverse transcription reaction that converts a target RNA to a complementary single stranded DNA, which is then amplified, an example of which is described in Tecott et al., U.S. Pat. No.
5,168,038, the disclosure of which is incorporated herein by reference in its entirety. “Real-time PCR” means a PCR for which the amount of reaction product, i.e., amplicon, is monitored as the reaction proceeds. There are many forms of real-time PCR that differ mainly in the detection chemistries used for monitoring the reaction product, e.g., Gelfand et al., U.S. Pat. No. 5,210,015 (“taqman”); Wittwer et al., U.S. Pat. Nos. 6,174,670 and 6,569,627 (intercalating dyes); Tyagi et al., U.S. Pat. No. 5,925,517 (molecular beacons); the disclosures of which are hereby incorporated by reference herein in their entireties. Detection chemistries for real-time PCR are reviewed in Mackay et al., Nucleic Acids Research, 30: 1292-1305 (2002), which is also incorporated herein by reference. “Nested PCR” means a two-stage PCR wherein the amplicon of a first PCR becomes the sample for a second PCR using a new set of primers, at least one of which binds to an interior location of the first amplicon. As used herein, “initial primers” in reference to a nested amplification reaction mean the primers used to generate a first amplicon, and “secondary primers” mean the one or more primers used to generate a second, or nested, amplicon. “Asymmetric PCR” means a PCR wherein one of the two primers employed is in great excess concentration so that the reaction is primarily a linear amplification in which one of the two strands of a target nucleic acid is preferentially copied. The excess concentration of asymmetric PCR primers may be expressed as a concentration ratio. Typical ratios are in the range of from 10 to 100. “Multiplexed PCR” means a PCR wherein multiple target sequences (or a single target sequence and one or more reference sequences) are simultaneously carried out in the same reaction mixture, e.g., Bernard et al., Anal. Biochem., 273: 221-228 (1999) (two- color real-time PCR). Usually, distinct sets of primers are employed for each sequence being amplified. Typically, the number of target sequences in a multiplex PCR is in the range of from 2 to 50, or from 2 to 40, or from 2 to 30. “Quantitative PCR” means a PCR designed to measure the abundance of one or more specific target sequences in a sample or specimen. Quantitative PCR includes both absolute quantitation and relative quantitation of such target sequences.
Quantitative measurements are made using one or more reference sequences or internal standards that may be assayed separately or together with a target sequence. The reference sequence may be endogenous or exogenous to a sample or specimen, and in the latter case, may comprise one or more competitor templates. Typical endogenous reference sequences include segments of transcripts of the following genes: P-actin, GAPDH, p2-microglobulin, ribosomal RNA, and the like. Techniques for quantitative PCR are well-known to those of ordinary skill in the art, as exemplified in the following references, which are incorporated by reference herein in their entireties: Freeman et al., Biotechniques, 26: 112-126 (1999); Becker- Andre et al., Nucleic Acids Research, 17: 9437-9447 (1989); Zimmerman et al., Biotechniques, 21: 268-279 (1996); Diviacco et al., Gene, 122: 3013-3020 (1992); and Becker- Andre et al., Nucleic Acids Research, 17: 9437-9446 (1989).
[00115] The term “primer” as used herein means an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3' end along the template so that an extended duplex is formed. Extension of a primer is usually carried out with a nucleic acid polymerase, such as a DNA or RNA polymerase. The sequence of nucleotides added in the extension process is determined by the sequence of the template polynucleotide. Usually, primers are extended by a DNA polymerase. Primers usually have a length in the range of from 14 to 40 nucleotides, or in the range of from 18 to 36 nucleotides. Primers are employed in a variety of nucleic amplification reactions, for example, linear amplification reactions using a single primer, or polymerase chain reactions, employing two or more primers. Guidance for selecting the lengths and sequences of primers for particular applications is well known to those of ordinary skill in the art, as evidenced by the following reference that is incorporated by reference herein in its entirety: Dieffenbach, editor, PCR Primer: A Laboratory Manual, 2nd Edition (Cold Spring Harbor Press, New York, 2003).
[00116] The terms “unique identifier”, “unique sequence tag”, “sequence tag”, “tag” or “barcode”, as used interchangeably herein, refer to an oligonucleotide that is attached to a polynucleotide or template molecule and is used to identify and/or track the polynucleotide or template in a reaction or a series of reactions. A unique identifier may be attached to the 3'- or 5 '-end of a polynucleotide or template, or it may be inserted into the interior of such polynucleotide or template to form a linear conjugate, sometimes referred to herein as a “tagged polynucleotide,” or “tagged template,” or the like. A unique identifier may vary widely in size and compositions; the following references, which are incorporated herein by reference in their entireties, provide guidance for selecting sets of unique identifiers appropriate for particular embodiments: Brenner, U.S. Pat. No. 5,635,400; Brenner and Macevicz, U.S. Pat. No.
7,537,897; Brenner et al., Proc. Natl. Acad. Sci., 97: 1665-1670 (2000); Church et al., European patent publication 0 303 459; Shoemaker et al., Nature Genetics, 14: 450-456 (1996); Morris et al., European patent publication 0799897A1; Wallace, U.S. Pat. No. 5,981,179; and the like. Lengths and compositions of unique identifiers can vary widely, and the selection of particular lengths and/or compositions depends on several factors including, without limitation, how unique identifiers are used to generate a readout, e.g., via a hybridization reaction or via an enzymatic reaction, such as sequencing; whether they are labeled, e.g., with a fluorescent dye or the like; the number of distinguishable oligonucleotide identifiers required to unambiguously identify a set of polynucleotides, and the like, and how different the identifiers of a particular set must be in order to ensure reliable identification, e.g., freedom from cross hybridization or misidentification from sequencing errors. In one aspect, unique identifiers can each have a length within a range of from about 2 to about 36 nucleotides, or from about 4 to about 30 nucleotides, or from about 8 to about 20 nucleotides, or from about 6 to about 10 nucleotides. In one aspect, sets of unique identifiers are used, wherein each unique identifiers of a set has a unique nucleotide sequence that differs from that of every other tag of the same set by at least two bases; in another aspect, sets of unique identifiers are used wherein the sequence of each unique identifiers of a set differs from that of every other unique identifiers of the same set by at least three bases.
[00117] Aspects of the invention involve the use of unique identifiers. Unique identifiers in accordance with embodiments of the invention can serve many functions. For example, unique sequence tags can include molecular barcode sequences, unique molecular identifier (UMI) sequences, or index sequences. In one embodiment, unique sequence tags (e.g., barcode or index sequences) can be used to identify DNA sequences originating from a common source such as a sample type, tissue, subject, or individual. In accordance with one embodiment, barcodes or index sequences can be used for multiplex sequencing. In one embodiment, unique sequence tags e.g., unique molecular identifiers (UMIs)) can be used to identify unique nucleic acid sequences from a mixed nucleic acid sample. For example, differing unique molecular identifiers e.g., UMIs) can be used to differentiate ssDNA molecules, dsDNA molecules, or damaged molecules (e.g., nicked dsDNA) contained in a cfDNA sample. In another embodiment, unique molecular identifiers (e.g., UMIs) can be used to reduce amplification bias, which is the asymmetric amplification of different targets due to differences in nucleic acid composition (e.g., high GC content). The unique molecular identifiers (UMIs) can be used to discriminate between nucleic acid mutations that arise during amplification. The unique sequence tags can be present in a multi-functional nucleic acid adapter, which adapter can comprise both a unique sequence tag and a universal priming site. In some embodiments, unique sequence tags can be greater than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18 nucleic acids in length.
[00118] In one embodiment, ssDNA molecules in a mixture of dsDNA and ssDNA molecules can be tagged with a unique sequence tags (e.g., ssDNA-specific tags, barcodes or UMIs) using an ssDNA ligation protocol and converted to dsDNA prior to preparation of a combined cfDNA library.
[00119] In another embodiment, dsDNA molecules in a mixture of dsDNA and ssDNA molecules can be tagged with unique molecular identifiers (e.g., UMIs) in a dsDNA ligation protocol using Y -shaped sequencing adapters and then ssDNA molecules can be tagged with a unique identifiers (e.g., barcode or unique UMI) and converted to dsDNA.
[00120] In some embodiments, the methods of the invention involve differential tagging of populations of cfDNA molecules e.g., dsDNA molecules, ssDNA molecules, and nicked dsDNA molecules) in a sample with unique sequence tags to distinguish sequence information derived from one population of cfDNA molecules e.g., dsDNA molecules) from sequence information derived from another population of cfDNA molecules (e.g., ssDNA molecules). Analysis of all populations of cfDNA molecules (e.g., dsDNA molecules, ssDNA molecules, and nicked dsDNA molecules) may increase the sensitivity of certain protocols, for example, a cancer screening protocol. Without being bound by theory, it is believed that ssDNA molecules and/or nicked dsDNA may provide additional valuable insight for cancer detection and screening from a cfDNA sample, and/or may be more representative of tumor content in a cfDNA sample.
[00121] In one embodiment, ssDNA molecules in a mixture of dsDNA and ssDNA molecules can be tagged with a unique sequence tags (e.g., ssDNA-specific tags, barcodes or UMIs) using an ssDNA ligation protocol and converted to dsDNA prior to preparation of a combined cfDNA library.
[00122] In another embodiment, dsDNA molecules in a mixture of dsDNA and ssDNA molecules can be tagged with unique sequence tags (e.g., UMIs) in a dsDNA ligation protocol using Y-shaped sequencing adapters (also referred to herein as “Y adapters”) and then ssDNA molecules can be tagged with a unique sequence tags (e.g., barcode or unique UMI) and converted to dsDNA.
[00123] In one embodiment, the incorporated unique sequences tags and ssDNA-specific tag can be used to distinguish sequencing reads as being originally derived from dsDNA or ssDNA in a cfDNA sample.
[00124] In another embodiment, the incorporated unique sequences tags (e.g., UMIs) and ssDNA-specific tags (e.g., barcodes or UMIs) can be used to obtain fragment size information and genome position associated with sequencing reads from nicked dsDNA fragments in a cfDNA sample. [00125] In yet another embodiment, the incorporated unique sequences tags (e.g., UMIs) and ssDNA-specific tags e.g., barcodes or UMIs) are used to reduce error introduced by amplification, library preparation, and/or sequencing.
[00126] As used herein, the term “sensitivity” refers to the ability of a diagnostic assay to correctly identify subjects with a condition of interest. As used herein, the term “specificity” refers to the ability of a diagnostic assay to correctly identify subjects without a condition of interest.
[00127] As used herein, the term “subject” refers to any living or non-living organism, including but not limited to a human e.g., a male human, female human, fetus, pregnant female, child, or the like), a non-human animal, a plant, a bacterium, a fungus or a protist. Any human or non-human animal can serve as a subject, including but not limited to mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g., cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g., pig), camelid (e.g., camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark. In some embodiments, a subject is a male or female of any age (e.g., a man, a women or a child).
II. Methods of Isolating and/or Enriching Target Nucleic Acid Sequences
[00128] Detecting a target sequence in a nucleic acid can be a challenge when the target sequence is present at a low frequency in the nucleic acid sample. The instant disclosure provides methods that improve detection of nucleic acids containing a target sequence, e.g., rare target sequences, by isolating and/or enriching such target sequences in a nucleic acid sample. For example, the rare target sequence may be in a nucleic acid sequence from a cfDNA sample, such as a cfDNA sample that has been treated with bisulfite or chemical conversion to convert cytosines to uracils to preserve information regarding the methylation status of a particular nucleic acid sequence (e.g., comprising a CpG site), in a subject. The target sequence may be indicative of the risk of developing or the presence of cancer in the subject from whom the sample was taken. In certain embodiments, the target sequence is present in a nucleic acid library (e.g., the sample may be a nucleic acid library), and the methods described herein enrich the target sequence in the nucleic acid library. a. Enrichment Using Staggered Nucleic Acid Cleavage of a Target Sequence and Overhang Fill with a Label
[00129] A nucleic acid target sequence can be isolated and/or enriched by cutting a nucleic acid molecule that includes the target sequence to form a single-stranded overhang, filling in the overhang with a label, and capturing the nucleic acid molecule that includes the target by contacting at least one of the labeled nucleotides with a capture domain, thereby isolating and/or enriching the target sequence. In other embodiments, a nucleic acid molecule that includes the target molecule can be enriched by separating labeled nucleic acids from unlabeled nucleic acids without the use of a capture domain. For example, labeled (e.g., fluorescence-labeled) nucleic acids can be separated from unlabeled nucleic acids using a method that sorts fluorescent molecules away from non-fluorescent molecules and/or by using magnetic fields to sort a nucleic acid labeled with a magnetic label away from non-labeled nucleic acids.
[00130] Nucleic acids containing or suspected of containing a target sequence can be contacted with an enzyme that (1) recognizes a target sequence and (2) cuts (cleaves) the nucleic acid molecules that contain the target sequence with a nuclease. In certain embodiments, the nuclease cleaves the nucleic acid within the target sequence. In certain embodiments, the nuclease cleaves the nucleic acid near the target sequence. For example, the nuclease may cleave the nucleic acid within about 1 nt to about 20 nt, about 2 nt to about 20 nt, about 5 nt to about 20 nt, about 10 nt to about 20 nt, about 15 nt to about 20 nt, about 1 nt to about 15 nt, about 2 nt to about 15 nt, about 5 nt to about 15 nt, about 10 nt to about 15 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 5 nt, about 2 nt to about 5 nt, about 2 nt to about 5 nt of the target sequence.
[00131] Nucleases suitable for use herein generate a staggered cut in the nucleic acid, leaving a single stranded overhang of unpaired nucleotides. The overhang may be any length, for example, between 1 and 3 nt, between 1 and 5 nt, between 1 and 10 nt, between 1 and 15 nt, between 3 and 5 nt, between 3 and 10 nt, between 3 and 15 nt, or between 5 and 10 nt, between 5 and 15 nt, between 10 and 15 nt, or about 2 nt, about 3 nt, about 4 nt, about 5 nt, about 6 nt, about 7 nt, about 8 nt, about 9 nt, about 10 nt, about 11 nt, about 12 nt, about 13 nt, about 14 nt, or about 15 nt. Exemplary nucleases include type II and type V CRISPR-Cas nucleases. In certain embodiments, the nuclease is a Cas9 or Casl2 nuclease, or a variant thereof (see, e.g., Liu et al. (2019) Nature Communications 10; Article 5524). In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence that binds to the target sequence.
[00132] In certain embodiments, two or more nucleases are used in the methods described herein. In certain embodiments, the two or more nucleases are used sequentially. In certain embodiments, the two or more nucleases are used are used simultaneously. In certain embodiments, two or more nucleases are used to increase the number or variety of target sequences that can be enriched. For example, in certain embodiments, if one or more target sequences is not located near a Casl2 (e.g., Casl2a) PAM site, a second nuclease e.g., a Cas9 or a CasX nuclease) may be used if the second nuclease has a PAM site near the remaining target sequences.
[00133] An enrichment method may include one or more of the steps of: (1) designing guide RNAs to bind target sequences for cleavage with Casl2 (e.g., Casl2a), (2) designing guide RNAs to bind target sequences for cleavage with Cas9 (e.g., for additional target sequence that are not near a Casl2 PAM), (3) adding a mixture of guide sequences, Casl2, Cas9 to the nucleic acid comprising a plurality of target sequences.
[00134] In certain embodiments, an end-repair step may be performed prior to a cutting step. For example, when enriching a nucleic acid that may contain overhangs, such as cfDNA, an end-repair step can be performed to blunt-end repair any overhangs unrelated to the target sequence, prior to the cutting step.
[00135] The overhangs are filled in using a polymerase, such as a DNA polymerase. In certain embodiments, the DNA polymerase I consists of the Klenow fragment. The polymerase reaction includes nucleotides (free nucleotides), such as dNTPs, which are used by the DNA polymerase to fill in the overhangs. Klenow fragment can be used in an amount of from about 0.01 units/pL to about 1 unit/pL, for example, from about 0.05 units/pL to about 0.5 units/pL, from about 0.075 units/pL to about 0.125 units/pL or at about 0.1 unit/pL. The nucleotides are associated with (e.g., bound to) a label, which allows for the separation and/or isolation of the nucleic acid comprising the target sequence from other nucleic acids not containing the target sequence. In certain embodiments, the label comprises a fluorophore, a magnetic moiety, biotin or digoxigenin.
[00136] In certain embodiments, a labeled e.g., biotin-labeled) nucleic acid comprising a target sequence is exposed to a capture domain e.g., avidin), forming a capture domain-label- nucleic acid target complex. The capture domain can be bound to a solid support, such as a bead. Thus, after exposure to the solid support-bound capture domain, the beads will be bound to the capture domain-label-nucleic acid target complex, which can be separated from non-target sequence from the nucleic acid comprising the target sequence, e.g. , by a wash step.
[00137] In certain embodiments, the capture domain comprises avidin, streptavidin, or a DIG-binding protein. In certain embodiments, the solid support is a bead, a well, a tube, or a slide.
[00138] The steps of the methods described herein, such as the enrichment method, including the cutting step, can be performed at a variety of temperatures, including but not limited to room temperature and/or about 20°C to about 45°C, about 20°C, about 25°C, about 30°C, about 35°C, about 37°C, about 40°C, about 45°C, or any ranges therein (e.g., about 20°C to about 25°C, about 25°C to about 37°C, about 35°C to about 45°C, and so on). b. Additional Enrichment Steps
[00139] In certain embodiments, the target sequence or a subset of target sequences in the nucleic acid can be enriched using one or more additional enrichment steps. The one or more additional enrichments steps can be performed using any enrichment method known in the art. Non-limiting examples include hybrid capture and use of DNA-binding proteins to enrich a target sequence or a subset of target sequences.
[00140] One or more additional enrichment steps can be performed before or after enrichment using a targeted cutting and overhang filling method described above. For example, in certain embodiments, the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to a nucleic acid enrichment method that includes cutting the nucleic acid molecules that include the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the molecules that include the one or more e.g., the plurality of) target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more e.g., the plurality of) target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules. The method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to one or more additional enrichment steps to enrich for a subset of the target sequences. In certain embodiments, the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
[00141] In certain embodiments, the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to an enrichment step to enrich for the one or more (e.g., the plurality of) target sequences. The method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to a nucleic acid enrichment method that includes cutting the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the target sequences or a subset of the target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more target sequences or subset of target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules. In certain embodiments, the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest. i. Hybrid Capture
[00142] In certain embodiments, a target sequence or a subset of target sequences in the nucleic acid can be enriched by subjecting the nucleic acid comprising the target sequence or the subset of target sequences to hybrid capture. In hybrid capture, labeled (e.g., biotinylated) capture probes that can bind to one or more target sequences or subsets of target sequences are exposed to the nucleic acid comprising the one or more target sequences. The capture probes are specific to a sequence of interest, for example, a methylation pattern of interest that can be detected as a bisulfite-converted epitype. Examples of such hybrid capture probe sets include the KAPA HyperPrep Kit and SeqCAP Epi Enrichment System from Roche Diagnostics (Pleasanton, CA).
[00143] Hybrid capture can be performed before or after enrichment using the targeted cutting and overhang filling method described above. For example, in certain embodiments, the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to a nucleic acid enrichment method that includes cutting the nucleic acid molecules that include the one or more e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the molecules that include the one or more e.g., the plurality of) target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more (e.g., the plurality of) target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules. The method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to hybrid capture to enrich for a subset of the target sequences. In certain embodiments, the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
[00144] In certain embodiments, the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to hybrid capture to enrich for the one or more (e.g., the plurality of) target sequences. The method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to a nucleic acid enrichment method that includes cutting the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the target sequences or a subset of the target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more target sequences or subset of target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules. In certain embodiments, the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest. ii. Nucleic Acid Binding Proteins
[00145] In certain embodiments, a target sequence or a subset of target sequences in the nucleic acid can be enriched by subjecting the nucleic acid comprising the target sequence to nucleic acid binding proteins (also referred to herein as protein binders). The nucleic acid binding protein may bind a particular sequence or may bind to methylated CpGs. Exemplary nucleic acid binding proteins that bind to a particular sequence include transcription factors and nuclease deficient CRISPR enzymes (e.g., DCas9). Exemplary DNA binding proteins that bind methylated CpGs include methyl-CpG-binding domain (MBD) proteins such as MECP2 (methyl-CpG-binding protein 2), MBD1, MBD2, MBD3, MBD4, MBD5, MBD6, the Kaiso family proteins, and the SET- and Ring finger-associated (SRA) domain family. In certain embodiments, the MBD protein is selected from MECP2, MBD1, MBD2, and MBD4. See, e.g., Du et al. (2015) Epigenomics 7(6): 1051-1073, incorporated by reference herein for all purposes.
[00146] In certain embodiments, a nucleic acid comprising a target sequence is exposed to a protein comprising a nucleic acid binding protein, which binds to a target sequence or a subset of target sequences. The target sequence or subset of target sequences can be enriched by isolating the target sequence- nucleic acid binding protein complex, for example, using an antibody to the nucleic acid binding protein. In certain embodiments, the nucleic acid binding protein is attached to a label. In certain embodiments, the label comprises a fluorophore, biotin or digoxigenin. In certain embodiments, the label binds a capture domain that can be used to isolate and/or separate the target nucleic acid or subset of target nucleic acids from a sample. In certain embodiments, the capture domain comprises avidin, streptavidin, or a DIG-binding protein. In certain embodiments, the solid support is a bead, a well, a tube, or a slide.
[00147] In certain embodiments, a nucleic acid comprising a target sequence is exposed to a protein comprising a methyl-CpG-binding domain (MBD), which binds to a methylated CpG within the target sequence. The target sequence can be enriched by isolating the target sequence MDB complex. Because CGIs are typically not methylated, use of a nucleic acid binding protein enrichment using an MBD would enrich for methylated fragments, for example, rare methylated fragments. In certain embodiments, the MBD is MBD3, which binds to 5- hydroxymethylcytosine. In certain embodiments, the method enriches for hydroxymethylcytosine-containing fragments of a CGI.
[00148] Enrichment using a nucleic acid binding protein can be performed before or after enrichment using the targeted cutting and overhang filling method described above. For example, in certain embodiments, the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to a nucleic acid enrichment method that includes cutting the nucleic acid molecules that include the one or more (e.g. , the plurality of) target sequences to generate single stranded overhangs at the cut end of the molecules that include the one or more e.g., the plurality of) target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more (e.g., the plurality of) target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules. The method comprises a second step of combining the one or more (e.g., the plurality of) target molecules with a nucleic acid binding protein to enrich for a subset of the target sequences. In this method, the nucleic acid binding protein binds to the one or more (e.g., the plurality of) target molecules and the nucleic acid binding protein-target complex is isolated as described above. In certain embodiments, the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
[00149] In certain embodiments, the method comprises a first step of subjecting a plurality of nucleic acid molecules that include one or more (e.g., a plurality of) target sequences to hybrid capture to enrich for the one or more (e.g., the plurality of) target sequences. The method comprises a second step of subjecting the one or more (e.g., the plurality of) target molecules to a nucleic acid enrichment method that includes cutting the one or more (e.g., the plurality of) target sequences to generate single stranded overhangs at the cut end of the target sequences or a subset of the target sequences, filling the overhangs with at least one labeled nucleotide, and enriching the molecules that include the one or more target sequences or subset of target sequences by contacting the labeled nucleotides in the molecules with the capture domains and/or separating labeled molecules from unlabeled molecules. In certain embodiments, the first step enriches for target sequences in a genomic region of interest and the second step enriches for a subset of the target sequences that contains a methylation pattern (e.g., an epitype) of interest.
[00150] After isolating and/or separating the nucleic acid comprising a target sequence, the method can include amplifying the nucleic acid molecule, e.g., by PCR. Amplification can occur while the nucleic acid is in contact with the capture domain, or the nucleic acid can be removed from the capture domain e.g., by heat elution, a chemical agent, mechanical disruption, or combinations thereof) prior to amplification.
III. Nucleic Acid Libraries Enriched for Regions of Interest and Methods of Making Same
[00151] In another aspect, the disclosure relates to a method for producing a nucleic acid library enriched for regions of interest (i.e., targets). In certain embodiments, regions of interest are enriched prior to making the library. In other embodiments, the library is made and then regions of interest present in the library are enriched.
[00152] A method of making a nucleic acid library enriched for regions of interest can include obtaining a sample comprising a plurality of nucleic acids, wherein the plurality of nucleic acids comprise regions of interest. In certain embodiments, methylated cytosines are converted to uracils. Adapters are added to the plurality of nucleic acids to form a nucleic acid library. To enrich the nucleic acid library that is initially created for regions of interest, the plurality of nucleic acid molecules having regions of interest can be cut, e.g., by a nuclease, to generate single stranded overhangs at cut ends of the molecules that include the regions of interest. The overhangs are filled in, e.g. , using a polymerase, with a least one labeled nucleotide. The molecules that include the regions of interest are then enriched by contacting the labeled nucleic acids in the molecule with capture domains which can be used to separate and/or isolate the labeled nucleic acids from unlabeled nucleic acids. Nucleic acids containing the regions of interest can be amplified to form the nucleic acid library enriched for regions of interest.
[00153] An exemplary method of making a nucleic acid library enriched for regions of interest is shown in FIG. 1. A cell-free DNA (cfDNA) sample comprising methylated nucleotides that have been converted using bisulfite treatment are used to construct a nucleic acid library. sgRNAs complementary to target sequences are constructed, and the library is exposed to Casl2 and the sgRNAs. The sgRNAs direct Casl2 to the target sequence and cleave the DNA, leaving an overhang. Biotinylated dNTPs are added with a polymerase (Klenow fragment) to fill in the nucleotides complementary to the overhang. Streptavidin beads bind to the biotinylated nucleotides of the target sequences, and any uncut, off-target sequences are washed away. A PCR amplification is then performed to amplify the enriched target sequences that have been bound to the bead.
[00154] Enriched libraries can also be made by enriching for regions of interest prior to making the library. In this method, a sample is obtained which comprises a plurality of nucleic acids, wherein a subset of the plurality of nucleic acids comprise or are suspected to comprise regions of interest. The subset of the plurality of nucleic acid molecules having regions of interest are cut to generate single stranded overhangs at cut ends of the molecules that include the regions of interest. The overhangs are filled in, e.g., using a polymerase, with a least one labeled nucleotide. The nucleic acids that include the regions of interest are then enriched by contacting the labeled nucleic acids with capture domains which can be used to separate and/or isolate the labeled nucleic acids from unlabeled nucleic acids. The nucleic acids that include the regions of interest are removed from the capture domains. In certain embodiments, the nucleic acids are treated to convert methylated cytosines to uracils, to preserve information about the methylation state of the nucleic acids. Nucleic acid adapters are added to the plurality of nucleic acids to form the nucleic acid library enriched for regions of interest.
[00155] Another exemplary method of making a nucleic acid library enriched for regions of interest is shown in FIG. 2. In this method, a cell-free DNA (cfDNA) sample is exposed to Casl2 and sgRNAs complementary to target sequences in a target region. The sgRNAs direct Casl2 to the target sequence, and cleave the DNA, leaving an overhang. Biotinylated dNTPs are added with a polymerase (Klenow fragment) to fill in the nucleotides complementary to the overhang. Streptavidin beads bind to the biotinylated nucleotides present in the target region, and any uncut, off-target sequences are washed away. The enriched target sequences are eluted off of the beads using heat treatment. The eluted target sequences are then treated with bisulfite to preserve methylation information. The bisulfite converted target sequences are used to construct a library that is enriched for target sequences.
[00156] In certain embodiments, nucleic acids containing or suspected of containing a target sequence are contacted with an enzyme that (1) recognizes a region of interest (z.e., target sequence) and (2) cuts (cleaves) the nucleic acid molecules that contain the target sequence with a nuclease. In certain embodiments, the nuclease cleaves the nucleic acid within the target sequence. In certain embodiments, the nuclease cleaves the nucleic acid near the target sequence. For example, the nuclease may cleave the nucleic acid within about 1 nt to about 20 nt, about 2 nt to about 20 nt, about 5 nt to about 20 nt, about 10 nt to about 20 nt, about 15 nt to about 20 nt, about 1 nt to about 15 nt, about 2 nt to about 15 nt, about 5 nt to about 15 nt, about 10 nt to about 15 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 10 nt, about 2 nt to about 10 nt, about 5 nt to about 10 nt, about 1 nt to about 5 nt, about 2 nt to about 5 nt, about 2 nt to about 5 nt of the target sequence.
[00157] Nucleases suitable for use herein generate a staggered cut in the nucleic acid, leaving a single stranded overhang of unpaired nucleotides. The overhang may be any length, for example, between 1 and 3 nt, between 1 and 5 nt, between 1 and 10 nt, between 1 and 15 nt, between 3 and 5 nt, between 3 and 10 nt, between 3 and 15 nt, or between 5 and 10 nt, between 5 and 15 nt, between 10 and 15 nt, or about 2 nt, about 3 nt, about 4 nt, about 5 nt, about 6 nt, about 7 nt, about 8 nt, about 9 nt, about 10 nt, about 11 nt, about 12 nt, about 13 nt, about 14 nt, or about 15 nt. Exemplary nucleases include type II and type V CRISPR-Cas nucleases. In certain embodiments, the nuclease is a Cas9 or Casl2 nuclease, or a variant thereof (see, e.g., Liu et al. (2019) supra). In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease. In certain embodiments, the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence that binds to the target sequence.
[00158] As described above, overhangs are filled in using a polymerase, such as a DNA polymerase. In certain embodiments, the DNA polymerase I consists of the Klenow fragment. The polymerase reaction includes nucleotides (free nucleotides), such as dNTPs, which are used by the DNA polymerase to fill in the overhangs. In certain embodiments, at least one nucleotide comprises a label. In certain embodiments, the label comprises a fluorophore, biotin or digoxigenin. In certain embodiments, the target nucleic acid is enriched by isolating and/or separating the labeled nucleic acid. In certain embodiments, the label binds to a capture domain. In certain embodiments, the capture domain comprises avidin, streptavidin, or a DIG-binding protein. In certain embodiments, the capture moiety comprises or is connected to a solid support. In certain embodiments, the solid support is a bead, a well, a tube, or a slide.
[00159] In any of the foregoing embodiments, at any step in the method, the target sequence or a subset of target sequences in the nucleic acid can be enriched by subjecting the nucleic acid comprising the target sequence to hybrid capture. For example, hybrid capture can be performed before or after enrichment using the targeted cutting and overhang filling method described above. Hybrid capture can be performed before or after addition of adaptors. Hybrid capture can be performed before or after conversion of nucleotides (e.g., by bisulfite conversion). In certain embodiments, hybrid capture enriches for target sequences in a genomic region of interest and targeted cutting and overhang filling enriches for a subset of the target sequences that contains a methylation pattern e.g., an epitype) of interest. In certain embodiments, targeted cutting and overhang filling enriches for target sequences in a genomic region of interest and hybrid capture enriches for a subset of the target sequences that contains a methylation pattern e.g., an epitype) of interest.
[00160] After isolating and/or separating the nucleic acid comprising a target sequence, the method can include amplifying the nucleic acid molecule. Amplification can occur while the nucleic acid is in contact with the capture domain, or the nucleic acid can be removed from the capture domain (e.g., by heat elution, a chemical agent, mechanical disruption, or combinations thereof) prior to amplification.
[00161] In any of the foregoing embodiments, adaptors can be attached to a nucleic acid by any means known in the art, for example, as are used in connection with next generation sequencing (NGS). For example, adapters, such as a Y adapter, can be attached to a nucleic acid by ligation. In certain embodiments, the adapter is attached by nucleic acid amplification of the cell-free nucleic acid using a primer comprising the adapter. In certain embodiments, the adapter comprises one or more of a flow cell binding site, an index, a unique molecular identifier (UMI), and a sequencing binding site. Such adapters can be used to subsequently sequence the library using NGS.
[00162] In another aspect, the disclosure relates to a nucleic acid library, produced by the methods described herein.
IV. Nucleic Acids
A. Source of Nucleic Acids
[00163] Nucleic acids used in the methods described herein can be derived from any source, such as a sample taken from the environment or from a subject (e.g., a human subject). A biological sample can be treated to physically disrupt tissue or cell structure (e.g. , centrifugation and/or cell lysis), thus releasing intracellular components into a solution which can further contain enzymes, buffers, salts, detergents, and the like which can be used to prepare the sample for analysis. A biological sample can take any of a variety of forms, such as a liquid biopsy e.g., blood, urine, stool, saliva, or mucous), or a tissue biopsy, or other solid biopsy. Examples of biological samples include, but are not limited to, blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of the subject. A biological sample can include any tissue or material derived from a living or dead subject. A biological sample can be a cell-free sample. A sample can be a liquid sample or a solid sample (e.g., a cell or tissue sample). A biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g., of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g., thyroid, breast), etc.
[00164] The nucleic acid can be of any composition form, such as deoxyribonucleic acid (DNA, e.g., complementary DNA (cDNA), genomic DNA (gDNA) and the like), and/or DNA analogs (e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like), and/or ribonucleic acid (RNA) and or RNA analogs, all of which can be in single- or doublestranded form. In certain embodiments, single-stranded nucleic acids can be made double stranded prior to cutting with an enzyme. Unless otherwise limited, a nucleic acid can comprise known analogs of natural nucleotides, some of which can function in a similar manner as naturally occurring nucleotides. A nucleic acid can be in any form useful for conducting processes herein (e.g., linear, circular, supercoiled, single-stranded, double-stranded and the like). A nucleic acid in some embodiments can be from a single chromosome or fragment thereof (e.g. , a nucleic acid sample may be from one chromosome of a sample obtained from a diploid organism). In certain embodiments nucleic acids comprise nucleosomes, fragments or parts of nucleosomes or nucleosome-like structures. Nucleic acids can comprise protein e.g., histones, DNA binding proteins, and the like). Nucleic acids analyzed by processes described herein can be substantially isolated and are not substantially associated with protein or other molecules. Nucleic acids can also include derivatives, variants and analogs of DNA synthesized, replicated or amplified from single-stranded (“sense” or “antisense,” “plus” strand or “minus” strand, “forward” reading frame or “reverse” reading frame) and double-stranded polynucleotides. Deoxyribonucleotides can include deoxyadenosine, deoxycytidine, deoxyguanosine and deoxythymidine. A nucleic acid may be prepared using a nucleic acid obtained from a subject as a template.
[00165] In certain embodiments, the nucleic acid is a cell-free nucleic acid, which can be found in bodily fluids such as blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of a subject. In certain embodiments, a plasma sample can be used directly in the methods disclosed herein (for example, in the cutting step), without prior purification or isolation of nucleic acids in the plasma. Cell-free nucleic acids originate from one or more healthy cells and/or from one or more cancer cells, or from non-human sources such bacteria, fungi, viruses. Examples of the cell-free nucleic acids include but are not limited to cell-free DNA (“cfDNA”), including mitochondrial DNA or genomic DNA, and cell-free RNA. In certain embodiments herein, instruments for assessing the quality of the cell-free nucleic acids, such as the TapeStation System from Agilent Technologies (Santa Clara, CA) can be used. Concentrating low- abundance cfDNA can be accomplished, for example using a Qubit Fluorometer from Thermofisher Scientific (Waltham, MA).
[00166] In various embodiments, the majority of DNA in a biological sample that has been enriched for cell-free DNA (e.g., a plasma sample obtained via a centrifugation protocol) can be cell-free e.g., greater than 50%, 60%, 70%, 80%, 90%, 95%, or 99% of the DNA can be cell-free).
B. Nucleic Acids Derived From Methylated Nucleic Acids
[00167] A methylated nucleic acid is a nucleic acid having a modification in which a hydrogen atom on the pyrimidine ring of a cytosine base is converted to a methyl group, forming 5 -methylcytosine. Methylation can occur at dinucleotides of cytosine and guanine referred to herein as “CpG sites”, which can be a target for enrichment. Methylation of cytosine can occur in cytosines in other sequence contexts, for example, 5'-CHG-3' and 5'-CHH-3', where H is adenine, cytosine or thymine. Cytosine methylation can also be in the form of 5- hydroxymethylcytosine. Methylation of DNA can include methylation of non-cytosine nucleotides, such as /'/’-methyl adenine (6mA). Anomalous cfDNA methylation can be identified as hypermethylation or hypomethylation, both of which may be indicative of cancer status. As is well known in the art, DNA methylation anomalies (compared to healthy controls) can cause different effects, which may contribute to cancer.
[00168] In certain embodiments, the nucleic acid comprises a CpG site (z'.e. , cytosine and guanine separated by only one phosphate group). In certain embodiments, the nucleic acid comprises a CpG island (also referred to as a “CG islands” or “CGI”) or a portion thereof, which is the target for enrichment. Because certain CGIs and certain features of certain CGIs in tumor cells tend to be different from the same CGIs or features of the CGIs in healthy cells, detection of such CGIs can be informative of a health condition. In certain embodiments, the CGI is a “cancer informative CGIs”, which is defined and described in more detail below. In certain embodiments, the CpG is an “informative CpG”, e.g., a “cancer informative CGI”. Such CGIs may have methylation patterns in tumor cells that are different from the methylation patterns in healthy cells. Accordingly, detection of a cancer informative CGI can be informative regarding a subject’s risk of developing cancer or can be indicative that the subject has cancer. Exemplary cancer informative CGIs, which can be target sequences as described herein, are identified in, e.g., Table 1 of U.S. Patent Publication 2020/0109456A1, Tables 2 and 3 of WO2022/133315, and TABLES 1-4 provided herein. C. Converting Unmethylated Nucleic Acids
[00169] In certain aspects, the nucleic acids of the invention have been treated to convert one or more unmethylated nucleotides (e.g., cytosines) to another nucleotide (a “converted nucleotide”, as used herein, such as a uracil), for example, prior to amplification. In certain embodiments, one or more unmethylated cytosines are converted to a nucleotide that pairs with adenine e.g., the unmethylated cytosine may be converted to uracil). In certain embodiments, one or more unmethylated adenines are converted to a base that pairs with cytosine e.g., the unmethylated adenine may be converted to inosine (I)). In certain embodiments, one or more methylated cytosines (e.g., a 5 -methylcytosine (5mC)) is converted to a thymine, which pairs with adenine. In certain embodiments, methylated cytosines are protected from conversion (e.g., deamination) during the conversion step.
[00170] After a nucleic acid has been treated to convert unmethylated, or, in some cases, methylated nucleotides, into another nucleotide, the nucleic acid may be amplified. During amplification, the converted nucleotide pairs with its complementary nucleotide, and in the next round of amplification, the complementary nucleotide pairs with a replacement nucleotide. For example, following the conversion of an unmethylated cytosine to a uracil, the nucleic acid may be amplified such that an adenine pairs with the uracil in the first round of replication, and in the second round of replication, the adenine pairs with a thymine. Accordingly, the thymine replaces the uracil in the original nucleic acid sequence, and is referred to herein as a “replacement nucleotide”.
[00171] In certain aspects, the nucleic acids of the invention have been selectively deaminated. Selective deamination refers to a process in which unmethylated cytosine residues are selectively deaminated over methylated cytosine (5-methylcytosine) residues. In certain embodiments, deamination of cytosine forms uracil, effectively inducing a C to T point mutation to allow for detection of methylated cytosines. Methods of deaminating cytosine are known in the art, and include bisulfite conversion and enzymatic conversion. In certain embodiments, the enzymatic conversion comprises subjecting the nucleic acid to TET2, which oxidizes methylated cytosines, thereby protecting them, and subsequent exposure to APOBEC, which converts unprotected (i.e., unmethylated) cytosines to uracils. [00172] In some embodiments, the conversion, for example, bisulfite conversion or enzymatic conversion, uses commercially available kits. Bisulfite conversion can be performed using commercially available technologies, such as EZ DNA Methylation-Gold, EZ DNAMethylation-Direct or an EZ DNAMethylation-Lighting kit (Zymo Research Corp (Irvine, California)) or EpiTect Fast available from Qiagen (Germantown, MD). In another example a kit such as APOBECSeq (NEBiolabs) or OneStep qMethyl-PCR Kit (Zymo Research Corp (Irvine, California)) is used. i. Bisulfite conversion
[00173] Bisulfite conversion is performed on DNA by denaturation using high heat, preferential deamination (at an acidic pH) of unmethylated cytosines, which are then converted to uracil by desulfonation (at an alkaline pH). Methylated cytosines remain unchanged on the single-stranded DNA (ssDNA) product. An overview of bisulfite conversion is provided in FIG. 3.
[00174] In some embodiments the methods include treatment of the sample with bisulfite (e.g., sodium bisulfite, potassium bisulfite, ammonium bisulfite, magnesium bisulfite, sodium metabisulfite, potassium metabisulfite, ammonium metabisulfite, magnesium metabisulfite and the like). Unmethylated cytosine is converted to uracil through a three-step process during sodium bisulfite modification. As shown in FIG. 3, the steps are sulphonation to convert cytosine to cytosine sulphonate, deamination to convert cytosine sulphonate to uracil sulphonate and alkali desulfonation to convert uracil sulphonate to uracil. Conversion on methylated cytosine is much slower and is not observed at significant levels in a 4-16 hour reaction. (See Clark et al., Nucleic Acids Res., 22(15):2990-7 (1994).) If the cytosine is methylated it will remain a methylated cytosine. If the cytosine is unmethylated it will be converted to uracil.
When the modified strand is copied, for example, through extension of a locus specific primer, a random or degenerate primer or a primer to an adaptor, a G will be incorporated in the interrogation position (opposite the C being interrogated) if the C was methylated and an A will be incorporated in the interrogation position if the C was unmethylated and converted to U.
When the double stranded extension product is amplified those Cs that were converted to Us and resulted in incorporation of A in the extended primer will be replaced by Ts during amplification. Those Cs that were not converted (z'.e. , the methylated Cs) and resulted in the incorporation of G will be replaced by unmethylated Cs during amplification. ii. Enzymatic conversion
[00175] In certain embodiments, the enzymatic treatment with a cytidine deaminase enzyme is used to convert cytosine to uracil. Enzymatic conversion can include an oxidation step, in which Tet methylcytosine dioxygenase 2 (TET2) catalyzes the oxidation of 5mC to 5hmC to protect methylated cytosines from conversion by subsequent exposure to a cytidine deaminase. Other protection steps known in the art can be used in addition to or in place of oxidation by TET2. After the oxidation step, the nucleic acid is treated with the cytidine deaminase to convert one or more unmethylated cytosines to uracils. As with bisulfite conversion, when the modified strand is copied, a G will be incorporated in the interrogation position (opposite the C being interrogated) if the C was methylated and an A will be incorporated in the interrogation position if the C was unmethylated. When the double stranded extension product is amplified those Cs that were converted to Us and resulted in incorporation of A in the extended primer will be replaced by Ts during amplification. Those Cs that were not modified and resulted in the incorporation of G will remain as C.
[00176] In certain embodiments the cytidine deaminase may be APOBEC. In certain embodiments the cytidine deaminase includes activation induced cytidine deaminase (AID) and apolipoprotein B mRNA editing enzymes, catalytic polypeptide-like (APOBEC). In certain embodiments, the APOBEC enzyme is selected from the human APOBEC family consisting of: APOBEC- 1 (Apol), APOBEC-2 (Apo2), AID, APOBEC-3A, -3B, -3C, -3DE, -3F, -3G, -3H and APOBEC-4 (Apo4). In certain embodiments, the APOBEC enzyme is APOBEC-seq. iii. Nitrite Conversion
[00177] In certain embodiments, nitrite treatment is used to deaminate adenine and cytosine. Deamination of an A results in conversion to an inosine (I), which is read by a polymerase as a G, whereas deamination of a methylated A (A6-methyladenine (6mA)) results in a nitrosylated 6mA (6mA-N0), which causes the base to be read by a polymerase as an A. Deamination of a C results in conversion to a uracil, which is read by a polymerase as a T, whereas deamination of a A4-methylcytosine (4mC) to 4mC-N0 or a 5-methylcytosine (5mC) to a T causes the base to be read by a polymerase as a C or a T, respectively. For 5mC bases, the C to T ratio at the 5mC position is about 40% higher than other cytosine positions, allowing 5mC to be differentiated from C. (See, Li et al. (2022) Genome Biology 23: 122.)
V. Guide RNAs (gRNAs, sgRNAs)
[00178] A “guide RNA” (“gRNA”) is a type of RNA that includes a CRISPR RNA sequence (crRNA, also referred to as a “guide sequence” or “spacer”), and, in certain embodiments, a trans-activating CRISPR RNA sequence (tracrRNA). The tracrRNA, if present, binds to an endonuclease (e.g., a CRISPR enzyme) and the crRNA is complementary to a target sequence. In certain embodiments, the guide RNA is referred to as a single guide RNA (sgRNA), which refers to a guide RNA comprising both a crRNA and tracrRNA.
[00179] A guide sequence can be designed to have complementarity to a target sequence of the disclosure, where hybridization between a target sequence and a guide sequence promotes the formation of a gene editing endonuclease complex (e.g., a CRISPR complex). Full complementarity may not be required, provided there is sufficient complementarity to cause hybridization and promote formation of a gene editing endonuclease complex (e.g., a CRISPR complex). In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. In certain embodiments, the guide sequence and the target sequence exhibit full (100%) complementarity.
[00180] Optimal alignment of the polyribonucleotide to the target sequence may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows -Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina®, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length.
[00181] The ability of a guide sequence to direct sequence- specific binding of a CRISPR complex to a target sequence may be assessed by any suitable assay. For example, the components of a CRISPR system sufficient to form a CRISPR complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the CRISPR sequence, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a CRISPR complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.
VI. Nucleases
[00182] The nuclease used in the methods described herein can be an endonuclease, for example, a Cas protein, that is capable of cleaving DNA to effect a staggered break at the intended locus, wherein the break results in an overhang. Non-limiting examples of Cas proteins that are capable of cleaving DNA to effect a staggered break at the intended locus, wherein the break results in an overhang, include type V CRISPR enzymes such as Casl2, Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2fl, Casl2g, Casl2h, Casl2i, homologs thereof, or modified versions thereof, (see, e.g., Liu et al. (2019) supra). For example, in some embodiments, the DNA endonuclease is a Cas 12 endonuclease that effects a staggered break at a locus within or near a target sequence, producing a 1-5 nt overhang. Cas 12 recognizes a 5’-T-rich PAM, such as TTN or TTTN.
[00183] In certain embodiments, the endonuclease is a Casl2a/Cpfl endonuclease; a homolog thereof, a recombinant of the naturally occurring molecule thereof, a codon-optimized version thereof, a modified version thereof, and combinations of any of the foregoing. The Casl2a/Cpfl endonuclease can be derived from a variety of bacterial species. For example, in certain embodiments, the Casl2a/Cpfl endonuclease is derived from Acidaminococcus bacteria or Lachnospiraceae bacteria. In a specific embodiment, the Casl2a/Cpfl endonuclease is a Lachnospiraceae bacterium ND2006 Cpf 1.
[00184] In certain embodiments, the endonuclease is a MAD7 endonuclease, a homolog thereof, a recombinant of the naturally occurring molecule thereof, a codon-optimized version thereof, a modified version thereof, and combinations of any of the foregoing. MAD7 is a codon optimized endonuclease can be derived from Eubacterium rectale (Inscripta, Boulder, CO.) MAD7 is described in U.S. Patent No. 9,982,279.
[00185] In addition, it has recently been discovered that Cas9 enzyme (a type II CRISPR enzyme that recognizes a 3’-G-rich PAM such as NGG) previously thought to create blunt cuts (z.e., breaks in DNA that do not result in an overhang), is capable of creating a 1 nt, 2 nt, and 3 nt overhangs. (See, e.g., Shi et al. (2019) Cell Discovery 5:53.) Accordingly, in certain embodiments, the endonuclease is a Cas9 protein.
[00186] Another nuclease capable of cleaving DNA to effect a staggered break at the intended locus, wherein the break results in an overhang, is a CasX nuclease. (See, Liu et al. (2019) Nature 566:218-223. CasX recognizes a 5’-TTCN PAM and is capable of creating 10-nt overhangs. (Id.)
[00187] In some embodiments, the endonuclease (e.g., a CRISPR enzyme) directs cleavage at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the endonuclease directs cleavage within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
[00188] In general, “CRISPR system” refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus. In some embodiments, one or more elements of a CRISPR system is derived from a type I, type II, or type III CRISPR system. In some embodiments, one or more elements of a CRISPR system is derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system).
VII. Kits
[00189] Also disclosed herein are kits for enriching a target nucleic acid and/or making an enriched nucleic acid library. For example, the kit may include a nuclease that cuts a nucleic acid molecule including a target sequence to generate a single stranded overhand at a cut end of the molecule that includes the target; labeled dNTPs; DNA polymerase; and a capture moiety comprising a capture domain.
[00190] In certain embodiments, the kit includes a nuclease, such as a CRISPR-Cas nuclease. In certain embodiments, the nuclease is a type II CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Cas9 nuclease. In certain embodiments, the nuclease is a type V CRISPR-Cas nuclease. In certain embodiments, the nuclease is a Casl2 nuclease. In certain embodiments, the nuclease is a Casl2a/Cpfl nuclease. In certain embodiments, the nuclease is a MAD7 nuclease. In certain embodiments, the nuclease is a CasX nuclease.
[00191] In certain embodiments, the DNA polymerase is DNA polymerase I. In certain embodiments, the DNA polymerase I consists of the Klenow fragment. In certain embodiments, the label comprises biotin, digoxigenin, a magnetic moiety or a fluorophore. In certain embodiments, the capture moiety comprises avidin, streptavidin, or a DIG-binding molecule. In certain embodiments, the capture moiety comprises or is connected to a solid support.
[00192] Kits contemplated herein may further include a solid support, such as a bead, a well, a tube, or a slide. In certain embodiments, the capture domain comprises streptavidin connected to a bead.
[00193] Throughout the description, where apparatus, devices, and systems are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are apparatus, devices, and systems of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps. [00194] Practice of the invention will be more fully understood from the foregoing examples, which are presented herein for illustrative purposes only, and should not be construed as limiting the invention in any way.
EXAMPLES
Example 1 - Cleaving Target Sequences Using CRISPR-Casl2a
[00195] This example describes an exemplary method for target cleavage (e.g., at a CpG site) using a gene editing system (CRISPR-Casl2a), for use in an enrichment method provided herein.
Prepare target specimen
[00196] DNA samples (either genomic DNA (gDNA) or sheared genomic DNA (shDNA)) comprising a target sequence were obtained. The shDNA was sheared to approximately 180bp to serve as a model for cfDNA. Herring DNA lacking the target sequence of interest was used as a negative control. An amplicon containing the target sequence of interest (HPRT control target, or one or six experimental CpG sites) was generated using PCR (New England BioLabs® LongAmp®) and purified using solid-phase reversible immobilization (SPRI).
Cleaving target specimen using Casl2a
[00197] One (1) pM Casl2a and 300nM crRNA were incubated at room temperature for at least 10 minutes to create Cas complexes. Thirty (30) nM of each amplicon generated in the purification step was added to the complexes to cut the amplicon at the target sites with a 4-base overhang on the opposite strand to the PAM. A solid-phase reversible immobilization (SPRI) selection was performed to remove unwanted DNA fragments, excess enzymes, dNTPs and molecules. Purified cleaved DNA was analyzed using the Agilent TapeStation and Qubit™ Fluorometer to determine cutting efficiency.
Results
[00198] As shown in TABLE 5, Cas 12a cut each target, with an average cutting efficiency of between 48% and 93%. TABLE 5
Figure imgf000054_0001
Example 2 - Cleaving Target Sequences in Plasma Using CRISPR-Casl2a
[00199] This example demonstrates that a CRISPR cleavage reaction can be performed in plasma instead of buffer, suggesting that a plasma sample can be used directly in a CRISPR cleavage reaction in connection with the methods of the disclosure.
Prepare target specimen
[00200] A human DNA sample comprising a target sequence of interest was obtained. An amplicon containing the target sequence of interest (HPRT control target) was generated using PCR (New England BioLabs® LongAmp®) and purified using solid-phase reversible immobilization (SPRI).
Cleaving target specimen using Casl2a in plasma
[00201] One (1) pM (lx), 3 pM (3x), or 5 pM (5x) Casl2a and 300nM (lx), 900 nM (3x), or 1.5 pM (5x) crRNA, respectively, were incubated at room temperature for at least 10 minutes to create Cas complexes. Thirty (30) nM of the HPRT amplicon and 21 pL of deionized water or plasma were added to the complexes to cut the amplicon at the target sites with a 4-base overhang on the opposite strand to the PAM. A solid-phase reversible immobilization (SPRI) selection was performed to remove unwanted DNA fragments, excess enzymes, dNTPs and molecules. Purified cleaved DNA was analyzed using the Agilent TapeStation and Qubit™ Fluorometer to determine cutting efficiency.
[00202] The experiment was repeated using various combinations of 3 pM (3x), or 5 pM
(5x) Cas 12a and 300nM (lx), 600 nM (2x), or 900 nM (3x) crRNA
Results
[00203] As shown in TABLE 6, Casl2a is capable of cutting a target DNA sequence in the presence of plasma, and increasing the amount of Casl2a and crRNA in the reaction increases the efficiency of cutting to a level that is similar to the efficiency of cutting in buffer.
TABLE 6
Figure imgf000055_0001
Figure imgf000056_0001
Example 3 - Cleaving Target Sequences at Room Temperature Using CRISPR-Casl2a
[00204] This example demonstrates that the methods of the disclosure can be performed at room temperature.
Prepare target specimen
[00205] A human DNA sample comprising a target sequence of interest (CpG-4) was obtained. An amplicon containing the target sequence of interest was generated using PCR (New England BioLabs® LongAmp®) and purified using solid-phase reversible immobilization (SPRI).
Cleaving target specimen using Casl2a at room temperature
[00206] One (1) pM Casl2a and 300nM crRNA were incubated at room temperature for at least 10 minutes to create Cas complexes. Thirty (30) nM of the CpG-4 amplicon and 21pL of deionized water were added to the complexes to cut the amplicon at the target site. The reaction was incubated at room temperature for 30s, Im, 3m, 5m, or 10m, and then 1 pL ProK was added. A solid-phase reversible immobilization (SPRI) selection was performed to remove unwanted DNA fragments, excess enzymes, dNTPs and molecules. Purified cleaved DNA was analyzed using the Agilent TapeStation and Qubit™ Fluorometer to determine cutting efficiency.
Results
[00207] As shown in TABLE 7, Cas 12a is capable of cutting a target DNA sequence at room temperature, with the highest efficiency of cutting seen with a 5m and 10m (above 5 minutes) incubation time. TABLE 7
Figure imgf000057_0001
Example 4 - Incorporation of Biotin-Labeled dNTPs at an Overhang of a CRISPR Cut Site
[00208] This experiment demonstrates that biotin-labeled dNTPs can be incorporated at the overhang of a CRISPR cut site at a target sequence.
Prepare target specimen
[00209] A human DNA sample comprising a target sequence of interest was obtained. An amplicon containing the target sequence of interest was generated using PCR (New England BioLabs® LongAmp®) and purified using solid-phase reversible immobilization (SPRI).
Cleaving target specimen using Casl2a and filling in overhangs with biotinylated dNTPs
[00210] Casl2a (1 pM) and crRNA (300 nM) were incubated for at least 10 minutes to create Cas complexes. Amplicon was added to the complexes to cut the amplicon at the target sites with a 4-base overhang on the opposite strand to the PAM. The overhang bases were filled in using DNA Polymerase-I and 1 mM biotinylated-dNTPs and/or 1 mM unlabeled dNTPs. DNA polymerase was used at 0.1 units/pL (lx) or 0.5 units/pL (5x). Streptavidin beads were added and bound to DNA containing biotinylated dNTPs. The reaction mixture was centrifuged and the beads separated from the supernatant. Bead and supernatant samples were analyzed using the Agilent TapeStation and Qubit™ Fluorometer to determine cutting efficiency. Results
[00211] As shown in FIG. 4, the results of the experiment show that streptavidin beads were capable of binding to and isolating target fragments that had incorporated biotinylated dNTPs. Bands representing a biotinylated target fragment bound to beads were seen in both the lx and 5x polymerase (“Enzyme”) conditions and with anywhere from 10% to 100% biotinylated dNTPs. Three negative controls, “cut control” lacking polymerase enzyme and biotinylated dNTPs, “no bind control” lacking Casl2, crRNA, and polymerase enzyme, and “bind control” which contained a biotinylated amplicon, did not contain biotinylated target fragment.
Example 5 - Positive Enrichment of Target Sequences using CRISPR and Library Generation
[00212] This example provides an exemplary process overview for Casl2a positive enrichment of target sequences. A flowchart of the experimental design is shown in FIG. 5 and a schematic of each step is shown in FIG. 6. This example demonstrates successful completion of a target-sequence enriched library using CRISPR to enrich sequences of interest.
End-repair cfDNA
[00213] Cell-free DNA comprising a target of interest was blunt end-repaired by incubating cfDNA, dNTPs and Klenow fragment (3 ’-5’ exo-) at 37C for 30 minutes. A solidphase reversible immobilization (SPRI) selection was used to remove unwanted DNA fragments, excess enzymes, dNTPs and molecules.
Positive enrichment using Casl2a
[00214] Casl2a and crRNA were incubated at room temperature (25°C) for 10 minutes to create Cas complexes. The target specimen was then spiked into the complexes to cut the specimen at the target sites with a 4-base overhang on the opposite strand to the PAM. Biotinylated dNTPs and Klenow fragment (3 ’-5’ exo-; 0.1 units/pL) were then added and incubated at 37°C for 30 minutes to fill-in the overhang. A solid-phase reversible immobilization (SPRI) selection was performed to remove unwanted DNA fragments, excess enzymes, dNTPs and molecules. DNA comprising the target sequence (having biotinylated dNTPs) were hybridized to streptavidin beads using the biotin: streptavidin interaction. A series of washes removed off-target DNA molecules and the samples were enriched for on-target fragments and depleted for off-target fragments. Streptavidin beads with target DNA bound were resuspended in water.
Unmethylated cytosines converted to uracils
[00215] Bisulfite conversion was performed on DNA bound to streptavidin beads by denaturation using high heat, preferential deamination (at an acidic pH) of unmethylated cytosines, which were then converted to uracil by desulfonation (at an alkaline pH). Methylated cytosines remained unchanged on the single-stranded DNA (ssDNA) product.
Generate library
[00216] The bisulfite converted ssDNA was then used to create a library (“library creation”, LC) using Adaptase® technology from IDT. This technology uses an enzymatic reaction resulting in unbiased addition of a truncated adapter. The Adaptase® enzymatic reaction performed end-repairing, tailing of 3’ ends and ligation of first truncated adapter complement to 3 ’ ends simultaneously. A uracil-free reverse complement to the bisulfite converted ssDNA was then generated using the truncated adapter to prime and extend. A solidphase reversible immobilization (SPRI) selection was performed to remove unwanted ssDNA fragments, excess adapters and molecules. A ligation reaction was performed, adding truncated P5 adapter to the 3’ end of the uracil-free reverse complement fragment. A solid-phase reversible immobilization (SPRI) selection was used to remove unwanted ssDNA fragments, excess adapters and molecules. Indexing PCR amplification was performed with a high fidelity DNA polymerase and unique, known 10-bp barcodes. Indices allow for sample multiplex for the downstream assay. The product was a bisulfite converted dsDNA library with full length adapters. Post-PCR, a SPRI selection was done to remove unwanted ssDNA fragments, excess primers, excess adapters and excess molecules. After library construction, the library quality and quantity were evaluated using the Agilent TapeStation and Qubit Fluorometer, respectively. Sequencing of enriched library
[00217] Sequencing was performed using an iSeq using paired end 150x150 base sequencing with a 5% PhiX spike-in. Sequencing data generated was then demultiplexed utilizing the assigned barcode, aligned to the human genome and trimmed. The cleaned-up data was then processed through a quality pipeline to collapse duplicate reads and the sequencing data was evaluated. As shown in FIG. 7, the library exhibited a conversion efficiency of 99.04%.
Example 6 — Enrichment of a Target Region Within a Nucleic Acid Having Multiple Target Sites
The methods of Example 5 were repeated using gDNA as the nucleic acid source and CpG-5plex as the target, which contained multiple cut sites. Enrichment for one of the targets (CpG-4) within the CPG-5plex is shown in FIG. 8. As shown, the CpG-4 target is enriched in the resulting library, where a no Cas/crRNA (“no cut”) control and a library constructed using the gDNA without the enrichment steps (“no C-Select”) showed no enrichment. These results demonstrate that a library enriched for specific target sequences can be constructed using the methods of the disclosure.
INCORPORATION BY REFERENCE
[00218] The entire disclosure of each of the patent and scientific documents referred to herein is incorporated by reference for all purposes.
EQUIVALENTS
[00219] The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein. Scope of the invention is thus indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein. TABLE 1 - List of CGIs
Reference Pos (hgl9 coordinates)
1 chrl3:108518334-108518633
2 chr6:137242315-137245442
3 chr2:177016416-177016632
4 chr5:2738953-2741237
5 chr4:111553079-111554210
6 chrl5:96909815-96910030
7 chr6:42072032-42072701
8 chrl0:123922850-123923542
9 chrl6:86612188-86613821
10 chrl9:47151768-47153125
11 chrl:110610265-110613303
12 chr5:3594467-3603054
13 chr9:126773246-126780953
14 chr3:138656627-138659107
15 chr4:4859632-4860191
16 chrl0:118895963-118898037
17 chr7:103086344-103086840
18 chrl9:407011-409511
19 chrl0:22764708-22767050
20 chrl6:86549069-86550512
21 chr9:96713325-96718186
22 chr8:139508795-139509774
23 chr2:73143055-73148260
24 chr8:26721642-26724566
25 chr9:129386112-129389231
26 chrl2:49483601-49484255
27 chrl6:54325040-54325703
28 chr8:72468560-72469561
29 chrl8:70533965-70536871
30 chr9:98111364-98112362
31 chrl:50882997-50883426
32 chrl0:88122924-88127364
33 chrll:31839353-31839813
34 chrl0:101290025-101290338
35 chr6:41528266-41528900
36 chrl6:51183699-51188763
37 chr5:140346105- 140346931
38 chr9:23820691-23822135
39 chr20:690575-691099
40 chrl:177133392-177133846
41 chr5:45695394-45696510
42 chr2:45395869-45398186
43 chr20:48184193-48184833
44 chr6:6002471-6005125
45 chrl4:101192851-101193499 chr8:4848968-4852635 chr8:53851701-53854426 chrl2:186863-187610 chr5:54519054-54519628 chr6:108485671-108490539 chr3:157815581-157816095 chrll:626728-628037 chr2:177012371-177012675 chrl7:59531723-59535254 chrl6:55364823-55365483 chr8:99960497-99961438 chr7:42267545-42267823 chrl7:14202632- 14203258 chrl0:102891010-102891794 chr5:174158680-174159729 chrl4:33402094-33404079 chr2:177036254-177037213 chrl0:106399567-106402812 chr6:166579973-166583423 chrll:123066517-123066986 chrll:44327240-44327932 chrl4:95237622-95238211 chr9:102590742-102591303 chrl5:76630029-75630970 chr4:24801109-24801902 chr8:97169731-97170432 chr3:6902823-6903516 chr22:48884884-48887043 chrl5:45408573-45409528 chr9:100610696-100611517 chr4:174448333-174448845 chrl6:20084707-20085305 chr4:174439812-174440249 chr6:10381558-10382354 chrl5:35046443-35047480 chrl0:119494493-119494991 chr5:72676120-72678421 chrll:44325657-44326517 chrl7:46670522-46671458 chrl4:92789494-92790712 chr4:174459200-174460054 chr2:80549578-80549798 chr7:153748407-153750444 chr6:1389139-1391393 chrl6:49314037-49316543 chr2:105459127-105461770 chr21:38079941-38081833 chr4:174427891-174428192 chrl4:60973772-60974123 chr8:99985733-99986983 chr2:63281034-63281347 chrl2:101109863-101111622 chrl:119549144-119551320 chr5:38257825-38259136 chr5:54522302-54523533 chrl:165324191-165326328 chrl5:33602816-33604003 chrl0:118030732-118034230 chr2:45240372-45241579 chr4:174430386-174430861 chr6:50810642-50810994 chr5:122430676-122431443 chrl0:109674196-109674964 chr8:97172634-97173880 chr8:11536767-11538961 chr5:180486154-180486892 chr2:38301276-38304518 chrl0:1778784-1780018 chrl2:54424610-54425173 ch r 17 :46669434-46669811 chrll:8190226-8190671 chr8:25900562-25905842 chrl2:81102034-81102716 chr7:27199661-27200960 chrl0:119311204-119312104 chrl2:130387609-130389139 chr7:155258827-155261403 chr6:117591533-117592279 chrl0:111216604-111217083 chrl:29585897-29586598 chr2:144694656- 144695180 chrl2:48397889-48398731 chr5:2748368-2757024 chrl2:114845861-114847650 chr2:80529677-80530846 chr5:1874907-1879032 chr6:100905952-100906686 chrl5:96904722-96905050 chr5:134374385-134376751 chr2:66652691-66654218 chrl2:54440642-54441543 chr6:108495654-108495986 chrl7:70112824-70114271 chr3:87841796-87842563 chr7:96650221-96651551 chr4:110222970-110224257 chr6:78172231-78174088 chr7:155164557-155167854 chrl2:113900750-113906442 chr9:112081402-112082905 chrl2:114886354-114886579 chr5:3590644-3592000 chr2:119592602-119593845 chr20:21485932-21496714 chrl8:11148307-11149936 chrl7:46824785-46825372 chrl0:100992156-100992687 chrl4:36986362-36990576 chrl8:55094825-55096310 chrl5:96895306-96895729 chrl7:36717727-36718593 chr2:223183013-223185468 chr7:30721372-30722445 chrl:53527572-53528974 chrl8:56939624-56941540 chr5:175085004-175085756 chrl0:50817601-50820356 chrl4:60975732-60978180 chrl5:89920793-89922768 chr9:122131086-122132214 chrl:217311467-217311773 chrl4:38724254-38725537 chrl4:61103978-61104663 chrl8:73167402-73167920 chrl:50880916-50881516 chr2:241758141-241760783 chrll:31825743-31826967 chr7:27260101-27260467 chr20:41817475-41819212 chr3:238391-240140 chr7:121950249-121950927 chr5:72526203-72526497 chrl5:96903311-96903711 chrl0:26504383-26507434 chr6:100915602-100915883 chrl:18962842-18963481 chr3:127794369-127796136 chr7:27203915-27206462 chr8:25899335-25899692 chrl2:114838312-114838889 chr6:38682949-38583265 chrll:31841315-31842003 chr4:174451828-174452962 chr9:129372737-129378106 chr2:176964062-176965509 chr2:176931575-176932663 chrl2:114833911-114834210 chrll:79148358-79152200 chr2:177024501-177025692 chr5:172672311-172672971 chr7:27291119-27292197 chrl:180198119-180204975 chrl4:37126786-37128274 chr2:200333687-200334172 chrl4:58331676-58333121 chr3:147131066- 147131333 chrl3:109147798-109149019 chrl4:48143433-48145589 chr6:100905444-100905697 chrl7:14200579- 14200996 chr6:1379693-1380014 chrl:34642382-34643024 chr2:119599059-119599299 chr2:119613031-119615565 chr4:85413997-85414874 chr9:17906419-17907488 chrl2:29302034-29302954 chr20:10200088-10200384 chr8:57358126-57359415 chrl0:63212495-63213009 chr2:176936246-176936809 chrll:20618197-20619920 chrl8:19744936-19752363 chrl4:29234889-29235908 chrl7:46673532-46674181 chr4:144620822- 144622218 chrl6:82660651-82661813 chr3:192125821-192127994 chr2:119599458-119600966 chr22:44257942-44258612 chrl9:13616752-13617267 chr3:147138916- 147139564 chr9:969529-973276 chrl8:55103154-55108853 chr4:174422024-174422443 chr4:57521621-57522703 chrl5:79724099-79725643 chrl4:37135513-37136348 chrl0:23480697-23482455 chr2:45169505-45171884 chrl8:30349690-30352302 chr6:99291327-99291737 chr9:21970913-21971190 chr4:107146-107898 chrl2:117798076-117799448 chr2:219736132-219736592 chrl0:118892161-118892639 chrll:27743472-27744564 chrl2:65218245-65219143 chrl2:75601081-75601752 chr7:54612324-54612558 chr6:100912071-100913337 chrl0:102905714-102906693 chr8:87081653-87082046 chr6:50818180-50818431 chrl:91189139-91189400 chr2:118981769-118982466 chrl0:50602989-50606783 chrl7:59528979-59530266 chr4:147559205- 147561901 chrl:4713989-4716555 chrl3:102568425-102569495 chrl6:6068914-6070401 chr22:29709281-29712013 chrl0:100993820-100994188 chr6:391188-393790 chr2:176977284-176977540 chr4:4868440-4869173 chr6:137809342-137810204 chrl2:54321301-54321721 chr2:105468851-105473488 chr8:55366180-55367628 chrl2:72665683-72667551 chr4:54966163-54968063 chr5:134366913-134367438 chrl:226075150-226075680 chr20:17206528-17206952 chr4:172733734-172735118 chrl8:55019707-55021605 chr2:162279835-162280709 chr6:1381743-1385211 chr7:103968783-103969959 chr6:150358872-150359394 chr2:119914126-119916663 chr7:27278945-27279469 chrl2:114851957-114852360 chrl6:24267040-24267527 chr6:7229877-7230865 chr2:45227644-45228783 chr4:174450046-174451469 chr4:154712073-154712706 chr3:22413492-22414365 chr20:21694472-21695344 chr6:1378445-1379318 chr8:70981873-70984888 chrl2:53107912-53108471 chrl0:102996034-102996646 chr3:157821232-157821604 chr4:111554965-111555504 chrl3:58206526-58208930 chrl0:22634000-22634862 chr9:22005887-22006229 chr5:159399004-159399928 chr2:31805293-31806403 chr6:100903491-100903713 chr5:77268350-77268787 chrl4:85997468-85998637 chr5:92923487-92924497 chrll:64480199-64481344 chrl3:28366549-28368505 chr5:77805753-77806313 chr9:79633326-79636030 chr4:93226348-93227007 chr2:223170486-223171140 chrl:91172102-91172771 chrl:1181756-1182470 chr8:65281903-65283043 chrl0:94825546-94826320 chr6:108491033-108491410 chr21:38076752-38077685 chrl:91183240-91184540 chr3:147136903- 147137328 chrl5:96911511-96911808 chrl4:57274607-57276840 chrl3:112726281-112728419 chr2:171672310-171675447 chr8:11559596-11562956 chrl0:48438411-48439320 chrl8:59000683-59001692 chrl5:91642908-91643702 chr5:3592391-3592644 chrl9:56988313-55989741 chr6:26614013-26614851 chrll:27742059-27742273 chr3:147113608- 147114479 chrl4:57264638-57265561 chr7:155302253-155303158 chrll:31848487-31848776 chrl6:54970301-54972846 chrl9:30715549-30715753 chr9:96710811-96711717 chrl8:77557780-77558948 chr20:21686199-21687689 chrll:31847132-31847958 chrl6:86530747-86532994 chrl:203044722-203045390 chrl5:53096014-53096482 chr7:97361132-97363018 chrl4:29236835-29237832 chrl3:79182859-79183880 chrll:69517840-69519929 chrl:231296559-231297345 chrl9:8675333-8575699 chrl:63795363-63796140 chr4:90228714-90229010 chr3:62362610-62363082 chrl9:5827754-5828405 chrl0:125732220-125732843 chr9:136293566-136294160 chrl:53782394-63790471 chr4:4867386-4857673 chr9:133534534-133542394 chrl5:100913438-100914022 chrl0:101279941-101280382 chrl3:53419897-53422872 chrl:77747314-77748224 chrl4:36974548-36975425 chrl2:57618769-57619402 chr7:49813008-49815752 chr4:188916605-188916876 chrll:31831620-31839038 chr8:132052203-132054749 chr2:237071794-237078762 chr20:39994545-39995810 chrll:132812662-132813075 chr5:1707351S9-170739863 chrl:221051966-221053673 chr5:72529099-72529976 chrl4:36973169-35973740 chr4:158141404-158141836 chrl4:103655241-103655928 chrl:65731411-65731849 chrl:38218190-38218977 chr3:128719865-128721245 chrl5:33009530-33011696 chr2:162275161-162275596 chr7:155241323-155243757 chrl9:46001830-46002686 chr6:137814355-137815202 chr7:70596228-70598382 chrl5:96959341-96960531 chrl6:66612749-66613412 chr6:110299365-110301267 chrl5:27215951-27216856 chrll:88241710-88242562 chr2:124782252-124783255 chrl7:70111979-70112308 chr2:63283936-63284147 chrl7:46800945-46801288 chr6:1393049-1394170 chr3:137489594-137491004 chrl5:60296135-60298520 chrl2:106979429-106981086 chrl2:54360374-54360660 chrl4:36991594-36992488 chr4:156129168-156130209 chr4:54975387-54976202 chr3:137482964-137484454 chrl0:118893527-118894432 chrl8:76737005-76741244 chrl0:110671724-110672326 chr5:71014917-71015715 chr6:50787285-50788091 chrl9:3868585-3869217 chr4:5894071-5895116 chrll:131780328-131781532 chr6:101846766-101847135 chrll:71952112-71952528 chr5:172663616-172664584 chr9:23822412-23822667 chr4:5891981-5892365 chrl:217310749-217311178 chrl0:108923780-108924805 chr6:100038655-100039477 chr7:121945345-121946235 chr3:147126988- 147128999 chr7:121956543-121957341 chr4:156680095-156681386 chr4:85404986-85405252 chrl:221064889-221065600 chrl7:73749618-73750178 chr8:55370170-55372525 chr6:70992040-70992912 chrl6:55513220-55513526 chr6:106433984-106434459 chrl4:29254365-29255069 chr6:33655965-33656238 chr9:19788215-19789288 chrll:115630398-115631117 chrl:34628783-34630976 chrl4:101923575-101925995 chrl7:72855621-72858012 chr2:223162946-223163912 chr4:85417659-85420799 chrl:156390403-156391581 chr3:147130342- 147130577 chr2:119602616-119604486 chr9:120175253-120177496 chr4:174443355-174443948 chr5:145724294- 145724551 chrll:32454874-32457311 chr2:176949511-176949795 chrl:18436551-18437673 chr3:26665950-26666164 chr3:170303044-170303249 chr2:223176493-223177515 chr2:182321761-182323029 chrl8:44789742-44790678 chrl7:46796234-46797292 chrl8:44772992-44775577 chr8:101117922-101118693 chr7:27134097-27134303 chrl0:102507482-102509646 chrl9:39754973-39756540 chr7:26415746-26416891 chrl4:37116188-37117628 chr4:174421347-174421559 chr6:85472702-85474132 chr20:22557517-22559240 chr6:117198089-117198705 chrl0:71331926-71333392 chrl9:36334994-35335321 chr4:46995128-45995872 chr9:135455164-135458586 chr8:65290108-65290946 chrl0:94828102-94829040 chrl:116380359-116382364 chrl5:47476369-47477499 chr3:147115764- 147116421 chrl7:59485573-59485780 chrl0:23983366-23984978 chr2:176949993-176950336 chr9:137967110-137967727 chr2:176957054-176958279 chrll:119293320-119293943 chrll:132813562-132814395 chr2:237068071-237068834 chrl0:27547668-27548402 chr4:4866438-4866813 chr21:19617098-19617874 chrl:91185156-91185577 chrl9:15292399-15292632 chrl:145075483- 145075845 chr2:19560963-19561650 chrl4:57260878-57262123 chr8:55378928-55380186 chr6:99290279-99290771 chrl9:13124959-13125259 chrl5:27112030-27113479 chr8:145925410- 145926101 chrll:124629723-124629926 chr4:109093038-109094546 chr3:62356773-62357315 chrl4:37131181-37132785 chrl0:124905634-124906161 chr7:35296921-35298218 chrl9:36248979-36249307 chrl2:15475318-15475901 chr5:87985470-87985810 chrl2:54423427-54423712 chr7:96653467-96654199 chr2:45155195-45157049 chrl5:96896928-96897301 chrl2:58004982-58005351 chr2:176933131-176933449 chr2:176962179-176962487 chr20:25063838-25065525 chrl2:5153012-5154346 chr3:154146347-154146965
■ ' :165323486-165323811 chr21:38065179-38066185 chrl0:119000435-119001530 chrl2:45444202-45445386 chr4:158143296-158144053 chr5:76932317-76933523 chr5:172659049-172660277 chr2:223168653-223169008 chrl:248020330-248021252 chrl8:904578-909574 chrl2:127940451-127940907 chr9:135461934-135462909 chrl7:48041282-48043064 chr4:94755786-94756310 chrl0:130338695-130338994 chr2:119616133-119616825 chr2:177042751-177043444 chr2:105478600-105479188 chr5:172670829-172671824 chr2:176952695-176953297 chrl3:28549839-28550246 chrl3:112720564-112723582 chr6:100895773-100896062 chr7:136553854-136556194 chr6:127441553-127441760 chrl:119526782-119527192 chrl2:49484920-49485178 chr9:23850910-23851522 chr2:220299483-220300243 chr5:1881924-1887743 chr8:57360585-57360815 chrl8:74961556-74963822 chr5:172660720-172661133 chrl7:75277317-75278172 chrl0:99789614-99791320 chr2:176944087-176948446 chr4:154709512-154710827 chr5:140798757- 140799359 chr3:44063314-44063837 chrl5:79574830-79575211 chr2:223161531-223161919 chr6:134210639-134211218 chrl0:102899177-102899489 chrl3:79181944-79182222 chr7:71800757-71802768 chr3:186078710-186080111 chrl:24229115-24229537 chrl6:48844551-48845264 chr7:113724924-113727795 chr22:44726724-44727590 chr4:15779998-15780729 chr4:41869174-41869459 chrl:38941919-38942404 chr2:176971706-176972305 chr2:119607378-119607910 chr5:76934581-76935296 chrl2:103696090-103696418 chr5:63255044-63255407 chrl:221067447-221068185 chr2:119611296-119611881 chrl0:124907283-124911035 chrl2:114878143-114879155 chrl2:49371690-49375550 chrl7:36719544-36719938 chrl7:46696553-46696926 chr3:147142181- 147142391 chr8:9762661-9764748 chrl4:74706188-74708192 chr3:12837992-12838359 chr20:37352130-37357372 chrl0:8077829-8078378 chr4:4864456-4864834 chr4:13524062-13526083 chrl:66258440-66258918 chrll:17740789-17743779 chrl2:106975195-106975714 chr9:91792662-91793611 chrl:149333785- 149334111 chr3:170303532-170303768 chr5:72594147-72595808 chr5:145725286- 145725852 chrl0:23462224-23463889 chr20:21689758-21690048 chrl5:53080458-53083699 chr2:154727906-154728271 chr5:170743178-170744107 chrl0:102899822-102900263 chr5:134368578-134370466 chr2:66808568-66809404 chr7:96651963-96652246 chrl:91190489-91192804 chrl7:75368688-75370506 chr4:185939222-185942747 chr7:43152020-43153340 chrl3:84453654-84453897 chr2:176956504-176956707 chr7:87563342-87564571 chr20:17208550-17208756 chr22:19746924-19747141 chr2:223159725-223160487 chrl2:131200509-131200726 chrl8:44336183-44337110 chr2:63285949-63287097 chr4:13526553-13526770 chrl5:89949373-89951130 chrl9:55815940-55816277 chrl7:50235175-50236466 chrl9:58545115-58545897 chrl2:113592203-113592620 chrl2:115109503-115110061 chr4:164264821-164265772 chrl:2772126-2772665 chr3:71834068-71834653 chrl2:5018585-5021171 chrl5:74419870-74423044 chr3:147108511- 147111703 chr5:88185224-88185589 chrl2:54354529-54355491 chrl0:101290625-101291178 chr8:11557852-11558252 chr8:105478672-105479340 chrll:20181200-20182325 chrl9:54483021-54483572 chrl3:112707804-112708696 chrl6:22824616-22826459 chr4:66536065-66536674 chr4:154713537-154714240 chr7:12151220-12151559 chrl2:119212110-119212393 chrl7:14201726- 14202052 chr20:21376358-21378245 chrl3:36045931-36046143 chrl5:60287107-60287663 chr9:100613938-100614622 chrl0:102475276-102475579 chr7:121940006-121940648 chr5:37834671-37835128 chrl:197887088-197887791 chrl2:99139386-99139769 chr6:1619093-1621094 chrl2:113917394-113918107 chrl4:24044886-24046760 chr5:77253832-77254049 chr4:85403830-85404524 chr6:166666837-166667541 chrl8:77547965-77549038 chr2:219848919-219850541 chrl7:7832532-7833164 chr5:134363092-134365146 chrl0:103043990-103044480 chr8:97171805-97172022 chr20:57089460-57090237 chrl2:114840853-114841063 chr4:66535193-66535620 chr8:85096759-85097247 chr6:10881846-10882051 chrl3:28498226-28499046 chrl:161695637-161697298 chrll:2890388-2891337 chrl7:5000369-5001205 chrl3:27334226-27335205 chrl0:22623350-22625875 chr2:157185557-157186355 chr7:20370003-20371504 chr4:961347-962155 chrl2:49485766-49485977 chr3:62356119-62356378 chrll:14995128- 14995908 chrl2:53359192-53359507 chrl6:51168266-51169110 chrl4:57278709-57279116 chr6:37616722-37617179 chrl8:11750953-11752756 chrl9:45260352-45261809 chrl:119531991-119532196 chrl9:36523391-36523887 chrl2:52652018-52652743 chr8:49468683-49468959 chr8:9760750-9761643 chr7:19146923-19147308 chrl3:32889533-32889900 chr5:140797162- 140797701 chr21:42218489-42219222 chrl9:54411376-54411968 chr3:62354291-62355012 chrl2:113590806-113591304 chrl:225865068-225865328 chr7:130790358-130792773 chrl5:53076187-53077926 chrl:214158726-214159080 chrl2:3308812-3310270 chrl:39044059-39044561 chrl0:119312766-119313563 chrl2:65514878-65515863 chrl2:54366815-54369103 chrl2:114885105-114885418 chrl6:2228190-2230946 chrll:68622722-68623252 chr2:25499763-25500429 chr5:172661486-172662228 chrl7:46691520-46692097 chrl2:75602991-75603344 chr2:80531367-80531719 chr5:158478378-158478630 chr2:177017266-177017489 chr2:63282514-63283122 chr7:155595692-155599414 chr5:172665306-172666072 chrl2:114843022-114843610 chrl3:112758598-112760491 chr4:4858389-4858893 chrl6:55365814-55366022 chr9:96108466-96108992 chrl2:3475010-3475654 chr9:86152353-86153777 chr6:10384965-10385492 chr22:31500396-31501239 chr5:179228283-179229003 chr6:137816474-137817223 chr2:106681982-106682403 chrl4:95239375-95239679 chr7:154001964-154002281 chrl:1476093- 1476669 chrl5:89904822-89906050 chrll:89224416-89224718 chr9:100615234-100617510 chr3:172165372-172166738 chrl:202678881-202679769 chrl4:37053134-37053690 chr4:41875445-41875794 chr2:162273294-162273725 chrl:181287300-181287873 chrl3:79181327-79181614 chr8:145103285- 145108027 chr22:42305617-42307254 chr8:102505512-102506430 chrl7:74533281-74534566 chrl:214156000-214156851 chr20:2780978-2781497 chr4:4861227-4862241 chrl9:13215244-13215543 chr7:121943867-121944538 chrl7:71948478-71949255 chr2:127413696-127414171 chrl:113286332-113287172 chrl:47009575-47010132 chrl6:62069121-62070634 chrl6:3013651-3015131 chrl8:76732970-76734765 chr4:155664819-155665833 chr6:72298274-72298528 chrl5:89147660-89149198 chrl7:33775294-33775794 chrl8:44337510-44338100 chrl0:8076002-8077261 chrl3:112717125-112717421 chrl5:89914363-89915061 chrl:228785986-228786204 chrl:156358050-156358252 chr7:751712-752150 chr3:137489051-137489409 chrl7:7905927-7907445 chrl8:35144907-35147628 chr3:9177691-9178189 chr6:10390888-10391098 chrl4:37052537-37052838 chrl:47909712-47911020 chrl3:93879245-93880877 chrl:50893468-50893745 chr7:27282085-27283136 chr4:147558231- 147558583 chrl9:13124569-13124788 chrl7:46619087-46619314 chr3:44596535-44597018 chrl4:24803678-24804353 chr2:3286324-3286530 chrl2:14134626- 14135242 chrl2:114881649-114881937 chr20:22548967-22549720 chr8:3782248S-37824008 chrl3:100641334-100642188 chr4:206377-206892 chr3:11034445-11035384 chr7:152622343-152623305 chrl0:22629360-22630328 chr4:140201064- 140201449 chrl9:46318490-46319266 chr3:121902742-121903645 chr9:77112712-77113583 chr2:114256775-114258043 chrl0:15761423-15762101 chrl:115880167-115881332 chr6:50791110-50791573 chr6:55039170-55039392 chr2:176980755-176981423 chr8:86350765-86351196 chr8:24812946-24814299 chr7:19184818-19185033 chr5:76936126-76936984 chr5:87980878-87981272 chr9:77111778-77112042 chrll:20622720-20623399 chrl:50882433-50882660 chrl7:35291899-35300875 chrl7:46675044-46675589 chr20:5296265-5297798 chr7:156871054-156871297 chr4:681313-681514 chr2:177039551-177039951 chrl7:46695325-46695553 chrl:41283840-41284591 chr9:16726859-16727273 chrl:65991001-65991811 chrl:181452706-181453073 chr8:120428398-120429178 chr3:32863174-32863415 chr4:134069152-134070442 chrl2:123754049-123754373 chr5:63256548-63257886 chr5:1879689-1879928 chrl0:118899247-118900329 chr20:2731063-2731395 chr5:134385967-134386370 chr2:177014948-177015214 chrl:67218079-67218293 chrll:65408344-65408631 chr7:156801418-156801632 chrl8:54788959-54789194 chr2:220173870-220174283 chr2:220173021-220173271 chrl2:113908887-113910681 chr6:100897080-100897621 chrl:155290606-155291001 chr2:130763483-130763764 chrl2:129337870-129338653 chr21:34395128-34400245 chrl2:52115410-52115679 chr3:126113547-126113967 chrl6:3220438-3221356 chrl:119543056-119543454 chrl4:62279476-62280019 chrll:636906- 640628 chrl0:102893660-102895059 chr3:3840513-3842772 chrl:119529819-119530712 chr9:32782936-32783625 chrl9:1064897-1065191 chr5:54527319-54527760 chr7:156795355-156799394 chrl:155147185-155147444 chr9:37002489-37002957 chrll:69831571-69832484 chr2:128421719-128422182 chr22:38476836-38478839 chrl9:54412710-54413087 chr9:123656750-123656972 chr7:129422997-129423355 chrl9:36336275-36337138 chr2:50574045-50574817 chrl0:102975969-102978096 chr6:5996185-5996486 chr3:26664104-26664796 chr7:155170623-155170939 chr8:65286067-65286659 chrl4:37125219-37125661 chrll:65816404-65816665 chr6:41908745-41909711 chrl7:46620367-46621373 chr2:142887724- 142888553 chrl:221050448-221050864 chrl2:106974412-106974951 chrl4:57278068-57278287 chrl:67773329-67773767 chrl7:40936445-40936668 chr20:2729997-2730797 chrl2:113013099-113013529 chr7:155244046-155244357 chrl:214153214-214153668 chrl:156863415-156863711 chrl:114695136-114696672 chrl4:85996494-85996958 chr7:100823307-100823701 chr20:52789252-52790986 chr5:178421225-178422337 chrll:36397926-36399398 chrl3:36052553-36053119 chrl4:57283967-57284558 chr4:25090106-25090510 chr2:5831187-5831413 chr6:117869097-117869530 chrl9:58094739-58095764 chr4:85422929-85423190 chrl3:100547172-100547431 chr8:68864584-68864946 chrl6:49311413-49312308 chr7:19184221-19184686 chr2:19562749-19562965 chrl9:54481412-54481955 chrl0:124901907-124902617 chr3:62357639-62359774 chrll:31827696-31827921 chrl7:43037166-43037740 chr7:37955622-37956555 chr6:106429111-106429772 chr6:50682334-50683214 chr5:76923887-76924502 chr6:168841818-168843100 chr7:19145872-19146256 chr20:32856659-32857248 chrl7:79859808-79860963 chr7:95225503-95226194 chrl4:105167663-105168129 chrl7:14248391- 14248721 chrl6:84002269-84002860 chr9:104499849-104501076 chrl7 :46604362-46604881 chr2:87015974-87018182 chrl4:36990873-36991209 chr5:52777788-52777996 chrl9:35633847-35634629 chrl:221055492-221055800 chrl:146551476- 146551764 chrl3:100642774-100643094 chrl4:85999532-86000478 chrl3:36049570-36050159 chr2:119606038-119606313 chrll:123065426-123066184 chr3:172167526-172167866 chr4:41882450-41882964 chr8:142528185- 142529029 chr9:79637814-79638169 chr3:19189688-19190100 chr4:122301567-122302290 chrl0:130339526-130339777 chr9:35846310-35846638 chrl5:53097551-53098476 chr2:157184389-157184632 chr5:145718289- 145720095 chrll:105481126-105481422 chr5:170741603-170742751 chr3:62355315-62355534 chrl:38219702-38220012 chr4:41881177-41881418 chrl3:112715359-112716234 chrl7:1880789-1881116 chrl8:56887091-56887665 chr6:10390038-10390565 chrll:69516931-69517218 chrl9:39737689-39739288 chr3:157812053-157812764 chrl4:37049333-37051726 chr7:156409023-156409294 chrll:46366876-46367101 chr5:50685453-50686148 chr4:41883492-41884570 chrl3:112709884-112712665 chr22:44287497-44288061 chr22:46440393-45441019 chr8:23562475-23565175 chr2:207506774-207507422 chr4:169799086-169799625 chr3:133393118-133393657 chr8:41424341-41425300 chr4:100870377-100871994 chr4:107956555-107957453 chrl7:79314962-79320653 chr2:30453566-30455655 chrl:18956895-18959829 chrl2:41086522-41087102 chr22:42685894-42686095 chr6:100914946-100915245 986 chrl:46951168-46951792
987 chr4:41749184-41749811
988 chrll:128419198-128419513
989 chr2:171671598-171671804
990 chrl:170630456-170630851
991 chr20:44657463-44659243
992 chr9:139096665-139096993
993 chr7:155174128-155175248
994 chrl4:36993488-36994488
995 chr3:138654837-138655363
996 chr4:5709985-5710495
997 chrl5:23157794-23158624
998 chr20:9496471-9496893
999 chr4:174437914-174438346
1000 chr5:140305712- 140307193
1001 chrl5:79576059-79576270
1002 chrl4:38678245-38680937
1003 chrl0:102473206-102474026
1004 chrl7:59486727-59487132
1005 chr3:64253533-64253819
1006 chrl0:102484200-102484476
1007 chr7:27198182-27198514
1008 chr2:97192977-97193383
1009 chr9:77113709-77113927
1010 chr6:154360586-154361008
1011 chrll:44324875-44325087
1012 chr2:182521221-182521927
1013 chr7:124404700-124406189
1014 chr2:132182327-132183101
1015 chr7:101005899-101007443
1016 chr7:149744402- 149746469
1017 chr8:50822270-50822860
1018 chr7:27227520-27229043
1019 chr6:134212690-134213098
1020 chrl3:36044844-36045481
1021 chrll:132934059-132934291
1022 chrl6:51189800-51190260
1023 chrl:155145342-155145938
1024 chr4:682724-683079
1025 chr5:92939795-92940216
1026 chrl0:134597357-134602649
1027 chrl:200009807-200010036
1028 chrl9:12666243-12666682
1029 chr9:97401286-97402067
1030 chr2:107103833-107104053
1031 chrl5:89910521-89912177
1032 chr5:140789094- 140789762 1033 chr2:114033359-114033617
1034 chrl7:12568667-12569335
1035 chrll:68622108-68622339
1036 chrl:160340604-160340843
1037 chr7:103085710-103086132
1038 chrl5:76628998-76629207
1039 chr20:10198135-10198984
1040 chr20:44660342-44660948
1041 chrl7:35290403-35290663
1042 chrl7:933026-933236
1043 chr4:128544031-128544903
1044 chrl:50881884-50882103
1045 chrl0:125425495-125426642
1046 chrl7:46801784-46802071
1047 chrl:25255527-25259005
1048 chr3:32861141-32861429
1049 chrl7:70116274-70119998
1050 chrl0:75407413-75407706
1051 chr2:467849-468659
1052 chrll:132952538-132953307
1053 chr3:6904133-6904641
1054 chrl0:120353692-120355821
1055 chr7:20830567-20830817
1056 chrll:71950815-71951408
1057 chrl4:95240083-95240341
1058 chrl9:5829048-5829474
1059 chr20:9495253-9495597
1060 chr9:112083333-112083549
1061 chrl5:96873408-96877721
1062 chrl6:67208067-67208678
1063 chrl:175568376-175568808
1064 chr6:5999149-5999787
1065 chr3:129693127-129694841
1066 chr6:10383525-10384114
1067 chrll:636435-636668
1068 chrl:181451311-181452049
1069 chr9:135464586-135466240
1070 chrl5:60289325-60289533
1071 chrl6:49309123-49309353
1072 chrl:243646394-243646888
1073 chrl2:54071053-54071265
1074 chrl:91176404-91176701
1075 chr5:140864527- 140864748
1076 chr4:47034427-47034940
1077 chrl0:102489343-102491011
1078 chrl0:102419147-102419668
1079 chrl2:81471569-81472119 1080 chr6:50813314-50813699
1081 chr5:158526133-158526431
1082 chrl:119543821-119544339
1083 chr5:77140542-77140914
1084 chr8:23567180-23567678
1085 chrl:41831976-41832542
1086 chr2:139537692-139538650
1087 chr7:100075303-100075551
1088 chr2:176969217-176969895
1089 chr7:27284639-27286237
1090 chr5:31193952-31194419
1091 chr6:37616393-37616621
1092 chrl9:1748167-1750243
1093 chrl0:101281181-101282116
1094 chr21:31311386-31312106
1095 chr2:176973427-176973718
1096 chrl5:96900142-96900644
1097 chr7:158936507-158938492
1098 chr3:63263989-63264205
1099 chrl6:71459781-71460338
1100 chr7:155601175-155603235
1101 chrl2:54447744-54448091
1102 chrl2:53491572-53491955
1103 chrl0:16561604-16563822
1104 chrll:133994709-133995090
1105 chr2:137522460-137523696
1106 chrl7:12877270-12877773
1107 chr8:98289604-98290404
1108 chr4:185937242-185937750
1109 chr3:185911344-185912228
1110 chrl2:54378696-54380102
1111 chrl:221060850-221061071
1112 chrl2:63543636-63544967
1113 chr6:6006689-6007043
1114 chrl9:51169659-51172023
1115 chrl:1474962- 1475220
1116 chrl4:54418677-54418881
1117 chr6:108497595-108497996
1118 chrl7:37764092-37764304
1119 chr4:109092578-109092839
1120 chrl:91182097-91182364
1121 chrl3:112760865-112761113
1122 chrl2:122018170-122018457
1123 chr7:142494563- 142495248
1124 chrl3:58203586-58204322
1125 chrl:92945907-92952609
1126 chrl2:106977388-106977713 1127 chr5:76925445-76926875
1128 chrl6:3190765-3191389
1129 chrl:12123488-12124148
1130 chrl7:48545570-48546900
1131 chrl2:113916433-113916717
1132 chr4:41747508-41747944
1133 chrl9:46916587-46916862
1134 chrl5:49254984-49255564
1135 chrl9:8674332-8674764
1136 chr2:223167205-223167560
1137 chrl7:1173535-1174733
1138 chr3:75955759-75956308
1139 chr5:115697134-115697589
1140 chr8:21644908-21647845
1141 chr5:59189046-59189894
1142 chrl2:54338761-54339168
1143 chrl6:31053479-31053800
1144 chrl:50892437-50893243
1145 chrl7:40935964-40936180
1146 chrl9:44203558-44203987
1147 chr4:81109887-81110460
1148 chrl:2979275-2980758
1149 chrl6:49872449-49872926
1150 chrl:200008392-200009047
1151 chrl6:49316997-49317263
1152 chr2:114034594-114036041
1153 chr2:105480197-105480760
1154 chrl8:44777632-44778084
1155 chrl9:13213450-13213821
1156 chrl7:6616422-6617471
1157 chrl4:36977518-36977996
1158 chrl:214160798-214161034
1159 chrl:91182509-91182857
1160 chrl0:130508443-130508658
1161 chr2:154728944-154729328
1162 chrl5:89952271-89953061
1163 chrl8:55102427-55102708
1164 chr22:31198491-31199033
1165 chrl0:50821487-50821688
1166 chr7:100076454-100076785
1167 chrl8:13641584-13642415
1168 chrl8:13868532-13869026
1169 chr6:168841438-168841699
1170 chrl:61515875-61516831
1171 chr7:32110063-32110910
1172 chr7:56355508-56355798
1173 chrl9:12767749-12767980 1174 chrl9:19371675-19372393
1175 chrl4:69256676-69257036
1176 chrl7:75447477-75447821
1177 chrl4:24801680-24802153
1178 chr5:148033472- 148034080
1179 chrl0:125650820-125651373
1180 chrll:43568921-43569854
1181 chr22:37212769-37213467
1182 chr2:162283581-162284677
1183 chr8:130995921-130996149
1184 chrll:70508328-70508617
1185 chrl6:88943427-88943669
1186 chrl9:42891311-42891646
1187 chrl5:53079220-53079579
1188 chrl7:46690390-46691055
1189 chr4:41880224-41880500
1190 chrl:156105707-156106171
1191 chr6:5997027-5997414
1192 chrl:18964180-18964401
1193 chrl4:36983440-36983738
1194 chrl2:54445876-54446113
1195 chr5:87968635-87968907
1196 chrl:29587087-29587412
1197 chrll:60718428-60718888
1198 chr2:66672431-66673636
1199 chr4:81119095-81119391
1200 chrl0:76573195-76573507
1201 chr22:42322043-42322909
1202 chrl9:45898879-45900315
1203 chrl4:95826675-95826941
1204 chrl7:48194634-48195085
1205 chrl9:49669275-49669552
1206 chrl5:96897596-96898046
1207 chrl9:40314926-40315144
1208 chr9:120507227-120507642
1209 chr5:145722467- 145722925
1210 chr3:19188246-19188772
1211 chr5:140787447- 140788044
1212 chrl9:50881418-50881664
1213 chrl0:102896342-102896665
1214 chr7:53286851-53287192
1215 chrl5:89903446-89903720
1216 chrl0:23461300-23461610
1217 chr2:127783081-127783311
1218 chrll:72532612-72533774
1219 chr2:119605200-119605620
1220 chrl8:12254147-12255089 1221 chr7:100817759-100817975
1222 chrl4:77736733-77737772
1223 chrl2:127212279-127212529
1224 chr2:119606569-119606826
1225 chrl:155264318-155265536
1226 chrl2:131199824-131200157
1227 chrl:91300979-91301891
1228 chr6:100909210-100909444
1229 chr6:4079052-4079443
1230 chr2:233251361-233253414
1231 chr4:960505-960836
1232 chrl9:21769189-21769786
1233 chrl0:102279162-102279730
1234 chrl2:127210778-127211651
1235 chrl2:54069625-54070177
1236 chrl5:53087211-53087488
1237 chrl3:28365545-28365785
1238 chrl2:113913615-113914322
1239 chrl4:51338712-51339146
1240 chr7:155604725-155605095
1241 chr3:62364017-62364316
1242 chr6:6008857-6009299
1243 chr3:46618307-46618669
1244 chrl7:33776553-33776888
1245 chrl2:58158855-58160000
1246 chr2:219857682-219858917
1247 chrl9:44278273-44278777
1248 chrl0:101282725-101282934
1249 chr20:2539133-2539877
1250 chrl2:58003880-58004249
1251 chrl6:51147490-51147944
1252 chrl:179544720-179545307
1253 chr2:71787430-71787897
1254 chrl0:129534410-129537366
1255 chr6:42145847-42146053
1256 chrl4:24802927-24803159
1257 chr22:29707479-29707797
1258 chr9:132459587-132460017
1259 chrl7:40937258-40937480
1260 chr4:151504011-151505085
1261 chrl:18967251-18968119
1262 chrl9:56598038-56600296
1263 chrl9:35633409-35633697
1264 chr2:171678546-171680358
1265 chr6:134638797-134639021
1266 chrl:36549554-36549965
1267 chr!9:12833104-12833574 1268 chr3:137487429-137488021
1269 chr9:139715663-139716441
1270 chr6:37617863-37618147
1271 chrl7:32484007-32484280
1272 chr7:156409577-156409865
1273 chr5:11384681-11385521
1274 chr8:102504478-102504841
1275 chr20:33296514-33298242
1276 chr20:57415135-57417153
1277 chrl0:71331449-71331691
1278 chr3:75667777-75669067
1279 chrl6:67571252-67572728
1280 chrl9:36500169-36500530
1281 chr2:154729613-154729918
1282 chrl2:48399168-48399372
1283 chr4:41867385-41867586
1284 chrl7:46800533-46800746
1285 chr20:44685771-44687610
1286 chrl9:10406934-10407342
1287 chr6:108496715-108497320
1288 chr5:158523906-158524598
1289 chr9:124413512-124414193
1290 chr20:57427691-57427995
1291 chrl6:10912159-10912719
1292 chr7:149389654- 149389976
1293 chrl:173638662-173639045
1294 chrl9:55597977-55598887
1295 chrl4:62279037-62279339
1296 chr3:13114627-13115245
1297 chr2:3750828-3751927
1298 chr4:85402764-85403175
1299 chrl7:74017769-74018658
1300 chr5:54523676-54523901
1301 chr7:89747892-89749036
1302 chrl8:72916107-72917233
1303 chr9:136294738-136295236
1304 chrl:201252452-201253648
1305 chr5:146888750- 146889840
1306 chrl4:52734207-52735486
1307 chrl3:20875518-20876214
1308 chrl8:77560088-77560292
1309 chr2:102803672-102804556
1310 chr2:176982107-176982402
1311 chrl7:6679205-6679710
1312 chrl9:10463626-10464378
1313 chr5:140810494- 140812617
1314 chrll:46299544-46300216 1315 chrll:64136814-64138187
1316 chr6:6007387-6007797
1317 chrl7:37321482-37322099
1318 chrl0:94455524-94455896
1319 chrl3:51417371-51418149
1320 chr8:11565217-11567212
1321 chrl:226127112-226127695
1322 chr2:3287874-3288228
1323 chr6:10882926-10883149
1324 chr22:19746155-19746369
1325 chr3:12838471-12838782
1326 chr9:36739534-36739782
1327 chr9:134429866-134430491
1328 chrll:70672834-70673055
1329 chrl4:24641053-24642220
1330 chr7:27283408-27283614
1331 chrl2:49182421-49182658
1332 chrl:44031286-44031853
1333 chrl:114696886-114697185
1334 chrl5:89901914-89902785
1335 chrll:65352231-65353134
1336 chr7:72838383-72838815
1337 chr22:38379093-38379964
1338 chr4:155663809-155664315
1339 chr9:100619984-100620192
1340 chr7:143582125- 143582610
1341 chr7:23287221-23287508
1342 chrll:64815040-64815722
1343 chr2:87088816-87089037
1344 chr20:57426729-57427047
1345 chrl0:43428167-43429460
1346 chrl0:121577529-121578385
1347 chr4:190939801-190940591
1348 chr6:100037323-100037544
1349 chrl9:12880574-12880888
1350 chr2:171670110-171670549
1351 chr7:124404174-124404432
1352 chr7:97840559-97840845
1353 chrl9:50879606-50880094
1354 chrl:113265573-113265787
1355 chrl9:2424005-2427983
1356 chr3:127633993-127634588
1357 chrl0:50817095-50817309
1358 chr2:171676552-171676980
1359 chrl:86621278-86622871
1360 chrl:164545540-164545917
1361 chr22:19967279-19967808 1362 chrll:67350928-67351953
1363 chr20:36226617-36226841
1364 chrl9:14089570- 14089796
1365 chrl9:38700333-38700577
1366 chrl:18435566-18435904
1367 chr8:21905461-21905757
1368 chr2:176950595-176950846
1369 chrl7:75251958-75252180
1370 chrl5:37390175-37390380
1371 chr9:98113447-98113662
1372 chrl:40235767-40237190
1373 chr8:144811237- 144811446
1374 chr8:99984584-99985072
1375 chr7:152621916-152622149
1376 chrl:40769186-40769871
1377 chrl9:2428349-2428731
1378 chrl7:15820620-15821325
1379 chr22:25081850-25082112
1380 chrl:19203874-19204234
1381 chr20:61703526-61704022
1382 chr2:237080188-237080432
1383 chrl:156338758-156339251
1384 chrl:149332993- 149333389
1385 chr22:50496441-50497393
1386 chr7:27146069-27146600
1387 chrl3:100547633-100548911
1388 chr4:190939007-190939274
1389 chr7:73894815-73895110
1390 chrl9:35632356-35632572
1391 chrl6:67918679-67918909
1392 chr2:108602824-108603467
1393 chr2:238864315-238865170
1394 chr8:144808221- 144810978
1395 chr8:145101631- 145101834
1396 chrl2:132905449-132906206
1397 chr6:99275763-99276038
1398 chr5:140800760- 140801072
1399 chrl7:75242871-75243613
1400 chrl7:41278134-41278460
1401 chrl2:122016170-122017693
1402 chrl0:131264948-131265710
1403 chrl7:46631800-46632212
1404 chrl4:105167277-105167501
1405 chrl0:23982382-23982589
1406 chrl9:50931270-50931638
1407 chr3:27771638-27771942
1408 chrl8:74799144-74800038 1409 chrl:21616380-21617101
1410 chrl:147782066- 147782473
1411 chr7:6590563-6590957
1412 chr7:97839862-97840222
1413 chrl2:113914440-113914657
1414 chrl9:7933263-7934898
1415 chr20:22559553-22560001
1416 chrl5:53086629-53086858
1417 chrl0:94180315-94180754
1418 chr5:140052059- 140053381
1419 chrl0:101287162-101287920
1420 chrl4:38677154-38677787
1421 chr22:39262338-39263211
1422 chrl8:74153239-74155073
1423 chrl5:59157045-59157594
1424 chr4:963804-964115
1425 chrll:624780-625053
1426 chr7:1362811-1363643
1427 chrl9:36246328-36247982
1428 chr5:54528095-54528404
1429 chrl2:54359658-54359906
1430 chr2:127782613-127782829
1431 chrl9:406131-405511
1432 chrl7:46697413-45697701
1433 chrl8:43608140-43608510
1434 chrl6:23724270-23724775
1435 chrl8:55922987-55924068
1436 chrl5:60291879-60292167
1437 chrl4:92788913-92789204
1438 chrl9:1108394-1109610
1439 chrll:124628367-124629590
1440 chrl:32052471-32052771
1441 chrl9:11594372-11594987
1442 chrl9:870774-871318
1443 chr2:54086775-54087266
1444 chr2:241459632-241460047
1445 chr7:127990926-127992616
1446 chrl:208132327-208133117
1447 chr7:90893567-90896683
1448 chrl:41284847-41285149
1449 chrll:32452144-32452708
1450 chr5:77146998-77147785
1451 chrl9:45901452-45901688
1452 chr7:6661875-6662695
1453 chr6:161188084-161188639
1454 chrl7:934417-935088
1455 chrll:65409636-65410127 1456 chrl7:19883325-19883610
1457 chrl8:77549524-77550299
1458 chrl:38461584-38461988
1459 chrl9:10464666-10464927
1460 chrl7:70120139-70120442
1461 chr7:27147589-27148389
1462 chr2:31806545-31806782
1463 chrll:119292689-119292891
1464 chrl9:18979351-18981200
1465 chr6:42879279-42879623
1466 chrl2:130908777-130909191
1467 chrl7:46629553-46629816
1468 chrl:202162958-202163390
1469 chrl7:21367114-21367592
1470 chrl6:84001805-84002011
1471 chrl:221057463-221057757
1472 chrl7:27899511-27900067
1473 chrl5:40268581-40269061
1474 chr22:37465056-37465331
1475 chrl7:77805866-77809046
1476 chrl9:13198699-13198999
1477 chr3:184056419-184056671
1478 chr22:37911979-37912258
1479 chrl9:19368708-19369681
1480 chrll:64135815-64136381
1481 chrl8:77552401-77552603
1482 chrl9:58554354-58554587
1483 chr20:57414595-57414896
1484 chr4:190938106-190938848
1485 chr5:172110282-172111166
1486 chrl6:68480864-68482822
1487 chr9:139395020-139395287
1488 chrl2:113515164-113515970
1489 chrl:221054554-221054888
1490 chr8:144990270- 145002135
1491 chr9:131154346-131155923
1492 chr6:150335525-150336278
1493 chr9:115824684-115825033
1494 chrl2:54519768-54520457
1495 chr6:35479872-35480154
1496 chrl9:3870788-3871043
1497 chrl9:48965002-48965792
1498 chr6:35479388-35479678
1499 chrl2:52408381-52408675
1500 chrl:221068782-221069159
1501 chr6:46655262-45556738
1502 chr3:55508335-55508708 1503 chrl:39980365-39981768
1504 chrl6:3067521-3068358
1505 chrl:1473107-1473342
1506 chrl0:105362549-105362827
1507 chrl7:46698880-46699083
1508 chr2:198029068-198029438
1509 chr20:17209418-17209622
1510 chrl2:49183049-49183282
1511 chrl6:58030214-58031633
1512 chrl0:94820026-94823252
1513 chrll:725596-726870
1514 chr6:170732119-170732442
1515 chrl2:120835586-120835927
1516 chr20:36012595-36013439
1517 chr8:143545445- 143546178
1518 chr6:27228100-27228364
1519 chr21:32624144-32624382
1520 chr9:95477296-95477708
1521 chrl0:105420685-105421076
1522 chrl:1470604- 1471450
1523 chrl:146552328- 146552577
1524 chrl9:33625467-33625805
1525 chrll:64478843- 64479598
1526 chr20:57428308-57428516
1527 chr7:27182613-27185562
1528 chrl9:51815157-51815458
1529 chrl7:46607804-46608390
1530 chrl2:52408860-52409121
1531 chrl9:10405924-10406398
1532 chrll:14993452- 14993661
1533 chrl9:13135317-13136169
1534 chr7:750788-751237
1535 chrl:53742297-53742845
1536 chrl:200010625-200010832
1537 chr5:139138875-139139242
1538 chrl7:45949676-45949885
1539 chr3:128722283-128723036
1540 chrl5:89312719-89313183
1541 chr9:135039673-135039978
1542 chrl9:12831793-12832225
1543 chr20:51589707-51590020
1544 chr20:3145121-3145746
1545 chr8:65710990-65711722
1546 chrll:128694084-128694688
1547 chr2:20870006-20871280
1548 chrl9:18977466-18977833
1549 chr3:49947621-49948430 1550 chr6:30139718-30140263
1551 chrl2:104697348-104697984
1552 chrl0:105361784-105362188
1553 chr6:29894140-29895117
1554 chr4:187219320-187219745
1555 chrl5:67073306-67073943
1556 chr2:220412341-220412678
1557 chr6:170730395-170730887
1558 chr9:115822071-115823416
1559 chrl:10764449-10764925
1560 chrl7:46627787-46628444
1561 chrl9:51601822-51602260
1562 chrl9:55814067-55814278
1563 chr6:138745348-138745593
1564 chr9:124987743-124991086
1565 chr22:46318693-46319087
1566 chrl6:3013016-3013228
1567 chr4:114900355-114900810
1568 chrl9:1063544-1064265
1569 chrl9:1110399-1110701
1570 chr7:97841636-97842005
1571 chr8:57359899-57360114
1572 chrl7:72915558-72916510
1573 chrl:16860873-16862296
1574 chrl7:75398284-75398527
1575 chr9:139397412-139397710
1576 chr6:33393592-33393908
1577 chr6:29595298-29595795
1578 chrl2:6438272-6438931
1579 chr3:113160299-113160641
1580 chrl:55505060-55506015
1581 chrll:132951692-132952260
1582 chr4:81118137-81118603
1583 chrl9:38876070-38876332
1584 chrl9:58549305-58549712
1585 chrl7:43472527-43474343
1586 chr9:139396205-139397040
1587 chrl6:3192181-3192669
1588 chr6:33048416-33048814
1589 chr7:128555329-128556650
1590 chrl9:46915311-46915802
1591 chr6:30095173-30095610 Table 2: Example CGIs
Figure imgf000095_0001
Figure imgf000096_0001
Figure imgf000097_0001
Figure imgf000098_0001
Figure imgf000099_0001
Figure imgf000100_0001
Figure imgf000101_0001
Figure imgf000102_0001
Figure imgf000103_0001
Figure imgf000104_0001
Figure imgf000105_0001
Figure imgf000106_0001
Figure imgf000107_0001
Figure imgf000108_0001
Figure imgf000109_0001
Table 3: Additional Example CGIs
Figure imgf000110_0001
Figure imgf000111_0001
Figure imgf000112_0001
Figure imgf000113_0001
Ill
Figure imgf000114_0001
Figure imgf000115_0001
Figure imgf000116_0001
Figure imgf000117_0001
Figure imgf000118_0001
Figure imgf000119_0001
Figure imgf000120_0001
Figure imgf000121_0001
Figure imgf000122_0001
Figure imgf000123_0001
Figure imgf000124_0001
Figure imgf000125_0001
Figure imgf000126_0001
Figure imgf000127_0001
Figure imgf000128_0001
Table 4: Additional Example CGIs
Figure imgf000128_0002
Figure imgf000129_0001
Figure imgf000130_0001
Figure imgf000131_0001
Figure imgf000132_0001
Figure imgf000133_0001
Figure imgf000134_0001
Figure imgf000135_0001
Figure imgf000136_0001
Figure imgf000137_0001
Figure imgf000138_0001

Claims

WHAT IS CLAIMED IS:
1. A nucleic acid enrichment method, the method comprising: cutting a nucleic acid molecule that includes a target sequence to generate a single stranded overhang at a cut end of the molecule that includes the target; filling in the overhang with at least one labeled nucleotide; and enriching the molecule that includes the target by contacting at least one of the labeled nucleotides in the molecule with a capture domain.
2. The method of claim 1 , wherein the cutting step is performed by a nuclease.
3. The method of claim 2, wherein the nuclease is a CRISPR-Cas nuclease.
4. The method of claim 3, wherein the nuclease is a type II or a type V CRISPR-Cas nuclease.
5. The method of claim 3 or claim 4, wherein the nuclease is a Cas9, Casl2, or CasX nuclease.
6. The method of claim 5, wherein the nuclease is a Casl2a/Cpfl nuclease.
7. The method of any one of claims 3-6, wherein the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
8. The method of any one of claims 1-7, wherein the cutting step is performed at room temperature.
9. The method of any one of claims 1-8, wherein the overhang is filled in using a DNA polymerase.
10. The method of claim 9, wherein the DNA polymerase is DNA polymerase I.
11. The method of claim 10, wherein the DNA polymerase I consists of the Klenow fragment.
12. The method of any one of claims 1-11, wherein the label comprises biotin or digoxigenin.
13. The method of any one of claims 1-12, wherein the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
14. The method of any one of claims 1-13, wherein the capture domain comprises or is connected to a solid support.
15. The method of claim 14, wherein the solid support is a bead, a well, a tube, or a slide.
16. The method of claim 15, wherein the capture domain comprises streptavidin connected to the bead.
17. The method of any one of claims 1-16, wherein the nucleic acid molecule is present in a nucleic acid sequencing library, and the method enriches target sequences in the library.
18. The method of any one of claims 1-16, wherein the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
19. The method of claim 18, wherein the nucleic acid sample is a plasma sample, and the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
20. The method of claim 18 or 19, wherein the nucleic acid sample comprises cell free DNA (cfDNA).
21. The method of claim 20, wherein cytosines in the cfDNA have been converted to uracils.
22. The method of claim 20 or 19 wherein the cfDNA has been treated with bisulfite.
23. The method of any one of claims 1-16 and 18-22, wherein the method further comprises preparing a library before or after enriching the molecule that includes the target.
24. The method of any one of claims 1-20 and 23, the method further comprising the step of converting methylated cytosines to uracils.
25. The method of any one of claims 1-24, the method further comprising a wash step to remove nucleic acid molecules that do not include the target.
26. The method of any one of claims 1-25, the method further comprising amplifying the nucleic acid molecule.
27. The method of claim 26, wherein the amplification occurs while the nucleic acid is in contact with the capture domain.
28. The method of any one of claims 1-27, the method further comprising sequencing the enriched molecule.
29. The method of any one of claims 1-28, the method further comprising separating the nucleic acid molecule from the capture domain.
30. The method of claim 29, wherein the separating step is performed using heat elution, a chemical agent, mechanical disruption, or combinations thereof.
31. The method of claim 29 or 30, wherein the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
32. The method of any one of claims 1-31, wherein the method further comprises an additional enrichment step.
33. The method of claim 32, wherein the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
34. The method of claim 32 or 33, wherein the additional enrichment step comprises hybrid capture.
35. The method of any one of claims 32-34, wherein the additional enrichment step comprises using a nucleic acid binding protein.
36. A method of capturing a nucleic acid molecule having a target sequence, the method comprising: cutting a nucleic acid molecule that includes a target sequence to generate a single stranded overhang at a cut end of the molecule that includes the target; filling in the overhang with at least one labeled nucleotide; and capturing the molecule that includes the target by contacting at least one of the labeled nucleotides in the molecule with a capture domain.
37. The method of claim 36, wherein the cutting step is performed by a nuclease.
38. The method of claim 37, wherein the nuclease is a CRISPR-Cas nuclease.
39. The method of claim 38, wherein the nuclease is a type II or a type V CRISPR-Cas nuclease.
40. The method of claim 38 or claim 39, wherein the nuclease is a Cas9, Casl2, or CasX nuclease.
41. The method of claim 40, wherein the nuclease is a Casl2a/Cpfl nuclease.
42. The method of any one of claims 36-41 , wherein the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
43. The method of any one of claims 36-42, wherein the cutting step is performed at room temperature.
44. The method of any one of claims 36-43, wherein the overhang is filled in using a DNA polymerase.
45. The method of claim 44, wherein the DNA polymerase is DNA polymerase I.
46. The method of claim 45, wherein the DNA polymerase I consists of the Klenow fragment.
47. The method of any one of claims 36-46, wherein the label comprises biotin or digoxigenin.
48. The method of any one of claims 36-47, wherein the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
49. The method of any one of claims 36-48, wherein the capture domain comprises or is connected to a solid support.
50. The method of claim 49, wherein the solid support is a bead, a well, a tube, or a slide.
51. The method of claim 50, wherein the capture domain comprises streptavidin connected to the bead.
52. The method of any one of claims 36-51, wherein the nucleic acid molecule is present in a nucleic acid sequencing library, and the method captures target sequences of interest in the library.
53. The method of any one of claims 36-51, wherein the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
54. The method of claim 53, wherein the nucleic acid sample is a plasma sample, and the plasma sample is used directly in the method of capturing a nucleic acid molecule without prior enrichment or purification of the nucleic acid.
55. The method of claim 53, wherein the nucleic acid sample comprises cfDNA.
56. The method of claim 55, wherein cytosines in the cfDNA have been converted to uracils.
57. The method of claim 55 or 56 wherein the cfDNA has been treated with bisulfite.
58. The method of any one of claims 36-51 and 53-57, wherein the method further comprises preparing a library before or after capturing the molecule that includes the target.
59. The method of any one of claims 36-55 and 58, the method further comprising the step of converting methylated cytosines to uracils.
60. The method of any one of claims 36-59, the method further comprising a wash step to remove nucleic acid molecules that do not include the target.
61. The method of any one of claims 36-60, the method further comprising amplifying the nucleic acid molecule.
62. The method of claim 61, wherein the amplification occurs while the nucleic acid is in contact with the capture domain.
63. The method of any one of claims 36-62, the method further comprising sequencing the captured molecule.
64. The method of any one of claims 36-63, the method further comprising separating the nucleic acid molecule from the capture domain.
65. The method of claim 64, wherein the separating step is performed using heat elution, a chemical agent, mechanical disruption, or combinations thereof.
66. The method of claim 64 or 65, wherein the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
67. The method of any one of claims 36-66, wherein the method further comprises an additional enrichment step.
68. The method of claim 67, wherein the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
69. The method of claim 67 or 68, wherein the additional enrichment step comprises hybrid capture.
70. The method of any one of claims 67-69, wherein the additional enrichment step comprises using a nucleic acid binding protein.
71. A nucleic acid enrichment method, the method comprising: cutting a nucleic acid molecule that includes a target sequence to generate a single stranded overhang at a cut end of the molecule that includes the target; filling in the overhang with at least one labeled nucleotide; and enriching the molecule that includes the target by separating labeled molecules from unlabeled molecules.
72. The method of claim 71, wherein the cutting step is performed by a nuclease.
73. The method of claim 72, wherein the nuclease is a CRISPR-Cas nuclease.
74. The method of claim 73, wherein the nuclease is a type II or a type V CRISPR-Cas nuclease.
75. The method of claim 73 or claim 74, wherein the nuclease is a Cas9, Casl2, or CasX nuclease.
76. The method of claim 75, wherein the nuclease is a Casl2a/Cpfl nuclease.
77. The method of any one of claims 71-76, wherein the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
78. The method of any one of claims 71-77, wherein the cutting step is performed at room temperature.
79. The method of any one of claims 71-78, wherein the overhang is filled in using a DNA polymerase.
80. The method of claim 79, wherein the DNA polymerase is DNA polymerase I.
81. The method of claim 80, wherein the DNA polymerase I consists of the Klenow fragment.
82. The method of any one of claims 71-81, wherein the label comprises biotin, digoxigenin, or a fluorophore.
83. The method of any one of claims 71-82, wherein the capture domain comprises or is connected to a solid support.
84. The method of claim 83, wherein the solid support is a bead, a well, a tube, or a slide.
85. The method of claim 84, wherein the capture domain comprises streptavidin connected to the bead.
86. The method of any one of claims 71-85, wherein the nucleic acid molecule is present in a nucleic acid sequencing library, and the method enriches target sequences of interest in the library.
87. The method of any one of claims 71-86, wherein the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
88. The method of claim 87, wherein the nucleic acid sample is a plasma sample, and the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
89. The method of claim 88, wherein the nucleic acid sample comprises cell free DNA (cfDNA).
90. The method of claim 89, wherein cytosines in the cfDNA have been converted to uracils.
91. The method of claim 89 or 90 wherein the cfDNA has been treated with bisulfite.
92. The method of any one of claims 71-85 and 87-91, wherein the method further comprises preparing a library before or after enriching the molecule that includes the target.
93. The method of any one of claims 71-89 and 92, the method further comprising the step of converting methylated cytosines to uracils.
94. The method of any one of claims 71-93, wherein the method includes a wash step.
95. The method of any one of claims 71-94, the method further comprising amplifying the nucleic acid molecule.
96. The method of claims 95, wherein the amplification occurs while the nucleic acid is in contact with the capture domain.
97. The method of any one of claims 71-96, the method further comprising sequencing the enriched molecule.
98. The method of any one of claims 71-97, the method further comprising separating the nucleic acid molecule from the capture domain.
99. The method of claim 98, wherein the separating step is performed using heat elution, a chemical agent, mechanical disruption, or combinations thereof.
100. The method of claim 98 or 99, wherein the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
101. The method of any one of claims 71-100, wherein the method further comprises an additional enrichment step.
102. The method of claim 101, wherein the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
103. The method of claim 101 or 102, wherein the additional enrichment step comprises hybrid capture.
104. The method of any one of claims 101-103, wherein the additional enrichment step comprises using a nucleic acid binding protein.
105. A method of producing a nucleic acid library enriched for regions of interest, the method comprising: cutting a plurality of nucleic acid molecules comprising regions of interest to generate single stranded overhangs at cut ends of the molecules that include the regions of interest; filling in each overhang with a least one labeled nucleotide; and enriching the molecules that include the regions of interest by contacting the labeled nucleotides in the molecule with capture domains.
106. The method of claim 105, wherein the cutting step is performed by a nuclease.
107. The method of claim 106, wherein the nuclease is a CRISPR-Cas nuclease.
108. The method of claim 107, wherein the nuclease is a type II or a type V CRISPR-Cas nuclease.
109. The method of claim 107 or claim 108, wherein the nuclease is a Cas9, Casl2, or CasX nuclease.
110. The method of claim 109, wherein the nuclease is a Casl2a/Cpfl nuclease.
111. The method of any one of claims 105-110, wherein the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecules that include the regions of interest.
112. The method of any one of claims 105-111, wherein the cutting step is performed at room temperature.
113. The method of any one of claims 105-112, wherein the overhangs are filled in using a DNA polymerase.
114. The method of claim 113, wherein the DNA polymerase is DNA polymerase I.
115. The method of claim 114, wherein the DNA polymerase I consists of the Klenow fragment.
116. The method of any one of claims 105-115, wherein the label comprises biotin, digoxigenin, or a fluorophore.
117. The method of any one of claims 105-116, wherein the capture domains comprise or are connected to solid supports.
118. The method of claim 117, wherein the solid supports are beads, wells, tubes, or slides.
119. The method of claim 118, wherein the capture domains comprise streptavidin connected to beads.
120. The method of any one of claims 105-119, the method further comprising amplifying the nucleic acid molecules.
121. The method of claim 120, wherein the amplifying is performed with primers that comprise adapters to facilitate sequencing of the nucleic acid molecules.
122. The method of any one of claims 105-121, wherein the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
123. The method of claim 122, wherein the nucleic acid sample is a plasma sample, and the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
124. The method of claim 123, wherein the nucleic acid sample comprises cell free DNA (cfDNA).
125. The method of claim 124, wherein cytosines in the cfDNA have been converted to uracils.
126. The method of claim 124 or 125 wherein the cfDNA has been treated with bisulfite.
127. The method of any one of claims 105-124, the method further comprising the step of converting methylated cytosines to uracils.
128. The method of any one of claims 105-127, the method further comprising a wash step to remove nucleic acid molecules that do not include the regions of interest.
129. The method of any one of claims 105-128, the method further comprising amplifying the nucleic acid molecule.
130. The method of claims 129, wherein the amplification occurs while the nucleic acid is in contact with the capture domain.
131. The method of any one of claims 105-130, the method further comprising separating the nucleic acid molecules from the capture domains.
132. The method of claim 131, wherein the separating step is performed using heat elution, a chemical agent, mechanical disruption, or combinations thereof.
133. The method of claim 131 or 132, wherein the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
134. The method of any one of claims 105-133, wherein the method further comprises an additional enrichment step.
135. The method of claim 134, wherein the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
136. The method of claim 134 or 135, wherein the additional enrichment step comprises hybrid capture.
137. The method of any one of claims 134-136, wherein the additional enrichment step comprises using a nucleic acid binding protein.
138. A method for producing a nucleic acid library enriched for regions of interest, the method comprising: obtaining a sample comprising a plurality of nucleic acids, wherein a subset of the plurality of nucleic acids comprise regions of interest; optionally converting methylated cytosines to uracils; adding nucleic acid adapters to the plurality of nucleic acids to form a nucleic acid library; cutting the subset of the plurality of nucleic acid molecules having regions of interest to generate single stranded overhangs at cut ends of the molecules that include the regions of interest; filling in each overhang with a least one labeled nucleotide; enriching the molecules that include the regions of interest by contacting the labeled nucleotides in the molecule with capture domains; and amplifying the molecules that include the regions of interest to form the nucleic acid library enriched for regions of interest.
139. A method for producing a nucleic acid library enriched for regions of interest, the method comprising: obtaining a sample comprising a plurality of nucleic acids, wherein a subset of the plurality of nucleic acids comprise regions of interest; cutting the subset of the plurality of nucleic acid molecules having regions of interest to generate single stranded overhangs at cut ends of the molecules that include the regions of interest; filling in each overhang with a least one labeled nucleotide; and enriching the molecules that include the regions of interest by contacting the labeled nucleotides in the molecule with capture domains; removing the molecules that include the regions of interest from the capture domains; optionally converting methylated cytosines to uracils; and adding nucleic acid adapters to the plurality of nucleic acids to form the nucleic acid library enriched for regions of interest.
140. The method of claim 138 or 139, wherein the cutting is performed by a nuclease.
141. The method of claim 140, wherein the nuclease is a CRISPR-Cas nuclease.
142. The method of claim 141, wherein the nuclease is a type II or a type V CRISPR-Cas nuclease.
143. The method of claim 141 or claim 142, wherein the nuclease is a Cas9, Casl2, or CasX nuclease.
144. The method of claim 143, wherein the nuclease is a Casl2a/Cpfl nuclease.
145. The method of any one of claims 138-144, wherein the nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
146. The method of any one of claims 138-145, wherein the cutting step is performed at room temperature.
147. The method of any one of claims 138-146, wherein the overhang is filled in using a DNA polymerase.
148 The method of claim 147, wherein the DNA polymerase is DNA polymerase I.
149. The method of claim 148, wherein the DNA polymerase I consists of the Klenow fragment.
150. The method of any one of claims 138-149, wherein the label comprises biotin or digoxigenin.
151. The method of any one of claims 138-150, wherein the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
152. The method of any one of claims 138-151, wherein the capture domain comprises or is connected to a solid support.
153. The method of claim 152, wherein the solid support is a bead, a well, a tube, or a slide.
154. The method of claim 153, wherein the capture domain comprises streptavidin connected to the bead.
155. The method of any one of claims 138-154, wherein the method further comprises an additional enrichment step.
156. The method of claim 155, wherein the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
157. The method of claim 155 or 156, wherein the additional enrichment step comprises hybrid capture.
158. The method of any one of claims 155-157, wherein the additional enrichment step comprises using a nucleic acid binding protein.
159. A nucleic acid library, produced by the method of any one of claims 138-158.
160. A kit comprising: a nuclease that cuts a nucleic acid molecule including a target sequence to generate a single stranded overhand at a cut end of the molecule that includes the target; labeled dNTPs;
DNA polymerase; and a capture moiety comprising a capture domain.
161. The kit of claim 160, wherein the nuclease is a CRISPR-Cas nuclease.
162. The kit of claim 161, wherein the nuclease is a type II or a type V CRISPR-Cas nuclease.
163. The kit of claim 161 or claim 162, wherein the nuclease is a Cas9, Casl2, or CasX nuclease.
164. The kit of claim 163, wherein the nuclease is a Casl2a/Cpfl nuclease.
165. The kit of any one of claims 160-164, wherein the DNA polymerase is DNA polymerase I.
166. The kit of claim 165, wherein the DNA polymerase I consists of the Klenow fragment.
167. The kit of any one of claims 160-166, wherein the label comprises biotin, digoxigenin, or a fluorophore.
168. The kit of any one of claims 160-167, wherein the capture moiety comprises a solid support.
169. The kit of claim 168, wherein the solid support is a bead, a well, a tube, or a slide.
170. The kit of claim 169, wherein the capture domain comprises streptavidin connected to the bead.
171. An nucleic acid enrichment method comprising the steps of:
(a) designing a first set of guide RNAs to bind a first set of target sequences for cleavage with a first nuclease,
(b) designing a second set of guide RNAs to bind a second set of target sequences for cleavage with a second nuclease,
(c) adding the first and second sets of guide sequences and the first and second nucleases to a nucleic acid comprising a plurality of target sequences,
(d) generating single stranded overhangs at the cleavage sites in the first and second sets of target sequences,
(e) filling in each overhang with at least one labeled nucleotide; and
(f) enriching the target sequences by contacting at least one of the labeled nucleotides in the molecule with a capture domain.
172. The method of claim 171, wherein the first nuclease or the second nuclease is a CRISPR- Cas nuclease.
173. The method of claim 172, wherein the first nuclease or the second nuclease is a type II or a type V CRISPR-Cas nuclease.
174. The method of claim 172 or claim 173, wherein the first nuclease or the second nuclease is a Cas9, Casl2, or CasX nuclease.
175. The method of claim 174, wherein the first nuclease or the second nuclease is a Casl2a/Cpfl nuclease.
176. The method of any one of claims 171-175, wherein the first nuclease or the second nuclease is associated with a guide RNA (gRNA) comprising a spacer sequence, wherein the spacer sequence binds to the nucleic acid molecule that includes the target sequence.
177. The method of any one of claims 171-176, wherein the cutting step is performed at room temperature.
178. The method of any one of claims 171-177, wherein the overhang is filled in using a DNA polymerase.
179. The method of claim 178, wherein the DNA polymerase is DNA polymerase I.
180. The method of claim 179, wherein the DNA polymerase I consists of the Klenow fragment.
181. The method of any one of claims 171-180, wherein the label comprises biotin or digoxigenin.
182. The method of any one of claims 171-181, wherein the capture domain comprises avidin, streptavidin, or a DIG-binding protein.
183. The method of any one of claims 171-182, wherein the capture domain comprises or is connected to a solid support.
184. The method of claim 183, wherein the solid support is a bead, a well, a tube, or a slide.
185. The method of claim 184, wherein the capture domain comprises streptavidin connected to the bead.
186. The method of any one of claims 171-185, wherein the nucleic acid molecule is present in a nucleic acid sequencing library, and the method enriches target sequences in the library.
187. The method of any one of claims 171-185, wherein the nucleic acid molecule was obtained from a nucleic acid sample from a subject.
188. The method of claim 187, wherein the nucleic acid sample is a plasma sample, and the plasma sample is used directly in the nucleic acid enrichment method without prior enrichment or purification of the nucleic acid.
189. The method of claim 187 or 188, wherein the nucleic acid sample comprises cell free DNA (cfDNA).
190. The method of claim 189, wherein cytosines in the cfDNA have been converted to uracils.
191. The method of claim 189 or 190 wherein the cfDNA has been treated with bisulfite.
192. The method of any one of claims 171-185 and 187-191, wherein the method further comprises preparing a library before or after enriching the molecule that includes the target.
193. The method of any one of claims 171-189 and 192, the method further comprising the step of converting methylated cytosines to uracils.
194. The method of any one of claims 171-193, the method further comprising a wash step to remove nucleic acid molecules that do not include the target.
195. The method of any one of claims 171-194, the method further comprising amplifying the nucleic acid molecule.
196. The method of claim 195, wherein the amplification occurs while the nucleic acid is in contact with the capture domain.
197. The method of any one of claims 171-196, the method further comprising sequencing the enriched molecule.
198. The method of any one of claims 171-197, the method further comprising separating the nucleic acid molecule from the capture domain.
199. The method of claim 198, wherein the separating step is performed using heat elution, a chemical agent, mechanical disruption, or combinations thereof.
200. The method of claim 198 or 199, wherein the method further comprises amplifying the nucleic acid after separation of the nucleic acid from the capture domain.
201. The method of any one of claims 171-200, wherein the method further comprises an additional enrichment step.
202. The method of claim 201, wherein the target sequence comprises a plurality of target sequences, and the enrichment step enriches a subset of the target sequences.
203. The method of claim 201 or 202, wherein the additional enrichment step comprises hybrid capture.
204. The method of any one of claims 201-203, wherein the additional enrichment step comprises using a nucleic acid binding protein.
PCT/US2023/074937 2022-09-23 2023-09-22 Methods for enriching nucleic acid target sequences WO2024064915A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202263409589P 2022-09-23 2022-09-23
US63/409,589 2022-09-23
US202363497175P 2023-04-19 2023-04-19
US63/497,175 2023-04-19

Publications (2)

Publication Number Publication Date
WO2024064915A2 true WO2024064915A2 (en) 2024-03-28
WO2024064915A3 WO2024064915A3 (en) 2024-05-02

Family

ID=90455317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/074937 WO2024064915A2 (en) 2022-09-23 2023-09-22 Methods for enriching nucleic acid target sequences

Country Status (1)

Country Link
WO (1) WO2024064915A2 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014047561A1 (en) * 2012-09-21 2014-03-27 The Broad Institute Inc. Compositions and methods for labeling of agents
US11542544B2 (en) * 2016-05-11 2023-01-03 Illumina, Inc. Polynucleotide enrichment and amplification using CRISPR-Cas or Argonaute systems
CA3160186A1 (en) * 2019-11-05 2021-05-14 Pairwise Plants Services, Inc. Compositions and methods for rna-encoded dna-replacement of alleles

Also Published As

Publication number Publication date
WO2024064915A3 (en) 2024-05-02

Similar Documents

Publication Publication Date Title
US11629379B2 (en) Single cell nucleic acid detection and analysis
JP7256748B2 (en) Methods for targeted nucleic acid sequence enrichment with application to error-corrected nucleic acid sequencing
US20220033890A1 (en) Method for highly sensitive dna methylation analysis
EP2470675B1 (en) Detection and quantification of hydroxymethylated nucleotides in a polynucleotide preparation
EP3837379B1 (en) Method of nucleic acid enrichment using site-specific nucleases followed by capture
CN109952381B (en) Method for multiplex detection of methylated DNA
JP7232643B2 (en) Deep sequencing profiling of tumors
WO2014026031A1 (en) High sensitivity mutation detection using sequence tags
CN114096680A (en) Methods and systems for detecting methylation changes in a DNA sample
US20200063213A1 (en) Methods of Amplifying DNA to Maintain Methylation Status
US20040219580A1 (en) Genome signature tags
CN114438184B (en) Free DNA methylation sequencing library construction method and application
CN113166809A (en) Method, kit, device and application for detecting DNA methylation
Tost Current and emerging technologies for the analysis of the genome-wide and locus-specific DNA methylation patterns
WO2024064915A2 (en) Methods for enriching nucleic acid target sequences
WO2024064369A1 (en) Methods for amplifying nucleic acids
WO2023287876A1 (en) Efficient duplex sequencing using high fidelity next generation sequencing reads
Zhao et al. Method for highly sensitive DNA methylation analysis
TW202417642A (en) Methylation markers for identifying cancer and the applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23869245

Country of ref document: EP

Kind code of ref document: A2