WO2018144159A1 - Capture probes using positive and negative strands for duplex sequencing - Google Patents

Capture probes using positive and negative strands for duplex sequencing Download PDF

Info

Publication number
WO2018144159A1
WO2018144159A1 PCT/US2017/068090 US2017068090W WO2018144159A1 WO 2018144159 A1 WO2018144159 A1 WO 2018144159A1 US 2017068090 W US2017068090 W US 2017068090W WO 2018144159 A1 WO2018144159 A1 WO 2018144159A1
Authority
WO
WIPO (PCT)
Prior art keywords
strand
sequencing
strands
plus
minus
Prior art date
Application number
PCT/US2017/068090
Other languages
French (fr)
Inventor
Clement S. Chu
Noah C. Welker
Kyle BEAUCHAMP
Carlo ARTIERI
Original Assignee
Counsyl, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Counsyl, Inc. filed Critical Counsyl, Inc.
Publication of WO2018144159A1 publication Critical patent/WO2018144159A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B20/00Methods specially adapted for identifying library members
    • C40B20/06Methods specially adapted for identifying library members using iterative deconvolution techniques
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the present invention relates to methods and compositions for sequencing nucleic acids.
  • NGS Next generation sequencing
  • duplex sequencing methods have been developed that greatly reduce errors by independently tagging and sequencing each of the two strands of a DNA duplex (U.S. Patent App. No. 2015/0044687; Schmitt et al. (2012) Proc. Natl. Acad. Sci. USA 109: 14508-13; Kennedy et al. (2013) PLoS Genet. 9:el003794; Kennedy et al. (2014) Nature Protocols 9:2586-2606).
  • Duplex sequencing methods sequence both strands of DNA and, importantly, only scores mutations if the mutations are present as complementary substitutions in both strands of a double-stranded DNA molecule.
  • Duplex sequencing has been found to be >10,000-fold more accurate than conventional NGS (Fox et al. (2014) Next General Sequenc. & Applic. 1 : 106).
  • strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in incorrect evaluation of the amount of evidence observed for one allele vs. the other.
  • duplex sequencing methods that reduce or eliminate errors such as strand bias.
  • the sensitivity of detecting a mutation is largely dictated by the number of complete duplexes that are recovered.
  • a method of sequencing a duplex nucleic acid molecule comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both a plus strand and a minus strand;
  • the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
  • a method of sequencing a duplex nucleic acid molecule comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both plus strands and minus strands;
  • a method of sequencing a duplex nucleic acid molecule comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both plus strands and minus strands;
  • sequentially enriching the plus strands and the minus strands of the sample library using capture probes comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures;
  • sequentially enriching comprises first enriching with capture probes targeting the minus strands followed by enriching remaining unbound strands with capture probes targeting the plus strands.
  • the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
  • amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads.
  • the method further comprises constructing a first strand consensus sequence, comprising: comparing the first strand reads in the set of first strand reads; identifying and removing errors in the set of first strand reads; and constructing an error-corrected first-strand consensus sequence.
  • the method further comprises constructing a second strand consensus sequence, comprising: comparing the second strand reads in the set of second strand reads; identifying and removing errors in the set of second strand reads; and constructing an error-corrected second-strand consensus sequence.
  • the method further comprises comparing the first strand consensus sequence and the second strand consensus sequence; identifying and removing errors in the set of first strand reads and the set of second strand reads; and constructing an error-corrected duplex consensus sequence.
  • a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both a plus strand and a minus strand;
  • the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
  • the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
  • a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both plus strands and minus strands;
  • the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
  • a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand; independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
  • sequentially enriching the plus strands and the minus strands of the sample library using capture probes comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures;
  • the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides
  • the sequencing adapters further comprise duplex molecular barcodes.
  • the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
  • amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads.
  • the method further comprises constructing a first strand consensus sequence, comprising: comparing the first strand reads in the set of first strand reads; identifying and removing errors in the set of first strand reads; and constructing an error-corrected first-strand consensus sequence.
  • the method further comprises constructing a second strand consensus sequence, comprising: comparing the second strand reads in the set of second strand reads; identifying and removing errors in the set of second strand reads; and constructing an error-corrected second-strand consensus sequence.
  • the method further comprises comparing the first strand consensus sequence and the second strand consensus sequence; identifying and removing errors in the set of first strand reads and the set of second strand reads; and constructing an
  • nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3 ' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3 ' end of the second strand;
  • first strand and the second strand are each represented as both a plus strand and a minus strand;
  • the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures.
  • the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
  • nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3 ' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3 ' end of the second strand; independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
  • nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • sequentially enriching the plus strands and the minus strands of the sample library using capture probes comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures.
  • sequentially enriching comprises first enriching with capture probes targeting the minus strands followed by enriching remaining unbound strands with capture probes targeting the plus strands.
  • the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
  • FIG. 1 presents a flow chart of methods of preparing PCR and sequence enriched libraries using customized sequencing adapters ligated to a double-stranded DNA (dsDNA) of interest.
  • FIG. 1A shows that the original dsDNA molecule consisted of both plus (+ ; dashed line) and minus (- ; solid line) strands.
  • Adapters consisted of two unique sequencing primer binding sites, labeled P5 and P7, and an "inline" molecular barcode used to trace back downstream sequencing reads to the dsDNA molecule from which they were derived.
  • FIG. 1 presents a flow chart of methods of preparing PCR and sequence enriched libraries using customized sequencing adapters ligated to a double-stranded DNA (dsDNA) of interest.
  • FIG. 1A shows that the original dsDNA molecule consisted of both plus (+ ; dashed line) and minus (- ; solid line) strands.
  • Adapters consisted of two unique sequencing primer binding sites, labeled
  • FIG. 1C and FIG. ID show capture enrichment performed in parallel including ylated baits/probes designed to both the plus strand of the region of interest (FIG. 1C) and the minus strand of the region of interest (FIG. ID).
  • FIG. IE shows that enriched samples were PCR amplified in preparation for sequencing.
  • FIG. 2 illustrates strand bias as a function of (+) vs. (-) vs. (+ and -) probes.
  • FIG. 3 provides a heatmap summarizing the median molecular and duplex recovery depth from capture enrichments of six independent samples (triplicate of 20ng and 50ng of DNA input into library preparation).
  • duplex sequencing adapters to sequence a duplex nucleic acid molecule and to decrease or eliminate strand bias during amplification and sequencing of a duplex nucleic acid molecule.
  • Primers specific to the duplex sequencing adapters are used to amplify the first strand and the second strand of the duplex nucleic acid molecule to create a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands.
  • Enrichment methods are described wherein the plus strands and minus strands are enriched using capture probes that target substantially the same region of the plus strands and minus strands, thereby producing enriched libraries resulting from plus strand probe captures and minus strand probe captures. Further provided are nucleic acid sequencing libraries constructed using the methods described herein.
  • Reference to "about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of "X”. Additionally, use of “about” preceding any series of numbers includes “about” each of the recited numbers in that series. For example, description referring to "about X, Y, or Z” is intended to describe "about X, about Y, or about Z.”
  • An “adapter” refers to an oligonucleotide that is attached to a nucleic acid of interest and that is used for sequencing and downstream applications.
  • substantially the same region refers to the same region plus or minus up to 5, 10, 15, 20, 25,
  • a "set" of reads refers to all sequencing reads with a common parent nucleic acid strand, which may or may not have had errors introduced during sequencing or
  • a method of sequencing a duplex nucleic acid molecule comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both a plus strand and a minus strand;
  • the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
  • a method of sequencing a duplex nucleic acid molecule comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand; independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
  • the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
  • a method of sequencing a duplex nucleic acid molecule comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both plus strands and minus strands;
  • sequentially enriching the plus strands and the minus strands of the sample library using capture probes comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures;
  • sequentially enriching comprises first enriching with capture probes targeting the minus strands followed by enriching remaining unbound strands with capture probes targeting the plus strands.
  • the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
  • amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads.
  • the method further comprises constructing a first strand consensus sequence, comprising: comparing the first strand reads in the set of first strand reads;
  • the method further comprises constructing a second strand consensus sequence, comprising: comparing the second strand reads in the set of second strand reads; identifying and removing errors in the set of second strand reads; and constructing an error-corrected second-strand consensus sequence.
  • the method further comprises comparing the first strand consensus sequence and the second strand consensus sequence; identifying and removing errors in the set of first strand reads and the set of second strand reads; and constructing an error-corrected duplex consensus sequence.
  • a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both a plus strand and a minus strand;
  • the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
  • a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both plus strands and minus strands;
  • the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
  • a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both plus strands and minus strands;
  • sequentially enriching the plus strands and the minus strands of the sample library using capture probes comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures;
  • the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides
  • the sequencing adapters further comprise duplex molecular barcodes.
  • the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
  • amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads.
  • the method further comprises constructing a first strand consensus sequence, comprising: comparing the first strand reads in the set of first strand reads; identifying and removing errors in the set of first strand reads; and constructing an error-corrected first-strand consensus sequence.
  • the method further comprises constructing a second strand consensus sequence, comprising: comparing the second strand reads in the set of second strand reads; identifying and removing errors in the set of second strand reads; and constructing an error-corrected second-strand consensus sequence.
  • the method further comprises comparing the first strand consensus sequence and the second strand consensus sequence; identifying and removing errors in the set of first strand reads and the set of second strand reads; and constructing an
  • nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
  • first strand and the second strand are each represented as both a plus strand and a minus strand;
  • the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures.
  • the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
  • nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3 ' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3 ' end of the second strand;
  • first strand and the second strand are each represented as both plus strands and minus strands;
  • nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
  • first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3 ' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3 ' end of the second strand;
  • sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures.
  • sequentially enriching comprises first enriching with capture probes targeting the minus strands followed by enriching remaining unbound strands with capture probes targeting the plus strands.
  • the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
  • Sequencing libraries can also be prepared, for example, by forming fragments of DNA (for example, by shearing the DNA), and attaching the duplex sequencing adapters described herein to the DNA fragments.
  • the nucleic acid molecules in the sequencing library are cell-free DNA, such as cell-free fetal DNA (also referred to as “cfDNA”) or circulating tumor DNA (also referred to as “cell-free tumor DNA,” or “ctDNA”).
  • ctDNA circulates in the blood of a cancer patient, and is generally pre-fragmented.
  • cfDNA circulates in the blood of a pregnant mother, and represents the fetal genome.
  • the fragments (for example, the ctDNA or fragments formed by fragmenting longer DNA strands) can be referred to as “inserts,” as they can be "inserted” or ligated adjacent to a sequencing adapter. In some embodiments, the inserts are inserted between two sequencing adapters.
  • RNA molecules can also be sequenced, for example by reverse transcribing the RNA molecules to form DNA molecules, which are attached to sequencing adapters.
  • Sequenced nucleic acid molecules can be used to identify variants in an allele relative to a wild-type or consensus sequence. Such variants can be, for example, a single nucleotide polymorphism (SNP), an insertion or deletion (indel), a copy -number variant, or protein fusion variant. Identification of such variants allows for the diagnosis of genetic disease, a tumor (for example, when sequencing ctDNA), or a fetal abnormality (for example, when sequencing cfDNA).
  • duplex sequencing adapters are ligated to the nucleic acid molecules (i.e., "inserts") in the sequencing library, thereby forming a plurality of inserts bound to the duplex sequencing adapters sequencing adapters (that is, each nucleic acid molecule is ligated to a sequencing adapter at both ends of the nucleic acid molecule).
  • the duplex nucleic acid molecules ligated to the duplex sequencing adapters are amplified, for example using polymerase chain reaction (PCR).
  • amplification primers are combined with the duplex nucleic acids, which can bind to primer annealing sites (for example, primer annealing sites located in the non complementary region of the duplex sequencing adapter).
  • a high-fidelity DNA polymerase is used to amplify the nucleic acid molecules, for example a DNA polymerase with an error rate of about 1 error per 500,000 base pairs or less, or about 1 error per 1 million base pairs or less.
  • a low-fidelity DNA polymerase is used.
  • the amplified nucleic acid molecules are enriched for a region of interest (such as a gene of interest).
  • one or more capture probes are combined with the amplified nucleic acid molecules.
  • the capture probes comprise a nucleic acid sequence complementary to a portion of the region of interest.
  • nucleic acid molecules with complementary regions bind to the capture probes.
  • the nucleic acid molecules binding the capture probes can then be separated from the remaining nucleic acid molecules.
  • the capture probes can be conjugated to a bead (such as a magnetic bead) or a molecular tag (such as biotin), which allows for easy separation.
  • a plurality of different capture probes is used to enrich a particular gene of interest.
  • the sequences of the capture probes are complementary to different portions of the region of interest, thereby allowing the full region of interest to be enriched.
  • An even number of each capture probe can result in sequencing depth variation because some capture probes may more efficiently bind a particular fragment of a sequence of interest than another capture probe. This may be due, for example, to a variance in the GC content within a region of interest. Large variance in sequencing depth can decrease overall sequence quality by limiting sensitivity and specificity in low sequencing depth sub-regions of a region of interest. Additionally, very deep sequencing of sub-regions is unlikely to result in any substantial increase in sequencing quality, but can add significant cost.
  • Sequencing depth variation can improve by balancing the different types of capture probes.
  • Balanced capture probes are a set of capture probes for a sequence of interest, wherein the amount of each capture probe in the set is predetermined to account for varying efficiency of capture for each probe.
  • the balanced capture probes provide a reduction in enrichment variance for fragments in the region of interest relative to unbalanced capture probes.
  • Sequencing can be performed using any known sequencing method, such as single-molecule real-time sequencing, ion semiconductor sequencing, pyrosequencing, massively parallel signature sequencing, or sequencing-by-synthesis chemistry.
  • An exemplary method of sequencing-by-synthesis chemistry is performed using an Illumina HiSeq 2500® sequencer or an Illumina HiSeq 4000® sequencer.
  • the methods described herein can be useful for sequencing DNA sample from a subject.
  • blood sample can be taken from a subject, the DNA isolated from the blood, a sequence library formed by fragmenting the isolated DNA, and the DNA fragments sequenced using the molecular barcodes described herein.
  • the DNA sequencing library comprises cell-free DNA, such as ctDNA or cfDNA.
  • the fraction of cell-free DNA (such as ctDNA or cfDNA) relative to the total amount of DNA in the sample is about 0.001 to about 0.02 (such as about 0.001 to about 0.002, about 0.002 to about 0.003, about 0.003 to about 0.004, about 0.004 to about 0.005, about 0.005 to about 0.006, about 0.006 to about 0.008, about 0.008 to about 0.01, about 0.01 to about 0.012, about 0.012 to about 0.014, about 0.014 to about 0.016, or about 0.016 to about 0.02). Because the fraction of cell-free DNA in a blood sample is generally small relative to the total amount of DNA in the blood sample, sensitive detection of sequence variants is often difficult using previous techniques.
  • a sensitivity about 0.8 or higher can be obtained when the cell-free DNA fraction is about 0.0035 for higher for de novo mutations (that is, unknown sequence variants), and a sensitivity of about 0.8 or higher can be obtained when the cell-free DNA fraction is about 0.002 or higher for known mutations (that is, known sequence variants).
  • a sensitivity about 0.9 or higher can be obtained when the cell-free DNA fraction is about 0.005 for higher for de novo mutations, and a sensitivity of about 0.9 or higher can be obtained when the cell-free DNA fraction is about 0.003 or higher for known mutations.
  • a sensitivity about 0.95 or higher can be obtained when the cell-free DNA fraction is about 0.006 for higher for de novo mutations, and a sensitivity of about 0.95 or higher can be obtained when the cell-free DNA fraction is about 0.004 or higher for known mutations.
  • the duplex sequencing adapters described herein further comprise a molecular barcode having a nucleic acid duplex with a predetermined or nondegenerate sequence.
  • a plurality of duplex sequencing adapters described herein comprise molecular barcodes of two or more different lengths (i.e., variable length barcodes).
  • the duplex sequencing adapter comprises a constant 3 '-overhang.
  • the duplex sequencing adapter comprises a sample index.
  • the duplex sequencing adapter comprises a primer annealing site.
  • the duplex sequencing adapters described herein can be used, for example, in any of the methods described herein.
  • the duplex sequencing adapters are Y-shaped duplex sequencing adapters. In some embodiments, the duplex sequencing adapters are U-shaped duplex sequencing adapters. In some embodiments, a composition comprising a plurality of duplex sequencing adapters comprises only Y-shaped adapters or both Y-shaped adapters and U-shaped adapters.
  • Duplex sequencing adapter compositions comprise a plurality of duplex sequencing adapters, as described herein.
  • the molecular barcodes in a plurality of duplex sequencing adapters are diverse, although multiple copies of the same molecular barcode may be present in a composition comprising the plurality of duplex sequencing adapters.
  • the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500, such as between about 10 and about 400, between about 20 and about 300, between about 50 and about 200, between about 10 and about 50, between about 50 and about 100, between about 75 and about 150, between about 100 and about 200, between about 200 and about 300, between about 300 and about 400, between about 400 and about 500, or about 24, about 48, about 96, about 192, or about 384.
  • a molecular barcode in the plurality of duplex sequencing adapters have an edit distance of 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, or 8 or more from any other unique molecular barcode.
  • Edit distance refers to the minimum number of single-base substitutions, single-base insertions, and/or single-base deletions that a pair of sequences must undergo to result in complete identity between the two sequences.
  • first molecular barcode and a second molecular barcode For example, if the edit distance between a first molecular barcode and a second molecular barcode is 2, either the first molecular barcode must be mutated at least twice, the second molecular barcode must be mutated at least twice, or the first molecular barcode and the second molecular barcode must be mutated at least once each to result in identical sequences.
  • the duplex sequencing adapter further comprises a constant 3 '-overhang, which can be adjacent to the molecular barcode in the duplex sequencing adapter.
  • the constant 3 '-overhang is referred to as "constant" because the same
  • the constant 3 '-overhang is used for each of the duplex sequencing adapters in a composition.
  • the constant 3 '-overhang can comprise adenine (A), thymine (T), guanine (G), cytosine (C), uracil (U), inosine (I), or any other natural or synthetic base.
  • the 3 '-overhang comprises a dinucleotide, such as a guanine-cytosine (GC) dinucleotide.
  • the constant 3 '-overhang can be ligated to the nucleic acid molecule to be sequenced.
  • FIG. 2 illustrates one exemplary embodiment of a duplex sequencing adapter comprising a constant 3 '-overhang.
  • the molecular barcode is ligated adjacent to the nucleic acid molecule to be sequenced, except that it may be separated by the constant 3 '-overhang (and/or its complementary base(s) that may be included in the complementary strand after ligation).
  • the molecular barcodes can be of any length, for example between about 2 and about 24 bases length. In some embodiments, the molecular barcodes are about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 bases in length. In some embodiments, a composition comprises a plurality of duplex sequencing adapters, and the duplex sequencing adapters comprise molecular barcodes of at least two different lengths, at least three different lengths, or at least four different lengths.
  • a plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first molecular barcode comprising a nucleic acid duplex n nucleotides in length; and a second duplex sequencing adapter comprising a second molecular barcode comprising a nucleic acid duplex n + x nucleotides, wherein x is not zero.
  • the plurality of duplex sequencing adapters further comprises a third duplex sequencing adapter comprising a third molecular barcode comprising a nucleic acid duplex n +y nucleotides in length, wherein ⁇ is not zero or x.
  • Variable lengths of the molecular barcodes in the plurality of duplex sequencing adapters are particularly useful, for example, when the duplex sequencing adapters comprise a constant 3 '-overhang.
  • the constant 3 '-overhang would be read in the same sequencing cycle, resulting in a large, non-diverse signal.
  • Such non-diverse (or low diverse) signals can be problematic for many sequencing systems, as it can create a high level of noise that overwhelms the true signal at that position.
  • variable length molecular barcodes it ensures that no single sequencing cycle is presented with only a single base, thereby preventing loss of sequencing quality.
  • the molecular barcodes are laser-color balanced. Similar to the variable lengths of the molecular barcodes, laser-color balancing can help ensure that no single sequencing cycle is presented with only a single base when sequencing the molecular barcode. For example, some sequencing systems employ colored lasers to sequence nucleic acid molecules (for example, in some sequencing systems, a green laser is used to sequence G or T nucleotides, and a red laser is used to sequence A or C nucleotides). To avoid oversaturation of signal, resulting in sequencing quality loss, the molecular barcodes can be color balanced. In some embodiments, the molecular barcodes are laser-color balanced amongst the plurality of sequence adapters.
  • the ratio of A/C to G/T nucleotides at any given position of the molecular barcode in the plurality of sequence adapters is between about 2: 1 and about 1 :2 (such as about 1 : 1) at the
  • the molecular barcodes are laser-color balanced within any given molecular barcode.
  • the ratio of A/C to G/T nucleotides within any given molecular barcode is between about 2: 1 and about 1 :2 (such as about 1 : 1).
  • the molecular barcodes are base-composition balanced. In some embodiments, the molecular barcodes are base-composition balanced amongst the plurality of sequence adapters.
  • the proportion of adenine at any given position of the molecular barcode amongst the plurality of sequence adapters is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25) at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters;
  • the proportion of cytosine at any given position of the molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25) at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters;
  • the proportion of thymine at any given position of the molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.2
  • the molecular barcodes are base-composition balanced within the molecular barcode.
  • the proportion of adenine within any given molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); the proportion of cytosine within any given molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); the proportion of thymidine within any given molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); and the proportion of guanine within any given molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25).
  • Laser-colored balancing and base-composition balancing at any given position of the molecular barcode amongst the plurality of sequence adapters is preferably measured against the length of the shortest molecular barcode. This is because, in some embodiments, a constant 3 '-overhang is adjacent to the molecular barcode in the duplex sequencing adapter, which can cause a strong signal for that particular nucleotide. Including the same nucleotide at the position of a longer molecular barcode that overlaps the 3 '-overhang following a shorter barcode, would add to the signal of the nucleotide in the 3 '-overhang. Thus, in some embodiments, the molecular barcodes do not comprise the nucleotide present in the 3 '-overhang at any position that would be co-sequenced with the 3 '-overhang.
  • the proportion of any given nucleotide (e.g., A, T, C, or G) at any given position of the molecular barcode amongst the plurality of sequence adapters is between about 0.2 and about 0.3 (such as about 0.25) at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters, and between about 0.25 and about 0.4 (such as about 0.33) for any given nucleotide other than the constant 3 '-overhang nucleotide at any position beyond the length of the shortest molecular barcode.
  • Laser-color balancing and base-composition balancing within any given molecular barcode can be determined by counting the fraction of different nucleotide types within any molecular barcode.
  • Base-composition balance need not be precisely balanced. For example, in molecular barcodes with a length not divisible by 4, an imperfect balance is inevitable.
  • the molecular barcodes include additional engineering features to enhance the sequencing quality.
  • the molecular barcodes do not comprise homopolymer sequences (such as three or more consecutive, identical nucleotides; three or more consecutive, identical nucleotides; four or more consecutive, identical nucleotides; five or more consecutive, identical nucleotides; or six or more consecutive, identical nucleotides).
  • the molecular barcodes are non-self-complementary (i.e., a single strand of the molecular barcode is not complementary to itself, for example a hairpin structure).
  • the duplex sequencing adapter optionally further comprises a sample index.
  • the sample index can be used to identify the sample of origin of each read, and allows pooling of multiple samples during the same sequencing run.
  • the sample index is the same for each duplex sequencing adapter when the duplex sequencing adapter is ligated to the nucleic acid molecules, and different samples can be pooled together after ligation. Pooling of samples can occur, for example, prior to any amplification or sequencing of the nucleic acids.
  • the sample index can be of any length, for example between about 6 nucleotides and about 24 nucleotides in length (such as between about 6 nucleotides and about 12 nucleotides, between about 8 nucleotides and about 16 nucleotides, between about 12 nucleotides and about 26 nucleotides, or about 16 nucleotides and about 24 nucleotides in length).
  • the sample index comprises a first portion and a second portion, which may be on the same strand or on differing strands of the duplex sequencing adapter.
  • the first portion of the sample index or the second portion of the sample index is between about 3 nucleotides and about 12 nucleotides in length (such as between about 3 nucleotides and about 6 nucleotides, between about 6 nucleotides and about 8 nucleotides, between about 8 nucleotides and about 10 nucleotides, or about 10 nucleotides and about 12 nucleotides in length).
  • the first portion of the sample index and the second portion of the sample index are of equal length.
  • the sample index is laser-color balanced with the sample index.
  • the ratio of A/C to G/T nucleotides within any given sample index is between about 2: 1 and about 1 :2 (such as about 1 : 1).
  • the sample index is base composition balanced within the sample index.
  • the proportion of adenine within the sample index is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); the proportion of cytosine within the sample index is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); the proportion of thymine within the sample index is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); and the proportion of guanine within the sample index is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25).
  • the sample index includes additional engineering features to enhance the sequencing quality.
  • the sample index does not comprise homopolymer sequences (such as two or more consecutive, identical nucleotides; three or more consecutive, identical nucleotides; four or more consecutive, identical nucleotides; five or more consecutive, identical nucleotides; or six or more consecutive, identical nucleotides).
  • the sample index is not comprise homopolymer sequences (such as two or more consecutive, identical nucleotides; three or more consecutive, identical nucleotides; four or more consecutive, identical nucleotides; five or more consecutive, identical nucleotides; or six or more consecutive, identical nucleotides).
  • sample index is not complementary to itself, for example the sample index is not a hairpin structure.
  • two or more sequencing libraries are pooled, wherein the nucleic acid molecules are ligated to a duplex sequencing adapter and wherein the individual sequencing libraries are identifiable by a unique sample index.
  • the sample indices an edit distance of 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, or 8 or more from any other unique sample index.
  • the sample index is incorporated into a plurality of duplex sequencing adapters by combining the duplex sequencing adapters comprising a molecular barcode with an oligonucleotide comprising the complement sequence of a sample index.
  • the oligonucleotide further comprises a complementation region, which is complementary to a portion of a non-complementary region of the duplex sequencing adapter.
  • a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence, wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; and wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96).
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are color balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters.
  • a constant 3 '-overhang such as a
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are base-composition balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; and wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96).
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are color balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are base-composition balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence, wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; and wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96).
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero.
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; and wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96).
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are color balanced at the corresponding position relative to the shortest molecular barcode in
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are base-composition balanced at the corresponding position relative to the shortest molecular
  • the edit distance between each molecular barcode is 2 or more.
  • the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification).
  • the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter).
  • the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
  • Customized sequencing adapters were ligated to a double-stranded DNA (dsDNA) of interest during a library preparation step.
  • dsDNA double-stranded DNA
  • FIG. 1A the original dsDNA molecule consisted of both plus (+ ; dashed line) and minus (- ; solid line) strands.
  • Adapters in this instance, consisted of two unique sequencing primer binding sites, labeled P5 and P7, and an "inline" molecular barcode used to trace back downstream sequencing reads to the dsDNA molecule from which they were derived. Reconstructing the sequence of the original dsDNA through redundant sequencing of the original strands allowed for a high degree of error correction and an almost complete elimination of false positive mutations, due mainly to chemical DNA damage and sequencing artifacts.
  • Capture enrichment was performed in parallel including biotinylated baits/probes designed to both the plus strand of the region of interest (FIG. 1C) and the minus strand of the region of interest (FIG. ID). Probes used in FIG. 1C and FIG. ID were reverse complements of one another, but were separated spatially so as to prevent self binding.
  • the DNA library from FIG. IB was split and separately added to either plus strand probes (FIG. 1C), or minus strand probes (FIG. ID).
  • blocking oligos were included that possess complementarity to the sequencing primer binding sites P5 and P7. The inclusion of blocking oligos minimized "off target” effects due to non-specific binding of these sequences to other "off target” library molecules that also contained these sequences.
  • the mixture of probes, DNA library, and blocking oligos was concentrated via SpeedVacTM (Thermo Fisher Scientific Inc.), and resuspended in hybridization buffer.
  • the hybridization mixture was heated to 95°C for 5-10 minutes and then cooled to 55-65°C overnight to allow probes to bind to the target DNA library.
  • magnetic streptavidin beads were added to the solution and incubated at 55-65°C.
  • the probe-library bound beads were washed to remove DNA that bound non-specifically, and finally resuspended in a PCR Master Mix containing primers to P5/P7.
  • Enriched samples were PCR amplified in preparation for sequencing on an Illumina HiSeq 2500 (FIG. IE). Paired end sequencing was performed on the enriched libraries such that plus strand probe captures (FIG. 1C) were sequenced independently from minus strand probe captures (FIG. ID). In this way, downstream analysis could be performed such that it was easy to compare the effects of capturing with probes to designed to either the plus strand, the minus strand, or both (by merging data from both sequencing runs).
  • FIG. 2 shows strand bias as a function of (+) vs. (-) vs. (+ and -) probes.
  • Six technically and biologically independent samples were analyzed. In other words, six biologically distinct DNA samples were all prepared and enriched in 12 independent enrichment reactions (+ and - strand probes for each biological replicate). All (+) strand probe enrichments were consolidated and sequenced independently from (-) strand probe enrichments.
  • Strand bias was determined from sequencing reads by calculating the number of times a forward read (read from the P5 sequencing primer as shown in FIG. 1) maps to the plus (black) or minus (grey) strand following alignment to a reference genome. For this analysis, only reads mapping to the region of interest to which probes were designed were considered. Since (+) strand probe enrichments were sequenced independently from (-) strand probe enrichments, strand bias metrics were determined for each enrichment condition independently (shown as checkered blocks for (+) strand probes and hashed blocks for (-) strand probes). Further, data from both (+) and (-) strand probe enrichments was merged and shown as solid blocks (FIG. 2).
  • FIG. 3 shows a heatmap summarizing the median molecular and duplex recovery depth from capture enrichments of six independent samples (triplicate of 20ng and 50ng of DNA input into library preparation).
  • DNA samples were molecularly barcoded during library preparation and PCR such that sequencing reads could be assigned to the original duplex molecule from which they were derived.
  • Duplex depth is defined as the number of molecules for which both strands of the original molecule were sequenced, whereas molecular depth is defined as the depth of duplexes plus any molecules for which only one strand was recovered after enrichment and sequencing.
  • Libraries were captured with probes designed to a ⁇ 10Kb region of interest to both the plus and minus strands (i.e., the probes were reverse complements of one another). As such, libraries were split following PCR into separate reactions and enriched independently, in one reaction with plus strand capture probes and in the other with minus strand capture probes. These enrichments were sequenced independently and then merged for the analysis of (+) and (-) strand capture probes.

Abstract

Provided herein are methods comprising the use of duplex sequencing adapters to sequence a duplex nucleic acid molecule and to decrease or eliminate strand bias during amplification and sequencing of a duplex nucleic acid molecule. Primers specific to the duplex sequencing adapters are used to amplify the first strand and the second strand of the duplex nucleic acid molecule to create a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands. Enrichment methods are described wherein the plus strands and minus strands are enriched using capture probes that target substantially the same region of the plus strands and minus strands, thereby producing enriched libraries resulting from plus strand probe captures and minus strand probe captures. Further provided are nucleic acid sequencing libraries constructed using the methods described herein.

Description

CAPTURE PROBES USING POSITIVE AND NEGATIVE STRANDS
FOR DUPLEX SEQUENCING
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application Number 62/452,593, filed January 31, 2017 and titled CAPTURE PROBES USING POSITIVE AND NEGATIVE STRANDS FOR DUPLEX SEQUENCING AND METHODS OF USE THEREOF, the disclosure of which is incorporated by reference herein in its entirety.
FIELD OF THE INVENTION
[0002] The present invention relates to methods and compositions for sequencing nucleic acids.
BACKGROUND
[0003] Next generation sequencing (NGS) of nucleic acids has greatly increased the rate of genomic sequencing, thereby bringing in a new era for medical diagnostics, forensics, metagenomics, and many other applications. However, these high-throughput approaches often introduce errors. These errors can arise, for example, during nucleic acid
amplification or sequencing, or downstream analysis.
[0004] Traditionally, NGS platforms rely upon generation of sequence data from a single strand of DNA. As a consequence, artifactual mutations introduced during the initial rounds of PCR amplification are undetectable as errors, even with tagging techniques, if the base change is propagated to all subsequent PCR duplicates. Several types of DNA damage are highly mutagenic and may lead to this scenario. Spontaneous DNA damage arising from normal metabolic processes results in thousands of damaging events per cell per day (Lindahl & Wood (1999) Science 286: 1897-1905). In addition to damage from oxidative cellular processes, further DNA damage is generated ex vivo during tissue processing and DNA extraction (Kunkel (1984) Proc. Natl. Acad. Sci. USA 81 : 1494-98). These damage events can result in frequent copying errors by DNA polymerases. For example, a common DNA lesion arising from oxidative damage, 8-oxo-guanine, has the propensity to incorrectly pair with adenine during complementary strand extension with an overall efficiency greater than that of correct pairing with cytosine, and thus can contribute a large frequency of artifactual G→T mutations (Shibutani ei al. (1991) Nature 349:431-4). Likewise, deamination of cytosine to form uracil is a particularly common event which leads to the inappropriate insertion of adenine during PCR, thus producing artifactual C→T mutations with a frequency approaching 100% (Stiller et al. (2006) Proc. Natl. Acad. Sci. USA 103 : 13578-84).
[0005] To overcome limitations in sequencing accuracy, duplex sequencing methods have been developed that greatly reduce errors by independently tagging and sequencing each of the two strands of a DNA duplex (U.S. Patent App. No. 2015/0044687; Schmitt et al. (2012) Proc. Natl. Acad. Sci. USA 109: 14508-13; Kennedy et al. (2013) PLoS Genet. 9:el003794; Kennedy et al. (2014) Nature Protocols 9:2586-2606). Duplex sequencing methods sequence both strands of DNA and, importantly, only scores mutations if the mutations are present as complementary substitutions in both strands of a double-stranded DNA molecule. Duplex sequencing has been found to be >10,000-fold more accurate than conventional NGS (Fox et al. (2014) Next General Sequenc. & Applic. 1 : 106).
[0006] While duplex sequencing methods have improved accuracy compared to conventional sequencing techniques, errors still arise. For example, strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in incorrect evaluation of the amount of evidence observed for one allele vs. the other.
Accordingly, it would be desirable to develop improved duplex sequencing methods that reduce or eliminate errors such as strand bias. Further, due to the high degree of background correction afforded by duplex sequencing workflows, the sensitivity of detecting a mutation is largely dictated by the number of complete duplexes that are recovered.
[0007] The disclosures of all publications, patents, and patent applications referred to herein are hereby incorporated herein by reference in their entireties.
SUMMARY OF THE INVENTION
[0008] In one embodiment, a method of sequencing a duplex nucleic acid molecule is provided, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both a plus strand and a minus strand;
differentially enriching the resulting plus strands and the minus strands using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures. In another embodiment, the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
[0009] In another embodiment, a method of sequencing a duplex nucleic acid molecule is provided, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
splitting the sample library into a first sample enrichment library and a second sample enrichment library;
enriching the plus strands of the first sample enrichment library using capture probes and enriching the minus strands of the second sample enrichment library using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures. [0010] In another embodiment, a method of sequencing a duplex nucleic acid molecule is provided, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
sequentially enriching the plus strands and the minus strands of the sample library using capture probes, wherein sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures. In other embodiments, sequentially enriching comprises first enriching with capture probes targeting the minus strands followed by enriching remaining unbound strands with capture probes targeting the plus strands.
[0011] Within the methods of sequencing a duplex nucleic acid molecule, in some embodiments, the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
[0012] Further within the methods of sequencing a duplex nucleic acid molecule, in some embodiments, amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads. In other embodiments, the method further comprises constructing a first strand consensus sequence, comprising: comparing the first strand reads in the set of first strand reads; identifying and removing errors in the set of first strand reads; and constructing an error-corrected first-strand consensus sequence. In other embodiments, the method further comprises constructing a second strand consensus sequence, comprising: comparing the second strand reads in the set of second strand reads; identifying and removing errors in the set of second strand reads; and constructing an error-corrected second-strand consensus sequence. In other embodiments, the method further comprises comparing the first strand consensus sequence and the second strand consensus sequence; identifying and removing errors in the set of first strand reads and the set of second strand reads; and constructing an error-corrected duplex consensus sequence.
[0013] In another embodiment, a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule is provided, comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both a plus strand and a minus strand;
differentially enriching the plus strands and the minus strands using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures. In another embodiment, the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
In another embodiment, a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule is provided, comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
splitting the sample library into a first sample enrichment library and a second sample enrichment library;
enriching the plus strands of the first sample enrichment library using capture probes and enriching the minus strands of the second sample enrichment library using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
[0014] In another embodiment, a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule is provided, comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand; independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
sequentially enriching the plus strands and the minus strands of the sample library using capture probes, wherein sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
[0015] Within the methods of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule, in some embodiments, the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides
complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
[0016] Further within the methods of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule, in some embodiments, amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads. In other embodiments, the method further comprises constructing a first strand consensus sequence, comprising: comparing the first strand reads in the set of first strand reads; identifying and removing errors in the set of first strand reads; and constructing an error-corrected first-strand consensus sequence. In other embodiments, the method further comprises constructing a second strand consensus sequence, comprising: comparing the second strand reads in the set of second strand reads; identifying and removing errors in the set of second strand reads; and constructing an error-corrected second-strand consensus sequence. In other embodiments, the method further comprises comparing the first strand consensus sequence and the second strand consensus sequence; identifying and removing errors in the set of first strand reads and the set of second strand reads; and constructing an
error-corrected duplex consensus sequence.
[0017] In another embodiment, nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3 ' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3 ' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both a plus strand and a minus strand;
differentially enriching the plus strands and the minus strands using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures. In another embodiment, the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
[0018] In another embodiment, nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3 ' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3 ' end of the second strand; independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
splitting the sample library into a first sample enrichment library and a second sample enrichment library; and
enriching the plus strands of the first sample enrichment library using capture probes and enriching the minus strands of the second sample enrichment library using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures.
[0019] In another embodiment, nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands; and
sequentially enriching the plus strands and the minus strands of the sample library using capture probes, wherein sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures. In other embodiments, sequentially enriching comprises first enriching with capture probes targeting the minus strands followed by enriching remaining unbound strands with capture probes targeting the plus strands.
[0020] Within the nucleic acid sequencing libraries produced by methods described herein, in some embodiments, the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] FIG. 1 presents a flow chart of methods of preparing PCR and sequence enriched libraries using customized sequencing adapters ligated to a double-stranded DNA (dsDNA) of interest. FIG. 1A shows that the original dsDNA molecule consisted of both plus (+ ; dashed line) and minus (- ; solid line) strands. Adapters consisted of two unique sequencing primer binding sites, labeled P5 and P7, and an "inline" molecular barcode used to trace back downstream sequencing reads to the dsDNA molecule from which they were derived. FIG. IB shows that, following adapter ligation, library samples were PCR amplified with primers specific to P5/P7 sequences, such that only molecules with adapters ligated to both ends were exponentially amplified (dashed lines indicate post-PCR library molecules derived from the original plus strand, while solid lines indicate post-PCR library molecules derived from the original minus strand). FIG. 1C and FIG. ID show capture enrichment performed in parallel including ylated baits/probes designed to both the plus strand of the region of interest (FIG. 1C) and the minus strand of the region of interest (FIG. ID). FIG. IE shows that enriched samples were PCR amplified in preparation for sequencing.
[0022] FIG. 2 illustrates strand bias as a function of (+) vs. (-) vs. (+ and -) probes.
[0023] FIG. 3 provides a heatmap summarizing the median molecular and duplex recovery depth from capture enrichments of six independent samples (triplicate of 20ng and 50ng of DNA input into library preparation).
DETAILED DESCRIPTION OF THE EMB ODEVIENT S
[0024] Described herein are methods comprising the use of duplex sequencing adapters to sequence a duplex nucleic acid molecule and to decrease or eliminate strand bias during amplification and sequencing of a duplex nucleic acid molecule. Primers specific to the duplex sequencing adapters are used to amplify the first strand and the second strand of the duplex nucleic acid molecule to create a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands. Enrichment methods are described wherein the plus strands and minus strands are enriched using capture probes that target substantially the same region of the plus strands and minus strands, thereby producing enriched libraries resulting from plus strand probe captures and minus strand probe captures. Further provided are nucleic acid sequencing libraries constructed using the methods described herein.
Definitions
[0025] As used herein, the singular forms "a," "an," and "the" include the plural reference unless the context clearly dictates otherwise.
[0026] Reference to "about" a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to "about X" includes description of "X". Additionally, use of "about" preceding any series of numbers includes "about" each of the recited numbers in that series. For example, description referring to "about X, Y, or Z" is intended to describe "about X, about Y, or about Z."
[0027] It is understood that aspects and variations of the invention described herein include "consisting" and/or "consisting essentially of aspects and variations.
[0028] An "adapter" refers to an oligonucleotide that is attached to a nucleic acid of interest and that is used for sequencing and downstream applications.
[0029] In the context of comparing regions of nucleic acid strands, "substantially the same region" as used herein refers to the same region plus or minus up to 5, 10, 15, 20, 25,
30, 35, 40, 45, or 50 base pairs.
[0030] A "set" of reads refers to all sequencing reads with a common parent nucleic acid strand, which may or may not have had errors introduced during sequencing or
amplification of the parent nucleic acid strand.
[0031] It is to be understood that one, some or all of the properties of the various embodiments described herein may be combined to form other embodiments of the present invention. [0032] The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
Duplex Sequencing Methods, Methods for Decreasing or Eliminating Strand Bias, and Nucleic Acid Sequencing Libraries
[0033] In one embodiment, a method of sequencing a duplex nucleic acid molecule is provided, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both a plus strand and a minus strand;
differentially enriching the resulting plus strands and the minus strands using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures. In another embodiment, the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
[0034] In another embodiment, a method of sequencing a duplex nucleic acid molecule is provided, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand; independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
splitting the sample library into a first sample enrichment library and a second sample enrichment library;
enriching the plus strands of the first sample enrichment library using capture probes and enriching the minus strands of the second sample enrichment library using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
[0035] In another embodiment, a method of sequencing a duplex nucleic acid molecule is provided, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
sequentially enriching the plus strands and the minus strands of the sample library using capture probes, wherein sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures. In other embodiments, sequentially enriching comprises first enriching with capture probes targeting the minus strands followed by enriching remaining unbound strands with capture probes targeting the plus strands.
[0036] Within the methods of sequencing a duplex nucleic acid molecule, in some embodiments, the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
[0037] Further within the methods of sequencing a duplex nucleic acid molecule, in some embodiments, amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads. In other embodiments, the method further comprises constructing a first strand consensus sequence, comprising: comparing the first strand reads in the set of first strand reads;
identifying and removing errors in the set of first strand reads; and constructing an error-corrected first-strand consensus sequence. In other embodiments, the method further comprises constructing a second strand consensus sequence, comprising: comparing the second strand reads in the set of second strand reads; identifying and removing errors in the set of second strand reads; and constructing an error-corrected second-strand consensus sequence. In other embodiments, the method further comprises comparing the first strand consensus sequence and the second strand consensus sequence; identifying and removing errors in the set of first strand reads and the set of second strand reads; and constructing an error-corrected duplex consensus sequence.
[0038] In another embodiment, a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule is provided, comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both a plus strand and a minus strand;
differentially enriching the plus strands and the minus strands using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures. In another embodiment, the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
In another embodiment, a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule is provided, comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
splitting the sample library into a first sample enrichment library and a second sample enrichment library;
enriching the plus strands of the first sample enrichment library using capture probes and enriching the minus strands of the second sample enrichment library using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
[0039] In another embodiment, a method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule is provided, comprising: ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
sequentially enriching the plus strands and the minus strands of the sample library using capture probes, wherein sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
[0040] Within the methods of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule, in some embodiments, the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides
complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
[0041] Further within the methods of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule, in some embodiments, amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads. In other embodiments, the method further comprises constructing a first strand consensus sequence, comprising: comparing the first strand reads in the set of first strand reads; identifying and removing errors in the set of first strand reads; and constructing an error-corrected first-strand consensus sequence. In other embodiments, the method further comprises constructing a second strand consensus sequence, comprising: comparing the second strand reads in the set of second strand reads; identifying and removing errors in the set of second strand reads; and constructing an error-corrected second-strand consensus sequence. In other embodiments, the method further comprises comparing the first strand consensus sequence and the second strand consensus sequence; identifying and removing errors in the set of first strand reads and the set of second strand reads; and constructing an
error-corrected duplex consensus sequence.
[0042] In another embodiment, nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both a plus strand and a minus strand;
differentially enriching the plus strands and the minus strands using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures. In another embodiment, the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
[0043] In another embodiment, nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3 ' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3 ' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
splitting the sample library into a first sample enrichment library and a second sample enrichment library; and
enriching the plus strands of the first sample enrichment library using capture probes and enriching the minus strands of the second sample enrichment library using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures.
[0044] In another embodiment, nucleic acid sequencing libraries are provided, wherein the nucleic acid sequencing libraries are produced by methods comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3 ' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3 ' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands; and sequentially enriching the plus strands and the minus strands of the sample library using capture probes, wherein sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures. In other embodiments, sequentially enriching comprises first enriching with capture probes targeting the minus strands followed by enriching remaining unbound strands with capture probes targeting the plus strands.
[0045] Within the nucleic acid sequencing libraries produced by methods described herein, in some embodiments, the first sequencing adapter and the second sequencing adapter are the same. In other embodiments, the first sequencing adapter and the second sequencing adapter are different. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced together. In other embodiments, the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately. In other embodiments, enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters. In other embodiments, the sequencing adapters further comprise duplex molecular barcodes. In other embodiments the duplex nucleic acid molecule is a cell-free DNA molecule, particularly a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
[0046] Sequencing libraries can also be prepared, for example, by forming fragments of DNA (for example, by shearing the DNA), and attaching the duplex sequencing adapters described herein to the DNA fragments.
[0047] In some embodiments, the nucleic acid molecules in the sequencing library are cell-free DNA, such as cell-free fetal DNA (also referred to as "cfDNA") or circulating tumor DNA (also referred to as "cell-free tumor DNA," or "ctDNA"). ctDNA circulates in the blood of a cancer patient, and is generally pre-fragmented. cfDNA circulates in the blood of a pregnant mother, and represents the fetal genome. The fragments (for example, the ctDNA or fragments formed by fragmenting longer DNA strands) can be referred to as "inserts," as they can be "inserted" or ligated adjacent to a sequencing adapter. In some embodiments, the inserts are inserted between two sequencing adapters. RNA molecules can also be sequenced, for example by reverse transcribing the RNA molecules to form DNA molecules, which are attached to sequencing adapters. [0048] Sequenced nucleic acid molecules can be used to identify variants in an allele relative to a wild-type or consensus sequence. Such variants can be, for example, a single nucleotide polymorphism (SNP), an insertion or deletion (indel), a copy -number variant, or protein fusion variant. Identification of such variants allows for the diagnosis of genetic disease, a tumor (for example, when sequencing ctDNA), or a fetal abnormality (for example, when sequencing cfDNA).
[0049] The duplex sequencing adapters are ligated to the nucleic acid molecules (i.e., "inserts") in the sequencing library, thereby forming a plurality of inserts bound to the duplex sequencing adapters sequencing adapters (that is, each nucleic acid molecule is ligated to a sequencing adapter at both ends of the nucleic acid molecule).
[0050] In some embodiments, the duplex nucleic acid molecules ligated to the duplex sequencing adapters are amplified, for example using polymerase chain reaction (PCR). In some embodiments, amplification primers are combined with the duplex nucleic acids, which can bind to primer annealing sites (for example, primer annealing sites located in the non complementary region of the duplex sequencing adapter). In some embodiments, a high-fidelity DNA polymerase is used to amplify the nucleic acid molecules, for example a DNA polymerase with an error rate of about 1 error per 500,000 base pairs or less, or about 1 error per 1 million base pairs or less. In some embodiments, a low-fidelity DNA polymerase is used.
[0051] In some embodiments, the amplified nucleic acid molecules are enriched for a region of interest (such as a gene of interest). For example, in some embodiments, one or more capture probes are combined with the amplified nucleic acid molecules. The capture probes comprise a nucleic acid sequence complementary to a portion of the region of interest. Thus, nucleic acid molecules with complementary regions bind to the capture probes. The nucleic acid molecules binding the capture probes can then be separated from the remaining nucleic acid molecules. For example, the capture probes can be conjugated to a bead (such as a magnetic bead) or a molecular tag (such as biotin), which allows for easy separation.
[0052] In some embodiments, a plurality of different capture probes is used to enrich a particular gene of interest. The sequences of the capture probes are complementary to different portions of the region of interest, thereby allowing the full region of interest to be enriched. An even number of each capture probe can result in sequencing depth variation because some capture probes may more efficiently bind a particular fragment of a sequence of interest than another capture probe. This may be due, for example, to a variance in the GC content within a region of interest. Large variance in sequencing depth can decrease overall sequence quality by limiting sensitivity and specificity in low sequencing depth sub-regions of a region of interest. Additionally, very deep sequencing of sub-regions is unlikely to result in any substantial increase in sequencing quality, but can add significant cost.
[0053] Sequencing depth variation can improve by balancing the different types of capture probes. Balanced capture probes are a set of capture probes for a sequence of interest, wherein the amount of each capture probe in the set is predetermined to account for varying efficiency of capture for each probe. Thus, the balanced capture probes provide a reduction in enrichment variance for fragments in the region of interest relative to unbalanced capture probes. Roak et al., Multiplex Targeted Sequencing Identifiers Recurrently Mutated Genes in Autism Spectrum Disorders, Science, vol. 338, pp.
1619-1622 (2012), provides examples of balanced capture probes.
[0054] Sequencing can be performed using any known sequencing method, such as single-molecule real-time sequencing, ion semiconductor sequencing, pyrosequencing, massively parallel signature sequencing, or sequencing-by-synthesis chemistry. An exemplary method of sequencing-by-synthesis chemistry is performed using an Illumina HiSeq 2500® sequencer or an Illumina HiSeq 4000® sequencer.
[0055] The methods described herein can be useful for sequencing DNA sample from a subject. For example, blood sample can be taken from a subject, the DNA isolated from the blood, a sequence library formed by fragmenting the isolated DNA, and the DNA fragments sequenced using the molecular barcodes described herein. In some embodiments, the DNA sequencing library comprises cell-free DNA, such as ctDNA or cfDNA. In some embodiments, the fraction of cell-free DNA (such as ctDNA or cfDNA) relative to the total amount of DNA in the sample is about 0.001 to about 0.02 (such as about 0.001 to about 0.002, about 0.002 to about 0.003, about 0.003 to about 0.004, about 0.004 to about 0.005, about 0.005 to about 0.006, about 0.006 to about 0.008, about 0.008 to about 0.01, about 0.01 to about 0.012, about 0.012 to about 0.014, about 0.014 to about 0.016, or about 0.016 to about 0.02). Because the fraction of cell-free DNA in a blood sample is generally small relative to the total amount of DNA in the blood sample, sensitive detection of sequence variants is often difficult using previous techniques. However, using the sequencing adapters and methods described herein, a sensitivity about 0.8 or higher can be obtained when the cell-free DNA fraction is about 0.0035 for higher for de novo mutations (that is, unknown sequence variants), and a sensitivity of about 0.8 or higher can be obtained when the cell-free DNA fraction is about 0.002 or higher for known mutations (that is, known sequence variants). In some embodiments, a sensitivity about 0.9 or higher can be obtained when the cell-free DNA fraction is about 0.005 for higher for de novo mutations, and a sensitivity of about 0.9 or higher can be obtained when the cell-free DNA fraction is about 0.003 or higher for known mutations. In some embodiments, a sensitivity about 0.95 or higher can be obtained when the cell-free DNA fraction is about 0.006 for higher for de novo mutations, and a sensitivity of about 0.95 or higher can be obtained when the cell-free DNA fraction is about 0.004 or higher for known mutations.
Molecular Barcodes and Molecular Barcode Compositions
[0056] In some embodiments, the duplex sequencing adapters described herein further comprise a molecular barcode having a nucleic acid duplex with a predetermined or nondegenerate sequence. In some embodiments, a plurality of duplex sequencing adapters described herein comprise molecular barcodes of two or more different lengths (i.e., variable length barcodes). In some embodiments, the duplex sequencing adapter comprises a constant 3 '-overhang. In some embodiments, the duplex sequencing adapter comprises a sample index. In some embodiments, the duplex sequencing adapter comprises a primer annealing site. The duplex sequencing adapters described herein can be used, for example, in any of the methods described herein.
[0057] In some embodiments, the duplex sequencing adapters are Y-shaped duplex sequencing adapters. In some embodiments, the duplex sequencing adapters are U-shaped duplex sequencing adapters. In some embodiments, a composition comprising a plurality of duplex sequencing adapters comprises only Y-shaped adapters or both Y-shaped adapters and U-shaped adapters.
[0058] Duplex sequencing adapter compositions comprise a plurality of duplex sequencing adapters, as described herein. The molecular barcodes in a plurality of duplex sequencing adapters are diverse, although multiple copies of the same molecular barcode may be present in a composition comprising the plurality of duplex sequencing adapters. For example, in some embodiments, the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500, such as between about 10 and about 400, between about 20 and about 300, between about 50 and about 200, between about 10 and about 50, between about 50 and about 100, between about 75 and about 150, between about 100 and about 200, between about 200 and about 300, between about 300 and about 400, between about 400 and about 500, or about 24, about 48, about 96, about 192, or about 384.
[0059] In some embodiments, a molecular barcode in the plurality of duplex sequencing adapters have an edit distance of 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, or 8 or more from any other unique molecular barcode. Edit distance refers to the minimum number of single-base substitutions, single-base insertions, and/or single-base deletions that a pair of sequences must undergo to result in complete identity between the two sequences. For example, if the edit distance between a first molecular barcode and a second molecular barcode is 2, either the first molecular barcode must be mutated at least twice, the second molecular barcode must be mutated at least twice, or the first molecular barcode and the second molecular barcode must be mutated at least once each to result in identical sequences.
[0060] In some embodiments, the duplex sequencing adapter further comprises a constant 3 '-overhang, which can be adjacent to the molecular barcode in the duplex sequencing adapter. The constant 3 '-overhang is referred to as "constant" because the same
3 '-overhang is used for each of the duplex sequencing adapters in a composition. In some embodiments, the constant 3 '-overhang can comprise adenine (A), thymine (T), guanine (G), cytosine (C), uracil (U), inosine (I), or any other natural or synthetic base. In some embodiments, the 3 '-overhang comprises a dinucleotide, such as a guanine-cytosine (GC) dinucleotide. The constant 3 '-overhang can be ligated to the nucleic acid molecule to be sequenced. FIG. 2 illustrates one exemplary embodiment of a duplex sequencing adapter comprising a constant 3 '-overhang. The molecular barcode is ligated adjacent to the nucleic acid molecule to be sequenced, except that it may be separated by the constant 3 '-overhang (and/or its complementary base(s) that may be included in the complementary strand after ligation).
[0061] The molecular barcodes can be of any length, for example between about 2 and about 24 bases length. In some embodiments, the molecular barcodes are about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 bases in length. In some embodiments, a composition comprises a plurality of duplex sequencing adapters, and the duplex sequencing adapters comprise molecular barcodes of at least two different lengths, at least three different lengths, or at least four different lengths. For example, in some embodiments, a plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first molecular barcode comprising a nucleic acid duplex n nucleotides in length; and a second duplex sequencing adapter comprising a second molecular barcode comprising a nucleic acid duplex n + x nucleotides, wherein x is not zero. In some embodiments, the plurality of duplex sequencing adapters further comprises a third duplex sequencing adapter comprising a third molecular barcode comprising a nucleic acid duplex n +y nucleotides in length, wherein^ is not zero or x.
[0062] Variable lengths of the molecular barcodes in the plurality of duplex sequencing adapters are particularly useful, for example, when the duplex sequencing adapters comprise a constant 3 '-overhang. For example, if all molecular barcodes were of the same length, the constant 3 '-overhang would be read in the same sequencing cycle, resulting in a large, non-diverse signal. Such non-diverse (or low diverse) signals can be problematic for many sequencing systems, as it can create a high level of noise that overwhelms the true signal at that position. Thus, by using variable length molecular barcodes, it ensures that no single sequencing cycle is presented with only a single base, thereby preventing loss of sequencing quality.
[0063] In some embodiments, the molecular barcodes are laser-color balanced. Similar to the variable lengths of the molecular barcodes, laser-color balancing can help ensure that no single sequencing cycle is presented with only a single base when sequencing the molecular barcode. For example, some sequencing systems employ colored lasers to sequence nucleic acid molecules (for example, in some sequencing systems, a green laser is used to sequence G or T nucleotides, and a red laser is used to sequence A or C nucleotides). To avoid oversaturation of signal, resulting in sequencing quality loss, the molecular barcodes can be color balanced. In some embodiments, the molecular barcodes are laser-color balanced amongst the plurality of sequence adapters. For example, in some embodiments, the ratio of A/C to G/T nucleotides at any given position of the molecular barcode in the plurality of sequence adapters is between about 2: 1 and about 1 :2 (such as about 1 : 1) at the
corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the molecular barcodes are laser-color balanced within any given molecular barcode. For example, in some embodiments, the ratio of A/C to G/T nucleotides within any given molecular barcode is between about 2: 1 and about 1 :2 (such as about 1 : 1).
[0064] In some embodiments, the molecular barcodes are base-composition balanced. In some embodiments, the molecular barcodes are base-composition balanced amongst the plurality of sequence adapters. For example, in some embodiments, the proportion of adenine at any given position of the molecular barcode amongst the plurality of sequence adapters is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25) at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters; the proportion of cytosine at any given position of the molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25) at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters; the proportion of thymine at any given position of the molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25) at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters; and the proportion of guanine at any given position of the molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25) at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the molecular barcodes are base-composition balanced within the molecular barcode. For example, in some embodiments, the proportion of adenine within any given molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); the proportion of cytosine within any given molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); the proportion of thymidine within any given molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); and the proportion of guanine within any given molecular barcodes is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25).
[0065] Laser-colored balancing and base-composition balancing at any given position of the molecular barcode amongst the plurality of sequence adapters is preferably measured against the length of the shortest molecular barcode. This is because, in some embodiments, a constant 3 '-overhang is adjacent to the molecular barcode in the duplex sequencing adapter, which can cause a strong signal for that particular nucleotide. Including the same nucleotide at the position of a longer molecular barcode that overlaps the 3 '-overhang following a shorter barcode, would add to the signal of the nucleotide in the 3 '-overhang. Thus, in some embodiments, the molecular barcodes do not comprise the nucleotide present in the 3 '-overhang at any position that would be co-sequenced with the 3 '-overhang.
[0066] In some embodiments, the proportion of any given nucleotide (e.g., A, T, C, or G) at any given position of the molecular barcode amongst the plurality of sequence adapters is between about 0.2 and about 0.3 (such as about 0.25) at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters, and between about 0.25 and about 0.4 (such as about 0.33) for any given nucleotide other than the constant 3 '-overhang nucleotide at any position beyond the length of the shortest molecular barcode.
[0067] Laser-color balancing and base-composition balancing within any given molecular barcode can be determined by counting the fraction of different nucleotide types within any molecular barcode. Base-composition balance need not be precisely balanced. For example, in molecular barcodes with a length not divisible by 4, an imperfect balance is inevitable.
[0068] In some embodiments, the molecular barcodes include additional engineering features to enhance the sequencing quality. For example, in some embodiments, the molecular barcodes do not comprise homopolymer sequences (such as three or more consecutive, identical nucleotides; three or more consecutive, identical nucleotides; four or more consecutive, identical nucleotides; five or more consecutive, identical nucleotides; or six or more consecutive, identical nucleotides). In some embodiments, the molecular barcodes are non-self-complementary (i.e., a single strand of the molecular barcode is not complementary to itself, for example a hairpin structure).
[0069] The duplex sequencing adapter optionally further comprises a sample index. The sample index can be used to identify the sample of origin of each read, and allows pooling of multiple samples during the same sequencing run. Thus, the sample index is the same for each duplex sequencing adapter when the duplex sequencing adapter is ligated to the nucleic acid molecules, and different samples can be pooled together after ligation. Pooling of samples can occur, for example, prior to any amplification or sequencing of the nucleic acids.
[0070] The sample index can be of any length, for example between about 6 nucleotides and about 24 nucleotides in length (such as between about 6 nucleotides and about 12 nucleotides, between about 8 nucleotides and about 16 nucleotides, between about 12 nucleotides and about 26 nucleotides, or about 16 nucleotides and about 24 nucleotides in length). In some embodiments, the sample index comprises a first portion and a second portion, which may be on the same strand or on differing strands of the duplex sequencing adapter. In some embodiments, the first portion of the sample index or the second portion of the sample index is between about 3 nucleotides and about 12 nucleotides in length (such as between about 3 nucleotides and about 6 nucleotides, between about 6 nucleotides and about 8 nucleotides, between about 8 nucleotides and about 10 nucleotides, or about 10 nucleotides and about 12 nucleotides in length). In some embodiments, the first portion of the sample index and the second portion of the sample index are of equal length. [0071] In some embodiments, the sample index is laser-color balanced with the sample index. For example, in some embodiments, the ratio of A/C to G/T nucleotides within any given sample index is between about 2: 1 and about 1 :2 (such as about 1 : 1). In some embodiments, the sample index is base composition balanced within the sample index. For example, in some embodiments, the proportion of adenine within the sample index is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); the proportion of cytosine within the sample index is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); the proportion of thymine within the sample index is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25); and the proportion of guanine within the sample index is between about 0.2 and about 0.4 (such as between about 0.2 and about 0.3, or about 0.25).
[0072] In some embodiments, the sample index includes additional engineering features to enhance the sequencing quality. For example, in some embodiments, the sample index does not comprise homopolymer sequences (such as two or more consecutive, identical nucleotides; three or more consecutive, identical nucleotides; four or more consecutive, identical nucleotides; five or more consecutive, identical nucleotides; or six or more consecutive, identical nucleotides). In some embodiments, the sample index is
non-self-complementary (i.e., the sample index is not complementary to itself, for example the sample index is not a hairpin structure).
[0073] In some embodiments, two or more sequencing libraries are pooled, wherein the nucleic acid molecules are ligated to a duplex sequencing adapter and wherein the individual sequencing libraries are identifiable by a unique sample index. In some embodiments, the sample indices an edit distance of 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, or 8 or more from any other unique sample index.
[0074] In some embodiments, the sample index is incorporated into a plurality of duplex sequencing adapters by combining the duplex sequencing adapters comprising a molecular barcode with an oligonucleotide comprising the complement sequence of a sample index. The oligonucleotide further comprises a complementation region, which is complementary to a portion of a non-complementary region of the duplex sequencing adapter. Thus, once combined, the oligonucleotide pairs with the adapter sequence, and the sample index can be incorporated using a DNA polymerase.
[0075] In some embodiments, there is provided a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0076] In some embodiments, there is provided a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence, wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0077] In some embodiments, there is provided a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0078] In some embodiments, there is provided a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; and wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96). In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0079] In some embodiments, there is provided a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are color balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0080] In some embodiments, there is provided a composition comprising a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are base-composition balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0081] In some embodiments, there is provided a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0082] In some embodiments, there is provided a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0083] In some embodiments, there is provided a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; and wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96). In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0084] In some embodiments, there is provided a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are color balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0085] In some embodiments, there is provided a composition comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are base-composition balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof. [0086] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0087] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence, wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0088] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0089] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; and wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96). In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0090] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are color balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0091] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a duplex molecular barcode having a nondegenerate sequence and a constant 3 '-overhang (such as a thymine nucleotide), wherein the plurality of duplex sequencing adapters comprises a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are base-composition balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0092] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length; and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length, wherein x is not zero. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0093] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof. [0094] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; and wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96). In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0095] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are color balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
[0096] In some embodiments, there is provided a nucleic acid sequencing library comprising a plurality of nucleic acid inserts ligated to at least one (such as two) duplex sequencing adapter randomly selected from a plurality of duplex sequencing adapters comprising a first duplex sequencing adapter comprising a first duplex molecular barcode n nucleotides in length and a constant 3 '-overhang (such as a thymine nucleotide); and a second duplex sequencing adapter comprising a second duplex molecular barcode n + x nucleotides in length and the constant 3 '-overhang, wherein x is not zero; wherein the number of unique molecular barcodes in the plurality of duplex sequencing adapters is between 2 and about 500 (such as between about 10 and about 400, between about 40 and about 200, between about 80 and 120, or about 96); and wherein the molecular barcodes are base-composition balanced at the corresponding position relative to the shortest molecular barcode in the plurality of duplex sequencing adapters. In some embodiments, the edit distance between each molecular barcode is 2 or more. In some embodiments, the duplex sequencing adapters comprise a sample index (which may be included in the duplex sequencing adapter before ligation or incorporated into the duplex sequencing adapter during amplification). In some embodiments, the sample index comprises a first portion and a second portion (which may be on the same nucleic acid strand or on different nucleic acid strands of the duplex sequencing adapter). In some embodiments, the plurality of sequence adapters comprises Y-shaped duplex sequencing adapters, U-shaped duplex sequencing adapters, or a combination thereof.
EXAMPLES
Example 1
[0097] Customized sequencing adapters were ligated to a double-stranded DNA (dsDNA) of interest during a library preparation step. As shown in FIG. 1A, the original dsDNA molecule consisted of both plus (+ ; dashed line) and minus (- ; solid line) strands.
Adapters, in this instance, consisted of two unique sequencing primer binding sites, labeled P5 and P7, and an "inline" molecular barcode used to trace back downstream sequencing reads to the dsDNA molecule from which they were derived. Reconstructing the sequence of the original dsDNA through redundant sequencing of the original strands allowed for a high degree of error correction and an almost complete elimination of false positive mutations, due mainly to chemical DNA damage and sequencing artifacts.
[0098] Following adapter ligation, library samples were PCR amplified with primers specific to P5/P7 sequences, such that only molecules with adapters ligated to both ends were exponentially amplified (see FIG. IB). Importantly, during PCR amplification each original plus and minus strand of an original dsDNA molecule (as shown in FIG. 1A) was amplified independently such that the sequence of the original plus strand was now represented in both a plus and minus strand, and the sequence of the original minus strand was represented in both a minus and plus strand. As shown in FIG. IB, dashed lines indicate post-PCR library molecules derived from the original plus strand, while solid lines indicate post-PCR library molecules derived from the original minus strand.
[0099] Capture enrichment was performed in parallel including biotinylated baits/probes designed to both the plus strand of the region of interest (FIG. 1C) and the minus strand of the region of interest (FIG. ID). Probes used in FIG. 1C and FIG. ID were reverse complements of one another, but were separated spatially so as to prevent self binding. The DNA library from FIG. IB was split and separately added to either plus strand probes (FIG. 1C), or minus strand probes (FIG. ID). In addition, blocking oligos were included that possess complementarity to the sequencing primer binding sites P5 and P7. The inclusion of blocking oligos minimized "off target" effects due to non-specific binding of these sequences to other "off target" library molecules that also contained these sequences.
[00100] The mixture of probes, DNA library, and blocking oligos was concentrated via SpeedVac™ (Thermo Fisher Scientific Inc.), and resuspended in hybridization buffer. The hybridization mixture was heated to 95°C for 5-10 minutes and then cooled to 55-65°C overnight to allow probes to bind to the target DNA library. Following hybridization, magnetic streptavidin beads were added to the solution and incubated at 55-65°C. The probe-library bound beads were washed to remove DNA that bound non-specifically, and finally resuspended in a PCR Master Mix containing primers to P5/P7.
[00101] Enriched samples were PCR amplified in preparation for sequencing on an Illumina HiSeq 2500 (FIG. IE). Paired end sequencing was performed on the enriched libraries such that plus strand probe captures (FIG. 1C) were sequenced independently from minus strand probe captures (FIG. ID). In this way, downstream analysis could be performed such that it was easy to compare the effects of capturing with probes to designed to either the plus strand, the minus strand, or both (by merging data from both sequencing runs).
Example 2
[00102] FIG. 2 shows strand bias as a function of (+) vs. (-) vs. (+ and -) probes. Six technically and biologically independent samples were analyzed. In other words, six biologically distinct DNA samples were all prepared and enriched in 12 independent enrichment reactions (+ and - strand probes for each biological replicate). All (+) strand probe enrichments were consolidated and sequenced independently from (-) strand probe enrichments.
[00103] Strand bias was determined from sequencing reads by calculating the number of times a forward read (read from the P5 sequencing primer as shown in FIG. 1) maps to the plus (black) or minus (grey) strand following alignment to a reference genome. For this analysis, only reads mapping to the region of interest to which probes were designed were considered. Since (+) strand probe enrichments were sequenced independently from (-) strand probe enrichments, strand bias metrics were determined for each enrichment condition independently (shown as checkered blocks for (+) strand probes and hashed blocks for (-) strand probes). Further, data from both (+) and (-) strand probe enrichments was merged and shown as solid blocks (FIG. 2).
[00104] While there was no apparent relationship between the direction of bias and the probeset used, combining data from both (+) and (-) strand probes greatly reduced strand bias to nearly a 1 : 1 ratio for all samples tested. In other words, bias introduced in a (+) strand probe enrichment was almost completely offset by bias in the opposite direction in a (-) strand probe enrichment of the same sample (FIG. 2, compare checkered blocks to hashed blocks for each individual sample).
Example 3
[00105] FIG. 3 shows a heatmap summarizing the median molecular and duplex recovery depth from capture enrichments of six independent samples (triplicate of 20ng and 50ng of DNA input into library preparation). DNA samples were molecularly barcoded during library preparation and PCR such that sequencing reads could be assigned to the original duplex molecule from which they were derived. Duplex depth is defined as the number of molecules for which both strands of the original molecule were sequenced, whereas molecular depth is defined as the depth of duplexes plus any molecules for which only one strand was recovered after enrichment and sequencing.
[00106] Libraries were captured with probes designed to a ~10Kb region of interest to both the plus and minus strands (i.e., the probes were reverse complements of one another). As such, libraries were split following PCR into separate reactions and enriched independently, in one reaction with plus strand capture probes and in the other with minus strand capture probes. These enrichments were sequenced independently and then merged for the analysis of (+) and (-) strand capture probes.
[00107] In general, increases in the number of duplex molecules were observed when separate enrichments were performed with probes to both the (+) and (-) strands (FIG. 3, compare row 2 to rows 4 and 6). This increase represents a -5-10% increase compared to performing an enrichment with probes designed to a single strand alone.

Claims

CLAIMS What is claimed is:
1. A method of sequencing a duplex nucleic acid molecule, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both a plus strand and a minus strand;
differentially enriching the resulting plus strands and the minus strands using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
2. The method of claim 1 , wherein the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
3. A method of sequencing a duplex nucleic acid molecule, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand; independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
splitting the sample library into a first sample enrichment library and a second sample enrichment library;
enriching the plus strands of the first sample enrichment library using capture probes and enriching the minus strands of the second sample enrichment library using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
4. A method of sequencing a duplex nucleic acid molecule, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
sequentially enriching the plus strands and the minus strands of the sample library using capture probes, wherein sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
5. The method of any one of claims 1 to 4, wherein the first sequencing adapter and the second sequencing adapter are the same.
6. The method of any one of claims 1 to 4, wherein the first sequencing adapter and the second sequencing adapter are different.
7. The method of any one of claims 1 to 6, wherein the plus strand probe captures and the minus strand probe captures are amplified and sequenced together.
8. The method of any one of claims 1 to 6, wherein the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately.
9. The method of any one of claims 1 to 8, wherein enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters.
10. The method of any one of claims 1 to 9, wherein the sequencing adapters further comprise molecular barcodes.
11. The method of any one of claims 1 to 10, wherein amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads.
12. The method of claim 11, further comprising constructing a first strand consensus sequence, comprising:
comparing the first strand reads in the set of first strand reads;
identifying and removing errors in the set of first strand reads; and
constructing an error-corrected first-strand consensus sequence.
13. The method of claim 12, further comprising constructing a second strand consensus sequence, comprising:
comparing the second strand reads in the set of second strand reads;
identifying and removing errors in the set of second strand reads; and constructing an error-corrected second-strand consensus sequence.
14. The method of claim 13, further comprising:
comparing the first strand consensus sequence and the second strand consensus sequence;
identifying and removing errors in the set of first strand reads and the set of second strand reads; and
constructing an error-corrected duplex consensus sequence.
15. The method according to any one of claims 1 to 14, wherein the duplex nucleic acid molecule is a cell-free DNA molecule.
16. The method according to claim 15, wherein the duplex nucleic acid molecule is a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
17. A method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both a plus strand and a minus strand;
differentially enriching the plus strands and the minus strands using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
18. The method of claim 17, wherein the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
19. A method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
splitting the sample library into a first sample enrichment library and a second sample enrichment library;
enriching the plus strands of the first sample enrichment library using capture probes and enriching the minus strands of the second sample enrichment library using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures; and amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
20. A method of decreasing or eliminating strand bias during amplification and sequencing of a duplex nucleic acid molecule, comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand; independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
sequentially enriching the plus strands and the minus strands of the sample library using capture probes, wherein sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures; and
amplifying and sequencing the plus strand probe captures and the minus strand probe captures.
21. The method of any one of claims 17 to 20, wherein the first sequencing adapter and the second sequencing adapter are the same.
22. The method of any one of claims 17 to 20, wherein the first sequencing adapter and the second sequencing adapter are different.
23. The method of any one of claims 17 to 22, wherein the plus strand probe captures and the minus strand probe captures are amplified and sequenced together.
24. The method of any one of claims 17 to 22, wherein the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately.
25. The method of any one of claims claim 17 to 24, wherein enriching the plus strand library and the minus strand library using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters.
26. The method of any one of claims 17 to 25, wherein the sequencing adapters further comprise duplex molecular barcodes.
27. The method of any one of claims 17 to 26, wherein amplifying and sequencing the plus strand probe captures and the minus strand probe captures produce a set of first strand reads and a set of second strand reads.
28. The method of claim 27, further comprising constructing a first strand consensus sequence, comprising:
comparing the first strand reads in the set of first strand reads;
identifying and removing errors in the set of first strand reads; and
constructing an error-corrected first-strand consensus sequence.
29. The method of claim 28, further comprising constructing a second strand consensus sequence, comprising:
comparing the second strand reads in the set of second strand reads;
identifying and removing errors in the set of second strand reads; and
constructing an error-corrected second-strand consensus sequence.
30. The method of claim 29, further comprising:
comparing the first strand consensus sequence and the second strand consensus sequence;
identifying and removing errors in the set of first strand reads and the set of second strand reads; and
constructing an error-corrected duplex consensus sequence.
31. The method of any one of claims 17 to 30, wherein the duplex nucleic acid molecule is a cell-free DNA molecule.
32. The method according to claim 31, wherein the duplex nucleic acid molecule is a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
33. Nucleic acid sequencing libraries, wherein the nucleic acid sequencing libraries are produced by methods comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both a plus strand and a minus strand;
differentially enriching the plus strands and the minus strands using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures.
34. The nucleic acid sequencing libraries of claim 33, wherein the probes targeting substantially the same region of the plus strands and the minus are separated either temporally or spatially.
35. Nucleic acid sequencing libraries, wherein the nucleic acid sequencing libraries are produced by methods comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands;
splitting the sample library into a first sample enrichment library and a second sample enrichment library; and
enriching the plus strands of the first sample enrichment library using capture probes and enriching the minus strands of the second sample enrichment library using capture probes, wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library resulting from plus strand probe captures and an enriched library resulting from minus strand probe captures.
36. Nucleic acid sequencing libraries, wherein the nucleic acid sequencing libraries are produced by methods comprising:
ligating a first sequencing adapter and a second sequencing adapter to a first strand of a duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the first strand and the second sequencing adapter is ligated to a 3' end of the first strand; ligating the first sequencing adapter and the second sequencing adapter to a second strand of the duplex nucleic acid molecule, wherein the first sequencing adapter is ligated to a 5' end of the second strand and the second sequencing adapter is ligated to a 3' end of the second strand;
independently amplifying the first strand and the second strand with primers specific to the sequencing adapters, thereby creating a sample library wherein the first strand and the second strand are each represented as both plus strands and minus strands; and
sequentially enriching the plus strands and the minus strands of the sample library using capture probes, wherein sequentially enriching comprises first enriching with capture probes targeting the plus strands followed by enriching remaining unbound strands with capture probes targeting the minus strands, and wherein the capture probes target substantially the same region of the plus strands and the minus strands, thereby producing an enriched library comprising plus strand probe captures and minus strand probe captures.
37. The nucleic acid sequencing libraries of any one of claims 33 to 36, wherein the first sequencing adapter and the second sequencing adapter are the same.
38. The nucleic acid sequencing libraries of any one of claims 33 to 36, wherein the first sequencing adapter and the second sequencing adapter are different.
39. The nucleic acid sequencing libraries of any one of claims 33 to 38, wherein the plus strand probe captures and the minus strand probe captures are amplified and sequenced together.
40. The nucleic acid sequencing libraries of any one of claims 33 to 38, wherein the plus strand probe captures and the minus strand probe captures are amplified and sequenced separately.
41. The nucleic acid sequencing libraries of any one of claims 33 to 40, wherein enriching the plus strands and the minus strands using capture probes further comprises minimizing non-specific binding using blocking oligonucleotides complementary to one or more of the sequencing adapters.
42. The nucleic acid sequencing libraries of any one of claims 33 to 41, wherein the sequencing adapters further comprise duplex molecular barcodes.
43. The nucleic acid sequencing libraries of any one of claims 33 to 42, wherein the duplex nucleic acid molecule is a cell-free DNA molecule.
44. The nucleic acid sequencing libraries of claim 43, wherein the duplex nucleic acid molecule is a cell-free tumor DNA molecule or a cell-free fetal DNA molecule.
PCT/US2017/068090 2017-01-31 2017-12-22 Capture probes using positive and negative strands for duplex sequencing WO2018144159A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762452593P 2017-01-31 2017-01-31
US62/452,593 2017-01-31

Publications (1)

Publication Number Publication Date
WO2018144159A1 true WO2018144159A1 (en) 2018-08-09

Family

ID=63040020

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/068090 WO2018144159A1 (en) 2017-01-31 2017-12-22 Capture probes using positive and negative strands for duplex sequencing

Country Status (1)

Country Link
WO (1) WO2018144159A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020043803A1 (en) * 2018-08-28 2020-03-05 Sophia Genetics S.A. Methods for asymmetric dna library generation and optionally integrated duplex sequencing
EP3795685A1 (en) * 2019-09-20 2021-03-24 Sophia Genetics S.A. Methods for dna library generation to facilitate the detection and reporting of low frequency variants
CN116083423A (en) * 2022-05-16 2023-05-09 纳昂达(南京)生物科技有限公司 Probe for target enrichment of nucleic acid
EP4028586A4 (en) * 2019-09-13 2023-10-04 University Health Network Detection of circulating tumor dna using double stranded hybrid capture
CN116083423B (en) * 2022-05-16 2024-04-30 纳昂达(南京)生物科技有限公司 Probe for target enrichment of nucleic acid

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013191775A2 (en) * 2012-06-18 2013-12-27 Nugen Technologies, Inc. Compositions and methods for negative selection of non-desired nucleic acid sequences
US20150044687A1 (en) * 2012-03-20 2015-02-12 University Of Washington Through Its Center For Commercialization Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing
WO2016176091A1 (en) * 2015-04-28 2016-11-03 Illumina, Inc. Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis)

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150044687A1 (en) * 2012-03-20 2015-02-12 University Of Washington Through Its Center For Commercialization Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing
WO2013191775A2 (en) * 2012-06-18 2013-12-27 Nugen Technologies, Inc. Compositions and methods for negative selection of non-desired nucleic acid sequences
WO2016176091A1 (en) * 2015-04-28 2016-11-03 Illumina, Inc. Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KENNEDY, SR ET AL.: "Detecting ultralow-frequency mutations by Duplex Sequencing", NATURE PROTOCOLS, vol. 9, no. 11, November 2014 (2014-11-01), pages 2586 - 2606, XP055390095 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020043803A1 (en) * 2018-08-28 2020-03-05 Sophia Genetics S.A. Methods for asymmetric dna library generation and optionally integrated duplex sequencing
EP4028586A4 (en) * 2019-09-13 2023-10-04 University Health Network Detection of circulating tumor dna using double stranded hybrid capture
EP3795685A1 (en) * 2019-09-20 2021-03-24 Sophia Genetics S.A. Methods for dna library generation to facilitate the detection and reporting of low frequency variants
WO2021053208A1 (en) 2019-09-20 2021-03-25 Sophia Genetics S.A. Methods for dna library generation to facilitate the detection and reporting of low frequency variants
CN116083423A (en) * 2022-05-16 2023-05-09 纳昂达(南京)生物科技有限公司 Probe for target enrichment of nucleic acid
CN116083423B (en) * 2022-05-16 2024-04-30 纳昂达(南京)生物科技有限公司 Probe for target enrichment of nucleic acid

Similar Documents

Publication Publication Date Title
Salk et al. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations
US20230416729A1 (en) Nucleic acid sequencing adapters and uses thereof
US20230332221A1 (en) Compositions and methods for identifying nucleic acid molecules
KR102210852B1 (en) Systems and methods to detect rare mutations and copy number variation
US20220348998A1 (en) Methods for labelling nucleic acids
JP6664575B2 (en) Nucleic acid molecule counting method
US20220364169A1 (en) Sequencing method for genomic rearrangement detection
US20240026440A1 (en) Methods of labelling nucleic acids
US20140336058A1 (en) Method and kit for characterizing rna in a composition
WO2018144159A1 (en) Capture probes using positive and negative strands for duplex sequencing
CN111801427A (en) Generation of single-stranded circular DNA templates for single molecules
CN114774522A (en) Method and kit for constructing high fidelity sequencing library and application
US20200208140A1 (en) Methods of making and using tandem, twin barcode molecules
US20230399687A1 (en) Quantitative Multiplex Amplicon Sequencing System
EP3938541B9 (en) Method for sequencing a direct repeat
US20240052342A1 (en) Method for duplex sequencing
WO2024039272A1 (en) Nucleic acid amplification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17895327

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17895327

Country of ref document: EP

Kind code of ref document: A1