WO2013192292A1 - Massively-parallel multiplex locus-specific nucleic acid sequence analysis - Google Patents

Massively-parallel multiplex locus-specific nucleic acid sequence analysis Download PDF

Info

Publication number
WO2013192292A1
WO2013192292A1 PCT/US2013/046522 US2013046522W WO2013192292A1 WO 2013192292 A1 WO2013192292 A1 WO 2013192292A1 US 2013046522 W US2013046522 W US 2013046522W WO 2013192292 A1 WO2013192292 A1 WO 2013192292A1
Authority
WO
WIPO (PCT)
Prior art keywords
barcode
primer
sequence
gene
nucleic acid
Prior art date
Application number
PCT/US2013/046522
Other languages
French (fr)
Inventor
Justin Lamb
Original Assignee
Justin Lamb
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Justin Lamb filed Critical Justin Lamb
Publication of WO2013192292A1 publication Critical patent/WO2013192292A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism

Definitions

  • the restriction enzyme site is located within said 3' primer-specific portion or said 5 ' primer-specific portion of said modified ligation product. In one embodiment, the restriction enzyme sites comprise a cleavage site that abuts said target-specific portion of said modified ligation products. In one embodiment, the restriction enzyme site is flanked by 3 or more nucleotides. In one embodiment, the restriction enzyme generates an overhang. In one embodiment, the overhang is at least three nucleotides. In one embodiment, the overhang is a 3' overhang. In one embodiment, the overhang is a 5' overhang. In one embodiment, the restriction enzyme generates an asymmetric end. In one embodiment, the addressing fragment may comprise a well-barcode sequence element.
  • applying refers to any process or method that can join two nucleic acid sequences into a single nucleic acid sequence. Such processes may include, but are not limited to, conjugation, ligation, polymerization, or condensation.
  • the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule.
  • the primers are extended with a polymerase so as to form a new pair of complementary strands.
  • the steps of denaturation, primer annealing, and polymerase extension can be repeated many times (in other words, denaturation, annealing and extension constitute one "cycle”; there can be numerous "cycles") to obtain a high concentration of an amplified segment of the desired target sequence.
  • the length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter.
  • the present invention contemplates a method which may comprise introducing at least one restriction endonuclease site in an LDR/PCR product primer-specific sequence, wherein the LDR PCR product is cleaved to generate similarly-sized fragments.
  • One embodiment of the present invention contemplates a method which may comprise introducing a restriction endonuclease recognition and/or cleavage site in an LDR PCR product primer-specific sequence. See, Figure 4.
  • the present invention contemplates cleaving an LDR/PCR product with a restriction endonuclease at a restriction recognition and/or cleavage site.
  • LDR/PCR products are blended with a restriction enzyme to form a restriction endonuclease cleavage mixture.
  • the restriction endonuclease cleavage mixture may be subjected to one heat treatment leading to cleavage of each LDR/PCR product in the primer-specific portion of these products by the action of the restriction endonuclease to yield at least two predominantly double-stranded digestion products whose termini are 5' phosphorylated.
  • An optional second heat treatment may be performed to inactivate the endonuclease.
  • a mixture (i.e. blend, pool) of barcode ligation phase products are subjected to a treatment substantially resulting in the enrichment of one strand of final barcoded products over the other strand of final barcoded products.
  • final barcoded products purified (i.e., for example, selected, separated, concentrated, enriched) from a mixture (i.e. blend, pool) of barcode ligation phase products are subjected to a treatment substantially resulting in the enrichment of one strand of final barcoded products over the other strand of final barcoded products.
  • the LDR phase was performed with a library of 1,400 probe pairs as described for Example 3.
  • 15 ⁇ reaction mixtures containing 1 x HotStar Taq PCR Buffer (Qiagen) supplemented with 850 ⁇ MgC12, 1.5 pmol top primer (SEQ ID NO: 10), 1.5 pmol Dra primer (SEQ ID NO: 16), 160 ⁇ of each dNTP, and 0.48 units of HotStar Taq DNA Polymerase (Qiagen) were added to each well and incubated at 92°C for 9 minutes (initial denaturation), followed by 29 cycles of 92°C for 60 seconds (denaturation), 52°C for 60 seconds (annealing), and 72°C for 60 seconds (extension), followed by 72°C for 5 minutes (final extension).
  • Nucleic acids were precipitated from both pools by the addition of 20 ⁇ of 3M sodium acetate pH 5.5, 0.8 ⁇ of GlycoBlue (Ambion), and 600 ⁇ of 100% ethanol, and incubation on ice for approximately 90 minutes. Precipitates were collected by centrifugation at 16,000 g at 4°C for 30 minutes, washed with 900 ⁇ 75% ethanol, collected by spinning as before, briefly air dried, and dissolved in 60 ⁇ TE pH 8 (Ambion) or 45 ⁇ TE pH 8 for pools X and Y, respectively.

Abstract

The efficiency of massively-parallel nucleic acid population sequencing has been significantly improved by simultaneously sequencing nucleic acids from different biological samples (i.e., different genomic populations). Improvements to the ligase detection reaction described herein utilize combining the advantages of gene-barcoding and sample barcoding (e.g., well-barcodes). For example, amplified ligated gene-barcoded nucleic acids are incorporated with terminal restriction enzyme sites such that their resultant cleavage fragments have cohesive ends that are compatible with synthetically constructed addressing fragments containing well- barcodes. The joining of these addressing fragments and the cleaved gene-barcode fragments result in an informational read that can be sequenced in a high-throughput sequencer instrument.

Description

MASSIVELY-PARALLEL MULTIPLEX LOCUS-SPECIFIC
NUCLEIC ACID SEQUENCE ANALYSIS
RELATED APPLICATIONS AND INCORPORATION BY REFERENCE
[0001] This application claims priority to US Provisional Application Serial No. 61/662,578 filed June 21, 2012.
[0002] The foregoing applications, and all documents cited therein ("appln cited documents") and all documents cited or referenced in the appln cited documents, and all documents cited or referenced herein ("herein cited documents"), and all documents cited or referenced in herein cited documents, together with any manufacturer's instructions, descriptions, product specifications, and product sheets for any products mentioned herein or in any document incorporated by reference herein, are hereby incorporated herein by reference, and may be employed in the practice of the invention. More specifically, all referenced documents are incorporated by reference to the same extent as if each individual document was specifically and individually indicated to be incorporated by reference.
FIELD OF THE FNVENTION
[0003] The present invention relates to the simultaneous detection and quantification of multiple nucleic-acid sequences in a plurality of biological samples in parallel fashion. For example, sequence analysis may be performed using a massively-parallel multiplex reaction of locus-specific nucleic acids. It is of relevance to the fields of transcriptional profiling, genotyping and mutation analysis.
BACKGROUND
[0004] A variety of approaches allow genotyping of thousands of genomic loci and quantification of thousands of transcripts in individual biological samples (e.g. Affymetrix GeneChip, Illumina Infinium). What the field lacks is a method to analyze a relatively small number (100-1,000) of loci— perhaps identified using these genome-wide approaches to be especially informative— in a large number of biological samples rapidly and at low unit cost. Such a method would have applications including, but not limited to, large-scale validation of genetic associations with, and expression correlates of, disease states or outcomes, as well as screening of perturbagen libraries. The advent of technologies capable of sequencing hundreds of millions of individual DNA molecules simultaneously in a matter of days provides a possible solution to this problem.
[0005] Citation or identification of any document in this application is not an admission that such document is available as prior art to the present invention.
SUMMARY OF THE INVENTION
[0006] The present invention relates to the simultaneous detection and quantification of multiple nucleic-acid sequences in a plurality of biological samples in parallel fashion. For example, sequence analysis may be performed using a massively-parallel multiplex reaction of locus-specific nucleic acids. It is of relevance to the fields of transcriptional profiling, genotyping and mutation analysis.
[0007] In one embodiment, the present invention contemplates an improved method for determining the presence and optionally the abundance of a target nucleotide sequence by performing a multiplex ligase detection/polymerase chain reaction on a biological sample which may comprise a plurality of target nucleotide sequences with a probe pair, wherein each probe of said probe pair may comprise a primer-binding site and a partial gene-barcode sequence, such that said partial gene-barcode sequence of said each probe hybridizes with at least a portion of said target nucleotide sequence, wherein said partial gene -barcode sequences of said each probe of said probe pair ligate together to form a gene-barcode nucleic acid, said improvement which may comprise: a) providing; i) a primer sequence which may comprise a template for at least one restriction enzyme site, wherein said primer sequence hybridizes to a terminal end or said primer-binding site of said gene-barcode nucleic acid; and ii) an addressing fragment which may comprise a well-barcode sequence, a blunt terminus and a first overhang terminus; iii) a restriction enzyme recognizing said at least one restriction enzyme site and capable of creating a second overhang terminus; b) amplifying said gene-barcode nucleic acid with said primer thereby incorporating said at least one restriction enzyme site into a gene -barcode amplicon; c) cleaving said gene-barcode amplicon with said restriction enzyme to create a gene-barcode fragment; d) ligating said addressing fragment with said gene-barcode fragment to create a final barcoded product; and e) sequencing said final barcoded product, wherein at least a partial sequence of said gene -barcode and said well-barcode contained therein are obtained, wherein said target nucleotide sequence is identified, and its presence and optionally its abundance in said biological sample is determined. In one embodiment, the ligating is performed by a ligase enzyme selected from the group consisting of a Thermus aquaticus ligase, a Thermus thermophilus ligase, an E. coli ligase, a T4 ligase, and a Pyrococcus ligase. In one embodiment, the terminal end of said gene-barcode nucleic acid is a 3 ' terminal end. In one embodiment, the addressing fragment is a synthetic nucleic acid. In one embodiment, the well-barcode identifies a biological sample source of said target nucleotide sequence. In one embodiment, the gene- barcode nucleic acid may comprise a complementary sequence to said target nucleotide sequence. In one embodiment, the final barcoded product may comprise a well-barcode and a gene-barcode. In one embodiment, the well-barcode and said gene -barcode are adjacent. In one embodiment, the adjacent well-barcode and gene-barcode are separated by at least three nucleotides. In one embodiment, the restriction enzyme is an endonuclease. In one embodiment, the endonuclease is selected from the group consisting of a Dralll endonuclease and a Hgal endonuclease. In one embodiment, the gene-barcode fragment may comprise an overhang terminus. In one embodiment, the primer sequences have imperfect complementarity with at least one of said primer-binding sites in said gene- barcode nucleic acid. In one embodiment, the ligating joins said first overhang terminus and said second overhang terminus.
[0008] In one embodiment, the present invention contemplates an improved method for identifying one or more target nucleic acid molecules within a plurality of target nucleic acid molecules from a reaction mixture that may comprise a ligase, one or more target nucleic acid molecules, and one or more oligonucleotide probe sets, each of said probe sets including: i) a first oligonucleotide which may comprise: (a) a first target-specific portion capable of hybridizing to a corresponding target nucleic acid molecule, and (b) a first primer-specific portion; and ii) a second oligonucleotide which may comprise: (a) a second target-specific portion capable of hybridizing to said corresponding target nucleic acid molecule, and (b) a second primer-specific portion; by producing one or more ligation products which may comprise said first and second oligonucleotides after said first and said second target-specific portions of said oligonucleotides are hybridized to said corresponding target nucleic acid molecule and are ligated together, wherein each of said one or more ligation products may comprise a ligated sequence which include: iii) said first target-specific portion of said first oligonucleotide, and said first primer-specific portion of said first oligonucleotide in a corresponding probe set and iv) said second target-specific portion of the second oligonucleotide, and said second primer-specific portion of said second oligonucleotide in said corresponding probe set; and subjecting said one or more ligation products to one or more polymerase chain reaction cycles to produce one or more amplified ligation products, said improvement which may comprise: a) incorporating a restriction enzyme site into the 5' end or the 3' end of said amplified ligation products, to create a modified ligation product; b) contacting said modified ligation products with a restriction enzyme to produce a digested ligation product with at least one preferably cohesive end, and c) appending an addressing fragment to said preferably cohesive end of said digested ligation product to produce a final barcoded product, and d) sequencing said final barcoded product, wherein at least a partial sequence of said target-specific portion is obtained, and at least a partial sequence of said addressing fragment is obtained, wherein said target nucleic acid molecule is identified. In one embodiment, the first oligonucleotide primer-specific portion may comprise a template for a first restriction enzyme site. In one embodiment, the second oligonucleotide primer-specific portion may comprise a template for a second restriction enzyme site. In one embodiment, the primer-specific portions of said first and second oligonucleotide probe sets are universal primer binding sites. In one embodiment, the final barcoded product identifies one locus on said target nucleic acid molecule. In one embodiment, each probe of said each probe set is provided for each target locus identified in said target nucleic acid molecules. In one embodiment, the 5' end of said second target-specific portion is phosphorylated. In one embodiment, each probe of said probe set forms a ligation product when the 3 ' end of said first target- specific portion and the 5' end of said second target-specific portion hybridize to adjacent nucleotides of said target nucleic acid molecule. In one embodiment, the incorporating is performed by polymerase chain reaction with a primer set which may comprise a first primer with a nucleotide sequence that is substantially identical to said 5' primer-specific portion sequence and a second primer with a nucleotide sequence that is substantially complementary to the 3' primer- specific portion sequence of said ligation products. In one embodiment, the first and second primers are universal primers. In one embodiment, the incorporating is performed by polymerase chain reaction with a primer set which may comprise a first primer with a nucleotide sequence that may comprise one or more mismatches from said 5' primer-specific portion sequence, and or a second primer with a nucleotide sequence that may comprise one or more mismatches from the complement of the 3' primer-specific portion sequence of said ligation products. In one embodiment, the first and or said second primers comprise a template for at least one restriction enzyme recognition site. In one embodiment, the first and said second primers are universal primers. In one embodiment, the appending is performed by a ligase enzyme. In one embodiment, the restriction enzyme site is located within said 3' primer-specific portion or said 5 ' primer-specific portion of said modified ligation product. In one embodiment, the restriction enzyme sites comprise a cleavage site that abuts said target-specific portion of said modified ligation products. In one embodiment, the restriction enzyme site is flanked by 3 or more nucleotides. In one embodiment, the restriction enzyme generates an overhang. In one embodiment, the overhang is at least three nucleotides. In one embodiment, the overhang is a 3' overhang. In one embodiment, the overhang is a 5' overhang. In one embodiment, the restriction enzyme generates an asymmetric end. In one embodiment, the addressing fragment may comprise a well-barcode sequence element. In one embodiment, the well-barcode sequence element identifies the biological source of said target nucleotide sequence. In one embodiment, the final barcoded product includes at least one gene-barcode element that is less than 20 nucleotides. In one embodiment, the final barcoded product includes a single well-barcode. In one embodiment, the well-barcode element is located at the 3' end of said final barcoded product. In one embodiment, the well-barcode element is located at the 5' end of said final barcoded product. In one embodiment, the addressing fragment is a synthetic nucleic acid. In one embodiment, the addressing fragment has an asymmetric end. In one embodiment, the method further may comprise the step of counting said final barcoded products which may comprise identical gene-barcodes and identical well-barcodes to determine locus-specific abundance in a biological sample. In one embodiment, the method further may comprise the step of combining said final barcoded products having different well-barcodes before step (d). In one embodiment, the method further may comprise the step of counting said final barcoded products which may comprise identical gene-barcodes and identical well-barcodes to determine locus-specific abundance in a plurality of different biological samples. In one embodiment, the sequencing is performed on a single-molecule sequencing instrument. In one embodiment, the sequencing is massively-parallel. In one embodiment, the gene -barcode segment identifies a specific locus of said target nucleic acid molecule.
[0009] Accordingly, it is an object of the invention to not encompass within the invention any previously known product, process of making the product, or method of using the product such that Applicants reserve the right and hereby disclose a disclaimer of any previously known product, process, or method. It is further noted that the invention does not intend to encompass within the scope of the invention any product, process, or making of the product or method of using the product, which does not meet the written description and enablement requirements of the USPTO (35 U.S.C. § 112, first paragraph) or the EPO (Article 83 of the EPC), such that Applicants reserve the right and hereby disclose a disclaimer of any previously described product, process of making the product, or method of using the product.
[0010] It is noted that in this disclosure and particularly in the claims and/or paragraphs, terms such as "comprises", "comprised", "comprising" and the like can have the meaning attributed to it in U.S. Patent law; e.g., they can mean "includes", "included", "including", and the like; and that terms such as "consisting essentially of and "consists essentially of have the meaning ascribed to them in U.S. Patent law, e.g., they allow for elements not explicitly recited, but exclude elements that are found in the prior art or that affect a basic or novel characteristic of the invention.
[0011] These and other embodiments are disclosed or are obvious from and encompassed by, the following Detailed Description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] For a more complete understanding of the features and advantages of the present invention, reference is now made to the detailed description of the invention along with the accompanying figures.
[0013] Figure 1 depicts a representative flow diagram of a massively-parallel multiplex locus-specific nucleic-acid sequence analysis scheme.
[0014] Figure 2 depicts a representative schematic diagram of a coupled LDR/PCR (modified from Barany et al US 7,429,453, herein incorporated by reference).
[0015] Figure 3A-B depicts a representative schematic diagram of strategies for introducing well-barcode sequence elements. 3A: An exemplary well-barcode inserted into a ligation probe. 3B: An exemplary well-barcode incorporated into a PCR primer. 3C: An exemplary well- barcode ligated to PCR product after cleavage at restriction endonuclease site in a primer- specific portion.
[0016] Figure 4 depicts a representative schematic diagram of a restriction endonuclease site introduction into LDR/PCR products using a modified primer.
[0017] Figure 5A-F depicts one embodiment of a digestion of - and barcode ligation to - PCR products. 5A: PCR product. 5B: Expected Dralll cleavage fragments. 5C: Expected Hgal cleavage fragments. 5D: Dralll addressing fragment and corresponding Dralll cleavage fragment. 5E: Hgal addressing fragment and corresponding Hgal cleavage fragment. 5F: Gel showing results of digestion and ligation.
[0018] Figure 6 depicts exemplary data showing the effect of heat inactivation of residual polymerase on subsequent digestion and ligation reactions.
[0019] Figure 7A-C depicts one embodiment of an introduction of restriction endonuclease recognition sites with a mutant PCR primer. 7A: PCR product using mutant primer. 7B: Expected Hgal cleavage fragments. 7C: Gel showing results of digestion and ligation.
[0020] Figure 8A-D depicts one embodiment of an introduction of restriction endonuclease recognition sites with a lengthened mutant PCR primer. 8A: PCR product using lengthened mutant primer. 8B: Expected Hgal cleavage fragments. 8C: Z99-Hga addressing fragment and corresponding Hgal cleavage fragment. 8D: Gel showing results of digestion and ligation.
[0021] Figure 9A-E depicts one embodiment of an introduction of restriction endonuclease recognition sites with a mutant PCR primer and reduced annealing temperature. 9A: PCR product using mutant primer. 9B: Gel showing influence of annealing temperature. 9C: Expected Dralll cleavage fragments. 9D: Z99-Dra-P04 addressing fragment and corresponding Dralll cleavage fragment. 9E: Gel showing results of digestion and ligation.
[0022] Figure 10A-B depicts one embodiment of a purification of pools of LDR/PCR products. 10A: Gel depicting the results of pooling and lambda exonuclease digestion. 10B: Denaturing gel electrophoresis of sequencing libraries.
[0023] Figure 11 A-C depicts exemplary data showing counts of LDR/PCR products with different well-barcode sequence elements. 11 A: Structure of anchored reads (pink, gene-specific sequence; blue, well-barcode sequence; invariant region: TCCACTTA). 11B: Counts of strands with the same gene -barcode sequence fragment and different well-barcode sequence fragments from Library A. 11C: Counts of strands with the same gene -barcode sequence, fragments and different well-barcode sequence fragments from Library B.
[0024] Figure 12A-G presents a comparison of an LDR embodiment described herein with a conventional Serial Multiplex Detection Method: 12A: Structure of long anchored reads (pink, target-specific sequence; blue, well-barcode sequence; invariant region: TCCACTTA). 12B: Plot of counts of reads with the same sequence tags and different well-barcode sequence fragments from Library X. 12C: Plot of counts of reads with the same sequence tags and different well- barcode sequence fragments from Library Y. 12D: Plot of abundance estimated by the FlexMAP method and by counts of informative reads containing the Z99S well-barcode sequence element in library X. 12E: Plot of abundance estimated by the FlexMAP method and by counts of informative reads containing the Z97S well-barcode sequence element in Library X. 12F: Plot of abundance estimated by the FlexMAP method and by counts of informative reads containing the Z97S well-barcode sequence element in Library Y. 12G: Plot of abundance estimated by the FlexMAP method and by counts of informative reads containing the Z99S well-barcode sequence element in Library Y.
[0025] Figure 13 depicts the nucleic acid sequence of the primers, probes, oligonucleotides and polynucleotides employed herein. Synthetic polynucleotide (SEQ ID NO: 1); Primers (SEQ
ID NOs: 2-3); Synthetic oligonucleotides (SEQ ID NOs: 4-7, 13-14, and 17-20); Left Probe
(SEQ ID NO: 8); Right Probe (SEQ ID NO: 9); Top primer (SEQ ID NO: 10); and Bottom primers (SEQ ID NO: 11, 12, 15, 16).
DETAILED DESCRIPTION OF THE INVENTION
[0026] To facilitate the understanding of this invention a number of terms are defined below.
Terms defined herein (unless otherwise specified) have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as "a", "an" and "the" are not intended to refer to only a singular entity, but include the general class of which a specific example may be used for illustration. The terminology herein is used to describe specific embodiments of the invention, but their usage does not delimit the invention, except as outlined in the claims.
[0027] As used herein, terms defined in the singular are intended to include those terms defined in the plural and vice versa.
[0028] As used herein, the term "barcode" as used herein, refers to any nucleotide sequence within a given nucleic acid molecule (i.e., for example, an addressing fragment or a probe nucleic acid) that renders that given nucleic acid molecule distinguishable (i.e., for example, non-identical and/or unique) as compared to any other nucleic acid molecule. Such barcode nucleic acid sequences are pre-assigned to correspond to specific elements including, but not limited to, a gene locus or a biological sample collection (e.g., population). Further, preferable barcode nucleic acid sequences are not known to be wild type sequence for any gene and are considered arbitrary. [0029] The term "addressing fragment" as used herein, refers to any nucleic acid sequence which may comprise at least one unique well-barcode and a cohesive end (e.g., for example, a 3' overhang terminus or a 5' overhang terminus). For example, the addressable fragment may also have a primer sequence appended with a well-barcode sequence.
[0030] The term "well-barcode" refers to any nucleotide barcode sequence that is appended to any addressable fragment to uniquely identify a biological sample collection (i.e., for example, a population) from which they were derived. Well-barcode sequence elements can differ in attributes including, but not limited to, length and sequence. The composition and combinatorial nature of nucleic acid (e.g., DNA) is such that populations and/or collections of very short polynucleotides having a similar molecular weight can be derived from a very large number of non-identical species. For example, it is possible to generate 256 non-identical 4-mer polynucleotides and more than one million non-identical 10-mer polynucleotides that are easily distinguishable by direct sequencing. However, in accordance with some embodiments as described herein, if some of these polynucleotides (e.g., addressable fragments) are, in fact identical, they will differ in the nucleotide sequence of their well-barcode sequence element, and are therefore distinguishable from one another.
[0031] The term "gene-barcode" refers to a unique nucleotide sequence that specifically identifies, and is complementary with, a target-specific nucleotide segment present within a locus nucleic acid molecule. A "partial gene -barcode" is complementary to at least a portion of a target-specific nucleic acid molecule may be included within a ligation detection reaction probe nucleic acid. The gene-barcode sequences are pre-assigned to be complementary with a specific and known locus nucleic acid sequences.
[0032] The term "gene-barcode nucleic acid" or "digested ligation products" as used herein, refers to any nucleic acid which may comprise a ligated nucleotide sequence complementary to a target-specific nucleotide sequence derived from the target-specific portions (e.g., partial gene- barcode sequences) of an LDR probe pair.
[0033] The term "gene -barcode fragment" or "target-specific ligation product" as used herein, refers to any nucleic acid which may comprise a nucleotide sequence complementary to a target-specific nucleotide sequence (e.g., a gene -barcode) that has been cleaved by a restriction enzyme (e.g., for example, an endonuclease) and may comprise an overhang terminus (preferably a 3' overhand terminus). [0034] The term "probe set" as used herein, refers to any pair of nucleic acids capable of hybridizing to a target-specific nucleic acid wherein each nucleic acid of the pair ligates together. Preferably, the ligation joins a first partial gene-barcode sequence on a first probe, with a second partial gene -barcode sequence on a second probe.
[0035] The term "final barcoded product" or "informative read" as used herein, refers to any nucleic acid sequence which may comprise a gene-barcode sequence and a well-barcode sequence. Preferably, the well-barcode sequence and the gene-barcode sequence are adjacent (i.e., for example, separated from each other by between approximately 2 - 10 nucleotides, but more preferably 3 nucleotides).
[0036] The term "target-specific sequence" as used herein, refers to a known wild-type nucleotide sequence of a specific locus.
[0037] The term "locus" as used herein, refers to any genomic region that can be targeted by a gene -barcode sequence that is complementary to a target-specific sequence residing within a particular locus. Preferably, the locus represents an operon of a gene, and more preferably at least partially includes an open reading frame.
[0038] The term "primer-specific sequence" as used herein, refers to any nucleic acid sequence that is complementary to a primer (e.g., a polymerase chain reaction primer). For example, a primer-specific sequence may reside within a target-specific sequence of a particular locus. A primer-specific sequence may be modified to contain at least one endonuclease restriction site.
[0039] The term "appending" as used herein, refers to any process or method that can join two nucleic acid sequences into a single nucleic acid sequence. Such processes may include, but are not limited to, conjugation, ligation, polymerization, or condensation.
[0040] The term "similarly-sized" as used herein when referring to nucleic acids, means a set of nucleic acids all within a specified range of length and/or molecular weight. Different sets of nucleic acids may be determined having sufficient differences in length and/or molecular weight ranges such that they may be easily distinguishable by standard isolation and purification techniques (i.e., for example, electrophoresis). For example, "similarly-sized addressing fragments" may range between 8 - 15 base pairs. For example, "similarly-sized unc leaved LDR/PCR products" may range between 70 and 90 base pairs. For example, "similarly-sized cleaved LDR/PCR products" may range between 35 - 64 base pairs. [0041] The term "imperfect complementarity" as used herein, refers to two nucleic acid sequences that do not have perfect base-pairing matches (e.g., for example, mis-matches) as a basis for hybridization. Nonetheless, two nucleic acid sequences having imperfect complementarity may still hybridize and undergo either ligation events, or amplification events. Imperfect complementarity may be seen in nucleic acid sequences having between approximately 95-99% identity in base-pair matches.
[0042] As used herein, the term "library" refers to a collection of nucleic acid fragments (i.e. DNA, cDNA, RNA) that is stored and propagated in a population of microorganisms through the process of molecular cloning. The application of these libraries depends on the source of the original DNA fragments. There are differences in the cloning vectors and techniques used in library preparation, but in general each DNA fragment is uniquely inserted into a cloning vector and the pool of recombinant DNA molecules is then transferred into a population of bacteria or yeast such that each organism contains on average one construct (vector plus insert). As the population of organisms is grown in culture, the DNA molecules contained within them are copied and propagated (i.e. "cloned"). The term "library" may refer to a population of organisms, each of which carries a DNA molecule inserted into a cloning vector, or alternatively to the collection of all of the cloned vector molecules. A "cDNA library" represents a sample of the mRNA purified from a particular source (including a collection of cells, a particular tissue, or an entire organism) that has been converted back to a DNA template by the enzyme reverse transcriptase. A "cDNA library" therefore represents the genes that were being actively transcribed in that particular source under the physiological, developmental, or environmental conditions that existed when the mRNA was purified. cDNA libraries can be generated using techniques that promote "full-length" clones or under conditions that generate shorter fragments used for the identification of "expressed sequence tags". Applications of cDNA libraries include, discovery of novel genes, cloning of full-length cDNA molecules for in vitro study of gene function, study of the repertoire of mRNAs expressed in different cells or tissues and study of alternative mRNA splicing in different cells or tissues. A "genomic DNA library" is a set of clones that represent the entire genome of a given organism. The number of clones that constitute a genomic library depends on the size of the genome in question and the insert size tolerated by the particular cloning vector system. Applications of genomic libraries include determining the complete genome sequence of a given organism, generation of transgenic animals, studying the function of regulatory sequences in vitro and identifying genetic mutations that underlie diseases such as cancer, diabetes and hypertension.
[0043] As used herein, the term "polymerase chain reaction" ("PCR") refers to the method of Mullis as provided for in U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,965,188, incorporated herein by reference, that describe a method for increasing the concentration of a segment of a target sequence in a mixture of genomic DNA without cloning or purification. This process for amplifying the target sequence consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double stranded target sequence. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule. Following annealing, the primers are extended with a polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer annealing, and polymerase extension can be repeated many times (in other words, denaturation, annealing and extension constitute one "cycle"; there can be numerous "cycles") to obtain a high concentration of an amplified segment of the desired target sequence. The length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as the "polymerase chain reaction" (hereinafter "PCR"). Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be "PCR amplified". With PCR, it is possible to amplify a single copy of a specific target sequence in genomic DNA to a level detectable by several different methodologies (for example, hybridization with a labeled probe; incorporation of biotinylated primers followed by avidin- enzyme conjugate detection; incorporation of 32P-labeled deoxynucleotide triphosphates, such as dCTP or dATP, into the amplified segment). In addition to genomic DNA, any oligonucleotide or polynucleotide sequence can be amplified with the appropriate set of primer molecules. In particular, the amplified segments created by the PCR process itself are, themselves, efficient templates for subsequent PCR amplifications.
[0044] As used herein, the terms "PCR product", "PCR fragment", "amplification product" and the like refer to the resultant mixture of compounds after two or more cycles of the PCR steps of denaturation, annealing and extension are complete. These terms encompass the case where there has been amplification of one or more segments of one or more target sequences.
[0045] As used herein, the term "ligase-detection reaction" or "LDR" refers to a method that utilizes the ability of DNA ligase to preferentially seal adjacent oligonucleotides hybridized to a target DNA molecule. For example, a bi-allelic single nucleotide polymorphism (SNP) may be "typed" by designing three probes - one common probe and two allelic probes. The common probe anneals to a target nucleic acid template immediately downstream of the nucleotide (i.e. SNP) being examined. The 3' end of one of the allelic probes may comprise a nucleotide that corresponds to the wild type allele, while the other allelic probe may comprise a nucleotide at its 3' end that corresponds to the variant allele. The two allelic probes compete to anneal to the template adjacent to the common probe. This generates a double stranded region containing a nick (i.e. a missing phosphodiester bond) at the nucleotide position being examined. Only the allelic probe with perfect complementation to the template will be ligated to the common probe by the DNA ligase. Utilization of thermostable Taq DNA Ligase enables repeated thermal cycles, resulting in a linear increase in ligation product. In some embodiments, the allelic probes can be designed to have unique lengths such that the wild type and variant ligation products can be separated on the basis of size. Alternatively, the allelic probes can be differentially labeled with fluorescent dyes to enable the ligation products to be discriminated based on color. In addition to SNP analysis this approach may be applied to the typing of a variety of genetic loci including (but not limited to) micro-deletions, insertions, translocations and inversions. In one embodiment, probes for a number of polymorphic sites can be multiplexed together, enabling several polymorphisms from a single biological sample to be typed simultaneously. Furthermore, polymorphisms located on distinct nucleic acid molecules can be examined simultaneously.
[0046] As used herein, the term "LDR/PCR" refers to a method that couples the ligase detection reaction (LDR) with the polymerase chain reaction (PCR) such that the sensitivity of the PCR step is complemented by the high specificity of the LDR step.
[0047] As used herein, the term "restriction enzyme", "restriction endonuclease" or "RE" refers to an enzyme that cuts nucleic acid (i.e. double-stranded or single stranded DNA) at specific nucleotide recognition sequences known as "restriction sites". While restriction sites vary between 4 and 8 nucleotides, many of them are palindromic, which correspond to nitrogenous base sequences that read the same backwards and forwards. Restriction sites typically differ between restriction enzymes, producing differences in the length, sequence and strand orientation of sticky-end "overhangs" (i.e. 5' end or the 3' end). Different restriction enzymes that recognize the same sequence are known as "neoschizomers"; these often cleave at different locations of the sequence. Different enzymes that recognize and cleave in the same location are known as an "isoschizomer". More than 600 restriction enzymes are commercially available and are routinely used for DNA modification and manipulation in laboratories.
[0048] As used herein, the term "primer" refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced (in other words, in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact lengths of the primers will depend on many factors, including temperature, source of primer and the use of the method.
[0049] As used herein, the terms "complementary" or "complementarity" are used in reference to "polynucleotides" and "oligonucleotides" (which are interchangeable terms that refer to a sequence of nucleotides) related by the base-pairing rules. For example, the sequence "C-A-G-T" is complementary to the sequence "G-T-C-A". Complementarity can be "partial" or "total". "Partial" complementarity is where one or more nucleic acid bases is not matched according to the base pairing rules. "Total" or "complete" complementarity between nucleic acids is where each and every nucleic acid base is matched with another base under the base pairing rules. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods that depend upon binding between nucleic acids.
[0050] As used herein, the term "hybridize", "hybridization" or "hybridizing" refers to the pairing of complementary nucleic acids resulting in the formation of a partially or wholly complementary nucleic acid duplex by association of single strands. Hybridization usually occurs between DNA and RNA strands or previously unassociated DNA strands, but also between two RNA strands; and may be used to detect and isolate specific sequences, measure homology, or define other characteristics of one or both strands. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, the melting temperature of the formed hybrid, and the G:C ratio within the nucleic acids.
[0051] As used herein, the term "oligonucleotide" refers to a short polynucleotide or a portion of a polynucleotide comprised of two or more deoxyribonucleotides or ribonucleotides, preferably more than three, and usually more than ten. The exact size will depend on many factors, which in turn depends on the ultimate function or use of the oligonucleotide. The oligonucleotide may be generated in any manner, including chemical synthesis, DNA replication, reverse transcription, or a combination thereof. The word "oligo" is sometimes used in place of the word "oligonucleotide".
[0052] As used herein, the term "sample" is used in its broadest sense and includes environmental and biological samples. Environmental samples include material from the environment such as soil and water. Biological samples may be animal, including, human, fluid (e.g., blood, plasma and serum), solid (e.g., stool), tissue, liquid foods (e.g., milk), and solid foods (e.g., vegetables). For example, a pulmonary sample may be collected by bronchoalveolar lavage (BAL), which may comprise fluid and cells derived from lung tissues. A biological sample may comprise a cell, tissue extract, body fluid, chromosomes or extrachromosomal elements isolated from a cell, genomic DNA (in solution or bound to a solid support such as for Southern blot analysis), RNA (in solution or bound to a solid support such as for Northern blot analysis), cDNA (in solution or bound to a solid support) and the like.
[0053] The present invention relates to the simultaneous detection and quantification of multiple nucleic-acid sequences in a plurality of biological samples in parallel fashion. For example, sequence analysis may be performed using a massively-parallel multiplex reaction of locus-specific nucleic acids. It is of relevance to the fields of transcriptional profiling, genotyping and mutation analysis. [0054] In one embodiment, the method involves providing a plurality of biological samples each potentially containing one or more target nucleic-acid sequences, and performing LDR/PCR on each biological sample. For example, LDR relies on probe pairs that are designed to be substantially complementary to the loci of interest such that adjacent probes of a probe pair ligate when brought into juxtaposition through hybridization with the target nucleic acid fragment. In PCR, the LDR ligation products are amplified using primers that are complementary to universal priming sites incorporated into the hybridization probes. Such LDR/PCR applications may be used for transcriptional profiling and genotyping applications.
[0055] In one embodiment, the present invention contemplates a method for analyzing multiple loci (i.e., for example, approximately between 100-1,000 loci) from a large number of biological samples rapidly and at low unit cost. Current next generation technologies capable of simultaneously sequencing hundreds of millions of individual DNA molecules in a single run require a matter of days to be completed (i.e., for example, Helicos HeliScope, Pacific Biosciences, Oxford Nanopore Technologies).
[0056] Although it is not necessary to understand the mechanism of an invention, it is believed that some of the embodiments described herein are compatible with these current next generation sequencing technologies and may exploit their high-throughput capabilities, for example: a process may be contemplated in which: (1) stretches of nucleic acid (e.g., a digested nucleotide sequence, such as a target-specific nucleotide sequence) representing the loci of interest are isolated from each of a plurality of biological sample collections (e.g., populations); (2) a short stretch of nucleic acid (i.e., for example a "well-barcode" sequence or a "gene- barcode" sequence) is appended to all of the molecules from each sample such as to introduce a sequence element that uniquely identifies that sample; (3) the so-appended molecules from the various biological sample collections (e.g., populations) are pooled together to constitute a single sequencing library; (4) molecules in the pooled library are individually sequenced in a massively-parallel fashion; (5) the sequence of the well-barcode segment of each molecule is used to identify the sample from which it was derived; (6) the sequence of the gene-barcode segment (e.g., a locus-specific sequence) of those molecules identify the locus from which it derived and its genotype; and (7) the number of identical gene-barcode segments and well- barcode segment combinations identify the abundance of that locus (i.e. expression level or copy number) in each biological sample collection (e.g., population). See, Figure 1. [0057] To successfully practice the above process requires solving at least the following three challenges: (1) isolating stretches of nucleic acid representing loci of interest in a manner suitable for (a) small sample input (e.g. cells cultured in one well of a 384-well plate), and (b) high-throughput, automatable implementation; (2) introduction of well-barcode sequence elements with (a) minimal additional sample manipulation, (b) a short linear placement from a gene -barcode element such that either the well-barcode or the gene-barcode is adjacent to one end of the molecule to minimize the read-length required to identify both barcode sequence elements, and (c) without the requirement for numbers of oligonucleotide probes or primers approximating the product of the number of loci of interest by the number of samples (e.g., loci # x sample #) to be analyzed simultaneously; and (3) arranging a sequencing library to contain a number of molecules sufficient for modest differences in abundance of each locus between each constituent sample to be detected.
[0058] More specifically, in one embodiment the present invention contemplates a method for multiplexed locus-specific nucleic acid sequence analysis and abundance measurement from a plurality of biological samples in massively-parallel fashion. In one embodiment, the method uses a single -molecule sequencing device. In one embodiment, the method may comprise an LDR/PCR step, followed by a restriction endonuclease digestion (RED) phase (i.e. LDR/PCR/RED) to create digested ligation fragments. In some embodiments, the digested ligation fragments are appended with well-barcodes. In some embodiments, the method further may comprise pooling the digested ligation fragments (e.g., gene-barcode elements) derived from a plurality of biological sample populations (e.g., collections) to which well-barcodes have been appended to constitute a sequencing library. In some embodiments, the pooled library is sequenced, preferably using next generation high-throughput sequencing technology (e.g., Illumina, SoLID, etc.)
[0059] LDR technology has been previously reported, including variations that include an additional extension step. Barany et al. Gene 109: 1-11 (1991); Barany et al., US 6,027,889 and continuations thereof; and Oliphant et al., US 7,582,420 (all three herein incorporated by reference). Basically, LDR utilizes the ability of DNA ligase to preferentially seal two adjacent oligonucleotides while hybridized to a template DNA molecule. LDR/PCR refers to a method that couples the ligase detection reaction (LDR) with the polymerase chain reaction (PCR) such that the sensitivity of the PCR step is complemented by the high specificity of the LDR step. The LDR/PCR reaction may be performed on nucleic acid molecules derived from a plurality of biological samples, each potentially containing one or more target nucleic-acid sequences.
[0060] A number of methods for isolating stretches of nucleic acid (e.g., for example, a nucleotide sequence) representing a locus of interest are currently available. These methods achieve isolation by differential labeling, differential amplification, differential hybridization, polymerase chain reaction (PCR), oligonucleotide ligation assay (OLA), ligase chain reaction (LCR), gap ligase chain reaction, combined ligase detection and polymerase chain reaction (LDR/PCR), and hybrid selection. LDR PCR methods and variants thereof have been reported to provide data relevant to transcriptional profiling and genotyping applications. Barany et al. (Gene 109, 1-11 : 1991; and US 6,027,889 and continuations thereof, both herein incorporated by reference); and Oliphant et al, US 7,582,420, Figure 2.
[0061] LDR/PCR methods can operate at high degrees of multiplexing (i.e., for example, >1,000 plex). The exponential amplification provided by the PCR step makes LDR/PCR suitable for use with small sample inputs. Furthermore, incorporation of a universal primer site in at least one LDR probe permits the use of universal primers in the subsequent PCR and maintains the relative proportionality of target loci from the starting material in the final product pool. DASL and GoldenGate methods offered by Illumina are thought to be based upon LDR/PCR.
[0062] The following description is a modification of a previously reported LDR/PCR method. Barany et al. US 7,429,453 (herein incorporated by reference). Usually, one oligonucleotide probe pair set may be provided for each target locus. Figure 2. Each probe pair set may include, but is not limited to: i) a first oligonucleotide probe having a first target- specific portion and a 5' upstream primer-specific portion ("left probe"); and ii) a second oligonucleotide probe having a second target-specific portion, a 3 ' downstream primer-specific portion, and is 5 ' phosphorylated ("right probe"). The oligonucleotide probes in a particular set are suitable for ligation together when hybridized adjacent to one another on a corresponding target-specific nucleotide sequence. However, ligation will not occur if the probes hybridize to a non-target sequence because of a large number of base-pairing mismatches.
[0063] Usually, a target-specific nucleotide sequence, a perfectly complementary probe set and a ligase blend together to form a ligase detection reaction mixture that can be subjected to one or more ligase detection reaction cycles. These cycles include a denaturation treatment and a hybridization treatment. In the denaturation treatment any hybridized oligonucleotides are separated from the target nucleotide sequence (usually mediated by an increase in temperature). The hybridization treatment causes the probes to hybridize at adjacent positions in a base- specific manner to their respective target nucleotide sequences (usually mediated by a decrease in temperature). Once hybridized, the probe sets ligate to one another to form a ligation product sequence.
[0064] This ligation product sequence may further contain a 5' upstream primer-specific portion, the joined target-specific portions, and a 3' downstream primer-specific portion. A set of oligonucleotide primers may be provided which may comprise an upstream primer ("top primer") complementary to the 5' upstream primer-specific portion of the ligation product sequence and a downstream primer ("bottom primer") complementary to the 3' downstream primer-specific portion of the ligation product sequence. A ligase detection reaction mixture can then be blended with this primer set and a polymerase to form a PCR mixture. The PCR mixture can be subjected to one or more PCR cycles, which includes a denaturation treatment, a hybridization treatment, and an extension treatment. During the denaturation treatment hybridized nucleic-acid sequences are separated. The hybridization treatment causes primers to hybridize to their respective complementary primer-specific portions of the ligation product sequence. During the extension treatment, hybridized primers are extended to form extension products complementary to the sequences to which the primers are hybridized.
[0065] In a first cycle of PCR, a downstream primer hybridizes to the 3 ' downstream primer- specific portion of a ligation product sequence and is extended to form an extension product complementary to the ligation product sequence. In subsequent cycles, the upstream primer hybridizes to the 5 ' upstream primer-specific portion of the extension product complementary to the ligation product sequence and the downstream primer hybridizes to the 3' downstream portion of the ligation product sequence. The resulting product is double-stranded DNA of which one strand has the same sequence as the ligation reaction product and the other strand is complementary to the ligation reaction product.
[0066] A high-throughput transcriptional profiling method has been developed and reported that is referred to as "ligation-mediated amplification" (LMA). Peck et al. "A method for high- throughput gene expression signatures analysis" Genome Biology 7: R61 2006, herein incorporated by reference). LMA differs from LDR/PCR by the addition of an initial step to covalently attach oligo-dT's (i.e., for example, TurboCapture®, Qiagen). Consequently, polyadenylated RNA molecules may be captured from crude lysates of the input biological sample by these oligomers, which then serve as primers for reverse transcription.
[0067] The addition of reverse transcriptase and nucleotides under suitable reaction conditions results in the synthesis of immobilized first-strand cDNAs representing the population of mRNAs in the original biological sample suitable as targets for LDR/PCR in transcriptional- profiling applications and for genotyping of expressed sequence variants (including the special case of RNA editing). This initial step obviates the need for prior nucleic-acid isolation and purification, and allows excess and unannealed ligation probes to be removed by simple washing, thereby allowing the entire process - beginning from crude biological materials (e.g. cell cultures) - to be performed in largely automated fashion. Current LMA implementation allows the simultaneous isolation by differential amplification of nucleic-acid stretches representing 1,400 distinct loci from each of 384 crude biological samples (e.g. cells cultured in 384-well microtiter plates) in less than two days with minimal operator intervention.
[0068] In one embodiment, the present invention contemplates a method which may comprise introducing a well-barcode nucleotide sequence into an addressable fragment. In one embodiment, the addressable fragment is derived from annealing two synthetic oligonucleotides.
[0069] Well-barcode sequence elements may be introduced between a target-specific portion (e.g., a gene -barcode) and a primer-specific portion of either a left probe or a right probe of a ligation probe pair. See, Figure 3A. This method may place a well-barcode and gene-barcode in close proximity (e.g., but not adjacent) in a final PCR product. In light of the challenges outlined above, this strategy is rendered impractical for high-throughput multiplex locus-specific sequencing by the requirement that at least one unique ligation probe must be produced for each locus in each sample in the sample pool. For example, an analysis of 100 loci in 384 samples would require at least 38,400 right probes (i.e., 100 x 384, wherein each contains a different well-barcode) and 100 left probes.
[0070] Previous research has disclosed that at least one primer with a 5 ' well-barcode may be used during LDR/PCR. Barany et al. US 6,027,889 and continuations thereof (herein incorporated by reference), and Figure 3B. This method can incorporate a well-barcode at either end of an LDR/PCR product (e.g., two well-barcodes per LDR/PCR product). However, only one primer with a well-barcode is useful to uniquely identify each additional sample included in the sample pool. Unfortunately, even if only a single well-barcode was incorporated, this method is not useable with the embodiments described herein, because the method places a primer- specific sequence between a well-barcode and a gene-barcode, thereby making an inappropriately long sequencing read length. As this construct cannot be shortened to less than approximately 20 nucleotides without compromising annealing of the primers, an unfavorable sequencing read length required to simultaneously obtain the well-barcode and the gene-barcode sequences, and also places an unfavorable proportion of non-informative bases within those reads (i.e., for example, approximately 50%).
[0071] In one embodiment, the present invention contemplates a method which may comprise introducing one or more specific restriction endonuclease recognition and/or cleavage sites into at least one primer-specific sequence of an amplified ligation product (e.g., an amplified gene -barcode nucleic acid) to create a modified amplified ligation product. In one embodiment, the method further may comprise contacting an endonuclease with a modified amplified ligation product (i.e., a gene-barcode product) for example, wherein the product is cleaved such that a cohesive termini suitable for ligation with a double-stranded addressing fragment which may comprise a well-barcode sequence is created. See, Figure 3C (overhang represents a cohesive end at the endonuclease cleavage site).
[0072] Although it is not necessary to understand the mechanism of an invention, it is believed that some embodiments of the present invention may be performed with two oligonucleotides (i.e., for example, a well-barcode sequence and a complementary LDR/PCR sequence (e.g., a gene-barcode sequence) for each additional sample included in the sample pool, and generates LDR/PCR products with terminal well-barcode elements in which the well- barcode and gene-barcode are separated by only a very small number ofnucleotides (i.e., for example, approximately three (3) nucleotides). Further, those of skill in the art will appreciate that many restriction endonucleases function effectively in PCR reaction buffers, and methods disclosed herein provide for the ligation of well-barcode elements to those digestion products in the same reaction mixture.
[0073] In contrast to some embodiments of the present invention, the above described conventional techniques are limited to subjecting the products of the LDR PCR reaction to restriction endonuclease digestion. Note that Barany et al. (US 6,027,889 and continuations thereof) teaches subjecting the products of the LDR/PCR reaction to restriction endonuclease digestion. However, this established art exploits restriction endonuclease recognition sites present in the target-specific portion of the LDR/PCR products to generate digestion fragments of sizes identifiable with a particular locus, such that these fragments can be separated by electrophoresis allowing their individual relative abundance (and the abundance of the loci from which they derive) to be estimated.
[0074] The introduction of restriction endonuclease sites in the common primer-specific portions of LDR/PCR products and subsequent cleavage to generate fragments of the same size of the present method has utility unanticipated by Barany. In one embodiment, the present invention contemplates a method which may comprise introducing at least one restriction endonuclease site in an LDR/PCR product primer-specific sequence, wherein the LDR PCR product is cleaved to generate similarly-sized fragments.
[0075] In one embodiment, a restriction enzyme recognition and/or cleavage site is placed within an LDR/PCR product (i.e., for example, a digested ligation product and/or a gene-barcode nucleic acid) primer-specific sequence. Although it is not necessary to understand the mechanism of an invention, it is believed that such positioning ensures that any subsequent cleavage with a corresponding restriction endonuclease generates an identical end sequence (i.e., an overhang end and/or a cohesive end) in all LDR/PCR products regardless of the sequence of the target-specific portions. In one embodiment, a restriction endonuclease recognition site may be introduced into one, or both, of the primer-specific portions of PCR products. In yet another embodiment, restriction endonuclease recognition sites may be introduced into both LDR product and PCR product primer-specific sequences. In another embodiment, restriction endonuclease recognition sites are introduced to either an LDR product or a PCR product primer-specific sequence. In one embodiment, different restriction endonuclease recognition sites may be introduced into two or more subpopulations of LDR/PCR products, allowing subsets of products generated in the same LDR/PCR process to be selectively cleaved. In one embodiment, restriction endonuclease recognition sites are introduced at a position in the primer- specific portion of LDR/PCR products such that cleavage sites for the corresponding restriction endonucleases abut an adjacent target-specific portion of the LDR/PCR products. Although it is not necessary to understand the mechanism of an invention, it is believed that this particular positioning minimizes the distance between the target-specific element (i.e., for example, a "gene -barcode sequence") and a well-barcode sequence element. It is further believed that an addressing fragment which may comprise a well-barcode sequence and a gene -barcode nucleic acid (e.g., a target-specific ligation product) are subsequently ligated via the cohesive terminus created by the endonuclease, and consequently, the resultant informational read (i.e., for example, final barcoded product) may comprise both barcode sequences.
[0076] It is known to those of skill in the art that the efficiency of cleavage by restriction endonucleases may be adversely affected when a restriction endonuclease recognition site is positioned close to the end of a PCR product. To avoid this situation, some embodiments of the present invention contemplate that restriction endonuclease recognition/cleavage sites are introduced at a position in the primer-specific portion of LDR/PCR products such that the number of base pairs flanking either side of the restriction endonuclease recognition site is greater than approximately three.
[0077] One embodiment of the present invention contemplates a method which may comprise introducing a restriction endonuclease recognition and/or cleavage site in an LDR PCR product primer-specific sequence. See, Figure 4.
[0078] In one embodiment, the present invention contemplates a method which may comprise performing a ligase detection reaction with a template nucleic acid sequence which may comprise a 3' primer-specific portion, a 5' primer-specific portion, and a gene-barcode sequence. In one embodiment, the method further may comprise performing a polymerase chain reaction with a modified (or "bottom") primer complementary to the 3' primer-specific portion, wherein the modified 3' primer may comprise a restriction endonuclease recognition site. In one embodiment, the method creates an LDR/PCR product which may comprise a gene-barcode sequence, a 3' primer-specific portion, and a modified 5' primer sequence which may comprise the endonuclease recognition site. In one embodiment, the method further may comprise performing a polymerase chain reaction with an unmodified (or "top") primer complementary to the 3 ' primer-specific portion of the LDR/PCR product.
[0079] In one embodiment, LDR is performed with a right probe in which a 3' primer- specific portion contains a nucleotide sequence differing from the restriction endonuclease recognition site by at least one, but less than approximately four nucleotides. In one embodiment, PCR is performed with a downstream (or "bottom") primer complementary to the 3' downstream primer-specific portion of the resulting ligation product sequence except for the region deviating from the restriction endonuclease recognition site which is substituted in the primer with nucleotides complementary to those matching the intact restriction endonuclease recognition site. See, Figure 4.
[0080] Those of skill in the art will know that a small number of internal (especially noncontiguous) nucleotide mismatches between a primer and a primer site will not necessarily prevent annealing and subsequent extension, but that a reduction in the annealing temperature from that suitable for a perfect-match primer/primer-site combination may be required, and that an appropriate annealing temperature can easily be determined by simple experimentation.
[0081] In one embodiment, the present invention contemplates a first PCR cycle wherein a bottom primer anneals to a non-perfectly matched 3 ' downstream primer-specific portion of the ligation product sequence and is extended to form an extension product complementary to the complement of the ligation product sequence. In one embodiment, subsequent PCR cycles are performed wherein a top primer (unmodified) hybridizes to a 5 ' upstream primer-specific portion of the extension product primed with the bottom primer and is extended to form an extension product with the same sequence as the ligation product except for the 3' downstream primer- specific portion which contains the intact restriction endonuclease recognition site templated by the bottom primer, and the bottom primer anneals to either the non-perfectly matched 3' downstream primer-specific portion of the ligation product sequence or, increasingly as the reaction proceeds, the perfectly-matched 3 ' downstream primer-specific portion of the extension product primed from the top primer.
[0082] These embodiments provide advantages over conventional LDR techniques in that: (1) the annealing of the bottom primer with an unligated right probe does not constitute a functional double-stranded restriction endonuclease recognition site, thereby eliminating a potential source of background signal; and (2) LDR/PCR products containing restriction endonuclease recognition sites can often be generated using legacy right probes designed and synthesized without the explicit (or fortuitous) presence of a restriction endonuclease recognition site in the 3' primer-specific portion (with only a knowledge of restriction endonuclease recognition sites and primer design capabilities common to those of skill in the art).
[0083] In one embodiment, the present invention contemplates a method which may comprise a restriction endonuclease, wherein an LDR/PCR product is cleaved. Any of a large number of enzymes may be used, provided their recognition sites are not present in the target- specific portion of a large proportion of LDR PCR products. [0084] However, certain attributes of an endonuclease restriction enzyme cleavage site are advantageous. In one embodiment, a restriction endonuclease used to cleave an LDR/PCR product generates an overhang of at least approximately three nucleotides. In one embodiment, a restriction endonuclease used to cleave an LDR/PCR product generates a 3' overhang. Although it is not necessary to understand the mechanism of an invention, it is believed that a 3 ' overhang is recessed and therefore can act as a primer and be extended during a PCR amplification. It is further believed that a 3' recessed overhang eliminates a cohesive end thereby preventing unintended ligation. In one embodiment, a restriction endonuclease used to cleave an LDR/PCR product generates asymmetric ends. Although it is not necessary to understand the mechanism of an invention, it is believed that asymmetric ends may prevent concatemerization (i.e. dimerization) of cleaved LDR/PCR products having compatible ends during a subsequent ligation phase. Further, restriction enzymes with degeneracy in their cleavage site, for example Dralll (C AC NN A GTG) and many outside cutters such as Hgal (GACGC NNNNA NNNN), are desirable in this regard. Other attributes of restriction enzymes known to those of skill in the art including but not limited to stability in reaction, and commercial availability may also be considered.
[0085] In one embodiment, the present invention contemplates cleaving an LDR/PCR product with a restriction endonuclease at a restriction recognition and/or cleavage site. In one embodiment, LDR/PCR products are blended with a restriction enzyme to form a restriction endonuclease cleavage mixture. The restriction endonuclease cleavage mixture may be subjected to one heat treatment leading to cleavage of each LDR/PCR product in the primer-specific portion of these products by the action of the restriction endonuclease to yield at least two predominantly double-stranded digestion products whose termini are 5' phosphorylated. An optional second heat treatment may be performed to inactivate the endonuclease.
[0086] In one embodiment, a restriction endonuclease is added directly to a PCR product mixture to form a PCR product restriction endonuclease reaction mixture. In one embodiment, a restriction endonuclease is blended with a reaction buffer containing materials required for, to enhance the activity of, or to dilute the enzyme, to form a restriction endonuclease mix. The restriction endonuclease mix is added directly to the PCR product mixture to form a restriction endonuclease reaction mixture. [0087] Many restriction endonucleases are active in PCR reaction buffers without modification or supplementation. Turbett and Sellner, "Digestion ofPCR and RT-PCR Products with Restriction Endonucleases Without Prior Purification or Precipitation" Promega Notes Magazine 60:23-26 (1996); and Blanck et al, "Activity of Restriction Enzymes in a PCR Mix" Biochemica 3:25 (1997). Restriction endonuclease activity in PCR reaction buffers may be enhanced by the addition of certain materials known to those of skill in the art not commonly present in such buffers, or not commonly present at optimum concentrations, including, but not limited to, bovine serum albumin, sodium chloride, and dithiothreitol. These materials also provide a convenient diluent for the enzyme.
[0088] One problem that can be encountered during endonuclease restriction cleavage is when a restriction endonuclease generates a 5 ' overhang and is added directly to a PCR reaction mixture. In this situation, the DNA polymerase fills in the complementary 3' recessed end, thereby generating a blunt end, such that the cohesive ends necessary for a subsequent ligation phase are eliminated.
[0089] This problem can be solved by treating the PCR product mixture that leads to the destruction (i.e. inactivation, denaturation, degradation) of the polymerase enzyme before the RED phase. Such treatments are known to those of skill in the art and include, but are not limited to, contacting the PCR product mixture with proteinase K (Crowe et al. "Improved Cloning Efficiency of Polymerase Chain Reaction (PCR) Products after Proteinase K Digestion" Nucleic Acids Research 19: 184 (1991)), and/or a sustained incubation at high temperature (preferably approximately one hour at approximately 100°C).
[0090] In one embodiment, the present invention contemplates a method which may comprise joining an addressing fragment to a restriction nuclease cleaved LDR/PCR product to create a final barcoded product. In one embodiment, the addressing fragment may comprise a well-barcode sequence.
[0091] In one embodiment, a restriction nuclease cleaved LDR/PCR product (henceforth an "LDR PCR/RED product") may comprise a first double-stranded DNA fragment containing an LDR PCR target-specific portion (e.g., a gene-barcode sequence) with at least one preferably cohesive end (i.e., for example, a 5' overhang end), and a second double-stranded DNA fragment containing the remainder of the primer-specific portion of LDR/PCR products and one cohesive end (i.e., for example, a 5' overhang end). See, Figure 3C. [0092] In some embodiments, one or more addressing fragments are provided in an LDR/PCR reaction mixture, wherein each addressing fragment may comprise a well-barcode sequence element. In one embodiment, an addressing fragment is blended with an LDPv/PCPv/RED product and contacted with a ligase so as to form a barcode ligation reaction mixture, wherein an addressing fragment well-barcode sequence is joined to the target-specific portion of the LDR/PCR product, thereby forming a "final barcoded product". Although it is not necessary to understand the mechanism of an invention, it is believed that the joining of LDR PCR/RED products with an addressing fragment is favored over rejoining of compatible LDR/PCR/RED products by providing the addressing fragment in molar excess over the LDR/PCR/RED products.
[0093] In one embodiment, the present invention contemplates contacting an addressing fragment and a ligase with a restriction endonuc lease cleaved LDR/PCR product mixture. Certain materials known to those of skill in the art not commonly present in PCR reaction buffers or restriction endonuclease digestion buffers, or not commonly present at optimum concentrations include, but not limited to, adenosine triphosphate and beta-nicotinamide adenine dinucleotide that may be required for ligase activity. These materials also provide a convenient diluent for the enzyme. In one embodiment, an addressing fragment and a ligase are blended with a reaction buffer containing such materials required to enhance, or dilute, ligase activity. This ligase mix is added directly to the restriction endonuclease reaction mixture after completion of the restriction endonuclease cleavage step.
[0094] In one embodiment, the present invention contemplates a method which may comprise appending an addressing fragment to an LDR/PCR/RED product derived from each biological sample in a collection of biological samples such that the resulting final barcoded products may be readily distinguished when combined (i.e. pooled) so as to be analyzed in parallel. In one embodiment, a plurality of addressing fragments is provided, each which may comprise a distinguishable (i.e. non-identical and/or unique) well-barcode sequence element. In one embodiment, the number of addressing fragments provided is approximately equal to the number of biological samples in a population of biological samples to be analyzed in parallel in a single pool. In one embodiment, a different addressing fragment selected from a plurality of addressing fragments, each which may comprise a distinguishable (i.e. non-identical) well- barcode sequence element, is appended to the LDR/PCR/RED products derived from each biological sample in a collection of biological samples.
[0095] In one embodiment, the present invention contemplates a composition which may comprise an addressing fragment containing a well-barcode sequence element. In one embodiment, the well-barcode sequence may comprise a cohesive end, such that the barcode sequence is capable of being joined to a cleaved LDR/PCR/RED product. In one embodiment, the addressing fragment is DNA. In one embodiment, the addressing fragment is substantially double stranded DNA. In one preferred embodiment, the cohesive end may comprise a 5 ' or 3 ' overhang and a phosphorylated 5 '-terminus.
[0096] In certain circumstances including, but not limited to, increasing the proportion of informative reads generated by a single-molecule sequencing device, it may be desirable to selectively destroy, degrade, remove, and/or separate one strand of a double-stranded DNA molecule ((i.e. top strand/primer, bottom strand/primer, forward strand/primer or reverse strand/primer) from its complement strand.
[0097] Those of skill in the art will be aware of a variety of methods of strand enrichment involving the introduction of a specifying moiety to one strand of a double-stranded molecule including, but not limited to, positive selection of polynucleotides containing a biotin moiety with a suitable affinity reagent (e.g. streptavidin) and negative selection (i.e. degradation) of 5'- phosphorylated polynucleotides by the action of lambda exonuclease. Higuchi and Ochman, "Production of single-stranded DNA template by exonuclease digestion following polymerase chain reaction" Nucleic Acids Research 17: 5865 (1989)(herein incorporated by reference). Inclusion of a specifying moiety in the addressing fragment provides a convenient means to introduce such a moiety to hybrid molecules containing the addressing fragment (i.e., for example, a final barcoded product). In one embodiment, the addressing fragment contains a moiety capable of specifying (i.e. facilitating) strand enrichment.
[0098] In one embodiment, the present invention contemplates a composition which may comprise an addressing fragment having a 3' cohesive end and a 5' phosphorylated nucleotide end. One problem that can be encountered during the barcode ligation phase is when an addressing fragment having a 5' phosphorylated nucleotide end at the opposite end and a 3' cohesive end, the 5' end is blunt-ended and double stranded such that concatemerization (i.e., for example, dimerization) of the addressing fragments can occur by blunt-end ligation. The present invention has a solution for this problem.
[0099] In one embodiment, the present invention contemplates an asymmetric 5'- or 3'- overhang at a 5'-phosphorylated terminus. Although it is not necessary to understand the mechanism of an invention, it is believed that a 3 '-overhang is more preferable because it avoids the possibility that a 5 '-overhang will be filled in by residual polymerase activity carried over from the PCR step, thereby recreating a terminus available for blunt-end ligation, and lambda exonuclease initiates degradation less efficiently from 5 ' overhangs (New England Biolabs website).
[0100] In one embodiment, the opposite end of the addressing fragment from the cohesive end compatible with the ends of LDR/PCR/RED products has an asymmetric overhang, most preferably a 3 '-overhang. In one embodiment, final barcode products are treated wherein there is a substantial enrichment of one strand of final barcoded products over the other strand of final barcoded products. In one embodiment, substantial enrichment of first final barcode product strand over a second final barcode product strand may comprise contact with a lambda exonuclease.
[0101] Certain single-molecule sequencer instruments require the presence of recognition sequence elements for purposes including, but not limited to, primer annealing or strand capture (i.e. immobilization) in molecules to be sequenced (e.g. Helicos HeliScope RG2 version flow cell). In addition, the presence of such recognition sequence elements can obviate the need to destroy (i.e. degrade) or remove (i.e. separate) one strand of double-stranded DNA products that under some circumstances can increase the proportion of informative reads generated by a single-molecule sequencing instrument.
[0102] In one embodiment, the present invention contemplates a composition which may comprise an addressing fragment which may comprise a recognition sequence element. In one embodiment, the recognition sequence fragment is introduced into a final barcoded product. In one embodiment, the recognition sequence element is compatible with a single-molecule sequencing instrument. In one embodiment, the recognition sequence element is positioned at the end opposite to the cohesive end of an addressing fragment, wherein the cohesive end is compatible with the ends of at least one LDR/PCR/RED product. In one embodiment, the recognition sequence element is substantially single-stranded. [0103] In one embodiment, the present invention contemplates a composition which may comprise a plurality of addressing fragments. In one embodiment, each addressing fragment in the plurality of addressing fragments may comprise a different well-barcode sequence element.
[0104] Well-barcode sequence elements can differ in attributes including, but not limited to, length and sequence. The composition and combinatorial nature of DNA is such that populations of very short polynucleotides of similar sizes can contain very large numbers of non-identical species. For example, there are 256 non-identical 4-mer polynucleotides, and more than one million possible 10 mers. In one embodiment, all addressing fragments in a plurality of addressing fragments differ only in the nucleotide sequence of their well-barcode sequence element. In one embodiment, all addressing fragments in a plurality of addressing fragments are similarly-sized. In one embodiment, all addressing fragments in a plurality of addressing fragments are similarly-sized such that the size range of final barcoded products are different from the size range of other nucleic acids, including, but not limited to, intact LDR/PCR products (i.e., not cleaved by an endonuclease enzyme), cleaved LDR/PCR/RED products, rejoined LDR/PCR/RED products (i.e., dimers and other concatemers).
[0105] Since all LDR PCR products are similarly-sized, and a different size from all LDR/PCR/RED products, this inherently results in all final barcoded products also being similarly sized, thereby allowing for selection and/or purification (i.e., for example, by methods including, but not limited to, gel electrophoresis).
[0106] Those of skill in the art will know of a variety of ways to construct (i.e. generate) addressing fragments including but not limited to blending together of substantially-complementary synthetic oligonucleotides under conditions allowing them to become annealed according to standard base pairing.
[0107] The relationship between the number of loci analyzed in each sample, the number of such samples pooled together, the number of informative reads per sequencing cycle (and the error rate of the sequencing device), and the resolution at which differences in abundance of loci within and between samples that can be detected, all interact in the performance of some embodiments of the present invention, such that their contributions have been empirically determined and presented herein.
[0108] In one embodiment, the present invention contemplates a method for creating a sequencing library which may comprise combining (i.e. pooling) a plurality of final barcoded products, wherein the final barcoded products are derived from a specific biological sample within a collection of biological samples.
[0109] In one embodiment, the plurality of barcode ligation phase products (e.g., final barcoded products) generated from a plurality of barcode ligation phases each performed with a different addressing fragment selected from a plurality of addressing fragments, each comprising a distinguishable (i.e. non-identical and/or unique) well-barcode sequence element, are blended (i.e. combined, pooled). In one embodiment, each barcode ligation phase from a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is performed with LDR/PCR/RED products derived from one biological sample in a collection of biological samples. In one embodiment, one or more barcode ligation phases from a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is performed with LDR/PCR/RED products derived from a biological sample not part of a collection of biological samples. In one embodiment, one or more barcode ligation phases from a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is performed with LDR PCR/RED products derived from a non-biological (i.e. artificial, synthetic) sample.
[0110] The number of barcode ligation phases in a plurality of barcode ligation phases whose products are combined is related to parameters including, but not limited to, the number of loci targeted in the LDR phase, the number of informative reads (i.e. reads containing an identifiable well-barcode and gene-barcode sequence element) generated per sequencing cycle by a single- molecule sequencing device, and the resolution and degree of confidence (i.e. coverage) at which differences in the relative abundance or the sequence of loci targeted are detected.
[0111] In one embodiment, the number of barcode ligation phases in a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is selected according to the number of loci targeted in the LDR phase. In one embodiment, the number of barcode ligation phases in a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is selected according to the number of informative reads generated per sequencing cycle by the single-molecule sequencing device. In one embodiment, the number of barcode ligation phases in a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is selected according to the error rate (i.e. the frequency at which the true identity of a nucleotide in a polynucleotide is incorrectly reported) of a single-molecule sequencing device. [0112] In one embodiment, the number of barcode ligation phases in a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is selected according to resolution at which differences in the relative abundance of loci targeted are detected. In one embodiment, the number of barcode ligation phases in a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is selected according to the degree of confidence at which differences in the sequence of loci targeted are detected.
[0113] In one embodiment, the amount (i.e., volume and/or moles of nucleic acid) of barcode ligation phase products (e.g., final barcoded products) from each barcode ligation phase in a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is equal.
[0114] In one embodiment, the amount (i.e. volume and/or moles of nucleic acid) of barcode ligation phase products from each barcode ligation phase in a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is not equal. These are circumstances including, but not limited to, desiring a different coverage for different biological samples in a collection of biological samples under which different amounts of barcode ligation phase products may be combined.
[0115] In one embodiment, the amount (i.e. volume and/or moles of nucleic acid) of barcode ligation phase products from each barcode ligation phase in a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is selected according to the resolution at which differences in the relative abundance of loci targeted are detected.
[0116] In one embodiment, the amount (i.e. volume and/or moles of nucleic acid) of barcode ligation phase products from each barcode ligation phase in a plurality of barcode ligation phases whose products are blended (i.e. combined, pooled) is selected according to the degree of confidence at which differences in the sequence of loci targeted must be detected.
[0117] Generation of a sequencing library from a mixture (i.e. blend, pool) of barcode ligation phase products (e.g., final barcoded products) can involve the purification (i.e. enrichment, selection, concentration) of the final barcoded products from other reaction components and products including, but not limited to, LDR/PCR products left undigested after the RED phase, all LDR PCR/RED products, LDR/PCR/RED products rejoined during the barcode ligation phase, addressing fragments not joined during the barcode ligation phase, and enzymes, salts and diluents. Those of skill in the art will know of many methods capable of effecting such a purification of final barcoded products including but not limited to the use of various resins, columns, and gels.
[0118] In one embodiment, nucleic-acid components are purified (i.e. separated, concentrated) from non-nucleic-acid components of a mixture (i.e. blend, pool) of barcode ligation phase products.
[0119] In one embodiment, the nucleic-acid components of a mixture (i.e. blend, pool) of barcode ligation phase products are purified (i.e. separated, concentrated) by ethanol precipitation followed by dissolution in a suitable vehicle.
[0120] In one embodiment, final barcode products are purified (i.e. separated, concentrated) from other nucleic-acid components of a mixture (i.e. blend, pool) of barcode ligation phase products.
[0121] In one embodiment, final barcode products are purified (i.e. separated, concentrated) from other nucleic-acid components of a mixture (i.e. blend, pool) of barcode ligation phase products by size. In one embodiment, the size-based purification may comprise techniques including, but not limited to, gel electrophoresis, band visualization, band excision, elution of nucleic acids from the excised band, and ethanol precipitation of the nucleic acids eluted from the excised band followed by dissolution in a suitable vehicle.
[0122] Generation of a sequencing library from a population of purified, selected, separated, concentrated and/or enriched final barcoded products from other reaction products and components from a mixture (i.e. blend, pool) of barcode ligation phase products can involve the selective destruction (i.e. degradation) or removal (i.e. separation) of one strand (i.e. top or bottom, or forward or reverse) of final barcoded products over the other strand of final barcoded products.
[0123] Certain embodiments of the present invention provide for the introduction of moieties (e.g., selection moieties) to final barcoded products allowing for such strand enrichment. Other embodiments provide for strand enrichment at the completion of the barcode ligation phase. However, it is more convenient to perform strand enrichment of final barcoded products after pooling of a plurality of barcode ligation phase products.
[0124] In one embodiment, a mixture (i.e. blend, pool) of barcode ligation phase products are subjected to a treatment substantially resulting in the enrichment of one strand of final barcoded products over the other strand of final barcoded products. [0125] In one embodiment, final barcoded products purified (i.e., for example, selected, separated, concentrated, enriched) from a mixture (i.e. blend, pool) of barcode ligation phase products are subjected to a treatment substantially resulting in the enrichment of one strand of final barcoded products over the other strand of final barcoded products.
[0126] In one embodiment, final barcoded products purified (i.e., for example, selected, separated, concentrated, enriched) from a mixture (i.e. blend, pool) of barcode ligation phase products are contacted with lambda exonuclease under conditions resulting in the substantial enrichment of one strand of final barcoded products over the other strand of final barcoded products.
[0127] There are circumstances including, but not limited to, monitoring a single-molecule sequencing instrument performance where it may be advantageous to spike (i.e. add, supplement) polynucleotides of known composition (i.e. sequence, length) into a sequencing library which may comprise final barcoded products.
[0128] Sequencing method enhancements including, but not limited to, dilution, tailing, and denaturing may be applied to any mixture of nucleic acids described herein to improve compatibility with a given single-molecule sequencing instrument. For example, the mixture of nucleic acids may be a sequencing library, wherein the sequencing library may comprise final barcoded products as described herein before it is contacted with a single-molecule sequencing device and sequenced.
[0129] A plurality of sequence reads, wherein each sequence read may comprise an identifiable well-barcode sequence element and an identifiable target-specific segment (i.e. "gene barcode") constitute a population of informative reads. In one embodiment, a well-barcode segment sequence within each informative read is used to identify a specific biological sample from which it was derived. In one embodiment, a gene-barcode segment sequence within each informative read is used to identify a specific genomic locus from which the read was derived and its genotype. In one embodiment, the number of informative reads having an identical gene- barcode sequence determines the abundance of that locus (i.e., for example, also referred to as expression level or copy number).
[0130] Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined in the appended claims. [0131] The present invention will be further illustrated in the following Examples which are given for illustration purposes only and are not intended to limit the invention in any way.
[0132] In the experimental disclosure that follows, the following abbreviations apply: eq. or eqs. (equivalents); M (Molar); μίΜ (micromolar); N (Normal); mol (moles); mmol (millimoles); μίιηοΐ (micromoles); nmol (nanomoles); pmoles (picomoles); g (grams); mg (milligrams); μg (micrograms); ng (nanogram); vol (volume); w/v (weight to volume); v/v (volume to volume); L (liters); ml (milliliters); μΐ (microliters); cm (centimeters); mm (millimeters); μιη (micrometers); nm (nanometers); C (degrees Centigrade); rpm (revolutions per minute); DNA (deoxyribonucleic acid); kdal (kilodaltons).
Example 1
Digestion of and Barcode Ligation to PCR Products
[0133] A synthetic polynucleotide designed to mimic a product of a LDR containing restriction endonuclease recognition and cleavage sites (SEQ ID NO: 1) was subjected to PCR with a pair of primers (SEQ ID NO: 2 and 3) in 15 μΐ reactions containing lOmM tris-HCl pH 8.3, 50 mM KC1, 3 mM MgCb2, 160 μΜ each of dATP, dCTP, dGTP and dTTP, 60 fmol template, 1.5 pmol of each primer, and 0.5 units Taq DNA polymerase (New England Biolabs) with incubations at 95°C for 5 minutes (initial denaturation), followed by 30 cycles of 95°C for 30 seconds (denaturation), 55°C for 30 seconds (annealing), and 72°C for 30 seconds (extension).
[0134] Eight completed PCR reactions were combined and 15 μΐ aliquots dispensed to seven tubes. Five microliters of a buffer containing 20 mM tris-HCl pH 7.9, 100 mM NaCl, 20 mM MgCl2, 2 mM dithiothreitol, 400 μg/ml bovine serum albumin, and either 20 units Dralll, 2 units Hgal (both enzymes from New England Biolabs), or no enzyme was added and the reactions incubated at 37°C for 60 minutes then 65°C for 20 minutes. Addressing fragments with cohesive ends compatible with the ends generated by Dralll or Hgal digestion of the PCR products were constructed by annealing two pairs of synthetic oligonucleotides (SEQ ID NO: 4 and 5; SEQ ID NO: 6 and 7). One nanomole of each oligonucleotide was combined in a 100 μΐ reaction containing 10 mM tris-HCl pH 7.6, 50 mM NaCl, and 1 mM EDTA. Reactions were incubated at 95°C for 5 minutes before being allowed to cool to room temperature. The final concentration of the double-stranded products was 10 μΜ. [0135] Ten microliters of a mixture containing 100 mM tris-HCl pH 7.5, 20 mM MgCl2, 20 mM dithiothreitol, 3 mM adenosine triphosphate, and 20 pmol of the Dralll addressing fragment, 20 pmol of the Hgal addressing fragment, or no addressing fragment, and 400 units T4 DNA ligase (New England Biolabs), or no enzyme was added to each digestion reaction and incubated at 16°C for 60 minutes.
[0136] Ten microliters aliquots of each reaction were resolved on a 10% polyacrylamide /TBE gel (Invitrogen) together with a 25 bp ladder (Invitrogen), and the gel was stained with SYBR Gold (Invitrogen). An image of the gel is provided as Figure 5F.
[0137] Lane 1 shows the expected 90 bp PCR product. A fragment of a size consistent with digestion of the PCR product with Dralll (-64 bp) is seen in lanes 2 and 4, together with some undigested PCR product. Lanes 3 and 5 show fragments consistent with Hgal digestion (-35 bp and -55 bp), as well as some undigested PCR product. Lane 6 indicates successful ligation of the Dralll addressing fragment with a Dralll cleavage fragment to form a 77 bp product. This result demonstrates that restriction endonuclease digestion and subsequent barcode ligation can be accomplished in a crude PCR reaction without prior or intervening purification or clean up, at least for a restriction endonuclease generating a 3' overhang. The ability to perform these two reactions in a simple 'add-only' workflow, and thus amenable to a high-throughput highly- parallel implementation, is essential for a process intended to operate upstream of a massively- parallel detection solution. However, lane 7 provides no evidence for the formation the 70 bp product expected from direct cohesive-end joining of the Hgal addressing fragment with its corresponding Hgal cleavage product. Instead, a complex banding pattern is seen consistent with 'fill-in' of the 5' overhangs generated by this restriction endonuclease by residual polymerase activity in the reaction leading to 5'-phosphorylated blunt ends and the formation of multiple undesired ligation products.
Example 2
Inactivation of Residual Polymerase
[0138] Ten completed PCR reactions were combined and 15 μΐ aliquots dispensed to eight tubes. Four tubes were maintained at room temperature for 60 minutes while the other four tubes were incubated at 100°C for 60 minutes. [0139] Five microliters of a buffer containing 20 mM tris-HCl pH 7.9, 100 mM NaCl, 20 mM MgCl2, 2 mM dithiothreitol, 400 μg/ml bovine serum albumin, and 2 units Hgal (New England Biolabs), or no enzyme was added and the reactions incubated at 37°C for 60 minutes then 65°C for 20 minutes.
[0140] Ten microliters of a mixture containing 100 mM tris-HCl pH 7.5, 20 mM MgCl2, 20 mM dithiothreitol, 3 mM adenosine triphosphate, and 20 pmol of the Hgal addressing fragment, or no addressing fragment, and 400 units T4 DNA ligase (New England Biolabs), or no enzyme was added to each digestion reaction and incubated at 16°C for 60 minutes.
[0141] Ten microliters aliquots of each reaction were resolved on a 10% polyacrylamide /TBE gel (Invitrogen) together with a 25 bp ladder (Invitrogen), and the gel was stained with SYBR Gold (Invitrogen). An image of the gel is provided as Figure 6.
[0142] Lanes 1 and 5 show the expected 90 bp PCR product, and lanes 2, 3, 6, and 7 show fragments consistent with Hgal digestion (-35 bp and -55 bp), as well as some undigested PCR product. Heat inactivation has little discernible effect on the products generated (lanes 1-3 versus lanes 5-7). However, while lane 4 (no heat inactivation) shows the same complex banding pattern seen in Example 1 (lane 7) consistent with fill-in of the 5' overhang generated by Hgal and multiple blunt-end ligations, lane 8 (with heat inactivation) shows a banding pattern consistent with successful and specific ligation of the Hgal addressing fragment with a Hgal cleavage fragment to form a 70 bp product. This result suggests that incubation at 100°C for 60 minutes can destroy residual polymerase activity present in crude PCR reaction mixture, allowing the use of restriction endonucleases producing 5' overhangs in this process.
Example 3
Introduction of Restriction Endonuclease Recognition Sites in LDR/PCR Products using a Mutant Primer A
[0143] In an attempt to introduce restriction endonuclease recognition and cleave site(s) into the primer-specific region of LDR/PCR products generated from a pre-existing library of LDR probe pairs, one of the primers used in the PCR phase was modified. The general formula of the left probes in the library is shown as SEQ ID NO: 8 (20 nt primer-specific portion, 24 nt hybridization tag, 20 nt target- specific portion), and that of the right probes as SEQ ID NO: 9 (5' phosphate, 20 nt target-specific portion, 20 nt primer-specific portion). Conventionally, the top primer (SEQ ID NO: 10) contains the same sequence as the primer-specific portion of the left probes, and the bottom primer (SEQ ID NO: 11) contains a sequence complementary to the primer-specific portion of the right probes. However, since the primer-specific portion of the right probes contains a short stretch (GGGTT) that differs by only two nucleotides from the Hgal recognition site (GCGTC), the use of a bottom primer modified to include these two nucleotide substitutions (SEQ ID NO: 12, "Hga primer") in the PCR phase should result in the presence of a functional Hgal site in the final LDR/PCR products.
[0144] For the LDR phase, total RNA from MCF7 (human breast epithelial cancer) cells (Ambion) was mixed with TCL lysis buffer (Qiagen) to a final concentration of 15 ng/μΐ, and incubated at room temperature for 5 minutes. Twenty-microliter aliquots were dispensed to wells of a TurboCapture384 plate (Qiagen), and incubated at room temperature for 60 minutes before unbound material was removed by inverting the plate onto an absorbent towel and spinning at 500 g for 60 seconds. Five microliters of a reverse-transcription mix containing 50 mM tris-HCl pH 8.3, 75 mM KC1, 3 mM MgCl2, 10 mM dithiothreitol, 125 μΜ of each dNTP, and 40 units M-MLV reverse transcriptase (Promega) was added to each well, and incubated at 37°C for 90 minutes. Wells were emptied by centrifugation, as before, and 5 μΐ of a probe-annealing mix containing 20 mM tris-HCl pH 7.6, 25 mM potassium acetate, 10 mM magnesium acetate, 1 mM beta-nicotinamide adenine dinucleotide, 10 mM dithiothreitol, 0.1% Triton X-100, and 10 fmol of each of 1,400 probe pairs was added. The reactions were incubated at 95°C for 2 minutes, moved to 70°C, and gradually cooled to 40°C over a period of 6 hours. Unannealed probes were removed by centrifugation, as before, and 5 μΐ of a probe-ligation mix containing 20 mM tris- HCl pH7 7.6, 25 mM potassium acetate, 10 mM magnesium acetate, 1 mM beta-nicotinamide adenine dinucleotide, 10 mM dithiothreitol, 0.1% Triton X-100, and 2.5 units of Taq DNA ligase (New England Biolabs) was added to each well. Reactions were incubated at 45°C for 60 minutes followed by 65°C for 10 minutes. Wells were emptied by centrifugation, as before.
[0145] For the PCR phase, 15 μΐ reaction mixtures containing lxHotStar Taq PCR Buffer (Qiagen) supplemented with 850 μΜ MgCl2, 1.5 pmol top primer (SEQ ID NO: 10), 1.5 pmol bottom primer (SEQ ID NO: 11) or modified bottom primer (SEQ ID NO: 12), 160 μΜ of each dNTP, and 0.48 units of HotStar Taq DNA Polymerase (Qiagen) were added to each well and incubated at 92°C for 9 minutes (initial denaturation), followed by 29 cycles of 92°C for 60 seconds (denaturation), 60°C for 60 seconds (annealing), and 72°C for 60 seconds (extension), followed by 72°C for 5 minutes (final extension). [0146] For each primer pair, four completed PCR reactions were combined, and 15 μΐ aliquots were dispensed to three tubes. All aliquots were incubated at 100°C for 60 minutes to inactivate residual polymerase.
[0147] Five microliters of a buffer containing 20 mM tris-HCl pH 7.9, 100 mM NaCl, 20 mM MgCl2, 2 mM dithiothreitol, 400 μg/ml bovine serum albumin, and 2 units Hgal (New England Biolabs) or no enzyme was added and the reactions incubated at 37°C for 60 minutes then 65°C for 20 minutes.
[0148] An addressing fragment ("Z99-Hga") with cohesive ends compatible with the ends generated by Hgal digestion of products of the PCR step using the modified bottom primer was constructed by annealing two synthetic oligonucleotides (SEQ ID NO: 13 and 14) as described for Example 1.
[0149] Ten microliters of a mixture containing 100 mM tris-HCl pH 7.5, 20 mM MgCl2, 20 mM dithiothreitol, 3 mM adenosine triphosphate, 20 pmol of the Z99-Hga addressing fragment or no addressing fragment, and 400 units T4 DNA ligase (New England Biolabs) or no enzyme was added to each digestion reaction and incubated at 16°C for 60 minutes.
[0150] Ten microliters aliquots of each reaction were resolved on a 10% polyacrylamide/TBE gel (Invitrogen) together with a 25 bp ladder (Invitrogen), and the gel was stained with SYBR Gold (Invitrogen). An image of the gel is provided in Figure 7. Lane 1 shows the expected 104 bp PCR products generated with the unmodified bottom primer. No evidence for digestion by Hgal is seen in lanes 2 and 3, consistent with the absence of a restriction endonuclease recognition site for this restriction endonuclease in these products. The 104 bp product expected when using the modified bottom primer is visible in lane 4. However, neither lanes 5 or 6 provide any evidence for digestion of these products by Hgal. A likely explanation is that the restriction endonuclease recognition site is positioned too close to the end of the product to be recognized by the enzyme.
Example 4
[0151] Introduction of Restriction Endonuclease Recognition Sites in LDR/PCR Products using a Mutant Primer B
[0152] The modified bottom primer from Example 3 was lengthened by 10 nucleotides at its 5' end (SEQ ID NO: 15, "long Hga primer") in an attempt to generate LDR/PCR products containing an Hgal recognition site with a longer flanking sequence than the products from Example 3.
[0153] The LDR, PCR, polymerase inactivation, digestion, and ligation steps were performed exactly as for Example 3, except that the modified bottom primer (Hga primer) was substituted with the lengthened and modified bottom primer (long Hga primer). Ten microliters aliquots of each reaction were resolved on a 10% polyacrylamide/TBE gel (Invitrogen) together with a 25 bp ladder (Invitrogen), and the gel was stained with SYBR Gold (Invitrogen). An image of the gel is provided in Figure 8.
[0154] Lanes 1-3 show the expected 104 bp PCR products generated with the unmodified bottom primer and their resistance to digestion by Hgal, exactly as anticipated and seen in Example 3. Lane 4 shows the 114 bp product expected when using the lengthened and modified bottom primer. However, unlike Example 3, fragments consistent with Hgal digestion (~86 bp) are visible in lane 5, together with some undigested product. In addition, lane 6 shows a ~ 99 bp product whose appearance is consistent with successful and specific ligation of the Z99-Hga addressing fragment to the Hgal cleavage fragments of LDR/PCR products containing target- specific sequences.
[0155] These results demonstrate successful introduction of functional restriction endonuclease recognition and cleavage sites in LDR PCR products derived from a library of LDR probe pairs lacking such sites through the use of a modified bottom primer in the PCR phase.
Example 5
Introduction of Restriction Endonuclease Recognition Sites in LDR/PCR Products using a Mutant Primer C
[0156] The primer- specific portion of the right probes described in Example 3 also contains a stretch (CCTTTAGTG) that differs by only two nucleotides from the Dralll recognition site (CACNNNGTG). These two nucleotide substitutions were introduced to the bottom primer (SEQ ID NO: 16, "Dra primer") in an attempt to generate a functional Dralll site in the final LDR/PCR products.
[0157] The LDR and PCR phases were performed as for Example 3 except that the modified bottom primer (Hga primer) was substituted with the Dra primer. Analysis of the LDR/PCR reactions by gel electrophoresis revealed that although the anticipated 104 bp product was clearly visible when the unmodified top and bottom primers were used in the PCR, no products could be seen when the Dra primer was substituted (data not shown). One likely explanation is that the annealing temperature selected for the PCR (60°C) was too high for adequate priming to occur during the initial cycles of the reaction where the Dra primer is required to anneal to a non- perfectly complementary region of the LDR products. This possibility was tested by repeating the LDR and PCR phases using a range of annealing temperatures (50, 52, 54, 56, 58, 60°C) in the PCR. Ten microliters aliquots of each reaction were resolved on a 10% polyacrylamide/TBE gel (Invitrogen) together with a 25 bp ladder (Invitrogen), and the gel was stained with SYBR Gold (Invitrogen).
[0158] As anticipated, PCR products of the expected size (104 bp) were only seen at lower annealing temperatures (50, 52, 54, 56°C) when the Dra primer was used (Figure 9B; compare lanes 2, 4, 6, and 8 with lanes 10 and 12).
[0159] To confirm the introduction of a functional Dralll site in LDR/PCR products the LDR and PCR phases were performed as for Example 3 except that the Dra primer was used in place of the Hga primer, and an annealing temperature of 52°C was used in reactions containing the Dra primer.
[0160] Five microliters of a buffer containing 20 mM tris-HCl pH 7.9, 100 mM NaCl, 20 mM MgCl2, 2 mM dithiothreitol, 400 μg/ml bovine serum albumin, and 20 units Dralll (New England Biolabs) or no enzyme was added and the reactions incubated at 37°C for 60 minutes then 65°C for 20 minutes.
[0161] An addressing fragment ("Z99-Dra-P04") with cohesive ends compatible with the ends generated by Dralll digestion of products of the PCR step using the Dra primer, and 5' phosphorylated on the opposite end was constructed by annealing two synthetic oligonucleotides (SEQ ID NO: 17 and 18) as described for Example 1.
[0162] Ten microliters of a mixture containing 100 mM tris-HCl pH 7.5, 20 mM MgCl2, 20 mM dithiothreitol, 3 mM adenosine triphosphate, 20 pmol of the Z99-Dra-P04 addressing fragment or no addressing fragment, and 400 units T4 DNA ligase (New England Biolabs) or no enzyme was added to each digestion reaction and incubated at 16°C for 60 minutes.
[0163] Ten microliters aliquots of each reaction were resolved on a 10% polyacrylamide/TBE gel (Invitrogen) together with a 25 bp ladder (Invitrogen), and the gel was stained with SYBR Gold (Invitrogen). An image of the gel is provided as Figure 9E. Lane 1 shows the expected 104 bp PCR products generated with the unmodified bottom primer. No evidence for digestion by Dralll is seen in lanes 2 and 3, consistent with the absence of a restriction endonuclease recognition site for this restriction endonuclease in these products. The 104 bp product expected when using the modified bottom primer is visible in lane 4, and lane 5 shows a fragment (-89 bp) consistent with cleavage of this product by Dralll. Lane 6 shows a -101 bp product whose appearance is consistent with successful and specific ligation of the Z99- Dra-P04 addressing fragment to the Dralll cleavage fragments of LDR/PCR products containing target-specific sequences. A small amount of unligated Dralll cleavage fragment (-89 bp) is also visible.
[0164] These results provide another example of successful introduction of functional restriction endonuclease recognition and cleavage sites in LDR/PCR products derived from a library of LDR probe pairs lacking such sites through the use of a modified bottom primer in the PCR phase. In addition, the introduction of sites for a restriction endonuclease generating 3' overhangs obviates the need (Example 4) to inactivate residual polymerase activity. This example also demonstrates the generation of barcoded LDR/PCR products in which one strand is 5 ' phosphorylated.
Example 6
Pooling and Purification of Differ ently-Barcoded LDR/PCR Products
[0165] This example shows how LDR/PCR products generated from different samples are appended with distinguishable barcode sequence elements, and pooled together to form a sequencing library.
[0166] A second addressing fragment ("Z97-Dra-P04") with cohesive ends compatible with the ends generated by Dralll digestion of products of the PCR step using the Dra primer, and 5 ' phosphorylated on the opposite end was constructed by annealing two synthetic oligonucleotides (SEQ ID NO: 19 and 20) as described for Example 1. This addressing fragment has the same structure as Z99-Dra-P04 but differs in the sequence of the barcode element.
[0167] The LDR phase was performed with a library of 1,400 probe pairs as described for Example 3. For the PCR phase, 15 μΐ reaction mixtures containing 1 x HotStar Taq PCR Buffer (Qiagen) supplemented with 850 μΜ MgC12, 1.5 pmol top primer (SEQ ID NO: 10), 1.5 pmol Dra primer (SEQ ID NO: 16), 160 μΜ of each dNTP, and 0.48 units of HotStar Taq DNA Polymerase (Qiagen) were added to each well and incubated at 92°C for 9 minutes (initial denaturation), followed by 29 cycles of 92°C for 60 seconds (denaturation), 52°C for 60 seconds (annealing), and 72°C for 60 seconds (extension), followed by 72°C for 5 minutes (final extension).
[0168] Five microliters of a buffer containing 20 mM tris-HCl pH 7.9, 100 mM NaCl, 20 mM MgC12, 2 mM dithiothreitol, 400 μg/ml bovine serum albumin, and 20 units Dralll (New England Biolabs) was added and the reactions incubated at 37°C for 60 minutes then 65°C for 20 minutes.
[0169] Ten microliters of a mixture containing 100 mM tris-HCl pH 7.5, 20 mM MgC12, 20 mM dithiothreitol, 3 mM adenosine triphosphate, 20 pmol of the Z99-Dra-P04 addressing fragment, and 400 units T4 DNA ligase (New England Biolabs) was added to 10 digestion reactions. Ten microliters of the same mixture in which the Z99-Dra-P04 addressing fragment was replaced by the Z97-Dra-P04 addressing fragment was added to 2 digestion reactions. All reactions were incubated at 16°C for 60 minutes. One pool ("A") was created by combining a 25 μΐ aliquot from each of 9 reactions containing Z99-Dra-P04 with a 25 μΐ aliquot from one reaction containing Z97-Dra-P04. Ten percent of the ligated LDR/PCR products in this pool are thus expected to contain a barcode sequence different from the remainder. Another pool ("B") was created by mixing together all remaining volumes of reactions containing Z99-Dra-P04, all remaining volumes of reactions containing Z97-Dra-P04, and combining 30 μΐ aliquots of each mixture. This pool is thus expected to contain equal amounts of ligated LDR/PCR products containing both barcode sequences.
[0170] Nucleic acids were precipitated from pool A by the addition of 25 μΐ of 3M sodium acetate pH 5.5, 1 μΐ of GlycoBlue (Ambion), and 750 μΐ of 100% ethanol, and incubation on ice for approximately 90 minutes. Pool B was precipitated in similar fashion using 6 μΐ of 3M sodium acetate pH 5.5, 0.5 μΐ ofGlycoBlue, and 175 μΐ of 100%. Precipitates were collected by centrifugation at 16,000 g at 4° C for 30 minutes, washed with 900 μΐ 75% ethanol, collected by spinning as before, briefly air dried, and dissolved in 60 μΐ TE pH 8 (Ambion) or 15 μΐ TE pH 8 for pools A and B, respectively.
[0171] The 5' phosphorylated (bottom) strands of barcoded LDR PCR products from pool B were selectively degraded by adding 2 μΐ of a buffer containing 670 μΜ glycine-KOH pH 9.3, 25 mM MgC12, and 500 μgml bovine serum albumin to the precipitated pool B components (15 μΐ), followed by Ιμΐ of 5 units/μΐ lambda exonuc lease (New England Biolabs), and incubating the mixture at 37°C for 30 minutes then 75°C for 10 minutes.
[0172] Four 15 μΐ aliquots of precipitated pool A components and the full 18 μΐ volume of the lambda exonuclease treated pool B components were resolved on a 10% polyacrylamide/TBE gel (Invitrogen) together with a 25 bp ladder (Invitrogen), and the gel was stained with SYBR Gold (Invitrogen) (Figure 10 A). The -101 bp double-stranded barcoded LDR/PCR products from pool A were excised from the gel, avoiding the faster-migrating ~89 bp unligated cleavage fragments (lanes 1-4), as were the 102 nt single-stranded barcoded LDR/PCR products from pool B which have an apparent size of approximately 150 bp under these non- denaturing conditions (lane 5). The gel slices were crushed, and soaked overnight at 37°C in 300 μΐ (pool A) or 150 μΐ (pool B) of a gel-elution buffer containing 500 mM ammonium acetate, 1 mM ethylenediammetetraacetic acid (EDTA), and 0.1% sodium dodecyl sulfate. Eluted nucleic acids were precipitated by the addition of 800 μΐ 100% ethanol and 1.2 μΐ GlycoBlue (pool A) or 400 μΐ 100% ethanol and 0.6 μΐ GlycoBlue (pool B), and incubation on ice for approximately 90 minutes. Precipitates were collected by centrifugation at 16,000 g at 4° C for 30 minutes, washed with 900 μΐ 75% ethanol, collected by spinning as before, briefly air dried, and dissolved in 26 111 (pool A) or 13 μΐ (pool B) TE pH 8 to form two sequencing libraries ("A" and "B").
[0173] A 3 μΐ aliquot of both sequencing libraries was each mixed with an equal volume of Gel Loading Buffer II (Ambion), heated at 95°C for 5 minutes, and resolved on a 10% polyacrylamide/TBE/urea gel (Invitrogen) together with a mixture of three synthetic oligonucleotides as size markers, and the gel was stained with SYBR Gold (Invitrogen). The image from this denaturing gel (Figure 10B) confirms the libraries to be composed primarily of molecules of the expected sizes (101 and 102 nt).
Example 7
Sequencing of Libraries, and Decoding
[0174] The sequencing libraries constructed in Example 6 were analyzed using a HeliScope single-molecule sequencer (Helicos Biosciences) according to the manufacturer's directions.
[0175] A proportion of the total number of reads generated by this sequencer are recognized as artifactual (i.e. so-called "CUAGs"), and others are prematurely terminated. There is also a not insignificant base-miscall and base-skip rate. [0176] The following filtering process was therefore employed: First, the population of reads containing stretches matching at least 7 of the 8 nucleotides which may comprise the remnants of the Dralll recognition site and primer-specific region of the top strand barcoded LDR/PCR products (TCCACTTA) (see Figure 9D) (henceforth the "invariant region"), and flanked on both sides by at least 5 nucleotides (henceforth "anchored reads") were selected. The sequence stretches immediately 3' to the invariant region are the extreme 5 '-ends of the well-barcode sequences contributed by the addressing fragments. The sequence stretches immediately 5' to the invariant region are the extreme 3 '-ends of the gene-specific sequences contributed by the right LDR probes (Figure 11 A). Second, those anchored reads additionally showing a perfect match to the first five 5' nucleotides of one of the two well-barcode sequence elements used (i.e. Z99: CGTAG, see SEQ ID NO: 17; Z97: TGACG, see SEQ ID NO: 19) immediately 3' to the invariant region (henceforth "Z99 reads" and "Z97 reads") were selected. The number and proportions of reads at each step of this process for both libraries are shown in Table I.
Table 1: Number and Proportion of Different Types of Reads
Figure imgf000046_0001
[0177] The number of anchored reads recorded in library B as a proportion of the total (44%) was almost exactly double that seen in library A (22%), consistent with degradation of the bottom (uninformative) strand during the preparation of the former. This demonstrates the value of this additional step.
[0178] Approximately two thirds of anchored reads from both libraries contained exact matches to the 5 ' end of one of the two well-barcode sequence elements used. A greater proportion could be recovered by allowing some degeneracy in the match.
[0179] The ratio of Z97 reads to Z99 reads observed in library A (0.18: 1) was different from that expected (0.11 : 1). This was also the case with library B (1.82: 1, observed; 1 : 1, expected), with Z97 reads being over-represented in both libraries. [0180] Despite this deviation from the expected proportionality, which is likely due to stochastic differences in the concentration of nucleic acids between products of the barcode ligation phase, these data indicate that strands from one biological sample can easily be distinguished from others in a pool composed of at least ten similar biological samples using this sample-barcoding method.
[0181] The number of Z99 reads and Z97 reads containing the same five-nucleotide sequences immediately 5 ' to the invariant region were counted. These stretches represent the 3 ' ends of gene-specific sequences, and given that all of the barcoded products analyzed here derive from the same biological sample, a comparison of the frequency of these reads in the two populations provides an assessment of: a) the effect of the nucleotide sequence of the well- barcode sequence elements on the estimated abundance of their corresponding LDR/PCR products, and b) whether one sample in a pool of samples of a given complexity can faithfully recapitulate that abundance profile.
[0182] Plots of these counts for each gene-specific stretch reveal a very tight global correlation in both libraries (Figure 11 Band C; r2 values of 0.991 and 0.964; library A and B, respectively). These data are consistent both with the nucleotide sequences of the two well- barcode sequence elements tested having no differential effect on abundance estimates, and abundance estimates being maintained in each sample in a pool of at least ten samples. These counts also show a range of in excess of three orders of magnitude, and that reads with the highest frequencies generally contained the same gene-specific stretches in both libraries. This is consistent with a broad dynamic detection range, and high technical (i.e. sample -to-sample) quantitative reproducibility.
Example 8
Comparison with Serial Multiplex Detection Method
[0183] This example compares the present method with an established serial multiplex-detection solution for estimation of the abundance of LDR/PCR products derived from multiple biological samples.
[0184] The reference method is based upon hybridization of labeled LDR/PCR products to complementary oligonucleotide capture probes immobilized on optically-addressed microspheres, and flow-cytometric detection, decoding and quantification of these microsphere - amplicon complexes (henceforth "FlexMAP method"). The FlexMAP method has been described in detail elsewhere (Peck et al "A method for high-throughput gene expression signatures analysis" Genome Biology 7: R61 2006; co-pending US 61/321,298 and US 61/321,385). It requires introduction of a label to the top strand of LDR/PCR products. This is achieved through the use of a 5' modified top primer in the PCR phase. The LDR/PCR products from each biological sample are analyzed individually, in serial fashion.
[0185] The LDR phase was performed as for Example 3, except that total RNA from HeLa (human cervical adenocarcinoma) cells replaced the MCF7 RNA. For the PCR phase, 15 μΐ reaction mixtures containing lx HotStar Taq PCR Buffer (Qiagen) supplemented with 850M MgCl2, 1.5 pmol top primer (seq 10) or 5' biotinylated top primer (seq 21), 1.5 pmol Dra primer (seq 16), 160 μΜ of each dNTP, and 0.48 units of HotStar Taq DNA Polymerase (Qiagen) were added to each well and incubated at 92°C for 9 minutes (initial denaturation), followed by 29 cycles of 92°C for 60 seconds (denaturation), 52°C for 60 seconds (annealing), and 72°C for 60 seconds (extension), followed by 72°C for 5 minutes (final extension).
[0186] The abundance of 949 LDR/PCR products was estimated in each of six individual wells by the FlexMAP method. Products were generated with the biotinylated top primer. Average abundance estimates across all six samples were calculated. Each measured LDR/PCR product and its abundance were then associated with the 10 nucleotide sequence stretch (henceforth "sequence tag") at the extreme 3' end of the target-specific portion of its corresponding right LDR probe. Only 949 of the 1,398 possible LDR/PCR products were detected because of the limited multiplexing capacity of the FlexMAP method.
[0187] Five microliters of a buffer containing 20 mM tris-HCl pH 7.9, 100 mM NaCl, 20 mM MgCl2, 2 mM dithiothreitol, 400 μg/ml bovine serum albumin, and 20 units Dralll (New England Biolabs) was added to each of sixteen wells containing LDR/PCR products (generated using the unmodified top primer) and the reactions incubated at 37°C for 60 minutes then 65°C for 20 minutes.
[0188] Shorter addressing fragments were used than in previous examples. These addressing fragments are similar in composition and sequence to Z99-Dra-P04 (Example5) and Z97-Dra- P04 (Example 6), but the length of the barcode elements are shortened by 2 nucleotides. These addressing fragments ("Z99S-Dra-P04" and "Z97S-Dra-P04") were each constructed by annealing two synthetic oligonucleotides (seq 22 and 23, seq 24 and 25, respectively) as described in Example 1.
[0189] Ten microliters of a mixture containing 100 mM tris-HCl pH 7.5, 20 mM MgCl2, 20 mM dithiothreitol, 3 mM adenosine triphosphate, 20 pmol of the Z99S-Dra-P04 addressing fragment, and 400 units T4 DNA ligase (New England Biolabs) was added to 8 digestion reactions. Ten microliters of the same mixture in which the Z99S-Dra-P04 addressing fragment was replaced by the Z97S-Dra-P04 addressing fragment was added to 8 digestion reactions. All reactions were incubated at 16°C for 60 minutes.
[0190] All reactions containing Z99S-Dra-P04 were combined, as were all those containing Z97S-Dra-P04. One pool ("X") was created by combining a 180 μΐ aliquot of the former with a 20 μΐ aliquot of the latter. Ten percent of the ligated LDR/PCR products in this pool are thus expected to contain a barcode sequence different from the remainder. Another pool ("Y") was created by mixing together a 5 μΐ aliquot of the Z99S-Dra-P04 products and a 195 μΐ aliquot of the Z97S-Dra-P04 products. Two-point-five percent of the ligated LDR/PCR products in this pool are thus expected to contain a barcode sequence different from the remainder. This models the pooling of forty individual, differently-barcoded samples.
[0191] Nucleic acids were precipitated from both pools by the addition of 20 μΐ of 3M sodium acetate pH 5.5, 0.8 μΐ of GlycoBlue (Ambion), and 600 μΐ of 100% ethanol, and incubation on ice for approximately 90 minutes. Precipitates were collected by centrifugation at 16,000 g at 4°C for 30 minutes, washed with 900 μΐ 75% ethanol, collected by spinning as before, briefly air dried, and dissolved in 60 μΐ TE pH 8 (Ambion) or 45 μΐ TE pH 8 for pools X and Y, respectively.
[0192] The 5' phosphorylated (bottom) strands of barcoded LDR/PCR products from pool Y were selectively degraded by adding 2 μΐ of a buffer containing 670 μΜ glycine-KOH pH 9.3, 25 mM MgCl2, and 500 l^g/ml bovine serum albumin to each of three 15 μΐ aliquots of the precipitated pool Y components, followed by 1 μΐ of 5 units/μΐ lambda exonuclease (New England Biolabs), and incubating the mixtures at 37°C for 30 minutes then 75°C for 10 minutes. Precipitated pool X components (4 x 15 μΐ aliquots) and lambda exonuclease treated pool Y components (3 x 18 μΐ reactions) were resolved on a 10%> polyacrylamide/TBE gel (Invitrogen) together with a 25 bp ladder (Invitrogen), and the gel was stained with SYBR Gold (Invitrogen). The 99 bp double-stranded barcoded LDR/PCR products from pool X were excised from the gel, as were the 100 nt single-stranded barcoded LDR PCR products from pool Y which have an apparent size of approximately 150 bp under these non-denaturing conditions.
[0193] The gel slices were crushed, and soaked overnight at 37°C in 300 μΐ (pool X) or 150 μΐ (pool Y) of a gel-elution buffer containing 500 mM ammonium acetate, 1 mM ethylenediaminetetraacetic acid (EDTA), and 0.1% sodium dodecyl sulfate. Eluted nucleic acids were precipitated by the addition of 800 μΐ 100% ethanol and 1.2 μΐ GlycoBlue (pool X) or 400 μΐ 100% ethanol and 0.6 μΐ GlycoBlue (pool Y), and incubation on ice for approximately 90 minutes. Precipitates were collected by centrifugation at 16,000 g at 4°C for 30 minutes, washed with 900 μΐ 75% ethanol, collected by spinning as before, briefly air dried, and dissolved in 26 μΐ (pool X) or 13 μΐ (pool Y) TE pH 8 to form two sequencing libraries ("X" and "Y").
[0194] The sequencing libraries were each analyzed in one channel of a HeliScope single- molecule sequencer (Helicos Biosciences) according to the manufacturer's directions. A proportion of the total number of reads generated by this sequencer are recognized as artifactual (i.e. so-called "CUAGs"), and others are prematurely terminated. There is also a not insignificant base-miscall and base-skip rate.
[0195] The following filtering process was therefore employed: First, the population of reads containing stretches perfectly matching the 8 nucleotides which may comprise the remnants of the Dralll recognition site and primer-specific region of the top strand barcoded LDR/PCR products (TCCACTTA) (see Figure 9C) ("invariant region"), and flanked on the 3' side by at least 5 nucleotides and on the 5' side by at least 10 nucleotides (henceforth "long anchored reads") were selected (Figure 12A). The nucleotides immediately 3' to the invariant region are at the extreme 5 '-ends of the well-barcode sequences contributed by the addressing fragments. The nucleotides immediately 5' to the invariant region are the extreme 3 '-ends of the target- specific sequences (e.g., gene-barcode) contributed by the right LDR probes. Second, those long anchored reads additionally showing a perfect match to the first five 5 ' nucleotides of one of the two well-barcode sequence elements used (i.e. Z99S: CGTAG, see seq 22; Z97S: TGACG, see seq 24) immediately 3 ' to the invariant region (henceforth "Z99S reads" and "Z97S reads") were selected. Third, those Z99S reads and Z97S reads additionally showing a perfect match to one of 1,378 sequence tags immediately 5' to the invariant region (henceforth "informative reads") were selected. A total of 1,398 LDR/PCR products are possible with the probe-pair library used. However, two LDR/PCR products have the identical sequence tag and cannot be distinguished. Reads with this sequence tag were not selected. An additional 18 LDR/PCR products contain an internal Dralll site. These will be digested at these sites during the RED phase, and thereby lost from the product pool. The sequence tags for these products were not included in the set of those sought. This mode of failure is predictable, and alternative probe pairs for these targets can be designed and substituted easily. The number and proportions of reads at each step of this process for both libraries are shown in Table II.
Table II: Number and Proportion of Different Types of Reads
Figure imgf000051_0001
[0196] The number of anchored reads recorded in library Y as a proportion of the total (25%) was slightly less than double that seen in library X (15.2%), consistent with degradation of the bottom (uninformative) strand during the preparation of the former. A similar proportionality was seen in Example 7, although the numbers of anchored reads as a proportion of the total were higher there. This is attributable to the degeneracy allowed in the definition of anchored reads in that case. Approximately three quarters of anchored reads from both libraries contained exact matches to the 5 ' end of one of the two well-barcode sequence elements used. This is also similar to Example 7. Approximately 55% of anchored reads from library X are informative reads. This is 8.3% of the total number of reads. Only approximately 36% of anchored reads from library Y are informative reads. This is 8.9% of the total. This difference appears to be explained by the relatively low proportion of Z97S reads that are informative (47.5%) in library Y. This proportion in approximately two thirds in library X, and for Z99S reads in both libraries. The reason for this difference is unknown. [0197] The ratio of all informative Z97S reads to all informative Z99S reads in library X (0.56: 1) was different than expected (0.11 : 1). This was also the case with library Y (34: 1, observed; 39: 1, expected), although the deviation was much more modest and in the opposite direction. Notwithstanding, these data indicate that strands containing target identities from one biological sample can easily be distinguished from others in a pool composed of many biological samples using the present sample-barcoding method. The numbers of informative Z99S reads and informative Z97S reads containing each of the 1,378 sequence tags were counted in both libraries. Plots of these counts reveal a very tight global correlation in library X (Figure 12B; r value of 0.961), much as was seen in Example 7. The global correlation in library Y was slightly poorer (Figure 12C; r value of 0.916). This is attributable to the total number of informative Z99S reads being too low to provide sufficient resolution in the estimation of abundance of low copy-number products. Nevertheless, these data reveal that faithful relative abundance estimates for in excess of 1 ,000 loci can be made for more than one sample in parallel using the methods of the present invention. The data also indicate that faithful abundance estimates can be made for one sample in a pool composed of the equivalent of approximately forty samples of this same complexity using these methods. These counts also show a range of in excess of three orders of magnitude, and that the reads with the highest frequencies amongst all informative reads in both libraries generally contained the same sequence tags. This is consistent with a broad dynamic detection range, and high technical (i.e. sample -to-sample and pool-to-pool) quantitative reproducibility.
[0198] Abundance estimates made by the methods of the present invention were compared with those produced by the FlexMAP method in the space of the 935 LDR/PCR products for which measures were available from both platforms. The total number of informative reads containing one of the 935 sequence tags is shown in Table III. Plots of abundance as measured by the FlexMAP method against the counts of informative reads from library X containing the Z99S well-barcode sequence element (Figure 12D; r value of 0.848) or the Z97S well-barcode sequence element (Figure 12E; r value of 0.785) reveal high global correlation. This degree of correlation was also seen with counts of informative reads from library Y containing the Z97S well-barcode sequence element (Figure 12F; r value of 0.801). However, the correlation was markedly poorer when considering informative reads containing the Z99S well-barcode sequence element (Figure 12G; r value of 0.783). This is attributable to the total number of informative Z99S reads being too low to provide sufficient resolution in the estimation of abundance of low copy-number products. Nevertheless, these data reveal that abundance estimates made by the methods of the present invention are generally highly correlated with abundance estimates made by an established serial multiplex detection method. They also demonstrate that abundance estimates for almost 1,000 loci that are highly correlated with estimates made by an established serial multiplex detection method can be made for more than one sample in parallel using the methods of the present invention.
[0199] The data also indicate that abundance estimates for almost 1,000 loci that are well correlated with estimates made by an established serial multiplex detection method can be made for one sample in a pool composed of the equivalent of approximately forty samples of this same complexity using these methods.
Table III: Number of Different Types of Reads Used in the Comparison
Figure imgf000053_0001
[0200] The number of informative Z99S reads from library Y used in the comparison (56,986) equates to approximately 60 reads per locus on average. This is too few to provide accurate estimates of the abundance of low copy-number products in a pool of products with a range of abundance of approximately three orders of magnitude (see Figure 12G). Reasonable resolution of the abundance of low-copy number products is achieved with an average number of reads per locus of approximately 890 (Z97S reads from library X, see Figure 12E). These data indicate that an average of approximately 1,000 reads per locus is required to provide high resolution estimates of the abundance of all products from a pool of products that span a range of abundance of approximately three orders of magnitude.
[0201] The implication of this finding is that approximately one million reads are required to accurately estimate the abundance of approximately one thousand loci with a range of abundance of approximately three orders of magnitude. The number of samples that can therefore be assayed in this fashion in parallel using the methods of the present invention is dictated by the number of informative reads that can be generated in a single channel of a given single-molecule sequencer. Only approximately 4 million informative reads were generated in each library in the present example. However, this represents less than 10% of the total number of reads (see Table II). Many uninformative reads could likely be rendered informative by allowing some degeneracy in the definition of sequence matches. The error rate, and proportion of artifactual or truncated reads generated by this sequencer is also likely to decrease, further increasing the number of informative reads. However, newer more accurate sequencers, producing greater numbers of reads per channel, will likely be required to realize the full potential of the present invention.
* * *
[0202] Having thus described in detail preferred embodiments of the present invention, it is to be understood that the invention defined by the above paragraphs is not to be limited to particular details set forth in the above description as many apparent variations thereof are possible without departing from the spirit or scope of the present invention.

Claims

WHAT IS CLAIMED IS:
1. An improved method for determining the presence and optionally the abundance of a target nucleotide sequence by performing a multiplex ligase detection/polymerase chain reaction on a biological sample comprising a plurality of target nucleotide sequences with a probe pair, wherein each probe of said probe pair comprises a primer-binding site and a partial gene-barcode sequence, such that said partial gene -barcode sequence of said each probe hybridizes with at least a portion of said target nucleotide sequence, wherein said partial gene- barcode sequences of said each probe of said probe pair ligate together to form a gene-barcode nucleic acid, said improvement comprising:
a) providing;
i) a primer sequence comprising a template for at least one restriction enzyme site, wherein said primer sequence hybridizes to a terminal end or said primer-binding site of said gene-barcode nucleic acid; and
ii) an addressing fragment comprising a well-barcode sequence, a blunt terminus and a first overhang terminus;
iii) a restriction enzyme recognizing said at least one restriction enzyme site and capable of creating a second overhang terminus;
b) amplifying said gene -barcode nucleic acid with said primer thereby incorporating said at least one restriction enzyme site into a gene-barcode amplicon;
c) cleaving said gene -barcode amplicon with said restriction enzyme to create a gene-barcode fragment;
d) ligating said addressing fragment with said gene-barcode fragment to create a final barcoded product.
e) sequencing said final barcoded product, wherein at least a partial sequence of said gene -barcode and said well-barcode contained therein are obtained, wherein said target nucleotide sequence is identified, and its presence and optionally its abundance in said biological sample is determined.
2. The method of Claim 1, wherein said ligating is performed by a ligase enzyme selected from the group consisting of a Thermus aquaticus ligase, a Thermus thermophilus ligase, an E. coli ligase, T4 ligase, and a Pyrococcus ligase.
3. The method of Claim 1, wherein said terminal end of said gene-barcode nucleic acid is a 3 ' terminal end.
4. The method of Claim 1, wherein said addressing fragment is a synthetic nucleic acid.
5. The method of Claim 1, wherein said well-barcode identifies a biological sample source of said target nucleotide sequence.
6. The method of Claim 1, wherein said gene-barcode nucleic acid comprises a complementary sequence to said target nucleotide sequence.
7. The method of Claim 1, where said final barcoded product comprises a well- barcode and a gene-barcode.
8. The method of Claim 7, wherein said well-barcode and said gene barcode are adjacent.
9. The method of Claim 8, wherein said adjacent well-barcode and gene-barcode are separated by at least three nucleotides.
10. The method of Claim 1, wherein said restriction enzyme is an endonuclease.
11. The method of Claim 10, wherein said endonuclease is selected from the group consisting of a Dralll endonuclease and a Hgal endonuclease.
12. The method of Claim 1, wherein said gene-barcode fragment comprises an overhang terminus.
13. The method of Claim 1 , wherein said primer sequences have imperfect complementarity with at least one of said primer-binding sites in said gene-barcode nucleic acid.
14. The method of Claim 1, wherein said ligating joins said first overhang terminus and said second overhang terminus.
15. An improved method for identifying one or more target nucleic acid molecules within a plurality of target nucleic acid molecules from a reaction mixture that comprises a ligase, one or more target nucleic acid molecules, and one or more oligonucleotide probe sets, each of said probe sets including:
i) a first oligonucleotide comprising: (a) a first target- specific portion capable of hybridizing to a corresponding target nucleic acid molecule, and (b) a first primer-specific portion; and ii) a second oligonucleotide comprising: (a) a second target-specific portion capable of hybridizing to said corresponding target nucleic acid molecule, and (b) a second primer-specific portion;
by producing one or more ligation products comprising said first and second oligonucleotides after said first and said second target-specific portions of said oligonucleotides are hybridized to said corresponding target nucleic acid molecule and are ligated together, wherein each of said one or more ligation products comprises a ligated sequence which includes:
iii) said first target-specific portion of said first oligonucleotide, and said first primer-specific portion of said first oligonucleotide in a corresponding probe set and iv) said second target-specific portion of the second oligonucleotide, and said second primer-specific portion of said second oligonucleotide in said corresponding probe set;
and subjecting said one or more ligation products to one or more polymerase chain reaction cycles to produce one or more amplified ligation products, said improvement comprising:
a) incorporating a restriction enzyme site into the 5' end or the 3' end of said amplified ligation products, to create a modified ligation product;
b) contacting said modified ligation products with a restriction enzyme to produce a digested ligation product with at least one preferably cohesive end, and
c) appending an addressing fragment to said preferably cohesive end of said digested ligation product to produce a final barcoded product, and
d) sequencing said final barcoded product, wherein at least a partial sequence of said target-specific portion is obtained, and at least a partial sequence of said addressing fragment is obtained, wherein said target nucleic acid molecule is identified.
16. The method of Claim 15, wherein said first oligonucleotide primer-specific portion comprises a template for a first restriction enzyme site.
17. The method of Claim 15, wherein said second oligonucleotide primer-specific portion comprises a template for a second restriction enzyme site.
18. The method of Claim 15, wherein said primer-specific portions of said first and second oligonucleotide probes sets are universal primer binding sites.
19. The method of Claim 15, wherein said final barcoded product identifies one locus on said target nucleic acid molecule.
20. The method of Claim 15, wherein each probe of said each probe set is provided for each target locus identified in said target nucleic acid molecules.
21. The method of Claim 15, wherein the 5' end of said second-target specific portion is phosphorylated.
22. The method of Claim 15, wherein each probe of said probe set forms a ligation product when the 3 ' end of said first target-specific portion and the 5 ' end of said second target- specific portion hybridize to adjacent nucleotides of said target nucleic acid molecule.
23. The method of Claim 15, wherein said incorporating is performed by polymerase chain reaction with a primer set comprising a first primer with a nucleotide sequence that is substantially identical to said 5' primer-specific portion sequence and a second primer with a nucleotide sequence that is substantially complementary to the 3 ' primer-specific portion sequence of said ligation products.
24. The method of Claim 23, wherein said first and second primers are universal pnmer s.
25. The method of Claim 15, wherein said incorporating is performed by polymerase chain reaction with a primer set comprising a first primer with a nucleotide sequence that comprises one or more mismatches from said 5' primer-specific portion sequence, and or a second primer with a nucleotide sequence that comprises one or more mismatches from the complement of the 3' primer-specific portion sequence of said ligation products.
26. The method of Claim 25, wherein said first and or said second primers comprise a template for at least one restriction enzyme recognition site.
27. The method of Claim 26, where said first and said second primers are universal pnmer s.
28. The method of Claim 15, wherein said appending is performed by a ligase enzyme.
29. The method of Claim 15, wherein said restriction enzyme site is located within said 3' primer-specific portion or said 5' primer-specific portion of said modified ligation product.
30. The method of Claim 15, wherein said restriction enzyme sites comprise a cleavage site that abuts said target-specific portion of said modified ligation products.
31. The method of Claim 15, wherein said restriction enzyme site is flanked by 3 or more nucleotides.
32. The method of Claim 15, wherein said restriction enzyme generates an overhang.
33. The method of Claim 31 , wherein said overhang is at least three nucleotides.
34. The method of Claim 31 , wherein said overhang is a 3 ' overhang.
35. The method of Claim 31 , wherein said overhang is a 5 ' overhang.
36. The method of Claim 31, wherein said restriction enzyme generates an asymmetric end.
37. The method of Claim 31, wherein said addressing fragment comprises a well- barcode sequence element.
38. The method of Claim 36, wherein said well-barcode sequence element identifies the biological source of said target nucleotide sequence.
39. The method of Claim 15, wherein said final barcoded product includes at least one gene-barcode element that is less than 20 nucleotides.
40. The method of Claim 15, wherein said final barcoded product includes a single well-barcode.
41. The method of Claim 39, wherein said well-barcode element is located at the 3' end of said final barcoded product.
42. The method of Claim 39, wherein said well-barcode element is located at the 5' end of said final barcoded product.
43. The method of Claim 15, wherein said addressing fragment is a synthetic nucleic acid.
44. The method of Claim 15, wherein said addressing fragment has an asymmetric end.
45. The method of Claim 15, further comprising the step of counting said final barcoded products comprising identical gene -barcodes and identical well-barcodes to determine locus-specific abundance in a biological sample.
46. The method of Claim 15, further comprising the step of combining said final barcoded products having different well-barcodes before step (d).
47. The method of Claim 45, further comprising the step of counting said final barcoded products comprising identical gene -barcodes and identical well-barcodes to determine locus-specific abundance in a plurality of different biological samples.
48. The method of Claim 15, wherein said sequencing is performed on a single-molecule sequencing instrument.
49. The method of Claim 15, wherein said sequencing is massively-parallel.
50. The method of Claim 15, wherein said gene -barcode segment identifies a specific locus of said target nucleic acid molecule.
PCT/US2013/046522 2012-06-21 2013-06-19 Massively-parallel multiplex locus-specific nucleic acid sequence analysis WO2013192292A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261662578P 2012-06-21 2012-06-21
US61/662,578 2012-06-21

Publications (1)

Publication Number Publication Date
WO2013192292A1 true WO2013192292A1 (en) 2013-12-27

Family

ID=48790579

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/046522 WO2013192292A1 (en) 2012-06-21 2013-06-19 Massively-parallel multiplex locus-specific nucleic acid sequence analysis

Country Status (1)

Country Link
WO (1) WO2013192292A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016018986A1 (en) * 2014-08-01 2016-02-04 Ariosa Diagnostics, Inc. Detection of target nucleic acids using hybridization
CN105803055A (en) * 2014-12-31 2016-07-27 天昊生物医药科技(苏州)有限公司 New target gene regional enrichment method based on multiple circulation extension connection
US9567639B2 (en) 2010-08-06 2017-02-14 Ariosa Diagnostics, Inc. Detection of target nucleic acids using hybridization
US9797001B2 (en) 2013-04-17 2017-10-24 Pioneer Hi-Bred International, Inc. Methods for characterizing a target DNA sequence composition in a plant genome
WO2018127408A1 (en) * 2017-01-05 2018-07-12 Tervisetehnoloogiate Arenduskeskus As Quantifying dna sequences
US10131951B2 (en) 2010-08-06 2018-11-20 Ariosa Diagnostics, Inc. Assay systems for genetic analysis
US10308981B2 (en) 2010-08-06 2019-06-04 Ariosa Diagnostics, Inc. Assay systems for determination of source contribution in a sample
WO2019165318A1 (en) * 2018-02-22 2019-08-29 10X Genomics, Inc. Ligation mediated analysis of nucleic acids
US10533223B2 (en) 2010-08-06 2020-01-14 Ariosa Diagnostics, Inc. Detection of target nucleic acids using hybridization
US10718019B2 (en) 2011-01-25 2020-07-21 Ariosa Diagnostics, Inc. Risk calculation for evaluation of fetal aneuploidy
US11203786B2 (en) 2010-08-06 2021-12-21 Ariosa Diagnostics, Inc. Detection of target nucleic acids using hybridization
US11639928B2 (en) 2018-02-22 2023-05-02 10X Genomics, Inc. Methods and systems for characterizing analytes from individual cells or cell populations
WO2024006392A1 (en) * 2022-06-29 2024-01-04 10X Genomics, Inc. Probe-based analysis of nucleic acids and proteins
US11952626B2 (en) 2021-02-23 2024-04-09 10X Genomics, Inc. Probe-based analysis of nucleic acids and proteins

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4683202A (en) 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
US4683195A (en) 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4965188A (en) 1986-08-22 1990-10-23 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme
US6027889A (en) 1996-05-29 2000-02-22 Cornell Research Foundation, Inc. Detection of nucleic acid sequence differences using coupled ligase detection and polymerase chain reactions
WO2003033722A2 (en) * 2001-10-15 2003-04-24 Mount Sinai School Of Medicine Of New York University Nucleic acid amplification methods
WO2005068656A1 (en) * 2004-01-12 2005-07-28 Solexa Limited Nucleic acid characterisation
WO2007100243A1 (en) * 2006-03-01 2007-09-07 Keygene N.V. High throughput sequence-based detection of snps using ligation assays
US7582420B2 (en) 2001-07-12 2009-09-01 Illumina, Inc. Multiplex nucleic acid reactions
WO2013009175A1 (en) * 2011-07-08 2013-01-17 Keygene N.V. Sequence based genotyping based on oligonucleotide ligation assays

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4683202A (en) 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
US4683202B1 (en) 1985-03-28 1990-11-27 Cetus Corp
US4683195A (en) 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4683195B1 (en) 1986-01-30 1990-11-27 Cetus Corp
US4965188A (en) 1986-08-22 1990-10-23 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme
US6027889A (en) 1996-05-29 2000-02-22 Cornell Research Foundation, Inc. Detection of nucleic acid sequence differences using coupled ligase detection and polymerase chain reactions
US7429453B2 (en) 1996-05-29 2008-09-30 Cornell Research Foundation, Inc. Detection of nucleic acid sequence differences using coupled ligase detection and polymerase chain reactions
US7582420B2 (en) 2001-07-12 2009-09-01 Illumina, Inc. Multiplex nucleic acid reactions
WO2003033722A2 (en) * 2001-10-15 2003-04-24 Mount Sinai School Of Medicine Of New York University Nucleic acid amplification methods
WO2005068656A1 (en) * 2004-01-12 2005-07-28 Solexa Limited Nucleic acid characterisation
WO2007100243A1 (en) * 2006-03-01 2007-09-07 Keygene N.V. High throughput sequence-based detection of snps using ligation assays
WO2013009175A1 (en) * 2011-07-08 2013-01-17 Keygene N.V. Sequence based genotyping based on oligonucleotide ligation assays

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
BARANY ET AL., GENE, vol. 109, 1991, pages 1 - 11
BLANCK ET AL.: "Activity of Restriction Enzymes in a PCR Mix", BIOCHEMICA, vol. 3, 1997, pages 25
CROWE ET AL.: "Improved Cloning Efficiency of Polymerase Chain Reaction (PCR) Products after Proteinase K Digestion", NUCLEIC ACIDS RESEARCH, vol. 19, 1991, pages 184
HIGUCHI; OCHMAN: "Production of single-stranded DNA template by exonuclease digestion following polymerase chain reaction", NUCLEIC ACIDS RESEARCH, vol. 17, 1989, pages 5865
PECK DAVID ET AL: "A method for high-throughput gene expression signature analysis", GENOME BIOLOGY, BIOMED CENTRAL LTD., LONDON, GB, vol. 7, no. 7, 19 July 2006 (2006-07-19), pages R61, XP021021264, ISSN: 1465-6906, DOI: 10.1186/GB-2006-7-7-R61 *
PECK ET AL.: "A method for high- throughput gene expression signatures analysis", GENOME BIOLOGY, vol. 7, 2006, pages R61
PECK ET AL.: "A method for high-throughput gene expression signatures analysis", GENOME BIOLOGY, vol. 7, 2006, pages R61
TURBETT; SELLNER: "Digestion ofPCR and RT-PCR Products with Restriction Endonucleases Without Prior Purification or Precipitation", PROMEGA NOTES MAGAZINE, vol. 60, 1996, pages 23 - 26

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10533223B2 (en) 2010-08-06 2020-01-14 Ariosa Diagnostics, Inc. Detection of target nucleic acids using hybridization
US11203786B2 (en) 2010-08-06 2021-12-21 Ariosa Diagnostics, Inc. Detection of target nucleic acids using hybridization
US9567639B2 (en) 2010-08-06 2017-02-14 Ariosa Diagnostics, Inc. Detection of target nucleic acids using hybridization
US10131951B2 (en) 2010-08-06 2018-11-20 Ariosa Diagnostics, Inc. Assay systems for genetic analysis
US10308981B2 (en) 2010-08-06 2019-06-04 Ariosa Diagnostics, Inc. Assay systems for determination of source contribution in a sample
US10718024B2 (en) 2011-01-25 2020-07-21 Ariosa Diagnostics, Inc. Risk calculation for evaluation of fetal aneuploidy
US10718019B2 (en) 2011-01-25 2020-07-21 Ariosa Diagnostics, Inc. Risk calculation for evaluation of fetal aneuploidy
US10941436B2 (en) 2013-04-17 2021-03-09 Pioneer Hi-Bred International, Inc. Methods for characterizing DNA sequence composition in a genome
US10487352B2 (en) 2013-04-17 2019-11-26 Pioneer Hi-Bred International, Inc. Methods for characterizing DNA sequence composition in a genome
US9797001B2 (en) 2013-04-17 2017-10-24 Pioneer Hi-Bred International, Inc. Methods for characterizing a target DNA sequence composition in a plant genome
US11702685B2 (en) 2013-04-17 2023-07-18 Pioneer Hi-Bred International, Inc. Methods for characterizing DNA sequence composition in a genome
WO2016018986A1 (en) * 2014-08-01 2016-02-04 Ariosa Diagnostics, Inc. Detection of target nucleic acids using hybridization
CN105803055A (en) * 2014-12-31 2016-07-27 天昊生物医药科技(苏州)有限公司 New target gene regional enrichment method based on multiple circulation extension connection
WO2018127408A1 (en) * 2017-01-05 2018-07-12 Tervisetehnoloogiate Arenduskeskus As Quantifying dna sequences
WO2019165318A1 (en) * 2018-02-22 2019-08-29 10X Genomics, Inc. Ligation mediated analysis of nucleic acids
US11639928B2 (en) 2018-02-22 2023-05-02 10X Genomics, Inc. Methods and systems for characterizing analytes from individual cells or cell populations
US11852628B2 (en) 2018-02-22 2023-12-26 10X Genomics, Inc. Methods and systems for characterizing analytes from individual cells or cell populations
US11952626B2 (en) 2021-02-23 2024-04-09 10X Genomics, Inc. Probe-based analysis of nucleic acids and proteins
WO2024006392A1 (en) * 2022-06-29 2024-01-04 10X Genomics, Inc. Probe-based analysis of nucleic acids and proteins

Similar Documents

Publication Publication Date Title
WO2013192292A1 (en) Massively-parallel multiplex locus-specific nucleic acid sequence analysis
JP5986572B2 (en) Direct capture, amplification, and sequencing of target DNA using immobilized primers
US9745614B2 (en) Reduced representation bisulfite sequencing with diversity adaptors
JP6525473B2 (en) Compositions and methods for identifying replicate sequencing leads
DK1991698T3 (en) "High-throughput" -sekvensbaseret detection of SNPs using ligeringsassays
EP3538662B1 (en) Methods of producing amplified double stranded deoxyribonucleic acids and compositions and kits for use therein
EP3077540B1 (en) Nucleic acid probe for detecting genomic fragments
KR102592367B1 (en) Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications
KR20110106922A (en) Single-cell nucleic acid analysis
US20220389408A1 (en) Methods and compositions for phased sequencing
WO2018108328A1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
EP3749782B1 (en) Generation of single-stranded circular dna templates for single molecule sequencing
JP2002517981A (en) Methods for detecting nucleic acid sequences
CN117242190A (en) Amplification of Single-stranded DNA
US11174511B2 (en) Methods and compositions for selecting and amplifying DNA targets in a single reaction mixture
WO2004053159A2 (en) Oligonucleotide guided analysis of gene expression
CA3142010A1 (en) Flexible and high-throughput sequencing of targeted genomic regions
US20220136042A1 (en) Improved nucleic acid target enrichment and related methods
JP2024035110A (en) Sensitive method for accurate parallel quantification of mutant nucleic acids
WO2023012195A1 (en) Method
WO2013140339A1 (en) Positive control for pcr

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13737011

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13737011

Country of ref document: EP

Kind code of ref document: A1