WO2009052214A2 - Analyse de séquence à l'aide d'acides nucléiques décorés - Google Patents

Analyse de séquence à l'aide d'acides nucléiques décorés Download PDF

Info

Publication number
WO2009052214A2
WO2009052214A2 PCT/US2008/080045 US2008080045W WO2009052214A2 WO 2009052214 A2 WO2009052214 A2 WO 2009052214A2 US 2008080045 W US2008080045 W US 2008080045W WO 2009052214 A2 WO2009052214 A2 WO 2009052214A2
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acids
probes
decorated
nucleic acid
sequence
Prior art date
Application number
PCT/US2008/080045
Other languages
English (en)
Other versions
WO2009052214A3 (fr
Inventor
Radoje Drmanac
Snezana Drmanac
Original Assignee
Complete Genomics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Complete Genomics, Inc. filed Critical Complete Genomics, Inc.
Publication of WO2009052214A2 publication Critical patent/WO2009052214A2/fr
Publication of WO2009052214A3 publication Critical patent/WO2009052214A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means
    • C12Q1/6825Nucleic acid detection involving sensors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation

Definitions

  • the present invention provides a sequence interrogation chemistry that combines the accuracy and haplotype integrity of long-read sequencing with improved methods of preparing genomic nucleic acids and analyzing sequence information generated from those nucleic acids.
  • the present invention provides a composition that comprises a substrate comprising a plurality of locations. Each location of the substrate comprises a single molecule of stretched decorated nucleic acids. Each of the stretched nucleic acids comprises a plurality of probes, and the stretched decorated nucleic acids are positioned on the substrate in such a way that they are optically resolvable.
  • stretched decorated nucleic acids of the invention are formed by: (i) nicking a nucleic acid to form a nicked nucleic acid, (ii) adding an exonuclease to the nicked nucleic acid to form a gapped nucleic acid, and (iii) adding a first set of labeled probes to the gapped nucleic acid such that at least one of the first set of labeled probes hybridizes to single stranded areas of said gapped nucleic acid.
  • the first set of probes comprises a plurality of non-overlapping probe sequences.
  • each probe sequence comprises a unique label.
  • steps (i) through (iii) are performed simultaneously or are performed sequentially.
  • stretched decorated nucleic acids of the invention are formed by (i) providing a double stranded nucleic acid; (ii) adding a first set of recA invasive labeled probes to the double stranded nucleic acid to form D-loops within the double stranded nucleic acid, thus forming a decorated nucleic acid; and (iii) stretching the decorated nucleic acid to form a stretched decorated nucleic acid.
  • the recA invasive labeled probes comprise a plurality of non-overlapping probe sequence and each probe sequence comprises a unique label. Such probes hybridize to sequences in the double stranded nucleic acid that are complementary to the probe sequences.
  • the present invention provides methods for detecting the presence of a target nucleic acid in a sample.
  • a substrate comprising stretched decorated nucleic acids of the invention is provided.
  • the stretched decorated nucleic acids of the invention will generally comprise a plurality of labeled probes. The order of the labeled probes on the stretched decorated nucleic is determined, and that order thereby indicates the presence of the target nucleic acid.
  • stretched decorated nucleic acids of the invention are used to obtain sequence information from a target nucleic acid.
  • a substrate comprising stretched decorated nucleic acids of the invention is provided.
  • the stretched decorated nucleic acids of the invention will generally comprise a plurality of labeled probes.
  • the order of the labeled probes on the stretched decorated nucleic is determined, and that order thereby provides sequence information for the target nucleic acid.
  • Figure 1 is a schematic illustration of an exemplary embodiment of the invention for decorating nucleic acids.
  • Genomic nucleic acid is fragmented and optionally amplified to produce the fragments illustrated in Figure 1A. These fragments are nicked with a nicking enzyme in Figure 1 B and then gaps are formed at one or more of those nicks through treatment with an exonuclease to produce the gapped nucleic acids pictured in Figure 1C.
  • Labeled probes either a single probe or different probes, are hybridized to the gapped nucleic acids to form the decorated nucleic acids in Figure 1 D.
  • the decorated nucleic acids may then optionally be repaired by filling in the gaps by treatment with a polymerase and nucleotides and/or optionally by treatment with ligase to produce the decorated nucleic acids in Figure 1 E.
  • Figure 2 is a schematic illustration of an exemplary embodiment of the invention in which decorated nucleic acids (Figure 2A) are stretched on a substrate in Figure 2B.
  • the substrate can comprise nanochannels ( Figure 2C) or a patterned substrate comprising linear features ( Figure 2D).
  • the order of labels can be detected ( Figure 2E) and then assembled to generate a whole or partial sequence ( Figure 2F) of a target nucleic acid (such as a chromosome).
  • Figure 3 is a schematic illustration of an exemplary embodiment for forming decorated nucleic acids and stretching them on a substrate.
  • Genomic nucleic acid is fragmented and optionally amplified to form double stranded fragments ( Figure 3A).
  • Invasive probes either single invasive probes ( Figure 3B) or invasive probe sets (Figure 3C) are applied to form D-loops ( Figure 3B) or double D-loops ( Figure 3C).
  • the decorated nucleic acids are stretched on substrates comprising nanochannels ( Figure 3E) or on a patterned substrate comprising linear features ( Figure 3F).
  • the invasive probes within the D-loops or double D-loops are extended using a polymerase to further stabilize the structure of the decorated nucleic acid and prevent the D-loop or double D-loop from destabilizing the probes and causing them to detach from the nucleic acids.
  • Figures 4A-E are schematic illustrations of different configurations in which two labeled probes can hybridize to the same gap.
  • Figures 5A-G are illustrations of different exemplary embodiments of junction structures that can be used for labeling probes in accordance with the present invention..
  • Figure 6 is an illustration of a treblor label structure that can be used to label probes in accordance with the present invention.
  • Figure 7 is an illustration of a dendrimeric non-nucleic acid label that can be used in probes in accordance with the present invention.
  • Figure 8 is an illustration of a label comprising multiple dyes conjugated to the 3' or 5' phosphate or to the nucleobase. It will be appreciated that this is an exemplary embodiment and the label may comprise a subset of these labels in any combination.
  • Figure 9 is a schematic illustration of an exemplary embodiment of the invention.
  • Genomic DNA is isolated from a drop of blood ( Figure 9A). This DNA is fragmented and then decorated with labeled probes ( Figure 9B). The decorated nucleic acids are then applied to a substrate, such as a nanochip ( Figure 9C), which stretches the DNA.
  • the sequence of the different fragments can then be mapped based on detection of the order, and optionally the relative distance, or the probes along the fragments.
  • the sequences of the individual fragments are aligned against a reference sequence ("RefSeq", Figure 9D).
  • Figure 10 is a schematic illustration of converting consensus signatures into partial chromosome sequences.
  • the order of probes detected along a fragment ( Figure 10A) provide a map of the sequences represented by those probes. This can be accomplished for multiple probe sets ( Figure 10A and 10B).
  • the partial sequences obtained from each of the repetitions can be assembled to provide the partial chromosome sequence.
  • Figure 11 is a schematic illustration of assembling fragments to construct a map of chromosome molecules.
  • Figure 11 A shows the multiple chromosomes that can be fragmented to form the fragments in
  • Figure 11 B These fragments are labeled with probes that bind to specific 6-mers ( Figure 1 1C) to form decorated fragments.
  • Figure 1 1C The order of the probes along each fragment provides a signature for each fragment
  • Figure 12 is a schematic illustration of assembling consensus signatures for each haplotype chromosome. Three different chromosomes are illustrated.
  • Figure 13 is a schematic illustration of a substrate of use in the invention.
  • the exemplary substrate illustrated comprises a non-patterned region that leads into linear features.
  • Figure 14 is a schematic illustration of a substrate of use in the invention that comprises a nanopore.
  • a nucleic acid molecule can be non-linear (i.e., non-stretched) on either side of the nanopore, but the movement through the nanopore serves to stretch the nucleic acid.
  • Figure 15 is a schematic illustration of molecular beacons, which can be added to gapped nucleic acids to form decorated nucleic acids. Such molecular beacons are quenched before hybridization/attachment to the nucleic acids, and the conformational change of attaching to the nucleic acids results in a detectable signal.
  • the practice of the present invention may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, polymer technology, molecular biology (including recombinant techniques), cell biology, biochemistry, and immunology, which are within the skill of the art.
  • Such conventional techniques include polymer array synthesis, hybridization, ligation, and detection of hybridization using a label. Specific illustrations of suitable techniques can be had by reference to the example herein below. However, other equivalent conventional procedures can, of course, also be used.
  • Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (VoIs.
  • the present invention is directed to compositions and methods for single molecule nucleic acid identification and detection, which finds use in a wide variety of applications as described herein.
  • the invention can be described as follows. Genomic nucleic acid, generally double stranded DNA, is obtained from cells, generally from roughly 10 to 100 cells.
  • the genomic nucleic acid is fractionated into appropriate sizes using standard techniques such as physical or enzymatic fractionation. Amplification can optionally be done as needed, although in general, the efficiency and redundancy of the present invention allows the methods to be done without amplification.
  • the genomic nucleic acid fragments are then "decorated" with labeled probes in one of two ways.
  • the fragments are nicked using nicking enzymes, and "gapped", using exonucleases, to produce single stranded gaps spaced along both strands of the double stranded target, as is generally shown in Figures 1 and 2.
  • Labeled probes are then added under conditions such that if the label probes are perfectly complementary to the single stranded nucleic acid of the gap, the label probe hybridizes to the single strand. This forms a "decorated” target nucleic acid.
  • polymerase and dNTPs can be added to "fill in the gaps", and further optionally ligase can be added to ligate one or both ends of the probes to the rest of the target sequence.
  • a decorated nucleic acid can still contain gaps and/or nicks, depending on the configuration of the system used as well as stretching conditions; for example, vigorous stretching conditions may favor the use of decorated nucleic acids with a minimum of gaps and/or nicks, while other stretching techniques may not require the use of any filling or ligation.
  • Another embodiment used to create decorated nucleic acids does not rely on the use of nicking and gapping techniques to decorate the nucleic acids.
  • invasive probes are used, e.g. recA probes are used.
  • RecA is a protein that binds to single stranded probes and then will hybridize to a corresponding sequence in a target double stranded sequence, forming a "bubble” or a "D- loop” as it is frequently referred to.
  • the recA probes of the invention are also labeled as outlined herein.
  • pairs of recA probes can be made, such that the probes hybridize to each strand, forming "double D-loops", again as generally depicted in Figure 3C. That is, a "Watson" recA labeled probe and a "Crick” recA labeled probe are used, preferably labeled with the same label, to deliver two labels to the target.
  • the incorporation of the recA probes along the target sequence also forms a "decorated" nucleic acid.
  • the label probes can take on a wide variety of configurations.
  • the label probes are single stranded and contain a distinguishing label for each label probe sequence; this can be done for both the nicking and recA embodiments.
  • the label probes contain more than one label, such as depicted in Figure 5, which shows a hybridization complex of several sequences that contain more than one label without significant quenching.
  • two label probes are used, which can be optionally ligated together, and may contain FRET pairs, which allows longer sequence calls.
  • the label probes may be labeled at either terminus or at an internal position, which can determine whether the label probes can be ligated at one or both ends during the decoration process.
  • the decorated nucleic acid is then "stretched".
  • this refers to the process of adding a decorated nucleic acid to a substrate in such a manner that the decorated nucleic acid is substantially linear and optically separated from other single decorated nucleic acids; that is, the order (and in some cases, the spacing) of the labels can be determined.
  • the stretched nucleic acid need not be perfectly linear, it just needs to be "straight" enough such that a non-ambiguous order of labels can be determined. For example, having the decorated nucleic acid stretched in a straight or serpentine channel is sufficient.
  • a detector After stretching, a detector is used to determine the order of the labels within the decorated nucleic acid, and, in some cases, some determination of distance between the labels, depending on the system.
  • the information is then used to create a map, or "sequence signature", as is generally depicted in Figure 9D.
  • This sequence signature can be compared against a reference sequence, or database of sequences, to determine any number of things, including for example the identity of the genome (e.g. pathogens). That is, each fragment will have a readout of colors in a particular order, with each color corresponding to a particular sequence.
  • fragment 1 may be red-red-yellow-green-green-blue-yellow-yellow- yellow-red
  • fragment 2 may be yellow-green-green-blue-red-yellow-blue-red-red-yellow-green
  • fragment 3 may be yellow-yellow-red-green-blue-blue-blue-yellow-red-green. Lining these up using the overlaps gives an order of fragment 2-fragment 1 -fragment 3.
  • this uses the sequences of the probes, this generates a "sequence signature" that can be compared to reference sequences to identify the nucleic acid, or to confirm its identity.
  • this information can be used as well.
  • the match of sequence to a reference that contains some differences can also be used to identify changes in the target genome, e.g. a single nucleotide polymorphism (SNP) within a probe sequence (e.g. fragment 1 in this particular sample is missing the second red, although the rest of the signature is correct, indicating a change within that sequence of that particular target)
  • SNP single nucleotide polymorphism
  • compositions comprising substrates with stretched decorated nucleic acids.
  • stretched decorated nucleic acids of the invention can be used for a variety of purposes, including sequence analysis, analysis of genetic variation, detection of pathogens and detection of markers for disease.
  • the present invention provides compositions and methods utilizing stretched, decorated nucleic acids to identify and/or detect target nucleic acids in samples.
  • the sample solution may comprise any number of things, including, but not limited to, bodily fluids (including, but not limited to, blood, urine, serum, lymph, saliva, anal and vaginal secretions, perspiration and semen, of virtually any organism, with mammalian samples being preferred and human samples being particularly preferred); environmental samples (including, but not limited to, air, agricultural, water and soil samples); biological warfare agent samples; research samples (i.e.
  • the sample may be the products of an amplification reaction, including both target and signal amplification as is generally described in PCT/US99/01705, such as PCR amplification reaction); purified samples, such as purified genomic DNA, RNA, proteins, etc.; raw samples (bacteria, virus, genomic DNA, etc.); as will be appreciated by those in the art, virtually any experimental manipulation may have been done on the sample.
  • stretched decorated nucleic acids are formed from genomic DNA.
  • genomic DNA is isolated from a target organism.
  • target organism is meant an organism of interest and as will be appreciated, this term encompasses any organism from which nucleic acids can be obtained. Methods of obtaining nucleic acids from target organisms are well known in the art. Samples comprising genomic DNA of humans find particular use in many embodiments.
  • target nucleic acid is used to generate stretched decorated nucleic acids of the invention.
  • target nucleic acid refers to a nucleic acid of interest.
  • target nucleic acids of the invention are genomic nucleic acids.
  • Target nucleic acids include naturally occurring or genetically altered or synthetically prepared nucleic acids (such as genomic DNA from a mammalian disease model).
  • Target nucleic acids can be obtained from virtually any source and can be prepared using methods known in the art.
  • target nucleic acids can be directly isolated without amplification, isolated by amplification using methods known in the art, including without limitation polymerase chain reaction (PCR), multiple displacement amplification (MDA), rolling circle amplification (RCA), rolling circle amplification (RCR) and other amplification methodologies.
  • PCR polymerase chain reaction
  • MDA multiple displacement amplification
  • RCA rolling circle amplification
  • RCR rolling circle amplification
  • Target nucleic acids may also be obtained through cloning, including cloning into vehicles such as plasmids, yeast, and bacterial artificial chromosomes.
  • nucleic acid or "oligonucleotide” or “polynucleotide” or grammatical equivalents herein means at least two nucleotides covalently linked together.
  • a nucleic acid of the present invention will generally contain phosphodiester bonds, although in some cases, as outlined below (for example in the construction of primers and probes such as label probes), nucleic acid analogs are included that may have alternate backbones, comprising, for example, phosphoramide (Beaucage et al., Tetrahedron 49(10):1925 (1993) and references therein; Letsinger, J. Org. Chem. 35:3800 (1970); Sblul et al., Eur. J.
  • LNA locked nucleic acids
  • Other analog nucleic acids include those with bicyclic structures including locked nucleic acids (also referred to herein as "LNA"), Koshkin et al., J. Am. Chem. Soc. 120:13252 3 (1998); positive backbones (Denpcy et al., Proc. Natl. Acad. Sci. USA 92:6097 (1995); non- ionic backbones (U.S. Pat. Nos. 5,386,023, 5,637,684, 5,602,240, 5,216,141 and 4,469,863; Kiedrowshi et al., Angew. Chem. Intl. Ed. English 30:423 (1991 ); Letsinger et al., J. Am. Chem.
  • nucleic acids containing one or more carbocyclic sugars are also included within the definition of nucleic acids (see Jenkins et al., Chem. Soc. Rev. (1995) pp 169 176).
  • nucleic acid analogs are described in Rawls, C & E News Jun. 2, 1997 page 35.
  • LNATM Locked nucleic acids
  • LNAs are a class of nucleic acid analogues in which the ribose ring is "locked" by a methylene bridge connecting the 2'-0 atom with the 4'-C atom, All of these references are hereby expressly incorporated by reference. These modifications of the ribose-phosphate backbone may be done to increase the stability and half-life of such molecules in physiological environments. For example, PNA:DNA and LNA-DNA hybrids can exhibit higher stability and thus may be used in some embodiments.
  • the target nucleic acids may be single stranded or double stranded, as specified, or contain portions of both double stranded or single stranded sequence.
  • the nucleic acids may be DNA (including genomic and cDNA), RNA (including mRNA and rRNA) or a hybrid, where the nucleic acid contains any combination of deoxyribo- and ribo-nucleotides, and any combination of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xathanine hypoxathanine, isocytosine, isoguanine, etc.
  • many embodiments utilize substantially double stranded genomic DNA as the target nucleic acids.
  • nucleic acid includes oligonucleotides and polynucleotides.
  • Decorated nucleic acids of the invention can in some embodiments be at least 10 kb in length.
  • decorated nucleic acids of the invention are at least 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 250, 300, 350, 400, 450, 500, 750, and 1000 kb in length.
  • the decorated nucleic acids can be shorter depending on the length of the gene, e.g. 1-10 kb in length.
  • compositions of the invention comprise decorated target nucleic acids, generally stretched on a substrate.
  • substrate or “solid support” or other grammatical equivalents herein is meant any material that is modified to allow “stretching" of nucleic acid molecules as described herein.
  • the substrate contains discrete individual sites (for example, nanochannels or lines) appropriate for the attachment or association of decorated nucleic acid molecules to form stretched nucleic acids and is amenable to at least one detection method. As will be appreciated by those in the art, the number of possible substrates is very large.
  • Possible substrates include, but are not limited to, glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses, plastics, optical fiber bundles, and a variety of other polymers.
  • plastics including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.
  • polysaccharides such as polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.
  • polysaccharides such as polypropylene, polyethylene, polybutylene, polyurethanes
  • Substrates of the invention can be configured to have any convenient geometry or combination of structural features.
  • the substrates can be either rigid or flexible and can be either optically transparent or optically opaque, or have combinations of these surfaces.
  • the substrates can also be electrical insulators, conductors or semiconductors. Further the substrates can be substantially impermeable to liquids, vapors and/or gases or, alternatively, the substrates can be substantially permeable to one or more of these classes of materials.
  • the substrates fall into two different classes: substrates comprising particular geometries such as nanochannels or nanopores, as more fully discussed below, or those that have surface characteristics to allow the stretching of decorated nucleic acids, such as the use of linear patterns of surface chemistries.
  • substrates of the invention comprise nanostructures.
  • Such structures can include without limitation nanopillars, nanopores and nanochannels.
  • substrates of the invention comprise nanochannels.
  • Such substrates are known in the art.
  • U.S. Patent Nos. 7,217,562; 6,685,841 ; 6,518,189; 6,440,662; 6,214,246 describe nanostructures, including nanochannels, of use in accordance with the present invention.
  • These patents are hereby incorporated by reference in their entirety for all purposes and in particular for their teachings regarding the figures, legends and accompanying text describing the compositions, methods of using the compositions and methods of making the compositions.
  • nanochannel substrates there is a reservoir into which the decorated nucleic acids are placed, which are then moved into nanochannels, a single molecule of decorated nucleic acid per nanochannel, to form the stretched nucleic acids, followed by detection of the order, and optionally, the distance between the labels of the incorporated probes.
  • the substrates comprise nanochannels that are generally from about 10 nanometers to about 50 nanometers in diameter.
  • Nanopore devices can provide single-molecule detection of molecules driven electrophoretically in solution through a nano-scale pore, and the sequence of nucleotides can be detected by the sequence of signals generated as each nucleotide passes through the pore.
  • Such nanopores and methods of sequencing using nanopores are known in the art and discussed in for example, Branton et al., (2008), Nature, 26(10):1146-53 and in U.S. patent Nos. 6,673,615; 7,258,838; 7,238,485; 7,189,503;6,627,067; 6,464,842; 6,267,872 and U.S. Patent Application Nos.
  • substrates comprising essentially linear patterns of surface characteristics can be used to as the compositions of the stretched nucleic acids of the invention.
  • substrates of the invention will generally comprise discrete individual sites for attachment or association of a stretched decorated nucleic acid molecules of the invention. Such sites may be a pattern, i.e. a regular design or configuration, or randomly distributed.
  • Pattern in this sense includes a repeating unit cell, preferably one that allows a high density of nucleic acid molecules on the substrate.
  • the surface of the substrate is modified to allow attachment of the nucleic acid molecules at individual sites, whether or not those sites are contiguous or non-contiguous with other sites, although in many embodiments the sites are optically resolvable.
  • the surface of the substrate may be modified such that discrete sites are formed that can only have a single associated nucleic acid.
  • Such substrates have a number of advantages, including ease of preparation and the ability to produce a complete array of longer stretched nucleic acid molecules.
  • An exemplary embodiment of such substrates is illustrated in Figure 13 (which are also referred to herein as "flowthrough substrates” and “flowthrough systems”).
  • the substrate 1301 comprises a non-patterned region 1302 that leads into a region patterned with linear features (1304-1307), which are interspersed with non-binding regions such as 1308.
  • Nucleic acid molecules can be flowed across the non-patterned region 1302 into the linear features 1304-1307.
  • the flow across the open region 1302 will generally result in allowing the nucleic acid molecules to "untangle" and stretch as they are moved along with the flow toward the linear features.
  • 1303 is an interface region between the non-patterned region 1302 and the linear features. This interface region may comprise patterns that conform to or lead into the linear features or the interface region may be non-patterned.
  • the linear features of such substrates can be long and very narrow.
  • such linear features are about 10 nm to about 1 ⁇ m in width.
  • such linear features are about 10 to about 100 nm in width.
  • such linear features are about 20 to about 750, about 30 to about 500, about 40 to about 400, about 50 to about 300, about 60 to about 200, and about 70 to about 100 nm in width.
  • the linear features are about 10 ⁇ m to about 5 mm in length.
  • the linear features are about 100 ⁇ m to about 1 mm in length.
  • the linear features are about 20 to about 900, about 30 to about 800, about 40 to about 700, about 50 to about 600, about 60 to about 500, about 70 to about 400, about 80 to about 300, about 90 to about 200, and about 100 to about 150 ⁇ m in length.
  • the linear features are separated by regions of the substrate surface that do not bind and/or repel nucleic acids. In an exemplary embodiment, the linear features are separated by about 30-3000 nm.
  • the linear feature are separated by about 40 to about 2500, by about 50 to about 2000, by about 60 to about 1500, by about 70 to about 1000, by about 80 to about 900, by about 90 to about 800, by about 100 to about 700, by about 200 to about 600, by about 300 to about 500, and by about 400 to about 450 nm.
  • the distance between the stripes is great enough to avoid significant cross-binding, that is, one target nucleic acid binding to more than one stripe.
  • the distance between the features can also vary depending on the length of the sequence. For example, in the sequencing of "shorter" targets, e.g. cDNAs, the features can be closer together.
  • the linear features comprise positive charges that are able to attract and bind nucleic acid molecules.
  • these areas are separated by negative charges and/or hydrophobic chemistries. This can be accomplished using a wide variety of known techniques as is more fully discussed below.
  • Each linear feature may comprise one or multiple consecutive stretched decorated nucleic acid molecules.
  • a substrate of the invention may comprise over 100, 1000, 10,000, 100,000 or 1 ,000,000 linear features. All or a portion of such linear features may further comprise one or more stretched decorated nucleic acid molecules.
  • the nature of the patterned linear features will depend on the material of the substrate and the desired characteristics.
  • patterns on the substrate are generated using surfaces chemistries that result in a pattern of hydrophilic surface area (e.g. a line or stripe), optionally surrounded or separated by hydrophobic areas.
  • Alternate embodiments utilize electrostatic forces, for example linear stripes of positively charged surface chemistries, again optionally surrounded or separated by either positively charged or neutral surface chemistries
  • the open non-patterned region of the substrate will generally not bind or otherwise attract nucleic acids, and in some embodiments, this non-patterned region will be negatively charged and/or hydrophobic.
  • the length of this non-patterned region can be adjusted to accommodate nucleic acid type (single stranded or double stranded, RNA, DNA, etc.), nucleic acid length, and loading conditions.
  • substrates such as those illustrated in Figure 13 are enclosed in a flow cell.
  • such substrates are enclosed in a chamber or covered by a material (such as, in one non-limiting example, a glass coverslip) to create a space of about 10 nm to about 1 ⁇ m in the "z" dimension, allowing solutions containing decorated nucleic acids to flow across the non- patterned region to the linear features.
  • This movement of the decorated nucleic acids may be due to any number of forces, including gravity, an electric field, or some combination thereof.
  • the nucleic acids will become stretched and oriented in the direction of the linear features.
  • nucleic acids As the nucleic acids flow across and are attracted and/or bind to the linear features, they may be further stretched.
  • one end of the nucleic acid molecules may be designed to move faster across the surface or attach/adsorb to the linear feature before the other end.
  • the nucleic acid molecule, as it moves along with the flow of the solution, will be further strip until the majority of its length is attached or otherwise associated with some or all of the remaining length of the linear feature.
  • the height ("z direction") of the non-patterned region is different than that of the region comprising the linear features.
  • Such differential heights can be designed to help direct the nucleic acid molecules across the non-patterned area of the substrate toward the area comprising the linear features.
  • Loading of nucleic acids onto these substrates can be modulated and/or controlled by the flow and/or electrical forces, including diffusion forces and surface forces exerted by areas of differential charge and/or hydrophobicity.
  • the number of nucleic acids applied to the substrate i.e., with a loading buffer or other solution
  • the number of nucleic acids applied to the substrate can be adjusted to assure maximal occupancy of the linear features with non-overlapping nucleic acid molecules and thus minimize the number of empty linear features on the substrate.
  • at least 50% of the linear features of a substrate are occupied by at least one nucleic acid molecule.
  • at least 60%, 70%, 80%, 90%, and 95% of the linear features are occupied by one or more nucleic acids.
  • a nucleic acid occupying a linear feature will exclude the entrance of a second nucleic acid to that same linear feature, for example by the repelling force of the negative charge of the first nucleic acid molecule alone, or in combination with the attractive force of positive charges contained in nearby empty linear features.
  • This exclusion may be further controlled and/or modulated by the rate of flow of the nucleic acid molecule-containing solution through and/or across the substrate, by the dimensions of the linear features and the non-patterned region (particularly the height of these regions), the buffer composition, the width of the linear features, as well as other parameters apparent to one of skill in the art.
  • nucleic acids not adsorbed, attached or otherwise associated with a linear feature will be washed away or otherwise removed from the substrate before the substrate comprising the nucleic acids are used in applications, such as sequencing applications, as discussed further herein.
  • nucleic acid molecules may be continuously flowed through and/or over and/or across linear features, allowing for continuous detection and analysis of each decorated nucleic acid molecule as it travels along the linear features.
  • photolithography, electron beam lithography, nano imprint lithography, and nano printing may be used to generate such patterns on a wide variety of surfaces, e.g. Pirrung et al, U.S. patent 5,143,854; Fodor et al, U.S. patent 5,774,305; Guo, (2004) Journal of Physics D: Applied Physics, 37: R123-141 ; which are incorporated herein by reference for all purposes, and in particular for the figures, legends and accompanying text describing the compositions, methods of using the compositions and methods of making the compositions.
  • substrates of the invention comprise silicon dioxide wafers. Such silicon dioxide wafers may in one embodiment be patterned in accordance with methods described above and known in the art.
  • substrates comprise a plurality of locations. Such locations may be patterned on the surface using methods described above.
  • each of the locations on the substrate comprises a single stretched decorated nucleic acid.
  • the stretched decorated nucleic acids are positioned on the substrate such that they are optically resolvable.
  • Each of the plurality of locations may comprise a nanochannel or a surface comprising hydrophobic regions alternating with hydrophilic regions, as discussed further above.
  • substrates of the invention comprise a plurality of locations, but these locations do not comprise capture probes; that is, the substrate does not contain attached nucleic acids used to capture targets, as is well known in the art.
  • the substrates may be part of a cartridge system, (sometimes referred to as a "biochip"), that can include a variety of different additional components for functionality, including pumps, valves, reagents, additional chambers (in addition to the detection chamber), etc.
  • a cartridge system sometimes referred to as a "biochip”
  • additional components including pumps, valves, reagents, additional chambers (in addition to the detection chamber), etc.
  • the present invention encompasses compositions comprising substrates with stretched decorated nucleic acids.
  • stretched nucleic acids By “stretched” is meant linearized such that the order and optionally the relative distance of probes along the decorated nucleic acids can be detected.
  • stretched nucleic acids may be of any configuration such that order and optionally relative distance of probes attached to or associated with the nucleic acids can be detected, i.e., stretched nucleic acids of the invention may be in a linear configuration, but may also comprise serpentine or other curved or extended configurations.
  • the stretched nucleic acid may not be linear prior to entering the nanopore; in these embodiments, "stretched” means that the nucleic acid enters the nanopore in a linear way such that the order and optionally the relative distance of probes along the decorated nucleic acids can be detected, even though the nucleic acid may be substantially non-linear either before entering the pore or after, or both, as is generally depicted in Figure 14.
  • nucleic acids of the invention are substantially double stranded.
  • substantially double stranded herein is meant that the majority of the nucleic acid is double stranded but contains one or more single stranded regions.
  • about 51 % to about 99% of a "substantially double stranded" nucleic acid of the invention is double stranded.
  • about 55% to about 90%, about 60% to about 85%, about 65% to about 80%, and about 70% to about 75% of a substantially double stranded nucleic acid of the invention is double stranded.
  • gap filling and ligation of nicks may be utilized, while in other embodiments, these steps are not required to allow stretching.
  • these reactions may be terminated prior to saturation; that is, some gaps and/or nicks can be repaired but not all. It should be noted that this may also be a function of the amount or time of enzymatic exposure of the original target sequence to these enzymes; for example, high concentrations and/or long exposure times to the exonuclease can result in bigger gaps.
  • a stretched nucleic acid according to the invention comprises no gaps and/or nicks.
  • a stretched nucleic acid of the invention may comprise one or more gaps, one or more nicks, or a combination of nicks and gaps.
  • stretched nucleic acids including stretched nucleic acids comprising one ore more gaps and/or nicks, particularly gaps and/or nicks that are located in close proximity on opposite strands
  • crosslinking such as, in one limiting example, by applying psoralen
  • stabilizing proteins bound to or cross-linked to the nucleic acids such as, in one non-limiting example, double stranded DNA stabilizing proteins
  • nucleotide analogs such as LNA or PNA, which are usually, (but are not required to be
  • the target nucleic acid when invasive probes are used, may not contain any gaps and/or nicks.
  • stretched nucleic acids of the invention are concatemers.
  • concatemer is meant a nucleic acid that contains multiple copies (e.g. "monomers") of a target nucleic acid or a fragment of a target nucleic acid.
  • Such concatemers may be of particular use in analyzing shorter nucleic acids, such as cDNA or shorter fragments of genomic DNA (for example, fragments of -10-30 kb in length).
  • concatemers are generated from these shorter nucleic acids, and each of those concatemers will include multiple copies of those shorter nucleic acids. Decorating such concatemers and then detecting the probes (using methods discussed in further detail below) can result in an increased signal to noise ratio and minimize false negatives.
  • Phi29 can be used to generated 100-300 kb long concatemers comprising, for example, 9 copies of a 30 kb length of nucleic acid or 30-90 copies of a 3 kb length of nucleic acid (3 kb is a typical length of a cDNA molecule).
  • Methods of generating concatemers in accordance with the present invention are described in U.S.
  • the present invention encompasses substrates with stretched decorated nucleic acids.
  • decorated nucleic acid herein is meant a nucleic acid which has at least one labeled probe incorporated into its structure.
  • incorporated into its structure' is meant that the labeled probe is associated with the nucleic acid, for example through hybridization to complementary regions on either or both of the nucleic acids of a double stranded target nucleic acid, through ligation, through a combination of ligation and hybridization, or through other methods known in the art for attaching labeled probes to nucleic acids, including post-hybridization cross-linking.
  • probe is meant a nucleic acid, usually single stranded nucleic acid, that comprises one or more detectable label(s) as further outlined herein.
  • labels may be attached to such oligonucleotide probes at one or both ends and/or to nucleotides within the body of the oligonucleotide probe. Probes of use in the invention are discussed further below.
  • decorated nucleic acids of the invention may be fully double stranded or may contain one or more gaps and/or nicks. As discussed further below, decorated nucleic acids are "decorated" with probes.
  • Probes of use in the invention comprise any nucleic acid or set of nucleic acids associated with a detectable label that can be attached to target nucleic acids to form decorated nucleic acids of the invention.
  • probes of the invention are nucleic acids comprising sequences that will hybridize to some portion, i.e. a domain, of a target nucleic acid.
  • Probes of the present invention are designed to be complementary, and in general, perfectly complementary, to a sequence of the target sequence such that hybridization of a portion target sequence and probes of the present invention occurs.
  • the probes are perfectly complementary to the target sequence to which they hybridize; that is, the experiments are run under conditions that favor the formation of perfect basepairing, as is known in the art.
  • a probe that is perfectly complementary to a first domain of the target sequence could be only substantially complementary to a second domain of the same target sequence; that is, the present invention relies in many cases on the use of sets of probes, for example, sets of hexamers, that will be perfectly complementary to some target sequences and not to others.
  • the complementarity between the probe and the target need not be perfect; there may be any number of base pair mismatches, which will interfere with hybridization between the target sequence and the single stranded nucleic acids of the present invention. However, if the number of mismatches is so great that no hybridization can occur under even the least stringent of hybridization conditions, the sequence is not a complementary target sequence.
  • substantially complementary herein is meant that the probes are sufficiently complementary to the target sequences to hybridize under normal reaction conditions. However, for most applications, the conditions are set to favor probe hybridization only if perfectly complementarity exists.
  • probes capable of forming a Hoogsteen bond with the target nucleic acid are used. Such probes form a triplex with the target nucleic acid.
  • a probe that binds by Hoogsteen binding enters the major groove of a target nucleic acid and hybridizes with the bases located there.
  • probes used in accordance with the present invention can form both Watson-Crick and Hoogsteen bonds with the target nucleic acid.
  • Bis PNA probes for instance, are capable of both Watson- Crick and Hoogsteen binding to a target nucleic acid molecule.
  • the probes of use in the invention are generally single stranded, but they are not so limited.
  • the probe when it is a bis PNA it can adopt a secondary structure with the target nucleic acid resulting in a triple helix conformation, with one region of the bis PNA clamp forming Hoogsteen bonds with the backbone of the target molecule and another region of the bis PNA clamp forming Watson-Crick bonds with the nucleotide bases of the target molecule.
  • Probes of use in the invention can be of any size.
  • probes are generally on the order of 100 bases or fewer in length.
  • probes of the invention are about 3 to about 20 bases in length, with probes of 5, 6, 7, 8, 9 and 10 finding particular use.
  • probes of the invention are about 5 to about 90, about 10 to about 80, about 15 to about 70, about 20 to about 60, about 25 to about 50, and about 30 to about 40 bases in length.
  • probes used in accordance with the invention are 6 bases long (also referred to herein as “6-mers” and/or “hexamers”).
  • the optimal length of a probe of use in the invention will depend in part on the length of the target nucleic acid to be analyzed. Longer probes can provide more sequence information but will have fewer points of hybridization along a given nucleic acid, resulting in less "coverage" of that nucleic acid. Longer probes may be useful for pathogen detection or other diagnostics.
  • some embodiments utilize pairs of probes, e.g.
  • Probes for use in the present invention may have both informative and non-informative (such as degenerative or universal) bases.
  • a probe set can be designed to include 3 degenerative positions and three informational positions, e.g. D-D-D-I-I-I.
  • the degenerative positions contain all four possible bases and the informational positions contain only 1 base. That is, in this example, the set comprises 64 different probes that will hybridize to any target sequence that has any base at the first three positions and the complement of the bases at the informational bases in that order.
  • each probe of this set will be labeled with the same label, such that the complement of the informational trimer can be detected.
  • the degenerative positions and the informational positions be in any order (e.g. D-I-D-I-I-D, etc.)
  • universal bases which hybridize to more than one base can be used.
  • inosine can be used. Any combination of these systems can be utilized.
  • Probes of the invention may be provided in one or more sets, in which probes in different sets comprise different probe sequences, and each probe sequence (unless degeneracy is utilized) within a set comprises a unique label.
  • labels of use in the invention include without limitation isotopic labels, which may be radioactive or heavy isotopes, magnetic labels, electrical labels, thermal labels, colored and luminescent dyes, enzymes and magnetic particles as well.
  • Dyes of use in the invention may be chromophores, phosphors or fluorescent dyes, which due to their strong signals provide a good signal-to-noise ratio for decoding.
  • Suitable dyes for use in the invention include, but are not limited to, fluorescent lanthanide complexes, including those of Europium and Terbium, fluorescein, rhodamine, tetramethylrhodamine, eosin, erythrosin, coumarin, methyl-coumarins, pyrene, Malacite green, stilbene, Lucifer Yellow, Cascade Blue. TM., Texas Red, and others described in the 6th Edition of the Molecular Probes Handbook by Richard P. Haugland, hereby expressly incorporated by reference in its entirety for all purposes and in particular for its teachings regarding labels of use in accordance with the present invention.
  • fluorescent nucleotide analogues readily incorporated into the labeling oligonucleotides include, for example, Cy3-dCTP, Cy3-dUTP, Cy5-dCTP, Cy5- dUTP (Amersham Biosciences, Piscataway, New Jersey, USA), fluorescein-12-dUTP, tetramethylrhodamine-6-dUTP, Texas Red®-5-dUTP, Cascade Blue®-7-dUTP, BODIPY® FL-14-dUTP, BODIPY®R-14-dUTP, BODIPY® TR-14-dUTP, Rhodamine GreenTM-5-dUTP, Oregon Green® 488-5-dUTP, Texas Red®-12-dUTP, BODIPY® 630/650-14-dUTP, BODIPY® 650/665-14-dUTP, Alexa Fluor® 488-5-dUTP, Alexa Fluor® 532-5-dUTP, Alexa Fluor® 5
  • fluorophores available for post-synthetic attachment include, inter alia, Alexa Fluor® 350, Alexa Fluor® 532, Alexa Fluor® 546, Alexa Fluor® 568, Alexa Fluor® 594, Alexa Fluor® 647, BODIPY 493/503, BODIPY FL, BODIPY R6G, BODIPY 530/550, BODIPY TMR, BODIPY 558/568, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589, BODIPY 581/591 , BODIPY 630/650, BODIPY 650/665, Cascade Blue, Cascade Yellow, Dansyl, lissamine rhodamine B, Marina Blue, Oregon Green 488, Oregon Green 514, Pacific Blue, rhodamine 6G, rhodamine green, rhodamine red, tetramethylrhodamine,
  • FRET fluorescence resonance energy transfer
  • Suitable FRET tandem fluorophores include but are not limited to, PerCP-Cy5.5, PE-Cy5, PE-Cy5.5, PE-Cy7, PE-Texas Red, and APC-Cy7; also, PE- Alexa dyes (610, 647, 680) and APC-Alexa dyes.
  • the labels are direct or primary labels, e.g. the fluorophore is directly and usually covalently attached to the probe.
  • indirect labels can be used. That is, in these embodiments, a secondary detectable label is used.
  • a secondary label is one that is indirectly detected; for example, a secondary label can bind or react with a primary label for detection, can act on an additional product to generate a primary label (e.g. enzymes), or may allow the separation of the compound comprising the secondary label from unlabeled materials, etc.
  • Secondary labels include, but are not limited to, one of a binding partner pair such as biotin/streptavidin; chemically modifiable moieties; nuclease inhibitors, enzymes such as horseradish peroxidase, alkaline phosphatases, lucifierases, etc.
  • a binding partner pair such as biotin/streptavidin; chemically modifiable moieties; nuclease inhibitors, enzymes such as horseradish peroxidase, alkaline phosphatases, lucifierases, etc.
  • the secondary label is a binding partner pair.
  • the label may be a hapten or antigen, which will bind its binding partner.
  • binding partner pairs include, but are not limited to: antigens (such as proteins (including peptides)) and antibodies (including fragments thereof (FAbs, etc.)); proteins and small molecules, including biotin/streptavidin; enzymes and substrates or inhibitors; other protein-protein interacting pairs; receptor-ligands; and carbohydrates and their binding partners.
  • Nucleic acid— nucleic acid binding proteins pairs are also useful.
  • Preferred binding partner pairs include, but are not limited to, biotin (or imino-biotin) and streptavidin, digeoxinin and Abs, and Prolinx.TM. reagents (see www.prolinxinc.com/ie4/home.hmtl).
  • the binding partner pair comprises biotin or imino-biotin and a fluorescently labeled streptavidin or anti-biotin antibody, with the former generally being preferred.
  • Imino- biotin is particularly preferred as imino-biotin disassociates from streptavidin in pH 4.0 buffer while biotin requires harsh denaturants (e.g. 6 M guanidinium HCI, pH 1.5 or 90% formamide at 95°C).
  • Other pairs of use in the invention include digoxigenin, which may be incorporated as a label and subsequently bound by a detectably labeled anti-digoxigenin antibody (e.g. fluoresceinated anti-digoxigenin).
  • an aminoallyl-dUTP residue may be incorporated into a detection oligonucleotide and subsequently coupled to an N-hydroxy succinimide (NHS) derivitized fluorescent dye, such as those listed supra.
  • NHS N-hydroxy succinimide
  • any member of a conjugate pair may be incorporated into a detection oligonucleotide provided that a detectably labeled conjugate partner can be bound to permit detection
  • Additional labels of use in the present invention include nanocrystals, sometimes referred to as Quantum Dots or Q-dots, which are known in the art and described generally for example in Bawendi et al. and C. Kagan et al.; Phys. Rev. Lett. 76, (1996), pages 1517-1520 and in U.S. Patent Nos. 6,544,732 and 7,410,810, which are hereby expressly incorporated by reference in their entirety for all purposes, and in particular for their teachings regarding nanocrystals and/or Q-dots as labels for nucleic acids, including all the discussions regarding the shell, core and polymer components, as well as conjugation strategies.
  • Q-dots Quantum Dots or Q-dots
  • Suitable quantum dots include QDot 605 and QDot 650 compositions.
  • quantum dots externally conjugated with streptavidin can be used as a secondary label with a biotinylated probe. These result in labels with strong brightness; and in the present invention have been shown to give a signal similar to 3 or 4 conventional fluorophores.
  • the dots may aggregate in nanochannels, and in some instances they "blink", and thus may not be preferred in some instances.
  • Molecular antennae also known as "light harvesting polymers” can also be used to label probes (Gaylord et al., PNAS, 2002, 99(17): 10954-10957).
  • nano-beads are used to label probes of the invention.
  • Such nano- beads are generally beads encapsulating or covalently bound to molecules of fluorescent dye.
  • Such beads are well known-in the art.
  • such beads comprise plastic, glass, metal, as well as other materials or combinations of materials.
  • Labels can be attached to nucleic acids to form the labeled probes of the present invention using methods known in the art, and to a variety of locations of the nucleosides.
  • attachment can be at either or both termini of the nucleic acid, or at an internal position, or both.
  • attachment of the label may be done on a ribose of the ribose-phosphate backbone at the 2' or 3' position (the latter for use with terminal labeling), in one embodiment through an amide or amine linkage. Attachment may also be made via a phosphate of the ribose-phosphate backbone, or to the base of a nucleotide.
  • Labels can be attached to one or both ends of a probe or to any one of the nucleotides along the length of a probe.
  • probes of the invention are labeled with dyes through "treblor linkers" such as those illustrated in Figure 6.
  • Treblor linkers such as those illustrated in Figure 6 can have linkers comprising about 3 to about 15 NH 2 groups attached to the 5' phosphate of a polynucleotide.
  • the linkers comprise about 4 to about 14, about 5 to about 13, about 6 to about 12, about 7 to about 11 , and about 8 to about 10 NH 2 groups.
  • multiple dyes are attached to each "arm" of the treblor linker.
  • about 2 to about 10 dyes are attached to each arm.
  • about 3 to about 9, about 4 to about 8, and about 5 to about 7 dyes are attached per arm. The number of dyes that can be attached to each arm will depend at least in part on the number of NH2 groups in the arms.
  • treblor linker is discussed herein with respect to compounds comprising NH2 groups, it will be appreciated by one of skill in the art that a wide variety of moieties can be used in such treblor arms, selected from substituted or unsubstituted alkyl (such as alkane or alkene linkers of from about C20 to about C30), substituted or unsubstituted heteroalkyl, substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, substituted or unsubstituted cycloalkyl, and substituted or unsubstituted heterocycloalkyl.
  • substituted or unsubstituted alkyl such as alkane or alkene linkers of from about C20 to about C30
  • substituted or unsubstituted heteroalkyl substituted or unsubstituted aryl
  • substituted or unsubstituted heteroaryl substituted or unsubstituted cycloalkyl
  • the arms of such treblor linkers may include poly(ethylene glycol) (PEG) groups, saturated or unsaturated aliphatic structures comprised of single or connected rings, amino acid linkers, peptide linkers, nucleic acid linkers, PNA, LNA, as well as linkers containing phosphate or phosphonate groups.
  • PEG poly(ethylene glycol)
  • FRET pairs can also be used as labels in a variety of embodiments.
  • probes of the invention are preferably have a large enough signal to noise ratio to allow detection of single molecules.
  • signal amplification is achieved by attaching multiple labels to a probe. These multiple labels improve the signal to noise ratio and minimize false negatives caused by failures in the individual label molecules.
  • probes of the invention comprise multiple fluorophores.
  • probes of the invention comprise at least 2-20, 3-18, 4-16, 5-14, 6-12, and 8-10 fluorophores. The multiple fluorophores on a particular probe can all be the same (i.e., the same color) or can be a combination of two or more different fluorophores.
  • Probes according to the present invention can be designed to reduce the effect of quenching and thereby further improve the brightness of the signal generated by such probes.
  • quenching is a function of distance (1/r 6 ), and thus the addition of multiple labels to longer probes may result in less quenching, assuming the spacing between the labels is maximized.
  • multiple labels are attached to a single probe sequence.
  • dendrimeric probes can be used.
  • dendrimeric probe is meant a probe with a branched structure.
  • Dendrimeric probes fall into three general categories: a dendrimeric probe can be a hybridization complex comprising a number of different nucleic acid sequences that form additional basepairing between them, as is generally depicted in Figure 5, sometimes referred to herein as "junction structures" or “junction nucleic acid”.
  • dendrimeric probes utilize branching, non-nucleic acid components such as linkers to result in the addition of multiple dyes as is generally depicted in Figures 6-8.
  • Related probes utilized branched nucleic acids, which can contain a minimal non-nucleic acid linker but are composed mostly of nucleic acids. Dendrimeric probes are generally described in U.S. Pat. No. 5,175,270, which is hereby incorporated by reference in its entirety for all purposes and in particular for its teachings regarding dendrimers.
  • Nucleic acid dendrimeric probes can comprise a wide range of structures. Exemplary structures are illustrated in Figure 5. The illustrated "junction structures" allow multiple labels to be attached to a single stranded nucleic acid probe.
  • a four-way junction structure is illustrated in Figure 5A. Four labels can be incorporated into each "arm" of the four-way junction (the labels are illustrated as stars in Figure 5). Under typical buffer conditions, the four-arm junction structure will undergo rapid conformational changes, which will provide a certain degree of separation between the different labels. When the labels are fluorescent dyes, this separation will minimize potential quenching effects between dyes.
  • a four-way junction structure according to Figure 5A, in which each arm comprises a 12-mer nucleic acid is about 8 x 8 nm in size.
  • Figure 5 illustrates a junction structure comprising two of the four-way junctions depicted in Figure 5A. Such a structure allows incorporation of 7 labels, and is about 16 x 8 nm in size in embodiments in which each arm is a 12-mer nucleic acid. In further embodiments, multiple four-way junctions can be used in order to incorporate still greater numbers of labels into a probe. Exemplary embodiments of such structures are depicted in Figures 5C and D. It will be appreciated that different numbers and configurations of four- way junctions can be constructed, and that all such structures are encompassed by the present invention.
  • junction structure nucleic acid probes of the invention may comprise structures other than the four-way junctions depicted in Figures 5A-D. Exemplary embodiments of such structures are illustrated in Figures 5E-F. It will be appreciated that these are only exemplary embodiments, and that any number of nucleic acid structures which can be used with nucleic acid probes are encompassed by the present invention.
  • a multiple branched structure according to the invention comprises about 3 to about 20 arms. In a further embodiment, such a structure comprises about 5 to about 18, about 6 to about 16, about 8 to about 14, about 9 to about 12, and about 10 to about 11 arms.
  • Figure 5F depicts a structure in which asymmetrical helical ends are combined with end-labeling to provide a probe of the invention.
  • Such labels can be incorporated by hybridizing oligonucleotides attached to such labels to the single stranded regions at the 5' and 3' of the asymmetrical helical ends of each arm.
  • These asymmetrical ends may all have the same sequence, or some or all of them may have different sequences.
  • asymmetrical ends are about 6 nucleotides apart to provide a distance between labels and thus minimize quenching between labels in embodiments in which fluorescent dyes are used.
  • Figure 5G depicts a structure in which hairpin oligonucleotides are used to link all helical segments of a junction structure.
  • hairpin oligonucleotides may be a part of the junction structure, or they may be added to create the structures depicted in Figure 5G.
  • Labels can be incorporated into such structures by hybridizing oligonucleotides complementary to the loops created by the hairpin oligonucleotides or to other single stranded nucleic acid regions in the structure.
  • a structure according to Figure 5G is formed from a single stranded nucleic acid molecule that is designed to self-fold into the desired structure through self-complementary sequence regions, assisted with hairpin oligonucleotide loops.
  • structures according to Figure 5G are not limited to the number of arms depicted, and greater numbers of helical segments can be added to allow incorporation of greater numbers of labels.
  • Non-nucleic acid dendrimeric probes may also be used in accordance with the present invention.
  • a number of features are taken into consideration when designing probes for use in the present invention. As discussed above, the ability to support multiple fluorophores is an important feature for the present invention.
  • homogeneity of labeling across different probes in a set can be of importance, to ensure that probes identifying the same sequence provide a similar intensity of signal to minimize false positives or negatives.
  • Another feature of probe design is their kinetics and stability, which can be affected by their size and structure. Larger probes containing multiple fluorescent molecules, for example, will have different hybridization kinetics than smaller probes comprising only a single label. Stability of such probes can also be affected through alterations to their structure through chemical modification, including crosslinking.
  • probes of the invention are generally designed to be complementary to a target sequence.
  • a set of probes is provided in which each of the probes of the set comprises a plurality of non-overlapping probe sequences, and each probe sequence comprises a unique label.
  • the number of unique labels that can be used in a given set of probes is limited only by the number of distinguishable labels available for a particular type. For example, at present, fluorescent dyes are generally available in four different distinguishable colors. As such, a given set of probes will in one aspect of the invention comprise four different unique probe sequences. It will be appreciated that as more distinguishable labels become available, the number of distinguishable probes that can be included in a given set will increase.
  • decorated nucleic acids of the invention are produced by forming single stranded gaps within a double stranded nucleic acid.
  • Labeled probes such as those discussed above and known in the art can be hybridized to those single stranded gaps.
  • any gaps not occupied by a probe are repaired using a polymerase and dNTPs, a ligase, or a combination thereof.
  • only a subset of the gaps not occupied by a probe are repaired, resulting in a decorated nucleic acid that is partially double stranded and partially single stranded.
  • Such partially double stranded decorated nucleic acids will generally retain enough overall structural cohesion to be applied to substrates without breaking, as described below in further detail.
  • invasive probes are used, which result in structures that contain no gaps or nicks. It is also possible to utilize both types of probes, e.g. to have some invasive probes and some gap/nick probes, as well as to use multiple probes per single gap, as is more fully described below. All of these probes can be used in any combination.
  • probe design parameters there are a variety of probe design parameters to be used in the present invention. With regard to picking sequences for the label probes, there are several considerations. Preferred probe sets have sequences that fit the selected frequency window; that is, they are dependent on the spatial resolution of the imaging system used to detect the presence of such probes (methods of detection and systems of use for such detection are described in further detail below). In addition, the probes for any particular set should not hybridize to each other; that is, no probe is the reverse complement, or substantially the reverse complement, of another probe in that set. Similarly, no probe should overlap in sequence with another probe in the set; that is, probes should not be competing for hybridization to the same sequence of the target (or to the complements of the other probes). In general, probes that do not contain many repeating bases are also of particular use. Probes generally should not have many common bases with other probes when shifted (or to the complements of the other probes).
  • the general problem of testing a large number of probes for their suitability in the present invention is a NP-hard problem with complexity, proportional to k(4 m f, where m is probe length and k is the number of probes.
  • probe geometries there area also a variety of probe geometries that can be used.
  • the probes can be straight linear probes with an attached label, or can include additional sequences or polymers as described above for multiple labels.
  • “molecular beacon” or “hairpin” geometries can be used, as will be appreciated by those in the art, generally depicted in Figure 15, as described in U.S. Application Nos. 08/152,006; 08/439,819, 10/1 10,907 and in U.S. Patent Nos.
  • the probe sequence for hybridization to the target sequence is found in the single stranded portion of the hairpin, with labels on the complementary section. In the absence of hybridization to the target, the labels are quenched. Upon hybridization, the labels are separated, are no longer quenched, and result in a signal.
  • molecular beacons are generally designed such that the Tm of the probe:target complex is higher than that of the "closed hairpin".
  • including one or more mismatches in the stem of the hairpin may further favor binding to the full-matched target.
  • degenerated positions can be included in the part of the hairpin that does not match the target to extend binding to the target.
  • pairs of probes are used for each gap, with optional ligation.
  • two 6-mers can be used that hybridize to adjacent target sequences in the gaps, and then ligated together. These can contain either the same label, different labels, or a FRET label pair, resulting in more sequence information per gap.
  • a combinatorial ligation method may be used to probe all informative 6-mers with two sets of 64 probes, in which the probes have 3 informative bases.
  • recombinase As an alternative to the nicking and gapping methods of incorporating labeled probes to form a decorated nucleic acid, invasive probes, that rely on the use of recombinase, such as recA, can be used.
  • recombinase herein is meant a protein that, when included with an exogenous targeting polynucleotide, provide a measurable increase in the recombination frequency and/or localization frequency between the targeting polynucleotide (e.g. the label probes of the invention) and an endogenous predetermined DNA sequence.
  • recombinase refers to a family of RecA-like recombination proteins all having essentially all or most of the same functions, particularly: (i) the recombinase protein's ability to properly bind to and position targeting polynucleotides on their homologous targets and (ii) the ability of recombinase protein/targeting polynucleotide complexes to efficiently find and bind to complementary endogenous sequences.
  • recA803 see Madiraju et al., PNAS USA 85(18):6592 (1988); Madiraju et al, Biochem. 31 : 10529 (1992); Layery et al., J. Biol. Chem. 267:20648 (1992)).
  • recA803 see Madiraju et al., PNAS USA 85(18):6592 (1988); Madiraju et al, Biochem. 31 : 10529 (1992); Layery et al., J. Biol. Chem. 267:20648 (1992)
  • recA-like recombinases with strand-transfer activities e.g., Fugisawa et al., (1985) Nucl. Acids Res.
  • RecA may be purified from E. coli strains, such as E. coli strains JC12772 and JC15369 (available from A. J. Clark and M. Madiraju, University of California- Berkeley, or purchased commercially). These strains contain the recA coding sequences on a "runaway" replicating plasmid vector present at a high copy numbers per cell.
  • the recA803 protein is a high-activity mutant of wild-type recA.
  • recombinase proteins for example, from Drosophila, yeast, plant, human, and non-human mammalian cells, including proteins with biological properties similar to recA (i.e., recA-like recombinases), such as Rad51 , Rad57, dmel from mammals and yeast, and Pk-rec (see Rashid et al., Nucleic Acid Res. 25(4):719 (1997), hereby incorporated by reference).
  • the recombinase may actually be a complex of proteins, i.e. a "recombinosome".
  • recA is used.
  • recA protein is typically obtained from bacterial strains that overproduce the protein: wild-type E. coli recA protein and mutant recA803 protein may be purified from such strains.
  • recA protein can also be purchased from, for example, Pharmacia (Piscataway, N.J.) or Boehringer Mannheim (Indianapolis, Ind.).
  • RecA proteins, and its homologs form a nucleoprotein filament when it coats a single- stranded DNA.
  • this nucleoprotein filament one monomer of recA protein is bound to about 3 nucleotides. This property of recA to coat single-stranded DNA is essentially sequence independent, although particular sequences favor initial loading of recA onto a polynucleotide (e.g., nucleation sequences).
  • the nucleoprotein filament(s) can be formed on essentially any DNA molecule and can be formed in cells (e.g., mammalian cells), forming complexes with both single-stranded and double-stranded DNA, although the loading conditions for dsDNA are somewhat different than for ssDNA.
  • Embodiments utilizing recA invasive probes can utilize single stranded probes, e.g. one probe per location, which forms single "D-loops", or two, forming "double D-loops".
  • single recA probes are used.
  • single D-loops can be inherently somewhat unstable, due to the competition of the additional strand of target nucleic acid (e.g. if the probe is "Crick", it is hybridized to target "Watson", and target "Crick” will tend to kick out the probe "Crick”), it can be desirable to either use a second recA strand and/or to extend the invasive probe.
  • polymerase and dNTPs can be added during the invasion process to extend the invasion probe and thus make it more stable; see for example U.S.
  • the targeting polynucleotides form a double stranded hybrid, which may be coated with recombinase, although when the recombinase is recA, the loading conditions may be somewhat different from those used for single stranded nucleic acids.
  • the two complementary single-stranded targeting polynucleotides are usually of equal length, although this is not required.
  • the stability of the four strand hybrids of the invention is putatively related, in part, to the lack of significant unhybridized single-stranded nucleic acid, and thus significant unpaired sequences are not preferred.
  • the complementarity between the two targeting polynucleotides need not be perfect.
  • the two complementary single-stranded targeting polynucleotides are simultaneously or contemporaneously introduced into a target cell harboring a predetermined endogenous target sequence, generally with at lease one recombinase protein (e.g., recA).
  • a recombinase protein e.g., recA
  • it is preferred that the targeting polynucleotides are incubated with recA or other recombinase prior to introduction to the target nucleic acid, so that the recombinase protein(s) may be "loaded" onto the targeting polynucleotide(s), to coat the nucleic acid, as is described below. Incubation conditions for such recombinase loading are described infra, and also in U.S. Ser. No.
  • a targeting polynucleotide may contain a sequence that enhances the loading process of a recombinase, for example a recA loading sequence is the recombinogenic nucleation sequence poly[d(A-C)], and its complement, poly[d(G-T)].
  • RecA protein coating of targeting polynucleotides is typically carried out as described in U.S. Ser. No. 07/910,791 , filed 9 JuI. 1992 and U.S. Ser. No. 07/755,462, filed 4 Sep. 1991 , and PCT US98/05223, which are incorporated herein by reference. Briefly, the targeting polynucleotide, whether double-stranded or single-stranded, is denatured by heating in an aqueous solution at 95-100 0 C. for five minutes, then placed in an ice bath for 20 seconds to about one minute followed by centrifugation at 0 0 C. for approximately 20 sec, before use.
  • denatured targeting polynucleotides When denatured targeting polynucleotides are not placed in a freezer at -20 0 C . they are usually immediately added to standard recA coating reaction buffer containing ATP-gamma-S, at room temperature, and to this is added the recA protein. Alternatively, recA protein may be included with the buffer components and ATP-gamma-S before the polynucleotides are added.
  • RecA coating of targeting polynucleotide(s) is initiated by incubating polynucleotide-recA mixtures at 37°C. for 10-15 min.
  • RecA protein concentration tested during reaction with polynucleotide varies depending upon polynucleotide size and the amount of added polynucleotide, and the ratio of recA molecule:nucleotide preferably ranges between about 3:1 and 1 :3.
  • the mM and ⁇ M concentrations of ATP-gamma-S and recA, respectively, can be reduced to one-half those used with double-stranded targeting polynucleotides (i.e., recA and ATP-gamma-S concentration ratios are usually kept constant at a specific concentration of individual polynucleotide strand, depending on whether a single- or double-stranded polynucleotide is used).
  • RecA protein coating of targeting polynucleotides is normally carried out in a standard 1x RecA coating reaction buffer.
  • 10x RecA reaction buffer i.e., 10x AC buffer
  • 10x AC buffer consists of: 100 mM Tris acetate (pH 7.5 at 0 C), 20 mM magnesium acetate, 500 mM sodium acetate, 10 mM DTT, and 50% glycerol). All of the targeting polynucleotides, whether double-stranded or single-stranded, typically are denatured before use by heating to 95-100 0 C. for five minutes, placed on ice for one minute, and subjected to centrifugation (10,000 rpm) at 0 0 C.
  • Denatured targeting polynucleotides usually are added immediately to room temperature RecA coating reaction buffer mixed with ATP-gamma-S and diluted with double-distilled H 2 O as necessary.
  • a reaction mixture typically contains the following components: (i) 0.2-4.8 mM ATP-gamma-S; and (ii) between 1-100 ng/ ⁇ l of targeting polynucleotide.
  • To this mixture is added about 1-20 ⁇ l of recA protein per 10-100 ⁇ l of reaction mixture, usually at about 2-10 mg/ml (purchased from Pharmacia or purified), and is rapidly added and mixed.
  • the final reaction volume-for RecA coating of targeting polynucleotide is usually in the range of about 10-500 ⁇ l. RecA coating of targeting polynucleotide is usually initiated by incubating targeting polynucleotide-RecA mixtures at 37°C. for about 10-15 min.
  • RecA protein concentrations in coating reactions varies depending upon targeting polynucleotide size and the amount of added targeting polynucleotide: recA protein concentrations are typically in the range of 5 to 50 ⁇ M.
  • recA protein concentrations are typically in the range of 5 to 50 ⁇ M.
  • the concentrations of ATP-gamma-S and recA protein may optionally be reduced to about one-half of the concentrations used with double-stranded targeting polynucleotides of the same length: that is, the recA protein and ATP-gamma-S concentration ratios are generally kept constant for a given concentration of individual polynucleotide strands.
  • the coating of targeting polynucleotides with recA protein can be evaluated in a number of ways.
  • protein binding to DNA can be examined using band-shift gel assays (McEntee et al., (1981 ) J. Biol. Chem. 256: 8835).
  • Labeled polynucleotides can be coated with recA protein in the presence of ATP-gamma-S and the products of the coating reactions may be separated by agarose gel electrophoresis.
  • the recA protein effectively coats single- stranded targeting polynucleotides derived from denaturing a duplex DNA.
  • targeting polynucleotide's electrophoretic mobility decreases, i.e., is retarded, due to recA-binding to the targeting polynucleotide.
  • Retardation of the coated polynucleotide's mobility reflects the saturation of targeting polynucleotide with recA protein.
  • An excess of recA monomers to DNA nucleotides is required for efficient recA coating of short targeting polynucleotides (Leahy et al., (1986) J. Biol. Chem. 261 : 954).
  • a second method for evaluating protein binding to DNA is in the use of nitrocellulose fiber binding assays (Leahy et al., (1986) J. Biol. Chem. 261 :6954; Woodbury, et al., (1983) Biochemistry 22(20):4730- 4737.
  • the nitrocellulose filter binding method is particularly useful in determining the dissociation-rates for protein:DNA complexes using labeled DNA.
  • DNA:protein complexes are retained on a filter while free DNA passes through the filter.
  • This assay method is more quantitative for dissociation- rate determinations because the separation of DNA:protein complexes from free targeting polynucleotide is very rapid.
  • the present invention provides methods of making compositions comprising substrates with stretched decorated nucleic acids.
  • stretched decorated nucleic acids of the invention are single or double stranded nucleic acids decorated with a plurality of probes and linearized such that the order and relative distance of those probes can be detected.
  • target nucleic acid is used to generate stretched decorated nucleic acids of the invention.
  • target nucleic acid refers to a nucleic acid of interest.
  • target nucleic acids of the invention are genomic nucleic acids.
  • Target nucleic acids include naturally occurring or genetically altered or synthetically prepared nucleic acids (such as genomic DNA from a mammalian disease model).
  • Target nucleic acids can be obtained from virtually any source and can be prepared using methods known in the art.
  • target nucleic acids can be directly isolated without amplification, isolated by amplification using methods known in the art, including without limitation polymerase chain reaction (PCR), multiple displacement amplification (MDA), rolling circle amplification (RCA), rolling circle amplification (RCR) and other amplification methodologies.
  • Target nucleic acids may also be obtained through cloning, including cloning into vehicles such as plasmids, yeast, and bacterial artificial chromosomes.
  • stretched decorated nucleic acids are formed from genomic DNA.
  • genomic DNA is isolated from a target organism.
  • target organism is meant an organism of interest and as will be appreciated, this term encompasses any organism from which nucleic acids can be obtained. Methods of obtaining nucleic acids from target organisms are well known in the art.
  • a preliminary step for making stretched decorated nucleic acids of the invention includes fragmenting nucleic acids isolated from a target organism.
  • Methods for fragmenting nucleic acids include, but are not limited to, nonspecific endonuclease digestion, restriction enzyme digestion, physical shearing (e.g., by ultrasound), and treatment with sodium hydroxide.
  • the size of fragments generated can be controlled by the extent (i.e., length of time) of mechanical or enzymatic fragmenting.
  • the size of the desired fragments may depend on the detection system used.
  • fragments for use in nanochannel applications range in size from about 1 to about 500 kb in length, with from about 10 to about 100 kb in length finding particular use in some embodiments.
  • Fragments of specific lengths can be generated using methods known in the art, for example by modifying the time used for mechanical or enzymatic fragmentation, by using restriction endonucleases whose recognition sites appear with a known frequency in certain genomes, and other methods known in the art.
  • Fragments may be separated by size, for example using gel electrophoresis, sizing columns, filters, and other methods known in the art, to obtain desired fragment lengths
  • nested fragments are generated to produce a sample of fragmented nucleic acids for making decorated nucleic acids of the invention. Such nested fragments can be of particular use in sequencing applications in which sequence information obtained from different fragments is assembled using methods known in the art and described below in further detail.
  • nested fragments are created from starting seed fragments through deletions of defined size. Such nested fragments have exact- end deletions generated in one embodiment by first preparing a partial digest within one or a pool of frequent cutting restriction enzymes that produce 10-200 kb long fragments that start at predefined restriction enzyme recognition sequence sites approximately every 50-200 bases. An aliquot from this first digest is used directly in assays as described further herein.
  • the remainder of the fragments are subjected to deletion of a known number of bases from each end - in one embodiment this is accomplished through a number of consecutive cycles of ligation of an adaptor with a type Ns restriction enzyme binding site to ends of the fragments and then cleavage with the type Ms restriction enzyme.
  • the type Ms restriction enzyme is an exact cutter.
  • exact cutter is meant that the restriction endonuclease cuts at a known distance from the recognition site in all or most of the polynucleotide molecules.
  • Exact cutter endonucleases include without limitation Type Ms restriction endonucleases such as Eco57M I, Mme I, Acu I, Bpm I, BceA I, Bbv I, BciV I, BpuE I, BseM II, BseR I, Bsg I, BsmF I, BtgZ I, Eci I, EcoP15 I, Eco57M I, Fok I, Hga I, Hph I, Mbo II, MnI I, SfaN I, TspDT I, TspDW I, Taq II, and the like, all of which can be used to generate nested fragments according to the present invention.
  • Type Ms restriction endonucleases such as Eco57M I, Mme I, Acu I, Bpm I, BceA I, Bbv I, BciV I, BpuE I, BseM II, BseR I, Bsg I, BsmF I, BtgZ I
  • Sequence information obtained from the fragments subjected to different numbers of cycles of deletion can be compared to aid in assembly of the full sequence of the original target nucleic acid.
  • an amplification step can be applied to the population of fragmented nucleic acids to ensure that a large enough concentration of all the fragments is available for subsequent steps of creating the decorated nucleic acids of the invention and using those nucleic acids for obtaining sequence information.
  • amplification methods include without limitation: polymerase chain reaction (PCR), ligation chain reaction (sometimes referred to as oligonucleotide ligase amplification OLA), cycling probe technology (CPT), strand displacement assay (SDA), transcription mediated amplification (TMA), nucleic acid sequence based amplification (NASBA), rolling circle amplification (RCA) (for circularized fragments), and invasive cleavage technology.
  • PCR polymerase chain reaction
  • ligation chain reaction sometimes referred to as oligonucleotide ligase amplification OLA
  • CPT cycling probe technology
  • SDA strand displacement assay
  • TMA transcription mediated amplification
  • NASBA nucleic acid sequence based amplification
  • RCA rolling circle amplification
  • a sample of fragmented nucleic acids are enriched for target fragments.
  • target fragments fragments of interest for a particular application, for example, in clinical diagnostics, target fragments would include those fragments derived from a particular part of the genome or that contain particular marker sequences that indicate a disease or the presence of a pathogen. Methods for enriching a population of fragmented nucleic acids for target fragments are well known in the art.
  • capture probes are used to enrich for target fragments. Such capture probes will hybridize to target fragments, after which any fragments not hybridized to the capture probes can be removed from the sample. The captured target fragments can then be released from the capture probes using methods well characterized in the art.
  • an exonuclease can be applied to remove a number of bases of one end of one strand. In an exemplary embodiment, about 2000 to about 5000 bases are removed from one end of one strand. In a further embodiment, about 2100 to about 4500, about 2200 to about 4000, about 2300 to about 3500 and about 2400 to about 3000 bases are removed from one end of one strand.
  • Various enzymes are available for removing bases of one strand including without limitation Lambda exonuclease or Exonuclease III .
  • Capture probes will be able to hybridize to the resultant single stranded region of the fragment, thereby capturing fragments containing regions of interest.
  • multiple probes can be prepared and hybridized per fragment of interest to assure capture.
  • Capture probes can be designed to select for each of the two strands to hybridize targeted regions from both ends. Such consecutive "double capture” may also be performed to eliminate fragments that are of a shorter length than is desired for later uses, for example, in assays detecting specific target sequences.
  • Capture probes can be prepared in large quantities as a pool in an amount sufficient for thousands of preparations. In an exemplary embodiment, a set of 1000-10,000 capture probes are prepared for use in accordance with the present invention.
  • a sample of fragmented nucleic acids is enriched using a biotin- streptavidin system.
  • target fragments of interest are tagged with biotin and captured using streptavidin coated beads. Unbound fragments are washed away and then bound fragments can be released by cleaving the link between the bead and the fragment.
  • a sample of fragmented nucleic acids is enriched using PCR to amplify fragments with selected sequences.
  • PCR PCR to amplify fragments with selected sequences.
  • PCR can be used to determine that all targeted fragments are present in sufficient quantities. For many applications, including detecting specific target sequences and/or polymorphisms, 10,000 or more copies can ensure proper coverage of the target regions.
  • chromosome flow sorting is used to enrich for sequences prior to fragmenting.
  • the present invention provides methods for forming stretched decorated nucleic acids of the invention.
  • decorated nucleic acids are formed by labeling nucleic acids with detectable probes.
  • probes are incorporated into the structure of a nucleic acid.
  • probes are incorporated into the structure of a nucleic acid every 0.5 - 10 kb.
  • probes are incorporated every 1000-2000, 1500-9000, 2000-8000, 2500-7000, 3000-6000, 3500-5500, and 4000-5000 bases.
  • the frequency at which probes are incorporated along the length of a nucleic acid can be influenced by the size and structure of the probes, including the size and structure of their labels, as well as by reaction conditions, including the temperature at which the reactions incorporating the probes into the nucleic acid (which are described further below) are conducted.
  • probes are incorporated into the structure of a double stranded nucleic acid. Decoration by generating single stranded regions in a double stranded nucleic acid
  • probes are incorporated into a double stranded nucleic acid in a method in which single stranded regions are created along the length of the double stranded nucleic acid, and probes as described herein hybridize to those single stranded regions.
  • a schematic illustration of such a method is provided in FIG. 1.
  • a nicking enzyme is added to a double stranded nucleic acid to produce a nicked nucleic acid (see FIG 1 B).
  • Nicking enzymes are known in the art, and are generally altered restriction enzymes that hydrolyze only one strand of the duplex, to produce DNA molecules that are "nicked", rather than cleaved.
  • nicking enzymes of use in the present invention include without limitation NtCViPii, Nt.BstNBI, a naturally occurring thermostable nicking endonuclease cloned from Bacillus Stereothermophilus, Nb.BsrDI and Nb.Btsl, naturally occurring large subunits of thermostable heterodimeric enzymes, Nt.Alwl, a derivative of the restriction enzyme AIwI, Nb.BbvCI, Nt.BbvCI and Nb.Bsml, a bottom- strand specific variant of Bsml discovered from a library of random mutants, all of which are available from commercial suppliers such as New England Biolabs.
  • nicking enzymes will nick at different frequencies. For example, some nicking enzymes will nick at a frequency of about 1 nick every 100 bases. In a further embodiment, nicking enzymes are used that nick at a frequency of about 1 nick every 90, 80, 70, 60, 50, 40, 30, 20, 10 and 5 bases..
  • a nicking enzyme may nick a nucleic acid in the same location on both strands. Such a situation could result in the nucleic acid breaking in subsequent uses, particularly when stretched according to methods described below.
  • One method for preventing this possibility is to crosslink the nucleic acid to stabilize its overall structure. Such crosslinking methods are known in the art.
  • an exonuclease is applied to widen nicks created by the nicking enzyme to create gapped nucleic acids comprising a series of single stranded gaps along their lengths (FIG. 1C).
  • Exonucleases of use in the present invention are known in the art and include without limitation: RecJ, Lambda Exo, and T7 Exo, Exolll and the like. Exolll is of use in certain aspects of the present invention, because it will cut 20 bases and release those cut bases. This release of the cut portion of the nucleic acid can be useful in generating gapped nucleic acids in accordance with the present invention.
  • probes of the invention are added to the gapped nucleic acids and hybridize to the single stranded regions created by the application of the nicking and exonuclease enzymes (FIG. 1 D).
  • probes comprise probe sequences that are complementary to single stranded regions of the nucleic acids. Hybridization of the probes results in a double stranded nucleic acid "decorated" with at least one probe.
  • FIG. 1 D depicts only probes labeled at their 3' ends, it will be understood that the present invention encompasses probes labeled at one or both ends or at one of the nucleotides within the body of the probe, as discussed further above.
  • FIG. 1 D only depicts an exemplary embodiment for the sake of clarity.
  • a polymerase enzyme and nucleotides are applied to the gapped nucleic acids comprising hybridized probes to "fill in" remaining single stranded regions (represented by the asterisks in FIG. 1 E).
  • all of the single stranded regions are repaired in this way. This embodiment is pictured in FIG. 1 E.
  • only a subset of the single stranded regions are repaired.
  • Such partially repaired decorated nucleic acids are encompassed by the present invention.
  • Polymerases of use in this aspect of the invention are known in the art and include without limitation: Taq polymerases, E.
  • coli DNA Polymerase 1 Klenow fragment, reverse transcriptases, ⁇ 29 related polymerases including wild type ⁇ 29 polymerase and derivatives of such polymerases, T7 DNA Polymerase, T5 DNA Polymerase, RNA polymerases.
  • a ligase is applied to further repair the decorated nucleic acid.
  • the ligase may act on all of the nucleotides needing repair or only on a subset of such nucleotides.
  • Ligases of use in the invention are known in the art and include without limitation DNA ligase I, DNA ligase II, DNA ligase III, DNA ligase IV, E. coli DNA ligase, Taq ligase, T4 DNA ligase, T4 RNA ligase 1 , T4 RNA ligase 2, and the like.
  • the application of the nicking enzyme, the exonuclease, the probes and optionally the polymerase and nucleotides and ligase is conducted in a sequential manner, i.e., first the nicking enzyme is applied to create nicks, the exonuclease is applied to widen the nicks into gaps, the probes are then applied to hybridize to the gaps and then the polymerase, nucleotides, and ligase are optionally added to repair remaining single stranded regions.
  • the nicking enzyme, the exonuclease, the probes, and optionally the polymerase, nucleotides and ligase are applied to the nucleic acid simultaneously.
  • the combined action of these elements serves to stochastically create gaps and then fill in those gaps with either a probe or nucleotides or a combination of both. Applying all of these elements at once provides a solution to the problem of finding the appropriate point along a long nucleic acid molecule for a relatively shorter probe to hybridize.
  • the combined activity of all of these enzymes thus provides a method for "scanning" along a nucleic acid molecule and stochastically inserting the appropriate probes to generate the decorated nucleic acids of the invention.
  • the frequency of the simultaneously occurring nicking, generating gaps, and repair will be at a frequency such that at any particular moment less than 10% of the nicking sites are nicked.
  • chemical nicking is used.
  • Chemical nicking can be accomplished using compounds including without limitation piperidine, dimethyl sulfate, hydrazine, as well as combinations of these and/or other chemicals apparent to one of skill in the art. Chemical nicking generally has no frequency limitation.
  • a low efficiency 3' exonuclease such as a polymerase with 3' exonuclease activity, can be used to widen the nicks into gaps. If one or more probes are present that are complementary to the gap region, the probe will hybridize, and this hybridization will prevent the polymerase from filling in the gap with nucleotides. In a further embodiment, two probes may hybridize adjacent to each other in the gap and ligate.
  • a probe will hybridize to a complementary region of the nucleic acid, but will not have a free terminus for ligation to an adjacent nucleotide. This may in particular occur with probes that are labeled at one or both ends, as described above. For example, in FIG. 1 D, the label (depicted as a circle) is attached to the 3' end of the probe. Thus, the 3' end of the probe would not be available for ligation to an adjacent nucleotide.
  • two probes will hybridize adjacent to each to each other in the same gap of a gapped nucleic acid. These adjacently hybridized probes may or may not ligate to each other to form a ligated complex. As shown in Figure 4, depending on where the probes are labeled, the hybridized probes may not have a free terminus available for ligation. Figures 4B - 2D illustrate examples of probes that may not be able to ligate to each other even when hybridized in adjacent positions. The labels, depicted as circles, occupy one end of the probe, and the probes hybridize in such a way that their free ends are not available for ligation to each other.
  • Figure 4A and Figure 4E show examples in which adjacently hybridized probes can ligate to each other.
  • Figure 4A illustrates the situation in which the 3' end of a 5' labeled probe is free to ligate to a 5' end of a 3' labeled probe.
  • Figure 4E illustrates a situation in which the probes are labeled at an interior position, leaving both termini free for ligation. It is noted that the illustration in Figure 4 shows probes inserted only in the top strand - this was done for clarity's sake, and it will be understood by one of skill in the art that the methods discussed herein for decorating nucleic acids of the invention will produce decorated nucleic acids comprising probes on both strands.
  • the hybridization reactions will be conducted under conditions such that only probes capable of ligation will remain hybridized to the target nucleic acid.
  • probes hybridizing in configurations illustrated in Figures 4B-D would not remain hybridized to the nucleic acid and would not be part of the final decorated nucleic acid generated using this method of the invention.
  • the overall stability of the nucleic acid can be increased using known methods, including crosslinking one or more positions along the length of the nucleic acid to stabilize the overall structure.
  • single stranded gaps remaining in the substantially double stranded decorated nucleic acid will comprise less than 20%, 15%, 10%, 5%, 3%, or 1 % of all bases in the entire decorated nucleic acid.
  • the frequency at which probes are inserted along a nucleic acid can be estimated statistically, partly based on size of the probes. Other factors influencing this frequency are reaction conditions, including the temperature at which the nicking-gap-hybridization reactions are conducted. For full coverage of a genome, generally 1024 reactions - each including a sufficient number of target fragments from the same sample - can be reacted with four hexamer probes. In one embodiment, the target fragments have overlapping sequences, thus providing multiple reads for each base of the target nucleic acid.
  • An optimal concentration of enzymes and probes can be determined such that the number of molecules of exonuclease, polymerase and ligase matches the number of non-repaired gaps.
  • An exemplary combination of enzymes includes Nt.CviPII (
  • the Nt.CviPII enzyme activity varies from 25% to 100% when different buffers are used. Modification of reaction conditions or engineering of the enzyme active site may further increase the nicking frequency 2-4 fold. Similarly, temperature, salt concentration and the ratio of enzyme to DNA greatly affect Exolll enzyme activity. In addition, enzymes with different binding or preference sequences can be used to further control and modify the process of decorating a nucleic acid. For example, DNAse I nicks each DNA strand independently with preference for some sequences. Thus, reaction conditions, including buffers, temperatures, and reactant concentrations, can all be modified in the process of decorating a nucleic acid to optimize the hybridization of probes and any subsequent repair steps.
  • a plurality of nucleic acids undergoing the decoration method outlined above are separated into different aliquots at some point before the labeled probes are added to the gapped nucleic acids. This aliquoting may be done before the nicking enzyme is applied, before the exonuclease is applied, and/or before the labeled probes are added. Different sets of probes can be applied to the different aliquots, thus generating decorated nucleic acids with overlapping sequences that are decorated with different sets of probes.
  • Determining the order of the labels on the decorated nucleic acids from the different aliquots thus provides a greater amount of sequence information that can then be assembled using methods known in the art and described herein to provide sequences of larger target nucleic acids, including whole human genomes.
  • each set when different sets of probes are used, each set will generally comprise probes with probe sequences that are different from the probe sequences of the other sets. Each probe sequence within a set will also generally comprise a unique label.
  • double stranded nucleic acids are decorated through the use of invasive probes.
  • invasive probe is meant a probe that is able to enter a double stranded nucleic acid and hybridize to a complementary sequence on one of the strands, creating a "D-loop" within the structure of the nucleic acid.
  • Such invasive probes are generally associated with recombinases.
  • recombinase refers to a family of RecA-like recombination proteins all having essentially all or most of the same functions, particularly: (i) the recombinase protein's ability to property bind to and position targeting polynucleotides (also referred to herein as invasive probes) on their homologous targets and (ii) the ability of recombinase protein/targeting polynucleotide complexes to efficiently find and bind to complementary endogenous sequences.
  • the best characterized recA protein is from E.
  • recA803 see Madiraju et al., PNAS USA 85(18):6592 (1988); Madiraju et al, Biochem. 31 :10529 (1992); Lavery et al., J. Biol. Chem. 267:20648 (1992)).
  • many organisms have recA-like recombinases with strand-transfer activities (e.g., Fugisawa et al., (1985) Nucl. Acids Res. 13: 7473; Hsieh et al., (1986) Cell 44: 885; Hsieh et al., (1989) J.
  • recombinase proteins include, for example but not limitation: recA, recA803, uvsX, and other recA mutants and recA-like recombinases (Roca, A. I. (1990) Crit. Rev. Biochem. Molec. Biol. 25: 415), sep1 (Kolodner et al. (1987) Proc. Natl. Acad. Sci. (U.S.A.) 84:5560; Tishkoff et al. Molec. Cell. Biol. 11 :2593), RuvC (Dunderdale et al. (1991 ) Nature 354: 506), DST2, KEM1 , XRN1 (Dykstra et al.
  • RecA may be purified from E. coli strains, such as E. coli strains JC12772 and JC15369 (available from A. J.
  • the recA803 protein is a high-activity mutant of wild-type recA.
  • the art teaches several examples of recombinase proteins, for example, from Drosophila, yeast, plant, human, and non-human mammalian cells, including proteins with biological properties similar to recA (i.e., recA-like recombinases), such as Rad51 from mammals and yeast, and Pk-rec (see Rashid et al., Nucleic Acid Res. 25(4):719 (1997), hereby incorporated by reference).
  • the recombinase used with invasive probes of the invention may actually be a complex of proteins, i.e. a "recombinosome".
  • a recombinase included within the definition of a recombinase are portions or fragments of recombinases which retain recombinase biological activity, as well as variants or mutants of wild-type recombinases which retain biological activity, such as the E. coli recA803 mutant with enhanced recombinase activity.
  • recA or rad51 is used.
  • recA protein is typically obtained from bacterial strains that overproduce the protein: wild-type E. coli recA protein and mutant recA803 protein may be purified from such strains.
  • recA protein can also be purchased from, for example, Pharmacia (Piscataway, N.J.). RecA proteins, and its homologs, form a nucleoprotein filament when it coats a single-stranded DNA. In this nucleoprotein filament, one monomer of recA protein is bound to about 3 nucleotides. This property of recA to coat single-stranded DNA is essentially sequence independent, although particular sequences favor initial loading of recA onto a polynucleotide (e.g., nucleation sequences).
  • the nucleoprotein filament(s) can be formed on essentially any DNA molecule and can be formed in cells (e.g., mammalian cells), forming complexes with both single-stranded and double- stranded DNA, although the loading conditions for dsDNA are somewhat different than for ssDNA.
  • the conditions used to coat targeting polynucleotides with recombinases such as recA protein are known in the art, see e.g., U.S. Pat. Nos. 5,273,881 and 5,223,414, each incorporated herein in its entirety for all purposes and in particular for all teachings related to recombinases such as recA.
  • decorated nucleic acids of the invention are formed by adding a first set of recA invasive labeled probes to the double stranded nucleic acid to form D-loops within the double stranded nucleic acid, thus forming a decorated nucleic acid.
  • the recA invasive labeled probes comprise a plurality of non-overlapping probe sequence and each probe sequence comprises a unique label. Such probes hybridize to sequences in the double stranded nucleic acid that are complementary to the probe sequences.
  • a second set of recA invasive probes is added to the double stranded nucleic acid to form double D-loops within the double stranded nucleic acid.
  • the second set of recA invasive labeled probes are substantially complementary to the first set of recA invasive labeled probes.
  • both sets of probes may be labeled with the same color to increase fluorescence signal for each loop.
  • the two sets of invasive probes are labeled with different colors as a kind of internal control.
  • recA invasive labeled probes used in accordance with the invention comprise at least one modification selected from: locked nucleic acid, peptide nucleic acid and phosphorothioate nucleic acid. Such modifications can often serve to strengthen the hybridization of these probes with their complementary sequences than is possible with naturally occurring nucleic acids.
  • a polymerase extends the invasive probes until dideoxy is incorporated. Incorporation of dideoxy can help increase the stability of the resultant decorated nucleic acid molecules.
  • the hybridized invasive probes are extended using a polymerase.
  • This extension of the probes serves to stabilize the invasive probes and prevent the D-loop or the double D-loop from destabilizing the probes and causing them to detach from the nucleic acid. This extension is particularly useful in methods utilizing shorter probes (for example, probes of about 3-4 nucleotides in length).
  • Invasive labeled probes may comprise any of the structural aspects described above for probes of use in the invention, although generally recA invasive labeled probes will be of shorter length than probes used in other methods described herein for decorating nucleic acids.
  • hybridization conditions may be used to decorate nucleic acids in accordance with the present invention, including high, moderate and low stringency conditions; see for example Maniatis et al., Molecular Cloning: A Laboratory Manual, 2d Edition, 1989, and Short Protocols in Molecular Biology, ed. Ausubel, et al, hereby incorporated by reference. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Acid Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" (1993).
  • stringent conditions are selected to be about 5-10 0 C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH.
  • Tm is the temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium).
  • Stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30 0 C. for short probes (e.g.
  • Stringent conditions may also be achieved with the addition of helix destabilizing agents such as formamide.
  • the hybridization conditions may also vary when a non-ionic backbone, i.e. PNA is used, as is known in the art.
  • cross-linking agents may be added after target binding to cross-link, i.e. covalently attach, the two strands of the hybridization complex.
  • the sets may be divided into subsets that are used together in pools, as discussed in U.S. patent 6,864,052, which is hereby incorporated by reference in its entirety for all purposes an in particular for all teachings related to using sets of probes.
  • Probes from different sets may be hybridized to target sequences either simultaneously or sequentially. Probes from different sets may be hybridized as entire sets or as subsets, or pools. In one embodiment, lengths of probes in different sets are in the range of from about 3 to about 20 nucleotides. In a further embodiment, lengths of probes in different sets are in the range of from about 4 to about 18, about 5 to about 14, about 6 to about 12, about 7 to about 10 and about 8 to about 9 nucleotides. In a further aspect, probes from different sets will hybridize to adjacent positions on a target nucleic acid, and in some embodiments, these adjacently hybridized probes can be ligated, forming ligation products of lengths from about 6 to about 40 nucleotides.
  • the present invention provides methods of forming stretched decorated nucleic acids.
  • decorated nucleic acids are stretched by applying them to substrates. Substrates of use in the present invention are described in further detail above.
  • One advantage to the methods of forming decorated nucleic acids in accordance with the methods described above is that these molecules are often more flexible than naturally occurring nucleic acids but nevertheless retain enough stability to withstand the process of being "stretched". This combination of flexibility and strength is of particular use with substrates comprising nanostructures such as nanochannels.
  • decorated nucleic acids of the invention are stretched by applying them to nanochannels.
  • decorated nucleic acids of the invention are stretched by applying them to flowthrough systems.
  • flowthrough systems of the invention include a planar surface with alternating hydrophobic and hydrophilic regions.
  • the alternating hydrophobic and hydrophilic regions are in a linear pattern.
  • methods of the invention can be used to generate a plurality of different decorated nucleic acids.
  • each of a plurality of different nucleic acids can be applied to a single substrate, for example to a single nanochannel or a single lane of a flowthrough system.
  • the different decorated nucleic acids can be applied to the substrate sequentially, thus allowing sequence information from each one to be obtained in turn.
  • the plurality of different decorated nucleic acids are applied to a plurality of substrates, either singly or in groups. For example, an assembly comprising multiple nanochannels may be used, or a flowthrough system comprising multiple lanes. This embodiment increases the number of decorated nucleic acids that can be analyzed at once, thus allowing such analyses to be scaled up to high density high throughput applications.
  • double stranded decorated nucleic acids of the invention are applied to a nanostructure (e.g., a nanochannel or a nanopore) to remove one strand and to stretch the molecule, thus providing a stretched single-stranded decorated nucleic acid.
  • a nanostructure e.g., a nanochannel or a nanopore
  • the narrow nanostructures prevent banding and formation of intra-strand hairpin type secondary structures.
  • double stranded decorated nucleic acids are denatured to form single stranded molecules, and then those single stranded molecules are applied to the nanostructures or other substrates described in further detail above.
  • Both double stranded and single stranded decorated nucleic acids can be analyzed for sequence information in accordance with the present invention as described more fully below.
  • the present invention provides methods of using compositions comprising substrates with stretched decorated nucleic acids.
  • compositions of the invention are used to determine the sequence of a target nucleic acid.
  • target nucleic acid is meant a nucleic acid of interest.
  • sequence of a target nucleic acid refers to a nucleic acid sequence on a single strand of nucleic acid.
  • the target sequence may be a portion of a gene, a regulatory sequence, genomic DNA, cDNA, RNA including mRNA and rRNA, or others. It may be any length, with the understanding that longer sequences are more specific. As will be appreciated by those in the art, the target sequence may take many forms.
  • probes are designed to hybridize to target sequences to determine the presence or absence of the target sequence in a sample.
  • compositions of the invention are used in to cfe novo sequencing or resequencing applications.
  • De novo sequencing is the initial sequencing that results in the primary genetic sequence of organisms. A detailed genetic analysis of an organism is possible only after cfe novo sequencing has been performed.
  • Resequencing involves detecting target sequences, for example for candidate genes or other genomic regions of interest, in a sample. Such resequencing applications are a key step in detection of mutations associated with various congenital diseases. Resequencing techniques can be divided into those which test for known mutations (genotyping) and those which scan for any mutation in a given target region (variation analysis). Typical mutations being tested are single nucleotide polymorphisms (SNP), insertion and deletion mutations.
  • SNP single nucleotide polymorphisms
  • the present invention provides methods of using compositions comprising substrates with stretched decorated nucleic acids. As discussed herein, in one exemplary aspect, compositions of the invention are used to determine the sequence of a nucleic acid or detect a target sequence. In one aspect, the present invention provides methods for analyzing single molecule genome segments of 100,000 base pairs or longer to be analyzed at a rate of 1 billion raw base pairs per minute with an output accuracy of greater than 99.99%.
  • the present invention provides methods of detecting the probes incorporated into stretched decorated nucleic acids. Since these probes are generally associated with a specific sequence, identification of a particular probe on a stretched decorated nucleic acid provides sequence information about that nucleic acid.
  • Methods for detecting probes associated with nucleic acids are known in the art. Such methods often involve multicolor imaging systems.
  • hardware is provided to allow detection of decorated nucleic acids of the invention.
  • the system hardware comprises three major components; the illumination system, the reaction chamber, and the detector system.
  • the detection instrument can include several features such as: adjustable laser power, electronic shutter, auto focus, and operating software.
  • Signals from decorated nucleic acids of the invention can be detected by a number of detection systems, including, but not limited to, scanning electron microscopy, near field scanning optical microscopy (NSOM), total internal reflection fluorescence microscopy (TIRFM), and the like.
  • NOM near field scanning optical microscopy
  • TRFM total internal reflection fluorescence microscopy
  • Abundant guidance is found in the literature for applying such techniques for analyzing and detecting nanoscale structures on surfaces, as evidenced by the following references that are incorporated by reference: Reimer et al, editors, Scanning Electron Microscopy: Physics of Image Formation and Microanalysis, 2 nd Edition (Springer, 1998); Nie et al, Anal.
  • decorated nucleic acids of the invention can in some aspects be stretched in nanochannels. Signals from nucleic acids in such nanochannels can be detected using any of the methods discussed above. In one embodiment, signals from stretched decorated nucleic acids in nanochannels are detected using scanning electron microscopy. Similarly, decorated nucleic acids stretched in flowthrough systems discussed herein can also be detected using any of the methods discussed above. [0187] The simplest commercially available probe labeling scheme is to use single dye molecules.
  • a washing step is included in the process to ensure that almost all probe molecules that are not hybridized to the target nucleic acid are removed.
  • four dyes are used to label probe molecules.
  • dyes used are commercially available, such as the dyes in the "BigDye®" sequencing kits available from Applied Biosystems. Such dyes generally combine a donor and one acceptor. The same donor is used in all four dyes, and each different dye has a different acceptor that generates a different emission wave length after illumination of the donor. These dyes are bright and a single excitation wave length is generally sufficient for all four dyes. Detection of a single dye molecule in stretched DNA can be obtained with sensitive CCD cameras.
  • FRET is used for detecting probes in decorated nucleic acids of the invention.
  • FRET detects only an acceptor dye that is ⁇ 10nm from donor dye.
  • two probes will hybridize to adjacent positions of a target nucleic acid, and in some further aspects will ligate.
  • Such adjacent probes are ideally suited for FRET detection. For example, probes of ⁇ 3nm in size that are adjacent to each other will provide a FRET-tolerated distance of ⁇ 6nm (or another multiplier of 3 if additional probes are used).
  • Standard FRET will generally require that the probes used in the invention be labeled with pairs of existing proven dye molecules such as Cy3/Cy5.
  • molecular antennae are used with FRET detection methods.
  • these light harvesting polymers can provide a 10-fold higher light yield than regular fluorophores, increasing the accuracy of any method used to detect probes on decorated nucleic acids of the invention.
  • oligonucleotide probes are labeled with one or more fluorescent dye molecules. These probes are used to decorate nucleic acids using methods described herein.
  • Light harvesting polymers are then applied to the decorated nucleic acids.
  • Light harvesting polymers that attach to the hybridized probes will bring the molecules in close enough proximity to allow the Forster energy transfer (FRET) to occur between the cationic-conjugated polymer and the dye on the probe.
  • FRET Forster energy transfer
  • This energy transfer results in the light emitted by the complex to change from the blue emission of the light harvesting polymers on their own to the green light emitted by the fluorescent dye.
  • the ability of the light harvesting polymers to collect a large number of photons results in increased energy for transfer to the dye on the probe, thus boosting the resultant signal.
  • the dye-labeled probes comprise PNA.
  • stretched decorated nucleic acids on a substrate will be treated such that the hybridized probes will detach from the nucleic acids and then the same or different probes will hybridize to the same location on the nucleic acid.
  • temperature changes i.e., "melting” the hybridized probes off of the nucleic acid
  • changes in pH or ionic strength can be used to destabilize the probes decorating a nucleic acid such that they detach from the nucleic acid.
  • nucleic acid is contained in a relatively confined system, such as a nanochannel or the substrates comprising linear features described above, pools of probes can be washed over the nucleic acid, such that probes again hybridize to the same location of the nucleic acid.
  • This embodiment is particularly amenable to nucleic acids that are decorated by generating single stranded gaps in a double stranded nucleic acid through some combination of nicking and exonuclease enzymes (such methods are described in further detail above in the section entitled "Methods of making compositions of the invention").
  • Stretched decorated nucleic acids of the invention can be used to obtain sequence information of a target nucleic acid.
  • the sequence of target nucleic acids are obtained by cumulative detection of labeled probes on decorated nucleic acids of the invention.
  • decorated nucleic acids comprising a plurality of probes are analyzed for their sequence signatures.
  • sequence signature is meant a pattern of probe decoration that is in general different for different nucleic acids.
  • decorated nucleic acids of the invention are generally produced through a stochastic process in which probes are incorporated into the structure of the nucleic acid at random intervals along the length of that nucleic acid. These stochastic processes thus generate a different order and relative distance of probes in different nucleic acids. Detection of the pattern of these probes for each individual nucleic aid provides the sequence signature for that nucleic acid.
  • a signature comprises about 4 probes.
  • signatures comprise about 8, 12, 16, 24, 32, 36 or more probes.
  • sequence signatures of individual decorated nucleic acids are combined to assemble larger sequences, including entire genome sequences.
  • sequence signatures are used to provide a genomic "map" of probe identities. This map can be processed to identify the sequences represented by the probes.
  • a single signature type is used for each nucleic acid within an entire sample.
  • multiple independent signatures are determined for a portion of a sample and used as a "representative set" of the sample.
  • the present invention provides methods for extracting accurate sequence signatures within the context of a high density imaging environment and to distinguish the actual signature from background noise.
  • signatures are generated from multiple images of stretched decorated nucleic acids of the invention.
  • signatures are obtained from data on probe binding "intensity".
  • the data used to assemble a unique signature for each nucleic acid corresponds to the number of distinguishing probe features that can be used to identify the presence of each individual probe. For example, in a case where four different probes are used, each linked to a different fluorescent marker identifiable by its color, four different images or intensity graphs would be taken for each decorated nucleic acid (one for each color).
  • the multiple images must be carefully aligned so that there is no significant offset between the images. In one aspect, this is accomplished by utilizing fiducials to ensure that each image is aligned with each previous image. Another way to ensure that multiple images are aligned is by taking each of the images at the same position, for example, by changing filters or by using multiple cameras. Cameras can be adjusted at the beginning of the imaging process to best align with the substrate on which the decorated nucleic acids are stretched and to take into account the pixels of the imaging device. If multiple cycles are performed on the same stretched decorated nucleic acids, the signatures from consecutive cycles can be used independently or can be combined to provide further specificity.
  • the signature of each nucleic acid molecule is generated by identifying the presence or absence of specific probes in consecutive resolution segments of the individual molecules.
  • resolution segment is meant a distance with a predefined accuracy in ordering neighboring probe matches. This predefined accuracy will generally be about 90%, but may be higher or lower.
  • sequence signature can be generated by identification and ordering of these probes on the stretched decorated nucleic acid within each resolution unit. This type of signature is particularly useful if probe matching sites on the nucleic acid are infrequent relative to optical resolution of the probes. For example, starting from the first positive probe a signature would be determined as follows:
  • Fragment 1 BBBBBB2 - BBBBBB3 - OOOOOO - BBBBBB2 .... BBBBBB1 - BBBBBB4 Fragment 2: BBBBBB4 - BBBBBB1 - BBBBBB3 - OOOOOO .... BBBBBB2 - OOOOOO
  • a resolution unit is expressed in number of bases, such as 500 bases.
  • a signature generated in this way can capture information about multiple occurrences of the same probe within one resolution unit.
  • the signatures can be mapped (aligned) to one or more genomic reference sequences.
  • the mapping approach used should accommodate the possibility of large numbers of missing probe binding scores and/or rare unexpected mutation reading probes or false positive scores. Furthermore, the mapping approach should allow proper mapping signatures for segments of 100 kb due to DNA rearrangements. In one embodiment, a high mapping speed is used to match the reader output of about 100,000 or more signatures per second (1 e 9 or more signatures in approximately 3 hours).
  • sequence signatures are mapped to an informative reference table of genome locations of individual sequence signatures created using the information of genomic sequence data from multiple sources.
  • sequence signatures are compared to a defined set of signatures created from empirical observation.
  • simple signatures (in which no distance information is included) made of 4 different 6-mers are identified that have a frequency in single stranded DNA of 1 in 4 P , where P is the number of positions in the signature.
  • Signatures generated on dsDNA can be equivalent to using 8 different probes if the probes recognizing complementary DNA sequences are labeled the same way; for example labeled with the same fluorescent dye.
  • chromosomal sequences are assembled cfe novo using sequence signatures obtained from fragments.
  • An exemplary embodiment is illustrated in Figure 11.
  • chromosomes from two different cells are broken into fragments.
  • Four differentially labeled probes (identified by the letters Q, U, P, and E in Figure 11 ) are added. Each probe will bind to a specific sequence. Analysis of images taken from each fragment will provide a "signature" for each fragment.
  • the chromosomal sequences can then be constructed by aligning signatures based on probe pattern and length ( Figure 11 E).
  • consensus signatures can be converted into partial or complete chromosomal sequences.
  • An exemplary embodiment is illustrated in Figure 10.
  • Each probe set comprises four differentially labeled probes, and each probe within each set has a unique probe sequence.
  • the partial sequences obtained from each of the probe sets can be combined to provide the chromosomal sequence (see Figure 10C),
  • Figure 12 illustrates a further embodiment in which consensus signatures for each haplotype chromosome are assembled from the signatures obtained from the fragments.
  • signatures for a 100 kb fragment of dsDNA that have 200 matching positions of 4 different 6-mers have a frequency in nucleic acid sequence equivalent of a 200-base contiguous sequence, e.g., these signatures have no practical chance to match other than the true site in the human genome.
  • a signature representing a 100 kb fragment can be subdivided into twenty -15 kb segments that are overlapped by 10 kb. Even shorter fragments such as fifty 4-6 kb segments starting at every 2 kb may be used to find matching segments in the presence of 10-20% of errors. These short signatures have low frequency of matching genomic sequences by chance, as such signatures occur extremely rarely in genomic sequence (e.g., the human genome, which has only 6 million possible signatures created from 4 different 6-mers on dsDNA).
  • Different genomic reference index tables can be created for the different forms of signature, e.g., signatures created using an order of probe sequences, by resolution units, by estimated neighboring distances between probes, etc.
  • An effective way to search signatures of a given number of probes is to generate reference index tables for the entire query region of a nucleic acid, e.g., an entire human genome or a specific sub-set of the human genome.
  • a reference table can be created that has 16 million possible signatures (4 12 , i.e., combinations of any of four 6-mers on each of twelve positions on a DNA segment) and for each signature all matching positions are available in the reference sequence table. Because the entire human genome has only 6 million such signatures, about 40% of such signatures will have one matching position in the genome. Genome positions for each signature can be directly found in the reference table by reviewing the index of each given signature. Each signature can be reviewed in the table in both the forward and the reverse orientation.
  • Reference index tables can be generated for matching signatures with deletions (non-scored 6-mers) due to errors in data or mutations in the sequence relative to the reference.
  • signatures of thirteen positive probes can be represented as thirteen signatures having twelve positive probes by deleting one probe at a time (-8% false negative rate).
  • Six million signatures for the entire human genome will generate 78 million positions to be distributed in the reference table that has 16 million possible signatures.
  • 12-probe experimental signature segments can then be checked in this table and cross-referenced to the original table to identify the most likely sequence match. A 100 kb fragment with 10-100 such overlapping signature segments will have a very low chance for false mapping. Longer signatures are generally more unique and can tolerate more missing probe scores. For example, a 15-probe experimental signature segment will find five occurrences in the genome, similarly as in the above example with twelve probe signatures (assuming an 8% missing probe rate).
  • experimental signatures can be expanded in sets of shorter signatures by removing one probe at a time.
  • a 16-probe signature can be compared to 15-probe reference table by creating sixteen different 15-probe signatures.
  • the sequence determination process includes detecting probes that are positive within a segment of nucleic acid followed by sequence assembly by compiling overlapped detected sequences.
  • a long stretched nucleic acid can be envisioned to consist of a series of consecutive 500-base segments, and shifts in these series can be defined based on sequence compilation. Local (within each segment) overlapping probe sequences from all aligned signatures are compiled and the determination value for a reference or a new sequence variant can be calculated.
  • each signature is first aligned to the reference sequence according to -160 matching probes (6 bases read every 500 bases would equal approximately an 80% detection rate).
  • each signature belongs to a nucleic acid fragment that is about 10 bases shifted from the neighboring fragment. If fragmenting is done by partial digest with a restriction enzyme that cuts frequently, fragments will start approximately every 50 bases, on average, and there will be several fragments starting at that position. Each base is covered by -10,000 overlapping 100 kb fragments but only a specified number will have probes reading a given base at a particular point.
  • each probe is annotated as matching or not matching to the reference.
  • Each fragment that matches non-continuously to the reference sequence is also annotated to be able to collect a proper subset of probes to assemble break-point sequences.
  • each base will be covered with up to 100 overlapping probes coming from different signatures. Even with a 20% false negative rate, about 80 probes (e.g., -6 overlapped 6-mer probes each scored in multiple overlapping 100 kb fragments) will be annotated as matching the reference sequence and would repeatedly confirm the identity of that base.
  • Some other probes may initially be scored as false positives reading a different base for that position but as soon as their number is small they can be recognized as false scores.
  • probes expected to be positive will not have significant number of occurrences for that region in the data set.
  • unexpected 6-mers that are detected for that 500 base region that are not found to match the reference sequence can be assembled by compiling their mutually overlapped sequence and their overlapping sequences with sequences surrounding the mutation site. This approach defines a bridging mutated sequence. This sequence has to have a significant number of overlapping 6-mers as a confirmation that this is a real sequence.
  • the bridging sequence provided by the probes may extend from the six overlapping probes in the case of single base mutation to over hundreds or thousands of bases.
  • sequence assembly This type of sequence assembly is known as "local de-novo sequence assembly.”
  • Several types of information can be provided to allow efficient de-novo assembly: 1 ) overlapping 6-mers, 2) order of small groups of 6-mers (the higher the resolution along DNA segments and more fragment ends from the overlapping DNA fragments, the smaller the groups), 3) knowledge of an exact distance between ends of some overlapped fragments that defines an exact length of sequence for a group of probes detected between two fragment ends; 4) knowledge of the fragment end sequence (1-4 bases); and 5) reference sequence match. Integration of these types of information assures unique sequence assembly and determination of the size of simple repeats.
  • the intensity of color detected for a single molecule provides sequence information.
  • the intensity of color can reflect the number of probes present on the molecule. That information can be used in conjunction with the sequence represented by that color (i.e., by the probe) to assist in assembling the sequence.
  • Tandem repeats can be detected and analyzed by using stretched decorated nucleic acids that are generated from nested fragments. As discussed further above, nested fragments can be produced through successive cycles of restriction enzyme digests. When these consecutive cycles utilize exact cutter type Ns restriction enzymes, identifying the sequence of these fragments will provide sequences that are separated by an exact number of bases. This allows determination of an exact length of tandem repeats such as mono, double and triple repeats that are located between these fragment ends.
  • decorated nucleic acids of the invention are used to detect the presence of a target sequence in a sample.
  • target nucleic acids obtained from a sample are used to generate stretched decorated nucleic acids of the invention.
  • a sample may comprise but is not limited to, cells, tissues, bodily fluids (including, but not limited to, blood, urine, serum, lymph, saliva, anal and vaginal secretions, perspiration and semen, of virtually any organism, with mammalian samples being preferred and human samples being particularly preferred); environmental samples (including, but not limited to, air, agricultural, water and soil samples); biological warfare agent samples; research samples; purified samples, such as purified genomic DNA, RNA, proteins, etc.; raw samples (bacteria, virus, genomic DNA, etc.); as well as libraries (such as cDNA libraries generated from mRNA), amplified and synthetic nucleic acids.
  • Detection of target sequences in a sample has a number of uses, including pathogen detection and clinical diagnostics.
  • Clinical diagnostics can include without limitation detection of markers of disease, prenatal diagnostics for identification of potential developmental abnormalities, and point of care testing.
  • detection of target sequences in a sample can be used to analyze genetic variation, including SNPs, insertions, and deletions.
  • Decorated nucleic acids of the invention are particularly useful for detecting target sequences, because the information present in the intact double stranded molecules provides the contextual information necessary for generating accurate information.
  • decorated nucleic acids of the invention are generally longer than target nucleic acids used in conventional assays, fewer (but longer) probes can be used per reaction to detect a target sequence. .
  • targeted disease or pathway analysis such as cancer diagnostics, only a predefined set of about 100-1000 genes may be required for diagnostic testing.
  • a sample of genome fragments used to generate decorated nucleic acids of the invention is enriched for the genomic region of interest, thus reducing the required sequencing effort 10-100 fold relative to sequencing the entire genome.
  • Targeted fragment selection can optimize the nucleic acid population used in diagnostic applications by reducing or eliminating the occurrence of false negatives.
  • target sequence detection involves gene expression analysis by quantification of a specific message.
  • cDNA is generated using methods well known in the art from mRNA obtained from a sample. This cDNA can be used to generate decorated nucleic acids using methods described above. Analysis of decorated nucleic acids generated from such cDNA provides representative sequence information the presence and relative abundance of specific splice variants within a sample. Since cDNA generated from mRNA will start at the same point, spatial localization of sequence information obtained from such cDNA is relatively straightforward. Longer probes may be of particular use in such analyses, because the optimized enriched population of target nucleic acids will make it more likely that these longer probes will have enough 'hits' to provide sufficient coverage.
  • the methods of the present invention can be used to identify individual mRNA molecules from a biological sample.
  • the majority of full-length mRNAs from any biological sample are longer than 2 kb. Since unique patterns of decoration (patterns of labeling with probes) can be generated for approximately 2 kb fragments in one reaction, the assay methods of the claimed invention are particularly useful in the high throughout, parallel processing of transcript information from a sample.
  • the resolution of detecting probes, particularly the number of bases between two bound probes that can be discriminated optically or tolerated physically, and the number of differentially labeled probes (dyes or intensity) that are used in a single assay reaction are both parameters that are of use in optimizing this aspect of the invention. These parameters can be optimized for the number and quantity of transcripts that are to be identified in a given biological sample.
  • Flow analysis of signature probes provides an efficient way to generate comprehensive digital gene expression data, including the identification of splice variants and any potential "gene editing" that occurred during transcription.
  • a single messenger per cell will be examined 100 times, on average.
  • redundancy provides highly accurate counting of each messenger from each gene, especially if 4-8 different signatures are prepared on 4-8 aliquots of -100 cells each prepared from the same tissue.
  • decorated nucleic acids made using the methods describe above are used to obtain sequence information without being stretched on a substrate. Often, decorated nucleic acids in solution can be used with certain combinations of probes and sets of probes to provide sequence information.
  • signature probes and “diagnostic probes” are utilized in a single assay to detect target sequences.
  • signature probe is meant a probe complementary to a target sequence, generally a unique sequence, such as a genetic marker for disease.
  • diagnostic probe is meant a probe comprising a sequence that complementary to a sequence that allows the diagnostic probe to hybridize adjacent to the signature probe.
  • Diagnostic probes will generally provide information on any polymorphisms or sequences associated with a genetic mutation. For example, to detect the presence of a SNP, two differentially labeled diagnostic probes can be used, such that detection of one probe over the other indicates whether a polymorphism present in the sample. In such assays, the diagnostic probes are differentially labeled from the signature probes to provide simultaneous identification of signature sequences and identification of diagnostic sequences (e.g., SNPs or sequences associated with a genetic mutation) on the target nucleic acids.
  • diagnostic sequences e.g., SNPs or sequences associated with a genetic mutation
  • a four color system is used in the combinatorial assay of the invention: two colors are used to label signature probes and two colors used to label diagnostic probes that identify sequence variants in a particular genomic location.
  • the number of fragments to be analyzed can be reduced by enriching for specific fragments harboring the gene or genes of interest. Methods of enriching for specific target sequences are discussed in more detail above.
  • a prenatal diagnostic panel including 100 critical genes can be interrogated simultaneously using the assay methods of the invention.
  • Each gene may have a specific site that is being interrogated, or each gene may have several diagnostic sites each.
  • a set of 2 x -1000 probes can be used in the same assay.
  • the diagnostic sites are 500 bases or more apart to allow for individual hybridization of the probes and optimal resolution of the imaging of any hybridization events. For greater accuracy, each diagnostic site can be probed ⁇ 20 times on average to minimize the potential impact of false negative scores on the final results.
  • Signatures created using 2-probe signatures provide unique mapping of genomic fragments as short as about 20 kb (even assuming a -20% false negative probe scoring rate).
  • One way to generate 2-probe signatures is to mix two 6-mers labeled with the same color. The other is to have less frequent probe sites and more variable distance measurements, e.g., 6-8 frequent distances.
  • the frequency of binding of the two probes is 1 in ⁇ 4 22 , i.e., equivalent to the binding frequency of a 22-mer. 22-mers statistically occur once in a genome that is >1 ,000 times longer than the human genome.
  • sequence information is obtained from decorated nucleic acids through sequencing-by-synthesis methods.
  • primers hybridize to probes on the decorated nucleic acid and their extension by a polymerase is detected.
  • Sequencing-by-synthesis methods are well known in the art and are described for example in U.S. Pat. Nos. 4,971 ,903; 6,828,100; 6,833,256; 6,911 ,345, as well as in Hyman, Anal. Biochem. 174:423 (1988); Rosenthal, International Patent Application Publication 761107 (1989); Metzker et al., Nucl. Acids Res. 22:4259 (1994); Jones, Biotechniques 22:938 (1997); Ronaghi et al., Anal. Biochem.
  • the present invention provides systems for obtaining sequence information from decorated nucleic acids.
  • systems will include assemblies that comprise nanostructures.
  • these systems will further include components for bridging the length scale extremes from the pipette to the nanochannel, from the millimeter to the nanometer scale.
  • systems of the invention include a package comprising an inlet port that leads to nanochannels, which in turn lead to an outlet port.
  • the system includes an imaging system combined with a hardware setup designed to accommodate and run a chip comprising assemblies comprising nanochannels.
  • systems include image acquisition software and a user interface to ensure ease of use and efficient data collection.
  • systems of the invention will include assemblies comprising a substrate such as that illustrated in Figure 13, which will comprise a non-patterned region and a set of linear features. Such systems may further include mechanisms for flowing nucleic acid-containing solutions across the substrate, such as pumps, electrodes, valves, as well as mechanisms for tilting the substrate to allow gravity to cause the flow of nucleic acids through the non-patterned region to the linear features.
  • such systems will also include an imaging system combined with a hardware setup designed to accommodate and run a chip comprising assemblies comprising nanochannels.
  • systems include image acquisition software and a user interface to ensure ease of use and efficient data collection.
  • Nucleic acids for example DNA
  • a sample such as a drop of blood ( Figure 9A).
  • a drop of blood will contain about 100,000 cells, from which genomic DNA can be isolated.
  • the DNA can then be fragmented into approximately 1000 fragments per chromosome. Such fragments are generally about 100 kb in length.
  • the fragments can then be dispensed with 4 or more differentially labeled probes (which will generally have about 5-7 informative bases) into a multiwell plate ( Figure 9B).
  • the probes will hybridize or ligate to complementary sequences in the fragments as described in further detail above to form decorated fragments.
  • a substrate can be a nanochannel chip, although it will be appreciated that any number of substrates can be used, as described in further detail above.
  • a nanochannel chip can in some embodiments have approximately 4000 100nm x 50 micron channels.
  • the DNA molecules are stretched by the flow force in the narrow channels. Imaging of the stretched DNA can be obtained - in some embodiments, such imaging is accomplished at approximately 40 frames per second. In further embodiments, multiple imagers may be used in parallel to obtain images from the stretched DNA.
  • DNA signatures can be extracted and analyzed from the images of the stretched DNA by processing the multicolor images to define the order and optionally the relative distances of the probes decorating each molecule ( Figure 9D).
  • Such unique signatures will generally have approximately a 500 base resolution and can be mapped to a reference sequence ("RefSeq in Figure 9D). A precise genome map can thus be obtained from the processed images. Complete sequence can be assembled for each parental chromosome in the patent's cells from millions of 100 kb overlapping signatures.
  • the present invention provides a composition that comprises a substrate comprising a plurality of locations. Each location of the substrate comprises a single molecule of stretched decorated nucleic acids. Each of the stretched nucleic acids comprises a plurality of probes, and the stretched decorated nucleic acids are positioned on the substrate in such a way that they are optically resolvable. [0245] In a further embodiment and in accordance with the above, each of the plurality of locations in a composition of the invention is a nanochannel.
  • the substrate comprises hydrophobic regions alternating with hydrophilic regions.
  • such alternating regions may be provided in a linear pattern.
  • stretched decorated nucleic acids of the invention are formed by: (i) nicking a nucleic acid to form a nicked nucleic acid, (ii) adding an exonuclease to the nicked nucleic acid to form a gapped nucleic acid, and (iii) adding a first set of labeled probes to the gapped nucleic acid such that at least one of the first set of labeled probes hybridizes to single stranded areas of said gapped nucleic acid.
  • the first set of probes comprises a plurality of non-overlapping probe sequences.
  • each probe sequence comprises a unique label.
  • steps (i) through (iii) are performed simultaneously or are performed sequentially.
  • a second set of labeled probes is added to the gapped nucleic acid to hybridize to single stranded areas of the gapped nucleic acid.
  • Such second set of probes may also comprise a plurality of non-overlapping probe sequences which each comprise a unique label.
  • probes from the first set and from the second set that are hybridized to adjacent positions of the gapped nucleic acid will hybridize to each other.
  • stretched decorated nucleic acids of the invention are formed by (i) providing a double stranded nucleic acid; (ii) adding a first set of recA invasive labeled probes to the double stranded nucleic acid to form D-loops within the double stranded nucleic acid, thus forming a decorated nucleic acid; and (iii) stretching the decorated nucleic acid to form a stretched decorated nucleic acid.
  • the recA invasive labeled probes comprise a plurality of non-overlapping probe sequence and each probe sequence comprises a unique label. Such probes hybridize to sequences in the double stranded nucleic acid that are complementary to the probe sequences.
  • a second set of recA invasive probes is added to the double stranded nucleic acid to form double D-loops within the double stranded nucleic acid.
  • the second set of recA invasive labeled probes are substantially complementary to the first set of recA invasive labeled probes.
  • the recA invasive labeled probes used in accordance with the invention comprise at least one modification selected from: locked nucleic acid, peptide nucleic acid and phosphorothioate nucleic acid.
  • gaps within a nucleic acid are repaired by adding a polymerase, dNTPs and a ligase. In a further embodiment, all gaps within the nucleic acid are repaired. In a still further embodiment, only a portion of the gaps of are repaired.
  • the present invention provides methods for detecting the presence of a target nucleic acid by using stretched decorated nucleic acids of the invention. In one embodiment, a substrate comprising the stretched decorated nucleic acids is provided and at least one of the labeled probes of the stretched decorated nucleic acids is detected, thereby indicating the presence of the target nucleic acid.
  • decorated nucleic acids of the invention are stretched by applying the nucleic acids to a nanochannel.
  • different decorated nucleic acids may be applied to an assembly comprising a plurality of nanochannels.
  • different decorated nucleic acids may be applied sequentially through the same nanochannel.
  • decorated nucleic acids of the invention are stretched by applying the nucleic acids to a flowthrough system that comprises surface that comprises hydrophobic regions alternating with hydrophilic regions.
  • the alternating hydrophobic and hydrophilic regions are arranged in a linear pattern.
  • different decorated nucleic acids are applied to different flowthrough systems.
  • a plurality of decorated nucleic acids are applied to the same flowthrough system.
  • the stretched decorated nucleic acids of the invention are substantially double stranded.
  • stretched decorated nucleic acids of the invention comprise probes that comprise a plurality of fluorophores.
  • stretched decorated nucleic acids of the invention comprise probes at least one of which is a dendrimeric probe.
  • the dendrimeric probe is a branched nucleic acid.
  • stretched decorated nucleic acids of the invention comprise probes that are hexamers.
  • stretched decorated nucleic acids of the invention are formed from nucleic acids obtained from a sample.
  • the sample is obtained from a target organism.
  • detecting the presence of a target nucleic acid in accordance with the present invention comprises identifying the presence of a pathogen in the sample.
  • detecting the presence of a target nucleic acid identifies the source of that target nucleic acid.
  • stretched decorated nucleic acids of the invention are used to obtain sequence information from a target nucleic acid.
  • a substrate comprising stretched decorated nucleic acids of the invention is provided.
  • the stretched decorated nucleic acids of the invention will generally comprise a plurality of labeled probes. The order of the labeled probes on the stretched decorated nucleic is determined, and that order thereby provides sequence information for the target nucleic acid.
  • different stretched nucleic acids comprise a different set of labeled probes, and determining the order of each of the different set of labeled probes provides information about the sequence of a target nucleic acid.
  • each set of labeled probes comprises a plurality of non-overlapping probe sequences that are different from the probe sequences of the other sets of labeled probes.
  • each probe sequence comprises a unique label.
  • the set of labeled probes comprises four hexamers.
  • a plurality of decorated nucleic acids are stretched on a single substrate and the order of probes is determined for one or more of the plurality of nucleic acids.
  • the plurality of decorated nucleic acids are obtained from a set of decorated nucleic acids.
  • the set of decorated nucleic acids are formed from the same target nucleic acid.
  • different decorated nucleic acids are stretched on different substrates, and the order of probes for one or more of the different decorated nucleic acids are determined.
  • the different decorated nucleic acids are aliquots from a set of decorated nucleic acids.
  • the set of decorated nucleic acids are formed from the same target nucleic acid.
  • Example 1 Optimizing buffers for use in decorating nucleic acids
  • a buffer that is compatible for all four of these enzymes is needed.
  • the following buffer is of use in the present invention:
  • Example 2 restriction enzyme digestion ofApoB DNA upon treatment with nicking enzymes, exo III and probe hybridization An ApoB 4.86 kb PCR fragment was nicked with Nb.BbvCI or Nt.BbvCI at 37°C for 2 hours followed by exo III treatment for 2 minutes at 22°C. The nuclease was then heat-inactivated at 80 0 C for 20 minutes. A 24-mer Nb.BbvCI probe was added to the exo III mixture for probe hybridization at 25°C for 50 minutes followed by 15 minutes of T4 ligase treatment. Restriction enzyme digestion demonstrates that the 24-mer Nb.BbvCI probe hybridized to the gaps generated at the Nb.BbvCI nicks, but not the gaps generated at the Nt.BbvCI nicks. [0267]
  • the decorated nucleic acids (Lambda DNA) were attached to a coverslip coated with poly(allylamine) and poly(acrylic acid), resulting in their stretching.
  • Nucleic acids were decorated with probes labeled with Qdot 605 (Invitrogen) with biotin alkyl amines attached to the 5' ends.
  • the decorated nucleic acids were stained with the intercalating dye YC 1 YO-I .
  • YOYO-1 was imaged with 488 nm excitation and 520 nm emission.
  • the Qdot 605 was imaged with 590 nm and 630 nm emission. The images were overlaid to show co-localization of the probes and the double-stranded nucleic acid.
  • Qdots are bright and allow for visualization of probes in the expected locations of nucleic acids nicked at a specific site with Nt.BspQI and with a gap opened by limited treatment with exonuclease III
  • the decorated nucleic acids were diluted to 100 pM in sterile TE buffer (10 mM
  • Tris pH 8.0, 1 mM EDTA
  • YOYO-1 iodide 300 pM
  • the coverslips were treated by multiple steps with poly(allylamine) and poly(acrylic acid) immersions to minimize background emission and provide uniform stretching of DNA molecules.
  • a segment of human genomic DNA to be sequenced has the following wild-type reference sequence as determined through compilation of empirical data:

Abstract

La présente invention concerne une chimie d'interrogation de séquence qui combine la précision et l'intégrité haplotype d'un séquençage longue durée avec des procédés perfectionnés de préparation d'acides nucléiques génomiques et d'analyse d'informations de séquence générées à partir de ces acides nucléiques. La présente invention porte également sur des compositions renfermant des acides nucléiques décorés étendus sur des substrats. La présente invention porte en outre sur des procédés de fabrication d'acides nucléiques décorés étendus et sur des procédés d'utilisation d'acides nucléiques décorés pour obtenir des informations de séquence.
PCT/US2008/080045 2007-10-15 2008-10-15 Analyse de séquence à l'aide d'acides nucléiques décorés WO2009052214A2 (fr)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US98013207P 2007-10-15 2007-10-15
US60/980,132 2007-10-15
US98030607P 2007-10-16 2007-10-16
US60/980,306 2007-10-16
US98071107P 2007-10-17 2007-10-17
US60/980,711 2007-10-17
US98104607P 2007-10-18 2007-10-18
US60/981,046 2007-10-18

Publications (2)

Publication Number Publication Date
WO2009052214A2 true WO2009052214A2 (fr) 2009-04-23
WO2009052214A3 WO2009052214A3 (fr) 2009-06-25

Family

ID=40561776

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/080045 WO2009052214A2 (fr) 2007-10-15 2008-10-15 Analyse de séquence à l'aide d'acides nucléiques décorés

Country Status (2)

Country Link
US (1) US8951731B2 (fr)
WO (1) WO2009052214A2 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015100473A1 (fr) * 2014-01-02 2015-07-09 The University Of Queensland Procédé et appareil de séquençage
CN105452482A (zh) * 2013-03-13 2016-03-30 北卡罗来纳-查佩尔山大学 用于对全基因组进行快速作图的纳米流体装置以及相关的分析系统和方法
US10471428B2 (en) 2015-05-11 2019-11-12 The University Of North Carolina At Chapel Hill Fluidic devices with nanoscale manifolds for molecular transport, related systems and methods of analysis
US10996212B2 (en) 2012-02-10 2021-05-04 The University Of North Carolina At Chapel Hill Devices and systems with fluidic nanofunnels for processing single molecules

Families Citing this family (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070190542A1 (en) * 2005-10-03 2007-08-16 Ling Xinsheng S Hybridization assisted nanopore sequencing
US8278047B2 (en) 2007-10-01 2012-10-02 Nabsys, Inc. Biopolymer sequencing by hybridization of probes to form ternary complexes and variable range alignment
AU2008308457A1 (en) * 2007-10-04 2009-04-09 Halcyon Molecular Sequencing nucleic acid polymers with electron microscopy
US10093552B2 (en) 2008-02-22 2018-10-09 James Weifu Lee Photovoltaic panel-interfaced solar-greenhouse distillation systems
US9650668B2 (en) 2008-09-03 2017-05-16 Nabsys 2.0 Llc Use of longitudinally displaced nanoscale electrodes for voltage sensing of biomolecules and other analytes in fluidic channels
WO2010028140A2 (fr) * 2008-09-03 2010-03-11 Nabsys, Inc. Utilisation d'électrodes nanométriques longitudinalement déplacées pour une détection de tension de biomolécules et autres analytes dans des canaux fluidiques
US8262879B2 (en) 2008-09-03 2012-09-11 Nabsys, Inc. Devices and methods for determining the length of biopolymers and distances between probes bound thereto
WO2010042007A1 (fr) * 2008-10-10 2010-04-15 Jonas Tegenfeldt Procédé de cartographie du rapport at/gc local sur la longueur d'un fragment d'adn
EP2394164A4 (fr) * 2009-02-03 2014-01-08 Complete Genomics Inc Cartographie de séquences oligomères
US8731843B2 (en) * 2009-02-03 2014-05-20 Complete Genomics, Inc. Oligomer sequences mapping
WO2010091023A2 (fr) * 2009-02-03 2010-08-12 Complete Genomics, Inc. Indexage d'une séquence de référence pour la cartographie d'une séquence d'oligomère
US8455260B2 (en) 2009-03-27 2013-06-04 Massachusetts Institute Of Technology Tagged-fragment map assembly
WO2010111605A2 (fr) * 2009-03-27 2010-09-30 Nabsys, Inc. La présente invention concerne des dispositifs et des procédés d'analyse de biomolécules et des sondes liées à celles-ci
DK2511843T3 (en) * 2009-04-29 2017-03-27 Complete Genomics Inc METHOD AND SYSTEM FOR DETERMINING VARIATIONS IN A SAMPLE POLYNUCLEOTIDE SEQUENCE IN TERMS OF A REFERENCE POLYNUCLEOTIDE SEQUENCE
WO2010141131A1 (fr) 2009-06-04 2010-12-09 Lockheed Martin Corporation Puce microfluidique a echantillons multiples pour l'analyse d'adn
US9023769B2 (en) 2009-11-30 2015-05-05 Complete Genomics, Inc. cDNA library for nucleic acid sequencing
US8715933B2 (en) 2010-09-27 2014-05-06 Nabsys, Inc. Assay methods using nicking endonucleases
US8725422B2 (en) 2010-10-13 2014-05-13 Complete Genomics, Inc. Methods for estimating genome-wide copy number variations
CA2814720C (fr) 2010-10-15 2016-12-13 Lockheed Martin Corporation Conception optique microfluidique
FR2966844A1 (fr) * 2010-11-03 2012-05-04 Vivatech Procede d'analyse genomique
US8859201B2 (en) 2010-11-16 2014-10-14 Nabsys, Inc. Methods for sequencing a biomolecule by detecting relative positions of hybridized probes
WO2012109574A2 (fr) * 2011-02-11 2012-08-16 Nabsys, Inc. Procédés de dosage à l'aide de protéines de liaison à l'adn
WO2013039778A2 (fr) 2011-09-12 2013-03-21 The University Of North Carolina At Chapel Hill Dispositifs comprenant un nanocanal de transport de fluide entrecoupé par un nanocanal de détection de fluide et procédés correspondants
US10837879B2 (en) 2011-11-02 2020-11-17 Complete Genomics, Inc. Treatment for stabilizing nucleic acid arrays
WO2013109731A1 (fr) * 2012-01-18 2013-07-25 Singular Bio Inc. Procédés de cartographie de molécules à code-barres destinés à la détection et au séquençage d'une variation structurale
US9322054B2 (en) 2012-02-22 2016-04-26 Lockheed Martin Corporation Microfluidic cartridge
CN104350067A (zh) 2012-04-10 2015-02-11 牛津纳米孔技术公司 突变胞溶素孔
EP2844771A4 (fr) 2012-05-04 2015-12-02 Complete Genomics Inc Procédés de détermination des variations du nombre de copies absolu à l'échelle du génome de tumeurs complexes
US20150197787A1 (en) * 2012-08-02 2015-07-16 Qiagen Gmbh Recombinase mediated targeted dna enrichment for next generation sequencing
US9914966B1 (en) 2012-12-20 2018-03-13 Nabsys 2.0 Llc Apparatus and methods for analysis of biomolecules using high frequency alternating current excitation
EP2956550B1 (fr) 2013-01-18 2020-04-08 Nabsys 2.0 LLC Liaison améliorée d'une sonde
US9411930B2 (en) 2013-02-01 2016-08-09 The Regents Of The University Of California Methods for genome assembly and haplotype phasing
GB2547875B (en) 2013-02-01 2017-12-13 Univ California Methods for meta-genomics analysis of microbes
EP2954069B1 (fr) * 2013-02-05 2020-11-11 Bionano Genomics, Inc. Procédés d'analyse de molécules uniques
KR20150132125A (ko) 2013-02-28 2015-11-25 더 유니버시티 오브 노쓰 캐롤라이나 엣 채플 힐 거대분자의 통제된 포획, 고정, 및 전달을 위한 통합된 부품을 가진 나노유체 장치 및 관련 분석 방법
US10221450B2 (en) 2013-03-08 2019-03-05 Oxford Nanopore Technologies Ltd. Enzyme stalling method
US9328382B2 (en) 2013-03-15 2016-05-03 Complete Genomics, Inc. Multiple tagging of individual long DNA fragments
GB201313477D0 (en) 2013-07-29 2013-09-11 Univ Leuven Kath Nanopore biosensors for detection of proteins and nucleic acids
WO2014196854A1 (fr) * 2013-06-03 2014-12-11 Stichting Vu-Vumc Procédé et système d'imagerie de brin moléculaire
JP2015035212A (ja) * 2013-07-29 2015-02-19 アジレント・テクノロジーズ・インクAgilent Technologies, Inc. ターゲットシークエンシングパネルから変異を見つける方法
CN105637099B (zh) 2013-08-23 2020-05-19 深圳华大智造科技有限公司 使用短读段的长片段从头组装
AU2015208919B9 (en) 2014-01-22 2021-04-01 Oxford Nanopore Technologies Limited Method for attaching one or more polynucleotide binding proteins to a target polynucleotide
CN106574300A (zh) 2014-05-02 2017-04-19 牛津纳米孔技术公司 改善目标多核苷酸相对于跨膜孔移动的方法
US10526641B2 (en) 2014-08-01 2020-01-07 Dovetail Genomics, Llc Tagging nucleic acids for sequence assembly
JP6777966B2 (ja) 2015-02-17 2020-10-28 ダブテイル ゲノミクス エルエルシー 核酸配列アセンブリ
US11807896B2 (en) 2015-03-26 2023-11-07 Dovetail Genomics, Llc Physical linkage preservation in DNA storage
JP7300831B2 (ja) 2015-10-19 2023-06-30 ダブテイル ゲノミクス エルエルシー ゲノムアセンブリ、ハプロタイプフェージング、および標的に依存しない核酸検出のための方法
AU2017223600B2 (en) 2016-02-23 2023-08-03 Dovetail Genomics Llc Generation of phased read-sets for genome assembly and haplotype phasing
US11186868B2 (en) 2016-03-02 2021-11-30 Oxford Nanopore Technologies Plc Mutant pore
CN116514944A (zh) 2016-04-06 2023-08-01 牛津纳米孔科技公开有限公司 突变体孔
SG11201810088SA (en) 2016-05-13 2018-12-28 Dovetail Genomics Llc Recovering long-range linkage information from preserved samples
SG11201913174PA (en) 2017-06-30 2020-01-30 Vib Vzw Novel protein pores
CN112898575B (zh) * 2019-12-03 2022-10-21 深圳清华大学研究院 树杈状大分子修饰的核苷酸的制备方法
US11427855B1 (en) 2021-06-17 2022-08-30 Element Biosciences, Inc. Compositions and methods for pairwise sequencing
US11859241B2 (en) 2021-06-17 2024-01-02 Element Biosciences, Inc. Compositions and methods for pairwise sequencing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001009384A2 (fr) * 1999-07-29 2001-02-08 Genzyme Corporation Analyse en serie d'alterations genetiques
WO2005042763A2 (fr) * 2003-10-28 2005-05-12 Bioarray Solutions Ltd. Optimisation de l'analyse de l'expression genique a l'aide de sondes de capture immobilisees

Family Cites Families (130)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994373A (en) 1983-01-27 1991-02-19 Enzo Biochem, Inc. Method and structures employing chemically-labelled polynucleotide probes
US4719179A (en) 1984-11-30 1988-01-12 Pharmacia P-L Biochemicals, Inc. Six base oligonucleotide linkers and methods for their use
US4883750A (en) 1984-12-13 1989-11-28 Applied Biosystems, Inc. Detection of specific sequences in nucleic acids
US5175270A (en) * 1986-09-10 1992-12-29 Polyprobe, Inc. Reagents for detecting and assaying nucleic acid sequences
US6270961B1 (en) 1987-04-01 2001-08-07 Hyseq, Inc. Methods and apparatus for DNA sequencing and DNA identification
US5202231A (en) 1987-04-01 1993-04-13 Drmanac Radoje T Method of sequencing of genomes by hybridization of oligonucleotide probes
US5525464A (en) 1987-04-01 1996-06-11 Hyseq, Inc. Method of sequencing by hybridization of oligonucleotide probes
US5124246A (en) 1987-10-15 1992-06-23 Chiron Corporation Nucleic acid multimers and amplified nucleic acid hybridization assays using same
US6147198A (en) 1988-09-15 2000-11-14 New York University Methods and compositions for the manipulation and characterization of individual nucleic acid molecules
US6150089A (en) 1988-09-15 2000-11-21 New York University Method and characterizing polymer molecules or the like
JPH02176457A (ja) 1988-09-15 1990-07-09 Carnegie Inst Of Washington パルス志向電気泳動
US5720928A (en) 1988-09-15 1998-02-24 New York University Image processing and analysis of individual nucleic acid molecules
US6610256B2 (en) 1989-04-05 2003-08-26 Wisconsin Alumni Research Foundation Image processing and analysis of individual nucleic acid molecules
US5091302A (en) 1989-04-27 1992-02-25 The Blood Center Of Southeastern Wisconsin, Inc. Polymorphism of human platelet membrane glycoprotein iiia and diagnostic and therapeutic applications thereof
US5744101A (en) 1989-06-07 1998-04-28 Affymax Technologies N.V. Photolabile nucleoside protecting groups
US5143854A (en) 1989-06-07 1992-09-01 Affymax Technologies N.V. Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof
US6416952B1 (en) 1989-06-07 2002-07-09 Affymetrix, Inc. Photolithographic and other means for manufacturing arrays
US5800992A (en) 1989-06-07 1998-09-01 Fodor; Stephen P.A. Method of detecting nucleic acids
US6346413B1 (en) 1989-06-07 2002-02-12 Affymetrix, Inc. Polymer arrays
US5427930A (en) * 1990-01-26 1995-06-27 Abbott Laboratories Amplification of target nucleic acids using gap filling ligase chain reaction
CA2036946C (fr) 1990-04-06 2001-10-16 Kenneth V. Deugau Molecules de liaison pour indexation
US5223414A (en) 1990-05-07 1993-06-29 Sri International Process for nucleic acid hybridization and amplification
US5273881A (en) 1990-05-07 1993-12-28 Daikin Industries, Ltd. Diagnostic applications of double D-loop formation
JP3080178B2 (ja) 1991-02-18 2000-08-21 東洋紡績株式会社 核酸配列の増幅方法およびそのための試薬キット
JP3085409B2 (ja) 1991-03-29 2000-09-11 東洋紡績株式会社 標的核酸配列の検出方法およびそのための試薬キット
US6589726B1 (en) 1991-09-04 2003-07-08 Metrigen, Inc. Method and apparatus for in situ synthesis on a solid support
US5474796A (en) 1991-09-04 1995-12-12 Protogene Laboratories, Inc. Method and apparatus for conducting an array of chemical reactions on a support surface
US5218196A (en) 1991-09-05 1993-06-08 Frost Controls, Inc. Light curtain system with system and watchdog microcontrollers
PT969102E (pt) 1991-09-24 2008-03-25 Keygene Nv Iniciadores, grupos e conjuntos de fragmentos de restrição usados na amplificação selectiva de fragmentos de restrição
US5632957A (en) 1993-11-01 1997-05-27 Nanogen Molecular biological diagnostic systems including electrodes
GB9214873D0 (en) 1992-07-13 1992-08-26 Medical Res Council Process for categorising nucleotide sequence populations
WO1994016104A1 (fr) * 1993-01-08 1994-07-21 Ctrc Research Foundation Systeme d'imagerie couleur utilise en biologie moleculaire
US6401267B1 (en) 1993-09-27 2002-06-11 Radoje Drmanac Methods and compositions for efficient nucleic acid sequencing
FR2716263B1 (fr) * 1994-02-11 1997-01-17 Pasteur Institut Procédé d'alignement de macromolécules par passage d'un ménisque et applications dans un procédé de mise en évidence, séparation et/ou dosage d'une macromolécule dans un échantillon.
SE9400522D0 (sv) 1994-02-16 1994-02-16 Ulf Landegren Method and reagent for detecting specific nucleotide sequences
US5641658A (en) 1994-08-03 1997-06-24 Mosaic Technologies, Inc. Method for performing amplification of nucleic acid with two primers bound to a single solid support
US6013445A (en) 1996-06-06 2000-01-11 Lynx Therapeutics, Inc. Massively parallel signature sequencing by ligation of encoded adaptors
US6362002B1 (en) 1995-03-17 2002-03-26 President And Fellows Of Harvard College Characterization of individual polymer molecules based on monomer-interface interactions
US5866337A (en) 1995-03-24 1999-02-02 The Trustees Of Columbia University In The City Of New York Method to detect mutations in a nucleic acid using a hybridization-ligation procedure
JP4511636B2 (ja) 1995-04-03 2010-07-28 ウィスコンシン アルムニ リサーチ ファウンデーション 顕微鏡的イメージングにより核酸の物理的特性を測定する方法
US8142708B2 (en) 1995-04-03 2012-03-27 Wisconsin Alumni Research Foundation Micro fluidic system for single molecule imaging
US5750341A (en) 1995-04-17 1998-05-12 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
US5851769A (en) * 1995-09-27 1998-12-22 The Regents Of The University Of California Quantitative DNA fiber mapping
US6518189B1 (en) 1995-11-15 2003-02-11 Regents Of The University Of Minnesota Method and apparatus for high density nanostructures
US6143495A (en) 1995-11-21 2000-11-07 Yale University Unimolecular segment amplification and sequencing
CA2238003C (fr) 1995-12-01 2005-02-22 Innogenetics N.V. Systeme de detection par mesure de l'impedance et procede pour le fabriquer
US5867266A (en) 1996-04-17 1999-02-02 Cornell Research Foundation, Inc. Multiple optical channels for chemical analysis
US5851804A (en) 1996-05-06 1998-12-22 Apollon, Inc. Chimeric kanamycin resistance gene
GB9620209D0 (en) 1996-09-27 1996-11-13 Cemu Bioteknik Ab Method of sequencing DNA
US6309824B1 (en) 1997-01-16 2001-10-30 Hyseq, Inc. Methods for analyzing a target nucleic acid using immobilized heterogeneous mixtures of oligonucleotide probes
US6297006B1 (en) 1997-01-16 2001-10-02 Hyseq, Inc. Methods for sequencing repetitive sequences and for determining the order of sequence subfragments
US5948653A (en) 1997-03-21 1999-09-07 Pati; Sushma Sequence alterations using homologous recombination
AU6846798A (en) 1997-04-01 1998-10-22 Glaxo Group Limited Method of nucleic acid sequencing
US5888737A (en) 1997-04-15 1999-03-30 Lynx Therapeutics, Inc. Adaptor-based sequence analysis
US20040229221A1 (en) 1997-05-08 2004-11-18 Trustees Of Columbia University In The City Of New York Method to detect mutations in a nucleic acid using a hybridization-ligation procedure
US6136537A (en) 1998-02-23 2000-10-24 Macevicz; Stephen C. Gene expression analysis
DK1056884T3 (da) * 1998-02-27 2002-02-25 Pamgene Bv Fremgangsmåde til ikke-specifik amplifikation af nukleinsyre
US6004755A (en) 1998-04-07 1999-12-21 Incyte Pharmaceuticals, Inc. Quantitative microarray hybridizaton assays
US6355419B1 (en) * 1998-04-27 2002-03-12 Hyseq, Inc. Preparation of pools of nucleic acids based on representation in a sample
US6255469B1 (en) 1998-05-06 2001-07-03 New York University Periodic two and three dimensional nucleic acid structures
WO2000004193A1 (fr) 1998-07-20 2000-01-27 Yale University Procede pour detecter des acides nucleiques au moyen de ligature a mediation par cibles d'amorces bipartites
US6787308B2 (en) 1998-07-30 2004-09-07 Solexa Ltd. Arrayed biomolecules and their use in sequencing
EP1105529B2 (fr) 1998-07-30 2013-05-29 Illumina Cambridge Limited Biomolecules en rangees et leur utilisation dans une procedure de sequencage
WO2000014282A1 (fr) 1998-09-04 2000-03-16 Lynx Therapeutics, Inc. Procede de depistage de polymorphisme genique
US6267872B1 (en) 1998-11-06 2001-07-31 The Regents Of The University Of California Miniature support for thin films containing single channels or nanopores and methods for using same
EP2145963A1 (fr) 1999-01-06 2010-01-20 Callida Genomics, Inc. Séquencage par hybridation, amélioré, utilisant des groupes de sondes
CA2356697C (fr) 1999-01-06 2010-06-22 Cornell Research Foundation, Inc. Acceleration de l'identification des polymorphismes d'un nucleotide unique et alignement de clones dans le sequencage genomique
GB9901475D0 (en) 1999-01-22 1999-03-17 Pyrosequencing Ab A method of DNA sequencing
US6514768B1 (en) 1999-01-29 2003-02-04 Surmodics, Inc. Replicable probe array
US6544732B1 (en) 1999-05-20 2003-04-08 Illumina, Inc. Encoding and decoding of array sensors utilizing nanocrystals
US6573369B2 (en) 1999-05-21 2003-06-03 Bioforce Nanosciences, Inc. Method and apparatus for solid state molecular analysis
EP2383776B1 (fr) 1999-06-22 2015-02-25 President and Fellows of Harvard College Dispositif de nanopore à corps solide pour l'évaluation de biopolymères
US6464842B1 (en) 1999-06-22 2002-10-15 President And Fellows Of Harvard College Control of solid state dimensional features
US7258838B2 (en) 1999-06-22 2007-08-21 President And Fellows Of Harvard College Solid state molecular probe device
US7501245B2 (en) 1999-06-28 2009-03-10 Helicos Biosciences Corp. Methods and apparatuses for analyzing polynucleotide sequences
US6818395B1 (en) 1999-06-28 2004-11-16 California Institute Of Technology Methods and apparatus for analyzing polynucleotide sequences
US6472156B1 (en) 1999-08-30 2002-10-29 The University Of Utah Homogeneous multiplex hybridization analysis by color and Tm
US7244559B2 (en) 1999-09-16 2007-07-17 454 Life Sciences Corporation Method of sequencing a nucleic acid
WO2001023610A2 (fr) 1999-09-29 2001-04-05 Solexa Ltd. Sequençage de polynucleotides
US6297016B1 (en) 1999-10-08 2001-10-02 Applera Corporation Template-dependent ligation with PNA-DNA chimeric probes
US6500620B2 (en) 1999-12-29 2002-12-31 Mergen Ltd. Methods for amplifying and detecting multiple polynucleotides on a solid phase support
DE60143723D1 (de) 2000-02-07 2011-02-03 Illumina Inc Nukleinsäuredetektionsverfahren mit universellem Priming
EP1255865B1 (fr) 2000-02-07 2007-04-18 Illumina, Inc. Procédé pour la détection d'acides nucléiques utilisant des amorces universelles
US6913884B2 (en) 2001-08-16 2005-07-05 Illumina, Inc. Compositions and methods for repetitive use of genomic DNA
AU2001241723A1 (en) 2000-02-25 2001-09-03 Affymetrix, Inc. Methods for multi-stage solid phase amplification of nucleic acids
US20020004204A1 (en) 2000-02-29 2002-01-10 O'keefe Matthew T. Microarray substrate with integrated photodetector and methods of use thereof
US6413722B1 (en) 2000-03-22 2002-07-02 Incyte Genomics, Inc. Polymer coated surfaces for microarray applications
WO2002065515A2 (fr) 2001-02-14 2002-08-22 Science & Technology Corporation @ Unm Dispositifs nanostructures de separation et d'analyse
EP2801624B1 (fr) 2001-03-16 2019-03-06 Singular Bio, Inc Puces et procédés d'utilisation
EP1414840A4 (fr) 2001-03-27 2005-04-13 Univ Delaware Applications genomiques destinees a des oligonucleotides modifies
US7338760B2 (en) * 2001-10-26 2008-03-04 Ntu Ventures Private Limited Sample preparation integrated chip
DE10153829A1 (de) 2001-11-05 2003-05-28 Bayer Ag Assay basierend auf dotierten Nanoteilchen
GB2382137A (en) 2001-11-20 2003-05-21 Mats Gullberg Nucleic acid enrichment
US7011945B2 (en) 2001-12-21 2006-03-14 Eastman Kodak Company Random array of micro-spheres for the analysis of nucleic acids
US20040002090A1 (en) 2002-03-05 2004-01-01 Pascal Mayer Methods for detecting genome-wide sequence variations associated with a phenotype
EP1572860B1 (fr) 2002-04-16 2018-12-05 Princeton University Structures de gradient interfa ant des elements microfluidiques et des elements nanofluidiques, leurs procedes de fabrication et d'utilisation
US20050019776A1 (en) 2002-06-28 2005-01-27 Callow Matthew James Universal selective genome amplification and universal genotyping system
EP1556506A1 (fr) 2002-09-19 2005-07-27 The Chancellor, Masters And Scholars Of The University Of Oxford Reseaux moleculaires et detection de molecule unique
JP2006517798A (ja) 2003-02-12 2006-08-03 イェニソン スベンスカ アクティエボラーグ 核酸配列用の方法及び手段
CN1791682B (zh) 2003-02-26 2013-05-22 凯利达基因组股份有限公司 通过杂交进行的随机阵列dna分析
CA2518452A1 (fr) * 2003-03-11 2004-09-23 Gene Check, Inc. Methode d'allongement d'oligonucleotide specifique a l'allele, au moyen de la proteine reca, permettant de detecter des mutations, des pns (polymorphisme d'un nucleotide simple) et des sequences specifiques
EP1685380A2 (fr) 2003-09-18 2006-08-02 Parallele Bioscience, Inc. Systeme et procedes pour renforcer les rapports signal/bruit dans les mesures par microreseaux
GB0324456D0 (en) 2003-10-20 2003-11-19 Isis Innovation Parallel DNA sequencing methods
EP2202322A1 (fr) 2003-10-31 2010-06-30 AB Advanced Genetic Analysis Corporation Procédés de production d'étiquette appariée à partir d'une séquence d'acide nucléique et leurs procédés d'utilisation
US7169560B2 (en) 2003-11-12 2007-01-30 Helicos Biosciences Corporation Short cycle methods for sequencing polynucleotides
GB0402895D0 (en) 2004-02-10 2004-03-17 Solexa Ltd Arrayed polynucleotides
ATE463584T1 (de) 2004-02-19 2010-04-15 Helicos Biosciences Corp Verfahren zur analyse von polynukleotidsequenzen
AU2005216549A1 (en) 2004-02-27 2005-09-09 President And Fellows Of Harvard College Polony fluorescent in situ sequencing beads
US20050214840A1 (en) 2004-03-23 2005-09-29 Xiangning Chen Restriction enzyme mediated method of multiplex genotyping
US7238485B2 (en) 2004-03-23 2007-07-03 President And Fellows Of Harvard College Methods and apparatus for characterizing polynucleotides
GB2413796B (en) 2004-03-25 2006-03-29 Global Genomics Ab Methods and means for nucleic acid sequencing
US20050260609A1 (en) 2004-05-24 2005-11-24 Lapidus Stanley N Methods and devices for sequencing nucleic acids
US7635562B2 (en) 2004-05-25 2009-12-22 Helicos Biosciences Corporation Methods and devices for nucleic acid sequence determination
US20070117104A1 (en) 2005-11-22 2007-05-24 Buzby Philip R Nucleotide analogs
US20060012793A1 (en) 2004-07-19 2006-01-19 Helicos Biosciences Corporation Apparatus and methods for analyzing samples
US7276720B2 (en) 2004-07-19 2007-10-02 Helicos Biosciences Corporation Apparatus and methods for analyzing samples
WO2006073504A2 (fr) 2004-08-04 2006-07-13 President And Fellows Of Harvard College Sequençage des oscillations dans l'anticodon
GB0422551D0 (en) 2004-10-11 2004-11-10 Univ Liverpool Labelling and sequencing of nucleic acids
WO2006055521A2 (fr) 2004-11-16 2006-05-26 Helicos Biosciences Corporation Train optique et procede de detection et d'analyse tirf de molecule unique
EP2239342A3 (fr) 2005-02-01 2010-11-03 AB Advanced Genetic Analysis Corporation Réactifs, méthodes et bibliothèques pour séquencage fondé sur des billes
ATE529734T1 (de) 2005-04-06 2011-11-15 Harvard College Molekulare charakterisierung mit kohlenstoff- nanoröhrchen-steuerung
US8445194B2 (en) 2005-06-15 2013-05-21 Callida Genomics, Inc. Single molecule arrays for genetic and chemical analysis
US7666593B2 (en) 2005-08-26 2010-02-23 Helicos Biosciences Corporation Single molecule sequencing of captured nucleic acids
WO2007133831A2 (fr) 2006-02-24 2007-11-22 Callida Genomics, Inc. Séquençage génomique à haut débit sur des puces à adn
US7960105B2 (en) 2005-11-29 2011-06-14 National Institutes Of Health Method of DNA analysis using micro/nanochannel
EP1987162A4 (fr) 2006-01-23 2009-11-25 Population Genetics Technologi Analyses d'acides nucléiques au moyen de marques de séquences
WO2007092538A2 (fr) 2006-02-07 2007-08-16 President And Fellows Of Harvard College Procédés de confection de sondes nucléotidiques pour séquençage et synthèse
SG10201405158QA (en) 2006-02-24 2014-10-30 Callida Genomics Inc High throughput genome sequencing on dna arrays
US7910302B2 (en) 2006-10-27 2011-03-22 Complete Genomics, Inc. Efficient arrays of amplified polynucleotides
US20090111705A1 (en) 2006-11-09 2009-04-30 Complete Genomics, Inc. Selection of dna adaptor orientation by hybrid capture

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001009384A2 (fr) * 1999-07-29 2001-02-08 Genzyme Corporation Analyse en serie d'alterations genetiques
WO2005042763A2 (fr) * 2003-10-28 2005-05-12 Bioarray Solutions Ltd. Optimisation de l'analyse de l'expression genique a l'aide de sondes de capture immobilisees

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BJÖRK PER ET AL: "Single molecular imaging and spectroscopy of conjugated polyelectrolytes decorated on stretched aligned DNA." NANO LETTERS OCT 2005, vol. 5, no. 10, October 2005 (2005-10), pages 1948-1953, XP002525621 ISSN: 1530-6984 *
BJÖRK PER ET AL: "Soft lithographic printing of patterns of stretched DNA and DNA/electronic polymer wires by surface-energy modification and transfer." SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) AUG 2006, vol. 2, no. 8-9, August 2006 (2006-08), pages 1068-1074, XP002525620 ISSN: 1613-6829 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10996212B2 (en) 2012-02-10 2021-05-04 The University Of North Carolina At Chapel Hill Devices and systems with fluidic nanofunnels for processing single molecules
CN105452482A (zh) * 2013-03-13 2016-03-30 北卡罗来纳-查佩尔山大学 用于对全基因组进行快速作图的纳米流体装置以及相关的分析系统和方法
US9970898B2 (en) 2013-03-13 2018-05-15 The University Of North Carolina At Chapel Hill Nanofluidic devices for the rapid mapping of whole genomes and related systems and methods of analysis
US10106848B2 (en) 2013-03-13 2018-10-23 The University Of North Carolina At Chapel Hill Nanofluidic devices for the rapid mapping of whole genomes and related systems and methods of analysis
US10571428B2 (en) 2013-03-13 2020-02-25 The University Of North Carolina At Chapel Hill Nanofluidic devices for the rapid mapping of whole genomes and related systems and methods of analysis
CN105452482B (zh) * 2013-03-13 2020-03-06 北卡罗来纳-查佩尔山大学 用于对全基因组进行快速作图的纳米流体装置以及相关的分析系统和方法
US11067537B2 (en) 2013-03-13 2021-07-20 The University Of North Carolina At Chapel Hill Nanofluidic devices for the rapid mapping of whole genomes and related systems and methods of analysis
US11307171B2 (en) 2013-03-13 2022-04-19 The University Of North Carolina At Chapel Hill Nanofluidic devices for the rapid mapping of whole genomes and related systems and methods of analysis
WO2015100473A1 (fr) * 2014-01-02 2015-07-09 The University Of Queensland Procédé et appareil de séquençage
US10471428B2 (en) 2015-05-11 2019-11-12 The University Of North Carolina At Chapel Hill Fluidic devices with nanoscale manifolds for molecular transport, related systems and methods of analysis

Also Published As

Publication number Publication date
US20090111115A1 (en) 2009-04-30
US8951731B2 (en) 2015-02-10
WO2009052214A3 (fr) 2009-06-25

Similar Documents

Publication Publication Date Title
US8951731B2 (en) Sequence analysis using decorated nucleic acids
US11835437B2 (en) Treatment for stabilizing nucleic acid arrays
AU698553B2 (en) Parallel primer extension approach to nucleic acid sequence analysis
JP2022122964A (ja) サンプル中の標的核酸を検出する方法
US9267172B2 (en) Efficient base determination in sequencing reactions
AU2007249635B2 (en) High throughput genome sequencing on DNA arrays
EP2227563B1 (fr) Détermination efficace des bases dans les réactions de séquençage
US10072287B2 (en) Methods of targeted sequencing
US6153379A (en) Parallel primer extension approach to nucleic acid sequence analysis
WO2008058282A2 (fr) Procédés et compositions pour analyse à grande échelle d'acides nucléiques par suppressions d'adn
US10174368B2 (en) Methods and systems for sequencing long nucleic acids
EP2610351B1 (fr) Détermination efficace des bases dans les réactions de séquençage
US7001722B1 (en) Parallel primer extension approach to nucleic acid sequence analysis
AU2014250690B2 (en) High throughput genome sequencing on DNA arrays
AU2013202989A1 (en) Efficient base determination in sequencing reactions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08839496

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A OF 30-06-2010)

122 Ep: pct application non-entry in european phase

Ref document number: 08839496

Country of ref document: EP

Kind code of ref document: A2