WO2013074632A1 - Mismatch nucleotide purification and identification - Google Patents

Mismatch nucleotide purification and identification Download PDF

Info

Publication number
WO2013074632A1
WO2013074632A1 PCT/US2012/065018 US2012065018W WO2013074632A1 WO 2013074632 A1 WO2013074632 A1 WO 2013074632A1 US 2012065018 W US2012065018 W US 2012065018W WO 2013074632 A1 WO2013074632 A1 WO 2013074632A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
mismatch
heterohybrid
fragments
nucleotide
Prior art date
Application number
PCT/US2012/065018
Other languages
French (fr)
Inventor
Floyd D. Rose
Andrew C. Hiatt
Original Assignee
Rose Floyd D
Hiatt Andrew C
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rose Floyd D, Hiatt Andrew C filed Critical Rose Floyd D
Publication of WO2013074632A1 publication Critical patent/WO2013074632A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2521/00Reaction characterised by the enzymatic activity
    • C12Q2521/30Phosphoric diester hydrolysing, i.e. nuclease
    • C12Q2521/301Endonuclease
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2525/00Reactions involving modified oligonucleotides, nucleic acids, or nucleotides
    • C12Q2525/10Modifications characterised by
    • C12Q2525/186Modifications characterised by incorporating a non-extendable or blocking moiety
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • MISMATCH NUCLEOTIDE PURIFICATION AND IDENTIFICATION the disclosure of which is hereby incorporated herein by reference .
  • the human nuclear genome is comprised of ⁇ 3 xlO 9 base pairs of DNA.
  • the nucleotide differences when comparing the genome of one individual to that of another individual, may be less than 0.02% of the total.
  • the primary differences between human genomes are single nucleotide polymorphisms (SNPs) occurring at single nucleotides. These polymorphisms also account for the heterozygosity of allelic DNA in multiploid genomes. It has been estimated that in genomic DNA single base-pair variations may be found at approximately 1200-nucleotide intervals suggesting that there may be 2-3 x 10 6 SNPs total. However, since individual genomes will have in common the majority of these SNPs (and are therefore not SNPs relative to each other), the actual number of SNPs when comparing the genomes of two individuals is probably far lower .
  • RFLPs restriction fragment length polymorphisms
  • Southern blotting technique Southern blotting technique
  • PCR polymerase chain reaction
  • SSCP single-strand conformational polymorphisms
  • DGGE denaturing gradient gel electrophoresis
  • HET heteroduplex analysis
  • CCM chemical cleavage analysis
  • the present invention encompasses high-throughput methods for identifying the complete set of SNPs in a genome of interest that may due to allelic heterozygosity or result from a comparison of a target DNA to a reference DNA whose sequence is known or substantially known.
  • One aspect of the present invention is directed to a method of determining the sequence of a DNA, comprising: a) preparing single stranded DNA fragments from a polyploid organism, b) allowing the fragments to re-anneal and form double stranded heterohybrid DNA fragments wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch; c) distinguishing formation of heterohybrid DNA containing a mismatch from formation of DNA which is perfectly complementary, d) separating or purifying each heterohybrid DNA containing a mismatch from all other heterohybrid DNA (which may be conducted in a single or multiple consecutive or non-consecutive steps), and e) determining the identity of the mismatched nucleotide ( s ) , and optionally the sequence of the mismatch region (i.e., the DNA sequence in the vicinity of the mismatch in d) ) , thus allowing elucidation of the
  • Another aspect of the present invention is directed to a method of determining the sequence of a DNA, comprising: a) preparing single stranded fragments of a first DNA having a substantially known sequence; b) preparing single stranded fragments of a second DNA having an unknown sequence; c) contacting the single stranded fragments of a) or copies thereof, and the single stranded fragments of b) or copies thereof under conditions that allow formation of heterohybrid DNA, wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch; d) distinguishing formation of heterohybrid DNA containing a mismatch from formation of heterohybrid DNA which is perfectly complementary; e) separating or purifying each heterohybrid DNA containing a mismatch from all other heterohybrid DNA (which may be conducted in a single or multiple consecutive or non-consecutive steps); and f) determining the identity of the mismatched nu
  • the method may be carried out by any one of the following sequences of steps.
  • the method may entail the steps of: a) creating double-stranded restriction fragments or double-stranded fragments derived from mechanical shear of genomic DNA; b) modifying the 3' hydroxyl groups of all of the fragments with a blocking moiety or group; c) separating the double stranded DNA to create a single stranded DNA population capable of randomly re-annealing to reform double stranded DNA fragments; d) allowing the DNA to re-anneal, thus forming the heterohybrid DNA wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch, and (d') reacting the heterohybrid DNA with a mismatch recognition protein-based system, thus creating a population of double strand DNA fragments wherein one strand contains at least one break in its phosphodiester bonds in the mismatch
  • the target DNA is re-annealed under stringent conditions with a reference DNA sample, or with itself.
  • the reference DNA sample comprises genomic DNA from a known or substantially known sequence of DNA referred to as the reference standard or reference DNA.
  • the target DNA sample is genomic DNA containing an unknown number of mutations or SNPs relative to the reference DNA and is referred to as target DNA.
  • Target DNA is processed in the same fashion as the reference DNA. For example, if the reference DNA is digested with a particular restriction enzyme or enzymes, the target DNA is preferably digested with the same enzyme (s).
  • both the target and the reference DNAs are obtained from the same source such as a polyploid organism, and are then melted and re-annealed, wherein the mismatched DNA sites may be derived from the polymorphisms inherent to allelic heterozygosity.
  • the hybrids that form that contain mismatch regions are recognized and endonucleolytically cleaved on one or both sides of the mismatch region by mismatch recognition protein-based systems.
  • the re-annealed heterohybrid DNA may be created from single stranded fragments of target and reference DNA that are obtained either directly from fragmented genomic DNA or indirectly from amplified or cloned fragments of genomic DNA. These heterohybrids in solution may be reacted with one or more mismatch repair enzymes under conditions in which the repair enzyme (s) remains attached to the mismatch region of the heterohybrid for a sufficient period of time to allow for further manipulation of the enzyme-DNA complex. Examples of further manipulation include, for example, purification, precipitation, hybridization, and denaturation .
  • heterohybrid DNA that is perfectly complementary (and which does not contain at least one mismatch) is first separated from the population of heterohybrid DNA that does contain a mismatch, and in subsequent step(s), each of the heterohybrid DNAs containing a mismatch are separated from each other, e.g., by arraying on a solid support such that the individual fragments containing a mismatch can be sequenced and the localization of the mismatch in the genome can be determined.
  • the purified fragments are cloned into appropriate vectors for propagation in, for example, microorganisms.
  • DNA amplification of the total population of fragments can unambiguously identify mismatch nucleotides and thereby confirm the presence and location of mismatched nucleotide pairs.
  • the present invention thus provides a simplified approach for determining the entirety of polymorphisms in a genome or combination of genomes .
  • the scission at the site of the mismatch in the re-annealed heterohybrid DNA is created by a detectably labeled ATE enzyme ("all-type nicking enzyme") and results in the covalent attachment of the enzyme to the nicked strand.
  • ATE enzyme all-type nicking enzyme
  • DNA topoisomerase I is a ubiquitous enzyme that relieves DNA torsional stress by introducing a break in the phosphodiester bond between the mismatch nucleotide and the nucleotide immediately on the 5' side of the mismatch. The enzyme becomes covalently attached to the free 3' hydroxyl via a phosphotyrosine moiety.
  • the resulting fluorescent signal indicates the presence of a mismatch in a heterohybrid DNA. A greater intensity of the fluorescent signal also indicates that there is more than one mismatch in the DNA fragment.
  • ATEs can be strand-selective based on local nucleotides, the strand containing the nick can be ultimately identified by comparison to the known sequence (which may be contained in a database), once the sequence in the vicinity of the mismatch is obtained.
  • Covalently bound ATE can be removed by proteolysis or by the activity of a tyrosyl phosphodiesterase.
  • the resulting 3 ' -phosphorylated nick is reconstituted to a 3 ' -hydroxyl by a polynucleotide kinase phosphatase, which may thus serve as a substrate for DNA polymerase and hence for sequencing.
  • the mismatch repair enzyme is attached to a first binding moiety or group, e.g., biotin, and the further manipulation involves contacting the enzyme-DNA complex with a second binding moiety that forms a complex with the first binding moiety (e.g., streptavidin ) , and which is attached to a solid support.
  • a first binding moiety or group e.g., biotin
  • the further manipulation involves contacting the enzyme-DNA complex with a second binding moiety that forms a complex with the first binding moiety (e.g., streptavidin ) , and which is attached to a solid support.
  • a second binding moiety that forms a complex with the first binding moiety (e.g., streptavidin )
  • heterohybrid DNA containing mismatches can be first identified in solution by enzyme binding and subsequently purified by affinity chromatography. These mismatch-containing heterohybrid DNAs can then be used as the starting material for amplification, immobilization, and sequencing.
  • Some enzymes e.g., topoisomerases , do not covalently modify a mismatch nucleotide but attach to the adjacent nucleotide on the 5' side of the mismatch nucleotide.
  • the enzyme (e.g., topoisomerase ) -DNA complexes are modified by the steps of (a) modifying substantially all of the free 3' hydroxyl groups with blocking groups, followed by (b) exposing the 3' hydroxyl group of one or both mismatch nucleotides, and (c) covalently modifying the mismatch nucleotide 3' hydroxyl group to allow for purification and identification of the individual mismatch nucleotides and the DNA sequence in the vicinity of the mismatch.
  • the population of re-annealed heterohybrid DNA fragments is treated with a mismatch endonuclease that cuts both strands of DNA at the 3 ' end of the mismatch.
  • This treatment results in the creation of two double-stranded fragments derived from the mismatch-containing fragment.
  • the new fragments each have a single nucleotide extending from the 3' end of one strand, and each of these nucleotides is a mismatch nucleotide.
  • Another aspect of the present invention is directed to a chip having affixed thereto, directly or indirectly, a plurality (typically in the order of thousands to millions) of DNA fragments (e.g., restriction fragments) of known sequence. These fragments, preferably single stranded, were purified by virtue of the presence of at least one mismatched nucleotide contained therein, and thus serve as the annealing templates for similarly processed DNA.
  • the annealed DNA thus captured on the chip can be used to readily identify the presence of mismatched nucleotides in other target DNA's. These represent the mutational fingerprint of an individual genome, combined genomes, heritable traits or disease states.
  • the present application is believed to provide a solution to current and emerging needs that face the biotechnology industry and particularly the fields of genomics, pharmacogenomics, drug discovery, food characterization and genotyping.
  • the method of the present invention has potential application in for example: nucleic acid sequencing and re-sequencing, diagnostics and screening, gene expression monitoring, genetic diversity profiling, whole genome polymorphism discovery and scoring, the SNPasome (array of SNPs), whole genome sequence determination, and the evolution and propagation of specific types of mutations in individuals and populations .
  • the present invention encompasses high-throughput methods for identifying all of the mismatches contained in a re-annealed genome or combination of genomes (e.g., from different sources).
  • high-throughput refers to a system for rapidly modifying and assaying large numbers of distinct DNA samples at the same time.
  • genomic DNA is re-annealed with itself or other genomic DNA of known or unknown sequence.
  • a "known DNA sequence” as referred to herein refers to a sequence of nucleotides comprising a gene, a set of genes, or a genome where the nucleotide sequence is substantially or entirely known such that oligonucleotides complementary to repeating units of the gene, set of genes, or genome can be synthesized. Examples of such repeating units include but are not limited to, for example, SNPs and restriction sites.
  • An "unknown DNA sequence” is a gene, set of genes, or a genome that contains an unknown population of single or multiple nucleotide differences in comparison to a known DNA sequence.
  • the methods of the present invention take advantage of the differences between physico-chemical properties of DNA hybrids between almost-identical (but not completely identical) DNA strands and DNA hybrids that are perfectly complementary.
  • the heterohybrid DNAs include double stranded fragments that contain a mismatched pair of nucleotides that is embedded in an otherwise perfectly matched hybrid.
  • mismatches are formed under controlled conditions and are chemically and/or enzymatically modified.
  • the sequences adjacent to, and including, the mismatch (referred to herein as the mismatch region) are then determined.
  • the mismatch region may include any number of bases, typically from 1 to about 1000 bases.
  • Embodiments of the present invention may encompass the steps of: 1) preparing re-annealed DNA derived from a single genome or multiple genomes of known or unknown sequence wherein re-annealing can occur before or after fragmentation by, for example, restriction enzyme digestion or mechanical shearing; 2) cleaving one or both of the DNA strands at mismatches to form a single-stranded nick or a new DNA fragment in the vicinity of the mismatch; and 3) purifying the fragments containing mismatch nucleotides such that each mismatch nucleotide is separated from the other mismatch nucleotide; followed by 4) determining the precise identity of the mismatch nucleotide and the DNA sequence in the vicinity of the mismatch nucleotide.
  • the nucleic acids to be used can be obtained using methods well known and documented in the art. For example, by obtaining a nucleic acid sample such as total DNA, genomic DNA, and cDNA by methods well known and documented in the art and generating fragments therefrom by, for example, limited restriction enzyme digestion or by mechanical means.
  • the DNA for use in the present invention may include genomic DNA derived from two different genomes (i.e., two different individuals), replaced with DNA derived from a PCR polymerase reaction, or replaced with DNA derived from a reverse transcription of mRNAs (cDNA) .
  • Genomes may include multiple prevalent versions, which contain alterations in sequence relative to each other that cause no discernible pathological effect. Such variations are designated “polymorphisms” or "allelic variants". Most preferably, genomic DNA from a single individual is used for the second DNA sample of unknown sequence. This insures that, statistically, hybrids formed between the first, second, or same genomic DNA sample will be perfectly matched except in the region of the mutation, where discrete mismatch regions will form. In some applications, it is desired to detect polymorphisms. In these cases, appropriate sources for the DNA sample will be selected accordingly. Depending upon what method is used subsequently to detect mismatches, the DNA may also be chemically or enzymatically modified, e.g., to remove or add methyl groups .
  • Heterozygous DNA from a single polyploid genomic DNA may be isolated from any cell source or body tissue (e.g., fluid) of a plant, animal or human.
  • cell sources include blood cells, buccal cells, cervicovaginal cells, epithelial cells from urine, fetal cells, or any cells present in tissue obtained by biopsy.
  • Cellular sources from plants include leaf, stem, root, flower, or cultured cells.
  • Body fluids include blood, urine, cerebrospinal fluid, and tissue exudates at the site of infection or inflammation.
  • DNA may be extracted from the cell source or body tissue using any one of the numerous methods that are standard in the art. It will be understood that the particular method used to extract DNA will depend on the nature of the source .
  • the amount of DNA to be extracted for use in the inventive methods is typically in the range of at least 5 pg (corresponding to about 1 cell equivalent of a genome size of 3xl0 9 base pairs) .
  • variable amounts of DNA may be extracted.
  • the reference and target DNAs to be combined may be obtained from any cell source or body fluid.
  • the reference DNA is obtained from a single cell source for comparison to different target DNAs.
  • a single source of reference DNA could be obtained, for example, from human cells in culture.
  • reference or target DNA can be obtained from an individual with a particular disease, such as cancer.
  • Reference and target DNAs obtained from individuals with diagnosed diseases or clinical symptoms or genetic traits could be used to ultimately identify unique SNP profiles associated with diseases, symptoms, or traits. In this way, multiple reference and target DNAs can be obtained, each of which will contain the SNP profile relating to a disease, a particular symptom or a trait.
  • the DNA may be employed for use in the inventive methods without further manipulation.
  • DNA fragments suitable for the present invention can be prepared by a variety of techniques well known to those of skill in the art. For example, a restriction endonuclease recognizing a six-base DNA sequence will cleave genomic DNA in fragments that have an average size of 4 6 (4096 ) bases. In a preferred embodiment, genomic DNA is fragmented by a restriction endonuclease recognizing 6 bases and cutting the DNA to leave 3' overhanging sticky ends (e.g., Kpnl leaves 5'GTAC3' ends) .
  • a "3' overhanging nucleotide" as used herein refers to a 3' terminal nucleotide that has no juxtaposed complementary nucleotide.
  • a Kpnl digestion of DNA results in four 3' overhanging nucleotides.
  • a "5' overhanging nucleotide” as used herein refers to a 5 ' terminal nucleotide that has no juxtaposed complementary nucleotide.
  • an EcoRl digestion of DNA results in four 5' overhanging nucleotides. Restriction endonucleases that produce fragments having blunt ends may also be useful.
  • the DNA fragments prepared may be optionally separated into different sizes of DNA fragments. Using agarose gel electrophoresis followed by gel melting and DNA capture, fragments in the range of, for example, 50-10,000 base pairs, can be isolated and used for the re-annealing procedures. Preferably, DNA sizing of this sort is performed after the step of DNA melting and re-annealing.
  • the DNA fragments may be amplified by PCR, preferably before melting and re-annealing.
  • Amplification provides the advantage of increasing the amount of either specific DNA or total sequences within the DNA sequence population.
  • the amplified regions may be specified by the choice of particular flanking sequences for use as primers.
  • the primers can be ligated to the ends of each restriction fragment.
  • the length of DNA sequence that can be amplified typically ranges from 80 bp up to about 30 kbp (Saiki et al . , 1988, Science, 239:487) .
  • the use of amplification primers that are modified by, e.g., biotinylation can allow for the selective incorporation of the modification into the amplified target DNA.
  • Nucleic acid template refers to an entity that includes or contains the DNA to be amplified or sequenced.
  • the DNA to be amplified or sequenced can also be provided in a double stranded form.
  • “DNA templates” of the invention may be single or double stranded DNA.
  • the DNA templates to be used in the methods of the present invention can be of variable lengths, typically at least 50 base pairs in length and in some embodiments up to about 30, 000 base pairs in length.
  • the nucleotides making up the DNA templates may be naturally occurring or non-naturally occurring nucleotides.
  • the DNA templates of the invention not only comprise the DNA to be amplified but may in addition contain at the 5' and 3' end short sequences that are complementary to synthetic oligonucleotides.
  • Re-annealing of single stranded fragments involves separating the two strands of restricted DNA fragments using, for example, heat. Each of the strands derived from the restriction fragments is then re-annealed at a lower temperature giving rise to heterohybrid DNA that contains perfectly complementary double strand DNA and double strand fragments containing mismatched nucleotide pairs.
  • Hybridization and re-annealing reactions according to the present invention may be performed under high stringency conditions, which in general entails carrying out the reactions in solutions ranging from about lOmM NaCl to about 600mM NaCl, and at temperatures ranging from about 37°C to about 65°C. It will be understood that the stringency of a hybridization reaction is determined by both the salt concentration and the temperature. Thus, a hybridization performed in lOmM salt at 37°C may be of similar stringency to one performed in 500mM salt at 65°C.
  • any hybridization conditions may be used that form perfect hybrids between precisely complementary sequences and mismatch loops between non-complementary sequences in the same molecules.
  • the re-annealing process is initiated by cooling the DNA to a temperature that yields an optimum proportion of double stranded DNA, e.g., 50° to 70°C.
  • re-annealing reactions are performed in about 600mM NaCl at about 65°C in solution.
  • mismatched nucleotide pair or “mismatched nucleotide pairs” or “mismatched nucleotides” as used herein refers to a pair of nucleotides contained in opposite strands of a largely complementary double strand DNA that are juxtaposed opposite to each other but comprise nucleotide pairs that are not GC or AT. Examples of mismatched nucleotide pairs are GG, CC, AA, TT, GA, GT, CA, and CT.
  • mismatch nucleotide refers to a single nucleotide that is one of the nucleotides in a mismatched nucleotide pair.
  • Unmatched nucleotide or "unmatched nucleotides” as used herein refers to one or more nucleotides contained in one strand of a double strand DNA that does not have a juxtaposed nucleotide on the opposite strand.
  • An example of unmatched nucleotides is a single stranded DNA loop that protrudes from double stranded DNA.
  • the fragmented double-stranded DNA and/or the double-stranded heterohybrid DNA that is formed by the re-annealing or the hybridization reaction may be treated to block substantially all or all of 3' free ends so that they cannot serve as substrates for further enzymatic modification such as by RNA or DNA ligases or polymerases.
  • Suitable blocking methods include, without limitation, removal of 5 '-phosphate groups, homopolymeric tailing of 3 ' -ends with nucleotides, monomeric tailing of 3 ' -ends with dideoxynucleotides, and ligation of modified double-stranded oligonucleotides to the ends of the duplex.
  • Enzymes for modifying 3' hydroxyls include, generally, polymerases and transferases.
  • modification of 3 '-hydroxyls can be by a chemical reaction.
  • the reaction of a 3 ' hydroxyl group with a reactive phophoramidite will result in modification of the 3 ' -hydroxyl .
  • modification of all 3 '-hydroxyls is accomplished with terminal deoxynucleotidyl transferase (TdT) and one or more deoxynucleotide triphosphates (dNTP) .
  • the deoxynucleotide triphosphates are dideoxy nucleotide triphosphates (ddNTP).
  • the modification of all 3' hydroxyls is accomplished with terminal TdT and one or more deoxynucleotide triphosphates that have a removable blocking group on its 3 ' -hydroxyl .
  • deoxynucleotide triphosphates that have a removable blocking group on its 3 ' -hydroxyl .
  • reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups .
  • cleavage refers to scission of the covalent linkage joining the 3' hydroxyl and 5' phosphate that join two ribose moieties resulting in a free 3' hydroxyl and 5 ' phosphate .
  • cleavage may occur at some predetermined distance from either boundary of the mismatch region, and may occur on either strand.
  • the mismatch region as used herein thus encompasses from 1 (typically) to about 1000 bases from the borders of the mismatch .
  • the mismatch region is comprised of -25 base pairs. Knowing the sequence, for example, of the 25 base pairs of DNA in the mismatch region allows for the unambiguous determination of the genomic location of the mismatch by analysis of a genomic sequence database of reference DNA.
  • mismatch recognition and cleavage protein-based systems suitable for use in the present invention include single and double strand mismatch nicking proteins, mismatch repair proteins, nucleotide excision repair proteins, mismatch nucleases, chemical modification, and combinations thereof. These embodiments are described below.
  • mismatch recognition and modification proteins used in practicing the present invention may be derived from any species, including bacterial (e.g., E. coli) and humans, or combinations thereof.
  • functional homologs for a given protein exist across phylogeny.
  • a "functional homolog" of a given protein as used herein is another protein that can functionally substitute for the first protein, either in vivo or in a cell-free reaction.
  • Mismatch repair proteins A number of different enzyme systems exist across phylogeny to repair mismatches that form during DNA replication.
  • E. coli one system involves the MutY gene product, which recognizes A/G mismatches and cleaves the A-containing strand (Tsai-Wu et al . , J. Bacteriol. 178:1902 (1991)) .
  • Another system in E. coli utilizes the coordinated action of the MutS, MutL, and Mutes proteins to recognize errors in newly-synthesized DNA strands specifically by virtue of their transient state of under-methylation (prior to their being acted upon by dam methylase in the normal course of replication) .
  • Cleavage typically occurs at a hemi-methylated GATC site within 1-2 kb of the mismatch, followed by exonucleolytic cleavage of the strand in either a 3 ' -5 ' or 5 ' -3 ' direction from the nick to the mismatch. In vivo, this is followed by re-synthesis involving DNA polymerase III holoenzyme and other factors (Cleaver, Cell, 76:1-4 (1994) ) .
  • Non-limiting examples of useful mismatch repair proteins from other organisms include those derived from Salmonella typhimurium (MutS, MutL) (purified MutS, MutL, and MutH are used to cleave mismatch regions (Su et al . , Proc. Natl. Acad. Sci . USA 83:5057 (1986)); Grulley et al . , J. Biol. Chem. 264:1000 (1989)). Streptococcus pneumoniae (HexA,
  • mismatches are identified by nicking one of the strands in the immediate vicinity of the mismatch, for example between the mismatch nucleotide and the next nucleotide on the 5' side.
  • the all-type nicking enzyme (ATE) from human HeLa cells or calf thymus can nick DNA at the first phosphodiester bond 5' to all 8 possible mismatched bases.
  • the strand disparity of this nicking is influenced by the neighboring nucleotide sequences.
  • the ATE covalently binds the 3 ' end of the DNA product to form a cleavable complex.
  • Topoisomerases I introduce transient DNA single-strand breaks by forming a catalytic intermediate in which a covalent bond is generated between an enzyme tyrosine residue (Tyr723 for human topoisomerase I) and the 3 ' -end of the broken DNA.
  • tyrosyl-DNA phosphodiesterase-1 Tdpl
  • Polynucleotide kinase phosphatase is then used to regenerate the 3 ' hydroxyl to create a substrate for DNA polymerase immediately 5' of the mismatch.
  • mismatch specific nucleases In plants, mismatch specific nucleases have been described that cleave DNA strands on the 3' side of the mismatch. Some of these nucleases cut only one strand resulting in mismatch termination on one strand and some of the nucleases cut both strands resulting in the creation of new DNA fragments (Sokurenko et al . , N.A.R. 2001, 29:elll; Zhang et al., Genetic Testing and Molecular Biomarkers 2009, 13:97-103) .
  • heteroduplexes are incubated with a mismatch endonuclease enzyme derived from celery (Cell and Celll mismatch endonucleases ) that is purified essentially as described in U.S. Patent 7,129,075.
  • Incubations may be performed in, e.g., 20mM Hepes pH 7.5, lOmM NaCl, 3mM MgCl 2 , varying amounts of DNA and Cel nuclease at 37°C for up to 1 hour .
  • Nucleotide excision repair proteins In E. coli, four proteins, designated UvrA, UvrB, UvrC, and UvrD, interact to repair nucleotides that are damaged by UV light or otherwise chemically modified (Sancar, Science 266:1954, 1994), and also to repair mismatches (Huang et al., Proc. Natl. Acad. Sci. USA 91:12213 (1994)) .
  • UvrA an ATPase, makes an A 2 Bi complex with UvrB, binds the site of the lesion, unwinds and kinks the DNA, and causes a conformational change in UvrB that allows it to bind tightly to the lesion site.
  • UvrA then dissociates from the complex, allowing UvrC to bind.
  • UvrB catalyzes an endonucleolytic cleavage at the fifth phosphodiester bond 3' from the lesion;
  • UvrC then catalyzes a similar cleavage at the eighth phosphodiester bond 5' from the lesion.
  • UvrD (helicase II) releases the excised oligomer.
  • DNA polymerase I displaces UvrB and fills in the excision gap, and the patch is ligated.
  • mismatch-containing duplexes formed between DNA strands are treated with a combination (e.g., mixture) of UvrA, UvrB, UvrC, with or without UvrD.
  • the proteins may be purified from wild-type E. coli, or from E. coli or other appropriate host cells containing recombinant genes encoding the proteins, and are formulated in compatible buffers and concentrations .
  • the final product is a heterohybrid DNA containing a single-stranded gap covering the site of the mismatch.
  • Excision repair proteins for use in the present invention may be derived from E. coli (as described above) or from any organism containing appropriate functional homologs .
  • useful homologs include those derived from S. cerevisiae (RAD1, 2, 3, 4, 10, 14, and 25) and humans (XPF, XPG, XPD, XPC, XPA, ERCC1, and XPB) (Sancar, Science 266:1954 (1994)).
  • the excised patch comprises an oligonucleotide extending 5 nucleotides from the 3' end of the lesion and 24 nucleotides from the 5' end of the lesion.
  • Aboussekhra et al., Cell 80:859 (1995) disclose a reconstituted in vitro system for nucleotide excision repair using purified components derived from human cells.
  • Mismatch-containing heterohybrid DNA may be chemically modified by treatment with osmium tetroxide (for mispaired thymidines) and hydroxylamine (for mispaired cytosines), using procedures that are well known in the art (see, e.g., Grompe, Nature Genetics 5:111 (1993); and Saleeba et al . , Meth. Enzymol. 217:288 (1993)).
  • the chemically modified DNA is contacted with excision repair proteins (as described above).
  • the hydroxylamine- or osmium-modified bases are recognized as damaged bases in need of repair, one of the DNA strands is selectively cleaved, and the product is a gapped heteroduplex as above .
  • Resolvases are enzymes that catalyze the resolution of branched DNA intermediates that form during recombination events (including Holliday structures, cruciforms, and loops) via recognition of bends, kinks, or DNA deviations (Youil et al . , Proc. Natl. Acad. Sci. USA 92:87 (1995)).
  • Endonuclease VII derived from bacteriophage T4 (T4E7) recognizes mismatch regions of from one to about 50 bases and produces double-stranded breaks within six nucleotides from the 3' border of the mismatch region.
  • T4E7 may be isolated from, e.g., a recombinant E.
  • T7E1 Endonuclease I of bacteriophage T7 (T7E1), which can be isolated using a polyhistidine purification tag sequence (Mashal et al., Nature Genetics 9:177 (1995) ) .
  • the fragments resulting from various mismatch nucleases may have 5' or 3' extensions of one or more nucleotides, one or more of which is a mismatch nucleotide. These fragments result from mismatch repair enzymes that introduce either a double or single strand break at or near the site of the mismatch.
  • the mismatch nuclease cleaves a phosphodiester bond on either the 5' or 3' side one or both mismatch nucleotides (Table 1) .
  • each mismatch nucleotide is first purified away from all other DNA fragments as well as from the other mismatch nucleotide of the mismatched pair.
  • purification of each fragment is accomplished by virtue of the unmodified 3' hydroxyl group that is produced by the mismatch nuclease, with or without an intermediate enzymatic step (Table 1) .
  • identifying, purifying and sequencing in the mismatch region containing one or both mismatch nucleotides embedded in the blocked DNA fragment occurs after a mismatch nuclease produces a single strand nick on the 3' side of a mismatch nucleotide (embodiment 1 in Table 1) .
  • the procedures may include the following steps: a) covalently modifying the newly exposed 3 ' hydroxyl of the mismatch nucleotide.
  • An example of covalent modification includes the addition of a dideoxy nucleotide that preferably has itself been modified to allow for purification, e.g., with a purification tag.
  • the dideoxy nucleotide may be attached to a linker moiety and a binding moiety such as with biotin, the linker preferably containing a scissile function such as, for example, a disulfide that can be cleaved with a reducing agent.
  • the covalent modification of the mismatch nucleotide is by the attachment of another nucleotide that has a removable blocking group on its 3' end. This allows for the subsequent regeneration of a 3 ' hydroxyl on that strand of DNA.
  • reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups.
  • This fragment of DNA can optionally be captured (e.g., on a solid support) and sequenced if desired.
  • An example of a single strand-specific nuclease for this step is Bal31 nuclease .
  • An example of separating the mismatch nucleotides is by, for example, heating to the point of melting and strand separation, whereby one of the single strand DNAs remains attached to the solid support and the other strand is released into solution. Prior to melting, DNA linkers may optionally be attached to the ends of the double strand DNA to facilitate mismatch identification and sequencing.
  • the removable group blocking the 3' hydroxyl if present, is removed in order to regenerate the free 3' hydroxyl. Removal of an azidomethyl blocking group, for example, is by the use of Tris ( 2-carboxyethyl ) phosphine in aqueous solution.
  • mismatch region containing one or both mismatch nucleotides embedded in the blocked DNA fragment occurs after a mismatch nuclease produces a single strand nick on the 5' side of a mismatch nucleotide (embodiment 2 in Table 1) .
  • the procedure may include the following steps: a) covalently modifying the newly exposed 3 ' hydroxyl of the nucleotide adjacent to the mismatch nucleotide.
  • An example of covalent modification includes the addition of a dideoxy nucleotide that preferably has itself been modified to allow for purification.
  • the dideoxy nucleotide may be comprised of a linker moiety and a binding moiety such as biotin, the linker preferably containing a scissile function such as, for example, a disulfide that can be cleaved with a reducing agent.
  • the covalent modification of nucleotide adjacent to the mismatch nucleotide is by the attachment of another nucleotide that has a removable blocking group on its 3' end. This allows for the subsequent regeneration of a 3 ' hydroxyl on that strand of DNA.
  • reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups.
  • An example of a single strand-specific nuclease for this step is Bal31 nuclease.
  • the captured DNA fragment if attached by a scissile linker as described above, can optionally be released and sequenced if desired.
  • the released fragment containing the mismatch nucleotides is optionally re-modified and re-captured using the aforementioned techniques.
  • the released fragment can be modified by the attachment of linkers to facilitate subsequent mismatch nucleotide separation and sequencing.
  • linkers optionally contain the functionalities useful for purification (e.g., biotin) and sequencing (e.g., oligonucleotide primer annealing sites).
  • the biotinylated fragment containing the mismatch nucleotides is preferably recaptured by, for example, binding to streptavidin-Sepharose .
  • An example of separating the mismatch nucleotides is by, for example, heating to the point of melting and strand separation, whereby one of the single strand DNAs remains attached and the other strand is released and captured. Prior to melting, DNA linkers may optionally be attached to the ends of the double strand DNA to facilitate mismatch identification and sequencing.
  • the removable blocking 3' hydroxyl group if present, is removed in order to regenerate the free 3' hydroxyl. Removal of an azidomethyl blocking group, for example, is by the use of Tris ( 2-carboxyethyl ) phosphine in aqueous solution.
  • DNA strands are amplified and sequenced separately either as separately cloned nucleic acids or as separate clusters of amplified DNA or as separate individual strands of DNA. In all cases, sequencing will reveal the identity of the mismatch nucleotide and the sequence of other nucleotides in the mismatch region.
  • mismatch region containing one or both mismatch nucleotides embedded in the blocked DNA fragment occurs after a mismatch nuclease produces a double strand break on the 3' side of a mismatch nucleotide (embodiment 3 in Table 1) .
  • the procedure may include the following steps: a) covalently modifying the newly exposed 3' hydroxyls of each mismatch nucleotide.
  • An example of covalent modification includes the addition of a dideoxy nucleotide that preferably has itself been modified to allow for purification.
  • the dideoxy nucleotide may be attached to a linker moiety and a binding moiety, such as biotin, the linker preferably containing a scissile function such as, for example, a disulfide that can be cleaved with a reducing agent.
  • the covalent modification of the mismatch nucleotide is by the attachment of another nucleotide that has a removable blocking group on its 3' end. This allows for the subsequent regeneration of a 3' hydroxyl on that strand of DNA.
  • reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups .
  • the covalent modification of the mismatch nucleotide is by the attachment of a DNA linker that has an aforementioned attached biotin group on a scissile linker and/or optionally a removable blocking group on its 3' terminal hydroxyl group.
  • c) Releasing of the captured fragments for example, by the use of a reducing agent (e.g., 10-100 mM mercaptoethanol) if a disulfide scissile linker was used for capture.
  • a reducing agent e.g., 10-100 mM mercaptoethanol
  • the removable blocking 3' hydroxyl group if present, is removed in order to regenerate the free 3' hydroxyl. Removal of an azidomethyl blocking group, for example, is by the use of Tris ( 2-carboxyethyl ) phosphine in aqueous solution.
  • the individual DNA fragments containing mismatch nucleotides are amplified and sequenced separately either as separately cloned nucleic acids or as separate clusters of amplified DNA or as separate individual fragments of DNA. In all cases, sequencing will reveal the identity of the mismatch nucleotide and the sequence of other nucleotides in the mismatch region.
  • purifying, identifying, and sequencing in the mismatch region containing one or both mismatch nucleotides embedded in the blocked DNA fragment occurs after a mismatch nuclease produces a double strand break on the 5' side of each mismatch nucleotide (embodiment 4 in Table 1) .
  • the procedure may include the following steps: a) covalently modifying the newly exposed 3 ' hydroxyl of the nucleotides that were adjacent to the mismatch nucleotides in the original blocked DNA fragment.
  • An example of covalent modification includes the addition of a dideoxy nucleotide that preferably is itself modified to allow for purification.
  • the dideoxy nucleotide may be attached to a linker moiety and a binding moiety, such as biotin, the linker preferably containing a scissile function such as, for example, a disulfide that can be cleaved with a reducing agent.
  • the covalent modification of the nucleotides that had been adjacent to the mismatch nucleotides is by the attachment of another nucleotide that has a removable blocking group on its 3' end. This would allow for the subsequent regeneration of a 3' hydroxyl on that strand of DNA.
  • reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups.
  • a reducing agent e.g., 10-lOOmM mercaptoethanol
  • the removable blocking 3' hydroxyl group if present, is removed in order to regenerate the free 3' hydroxyl. Removal of an azidomethyl blocking group, for example, is by the use of Tris ( 2-carboxyethyl ) phosphine in aqueous solution.
  • the two DNA fragments derived from the mismatch nucleotide cutting may be ligated to a linker DNA population that contains all four nucleotides at one terminus of the linker protruding as single overhanging nucleotides.
  • the body of the linker may be the same and the terminal nucleotide (either 5' or 3') will be either A, G, C, or T, thus constituting four subpopulations of linkers.
  • a linker may have the specific sequence 5' AAGGCCTT 3 ' .
  • 5' AAGGCCTTN 3' could have a 3' TTCCGGAA 5' complementary strand comprising linkers with a protruding N at the 3' end of one strand.
  • the linkers may contain an overhanging 3' nucleotide containing a 3 ' -hydroxyl as well as a recessed nucleotide containing a 5 ' -phosphate .
  • the linkers may contain an overhanging 5' nucleotide containing a 5 '-phosphate as well as a recessed nucleotide containing a 3 ' -hydroxyl .
  • the linkers may optionally contain covalently attached or removable fluorescent moieties wherein the excitation wavelength of the fluorescent moiety is specific for a particular overhanging nucleotide and can thereby indicate the identity of the mismatch nucleotide after ligation of the appropriate complementary nucleotide (attached to the linker) to the fragment containing the mismatch nucleotide.
  • each mismatch nucleotide introduces the complementary nucleotide attached to the linker.
  • a different linker may also be ligated to the other end of the fragment that comprises either a blunt end or a restriction enzyme sticky end .
  • the linkers may optionally include or be attached to other functional, e.g., binding moieties allowing for purification or solid phase attachment.
  • binding moieties allowing for purification or solid phase attachment.
  • An example of a moiety for purification is biotin.
  • moieties for solid phase attachment are amino groups and thiol groups .
  • the attached linkers are suitable for amplification of the DNA fragment, for example, by the use of the polymerase chain reaction.
  • the linkers are suitable for adapting to any of the various DNA sequencing technologies.
  • the blocked DNA fragments containing a nick on the 3' or 5' side of a mismatch are further modified using a mechanical shearing device (Joneja et al., BioTechniques 46:553-556 (2009)) that preferentially breaks DNA at the site of a nick.
  • the resulting two fragments are then purified and modified by the aforementioned techniques to allow for mismatch nucleotide identification and sequencing in the vicinity of the mismatch.
  • Other methods of identifying sites of mismatched nucleotides and the adjacent DNA sequence in the mismatch region include but are not limited to using glycosylases and polymerases. These enzymes function directly on mismatches or at DNA nicks in a mismatch region since the mismatch base is either destroyed (glycosylases) or changed to become the complement nucleotide of the opposite strand. This results in an ambiguous identification of mismatch nucleotides and does not provide confirmation or identification of a true mismatched nucleotide pair.. The separation of the mismatch nucleotide pair allows for sequence determination that does not change the identity of the mismatch nucleotide. In addition, the purification of mismatch fragments greatly reduces the total amount of DNA and number of fragments to be evaluated .
  • the modification involves a template-independent attachment of a nucleotide triphosphate to the 3' end of the fragments.
  • the dNTP attachment is catalyzed by terminal deoxynucleotidyl transferase (TdT) and the nucleotide triphosphate contains a covalently attached biotin moiety.
  • biotinylated dNTP is a dideoxy nucleotide and the linker to biotin is a scissile linker that can be cleaved chemically or enzymatically .
  • the scissile linker contains a disulfide group that can be cleaved by a reducing agent.
  • genomic DNA is first digested with a restriction enzyme (for example, Kpnl which will leave overhanging 3' sticky ends comprised of 5'GTAC3' and an average fragment length of 4096 bp) .
  • a restriction enzyme for example, Kpnl which will leave overhanging 3' sticky ends comprised of 5'GTAC3' and an average fragment length of 4096 bp.
  • the fragments are then modified at all or substantially all of the 3'-hydroxyls by the action of TdT and ddNTP .
  • Fragmented DNA containing modified 3 ' -ends is then heated until single strands of DNA are predominant, followed by cooling the sample to a temperature where high stringency re-annealing occurs .
  • the 3 ' -hydroxy1 modification using TdT (the "blocking step") is performed after the melting and re-annealing of the fragments .
  • the re-annealed double stranded fragments are then reacted with a mismatch nuclease that produces a nick or gap on one strand at or near the 3 ' -end of a mismatch nucleotide.
  • the newly exposed 3'-hydroxyl is then modified using TdT and a biotinylated ddNTP (the "biotinylation step") where the linker attaching biotin to the dideoxynucleotide is a scissile linker that can be cleaved by enzymatic or chemical means.
  • Biotin binding column for example, streptavidin-Sepharose
  • the captured fragments may be released using a dilute reducing agent (e.g., 10-lOOmM dithiothreitol or mercaptoethanol ) .
  • the eluted DNA fragments can then be ligated directly into an appropriate vector.
  • the purified fragments can be digested with a single strand specific nuclease (for example, SI, mung bean or Bal31 nuclease) and then cloned into an appropriate vector (e.g., with Smal/Kpnl termini).
  • an appropriate vector e.g., with Smal/Kpnl termini.
  • single strand region of the purified fragments can be filled in by the action of a polymerase in the presence of dNTPs followed by ligation into an appropriate vector.
  • linker DNAs are added to the ends of the purified fragments to allow for cloning, attachment to a solid support sequencing, or attachment of primer sites for cloning, sequencing or PCR.
  • Solid support refers to any solid surface to which nucleic acids can be covalently attached, such as for example latex beads, dextran beads, polystyrene, polypropylene surface, polyacrylamide gel, gold surfaces, glass surfaces and silicon wafers.
  • the solid support is a glass surface.
  • “Chemically-modifiable functional group” refers to a group such as for example, a phosphate group, a carboxylic or aldehyde moiety, a thiol, an amine, or a hydroxyl group.
  • Nucleic acid coordinate refers to a discrete area containing multiple copies of a nucleic acid strand or a synthetic oligonucleotide of known sequence. Multiple copies of the complementary strand to the nucleic acid strand may also be present in the same coordinate. The multiple copies of the nucleic acid strands making up the coordinates are generally immobilized on a solid support and may be in a single or double stranded form.
  • the attachment of the oligonucleotide primer as well as the extended nucleic acid template on the solid support is thermostable at the temperature to which the support may be subjected to during the nucleic acid amplification reaction, for example temperatures of up to approximately 100°C, for example approximately 94°C.
  • the attachment is covalent in nature.
  • the covalent binding of synthetic primers to the solid support is induced by a crosslinking or grafting agent such as for example l-ethyl-3- ( 3-dimethylaminopropyl ) -carbodiimide hydrochloride (EDC), succinic anhydride, phenyldiisothiocyanate or maleic anhydride, or a hetero-bifunctional crosslinker such as for example m-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS),
  • a crosslinking or grafting agent such as for example l-ethyl-3- ( 3-dimethylaminopropyl ) -carbodiimide hydrochloride (EDC), succinic anhydride, phenyldiisothiocyanate or maleic anhydride, or a hetero-bifunctional crosslinker such as for example m-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS),
  • SMCC N-y-maleimidobutyryloxy-succinimideester
  • GMBS N-y-maleimidobutyryloxy-succinimideester
  • Preferred crosslinking reagents for use in the present invention are s-SIAB, s-MBS and EDC.
  • s-MBS is a maleimide-succinimide hetero-bifunctional cross-linker
  • s-SIAB is an iodoacethyl-succinimide hetero-bifunctional cross-linker. Both linkers are capable of forming a covalent bond respectively with SE groups and primary amino groups.
  • EDC is a carbodiimide-reagent that mediates covalent attachment of phosphate and amino groups .
  • the solid support has a derivatized surface.
  • Derivatized surface refers to a surface which has been modified with chemically reactive groups, for example amino, thiol or acrylate groups.
  • the derivatized surface of the solid support is subsequently modified with bifunctional crosslinking groups to provide a functionalized surface, preferably with reactive crosslinking groups.
  • Fullyized surface refers to a derivatized surface which has been modified with specific functional groups, for example the maleic or succinic functional moieties.
  • the solid support may be any solid surface to which nucleic acids can be attached, such as for example latex beads, dextran beads, polystyrene, polypropylene surface, polyacrylamide gel, gold surfaces, glass surfaces and silicon wafers.
  • the solid support is a glass surface and the attachment of nucleic acids thereto is a covalent attachment .
  • crosslinkers such as succinic anhydride, phenyldiisothiocyanate (Guo et al . , (1994)), or maleic anhydride (Yang et al., (1998)).
  • Another widely used crosslinker is l-ethyl-3- ( 3-dimethylamonipropyl ) - carbodiimide hydrochloride (EDC).
  • EDC chemistry was first described by Gilham et al. (1968) who attached DNA templates to paper (cellulose) via the 5' end terminal phosphate group. Using EDC chemistry, other supports have been used such as, latex beads (Wolf et al .
  • oligonucleotide primers need to be specifically attached at their 5' ends to the solid surface, preferably glass.
  • the glass surface can be derivatized with reactive amino groups by silanization using amino-alkoxy silanes .
  • Suitable silane reagents include aminopropyltrimethoxysilane, aminopropyltriethoxysilane and 4-aminobutyltriethoxysilane .
  • Glass surfaces can also be derivatized with other reactive groups, such as acrylate or epoxy using epoxysilane, acrylatesilane and acrylamidesilane .
  • nucleic acid molecules or oligonucleotides having a chemically modifiable functional group at their 5' end are covalently attached to the derivatized surface by a crosslinking reagent such as those described above .
  • the derivatization step can be followed by attaching a bifunctional cross-linking reagent to the surface amino groups thereby providing a modified functionalized surface.
  • Nucleic acid molecules colony primers or nucleic acid templates
  • thiol or amino groups are then reacted with the functionalized surface forming a covalent linkage between the nucleic acid and the glass .
  • the oligonucleotide primers are generally modified at the 5 ' end by a phosphate group or by a primary amino group (for EDC grafting reagent) or a thiol group (for s-SIAB or s-MBS linkers) .
  • another aspect of the invention provides a solid support, to which there is attached a plurality of oligonucleotide primers or nucleic acids .
  • a plurality of nucleic acid templates is attached to the solid support, such as glass.
  • the attachment of the oligonucleotide primers to the solid support is covalent.
  • a yet further aspect of the invention provides an apparatus for carrying out the methods of the invention.
  • Such apparatus may include for example a plurality of nucleic acid templates and oligonucleotide primers of the invention bound, preferably covalently, to a solid support as outlined above, together with a nucleic acid polymerase, a plurality of nucleotide precursors such as those described above, a proportion of which may be detectably labeled, and a means for controlling temperature.
  • the apparatus may include for example a solid support comprising one or more nucleic acids.
  • the apparatus may also contain a detecting means for detecting and distinguishing signals from individual nucleic acids arrayed on the solid support according to the methods of the present invention.
  • a detecting means may contain a charge-coupled device operatively connected to a magnifying device such as a microscope as described above.
  • a magnifying device such as a microscope as described above.
  • any apparati of the invention are provided in an automated form454 pyrosequencing (Roche Diagnostics), SOLiD Sequencing (Applied Biosystems), Helioscope sequencing (Helios Inc.) are examples of available automated sequencing systems .
  • DNA from one source is annealed to itself or to DNA from another source (e.g., individual) to form mismatch regions and then treated with a mismatch recognition protein-based system, e.g., mismatch nicking proteins, mismatch repair proteins, excision repair proteins, mismatch nucleases, chemical modification and cleavage reagents, or combinations of such agents.
  • a mismatch recognition protein-based system e.g., mismatch nicking proteins, mismatch repair proteins, excision repair proteins, mismatch nucleases, chemical modification and cleavage reagents, or combinations of such agents.
  • This treatment introduces single-stranded breaks at predetermined locations on one or both sides of a mismatched nucleotide and may cause the selective excision of single-stranded fragment covering the mismatch region. Alternatively, the treatment results in a single nick being introduced at the 5' end of the mismatch.
  • the resulting structure is a nicked or gapped heteroduplex in which the gap may be from about 5 to about 1000 bases in length, depending on the mismatch recognition system used. In the case of a nick, no gap is formed but a free 3' hydroxyl is present at the site of the mismatch.
  • sequence determination may be carried out using any appropriate sequencing technique.
  • one technique of sequence determination that may be used in the present invention involves hybridizing an appropriate primer, sometimes referred to herein as a "sequencing primer", with the nucleic acid template to be sequenced, extending the primer and detecting the nucleotides used to extend the primer.
  • the nucleic acid used to extend the primer is detected before a further nucleotide is added to the growing nucleic acid chain, thus allowing base-by-base in situ nucleic acid sequencing.
  • the linker for one end may contain the sticky end of the restriction site (e.g., Kpnl) . Attachment of a linker at the other end may exploit complementarity to the two base 3' overhang (e.g., 5'NTN3') . At either end an attached linker may also contain any sequence required to impart a useful annealing site for, for example, attachment to a solid support, direct sequencing, or directional cloning.
  • nucleotides with fluorescent reversible 3' terminators allow each cycle of a sequencing reaction to occur simultaneously for all coordinates in the presence of all four nucleotides (A, C, T, and G) .
  • the polymerase is able to select the correct base to incorporate, with the natural competition among all four alternatives leading to higher accuracy than methods where only one nucleotide is present in the reaction mix at a time.
  • Sequences where a particular base is repeated e.g., homopolymers
  • the simultaneous sequencing of the thousands of clusters present on the solid support is accomplished by recording the unique fluorescent signal for each nucleotide at each position during every cycle of the process. After recording, the fluorescent terminators are removed, e.g., by a chemical reaction for example by the addition of a low pH solution such that the next round of polymerase additions can proceed.
  • the detection of incorporated nucleotides is facilitated by including one or more labeled nucleotides in the primer extension reaction.
  • Any appropriate detectable label may be used, for example a fluorophore, radiolabel etc.
  • a fluorescent label is used.
  • the same or different labels may be used for each different type of nucleotide. Where the label is a fluorophore and the same labels are used for each different type of nucleotide, each incorporated nucleotide can provide a cumulative increase in signal detected at a particular wavelength. If different labels are used then these signals may be detected at the different appropriate wavelengths. If desired, a mixture of labeled and unlabelled nucleotides is provided.
  • nucleic acid template In order to allow the hybridization of an appropriate sequencing primer to the nucleic acid template to be sequenced, the nucleic acid template should normally be in a single stranded form. If the nucleic acid templates making up the nucleic acid colonies are present in a double stranded form they can be processed to provide single stranded nucleic acid templates using methods well known in the art, for example by denturation, cleavage etc.
  • the sequencing primers which are hybridized to the nucleic acid template and used for primer extension are preferably short oligonucleotides, generally ranging from 15 to 25 nucleotides in length.
  • the sequence of the primers is designed so that they hybridize to part of the nucleic acid template to be sequenced, preferably under stringent conditions.
  • the sequence of the primers used for sequencing may have the same or similar sequences to that of the colony primers used to generate the nucleic acid colonies of the invention.
  • the sequencing primers may be provided in solution or in an immobilized form.
  • primer extension is carried out, for example using a nucleic acid polymerase and a supply of nucleotides, at least some of which are provided in labeled form, and under conditions suitable for primer extension if a suitable nucleotide is provided. Examples of nucleic acid polymerases and nucleotides which may be used are described above.
  • a washing step is performed in order to remove unincorporated nucleotides which may interfere with subsequent steps.
  • the nucleic acid colony is monitored in order to determine whether a labeled nucleotide has been incorporated into an extended primer.
  • the primer extension step may then be repeated in order to determine the next and subsequent nucleotides incorporated into an extended primer .
  • any device allowing detection and preferably quantification of the appropriate label for example fluorescence or radioactivity, may be used for sequence determination. If the label is fluorescent a CCD camera optionally attached to a magnifying device (as described above), may be used. In fact the devices used for the sequence determining aspects of the present invention may be the same as those described above for monitoring the amplified nucleic acid colonies .
  • the detection system is preferably used in combination with an analysis system in order to determine the number and nature of the nucleotides incorporated at each cluster after each step of primer extension. This analysis, which may be carried out immediately after each primer extension step, or later using recorded data, allows the sequence of the nucleic acid template within a given cluster to be determined.
  • the sequence being determined is unknown, the nucleotides applied to a given cluster are usually applied in a chosen order which is then repeated throughout the analysis, for example dATP, dTTP, dCTP, dGTP . If, however, the sequence being determined is known and is being re-sequenced, for example to analyze whether or not small differences in sequence from the known sequence are present, the sequencing determination process may be made quicker by adding the nucleotides at each step in the appropriate order, chosen according to the known sequence. Differences from the given sequence are thus detected by the lack of incorporation of certain nucleotides at particular stages of primer extension. Thus full or partial sequences of the amplified nucleic acid templates making up particular nucleic acid colonies may be determined using the methods of the present invention.
  • the full or partial sequence of more than one nucleic acid can be determined by determining the full or partial sequence of the amplified nucleic acid templates present in more than one nucleic acid coordinate. Preferably a plurality of sequences is determined simultaneously.
  • the attachment of the oligonucleotide primer as well as the extended nucleic acid template on the solid support is thermostable at the temperature to which the support may be subjected to during the nucleic acid amplification reaction, for example temperatures of up to approximately 100°C, for example approximately 94°C.
  • the attachment is covalent in nature.
  • the heteroduplexes are incubated with an appropriate DNA polymerase enzyme in the presence of dideoxynucleotides .
  • Suitable enzymes for use in this step include without limitation DNA polymerase I, DNA polymerase III holoenzyme, T4 DNA polymerase, and T7 DNA polymerase. The only requirement is that the enzyme be capable of accurate DNA synthesis using the gapped heteroduplex as a substrate.
  • the methods of the present invention are particularly suitable for high-throughput analysis of DNA, i.e., the rapid and simultaneous processing of genomic DNAs derived from an individual. Furthermore, in contrast to other methods for de novo mutation detection, the methods of the present invention are suitable for the simultaneous analysis of a large number of DNA fragments in a single reaction. This is referred to as "multiplex" analysis.
  • the manipulations involved in practicing the methods of the present invention lend themselves to automation, e.g., using multiwell formats as a solid support or as a receptacle for, e.g., beads; robotics to perform sequential incubations and washes; and, finally, automated sequencing using commercially available automated DNA sequencers.
  • populations of for example organisms or cells or tissues can be identified by the amplification of the sample DNA into coordinates, followed by the DNA sequencing of the specific "tags" for each individual genetic entity.
  • the genetic diversity of the sample can be defined by counting the number of tags from each individual entity.
  • the expressed mRNA molecules of a tissue or organism under investigation are converted into cDNA molecules which are amplified into sets of colonies for DNA sequencing.
  • the frequency of coordinates coding for a given mRNA is proportional to the frequency of the mRNA molecules present in the starting tissue.
  • a whole genome slide where the entire genome of a living organism is represented in a number of DNA colonies, numerous enough to contain all the sequences of that genome, may be prepared using the methods of the invention.
  • the genome slide is the genetic card of any living organism. Genetic cards have applications in medical research and genetic identification of living organisms of industrial value .
  • the present invention may also be used to carry out whole genome sequencing where the entire genome of a living organism is amplified as sets of coordinates for extensive DNA sequencing.
  • Whole genome sequencing allows for example, 1) a precise identification of the genetic strain of any living organism; 2) discovery of novel genes encoded within the genome; and 3) discovery of novel genetic polymorphisms.
  • nucleic acid tags can be incorporated into the nucleic acid templates and amplified, and different nucleic acid tags can be used for each cellular source or organism/individual .
  • sequence of the amplified nucleic acid is determined, the sequence of the tag may also be determined and the origin of the sample identified.
  • a further aspect of the invention provides the use of the methods of the invention, or the nucleic acid colonies of the invention, or the plurality of nucleic acid templates of the invention, or the solid supports of the invention, for providing nucleic acid molecules for sequencing and re-sequencing, gene expression monitoring, genetic diversity profiling, diagnosis, screening, whole genome sequencing, whole genome polymorphism discovery and scoring and the preparation of whole genome slides (i.e., the whole genome of an individual on one support), or any other applications involving the amplification of nucleic acids or the sequencing thereof.
  • a yet further aspect of the invention provides a kit for use in sequencing, re-sequencing, gene expression monitoring, genetic diversity profiling, diagnosis, screening, whole genome sequencing, whole genome polymorphism discovery and scoring, or any other applications involving the amplification of nucleic acids or the sequencing thereof.
  • This kit contains a plurality of nucleic acid templates and colony primers of the invention bound to a solid support, such as a chip as outlined above.
  • a chip having affixed thereto, directly or indirectly, a plurality (typically in the order of thousands to millions) of DNA fragments (e.g., restriction fragments) of known sequence.
  • fragments preferably single stranded, were purified by virtue of the presence of at least one mismatched nucleotide contained therein, and thus serve as the annealing templates for similarly processed DNA.
  • the annealed DNA thus captured on the chip can be used to readily identify the presence of mismatched nucleotides in other target DNA's. These represent the mutational fingerprint of an individual genome, combined genomes, heritable traits or disease states.
  • Example 1 Purification and sequencing of mismatch nucleotides from PCR amplified DNA.
  • IgG immunoglobulin heavy chain cDNA of known sequence that is cloned into the Zero Blunt PCR Cloning Vector (InVitrogen) is PCR amplified using appropriate primers that introduced a Kpnl and a Pstl site into the PCR product.
  • the PCR reaction uses error prone Taq polymerase.
  • Control IgG DNA (unamplified) is derived from the purified vector using the Qiagen Miniprep Kit and is digested with Kpnl and Pstl and the resulting IgG cDNA insert is purified by agarose gel electrophoresis and the Qiagen Gel Extraction Kit.
  • the PCR product is digested with Kpnl and Pstl and is electrophoresed in an agarose gel and purified using the Qiagen Gel Extraction Kit.
  • the PCR DNA and the purified insert DNA from the vector are separately subjected to the same procedures described below.
  • the DNA is reacted with dideoxy ATP and 100 units of TdT in 20mM Tris-acetate, 50mM potassium acetate, lOmM magnesium acetate, pH 7.9, supplemented with 0.25mM CoCl 2 , at 37°C for 1 hour.
  • the reaction products are then purified using a Qiagen DNeasy Kit capture column.
  • the categories of products of the nuclease reaction are expected to be 1) double stranded fragments with a dideoxy modified Kpnl or Pstl sticky end (5' GTACA-dd 3', 5' TGCAA-dd 3', the dideoxy adenosine having been added in the blocking step) at one end of the fragment and a 3' protruding mismatch nucleotide with a 3 ' hydroxyl at the other end of the fragment; 2) double stranded fragments with a dideoxy adenosine modified Kpnl or Pstl sticky end at both ends of the fragment and a 3' protruding mismatch nucleotide with a 3' hydroxyl within the fragment (single strand nick); 3) a large population of perfectly complementary fragments with the dideoxy adenosine modified Kpnl sticky ends at one end of the fragment and a dideoxy adenosine modified Pstl sticky end at the
  • [ 0141 ] The fragment population is then treated with 100 units of TdT in 20mM Tris-acetate, 50mM potassium acetate, lOmM magnesium acetate, pH 7.9, supplemented with 0.25mM CoCl 2 and dideoxy UTP [-S-S-] biotin at 37°C for 1 hour.
  • [-S-S-] refers to a linker strand containing -15-20 C- or N-linked atoms and a disulfide group within the strand.
  • the fragments are then treated with a single-strand specific nuclease (SaI31 nuclease) to release an unmodified double strand fragment from category 2) above.
  • Biotinylated fragments are then purified by passing through a 1ml column of Streptavidin-Sepharose .
  • the captured fragments are eluted with 50-100mM dithiothreitol or beta-mercaptoethanol .
  • the DNA fragments are again purified using the Qiagen DNeasy Kit capture column .
  • the purified fragments are modified by attachment of DNA linkers to allow for cloning and/or sequencing.
  • the first linker has a sticky end that is complementary to either the Kpnl or Pstl sticky ends with the attached ddATP (i.e., 3' TCATG 5' or 3' TACGT 5') ⁇
  • the other end of the linker is blunt and the total linker length is -15-20 nucleotides in order to remain double-stranded during ligation.
  • T4 DNA ligase is added to the linker /fragment mix covalently joining the 3 ' hydroxyl of the linker with the 5 ' phosphate of the fragment .
  • Example 2 Purification of mismatch nucleotides from genomic DNA.
  • Genomic DNA (-25 g) is isolated from human red blood cells (San Diego blood bank) using a Qiagen DNeasy Tissue Kit. The resulting fragment sizes range up to -50 kb and averaged -20-30 kb, as judged by agarose gel electrophoresis. The genomic DNA is then digested with Kpnl restriction endonuclease at 37°C for 4 hours. The restriction fragments less than -8 kb are then purified from an agarose gel using a Qiagen Gel Extraction Kit.
  • the DNA is reacted with dideoxy ATP and 100 units of TdT in 20mM Tris-acetate, 50mM potassium acetate, lOmM magnesium acetate, pH 7.9, supplemented with 0.25mM CoCl 2 , at 37° C for 1 h.
  • the reaction products were then purified using a Qiagen DNeasy Kit capture column.
  • the categories of fragments of the nuclease reaction are expected to be 1) double stranded fragments with a dideoxy modified Kpnl sticky end (5' GTACA-dd 3', the dideoxy adenosine having been added in the blocking step) at one end of the fragment and a 3' protruding mismatch nucleotide with a 3 ' -hydroxyl at the other end of the fragment; 2) double stranded fragments with a dideoxy modified blunt end at one end of the fragment and a 3' protruding mismatch nucleotide with a 3 '-hydroxyl at the other end of the fragment; 3) double stranded fragments with a dideoxy modified Kpnl sticky end (5' GTACA-dd 3', the dideoxy adenosine having been added in the blocking step) at both ends of the fragment and a 3' protruding mismatch nucleotide with a 3 ' hydroxyl within the fragment (single
  • [ 0151 ] The fragment population is then treated with 100 units of TdT in 20mM Tris-acetate 50mM potassium acetate lOmM magnesium acetate pH 7.9 supplemented with 0.25 mM CoCl 2 and dideoxy UTP [-S-S-] iotin at 37°C for 1 hour.
  • [-S-S-] refers to a linker strand containing -15-20 C- or N-linked atoms and a disulfide group within the strand.
  • the fragments are treated with a single-strand specific nuclease (Bal31 nuclease) to release an unmodified double strand fragment from category 3) above.
  • Biotinylated fragments are then purified by passing through a 1 ml column of Streptavidin-Sepharose .
  • the captured fragments are eluted with 50-100mM dithiothreitol or beta-mercaptoethanol .
  • the DNA fragments are again purified using the Qiagen DNeasy Kit capture column .
  • the purified fragments from Example 2 are then modified by the attachment of appropriate DNA linkers or primers to the Kpnl sticky ends, the blunt ends and the fragments containing the mismatch nucleotide with the 3'dideoxy UTP using ligation procedures that are well known in the art.
  • Linkers and primers facilitate the covalent attachment of fragments to solid supports to allow for multiplex sequencing by any of a variety of techniques that are well known in the art.
  • the linkers can facilitate the ligation of fragments into vectors that enable cloning and non-multiplex sequencing or for PCR amplification reactions .
  • the DNA sequences of the attached linkers also serve to identify the position of the mismatch nucleotide relative to the linker for unambiguous mismatch nucleotide identification .
  • the DNA sequences of the fragments serve to align the fragments to the known sequence of the human genome and to determine the position and identity of the mismatch nucleotide. In many cases, another fragment derived from the original genomic restriction fragment will be aligned on the same restriction fragment in a genome database.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed are methods for identifying mismatched nucleotides such as SNPs in a DNA of interest. The methods entail a) preparing single stranded DNA fragments from one or more sources, allowing the fragments to re-anneal and form double stranded heterohybrid DNA fragments that include perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatched pair of nucleotides, distinguishing formation of heterohybrid DNA containing a mismatch from formation of DNA which is perfectly complementary, separating or purifying each heterohybrid DNA containing a mismatch from all other heterohybrid DNA and determining the identity of the mismatched nucleotide(s), and optionally the DNA sequence in the vicinity of the mismatch, which may allow elucidation of the sequence of the DNA, e.g., the composition and location of allelic polymorphism(s) in a DNA of interest. Kits and compositions of matter for practicing the methods are also provided.

Description

MISMATCH NUCLEOTIDE PURIFICATION AND IDENTIFICATION
CROSS-REFERENCE TO RELATED APPLICATION
[ 0001 ] The present application claims the benefit of the filing date of U.S. Provisional Application No. 61/559,410, filed November 14, 2011, entitled
MISMATCH NUCLEOTIDE PURIFICATION AND IDENTIFICATION, the disclosure of which is hereby incorporated herein by reference .
BACKGROUND OF THE INVENTION
[ 0002 ] The human nuclear genome is comprised of ~3 xlO9 base pairs of DNA. The nucleotide differences, when comparing the genome of one individual to that of another individual, may be less than 0.02% of the total. The primary differences between human genomes are single nucleotide polymorphisms (SNPs) occurring at single nucleotides. These polymorphisms also account for the heterozygosity of allelic DNA in multiploid genomes. It has been estimated that in genomic DNA single base-pair variations may be found at approximately 1200-nucleotide intervals suggesting that there may be 2-3 x 106 SNPs total. However, since individual genomes will have in common the majority of these SNPs (and are therefore not SNPs relative to each other), the actual number of SNPs when comparing the genomes of two individuals is probably far lower .
[ 0003 ] The total number of SNPs that an individual possesses, as well as their positions in the genome, is different for each individual. Because of their abundance and low mutation rate, SNPs are the markers of choice in association studies to identify the genetic risk factors in common diseases (Risch and Merikangas 1996 ; Kruglyak 1999) . As a result of several large initiatives, several million single base-pair variations have been deposited in public and commercial databases. For example, rs4420638 near ApoE has a powerful association with late-onset Alzheimer's disease and rs333 (aka CCR5Delta32) is a well-known SNP associated with HIV. The ability to easily and rapidly detect such alterations in DNA sequences could be central to the diagnosis of genetic diseases and to the identification of clinically significant variants of disease-causing microorganisms .
[ 0004 ] One method for the molecular analysis of genetic variation involves the detection of restriction fragment length polymorphisms (RFLPs) using the Southern blotting technique (Southern, J. Mol . Biol. 98:503-517 (1975)). Since this approach is relatively cumbersome, new methods have been developed, some of which are based on the polymerase chain reaction (PCR) . These include: RFLP analysis using PCR (Chehab et al . , Nature, 329:293-294 (1987); Rommens et al . , Am. J. Hum. Genet. 45:395-396 (1990)), the creation of artificial RFLPs using primer-specified restriction-site modification (Haliassos et al., Nuc. Acids Res. 17:3606 (1989)), allele-specific amplification (ASA) (Newton et al . , Nuc. Acids Res. 17:2503-2516 (1989)), oligonucleotide ligation assay (OLA) (Landergren et al . , Science 241:1077-1080 (1988)), primer extension (Sokolov, Nucl . Acids Res. 18:3671 (1989)), artificial introduction of restriction sites (AIRS) (Cohen et al., Nature 334:119-121 (1988)), allele-specific oligonucleotide hybridization (ASO) (Wallace et al . , Nucl. Acids Res., 9:879-895 (1981)) and their variants. Together with robotics, these techniques for direct mutation and analysis have helped in reducing cost and increasing throughput when only a limited number of mutations need to be analyzed for efficient diagnostic analysis. [0005] These methods are limited in their applicability to complex mutational analysis. To achieve adequate detection frequencies for rare mutations using the above methods, large numbers of mutations must be screened. For example, in cystic fibrosis, a recessive disorder affecting 1 in 2000-2500 live births in the United States, more than 225 presumed disease-causing mutations have been identified. Furthermore, multiple mutations may be present in a single affected individual, and may be spaced within a few base pairs of each other. These phenomena present unique difficulties in designing clinical screening methods that can accommodate large numbers of sample DNAs .
[0006] To identify previously unknown mutations within a gene, other methodologies have been developed, including: single-strand conformational polymorphisms (SSCP) (Orita et al., Proc. Natl. Acad. Sci. USA 85:2766-2770 (1989)), denaturing gradient gel electrophoresis (DGGE) (Meyers et al., Nature 313:495-498 (1985)), heteroduplex analysis (HET) (Keen et al . , Trends Genet. 7:5 (1991)), chemical cleavage analysis (CCM) (Cotton et al . , Proc. Natl. Acad. Sci., 85:4397-4401 (1988)), complete sequencing of the target sample (Maxam et al., Methods Enzymol. 55:499-560 (1980), Sanger et al., Proc. Natl. Acad. Sci. USA 74:5463-5467 (1977)), mismatch purification and evaluation (U.S. Patents 7,033,757 and 6,340,566) and mismatch nicking and sequencing (Peter et al., BioTechniques 35:702-707 (2004); U.S.
Patents 5,571,676 and 5,824,471).
[0007] While all of these approaches may identify sequences where a mismatch may have existed, they are not capable of directly identifying the specific nucleotide ( s ) that are in the mismatch. As a result, there is a need in the art for methods that can identify mismatch nucleotides and thereby confirm that a true mismatch exists at a certain location in the genome. Simply sequencing a mismatch site gives an ambiguous result since any nucleotide can exist in three different mismatch configurations. Identification of the precise nucleotides resulting in a mismatch ultimately requires a non-degrading methodology that can readily be applied to an entire genome or combination of genomes. Thus, there is a further need in the art for a simplified, relatively low-cost method that allows for the efficient analysis of large numbers of DNA samples for the presence of previously unidentified mutations or sequence alterations that further allows for determining the sequence of the DNA in the vicinity of the mismatch of sequence alteration. SUMMARY OF THE INVENTION
[ 0008 ] The present invention encompasses high-throughput methods for identifying the complete set of SNPs in a genome of interest that may due to allelic heterozygosity or result from a comparison of a target DNA to a reference DNA whose sequence is known or substantially known. One aspect of the present invention is directed to a method of determining the sequence of a DNA, comprising: a) preparing single stranded DNA fragments from a polyploid organism, b) allowing the fragments to re-anneal and form double stranded heterohybrid DNA fragments wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch; c) distinguishing formation of heterohybrid DNA containing a mismatch from formation of DNA which is perfectly complementary, d) separating or purifying each heterohybrid DNA containing a mismatch from all other heterohybrid DNA (which may be conducted in a single or multiple consecutive or non-consecutive steps), and e) determining the identity of the mismatched nucleotide ( s ) , and optionally the sequence of the mismatch region (i.e., the DNA sequence in the vicinity of the mismatch in d) ) , thus allowing elucidation of the sequence of the nucleic acid as well as the composition and location of the allelic polymorphism ( s ) in the DNA of the polyploid organism. The various genetic alterations identified by the present method include additions, deletions, or substitutions of one or more nucleotides.
[ 0009 ] Another aspect of the present invention is directed to a method of determining the sequence of a DNA, comprising: a) preparing single stranded fragments of a first DNA having a substantially known sequence; b) preparing single stranded fragments of a second DNA having an unknown sequence; c) contacting the single stranded fragments of a) or copies thereof, and the single stranded fragments of b) or copies thereof under conditions that allow formation of heterohybrid DNA, wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch; d) distinguishing formation of heterohybrid DNA containing a mismatch from formation of heterohybrid DNA which is perfectly complementary; e) separating or purifying each heterohybrid DNA containing a mismatch from all other heterohybrid DNA (which may be conducted in a single or multiple consecutive or non-consecutive steps); and f) determining the identity of the mismatched nucleotide ( s ) , and optionally the sequence of the mismatch region (i.e., the DNA sequence in the vicinity of the mismatch in d) ) , thus allowing elucidation of the sequence of the second DNA.
[ 0010 ] In various embodiments, the method may be carried out by any one of the following sequences of steps. For example, the method may entail the steps of: a) creating double-stranded restriction fragments or double-stranded fragments derived from mechanical shear of genomic DNA; b) modifying the 3' hydroxyl groups of all of the fragments with a blocking moiety or group; c) separating the double stranded DNA to create a single stranded DNA population capable of randomly re-annealing to reform double stranded DNA fragments; d) allowing the DNA to re-anneal, thus forming the heterohybrid DNA wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch, and (d') reacting the heterohybrid DNA with a mismatch recognition protein-based system, thus creating a population of double strand DNA fragments wherein one strand contains at least one break in its phosphodiester bonds in the mismatch region (i.e., the vicinity of a mismatched nucleotide pair); or e) allowing the single stranded DNA to re-anneal, thus forming the heterohybrid DNA wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch, and (e') reacting the heterohybrid DNA with a mismatch recognition protein-based system, thus creating a population of double strand DNA fragments wherein both strands of the heterohybrid DNA containing a mismatch are cleaved in the mismatch region (i.e., the vicinity of the mismatched nucleotide pair) such that the heterohybrid DNA containing a mismatch becomes at least two fragments; f) purifying the fragments containing the mismatch nucleotides or mismatched pair of nucleotides from all other heterohybrid DNA fragments (which may be conducted in a single or multiple consecutive or non-consecutive steps); and g) identifying the mismatched nucleotides and the adjacent DNA sequence in each DNA fragment .
[ 0011 ] In practicing the present invention, the target DNA is re-annealed under stringent conditions with a reference DNA sample, or with itself. Typically, the reference DNA sample comprises genomic DNA from a known or substantially known sequence of DNA referred to as the reference standard or reference DNA. The target DNA sample is genomic DNA containing an unknown number of mutations or SNPs relative to the reference DNA and is referred to as target DNA. Target DNA is processed in the same fashion as the reference DNA. For example, if the reference DNA is digested with a particular restriction enzyme or enzymes, the target DNA is preferably digested with the same enzyme (s). In another embodiment of the method, both the target and the reference DNAs are obtained from the same source such as a polyploid organism, and are then melted and re-annealed, wherein the mismatched DNA sites may be derived from the polymorphisms inherent to allelic heterozygosity. The hybrids that form that contain mismatch regions are recognized and endonucleolytically cleaved on one or both sides of the mismatch region by mismatch recognition protein-based systems.
[ 0012 ] The re-annealed heterohybrid DNA may be created from single stranded fragments of target and reference DNA that are obtained either directly from fragmented genomic DNA or indirectly from amplified or cloned fragments of genomic DNA. These heterohybrids in solution may be reacted with one or more mismatch repair enzymes under conditions in which the repair enzyme (s) remains attached to the mismatch region of the heterohybrid for a sufficient period of time to allow for further manipulation of the enzyme-DNA complex. Examples of further manipulation include, for example, purification, precipitation, hybridization, and denaturation .
[ 0013 ] In order to vastly simplify the entire process and avoid any procedures that would obscure the precise identification of a mismatch nucleotide (such as by the application of polymerase activity for nick translating, filling in, amplifying or otherwise copying the DNA) , fragments containing each mismatch nucleotide of the mismatched pair of nucleotides are separated from the much larger population of perfectly complementary fragments, and from each other. In preferred embodiments of the methods of the present invention, heterohybrid DNA that is perfectly complementary (and which does not contain at least one mismatch) is first separated from the population of heterohybrid DNA that does contain a mismatch, and in subsequent step(s), each of the heterohybrid DNAs containing a mismatch are separated from each other, e.g., by arraying on a solid support such that the individual fragments containing a mismatch can be sequenced and the localization of the mismatch in the genome can be determined. Alternatively, the purified fragments are cloned into appropriate vectors for propagation in, for example, microorganisms. By first purifying each individual mismatch nucleotide derived from a mismatched nucleotide pair, DNA amplification of the total population of fragments (a common approach of multiplex sequencing techniques) can unambiguously identify mismatch nucleotides and thereby confirm the presence and location of mismatched nucleotide pairs. The present invention thus provides a simplified approach for determining the entirety of polymorphisms in a genome or combination of genomes .
[ 0014 ] In one embodiment of the method, the scission at the site of the mismatch in the re-annealed heterohybrid DNA is created by a detectably labeled ATE enzyme ("all-type nicking enzyme") and results in the covalent attachment of the enzyme to the nicked strand. For example, DNA topoisomerase I is a ubiquitous enzyme that relieves DNA torsional stress by introducing a break in the phosphodiester bond between the mismatch nucleotide and the nucleotide immediately on the 5' side of the mismatch. The enzyme becomes covalently attached to the free 3' hydroxyl via a phosphotyrosine moiety. Using a fluorescent ATE, the resulting fluorescent signal indicates the presence of a mismatch in a heterohybrid DNA. A greater intensity of the fluorescent signal also indicates that there is more than one mismatch in the DNA fragment. Since ATEs can be strand-selective based on local nucleotides, the strand containing the nick can be ultimately identified by comparison to the known sequence (which may be contained in a database), once the sequence in the vicinity of the mismatch is obtained. Covalently bound ATE can be removed by proteolysis or by the activity of a tyrosyl phosphodiesterase. The resulting 3 ' -phosphorylated nick is reconstituted to a 3 ' -hydroxyl by a polynucleotide kinase phosphatase, which may thus serve as a substrate for DNA polymerase and hence for sequencing.
[ 0015 ] In another embodiment, the mismatch repair enzyme is attached to a first binding moiety or group, e.g., biotin, and the further manipulation involves contacting the enzyme-DNA complex with a second binding moiety that forms a complex with the first binding moiety (e.g., streptavidin ) , and which is attached to a solid support. By this method, heterohybrid DNA containing mismatches can be first identified in solution by enzyme binding and subsequently purified by affinity chromatography. These mismatch-containing heterohybrid DNAs can then be used as the starting material for amplification, immobilization, and sequencing.
[ 0016 ] Some enzymes, e.g., topoisomerases , do not covalently modify a mismatch nucleotide but attach to the adjacent nucleotide on the 5' side of the mismatch nucleotide. Thus, in a further embodiment of the present invention, the enzyme (e.g., topoisomerase ) -DNA complexes are modified by the steps of (a) modifying substantially all of the free 3' hydroxyl groups with blocking groups, followed by (b) exposing the 3' hydroxyl group of one or both mismatch nucleotides, and (c) covalently modifying the mismatch nucleotide 3' hydroxyl group to allow for purification and identification of the individual mismatch nucleotides and the DNA sequence in the vicinity of the mismatch.
[ 0017 ] In other embodiments, the population of re-annealed heterohybrid DNA fragments is treated with a mismatch endonuclease that cuts both strands of DNA at the 3 ' end of the mismatch. This treatment results in the creation of two double-stranded fragments derived from the mismatch-containing fragment. The new fragments each have a single nucleotide extending from the 3' end of one strand, and each of these nucleotides is a mismatch nucleotide.
[ 0018 ] Another aspect of the present invention is directed to a chip having affixed thereto, directly or indirectly, a plurality (typically in the order of thousands to millions) of DNA fragments (e.g., restriction fragments) of known sequence. These fragments, preferably single stranded, were purified by virtue of the presence of at least one mismatched nucleotide contained therein, and thus serve as the annealing templates for similarly processed DNA. The annealed DNA thus captured on the chip can be used to readily identify the presence of mismatched nucleotides in other target DNA's. These represent the mutational fingerprint of an individual genome, combined genomes, heritable traits or disease states.
[ 0019 ] The present application is believed to provide a solution to current and emerging needs that face the biotechnology industry and particularly the fields of genomics, pharmacogenomics, drug discovery, food characterization and genotyping. Thus the method of the present invention has potential application in for example: nucleic acid sequencing and re-sequencing, diagnostics and screening, gene expression monitoring, genetic diversity profiling, whole genome polymorphism discovery and scoring, the SNPasome (array of SNPs), whole genome sequence determination, and the evolution and propagation of specific types of mutations in individuals and populations .
DETAILED DESCRIPTION OF THE INVENTION
[ 0020 ] All patent applications, patents, and literature references cited in this specification are hereby incorporated by reference in their entirety. In case of conflict, the present description, including definitions, will control.
[ 0021 ] The present invention encompasses high-throughput methods for identifying all of the mismatches contained in a re-annealed genome or combination of genomes (e.g., from different sources). As used herein, the term high-throughput refers to a system for rapidly modifying and assaying large numbers of distinct DNA samples at the same time.
[ 0022 ] In practicing the methods of the present invention, single-stranded genomic DNA is re-annealed with itself or other genomic DNA of known or unknown sequence. A "known DNA sequence" as referred to herein refers to a sequence of nucleotides comprising a gene, a set of genes, or a genome where the nucleotide sequence is substantially or entirely known such that oligonucleotides complementary to repeating units of the gene, set of genes, or genome can be synthesized. Examples of such repeating units include but are not limited to, for example, SNPs and restriction sites. An "unknown DNA sequence" is a gene, set of genes, or a genome that contains an unknown population of single or multiple nucleotide differences in comparison to a known DNA sequence.
[ 0023 ] The methods of the present invention take advantage of the differences between physico-chemical properties of DNA hybrids between almost-identical (but not completely identical) DNA strands and DNA hybrids that are perfectly complementary. When a sequence alteration is present, the heterohybrid DNAs include double stranded fragments that contain a mismatched pair of nucleotides that is embedded in an otherwise perfectly matched hybrid. According to the present invention, mismatches are formed under controlled conditions and are chemically and/or enzymatically modified. The sequences adjacent to, and including, the mismatch (referred to herein as the mismatch region) are then determined. Depending upon the mismatch recognition method used, the mismatch region may include any number of bases, typically from 1 to about 1000 bases.
[ 0024 ] Embodiments of the present invention may encompass the steps of: 1) preparing re-annealed DNA derived from a single genome or multiple genomes of known or unknown sequence wherein re-annealing can occur before or after fragmentation by, for example, restriction enzyme digestion or mechanical shearing; 2) cleaving one or both of the DNA strands at mismatches to form a single-stranded nick or a new DNA fragment in the vicinity of the mismatch; and 3) purifying the fragments containing mismatch nucleotides such that each mismatch nucleotide is separated from the other mismatch nucleotide; followed by 4) determining the precise identity of the mismatch nucleotide and the DNA sequence in the vicinity of the mismatch nucleotide.
[ 0025 ] These steps are described in detail below.
NUCLEIC ACIDS AND SOURCES THEREOF
[ 0026 ] The nucleic acids to be used can be obtained using methods well known and documented in the art. For example, by obtaining a nucleic acid sample such as total DNA, genomic DNA, and cDNA by methods well known and documented in the art and generating fragments therefrom by, for example, limited restriction enzyme digestion or by mechanical means. Thus, the DNA for use in the present invention may include genomic DNA derived from two different genomes (i.e., two different individuals), replaced with DNA derived from a PCR polymerase reaction, or replaced with DNA derived from a reverse transcription of mRNAs (cDNA) .
[ 0027 ] Genomes may include multiple prevalent versions, which contain alterations in sequence relative to each other that cause no discernible pathological effect. Such variations are designated "polymorphisms" or "allelic variants". Most preferably, genomic DNA from a single individual is used for the second DNA sample of unknown sequence. This insures that, statistically, hybrids formed between the first, second, or same genomic DNA sample will be perfectly matched except in the region of the mutation, where discrete mismatch regions will form. In some applications, it is desired to detect polymorphisms. In these cases, appropriate sources for the DNA sample will be selected accordingly. Depending upon what method is used subsequently to detect mismatches, the DNA may also be chemically or enzymatically modified, e.g., to remove or add methyl groups .
PREPARATION OF NUCLEIC ACID CONTAINING MISMATCHES
[ 0028 ] Heterozygous DNA from a single polyploid genomic DNA may be isolated from any cell source or body tissue (e.g., fluid) of a plant, animal or human. Non-limiting examples of cell sources include blood cells, buccal cells, cervicovaginal cells, epithelial cells from urine, fetal cells, or any cells present in tissue obtained by biopsy. Cellular sources from plants include leaf, stem, root, flower, or cultured cells. Body fluids include blood, urine, cerebrospinal fluid, and tissue exudates at the site of infection or inflammation. DNA may be extracted from the cell source or body tissue using any one of the numerous methods that are standard in the art. It will be understood that the particular method used to extract DNA will depend on the nature of the source . The amount of DNA to be extracted for use in the inventive methods is typically in the range of at least 5 pg (corresponding to about 1 cell equivalent of a genome size of 3xl09 base pairs) . In some applications, such as, for example, detection of sequence alterations in the genome of a microorganism, variable amounts of DNA may be extracted.
[ 0029 ] Likewise, the reference and target DNAs to be combined may be obtained from any cell source or body fluid. In one embodiment, the reference DNA is obtained from a single cell source for comparison to different target DNAs. A single source of reference DNA could be obtained, for example, from human cells in culture. Alternatively, reference or target DNA can be obtained from an individual with a particular disease, such as cancer. Reference and target DNAs obtained from individuals with diagnosed diseases or clinical symptoms or genetic traits could be used to ultimately identify unique SNP profiles associated with diseases, symptoms, or traits. In this way, multiple reference and target DNAs can be obtained, each of which will contain the SNP profile relating to a disease, a particular symptom or a trait.
[ 0030 ] Once extracted, the DNA may be employed for use in the inventive methods without further manipulation.
[ 0031 ] DNA fragments suitable for the present invention can be prepared by a variety of techniques well known to those of skill in the art. For example, a restriction endonuclease recognizing a six-base DNA sequence will cleave genomic DNA in fragments that have an average size of 46 (4096 ) bases. In a preferred embodiment, genomic DNA is fragmented by a restriction endonuclease recognizing 6 bases and cutting the DNA to leave 3' overhanging sticky ends (e.g., Kpnl leaves 5'GTAC3' ends) . A "3' overhanging nucleotide" as used herein refers to a 3' terminal nucleotide that has no juxtaposed complementary nucleotide. For example, a Kpnl digestion of DNA results in four 3' overhanging nucleotides. A "5' overhanging nucleotide" as used herein refers to a 5 ' terminal nucleotide that has no juxtaposed complementary nucleotide. For example, an EcoRl digestion of DNA results in four 5' overhanging nucleotides. Restriction endonucleases that produce fragments having blunt ends may also be useful.
[ 0032 ] Another method of providing double strand fragments from genomic DNA involves mechanical shearing (Joneja et al., BioTechniques 45:553-556 2009) .
[ 0033 ] The DNA fragments prepared may be optionally separated into different sizes of DNA fragments. Using agarose gel electrophoresis followed by gel melting and DNA capture, fragments in the range of, for example, 50-10,000 base pairs, can be isolated and used for the re-annealing procedures. Preferably, DNA sizing of this sort is performed after the step of DNA melting and re-annealing.
[ 0034 ] The DNA fragments may be amplified by PCR, preferably before melting and re-annealing. Amplification provides the advantage of increasing the amount of either specific DNA or total sequences within the DNA sequence population. The amplified regions may be specified by the choice of particular flanking sequences for use as primers. Alternatively, the primers can be ligated to the ends of each restriction fragment. The length of DNA sequence that can be amplified typically ranges from 80 bp up to about 30 kbp (Saiki et al . , 1988, Science, 239:487) . Furthermore, the use of amplification primers that are modified by, e.g., biotinylation, can allow for the selective incorporation of the modification into the amplified target DNA.
[ 0035 ] "Nucleic acid template" as used herein refers to an entity that includes or contains the DNA to be amplified or sequenced. The DNA to be amplified or sequenced can also be provided in a double stranded form. Thus, "DNA templates" of the invention may be single or double stranded DNA. The DNA templates to be used in the methods of the present invention can be of variable lengths, typically at least 50 base pairs in length and in some embodiments up to about 30, 000 base pairs in length. The nucleotides making up the DNA templates may be naturally occurring or non-naturally occurring nucleotides. The DNA templates of the invention not only comprise the DNA to be amplified but may in addition contain at the 5' and 3' end short sequences that are complementary to synthetic oligonucleotides.
[ 0036 ] Melting or denaturing DNA to yield single strands suitable for re-annealing or hybridization is preferably accomplished using heat, e.g., 95° C or higher.
[ 0037 ] Re-annealing of single stranded fragments involves separating the two strands of restricted DNA fragments using, for example, heat. Each of the strands derived from the restriction fragments is then re-annealed at a lower temperature giving rise to heterohybrid DNA that contains perfectly complementary double strand DNA and double strand fragments containing mismatched nucleotide pairs.
Hybridization and re-annealing reactions according to the present invention may be performed under high stringency conditions, which in general entails carrying out the reactions in solutions ranging from about lOmM NaCl to about 600mM NaCl, and at temperatures ranging from about 37°C to about 65°C. It will be understood that the stringency of a hybridization reaction is determined by both the salt concentration and the temperature. Thus, a hybridization performed in lOmM salt at 37°C may be of similar stringency to one performed in 500mM salt at 65°C. For the purposes of the present invention, any hybridization conditions may be used that form perfect hybrids between precisely complementary sequences and mismatch loops between non-complementary sequences in the same molecules. The re-annealing process is initiated by cooling the DNA to a temperature that yields an optimum proportion of double stranded DNA, e.g., 50° to 70°C. Preferably, re-annealing reactions are performed in about 600mM NaCl at about 65°C in solution.
[ 0038 ] "Mismatched nucleotide pair" or "mismatched nucleotide pairs" or "mismatched nucleotides" as used herein refers to a pair of nucleotides contained in opposite strands of a largely complementary double strand DNA that are juxtaposed opposite to each other but comprise nucleotide pairs that are not GC or AT. Examples of mismatched nucleotide pairs are GG, CC, AA, TT, GA, GT, CA, and CT.
[ 0039 ] "Mismatch nucleotide" as used herein refers to a single nucleotide that is one of the nucleotides in a mismatched nucleotide pair.
[ 0040 ] "Unmatched nucleotide" or "unmatched nucleotides" as used herein refers to one or more nucleotides contained in one strand of a double strand DNA that does not have a juxtaposed nucleotide on the opposite strand. An example of unmatched nucleotides is a single stranded DNA loop that protrudes from double stranded DNA.
[ 0041 ] The fragmented double-stranded DNA and/or the double-stranded heterohybrid DNA that is formed by the re-annealing or the hybridization reaction may be treated to block substantially all or all of 3' free ends so that they cannot serve as substrates for further enzymatic modification such as by RNA or DNA ligases or polymerases. Suitable blocking methods include, without limitation, removal of 5 '-phosphate groups, homopolymeric tailing of 3 ' -ends with nucleotides, monomeric tailing of 3 ' -ends with dideoxynucleotides, and ligation of modified double-stranded oligonucleotides to the ends of the duplex.
[ 0042 ] Enzymes for modifying 3' hydroxyls include, generally, polymerases and transferases. Alternatively, modification of 3 '-hydroxyls can be by a chemical reaction. For example, the reaction of a 3 ' hydroxyl group with a reactive phophoramidite will result in modification of the 3 ' -hydroxyl . In a preferred embodiment, modification of all 3 '-hydroxyls is accomplished with terminal deoxynucleotidyl transferase (TdT) and one or more deoxynucleotide triphosphates (dNTP) . In a further preferred embodiment, the deoxynucleotide triphosphates are dideoxy nucleotide triphosphates (ddNTP). In another embodiment, the modification of all 3' hydroxyls is accomplished with terminal TdT and one or more deoxynucleotide triphosphates that have a removable blocking group on its 3 ' -hydroxyl . Examples of reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups .
MISMATCH RECOGNITION AND CLEAVAGE
[ 0043 ] Re-annealed DNA fragments are treated so that one or both DNA strands are cleaved within the mismatch region. "Phosphodiester bond break" as used herein refers to scission of the covalent linkage joining the 3' hydroxyl and 5' phosphate that join two ribose moieties resulting in a free 3' hydroxyl and 5 ' phosphate . Depending on the method used for mismatch recognition and cleavage (see below), cleavage may occur at some predetermined distance from either boundary of the mismatch region, and may occur on either strand. The mismatch region as used herein thus encompasses from 1 (typically) to about 1000 bases from the borders of the mismatch .
[0044] In a preferred embodiment, the mismatch region is comprised of -25 base pairs. Knowing the sequence, for example, of the 25 base pairs of DNA in the mismatch region allows for the unambiguous determination of the genomic location of the mismatch by analysis of a genomic sequence database of reference DNA.
[0045] Non-limiting examples of mismatch recognition and cleavage protein-based systems suitable for use in the present invention include single and double strand mismatch nicking proteins, mismatch repair proteins, nucleotide excision repair proteins, mismatch nucleases, chemical modification, and combinations thereof. These embodiments are described below.
[0046] The mismatch recognition and modification proteins used in practicing the present invention may be derived from any species, including bacterial (e.g., E. coli) and humans, or combinations thereof. Typically, functional homologs for a given protein exist across phylogeny. A "functional homolog" of a given protein as used herein is another protein that can functionally substitute for the first protein, either in vivo or in a cell-free reaction.
[0047] Mismatch repair proteins: A number of different enzyme systems exist across phylogeny to repair mismatches that form during DNA replication. In E. coli, one system involves the MutY gene product, which recognizes A/G mismatches and cleaves the A-containing strand (Tsai-Wu et al . , J. Bacteriol. 178:1902 (1991)) . Another system in E. coli utilizes the coordinated action of the MutS, MutL, and Mutes proteins to recognize errors in newly-synthesized DNA strands specifically by virtue of their transient state of under-methylation (prior to their being acted upon by dam methylase in the normal course of replication) . Cleavage typically occurs at a hemi-methylated GATC site within 1-2 kb of the mismatch, followed by exonucleolytic cleavage of the strand in either a 3 ' -5 ' or 5 ' -3 ' direction from the nick to the mismatch. In vivo, this is followed by re-synthesis involving DNA polymerase III holoenzyme and other factors (Cleaver, Cell, 76:1-4 (1994) ) .
[ 0048 ] Non-limiting examples of useful mismatch repair proteins from other organisms include those derived from Salmonella typhimurium (MutS, MutL) (purified MutS, MutL, and MutH are used to cleave mismatch regions (Su et al . , Proc. Natl. Acad. Sci . USA 83:5057 (1986)); Grulley et al . , J. Biol. Chem. 264:1000 (1989)). Streptococcus pneumoniae (HexA,
HexB) ; Saccharomyces cerevisiae ("all-type", MSH2, MLH1, MSH3); Schi zosaccharomyces pombe (SWI4); mouse (repl, rep3); and human ("all-type", hMSH2, hMLHl, hPMSl, hPMS2, duel). The "all-type" mismatch repair system from human or yeast cells can also be used (Chang et al . , Nuc . Acids Res. 19:4761 (1991); Yang et al . , J. Biol. Chem. 266:6480 (1991)).
[ 0049 ] In one embodiment, mismatches are identified by nicking one of the strands in the immediate vicinity of the mismatch, for example between the mismatch nucleotide and the next nucleotide on the 5' side. The all-type nicking enzyme (ATE) from human HeLa cells or calf thymus can nick DNA at the first phosphodiester bond 5' to all 8 possible mismatched bases. The strand disparity of this nicking is influenced by the neighboring nucleotide sequences. After nicking, the ATE covalently binds the 3 ' end of the DNA product to form a cleavable complex. Topoisomerases I introduce transient DNA single-strand breaks by forming a catalytic intermediate in which a covalent bond is generated between an enzyme tyrosine residue (Tyr723 for human topoisomerase I) and the 3 ' -end of the broken DNA. In a further embodiment, tyrosyl-DNA phosphodiesterase-1 (Tdpl) then removes tyrosine from complexes in which the amino acid is linked to the 3 '-end of DNA fragments. Polynucleotide kinase phosphatase is then used to regenerate the 3 ' hydroxyl to create a substrate for DNA polymerase immediately 5' of the mismatch.
[ 0050 ] In plants, mismatch specific nucleases have been described that cleave DNA strands on the 3' side of the mismatch. Some of these nucleases cut only one strand resulting in mismatch termination on one strand and some of the nucleases cut both strands resulting in the creation of new DNA fragments (Sokurenko et al . , N.A.R. 2001, 29:elll; Zhang et al., Genetic Testing and Molecular Biomarkers 2009, 13:97-103) . In a preferred embodiment, heteroduplexes are incubated with a mismatch endonuclease enzyme derived from celery (Cell and Celll mismatch endonucleases ) that is purified essentially as described in U.S. Patent 7,129,075. Incubations may be performed in, e.g., 20mM Hepes pH 7.5, lOmM NaCl, 3mM MgCl2, varying amounts of DNA and Cel nuclease at 37°C for up to 1 hour .
[ 0051 ] Nucleotide excision repair proteins: In E. coli, four proteins, designated UvrA, UvrB, UvrC, and UvrD, interact to repair nucleotides that are damaged by UV light or otherwise chemically modified (Sancar, Science 266:1954, 1994), and also to repair mismatches (Huang et al., Proc. Natl. Acad. Sci. USA 91:12213 (1994)) . UvrA, an ATPase, makes an A2Bi complex with UvrB, binds the site of the lesion, unwinds and kinks the DNA, and causes a conformational change in UvrB that allows it to bind tightly to the lesion site. UvrA then dissociates from the complex, allowing UvrC to bind. UvrB catalyzes an endonucleolytic cleavage at the fifth phosphodiester bond 3' from the lesion; UvrC then catalyzes a similar cleavage at the eighth phosphodiester bond 5' from the lesion. Finally, UvrD (helicase II) releases the excised oligomer. In vivo, DNA polymerase I displaces UvrB and fills in the excision gap, and the patch is ligated.
[ 0052 ] In one embodiment of the present invention, mismatch-containing duplexes formed between DNA strands are treated with a combination (e.g., mixture) of UvrA, UvrB, UvrC, with or without UvrD. As described above, the proteins may be purified from wild-type E. coli, or from E. coli or other appropriate host cells containing recombinant genes encoding the proteins, and are formulated in compatible buffers and concentrations . The final product is a heterohybrid DNA containing a single-stranded gap covering the site of the mismatch.
[ 0053 ] Excision repair proteins for use in the present invention may be derived from E. coli (as described above) or from any organism containing appropriate functional homologs . Non-limiting examples of useful homologs include those derived from S. cerevisiae (RAD1, 2, 3, 4, 10, 14, and 25) and humans (XPF, XPG, XPD, XPC, XPA, ERCC1, and XPB) (Sancar, Science 266:1954 (1994)). When the human homologs are used, the excised patch comprises an oligonucleotide extending 5 nucleotides from the 3' end of the lesion and 24 nucleotides from the 5' end of the lesion. Aboussekhra et al., Cell 80:859 (1995)) disclose a reconstituted in vitro system for nucleotide excision repair using purified components derived from human cells.
[ 0054 ] Chemical Mismatch Recognition: Mismatch-containing heterohybrid DNA may be chemically modified by treatment with osmium tetroxide (for mispaired thymidines) and hydroxylamine (for mispaired cytosines), using procedures that are well known in the art (see, e.g., Grompe, Nature Genetics 5:111 (1993); and Saleeba et al . , Meth. Enzymol. 217:288 (1993)). In one embodiment, the chemically modified DNA is contacted with excision repair proteins (as described above). The hydroxylamine- or osmium-modified bases are recognized as damaged bases in need of repair, one of the DNA strands is selectively cleaved, and the product is a gapped heteroduplex as above .
[0055] Resolvases: Resolvases are enzymes that catalyze the resolution of branched DNA intermediates that form during recombination events (including Holliday structures, cruciforms, and loops) via recognition of bends, kinks, or DNA deviations (Youil et al . , Proc. Natl. Acad. Sci. USA 92:87 (1995)). For example, Endonuclease VII derived from bacteriophage T4 (T4E7) recognizes mismatch regions of from one to about 50 bases and produces double-stranded breaks within six nucleotides from the 3' border of the mismatch region. T4E7 may be isolated from, e.g., a recombinant E. coli that over-expresses gene 49 of T4 phage (Kosak et al., Eur. J. Biochem. 194:779 (1990)). Another suitable resolvase for use in the present invention is Endonuclease I of bacteriophage T7 (T7E1), which can be isolated using a polyhistidine purification tag sequence (Mashal et al., Nature Genetics 9:177 (1995) ) .
PURIFICATION OF MISMATCH NUCLEOTIDES
[0056] The fragments resulting from various mismatch nucleases may have 5' or 3' extensions of one or more nucleotides, one or more of which is a mismatch nucleotide. These fragments result from mismatch repair enzymes that introduce either a double or single strand break at or near the site of the mismatch. In one set of embodiments, the mismatch nuclease cleaves a phosphodiester bond on either the 5' or 3' side one or both mismatch nucleotides (Table 1) .
Table 1. Purification of reaction products from mismatch endonucleases .
Figure imgf000025_0001
mismatch nucleotide by, for example, using a polymerase to sequence at the single strand nick or gap or at the double strand fragment termini, each mismatch nucleotide is first purified away from all other DNA fragments as well as from the other mismatch nucleotide of the mismatched pair. In one embodiment, purification of each fragment is accomplished by virtue of the unmodified 3' hydroxyl group that is produced by the mismatch nuclease, with or without an intermediate enzymatic step (Table 1) .
[0058] In one embodiment, identifying, purifying and sequencing in the mismatch region containing one or both mismatch nucleotides embedded in the blocked DNA fragment occurs after a mismatch nuclease produces a single strand nick on the 3' side of a mismatch nucleotide (embodiment 1 in Table 1) .
[0059] The procedures may include the following steps: a) covalently modifying the newly exposed 3 ' hydroxyl of the mismatch nucleotide. An example of covalent modification includes the addition of a dideoxy nucleotide that preferably has itself been modified to allow for purification, e.g., with a purification tag. For example, the dideoxy nucleotide may be attached to a linker moiety and a binding moiety such as with biotin, the linker preferably containing a scissile function such as, for example, a disulfide that can be cleaved with a reducing agent. In a further embodiment of the invention, the covalent modification of the mismatch nucleotide is by the attachment of another nucleotide that has a removable blocking group on its 3' end. This allows for the subsequent regeneration of a 3 ' hydroxyl on that strand of DNA. Examples of reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups.
[0060] b) Capturing the nicked DNA fragment and separating it from all other DNA fragments by, for example, contacting the DNA solution that has been modified to be biotinylated with streptavidin-Sepharose followed by washing away all other fragments .
[0061] c) Exposing the captured DNA fragment to a single strand-specific nuclease in order to cleave the phosphodiester bond opposite the nick on the other DNA strand and release a fragment of DNA that does not contain a mismatch nucleotide. This fragment of DNA can optionally be captured (e.g., on a solid support) and sequenced if desired. An example of a single strand-specific nuclease for this step is Bal31 nuclease .
[ 0062 ] d) Separating the two mismatch nucleotides and evaluating them independently in order to identify each mismatch nucleotide and determine the sequence of the mismatch region including the mismatch nucleotide. An example of separating the mismatch nucleotides is by, for example, heating to the point of melting and strand separation, whereby one of the single strand DNAs remains attached to the solid support and the other strand is released into solution. Prior to melting, DNA linkers may optionally be attached to the ends of the double strand DNA to facilitate mismatch identification and sequencing. In another embodiment, the removable group blocking the 3' hydroxyl, if present, is removed in order to regenerate the free 3' hydroxyl. Removal of an azidomethyl blocking group, for example, is by the use of Tris ( 2-carboxyethyl ) phosphine in aqueous solution.
[ 0063 ] e) Sequencing of the two DNA strands. Amplification of the DNA strands can optionally be performed using the attached linkers and/or attachment of additional linkers. In a preferred embodiment, the individual DNA strands containing mismatch nucleotides are amplified and sequenced separately either as separately cloned nucleic acids or as separate clusters of amplified DNA or as separate individual strands of DNA. In all cases, sequencing will reveal the identity of the mismatch nucleotide and the sequence of other nucleotides in the mismatch region.
[ 0064 ] In another embodiment, purifying, identifying, and sequencing in the mismatch region containing one or both mismatch nucleotides embedded in the blocked DNA fragment occurs after a mismatch nuclease produces a single strand nick on the 5' side of a mismatch nucleotide (embodiment 2 in Table 1) .
[0065] The procedure may include the following steps: a) covalently modifying the newly exposed 3 ' hydroxyl of the nucleotide adjacent to the mismatch nucleotide. An example of covalent modification includes the addition of a dideoxy nucleotide that preferably has itself been modified to allow for purification. For example, the dideoxy nucleotide may be comprised of a linker moiety and a binding moiety such as biotin, the linker preferably containing a scissile function such as, for example, a disulfide that can be cleaved with a reducing agent. In a further embodiment of the invention, the covalent modification of nucleotide adjacent to the mismatch nucleotide is by the attachment of another nucleotide that has a removable blocking group on its 3' end. This allows for the subsequent regeneration of a 3 ' hydroxyl on that strand of DNA. Examples of reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups.
[0066] b) Capturing the nicked DNA fragment and separating it from all other DNA fragments by, for example, contacting the DNA solution that has been modified to be biotinylated with streptavidin-Sepharose followed by washing away all other fragments .
[0067] c) Exposing the captured DNA fragment to a single strand-specific nuclease in order to cleave the phosphodiester bond opposite the nick on the other DNA strand and release a fragment of DNA that contains both mismatch nucleotides. An example of a single strand-specific nuclease for this step is Bal31 nuclease. The captured DNA fragment, if attached by a scissile linker as described above, can optionally be released and sequenced if desired. The released fragment containing the mismatch nucleotides is optionally re-modified and re-captured using the aforementioned techniques.
Alternatively, the released fragment can be modified by the attachment of linkers to facilitate subsequent mismatch nucleotide separation and sequencing. These linkers optionally contain the functionalities useful for purification (e.g., biotin) and sequencing (e.g., oligonucleotide primer annealing sites). The biotinylated fragment containing the mismatch nucleotides is preferably recaptured by, for example, binding to streptavidin-Sepharose .
[0068] d) Separating and evaluating the two mismatch nucleotides independently in order to identify each mismatch nucleotide and determine the sequence of the mismatch region, including the mismatch nucleotide. An example of separating the mismatch nucleotides is by, for example, heating to the point of melting and strand separation, whereby one of the single strand DNAs remains attached and the other strand is released and captured. Prior to melting, DNA linkers may optionally be attached to the ends of the double strand DNA to facilitate mismatch identification and sequencing. In another embodiment, the removable blocking 3' hydroxyl group, if present, is removed in order to regenerate the free 3' hydroxyl. Removal of an azidomethyl blocking group, for example, is by the use of Tris ( 2-carboxyethyl ) phosphine in aqueous solution.
[0069] e) Sequencing the two DNA strands. Amplification of the DNA strands can optionally be performed using the attached linkers and/or attachment of additional linkers. In a preferred embodiment, the individual DNA strands containing mismatch nucleotides are amplified and sequenced separately either as separately cloned nucleic acids or as separate clusters of amplified DNA or as separate individual strands of DNA. In all cases, sequencing will reveal the identity of the mismatch nucleotide and the sequence of other nucleotides in the mismatch region.
[ 0070 ] In another embodiment, purifying, identifying, and sequencing in the mismatch region containing one or both mismatch nucleotides embedded in the blocked DNA fragment occurs after a mismatch nuclease produces a double strand break on the 3' side of a mismatch nucleotide (embodiment 3 in Table 1) .
[ 0071 ] The procedure may include the following steps: a) covalently modifying the newly exposed 3' hydroxyls of each mismatch nucleotide. An example of covalent modification includes the addition of a dideoxy nucleotide that preferably has itself been modified to allow for purification. For example, the dideoxy nucleotide may be attached to a linker moiety and a binding moiety, such as biotin, the linker preferably containing a scissile function such as, for example, a disulfide that can be cleaved with a reducing agent. In a further embodiment of the invention, the covalent modification of the mismatch nucleotide is by the attachment of another nucleotide that has a removable blocking group on its 3' end. This allows for the subsequent regeneration of a 3' hydroxyl on that strand of DNA. Examples of reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups . In a still further embodiment of the invention, the covalent modification of the mismatch nucleotide is by the attachment of a DNA linker that has an aforementioned attached biotin group on a scissile linker and/or optionally a removable blocking group on its 3' terminal hydroxyl group. [0072] b) Capturing the two DNA fragments and separating them from all other DNA fragments by, for example, contacting the DNA solution that has been modified to be biotinylated with streptavidin-Sepharose followed by washing away all other fragments .
[0073] c) Releasing of the captured fragments, for example, by the use of a reducing agent (e.g., 10-100 mM mercaptoethanol) if a disulfide scissile linker was used for capture. In another embodiment, the removable blocking 3' hydroxyl group, if present, is removed in order to regenerate the free 3' hydroxyl. Removal of an azidomethyl blocking group, for example, is by the use of Tris ( 2-carboxyethyl ) phosphine in aqueous solution.
[0074] d) Sequencing the two DNA strands. Amplification of the DNA strands can optionally be performed using the attached linkers and/or attachment of additional linkers. In a preferred embodiment, the individual DNA fragments containing mismatch nucleotides are amplified and sequenced separately either as separately cloned nucleic acids or as separate clusters of amplified DNA or as separate individual fragments of DNA. In all cases, sequencing will reveal the identity of the mismatch nucleotide and the sequence of other nucleotides in the mismatch region.
[0075] In another embodiment, purifying, identifying, and sequencing in the mismatch region containing one or both mismatch nucleotides embedded in the blocked DNA fragment occurs after a mismatch nuclease produces a double strand break on the 5' side of each mismatch nucleotide (embodiment 4 in Table 1) .
[0076] The procedure may include the following steps: a) covalently modifying the newly exposed 3 ' hydroxyl of the nucleotides that were adjacent to the mismatch nucleotides in the original blocked DNA fragment. An example of covalent modification includes the addition of a dideoxy nucleotide that preferably is itself modified to allow for purification. For example, the dideoxy nucleotide may be attached to a linker moiety and a binding moiety, such as biotin, the linker preferably containing a scissile function such as, for example, a disulfide that can be cleaved with a reducing agent. In a further embodiment of the invention, the covalent modification of the nucleotides that had been adjacent to the mismatch nucleotides is by the attachment of another nucleotide that has a removable blocking group on its 3' end. This would allow for the subsequent regeneration of a 3' hydroxyl on that strand of DNA. Examples of reversible blocking groups on nucleotides include, but are not limited to, azidomethyl groups.
[ 0077 ] b) Capturing the two DNA fragments and separating them from all other DNA fragments by, for example, contacting the DNA solution that has been modified to be biotinylated with streptavidin-Sepharose followed by washing away all other fragments .
[ 0078 ] c) Releasing of the captured fragments, for example, by the use of a reducing agent (e.g., 10-lOOmM mercaptoethanol ) if a disulfide scissile linker is used for capture. In another embodiment, the removable blocking 3' hydroxyl group, if present, is removed in order to regenerate the free 3' hydroxyl. Removal of an azidomethyl blocking group, for example, is by the use of Tris ( 2-carboxyethyl ) phosphine in aqueous solution.
[ 0079 ] d) Sequencing the two DNA strands. Amplification of the DNA strands can optionally be performed using already attached linkers and/or attachment of additional linkers. In a preferred embodiment, the individual DNA fragments containing mismatch nucleotides are amplified and sequenced separately either as separately cloned nucleic acids or as separate clusters of amplified DNA or as separate individual fragments of DNA. In all cases, sequencing will reveal the identity of the mismatch nucleotide and the sequence of other nucleotides in the mismatch region.
[ 0080 ] In a further embodiment of the invention, the two DNA fragments derived from the mismatch nucleotide cutting (embodiments 3 and 4, Table 1) may be ligated to a linker DNA population that contains all four nucleotides at one terminus of the linker protruding as single overhanging nucleotides. The body of the linker may be the same and the terminal nucleotide (either 5' or 3') will be either A, G, C, or T, thus constituting four subpopulations of linkers. For example, a linker may have the specific sequence 5' AAGGCCTT 3 ' . The complementary strand of the linker could be 3 ' TTCCGGAAN 5' (N = any nucleotide) comprising linkers with a protruding N at the 5' end of one strand. Similarly, 5' AAGGCCTTN 3' could have a 3' TTCCGGAA 5' complementary strand comprising linkers with a protruding N at the 3' end of one strand. For ligation to mismatch fragments containing 3' overhanging mismatch nucleotides (embodiment 3, Table 1), the linkers may contain an overhanging 3' nucleotide containing a 3 ' -hydroxyl as well as a recessed nucleotide containing a 5 ' -phosphate . For ligation to mismatch fragments containing 5' overhanging mismatch nucleotides (embodiment 4, Table 1), the linkers may contain an overhanging 5' nucleotide containing a 5 '-phosphate as well as a recessed nucleotide containing a 3 ' -hydroxyl .
[ 0081 ] The linkers may optionally contain covalently attached or removable fluorescent moieties wherein the excitation wavelength of the fluorescent moiety is specific for a particular overhanging nucleotide and can thereby indicate the identity of the mismatch nucleotide after ligation of the appropriate complementary nucleotide (attached to the linker) to the fragment containing the mismatch nucleotide.
[0082] The ligation reaction at the site of each mismatch nucleotide introduces the complementary nucleotide attached to the linker. In a further embodiment, a different linker may also be ligated to the other end of the fragment that comprises either a blunt end or a restriction enzyme sticky end .
[0083] In a further embodiment, the linkers may optionally include or be attached to other functional, e.g., binding moieties allowing for purification or solid phase attachment. An example of a moiety for purification is biotin. Examples of moieties for solid phase attachment are amino groups and thiol groups .
[0084] In a preferred embodiment, the attached linkers are suitable for amplification of the DNA fragment, for example, by the use of the polymerase chain reaction. In a further preferred embodiment, the linkers are suitable for adapting to any of the various DNA sequencing technologies.
[0085] In a further embodiment of the invention, the blocked DNA fragments containing a nick on the 3' or 5' side of a mismatch (embodiments 1 and 2, Table 1) are further modified using a mechanical shearing device (Joneja et al., BioTechniques 46:553-556 (2009)) that preferentially breaks DNA at the site of a nick. The resulting two fragments are then purified and modified by the aforementioned techniques to allow for mismatch nucleotide identification and sequencing in the vicinity of the mismatch.
[0086] Other methods of identifying sites of mismatched nucleotides and the adjacent DNA sequence in the mismatch region include but are not limited to using glycosylases and polymerases. These enzymes function directly on mismatches or at DNA nicks in a mismatch region since the mismatch base is either destroyed (glycosylases) or changed to become the complement nucleotide of the opposite strand. This results in an ambiguous identification of mismatch nucleotides and does not provide confirmation or identification of a true mismatched nucleotide pair.. The separation of the mismatch nucleotide pair allows for sequence determination that does not change the identity of the mismatch nucleotide. In addition, the purification of mismatch fragments greatly reduces the total amount of DNA and number of fragments to be evaluated .
[ 0087 ] There are numerous ways to detectibly modify a 3' hydroxyl to enable purification (Hegde et al., Cell Res. 2008, 18 (1) : 27-47) . In a preferred embodiment, the modification involves a template-independent attachment of a nucleotide triphosphate to the 3' end of the fragments. In a further preferred embodiment, the dNTP attachment is catalyzed by terminal deoxynucleotidyl transferase (TdT) and the nucleotide triphosphate contains a covalently attached biotin moiety. In a further preferred embodiment the biotinylated dNTP is a dideoxy nucleotide and the linker to biotin is a scissile linker that can be cleaved chemically or enzymatically . In a further preferred embodiment, the scissile linker contains a disulfide group that can be cleaved by a reducing agent.
[ 0088 ] In one embodiment of the invention, (embodiments 1 and 2, Table 1), genomic DNA is first digested with a restriction enzyme (for example, Kpnl which will leave overhanging 3' sticky ends comprised of 5'GTAC3' and an average fragment length of 4096 bp) . The fragments are then modified at all or substantially all of the 3'-hydroxyls by the action of TdT and ddNTP . Fragmented DNA containing modified 3 ' -ends is then heated until single strands of DNA are predominant, followed by cooling the sample to a temperature where high stringency re-annealing occurs .
[ 0089 ] In some embodiments, the 3 ' -hydroxy1 modification using TdT (the "blocking step") is performed after the melting and re-annealing of the fragments .
[ 0090 ] The re-annealed double stranded fragments are then reacted with a mismatch nuclease that produces a nick or gap on one strand at or near the 3 ' -end of a mismatch nucleotide. The newly exposed 3'-hydroxyl is then modified using TdT and a biotinylated ddNTP (the "biotinylation step") where the linker attaching biotin to the dideoxynucleotide is a scissile linker that can be cleaved by enzymatic or chemical means.
[ 0091 ] Purification of the biotinylated fragments is then accomplished by passing the reaction mixture through a biotin binding column (for example, streptavidin-Sepharose ) . After washing away unbound DNA, the captured fragments may be released using a dilute reducing agent (e.g., 10-lOOmM dithiothreitol or mercaptoethanol ) . The eluted DNA fragments can then be ligated directly into an appropriate vector. Preferably, the purified fragments can be digested with a single strand specific nuclease (for example, SI, mung bean or Bal31 nuclease) and then cloned into an appropriate vector (e.g., with Smal/Kpnl termini). Alternatively, single strand region of the purified fragments can be filled in by the action of a polymerase in the presence of dNTPs followed by ligation into an appropriate vector. In a further embodiment, linker DNAs are added to the ends of the purified fragments to allow for cloning, attachment to a solid support sequencing, or attachment of primer sites for cloning, sequencing or PCR. [ 0092 ] In another embodiment of the invention (embodiments 3 and 4, Table 1) the same purification procedure as for embodiments 1 and 2 is followed. In this embodiment however the mismatch nuclease activity results in two double strand fragments derived from the original mismatch-containing restriction fragment. The purified fragments contain either sticky end or blunt termini at one end and at the other end have either a 3' or 5' protrusion of nucleotides containing the mismatch base. The protrusion can be from 1 to 2000 nucleotides. The use of a single strand specific nuclease in this case eliminates the mismatch nucleotide. Filling in a 5 ' single strand protrusion may be desirable.
DNA IMMOBILIZATION
[ 0093 ] "Solid support" as used herein refers to any solid surface to which nucleic acids can be covalently attached, such as for example latex beads, dextran beads, polystyrene, polypropylene surface, polyacrylamide gel, gold surfaces, glass surfaces and silicon wafers. Preferably the solid support is a glass surface.
[ 0094 ] "Means for attaching nucleic acids to a solid support, " as used herein refers to any chemical or non-chemical attachment moieties and methods, including chemically-modifiable functional groups. "Attachment" relates to immobilization of nucleic acid on solid supports by either a covalent attachment or via irreversible passive adsorption or via affinity between molecules (for example, immobilization on an avidin-coated surface by biotinylated molecules). The attachment must be of sufficient strength that it cannot be removed by washing with water or aqueous buffer under DNA-denaturing conditions .
[ 0095 ] "Chemically-modifiable functional group" as used herein refers to a group such as for example, a phosphate group, a carboxylic or aldehyde moiety, a thiol, an amine, or a hydroxyl group.
[0096] "Nucleic acid coordinate" or "coordinate" as used herein refers to a discrete area containing multiple copies of a nucleic acid strand or a synthetic oligonucleotide of known sequence. Multiple copies of the complementary strand to the nucleic acid strand may also be present in the same coordinate. The multiple copies of the nucleic acid strands making up the coordinates are generally immobilized on a solid support and may be in a single or double stranded form.
[0097] Preferably, the attachment of the oligonucleotide primer as well as the extended nucleic acid template on the solid support is thermostable at the temperature to which the support may be subjected to during the nucleic acid amplification reaction, for example temperatures of up to approximately 100°C, for example approximately 94°C. Preferably the attachment is covalent in nature.
[0098] In a yet further embodiment of the invention, the covalent binding of synthetic primers to the solid support is induced by a crosslinking or grafting agent such as for example l-ethyl-3- ( 3-dimethylaminopropyl ) -carbodiimide hydrochloride (EDC), succinic anhydride, phenyldiisothiocyanate or maleic anhydride, or a hetero-bifunctional crosslinker such as for example m-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS),
N-succinimidyl [ 4-iodoacethyl ] aminobenzoate (SIAB) ,
Succinimidyl 4- [N-maleimidomethyl] cyclohexane-l-carboxylate
(SMCC), N-y-maleimidobutyryloxy-succinimideester (GMBS),
Succinimidyl-4- [p-maleimidophenyl ] butyrate (SMPB) and the sulfo (water-soluble) corresponding compounds. Preferred crosslinking reagents for use in the present invention are s-SIAB, s-MBS and EDC. s-MBS is a maleimide-succinimide hetero-bifunctional cross-linker and s-SIAB is an iodoacethyl-succinimide hetero-bifunctional cross-linker. Both linkers are capable of forming a covalent bond respectively with SE groups and primary amino groups. EDC is a carbodiimide-reagent that mediates covalent attachment of phosphate and amino groups .
[ 0099 ] In a yet further embodiment of the invention the solid support has a derivatized surface. "Derivatized surface" as used herein refers to a surface which has been modified with chemically reactive groups, for example amino, thiol or acrylate groups. In a yet further embodiment the derivatized surface of the solid support is subsequently modified with bifunctional crosslinking groups to provide a functionalized surface, preferably with reactive crosslinking groups. "Functionalized surface" as used herein refers to a derivatized surface which has been modified with specific functional groups, for example the maleic or succinic functional moieties.
[ 0100 ] The solid support may be any solid surface to which nucleic acids can be attached, such as for example latex beads, dextran beads, polystyrene, polypropylene surface, polyacrylamide gel, gold surfaces, glass surfaces and silicon wafers. Preferably the solid support is a glass surface and the attachment of nucleic acids thereto is a covalent attachment .
[ 0101 ] Other approaches for the attachment of oligonucleotides to solid surfaces use crosslinkers , such as succinic anhydride, phenyldiisothiocyanate (Guo et al . , (1994)), or maleic anhydride (Yang et al., (1998)). Another widely used crosslinker is l-ethyl-3- ( 3-dimethylamonipropyl ) - carbodiimide hydrochloride (EDC). EDC chemistry was first described by Gilham et al. (1968) who attached DNA templates to paper (cellulose) via the 5' end terminal phosphate group. Using EDC chemistry, other supports have been used such as, latex beads (Wolf et al . 1987, Lund et al 1988), polystyrene microwells (Rasmussen et al . 1991), controlled-pore glass (Ghosh et al . 1987) and dextran molecules (Gingeras et al. 1987) . The condensation of 5' amino-modified oligonucleotides with carbodiimide mediated reagent has been described by Chu et al . (1983), and by Egan et al . (1982) for 5' terminal phosphate modification group.
[ 0102 ] In order to generate nucleic acid clusters via a solid phase amplification reaction, oligonucleotide primers need to be specifically attached at their 5' ends to the solid surface, preferably glass. The glass surface can be derivatized with reactive amino groups by silanization using amino-alkoxy silanes . Suitable silane reagents include aminopropyltrimethoxysilane, aminopropyltriethoxysilane and 4-aminobutyltriethoxysilane . Glass surfaces can also be derivatized with other reactive groups, such as acrylate or epoxy using epoxysilane, acrylatesilane and acrylamidesilane . Following derivatization of the support surface, nucleic acid molecules or oligonucleotides having a chemically modifiable functional group at their 5' end, for example phosphate, thiol or amino groups, are covalently attached to the derivatized surface by a crosslinking reagent such as those described above .
[ 0103 ] Alternatively, the derivatization step can be followed by attaching a bifunctional cross-linking reagent to the surface amino groups thereby providing a modified functionalized surface. Nucleic acid molecules (colony primers or nucleic acid templates) having 5 ' -phosphate , thiol or amino groups are then reacted with the functionalized surface forming a covalent linkage between the nucleic acid and the glass .
[ 0104 ] Representative cross-linking and grafting reagents that can be used for covalent DNA/oligonucleotide grafting on the solid support are described above.
[ 0105 ] The oligonucleotide primers are generally modified at the 5 ' end by a phosphate group or by a primary amino group (for EDC grafting reagent) or a thiol group (for s-SIAB or s-MBS linkers) .
[ 0106 ] Thus, another aspect of the invention provides a solid support, to which there is attached a plurality of oligonucleotide primers or nucleic acids . Preferably a plurality of nucleic acid templates is attached to the solid support, such as glass. Preferably the attachment of the oligonucleotide primers to the solid support is covalent.
[ 0107 ] A yet further aspect of the invention provides an apparatus for carrying out the methods of the invention. Such apparatus may include for example a plurality of nucleic acid templates and oligonucleotide primers of the invention bound, preferably covalently, to a solid support as outlined above, together with a nucleic acid polymerase, a plurality of nucleotide precursors such as those described above, a proportion of which may be detectably labeled, and a means for controlling temperature. Alternatively, the apparatus may include for example a solid support comprising one or more nucleic acids. Preferably the apparatus may also contain a detecting means for detecting and distinguishing signals from individual nucleic acids arrayed on the solid support according to the methods of the present invention. For example such a detecting means may contain a charge-coupled device operatively connected to a magnifying device such as a microscope as described above. [ 0108 ] Preferably any apparati of the invention are provided in an automated form454 pyrosequencing (Roche Diagnostics), SOLiD Sequencing (Applied Biosystems), Helioscope sequencing (Helios Inc.) are examples of available automated sequencing systems .
SEQUENCE DETERMINATION
[ 0109 ] In one embodiment of the present invention, DNA from one source (e.g., individual) is annealed to itself or to DNA from another source (e.g., individual) to form mismatch regions and then treated with a mismatch recognition protein-based system, e.g., mismatch nicking proteins, mismatch repair proteins, excision repair proteins, mismatch nucleases, chemical modification and cleavage reagents, or combinations of such agents. This treatment introduces single-stranded breaks at predetermined locations on one or both sides of a mismatched nucleotide and may cause the selective excision of single-stranded fragment covering the mismatch region. Alternatively, the treatment results in a single nick being introduced at the 5' end of the mismatch. The resulting structure is a nicked or gapped heteroduplex in which the gap may be from about 5 to about 1000 bases in length, depending on the mismatch recognition system used. In the case of a nick, no gap is formed but a free 3' hydroxyl is present at the site of the mismatch.
[ 0110 ] In methods of the present invention wherein the additional step of performing at least one step of sequence determination of at least one of the mismatched nucleic acids is performed, the sequence determination may be carried out using any appropriate sequencing technique. For example, one technique of sequence determination that may be used in the present invention involves hybridizing an appropriate primer, sometimes referred to herein as a "sequencing primer", with the nucleic acid template to be sequenced, extending the primer and detecting the nucleotides used to extend the primer. Preferably the nucleic acid used to extend the primer is detected before a further nucleotide is added to the growing nucleic acid chain, thus allowing base-by-base in situ nucleic acid sequencing.
[ 0111 ] In one embodiment of the present invention in which the purified fragments contain two base 3' nucleotide extensions (i.e., the mismatch nucleotide and the dideoxy nucleotide) double strand linkers are readily ligated to the fragments . The linker for one end may contain the sticky end of the restriction site (e.g., Kpnl) . Attachment of a linker at the other end may exploit complementarity to the two base 3' overhang (e.g., 5'NTN3') . At either end an attached linker may also contain any sequence required to impart a useful annealing site for, for example, attachment to a solid support, direct sequencing, or directional cloning.
[ 0112 ] Specially designed nucleotides with fluorescent reversible 3' terminators allow each cycle of a sequencing reaction to occur simultaneously for all coordinates in the presence of all four nucleotides (A, C, T, and G) . In each cycle, the polymerase is able to select the correct base to incorporate, with the natural competition among all four alternatives leading to higher accuracy than methods where only one nucleotide is present in the reaction mix at a time. Sequences where a particular base is repeated (e.g., homopolymers ) are addressed like any other sequence and resolved with high accuracy. The simultaneous sequencing of the thousands of clusters present on the solid support is accomplished by recording the unique fluorescent signal for each nucleotide at each position during every cycle of the process. After recording, the fluorescent terminators are removed, e.g., by a chemical reaction for example by the addition of a low pH solution such that the next round of polymerase additions can proceed.
[ 0113 ] The detection of incorporated nucleotides is facilitated by including one or more labeled nucleotides in the primer extension reaction. Any appropriate detectable label may be used, for example a fluorophore, radiolabel etc. Preferably a fluorescent label is used. The same or different labels may be used for each different type of nucleotide. Where the label is a fluorophore and the same labels are used for each different type of nucleotide, each incorporated nucleotide can provide a cumulative increase in signal detected at a particular wavelength. If different labels are used then these signals may be detected at the different appropriate wavelengths. If desired, a mixture of labeled and unlabelled nucleotides is provided.
[ 0114 ] In order to allow the hybridization of an appropriate sequencing primer to the nucleic acid template to be sequenced, the nucleic acid template should normally be in a single stranded form. If the nucleic acid templates making up the nucleic acid colonies are present in a double stranded form they can be processed to provide single stranded nucleic acid templates using methods well known in the art, for example by denturation, cleavage etc.
[ 0115 ] The sequencing primers which are hybridized to the nucleic acid template and used for primer extension are preferably short oligonucleotides, generally ranging from 15 to 25 nucleotides in length. The sequence of the primers is designed so that they hybridize to part of the nucleic acid template to be sequenced, preferably under stringent conditions. The sequence of the primers used for sequencing may have the same or similar sequences to that of the colony primers used to generate the nucleic acid colonies of the invention. The sequencing primers may be provided in solution or in an immobilized form.
[ 0116 ] Once the sequencing primer has been annealed to the nucleic acid template to be sequenced by subjecting the nucleic acid template and sequencing primer to appropriate conditions (which are determined by methods well known in the art), primer extension is carried out, for example using a nucleic acid polymerase and a supply of nucleotides, at least some of which are provided in labeled form, and under conditions suitable for primer extension if a suitable nucleotide is provided. Examples of nucleic acid polymerases and nucleotides which may be used are described above.
[ 0117 ] Preferably after each primer extension step a washing step is performed in order to remove unincorporated nucleotides which may interfere with subsequent steps. Once the primer extension step has been carried out, the nucleic acid colony is monitored in order to determine whether a labeled nucleotide has been incorporated into an extended primer. The primer extension step may then be repeated in order to determine the next and subsequent nucleotides incorporated into an extended primer .
[ 0118 ] Any device allowing detection and preferably quantification of the appropriate label, for example fluorescence or radioactivity, may be used for sequence determination. If the label is fluorescent a CCD camera optionally attached to a magnifying device (as described above), may be used. In fact the devices used for the sequence determining aspects of the present invention may be the same as those described above for monitoring the amplified nucleic acid colonies . [ 0119 ] The detection system is preferably used in combination with an analysis system in order to determine the number and nature of the nucleotides incorporated at each cluster after each step of primer extension. This analysis, which may be carried out immediately after each primer extension step, or later using recorded data, allows the sequence of the nucleic acid template within a given cluster to be determined.
[ 0120 ] If the sequence being determined is unknown, the nucleotides applied to a given cluster are usually applied in a chosen order which is then repeated throughout the analysis, for example dATP, dTTP, dCTP, dGTP . If, however, the sequence being determined is known and is being re-sequenced, for example to analyze whether or not small differences in sequence from the known sequence are present, the sequencing determination process may be made quicker by adding the nucleotides at each step in the appropriate order, chosen according to the known sequence. Differences from the given sequence are thus detected by the lack of incorporation of certain nucleotides at particular stages of primer extension. Thus full or partial sequences of the amplified nucleic acid templates making up particular nucleic acid colonies may be determined using the methods of the present invention.
[ 0121 ] In a further embodiment of the present invention, the full or partial sequence of more than one nucleic acid can be determined by determining the full or partial sequence of the amplified nucleic acid templates present in more than one nucleic acid coordinate. Preferably a plurality of sequences is determined simultaneously.
[ 0122 ] Reliability of the sequence determination of nucleic acids using the methods of the present invention is enhanced due to the fact that large numbers of each nucleic acid to be sequenced are provided within each nucleic acid coordinate of the invention. If desired, further improvements in reliability can be obtained by providing a plurality of nucleic acid colonies containing the same nucleic acid template to be sequenced, then determining the sequence for each of the plurality of colonies and comparing the sequences thus determined.
[ 0123 ] Preferably the attachment of the oligonucleotide primer as well as the extended nucleic acid template on the solid support is thermostable at the temperature to which the support may be subjected to during the nucleic acid amplification reaction, for example temperatures of up to approximately 100°C, for example approximately 94°C. Preferably the attachment is covalent in nature.
[ 0124 ] To determine the nucleotide sequence of the nicked or excised region (including the mismatch), the heteroduplexes are incubated with an appropriate DNA polymerase enzyme in the presence of dideoxynucleotides . Suitable enzymes for use in this step include without limitation DNA polymerase I, DNA polymerase III holoenzyme, T4 DNA polymerase, and T7 DNA polymerase. The only requirement is that the enzyme be capable of accurate DNA synthesis using the gapped heteroduplex as a substrate. The presence of dideoxynucleotides, as in a Sanger sequencing reaction, insures that a nested set of premature termination products will be produced, and that resolution of these products by, e.g., gel electrophoresis, will display the DNA sequence across the gap.
High-Throughput Applications
[ 0125 ] The methods of the present invention are particularly suitable for high-throughput analysis of DNA, i.e., the rapid and simultaneous processing of genomic DNAs derived from an individual. Furthermore, in contrast to other methods for de novo mutation detection, the methods of the present invention are suitable for the simultaneous analysis of a large number of DNA fragments in a single reaction. This is referred to as "multiplex" analysis. The manipulations involved in practicing the methods of the present invention lend themselves to automation, e.g., using multiwell formats as a solid support or as a receptacle for, e.g., beads; robotics to perform sequential incubations and washes; and, finally, automated sequencing using commercially available automated DNA sequencers.
[ 0126 ] For use of the present invention in diagnostics and screening, whole genomes or fractions of genomes may be amplified into colonies for DNA sequencing of known single nucleotide polymorphisms. SNP identification has application in medical genetic research to identify genetic risk factors associated with diseases. SNP genotyping will also have diagnostic applications in pharmaco-genomics for the identification and treatment of patients with specific medications .
[ 0127 ] For use of the present invention in genetic diversity profiling, populations of for example organisms or cells or tissues can be identified by the amplification of the sample DNA into coordinates, followed by the DNA sequencing of the specific "tags" for each individual genetic entity. In this way, the genetic diversity of the sample can be defined by counting the number of tags from each individual entity.
[ 0128 ] For use of the present invention in gene expression monitoring, the expressed mRNA molecules of a tissue or organism under investigation are converted into cDNA molecules which are amplified into sets of colonies for DNA sequencing. The frequency of coordinates coding for a given mRNA is proportional to the frequency of the mRNA molecules present in the starting tissue. Applications of gene expression monitoring are in biomedical research.
[ 0129 ] A whole genome slide, where the entire genome of a living organism is represented in a number of DNA colonies, numerous enough to contain all the sequences of that genome, may be prepared using the methods of the invention. The genome slide is the genetic card of any living organism. Genetic cards have applications in medical research and genetic identification of living organisms of industrial value .
[ 0130 ] The present invention may also be used to carry out whole genome sequencing where the entire genome of a living organism is amplified as sets of coordinates for extensive DNA sequencing. Whole genome sequencing allows for example, 1) a precise identification of the genetic strain of any living organism; 2) discovery of novel genes encoded within the genome; and 3) discovery of novel genetic polymorphisms.
[ 0131 ] The applications of the present invention are not limited to an analysis of nucleic acid samples from a single cellular source or organism/individual. For example, nucleic acid tags can be incorporated into the nucleic acid templates and amplified, and different nucleic acid tags can be used for each cellular source or organism/individual . Thus, when the sequence of the amplified nucleic acid is determined, the sequence of the tag may also be determined and the origin of the sample identified.
[ 0132 ] Thus, a further aspect of the invention provides the use of the methods of the invention, or the nucleic acid colonies of the invention, or the plurality of nucleic acid templates of the invention, or the solid supports of the invention, for providing nucleic acid molecules for sequencing and re-sequencing, gene expression monitoring, genetic diversity profiling, diagnosis, screening, whole genome sequencing, whole genome polymorphism discovery and scoring and the preparation of whole genome slides (i.e., the whole genome of an individual on one support), or any other applications involving the amplification of nucleic acids or the sequencing thereof.
[0133] A yet further aspect of the invention provides a kit for use in sequencing, re-sequencing, gene expression monitoring, genetic diversity profiling, diagnosis, screening, whole genome sequencing, whole genome polymorphism discovery and scoring, or any other applications involving the amplification of nucleic acids or the sequencing thereof. This kit contains a plurality of nucleic acid templates and colony primers of the invention bound to a solid support, such as a chip as outlined above. In one embodiment, a chip having affixed thereto, directly or indirectly, a plurality (typically in the order of thousands to millions) of DNA fragments (e.g., restriction fragments) of known sequence. These fragments, preferably single stranded, were purified by virtue of the presence of at least one mismatched nucleotide contained therein, and thus serve as the annealing templates for similarly processed DNA. The annealed DNA thus captured on the chip can be used to readily identify the presence of mismatched nucleotides in other target DNA's. These represent the mutational fingerprint of an individual genome, combined genomes, heritable traits or disease states.
[0134] Example 1. Purification and sequencing of mismatch nucleotides from PCR amplified DNA.
[0135] An IgG immunoglobulin heavy chain cDNA of known sequence that is cloned into the Zero Blunt PCR Cloning Vector (InVitrogen) is PCR amplified using appropriate primers that introduced a Kpnl and a Pstl site into the PCR product. The PCR reaction uses error prone Taq polymerase. Control IgG DNA (unamplified) is derived from the purified vector using the Qiagen Miniprep Kit and is digested with Kpnl and Pstl and the resulting IgG cDNA insert is purified by agarose gel electrophoresis and the Qiagen Gel Extraction Kit.
[ 0136 ] The PCR product is digested with Kpnl and Pstl and is electrophoresed in an agarose gel and purified using the Qiagen Gel Extraction Kit. The PCR DNA and the purified insert DNA from the vector are separately subjected to the same procedures described below.
[ 0137 ] Melting of the DNA fragments to produce complementary single strand fragments is by heating in an Eppendorf tube at 98°C in 1ml 0. ImM EDTA, lOmM Tris-Cl pH 8.0 for 10 minutes in a heating block. The heating block is then turned off to allow the temperature to drop. When it reached 90°C the Eppendorf tube is opened and 5M NaCl is added to a final concentration of 500mM NaCl. DNA fragments are allowed to re-anneal overnight in the cooling heat block and are then purified by agarose gel electrophoresis and recovered using the Qiagen Gel Extraction Kit. The gel purification step is intended to eliminate aggregated or concatemeric DNA that may have resulted from misaligned re-annealing and to limit the DNA in subsequent steps to the 1.5 kb cDNA band.
[ 0138 ] In order to block all of the 3' hydroxyl groups in the re-annealed fragments, the DNA is reacted with dideoxy ATP and 100 units of TdT in 20mM Tris-acetate, 50mM potassium acetate, lOmM magnesium acetate, pH 7.9, supplemented with 0.25mM CoCl2, at 37°C for 1 hour. The reaction products are then purified using a Qiagen DNeasy Kit capture column.
[ 0139 ] Scission of DNA strands adjacent to mismatched nucleotides is performed in 20mM Tris-HCl, pH 7.4, 25mM KC1, lOmM MgCl2, 200μς DNA substrate, 100U DNA ligase, and an optimized amount of Celll nuclease ( Sokurenko, E.V. et al. N.A.R. 2001, 29:elll; Zhang, C. et al. Genetic Testing and Molecular Biomarkers 13:97-103, 2009) . The reaction is incubated at 42°C for 20 min and terminated by the addition of 2μΙ_ 0.5M EDTA. The reaction products are then purified using a Qiagen DNeasy Kit capture column.
[ 0140 ] The categories of products of the nuclease reaction are expected to be 1) double stranded fragments with a dideoxy modified Kpnl or Pstl sticky end (5' GTACA-dd 3', 5' TGCAA-dd 3', the dideoxy adenosine having been added in the blocking step) at one end of the fragment and a 3' protruding mismatch nucleotide with a 3 ' hydroxyl at the other end of the fragment; 2) double stranded fragments with a dideoxy adenosine modified Kpnl or Pstl sticky end at both ends of the fragment and a 3' protruding mismatch nucleotide with a 3' hydroxyl within the fragment (single strand nick); 3) a large population of perfectly complementary fragments with the dideoxy adenosine modified Kpnl sticky ends at one end of the fragment and a dideoxy adenosine modified Pstl sticky end at the other end of the fragment .
[ 0141 ] The fragment population is then treated with 100 units of TdT in 20mM Tris-acetate, 50mM potassium acetate, lOmM magnesium acetate, pH 7.9, supplemented with 0.25mM CoCl2 and dideoxy UTP [-S-S-] biotin at 37°C for 1 hour. [-S-S-] refers to a linker strand containing -15-20 C- or N-linked atoms and a disulfide group within the strand. The fragments are then treated with a single-strand specific nuclease (SaI31 nuclease) to release an unmodified double strand fragment from category 2) above. After the TdT reaction, 5M NaCl is added to the buffer to a final concentration of 200mM to eliminate any non-specific binding during purification. Biotinylated fragments are then purified by passing through a 1ml column of Streptavidin-Sepharose . The captured fragments are eluted with 50-100mM dithiothreitol or beta-mercaptoethanol . The DNA fragments are again purified using the Qiagen DNeasy Kit capture column .
[ 0142 ] The purified fragments are modified by attachment of DNA linkers to allow for cloning and/or sequencing. The first linker has a sticky end that is complementary to either the Kpnl or Pstl sticky ends with the attached ddATP (i.e., 3' TCATG 5' or 3' TACGT 5') · The other end of the linker is blunt and the total linker length is -15-20 nucleotides in order to remain double-stranded during ligation. T4 DNA ligase is added to the linker /fragment mix covalently joining the 3 ' hydroxyl of the linker with the 5 ' phosphate of the fragment .
[ 0143 ] An additional linker set is used for ligation to the mismatch nucleotide site that consists of 5' NU-dd 3' where N is any of the four nucleotides comprising the mismatch nucleotide and U-dd is the added dideoxy uridine used for purification. The two base annealing portion of the linker is therefore 5' AN 3' . The other end of the linker set is blunt and the total linker length is -15-20 nucleotides in order to remain double-stranded during ligation. T4 DNA ligase is added to the linker/fragment mix covalently joining the 3' hydroxyl of the linker with the 5' phosphate of the fragment.
[ 0144 ] The resulting blunt end fragments are then ligated into the Zero Blunt vector and used for bacterial transformation. One hundred colonies are used to inoculate 3 ml of LB broth, grown overnight with shaking at 37°C. Plasmid from each culture is purified using the Qiagen Miniprep Kit and are sent to the Retrogen Inc. (San Diego) sequencing facility. Analysis of the sequences reveals that each of the fragments that is purified and cloned contains a mismatch nucleotide. None of the sequenced plasmids contains a perfectly complementary DNA insert compared to the original IgG sequence. In the control experiment, using the perfectly complementary sequence, no bacterial colonies are identified that contain the cDNA. Background colonies are all due to re-ligated vector.
[0145] Example 2. Purification of mismatch nucleotides from genomic DNA.
[0146] Genomic DNA (-25 g) is isolated from human red blood cells (San Diego blood bank) using a Qiagen DNeasy Tissue Kit. The resulting fragment sizes range up to -50 kb and averaged -20-30 kb, as judged by agarose gel electrophoresis. The genomic DNA is then digested with Kpnl restriction endonuclease at 37°C for 4 hours. The restriction fragments less than -8 kb are then purified from an agarose gel using a Qiagen Gel Extraction Kit.
[0147] Melting of the DNA fragments to produce complementary single strand fragments is by heating in an Eppendorf tube at 98°C in 1ml 0. ImM EDTA, lOmM Tris-Cl pH 8.0 for 10 minutes in a heating block. The heating block is then turned off to allow the temperature to drop. When it reaches 90°C the Eppendorf tube is opened and 5M NaCl is added to a final concentration of 500mM NaCl. DNA fragments are allowed to reanneal overnight in the cooling heat block and are then purified by agarose gel electrophoresis and recovered using the Qiagen Gel Extraction Kit. The purification step is intended to eliminate aggregate or concatameric DNA that may have resulted from misaligned re-annealing and to limit the size of DNA in subsequent steps to under -8 kb.
[0148] In order to block all of the 3' hydroxyl groups in the re-annealed fragments, the DNA is reacted with dideoxy ATP and 100 units of TdT in 20mM Tris-acetate, 50mM potassium acetate, lOmM magnesium acetate, pH 7.9, supplemented with 0.25mM CoCl2, at 37° C for 1 h. The reaction products were then purified using a Qiagen DNeasy Kit capture column.
[ 0149 ] Scission of DNA strands adjacent to mismatched nucleotides is performed in 20mM Tris-HCl, pH 7.4, 25mM KC1, lOmM MgCl2, 200 ig DNA substrate, 100U DNA ligase, and an optimized amount of Cell I nuclease ( Sokurenko, E.V. et al. N.A.R. 2001, 29:elll; Zhang et al . , Genetic Testing and Molecular Biomarkers 13:97-103 (2009)) . The reaction is incubated at 42 °C for 20 minutes and terminated by the addition of 2μΙ_ 0.5M EDTA. The reaction products are then purified using a Qiagen DNeasy Kit capture column.
[ 0150 ] The categories of fragments of the nuclease reaction are expected to be 1) double stranded fragments with a dideoxy modified Kpnl sticky end (5' GTACA-dd 3', the dideoxy adenosine having been added in the blocking step) at one end of the fragment and a 3' protruding mismatch nucleotide with a 3 ' -hydroxyl at the other end of the fragment; 2) double stranded fragments with a dideoxy modified blunt end at one end of the fragment and a 3' protruding mismatch nucleotide with a 3 '-hydroxyl at the other end of the fragment; 3) double stranded fragments with a dideoxy modified Kpnl sticky end (5' GTACA-dd 3', the dideoxy adenosine having been added in the blocking step) at both ends of the fragment and a 3' protruding mismatch nucleotide with a 3 ' hydroxyl within the fragment (single strand nick); 4) a large population of perfectly complementary fragments with the dideoxy modified Kpnl sticky end (5' GTACA-dd 3', the dideoxy adenosine having been added in the blocking step) at one or both ends of the fragment and a dideoxy modified blunt end at one or both ends of the fragment. [ 0151 ] The fragment population is then treated with 100 units of TdT in 20mM Tris-acetate 50mM potassium acetate lOmM magnesium acetate pH 7.9 supplemented with 0.25 mM CoCl2 and dideoxy UTP [-S-S-] iotin at 37°C for 1 hour. [-S-S-] refers to a linker strand containing -15-20 C- or N-linked atoms and a disulfide group within the strand. The fragments are treated with a single-strand specific nuclease (Bal31 nuclease) to release an unmodified double strand fragment from category 3) above. After the TdT reaction, 5M NaCl is added to the buffer to a final concentration of 200mM to eliminate any non-specific binding during purification. Biotinylated fragments are then purified by passing through a 1 ml column of Streptavidin-Sepharose . The captured fragments are eluted with 50-100mM dithiothreitol or beta-mercaptoethanol . The DNA fragments are again purified using the Qiagen DNeasy Kit capture column .
[ 0152 ] The purified fragments from Example 2 are then modified by the attachment of appropriate DNA linkers or primers to the Kpnl sticky ends, the blunt ends and the fragments containing the mismatch nucleotide with the 3'dideoxy UTP using ligation procedures that are well known in the art. Linkers and primers facilitate the covalent attachment of fragments to solid supports to allow for multiplex sequencing by any of a variety of techniques that are well known in the art. Alternatively, the linkers can facilitate the ligation of fragments into vectors that enable cloning and non-multiplex sequencing or for PCR amplification reactions .
[ 0153 ] The DNA sequences of the attached linkers also serve to identify the position of the mismatch nucleotide relative to the linker for unambiguous mismatch nucleotide identification . [ 0154 ] The DNA sequences of the fragments serve to align the fragments to the known sequence of the human genome and to determine the position and identity of the mismatch nucleotide. In many cases, another fragment derived from the original genomic restriction fragment will be aligned on the same restriction fragment in a genome database.
[ 0155 ] Citations of Publications Referenced Herein:
Kruglyak, Nat. Genet. 22:139-144 (1999).
Risch et al . , Science 273:1516-1517 (1996).
Lu et al., Genomics 14:249-255 (1992).
Su et al., Genome 31:104-111 (1992).
Landegren et al . , Science 241:1077-1080 (1988).
Mashal et al . , Nature Genetics 9:177 (1995).
Maxam et al . , Methods Enzymol. 65:499-560 (1980).
Mayall et al . , J. Med. Genet. 27:558 (1990).
Meyers et al . , Nature 313:495-498 (1985).
Newton et al . , Nuc Acids Res. 17:2503-2516 (1989).
Orita et al . , Proc. Natl. Acad. Sci. USA 86:2766-2770 (1989).
Pease et al . , Proc. Natl. Acad. Sci. USA 91:5022 (1994).
Richards et al., Human Mol. Gen. 2:159 (1993).
Rommens et al . , Am. J. Genet. 46:395-396 (1990).
Saleeba et al . , Meth. Enzymol. 217:288 (1993).
Sancar, Science 266:1954 (1994) .
Shuber et al., Human Molecular Genetics 2:153-158 (1993).
Sokolov, Nucl. Acids Res. 18:3671 (1989).
Southern, J. Mol. Biol. 98:503-517 (1975).
Su et al., Proc. Natl. Acad. Sci. USA 83:5057 (1986).
Thompson and Thompson, Genetics in Medicine, 5th Ed.
Tsai-Wu et al . , J. Bacteriol. 178:1902 (1991).
Wallace et al . , Nucl. Acids Res. 9:879-895 (1981).
Yeh et al., J. Biol. Chem. 266:6480 (1991).
Youil et al., Proc. Natl. Acad. Sci. USA 92:87 (1995). Aboussekhra et al . , Cell 80:859 (1995).
Chang et al . , Nuc. Acids Res. 19:4761 (1991).
Chehab et al . , Nature 329:293-294 (1987).
Cleaver, Cell 76:1-4 (1994).
Cohen et al . , Nature 334:119-121 (1988).
Cotton et al., Proc. Natl. Acad. Sci. 85:4397-4401 (1988).
Grilley et al . , J. Biol. Chem. 264:1000 (1989).
Ealiassos et al., Nucleic Acids Research 17:3606 (1989).
Huang et al . , Proc. Natl. Acad. Sci. USA 91:12213 (1994).
Keen et al., Trends Genet. 7:5 (1991).
Kosak et al . , Eur. J. Biochem. 194:779 (1990).
Joneja et al . , BioTechniques 46:553-556 (2009).
[ 0156 ] All publications cited in the specification, both patent publications and non-patent publications are indicative of the level of skill of those skilled in the art to which this invention pertains. Any publication not already incorporated by reference herein is herein incorporated by reference to the same extent as if each individual publication were specifically and individually indicated as being incorporated by reference .
[ 0157 ] Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims .

Claims

We claim:
1. A method of purifying, tagging or identifying mismatched nucleotides in DNA, comprising:
a) creating double stranded fragments of each of a first DNA molecule and a second DNA molecule;
b) separating the double stranded DNA fragments of the first and second DNA molecules into single strands to create a population of single stranded DNA fragments;
c) allowing re-annealing of the single stranded DNA fragments, thus producing a population of double stranded DNA fragments which comprises heterohybrid DNA comprising perfectly complementary heterohybrid DNA fragments and heterohybrid DNA fragments containing at least one mismatched pair of nucleotides;
d) prior to step b) or after step c) modifying the 3' hydroxyl groups of the double stranded fragments with blocking groups ;
e) reacting the heterohybrid DNA with a nuclease that cleaves adjacent to one or both of the mismatched nucleotides; f) covalently modifying one or both of the mismatched nucleotides with an entity that facilitates isolation and/or sequencing of the nucleotides of the DNA fragment in the vicinity of the mismatched nucleotide;
g) purifying the covalently modified fragments produced in f) from heterohybrid DNA comprising perfectly complementary heterohybrid DNA fragments via the covalent modification of the mismatched nucleotide; and
h) determining the identity of the mismatched nucleotide and the sequence of the DNA fragment in a mismatch region comprising the mismatched nucleotide.
2. The method of claim 1, wherein the first DNA molecule and the second DNA molecule are isolated or derived from the same source .
3. The method of claim 1, wherein the first DNA molecule, the second DNA molecule or both comprises genomic DNA.
4. The method of claim 1, wherein the first DNA molecule comprises genomic DNA containing an unknown number of mutations or single nucleotide polymorphisms relative to the second DNA molecule which is of known sequence.
5. The method of claim 1, wherein the first DNA molecule, the second DNA molecule or both comprises DNA derived from a PCR polymerase reaction on genomic DNA.
6. The method of claim 1, wherein first DNA molecule or the second DNA molecule is of substantially known or a known sequence .
7. The method of claim 1, wherein the first DNA molecule, the second DNA molecule or both comprises cDNA.
8. The method of claim 1, wherein the first DNA molecule, the second DNA molecule, or both, is isolated or derived from a human cancer cell .
9. The method of claim 1, wherein a) comprises mechanical shearing .
10. The method of claim 1, wherein a) comprises reacting the first and second DNA molecules with a restriction endonuclease .
11. The method of claim 10, wherein each of the first and second DNA molecules is reacted with the same restriction endonuclease .
12. The method of claim 1, further comprising, prior to c), i) amplifying the single strands of DNA produced in b) .
13. The method of claim 1, wherein c) is conducted under high stringency conditions .
14. The method of claim 1, wherein d) comprises reacting the population of double stranded DNA fragments produced in c) with a terminal deoxynucleotidyl transferase and at least one deoxynucleotide triphosphate.
15. The method of claim 14, wherein the at least one deoxynucleotide triphosphate comprises a removable blocking group on a 3 ' -hydroxy1 thereof.
16. The method of claim 1, wherein e) comprises reacting the heterohybrid DNA a single- or double-strand mismatch nicking protein, a mismatch repair protein, a nucleotide excision repair protein, a mismatch nuclease, or a combination of two or more thereof.
17. The method of claim 16, wherein the mismatch nuclease comprises Cell, Celll or a combination thereof.
18. The method of claim 1, wherein e) comprises chemically modifying the heterohybrid DNA prior to the reacting.
19. The method of claim 18, wherein the chemically modifying comprises treating the heterohybrid DNA with osmium tetroxide and hydroxylamine .
20. The method of claim 1, wherein the nuclease in e) is an all-type nicking (ATE) enzyme.
21. The method of claim 1, wherein the nuclease in e) is attached to a first binding group via a scissile linker moiety and wherein g) comprises contacting a DNA-nuclease complex formed in e) with a second binding moiety bound to a solid support .
22. The method of claim 1, wherein the nuclease of e) cleaves one or both strands of double stranded heterohybrid DNA within about 25 nucleotides of the mismatched nucleotides.
23. The method of claim 1, wherein the nuclease in e) is a single-strand specific nuclease and produces a single-strand nick on the 3' side of the mismatched nucleotide, thus exposing the 3' hydroxyl of the mismatched nucleotide, and wherein the method further comprises i) covalently modifying the exposed 3' hydroxyl of the mismatch nucleotide with a purification tag to allow for separation of the heterohybrid DNA fragment containing the nick, and wherein g) and h) comprise separating the nicked heterohybrid DNA fragment from other heterohybrid DNA fragments in the population, exposing the separated DNA fragment to a single-strand specific nuclease which cleaves a phosphodiester bond the nick on the other DNA strand of the separated DNA fragment opposite the nick, separating the strands of the DNA fragment containing the two mismatch nucleotides, and determining the sequence of the mismatch region comprising the mismatched nucleotide.
24. The method of claim 1, wherein the nuclease in e) is a single-strand specific nuclease and produces a single-strand nick on the 5' side of the mismatched nucleotide, thus exposing the 3' hydroxyl adjacent to the mismatched nucleotide, and wherein the method further comprises i) covalently modifying the exposed 3' hydroxyl adjacent to the mismatch nucleotide with a purification tag to allow for separation of the heterohybrid DNA fragment containing the nick, and wherein g) and h) comprise separating the nicked heterohybrid DNA fragment from other heterohybrid DNA fragments in the population, exposing the separated DNA fragment to a single-strand specific nuclease which cleaves a phosphodiester bond the nick on the other DNA strand of the separated DNA fragment opposite the nick, separating the strands of the DNA fragment containing the two mismatch nucleotides, and determining the sequence of the mismatch region comprising the mismatched nucleotide.
25. The method of claim 23 or 24, wherein the nuclease is a topoisomerase I which is detectably labeled.
26. The method of claim 23 or 24, wherein the purification tag in i) comprises a linker moiety and a binding moiety.
27. The method of claim 26, wherein the binding moiety is biotin, and wherein the separating of the heterohybrid DNA fragment containing the nick is conducted by contacting the heterohybrid DNA with streptavidin bound to a solid support.
28. The method of claim 23 or 24, wherein purification tag in i) comprises a nucleotide containing a removable blocking group at its 3' end.
29. The method of claim 28, wherein the removable blocking group is an amidomethyl group.
30. The method of claim 23 or 24, further comprising j) amplifying each of the separated strands of the DNA fragment containing the two mismatch nucleotides, prior to the determining step.
31. The method of claim 1, wherein the nuclease in e) is a double-strand specific nuclease and produces a double-strand break on the 3' side of each of the mismatched nucleotides, thus producing two heterohybrid DNA fragments, each of which contains a mismatch nucleotide, exposing the 3' hydroxyls of each of the mismatched nucleotides, and wherein the method further comprises i) covalently modifying the exposed 3' hydroxyls of each of the mismatch nucleotides with a purification tag to allow for separation of the two heterohybrid DNA fragments containing the breaks, and wherein g) and h) comprise separating the heterohybrid DNA fragments containing the mismatch nucleotides from other heterohybrid DNA fragments in the population, separating the strands of the two heterohybrid DNA fragments containing the two mismatch nucleotides, and determining the sequence of the mismatch region comprising the mismatched nucleotide.
32. The method of claim 1, wherein the nuclease in e) is a double-strand specific nuclease and produces a double-strand break on the 5' side of each of the mismatched nucleotides, thus exposing the 3' hydroxyls of the nucleotides adjacent to each of the mismatched nucleotides, and wherein the method further comprises i) covalently modifying the exposed 3' hydroxyls with a purification tag to allow for separation of the two heterohybrid DNA fragments containing the mismatch nucleotides, and wherein g) and h) comprise separating the two heterohybrid DNA fragments containing the mismatch nucleotides from other heterohybrid DNA fragments in the population, separating the strands of the two heterohybrid DNA fragments containing the mismatch nucleotides, and determining the sequence of the mismatch region comprising the mismatched nucleotide .
33. The method of claim 31 or 32, wherein each of the two heterohybrid DNA fragments containing a mismatch nucleotide are ligated to a linker DNA population comprising four subpopulations of double stranded DNA molecules comprising a single nucleotide overhang (N) at the 5' or 3' terminus thereof, wherein for the four subpopulations (N) represents A, T, C and G, respectively.
34. The method of claim 33, wherein the linker DNA population contains a fluorescent moiety, which for each of the four subpopulations has a different excitation wavelength.
35. A method of determining the sequence of a DNA molecule, comprising: a) preparing single stranded DNA fragments from a polyploid organism, b) allowing the fragments to re-anneal and form double stranded heterohybrid DNA fragments wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch; c) distinguishing formation of heterohybrid DNA containing a mismatch from formation of DNA which is perfectly complementary, d) separating or purifying each heterohybrid DNA containing a mismatch from all other heterohybrid DNA, and e) determining the identity of the mismatched nucleotide ( s ) .
36. The method of claim 35, further comprising determining the sequence of a mismatch region that contains the mismatched nucleotide .
37. A method of determining the sequence of a DNA molecule, comprising: a) preparing single stranded fragments of a first DNA molecule having a substantially known sequence; b) preparing single stranded fragments of a second DNA molecule having an unknown sequence; c) contacting the single stranded fragments of a) or copies thereof, and the single stranded fragments of b) or copies thereof, under conditions that allow formation of heterohybrid DNA, wherein the heterohybrid DNA comprises perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch; d) distinguishing formation of heterohybrid DNA containing a mismatch from formation of heterohybrid DNA which is perfectly complementary; e) separating or purifying each heterohybrid DNA containing a mismatch from all other heterohybrid DNA; and f) determining the identity of the mismatched nucleotide ( s ) .
38. The method of claim 37, further comprising determining the sequence of a mismatch region that contains the mismatched nucleotide, thus allowing elucidation of the sequence of the second DNA.
39. A method of determining the sequence of a DNA molecule, comprising: a) creating double-stranded restriction fragments or double-stranded fragments derived from mechanical shear of genomic DNA; b) modifying 3' hydroxyl groups of all of the fragments with a blocking moiety; c) separating the double stranded DNA fragments, thus producing a single stranded DNA population capable of randomly re-annealing to reform double stranded DNA fragments; d) allowing the DNA to re-anneal, thus forming heterohybrid DNA, wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch, (d') reacting the heterohybrid DNA with a mismatch recognition protein-based system, thus creating a population of double strand DNA fragments wherein one strand contains at least one break in its phosphodiester bonds in a region containing the mismatch (the mismatch region); f) purifying the fragments containing the mismatch nucleotide or mismatched pair of nucleotides from all other heterohybrid DNA fragments; and g) identifying the mismatched nucleotides and the adjacent DNA sequence in each DNA fragment.
40. A method of determining the sequence of a DNA molecule, comprising: a) creating double-stranded restriction fragments or double-stranded fragments derived from mechanical shear of genomic DNA; b) modifying 3' hydroxyl groups of all of the fragments with a blocking moiety; c) separating the double stranded DNA fragments, thus producing a single stranded DNA population capable of randomly re-annealing to reform double stranded DNA fragments; e) allowing the single stranded DNA to re-anneal, thus forming the heterohybrid DNA wherein the heterohybrid DNA includes perfectly complementary heterohybrid DNA and heterohybrid DNA containing a mismatch, and (e') reacting the heterohybrid DNA with a mismatch recognition protein-based system, thus creating a population of double strand DNA fragments wherein both strands of the heterohybrid DNA containing a mismatch are cleaved in a region containing the mismatch is cleaved into two fragments, each of which contains a mismatched nucleotide; f) purifying the fragments containing the mismatch nucleotide or mismatched pair of nucleotides from all other heterohybrid DNA fragments; and g) identifying the mismatched nucleotides and the adjacent DNA sequence in each DNA fragment .
41. A method of identifying mismatched nucleotides in DNA, comprising :
a) digesting genomic DNA, copies of genomic DNA, or cDNA with a restriction endonuclease , thus producing double stranded DNA fragments; b) separating the double stranded DNA fragments thus producing single stranded DNA fragments;
c) allowing re-annealing of the single stranded DNA fragments, thus producing a population of double stranded DNA comprising perfectly complementary heterohybrid DNA fragments and heterohybrid DNA fragments containing at least one mismatched pair of nucleotides, optionally followed by purifying the population to remove aggregate and/or concatameric DNA formed from misaligned re-annealing and DNA fragments greater than a predetermined length;
d) modifying the 3' hydroxyl groups of with a blocking group comprising a dideoxy nucleotide triphosphate;
e) reacting the population of d) with a nuclease that cleaves DNA adjacent to one or both of the mismatched pair of nucleotides, thus producing heterohybrid DNA fragments comprising at both ends thereof, the dideoxy nucleotide triphosphate and a 3 ' protruding mismatch nucleotide with a 3' hydroxyl within the fragment;
f) reacting the heterohybrid DNA fragments of d) with a dideoxy nucleotide triphosphate linked to a first binding moiety via a scissile linker group, such that the heterohybrid DNA fragments of e) become linked to the first binding moiety;
g) separating the heterohybrid DNA fragments linked to the first binding moiety from the population of double stranded DNA by contacting the heterohybrid DNA fragments with a second binding moiety that binds the first moiety, wherein the second binding moiety is affixed to a solid support ;
h) releasing the heterohybrid DNA of g) from the solid support ; i) attaching a DNA linker or primer to the thus-released heterohybrid DNA fragments containing the mismatched pair of nucleotides; and
j) covalently attaching the heterohybrid DNA fragments of i) to a solid support followed optionally by multiplex sequencing of the DNA fragments or
k) ligating the heterohybrid DNA fragments into vectors followed by cloning and non-multiplex sequencing or amplification followed by sequencing.
42. A solid support having affixed thereto, directly or indirectly, a plurality of DNA fragments of known sequence, which are preferably single stranded, and were purified via of the presence of at least one mismatched nucleotide contained therein .
PCT/US2012/065018 2011-11-14 2012-11-14 Mismatch nucleotide purification and identification WO2013074632A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161559410P 2011-11-14 2011-11-14
US61/559,410 2011-11-14

Publications (1)

Publication Number Publication Date
WO2013074632A1 true WO2013074632A1 (en) 2013-05-23

Family

ID=48430114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/065018 WO2013074632A1 (en) 2011-11-14 2012-11-14 Mismatch nucleotide purification and identification

Country Status (1)

Country Link
WO (1) WO2013074632A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015127058A1 (en) * 2014-02-19 2015-08-27 Hospodor Andrew Efficient encoding and storage and retrieval of genomic data
EP3458586A1 (en) * 2016-05-16 2019-03-27 Accuragen Holdings Limited Method of improved sequencing by strand identification
WO2019075383A1 (en) * 2017-10-13 2019-04-18 The Charles Stark Draper Laboratory, Inc. Hybridization immunoprecipitation sequencing (hip-seq)
US11203782B2 (en) 2018-03-29 2021-12-21 Accuragen Holdings Limited Compositions and methods comprising asymmetric barcoding
US11286519B2 (en) 2013-12-11 2022-03-29 Accuragen Holdings Limited Methods and compositions for enrichment of amplification products
US11578359B2 (en) 2015-10-09 2023-02-14 Accuragen Holdings Limited Methods and compositions for enrichment of amplification products
US11597973B2 (en) 2013-12-11 2023-03-07 Accuragen Holdings Limited Compositions and methods for detecting rare sequence variants
US11643683B2 (en) 2016-08-15 2023-05-09 Accuragen Holdings Limited Compositions and methods for detecting rare sequence variants
US11859246B2 (en) 2013-12-11 2024-01-02 Accuragen Holdings Limited Methods and compositions for enrichment of amplification products

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5679522A (en) * 1989-05-12 1997-10-21 Duke University Methods of analysis and manipulation of DNA utilizing mismatch repair systems
WO2003070977A2 (en) * 2002-02-21 2003-08-28 Nanogen Recognomics Gmbh Method for detecting single nucleotide polymorphisms
US6924104B2 (en) * 2000-10-27 2005-08-02 Yale University Methods for identifying genes associated with diseases or specific phenotypes
US20100285970A1 (en) * 2009-03-31 2010-11-11 Rose Floyd D Methods of sequencing nucleic acids

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5679522A (en) * 1989-05-12 1997-10-21 Duke University Methods of analysis and manipulation of DNA utilizing mismatch repair systems
US6924104B2 (en) * 2000-10-27 2005-08-02 Yale University Methods for identifying genes associated with diseases or specific phenotypes
WO2003070977A2 (en) * 2002-02-21 2003-08-28 Nanogen Recognomics Gmbh Method for detecting single nucleotide polymorphisms
US20100285970A1 (en) * 2009-03-31 2010-11-11 Rose Floyd D Methods of sequencing nucleic acids

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11286519B2 (en) 2013-12-11 2022-03-29 Accuragen Holdings Limited Methods and compositions for enrichment of amplification products
US11859246B2 (en) 2013-12-11 2024-01-02 Accuragen Holdings Limited Methods and compositions for enrichment of amplification products
US11597973B2 (en) 2013-12-11 2023-03-07 Accuragen Holdings Limited Compositions and methods for detecting rare sequence variants
WO2015127058A1 (en) * 2014-02-19 2015-08-27 Hospodor Andrew Efficient encoding and storage and retrieval of genomic data
US11578359B2 (en) 2015-10-09 2023-02-14 Accuragen Holdings Limited Methods and compositions for enrichment of amplification products
EP3458586A4 (en) * 2016-05-16 2019-10-30 Accuragen Holdings Limited Method of improved sequencing by strand identification
US11427866B2 (en) 2016-05-16 2022-08-30 Accuragen Holdings Limited Method of improved sequencing by strand identification
EP3458586B1 (en) 2016-05-16 2022-12-28 Accuragen Holdings Limited Method of improved sequencing by strand identification
EP3458586A1 (en) * 2016-05-16 2019-03-27 Accuragen Holdings Limited Method of improved sequencing by strand identification
US11643683B2 (en) 2016-08-15 2023-05-09 Accuragen Holdings Limited Compositions and methods for detecting rare sequence variants
WO2019075383A1 (en) * 2017-10-13 2019-04-18 The Charles Stark Draper Laboratory, Inc. Hybridization immunoprecipitation sequencing (hip-seq)
US11802306B2 (en) 2017-10-13 2023-10-31 The Charles Stark Draper Laboratory, Inc. Hybridization immunoprecipitation sequencing (HIP-SEQ)
US11203782B2 (en) 2018-03-29 2021-12-21 Accuragen Holdings Limited Compositions and methods comprising asymmetric barcoding

Similar Documents

Publication Publication Date Title
US11697843B2 (en) Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing
US20210277459A1 (en) Preparation of templates for methylation analysis
US20190024141A1 (en) Direct Capture, Amplification and Sequencing of Target DNA Using Immobilized Primers
US20220042090A1 (en) PROGRAMMABLE RNA-TEMPLATED SEQUENCING BY LIGATION (rSBL)
JP2024060054A (en) Method for identifying and enumerating nucleic acid sequence, expression, copy, or DNA methylation changes using a combination of nucleases, ligases, polymerases, and sequencing reactions
WO2013074632A1 (en) Mismatch nucleotide purification and identification
US20100120034A1 (en) Methylation analysis of mate pairs
KR102592367B1 (en) Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications
GB2533882A (en) Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation
WO2011071923A2 (en) Multi-sample indexing for multiplex genotyping
US7579155B2 (en) Method for identifying the sequence of one or more variant nucleotides in a nucleic acid molecule
KR102398479B1 (en) Copy number preserving rna analysis method
US20220389408A1 (en) Methods and compositions for phased sequencing
CN114901818A (en) Methods of targeted nucleic acid library formation
JP2007530026A (en) Nucleic acid sequencing
WO2013192292A1 (en) Massively-parallel multiplex locus-specific nucleic acid sequence analysis
JP2002517981A (en) Methods for detecting nucleic acid sequences
US20180245132A1 (en) Methods for hybridization based hook ligation
US20230374574A1 (en) Compositions and methods for highly sensitive detection of target sequences in multiplex reactions
US20100285970A1 (en) Methods of sequencing nucleic acids

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12850660

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12850660

Country of ref document: EP

Kind code of ref document: A1