WO2003066882A2 - Procede et appareil de validation de sequences d'adn sans sequençage - Google Patents

Procede et appareil de validation de sequences d'adn sans sequençage Download PDF

Info

Publication number
WO2003066882A2
WO2003066882A2 PCT/US2003/003643 US0303643W WO03066882A2 WO 2003066882 A2 WO2003066882 A2 WO 2003066882A2 US 0303643 W US0303643 W US 0303643W WO 03066882 A2 WO03066882 A2 WO 03066882A2
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
fragments
mass
double stranded
acid sequence
Prior art date
Application number
PCT/US2003/003643
Other languages
English (en)
Other versions
WO2003066882A3 (fr
Inventor
Gregory T. Went
Original Assignee
Tethys Bioscience, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tethys Bioscience, Inc. filed Critical Tethys Bioscience, Inc.
Priority to AU2003215083A priority Critical patent/AU2003215083A1/en
Publication of WO2003066882A2 publication Critical patent/WO2003066882A2/fr
Publication of WO2003066882A3 publication Critical patent/WO2003066882A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6872Methods for sequencing involving mass spectrometry
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the field of this invention is nucleic acid molecule sequence classification, identification or determination; more particularly it is the validation of large fragments of nucleic acid or genes in a sample without performing de novo sequencing, as well as methods for screening nucleic acids for polymorphisms or mutations by analyzing fragmented nucleic acids using mass spectrometry.
  • the sequence of the human genome contains approximately 3 x 10 9 nucleotides, essentially all of which is publicly available as a result of the Human Genome Project. However, this is a consensus sequence derived for the genomic sequence from relatively few individuals, and the heterogeneity and complexity of both sequence polymorphisms and the splicing pattern of the human genome has been heretofore inadequately explored and characterized.
  • Sequencing by hybridization has been proposed (See, e.g., U.S. Patent Nos. 6,451,996, 5,667,972, 6,018,041, 5,510,270, 5,871,928, and 6,300,063), but is inefficient at determining exon order and inadequate in resolving power. More recently, mass spectrometry has been used to sequence nucleic acids (See, e.g., U.S. Patent Nos. 6,268,131 and 6,140,053) and to identify mutations in nucleic acids (See, e.g., U.S. Patent Nos. 6,051,378 and 6,500,621) but none of these methods are cost effective at validating large numbers of these larger DNA fragments.
  • sequence validation Any improved method for sequence validation will apply to other genomes as well. For all of the above purposes, a rapid, low cost means of validating large fragments of DNA would have a major impact on nucleic acids research and diagnostics.
  • sequence validation The general availability of wild type sequence for the mammalian and pathogen genomes of interest creates a new application, namely sequence validation.
  • Genetic polymorphisms such as mutations can manifest themselves in several forms, such as point mutations, wherein a single base is changed to one of the three other bases, deletions, wherein one or more bases are removed from a nucleic acid sequence and the bases flanking the deleted sequence are directly linked to each other, and insertions, wherein new bases are inserted at a particular point in a nucleic acid sequence adding additional length to the overall sequence. Large insertions and deletions, often the result of chromosomal recombination and rearrangement events, can lead to partial or complete loss of a gene. Of these forms of mutation, in general the most difficult type of mutation to screen for and detect is the point mutation, because it represents the smallest degree of molecular change.
  • the methods of the invention separate (via fragmentation, for example) the nucleic acid molecule sample into overlapping fragments and independently validate the molecular weight of each fragment and their corresponding plus and minus strands. Owing to the extreme low probability of compensating variants, an exact match to the wild type sequence can be readily assumed to be invariant. Only those small number of fragments harboring variant masses need be sequenced in detail, drastically reducing the time and cost of sequence validation.
  • the present invention therefore, allows for the rapid validation of sequence of a nucleic acid molecule, and concomitant determination of any sequence polymorphisms, without the need to sequence the portion of nucleic acids that do not vary from the wild type sequence.
  • the present invention provides a method for validating the sequence of a nucleic acid or detecting polymorphisms within a nucleic acid without sequencing the entirety of the nucleic acid.
  • One aspect the present invention provides methods of validating the sequence of a test double stranded nucleic acid, by contacting the test double stranded nucleic acid with one or more separation means, such that two or more double stranded nucleic acid fragments are generated from said test nucleic acid; generating one or more output signals from each of the fragments, the output signals including a representation of the molecular mass of each of the fragments; and comparing the one or more output signals with a set of output signals known or predicted to be produced by a nucleic acid of identical sequence to the test nucleic acid, whereby the sequence of the test nucleic acid is validated.
  • the separation means is a recognition means, h the practice of the invention, each recognition means recognizes a different target nucleotide subsequence or a different set of target nucleotide subsequences of the test nucleic acid.
  • the test nucleic acid is contacted with one or more recognition means that are restriction enzymes, such as restriction endonucleases.
  • the output signals are derived from mass spectrometry.
  • Methods of mass spectrometry of the present invention include, but are not limited to, ion cyclotron resonance mass spectrometry, electrospray ionization fourier transform ion cyclotron resonance mass spectrometry, matrix- assisted laser desorption ionization mass spectrometry, quadropole ion trap mass spectrometry, magnetic/electric sector mass spectrometry and time-of-fiight mass spectrometry.
  • An optional aspect of the invention is the inclusion of internal calibrants or internal self-calibrants in the set of nonrandom length fragments to be analyzed by mass spectrometry to provide improved mass accuracy.
  • the target double stranded nucleic acid is DNA or double stranded RNA.
  • Sources of DNA include genomic DNA, cDNA, and DNA generated by polymerase chain reaction (PCR).
  • the method may be repeated one, two, three or more times, under conditions such that the size of each of the two or more nucleic acid fragments is decreased with each repetition.
  • the two or more double stranded nucleic acid fragments generated are each under a certain length, e.g., under 500 bases, 200 bases, 100 bases, 50 bases, or 20 bases in length.
  • Another aspect of the invention provides a method for identifying all or substantially all of the DNA fragments encoding polymorphisms in a test double stranded nucleic acid, the method including contacting the test double stranded nucleic acid with one or more separation means, such that two or more double stranded nucleic acid fragments are generated from the test nucleic acid; generating one or more output signals from each of the fragments, the output signal including a representation of the molecular mass of each of the fragments; and comparing the one or more output signals with a set of output signals of a reference nucleic acid of identical sequence, whereby a difference in the one or more output signals of one or more nucleic acid fragments indicates a difference in the sequence of the one or more nucleic acid fragments, thereby identifying all or substantially all of the DNA fragments encoding polymorphisms in the test nucleic acid.
  • the method further includes identifying the one or more nucleic acid fragments having the polymorphism; and repeating the method one or more times, under conditions such that the size of each of the two or more nucleic acid fragments is decreased with each repetition.
  • the method further includes sequencing the nucleic acid fragments with output signals different from the output signals of the reference nucleic acid.
  • the invention provides a method for detecting a polymorphism in a target nucleic acid, the method including obtaining from the target nucleic acid a population of nucleic acid fragments in double stranded form, wherein the population essentially comprises the entirety of fragments generated from non-randomly fragmenting a double-stranded target nucleic acid, and determining the molecular masses of each of the double-stranded nucleic acid fragments of the population.
  • the method further includes comparing the molecular mass of each of the double-stranded nucleic acid fragments with the molecular masses known or predicted to be produced by a double stranded reference nucleic acid; and sequencing the nucleic acid fragments with molecular masses different from the molecular masses of the reference nucleic acid.
  • Another aspect of the invention provides a method for detecting a variation in a nucleic acid sequence among two individuals, the method including independently contacting a first nucleic acid from a first individual and a second nucleic acid from a second individual with one or more separation means, such that two or more double stranded nucleic acid fragments are generated from each of the first nucleic acid and the second nucleic acid; generating one or more output signals from each of the fragments, the output signal including a representation of the molecular mass of each of the fragments; and comparing the one or more output signals generated from the first nucleic acid with the one or more output signals generated from the second nucleic acid, whereby a variation in a nucleic acid sequence among two individuals is detected.
  • Another aspect of the invention provides a method for determimng paternity of an offspring, the method including independently contacting a first nucleic acid from a first individual and a second nucleic acid from a second individual with one or more separation means, such that two or more double stranded nucleic acid fragments are generated from each of the first nucleic acid and the second nucleic acid; generating one or more output signals from each of the fragments, the output signal including a representation of the molecular mass of each of the fragments; and comparing the one or more output signals generated from the first nucleic acid with the one or more output signals generated from the second nucleic acid, thereby determining the paternity of the first individual relative to the second individual.
  • a further aspect of the invention includes a method for identifying a polymorphism in a target double stranded nucleic acid, the method including the steps of contacting the target double stranded nucleic acid with one or more restriction enzymes, such that two or more double stranded nucleic acid fragments are generated from the target nucleic acid; determining the molecular masses of each of the double-stranded nucleic acid fragments; comparing the molecular masses of each of the double-stranded nucleic acid fragments with the molecular masses of the double-stranded nucleic acid fragments known or predicted to be produced by a double stranded reference nucleic acid of identical sequence to the target nucleic acid; repeating these steps one or more times, under conditions such that the size of each of the two or more nucleic acid fragments is decreased with each repetition; and sequencing the nucleic acid fragment(s) with molecular masses different from the molecular masses of the double-stranded nucleic acid fragments of the reference nu
  • An other aspect of this invention is a processor for analyzing nucleic acid sequences
  • a processor for analyzing nucleic acid sequences comprising a selecting module that enables a user to select one or more textual strings corresponding to one or more genes; in response to the user's selection, a providing module that provides a first set of nucleic acid sequence fragments comprising the fragments predicted to be generated by contacting a first double stranded nucleic acid molecule with at least one separation means, said first set of nucleic acid sequence fragments associated with the selected one or more textual stings; an evaluating module that evaluates each of the first set of nucleic acid sequence fragments to predict the mass of each fragment of the first set of nucleic acid sequence fragments; a retrieving module that retrieves experimental results comprising the mass of each of a second set of nucleic acid sequence fragments, said second set of nucleic acid sequence fragments generated by contacting a second double stranded nucleic acid molecule with said at least one separation means; a
  • the processor may further comprise a storing module that stores the results of the validation.
  • the separation means can be a recognition means, such as a restriction endonuclease, preferably a type 2 restriction endonuclease.
  • the process for evaluating the mass of each fragment preferably comprises performing mass spectrometry on each fragments.
  • Applicable means of mass spectrometry can include ion cyclotron resonance mass spectrometry, electrospray ionization fourier transform ion cyclotron resonance mass spectrometry, matrix- assisted laser desorption ionization mass spectrometry, quadropole ion trap mass spectrometry, magnetic/electric sector mass spectrometry and time-of-flight mass spectrometry.
  • the nucleic acid is DNA, however it can alternatively be nucleic acid is double stranded RNA.
  • a further aspect of this invention includes a method for analyzing nucleic acid sequences comprising enabling a user to select one or more textual strings corresponding to one or more genes; in response to the user's selection, providing a first set of nucleic acid sequence fragments associated with the selected one or more textual strings, said first set of nucleic acid sequence fragments comprising the fragments predicted to be generated by contacting a first double stranded nucleic acid molecule with at least one separation means; evaluating each of the first set of nucleic acid sequence fragments to predict the mass of each of the first set of nucleic acid sequence fragments; retrieving experimental results comprising the mass of each of a second set of nucleic acid sequence fragments, said second set of nucleic acid sequence fragments generated by contacting a second double stranded nucleic acid molecule with said at least one separation means; and validating the each of the first set of nucleic acid sequence fragments by evaluating the mass of the each of the first set of nucleic acid sequence fragments against the mass
  • the method may further comprise a step of storing the results of the validation.
  • the separation means can be a recognition means, such as a restriction endonuclease, preferably a type 2 restriction endonuclease.
  • the process for evaluating the mass of each fragment preferably comprises performing mass spectrometry on each fragments.
  • Applicable means of mass spectrometry can include ion cyclotron resonance mass spectrometry, electrospray ionization fourier transform ion cyclotron resonance mass spectrometry, matrix-assisted laser desorption ionization mass spectrometry, quadropole ion trap mass spectrometry, magnetic/electric sector mass spectrometry and time-of-flight mass spectrometry.
  • nucleic acid is DNA, however it can alternatively be nucleic acid is double stranded RNA.
  • a processor for analyzing nucleic acid sequences comprising selecting means that enables a user to select one or more textual strings corresponding to one more genes; in response to the user's selection, providing means that provides the mass of each fragment of a first set of nucleic acid sequence fragments associated with the selected one or more textual strings; evaluating means that evaluates each of the first set of nucleic acid sequence fragments to predict the mass of each fragment of the first set of nucleic acid sequence fragments for at least one separation means; retrieving means that retrieves experimental results comprising the mass of each fragments in a second set of nucleic acid sequence fragments for said at least one separation means; validating means that validates the first set of nucleic acid sequence fragments by evaluating the mass of each fragment of the first set of nucleic acid sequence fragments against the experimental results of the mass of each fragment of the second
  • a further aspect of this invention provides a processor readable medium for analyzing nucleic acid sequences, said medium comprising a first processor readable program code for enabling a user to select one or more textual strings corresponding to one or more genes; in response to the user's selection, a second processor readable program code for providing a first set of nucleic acid sequence fragments associated with the selected one or more textual strings; a third processor readable program code for evaluating each of the first set of nucleic acid sequence fragments to calculate the mass of each fragment of the first set of nucleic acid sequence fragments, said first set of nucleic acid sequence fragments comprising the fragments predicted to be generated by contacting a first double stranded nucleic acid molecule with at least one separation means; a fourth processor readable program code for retrieving experimental results of the determination of the mass of each fragment of a second set of nucleic acid sequence fragments, said second set of nucleic acid sequence fragments comprising the fragments generated by contacting a second double stranded nucleic
  • Figure 1 a depicts the nucleic acid sequence of a Panl nucleic acid (SEQ ID NO: 1) isolated from hamster.
  • Figure lb depicts the nucleic acid sequence of Pan2 (SEQ ID NO: 2) isolated from hamster.
  • Figure 2 demonstrates the pair wise sequence alignment of Panl and Pan2 nucleic acids.
  • Figure 3 indicates the predicted Acil and Haelll restriction enzyme sites within Panl and Pan2 cDNAs. The hatched boxes below the genes indicate regions of sequence divergence between Panl and Pan2 sequences.
  • Figure 4 is a schematic representation of an embodiment of the sequence validation method of the present invention using a Panl cDNA amplicon.
  • Figure 5a is a partial ESI-FTICR-MS spectra (M/Z of 952.5-957.5) of RE fragments derived from a Panl -like cDNAs;
  • Figure 5b is the deconvolution and analysis of the same partial ESI-FTICR-MS Spectra of RE fragments derived from a Panl -like cDNAs.
  • Figure 6a is a partial ESI-FTICR-MS spectra (M/Z of 1017.5-1027.0) of RE fragments derived from a Panl -like cDNAs
  • Figure 6b is the deconvolution and analysis of the same partial ESI-FTICR-MS Spectra of RE fragments derived from a Panl -like cDNAs.
  • Figure 7 is a schematic representation of an embodiment of the polymorphism scanning method of the present invention using genomic DNA (gDNA).
  • Figure 8 is a schematic representation of an embodiment of the polymorphism scanning method of the present invention using the CFTR exon and intron junction regions.
  • Figure 9 depicts an embodiment of the invention where multiple separation means, in this instance restriction endonuclease digestion, of double stranded DNA yields complete coverage of the sequence of the Panl gene overcoming any lower limits of resolution in current mass spectrometry methods.
  • multiple restriction endonucleases are employed and samples are run in tandem.
  • Figure 10 depicts a flow diagram demonstrating an embodiment of the clone validation system of the invention.
  • Figure 11 depicts a flow diagram demonstrating an embodiment of the method of building a nucleic acid reference database, in this instance a method of building a cDNA reference database.
  • Figure 12 depicts a flow diagram demonstrating an embodiment of the method for predicting fragments of cleaved nucleic acid molecules, in this instance a method of predicting restriction enzyme-cleaved fragments of a cDNA sample.
  • Figure 13 depicts a flow diagram demonstrating an embodiment of the method of generating nucleic acid fragments from clones by contacting nucleic acid molecules with separation means, in this instance contacting clones containing the nucleic acid molecules with restriction enzymes.
  • Figure 14 depicts a flow diagram demonstrating an embodiment of the method of generating fragment data for comparison of predicted and experimentally derived fragment sets.
  • Figure 15 depicts a flow diagram demonstrating an embodiment of the method of comparing the predicted and experimentally derived fragment sets.
  • Figure 16 depicts a flow diagram describing an embodiment of the clone validation system of the invention.
  • Figure 17 depicts a flow diagram describing a second embodiment of the clone validation system of the invention.
  • the present invention is directed in part to methods of validating the entire sequence of nucleic acids and for localizing polymorphisms in nucleic acid sequences derived from PCR, expression cloning, genomic cloning and the like using mass spectrometry.
  • the methods described herein can be performed iteratively in order to confirm the sequence of the nucleic acid without sequencing the nucleic acid or, alternatively, to provide detailed information about the nature and location of polymorphisms in the target nucleic acid.
  • the method and apparatus is especially useful for the analysis and validation of fragments ranging from approximately lkb up to approximately lOOkb, but may be adapted for even higher weight fragments.
  • the present invention involves obtaining from a target nucleic acid, using a variety of nonrandom fragmentation techniques, a set of two or more double stranded nucleic acid fragments and comparing the set of fragments with a set of fragments known or predicted to be produced by a double stranded reference nucleic acid of identical sequence to the predicted sequence of the target nucleic acid.
  • the reference nucleic acid may be, e.g., the wild type nucleic acid or may be a nucleic acid having a consensus sequence, i.e., a composite sequence generated by averaging two or more nucleic acid sequences. Most wild type sequences for the genes and genomes of interest are known and are stored in databases.
  • Wild type refers to a standard or reference nucleotide sequence to which variations are compared. As defined, any variation from wild type is considered a mutation, including naturally occurring sequence polymorphisms, insertions, deletions, substitutions, and inversions. The term mutation encompasses all the above-listed types of differences from wild type nucleic acid sequence.
  • the target nucleic acid can be single-stranded or double-stranded DNA, RNA or hybrids thereof, from any source, preferably from a mammalian source, e.g., a human, although any source from which one is capable of isolating nucleic acids can be used in the methods described herein, including pathogens and viruses. Uncommon DNA structures including triple stranded and quadruple stranded DNA are also included in the present invention.
  • the target nucleic acid of the present invention can also be synthesized by methods known to those skilled in the art. When the target nucleic acid is RNA, the RNA is preferably made double-stranded.
  • the target nucleic acid can be an RNA DNA hybrid, wherein either strand can be designated the plus or forward (+) strand and the other, the minus or reverse (-) strand.
  • the target nucleic acid is generally a nucleic acid which must be screened to determine all or substantially all of the polymorphisms, such as mutations.
  • the corresponding target nucleic acid derived from a wild type source is referred to as a reference nucleic acid.
  • the target nucleic acids can be obtained from a source sample containing nucleic acids and can be produced from the nucleic acid by PCR amplification or other amplification technique.
  • the target nucleic acids can be of any size capable of being fragmented by a separation means, e.g., a restriction enzyme.
  • Nonrandom length fragments are nucleic acid molecules generated by nonrandom fragmentation of a target nucleic acid molecule by any separation means, such that two or more double stranded nucleic acid fragments are generated.
  • nonrandom length fragment set(s) generated from the target nucleic acid molecule is(are) compared against reference fragment set(s) prepared from a predicted fragmentation of a reference nucleic acid molecule to validate the sequence of the target nucleic acid molecule.
  • the preferred method of comparing the nonrandom length fragment set(s) to the reference fragment set(s) is to determine the masses of sets of nonrandom length fragments, and to determine the mass of essentially every fragment resulting from the fragmentation of the target double stranded nucleic acid.
  • the methods described herein preferably use mass spectrometry to determine the masses of the set or sets of nonrandom length fragments and compare the output of mass spectrometry to the predicted output of the reference fragment set.
  • the resolving power of the mass spectral analyses of the present invention allow the detection of a very small mass change (on the order of 0.4 Da or smaller) in a nonrandom length fragment, while the mass change of a single base substitution is at least 9 Da (representing a change from A to T).
  • the methods described herein do not require sequencing of the target nucleic acid in order to confirm that the target nucleic acid has the identical sequence of the reference nucleic acid, or alternatively, to identify the nature and presence of all or substantially all of the mutations within the target nucleic acid. Instead, the methods of the present invention allow the comparison of the individual masses of a set of nucleic acid fragments derived from a target nucleic acid with masses of nucleic acid fragments known or predicted to be produced by a double stranded reference nucleic acid of identical sequence to the predicted sequence of the target nucleic acid.
  • a nucleic acid fragment containing a polymorphism By identifying a nucleic acid fragment from the target nucleic acid whose mass differs from the masses of the reference nucleic acid fragments, a nucleic acid fragment containing a polymorphism can be detected.
  • the methods of the present invention can be performed iteratively, such that the size of the nucleic acid fragment containing a polymorphism is successively reduced with each repetition.
  • the specific nature and location of the polymorphism can then be identified by conventional sequencing methods, e.g. , gan er sequencing using dideoxy termination and denaturing gel electrophoresis (S anger, F., Nichlen, S. & Coulson, A. R. Proc. Natl. Acad. Sci.
  • the nonrandom fragmentation techniques of the invention are any methods of fragmenting nucleic acids that provide a defined set of nonrandom length fragments, where that set of nonrandom length fragments may be reproducibly obtained by using the same nonrandom fragmentation method on the same target nucleic acid or its wild type version.
  • the methods used for nonrandom fragmentation are designed to optimize the ease of analyzing the resulting fragment set mass spectral data, e.g., by obtaining a range of fragment sizes that avoids significant overlap of mass peaks.
  • the nonrandom fragmentation techniques of the invention include enzymatic nonrandom fragmentation techniques such as digestion with restriction endonucleases or structure-specific endonucleases, and specific chemical cleavage.
  • the methods of the present invention are useful to validate the sequence of a nucleic acid such as a cDNA cloned into a plasmid or other vector, without de novo sequencing, e.g., Sanger or hybridization sequencing.
  • FT-ICR MS as disclosed in the application, is focused at analyzing cDNAs for mass variations compared to appropriate reference sequence cDNAs.
  • DNA, cDNA, synthetic DNA, and RNA can also be amplified by PCR; templates for PCR include previously isolated cDNA clones, cloned libraries of cDNAs, and RNA derived from appropriate cell or tissue sources which is reverse transcribed into cDNA.
  • templates for PCR include previously isolated cDNA clones, cloned libraries of cDNAs, and RNA derived from appropriate cell or tissue sources which is reverse transcribed into cDNA.
  • all PCR primers will be preferably positioned in unique, non-repetitive sequence stretches and anneal to their respective complementary strand at similar thermodynamic stability to enable amplification conditions to be uniform for all amplicons.
  • primers can be located either in the vector or within the cDNA insert itself.
  • RNAs isolated from cells or tissues will necessitate that the primers be located within the cognate cDNA that results from the RT reaction.
  • a series of minimally overlapping amplicons e.g., each 2 kb in length
  • relevant aspects of the cDNA e.g. 5' UTR and ORF
  • Amplicons will be generated by PCR using a high fidelity, thermostable DNA polymerase or fragments thereof (Klenow-like), e.g. Pful DNA polymerase, which lack both non-templated nucleotide polymerization activity and 3 ' exonuclease activity.
  • the size of the nucleic acids to be validated may be greater than 10 kilobases.
  • Nucleic acids including putative full-length or partial cDNA-derived amplicons, whose size is within the resolving range of FT-ICR will be analyzed for mass variation without fragmentation.
  • the present invention anticipates mass analysis of unfragmented nucleic acids of 200 bases or more, and contemplates analyzing larger nucleic acids (e.g., nucleic acids greater than 250, 300, 400, 500, 750 and 1000 bases in length).
  • Nucleic acids can be analyzed either individually or as mixtures with other nucleic acids that are also within the resolving range of FT-ICR. Preparation of mixtures of nucleic acids is particularly useful when PCR, including multiplexed PCR, is used to generated nucleic acids for validation.
  • nucleic acids whose size is beyond the resolving range of FT-ICR will be fragmented prior to analysis for mass variation. Fragmentation of nucleic acids will be done using one or more sequence specific DNA hydrolases, e.g. restriction enzymes, universal enzymes, etc., whose recognition site is small and therefore occurs frequently in double stranded DNA. Examples include simple four base cutters like Alul, discontinuous four base cutters like HinFL GANTC, and other restriction enzymes with slightly larger restriction sites due to sequence degeneracy, e.g. PspGI, which cuts at the sequence CCWGG.
  • sequence specific DNA hydrolases e.g. restriction enzymes, universal enzymes, etc.
  • the nucleic acids will be digested using one or more restriction enzymes to cleave the DNA such that the sizes of the expected restriction enzyme fragments are within the range of resolution and can be unambiguously distinguished from other fragments within the digest by fragment mass determinations utilizing a mass spectrotrometer (MS), preferably utilizing ESI-FTICR, that determine M/Z with high range, resolution, and accuracy e.g. ⁇ 200 bp, 30,000 (M/ ⁇ M) and >0.01%, respectively.
  • MS mass spectrotrometer
  • nucleic acids, PCR amplicons or restriction enzyme fragments derived from the nucleic acids are analyzed by MS to determine first, the M/Z value for each resolvable amplicon/RE fragment and then, the mass for each nucleic acid or restriction enzyme fragment as appropriate.
  • the mass determination for each nucleic acid or restriction enzyme fragment is compared to the expected values from the corresponding nucleic acid reference sequence.
  • the nucleic acid reference sequence may be present in a database containing known or predicted nucleic acid sequences.
  • test nucleic acids or restriction enzyme fragments derived from a test nucleic acid is identical to that expected for a nucleic acid or a restriction enzyme fragment derived from the reference sequence
  • sequence of the test nucleic acid is validated.
  • analyses that reveal mass differences between one or more test nucleic acids or restriction enzyme fragments and the corresponding reference nucleic acid denote variant nucleic acids having a sequence different than from the reference sequence.
  • the variant nucleic acid or a restriction enzyme fragment is sequenced either completely or within an interval that will encompass the restriction enzyme fragment(s) of variant mass so as to determine the cause of the mass aberration at the molecular level.
  • those region(s) are selected for further mass spectral analysis, either by generating restriction enzyme fragments encompassing the regions or by amplifying sub- regions using PCR, or by other means described herein.
  • the target nucleic acid to which the methods of the invention are applied can be any gene or fragment thereof, a nucleic acid generated by PCR, a cDNA contained within a vector, or all or a portion of a chromosome.
  • the target nucleic acid can be of any length that is capable of being acted upon by a separation means such as one or more restriction enzymes.
  • Target nucleic acids can be, e.g., from about 200 bases to greater than 100,000 bases. No prior amplification or selection of the target nucleic acid is required to practice the methods of the present invention. Alternatively, the target nucleic acid is synthetic.
  • the source of the nucleic acid is any nucleic acid-containing entity, including a whole organism, an organ, a tissue, a cell, a sub-cellular fraction, nucleic acids purified or obtained from biological materials and the like.
  • the nucleic acid source can also be a non-biological material to which a biological material has been contacted, such as an article of clothing contacted with a body fluid, e.g., blood, saliva, tears, urine, perspiration, semen, or vaginal secretions.
  • the nonrandom length fragments generated by the methods of the present invention are of a size capable of being accurately measured by mass spectrometry.
  • the fragment size is under 1,000 bases.
  • the fragment size can also be under about 500, 200, 100, 75, 50, 20 or 10 bases.
  • fragmentation methods that produce a set of random length fragments are not desirable due to the limited reproducibility of such fragments, the limited information available from mass spectrometry analysis of such fragments, and the likelihood of spectral overlap from randomly generated fragments.
  • a set of nonrandom length fragments is preferably generated ranging in length from 10-1000 bases, preferably from about 20 to about 200 bases in length.
  • the range of lengths serves to better separate and resolve the fragment peaks in the resulting mass spectrum.
  • subsequent iterations of the validation or polymorphism detection methods use progressively smaller length fragments. For example, a first set of nonrandom length fragments is generated ranging in length from 100 to 200 bases in length and analyzed using ESI-FITCR MS. A second set of nonrandom length fragments is then generated ranging in length from about 60 to about 100 bases in length and analyzed using ESI-FITCR MS.
  • a third set of nonrandom length fragments is then generated ranging in length from about 20 to about 40 bases in length and analyzed using ESI-FITCR MS.
  • a fourth set of nonrandom length fragments is then generated ranging in length from about 10 to about 20 bases in length and analyzed using ESI-FITCR MS.
  • the resulting polymorphism-containing fragment is then sequenced by standard methods well known in the art.
  • a schematic of a representative process is illustrated in Figure 1. In this manner, a target nucleic acid 2,000 bases in length could be analyzed with a coverage of 3x , to a window of 20 base pairs on average by 4 iterations of the methods of the invention.
  • Fragmentation of target nucleic acids can be accomplished using a number of means, including cleavage with one or more DNA restriction endonucleases targeting specific sequences within double-stranded DNA, chemical cleavage at structure-specific and or base- specific locations, polymerase incorporation of modified nucleotides that create cleavage sites when incorporated, and targeted structure-specific and/or sequence-specific nuclease treatment.
  • the restriction enzymes used are Type II enzymes, which cut DNA at defined positions close to or within their recognition sequences and generally produce discrete restriction fragments and distinct gel banding patterns.
  • Type II enzymes cleave DNA within their recognition sequences, e.g., Hha I, Hind III and Not I.
  • Most Type II enzymes recognize DNA sequences that are symmetric because they bind to DNA as homodimers, but a few, (e.g., BbvC I: CCTCAGC) recognize asymmetric DNA sequences because they bind as heterodimers.
  • Some enzymes recognize continuous sequences (e.g., EcoR I: GAATTC) in which the two half-sites of the recognition sequence are adjacent, while others recognize discontinuous sequences (e.g., Bgl I: GCCNNNNNGGC) in which the half-sites are separated.
  • continuous sequences e.g., EcoR I: GAATTC
  • discontinuous sequences e.g., Bgl I: GCCNNNNNGGC
  • type II enzymes useful in the present invention cleave outside of their recognition sequence to one side. These enzymes are usually referred to as "type IIs" and include, e.g., Fok I and Alw I. These enzymes are intermediate in size, 400-650 amino acids in length, and they recognize sequences that are continuous and asymmetric. They comprise two distinct domains, one for DNA binding, the other for DNA cleavage. They are thought to bind to DNA as monomers for the most part, but to cleave DNA cooperatively, through dimerization of the cleavage domains of adjacent enzyme molecules. For this reason, some type IIs enzymes are much more active on DNA molecules that contain multiple recognition sites.
  • type IIs include, e.g., Fok I and Alw I. These enzymes are intermediate in size, 400-650 amino acids in length, and they recognize sequences that are continuous and asymmetric. They comprise two distinct domains, one for DNA binding, the other for DNA cleavage. They are thought to bind to DNA as monomers for the most
  • type IIs enzymes are preferred in situations wherein non-type IIs enzymes cannot generate a suitable set of nonrandom length fragments, such as in cases of low- complexity DNA, genomic DNA with Alu or other repeats, or polynucleotide repeats (e.g., AAAAAAAAA).
  • type II enzymes useful in the present invention are large, combination restriction-and-modification enzymes, 850-1250 amino acids in length, in which the two enzymatic activities reside in the same protein chain. These enzymes cleave outside of their recogmtion sequences; those that recognize continuous sequences (e.g., Eco57 I: CTGAAG) cleave on just one side; those that recognize discontinuous sequences (e.g., Beg I: CGANNNNNNTGC) cleave on both sides releasing a small fragment containing the recognition sequence.
  • the amino acid sequences of these enzymes are varied but their organization are consistent.
  • multiple rounds of nucleic acid fragmentation and mass spectral analysis are performed, in which the size of the fragmented nucleic acids decrease with each successive round of fragmentation.
  • Multiple restriction enzymes are useful to generate nucleic acid fragments of specific, pre-determined lengths that maximize resolution of the mass spectrometry.
  • the double stranded nucleic acid fragments derived from the fragmentation process can be used directly in mass spectrometry without purification.
  • the fragmented nucleic acids can be purified.
  • the molecular masses of essentially all of the nucleic acid fragments generated by fragmentation are determined. As such it is generally unnecessary to remove any nucleic acid fragments prior to mass determination.
  • the preferred types of mass spectrometry used in the invention include ion cyclotron resonance mass spectrometry, electrospray ionization fourier transform ion cyclotron resonance (ESI-FTICR) mass spectrometry, matrix-assisted laser desorption ionization (MALDI) mass spectrometry, quadropole ion trap mass spectrometry, magnetic/electric sector mass spectrometry and time-of-flight mass spectrometry.
  • ESI-FTICR electrospray ionization fourier transform ion cyclotron resonance
  • MALDI matrix-assisted laser desorption ionization
  • quadropole ion trap mass spectrometry magnetic/electric sector mass spectrometry and time-of-flight mass spectrometry.
  • a preferred method of mass spectrometry is ESI-FTICR.
  • the methods are conducted to accurately determine the masses of a set of nonrandom length fragments and this data is correlated to a reference set of fragments to determine the presence or absence of a polymorphism, followed by optional characterization of any polymorphism present.
  • An advance of the present invention is the ability to perform mass spectrometric determination of the members of a set of double- stranded nonrandom length fragments, optionally in an iterative manner, such that the sequence validity of a nucleic acid can be determined without sequencing the entire nucleic acid.
  • ESI-FITCR MS The preferred method of mass spectrometry is ESI-FITCR MS, in part because of the ability to determine the molecular masses of both strands of double stranded DNA simultaneously.
  • ESI is the more gentle ionization procedure, producing a denatured but intact positive and negative strands.
  • Other MS techniques like MALDI are less preferred owing to the complex fragmentation patters and the lack of resolving power of all the mass fragments.
  • Mass spectrometers are typically calibrated using analytes of known mass. A mass spectrometer can then analyze an analyte of unknown mass with an associated mass accuracy and precision. However, the calibration, and associated mass accuracy and precision, for a given mass spectrometry system (including MALDI-TOF MS) can be significantly improved if analytes of known mass are contained within the sample containing the analyte(s) of unknown mass(es). The inclusion of these known mass analytes within the sample is referred to as use of internal calibrants. External calibrants, i.e. analytes of known mass that are not mixed in with the set of nonrandom length fragments of unknown mass and simultaneously analyzed in a mass spectrometer, are analyzed separately.
  • External calibrants can also be used to improve mass accuracy, but because they are not analyzed simultaneously with the set of fragments of unknown mass, they will not increase mass accuracy as much as internal calibrants do.
  • Another disadvantage of using external calibrants is that it requires an extra sample to be analyzed by the mass spectrometer.
  • MALDI-TOF MS generally only two calibrant molecules are needed for complete calibration, although sometimes three or more calibrants are used.
  • ESI-FTICR the abundance of internal calibrants is sufficient, although a high molecular weight calibrant is often added to help with the automatic detection of peaks in the samples. All of the embodiments of the invention described herein can be performed with the use of internal calibrants to provide improved mass accuracy.
  • the methods described herein one can obtain a mass spectrum with numerous mass peaks corresponding to the set of nonrandom length fragments of the gene or target nucleic acid under study. If no mutation is present in the target nucleic acid, all of the mass peaks corresponding to the nonrandom length fragments will be at mass-to- charge ratios associated with the set of NLFs from the wild type target nucleic acid. However, if the target nucleic acid contains a mutation, usually no more than one or two of the mass peaks will be shifted in mass, leaving the majority of mass peaks at unaltered locations.
  • a self-calibration algorithm uses these unmutated or nonpoiymorphic NLFs for internal calibration to optimize the mass accuracy for analysis of the NLFs containing a mutation, thus requiring no added calibrant(s), simplifying the calibration, and avoiding potential spectral overlaps. In a given sample, however, it will not be known a priori which mass peaks, if any, are altered or shifted from their expected masses for the wild type NLFs.
  • the self-calibration algorithm begins by dividing up the observed mass peaks into subsets, each subset consisting of all but one or two of the observed mass peaks. Each data subset has a different one or two mass peaks deleted from consideration. For each subset, the algorithm divides the subset further into a first group of two or three masses which are then used to generate a new set of calibration constants, and a second group which will serve as an internal consistency check on those new constants.
  • the internal consistency check begins by calculating the mass difference between the m/z values calculated for the second group of mass peaks and the values corresponding to reasonable choices for the associated wild-type NLFs. The internal consistency check can thus take the form of a chi-square minimization where the key parameter is this mass difference.
  • the algorithm finds which data subset has the lowest sum of the squares of these mass differences resulting in a choice of optimized calibration constants associated with group one of this data subset.
  • the mass- to-charge ratios are determined for the mass peaks omitted from the data subset; these are the nonrandom length fragments suspected to contain a mutation.
  • the differences from the observed mass peaks for the wild type NLFs are then used to determine whether a mutation has occurred, and if so, what the nature of this mutation is (e.g. the exact type of deletion, insertion, or point mutation). This self-calibration procedure should yield a mass accuracy of approximately 1 part in 10,000.
  • the present invention also provides a system for validating a target double stranded nucleic acid molecule and optionally identifying unique features (i.e., mutations) therein.
  • the validation system is based on a database of fragments of predicted, wild type nucleic acid molecules against which the fragments of the target double stranded nucleic acid molecule is compared.
  • the flow diagram in Figure 10 describes an embodiment of the validation system applied to one embodiment of the invention, validation of a cDNA sequence.
  • the system initially comprises having a user make a selection of one or more genes of interest, followed by the acquisition of or creation of cDNA clone samples for the selected gene(s).
  • the system Upon receiving and recording a request to perform a validation for the cDNA clone samples, the system branches into two activities.
  • cDNA samples are fragmented using fragmentation means, e.g., by contact of cDNA with various restriction enzymes, and masses are determined for sense and anti-sense strands of DNA.
  • fragmentation means e.g., by contact of cDNA with various restriction enzymes
  • masses are determined for sense and anti-sense strands of DNA.
  • in the second activity in silico calculations are performed to predict cDNA fragmentation based upon the desired genes and the restriction enzyme(s) to be applied, resulting in algorithmic calculations of the masses for sense and anti-sense strands of DNA.
  • the resulting data sets are merged to compare the observed results with the predicted results. Gene matching and validation conclusions can then be drawn from the comparisons.
  • This invention also provides a reference database of wild type nucleic acid sequences.
  • the reference database can be generated from the available nucleic acid sequence databases such as Genbank, EMBL, DDBJ, PDB, GSS, BDGP (the drosophila genome project), the CuraGen GeneCalling® database and the Celera Discovery System.
  • the database can be generated from experimental sequence analysis of wild type genes.
  • the database of the invention is designed to be non-redundant in order to simplify the downstream analysis, which can be confused if multiple, redundant entries are found in the database.
  • the flow diagram in Figure 11 depicts one such procedure for developing a reference database.
  • the cDNA Reference Database (Ref DB) is a database of putative genes and predicted fragment information that would be expected by experimentally applying separation means, such as restriction enzymes (REs), to cDNA samples.
  • the Ref DB is used during the clone validation to compare observed cDNA (digested) fragments against predicted fragments.
  • the process for building the Ref DB begins with a selection of genes for which fragment predictions will be carried out. If information about gene is found (is available in public or commercial sequence databases), a search is performed to find cDNA sequence information for the gene. If cDNA sequence information is located, the cDNA sequence is captured and the gene will be marked to indicate that real cDNA information exists.
  • cDNA sequence information is not found, the genomic DNA (gDNA) sequence information is obtained, and cDNA will be predicted from the gDNA, using an algorithm to predict introns and exons, and then assembling the exons into a predicted cDNA sequence. Following the cDNA prediction process, the gene will be marked as predicted cDNA.
  • cDNA information After the cDNA information has been determined for a gene, that information is stored in the Ref DB. Then, applying desired sets of REs, a process predicts the digested fragments that would result from experimentally applying the REs to a real cDNA sample (see "Predict RE-Cleaved Fragments" section for more details). Each predicted fragment is stored in the Ref DB with references to the source cDNA and the REs that were used in the prediction.
  • an optimal set (or global set) of separation means preferably REs are selected to generate overlapping fragments from which the entire target sequence can be covered.
  • knowing the overhangs on the 3' and 5' ends allows for the exact determination of the composition of each strand.
  • the resulting single strand mass can be directly computed from the composition multiplied by the monoisotopic molecular weight of each nucleotide:
  • test nucleic acid fragments can be generated by contacting the sample with the identical fragmentation means used to generate the database fragment set.
  • the test nucleic acid fragment set is then subject to mass analysis, preferably by mass spectrometric methods, to determine the mass ranges of the test nucleic acid fragment set.
  • Mass range data can be stored as numerical values in a table or displayed in a graphical representation. Comparison of data from the generated test set with the fragment database set allows for validation of the sequence of the test nucleic acid molecule.
  • a variety of statistical approaches can be applied in order to select which table of predicted RE fragments masses is the best fit, including non-linear regression analysis, neural network- type clustering, or a Bayesian analysis.
  • the invention also provides a method for predicting cleaved nucleic acid fragments, which process predicts the results of experimentally combining sets of REs with a particular nucleic acid sample, in particular a cDNA sample.
  • the prediction process begins with the gene sequence for the cDNA, and for each desired RE, predicts the cleavage sites and the resulting fragments that would be expected in experimental work, both for the sense and anti-sense strands of the DNA.
  • the user can determine the fragment starting position, length, nucleotide base composition, and molecular weight. All of the predicted fragment information is stored in the Ref DB.
  • the invention also provides a system for experimentally generating fragments from cDNA clone samples.
  • a user logs into the system and reviews the queue for sample processing requests, and then receives incoming cDNA samples.
  • the samples are advanced to the queue for performing RE separation laboratory work, and then the samples are stored in a refrigeration unit until the experimental work will begin.
  • the RE fragmentation laboratory process consists of three steps. The first step is focused on preparing reagent plates, consisting of RE pairs and buffer. The second step consists of combining the contents of the reagent plates with a plate that contains the cDNA sample.
  • the third step is to let the combined sample/reagent plate sit for several hours (generally overnight) at an appropriate temperature, e.g., 37° centigrade.
  • the final step is conducted in a manner to allow the RE pairs to cleave the cDNA sample and result in fragmentation of the cDNA.
  • the samples are ready for mass spectrometry, which can be done by the user or sent to a supplier of mass spectrometry sequencing services.
  • the purpose of the mass spectrometry sequencing aspect of the invention is to generate observed fragment data that can be used to identify the gene represented by the nucleic acid, in particular the cDNA, sample.
  • an additional aspect of this invention is the provision of nucleic acid fragment data, in particular gene fragment data for genes of interest.
  • the initial data consists of multiple charge patterns.
  • the next step is to transform the data into a simplified pattern such that peak finding can be performed for each fragment and the base composition can be determined for the fragment based upon the number of bases and the molecular weight of the fragment. With determinant fragment data established, the fragment sets can be packaged by, e.g., cDNA sample and RE. Comparing Observed (Experimental) and Predicted Fragments
  • This invention further provides a system for comparing observed (experimental) fragment mass data with the mass data generated from the method for producing predicted fragments of the nucleic acid molecule of interest, preferably a gene.
  • a system for comparing observed (experimental) fragment mass data with the mass data generated from the method for producing predicted fragments of the nucleic acid molecule of interest preferably a gene.
  • the observed are aligned against putative genes using one or more local sequence alignment tools such as BLAST and Smith- Waterman.
  • a histogram is generated for the observed fragments based upon the number of fragments that fall within a set of fragment length ranges.
  • predicted fragments for the same cDNA are retrieved from the Ref DB, aligned, and a histogram is generated for the predicted fragments based upon the number of fragments that fall within a set of fragment length ranges.
  • the observed and predicted fragments, along with their respective histograms are presented to a user in a viewer tool.
  • the viewer tool allows the user to visually examine the match between observed fragments and predicted fragments. Using the viewer tool, in the vast majority of cases, the user will be able to determine whether the experimental data sufficiently matches the predicted data to infer the identity of (validate) the cDNA sample.
  • a clone validation system 100 may include or otherwise access data from, for example, predicted restriction map database 102 and experimental results database 104.
  • Predicted restriction map database 102 may include predicted restriction maps of one or more nucleic acid sequence fragments (e.g., cDNA, portion of genomic DNA, etc.,).
  • Experimental results database 104 may include, for example, experimentally observed data of restriction maps of one or more nucleic acid sequence fragments (e.g., cDNA, portion of genomic DNA, etc.,).
  • the restriction maps of both predicted restriction map database 102 and experimental results database 104 may include a plurality of cleaving sites for one or more restriction endonucleases (e.g., EcoRI).
  • the cleaving sites maybe organized for sensed strands of one or more DNA fragments. In another embodiment, the cleaving sites may be organized for anti-sensed strands of one or more DNA fragments. In yet another embodiment, the cleaving sites may be organized for the pair of strands of one or more DNA fragments.
  • Both predicted restriction map database 102 and experimental results database 104 may also include, for example, but not limited to an identification number, base composition (e.g, proportion of guanine), and molecular weight for each of the stored nucleic acid sequence fragments conesponding to the restriction map.
  • the experimental database 104 may be coupled to a sequencing machine 106.
  • the experimental database 108 map be coupled to a plurality of equipments in a laboratory 108.
  • clone validation system 100 may be coupled to or otherwise access data from one or more public databases (e.g., GenBank) and/or one or more proprietary databases (e.g., Celera Genome Database). Clone validation system 100 may also be coupled to web server 114 and mail server
  • Both web server 114 and mail server 116 may obtain data from clone validation system 100, process the data and enable one or more remote users lOla-n to access the processed data through a web site 120.
  • mail server may enable one or more remote users to access the processed data through a non-web based electronic mail system (not shown in figure).
  • clone validation system may be coupled to wide area network (WAN) 122 and local area network (LAN) (not shown in figures).
  • Clone validation system 100 may also be coupled to one or more output means 124 (e.g., display). A user 101 may obtain results using the one or more output means 124.
  • clone validation system 100 may include a plurality of modules including, for example, clone selection module 202, restriction mapping module 204, clone identification module 206, data organization module 208, search module 210, validation module 212, output module 214, customer identification module 216, and storage module 218.
  • Clone selection module 202 may enable a user to select one or more genes and identify nucleic acid sequence fragments corresponding to the user selected genes.
  • Restriction mapping module 204 may predict one or more cleaving sites for one or more separation means in the nucleic acid sequence fragments corresponding to the user selected genes.
  • restriction mapping module 204 may predict one or more cleaving sites for one or more separation means specified by a user. This prediction may be performed by one or more user selectable algorithms (e.g., neural network algorithm, etc.,) in the system 100.
  • mass determination module 205 (not shown in figure) is included to calculate the mass of the fragments conesponding to the user selected genes using one or more mass determining algorithms.
  • Clone identification module 206 may enable a user to assign an identification code (e.g., an alpha numeric code) for nucleic acid sequence fragments conesponding to the user selected genes. Clone identification module 206 may also identify position of restriction enzyme binding sites, and calculate composition of As, Ts, Gs, and Cs and molecular weight for nucleic acid sequence fragments conesponding to the user selected genes.
  • an identification code e.g., an alpha numeric code
  • Data organization module 208 may organize the data, for example, identification code, molecular weight, etc., in a user specified manner. The organized data may be presented to a user through a display of output means 124.
  • Search module 210 may enable a user to search for unique nucleic acid sequences associated with the sequences of the user selected genes.
  • search module 210 may enable a user to search for nucleic acid sequences, preferably cDNA sequences, associated with the user selected genes.
  • search module 210 may enable a user to search for genomic sequence fragments including introns, and exons associated with the user selected genes.
  • search module 210 may enable a user to search for regulatory sequences associated with the user selected genes.
  • Validation module 212 may validate the nucleic acid sequences of the user selected genes by evaluating the predicted data for cleaving portions with experimentally observed data for cleaving portions. In one embodiment, this evaluation may be performed by, for example, probabilistic modeling of a predicted data versus experimental data. In another embodiment, this evaluation may be performed by one or more user selectable validation algorithms in the system 100. In one embodiment, a validation algorithm in the system 100 may conespond to a plurality of processes, for example, but not limited to obtaining a user requests for validation of one or more clones (e.g., genes, sequence fragments), predicting restriction sites in the one or more clones, retrieving experimental results of the restriction sites, and statistically analyzing predicted restriction sites with experimental results of the restriction sites.
  • clones e.g., genes, sequence fragments
  • the validation module 212 may validate the nucleic acid sequences conesponding to the user selected genes by evaluating the predicted mass of the nucleic acid fragments conesponding to the user selected genes against the experimentally observed mass data stored in the experimental results database 104.
  • the system 100 may determine the divergence in the nucleic acid fragments conesponding to the user selected genes based this evaluation and identify the fragments that may need further validation by sequencing.
  • Output module 214 may output the results of the validation and enables a user to identify unique features, for example, but not limited to single nucleotide polymorphisms (SNPs), micro-satellites, mini-satellites, etc.
  • SNPs single nucleotide polymorphisms
  • output module 214 may enable a user to identify candidate genes for the nucleic acid sequences conesponding to the user selected genes.
  • Storage module 218 may store the results of search, validation, and output for the nucleic acid sequences conesponding to the user selected genes.
  • a user may be able to store predicted restriction sites for each of the nucleic acid sequence fragments analyzed by the system 100.
  • Customer identification module 216 may store user data, including, for example, user log-in, password etc., of a plurality of users using clone validation system 100. Customer identification module may also track activities of a user, for example, time logged-in, time logged-out, duration of usage of clone validation system, etc.
  • the invention provides a method for medical decision making based on the presence or absence of a gene of interest in the test double stranded nucleic acid molecule.
  • Such medical decision making can comprise diagnosis of a genetic-based disorder and chromosomal aneuploidy or genetic predisposition to disease state.
  • Panl and Pan2 cDNAs are subjected to restriction enzyme digestion using Acil and Haelll.
  • a restriction enzyme map of each cDNA digested with Acil, and Haelll is provided in Figure 3.
  • the region within each cDNA amplicon that encodes divergent sequence relative to its counterpart is shown with a cross hatched black rectangle below the depiction of the gene. Only those Pan2-derived restriction enzyme fragments that either span or partially overlap the specified divergent segment(s) will fail to validate the mass fragment pattern expected for a Panl sequence, and consequently, will result in one or more fragments with mass variation when compared to the Panl reference sequence. The same result will occur when comparing Panl -derived restriction enzyme fragments with fragments expected from a Pan2 reference sequence.
  • Tables 1 and 2 provide a list of RE fragments resulting from single and double digestion of Panl and Pan2 cDNA with Acil (C'CGC) and Haelll (GG'CC) and the expected molecular weights of the plus and minus strands for each fragment.
  • FIG. 4 A schematic illustration of the method used to analyze the Panl and Pan2 cDNAs using ESI-FITCR is demonstrated in Figure 4. Amplification of cDNAs performed herein may be omitted or modified as required. Fragmented Panl and Pan2 cDNAs are prepared and spectra are generated using ESI-FTICR-MS, which can be deconvoluted using standard deconvolution means, and compared to identify the region of Panl or Pan2 for each resulting fragment mass.
  • Figure 5a shows aligned partial spectra over the M/Z range from 952.5 to 957.5 for restriction enzyme digests of Panl and Pan2 cDNAs.
  • Figure 6a shows aligned partial spectra over the M/Z range from 1017.5 to 1027.0 for RE digests of Panl and Pan2 cDNAs.
  • Pan2 a unique molecular ion exists, (M-29H +)29 ⁇ at a M/Z of 1023.790.
  • Deconvolution and analysis of this portion of the aligned spectra, shown in Figure 6b lowers the background and simplifies the pattern.
  • M-tT ⁇ 1" the monoisotopic molecular weight is measured to be 29,689.929 daltons.
  • the following example demonstrates a method of the invention detecting polymorphisms in the CFTR gene using mass variation identification.
  • the present invention allows the analysis of an entire gene for mass variation.
  • the gene may be associated with a specific disease, such as the human cystic fibrosis transmembrane receptor (CFTR) gene.
  • CFTR human cystic fibrosis transmembrane receptor
  • the gene may be analyzed for the presence of single nucleotide polymorphisms (SNPs) in nucleic acids derived from a subject (test nucleic acid or test DNA) or population of subjects.
  • SNPs single nucleotide polymorphisms
  • DNA fragments derived from a minimally tiled set of overlapping amplicons are derived by PCR of human genomic DNA. These amplicons may be of any size suitable for overlapping analysis, such as about 500 bases, 1 kb, 2kb or greater.
  • the exon organization of the CFTR gene is presented in Table 3. Exon lengths greater than 150 bases are indicated in bold in Table 3.
  • a set of minimally overlapping amplicons is designed such that when amplified by PCR from genomic DNA, the complete gene is available for sequence validation based on mass analysis.
  • Each amplicon will encode one or more mfrons and one or more exons.
  • Primers can be positioned in either introns or exons but will preferably be positioned in unique, non-repetitive sequence sfretches within introns.
  • a schematic illustration of the method described in this example is provided in Figure 7 a.
  • Figure 7b demonstrates the detectable changes in restriction enzyme fragment length of two mutations in exon 10 the CFTR gene.
  • the CFTR exon 10 is amplified to generate a 280 basepair amplicon (SEQ ID NO: XXX).
  • the delta 508 mutation of CFTR exon 10 results in a change at nucleotides 184- 186, and the delta 507 mutation of CFTR exon 10 results in a change at nucleotides 181-184.
  • the alterations in restriction enzyme fragment length can be observed when the CFTR exon 10 amplicon is digested with a single restriction enzyme or two restriction enzymes.
  • digestion of the wild-type amplicon with BstNI generates a restriction enzyme fragment is 122 bases in length from the 3 'most BstNI site to the 3' end of the amplicon (plus strand), while the conesponding restriction enzyme fragment resulting from digestion of either the delta 508 and delta 507 mutant amplicons with BstNI is 119 bases in length (plus strand), a 3 base decrease that can be detected by the mass spectrometric methods of the present invention.
  • Table 4 provides the approximate location of forward and reverse primers and the exons that are included within the analysis such as to generate a tiling set of ⁇ 2 kb amplicons.
  • Amplicons are generated by PCR using a high fidelity, thermostable DNA polymerase or fragments thereof (Klenow-like), e.g. Pful DNA polymerase, which lack both non-templated nucleotide polymerization activity and 3' exonuclease activity.
  • amplicons can be generated simultaneously as part of one or more multiplex PCR reactions. Alternatively, amplicons can be generated individually and then optionally mixed with other amplicons in a predetermined manner prior to DNA fragmentation.
  • the amplicons will be fragmented using one or more sequence specific DNA hydrolases, e.g. restriction enzymes, universal enzymes, etc., whose recognition site is small and therefore occurs frequently in double stranded DNA.
  • sequence specific DNA hydrolases e.g. restriction enzymes, universal enzymes, etc.
  • amplicons are digested using one or more restriction enzymes to cleave the DNA such that the resulting fragments are less than, e.g., 100 bp in length.
  • the amplicons are singly digested, or alternatively, mixed in different combinations such that mix 1, comprised of two or more amplicons, is digested with a unique combination of restriction enzymes (REs), e.g., RE 1-3, and mix 2, also comprised of two or more amplicons, is digested with a combination of REs, e.g. RE 1, 3, and 4.
  • REs restriction enzymes
  • Additional amplicon mixes are assembled and digested appropriately to generate restriction enzyme fragments that can be unambiguously distinguished from other fragments within the digest by fragment mass determinations utilizing mass spectrotrometers (MS), preferably utilizing ESI-FTICR, that determine M/Z with high range, resolution, and accuracy e.g. ⁇ 200 bp, 30,000 and >0.01%, respectively.
  • MS mass spectrotrometers
  • Example 3 Detection of Polymorphism in Entire Gene Regions.
  • the following example demonstrates the methods of the invention applied to detection of polymorphisms in the CFTR coding and splice regions using mass variation identification.
  • the present invention allows the detection of putative mutations, variants or polymorphisms within a gene of interest such as the CFTR gene, and can be focused towards the exons and proximal intron regions encoding splice junctions.
  • Table 3 a set of non-overlapping amplicons are designed such that when amplified by PCR from genomic DNA, the entirety of the exons and their respective proximal introns junctions are available for sequence validation and polymorphism based on mass analysis.
  • Each amplicon encodes a single exon and proximal segments of both upstream and downstream flanking introns.
  • the forward primer is positioned in the upstream intron and the reverse primer is positioned in the downstream intron relative to the exon to be amplified. All primers are preferably positioned in unique, non-repetitive sequence stretches within introns and anneal to their respective complementary strand at similar thermodynamic stability to enable amplification conditions to be uniform for all amplicons.
  • Table 5 provides the approximate location of forward and reverse primers for each amplicon, the exon that is included within the respective amplicon, and the size of the resulting amplicon.
  • Amplicons are generated by PCR using a high fidelity, thermostable DNA polymerase or fragments thereof (Klenow-like), e.g. Pful DNA polymerase, which lack both non-templated nucleotide polymerization activity and 3' exonuclease activity. Multiple amplicons are generated simultaneously as part of one or more multiplex PCR reactions. Alternatively, amplicons are generated individually and then optionally mixed with other amplicons in a predetermined manner for DNA fragmentation.
  • Klenow-like e.g. Pful DNA polymerase
  • Table 6 demonstrates the detectable changes in restriction enzyme fragment length of two mutations in exon 10 the CFTR gene.
  • the CFTR exon 10 can be amplified to generate a 210 basepair amplicon.
  • the delta 508 mutation of CFTR exon 10 results in a 207 basepair amplicon, and the delta 507 mutation of CFTR exon 10 results in a 207 basepair amplicon.
  • the alterations in restriction enzyme fragment length can be observed when the CFTR exon 10 amplicon is digested with a single restriction enzyme or two restriction enzymes. Masses differing between wild-type CTFR exon 10 and the delta 508 and the delta 507 mutations are indicated in bold.
  • digestion of the wild-type amplicon with BstNI generates a restriction enzyme fragment that is 79 bases in length from the 3 'most BstNI site to the 3' end of the amplicon (plus strand) with a monoisotopic mass of 24439.051 Da
  • the conesponding restriction enzyme fragment resulting from digestion of either the delta 508 and delta 507 mutant amplicons with BstNI is 76 bases in length (plus strand) with a monoisotopic mass of 23526.914 Da
  • a 3 base decrease that results in a decrease in mass of 912.137 Da.
  • BstNI (CC'WGG) cuts at 120 and 131 bp generating fragments of 120, 11 and 79
  • Termini Strand Strand Length Strand Mass (monoisotopic) wt ⁇ 508 ⁇ 507 wt ⁇ 508 ⁇ 507
  • T'TAA cuts at 80 and 140 generating fragments of 80, 60 and 70
  • Termini Strand Strand Length Strand Mass (monoi: sotopic) wt ⁇ 508 ⁇ 507 wt ⁇ 508 ⁇ 507
  • NlalV (GGN'NCC) cuts at 62 and 135 generating fragments of 62, 73 and 75
  • Termini Strand Strand Length Strand Mass (monoisotopic) wt ⁇ 508 ⁇ 507 wt ⁇ 508 ⁇ 507
  • Tsp509l ("AATT) cuts at 77 and 95 generating fragments of 77, 18 and 115
  • Termini Strand Strand Length Strand Mass (monoisotopic) wt ⁇ 508 ⁇ 507 wt ⁇ 508 ⁇ 507
  • BstNI (CC'WGG) and Msel (T AA) cut at 80, 120, 131and 140 bp generating fragments of 80, 40, 11 , 9 and 70
  • Termini Strand Strand Length Strand Mass (monoisotopic) wt ⁇ 508 ⁇ 507 wt ⁇ 508 ⁇ 507
  • BstNI (CC'WGG) and NlalV (GGN'NCC) cut at 62, 120, 131 and 135 bp generating fragments of 62, 58, 11 , 4, and 75.
  • Termini Strand Strand Length Strand Mass (monoisotopic) wt ⁇ 508 ⁇ 507 wt ⁇ 508 ⁇ 507
  • BstNI (CC'WGG) and Tsp509l ("AATT) cut at 77, 95, 120, and 131 bp generating fragments of 77, 18, 25, 11 , and 79 bp.
  • Termini Strand Strand Length Strand Mass (monoisotopic) wt ⁇ 508 ⁇ 507 wt ⁇ 508 ⁇ 507
  • T'TAA T'TAA
  • GGN'NCC NlalV
  • Tsel T'TAA
  • AATT Tsp509l
  • Termini Strand Strand Length Strand Mass (monoi sotopic) wt ⁇ 508 ⁇ 507 wt ⁇ 508 ⁇ 507
  • NlalV GGN'NCC
  • Tsp509l 'AATT
  • Termini Strand Strand Length Strand Mass (monoisotopic) wt ⁇ 508 ⁇ 507 wt ⁇ 508 ⁇ 507
  • Tsp5091-NlalV plus 40 40 40 40 12273.955 12273.955 12273.955 minus 36 36 36 11227.901 11227.901 11227.901
  • CFTR amplicons whose size is within the resolving range of FT-ICR are analyzed for mass variation without fragmentation. These amplicons will be examined for mass variation either individually or as mixtures with other amplicons that are also within the resolving range of the FT-ICR.
  • Amplicons whose size is beyond the resolving range of FT-ICR are fragmented prior to analysis for mass variation, as described supra.
  • amplicons are digested using one or more restriction enzymes to cleave the DNA such that the resulting fragments are less than, e.g., about 100 bp in length.
  • the amplicons are singly digested or, alternatively, mixed in different combinations such that mix 1, comprised of two or more amplicons, is digested with a combination of restriction enzymes, e.g. RE 1-3.
  • mix 2 also comprised of two or more amplicons, is digested with a combination of restriction enzymes, e.g.
  • Additional amplicon mixes are assembled and digested appropriately to generate RE fragments whose sizes are within the range of resolution by mass spectrometry and can be unambiguously distinguished from other fragments within the digest by fragment mass determinations utilizing mass spectrotrometers (MS), preferably utilizing ESI-FTICR.
  • MS mass spectrotrometers
  • Mass spectrometers such as these are able to determine M/Z with high range, resolution, and accuracy e.g. ⁇ 200 bp, 30,000 and >0.01%, respectively.
  • genomic DNA from the parents, siblings, and other first-degree relatives in addition to the test subject (the proband). Accordingly, amplification of the exons and splice regions of the CFTR gene is performed for each member in the family for which genomic DNA is available. Once amplified, each set of amplicons for individual family members are fragmented, analyzed by ESI-FTICR and then compared to a reference set of amplicons derived from genomic DNA of known sequence, or alternatively, compared to a database containing masses of predicted amplicons.
  • Mass analyses that reveal differences between one or more amplicons (and resulting RE fragments) derived from test DNAs and the appropriate reference set of amplicons (and resulting RE fragments) will denote variant amplicons that encode a sequence different than that of the reference sequence. Furthermore, variant and invariant amplicons derived from the test subject (proband) should be consistent with Mendelian inheritance. Exceptions to this prediction may arise due to somatic mutations within the discordant amplicon. When mass variant amplicon mixes are identified, the mass analysis determination is repeated with individual amplicons that comprised the original amplicon mix to ascertain which amplicon or amplicons show mass variation. After identifying individual amplicons that fail to validate the reference sequence, those amplicons will be sequenced either completely or within intervals that will encompass restriction enzyme fragments of variant mass when compared to the standards predicted by the reference sequence.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un système comprenant des procédés qui permettent de déterminer, sans séquençage, la séquence d'un acide nucléique dérivé de manière biologique ou non biologique. Les procédés comparent, de préférence, les masses moléculaires de sous-séquences générées à partir de la séquence cible avec des masses moléculaires prédites par une étape de consultation de base de données. L'invention concerne des procédés mis en oeuvre par ordinateur, qui permettent d'analyser les résultats expérimentaux et de déterminer toute sous-région de l'acide nucléique contenant au moins une variation.
PCT/US2003/003643 2002-02-06 2003-02-06 Procede et appareil de validation de sequences d'adn sans sequençage WO2003066882A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003215083A AU2003215083A1 (en) 2002-02-06 2003-02-06 Method and apparatus for validating dna sequences without sequencing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US35464002P 2002-02-06 2002-02-06
US60/354,640 2002-02-06

Publications (2)

Publication Number Publication Date
WO2003066882A2 true WO2003066882A2 (fr) 2003-08-14
WO2003066882A3 WO2003066882A3 (fr) 2004-01-22

Family

ID=27734402

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/003643 WO2003066882A2 (fr) 2002-02-06 2003-02-06 Procede et appareil de validation de sequences d'adn sans sequençage

Country Status (2)

Country Link
AU (1) AU2003215083A1 (fr)
WO (1) WO2003066882A2 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1740719A1 (fr) * 2004-04-09 2007-01-10 Trustees Of Boston University Procede de detection de novo de sequences dans des acides nucleiques et sequencage de cibles par fragmentation
EP2145180A1 (fr) * 2007-04-13 2010-01-20 Sequenom, Inc. Procédés et systèmes d'analyse de séquence comparative
CN113658635A (zh) * 2021-08-24 2021-11-16 北京诺禾致源科技股份有限公司 核酸检测结果的自动判定方法、装置及其应用

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6140053A (en) * 1996-11-06 2000-10-31 Sequenom, Inc. DNA sequencing by mass spectrometry via exonuclease degradation
US6277573B1 (en) * 1995-03-17 2001-08-21 Sequenom, Inc. DNA diagnostics based on mass spectrometry
US6303309B1 (en) * 1996-05-13 2001-10-16 Sequenom, Inc. Method for dissociating biotin complexes
US6566055B1 (en) * 1996-09-19 2003-05-20 Sequenom, Inc. Methods of preparing nucleic acids for mass spectrometric analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6277573B1 (en) * 1995-03-17 2001-08-21 Sequenom, Inc. DNA diagnostics based on mass spectrometry
US6303309B1 (en) * 1996-05-13 2001-10-16 Sequenom, Inc. Method for dissociating biotin complexes
US6566055B1 (en) * 1996-09-19 2003-05-20 Sequenom, Inc. Methods of preparing nucleic acids for mass spectrometric analysis
US6140053A (en) * 1996-11-06 2000-10-31 Sequenom, Inc. DNA sequencing by mass spectrometry via exonuclease degradation

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1740719A1 (fr) * 2004-04-09 2007-01-10 Trustees Of Boston University Procede de detection de novo de sequences dans des acides nucleiques et sequencage de cibles par fragmentation
EP1740719A4 (fr) * 2004-04-09 2008-03-05 Univ Boston Procede de detection de novo de sequences dans des acides nucleiques et sequencage de cibles par fragmentation
US7470517B2 (en) 2004-04-09 2008-12-30 Trustees Of Boston University Method for de novo detection of sequences in nucleic acids: target sequencing by fragmentation
AU2005233598B2 (en) * 2004-04-09 2010-09-30 Trustees Of Boston University Method for De novo detection of sequences in nucleic acids:target sequencing by fragmentation
US7807375B2 (en) 2004-04-09 2010-10-05 Trustees Of Boston University Method for de novo detection of sequences in nucleic acids: target sequencing by fragmentation
EP2145180A1 (fr) * 2007-04-13 2010-01-20 Sequenom, Inc. Procédés et systèmes d'analyse de séquence comparative
EP2145180A4 (fr) * 2007-04-13 2011-11-16 Sequenom Inc Procédés et systèmes d'analyse de séquence comparative
AU2008240143B2 (en) * 2007-04-13 2013-10-03 Agena Bioscience, Inc. Comparative sequence analysis processes and systems
CN113658635A (zh) * 2021-08-24 2021-11-16 北京诺禾致源科技股份有限公司 核酸检测结果的自动判定方法、装置及其应用
CN113658635B (zh) * 2021-08-24 2023-09-29 北京诺禾致源科技股份有限公司 核酸检测结果的自动判定方法、装置及其应用

Also Published As

Publication number Publication date
AU2003215083A8 (en) 2003-09-02
AU2003215083A1 (en) 2003-09-02
WO2003066882A3 (fr) 2004-01-22

Similar Documents

Publication Publication Date Title
EP2145180B1 (fr) Procédés et systèmes d'analyse de séquence comparative
US7820378B2 (en) Fragmentation-based methods and systems for sequence variation detection and discovery
Edwards et al. Mass-spectrometry DNA sequencing
Buetow et al. High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry
Pusch et al. MALDI-TOF mass spectrometry-based SNP genotyping
AU2004235331B2 (en) Fragmentation-based methods and systems for De Novo sequencing
GB2339905A (en) Use of mass-specrometry for detection of mutations
US6994969B1 (en) Diagnostic sequencing by a combination of specific cleavage and mass spectrometry
Sauer Typing of single nucleotide polymorphisms by MALDI mass spectrometry: principles and diagnostic applications
Lechner et al. Large-scale genotyping by mass spectrometry: experience, advances and obstacles
Graber et al. Differential sequencing with mass spectrometry
WO2002008462A1 (fr) Methode d'haplotypage par spectrometrie de masse
WO2003066882A2 (fr) Procede et appareil de validation de sequences d'adn sans sequençage
van den Boom et al. Discovery and identification of sequence polymorphisms and mutations with MALDI-TOF MS
US20040101873A1 (en) Method and apparatus for validating DNA sequences without sequencing
Fenyö et al. Informatics development: challenges and solutions for MALDI mass spectrometry
Singh et al. High-throughput SNP genotyping
Oliveira et al. Mutation detection in plasmid‐based biopharmaceuticals
van den Boom et al. MALDI‐MS of Nucleic Acids and Practical Implementations in Genomics and Genetics
Allman et al. Laser Desorption Mass Spectrometry for High Throughput DNA Analysis and Its Applications
van den Boom et al. Analysis of nucleic acids by mass spectrometry
February Mass spectrometry to assess DNA sequence polymorphisms
Berkenkamp et al. Analysis of Nucleic Acids, and Practical Implementations in Genomics and Genetics
Ding Qualitative and quantitative analysis of nucleic acids with mass spectrometry and its applications
Chen et al. Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP