WO2008147879A1 - Automated method and device for dna isolation, sequence determination, and identification - Google Patents

Automated method and device for dna isolation, sequence determination, and identification Download PDF

Info

Publication number
WO2008147879A1
WO2008147879A1 PCT/US2008/064519 US2008064519W WO2008147879A1 WO 2008147879 A1 WO2008147879 A1 WO 2008147879A1 US 2008064519 W US2008064519 W US 2008064519W WO 2008147879 A1 WO2008147879 A1 WO 2008147879A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
sample
biological sample
dna
present
Prior art date
Application number
PCT/US2008/064519
Other languages
French (fr)
Inventor
Ryan Golhar
Original Assignee
Ryan Golhar
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ryan Golhar filed Critical Ryan Golhar
Publication of WO2008147879A1 publication Critical patent/WO2008147879A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • the rapid identification of the nucleic acid sequences present in a complex biological sample has many practical applications. For example, the ability to rapidly identify the presence of pathogens in a biological sample, via their DNA or RNA signature, would be of enormous importance for the identification of hazardous bioagents or the diagnosis of disease in human patients.
  • pathogen identification requires specimen culturing or detection with pathogen-specific antibodies, both of which are not possible for all types of infections.
  • Molecular diagnostic methods involve detecting the hybridization of pathogen DNA or RNA present in the sample to known probes using DNA chips. Such methods are limited to the detection of known pathogens thus, as pathogens mutate, the pathogenic DNA may no longer hybridize to existing probes and new probes must be developed.
  • Alternative methods of pathogen identification include nucleic acid sequencing of DNA or RNA present in the sample.
  • current sequencing methodologies for pathogen identification are based on Sanger DNA sequencing which requires both amplification of the target nucleic acid and allows only a single nucleotide sequence to be identified from each sequencing reaction. Sanger sequencing is performed on a single known DNA fragment of interest. Thus, amplification and sequencing of the target nucleic acid implies a priori knowledge of the pathogen contained within the sample.
  • none of these current detection methods are capable of seamless, integrated operation.
  • the present invention provides novel methods, software and devices for the rapid identification of any nucleic acid sequence or nucleic acid-containing bioagent present in a biological sample.
  • the present invention involves: a) isolating nucleic acid from a biological sample; b) sequencing the nucleic acid within the sample using single- molecule sequencing technology; and c) analyzing the derived nucleic acid sequences by comparison to reference sequence(s), for example, in a database.
  • the present invention has many uses in areas that would require a rapid and integrated molecular diagnostic identification system.
  • the present invention allows extremely rapid and accurate detection and identification of bioagents compared to existing methods. Furthermore, this rapid detection and identification is possible even when sample material is impure.
  • the invention is useful in a wide variety of fields, including, but not limited to, medical diagnosis and pharmacogenetic analysis (including: diagnosis of infectious diseases and conditions; cancer diagnosis based on mutations and polymorphisms; drug resistance and susceptibility testing; screening for and/or diagnosis of genetic diseases and conditions), germ warfare (allowing immediate identification of the bioagent and appropriate treatment), environmental testing (e.g., detection and discrimination of pathogenic vs. non-pathogenic bacteria in soil, water or other samples), agricultural testing (e.g., detection of livestock infection, produce contamination), veterinary testing, and forensics (e.g., rapid detection of bioagents for molecular fingerprinting).
  • medical diagnosis and pharmacogenetic analysis including: diagnosis of infectious diseases and conditions; cancer diagnosis based on
  • the present invention can be used to detect and classify any bioagent containing nucleic acid (e.g., DNA), including bacteria, viruses, fungi and toxins.
  • nucleic acid e.g., DNA
  • the information obtained is used to determine practical information needed for countermeasures, including toxin genes, pathogenicity islands and antibiotic resistance genes.
  • the methods can be used to identify natural or deliberate engineering events including chromosome fragment swapping, molecular breeding (gene shuffling), DNA mutations (preventing DNA chip or primer hybridization) and emerging infectious diseases.
  • the invention has several advantages that include, but are not limited to, the following, providing integrated methods for the rapid identification of any nucleic acid sequence or nucleic acid-containing biological organisms present in a complex biological sample directly from the sample without the need for amplification of the nucleic acid; providing software for the identifying the source organism of any deduced nucleic acid sequence; and providing devices capable of performing the integrated processing of complex biological samples to determine the identity and predicted source of any nucleic acid present.
  • Figure 1 Depicts an environment suitable for practicing an embodiment of the present invention
  • FIG. 1 Depicts an alternative distributed environment suitable for practicing an embodiment of the present invention
  • Figure 3 Depicts a flowchart of a sequence of steps that may be followed by an embodiment of the present invention to predict bioagents present in a nucleic acid sequence isolated from a biological sample and subjected to a single molecule sequencing operation.
  • bioagent refers to any organism, living or dead, or a nucleic acid derived from such an organism.
  • examples of bioagents include but are not limited to cells (including but not limited to human clinical samples, bacterial cells and other pathogens) viruses, toxin genes and bioregulating compounds). Samples may be alive or dead or in a vegetative state (for example, vegetative bacteria or spores) and may be encapsulated or bioengineered.
  • sample refers to any form of matter capable of containing a bioagent.
  • samples include, but are not limited to, blood, animal tissue, sputum, urine, cell culture medium, water, leaf spot, soil, plant tissue, paleontology samples, forensic samples, water, food, and powders.
  • nucleic acid and “single-stranded nucleic acid” refers to RNA or RNA containing molecules as well as DNA or DNA containing molecules.
  • RNA refers to a polymer of ribonucleotides.
  • DNA or “DNA molecule” or deoxyribonucleic acid molecule” refers to a polymer of deoxyribonucleotides.
  • DNA and RNA can be synthesized naturally (e.g., by DNA replication or transcription of DNA, respectively). RNA can be post-transcriptionally modified. DNA and RNA can also be chemically synthesized.
  • DNA and RNA can be single- stranded (i.e., ssRNA and ssDNA, respectively), or multi-stranded (e.g., double stranded, i.e., dsRNA and dsDNA, respectively), i.e., duplexed or annealed.
  • nucleic acid sequence refers to the ordering of the individual nucleotides in a DNA or RNA polymer.
  • single-molecule sequencing refers to any method of determining the sequence of an individual nucleic acid molecule without the need for prior amplification.
  • compare when used with respect to nucleic acid sequences, refers to the alignment of one or molecule nucleic acid sequences to establish a percentage identity or similarity (identity and similarity will be used interchangeably) using, for example, a mathematical algorithm. To determine the percent identity of two nucleic acid sequences (or of two amino acid sequences), the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the first sequence or second sequence for optimal alignment).
  • the nucleotides (or amino acid residues) at corresponding nucleotide (or amino acid) positions are then compared. When a position in the first sequence is occupied by the same residue as the corresponding position in the second sequence, then the molecules are identical at that position.
  • the alignment can be generated over a certain portion of the sequence (i.e., a local alignment).
  • a non-limiting example of a local alignment algorithm utilized for the comparison of sequences is the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. ScL USA 87:2264-68, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. ScL USA 90:5873-77. Such an algorithm is incorporated into the BLAST programs (version 2.0) of Altschul, et al. (1990) /. MoI. Biol. 215:403-10.
  • the alignment can be optimized by introducing appropriate gaps and percentage identity determined over the length of the aligned sequence (i.e., a gapped alignment).
  • Gapped BLAST can be utilized as described in Altschul et al., (1997) Nucleic Acids Res. 25(17):3389-3402.
  • the alignment is optimized by introducing appropriate gaps and percent identity is determined over the entire length of the sequences aligned (i.e., a global alignment).
  • a preferred, non-limiting example of a mathematical algorithm utilized for the global comparison of sequences is the algorithm of Myers and Miller, CABIOS (1989). Such an algorithm is incorporated into the ALIGN program (version 2.0) which is part of the GCG sequence alignment software package.
  • Another global alignment algorithm is that of Needleman-Wunsch, (1970) /. MoI. Biol. 48:443-453. Various aspects of the invention are described in further detail in the following subsections.
  • the present invention provides novel methods, software algorithms and devices for the rapid identification of any nucleic acid sequence or nucleic acid-containing bioagent present in a biological sample.
  • the present invention involves: a) isolating nucleic acid from a biological sample; b) sequencing the totality of nucleic acid within the sample using single-molecule sequencing technology; and c) analyzing the derived nucleic acid sequences by comparison to a database.
  • the invention provides methods for the identification of any nucleic acid sequence or nucleic acid-containing bioagent present in a biological sample.
  • a sample suspected of containing a bioagent capable of causing a disease or disorder is obtained.
  • a blood sample is obtained from a human patient suspected of having contracted an infectious, bioagent- induced disease.
  • the total nucleic acid content of the either sample is extracted from the sample by art-recognized means and subject to a single-molecule sequencing reaction.
  • the resultant nucleic acid sequence data is then searched against reference sequences in databases using a software algorithm and the predicted source of the nucleic acid reported.
  • the invention provides a physical medium that holds computer- executable instructions for identifying bioagents present in a biological sample.
  • the medium holds instructions for receiving at least one result of a single molecule sequencing reaction conducted on nucleic acid in a biological sample.
  • the medium further holds computer-executable instructions for comparing the received nucleic acid sequence obtained from the single molecule sequencing reaction to one or more reference sequences contained in a database in order to predict at least one bioagent present in the biological sample.
  • the invention provides devices for the identification of any nucleic acid sequence or nucleic acid-containing bioagent present in a biological sample.
  • a device is contacted with a sample and said device performs all the combined functions of the invention in an integrated manner i.e., nucleic acid extraction, single-molecule sequencing, database searching and source organism reporting.
  • the invention provides a means to acquire patient- specific, as well as general, population-based data concerning the genetic basis of diseases and disorders.
  • the invention provides a means to acquire gene expression analysis data indicative of a change in physiological status of an organism. In another aspect, the invention provides a means to acquire epidemiological data.
  • the invention provides methods performing pharmacogenomics. In another aspect, the invention provides a means for testing livestock animals for diseases such as foot and mouth, and mad cow disease.
  • the present invention provides methods and devices for the identification of nucleic acid molecules contained within a biological sample.
  • exemplary samples include, but are not limited to, blood, animal tissue, sputum, urine, cell culture medium, water, leaf spot, soil, plant tissue, paleontology samples, forensic samples, water, food or any form of matter capable of containing bioagents or nucleic acid.
  • Several independent sources of nucleic acid may exist in the sample.
  • human blood human DNA and RNA will be present in white blood cells, in addition to the nucleic acid present in any infectious bioagents that may be present.
  • bioagent is any organism, living or dead, or a nucleic acid derived from such an organism.
  • examples of bioagents include but are not limited to cells (including but not limited to human clinical samples, bacterial cells and other pathogens) viruses, toxin genes and bioregulating compounds). Samples may be alive or dead or in a vegetative state (for example, vegetative bacteria or spores) and may be encapsulated or bioengineered.
  • Bacterial biological warfare bioagents capable of being detected by the present methods include, but are not limited to, Bacillus anthracis (anthrax), Yersinia pestis (pneumonic plague), Franciscella tularensis (tularemia), Brucella suis, Brucella abortus, Brucella melitensis (undulant fever), Burkholderia mallei (glanders),
  • Burkholderia pseudomalleii (melioidosis), Salmonella typhi (typhoid fever), Rickettsia typhii (epidemic typhus), Rickettsia prowasekii (endemic typhus) and Coxiella burnetii (Q fever), Rhodobacter capsulatus, Chlamydia pneumoniae, Escherichia coli, Shigella dysenteriae, Shigella flexneri, Bacillus cereus, Clostridium botulinum, Coxiella burnetti, Pseudomonas aeruginosa, Legionella pneumophila, Borrelia burgdorferi (Lyme disease), and Vibrio cholerae.
  • Biological warfare fungus bioagents include, but are not limited to, coccidioides immitis (Coccidioidomycosis).
  • Biological warfare toxin genes capable of being detected by the methods of the present invention include but not limited to botulism, T- 2 mycotoxins, ricin, staph enterotoxin B, shigatoxin, abrin, aflatoxin, Clostridium perfringens epsilon toxin, conotoxins, diacetoxyscirpenol, tetrodotoxin, and saxitoxin.
  • Biological warfare viral bioagents are mostly RNA viruses (positive-strand and negative- strand), with the exception of smallpox.
  • RNA virus Every RNA virus is a family of related viruses (quasispecies). These viruses mutate rapidly and the potential for engineered strains (natural or deliberate) is very high. RNA viruses cluster into families that have conserved RNA structural domains on the viral genome (e.g., virion components, accessory proteins) and conserved housekeeping genes that encode core viral proteins including, for single strand positive strand RNA viruses, RNA-dependent RNA polymerase, double stranded RNA helicase, chymotrypsin-like and papain-like proteases and methyltransferases.
  • conserved RNA structural domains on the viral genome e.g., virion components, accessory proteins
  • conserved housekeeping genes that encode core viral proteins including, for single strand positive strand RNA viruses, RNA-dependent RNA polymerase, double stranded RNA helicase, chymotrypsin-like and papain-like proteases and methyltransferases.
  • (-)-strand RNA viruses examples include arenaviruses (e.g., sabia virus, lassa fever, Machupo, Argentine hemorrhagic fever, flexal virus), bunyaviruses (e.g., hantavirus, nairovirus, phlebovirus, hantaan virus, Congo-crimean hemorrhagic fever, rift valley fever), and mononegavirales (e.g., filovirus, paramyxovirus, ebola virus, Marburg, equine morbilli virus).
  • arenaviruses e.g., sabia virus, lassa fever, Machupo, Argentine hemorrhagic fever, flexal virus
  • bunyaviruses e.g., hantavirus, nairovirus, phlebovirus, hantaan virus, Congo-crimean hemorrhagic fever, rift valley fever
  • mononegavirales
  • (+)-strand RNA viruses include picornaviruses (e.g., coxsackievirus, echovirus, human coxsackievirus A, human echovirus, human enterovirus, human poliovirus, hepatitis A virus, human parechovirus, human rhinovirus), astroviruses (e.g., human astrovirus), calciviruses (e.g., chiba virus, chitta virus, human calcivirus, norwalk virus), nidovirales (e.g., human coronavirus, human torovirus), flaviviruses (e.g., dengue viruses, Japanese encephalitis virus, Kyanasur forest disease virus, Murray Valley encephalitis virus, Rocio virus, St.
  • picornaviruses e.g., coxsackievirus, echovirus, human coxsackievirus A, human echovirus, human enterovirus, human poliovirus, hepatitis A virus, human parechovirus
  • togaviruses e.g., Chikugunya virus, Eastern equine encephalitis virus, Mayaro virus, O'nyong-nyong virus, Ross River virus, Venezuelan equine encephalitis virus, Rubella virus, hepatitis E virus.
  • the present invention can employ at least partial purification of target nucleic acid molecules. All methods of art recognized nucleic acid extraction and purification are contemplated. Exemplary methods include those commercialized by QIAGEN or PROMEGA. Nucleic acid purification on nanoengineered surfaces, as exemplified in U.S. patent application US20060166223), is also contemplated. In cases where biological samples are desiccated, where necessary, the sample with be solublized using appropriate art recognized solvents to facilitate nucleic acid extraction. 5. Single Molecule Sequencing
  • the present invention involves nucleic sequencing at the single molecule level.
  • Several art-recognized methods of single-molecule sequencing have been developed (see U.S. patent application US2006000400730 and U.S. patents 7,169,560; 6,221,592; 6,905,586; 6,524,829; 6,242,193; 6,221,592; and 6,136,543).
  • Single molecule sequencing is a powerful tool capable of elucidating sequence-specific information on a single nucleic acid template.
  • the ability to conduct single template sequencing allows the identification of subtle, often rare event, changes in nucleic acids that are important as the underlying basis for diseases such as cancer and others.
  • Single molecule sequencing also provides the ability to rapidly analyze a multitude of single nucleic acid templates, from a single sample, in parallel and with a high degree of precision.
  • individual labeled nucleotides are added sequentially by a polymerase to a growing complement strand. A label is detected as each nucleotide is added to the strand and the template sequence is determined.
  • the invention comprises exposing a nucleic acid primer to a template sequence in the presence of a polymerase and at least one labeled nucleotide base that is capable of hybridizing with a template nucleic acid downstream of the hybridized primer.
  • Nucleotide bases may be selected from the common Watson-Crick bases, adenine, thymine, cytosine, guanine, and uracil, or may be modifications of those bases, such as peptide nucleic acids, ribonucleotides, or nucleotides modified to incorporate a detectable label (e.g., with linkers or adapters). As each nucleotide is added to the growing complement strand, its label is detected and its position on the template is noted.
  • nucleic templates include DNA, RNA and RNA/DNA hybrids.
  • the invention comprises passing a single-stranded nucleic acid through a nano-pore.
  • the ssDNA travels through the nano-pore, the ssDNA passes over 4 nano-probes each containing one of the four nucleotide bases. Each time a probe hybridizes with the ssDNA, the signal is detected and the template sequence is determined. 6.
  • the present invention provides devices for the identification of nucleic acid molecules and nucleic acid-containing bioagents contained within a biological sample.
  • the device contains an integrated means of nucleic acid purification, single molecule sequencing, and sequence analysis.
  • the device is portable, preferably handheld.
  • the device may also include a microfabricated biopsy instrument as exemplified in U.S. patent application 2003/0119176A1.
  • the device connects wirelessly to a computer.
  • the device is part of a remotely controlled vehicle.
  • the device is capable of being operated by remote control.
  • the device is disposable.
  • the device is biodegradable.
  • the device is designed and/or packaged for home use, hospital use, or police/military use.
  • the present invention allows for the acquisition of patient- specific, as well as general, population-based data concerning the genetic basis of diseases and disorders.
  • Cancer is an example of a disease or disorder that has a strong genetic basis.
  • Complete sequencing of large numbers of tumors using single molecule sequencing provides a catalog of somatic cell mutations (including, without limitation, deletions, additions, amplifications, rearrangements, substitutions, losses, translocations, methylation, and other alterations of genomic DNA) that are useful to diagnose, evaluate, prognose, and treat patients.
  • a catalog of disease-related mutations and other alterations is a powerful diagnostic tool useful to rapidly categorize samples sequenced from future patients.
  • single molecule sequencing allows one to identify previously-unknown mutations that may be associated with cancer.
  • single molecule sequencing on pooled samples allows rapid identification of deletions, amplifications, and other changes that are indicative of cancer, even if the specific mutational change is not known.
  • tumor DNA is obtained and prepared using standard methods. Approximately 10 times coverage of each genomic region is sequenced. Using single molecule sequencing, the genome of the cancer tissue is rapidly sequenced. Mutations, insertions, deletions, rearrangements, and other alterations present in the tumor DNA are detected. Sequence assembly is accomplished using standard alignment techniques, such as BLAST (www.ncibi.nlm.nih.gov), incorporated by reference herein. Tumor sequences are compared to known sequences for either normal or cancer tissue or to consensus sequences in order to identify changes associated with cancer. Newly discovered genomic changes (i.e., those not previously associated with cancer) are cataloged and become known to be associated with a particular disease over time. Thus, patients are rapidly and accurately diagnosed based upon their individual genomic complement, either before or at the time of symptomatic- presentation of a disease.
  • DNA is isolated from a patient's tumor or other diseased sample and is compared to normal DNA from the same patient.
  • Whole genome sequencing of both the tumor and normal DNA may be done rapidly on a parallel basis using single molecule sequencing as described above.
  • Genome portions of interest include, for example, sequences associated with a known or candidate tumor suppressor gene or oncogene, or intronic sequences containing repeats that are susceptible to amplification by defective cellular machinery.
  • sequence determination a comparison is made between tumor and normal sequence. Differences between the tumor and normal sequences are identified as tumor-related mutations. In effect, any difference between the two likely is indicative of disease because all somatic cells should have the same sequence.
  • patient tumor sequence may be compared to a normal banked or consensus sequence instead of the patient's own normal DNA.
  • broad-based disease susceptibility testing is performed using single molecule sequencing on pooled genomic samples.
  • the number of positive samples i.e., those with a mutation present
  • Bulk sequencing likely would not detect mutations in pooled samples.
  • any positive sample is detected with digital precision.
  • genomic samples from a predetermined number of patients are collected, pooled and sequenced using single molecule sequencing techniques as described above.
  • Single molecule sequencing is done through large tracts of the genome, and mutations derived from any source are detected in the pooled sample.
  • Deviations are detected using single molecule sequencing with fewer cells than in bulk sequencing because individual DNA molecules are sequenced instead of an amalgam of cells that typically provide the basis for bulk sequencing assays as, for example, in assays for loss of heterozygosity.
  • data from a pooled experiment is useful for determining the frequency and distribution of mutations in a given population, without identifying the owners of specific mutations.
  • the rapid results provided by the invention also allow sequencing to detect familial mutations. For example, if it is determined that a patient has a mutation indicative of a cancer, certain forms of which have a strong familial link (e.g., breast cancer, colon cancer), primary siblings typically are not tested unless specified criteria are met. Single molecule sequencing not only identifies the underlying mutation in the primary patient, but allows rapid, cost-effective sequencing of relatives who also might carry the mutation.
  • Tumor typing may involve determining a genetic profile for a particular patient's tumor in order to guide treatment or other decisions.
  • the standard treatment for patients with colon cancer is the drug 5-Fluorouracil (5FU).
  • 5FU works to reduce tumors in many colon cancer patients, it actually accelerates tumor growth in a class of patients who have Hereditary Non-Polyposis Colorectal Cancer (HNPCC).
  • HNPCC is a familial form of colon cancer with a distinct genetic profile that is ascertainable by sequencing cellular DNA.
  • it is particularly important to know a colon cancer patient's genetic profile in order to determine the most effective treatment for that patient.
  • Single molecule sequencing is useful to make that determination because it is rapid, reliable, and effectively digital, therefore promptly indicates the presence or absence of the relevant genetic event(s).
  • Methods of the invention make possible the rapid and accurate identification of tumor-related mutations, thus an appropriate treatment may be selected or an inappropriate treatment avoided.
  • the invention provides gene expression analysis data.
  • Alteration in expression constructs is often indicative of a change in physiological status. Changes in expression patterns reflect cellular activities as well as disease state. Expression sequence analysis provides insight into the specialized activities of cells from different organs or of different types. Thus, expression analysis reveals aspects of the immune repertoire that are not apparent on a gross level. In one embodiment of the invention, a sequence determination is made with respect to the total antibody repertoire expressed by B-cells. In another embodiment of the invention, a sequence determination is made with respect to the T-cell receptor repertoire expressed by T-cells. Single molecule sequencing offers rapid, high-throughput sequencing that reveals specific detail as to which immune cells are active, and the likely epitopes against which they function.
  • Single molecule sequencing also provides an immune fingerprint that is used to identify an infection based upon the specifics of a patient's immune response.
  • the immune fingerprint generated using single molecule sequencing is compared to a database of collected immune sequence data in order to identify an infection. New infections are tracked through the appearance of new sequence specificities either alone or in combination with other diagnostic techniques. Isolation of immune cells is well- known in the art, and application of the present invention to sequencing a patient's immune cell complement is contemplated by the present invention.
  • the invention also provides epidemiological data.
  • an appropriate patient sample is obtained and DNA in the sample is sequenced.
  • the patient's genomic DNA is excluded.
  • a catalog is compiled comprising a fingerprint of the DNA (or RNA in other preferred embodiments) present in samples obtained from a multiplicity of patients.
  • Each patient's disease status then is correlated with specific sequence information obtained from the patient's sample. In this way, diagnostic accuracy and verifiability is improved, as a patient's disease status is confirmed by comparing the patient's DNA to sequences in the database.
  • whole genome sequencing is optional. In some circumstances, it is necessary only to sequence sufficient nucleic acid to establish a fingerprint for comparison with future samples. In one embodiment, ubiquitous epidemiology is performed.
  • patient DNA is routinely sequenced and stored for disease identification and comparison with future samples to identify and track new disease outbreaks.
  • a patient who presents with a new DNA profile i.e., containing a sequence that is not in the database
  • Future patients presenting with the same nucleic acid profile are tracked.
  • potential epidemic outbreaks are controlled.
  • no a priori assumptions are necessary. A novel sequence will immediately be identified as such, and appropriate monitoring can be put in place.
  • the invention provides methods and devices for performing pharmacogenomics. Differences in metabolism of therapeutics can lead to severe toxicity or therapeutic failure by altering the relation between dose and blood concentration of the pharmacologically active drug. Thus, a physician or clinician may consider applying knowledge obtained in relevant pharmacogenomics studies in determining whether to administer a therapeutic agent as well as tailoring the dosage and/or therapeutic regimen of treatment with a therapeutic agent.
  • Pharmacogenomics deals with clinically significant hereditary variations in the response to drugs due to altered drug disposition and abnormal action in affected persons. See, for example, Eichelbaum, M. et al. (1996) Clin. Exp. Pharmacol. Physiol. 23(10-11): 983-985 and Linder, M.W. et al. (1997) Clin. Chem. 43(2):254-266
  • the methods of the invention provide information regarding patient genome sequence which is used to select patients or patient subpopulations for treatment with FDA-approved therapies e.g., antibody, small molecule or peptide therapies.
  • FIG. 1-3 discuss aspects of the hardware and software environment utilized by the present invention to perform the bioagent prediction.
  • FIG. 1 depicts an environment suitable for practicing an embodiment of the present invention.
  • a computing device 102 holds a database 104 or other storage structure containing reference sequences 105 and an analysis facility 106.
  • the computing device 102 may be a server, workstation, laptop, personal computer, PDA or other computing device equipped with one or more processors and able to execute the analysis facility 106 discussed herein.
  • the analysis facility 106 is preferably implemented in software although in an alternate implementation, the logic may be also be implemented in hardware.
  • the analysis facility 106 operates on and analyzes results of single molecule sequencing reactions 122 that are received from a biological sample acquisition apparatus 120.
  • the biological sample acquisition apparatus conducts single molecule sequencing operations on nucleic acid isolated from a biological sample.
  • the biological sample acquisition apparatus 120 is a handheld device in wireless communication with the computing device 102.
  • the analysis facility 106 programmatically compares the results of the single molecule sequencing reaction 122 to the reference sequences 105 contained in the database 104 in order to generate a listing of predicted bioagents 144 that are present in the biological sample under consideration.
  • the comparison of the results of the single molecule sequencing operation 122 to the reference sequences 105 in order to predict bioagents present in a biological sample is performed programmatically without any user input.
  • the analysis facility 106 prompts a user for parameters controlling the comparison via the user interface 142.
  • the listing of the predicted bioagents 144 may be displayed to a user via a user interface 142 displayed on a display device 140 that is in communication with the computing device 102. It will be appreciated that the listing of predicted bioagents 144 may also be stored for later use and/or display to a user.
  • the user interface 142 may also be utilized to enable a user to configure the parameters of the comparison operation performed by the analysis facility 106. Those skilled in the art will recognize that many other configurations are also possible within the scope of the present invention.
  • Figure 2 depicts an alternative distributed environment 200 suitable for practicing an embodiment of the present invention.
  • a first computing device 202 may be used to execute an analysis facility 204.
  • the first computing device 202 may communicate over a network 250 with a second computing device 210 holding reference sequences 212.
  • the network 250 may be the Internet, a local area network (LAN), a wide area network (WAN), an intranet, an internet, a wireless network or some other type of network over which the first computing device 202 and the second computing device 210 can communicate.
  • the analysis facility 204 on the first computing device 202 may communicate over the network 250 with a biological sample acquisition apparatus 230 that generates results data 232 from a single molecule sequencing reaction performed on nucleic acid isolated from a biological sample.
  • the analysis facility 204 may store a listing of predicted bioagents that is generated by a comparison of the results of the single molecule sequencing reaction and the reference sequences 212. The storage may occur on the first computing device 202 or at a location remote from the first computing device that is accessible over the network 250.
  • Figure 3 is a flowchart of a sequence of steps that may be followed by an embodiment of the present invention to predict bioagents present in a biological sample.
  • the sequence begins by providing a biological sample (step 302).
  • the sample may be a previously acquired sample or may be a sample that is obtained immediately in advance of the bioagent prediction process that is discussed herein being performed.
  • Nucleic acid is then isolated from the biological sample (step 304) and a single molecule sequencing reaction is conducted on the isolated nucleic acid (step 306) as discussed above.
  • the results of the single molecule sequencing reaction are compared to reference sequences (step 308) and a listing of predicted bioagents that are present in the biological sample is generated.
  • the listing of predicted bioagents may then be displayed to a user or stored for later retrieval (step 310).
  • Embodiments of the present invention may be provided as one or more computer-readable programs embodied on or in one or more mediums.
  • the mediums may be a floppy disk, a hard disk, a compact disc, a digital versatile disc, a flash memory card, a PROM, an MRAM, a RAM, a ROM, or a magnetic tape.
  • the computer-readable programs may be implemented in any programming language. Some examples of languages that can be used include FORTRAN, C, C++, C#, Python, Perl or Java.
  • the software programs may be stored on or in one or more mediums as object code.
  • Hardware acceleration may be used and all or a portion of the code may run on a FPGA, an Application Specific Integrated Processor (ASIP), or an Application Specific Integrated Circuit (ASIC).
  • the code may run in a virtualized environment such as in a virtual machine. Multiple virtual machines running the code may be resident on a single processor.
  • the methods and devices disclosed herein can be used to screen fetal mRNA or DNA, present in maternal blood, for disease-associated mutations as an alternative to amniocentesis.
  • the invention provides nucleic acid sequence information for making diagnostic kits, or chips.
  • the nucleic acid sequence information or methodology disclosed herein can be used for forensic applications.
  • the methods and devices disclosed herein can be used for research purposes, for example genetic research on the distribution or migration of human populations.
  • the methods and devices disclosed herein can be used in paleontology, for example to identify and catalogue nucleic acid sequences contained in ancient biological samples.
  • the methods and devices disclosed herein can be used for environmental analysis to determine the bioagent profile of a particular of ecosystem.
  • the methods and devices disclosed herein can be used in agriculture.
  • the methods and devices of the invention are used to determine the bioagent profile of soil.
  • the methods and devices of the invention are used to determine the nucleic acid sequences present in plant samples and thereby assess whether they have been infected with disease-causing bioagents or have been modified by genetic engineering.
  • the methods and devices disclosed herein are used to determine genetic fingerprinting information about a subject and thereby uniquely identify them.
  • the invention provides business methods for commercializing nucleic acid sequences suitable for use in, for example, the making of devices, diagnostic chips, kits, networks, and pharmaceuticals for diagnosing and treating disease.
  • the practice of the present invention can employ, unless otherwise indicated, conventional techniques of chemistry, molecular biology, recombinant DNA technology, PCR technology, immunology, cell culture, and any necessary computer or electronic related technology that are within the skill of the art and are explained in the literature. See, e.g., Sambrook, Fritsch and Maniatis, Molecular Cloning: Cold Spring Harbor Laboratory Press (1989); DNA Cloning, VoIs. 1 and 2, (D.N. Glover, Ed. 1985); Oligonucleotide Synthesis (MJ. Gait, Ed. 1984); PCR Handbook Current Protocols in Nucleic Acid Chemistry, Beaucage, Ed.
  • the following example describes a novel method for determining the presence of an unknown bioagent in a patient sample.
  • a patient sample is obtained. If necessary several types of sample encompassing all potential areas of infection may be obtained and processed together.
  • Nucleic acid is extracted from the sample(s) using art recognized means. A single- nucleotide sequencing of the nucleic acid is then performed to determine the sequences of nucleic acids present in the sample. The deduced nucleic acid sequence data is compared against databases of known nucleic acid sequences, for example, using a mathematical algorithm and the percentage identity with known sequences is reported.
  • a DEVICE FOR DETECTING A BIOAGENT IN A PATIENT SUSPECTED OF HAVING CONTRACTED AN INFECTIOUS BIOAGENT-INDUCED DISEASE The following example describes a novel device for determining the presence of an unknown bioagent in a patient sample.
  • a patient sample is obtained. If necessary several types of sample encompassing all potential areas of infection may be obtained and processed together.
  • the sample is contacted with a device which proceeds to perform the following steps in an integrated operation: a) isolate nucleic acid from the sample in sufficient purity to perform single - nucleotide sequencing; b) perform single-nucleotide sequencing to determine the sequences of all nucleic acids isolated from the sample; c) compare the deduced nucleic acid sequences against databases of known nucleic acid sequences using, for example, a mathematical algorithm to determine the percentage identity of the deduced nucleic acid with known sequences; d) report the best sequence matches for all known (infectious) bioagents.

Abstract

A processing technique, associated method, product description, and related software are disclosed for achieving rapid identification of DNA from a single or multiple organisms contained within an organic sample such as blood, tissue, sputum, urine, cell culture, water, leaf spot, or any other form suitable for containing DNA. A biological sample is provided in a container from which the DNA contained within the sample is isolated and purified. The purified DNA is then sequenced using a single- molecule DNA sequencing technique. The resulting DNA sequences are identified by comparing the sequences to a DNA database. The resulting database matches are then reported.

Description

AUTOMATED METHOD AND DEVICE FOR DNA ISOLATION, SEQUENCE DETERMINATION, AND IDENTIFICATION
Related Information This application claims the benefit of U.S. Provisional Patent Application Serial
No. 60/931,285, filed May 22, 2007, the contents of this and any patents, patent applications, references, cited throughout this specification are hereby incorporated by reference in their entireties.
Background of the Invention
The rapid identification of the nucleic acid sequences present in a complex biological sample has many practical applications. For example, the ability to rapidly identify the presence of pathogens in a biological sample, via their DNA or RNA signature, would be of enormous importance for the identification of hazardous bioagents or the diagnosis of disease in human patients.
The majority of current methods for pathogen identification require specimen culturing or detection with pathogen-specific antibodies, both of which are not possible for all types of infections. Molecular diagnostic methods involve detecting the hybridization of pathogen DNA or RNA present in the sample to known probes using DNA chips. Such methods are limited to the detection of known pathogens thus, as pathogens mutate, the pathogenic DNA may no longer hybridize to existing probes and new probes must be developed. Alternative methods of pathogen identification include nucleic acid sequencing of DNA or RNA present in the sample. However, current sequencing methodologies for pathogen identification are based on Sanger DNA sequencing which requires both amplification of the target nucleic acid and allows only a single nucleotide sequence to be identified from each sequencing reaction. Sanger sequencing is performed on a single known DNA fragment of interest. Thus, amplification and sequencing of the target nucleic acid implies a priori knowledge of the pathogen contained within the sample. Moreover, none of these current detection methods are capable of seamless, integrated operation.
There is therefore a need in the art for alternative methods and devices for the rapid identification of nucleic acid sequences present in biological samples.
Summary of the Invention The present invention provides novel methods, software and devices for the rapid identification of any nucleic acid sequence or nucleic acid-containing bioagent present in a biological sample. The present invention involves: a) isolating nucleic acid from a biological sample; b) sequencing the nucleic acid within the sample using single- molecule sequencing technology; and c) analyzing the derived nucleic acid sequences by comparison to reference sequence(s), for example, in a database.
The present invention has many uses in areas that would require a rapid and integrated molecular diagnostic identification system. The present invention allows extremely rapid and accurate detection and identification of bioagents compared to existing methods. Furthermore, this rapid detection and identification is possible even when sample material is impure. Thus, the invention is useful in a wide variety of fields, including, but not limited to, medical diagnosis and pharmacogenetic analysis (including: diagnosis of infectious diseases and conditions; cancer diagnosis based on mutations and polymorphisms; drug resistance and susceptibility testing; screening for and/or diagnosis of genetic diseases and conditions), germ warfare (allowing immediate identification of the bioagent and appropriate treatment), environmental testing (e.g., detection and discrimination of pathogenic vs. non-pathogenic bacteria in soil, water or other samples), agricultural testing (e.g., detection of livestock infection, produce contamination), veterinary testing, and forensics (e.g., rapid detection of bioagents for molecular fingerprinting).
The present invention can be used to detect and classify any bioagent containing nucleic acid (e.g., DNA), including bacteria, viruses, fungi and toxins. As one example, where the bioagent is a biological threat, the information obtained is used to determine practical information needed for countermeasures, including toxin genes, pathogenicity islands and antibiotic resistance genes. In addition, the methods can be used to identify natural or deliberate engineering events including chromosome fragment swapping, molecular breeding (gene shuffling), DNA mutations (preventing DNA chip or primer hybridization) and emerging infectious diseases. Accordingly, the invention has several advantages that include, but are not limited to, the following, providing integrated methods for the rapid identification of any nucleic acid sequence or nucleic acid-containing biological organisms present in a complex biological sample directly from the sample without the need for amplification of the nucleic acid; providing software for the identifying the source organism of any deduced nucleic acid sequence; and providing devices capable of performing the integrated processing of complex biological samples to determine the identity and predicted source of any nucleic acid present.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims. Brief Description of the Figures
Figure 1. Depicts an environment suitable for practicing an embodiment of the present invention;
Figure 2. Depicts an alternative distributed environment suitable for practicing an embodiment of the present invention;
Figure 3. Depicts a flowchart of a sequence of steps that may be followed by an embodiment of the present invention to predict bioagents present in a nucleic acid sequence isolated from a biological sample and subjected to a single molecule sequencing operation.
Detailed Description of the Invention
In order to provide a clear understanding of the specification and claims, the following definitions are provided below.
Definitions So that the invention may be more readily understood, certain terms are first defined.
The term "bioagent" refers to any organism, living or dead, or a nucleic acid derived from such an organism. Examples of bioagents include but are not limited to cells (including but not limited to human clinical samples, bacterial cells and other pathogens) viruses, toxin genes and bioregulating compounds). Samples may be alive or dead or in a vegetative state (for example, vegetative bacteria or spores) and may be encapsulated or bioengineered.
The term "sample" refers to any form of matter capable of containing a bioagent. Examples of samples include, but are not limited to, blood, animal tissue, sputum, urine, cell culture medium, water, leaf spot, soil, plant tissue, paleontology samples, forensic samples, water, food, and powders.
The term "nucleic acid" and "single-stranded nucleic acid" refers to RNA or RNA containing molecules as well as DNA or DNA containing molecules. The term RNA refers to a polymer of ribonucleotides. The term "DNA" or "DNA molecule" or deoxyribonucleic acid molecule" refers to a polymer of deoxyribonucleotides. DNA and RNA can be synthesized naturally (e.g., by DNA replication or transcription of DNA, respectively). RNA can be post-transcriptionally modified. DNA and RNA can also be chemically synthesized. DNA and RNA can be single- stranded (i.e., ssRNA and ssDNA, respectively), or multi-stranded (e.g., double stranded, i.e., dsRNA and dsDNA, respectively), i.e., duplexed or annealed.
The term "nucleic acid sequence" refers to the ordering of the individual nucleotides in a DNA or RNA polymer. The term "single-molecule sequencing" refers to any method of determining the sequence of an individual nucleic acid molecule without the need for prior amplification. The term "compare", when used with respect to nucleic acid sequences, refers to the alignment of one or molecule nucleic acid sequences to establish a percentage identity or similarity (identity and similarity will be used interchangeably) using, for example, a mathematical algorithm. To determine the percent identity of two nucleic acid sequences (or of two amino acid sequences), the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the first sequence or second sequence for optimal alignment). The nucleotides (or amino acid residues) at corresponding nucleotide (or amino acid) positions are then compared. When a position in the first sequence is occupied by the same residue as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % homology = # of identical positions (+ # of substitutions for bases or amino acids)/total # of positions x 100), optionally penalizing the score for the number of gaps introduced and/or length of gaps introduced. The alignment can be generated over a certain portion of the sequence (i.e., a local alignment). A non-limiting example of a local alignment algorithm utilized for the comparison of sequences is the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. ScL USA 87:2264-68, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. ScL USA 90:5873-77. Such an algorithm is incorporated into the BLAST programs (version 2.0) of Altschul, et al. (1990) /. MoI. Biol. 215:403-10. The alignment can be optimized by introducing appropriate gaps and percentage identity determined over the length of the aligned sequence (i.e., a gapped alignment). To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., (1997) Nucleic Acids Res. 25(17):3389-3402. In another embodiment, the alignment is optimized by introducing appropriate gaps and percent identity is determined over the entire length of the sequences aligned (i.e., a global alignment). A preferred, non-limiting example of a mathematical algorithm utilized for the global comparison of sequences is the algorithm of Myers and Miller, CABIOS (1989). Such an algorithm is incorporated into the ALIGN program (version 2.0) which is part of the GCG sequence alignment software package. Another global alignment algorithm is that of Needleman-Wunsch, (1970) /. MoI. Biol. 48:443-453. Various aspects of the invention are described in further detail in the following subsections.
/. Overview The present invention provides novel methods, software algorithms and devices for the rapid identification of any nucleic acid sequence or nucleic acid-containing bioagent present in a biological sample. The present invention involves: a) isolating nucleic acid from a biological sample; b) sequencing the totality of nucleic acid within the sample using single-molecule sequencing technology; and c) analyzing the derived nucleic acid sequences by comparison to a database.
In one aspect, the invention provides methods for the identification of any nucleic acid sequence or nucleic acid-containing bioagent present in a biological sample. In one embodiment, a sample suspected of containing a bioagent capable of causing a disease or disorder is obtained. In a related embodiment, a blood sample is obtained from a human patient suspected of having contracted an infectious, bioagent- induced disease. The total nucleic acid content of the either sample is extracted from the sample by art-recognized means and subject to a single-molecule sequencing reaction. The resultant nucleic acid sequence data is then searched against reference sequences in databases using a software algorithm and the predicted source of the nucleic acid reported.
In another aspect, the invention provides a physical medium that holds computer- executable instructions for identifying bioagents present in a biological sample. The medium holds instructions for receiving at least one result of a single molecule sequencing reaction conducted on nucleic acid in a biological sample. The medium further holds computer-executable instructions for comparing the received nucleic acid sequence obtained from the single molecule sequencing reaction to one or more reference sequences contained in a database in order to predict at least one bioagent present in the biological sample.
In another aspect, the invention provides devices for the identification of any nucleic acid sequence or nucleic acid-containing bioagent present in a biological sample. In one embodiment a device is contacted with a sample and said device performs all the combined functions of the invention in an integrated manner i.e., nucleic acid extraction, single-molecule sequencing, database searching and source organism reporting.
In another aspect, the invention provides a means to acquire patient- specific, as well as general, population-based data concerning the genetic basis of diseases and disorders.
In another aspect, the invention provides a means to acquire gene expression analysis data indicative of a change in physiological status of an organism. In another aspect, the invention provides a means to acquire epidemiological data.
In another aspect, the invention provides methods performing pharmacogenomics. In another aspect, the invention provides a means for testing livestock animals for diseases such as foot and mouth, and mad cow disease.
2. Selecting a Biological Sample
The present invention provides methods and devices for the identification of nucleic acid molecules contained within a biological sample. Exemplary samples include, but are not limited to, blood, animal tissue, sputum, urine, cell culture medium, water, leaf spot, soil, plant tissue, paleontology samples, forensic samples, water, food or any form of matter capable of containing bioagents or nucleic acid. Several independent sources of nucleic acid may exist in the sample. In the case of human blood, human DNA and RNA will be present in white blood cells, in addition to the nucleic acid present in any infectious bioagents that may be present.
3. Bioagents
The present invention provides methods and devices for the identification of bioagents via the presence of their nucleic acids. In the context of the present invention, a "bioagent" is any organism, living or dead, or a nucleic acid derived from such an organism. Examples of bioagents include but are not limited to cells (including but not limited to human clinical samples, bacterial cells and other pathogens) viruses, toxin genes and bioregulating compounds). Samples may be alive or dead or in a vegetative state (for example, vegetative bacteria or spores) and may be encapsulated or bioengineered.
Bacterial biological warfare bioagents capable of being detected by the present methods include, but are not limited to, Bacillus anthracis (anthrax), Yersinia pestis (pneumonic plague), Franciscella tularensis (tularemia), Brucella suis, Brucella abortus, Brucella melitensis (undulant fever), Burkholderia mallei (glanders),
Burkholderia pseudomalleii (melioidosis), Salmonella typhi (typhoid fever), Rickettsia typhii (epidemic typhus), Rickettsia prowasekii (endemic typhus) and Coxiella burnetii (Q fever), Rhodobacter capsulatus, Chlamydia pneumoniae, Escherichia coli, Shigella dysenteriae, Shigella flexneri, Bacillus cereus, Clostridium botulinum, Coxiella burnetti, Pseudomonas aeruginosa, Legionella pneumophila, Borrelia burgdorferi (Lyme disease), and Vibrio cholerae.
Biological warfare fungus bioagents include, but are not limited to, coccidioides immitis (Coccidioidomycosis). Biological warfare toxin genes capable of being detected by the methods of the present invention include but not limited to botulism, T- 2 mycotoxins, ricin, staph enterotoxin B, shigatoxin, abrin, aflatoxin, Clostridium perfringens epsilon toxin, conotoxins, diacetoxyscirpenol, tetrodotoxin, and saxitoxin. Biological warfare viral bioagents are mostly RNA viruses (positive-strand and negative- strand), with the exception of smallpox. Every RNA virus is a family of related viruses (quasispecies). These viruses mutate rapidly and the potential for engineered strains (natural or deliberate) is very high. RNA viruses cluster into families that have conserved RNA structural domains on the viral genome (e.g., virion components, accessory proteins) and conserved housekeeping genes that encode core viral proteins including, for single strand positive strand RNA viruses, RNA-dependent RNA polymerase, double stranded RNA helicase, chymotrypsin-like and papain-like proteases and methyltransferases.
Examples of (-)-strand RNA viruses include arenaviruses (e.g., sabia virus, lassa fever, Machupo, Argentine hemorrhagic fever, flexal virus), bunyaviruses (e.g., hantavirus, nairovirus, phlebovirus, hantaan virus, Congo-crimean hemorrhagic fever, rift valley fever), and mononegavirales (e.g., filovirus, paramyxovirus, ebola virus, Marburg, equine morbilli virus).
Examples of (+)-strand RNA viruses include picornaviruses (e.g., coxsackievirus, echovirus, human coxsackievirus A, human echovirus, human enterovirus, human poliovirus, hepatitis A virus, human parechovirus, human rhinovirus), astroviruses (e.g., human astrovirus), calciviruses (e.g., chiba virus, chitta virus, human calcivirus, norwalk virus), nidovirales (e.g., human coronavirus, human torovirus), flaviviruses (e.g., dengue viruses, Japanese encephalitis virus, Kyanasur forest disease virus, Murray Valley encephalitis virus, Rocio virus, St. Louis encephalitis virus, West Nile virus, yellow fever virus, hepatitis c virus) and togaviruses (e.g., Chikugunya virus, Eastern equine encephalitis virus, Mayaro virus, O'nyong-nyong virus, Ross River virus, Venezuelan equine encephalitis virus, Rubella virus, hepatitis E virus).
4. Nucleic Acid Extraction
The present invention can employ at least partial purification of target nucleic acid molecules. All methods of art recognized nucleic acid extraction and purification are contemplated. Exemplary methods include those commercialized by QIAGEN or PROMEGA. Nucleic acid purification on nanoengineered surfaces, as exemplified in U.S. patent application US20060166223), is also contemplated. In cases where biological samples are desiccated, where necessary, the sample with be solublized using appropriate art recognized solvents to facilitate nucleic acid extraction. 5. Single Molecule Sequencing
The present invention involves nucleic sequencing at the single molecule level. Several art-recognized methods of single-molecule sequencing have been developed (see U.S. patent application US2006000400730 and U.S. patents 7,169,560; 6,221,592; 6,905,586; 6,524,829; 6,242,193; 6,221,592; and 6,136,543). Single molecule sequencing is a powerful tool capable of elucidating sequence-specific information on a single nucleic acid template. The ability to conduct single template sequencing allows the identification of subtle, often rare event, changes in nucleic acids that are important as the underlying basis for diseases such as cancer and others.
Single molecule sequencing also provides the ability to rapidly analyze a multitude of single nucleic acid templates, from a single sample, in parallel and with a high degree of precision. Using an isolated nucleic acid sequence as the substrate, individual labeled nucleotides are added sequentially by a polymerase to a growing complement strand. A label is detected as each nucleotide is added to the strand and the template sequence is determined.
In one embodiment, the invention comprises exposing a nucleic acid primer to a template sequence in the presence of a polymerase and at least one labeled nucleotide base that is capable of hybridizing with a template nucleic acid downstream of the hybridized primer. Nucleotide bases may be selected from the common Watson-Crick bases, adenine, thymine, cytosine, guanine, and uracil, or may be modifications of those bases, such as peptide nucleic acids, ribonucleotides, or nucleotides modified to incorporate a detectable label (e.g., with linkers or adapters). As each nucleotide is added to the growing complement strand, its label is detected and its position on the template is noted. Once a sufficient number of nucleotides have been incorporated, a sequence is determined. Methods of the invention facilitate rapid whole genome sequencing. Methods of the invention, however, also contemplate partial genome sequencing to obtain template or fingerprint sequences, thereby facilitating even more rapid sequence comparisons. Suitable nucleic templates include DNA, RNA and RNA/DNA hybrids.
In another embodiment, the invention comprises passing a single-stranded nucleic acid through a nano-pore. As the ssDNA travels through the nano-pore, the ssDNA passes over 4 nano-probes each containing one of the four nucleotide bases. Each time a probe hybridizes with the ssDNA, the signal is detected and the template sequence is determined. 6. Devices
In another aspect, the present invention provides devices for the identification of nucleic acid molecules and nucleic acid-containing bioagents contained within a biological sample. In one embodiment, the device contains an integrated means of nucleic acid purification, single molecule sequencing, and sequence analysis.
Embodiments where any one or more of the aforementioned functions are performed outside or remote from the device are also contemplated. In another embodiment the device is portable, preferably handheld. In another embodiment, the device may also include a microfabricated biopsy instrument as exemplified in U.S. patent application 2003/0119176A1. In another embodiment, the device connects wirelessly to a computer. In another embodiment, the device is part of a remotely controlled vehicle. In another embodiment, the device is capable of being operated by remote control. In another embodiment, the device is disposable. In another embodiment the device is biodegradable. In another embodiment, the device is designed and/or packaged for home use, hospital use, or police/military use.
7. Genomic DNA Analysis
In another aspect, the present invention allows for the acquisition of patient- specific, as well as general, population-based data concerning the genetic basis of diseases and disorders. Cancer is an example of a disease or disorder that has a strong genetic basis. Complete sequencing of large numbers of tumors using single molecule sequencing provides a catalog of somatic cell mutations (including, without limitation, deletions, additions, amplifications, rearrangements, substitutions, losses, translocations, methylation, and other alterations of genomic DNA) that are useful to diagnose, evaluate, prognose, and treat patients. A catalog of disease-related mutations and other alterations is a powerful diagnostic tool useful to rapidly categorize samples sequenced from future patients. Moreover, single molecule sequencing allows one to identify previously-unknown mutations that may be associated with cancer. Finally, single molecule sequencing on pooled samples allows rapid identification of deletions, amplifications, and other changes that are indicative of cancer, even if the specific mutational change is not known.
Analysis of genomic DNA using single molecule sequencing provides an approach that allows rapid identification of a genomic change present in a sample in low amounts. The ability to quickly and accurately perform rare-event detection is of great significance for the early diagnosis of cancer. Many cancers, if detected early, are treatable, and if detected too late may not be treatable. Cancer begins as somatic cell mutations accumulate in a very small initial population of cells. In samples typically obtained for genomic analysis, cancer or precancer cells are in very low abundance compared to healthy somatic cells. Bulk mutation detection mechanisms typically fail to detect these rare event changes. A digital technique, such as single molecule sequencing, allows the sequencing through mutations in multiple single templates rapidly. This, in turn, allows the detection of the rare-event mutations underlying cancer or precancer. In one embodiment of the invention, tumor DNA is obtained and prepared using standard methods. Approximately 10 times coverage of each genomic region is sequenced. Using single molecule sequencing, the genome of the cancer tissue is rapidly sequenced. Mutations, insertions, deletions, rearrangements, and other alterations present in the tumor DNA are detected. Sequence assembly is accomplished using standard alignment techniques, such as BLAST (www.ncibi.nlm.nih.gov), incorporated by reference herein. Tumor sequences are compared to known sequences for either normal or cancer tissue or to consensus sequences in order to identify changes associated with cancer. Newly discovered genomic changes (i.e., those not previously associated with cancer) are cataloged and become known to be associated with a particular disease over time. Thus, patients are rapidly and accurately diagnosed based upon their individual genomic complement, either before or at the time of symptomatic- presentation of a disease.
In another embodiment of the invention, DNA is isolated from a patient's tumor or other diseased sample and is compared to normal DNA from the same patient. Whole genome sequencing of both the tumor and normal DNA may be done rapidly on a parallel basis using single molecule sequencing as described above. Alternatively, only portions of the genome are sequenced and compared. Genome portions of interest include, for example, sequences associated with a known or candidate tumor suppressor gene or oncogene, or intronic sequences containing repeats that are susceptible to amplification by defective cellular machinery. Following sequence determination, a comparison is made between tumor and normal sequence. Differences between the tumor and normal sequences are identified as tumor-related mutations. In effect, any difference between the two likely is indicative of disease because all somatic cells should have the same sequence. Detection of a variation from the normal somatic cell sequence, indicating that a population of cells containing abnormal sequences is present, results in a positive diagnosis. Alternatively, patient tumor sequence may be compared to a normal banked or consensus sequence instead of the patient's own normal DNA.
In another embodiment broad-based disease susceptibility testing is performed using single molecule sequencing on pooled genomic samples. For example, in a large population, the number of positive samples (i.e., those with a mutation present) is relatively small. Bulk sequencing likely would not detect mutations in pooled samples. Using high-resolution single molecule sequencing, however, any positive sample is detected with digital precision. Thus, according to the invention, genomic samples from a predetermined number of patients (the number of patients does not matter for purposes of the invention) are collected, pooled and sequenced using single molecule sequencing techniques as described above. Single molecule sequencing is done through large tracts of the genome, and mutations derived from any source are detected in the pooled sample. To determine the source of a mutation or mutations, the original collection of individual patient samples is divided in half, re-pooled, and resequenced. This process continues until a unique identification of the affected patient or patients is possible. Due to the rapidity of single molecule sequencing, it is possible to perform multiple sequencing steps in a matter of minutes, hours or days. Using single molecule sequencing, pooled sequences, when compared to a consensus sequence, readily identify losses or amplifications in genomic DNA. All somatic cells will have not only the same sequence but will also be present in the same amounts. Deviations are detected using single molecule sequencing with fewer cells than in bulk sequencing because individual DNA molecules are sequenced instead of an amalgam of cells that typically provide the basis for bulk sequencing assays as, for example, in assays for loss of heterozygosity. In a related embodiment, data from a pooled experiment is useful for determining the frequency and distribution of mutations in a given population, without identifying the owners of specific mutations.
The rapid results provided by the invention also allow sequencing to detect familial mutations. For example, if it is determined that a patient has a mutation indicative of a cancer, certain forms of which have a strong familial link (e.g., breast cancer, colon cancer), primary siblings typically are not tested unless specified criteria are met. Single molecule sequencing not only identifies the underlying mutation in the primary patient, but allows rapid, cost-effective sequencing of relatives who also might carry the mutation.
The invention is also useful to perform tumor typing. Tumor typing may involve determining a genetic profile for a particular patient's tumor in order to guide treatment or other decisions. For example, the standard treatment for patients with colon cancer is the drug 5-Fluorouracil (5FU). Although 5FU works to reduce tumors in many colon cancer patients, it actually accelerates tumor growth in a class of patients who have Hereditary Non-Polyposis Colorectal Cancer (HNPCC). HNPCC is a familial form of colon cancer with a distinct genetic profile that is ascertainable by sequencing cellular DNA. Thus, to avoid tumor acceleration in potential HNPCC patients, it is particularly important to know a colon cancer patient's genetic profile in order to determine the most effective treatment for that patient. Single molecule sequencing is useful to make that determination because it is rapid, reliable, and effectively digital, therefore promptly indicates the presence or absence of the relevant genetic event(s). Methods of the invention make possible the rapid and accurate identification of tumor-related mutations, thus an appropriate treatment may be selected or an inappropriate treatment avoided.
8. Expression Analysis In another aspect, the invention provides gene expression analysis data.
Alteration in expression constructs is often indicative of a change in physiological status. Changes in expression patterns reflect cellular activities as well as disease state. Expression sequence analysis provides insight into the specialized activities of cells from different organs or of different types. Thus, expression analysis reveals aspects of the immune repertoire that are not apparent on a gross level. In one embodiment of the invention, a sequence determination is made with respect to the total antibody repertoire expressed by B-cells. In another embodiment of the invention, a sequence determination is made with respect to the T-cell receptor repertoire expressed by T-cells. Single molecule sequencing offers rapid, high-throughput sequencing that reveals specific detail as to which immune cells are active, and the likely epitopes against which they function. Single molecule sequencing also provides an immune fingerprint that is used to identify an infection based upon the specifics of a patient's immune response. The immune fingerprint generated using single molecule sequencing is compared to a database of collected immune sequence data in order to identify an infection. New infections are tracked through the appearance of new sequence specificities either alone or in combination with other diagnostic techniques. Isolation of immune cells is well- known in the art, and application of the present invention to sequencing a patient's immune cell complement is contemplated by the present invention.
9. Epidemiology
In another aspect the invention also provides epidemiological data. In a preferred embodiment, an appropriate patient sample is obtained and DNA in the sample is sequenced. Optionally, the patient's genomic DNA is excluded. A catalog is compiled comprising a fingerprint of the DNA (or RNA in other preferred embodiments) present in samples obtained from a multiplicity of patients. Each patient's disease status then is correlated with specific sequence information obtained from the patient's sample. In this way, diagnostic accuracy and verifiability is improved, as a patient's disease status is confirmed by comparing the patient's DNA to sequences in the database. As mentioned above, whole genome sequencing is optional. In some circumstances, it is necessary only to sequence sufficient nucleic acid to establish a fingerprint for comparison with future samples. In one embodiment, ubiquitous epidemiology is performed. In this case patient DNA is routinely sequenced and stored for disease identification and comparison with future samples to identify and track new disease outbreaks. For example, a patient who presents with a new DNA profile (i.e., containing a sequence that is not in the database) may be diagnosed with a new condition. Future patients presenting with the same nucleic acid profile are tracked. In this way, potential epidemic outbreaks are controlled. With respect to new diseases, no a priori assumptions are necessary. A novel sequence will immediately be identified as such, and appropriate monitoring can be put in place.
10. Pharmacogenomics
In another aspect, the invention provides methods and devices for performing pharmacogenomics. Differences in metabolism of therapeutics can lead to severe toxicity or therapeutic failure by altering the relation between dose and blood concentration of the pharmacologically active drug. Thus, a physician or clinician may consider applying knowledge obtained in relevant pharmacogenomics studies in determining whether to administer a therapeutic agent as well as tailoring the dosage and/or therapeutic regimen of treatment with a therapeutic agent.
Pharmacogenomics deals with clinically significant hereditary variations in the response to drugs due to altered drug disposition and abnormal action in affected persons. See, for example, Eichelbaum, M. et al. (1996) Clin. Exp. Pharmacol. Physiol. 23(10-11): 983-985 and Linder, M.W. et al. (1997) Clin. Chem. 43(2):254-266 In one embodiment, the methods of the invention provide information regarding patient genome sequence which is used to select patients or patient subpopulations for treatment with FDA-approved therapies e.g., antibody, small molecule or peptide therapies.
//. Hardware and Software Environment
As noted above, the embodiments of the present invention programmatically analyze the results of a single molecule sequencing reaction in order to predict bioagents present in a biological sample. Figures 1-3 discuss aspects of the hardware and software environment utilized by the present invention to perform the bioagent prediction.
Figure 1 depicts an environment suitable for practicing an embodiment of the present invention. A computing device 102 holds a database 104 or other storage structure containing reference sequences 105 and an analysis facility 106. The computing device 102 may be a server, workstation, laptop, personal computer, PDA or other computing device equipped with one or more processors and able to execute the analysis facility 106 discussed herein. The analysis facility 106 is preferably implemented in software although in an alternate implementation, the logic may be also be implemented in hardware. The analysis facility 106 operates on and analyzes results of single molecule sequencing reactions 122 that are received from a biological sample acquisition apparatus 120. The biological sample acquisition apparatus conducts single molecule sequencing operations on nucleic acid isolated from a biological sample. In one embodiment, the biological sample acquisition apparatus 120 is a handheld device in wireless communication with the computing device 102. The analysis facility 106 programmatically compares the results of the single molecule sequencing reaction 122 to the reference sequences 105 contained in the database 104 in order to generate a listing of predicted bioagents 144 that are present in the biological sample under consideration. In one implementation, the comparison of the results of the single molecule sequencing operation 122 to the reference sequences 105 in order to predict bioagents present in a biological sample is performed programmatically without any user input. In alternate implementations, the analysis facility 106 prompts a user for parameters controlling the comparison via the user interface 142.
The listing of the predicted bioagents 144 may be displayed to a user via a user interface 142 displayed on a display device 140 that is in communication with the computing device 102. It will be appreciated that the listing of predicted bioagents 144 may also be stored for later use and/or display to a user. The user interface 142 may also be utilized to enable a user to configure the parameters of the comparison operation performed by the analysis facility 106. Those skilled in the art will recognize that many other configurations are also possible within the scope of the present invention. Figure 2 depicts an alternative distributed environment 200 suitable for practicing an embodiment of the present invention. A first computing device 202 may be used to execute an analysis facility 204. The first computing device 202 may communicate over a network 250 with a second computing device 210 holding reference sequences 212. The network 250 may be the Internet, a local area network (LAN), a wide area network (WAN), an intranet, an internet, a wireless network or some other type of network over which the first computing device 202 and the second computing device 210 can communicate. The analysis facility 204 on the first computing device 202 may communicate over the network 250 with a biological sample acquisition apparatus 230 that generates results data 232 from a single molecule sequencing reaction performed on nucleic acid isolated from a biological sample. The analysis facility 204 may store a listing of predicted bioagents that is generated by a comparison of the results of the single molecule sequencing reaction and the reference sequences 212. The storage may occur on the first computing device 202 or at a location remote from the first computing device that is accessible over the network 250. Alternatively, the listing of predicted bioagents may be displayed to a user. It should be recognized that Figure 2 depicts only a single distributed configuration and many other distributed configurations are possible within the scope of the present invention. Figure 3 is a flowchart of a sequence of steps that may be followed by an embodiment of the present invention to predict bioagents present in a biological sample. The sequence begins by providing a biological sample (step 302). The sample may be a previously acquired sample or may be a sample that is obtained immediately in advance of the bioagent prediction process that is discussed herein being performed. Nucleic acid is then isolated from the biological sample (step 304) and a single molecule sequencing reaction is conducted on the isolated nucleic acid (step 306) as discussed above. The results of the single molecule sequencing reaction are compared to reference sequences (step 308) and a listing of predicted bioagents that are present in the biological sample is generated. The listing of predicted bioagents may then be displayed to a user or stored for later retrieval (step 310).
Embodiments of the present invention may be provided as one or more computer-readable programs embodied on or in one or more mediums. The mediums may be a floppy disk, a hard disk, a compact disc, a digital versatile disc, a flash memory card, a PROM, an MRAM, a RAM, a ROM, or a magnetic tape. In general, the computer-readable programs may be implemented in any programming language. Some examples of languages that can be used include FORTRAN, C, C++, C#, Python, Perl or Java. The software programs may be stored on or in one or more mediums as object code. Hardware acceleration may be used and all or a portion of the code may run on a FPGA, an Application Specific Integrated Processor (ASIP), or an Application Specific Integrated Circuit (ASIC). The code may run in a virtualized environment such as in a virtual machine. Multiple virtual machines running the code may be resident on a single processor.
Since certain changes may be made without departing from the scope of the present invention, it is intended that all matter contained in the above description or shown in the accompanying drawings be interpreted as illustrative and not in an exclusive sense. Practitioners of the art will realize that the sequence of steps and architectures depicted in the figures may be altered without departing from the scope of the present invention and that the illustrations contained herein are singular examples of a multitude of possible depictions of the present invention.
12. Other Applications of the Technology of the Invention
In another aspect, the methods and devices disclosed herein can be used to screen fetal mRNA or DNA, present in maternal blood, for disease-associated mutations as an alternative to amniocentesis.
In another aspect, the invention provides nucleic acid sequence information for making diagnostic kits, or chips. In another aspect, the nucleic acid sequence information or methodology disclosed herein can be used for forensic applications.
In another aspect, the methods and devices disclosed herein can be used for research purposes, for example genetic research on the distribution or migration of human populations.
In another aspect, the methods and devices disclosed herein can be used in paleontology, for example to identify and catalogue nucleic acid sequences contained in ancient biological samples.
In another aspect, the methods and devices disclosed herein can be used for environmental analysis to determine the bioagent profile of a particular of ecosystem.
In another aspect, the methods and devices disclosed herein can be used in agriculture. In one embodiment the methods and devices of the invention are used to determine the bioagent profile of soil. In another embodiment the methods and devices of the invention are used to determine the nucleic acid sequences present in plant samples and thereby assess whether they have been infected with disease-causing bioagents or have been modified by genetic engineering.
In another aspect, the methods and devices disclosed herein are used to determine genetic fingerprinting information about a subject and thereby uniquely identify them. In another aspect, the invention provides business methods for commercializing nucleic acid sequences suitable for use in, for example, the making of devices, diagnostic chips, kits, networks, and pharmaceuticals for diagnosing and treating disease.
Exemplification
Throughout the examples, the following materials and methods were used unless otherwise stated.
Materials and Methods
In general, the practice of the present invention can employ, unless otherwise indicated, conventional techniques of chemistry, molecular biology, recombinant DNA technology, PCR technology, immunology, cell culture, and any necessary computer or electronic related technology that are within the skill of the art and are explained in the literature. See, e.g., Sambrook, Fritsch and Maniatis, Molecular Cloning: Cold Spring Harbor Laboratory Press (1989); DNA Cloning, VoIs. 1 and 2, (D.N. Glover, Ed. 1985); Oligonucleotide Synthesis (MJ. Gait, Ed. 1984); PCR Handbook Current Protocols in Nucleic Acid Chemistry, Beaucage, Ed. John Wiley & Sons (1999) (Editor); Oxford Handbook of Nucleic Acid Structure, Neidle, Ed., Oxford Univ Press (1999); PCR Protocols: A Guide to Methods and Applications, Innis et al, Academic Press (1990); PCR Essential Techniques: Essential Techniques, Burke, Ed., John Wiley & Son Ltd (1996); The PCR Technique: RT-PCR, Siebert, Ed., Eaton Pub. Co. (1998).
EXAMPLE 1
A METHOD FOR DETECTING A BIOAGENT IN A PATIENT SUSPECTED OF HAVING CONTRACTED AN INFECTIOUS BIOAGENT-INDUCED DISEASE
The following example describes a novel method for determining the presence of an unknown bioagent in a patient sample. A patient sample is obtained. If necessary several types of sample encompassing all potential areas of infection may be obtained and processed together.
Nucleic acid is extracted from the sample(s) using art recognized means. A single- nucleotide sequencing of the nucleic acid is then performed to determine the sequences of nucleic acids present in the sample. The deduced nucleic acid sequence data is compared against databases of known nucleic acid sequences, for example, using a mathematical algorithm and the percentage identity with known sequences is reported.
From these sequence identities all bioagents present in the sample are deduced and known infectious bioagents identified. EXAMPLE 2.
A DEVICE FOR DETECTING A BIOAGENT IN A PATIENT SUSPECTED OF HAVING CONTRACTED AN INFECTIOUS BIOAGENT-INDUCED DISEASE The following example describes a novel device for determining the presence of an unknown bioagent in a patient sample.
A patient sample is obtained. If necessary several types of sample encompassing all potential areas of infection may be obtained and processed together. The sample is contacted with a device which proceeds to perform the following steps in an integrated operation: a) isolate nucleic acid from the sample in sufficient purity to perform single - nucleotide sequencing; b) perform single-nucleotide sequencing to determine the sequences of all nucleic acids isolated from the sample; c) compare the deduced nucleic acid sequences against databases of known nucleic acid sequences using, for example, a mathematical algorithm to determine the percentage identity of the deduced nucleic acid with known sequences; d) report the best sequence matches for all known (infectious) bioagents.
Equivalents
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims.

Claims

What is claimed
1. A method for identifying a bioagent(s) present in a biological sample, the method comprising the steps of: isolating nucleic acid from a biological sample; conducting a single molecule sequencing reaction on the plurality of nucleic acid in said sample; comparing nucleic acid sequences obtained in said conducting step to one or more reference sequences contained in a database to predict the bioagent(s) present in the sample.
2. The method of claim 1, wherein said biological sample is any form of matter containing nucleic acid.
3. The method of claim 1, wherein said biological sample is blood or another bodily derived fluid.
4. The method of claim 1, wherein said biological sample is obtained from tissue.
5. The method of claim 1, wherein said biological sample is obtained from soil.
6. The method of claim 1, wherein said biological sample is obtained is suspected to contain a hazardous bioagent.
7. A device capable of performing the method of claim 1.
8. The method of claim 1 wherein one or more steps are carried out programmatically.
9. A method for detecting nucleic acids indicative of a disease state in a sample, the method comprising the steps of: isolating nucleic acid from a biological sample suspected to contain a nucleic acid that would not be expected to be present in the sample if the individual from whom it was obtained were healthy; conducting a single molecule sequencing reaction on nucleic acid in said sample; and comparing nucleic acid sequences obtained in said conducting step to one or more reference sequences that represent nucleic acids that are not expected to be present in a sample obtained from a healthy individual, thereby identifying nucleic acids in said sample that are indicative of a disease state.
10. The method of claim 9, wherein said biological sample is blood or another bodily-derived fluid.
11. The method of claim 9, wherein said biological sample is obtained from tissue.
12. The method of claim 9, wherein said reference sequences represent a mutation that is indicative of cancer or precancer.
13. The method of claim 9, wherein said reference sequences represent an infectious disease agent.
14. The method of claim 9, wherein said heterogeneous sample comprises nucleic acid derived from multiple cell types.
15. The method of claim 9, wherein said mutation is a mutation or an insertion or a deletion.
16. The method of claim 9, wherein said biological sample is maternal blood.
17. The method of claim 9, wherein said reference nucleic acid is fetal DNA or RNA.
18. The method of claim 9, wherein said comparing step identifies the presence of nucleic acids derived from multiple organisms in a pooled sample.
19. A device capable of performing the method of claim 9.
20. The method of claim 9, wherein one or more steps are carried out programmatically.
21. A method for identifying a patient or patient subpopulation amenable to therapy wherein the patient or patient subpopulation is first identified as in need of such therapy according to the methods of any one of the preceding claims.
22. A physical medium holding computer-executable instructions for predicting bioagents present in a biological sample, the medium comprising: instructions for receiving at least one result of a single molecule sequencing reaction conducted on nucleic acid in a biological sample; and instructions for comparing at least one nucleic acid sequence obtained in said at least one result to one or more reference sequences contained in a database to predict at least one bioagent present in the biological sample.
23. The medium of claim 22 wherein the medium further comprises instructions for receiving parameters controlling said comparing from a user prior to performing said comparing.
PCT/US2008/064519 2007-05-22 2008-05-22 Automated method and device for dna isolation, sequence determination, and identification WO2008147879A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US93128507P 2007-05-22 2007-05-22
US60/931,285 2007-05-22

Publications (1)

Publication Number Publication Date
WO2008147879A1 true WO2008147879A1 (en) 2008-12-04

Family

ID=40075490

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/064519 WO2008147879A1 (en) 2007-05-22 2008-05-22 Automated method and device for dna isolation, sequence determination, and identification

Country Status (1)

Country Link
WO (1) WO2008147879A1 (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8236503B2 (en) 2008-11-07 2012-08-07 Sequenta, Inc. Methods of monitoring conditions by sequence analysis
US8583380B2 (en) 2008-09-05 2013-11-12 Aueon, Inc. Methods for stratifying and annotating cancer drug treatment options
US8628927B2 (en) 2008-11-07 2014-01-14 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
US8691510B2 (en) 2008-11-07 2014-04-08 Sequenta, Inc. Sequence analysis of complex amplicons
US8748103B2 (en) 2008-11-07 2014-06-10 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
CN104106072A (en) * 2011-12-08 2014-10-15 皇家飞利浦有限公司 Biological cell assessment using whole genome sequence and oncological therapy planning using same
EP2566984A4 (en) * 2010-05-07 2015-05-20 Univ Leland Stanford Junior Measurement and comparison of immune diversity by high-throughput sequencing
US9043160B1 (en) 2009-11-09 2015-05-26 Sequenta, Inc. Method of determining clonotypes and clonotype profiles
US9150905B2 (en) 2012-05-08 2015-10-06 Adaptive Biotechnologies Corporation Compositions and method for measuring and calibrating amplification bias in multiplexed PCR reactions
US9181590B2 (en) 2011-10-21 2015-11-10 Adaptive Biotechnologies Corporation Quantification of adaptive immune cell genomes in a complex mixture of cells
US9365901B2 (en) 2008-11-07 2016-06-14 Adaptive Biotechnologies Corp. Monitoring immunoglobulin heavy chain evolution in B-cell acute lymphoblastic leukemia
US9499865B2 (en) 2011-12-13 2016-11-22 Adaptive Biotechnologies Corp. Detection and measurement of tissue-infiltrating lymphocytes
US9506119B2 (en) 2008-11-07 2016-11-29 Adaptive Biotechnologies Corp. Method of sequence determination using sequence tags
US9528160B2 (en) 2008-11-07 2016-12-27 Adaptive Biotechnolgies Corp. Rare clonotypes and uses thereof
US9708657B2 (en) 2013-07-01 2017-07-18 Adaptive Biotechnologies Corp. Method for generating clonotype profiles using sequence tags
US9809813B2 (en) 2009-06-25 2017-11-07 Fred Hutchinson Cancer Research Center Method of measuring adaptive immunity
US9824179B2 (en) 2011-12-09 2017-11-21 Adaptive Biotechnologies Corp. Diagnosis of lymphoid malignancies and minimal residual disease detection
US9909180B2 (en) 2013-02-04 2018-03-06 The Board Of Trustees Of The Leland Stanford Junior University Measurement and comparison of immune diversity by high-throughput sequencing
EP3209791A4 (en) * 2014-10-22 2018-06-06 Ibis Biosciences, Inc. Bacterial epigenomic analysis
US10058839B2 (en) 2013-03-15 2018-08-28 Lineage Biosciences, Inc. Methods and compositions for tagging and analyzing samples
US10066265B2 (en) 2014-04-01 2018-09-04 Adaptive Biotechnologies Corp. Determining antigen-specific t-cells
US10072283B2 (en) 2010-09-24 2018-09-11 The Board Of Trustees Of The Leland Stanford Junior University Direct capture, amplification and sequencing of target DNA using immobilized primers
US10077478B2 (en) 2012-03-05 2018-09-18 Adaptive Biotechnologies Corp. Determining paired immune receptor chains from frequency matched subunits
US10150996B2 (en) 2012-10-19 2018-12-11 Adaptive Biotechnologies Corp. Quantification of adaptive immune cell genomes in a complex mixture of cells
US10221461B2 (en) 2012-10-01 2019-03-05 Adaptive Biotechnologies Corp. Immunocompetence assessment by adaptive immune receptor diversity and clonality characterization
US10246701B2 (en) 2014-11-14 2019-04-02 Adaptive Biotechnologies Corp. Multiplexed digital quantitation of rearranged lymphoid receptors in a complex mixture
US10323276B2 (en) 2009-01-15 2019-06-18 Adaptive Biotechnologies Corporation Adaptive immunity profiling and methods for generation of monoclonal antibodies
US10385475B2 (en) 2011-09-12 2019-08-20 Adaptive Biotechnologies Corp. Random array sequencing of low-complexity libraries
US10392663B2 (en) 2014-10-29 2019-08-27 Adaptive Biotechnologies Corp. Highly-multiplexed simultaneous detection of nucleic acids encoding paired adaptive immune receptor heterodimers from a large number of samples
US10428325B1 (en) 2016-09-21 2019-10-01 Adaptive Biotechnologies Corporation Identification of antigen-specific B cell receptors
US11041202B2 (en) 2015-04-01 2021-06-22 Adaptive Biotechnologies Corporation Method of identifying human compatible T cell receptors specific for an antigenic target
US11047008B2 (en) 2015-02-24 2021-06-29 Adaptive Biotechnologies Corporation Methods for diagnosing infectious disease and determining HLA status using immune repertoire sequencing
US11066705B2 (en) 2014-11-25 2021-07-20 Adaptive Biotechnologies Corporation Characterization of adaptive immune response to vaccination or infection using immune repertoire sequencing
US11248253B2 (en) 2014-03-05 2022-02-15 Adaptive Biotechnologies Corporation Methods using randomer-containing synthetic molecules
US11254980B1 (en) 2017-11-29 2022-02-22 Adaptive Biotechnologies Corporation Methods of profiling targeted polynucleotides while mitigating sequencing depth requirements
US11965211B2 (en) 2018-10-08 2024-04-23 Aqtual, Inc. Methods for sequencing samples

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005014850A2 (en) * 2003-08-06 2005-02-17 University Of Massachusetts Systems and methods for analyzing nucleic acid sequences
US20050153284A1 (en) * 2000-06-30 2005-07-14 Zeno Foldes-Papp Single molecule sequencing method
US20050260614A1 (en) * 2000-07-07 2005-11-24 Susan Hardin Methods for real-time single molecule sequence determination
US20060046258A1 (en) * 2004-02-27 2006-03-02 Lapidus Stanley N Applications of single molecule sequencing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050153284A1 (en) * 2000-06-30 2005-07-14 Zeno Foldes-Papp Single molecule sequencing method
US20050260614A1 (en) * 2000-07-07 2005-11-24 Susan Hardin Methods for real-time single molecule sequence determination
WO2005014850A2 (en) * 2003-08-06 2005-02-17 University Of Massachusetts Systems and methods for analyzing nucleic acid sequences
US20060046258A1 (en) * 2004-02-27 2006-03-02 Lapidus Stanley N Applications of single molecule sequencing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GREENLEAF W.J. ET AL.: "Single-molecule, motion-based DNA sequencing using RNA polymerase", SCIENCE, vol. 313, 11 August 2006 (2006-08-11), pages 801 *
HARRIS T.D. ET AL.: "Single-Molecule DNA Sequencing of a Viral Genome", SCIENCE, vol. 320, no. 5872, 4 April 2008 (2008-04-04), pages 106 - 109 *

Cited By (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8583380B2 (en) 2008-09-05 2013-11-12 Aueon, Inc. Methods for stratifying and annotating cancer drug treatment options
US8236503B2 (en) 2008-11-07 2012-08-07 Sequenta, Inc. Methods of monitoring conditions by sequence analysis
US8748103B2 (en) 2008-11-07 2014-06-10 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
US9217176B2 (en) 2008-11-07 2015-12-22 Sequenta, Llc Methods of monitoring conditions by sequence analysis
US8691510B2 (en) 2008-11-07 2014-04-08 Sequenta, Inc. Sequence analysis of complex amplicons
US9228232B2 (en) 2008-11-07 2016-01-05 Sequenta, LLC. Methods of monitoring conditions by sequence analysis
US8795970B2 (en) 2008-11-07 2014-08-05 Sequenta, Inc. Methods of monitoring conditions by sequence analysis
US10155992B2 (en) 2008-11-07 2018-12-18 Adaptive Biotechnologies Corp. Monitoring health and disease status using clonotype profiles
US10519511B2 (en) 2008-11-07 2019-12-31 Adaptive Biotechnologies Corporation Monitoring health and disease status using clonotype profiles
US9523129B2 (en) 2008-11-07 2016-12-20 Adaptive Biotechnologies Corp. Sequence analysis of complex amplicons
US10246752B2 (en) 2008-11-07 2019-04-02 Adaptive Biotechnologies Corp. Methods of monitoring conditions by sequence analysis
US9512487B2 (en) 2008-11-07 2016-12-06 Adaptive Biotechnologies Corp. Monitoring health and disease status using clonotype profiles
US9528160B2 (en) 2008-11-07 2016-12-27 Adaptive Biotechnolgies Corp. Rare clonotypes and uses thereof
US8628927B2 (en) 2008-11-07 2014-01-14 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
US8507205B2 (en) 2008-11-07 2013-08-13 Sequenta, Inc. Single cell analysis by polymerase cycling assembly
US10266901B2 (en) 2008-11-07 2019-04-23 Adaptive Biotechnologies Corp. Methods of monitoring conditions by sequence analysis
US10760133B2 (en) 2008-11-07 2020-09-01 Adaptive Biotechnologies Corporation Monitoring health and disease status using clonotype profiles
US9506119B2 (en) 2008-11-07 2016-11-29 Adaptive Biotechnologies Corp. Method of sequence determination using sequence tags
US9347099B2 (en) 2008-11-07 2016-05-24 Adaptive Biotechnologies Corp. Single cell analysis by polymerase cycling assembly
US9365901B2 (en) 2008-11-07 2016-06-14 Adaptive Biotechnologies Corp. Monitoring immunoglobulin heavy chain evolution in B-cell acute lymphoblastic leukemia
US11001895B2 (en) 2008-11-07 2021-05-11 Adaptive Biotechnologies Corporation Methods of monitoring conditions by sequence analysis
US9416420B2 (en) 2008-11-07 2016-08-16 Adaptive Biotechnologies Corp. Monitoring health and disease status using clonotype profiles
US10323276B2 (en) 2009-01-15 2019-06-18 Adaptive Biotechnologies Corporation Adaptive immunity profiling and methods for generation of monoclonal antibodies
US11214793B2 (en) 2009-06-25 2022-01-04 Fred Hutchinson Cancer Research Center Method of measuring adaptive immunity
US9809813B2 (en) 2009-06-25 2017-11-07 Fred Hutchinson Cancer Research Center Method of measuring adaptive immunity
US11905511B2 (en) 2009-06-25 2024-02-20 Fred Hutchinson Cancer Center Method of measuring adaptive immunity
US9043160B1 (en) 2009-11-09 2015-05-26 Sequenta, Inc. Method of determining clonotypes and clonotype profiles
EP2566984A4 (en) * 2010-05-07 2015-05-20 Univ Leland Stanford Junior Measurement and comparison of immune diversity by high-throughput sequencing
US9290811B2 (en) 2010-05-07 2016-03-22 The Board Of Trustees Of The Leland Stanford Junior University Measurement and comparison of immune diversity by high-throughput sequencing
US10774382B2 (en) 2010-05-07 2020-09-15 The Board of Trustees of the Leland Stanford University Junior University Measurement and comparison of immune diversity by high-throughput sequencing
US9234240B2 (en) 2010-05-07 2016-01-12 The Board Of Trustees Of The Leland Stanford Junior University Measurement and comparison of immune diversity by high-throughput sequencing
US10196689B2 (en) 2010-05-07 2019-02-05 The Board Of Trustees Of The Leland Stanford Junior University Measurement and comparison of immune diversity by high-throughput sequencing
US10072283B2 (en) 2010-09-24 2018-09-11 The Board Of Trustees Of The Leland Stanford Junior University Direct capture, amplification and sequencing of target DNA using immobilized primers
US10385475B2 (en) 2011-09-12 2019-08-20 Adaptive Biotechnologies Corp. Random array sequencing of low-complexity libraries
US9279159B2 (en) 2011-10-21 2016-03-08 Adaptive Biotechnologies Corporation Quantification of adaptive immune cell genomes in a complex mixture of cells
US9181590B2 (en) 2011-10-21 2015-11-10 Adaptive Biotechnologies Corporation Quantification of adaptive immune cell genomes in a complex mixture of cells
US20140330162A1 (en) * 2011-12-08 2014-11-06 Koninklijke Philips N.V. Biological cell assessment using whole genome sequence and oncological therapy planning using same
CN104106072A (en) * 2011-12-08 2014-10-15 皇家飞利浦有限公司 Biological cell assessment using whole genome sequence and oncological therapy planning using same
US9824179B2 (en) 2011-12-09 2017-11-21 Adaptive Biotechnologies Corp. Diagnosis of lymphoid malignancies and minimal residual disease detection
US9499865B2 (en) 2011-12-13 2016-11-22 Adaptive Biotechnologies Corp. Detection and measurement of tissue-infiltrating lymphocytes
US10077478B2 (en) 2012-03-05 2018-09-18 Adaptive Biotechnologies Corp. Determining paired immune receptor chains from frequency matched subunits
US9150905B2 (en) 2012-05-08 2015-10-06 Adaptive Biotechnologies Corporation Compositions and method for measuring and calibrating amplification bias in multiplexed PCR reactions
US10214770B2 (en) 2012-05-08 2019-02-26 Adaptive Biotechnologies Corp. Compositions and method for measuring and calibrating amplification bias in multiplexed PCR reactions
US9371558B2 (en) 2012-05-08 2016-06-21 Adaptive Biotechnologies Corp. Compositions and method for measuring and calibrating amplification bias in multiplexed PCR reactions
US10894977B2 (en) 2012-05-08 2021-01-19 Adaptive Biotechnologies Corporation Compositions and methods for measuring and calibrating amplification bias in multiplexed PCR reactions
US10221461B2 (en) 2012-10-01 2019-03-05 Adaptive Biotechnologies Corp. Immunocompetence assessment by adaptive immune receptor diversity and clonality characterization
US11180813B2 (en) 2012-10-01 2021-11-23 Adaptive Biotechnologies Corporation Immunocompetence assessment by adaptive immune receptor diversity and clonality characterization
US10150996B2 (en) 2012-10-19 2018-12-11 Adaptive Biotechnologies Corp. Quantification of adaptive immune cell genomes in a complex mixture of cells
US10774383B2 (en) 2013-02-04 2020-09-15 The Board Of Trustees Of The Leland Stanford Junior University Measurement and comparison of immune diversity by high-throughput sequencing
US9909180B2 (en) 2013-02-04 2018-03-06 The Board Of Trustees Of The Leland Stanford Junior University Measurement and comparison of immune diversity by high-throughput sequencing
US10058839B2 (en) 2013-03-15 2018-08-28 Lineage Biosciences, Inc. Methods and compositions for tagging and analyzing samples
US10722858B2 (en) 2013-03-15 2020-07-28 Lineage Biosciences, Inc. Methods and compositions for tagging and analyzing samples
US11161087B2 (en) 2013-03-15 2021-11-02 Lineage Biosciences, Inc. Methods and compositions for tagging and analyzing samples
US9708657B2 (en) 2013-07-01 2017-07-18 Adaptive Biotechnologies Corp. Method for generating clonotype profiles using sequence tags
US10526650B2 (en) 2013-07-01 2020-01-07 Adaptive Biotechnologies Corporation Method for genotyping clonotype profiles using sequence tags
US10077473B2 (en) 2013-07-01 2018-09-18 Adaptive Biotechnologies Corp. Method for genotyping clonotype profiles using sequence tags
US11248253B2 (en) 2014-03-05 2022-02-15 Adaptive Biotechnologies Corporation Methods using randomer-containing synthetic molecules
US10066265B2 (en) 2014-04-01 2018-09-04 Adaptive Biotechnologies Corp. Determining antigen-specific t-cells
US10435745B2 (en) 2014-04-01 2019-10-08 Adaptive Biotechnologies Corp. Determining antigen-specific T-cells
US11261490B2 (en) 2014-04-01 2022-03-01 Adaptive Biotechnologies Corporation Determining antigen-specific T-cells
EP3209791A4 (en) * 2014-10-22 2018-06-06 Ibis Biosciences, Inc. Bacterial epigenomic analysis
US10392663B2 (en) 2014-10-29 2019-08-27 Adaptive Biotechnologies Corp. Highly-multiplexed simultaneous detection of nucleic acids encoding paired adaptive immune receptor heterodimers from a large number of samples
US10246701B2 (en) 2014-11-14 2019-04-02 Adaptive Biotechnologies Corp. Multiplexed digital quantitation of rearranged lymphoid receptors in a complex mixture
US11066705B2 (en) 2014-11-25 2021-07-20 Adaptive Biotechnologies Corporation Characterization of adaptive immune response to vaccination or infection using immune repertoire sequencing
US11047008B2 (en) 2015-02-24 2021-06-29 Adaptive Biotechnologies Corporation Methods for diagnosing infectious disease and determining HLA status using immune repertoire sequencing
US11041202B2 (en) 2015-04-01 2021-06-22 Adaptive Biotechnologies Corporation Method of identifying human compatible T cell receptors specific for an antigenic target
US10428325B1 (en) 2016-09-21 2019-10-01 Adaptive Biotechnologies Corporation Identification of antigen-specific B cell receptors
US11254980B1 (en) 2017-11-29 2022-02-22 Adaptive Biotechnologies Corporation Methods of profiling targeted polynucleotides while mitigating sequencing depth requirements
US11965211B2 (en) 2018-10-08 2024-04-23 Aqtual, Inc. Methods for sequencing samples

Similar Documents

Publication Publication Date Title
WO2008147879A1 (en) Automated method and device for dna isolation, sequence determination, and identification
JP7051900B2 (en) Methods and systems for the generation and error correction of unique molecular index sets with non-uniform molecular lengths
Cummings et al. Using DNA microarrays to study host-microbe interactions.
EP2518162B1 (en) Multitag sequencing and ecogenomics analysis
AU2006320541B2 (en) Methods and systems for designing primers and probes
US20120129794A1 (en) Apparati, methods, and compositions for universal microbial diagnosis, detection, quantification, and specimen-targeted therapy
CN108138227A (en) Inhibit error in DNA fragmentation is sequenced using the redundancy read that (UMI) is indexed with unique molecular
US20140155283A1 (en) Microarray for detecting viable organisms
JP2013531983A (en) Nucleic acids for multiplex biological detection and methods of use and production thereof
JP2003021630A (en) Method of providing clinical diagnosing service
CA3067418C (en) Methods for accurate computational decomposition of dna mixtures from contributors of unknown genotypes
WO2013173774A2 (en) Molecular inversion probes
Schürch et al. Genomic tracing of epidemics and disease outbreaks
US20080228406A1 (en) System and method for fungal identification
US20220251669A1 (en) Compositions and methods for assessing microbial populations
Alford et al. DNA analysis in forensics, disease and animal/plant identification
US20200082911A1 (en) Analysis method, information processing apparatus, gene analysis system and non-transitory storage medium
Leong et al. State-Wide genomic and epidemiological analyses of Vancomycin-Resistant Enterococcus faecium in Tasmania’s public hospitals
JP2024504062A (en) chromosome interactions
US7531309B2 (en) PCR based capsular typing method
WO2023058522A1 (en) Method for analyzing structural polymorphism, primer pair set, and method for designing primer pair set
WO2021192395A1 (en) Method and program for calculating base methylation levels
Mandal et al. Rapid Microbial Genome Sequencing Techniques and Applications
WO2024030342A1 (en) Methods and compositions for nucleic acid analysis
CN108271396A (en) For predicting the genetic test of the resistance of Stenotrophomonas species combating microorganisms agent

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08769611

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08769611

Country of ref document: EP

Kind code of ref document: A1