WO2013173774A2 - Sondes d'inversion moléculaire - Google Patents

Sondes d'inversion moléculaire Download PDF

Info

Publication number
WO2013173774A2
WO2013173774A2 PCT/US2013/041675 US2013041675W WO2013173774A2 WO 2013173774 A2 WO2013173774 A2 WO 2013173774A2 US 2013041675 W US2013041675 W US 2013041675W WO 2013173774 A2 WO2013173774 A2 WO 2013173774A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
probe
sequence
probes
strains
Prior art date
Application number
PCT/US2013/041675
Other languages
English (en)
Other versions
WO2013173774A3 (fr
Inventor
Philip Rolfe
Alexander SHEH
Original Assignee
Pathogenica, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pathogenica, Inc. filed Critical Pathogenica, Inc.
Publication of WO2013173774A2 publication Critical patent/WO2013173774A2/fr
Publication of WO2013173774A3 publication Critical patent/WO2013173774A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/70Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving virus or bacteriophage
    • C12Q1/701Specific hybridization probes
    • C12Q1/706Specific hybridization probes for hepatitis
    • C12Q1/707Specific hybridization probes for hepatitis non-A, non-B Hepatitis, excluding hepatitis D
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • Detection of different organisms can be important in many applications, such as in clinical diagnosis (for example, detection of viruses, parasites, bacteria, fungus), clinical monitoring (for example, viral/bacterial load, pathogen biomarkers, biomarkers of a host or subject), environmental biosurveillance (for example, hospital acquired infections, biological agents, controlled genetically modified organisms), as well as, in biological safety (detection of contaminants or foreign organism in blood supply, biologic agents, food/water agriculture, livestock pathogen surveillance and breeding, genetically modified crop pathogen and breeding, biodefense such as large volume air/water supply, surface swabs, and rapid identification from blood samples). Detection can be by sequencing and used for viral or cancer detection, such as HCV screening, as well as pathogen genotyping.
  • Pathogen genotyping can be used to sub-type or identify the genotype of a pathogen, to identify the pathogen as a high or low risk variant, quantify the level of pathogen in a sample or host, detect drug resistant variants, or identify toxin genes of the pathogen.
  • viral genotyping can be used for HIV or HCV genotyping.
  • Infectious diseases can be caused by a wide variety of pathogens, including viruses, bacteria, archaea, fungi, and other eukaryotes (both single cellular and multicellular), many of which can be cultured only with great difficulty or not at all, hindering detection and selection of proper clinical intervention.
  • the invention can be directed to sets of nucleic acid probes for multiplex detection of hepatitis viruses and methods of using the probes.
  • a single-stranded or predominately single-stranded nucleic acid probe for identifying one or more hepatitis viruses, or variants, mutations, subtypes, or strains thereof in a sample, the probe comprising a first probe sequence that hybridizes to a 5' end of a target sequence in a genomic region of one or more more hepatitis viruses, or variants, mutations, subtypes, or strains thereof; a second probe sequence that hybridizes to a 3' end of the target sequence of the one or more hepatitis viruses, or variants, mutations, subtypes, or strains thereof, and a backbone sequence between the first and second probe sequences; wherein the first probe sequence can be an odd numbered SEQ ID NO selected from SEQ ID NO: 1 to SEQ ID NO: 873; and the second probe sequence has a SEQ ID NO that is one number greater than the SEQ ID NO of the first probe sequence.
  • the backbone sequence comprises a detectable moiety and a primer-binding sequence.
  • the backbone sequence is selected from SEQ ID NO: 875 or SEQ ID NO: 876.
  • the probe is a molecular inversion probe (MIP).
  • the backbone sequence furthers comprise a second primer-binding sequence.
  • the detectable moiety is a barcode sequence.
  • composition comprising a plurality of probes according to claim 1, wherein one or more probes in the plurality of probes bind to a target sequence.
  • some of the genomic regions chosen as target sequences are known to be highly conserved such that each strain or variant tends to contain a single version of the region, thus allowing for strain or mutant identification.
  • the mutants confer resistance to an antiviral drug or class of drugs.
  • the plurality of probes when implemented in an assay, allows for detecting and distinguishing at least 2 nucleotide variants resulting in amino acid substitutions within virus proteins.
  • the nucleotide variants resulting in amino acid substitutions within virus proteins include codons predicted to modulate antiviral drug resistance.
  • said assay has at least 99% specificity or sensitivity in identifying said hepatitis viruses, or variants, mutations, subtypes, or strains thereof.
  • the plurality of probes, when implemented in an assay allows for detecting and distinguishing over 2 nucleotide variants resulting in synonymous codons.
  • the plurality of probes, when implemented in an assay allows for estimating codon usage by virus variants.
  • the plurality of probes, when implemented in an assay allows for detecting and distinguishing at least 2 different strains, mutants, variants, or subtypes of a hepatitis virus.
  • the hepatitis virus includes hepatitis A virus (HAV), hepatitis B virus (HBV), hepatitis C virus (HCV), hepatitis D virus (HDV), and hepatitis E virus (HEV).
  • HCV hepatitis A virus
  • HBV hepatitis B virus
  • HCV hepatitis C virus
  • HDV hepatitis D virus
  • HCV hepatitis E virus
  • the at least 2 different strains, mutants, variants, or subtypes of a hepatitis virus include all known hepatitis virus strains, mutants, variants, or subtypes.
  • the at least 2 different strains, mutants, variants, or subtypes of HCV comprise HCV- la and HCVlb.
  • each probe sequence in the group consisting of all sequences from SEQ ID NO: l to SEQ ID NO: 874 can be contained in at least one probe in the plurality of probes.
  • a composition further comprises extracted nucleic acids from a test sample.
  • the extracted nucleic acids comprise RNA or DNA.
  • the RNA is operable to derive cDNA therefrom.
  • the extracted nucleic acids can be from a biological sample.
  • the biological sample can be from a human patient.
  • a composition further comprises at least one sample internal calibration standard nucleic acid.
  • a composition further comprises at least one probe that specifically hybridizes with the sample internal calibration standard nucleic acid.
  • a composition further comprises extracted nucleic acids from a test sample.
  • the plurality of probes includes at least 2 probes.
  • the plurality of probes when implemented in an assay, allows for detecting and distinguishing at least 2 resistance mutations.
  • the resistance mutations confer resistance to two or more drugs.
  • the resistance mutations confer resistance to two or more classes of drugs.
  • the resistance mutations comprise all known resistance mutations of hepatitis virus.
  • the resistance mutations comprise mutations within the genes NS3, NS5a, NS5b, and combinations thereof.
  • some of the target sequences are known to be highly variable such that each strain or substrain can contain a different version of the region, thus enabling strain or substrain identification and differentiation.
  • kits comprising any of the compositions described herein, reagents, and instructions for use to capture the target sequences.
  • the kit further comprises reagents for DNA extraction.
  • the probes, reagents, and instructions allow the capture reaction to be performed in a single tube. In one embodiment, the probes, reagents and instructions allow the capture reaction to be performed in a less than three hours.
  • a method of detecting the presence of one or more strains or mutations of hepatitis virus in a test sample from a host comprising: a) contacting a test sample with any of the compositions described herein to form a mixture; b) capturing one or more regions of interest in an hepatitis virus genome by at least one single-stranded nucleic acid probe hybridized to a first and second target sequence in the hepatitis virus genome to form one or more circularized probes; and c) detecting the one or more captured regions of interest, thereby identifying the presence and/or quantifying the amount of the one or more strains or mutations of hepatitis virus.
  • the circularized probe can be detected by hybridization.
  • the hybridization can be to a microarray comprising at least one feature that specifically hybridizes to the circularized probe.
  • the test sample can be obtained from a human subject.
  • the test sample can be a biopsy, blood draw, pin prick, saliva and other tissue.
  • the method further comprises determining the genotype of said host from said sample.
  • the identification and/or quantification results and the genotype of the host are reported to a physician. In one embodiment, determining the genotype of said host and identifying the presence and/or quantifying the amount of the one or more strains or mutations of hepatitis virus are performed simultaneously. In one embodiment, the identification and/or quantification results are reported to a physician immediately after the physician orders the results from the determination of the genotype of the host.
  • an additional non-host organism is identified.
  • an additional strain or mutant of said hepatitis virus is identified.
  • the one or more regions of interest can be captured by polymerase-dependent extension from the 3 ' terminus of the first probe sequence of a probe in the plurality of probes.
  • the one or more regions of interest can be captured by sequence-specific ligation of a linking oligonucleotide.
  • the method further comprises the step of amplifying the one or more circularized probes to form a plurality of amplicons containing one or more captured regions of interest.
  • the method further comprises the step of treating the mixture with a nuclease to remove linear nucleic acids between steps (b) and (c).
  • the method further comprises the step of linearizing the one or more circularized probes by cleavage with a site-specific endonuclease.
  • the one or more regions of interest are sequenced, either completely, from one end, or from both ends, such that a mutation within one or more regions of interest will be detected if it is present in at least 5%, 1%, .1%, or .01% of the population of the cells and viruses present in the sample.
  • the one or more regions of interest provide information about the presence, characteristics, mutations, strains, or status of viruses if the viruses are present in at least 5%, 1%>, 0.1%, or 0.0 P/o of the population of the cells and viruses present in the sample.
  • the methods further comprise further comprising the step of sequencing the one or more regions of interest or one or more parts thereof. In one embodiment, the sequencing can be dideoxy sequencing or next generation sequencing.
  • some portion of a plurality of the regions of interest or parts thereof are sequenced simultaneously and then mapped to a database of reference sequences to determine the most likely identities of the strains or mutants present in the sample. In one embodiment, some portion of a plurality of the regions of interest or parts thereof are sequenced simultaneously and then assembled into one or more consensus sequences.
  • the method further comprises the steps of: (a) comparing the sequences of the one or more captured regions of interest to a database of sequences of known hepatitis virus strains or to a database of known hepatitis virus mutations, and (b) compiling the results of the comparisons
  • the database comprises all known hepatitis virus strains, resistance mutations, or both.
  • the sequences of the one or more captured regions of interest are compared to the database individually.
  • the method further comprises the steps of (a) generating one or more consensus sequence of the one or more captured regions of interest determined from sequences determined by using two or more probes, (b) comparing the consensus sequence to a database of sequences of known hepatitis virus strains, a database of known hepatitis virus mutations, or a combination thereof, and (c) determining the abundance or proportion of one or more strains and/or mutations present in the sample based on the results of the comparing step.
  • the one or more probes comprise the same single-stranded nucleic acid probes hybridized to a first and second target sequence in the hepatitis virus genome.
  • the one or more probes comprise different single-stranded nucleic acid probes hybridized to a first and second target sequence in the hepatitis virus genome.
  • the sequence of interest contains one or more of any one of the group selected from a mutation, a single nucleotide polymorphisms (SNP), an insertion, a deletion, and an indel.
  • the method further comprises the step of analyzing the sequence of the captured region of interest with respect to the sequence of known hepatitis virus genomes and a model of sequencing errors to estimate the proportion or abundance of one or more hepatitis virus strains, mutations, or both present in the sample.
  • the methods further comprise the step of adding a sample internal calibration standard to the test sample. In one embodiment, the method further comprises the steps of adding a probe that specifically hybridizes with the sample internal calibration standard and detecting the sample internal calibration standard. In some embodiments, the methods further comprise the step of formatting the results to inform physician decision making. In one embodiment, the formatting includes providing an estimated quantity of one or more hepatitis virus strains of interest, mutations, sets of mutations of interest, or a combination thereof. In one embodiment, the formatted results comprise a therapeutic recommendation based on the one or more hepatitis virus strains detected.
  • a method of treating a subject infected with hepatitis virus comprising performing any of the methods described herein and administering a suitable therapy to the subject based on the at least one hepatitis virus strain detected.
  • a method of diagnosing a subject as being infected with hepatitis virus comprising performing any of the methods described herein and diagnosing the subject as being infected with hepatitis virus if at least one hepatitis virus strain can be detected.
  • any of the methods provided herein simultaneously detect and genotype a hepatitis virus, while also assaying for, or detecting, a set of pathogens common in virally infected individuals.
  • the method also detects the genotype of one of more human loci known or suspected to influence the choice or effectiveness of an anti-viral treatment. In one embodiment, some of the hepatitis virus genotypes are known or suspected to influence the effectiveness of an anti-viral treatment.
  • any of the methods provided herein allow for prediction of drug resistance of a hepatitis virus infection using viral markers present at less than 500 copies per ml of blood. In some aspects, any of the methods provided herein allow for prediction of drug resistance of a hepatitis virus infection using viral markers present at between 10 and 10,000 copies per ml of blood.
  • any of the methods provided herein can achieve 99% or greater sensitivity and specificity for hepatitis virus detection.
  • any of the methods provided herein can detect any pair of hepatitis virus strains from some list and determine which pair of strains is present.
  • any of the methods provided herein can predict hepatitis virus caused liver cancer with greater than 80% sensitivity and 80%> specificity based on presence of hepatitis virus strains
  • any of the methods provided herein can predict hepatitis virus caused liver cancer with greater than 80%o sensitivity and 80%o specificity based on presence of hepatitis virus strains and a human genotype [0026] In some aspects, any of the methods provided herein can simultaneously detect any combination of at least two hepatitis virus strains in addition to detecting the human's genotype.
  • any of the methods provided herein can simultaneously detect any combination of at least two hepatitis virus strains in addition to detecting at least two other viral infections.
  • any of the methods provided herein can simultaneously detect any combination of at least two hepatitis virus strains in addition to detecting the human's genotype and the presence of at least two other viral infections.
  • the reaction of any of the methods provided herein is performed in a single tube.
  • the hepatitis virus includes all known hepatitis virus strains, mutants, variants, or subtypes.
  • the resistance mutations include all known hepatitis virus drug resistance mutations.
  • the hepatitis virus includes hepatitis A virus (HAV), hepatitis B virus (HBV), hepatitis C virus (HCV), hepatitis D virus (HDV), and hepatitis E virus (HEV).
  • HAV hepatitis A virus
  • HBV hepatitis B virus
  • HCV hepatitis C virus
  • HDV hepatitis D virus
  • HEV hepatitis E virus
  • the hepatitis virus includes HCV-la and HCV lb strains.
  • Figure 1 depicts a schematic of the workflow from total blood RNA.
  • Figure 2 depicts selected coverage of resistance markers using an HCV probe set.
  • Figure 3 depicts selected coverage of resistance markers using an HCV probe set.
  • Figure 4A depicts a plot of codon number vs. alternate reads/total reads of a sample for selected observed NS3 mutations.
  • the codon numbers depicted from left to right are 36, 40, 41, 43, 54, 55, 72, 80,
  • Figure 4B depicts a plot of codon number vs. alternate reads/total reads of a sample for NS3 mutations associated with drug resistance.
  • the codon numbers depicted from left to right are 36, 41, 43, 54, 55, 80, 109, 153, 155, 156, 168, 170, and 176.
  • Figure 4C depicts a plot of codon number vs. alternate reads/total reads of a sample for selected observed NS5b mutations.
  • the codon numbers depicted from left to right are 50, 71, 79, 95, 98, 117, 142,
  • Figure 4D depicts a plot of codon number vs. alternate reads/total reads of a sample for NS5b mutations associated with drug resistance.
  • the codon numbers depicted from left to right are 50, 55, 71,
  • Figure 5A depicts a plot of codon number vs. alternate reads/total reads of a sample for NS3 mutations associated with drug resistance.
  • the codon numbers depicted from left to right are 36, 41, 43,
  • Figure 5B depicts a plot of codon number vs. alternate reads/total reads of a sample for NS5a mutations associated with drug resistance.
  • the codon numbers depicted from left to right are 23, 28, 30, 31, 32, 58, and 93.
  • Figure 5C depicts a plot of codon number vs. alternate reads/total reads of a sample for NS5b mutations associated with drug resistance.
  • the codon numbers depicted from left to right are 50, 55, 71,
  • Figure 6A depicts a plot of codon number vs. alternate reads/total reads of a sample for selected observed NS3 mutations.
  • the codon numbers depicted from left to right are 36, 41, 43, 54, 55, 61, 62, 64,
  • Figure 6B depicts a plot of codon number vs. alternate reads/total reads of a sample for selected observed NS5a mutations.
  • the codon numbers depicted from left to right are 17, 23, 25, 28, 30, 31, 32,
  • Figure 6C depicts a plot of codon number vs. alternate reads/total reads of a sample for selected observed NS5b mutations.
  • the codon numbers depicted from left to right are 50, 71, 85, 96, 114, 130, 142, 168, 270, 288, 316, 368, 395, 414, 419, 426, 440, 442, 447, 451, 495, 499, 555, and 558.
  • Figure 7 depicts a table of selected mutations in HCV genes detected in clinical samples and the respective associated drug resistance.
  • Figure 8 depicts plots of correlation between runs of samples and a table of the frequency at which specific codon were detected in samples.
  • the Pathogenica BioDetectionTM system encompasses a library of >75,000 DxSeq probes that select and report unique DNA sequence of target genes/mutations. This technology enables the sequencing of dozens to thousands of genomic regions that may be present in a sample.
  • the Pathogenica platform can be sequencer-agnostic - the DxSeq probes can be used in conjunction with all commercially available sequencing platforms.
  • the assay can be performed in a single tube, in under 3 hours, and the probe technology can provide significant gene region fold coverage (500 to >50,000 read depth) which can be used to determine quantitative variant information.
  • Obj ectives To develop an assay to genotype HCV and to detect NS3 , NS5a and NS5b drug resistance mutations in HCV genotype la and lb clinical samples using Pathogenica DxSeq probe technology.
  • results The HCV DxSeq assay enabled genotyping of HCV viral variants present within the sample, correctly identifying HCV- la and HCVlb infections.
  • DxSeq probes were able to sequence regions previously reported to confer retroviral drug resistance, in both genotypes 1 NS5a and NS5b. Resistance locus capture size was between 150 and 250 bases, and read depth ranged between 50 to 4756 fold for 215 probes. Greater than 20 nucleotide variants resulting in amino acid substitutions within viral proteins were identified, including codons predicted to modulate antiviral drug resistance, and over 50 nucleotide variants generating synonymous codons were sequenced. Quantitative variant information provided by DxSeq probes allowed for precise % estimates for codon usage by viral variants within these patient samples.
  • the HCV DxSeq assay enables the sensitive sequencing of a broad range of genotype and resistance loci simultaneously from a patient sample, with a single tube assay. This technology low cost technology can enable the broader application of new sequencing platforms to clinical genotyping.
  • a system for detecting an organism such as a pathogen, as well as a method for using the system to identifying and detect the organism.
  • the system can comprise a probe or plurality of probes.
  • Embodiments of the present invention include optimized nucleic acid probes, and methods of using them, that enable the skilled artisan to simultaneously detect a plurality of organisms in a complex mixture, without the need for culturing.
  • the invention can be based, at least in part, on the discovery of sequences, from sets of large query hepatitis C virus (HCV) sequences such as whole genomes, which can be used in multiplex diagnostic assays that dramatically reduce assay time and cost, compared to conventional diagnostics.
  • HCV hepatitis C virus
  • nucleic acids and methods of the invention enable the skilled artisan to identify hepatitis C virus and differentiate between closely related strains thereof based on the sequence of regions containing, for example, single nucleotide polymorphisms (SNPs), insertions, deletions, or indels (sites where a colocalized insertion and deletion has occurred, resulting in a net gain or loss in nucleotides).
  • SNPs single nucleotide polymorphisms
  • insertions insertions
  • deletions or indels
  • a further advantage of the nucleic acid probes and methods of the invention can be the ability to interrogate specific host loci in parallel with detecting infectious agents, e.g., for host genotyping.
  • nucleic acid probes and methods of the invention may be further multiplexed and used in automated systems, such as microplates, for high throughput processing of large numbers of samples by centralized laboratory, hospital, and/or diagnostic facilities.
  • a composition comprising a plurality of probes can comprise extracted nucleic acids from a test sample.
  • the extracted nucleic acids may be from a biological sample.
  • the biological sample may be from a human patient.
  • a composition comprising a plurality of probes can comprise at least one sample internal calibration standard nucleic acid.
  • a composition comprising a plurality of probes can comprise at least one probe that specifically hybridizes with the sample internal calibration standard nucleic acid.
  • kits comprising the composition comprising a plurality of probes as described herein and instructions for use.
  • a kit can also comprise reagents for obtaining a sample (e.g., swabs), and/or reagents for extracting DNA, and/or enzymes, such as polymerase and/or ligase to capture a region of interest.
  • a system for detection of an organism can comprise a mixture or probe set comprising a plurality of probes.
  • the target organism for a particular probe may be any organism, such as a viral, bacterial, fungal, archaeal, or eukaryotic, including single cellular and multicellular eukaryotes, organism.
  • a target organism can be a pathogen.
  • a probe can be a sequence that hybridizes to another sequence.
  • a probe can be a linear, unbranched polynucleic acid.
  • a probe can comprise two homologous probe sequences separated by a backbone sequence, where the first homologous probe sequence can be at a first terminus of the nucleic acid and the second homologous probe sequence can be at the second terminus to the nucleic acid, and where the probe can be capable of circularizing capture of a region of interest of at least 2 nucleotides. Circularizing capture can be a probe becoming circularized by incorporating the sequence complementary to a region of interest.
  • One aspect of the invention provides mixtures of circularizing capture probes suitable for sensitive, rapid, and highly specific detection of one or more hepatitis A, B, and/or C viruses in complex samples.
  • Basic design principles for circularizing probes such as simple molecular inversion probes (MIPs) as well as related capture probes can be known in the art and described in, for example, Nilsson et al., Science, 265:2085-88 (1994), Hardenbol et al., Genome Res., 15:269-75 (2005), Akharas et al, PLOS One, 9:e915 (2007), Porecca et al, Nature Methods, 4:931 -36 (2007); Deng et al.,Nat. Biotechnol, 27(4):353-60 (2009), U.S. Patent No s. 7,700,323 and 6,858,412, and International Publications WO/1999/049079 and WO/1995/022623.
  • a probe set can include large number of probes, e.g., 10, 20, 30, 40, 50, 100, 200, 400, 500, 1000, 2000, 3000, 4000, 5000, 10000, 20000, 40000, 80000, or more.
  • a probe set can include one or more probes directed to a large number of different target organisms, e.g., at least 10, 20, 40, 60, 80, 100, 150, 200, 250, or more different target organisms.
  • a mixture including one or more probes to a plurality of target organisms can contain only one probe to a target organism.
  • a mixture can contain more than one probe to a target organism, suchas about 2, 3, 4, 5, 6, 7, 8, 9, 10, or more probes for a target organism.
  • a mixture can further include probes with homologous probe sequences that specifically hybridize to the host genome for applications such as host genotyping.
  • Mixtures can further comprise sample internal calibration standards.
  • Mixtures can comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 60, 80, 100, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000, 1 100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 3000, 4000, 5000, 10000, 20000, 40000, 80000, or more probes.
  • Mixtures can capture at least one sequence for each of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 60, 80, 100, 150, 200, 250, 300, 400, 500, 1000, different strains of HCV.
  • a mixture can comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 65, 70, 75, or 80 probes comprising the homologous probe sequence pairs listed in Table 1.
  • Each probe in the plurality of probes can detect the different organisms, pathogens, different strains, variants or subtypes of a pathogen or different strains, variants or sub-types of different pathogens, with high specificity, sensitivity, or both.
  • Sensitivity can be determined by: (number of true positives)/(number of true positives + number of false negatives).
  • the specificity can be determined by: (number of true negatives)/(number of true negatives + number of false positives).
  • each probe may detect or distinguish different organisms, different pathogens, different strains, variants or sub-types of a pathogen or different strains, variants or sub-types of different pathogens, with at least 70 % sensitivity, specificity, or both, such as with at least 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, or 89% sensitivity, specificity, or both, such as with at least 90% sensitivity, specificity, or both in an assay.
  • Each probe may detect or distinguish different organisms, different pathogens, different strains, variants or sub-types of a pathogen or different strains, variants or sub-types of different pathogens with at least 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100%o sensitivity, specificity, or both, in an assay.
  • the confidence level for determining the specificity, sensitivity, or both, in an assay may be with at least 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99%o confidence.
  • a plurality of probes can be used to detect at least 20, 30, 40, 50, 60, 65, 70, 80, 90 or 100 loci for a phenotype of an organism in at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1250, 1500, 1750, or 2000 strains, variants or subtypes.
  • a plurality of probes can be used to detect at approximately 65 viral drug resistance loci in approximately 1500 strains.
  • a plurality of probes can be used to detect approximately 20 or more viral drug/phenotype loci in approximately 50 or more strains.
  • a plurality of probes can be used for species or strain detection or distinction in approximately 20 or 50 or more viral genomes.
  • the plurality of probes can also be used to detect human or livestock diseases, such as detecting approximately 20 or more bacterial and viral species or types, such as detecting HCV strains.
  • Probes in a mixture can have similar bulk properties (such as, homologous probe sequence length, homologous probe sequence T m , and length of the captured region of interest, and the lack of secondary structure) or fall in ranges of similar values.
  • a T m of homologous probe sequences in a mixture of probes can be within 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 °C of each other, or can have the same T m .
  • Homologous probe sequences in a mixture of probes can all be within 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 nucleotide in length of each other.
  • the length of the region of interest between the target sequences of probes in a mixture may vary over a range of values, such as from 2 to 20, 20 to 100, 20 to 200, 40 to 300, 100 to 300, 100 to 500, or 80 to 500 nucleotides.
  • a length of the region of interest between the target sequences of probes in a mixture can be from 100 to 489 nucleotides. Regions of interest can be within 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 nucleotides in length of each other.
  • Barcode lengths may also vary, and can be within 25, 20, 15, 10, or 5 nucleotides of each other. Barcodes can be the same length.
  • a plurality of probes can detect at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1250, 1500, 1750, or 2000 different organisms or pathogens.
  • a plurality of probes can detect at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 300, 400, 500, 600, 700, 800, 900, 1250, 1500, 1750, or 2000 different strains, variants or sub-types of a pathogen or different strains or sub-types of different pathogens.
  • a probe set can be used to identify and/or detect at least 2 different viral, bacterial, or fungal strains.
  • a probe set can identify at least 50 different organisms, such as 50 different pathogens, or 50 different strains or subtypes of a pathogen, such as HCV.
  • a probe set can comprise probes capable of detecting a single molecule of a pathogen, thereby detecting, distinguishing or identifying the pathogen.
  • Mixtures can comprise capture reaction products and amplification reaction products from different test samples, as further described below.
  • different capture reaction products and/or amplification reaction products can be combined and multiplexed before detection, i.e., for concurrent detection. This can be accomplished using barcode sequences that identify the test samples.
  • capture reaction products from test sample A can include a sample A-specilic barcode
  • capture reaction products from sample B can include a sample B-specilic barcode.
  • all sequences in the sample A capture reaction products can be identified by the presence of the sample A-specilic barcode sequence.
  • Mixtures can contain sample internal calibration nucleic acids (SICs).
  • SICs sample internal calibration nucleic acids
  • Known quantities of one or more SICs can be included in a mixture provided by the invention. At least 1 , 2, 3, 4, 5, 6, 7, 8, 10, 15, 20, 25, or 30 different SICs can be included in the mixture. There can be about 2, 3, 4, or more different SICs in a mixture.
  • SICs can have a nucleotide composition characteristic of pathogenic DNA targets and can be present in specific molar quantities that allow for reconstruction of a calibration curve for quality control, such as for the processing and sequencing steps for each individual test sample.
  • SICs can make up approximately 10% (molar quantity) of nucleic acids in a mixture, for example, 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20% (molar) of nucleic acids in the mixture.
  • Different SICs can be present in different concentrations, for example, in a dilution series, over a 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 200, 500, 1000, 5000, 10000, 50000, or 100000 -fold concentration range from the most dilute to most concentrated SICs in 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, or 50 steps.
  • SICs can be present in a sample such as a mixture of probes and a test sample, a capture reaction, a capture reaction product, an amplification reaction, or an amplification reaction product, at concentrations of 5, 25, 100, 250 or more copies/ml.
  • concentrations of 5, 25, 100, 250 or more copies/ml By detecting the predetermined concentration of the SICs -for example, by using probes directed to the SICs-the skilled artisan can estimate the concentration of an organism of interest such as a virus in a test sample. This can be accomplished by correlating the frequency that a captured sequence can be detected to the volume of the sample from which the nucleic acids were obtained. Thus, an organism count per unit volume (e.g., copies/mL for liquid samples such as blood or urine) can be estimated for each organism detected.
  • an organism count per unit volume e.g., copies/mL for liquid samples such as blood or urine
  • the concentration of SICs and probes directed to the SICs can be adjusted empirically so that sequences of SICs detected in a capture reaction product and/or amplification reaction product make up about 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 25, or 30%o of sequences in the mixture. SICs can make up 10-20%o of sequence reads. The number of SICs sequence reads in a sequencing reaction can be quantitatively evaluated to ensure that sample processing occurs within pre-defined parameters.
  • the pre-defined parameters can include one or more of the following: reproducibility within two standard deviations relative to all samples sequenced during a particular run, empirically determined criteria for reliable sequencing data, such as base calling reliability, error scores, percentage composition of total sequencing reads for each probe per target organism, no greater than about 15% deviation of GC or AU-rich SICs within a sequencing run.
  • SICs DNA in a sample can also comprise the same barcode(s) corresponding to unique samples, such as particular patient samples.
  • SICs may comprise a region of interest as defined above, where the region of interest can be modified to further comprise a sequence heterologous to the region of interest.
  • a sequence heterologous to the region of interest in the SICs can be at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40 contiguous bases, or more.
  • Each probe in the probe set can comprise the same or different backbone size, sequence, chemistries, configuration of barcodes and sequences, specific sequences for probe enrichment, target sites for probe cleavage, hybridization arm physical and chemical properties, probe identification regions, and/or low structure optimized design.
  • a probe may be selected after screen key loci for pathogenicity and/or drug susceptibility for a pathogen, and a genetic fingerprint or genotype for each sub-strain that contains key phenotypic information can be generated.
  • the phenotype can be any characteristic, such as drug resistance, virulence or breeding phenotypes.
  • a probe set can comprise a first probe for genotyping a host target sequence and a second for detecting or identifying a pathogen.
  • a second probe can distinguish different strains, variants or sub-types of the pathogen.
  • a probe set can genotype or detect different strains or subtypes of HCV and the genotype of the individual or host with HCV, or an immuno-compromised individual.
  • Aspects of the invention provide one or more probes for multiplex analysis of test samples, including hepatitis virus detection and hepatitis viral genotyping in a biological sample from a patient.
  • Each probe arm sequence in the group consisting of all sequences from SEQ ID NO: 1 to SEQ ID NO: 874 can be contained in at least one probe in the plurality of probes.
  • Each probe arm sequence in the group consisting of all sequences from the even SEQ ID NO:2 to SEQ ID NO:874 can be contained in at least one probe in the plurality of probes.
  • Each probe arm sequence in the group consisting of all sequences from the odd SEQ ID NO: 1 to SEQ ID NO: 873 can be contained in at least one probe in the plurality of probes.
  • Probes in a mixture may be selected such that the mixture can comprise a subset of the full group of probes encompassed by the probe arm sequence pairs provided in Table 1, so as to detect a particular subset of hepatitis A, B, C, D, or E strains. Probes in a mixture may also be selected such that the mixture can comprise a subset of the full group of probes encompassed by the probe arm sequence pairs provided in
  • Probes in a mixture may include one or more probes each having a probe arm sequence pair, such as a homologous probe arm sequence pair provided in Table 1, so as to detect one or more particular hepatitis A, B, C, D, or E strains and to detect one or more drug resistance mutations in a certain set of HCV strains.
  • a probe can comprise a first sequence that hybridizes to a 5' end of a target sequence and a second sequence that hybridizes to a 3' end of a target sequence, wherein the target sequence can be used to identify, detect, or distinguish an organism, such as pathogen.
  • Probes in the mixture can each comprise a first and second homologous probe sequence-separated by a backbone sequence-that specifically hybridize to a first and second sequence, such as sequences 3' and/or 5' to a target sequence, respectively, in the genome of at least one target organism.
  • First and second homologous probe sequences may not be complementary to a target sequence, but can ligate to the 5' and 3' termini of a target nucleic acid, such as a microRNA, and can possess appropriate chemical groups for compatibility with a nucleic acid-ligating enzyme, such as phosphorylated or adenylated 5' termini, and free 3' hydroxyl groups.
  • a probe can be capable of circularizing capture of a region of interest.
  • Homologous probe sequences or the sequences of the probe that hybridize or can be homologous to the 3' and/or 5' region of a target sequence can specifically hybridize to target sequences in the genome of their respective target organism, but may not specifically hybridize to any sequence in the genome of a predetermined set of sequenced organisms-the exclusion set.
  • the 'homologous probe sequences' can be designed specifically to not substantially hybridize to any sequence within a defined set of genomes, such as an exclusion set.
  • the exclusion set can include the host's genome.
  • An exclusion set can also include a plurality of viral, eukaryotic, prokaryotic, and archaeal genomes.
  • a plurality of viral, eukaryotic, prokaryotic, and archaeal genomes in the exclusion set may comprise sequenced genomes from commensal, non- virulent, or nonpathogenic organisms.
  • An exclusion set for all probes in a mixture can share a common subset of sequenced genomes comprising, for example, a host genome and commensal, non-virulent, or non-pathogenic organisms.
  • the exclusion set can vary between probes in the mixture so that each probe in the mixture may not specifically hybridize with the target sequence of any other probe in the mixture.
  • Sequences 3' and/or 5' to a target sequence can be separated by a region of interest (eg. the target sequence) of at least two nucleotides. They can be separated by at least 5, 6, 7, 8, 9, 10, 12, 14, 18, 20, 25, 30, 50,75, 100, 150, 200, 300, 400, 600, 1200, 1500, 2500, or more nucleotides.
  • the first and second target sequences can be separated by no more than 5, 6, 7, 8, 9, 10, 12, 14, 18, 20, 25, 30, 50,75, 100, 150, 200, 300, 400, 600, 1200, 1500, or 2500 nucleotides
  • the probes in the probe set can each comprise homologous probe sequences which can be substantially free of secondary structure, may not contain long strings of a single nucleotide (e.g., they have fewer than 7, 6, 5, 4, 3, or 2 consecutive identical bases), can be at least about 8 bases (e.g., 8, 10, 12, 14, 16, 18, 20, 22, 24, 25, 26, 27, 28, 30, or 32 bases in length), and can have a Tm in the range of 50-72°C (e.g., about 53, 54, 55, 56, 57, 58, 59, 60, 61, or 62°C).
  • First and second homologous probe sequences can be about the same length and have the same Tm.
  • a length and Tm of the first and second homologous probe sequences can differ.
  • Homologous probe sequences in each probe may also be selected to occur below a certain threshold number of times in the target organism's genome (e.g., fewer than 20, 10, 5, 4, 3, or 2 times).
  • the probes may further comprise a backbone sequence, which can contain a detectable moiety and a primer binding site, between the homologous probe sequences.
  • the probe/target duplexes can be suitable substrates for polymerase-dependent incorporation of at least two nucleotides on the probe (on the extension arm), and/or ligase-dependent circularization of the probes (either by circularizing a polymerase-extended probe or by sequence-dependent ligation of a linking polynucleotide that spans the region of interest).
  • a capture reaction can be a process where one or more probes contacted with a test sample has potentially undergone circularizing capture of a region of interest, wherein the first and second homologous probe sequences in the probe have specifically hybridized to their respective target sequence in the test sample to capture the region of interest between the first and second target sequences of the probe.
  • a capture reaction may produce no circularized products containing a region of interest if none of the organisms targeted by the probes were present in the sample.
  • Capture reaction products can be the mixture of nucleic acids produced by completing a capture reaction with a test sample.
  • An amplification reaction can be the process of amplifying capture reaction products.
  • An amplification reaction product can be the mixture of nucleic acids produced by completing an amplification reaction with a capture reaction product.
  • First and second homologous probe sequences can be not complementary to the target sequence, but can ligate to the 5' and 3' termini of a target nucleic acid, such as small RNAs and microRNAs, and can possess appropriate chemical groups for compatibility with a nucleic acid-ligating enzyme, such as phosphorylated or adenylated 5' termini and free 3' hydroxyl groups.
  • a probe with an adenylated 5' end and a free 3'-OH can be ligated near-simultaneously to a small RNA fragment containing compatible ligation ends in one-step.
  • a probe may capture a small target nucleic acid in a two-step process wherein a probe with an adenylated 5' end and a blocked 3' end (e.g., a dideoxy nucleotide-blocked end) may be ligated to the target small RNA). This may occur by initial removal of an RNA base within the probe by guided RNase H2 digestion, and subsequent near-simultaneous ligation of the now 3'-OH-terminating probe to the small RNA.
  • the probe may be ligated to the 5'- adenylated probe site, and then the blocked 3' end of the probe may be digested by RNase H2 to generate a free 3'- OH for ligation.
  • the backbone sequence of the probes may include a detectable moiety and a primer-binding sequence.
  • a backbone sequence of the probes can comprise a second primer.
  • a detectable moiety can be a barcode.
  • a backbone can further comprise a cleavage site, such as a restriction endonuclease recognition sequence.
  • a backbone can contain non-WatsonCrick nucleotides, including, for example, abasic furan moieties, and the like.
  • Probes provided by the invention can include a probe backbone sequence between the first and second homologous probe sequences.
  • a backbone sequence can be at least 15, 20, 25, 30, 35, 40, 45, 50, 70, 90, 100, 12, 140, 150, 160, 180, 200, 400 bases, or more.
  • the backbone sequence may include a detectable moiety.
  • a detectable moiety can be a probe- specific sequence, such as a barcode for identification of a specific probe or set of probes.
  • a backbone sequence can comprise one or more primer-binding sites.
  • a backbone can include two primer-binding sites.
  • Each backbone primer-binding site may comprise one or more universal sequences that, for example, can be used to amplify all circularized probes in a mixture.
  • a primer-binding site can also contain one or more probe-specific sequences, such as a barcode, for identification and/or amplification of a specific probe or set of probes.
  • a backbone can comprise one or more 2'OMethyl nucleotide residues, artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer), or 2'OMethyl, abasic furans, or LNA nucleotides, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more LNAs or 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% 2'OMethyl, abasic furans, or LNA nucleotides, to confer greater reactivity or inertness in the hybridization reaction, provide resistance to enzymatic activities such as polymerase-mediated strand displacement or nuclease cleavage, to serve as inhibitors of spurious amplification events, or to act as target sites for trans-acting nucleic acid oligonucleotides such as PCR primers or biotinylated capture probes.
  • artificial base pairs such as IsodC
  • a backbone sequence can comprise a cleavage site.
  • a cleavage site can be a restriction endonuclease recognition site.
  • a backbone sequence can comprise one or more detectable moieties.
  • One or more detectable moieties can be each independently selected from a barcode sequence and a primer-binding sequence.
  • a backbone sequence of a probe in a composition comprising a plurality of probes can differ from the backbone sequence of one or more other probes in the composition.
  • a backbone sequence can comprise the sequence
  • a backbone sequence can comprise the sequence
  • a backbone sequence of a probe in a composition comprising a plurality of probes can differ from the backbone sequence of one or more other probes in the composition.
  • a homologous probe sequence can be a portion of a probe provided by the invention that specifically hybridizes to a target sequence present in the genome of an organism or virus, such as hepatitis A, B, C, D, or E virus.
  • the terms "homologous probe sequence,” “probe arm,” “homer,” and “probe homology region” each can be homologous probe sequences that may specifically hybridize to target genomic sequences, and can be used interchangeably herein.
  • a target sequence can be a nucleic acid sequence on a single strand of nucleic acid in the genome of an organism of interest.
  • Homologous probe sequences in the probes can be the probe pairs listed in Table l.
  • the term “hybridizes” can refer to sequence-specific interactions between nucleic acids by Watson-Crick base-pairing (A with T or U and G with C).
  • “Specifically hybridizes” can mean a nucleic acid hybridizes to a target sequence with a T m of not more than 14 °C below that of a perfect complement to the target sequence.
  • GCCCAATAGACCCCTGGTCTGC SEQ ID 81 GGTCGACATTGGTGTACATCT SEQ ID 82
  • CTGTCAGGGATGATACTGCCTTC SEQ ID 171 CAAATGACCACTAGGTCATCTC SEQ ID 172
  • GCTACCCCTGCTATCACCCCG SEQ ID 197 GGAGAAGAGTTGTCCGTGAAC SEQ ID 198
  • CATGTCCGGTGATCTCAGCTCC SEQ ID 221 GCCTTATTTCCACGTATTCCTC SEQ ID 222 GGTCCAGGTCACAACATTGGTAG SEQ ID 223 TTGGGCCTTAATGTAGCAAGT SEQ ID 224
  • AACGTAAAGCCTCTCAGTGAGGG SEQ ID 235 ACAGATAACGACTAGGTCGTC SEQ ID 236
  • ACATGTCCAGTAATTTCAGCTCCA SEQ ID 239 CTTACTTCCACGTATTCCTCTG SEQ ID 240
  • CAGCGAGTGTGCATAATGCCAT SEQ ID 295 TGCCTTATTTCCACATATTCCTC SEQ ID 296
  • GTGAGGCTGCTGAGTATGGTAGT SEQ ID 31 GCGTTGGCAGGATACAAAGG SEQ ID 312
  • GGTAAAGGTCTGAGGAGCCGC SEQ ID 327 ATAAAGTCCACCGCTTTAGCC SEQ ID 328
  • GCAAGCTCCCTCTATTGTCGCC SEQ ID 439 ATAGTAGTTTCCATGGACTCAAC SEQ ID 440
  • AAGTCGTCGCCACACACGAGCA SEQ ID 459 ATGTTATGAGCTCCAAGTCGTATTC SEQ ID 460
  • GCTTCCTCTACGGATAGCAAGTTAGCC SEQ ID 497 AGCCATGATGGTAGTGTCTATT SEQ ID 498
  • CTAGTTGTCAGTACGCCGCTCG SEQ ID 533
  • CCTCCGTGAAGGCTCTCAGGCTCGC SEQ ID 534
  • ATGTTATCAGCTCCAGGTCGTATTC SEQ ID 543 ACATGATGATGTTGCCTAGCCAGGA SEQ ID 544
  • GCCTACGTAGAGCCGTTCCG SEQ ID 557 AACATGTCATAGTCCTTGATGTTGG SEQ ID 558 GCATCTACGGTAGCCACATGAC SEQ ID 559 AATGCTCGCAGTGACGCAGTGTC SEQ ID 560
  • GCATAATGCCGTCTCCTCGC SEQ ID 639 AACTTATAGTTCGGCGCAGG SEQ ID 640
  • AGTACAGTACACACCCAGTCCCA SEQ ID 831 ATCTTCATAGACCCGTTCTTTAC SEQ ID 832
  • Table 1 Each row indicates a homologous probe sequence pair for a probe, such as an individual probe.
  • a bridge nucleic acid may be employed, wherein at least a first portion of the bridge nucleic acid can be capable of hybridizing to the capture probe, and at least a second portion of the bridge nucleic acid (which may overlap with the first portion) can be capable of simultaneously or sequentially hybridizing to the target nucleic acid, thereby enhancing the efficiency of ligation of the capture probe to the target.
  • Homology between two sequences may be determined by any means known in the art, including pairwise alignment, dot-matrix, and dynamic programming, and by FASTA (Lipman and Pearson, Science, 227: 1435-41 (1985) and Lipman and Pearson, PNAS, 85: 2444-48 (1998)), BLAST (McGinnis & Madden, Nucleic Acids Res., 32:W20-W25 (2004) (current BLAST reference, describing, inter alia, MegaBlast); Zhang et al, J. Comput.
  • the methods provided herein can comprise screening candidate sets of sequences by MegaBLAST against one or more annotated genomes.
  • a sequence can specifically hybridize when it hybridizes to a target sequence under stringent hybridization conditions.
  • Stringent hybridization conditions can refer to hybridizing nucleic acids in 6xSSC and 1% SDS at 65 °C, with a first wash for 10 minutes at about 42 °C with about 20% (v/v) formamide in O. lxSSC, and a subsequent wash with 0.2xSSC and 0.1% SDS at 65 °C.
  • Alternate hybridization conditions can include different hybridization and/or wash temperatures of about 55, 56, 57, 58, 59, 60, 61 , 62, 63, 64, 66, 67, 68, 69, or 70 °C or other hybridization conditions as disclosed in Sambrook and Russell, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 3rd edition (2001), which can be incorporated herein by reference.
  • a hybridization temperature can be greater than 60 °C, e.g., 60-65 °C.
  • An organism can be any biologic with a genome, including viruses, bacteria, archaea, and eukaryotes including plantae, fungi, protists, and animals.
  • a region of interest can refer to the sequence between the nearest termini of the two target sequences of the homologous probe sequences in a probe.
  • Homologous probe sequences in a probe provided by the invention can readily be adapted for use as a pair of conventional primer pairs for use in a polymerase chain reaction (PCR) to specifically amplify a region of interest from a viral sequence.
  • Conventional primer pairs can refer to a pair of linear nucleic acid primers each member of which can comprise sequences corresponding to one of the two homologous probe sequences in a probe provided by the invention, which can be capable of exponential amplification of a region of interest comprising at least two nucleotides.
  • These conventional primer pairs can be encompassed by and can be a part of the present invention.
  • conventional primer pairs can be oriented with their 3' ends facing each other to facilitate exponential amplification.
  • Cnventional primer pairs can comprise a barcode sequence.
  • Conventional primer pairs can comprise universal sequences, including, for example, sequences that hybridize to adaptamer primers.
  • the probes and conventional primer pairs provided by the invention may comprise the naturally occurring conventional nucleotides A, C, G, T, and U (in deoxyriobose and/or ribose forms) as well as modified nucleotides such as 2'0-Methylmodified nucleotides (Dunlap et al, Biochemistry. 10(13):2581-7 (1971)), artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer) (Chakravorty, et al. Methods Mol Biol.
  • 5' or 3' homologous probe sequences of a probe provided by the invention can comprise, at their respective termini, a photocleavable blocking group, such as PC-biotin.
  • a probe can comprise a
  • photocleavable blocking group at its 5' terminus to block ligation until photoactivation.
  • A can comprise at its 3' terminus a photocleavable blocking group to block polymerase-dependent extension or n-mer oligonucleotide ligation until photoactivation.
  • a 5'-most nucleotide of a probe provided by the invention can comprise an adenylated nucleotide to improve ligation and/or hybridization efficiency. See, e.g., Hogrefe et al, J Biol. Chem. 265 (10): 5561-5566, (1990).
  • a 5' end of the 5' homologous probe region e.g., H2, the ligation arm
  • the 5' terminal nucleotide can be a LNA.
  • a barcode can be used to refer to a nucleotide sequence that uniquely identifies a molecule or class of related molecules.
  • Suitable barcode sequences that may be used in the probes of the invention may include, for example, sequences corresponding to customized or prefabricated nucleic acid arrays, such as n-mer arrays as described in U.S. Patent No. 5,445,934 to Fodor et al. and U.S. Patent No. 5,635,400 to Brenner.
  • An n-mer barcode may be at least 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400 or 500 nucleotides, e.g., from 18 to 20, 21, 22, 23, 24, or 25 nucleotides.
  • An n-mer barcode can be from 6 to 8 nucleotides.
  • An n-mer barcode can be from 10 to 12 nucleotides.
  • Barcodes can include sequences that have been designed to require greater than 1, 2, 3, 4 or 5 sequencing errors to allow this barcode to be inadvertently read as another in error.
  • a probe may not contain a barcode, while a primer that can be used to amplify a circularized probe can contain a barcode.
  • barcode sequences for each barcode size K, 4 K random barcodes may be generated from the four DNA nucleotides, A, T, G, C, using a Perl script.
  • This set of barcodes represents the total number of unique sequence combinations possible for a sequence of K length, using 4 nucleotide variations. Barcodes for which one nucleotide can comprise 100% of the length, e.g., TTTTTT, can be then optionally removed using a pattern- matching Perl script.
  • Further filtering steps may include removal of barcodes which contain runs of nucleotides of more than 3 identical nucleotides, e.g., TGGGGT, or runs of such identical nucleotides interrupted by only one nucleotide, for instance, GGGTGG.
  • Barcodes containing palindromes or inverted repeats with a propensity to form secondary structure through self-hybridization may be filtered using a Perl script designed to identify such self- complementarity.
  • a set of candidate barcodes may be further filtered such that every barcode contains at least some number of base differences compared to any other barcode.
  • barcodes may be selected to be an edit distance of two nucleotides apart (i.e., differing in sequence by two nucleotides) to ensure that a single sequencing error does not cause barcode mis-identification.
  • Selection of barcodes that may be utilized in a mixture of probes used to test a sample from a patient may involve selecting a combination of barcodes that can provide greater than 5%> and not more than 50%o representation of a particular nucleotide at each position in the barcode sequence within the pool. This can be achieved by random addition and removal of barcodes to a pooled set until the conditions specified can be met using a Perl script.
  • Barcodes for which the reverse complement sequence can be also present within the barcode pool may also be eliminated.
  • Barcodes used in the probes can correspond to those on the Tag3 or Tag4 barcode arrays by AFFYMETRIX Tm . Further discussion of barcode systems can be found in Frank, BMC Bioinformatics, 10:362 (2009; 13 pages), Pierce et ah, Nature Methods, 3:601-03 (2006) (including web supplements), and Pierce et ah, Nature Protocols, 2:2958-74 (2007).
  • a barcode can be sample-specific, e.g., a barcode can be a patient-specific barcode. More than one barcode can be assigned per patient sample, allowing replicate samples for each patient to be performed within the same sequencing reaction. By using sample nucleic acid-specific barcodes it can be possible to both multiplex reactions as described in the present application, as well as detect cross- contamination between test samples that did not use a defined repertoire of specific barcodes.
  • a barcode can be a temporal barcode, i.e., a barcode that specifies a particular period of time. By using a temporal barcode, it can be possible to detect carry-over or contamination on an assay instrument, such as a sequencing instrument, between runs on different days. Sample and/or temporal barcodes may be used to automatically detect cross-contamination between samples and/or days and, for example, instruct an instrument operator to clean and/or decontaminate a sample handling system, such as a sequencing instrument.
  • a barcode sequence can be also a primer-binding sequence.
  • An amplification primer can include both universal and probe-specific sequences.
  • a universal sequence can be internal (i.e., 3') to probe-specific regions, or universal sequence(s) can be external (i. e., 5' to probe specific regions).
  • Universal and probe- specific sequences can be adjacent. They can be separated by at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, or 50 nucleotides, or more.
  • Universal primer-binding sequences in a backbone sequence can serve as a hybridizing template for longer adaptamer primers.
  • An adaptamer primer can be a primer that hybridizes to universal primer sequences in a capture reaction product to facilitate amplification of the capture reaction product and further comprise a sample-specific barcode sequence, such as a sequence 5' to the universal primer hybridizing region of the adaptamer primer.
  • Adaptamer primers can be used, for example, to incorporate sample-specific barcodes on amplification reaction products to allow further multiplexing of samples after completing a capture reaction and an amplification reaction. The addition of sample-specific barcodes can allow for multiple capture and/or amplification reaction products to be pooled before detection by, for example, sequencing.
  • Adaptamer primers can further include universal sequences that hybridize to a sequencing primer.
  • a detectable moiety may be associated with the backbone sequence. It may be bound to the polynucleotide sequence, as in the case of direct labels, such as fluorescent (e.g., quantum dots, small molecules, or fluorescent proteins), chemical or protein-based labels. Alternatively, a detectable moiety may be incorporated within the polynucleotide sequence, as in the case of nucleic acid labels, such as modified nucleotides or probe-specific sequences, such as barcodes. Quantum dots can be known in the art and can be described in, e.g., International Publication No. WO 03/003015.
  • kits comprising a probe set as disclosed herein.
  • a kit can comprise one or more of the following: reagents for obtaining a sample (e.g., swabs), reagents for extracting DNA, enzymes (such as polymerase and/or ligase to capture a region of interest), reagents for amplifying the region of interest, reagents for purifying the DNA or amplified or captured regions of interest (eg. purification cartridge), buffers, and sequencing reagents.
  • a kit may be a low throughput kit, such as a kit for a small number of samples (eg. less than 50 samples, such as 8 to 48 samples).
  • a kit may be a high-throughput kit, such as a kit for a large number of samples (eg. more than 50, such as at least 50 or more samples, such as 96-1536).
  • RNA template There can be numerous applications of MIP technology in healthcare diagnostics, environmental surveillance, food and water safety, agricultural plant and animal breeding, and human genetics that may require capture from an RNA template:
  • RNA viruses Infectious agents causing disease. RNA viruses can be the etiological agents behind many common diseases that can be important health concerns. Some commonly known RNA viruses include human immunodeficiency virus (HIV), the agent causing acquired immune deficiency syndrome (AIDS); rotavirus, an agent causing diarrhea; poliovirus, the causative agent of polio; hepatitis C virus (HCV), an agent causing hepatitis; the influenza virus, the agent causing influenza; and SARS coronavirus, the agent behind severe acute respiratory syndrome (SARS).
  • HCV human immunodeficiency virus
  • HCV hepatitis C virus
  • SARS coronavirus the agent behind severe acute respiratory syndrome
  • RNA viruses Causes of morbidity and mortality in cancer. It has been established that infectious agents, including viruses, can promote carcinogenesis. RNA viruses have been linked to several forms of cancer, including T-cell leukemia (Human T lymphotrophic virus type 1 (HTLV-1) and liver cancer (hepatitis C virus (HCV)).
  • T-cell leukemia Human T lymphotrophic virus type 1 (HTLV-1)
  • HCV hepatitis C virus
  • Bacterial Gene expression many healthcare or surveillance applications look for the expression of drug resistance or toxicity genes to determine the clinical or safety significance of bacterial presence. While sequencing or detection of genomic DNA can reveal the presence of resistance or virulence genes, such genes might be turned off or hyper-expressed due to mutations elsewhere in the genome. Quantifying the transcript level of the gene provides a simpler and more accurate method of detecting the gene's expression level.
  • Pathogen detection A pathogenic organism (generally a bacteria or fungus) may be clinically significant though present at only a few copies in a given clinical sample (eg, in a severely infected patient, only a few hundred bacteria may be present in one mL of blood). This low copy count presents a challenge to a diagnostic method that detects the genomic DNA of the organism. By detecting the RNA transcripts of highly expressed genes, however, the system may have thousands of template molecules to work with, thus easing the technical challenges of detecting the organism at low levels.
  • RNA can be also useful in human disease in that some diseases or conditions can be characterized by the inclusion or exclusion of particular exons in the transcript. Thus, the condition cannot be detected by sequencing the DNA genome and must be detected by examining the RNA transcript.
  • Gene expression In many applications, the level (either absolute or relative to other genes) of expression of one or more genes that indicate the phenotype of the cell or cells in the sample can be detected.
  • the phenotype may be
  • RNA into a cDNA RNA into a reverse transcription step that converts the RNA into a cDNA.
  • This step may be general, using random primers or a poly-T primer, or it may use a target-specific primer to convert only specific regions of RNA.
  • the methods provided herein do not require a distinct reverse-transcription step in order to capture (and thus detect, sequence, and quantify) RNA. Instead, the reverse-transcription generates the circular DNA molecules themselves without any intermediate. This direct capture reduces the assay's cost, time to result, and labor requirements.
  • the circular DNA molecules produced by these methods may be used in any of the same ways as would circular DNA molecules produced from a DNA template, including, but not limited to:
  • the molecules may be hybridized to a microarray
  • the molecules may be amplified by rolling circle amplification.
  • the probe(s) can contain a backbone between the two probe arms and the backbone can include a binding site for a primer used in the RCA.
  • the molecules may be amplified by PCR.
  • the probes can contain a backbone between the two probe arms and the backbone can include binding sites for two primers oriented in opposite directions and suitable to amplify the section of the circularized molecules containing the probe arms and capture region
  • the molecules may be cleaved if the probes contain a backbone between the probe arms where the
  • backbone contains a cleavage site. Cleavage of the circularized probes can be typically performed after the circularized molecules can be purified from the input reaction mixture (eg, by exonucleasing the linear template molecules) so as to make PCR amplification or hybridization easier or more efficient. • The molecules may be sequenced directly if the backbone contains suitable sequencing adapters.
  • the molecules may be quantified by qPCR using primers against any of the backbone, probe
  • MIP molecular inversion probe
  • One embodiment provides direct capture from RNA that does not require the separate formation of complementary DNA product an intermediate step for the formation of cDNA.
  • the method can utilize specific concentrations of MnS04, MgC12 and dNTPs, as well as multiple enzymes to accomplish this result.
  • Another embodiment provides direct capture from RNA, through a DNA: RNA hybrid state, that does not require the separate formation of complementary DNA product an intermediate step for the formation of cDNA.
  • Another embodiment provides a method based on the use of direct RNA capture to detect viruses, bacteria or fungi from a clinical sample where the viral, bacterial or fungal molecules can be present a level below that which could be detected by using DNA capture.
  • One embodiment provides a method of generating a circular DNA molecule from a probe containing a backbone flanked by two target-specific binding arms using an RNA template such that the resulting molecule contains the backbone, probe arms, and reverse complement of the targeted RNA sequence.
  • Another embodiment provides a method based on the use of a polymerase capable of synthesizing complementary strand of cDNA from an RNA template, and a ligase capable of ligating DNA to DNA to circularize a template.
  • One embodiment can be based on the use of Tth polymerase and T4 ligase to generate the circular DNA molecules.
  • Tth polymerase and Taq ligase to generate the circular DNA molecules.
  • One embodiment can be based on the use of truncated or specifically mutagenized molecular polymerase or molecular ligase enzymes, such as mutants derived or related to, Tth polymerase or Taq Ligase to generate the circular DNA molecules.
  • One embodiment can be based on the use of the circular DNA molecules from an RNA template to detect variants or mutations present in an RNA transcript in a clinical sample, or a sample containing both human and non- human nucleic acid.
  • One embodiment can be based on the use of the circular DNA molecules from an RNA template to detect spliced RNA transcripts in a clinical sample, or a sample containing both human and nonhuman nucleic acid.
  • One embodiment involves deployment of a library of >1 and up to 10,000 DNA molecules that form complementary DNA product from RNA without an intermediate step for the formation of cDNA.
  • One embodiment provides combination of two or more methods of i) Direct RNA capture, ii) cDNA capture, iii) ds DNA capture or iv) protein conjugated DNA capture, to quantify nucleic acids and or proteins in a biological sample.
  • One embodiment combines any of the above embodiments where the backbone of the circular molecule contains binding sites for primers that can be used to amplify, label, and/or sequence the population of molecules produced from a sample.
  • Also provided herein is a method of using one or more probes disclosed herein, such as one or more probe set, for detecting, identifying, or distinguishing one or more organisms, such as HCV.
  • the method can comprise identifying a an organism with a plurality of probes can detect at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1250, 1500, 1750, or 2000 different pathogens.
  • a plurality of probes can detect at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 300, 400, 500, 600, 700, 800, 900, 1250, 1500, 1750, or 2000 different strains, variants or sub-types of a pathogen or different strains or sub-types of different pathogens.
  • the method can comprise detecting or distinguishing different organisms, different pathogens, different strains, variants or sub-types of a pathogen or different strains, variants or sub-types of different pathogens, with at least 70 % sensitivity, specificity, or both, such as with at least 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, or 89% sensitivity, specificity, or both, such as with at least 90%o sensitivity, specificity, or both.
  • Each probe may detect or distinguish different organisms, different pathogens, different strains or sub-types of a pathogen or different strains or sub-types of different pathogens with at least 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100%o sensitivity, specificity, or both, in an assay.
  • a combination of probes may be used for detecting or distinguishing different organisms, different pathogens, different strains, variants or sub-types of a pathogen or different strains, variants or sub-types of different pathogens, with at least 70 %> sensitivity, specificity, or both, such as with at least 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, or 89% sensitivity, specificity, or both, such as with at least 90%o sensitivity, specificity, or both.
  • the confidence level for determining the specificity, sensitivity, or both may be with at least 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99%o confidence
  • a method for detecting the presence of one or more target organisms can be by contacting a sample suspected of containing at least one target organism with any of the probe set disclosed herein, capturing a region of interest of the at least one target organism (e.g., by polymerization and/or ligation) to form a circularized probe, and detecting the captured region of interest, thereby detecting the presence of the one or more target organisms.
  • a captured region of interest may be amplified to form a plurality of amplicons (e.g., by PCR).
  • a sample can be treated with nucleases to remove the linear nucleic acids after probe-circularizing capture of the region of interest.
  • a circularized probe can be linearized, e.g., by nuclease treatment.
  • a circularized probe molecule can be sequenced directly by any means known in the art, without amplification.
  • a circularized probe can be contacted by an oligonucleotide that primes polymerase-mediated extension of the molecules to generate sequences
  • a circularized probe molecule can be enriched from the reaction solution by means of a secondary- capture oligonucleotide capture probe.
  • a secondary-capture oligonucleotide capture probe may comprise a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence designed to hybridize to at least 6 nucleotides of the circularized probe.
  • the nucleic acid sequence designed to hybridize to at least 6 nucleotides of the circularized probe may include 1, 2, 4, 8, 16, 32 or more nucleotides of the polymerase-extended capture product.
  • a probe and/or captured region of interest can be sequenced by any means known in the art, such as polymerase- dependent sequencing (including, dideoxy sequencing, pyrosequencing, and sequencing by synthesis) or ligase based sequencing (e.g., polony sequencing).
  • the sequencing can be by Sanger sequencing or "next generation" (Next-gen) sequencing.
  • sequencing can be by second generation or third generation sequencing methods, such as using commercial platforms such as Illumina, 454, Solid, Ion Torrent, PacBio, Oxford, Life Technologies QDot, or any other available sequencing platform.
  • An organism can be identified from a sample, such as a sample form a host and the organism being identified can be a pathogen.
  • a sample can be a biological sample, such as from a mammal, such as a human.
  • a genotype of the host can be identified or detected from the sample or another sample from the host. Identification of one or more organisms (such as one or more pathogens, such as different pathogens or subtypes or strains of pathogens), can be used to select one or more therapeutics or treatments for the host. Identification of one or more organisms (such as one or more pathogens, such as different pathogens or subtypes or strains of pathogens), can be used to stratify the host into a therapeutic group, such as for a particular drug treatment or clinical trial. HCV strain identification can be used to stratify a host into a cancer therapeutic group or to select a cancer treatment.
  • Identification of one or more organisms (such as one or more pathogens, such as different pathogens or subtypes or strains of pathogens) and the genotype of a host can be used to select one or more therapeutics or treatments for the host. Identification of one or more organisms (such as one or more pathogens, such as different pathogens or subtypes or strains of pathogens) and the genotype of the host can be used to stratify the host into a therapeutic group, such as for a particular drug treatment or clinical trial.
  • Also provided herein can be a method for identifying an organism, such as a genetic signature of an organism, a subtype or strain of a pathogen in a short timeframe or with a fast turnaround time.
  • a genotype of an individual or host can be also identified within the short time frame. For example, the identification of a pathogen in a sample can completed in less than 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 hours. From contacting the sample with one or more probes to identifying the organism by sequencing can be performed in less than 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 hours.
  • From contacting the sample with the probe to identifying the organism (such as one or more pathogens) by sequencing, and transmitting the results to a health care professional (such as a clinician or physician) can be performed in less than 2, 3, 4, 5, 6, 1, 8, 9, 10, 11, or 12 hours.
  • a health care professional such as a clinician or physician
  • From contacting the sample with the probe to identifying the organism (such as one or more pathogens) by sequencing, transmitting the results to a health care professional (such as a clinician or physician), and selection of a therapeutic can be performed in less than 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 hours.
  • Also provided herein can be a method for simultaneous quantification and identification of an organism, such as identifying one or more subtypes or substrains of a pathogen. Multiplexing can be also provided herein, wherein a multiple pathogens, substrains or subtypes of pathogens, can be detected simultaneously.
  • Conversion of sequence data to quantitative report can be performed by using selected validated parameters. Any software known in the arts can be used for any of the methods disclosed herein.
  • aspects of the invention further include a method of detecting the presence of one or more strains of hepatitis A, B, C, D, or E virus in a test sample, comprising: contacting a test sample with the composition comprising a plurality of probes as described herein to form a mixture; capturing a region of interest in a hepatitis A, B, C, D, or E virus genome by at least one single-stranded nucleic acid probe hybridized to a first and second target sequence in the hepatitis virus genome to form a circularized probe; and detecting the captured region of interest, thereby detecting the presence of the one or more strains of hepatitis A, B, C, D, or E virus.
  • aspects of the invention also include a method of detecting the genotype of one or more strains of hepatitis C virus (HCV) in a test sample, comprising: contacting a test sample with the composition comprising a plurality of probes as described herein to form a mixture; capturing a region of interest in an HCV genome by at least one single-stranded nucleic acid probe hybridized to a first and second target sequence in the HCV genome to form a circularized probe; and determining the sequence of the captured region of interest, thereby detecting the genotype of each of the one or more strains of HCV.
  • HCV hepatitis C virus
  • a region of interest can be captured by polymerase dependent extension from the 3' terminus of a probe in the plurality of probes.
  • a region of interest can be captured by sequence-specific ligation of a linking
  • a method of detecting the presence of one or more strains of hepatitis A, B, C, D, or E virus in a test sample or of detecting the genotype of one or more strains of HCV in a test sample can include the step of amplifying a circularized probe to form a plurality of amplicons containing the captured region of interest.
  • a method of detecting the presence of one or more strains of hepatitis virus in a test sample or of detecting the genotype of one or more strains of HCV in a test sample can include the step of treating the mixture with a nuclease to remove linear nucleic acids between the steps of capturing and detecting a region of interest.
  • a method can include the step of linearizing the circularized probe by cleavage with a site-specific endonuclease.
  • a method of detecting the presence of one or more strains of hepatitis A, B, C, D, or E virus in a test sample can include the step of sequencing the region of interest.
  • the method of detecting the presence of one or more strains of hepatitis A, B, C, D, or E virus in a test sample can further include the step of comparing the sequence of the captured region of interest to the sequence of known hepatitis A, B, or virus genomes.
  • a method of detecting the presence of one or more strains of hepatitis A, B, C, D, or E virus in a test sample can include the step of analyzing the sequence of the captured region of interest with respect to the sequence of known hepatitis A, B, C, D, or E virus genomes and a model of sequencing errors to estimate the proportions or abundances of the hepatitis A, B, C, D, or E strains in the test sample.
  • a method of detecting the genotype of one or more strains of HCV in a test sample can include the step of comparing the sequence of the captured region of interest to a database of known HCV mutations.
  • a of known HCV mutations can be a database of known HCV drug resistance mutations.
  • a method of detecting the genotype of one or more strains of HCV in a test sample can include the step of analyzing the sequence of the captured region of interest with respect to the sequence of known HCV genotypes and a model of sequencing errors to estimate the proportions or abundances of one or more strains of HCV in a test sample.
  • the sequencing may be dideoxy sequencing.
  • a sequence of interest can contain one or more single nucleotide polymorphisms (SNPs), insertions, deletions, and/or indels.
  • SNPs single nucleotide polymorphisms
  • a circularized probe can be detected by hybridization.
  • the hybridization may be to a microarray including at least one feature that specifically hybridizes to the circularized probe.
  • a method of detecting the presence of one or more strains of hepatitits A, B, C, D, or E virus or of determining the genotype of one or more strains of HCV in a test sample can include the step of adding a sample internal calibration standard to the test sample.
  • a method further can comprise the steps of adding a probe that specifically hybridizes with the sample internal calibration standard and detecting the sample internal calibration standard.
  • a method of detecting the presence of one or more strains of hepatitis A, B, C, D, or E virus or of determining the genotype of one or more strains of HCV in a test sample can include the step of formatting the results to inform physician decision making. Formatting can include providing an estimated quantity of one or more hepatitis A, B, C, D, or E strains or HCV genotypes of interest. Formatted results can comprise a therapeutic recommendation based on the one or more hepatitis A, B, C, D, or E strains or HCV genotypes detected.
  • aspects of the invention further include a method of treating a subject infected with hepatitis A, B,
  • C, D, or E virus comprising performing the method of detecting the presence of one or more strains of hepatitis A, B, C, D, or E virus in a test sample as provided herein and administering a suitable therapy to the subject based on the at least one hepatitis A, B, C, D, or E virus strain detected.
  • aspects of the invention further include a method of treating a subject infected with HCV, comprising performing the method of determining the genotype of one or more strains of HCV in a test sample as provided herein and administering a suitable therapy to the subject based on the at least one HCV genotype detected.
  • the invention provides a method for detecting the presence of one or more hepatitis A, B, C, D, or E virus (HAV, HBV, HCV, HDV, and HEV respectively) by contacting a sample suspected of containing at least one such virus with a mixture of probes of the invention, capturing a region of interest of the at least one virus (e.g., by polymerization and/or ligation) to form, for example, a circularized probe, and detecting the captured region of interest, thereby detecting the presence of the one or more hepatitis A, B, C,
  • a captured region of interest may be amplified to form a plurality of amplicons (e.g., by PCR).
  • a sample can be treated with nucleases to remove the linear nucleic acids after probe- circularizing capture of the region of interest.
  • a circularized probe can be linearized, e.g., by nuclease treatment.
  • a circularized probe molecule can be sequenced directly by any means known in the art, without amplification.
  • a circularized probe can be contacted by an oligonucleotide that primes polymerase-mediated extension of the molecules to generate sequences complementary to that of to circularized probe, including from at least one to as many as 1 million or more concatemerized copies of the original circular probe.
  • a circularized probe molecule can be enriched from the reaction solution by means of a secondary-capture oligonucleotide capture probe.
  • a secondary-capture oligonucleotide capture probe may comprise a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence designed to hybridize to at least 6 nucleotides of a circularized probe.
  • the nucleic acid sequence designed to hybridize to at least 6 nucleotides of the circularized probe may include 1, 2, 4, 8, 16, 32 or more nucleotides of the polymerase-extended capture product.
  • a probe and/or captured region of interest can be sequenced by any means known in the art, such as polymerase-dependent sequencing (including, dideoxy sequencing, pyrosequencing, and sequencing by synthesis) or ligase based sequencing (e.g., polony sequencing).
  • polymerase-dependent sequencing including, dideoxy sequencing, pyrosequencing, and sequencing by synthesis
  • ligase based sequencing e.g., polony sequencing
  • Methods of detecting the presence of one or more hepatitis viruses further comprise the step of formatting the results to facilitate physician decision making by, for example, providing one or more graphical displays.
  • the invention provides a method of treating a subject suspected of being infected with a hepatitis A, B, C, D, or E virus, comprising detecting at least one hepatitis A, B, C, D, or E virus by the methods of the invention and administering a suitable therapeutic treatment based on the at least one hepatitis virus detected.
  • the invention also provides a method of treating a subject suspected of being infected with an HCV strain carrying a drug resistance mutation, comprising detecting at least one HCV drug resistant genotype by the methods of the invention and administering a suitable therapeutic treatment based on the at least one HCV drug resistant genotype detected.
  • the invention provides methods of detecting the presence of one or more HCV strains in a test sample.
  • the methods can comprise the step of contacting a mixture comprising probes described above with any of the test samples described above in a capture reaction, as defined above.
  • a mixture comprising probes can be contacted with nucleic acids extracted from a test sample such as blood, along with a polymerase enzyme and nucleotide triphosphates (NTPs), and can capture at least one region of interest by polymerase-dependent extension of at least one homologous probe sequence in the mixture.
  • NTPs nucleotide triphosphates
  • a polymerase-dependent extension of a homologous probe sequence can be followed by a ligation of the end of the extended (i.
  • homologous probe sequence to the end of the other homologous probe sequence to produce a circularized probe containing a region of interest from the genome of an HAV, HBV, or HCV strain.
  • a ligation reaction can occur while the target arm can be hybridized to the target.
  • a target arm can be dissociated from the target and ligated in solution under reaction conditions favoring self- ligation over trans-ligation to other probe molecules, for example a dilute ligation solution.
  • a sample may be treated with endonuclease to digest single stranded linear DNA.
  • Primers complementary to the probe backbone may amplify the MIP into dsDNA for sequencing.
  • amplification primers at this stage may contain sample-specific nucleotide barcode sequences, e.g., they may be adaptamer primers.
  • a unique primenbarcode molecule sequence therefore may identify each test sample. For example, a panel of 100 probes can be contacted with 50 individual test samples.
  • the homologous probe sequences detected in a sequence read can identify a strain of hepatitis A, B, C, D, or E or a drug resistance genotype of a strain of HCV.
  • Each test sample amplification reaction can be done with one unique probe set.
  • Each barcode within the amplification primer can be used to act as an identifier for a patient, e.g., contains a barcode. Therefore, 50 pairs of amplification primers (one for each amplification reaction product) and one panel of probes (e.g., probes for hepatitis A, B, and C distinction, for HCV genotyping, or both) can be required for a 50-sample multiplex assay.
  • Each test sample can be contacted with a unique set of probes, e.g., a panel.
  • Amplification reaction products for each test sample can be pooled.
  • the homologous probe sequences and capture sequence identify both the target organism (e.g., an HCV strain) and test sample, since each test sample can be contacted with a unique probe set.
  • Conventional primer pairs i.e., comprising homologous probe sequences
  • probe recognition sequence can be contacted with sample nucleic acids to amplify a region of interest using low cycle numbers (e.g., 10 or fewer cycles) to reduce amplification artifacts.
  • probes directed to the probe recognition sequence of the conventional primer pair amplifications products can be applied.
  • Polymerase extension and ligation captures the homologous probe sequences of the conventional primer pair and the intervening region of interest.
  • Unique barcoded probe sequences can allow for sample (e.g., patient) multiplexing. Sequence reads can comprise homologous probe sequences (identifying an organism of interest) and barcodes (associated with a sample, e.g., patient). In the example of a 100 probe panel and 50 test samples, each HCV strain has a pair of homologous probe sequences, which identify the strain of interest. Each test sample can be contacted with a unique probe set. Each barcode within the probe backbone can be used to act as a sample identifier. Therefore, for example, 50 sets of probes with 100 probes in each can be used.
  • Polymerases for use in the methods provided by the invention include Taq polymerase (Lawyer et al, J. Biol. Chem., 264:6427-6437 (1989); Genbank accession:P19821), including the 5'43' nuclease deficient "Stoffel” fragment described in Lawyer et al, PCR Meth. Appl, 2:275-287 (1993)), PHUSIONTM high fidelity recombinant polymerase (NEB), and Pyrococcus furiosus (Pfu) polymerase (see, e.g., U.S. Patent No.
  • polymerase can be 5'43' nuclease deficient, such as the Stoffel fragment of Taq polymerase, which further lacks 3'45' proofreading activity.
  • Polymerases lacking 5'43' exonuclease activity may be generated by means known in the art, for example, based on methods of screening or rational design.
  • polymerase variants can be designed based on sequence alignments of one or more polymerases to the Stoffel fragment of Taq and/or by "threading" a sequence through a solved polymerase structure (e.g., MMDB IDs 56530, 81884 and 81885).
  • a solved polymerase structure e.g., MMDB IDs 56530, 81884 and 81885.
  • a polymerase for use in the methods of the invention can be a non-displacing polymerase, such as Pfu, T4 DNA polymerase, or T7 DNA polymerase.
  • a polymerase for use in the methods provided by the invention can be a polymerase suitable for isothermal amplification and capture and/or amplification reactions can be performed isothermally, e.g., by controlling metal ion concentration and/or using particular polymerases and/or additional enzymes, such as helicases or nicking enzymes (such as primer generation RCA and EXPAR). See, e.g., U.S. Patent No. 6,566,103, Murakami et al, Nucl. Acid.
  • Polymerases foruse in isothermal amplification include, for example, Bst, Bsu andphi29 DNA polymerases, and E.coli DNA polymerase I.
  • a mixture of probes can be contacted with nucleic acids extracted from a test sample, a ligase enzyme, and a pool of n-mer oligonucleotides in a capture reaction, as defined above.
  • the n-mer oligonucleotides can be at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 24 or 25 nucleotides long. They can be random hexamers. They can be polynucleotides, the length of the region of interest between the first and second target sequences that hybridize to the homologous probe sequence.
  • the n-mer oligonucleotide contains 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 locked nucleic acids (LNAs) or 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% LNAs.
  • the ligase enzyme ligates the n-mer oligonucleotides with the probes provided by the invention to produce a circularized probe containing a region of interest from HCV.
  • Primers complementary to the probe backbone amplify the probe into dsDNA for sequencing.
  • Amplification primers can be adaptamer primers and contain sample- identifying barcode sequences, such as for multiplexing. A unique barcode sequence therefore, can identify each sample in a multiplex.
  • Each strain of HCV can be identified by the unique combination of homologous probe sequences and ligated n-mer in a sequence read.
  • Ligases for use in the methods of the invention include T4, T7, and thermostable ligases, such a Taq ligase (as disclosed in Takahashi et al, J. Biol. Chem., 259: 10041-47 (1984), and international publication WO 91/17239), and AMPLIGASE Tm
  • Mixtures comprising pairs of conventional PCR primers (conventional primer pairs provided by the invention; e.g., SEQ ID NOs. 1 and 2; SEQ ID NOs. 873 and 874) can be contacted with sample nucleic acids to amplify a region of interest between two target regions in HCV.
  • a limited number of amplification steps can be performed. Fewer than 25, 20, 15, 12, 10, 9, 8, 7, 6, 5, 4, 3, or 2 cycles of amplification can be performed.
  • a mixture of conventional primer pairs can be contacted with nucleic acids extracted from a test sample, a polymerase, and nucleotide triphosphates to amplify the region of interest.
  • Primers binding to universal probe recognition sequence (e.g., a barcode) in the conventional primer pairs can introduce nucleotide barcodes, and recognition sites for next-generation DNA sequencing technology primers.
  • conventional primer pairs can be used in a variety of additional methods.
  • conventional primer pairs may be contacted with a sample nucleic acid suspected of containing at least one target nucleic acid.
  • PCR may be used to amplify the region of interest directly from a sample nucleic acid.
  • Conventional primer pairs may be used to amplify capture reaction products, e.g., one or more circularized probes.
  • a sample nucleic acid suspected of containing a region of interest can be amplified using a conventional primer pair and then contacted with a probe provided by the invention for circularizing capture.
  • Conventional primer pairs can be contacted with a sample nucleic acid and modified nucleotides, such as biotinylated nucleotides.
  • the resulting capture or amplification reaction products can then be isolated by affinity capture, for example, with steptavidin substrates, for subsequent processing, e.g., circularizing capture with the probes provided by the invention.
  • affinity capture for example, with steptavidin substrates
  • a single conventional primer may be used for linear amplification of a region of interest in a sample nucleic acid in, and then contacted with a probe provided by the invention for circularizing capture.
  • a single conventional primer containing a 5' biotin moiety may be used to amplify a target sequence and then be enriched from the sample using streptavidin capture for sequencing by, for example, direct sequencing using either specific conventional primer pairs provided by the invention, or by random hexamer priming, or may be used for circularizing capture using probes provided by the invention.
  • Methods that comprise a capture reaction can further comprise the step of contacting the capture reaction product with one or more exonucleases to remove linear nucleic acids.
  • the exonuclease includes at least one of exo I, exo III, exo VII, and exo V.
  • the exonuclease can be up to a 100: 1 , 50: 1 , 25: 1 , 10: 1, 5: 1 , 2: 1 , 1 : 1, 1 :2, 1 :5, 1 : 10, 1 :25, 1 :50, or 1 : 100 (unit to unit) mixture of exonuclease I and exonuclease III.
  • Methods of the invention can further comprise the step of amplifying capture reaction products in an amplification reaction.
  • amplifying nucleic acids can be known in the art and include the polymerase chain reaction (see, e.g., U.S. Patent Nos. 4,683, 195 and 4,683,202 and McPherson and Moller, PCR (the baSlCs), Taylor & Francis; 2 edition (March 30, 2006)), OLA (oligonucleotide ligation amplification) (see, e.g., U.S. Patent Nos. 5, 185,243, 5,679,524, and 5,573,907), rolling-circle amplification ("RCA,” described in Baner et ah, Nuc.
  • RCA rolling-circle amplification
  • An amplification can be linear amplification such as, RCA.
  • Capture reaction products e.g., circularized probes
  • a RCA can be used as templates in a RCA to generate long, linear repeating ssDNA products.
  • the RCA reaction may comprise contacting a sample with modified nucleotides, such as biotinylated nucleotides, LNA nucleotides or artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer), to facilitate affinity enrichment and purification.
  • modified nucleotides such as biotinylated nucleotides, LNA nucleotides or artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer)
  • Amplification reaction products comprising linear repeating ssDNA can be contacted with a conventional primer provided by the invention to produce short extensions of double stranded DNA with a length 2, 3, 4, 5, 6, 7,10,15, 20, 30, 40, 50, 75, 100, 500 nucleotides.
  • the length of extension may be controlled by time of extension step at the optimum temperature of elongation for this polymerase, e.g., 5, 10, 15, 20, 40, 60 seconds, at temperatures including 37, 42, 45, 68, 72, 74 °C.
  • the length of extension can be controlled by mixing of nucleotide analogues that prevented further elongation into the reaction, such as dideoxyCytosine, or nucleotides with a 3' modification such as biotin, or a carbon spacer terminated with an amino group.
  • a primer can be contacted with a linear repeating ssDNA RCA amplification reaction product and extended by a polymerase for a single cycle of PCR, to generate a short single stranded DNA containing the complementary sequence to the repeating unit of the RCA product.
  • the primer contacted with a linear repeating ssDNA RCA amplification reaction product can produce a dsDNA region comprising a restriction enzyme cleavage site. Accordingly, when the primer hybridizes to the linear repeating ssDNA RCA amplification reaction product to form a double-stranded DNA region, the amplification reaction product can be contacted with the restriction enzyme to produce shorter fragments.
  • An amplification reaction can use adaptamer primers.
  • the amplification reaction can use sample-specific primers, that is, primers that hybridize to sequences present in the probe that identify the sample.
  • a low number of amplification cycles can be used to avoid amplification artifacts, e.g., fewer than 25, 20, 15, 12, 10, 9, 8, 7, 6, 5, 4, 3, or 2 cycles.
  • Methods provided herein may comprise the step of contacting sample nucleic acids, capture reaction products or amplification reaction products with a secondary-capture oligonucleotide capture probe which can comprise a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence, which can be able to hybridize to the sample nucleic acids, capture reaction products, or amplification reaction products.
  • a secondary-capture oligonucleotide capture probe which can comprise a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence, which can be able to hybridize to the sample nucleic acids, capture reaction products, or amplification reaction products.
  • oligonucleotide such as a biotinylated oligonucleotide, may be used to enrich their target nucleic acids using affinity purification.
  • a biotinylated oligonucleotide may specifically hybridize to a captured sequence (i.e., it can be complementary to a region of interest), a homologous probe sequence, or a backbone sequence, such as a barcode sequence.
  • a biotinylated probe may be extended on sample nucleic acids, capture reaction products or amplification reaction products using thermophilic or mesophilic polymerases.
  • the method can comprise contacting a capture reaction product with a biotinylated oligonucleotide for enrichment of specific capture reaction products using the biotin: streptavidin interaction.
  • Sequences captured by the methods of the invention can be detected by any means, including, for example, array hybridization or direct sequencing. Captured sequences may be detected by sequencing without amplification. Numerous sequencing methods can be known in the art, can be used in the method of the invention, and can be reviewed in, e.g., U.S. Patent No. 6,946,249 and Metzker, Nat. Reviews, Genetics, 11 :31-46 (2010); Ansorge, Nat. Biotechnol, 25(4): 195-203 (2009), Shendure and Ji, Nat. Biotechnol, 26(10): 1135-45 (2008), Shendure et al, Nat. Rev. Genet. 5:335-44 (2004).
  • the sequencing methods can rely on the specificity of either a DNA polymerise or DNA ligase and include, e.g., pyrosequencing, base extension sequencing (single base stepwise extensions), multi-base sequencing by synthesis (including, e.g., sequencing with terminally-labeled nucleotides) and wobble sequencing, which can be ligation-based.
  • Extension sequencing can be disclosed in, e.g., U.S. Patent No. 5,302,509.
  • Exemplary embodiments of terminal-phosphate-labeled nucleotides and methods of using them can be described in, e.g., U.S. Patent No. 7,361,466; U.S. Patent Publication No. 2007/0141598, published Jun.
  • Ligase-based sequencing methods can be disclosed in, for example, U.S. Patent No. 5,750,341, PCT publication WO 06/073504, and Shendure et al, Science 309: 17281732 (2005).
  • Sequencing technology used in the methods provided by the invention include Sanger sequencing, microelectrophoretic sequencing, nanopore sequencing, sequencing by hybridization (e.g., array-based sequencing), real-time observation of single molecules, and cyclic-array sequencing, including pyrosequencing (e.g., 454
  • SEQUENCING ® see, e.g., Margulies et al, Nature, 437: 376-380 (2005)), ILLUMINA ® or SOLEXA ® sequencing (see, e.g., Turcatti et al, Nucleic Acids Res., 36, e25 (2008), see also U.S. Patent Nos.
  • Capture probes can contain sequences that facilitate processing for sequencing by a certain sequencing technology, such as sequences that can serve as anchor sites for sequencing by synthesis, primer sites for sequencing reaction initiation, or restriction enzyme sites that allow cleavage for improved ligation of oligonucleotide adaptors for sequencing of the particular amplicon.
  • Circularized capture probes can be contacted by oligonucleotides which prime polymerise-mediated extension of the capture probes to generate sequences complementary to that of the circularized probe, including from at least one to one million or more concatemerized copies of the original circular probe.
  • homologous probe sequences can be used in the probes provided by the invention, as well as conventional primer pairs.
  • homologous probe sequences can be about 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 bases.
  • a region of interest between the target sequences of a probe or conventional primer pair can be about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, or 50 bases.
  • Probes described herein may be circularized by polymerase-dependent synthesis and ligation, or by ligation of n-mer oligonucleotides of about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, or 50 bases.
  • a region of interest can be about 7 bases and homologous probe sequences can be 10 or 12 bases.
  • a 7-mer oligonucleotide comprising a locked nucleic acid can be ligated to a probe provided by the invention, and a 7-mer oligonucleotide can comprise at least 1, 2, 3, 4, 5, 6, or 7 locked nucleic acids (LNAs).
  • Capture or amplification reaction products may be sequenced by emulsion droplet sequencing by synthesis as disclosed in, for example, Binladen et al, PLoS One. 2(2):el97 (2007).
  • ICapture products may be amplified by RCA to generate higher copy numbers of capture product within a single DNA molecule in order to facilitate emulsion of captured DNA for emulsion PCR and sequencing by synthesis. See, e.g., Drmanac et al, Science 327(5961):78-81 (2010).
  • Capture reaction products and/or amplification reaction products containing different samples can be combined before detection.
  • Capture and/or amplification reaction products can be combinatorially pooled before detection, e.g., an MxN array of individual capture reaction products and/or amplification reaction products can be pooled by row and column, and the pools can be detected. Results from row and column pools can then be deconvolved to provide results for individual samples. Higher dimensional arrays and pools may be used analogously.
  • Capture reaction products and/or amplification reaction products contain identifying barcode sequences.
  • Amplification primers can contain sample-specific barcode sequences.
  • sample source of sequences contained in pools of capture reaction products and/or amplification reaction products can be identified by their barcode sequences.
  • the methods provided by the invention may also include directly detecting a particular nucleic acid in a capture reaction product or amplification reaction product, such as a particular target amplicon or set of amplicons.
  • mixtures of the invention can comprise specialized probe sets including TAQMANTM, which uses a hydrolyzable probe containing detectable reporter and quencher moieties, which can be released by a DNA polymerase with 5'43' exonuclease activity (U.S. Pat. No. 5,538,848); molecular beacon, which uses a hairpin probe with reporter and quenching moieties at opposite termini (U.S. Patent No. 5,925,517);
  • FRET primers which use a pair of adjacent primers with fluorescent donor and acceptor moieties, respectively (U.S. Patent No. 6, 174,670); and LIGHTUPTM, a single short probe which fluoresces only when bound to the target (U.S. Patent No. 6,329,144).
  • SCORPIONTM U.S. Patent No. 6,326,145
  • SIMPLEPROBESTM U.S. Patent No. 6,635,427
  • Amplicon-detecting probes can be designed according to the particular detection modality used, and as discussed in the above-referenced patents.
  • a quantitative, real-time PCR assay to detect a particular capture reaction product or amplification reaction product may be performed on the ILLUMINA ® ECO Real-time PCR systemTM.
  • Methods disclosed herein can comprise using sample internal calibration nucleic acid (SICs) to estimate the concentration of a hepatitis strain in a test sample. This can be done by calibrating the frequency of a sequence from a hepatitis strain to the known concentration of the SICs to provide an estimated concentration of the viral strain in the test sample. An estimated concentration of the viral strain can be compared to a database of reference concentrations of hepatitis strains associated with a disease state and/or likely clinical diagnoses.
  • SICs sample internal calibration nucleic acid
  • Methods of the invention can further comprise steps of formatting results to inform physician decision making.
  • Results can refer to the outcome of detecting a target organism and includes, e.g., binary (e.g., +/-) detection as well as estimates of concentration, and may be based on, inter alia the result of sequencing a capture reaction product or amplification reaction product.
  • Formatting can comprise presenting an estimate of the concentration of an organism in a test sample, optionally including statistical confidence intervals. Formatting can further comprise color-coding of the results. Formatting can include recommendations for therapeutic intervention, including, for example, hospitalization, probiotic treatment, antibiotic treatments, and chemotherapy. Formatting can comprise one or more of the following: references to peer -reviewed medical literature and database statistics of empirically defined sample results.
  • Conversion of raw sequence data may occur in three stages, namely the processing of raw instrument data and conversion into aligned sequencing reads, statistical interpretation of read data and (3)providing output and storage in archives.
  • Processing of raw data from raw instrument readout to sequence information that can be associated with a location in a hepatitis genome may involve at least the following steps:
  • reads Integrating sequence readout
  • quality score files either before or during alignment.
  • Sequencing platform creates quality scores to capture errors and identify decay of sequence with read length
  • Statistical analysis and interpretation then can proceed to account for all statistically significant hits against all genomes and optionally sub- classify hits by regions of interest, such as resistance loci or unique identifiers of an HCV strain.
  • Sequencing reads may align to target genomic DNA with near -perfect matching through probe arm region, while the alignment in the polymerase-extended region may reveal sequence variation through this region, which allows assignment of these amplicon sequences to different strains.
  • Some reads may map to regions common between one or more strains. As an example, most reads align to strains A, B, C and D and can be common. In contrast, other reads maybe unique to specific strains (e.g., the subset of reads aligning only to strain D). Quantitative models can be used to predict the distribution of common reads and unique reads in order to provide a quantitative estimate of the proportion of each unique pathogen present in the sample.
  • Statistical analysis can include simple summary statistics, such as hit density for all hepatitis strains, where hit density can be the number of hits in a window of sequence divided by the number of high-quality reads. It can be recorded by sequence coordinates in the pathogen sequence or by a combination of a "region of interest" ID and the distance from its center.
  • classification methodologies may be used to provide accurate assignment of samples to hepatitis types or HCV genotypes.
  • the toolbox available involves maximum likelihood and Bayesian approaches, linear discriminate based methodologies and neural network approaches. This approach may employ any one or combinations of such approaches.
  • HMM hidden Markov models
  • Parzen Windows multivariate regression
  • SVMs support vector machines
  • Disclosed methods can employ one or more of these approaches evaluated against reference data sets in order to achieve maximum specificity and sensitivity.
  • Final analysis may depend on running many samples on a system of the invention and also on a "gold standard" reference. From this, one can then examine the properties of these data, the assays and implement fixed analysis algorithms. These algorithms can be not truly fixed, but instead adapt themselves to incoming data. This prior analysis can be run several times over the life cycle of a system of the invention. Statistical interpretation as implemented above can be dependent on prior analysis on powerful computational services.
  • Initial analysis generates algorithmic recipes for analysis and interpretation, which can then be deployed into a system of the invention.
  • a goal of sequencing and subsequent analysis following a capture reaction using a set of probes can be to determine the set of hepatitis types or HCV genotypes whose DNA can be present in a sample.
  • a further goal can be to determine the relative quantities of those strains in the sample.
  • Methods of analysis may rely on a model for the probability of errors in sequencing reads and a model for mutations arising between related strains.
  • the simplest version of these models may treat all errors or changes as having equal probability, where that probability may be derived from data or chosen based on a researcher's best guess.
  • More advanced models may learn the probabilities of different types of errors from sequencing datasets of known template material using the same machine, sample preparation, and analysis software.
  • Other advanced models may learn the probabilities of mutations based on sets of known strains from public databases of genes or genomes, private databases of genes or genomes, or from unassembled or partially assembled collections of sequencing reads.
  • the set of expected read sequences may be computed.
  • Each expected read sequence may be derived from one probe and one genome, thus the number of expected read sequences may be the product of the number of genomes and the number of probes.
  • the reads may be aligned against the set of expected reads.
  • the method may compute the probability that the read (or pair of reads) can be derived from each expected product.
  • the method may then compute the set of all organisms or strains that might be present in the sample as the union of the organisms/strains from all expected products to which a read aligns with greater than a selected minimum probability, for example, 0.1, 0.01, 0.001, 0.0001, or lower.
  • Methods of analysis can further determine the relative proportion or abundance of each strain, such that the proportions or abundances maximize the probability of actual occurance of the observed set of sequencing reads, given: (1) the probabilites of each read aligning to each expected read; a prior probability of observing each strain in the sample (for this type of probability, each strain can be equally likely); and a prior probability of the numberor strains that can be present.
  • each number of strains may be equally likely.
  • the probability of the number of strains may follow a Dirichlet distribution.
  • Methods of analysis can determine the relative proportions or abundances of organisms via a "Mixture Model.”
  • Hidden variables in the model can be the proportions or abundances of the strains and the assignments of sequencing reads to expected reads (where each observed read can be assigned to a single expected read).
  • a variety of methods, including Expectation-Maximization, Gibbs Sampling, and Metropolis- Hastings, may be used to find the values of these hidden variables, which can maximize the probability of the data given the hidden variables and the priors on the hidden variables.
  • Methods can also incorporate unknown strains or genotypes into the Mixture Model by using the probabilities of mutations. Genomes of unknown strains can be generated based on observed reads that contain one or more mismatches to all known hepatitis genomes. The previously unknown genome may be added to the mixture with the same probability as a known genome.
  • Some embodiments also correct for multiple testing. Without limitation as to any one technique, the objective can be to eliminate false positives and false negatives. FPR and FDR (false discovery rate) can be among the most promising corrections since they can be adaptable to any system. Thresholds can be updated over time as additional cases can be tested. Exemplary embodiments categorize a sample as (1) a significant hit, (2) an inconclusive hit, (3) lack of hit or missing pathogen, or (4) poor sample quality or data error.
  • Output of results can occur in parallel (1) to company server, (2) to xml and HL7 formats, e.g., for deposit in hospital system, in an electronic medical record (EMR) system, or in other HL7 or xml capable storage systems, for use in existing health record frameworks, and/or (3) to physician-friendly graphical and text formats, e.g., graphs, tables, summary text and possible annotated, web formats linking to reference information.
  • Output formats can be arbitrary, e.g., simple text, spreadsheet data, binary data objects, encrypted and/or compressed files.
  • a complete record may involve all or some of these linked to a diagnostic test via unique identifiers. They may be assembled into a coherent object or may be accessible via a search for the unique identifier.
  • a further aspect of the invention provides methods of making the mixtures of probes provided by the invention.
  • the methods comprise providing a reference genome and an exclusion set of genomes.
  • the sequence of the reference genome can be sliced (in silico) into n-mer strings of about 18-50 nucleotides.
  • the sliced n-mer strings can be screened to eliminate redundant sequences, sequences with secondary structure, repetitive sequences (e.g., strings with more than 4 consecutive identical nucleotides), and sequences with a Tm outside of a predetermined range (e.g., outside of 50-72°C).
  • the screened n-mers can be further screened to identify homologous probe sequences by eliminating n-mers that specifically hybridize to a sequence in the genome in the exclusion set of genomes (e.g., if a pairwise alignment contains 19 of 20 matches in an n-mer, such as a 25-mer) or occurs in the genome of the target organism more than a specified number of times.
  • a homologous probe sequence can occur only once in the genome of the target organism. For target organisms with a single-stranded genome, the homologous probe sequence may occur only once in the complement of the genome of the target organism.
  • the homologous probe sequences can be filtered so as to specifically hybridize to the genome of the additional sequenced variant(s) resulting in a probe that groups related organisms.
  • Homologous probe sequences may be filtered so as to not specifically hybridize to the genome of the sequenced variant (e.g., the sequenced variant can be part of the exclusion set), resulting in a probe that can discriminate between related organisms.
  • These filter processes can be iterated for each target organism to be detected by the particular mixture.
  • Candidate homologous probe sequences can be screened to eliminate those that can specifically hybridize with other probes in the mixture.
  • Probe selection can be based on a database of different strains of a pathogen, such as a database comprising more than 1500 strains of HIV, more than 100 HCV sub-types, or both, and optionally with additional strains or subtypes of other pathogens.
  • Probes for HCV can be selected by partitioning all available HCV genomes (for example, 1500+ genomes) into subsets based on sequence similarity. For each subset candidate probe sets can be generated that capture all strains. A filter can be then applied for specificity against human/microbial/viral/fungal genomes.
  • Probe arms can hybridize to the target nucleic acid molecule, surrounding the capture region; a polymerase extension fills in the gap between the arms and a ligase creates a circular molecule out of the extended probe.
  • primers amplify the captured probes.
  • the primers contain a 3' end homologous to the backbone (forward) and its reverse complement (reverse primer).
  • the 5' of the primer may contain a sequencing adapter for a particular next generation sequencing platform and may also contain a barcode sequence between the 5' and 3' segments such that multiple samples, each amplified with primers containing a sample-specific barcode, can be multiplexed into a single sequencing run.
  • each MIP captures a well-defined region of the target sequence (compare to hybridization capture methods, which yield a variety of molecules centered around the target).
  • MIPs may be applied to DNA or RNA templates, though different enzymes may be required. Pathogenica has extensive data on the application to DNA templates as well as data testing different polymerases, ligases, and reaction conditions. Application to RNA templates can be critical to determining whether a detected viral genome was present as a transcript, viral particle, or was integrated into the host genome.
  • RNA sequencing methods that allow for picking of target regions, phrased both in terms of target regions with defined ends (MIPs and PCR) and target regions centered on a probe sequence (hybridization). Also provided herein can be method for sequencing to pathogen detection, quantification, and genotyping. The methods provided herein allow for pathogen detection, quantification, and genotyping. The result may be a given sensitivity/specifity or an upper bound on the time to a result. The present methods allow for sequencing DNA from both the viral genomes in a sample and the human genomic DNA, as well as DNA from multiple categories of pathogenic or non- human organisms in a single sample.
  • Some embodiments provide a method that performs several tests in a group in a single tube, in certain embodiments only tests that use next generation sequencing as a readout which allows experiments to detect, genotype, and/or quantify more than, eg, 50 strains or organisms at once or genotype more than, eg, 30 loci at once
  • the methods disclosed herein can provide multiple advantages. For example, the methods disclosed herein provide an advantage over a multiplex PCR assay that uses three tubes compared to one in tube in the present methods which allows for significant reduction in reaction cost and an increase in the number of samples that can be run at once.
  • [00181] Provided herein can be methods for picking a minimal set of probes, primers, or regions to sequence.
  • the methods provide a process of picking the smallest set of probes/primers/regions that answers a given clinical question such that reduce the number of needed target regions and use of more reagents (more PCR primers or more probes) and more sequencing.
  • a variant of the probe/primer picking algorithm allows the process to pick the smallest set of probes/primers that can observe a set of genomic loci across a set of input genomes. For example, we want to pick the smallest set of probes that can enable us to sequence a set of drug resistance mutations across all known HCV strains.
  • One aspect of this algorithm can be determining whether a candidate probe or primer can work against a given genome or whether there can be too many mismatches. As before the methods may rely on computing an approximation.
  • Some of clinical tests based on the methods disclosed herein may rely on the ability to determine or approximate the number of input template molecules (genomes) in a sample.
  • a two step method can be used to calculate the number of template molecules in a sample from the sequencing read counts: 1) each sample that we sequence has a known quantity of a control sequence added to it, such as a GFP sequence.
  • the first step in analyzing sequencing reads can be to normalize the counts based on the number of reads that came from the control sequence. This normalization accounts for the fact that we might put more material from sample A than from sample B into the sequencing reaction. 2) since different MIPs (or primer pairs or hybridization capture probes) might work with different efficiencies, the second step of the quantification process can be to normalize between probes.
  • This normalization relies on experiments in which fixed amounts of different templates were sequenced and might reveal, e.g., that a probe against a first strain of HCV produces 2 circularized MIPs per template but a probe against a second HCV strain produces 3. Thus, the count for the first HCV probe might be multiplied by 33.3 and the count for the second HCV probe divided by 50 to produce comparable viral load counts for the two strains.
  • Some embodiments use a mixed quantity of GFP as the control sequence and a variable quantity of one or more HCV, HPV, or HIV strains. Some samples may contain only GFP and viral DNA while others also included a human background. After the sequencing reads can be separated by sample, the method calculates the ratio of viral reads to GFP and plots that ratio against the number of viral template molecules in the reaction. Such plots can indicate generally excellent agreement between the viral/GFP ratio and the input template quantity.
  • high throughput sequencing offers a relatively unique ability to detect and genotype the pathogen DNA and the human DNA in a sample from a single reaction.
  • genotyping the pathogen and human would require multiple tests, potentially doubling (or more) the expense compared to simply detecting a pathogen.
  • the methods disclosed herein enable simultaneous genotyping with minimal added cost and often no added labor. Other selection/enrichment technologies would also enable these tests.
  • the methods disclosed herein provide for simultaneously detecting or genotyping multiple pathogens. For example, the methods provide for:
  • the mixtures of the present invention can be processed essentially as described in these references for capture reactions (to form capture reaction products), amplification reactions (to form amplification reaction products), and sequencing of the capture and/or amplification reaction products.
  • the methods disclosed in these and other references can be only exemplary and can be in no way limiting of the present invention.
  • genomic DNA can be extracted from frozen pellets of fibroblast, iPS or hES cells using Qiagen DNeasy columns, and can be bisulfite- converted with the Zymo DNA Methylation Gold Kit (Zymo Research). Bisulfate conversion may be used in the methods of the invention to study, for example, DNA methylation, but can be not necessary.
  • Padlock probes can be combined to a total concentration of 60 nM with 200 ng of bisulfiteconverted genomic DNA and mixed in 10 pi lx Ampligase Buffer (Epicentre), denatured at 95 °C for 10 min, then hybridized at 55 °C for 18 h, after which 1 pi gap- filling mix (200 pM dNTPs, 2 U AmpliTaq Stoffel Fragment (ABI) and 0.5 units Ampligase (Epicentre) in lx Ampligase buffer) can be added to the reaction.
  • 1 pi gap- filling mix 200 pM dNTPs, 2 U AmpliTaq Stoffel Fragment (ABI) and 0.5 units Ampligase (Epicentre) in lx Ampligase buffer
  • the reactions can be incubated at 55 °C for 4 h, followed by five cycles of incubation at 95 °C for 1 min, and then can be incubated at 55 °C for 4 h.
  • 2 pi exonuclease mix (containing 10 U/pl exonuclease I and 100 U/pl exonuclease III; USB) can be added to the reaction, and the reactions can be incubated at 37 °C for 2 h and then inactivated at 95 °C for 5 min.
  • 10-pl circularization products can be amplified by PCR in 100 pi reactions with 200 nM AmpF6.2-SoL primer, 200 nM AmpR6.2-SoL primer, 0.4x SybrGreen I and 50 pi iProof High-Fidelity Master Mix (Bio-Rad) at 98 °C for 30 s, eight cycles of 98 °C for 10 s, 58 °C for 20 s, 72 °C for 20 s, 14 cycles of 98 °C for 10 s, 72 °C for 20 s and 72 °C for 3 min.
  • the amplicons of the expected size range (344-394 bp) can be purified with 6% PAGE (6% TBE gel; Invitrogen).
  • Purified PCR products can be then pooled with the four probe sets on the same template DNA in equal molar ratio, and the PCR products can be reamplified in 4x 100 pi reactions with 4-pl template (10-15 ng/pl), 200 pM dNTPs, 20 pM dUTP, 200 nM AmpF6.3 primer, 200 nM AmpR6.3 primer, 0.4x SybrGreen I and 200 pi 2x Taq Master Mix (NEB) at 94 °C for 3 min, 8 cycles of 94 °C for 45 s, 55 °C for 45 s, 72 °C for 45 s and 72 °C for 3 min.
  • PCR amplicons can be purified with Qiaquick columns, then digested with Mmel restriction endonuclease: -3.6 nmole purified PCR amplicons, 16 units of Mmel (2 U/pl; NEB), 100 pM SAM in lx NEB Buffer 4 at 37°C for 1 h.
  • the digestions can be again column-purified and digested with 3 U USER enzyme (1 U/pl) at 37 °C for 2 h, then with 10 units SI nuclease (10 U/pl; Invitrogen) in lx SI nuclease buffer at 37 °C for 10 min.
  • Fragmented DNA can be column-purified and end-repaired at 25 °C for 45 min in 25-pl reactions containing 2.5 pi lOx buffer, 2.5 pi dNTP mix (2.5 mMeach), 2.5 pi ATP (10 mM), 1 pi end-repair enzyme mix (Epicentre), and 15 pi DNA.
  • Approximately 100-500 ng of the end-repaired DNA can be ligated with 60 pM Solexa sequencing adaptors in 30 pi of lx QuickLigase Buffer (NEB) with 1 pi QuickLigase for 15 min at 25 °C.
  • NEB lx QuickLigase Buffer
  • Ligation products of 150-175 by in size can be size-selected with 6% PAGE and then amplified by PCR in 100 pi reactions with 15 pi template, 200 nM Solexa PCR primers, 0.8x SybrGreen I and 50 pi iProof High-Fidelity Master Mix (Bio-Rad) at 98 °C for 30 s, 12 cycles of 98 °C for 10 s, 65 °C for 20 s, 72 °C for 20 and 72 °C for 3 min.
  • the PCR amplicons can be then purified with Qiaquick PCR purification columns, and sequenced on an Illumina Genome Analyzer.
  • genomic DNA e.g., test sample DNA
  • probes each probe:gDNA molar ratio can be equal to 100: 1; numbers change accordingly for other ratios
  • lx Ampligase buffer Epicentre
  • gap filling and sealing mix (5.4 pM dNTPs [lOOx, numbers change accordingly for lx, lox, lOOOx, and 10,000x], two units of Taq Stoffel fragment [Applied Biosystems], and 2.5 units of Ampligase [Epicentre] in Ampligase storage buffer [Epicentre]) can be added, and the reaction can be incubated for 15 min, 1 h, 1 d, 2 d, or 5 d at 60 °C. The reaction can be cycled: after 1 d at 60 °C, 10 cycles of 2 min at 95 °C were applied, followed by 2 h at 60 °C.
  • the incubation temperature can be lowered to 37 °C, and 2 pL of Exonuclease I (20 units/pL) and 2 pL of Exonuclease III (200 units/pL) (both from USB) can be immediately added, and the reaction can be incubated for 2 h at 37 °C followed by 5 min at 94 °C.
  • the circles can be then amplified by two 100-pL PCR reactions with 50 pL of 2x iQ SYBR Green supermix (Bio-Rad), 10 pL of circle template (from previous step), and 40 pmol each of forward and reverse primers (IDT).
  • the PCR program can be 3 min at 96 °C; three cycles of 30 sec at 95 °C, 30 sec at 60 °C, and 30 sec at 72 °C; and 10 cycles of 30 sec at 95 °C, 1 min at 72 °C, and 5 min at 72 °C.
  • the desired PCR products can be gel-purified and quantified.
  • 10-20 fmol of DNA may be sequenced by both Illumina Genome Analyzer version 1 and updated version 2 with a custom primer.
  • the mixtures of the invention can contain sample nucleic acids.
  • the nucleic acids may be obtained from any test sample, such as a biological sample.
  • the nucleic acids obtained from the test sample may be of varying degrees of purity, such as at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 85, 90, 95, 96, 97, 98, 99% of organic matter by weight.
  • the sample nucleic acids can be extracted from a test sample.
  • the sample nucleic acids may be further processed, for example, to allow detection of methylation state. For an overview detecting genome-wide methylation sites, see Deng (2009) (describing MIP capture of CpG islands and bisulfate sequencing to map methylation sites).
  • Test samples may be from any source and include swabs or extracts of any surface, or biological samples, such as patient samples.
  • Patients may be of any age, including adults, adolescents, and infants.
  • Bio samples from a subject or patient may include whole cells, tissues, or organs, or biopsies comprising tissues originating from any of the three primordial germ layers— ectoderm, mesoderm or endoderm.
  • Exemplary cell or tissue sources include skin, heart, skeletal muscle, smooth muscle, kidney, liver, lungs, bone, pancreas, central nervous tissue, peripheral nervous tissue, circulatory tissue, lymphoid tissue, intestine, spleen, thyroid, connective tissue, or gonad.
  • Test samples may be obtained and immediately assayed or, alternatively processed by mixing, chemical treatment, fixation/ preservation, freezing, or culturing.
  • Biological samples from a subject include blood, pleural fluid, milk, colostrums, lymph, serum, plasma, urine, cerebrospinal fluid, synovial fluid, saliva, semen, tears, and feces.
  • the biological sample can be blood.
  • Other samples include swabs, washes, lavages, discharges, or aspirates (such as, nasal, oral, nasopharyngeal, oropharyngeal, esophagal, gastric, rectal, or vaginal, swabs, washes, ravages, discharges, or aspirates), and combinations thereof, including combinations with any of the preceding biopsy materials.

Landscapes

  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Communicable Diseases (AREA)
  • Immunology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Virology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé comprenant la formation d'une sonde d'inversion moléculaire (MIP; une molécule linéaire monocaténaire contenant deux bras de liaison à la cible, les bras pouvant être séparés par une séquence de squelette) pour cibler sélectivement une matrice nucléotidique provenant de n'importe quelle source (virale, procaryote et eucaryote) afin d'obtenir des informations, telles que la quantité de produits de transcription et des données de séquence.
PCT/US2013/041675 2012-05-18 2013-05-17 Sondes d'inversion moléculaire WO2013173774A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261649048P 2012-05-18 2012-05-18
US61/649,048 2012-05-18

Publications (2)

Publication Number Publication Date
WO2013173774A2 true WO2013173774A2 (fr) 2013-11-21
WO2013173774A3 WO2013173774A3 (fr) 2014-03-13

Family

ID=49584477

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/041675 WO2013173774A2 (fr) 2012-05-18 2013-05-17 Sondes d'inversion moléculaire

Country Status (1)

Country Link
WO (1) WO2013173774A2 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104178586A (zh) * 2014-09-01 2014-12-03 亚能生物技术(深圳)有限公司 用于hbv分型与耐药突变基因检测的核酸膜条和试剂盒
WO2016197065A1 (fr) * 2015-06-03 2016-12-08 The General Hospital Corporation Sondes à base d'oligonucléotides monobrin d'adaptation longs (lasso) pour capturer et cloner des bibliothèques complexes
CN106414775A (zh) * 2014-04-11 2017-02-15 宾夕法尼亚大学董事会 用于宏基因组生物标志检测的组合物和方法
CN106939360A (zh) * 2017-05-24 2017-07-11 贵州金域医学检验中心有限公司 HCV 2a亚型NS5B突变检测的PCR扩增引物、试剂盒及检测方法
WO2017147483A1 (fr) 2016-02-26 2017-08-31 The Board Of Trustees Of The Leland Stanford Junior University Visualisation de molécule simple multiplexée d'arn à l'aide d'un système de ligature de proximité à deux sondes
KR20180118561A (ko) * 2017-04-21 2018-10-31 이화여자대학교 산학협력단 표적 점 돌연변이 유전자 검출을 위한 핵산 분자 템플레이트 및 이를 이용한 유전자 검사방법
CN110462060A (zh) * 2017-12-08 2019-11-15 10X基因组学有限公司 用于标记细胞的方法和组合物
CN111235315A (zh) * 2020-03-13 2020-06-05 苏州智享众创孵化管理有限公司 一种可同时检测多种基因型戊型病毒肝炎病毒的方法
CN112063690A (zh) * 2020-09-18 2020-12-11 北京求臻医学检验实验室有限公司 单分子探针多重靶向捕获文库的构建方法及应用

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030053987A1 (en) * 1996-06-11 2003-03-20 Donnelly John J. Synthetic hepatitis c genes
US20030211467A1 (en) * 1999-12-21 2003-11-13 Schlauder George G. Methods and compositions for detecting hepatitis E virus
US20100184205A1 (en) * 2006-12-05 2010-07-22 Issac Bentwich Nucleic acids involved in viral infection
US20110059513A1 (en) * 2007-12-20 2011-03-10 Hvidovre Hospital Efficient cell culture system for hepatitis c virus genotype 1a and 1b
US20110150922A1 (en) * 2007-08-16 2011-06-23 Chrontech Pharma Ab Immunogen platform
WO2011156795A2 (fr) * 2010-06-11 2011-12-15 Pathogenica, Inc. Acides nucléiques pour la détection multiplex d'organismes et leurs procédés d'utilisation et de production

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030053987A1 (en) * 1996-06-11 2003-03-20 Donnelly John J. Synthetic hepatitis c genes
US20030211467A1 (en) * 1999-12-21 2003-11-13 Schlauder George G. Methods and compositions for detecting hepatitis E virus
US20100184205A1 (en) * 2006-12-05 2010-07-22 Issac Bentwich Nucleic acids involved in viral infection
US20110150922A1 (en) * 2007-08-16 2011-06-23 Chrontech Pharma Ab Immunogen platform
US20110059513A1 (en) * 2007-12-20 2011-03-10 Hvidovre Hospital Efficient cell culture system for hepatitis c virus genotype 1a and 1b
WO2011156795A2 (fr) * 2010-06-11 2011-12-15 Pathogenica, Inc. Acides nucléiques pour la détection multiplex d'organismes et leurs procédés d'utilisation et de production

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10883145B2 (en) 2014-04-11 2021-01-05 The Trustees Of The University Of Pennsylvania Compositions and methods for metagenome biomarker detection
CN106414775A (zh) * 2014-04-11 2017-02-15 宾夕法尼亚大学董事会 用于宏基因组生物标志检测的组合物和方法
EP3129503A4 (fr) * 2014-04-11 2018-02-21 The Trustees Of The University Of Pennsylvania Compositions et méthodes de détection d'analytes
CN104178586B (zh) * 2014-09-01 2015-12-30 亚能生物技术(深圳)有限公司 用于hbv分型与耐药突变基因检测的核酸膜条和试剂盒
CN104178586A (zh) * 2014-09-01 2014-12-03 亚能生物技术(深圳)有限公司 用于hbv分型与耐药突变基因检测的核酸膜条和试剂盒
WO2016197065A1 (fr) * 2015-06-03 2016-12-08 The General Hospital Corporation Sondes à base d'oligonucléotides monobrin d'adaptation longs (lasso) pour capturer et cloner des bibliothèques complexes
US20180171386A1 (en) * 2015-06-03 2018-06-21 The General Hospital Corporation Long Adapter Single Stranded Oligonucleotide (LASSO) Probes to Capture and Clone Complex Libraries
US20210108249A1 (en) * 2015-06-03 2021-04-15 The General Hospital Corporation Long Adapter Single Stranded Oligonucleotide (LASSO) Probes to Capture and Clone Complex Libraries
EP4354140A2 (fr) 2016-02-26 2024-04-17 The Board of Trustees of the Leland Stanford Junior University Visualisation de molécule simple multiplexée d'arn à l'aide d'un système de ligature de proximité à deux sondes
WO2017147483A1 (fr) 2016-02-26 2017-08-31 The Board Of Trustees Of The Leland Stanford Junior University Visualisation de molécule simple multiplexée d'arn à l'aide d'un système de ligature de proximité à deux sondes
EP4015647A1 (fr) 2016-02-26 2022-06-22 The Board of Trustees of the Leland Stanford Junior University Visualisation de molécule simple multiplexée d'arn à l'aide d'un système de ligature de proximité à deux sondes
US11008608B2 (en) 2016-02-26 2021-05-18 The Board Of Trustees Of The Leland Stanford Junior University Multiplexed single molecule RNA visualization with a two-probe proximity ligation system
KR20180118561A (ko) * 2017-04-21 2018-10-31 이화여자대학교 산학협력단 표적 점 돌연변이 유전자 검출을 위한 핵산 분자 템플레이트 및 이를 이용한 유전자 검사방법
KR102330252B1 (ko) * 2017-04-21 2021-11-24 이화여자대학교 산학협력단 표적 점 돌연변이 유전자 검출을 위한 핵산 분자 템플레이트 및 이를 이용한 유전자 검사방법
CN106939360A (zh) * 2017-05-24 2017-07-11 贵州金域医学检验中心有限公司 HCV 2a亚型NS5B突变检测的PCR扩增引物、试剂盒及检测方法
CN110462060B (zh) * 2017-12-08 2022-05-03 10X基因组学有限公司 用于标记细胞的方法和组合物
CN110462060A (zh) * 2017-12-08 2019-11-15 10X基因组学有限公司 用于标记细胞的方法和组合物
CN111235315A (zh) * 2020-03-13 2020-06-05 苏州智享众创孵化管理有限公司 一种可同时检测多种基因型戊型病毒肝炎病毒的方法
CN112063690A (zh) * 2020-09-18 2020-12-11 北京求臻医学检验实验室有限公司 单分子探针多重靶向捕获文库的构建方法及应用

Also Published As

Publication number Publication date
WO2013173774A3 (fr) 2014-03-13

Similar Documents

Publication Publication Date Title
AU2019250200B2 (en) Error Suppression In Sequenced DNA Fragments Using Redundant Reads With Unique Molecular Indices (UMIs)
AU2018331434B2 (en) Universal short adapters with variable length non-random unique molecular identifiers
WO2013173774A2 (fr) Sondes d'inversion moléculaire
US20130261196A1 (en) Nucleic Acids For Multiplex Organism Detection and Methods Of Use And Making The Same
EP1877576B1 (fr) Procedes de determination de variantes de sequence utilisant un sequencage des amplicons
AU2018266377A1 (en) Universal short adapters for indexing of polynucleotide samples
CA3220983A1 (fr) Sequences index optimales pour sequencage multiplex massivement parallele
US11028431B2 (en) Detection of short homopolymeric repeats
KR20140087044A (ko) 유기체 검출을 위한 방법 및 시스템
US20080228406A1 (en) System and method for fungal identification
WO2013040060A2 (fr) Acides nucléiques pour détection multiplex du virus de l'hépatite c
WO2013173795A1 (fr) Système de biosurveillance basé sur une séquence en temps réel

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13790030

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 13790030

Country of ref document: EP

Kind code of ref document: A2