WO2018089978A1 - Nucleic acid quantification compositions and methods - Google Patents

Nucleic acid quantification compositions and methods Download PDF

Info

Publication number
WO2018089978A1
WO2018089978A1 PCT/US2017/061469 US2017061469W WO2018089978A1 WO 2018089978 A1 WO2018089978 A1 WO 2018089978A1 US 2017061469 W US2017061469 W US 2017061469W WO 2018089978 A1 WO2018089978 A1 WO 2018089978A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
seq
dna
dna construct
target gene
Prior art date
Application number
PCT/US2017/061469
Other languages
French (fr)
Inventor
Francisco Moya FLORES
Pamela CAMEJO
Original Assignee
Wisconsin Alumni Research Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wisconsin Alumni Research Foundation filed Critical Wisconsin Alumni Research Foundation
Publication of WO2018089978A1 publication Critical patent/WO2018089978A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6851Quantitative amplification
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing

Definitions

  • Next-Gen sequencing has revolutionized the analysis of nucleic acids in complex samples.
  • Next-Gen sequencing has enabled the rapid analysis of genetic material recovered from complex clinical and environmental samples. Such an analysis has furthered our understanding of the identity and diversity of microbial communities throughout our environment.
  • OTUs Operational Taxonomic Units
  • an increase in the relative abundance of an OTU in one sample compared to another sample may be due to an increase in the absolute abundance of the OTU in the sample or a decrease in the absolute abundance of one or more other OTUs in the sample.
  • synthetic DNA constructs are provided.
  • the synthetic DNA constructs may include a standard identifier sequence flanked by a first primer binding site having substantial sequence identity to a first portion of a target gene and a second primer binding site having substantial sequence identity to a second portion of the target gene, wherein the standard identifier sequence does not have substantial sequence identity to the target gene.
  • synthetic DNA constructs between 100 and 1000 nucleotides in length and including SEQ ID NO: 1, SEQ ID NO: 2, a polynucleotide having at least 95% sequence identity to SEQ ID NO: 1, or a polynucleotide having at least 95% sequence identity to SEQ ID NO: 2.
  • the synthetic DNA construct includes SEQ ID NO: 1 or a polynucleotide having at least 95% sequence identity to SEQ ID NO: 1.
  • DNA standard solutions are provided.
  • the DNA standard solutions may include any one of the DNA constructs described herein and a buffer.
  • methods for determining the absolute abundance of at least one DNA nucleic acid comprising a target gene in a sample include (a) obtaining the sample, and (b) adding a predetermined amount of any one of the synthetic DNA constructs described herein or any one of the DNA standard solutions described herein to the sample.
  • the methods may further include (c) amplifying the target gene and the DNA construct in the sample to produce a plurality of amplicons, (d) sequencing the amplicons, and (e) determining the absolute abundance of the at least one DNA nucleic acid in the sample from the sequenced amplicons.
  • kits are also provided.
  • the kits may include any one of the synthetic DNA constructs described herein or any one of the DNA standard solutions described herein.
  • the kits may further include a primer set capable of amplifying a target gene and the synthetic DNA construct or DNA standard solution.
  • Fig. 1 shows the average percentage of total reads classified as each construct (Patho001_16S_rRNA_IS (SEQ ID NO: 1) or Thermus_thermophilus_16S_rRNA_IS (SEQ ID NO: 2)) when using different construct concentrations in one sample of wastewater.
  • Fig. 2 shows the average percentage of total reads classified as each construct
  • Fig. 3 shows the average percentage of total reads classified as each construct (Patho001_16S_rRNA_IS (SEQ ID NO: 1) or Thermus_thermophilus_16S_rRNA_IS (SEQ ID NO: 1)
  • the present inventors provide compositions, methods, and kits that may be used, for example, to compare how the diversity and abundances of OTUs vary across two or more samples.
  • the present inventors disclose compositions and methods for detecting and quantifying the microbial composition of a sample using Next Generation Sequencing ("Next-Gen") tools.
  • Next-Gen Next Generation Sequencing
  • the inventors propose an improved method for quantifying OTUs across two or more samples by adding a linear, synthetic DNA construct of a known quantity prior to DNA extraction and amplicon sequencing.
  • Such an improved method by providing the ability to recognize which OTUs are changing in absolute abundance, would not only facilitate pathogen detection in an environmental sample but may also allow the identification of the source of pathogen contamination in a particular environment such as a lake or food manufacturing process.
  • synthetic DNA constructs are provided.
  • the synthetic DNA constructs may include a standard identifier sequence flanked by a first primer binding site and a second primer binding site.
  • the synthetic DNA constructs may include a fragment of a target gene from a species or cell type that is not expected to be in a sample.
  • the synthetic DNA constructs may be single-stranded or double-stranded. Accordingly, the synthetic DNA constructs may include the synthetic DNA constructs disclosed herein or the reverse complement sequence of the synthetic DNA constructs disclosed herein.
  • the synthetic DNA constructs may be between 50-2000, 100- 1500, 100- 1000, 150-1000, 200-700, or 300-600 nucleotides in length, or any ranges therein.
  • the synthetic DNA constructs comprise, consist essentially of, or consist of SEQ ID NO: 1, SEQ ID NO: 2, a polynucleotide having at least 80%, 90%, 95%, 98%, 99%, or 99.9% sequence identity to SEQ ID NO: 1, or a polynucleotide having at least 80%, 90%, 95%, 98%, 99%, or 99.9% sequence identity to SEQ ID NO: 2.
  • the first primer binding site may have substantial sequence identity to a first portion of a target gene and the second primer binding site may have substantial sequence identity to a second portion of the target gene.
  • the first primer binding site and the second primer binding site are separated by no more than 50, 100, 150, 200, 250, 300, 350, 400, 500, 600, 700, 800, 900, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2500, 2700, 3000, 4000, or 5000 nucleotides within the target gene.
  • sequence identity refers to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences. Percent identity for a nucleic acid sequence may be determined as understood in the art. (See, e.g. , U.S. Patent No. 7,396,664). A suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST), which is available from several sources, including the NCBI, Bethesda, Md., at its website.
  • NCBI National Center for Biotechnology Information
  • BLAST Basic Local Alignment Search Tool
  • the BLAST software suite includes various sequence analysis programs including "blastn,” that is used to align a known polynucleotide sequence with other polynucleotide sequences from a variety of databases. Also available is a tool called “BLAST 2 Sequences” that is used for direct pairwise comparison of two nucleotide sequences. "BLAST 2 Sequences” can be accessed and used interactively at the NCBI website.
  • the first primer binding site and/or the second primer binding site are substantially identical to at least about 15, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, or 110 nucleotides of the target gene.
  • the first primer binding site and/or the second primer binding site are substantially identical to no more than about 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, or 250 nucleotides of the target gene.
  • the first primer binding site and/or the second primer binding site are substantially identical to between 15 to 50 nucleotides of the target gene.
  • the first primer binding site may include SEQ ID NO: 3, SEQ ID NO: 5, or both and the second primer binding site may include SEQ ID NO: 13, SEQ ID NO: 14, or both.
  • the first primer binding site may include SEQ ID NO: 9 or SEQ ID NO: 11, or both and the second primer binding site may include SEQ ID NO: 10 or SEQ ID NO: 12 or both.
  • substantially identical to or “substantial identity” when referring to the first primer binding site and/or the second primer binding site of the present invention means a polynucleotide sequence identity of at least 40%. Suitable polynucleotide identity when referring to the first primer binding site and/or the second primer binding site of the present invention can be any value between 40% and 100%. Preferably, the sequence identity of the first primer binding site to the first portion of the target gene is 100% and the sequence identity of the second primer binding site to the second portion of the target gene is 100%. As used herein "substantially identical to” or “substantial identity” when referring to the standard identifier sequence of the present invention means a polynucleotide sequence identity of at least 97%, 98%, or 99%.
  • the "target gene” in accordance with the present invention may be any gene that could be used to identify a particular cell type or microbial organism.
  • the target gene may be a protein- coding gene used as phylogenetic marker including, without limitation, EF-Tu, fusA, gyrB, ileS, lepA, leuS, pyrG, recA, recG, rplB, rpoB gene or portions of any gene thereof. See, e.g. , http://fepte.cme.msu.edu/index.spr for exemplary phylogenetic markers. Protein-coding genes that may be used for phylogenetic analysis are well known in the art and may be used in accordance with the present invention.
  • the target gene may be a prokaryotic or eukaryotic ribosomal RNA gene.
  • the target gene may be the 16S rRNA gene or a portion thereof such as the V3-V4 region of the 16S rRNA gene.
  • the 16S rRNA gene codes for a ribosomal RNA that is a component of the 30S small subunit of prokaryotic ribosomes.
  • the 16S rRNA gene is commonly used in the art to identify and phylogenetically classify bacteria.
  • ribosomal RNA genes may be used in accordance with the present invention including, without limitation, the 23S rRNA gene or portions thereof (prokaryotes), the 5S rRNA gene or portions thereof (prokaryotes), or the 18S rRNA gene or portions thereof (eukaryotes).
  • the "standard identifier sequence" may be any nucleotide sequence that does not have substantial sequence identity to the target gene from which the first primer binding site and the second primer binding site are derived. As defined above, this may include nucleotide sequences that have less than 97% sequence identity to any portion of the target gene, including the portion of the target gene residing between the first and second primer binding sites.
  • the standard identifier sequence ensures that some embodiments of the synthetic DNA constructs described herein are distinctly different in sequence from the target gene from which the first primer binding site and the second primer binding site are derived. For example, the present inventors have developed a synthetic standard identifier sequence represented in SEQ ID NO: 1.
  • the standard identifier sequence in SEQ ID NO: 1 was designed to ensure consistent amplification with the amplicons being sequenced in the samples in the Examples.
  • the standard identifier sequence in SEQ ID NO: 1 also does not have substantial sequence identity to the target gene from which the first primer binding site and the second primer binding site are derived, which ensures the synthetic DNA construct of SEQ ID NO: 1 can be readily identified among the target genes being sequenced in the samples in the Examples.
  • the standard identifier sequence in SEQ ID NO: 1 corresponds to a 16S rRNA construct encoding the target sites of two different set of primers used for amplicon sequencing. This is an artificial 16S rRNA construct, not existing in any known bacterium.
  • the amplicon sequence generated by these sets of primers is different enough from any other 16S rRNA sequence to be differentiated during downstream bioinformatics analysis.
  • the closest 16S rRNA sequence corresponds to Thermo sulfidibacter takaii (NR_041547), sharing only -91% sequence identity. Since OTUs are usually clustered based on 97% identity, the construct generated here will be recognized as a different OTU during the analysis.
  • the standard identifier sequence may have no more than 97%, 95%, 90%, 85%, 80%, 75%, 70%, or 65% sequence identity to the target gene from which the first primer binding site and the second primer binding site are derived.
  • the standard identifier sequence is between 100-1950, 150-1450, 200-950, 250-650, or 350-550 nucleotides in length.
  • the standard identifier sequence will not hybridize with the primers directed to the first primer binding site and/or the second primer binding site of the synthetic DNA construct.
  • the standard identifier sequence may lack secondary structure or repetitive sequences.
  • the GC content of the standard identifier sequence is between 40-60%.
  • the present inventors also disclose synthetic DNA constructs that are a fragment of a target gene from a species or cell type that is not expected to be in a sample.
  • the synthetic DNA construct in SEQ ID NO: 2 was derived from the 16S rRNA gene of Thermus thermophilics, which was not expected to be found in the samples analyzed in the Examples.
  • the sequence of this synthetic DNA construct is expected to be amplified to a similar extent as the other 16S rRNA genes in the sample given that it is an additional 16S rRNA gene.
  • this synthetic DNA construct could be unambiguously identified after being sequenced because the Thermus thermophilus 16S rRNA gene shares less than 97% sequence identity with the closest 16s rRNA gene is the sample.
  • DNA standard solutions may include any one of the DNA constructs described herein and a buffer.
  • a "buffer” may include any buffer used to buffer a DNA containing solution (i.e., a DNA buffer).
  • Suitable DNA buffers may include, without limitation, water, TE buffer, or other Tris-based buffers.
  • the DNA standard solution may include at least lxlO 6 copies of the DNA construct. In some embodiments, the DNA standard solution may include between about 1 x 10 6 to about 1 x 10 9 copies of the DNA construct.
  • the copies of the DNA construct in the DNA standard solution may be in any volume convenient for introducing the DNA standard solution into a sample including, without limitation, 1 ⁇ , 5 L, 10 ⁇ , 25 ⁇ , 50 ⁇ , 100 L, 200 ⁇ , 500 ⁇ , 1000 ⁇ , or more.
  • the DNA standard solution be about 10 11 copies ⁇ L and then diluted to a desired concentration.
  • Methods for determining the absolute abundance of at least one DNA nucleic acid comprising a target gene in a sample include (a) obtaining the sample, and (b) adding a predetermined amount of any one of the synthetic DNA constructs described herein or any one of the DNA standard solutions described herein to the sample.
  • the methods may further include (c) amplifying the target gene and the DNA construct in the sample to produce a plurality of amplicons, (d) sequencing the amplicons, and (e) determining the absolute abundance of the at least one DNA nucleic acid in the sample from the sequenced amplicons.
  • sample may be any type of sample containing a target gene.
  • the sample may be a clinical sample including, without limitation, a gut, fecal, blood, urine, synovial fluid, or saliva sample.
  • the sample may be an environmental sample including, without limitation, a water sample or a soil sample.
  • the water sample may be obtained from a lake, drinking water source, a wastewater source, a food production process, or a beverage production process.
  • the water sample may be a freshwater sample or a saltwater sample.
  • the "predetermined amount" of the synthetic compound in accordance with the present methods, the "predetermined amount" of the synthetic compound
  • DNA construct may be between 1 x 10 2 - 2 x 1013 copies per sample, or any range therein. In some embodiments, the predetermined amount of the synthetic DNA construct may be between about 1 x 10 6 to about 1 x 10 9 copies per sample. In some embodiments, the predetermined amount of the synthetic DNA construct may be at least 1 x 10 6 copies per sample.
  • the present methods may further include extracting the at least one DNA nucleic acid from the sample after adding the synthetic DNA construct to the sample and prior to the amplification step (c).
  • Methods for extracting DNA from cells in a sample are generally known in the art. In the Examples, the inventors use a Mo Bio Soil DNA Extraction kit or PowerSoil® DNA Isolation Kit Catalog No. 12888-50 & 12888-100. Similar DNA extraction procedures have been previously described in Camejo PY, Owen BR, Martirano J, Ma J, Kapoor V, Santo Domingo J, McMahon KD, Noguera DR, 2016.
  • the present methods could be performed without DNA extraction through, for example, the use of chaotropic agents for the disruption of cells and preservation of nucleic acids under buffered conditions.
  • the present methods may include amplifying the target gene and the synthetic DNA construct in the sample to produce a plurality of amplicons.
  • amplifying refers to increasing the quantity of the target gene and synthetic DNA construct in a sample using amplification techniques that are well-known in the art. Such methods include, without limitation, PCR, isothermal transcription-based amplification, rolling-circle amplification, and strand displacement amplification.
  • the target gene and the synthetic DNA construct are amplified using PCR with a primer set selected from SEQ ID NOs: 3-6.
  • real time PCR is used to amplify and quantify the amplicons.
  • the various amplicons can be differentiated using, for example, distinct TaqMan or other probes.
  • sequencing platforms can be used to "sequence the amplicons" in the present methods.
  • sequencing platforms include, without limitation, Illumina's Next-Gen sequencing technology, or sequencing technologies provided by Ion Torrent, Oxford Nanopore, Pacific Biosciences, Sanger sequencing, or Roche/454.
  • the absolute abundance of the at least one DNA nucleic acid in the sample may be determined from the sequenced amplicons.
  • the number of amplicons of each OTU may be normalized by the number of the synthetic DNA construct sequences detected and the number of average copies of the sequence (16S rRNA or genes) typically found in the genome of each genus.
  • kits are also provided.
  • the kits may include any one of the synthetic DNA constructs described herein or any one of the DNA standard solutions described herein.
  • the kits may further include a primer set capable of amplifying a target gene and the synthetic DNA construct.
  • the primer set may be any primer set selected from SEQ ID NOs: 3-6. It is also envisioned that the present kits may include the components of the Nextera XT Index Kit from Illumina.
  • RNA Unless otherwise specified or indicated by context, the terms “a”, “an”, and “the” mean “one or more.”
  • a protein or “an RNA” should be interpreted to mean “one or more proteins” or “one or more RNAs,” respectively.
  • Example 2 Internal Synthetic DNA Standards Synthetic, linear DNA standards were added to a water sample before total DNA extraction. Two internal DNA standards were tested in the Examples described herein: (1) Patho001_16S_rRNA_InternalStandard - a 656 bp synthetic 16S rRNA sequence (SEQ ID NO: 1); and (2) Thermus_thermophilus_16S_rRNA_InternalStandard - Region 321 to 890 of Thermus thermophilus (NR_113293) 16S rRNA sequence (SEQ ID NO: 2).
  • Each DNA standard was created by conventional DNA synthesis methods and stored in a circular vector, cloned into competent cells and subsequently amplified by PCR using the primer set M13 forward (-20): GTAAAACGACGGCCAG (SEQ ID NO: 7) and M13 reverse: CAGGAAACAGCTATGAC (SEQ ID NO: 8) and purification.
  • the number of copies of internal standard added to the DNA extraction step was calculated based on the standard DNA concentration via spectrophotometric measurements and the length of the molecule.
  • the two internal DNA standards may be targeted by two different primer pairs: (1) Primer pair 1: S-D-Bact-0341-b-S-17 (5'-CCTACGGGNGGCWGCAG)
  • Primer pair 2 515F-Y (5'-GTGYCAGCMGCCGCGGTAA)
  • the closest 16S rRNA sequence to the PCR products generated with both primer pairs and the Patho001_16S_rRNA_InternalStandard corresponds to Thermo sulfidibacter takaii (NR_041547) with a 91% of identity. This degree of differentiation with other 16S rRNA sequences means that the sequence will be clustered as a different organism during the sequencing analysis.
  • Example 3 DNA processing and extraction Prior to DNA extraction, filters were thawed and shattered in liquid nitrogen. Internal DNA standards (described above) were added to each sample. Concentrations between 1 X 10 - 1 X 10 11 copies of each Internal DNA standard were tested per sample.
  • DNA was extracted using the Mo Bio Soil DNA Extraction kit or PowerSoil® DNA Isolation Kit Catalog No. 12888-50 & 12888-100. Similar DNA extraction procedures have been previously described (1. Camejo PY, Owen BR, Martirano J, Ma J, Kapoor V, Santo Domingo J, McMahon KD, Noguera DR, 2016. Candidatus Accumulibacter phosphatis clades enriched under cyclic anaerobic and microaerobic conditions simultaneously use different electron acceptors. Water Res. 2016 Oct 1; 102: 125-37).
  • a lysis tube is prepared for each sample consisting of a sterile 2 ml screw-cap Eppendorf tube containing powerbeads (Mo-Bio, Carlsbad, CA) and 60 ⁇ of CI Lysis Solution (Mo-Bio, Carlsbad, CA), and internal standards (described below). Bead beat for 1 min at medium speed (3.5), centrifuge 30 sec at 10,000 x g and save supernatant in a new Eppendorf tube. Downstream processing was done following manufacturer's instructions.
  • DNA concentrations in all samples were determined using the QubitTM kit (buffer, dye and two standards) from Thermo Fisher Scientific.
  • PCR was used to amplify the 16S V3-V4 region in the extracted DNA samples.
  • the primers were selected from the Klindworth et al. publication (Klindworth A, Pruesse E, Schweer
  • Amplified DNA was purified using AMPure XP beads to purify the 16S V3 and V4 amplicons away from free primers and primer dimer species. Agarose gel electrophoresis was used to quality check the amplicon products.
  • a fluorometric quantification method that uses dsDNA binding dyes was used to quantify the libraries.
  • pooled libraries were denatured with NaOH, diluted with hybridization buffer, and then heat denatured before MiSeq sequencing. Each run included a minimum of 5% PhiX to serve as an internal control for these low diversity libraries.
  • Example 10 Data analysis and identification of potential sources of pathogenicity
  • the data files with the multiplexed paired-end reads sequences are processed using open source bioinformatics tools. Briefly, reads were merged, aligned, filtered (quality score, chimera checking and removal of sequences failing to align) and binned into operational taxonomic units
  • Example 12 DNA Constructs Added to Sample vs. Average % Reads in the Sample
  • Thermus_thermophilus_l6S_rKNA_IS (SEQ ID NO: 2)) in two wastewater samples. Duplicates were performed resulting in 24 sample points.
  • the results show a linear representation between the amount added to the sample of wastewater and the number of sequences identified per sample. For each independent sample analyzed, the correlation coefficients are 0.97, 0.98, 0.99 and 0.98.
  • the addition of 10 7 to 108 copies of construct/reaction is sufficient to capture its presence in an environmental sample.
  • the Patho001_16S_rRNA_IS (SEQ ID NO: 1) standard showed less sample to sample variability than the Thermus_thermophilus_l6S_rRNA_IS (SEQ ID NO: 2) standard. See Fig. 3.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Wood Science & Technology (AREA)
  • Hematology (AREA)
  • Biotechnology (AREA)
  • Urology & Nephrology (AREA)
  • Microbiology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Cell Biology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biophysics (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention generally relates to synthetic DNA constructs and methods of using such constructs for determining the absolute abundance of at least one DNA nucleic acid comprising a target gene in a sample.

Description

NUCLEIC ACID QUANTIFICATION COMPOSITIONS AND METHODS
CROSS-REFERENCE TO RELATED PATENT APPLICATIONS
The present application claims the benefit of priority to United States Provisional Patent Application No. 62/421,445, filed on November 14, 2016, the content of which is incorporated herein by reference in its entirety.
SEQUENCE LISTING
This application is being filed electronically via EFS-Web and includes an electronically submitted Sequence Listing in .txt format. The .txt file contains a sequence listing entitled "2017-11-14_5671-00076_ST25" created on November 14, 2017 and is 4,956 bytes in size. The Sequence Listing contained in this .txt file is part of the specification and is hereby incorporated by reference herein in its entirety.
INTRODUCTION
Next- Generation ("Next-Gen") sequencing has revolutionized the analysis of nucleic acids in complex samples. For example, in the field of metagenomics, Next-Gen sequencing has enabled the rapid analysis of genetic material recovered from complex clinical and environmental samples. Such an analysis has furthered our understanding of the identity and diversity of microbial communities throughout our environment.
Although current metagenomics methodologies have increased our understanding of the microbial composition of complex samples, these methodologies suffer from a number of limitations. One important limitation to current methodologies is that they merely provide the relative abundances of Operational Taxonomic Units (OTUs). Because the abundances of OTUs are relative, one cannot determine the extent or directionality of changes in any particular OTU in a sample when comparing two or more samples. For instance, two samples may have the same relative abundance of an OTU and yet one sample may have a greater absolute number of that OTU if it had a higher total number of microbial cells. Likewise, an increase in the relative abundance of an OTU in one sample compared to another sample may be due to an increase in the absolute abundance of the OTU in the sample or a decrease in the absolute abundance of one or more other OTUs in the sample. There thus remains a need in the art for new nucleic acid analysis compositions and methodologies that may be used to determine the absolute abundance of an OTU (or genomic sequence) in a sample. Such absolute abundances would facilitate the accurate comparison of a particular OTU (or genomic sequence) across two or more samples.
SUMMARY
In one aspect, synthetic DNA constructs are provided. The synthetic DNA constructs may include a standard identifier sequence flanked by a first primer binding site having substantial sequence identity to a first portion of a target gene and a second primer binding site having substantial sequence identity to a second portion of the target gene, wherein the standard identifier sequence does not have substantial sequence identity to the target gene. In one embodiment, synthetic DNA constructs between 100 and 1000 nucleotides in length and including SEQ ID NO: 1, SEQ ID NO: 2, a polynucleotide having at least 95% sequence identity to SEQ ID NO: 1, or a polynucleotide having at least 95% sequence identity to SEQ ID NO: 2. In one embodiment, the synthetic DNA construct includes SEQ ID NO: 1 or a polynucleotide having at least 95% sequence identity to SEQ ID NO: 1.
In another aspect of the present invention, DNA standard solutions are provided. The DNA standard solutions may include any one of the DNA constructs described herein and a buffer.
In another aspect, methods for determining the absolute abundance of at least one DNA nucleic acid comprising a target gene in a sample are also provided. The methods include (a) obtaining the sample, and (b) adding a predetermined amount of any one of the synthetic DNA constructs described herein or any one of the DNA standard solutions described herein to the sample. Optionally, the methods may further include (c) amplifying the target gene and the DNA construct in the sample to produce a plurality of amplicons, (d) sequencing the amplicons, and (e) determining the absolute abundance of the at least one DNA nucleic acid in the sample from the sequenced amplicons.
In a further aspect, kits are also provided. The kits may include any one of the synthetic DNA constructs described herein or any one of the DNA standard solutions described herein. Optionally, the kits may further include a primer set capable of amplifying a target gene and the synthetic DNA construct or DNA standard solution. BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows the average percentage of total reads classified as each construct (Patho001_16S_rRNA_IS (SEQ ID NO: 1) or Thermus_thermophilus_16S_rRNA_IS (SEQ ID NO: 2)) when using different construct concentrations in one sample of wastewater.
Fig. 2 shows the average percentage of total reads classified as each construct
(Patho001_16S_rRNA_IS (SEQ ID NO: 1) or Thermus_thermophilus_16S_rRNA_IS (SEQ ID
NO: 2)) when using different construct concentrations (10 7 , 108 and 109 copies/reaction) in two samples of wastewater.
Fig. 3 shows the average percentage of total reads classified as each construct (Patho001_16S_rRNA_IS (SEQ ID NO: 1) or Thermus_thermophilus_16S_rRNA_IS (SEQ ID
NO: 2)) when using different construct concentrations (10 7 and 108 copies/reaction) in two samples of wastewater.
DETAILED DESCRIPTION
The present inventors provide compositions, methods, and kits that may be used, for example, to compare how the diversity and abundances of OTUs vary across two or more samples. In the Examples, the present inventors disclose compositions and methods for detecting and quantifying the microbial composition of a sample using Next Generation Sequencing ("Next-Gen") tools. The inventors propose an improved method for quantifying OTUs across two or more samples by adding a linear, synthetic DNA construct of a known quantity prior to DNA extraction and amplicon sequencing. Such an improved method, by providing the ability to recognize which OTUs are changing in absolute abundance, would not only facilitate pathogen detection in an environmental sample but may also allow the identification of the source of pathogen contamination in a particular environment such as a lake or food manufacturing process.
In one aspect, synthetic DNA constructs are provided. The synthetic DNA constructs may include a standard identifier sequence flanked by a first primer binding site and a second primer binding site. Alternatively, in some embodiments, the synthetic DNA constructs may include a fragment of a target gene from a species or cell type that is not expected to be in a sample. The synthetic DNA constructs may be single-stranded or double-stranded. Accordingly, the synthetic DNA constructs may include the synthetic DNA constructs disclosed herein or the reverse complement sequence of the synthetic DNA constructs disclosed herein.
The synthetic DNA constructs may be between 50-2000, 100- 1500, 100- 1000, 150-1000, 200-700, or 300-600 nucleotides in length, or any ranges therein. In some embodiments, the synthetic DNA constructs comprise, consist essentially of, or consist of SEQ ID NO: 1, SEQ ID NO: 2, a polynucleotide having at least 80%, 90%, 95%, 98%, 99%, or 99.9% sequence identity to SEQ ID NO: 1, or a polynucleotide having at least 80%, 90%, 95%, 98%, 99%, or 99.9% sequence identity to SEQ ID NO: 2.
The first primer binding site may have substantial sequence identity to a first portion of a target gene and the second primer binding site may have substantial sequence identity to a second portion of the target gene. Suitably, the first primer binding site and the second primer binding site are separated by no more than 50, 100, 150, 200, 250, 300, 350, 400, 500, 600, 700, 800, 900, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2500, 2700, 3000, 4000, or 5000 nucleotides within the target gene.
As used herein, the terms "% sequence identity," "percent identity," "% identity," and
"sequence identity" refer to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences. Percent identity for a nucleic acid sequence may be determined as understood in the art. (See, e.g. , U.S. Patent No. 7,396,664). A suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST), which is available from several sources, including the NCBI, Bethesda, Md., at its website. The BLAST software suite includes various sequence analysis programs including "blastn," that is used to align a known polynucleotide sequence with other polynucleotide sequences from a variety of databases. Also available is a tool called "BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences. "BLAST 2 Sequences" can be accessed and used interactively at the NCBI website. In some embodiments, the first primer binding site and/or the second primer binding site are substantially identical to at least about 15, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, or 110 nucleotides of the target gene. In some embodiments, the first primer binding site and/or the second primer binding site are substantially identical to no more than about 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, or 250 nucleotides of the target gene. Preferably, the first primer binding site and/or the second primer binding site are substantially identical to between 15 to 50 nucleotides of the target gene. In some embodiments, the first primer binding site may include SEQ ID NO: 3, SEQ ID NO: 5, or both and the second primer binding site may include SEQ ID NO: 13, SEQ ID NO: 14, or both. In some embodiments the first primer binding site may include SEQ ID NO: 9 or SEQ ID NO: 11, or both and the second primer binding site may include SEQ ID NO: 10 or SEQ ID NO: 12 or both.
As used herein "substantially identical to" or "substantial identity" when referring to the first primer binding site and/or the second primer binding site of the present invention means a polynucleotide sequence identity of at least 40%. Suitable polynucleotide identity when referring to the first primer binding site and/or the second primer binding site of the present invention can be any value between 40% and 100%. Preferably, the sequence identity of the first primer binding site to the first portion of the target gene is 100% and the sequence identity of the second primer binding site to the second portion of the target gene is 100%. As used herein "substantially identical to" or "substantial identity" when referring to the standard identifier sequence of the present invention means a polynucleotide sequence identity of at least 97%, 98%, or 99%.
The "target gene" in accordance with the present invention may be any gene that could be used to identify a particular cell type or microbial organism. The target gene may be a protein- coding gene used as phylogenetic marker including, without limitation, EF-Tu, fusA, gyrB, ileS, lepA, leuS, pyrG, recA, recG, rplB, rpoB gene or portions of any gene thereof. See, e.g. , http://fungene.cme.msu.edu/index.spr for exemplary phylogenetic markers. Protein-coding genes that may be used for phylogenetic analysis are well known in the art and may be used in accordance with the present invention. The target gene may be a prokaryotic or eukaryotic ribosomal RNA gene. As used in the Examples, the target gene may be the 16S rRNA gene or a portion thereof such as the V3-V4 region of the 16S rRNA gene. The 16S rRNA gene codes for a ribosomal RNA that is a component of the 30S small subunit of prokaryotic ribosomes. The 16S rRNA gene is commonly used in the art to identify and phylogenetically classify bacteria. As will be appreciated by those in the art, other ribosomal RNA genes may be used in accordance with the present invention including, without limitation, the 23S rRNA gene or portions thereof (prokaryotes), the 5S rRNA gene or portions thereof (prokaryotes), or the 18S rRNA gene or portions thereof (eukaryotes).
As used herein, the "standard identifier sequence" may be any nucleotide sequence that does not have substantial sequence identity to the target gene from which the first primer binding site and the second primer binding site are derived. As defined above, this may include nucleotide sequences that have less than 97% sequence identity to any portion of the target gene, including the portion of the target gene residing between the first and second primer binding sites. The standard identifier sequence ensures that some embodiments of the synthetic DNA constructs described herein are distinctly different in sequence from the target gene from which the first primer binding site and the second primer binding site are derived. For example, the present inventors have developed a synthetic standard identifier sequence represented in SEQ ID NO: 1. The standard identifier sequence in SEQ ID NO: 1 was designed to ensure consistent amplification with the amplicons being sequenced in the samples in the Examples. The standard identifier sequence in SEQ ID NO: 1 also does not have substantial sequence identity to the target gene from which the first primer binding site and the second primer binding site are derived, which ensures the synthetic DNA construct of SEQ ID NO: 1 can be readily identified among the target genes being sequenced in the samples in the Examples. The standard identifier sequence in SEQ ID NO: 1 corresponds to a 16S rRNA construct encoding the target sites of two different set of primers used for amplicon sequencing. This is an artificial 16S rRNA construct, not existing in any known bacterium. The amplicon sequence generated by these sets of primers is different enough from any other 16S rRNA sequence to be differentiated during downstream bioinformatics analysis. The closest 16S rRNA sequence corresponds to Thermo sulfidibacter takaii (NR_041547), sharing only -91% sequence identity. Since OTUs are usually clustered based on 97% identity, the construct generated here will be recognized as a different OTU during the analysis.
In some embodiments, the standard identifier sequence may have no more than 97%, 95%, 90%, 85%, 80%, 75%, 70%, or 65% sequence identity to the target gene from which the first primer binding site and the second primer binding site are derived. Suitably, the standard identifier sequence is between 100-1950, 150-1450, 200-950, 250-650, or 350-550 nucleotides in length.
Preferably, the standard identifier sequence will not hybridize with the primers directed to the first primer binding site and/or the second primer binding site of the synthetic DNA construct. The standard identifier sequence may lack secondary structure or repetitive sequences. Suitably, the GC content of the standard identifier sequence is between 40-60%.
Alternatively, the present inventors also disclose synthetic DNA constructs that are a fragment of a target gene from a species or cell type that is not expected to be in a sample. For example, the synthetic DNA construct in SEQ ID NO: 2 was derived from the 16S rRNA gene of Thermus thermophilics, which was not expected to be found in the samples analyzed in the Examples. The sequence of this synthetic DNA construct is expected to be amplified to a similar extent as the other 16S rRNA genes in the sample given that it is an additional 16S rRNA gene. Furthermore, this synthetic DNA construct could be unambiguously identified after being sequenced because the Thermus thermophilus 16S rRNA gene shares less than 97% sequence identity with the closest 16s rRNA gene is the sample.
In another aspect of the present invention, DNA standard solutions are provided. The DNA standard solutions may include any one of the DNA constructs described herein and a buffer. As described herein, a "buffer" may include any buffer used to buffer a DNA containing solution (i.e., a DNA buffer). Suitable DNA buffers may include, without limitation, water, TE buffer, or other Tris-based buffers.
In some embodiments, the DNA standard solution may include at least lxlO6 copies of the DNA construct. In some embodiments, the DNA standard solution may include between about 1 x 106 to about 1 x 109 copies of the DNA construct. The copies of the DNA construct in the DNA standard solution may be in any volume convenient for introducing the DNA standard solution into a sample including, without limitation, 1 μί, 5 L, 10 μί, 25 μί, 50 μί, 100 L, 200 μί, 500 μί, 1000 μί, or more. The DNA standard solution be about 1011 copies^L and then diluted to a desired concentration.
Methods for determining the absolute abundance of at least one DNA nucleic acid comprising a target gene in a sample are also provided. The methods include (a) obtaining the sample, and (b) adding a predetermined amount of any one of the synthetic DNA constructs described herein or any one of the DNA standard solutions described herein to the sample. Optionally, the methods may further include (c) amplifying the target gene and the DNA construct in the sample to produce a plurality of amplicons, (d) sequencing the amplicons, and (e) determining the absolute abundance of the at least one DNA nucleic acid in the sample from the sequenced amplicons.
As used herein, a "sample" may be any type of sample containing a target gene. The sample may be a clinical sample including, without limitation, a gut, fecal, blood, urine, synovial fluid, or saliva sample. The sample may be an environmental sample including, without limitation, a water sample or a soil sample. The water sample may be obtained from a lake, drinking water source, a wastewater source, a food production process, or a beverage production process. The water sample may be a freshwater sample or a saltwater sample.
In accordance with the present methods, the "predetermined amount" of the synthetic
DNA construct may be between 1 x 10 2 - 2 x 1013 copies per sample, or any range therein. In some embodiments, the predetermined amount of the synthetic DNA construct may be between about 1 x 106 to about 1 x 109 copies per sample. In some embodiments, the predetermined amount of the synthetic DNA construct may be at least 1 x 106 copies per sample.
The present methods may further include extracting the at least one DNA nucleic acid from the sample after adding the synthetic DNA construct to the sample and prior to the amplification step (c). Methods for extracting DNA from cells in a sample are generally known in the art. In the Examples, the inventors use a Mo Bio Soil DNA Extraction kit or PowerSoil® DNA Isolation Kit Catalog No. 12888-50 & 12888-100. Similar DNA extraction procedures have been previously described in Camejo PY, Owen BR, Martirano J, Ma J, Kapoor V, Santo Domingo J, McMahon KD, Noguera DR, 2016. Candidatus Accumulibacter phosphatis clades enriched under cyclic anaerobic and microaerobic conditions simultaneously use different electron acceptors. Water Res. 2016 Oct 1; 102: 125-37. Alternatively, the present methods could be performed without DNA extraction through, for example, the use of chaotropic agents for the disruption of cells and preservation of nucleic acids under buffered conditions.
The present methods may include amplifying the target gene and the synthetic DNA construct in the sample to produce a plurality of amplicons. As used herein, "amplifying" refers to increasing the quantity of the target gene and synthetic DNA construct in a sample using amplification techniques that are well-known in the art. Such methods include, without limitation, PCR, isothermal transcription-based amplification, rolling-circle amplification, and strand displacement amplification. In some embodiments, the target gene and the synthetic DNA construct are amplified using PCR with a primer set selected from SEQ ID NOs: 3-6. In one embodiment, real time PCR is used to amplify and quantify the amplicons. The various amplicons can be differentiated using, for example, distinct TaqMan or other probes.
Several sequencing platforms can be used to "sequence the amplicons" in the present methods. Such sequencing platforms include, without limitation, Illumina's Next-Gen sequencing technology, or sequencing technologies provided by Ion Torrent, Oxford Nanopore, Pacific Biosciences, Sanger sequencing, or Roche/454.
Following sequencing, the absolute abundance of the at least one DNA nucleic acid in the sample may be determined from the sequenced amplicons. The number of amplicons of each OTU may be normalized by the number of the synthetic DNA construct sequences detected and the number of average copies of the sequence (16S rRNA or genes) typically found in the genome of each genus.
Kits are also provided. The kits may include any one of the synthetic DNA constructs described herein or any one of the DNA standard solutions described herein. Optionally, the kits may further include a primer set capable of amplifying a target gene and the synthetic DNA construct. In some embodiments, the primer set may be any primer set selected from SEQ ID NOs: 3-6. It is also envisioned that the present kits may include the components of the Nextera XT Index Kit from Illumina.
The present disclosure is not limited to the specific details of construction, arrangement of components, or method steps set forth herein. The compositions and methods disclosed herein are capable of being made, practiced, used, carried out and/or formed in various ways that will be apparent to one of skill in the art in light of the disclosure that follows. The phraseology and terminology used herein is for the purpose of description only and should not be regarded as limiting to the scope of the claims. Ordinal indicators, such as first, second, and third, as used in the description and the claims to refer to various structures or method steps, are not meant to be construed to indicate any specific structures or steps, or any particular order or configuration to such structures or steps. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended merely to facilitate the disclosure and does not imply any limitation on the scope of the disclosure unless otherwise claimed. No language in the specification, and no structures shown in the drawings, should be construed as indicating that any non-claimed element is essential to the practice of the disclosed subject matter. The use herein of the terms "including," "comprising," or "having," and variations thereof, is meant to encompass the elements listed thereafter and equivalents thereof, as well as additional elements. Embodiments recited as "including," "comprising," or "having" certain elements are also contemplated as "consisting essentially of and "consisting of those certain elements.
Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. For example, if a concentration range is stated as 1% to 50%, it is intended that values such as 2% to 40%, 10% to 30%, or 1% to 3%, etc., are expressly enumerated in this specification. These are only examples of what is specifically intended, and all possible combinations of numerical values between and including the lowest value and the highest value enumerated are to be considered to be expressly stated in this disclosure. Use of the word "about" to describe a particular recited amount or range of amounts is meant to indicate that values very near to the recited amount are included in that amount, such as values that could or naturally would be accounted for due to manufacturing tolerances, instrument and human error in forming measurements, and the like. All percentages referring to amounts are by weight unless indicated otherwise.
No admission is made that any reference, including any non-patent or patent document cited in this specification, constitutes prior art. In particular, it will be understood that, unless otherwise stated, reference to any document herein does not constitute an admission that any of these documents forms part of the common general knowledge in the art in the United States or in any other country. Any discussion of the references states what their authors assert, and the applicant reserves the right to challenge the accuracy and pertinence of any of the documents cited herein. All references cited herein are fully incorporated by reference in their entirety, unless explicitly indicated otherwise. The present disclosure shall control in the event there are any disparities between any definitions and/or description found in the cited references.
Unless otherwise specified or indicated by context, the terms "a", "an", and "the" mean "one or more." For example, "a protein" or "an RNA" should be interpreted to mean "one or more proteins" or "one or more RNAs," respectively.
As used herein, "about," "approximately," "substantially," and "significantly" will be understood by persons of ordinary skill in the art and will vary to some extent on the context in which they are used. If there are uses of these terms which are not clear to persons of ordinary skill in the art given the context in which they are used, "about" and "approximately" will mean plus or minus <10% of the particular term and "substantially" and "significantly" will mean plus or minus >10% of the particular term.
The following examples are meant only to be illustrative and are not meant as limitations on the scope of the invention or of the appended claims.
EXAMPLES We describe a method for preparing water samples retrieved from natural water bodies, water wells and wastewater treatment plants, for sequencing the variable V3-V4 regions of the 16S rRNA gene. The methods described herein can also be used for sequencing other regions with different region- specific primers. The steps were adapted from the Illumina 16S Metagenomic Sequencing Library Preparation document # 15044223 Rev B.
Example 1 - Sample collection
Water was filtered through a 0.22 μιη pore-size, 142 mm diameter Supor membrane filter (Pall, Port Washington, NY) in duplicate for total community metagenomics. Sampled filters were stored in screw cap tubes and frozen at -80°C until extraction. Large volume samples were stored at 4°C until they are processed and then stored at -80°C.
Example 2 - Internal Synthetic DNA Standards Synthetic, linear DNA standards were added to a water sample before total DNA extraction. Two internal DNA standards were tested in the Examples described herein: (1) Patho001_16S_rRNA_InternalStandard - a 656 bp synthetic 16S rRNA sequence (SEQ ID NO: 1); and (2) Thermus_thermophilus_16S_rRNA_InternalStandard - Region 321 to 890 of Thermus thermophilus (NR_113293) 16S rRNA sequence (SEQ ID NO: 2). Each DNA standard was created by conventional DNA synthesis methods and stored in a circular vector, cloned into competent cells and subsequently amplified by PCR using the primer set M13 forward (-20): GTAAAACGACGGCCAG (SEQ ID NO: 7) and M13 reverse: CAGGAAACAGCTATGAC (SEQ ID NO: 8) and purification. The number of copies of internal standard added to the DNA extraction step was calculated based on the standard DNA concentration via spectrophotometric measurements and the length of the molecule.
The two internal DNA standards (SEQ ID NOs: 1 and 2) may be targeted by two different primer pairs: (1) Primer pair 1: S-D-Bact-0341-b-S-17 (5'-CCTACGGGNGGCWGCAG)
(SEQ ID NO: 3) and
S-D-Bact-0785-a-A-21 (5 '-GACTACHVGGGTATCTAATCC) (SEQ ID NO: 4)
(2) Primer pair 2: 515F-Y (5'-GTGYCAGCMGCCGCGGTAA)
(SEQ ID NO: 5) and 926R (5'-CCGYCAATTYMTTTRAGTTT)
(SEQ ID NO: 6)
The closest 16S rRNA sequence to the PCR products generated with both primer pairs and the Patho001_16S_rRNA_InternalStandard (SEQ ID NO: 1) corresponds to Thermo sulfidibacter takaii (NR_041547) with a 91% of identity. This degree of differentiation with other 16S rRNA sequences means that the sequence will be clustered as a different organism during the sequencing analysis.
Example 3 - DNA processing and extraction Prior to DNA extraction, filters were thawed and shattered in liquid nitrogen. Internal DNA standards (described above) were added to each sample. Concentrations between 1 X 10 - 1 X 1011 copies of each Internal DNA standard were tested per sample.
DNA was extracted using the Mo Bio Soil DNA Extraction kit or PowerSoil® DNA Isolation Kit Catalog No. 12888-50 & 12888-100. Similar DNA extraction procedures have been previously described (1. Camejo PY, Owen BR, Martirano J, Ma J, Kapoor V, Santo Domingo J, McMahon KD, Noguera DR, 2016. Candidatus Accumulibacter phosphatis clades enriched under cyclic anaerobic and microaerobic conditions simultaneously use different electron acceptors. Water Res. 2016 Oct 1; 102: 125-37). Briefly, a lysis tube is prepared for each sample consisting of a sterile 2 ml screw-cap Eppendorf tube containing powerbeads (Mo-Bio, Carlsbad, CA) and 60 μΐ of CI Lysis Solution (Mo-Bio, Carlsbad, CA), and internal standards (described below). Bead beat for 1 min at medium speed (3.5), centrifuge 30 sec at 10,000 x g and save supernatant in a new Eppendorf tube. Downstream processing was done following manufacturer's instructions.
Example 4 - DNA Quantification
DNA concentrations in all samples were determined using the Qubit™ kit (buffer, dye and two standards) from Thermo Fisher Scientific.
Example 5 - DNA Normalization
All samples were diluted to 5 ng/μΕ with TE buffer (IX).
Example 6 - PCR Amplification of 16S rRNA gene
PCR was used to amplify the 16S V3-V4 region in the extracted DNA samples. The primers were selected from the Klindworth et al. publication (Klindworth A, Pruesse E, Schweer
T, Peplles J, Quast C, et al. (2013) Evaluation of general 16S ribosomal RNA gene PCR primers or classical and next -generation sequencing -based diversity studies. Nucleic Acids Res 41(1).)
See SEQ ID NOs: 3-4. Master mixes with both PCR primers and the KAPA mix were made including Amplicon PCR Forward Primer 1 μΜ, Amplicon PCR Reverse Primer 1 μΜ, and 2x
KAPA HiFi HotStart ReadyMix. DNA (5ng^L) from each sample was added to each reaction and thermocycled using the following parameters:
a. 95°C for 3 minutes
b. 25 cycles of: c. 95°C for 30 seconds
d. 55 °C for 30 seconds
e. 72°C for 30 seconds
f. 72°C for 5 minutes
g. Hold at 4°C
Amplified DNA was purified using AMPure XP beads to purify the 16S V3 and V4 amplicons away from free primers and primer dimer species. Agarose gel electrophoresis was used to quality check the amplicon products.
Example 7 - Index PCR
An index PCR reaction was used to attach dual indices and Illumina sequencing adapters using the Nextera XT Index Kit.
Example 8 - Library Quantification, Normalization, and Pooling
A fluorometric quantification method that uses dsDNA binding dyes was used to quantify the libraries. DNA concentrations in nM were calculated for each sample, based on the size of DNA amplicons as determined by an Agilent Technologies 2100 Bioanalyzer trace: (concentration in ng/μΐ) / (660 g/mol x average library size) x 10A6 = concentration in nM. For example: (15 ng/μΐ) / (660 g/mol x 500) x 10A6 = 45 nM
Concentrated final libraries were diluted using Resuspension Buffer (RSB) or 10 mM Tris pH 8.5 to 4nM. Diluted DNA from each library was aliquoted (5 μί) and mixed for pooling libraries with unique indices. Depending on coverage needs, up to 96 libraries were pooled for one MiSeq run.
Example 9 - Library Denaturing and MiSeq Loading and Sequencing
In preparation for cluster generation and sequencing, pooled libraries were denatured with NaOH, diluted with hybridization buffer, and then heat denatured before MiSeq sequencing. Each run included a minimum of 5% PhiX to serve as an internal control for these low diversity libraries.
Example 10 - Data analysis and identification of potential sources of pathogenicity
The data files with the multiplexed paired-end reads sequences are processed using open source bioinformatics tools. Briefly, reads were merged, aligned, filtered (quality score, chimera checking and removal of sequences failing to align) and binned into operational taxonomic units
(OTU) with 97% identity. Then, taxonomic classification of the most representative sequences from each OTU is done by using the public SILVA (3. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner FO, 2013. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucl. Acids Res. 41 (Dl): D590-D596) and PATRIC4 databases (4. Wattam, AR, Abraham D, Dalay O, Disz TL, Sobral BW, et al, 2014. PATRIC, the bacterial bioinformatics database and analysis resource. Nucl Acids Res 42 (Dl): D581-D591.). Subsequently, the number of 16S rRNA sequences of each OTU was normalized by the number of internal standard 16S rRNA sequences detected and the number of average 16S rRNA sequences typically found in each genus. Estimation of the number of cells per volume of water was carried out by averaging three replicates.
Example 11 - Internal DNA Standard Concentration Needed for Reliable Detection of the Standard
Using the methods described herein, we tested four different concentrations (0, 10 2 , 104 and 106 copies/DNA extraction reaction) of the two internal DNA constructs (Patho001_16S_rRNA_IS (SEQ ID NO: 1) and Thermus_thermophilus_16S_rRNA_lS (SEQ ID NO: 2)) in wastewater samples. Duplicates were performed.
As shown in Fig. 1, in the wastewater samples tested, both DNA constructs were only detected when a concentration of 106 copies/reaction were used. This suggests that the minimum number of copies/reaction needed to be added to wastewater samples for detection was somewhere in the range of 104-106 with the methods disclosed herein. No reads from the sequencing reactions corresponding to either DNA construct were detected when the initial concentrations were 0, 10 2 or 104.
99.74% and 99.70% of sequences retrieved from Patho001_16S_rRNA_IS and Thermus_thermophilus_l6S_rKNA_IS were correctly identified using the free open source bioinformatics program "QIIME" with a modified SILVA database that includes the sequence of the hybrid construct Patho001_16S_rRNA_IS.
Example 12 - DNA Constructs Added to Sample vs. Average % Reads in the Sample
Using the methods described herein and the two previously mentioned DNA constructs (Patho001_16S_rRNA_IS (SEQ ID NO: 1) and Thermus_thermophilus_16S_rRNA_lS (SEQ ID NO: 2)), we increased the concentration of these internal standards to analyze their presence at different concentrations when added to wastewater samples. We tested three different concentrations (107 , 108 and 109 copies/DNA extraction reaction) of the two internal DNA constructs (Patho001_16S_rRNA_IS (SEQ ID NO: 1) and
Thermus_thermophilus_l6S_rKNA_IS (SEQ ID NO: 2)) in two wastewater samples. Duplicates were performed resulting in 24 sample points.
As shown in Fig. 2, the results show a linear representation between the amount added to the sample of wastewater and the number of sequences identified per sample. For each independent sample analyzed, the correlation coefficients are 0.97, 0.98, 0.99 and 0.98. For any sample processed, we identified that the addition of 10 7 to 108 copies of construct/reaction is sufficient to capture its presence in an environmental sample. Surprisingly, when the DNA construct was added to the sample at 108 copies, the Patho001_16S_rRNA_IS (SEQ ID NO: 1) standard showed less sample to sample variability than the Thermus_thermophilus_l6S_rRNA_IS (SEQ ID NO: 2) standard. See Fig. 3.

Claims

CLAIMS We claim:
1. A synthetic DNA construct comprising a standard identifier sequence flanked by a first primer binding site having substantial sequence identity to a first portion of a target gene and a second primer binding site having substantial sequence identity to a second portion of the target gene, wherein the standard identifier sequence does not have substantial sequence identity to the target gene.
2. The DNA construct of claim 1, wherein the target gene is a prokaryotic ribosomal RNA gene.
3. The DNA construct of any one of the preceding claims, wherein the target gene is selected from the group consisting of the 16S rRNA gene, 23S rRNA gene, and 5S rRNA gene.
4. The DNA construct of any one of the preceding claims, wherein the target gene is the 16S rRNA gene.
5. The DNA construct of any one of the preceding claims, wherein the target gene is the V3-V4 region of the 16S rRNA gene.
6. The DNA construct of any one of the preceding claims, wherein the standard identifier sequence lacks secondary structure or repetitive sequences.
7. The DNA construct of any one of the preceding claims, wherein the standard identifier sequence is between 50-500 nucleotides in length.
8. The DNA construct of any one of the preceding claims, wherein the DNA construct is between 100-1000 nucleotides in length.
9. The DNA construct of any one of the preceding claims, wherein the DNA construct is double- stranded.
10. The DNA construct of any one of the preceding claims, wherein the first primer binding site comprises SEQ ID NO: 3 or SEQ ID NO: 5 and the second primer binding site comprises SEQ ID NO: 13 or SEQ ID NO: 14.
11. The DNA construct of claim 10, wherein the first primer binding site comprises SEQ ID NO: 9 or SEQ ID NO: 11 and the second primer binding site comprises SEQ ID NO: 10 and SEQ ID NO: 12.
12. The DNA construct of any one of the preceding claims, wherein the DNA construct comprises SEQ ID NO: 1, SEQ ID NO: 2, a polynucleotide having at least 95%, sequence identity to SEQ ID NO: 1, or a polynucleotide having at least 95% sequence identity to SEQ ID NO: 2.
13. A DNA standard solution comprising any one of the DNA constructs of claims 1-12 and a buffer.
14. The DNA standard solution of claim 13, wherein the DNA standard solution comprises at least lxlO6 copies of the DNA construct.
15. A method for determining the absolute abundance of at least one DNA nucleic acid comprising a target gene in a sample, the method comprising:
(a) obtaining the sample,
(b) adding a predetermined amount of any one of the DNA constructs of claims 1-12 or the DNA standard solutions of claims 13-14 to the sample.
16. The method of claim 15, further comprising:
(c) amplifying the target gene and the DNA construct in the sample to produce a plurality of amplicons,
(d) sequencing the amplicons, and
(e) determining the absolute abundance of the at least one DNA nucleic acid in the sample from the sequenced amplicons.
17. The method of claim 16, further comprising extracting the at least one DNA nucleic acid from the sample after adding the DNA construct to the sample and prior to the amplification step (c).
18. The method of any one of claims 15-17, wherein the sample is selected from the group consisting of a water sample, a soil sample, and a gut sample.
19. The method of any one of claims 15-18, wherein the sample comprises a water sample from a drinking water source, a wastewater source, a food production process, or a beverage production process.
20. The method of any one of claims 15-19, wherein the predetermined amount of the DNA construct is between about lxlO6 - lxlO10 copies per sample.
21. The method of any one of claims 15-20, wherein the target gene and the DNA construct are amplified using a primer set selected from SEQ ID NOs: 3-6.
22. The method of any one of claims 15-21, wherein the amplicons are sequenced using Illumina Next-Gen sequencing technology.
23. A kit comprising any one of the DNA constructs of claims 1-12 or the
DNA standard solutions of claims 13-14.
24. The kit of claim 23, further comprising a primer set capable of amplifying a target gene.
25. The kit of claim 24, wherein the target gene is a prokaryotic ribosomal RNA gene
26. The kit of claim 25, wherein the target gene is selected from the group consisting of the 16S rRNA gene, the 23S rRNA gene, and the 5S rRNA gene.
27. The kit of any one of claims 24-26, wherein the primer set is selected from SEQ ID NOs: 3-6.
28. A kit for performing any one of the methods described herein.
29. A synthetic DNA construct comprising SEQ ID NO: 1, SEQ ID NO: 2, a polynucleotide having at least 95% sequence identity to SEQ ID NO: 1, or a polynucleotide having at least 95% sequence identity to SEQ ID NO: 2, wherein the DNA construct is between
100-1000 nucleotides in length.
30. The synthetic DNA construct of claim 29, wherein the synthetic DNA construct comprises SEQ ID NO: 1 or a polynucleotide having at least 98%, sequence identity to
SEQ ID NO: 1.
31. A method for determining the absolute abundance of at least one DNA nucleic acid comprising a target gene in a sample, the method comprising:
(a) obtaining the sample, (b) adding a predetermined amount of the synthetic DNA construct of claim 29 to the sample.
PCT/US2017/061469 2016-11-14 2017-11-14 Nucleic acid quantification compositions and methods WO2018089978A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662421445P 2016-11-14 2016-11-14
US62/421,445 2016-11-14

Publications (1)

Publication Number Publication Date
WO2018089978A1 true WO2018089978A1 (en) 2018-05-17

Family

ID=62110664

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/061469 WO2018089978A1 (en) 2016-11-14 2017-11-14 Nucleic acid quantification compositions and methods

Country Status (1)

Country Link
WO (1) WO2018089978A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3959337A4 (en) * 2019-04-24 2023-08-30 Genepath Diagnostics Inc. Method for detecting specific nucleic acids in samples

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030027135A1 (en) * 2001-03-02 2003-02-06 Ecker David J. Method for rapid detection and identification of bioagents

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030027135A1 (en) * 2001-03-02 2003-02-06 Ecker David J. Method for rapid detection and identification of bioagents

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DATABASE NCBI [O] Nucleotide; 13 June 2011 (2011-06-13), "Thermus thermophilus strain SSW-06 16S ribosomal RNA gene , partial sequence", XP055484698, Database accession no. JN115058 *
DATABASE NCBI Nucleotide; 14 November 2006 (2006-11-14), "Thermosulfidibacter takaii gene for 16S rRNA, partial sequence, strain: ABI70S6", XP055484700, Database accession no. AB282756 *
HEDLUND ET AL.: "Potential role of Thermus thermophilus and T. oshimai in high rates of nitrous oxide (N20) production in ~80° C hot springs in the US Great Basin", GEOBIOLOGY, vol. 9, no. 6, 27 September 2011 (2011-09-27), pages 471 - 480, XP055484692 *
NUNOURA ET AL.: "Thermosulfidibacter takaii gen. nov.. sp. nov.. a thermophilic. hydrogen-oxidizing, sulfur-reducing chemolithoautotroph isolated from a deep-sea hydrothermal field in the Southern Okinawa Trough", INT J SYST EVOL MICROBIOL, vol. 58, 1 March 2008 (2008-03-01), pages 659 - 665, XP055484728 *
SRINIVASAN ET AL.: "Use of 16S rRNA Gene for Identification of a Broad Range of Clinically Relevant Bacterial Pathogens", PLOS ONE, vol. 10, no. 2, 1 January 2015 (2015-01-01), pages e0117617, XP055253245 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3959337A4 (en) * 2019-04-24 2023-08-30 Genepath Diagnostics Inc. Method for detecting specific nucleic acids in samples

Similar Documents

Publication Publication Date Title
US11085079B2 (en) Universal Sanger sequencing from next-gen sequencing amplicons
Chow et al. Seasonality and monthly dynamics of marine myovirus communities
RU2017127990A (en) DETECTION BY MEANS OF NANOPORA OF TARGET POLINUCLEOTIDES FROM THE SAMPLE BACKGROUND
Lee et al. Analysis of microbial composition associated with freshwater and seawater
DK3146070T3 (en) Sequencing Process
CN107849618A (en) Differentiate and detect the genetic marker of aquatile infectious disease Causative virus and using its Causative virus discriminating and detection method
CN106834530A (en) The method of primer, kit and detection HBB gene sequence
TW201321518A (en) Method of micro-scale nucleic acid library construction and application thereof
CN103184216B (en) Primer composition for amplifying coding sequence of immunoglobulin heavy chain CDR3 and use thereof
CN103205420A (en) Primer composition for amplifying T cell receptor beta chain CDR3 coding sequence and application thereof
JP2016520326A (en) Molecular bar coding for multiplex sequencing
Thies Molecular approaches to studying the soil biota
CN103215255B (en) Primer set for amplifying immunoglobulin light chain CDR3 sequence and application thereof
CN112384608A (en) Bacterial capture sequencing platform and design, construction and use methods thereof
WO2018089978A1 (en) Nucleic acid quantification compositions and methods
KR101969905B1 (en) Primer set for library of base sequencing and manufacturing method of the library
Prosdocimi et al. Errors in ribosomal sequence datasets generated using PCR-coupled ‘panbacterial’pyrosequencing, and the establishment of an improved approach
KR20220074756A (en) Method for tracking the generation order of the generaed strands by linking information of the strands generated during the pcr process to create a cluster
KR20220012896A (en) Methods for detecting rare DNA sequences in fecal samples
JP2004344065A (en) Oligonucleotide and method for detecting mycobacterium tuberculosis group using the same
US20230416727A1 (en) Hairpin oligonucleotides and uses thereof
US20230295714A1 (en) Methods of Producing Ribosomal Ribonucleic Acid Complexes
US20210172012A1 (en) Preparation of dna sequencing libraries for detection of dna pathogens in plasma
KR101609295B1 (en) Method of preparing metagenomic library by inhibiting amplification of target microbial species using PNA probe
Soto Serrano et al. Matching Excellence: ONT's Rise to Parity with PacBio in Genome Reconstruction of Non-Model Bacterium with High GC Content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17869401

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17869401

Country of ref document: EP

Kind code of ref document: A1