WO2012064739A2 - Microbial enrichment primers - Google Patents

Microbial enrichment primers Download PDF

Info

Publication number
WO2012064739A2
WO2012064739A2 PCT/US2011/059783 US2011059783W WO2012064739A2 WO 2012064739 A2 WO2012064739 A2 WO 2012064739A2 US 2011059783 W US2011059783 W US 2011059783W WO 2012064739 A2 WO2012064739 A2 WO 2012064739A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
primers
sequence
rrna
hexamer
Prior art date
Application number
PCT/US2011/059783
Other languages
French (fr)
Other versions
WO2012064739A3 (en
Inventor
Lan QUAN
Alexander Solovyov
W. Ian Lipkin
Original Assignee
The Trustees Of Columbia University In The City Of New York
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Trustees Of Columbia University In The City Of New York filed Critical The Trustees Of Columbia University In The City Of New York
Publication of WO2012064739A2 publication Critical patent/WO2012064739A2/en
Publication of WO2012064739A3 publication Critical patent/WO2012064739A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection

Definitions

  • RNA ribosomal RNA
  • the invention relates to the field of microbe detection.
  • the invention relates to microbe detection in an organism using transcriptome libraries.
  • the invention provides a composition comprising 20 or more nucleic acid sequences, wherein each of the 20 nucleic acid sequence comprises a different hexamer sequence selected from the group consisting of the hexamers sequences in Table 1 , provided that at least one nucleic sequence does not comprise a hexamer sequence selected from the group consisting of the hexamer sequences in Table 4.
  • each different hexamer sequence is selected from the group consisting of the hexamer sequences in Table 2. In some embodiments, each different hexamer sequence is selected from the group consisting of hexamer sequences in Table 3.
  • the composition comprises 200 or more nucleic acid sequences. In some embodiments, the composition comprises 800 or more nucleic acid sequences.
  • each nucleic acid further comprises a tail sequence 5' to the hexamer sequence, wherein the tail sequence is about 10 to about 22 nucleotides in length, wherein tail sequence is separated from the hexamer sequence by 0 to 10 nucleotides, wherein each nucleic acid comprises the same tail sequence.
  • each nucleic acid in the composition has the same length or substantially the same length. In some embodiments, each nucleic acid has the hexamer sequence in the same position in the nucleic acid.
  • each nucleic acid sequence is a primer. In some embodiments, each nucleic sequence is DNA, RNA, PNA, LNA, GNA or TNA.
  • the invention provides a method for designing a primer set for amplification of microbial nucleic acids in an organism comprising: (a) sequencing the transcriptome of an organism; (b) identifying highly expressed genes of the organism from the plurality of sequence reads identified in step (a); (c) providing a first primer library, wherein each primer comprises a different hexamer sequence; and (d) removing primers from the first primer library that are predicted to anneal to the RNA of the organism's highly expressed genes to generate a second primer library, provided that primers expected to anneal to the RNA predicted to form a secondary structure are not removed from the first primer library.
  • primers comprising hexamer sequences with perfect sequence matches to the regions giving a substantial number of reads are removed from the first primer library.
  • the substantial amount of reads is more than 1% of the relative coverage depth.
  • steps (b) or (d) are performed by a computer.
  • the organism's highly expressed genes comprise 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase, NADH dehydrogenase. In some embodiments, the organism's highly expressed genes further comprise one or more additional oxidative phosphorylation genes.
  • the transcriptome of the organisms is sequenced using unbiased high throughput sequencing.
  • the second primer library comprises 20 or more primers. In some embodiments, the second primer library comprises 800 or more primers. In some embodiments, the second primer library comprises 1600 or more primers.
  • the method further comprises separating the primers in the second primer library into two primer sets.
  • one primer set is used to generate a first cDNA strand from total RNA and the other primer set is used to generate a second cDNA strand from the first cDNA strand.
  • the organism is a eukaryote. In some embodiments, the organism is a human.
  • the method further comprises producing the second primer library.
  • the invention provides a method for amplifying a microbial nucleic acid comprising: (a) providing a sample from an organism; (b) isolating total RNA from the sample; (c) reverse-transcribing total RNA from the sample using a set of forward strand primers to provide a first cDNA strand, wherein the forward strand primers are designed not to amplify organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts, wherein the forward strand primers comprise primers complementary to regions predicted to form secondary RNA structure of the organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts; (d) replicating the first cDNA strand using a set of reverse strand primers to provide a second cDNA strand
  • the forward strand primers and the reverse strand primers are further designed not to amplify the RNA transcripts of one or more additional oxidative phosphorylation genes.
  • the forward strand primers and the reverse strand primers each comprise a different hexamer sequence. In some embodiments, the forward strand primers and the reverse strand primers each comprise the same specific tail sequence.
  • At least one of the double-strand cDNA primers is complementary to the specific tail sequence or a portion thereof.
  • each forward strand primer comprises a hexamer sequence selected from the group consisting of the hexamer sequences in Table 2.
  • each reverse strand primer comprises a hexamer sequence selected from the group consisting of the hexamer sequences in Table 3.
  • the method further comprises determining whether the organism is infected with a microbe.
  • the invention provides a composition comprising 800 or more nucleic acids, wherein each nucleic acid has the structure: H-N a -ST; wherein H is a nucleotide sequence of 5 to 7 nucleotides; N is a random nucleotide; a is an integer from 0 to 12; and ST is a nucleotide sequence from 10-22 nucleotides, provided that each nucleic acid in the composition has a different sequence H.
  • H is a nucleotide sequence of 6 nucleotides. In some embodiments, in at least one nucleic acid sequence H is not selected from the group consisting of the hexamer sequences in Table 4. [0032] In some embodiments, H is selected from the group consisting of the hexamer sequences in Table 2. In some embodiments, H is selected from the group consisting of the hexamer sequences in Table 3.
  • a is 1.
  • the composition comprises at least 1600 nucleic acids.
  • ST is a nucleotide sequence of 17 nucleotides.
  • the invention provides a kit comprising a composition of any of the aspects and embodiments described above and instructions for use.
  • FIG. 1 is a schematic of a secondary DNA structure impeding the ability to synthesize cDNA
  • FIG. 2 is an illustration of the coverage depth for the 28S rRNA gene by unbiased 454 high-throughput sequencing
  • FIG. 3A is an illustration of the first strand cDNA synthesis using forward strand microbial enrichment (FS-MEP) primers;
  • FIG. 3B is an illustration of the second strand cDNA synthesis using reverse strand microbial enrichment (RS-MEP) primers
  • FIG. 3C is an illustration of the PCR amplification of double stranded cDNA generated with the FS-MEP and RS-MEP primers;
  • FIGS. 4A and 4B are illustrations explaining coverage depth
  • FIG. 5 is an illustration of the predicted secondary structure of the 28S human ribosomal RNA sequence at 65 °C using RNAfold program from the Vienna-RNA package;
  • Figs. 6A-6D are illustrations of the relative coverage depth for human host genes 12S, 16S, 18S and 28S rRNA in 45 UHTS experiments using random primers;
  • Figs. 7A-7D are illustrations of the raw coverage depths for human host genes 12S, 16S, 18S and 28S rRNA in 45 UHTS experiments using random primers;
  • Fig. 8 is a schematic of the microbe RNA detection and amplification using the microbial enrichment primer procedure; and [0045] Fig. 9A and 9B are illustrations of the raw coverage depth of Lujo virus for its S and L segments, respectively, using the MEP primers and using random primers.
  • variable As used herein, the recitation of a numerical range for a variable is intended to convey that the invention may be practiced with the variable equal to any of the values within that range. Thus, for a variable which is inherently discrete, the variable can be equal to any integer value within the numerical range, including the end-points of the range.
  • variable can be equal to any real value within the numerical range, including the end-points of the range.
  • a variable which is described as having values between 0 and 2 can take the values 0, 1 or 2 if the variable is inherently discrete, and can take the values 0.0, 0.1, 0.01, 0.001, or any other real values > 0 and ⁇ 2 if the variable is inherently continuous.
  • microbe refers to bacteria, viruses, fungi, parasites, and other infectious agents that are capable of infecting a host.
  • host or "organism” refers to any selected source that is infected or could potentially be infected with a microbe.
  • organisms include vertebrates, invertebrates, mammals, humans, dogs, cats, cattle, pigs, sheep, rabbits, mice, rats, birds, reptiles, amphibians, fish, insects, plants, tissue cultures, and cell cultures.
  • sample refers to material obtained from an organism. Non-limiting examples of samples include tissue, body fluids, blood, saliva, sperm, cells.
  • the coverage depth of a sequence region is the number of sequence reads which contain this region (i.e., they "cover” it). It does not have units as it is a count of reads. It is measured using the 454 reference mapper application. To increase the sensitivity of detection of a microbe infection, it is desirable to increase coverage depth (number of reads) for microbial sequences and to decrease it for the common host sequences.
  • the present disclosure is directed to nucleic acid sequences as described herein for amplification of microbial nucleic acids from samples.
  • Amplification can be performed by any suitable method known in the art, for example but not limited to polymerase chain reaction (“PCR"), real-time polymerase chain reaction (“RT- PCR”), and transcription mediated amplification (TMA).
  • PCR polymerase chain reaction
  • RT- PCR real-time polymerase chain reaction
  • TMA transcription mediated amplification
  • UHTS refers to Unbiased High Throughput Sequencing.
  • MEP primers or “Microbial Enrichment Primers” or “microbe enrichment primers” refers to a set of primers designed or obtained using the methods described herein, which take into account the highly expressed host genes and the secondary structure of the RNA transcripts to be amplified.
  • forward strand microbial enrichment primers and "FS-MEP” refer to microbial enrichment primers or a set of microbial enrichment primers that used to generate a first cDNA strand from the total RNA.
  • FS-MEP are of the formula H-N-ST described above where H is a nucleic acid hexamer (i.e., six nucleotides) N is a random nucleotide, and ST is a Specific Tail Sequence.
  • the forward strand microbial enrichment primers are designed not to amplify transcripts from the highly expressed genes of an organism and include primers that are complementary to organism's RNA predicted to form a secondary structure.
  • RS-MEP reverse strand microbial enrichment primers
  • RS-MEP are of the formula H-N-ST described above where H is a nucleic acid hexamer (i.e., six nucleotides), N is a random nucleotide and ST is a Specific Tail Sequence.
  • H is a nucleic acid hexamer (i.e., six nucleotides)
  • N is a random nucleotide
  • ST is a Specific Tail Sequence.
  • the hexamers (H) of the RS-MEP are complementary to the hexamers (H) of the FS-MEP.
  • the invention relates to a composition comprising 20 or more nucleic acid sequences, wherein each of the 20 nucleic acid sequence comprises a different hexamer sequence selected from the group consisting of the hexamers sequences of any of SEQ ID NOs: 1-1662, SEQ ID NOs: 1663-2490, or SEQ ID NOs: 2491-3324, provided that at least one nucleic sequence does not comprise a hexamer sequence selected from the group consisting of the hexamer sequences of any of SEQ ID NOs: 3325-4822.
  • the composition comprising 20 or more nucleic acid sequences comprises 200 or more nucleic acid sequences.
  • the composition comprising 20 or more nucleic acid sequences comprises 800 or more nucleic acid sequences.
  • the composition comprising 20 or more nucleic acid sequences comprises is a composition wherein each nucleic acid further comprises a tail sequence 5 ' to the hexamer sequence, and wherein the tail sequence is about 10 to about 22 nucleotides in length, and wherein tail sequence is separated from the hexamer sequence by 0 to 10 nucleotides, and wherein each nucleic acid comprises the same tail sequence.
  • the composition comprising 20 or more nucleic acid sequences is a composition wherein each nucleic acid sequence has the same length or substantially the same length.
  • the composition comprising 20 or more nucleic acid sequences is a
  • composition wherein each nucleic acid has the hexamer sequence in the same position in the nucleic acid.
  • the composition comprising 20 or more nucleic acid sequences is a composition wherein each nucleic acid sequence is a primer. In certain embodiments, the composition comprising 20 or more nucleic acid sequences is a
  • composition wherein each nucleic sequence is DNA, RNA, PNA, LNA, GNA or TNA.
  • the invention relates to a method for designing a primer set for amplification of microbial nucleic acids in an organism comprising: (a) sequencing RNA molecules in the transcriptome of an organism to generate a plurality of RNA sequence reads, (b) identifying a set of high copy redundant sequence reads from the plurality of sequence reads of step (a), wherein each high copy redundant sequence read is a sequence read representing at least about 1% of the total number of sequence reads identified in the plurality of sequences reads of step (a), (c) identifying a set of highly expressed RNA molecules comprising a sequence having at least about 95% sequence identity to a high copy redundant sequence read, (d) providing a first primer library, wherein each primer comprises a different hexamer sequence; and (e) generating a second primer library by removing primers from the first primer library predicted to anneal to one or more the RNA molecules in the set of highly expressed RNA molecules identified in step (d), provided that primers predicted to anne
  • any one of steps (b), (c) or (e) are performed by a computer.
  • the set of highly expressed RNA molecules identified in step (c) comprises any one of 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase, or NADH
  • the set of highly expressed RNA molecules identified in step (c) comprises one or more oxidative phosphorylation genes.
  • the sequencing of step (a) is unbiased high throughput sequencing.
  • the second primer library comprises 20 or more primers.
  • the second primer library comprises at least 800 primers. In certain embodiments, the second primer library comprises at least 800 primers.
  • the second primer library comprises at least 1600 primers.
  • the method further comprises a step of generating DNA primers having the sequences of the primers if the second primer library.
  • the method further comprises steps of, (i) generating cDNA from a RNA test sample using the second primer library and, (ii) generating cDNA complementary to the cDNA generated in step (i).
  • the organism is a eukaryote. In certain embodiments, the organism is a human.
  • RNAs are subsequently reverse-transcribed into cDNA and amplified prior to being sequenced with the 454-Roche sequencing platform. Reverse transcription is conducted using random primers ensuring an unbiased process. This procedure ensures that all RNA molecules present in the sample are represented in the cDNA library. Conversely, it is expected that an RNA species highly present in a sample will generate a higher number of reads. Practically, the 454-Roche sequencing platform generates a high, but limited, number of sequences (typically a few 100,000s).
  • RNA especially ribosomal RNA (rRNA) which accounts for about 90-95% RNA species in total RNA.
  • rRNA ribosomal RNA
  • the remaining 5-10% of the sequencing reads will identify other host genes and potentially genetic signature of microbes.
  • the host genetic material constitutes the vast majority of the sequence reads thus reducing the chance to detect a microbe in tissue samples.
  • 18S and 28S rRNA, mitochondrial 12S and 16S rRNA and oxidative phosphorylation (OxPhos) genes are known to be highly expressed genes.
  • the oxidative phosphorylation genes include, but are not limited to, ATP synthase, cytochrome, fumarate reductase, NADH dehydrogenase, and polyphosphate kinase genes.
  • NSR primer collection was selected from in silico analyses of ribosomal RNA sequences present in public databases. All possible hexamer sequences with perfect sequence matches to human mitochondrial rRNA(12S, 16S) and ribosomal (18S, 28S) rRNA sequences were removed from the set of the initial random hexamer primers (4096 primers) (1).
  • the NSR primer collection comprises 749 hexamers providing unbiased cDNA libraries.
  • the method for designing primers described herein is based, in part, on analyses of host sequences detected in unbiased high throughput sequencing experiments rather than in silica analyses of sequences in public databases.
  • the advantages of the methods described herein include: (1) a more uniform coverage depth for microbe sequences; (2) depletion of a larger number of host genes; and/or (3) provision of a universal primer kit for a set of closely related organisms.
  • FIG. 2 shows the analysis of a 454 high- throughput sequencing experiment with the random primer set. Coverage depth for all positions in the 28S ribosomal RNA sequence using the 454 reference mapper application was identified. Fig. 2 shows that there are many reads coming from specific regions, while some other regions do not contribute any reads. This pattern was repeatedly observed through many experiments.
  • This invention is based, in part, on the surprising discovery of a strong correlation between the presence of secondary structures in the target RNAs and the low coverage depth. It is presumed that strong secondary structures impede binding of random hexamers to the RNA template and the synthesis of the complementary DNA (Fig. 1). Thus, it is postulated that only primers that hybridize to the mRNA of the target gene that does not form a RNA secondary structure are able to generate sequencing reads. These observations allow for minimization of the number of primers to be removed from the set of random primers. If each primer contains a random hexamer nucleotides sequence NNNN, the set of random primers would comprise 4 6 primers (4096 primers).
  • RNA The primary structure of an organism's RNA determines which hexamers can bind to it with maximal efficiency. This happens when there is an exact match between the hexamer sequence and the sequence of the host RNA to which it binds.
  • the secondary RNA structure analysis of an organism's RNA reveals that not all possible binding sites are of equal importance and that strong secondary structures impede binding of random hexamers to the RNA template.
  • 28S rRNA secondary structure suggests that the absence of host 28S rRNA reads in specific regions is correlated with strong secondary structures in those regions.
  • a set of primers obtained using this method is referred to as set of MEP primers.
  • RNA secondary structure prediction can be performed using either M-fold (Matthews et al. (1999), J. Mol. Biol. 288:911-940) or RNA Structure 2.52. M-fold can be accessed through the internet at
  • RNA secondary structure production includes programs such as RNAfield, RNAstructure and UNAFold.
  • the MEP primer design is based on analyses of host sequences detected in UHTS experiments rather than in silico analyses of sequences in public databases.
  • the primer removal algorithm in the MEP primer design not only takes into account the primer sequence, but also RNA secondary structures. This approach allows for the depletion of a larger set of host genes without impacting on the cDNA library complexity.
  • Microbial Enrichment Primers Microbial Enrichment Primers
  • each MEP primer has the following structure:
  • each primer in a primer set has the same Specific Tail, but a different Hexamer. In general, all of the primers in the set have the same length or substantially the same length.
  • H is a nucleic acid hexamer (i.e., a hexanucleotide) having a hexanucleotide sequence selected from the sequences in Table 1.
  • ATACGC SEQ ID: 207 turn TGACTG SEQ ID: 761 GGTAGG SEQ ID: 1315
  • ATACGT SEQ ID: 209 TGATAC SEQ ID: 763 GGTATG SEQ ID: 1317
  • ATATAC SEQ ID: 212 TGCAGG SEQ ID: 766 11 GGTCTA SEQ ID: 1320
  • ATCACC SEQ ID: 216 TGCCAA SEQ ID: 770 GGTTAT SEQ ID: 1324
  • ATCTAC SEQ ID: 222 ft 11 TGCGCA SEQ ID: 776 1 GTACCA SEQ ID: 1330
  • ATCTCA SEQ ID: 223 TGCTGG SEQ ID: 777 GTACCT SEQ ID: 1331
  • ATGCAA SEQ ID: 229 111 TGGAGC SEQ ID: 783 lift GTAGGG SEQ ID: 1337
  • ATGTCA SEQ ID: 239 Hi 11 TGTAAC SEQ ID: 793 11111 GTATTT SEQ ID: 1347
  • ATGTGC SEQ ID: 241 ft 11 TGTAGT SEQ ID: 795 1 GTCACT SEQ ID: 1349
  • ATTATC SEQ ID: 244 Hi 11 TGTCGA SEQ ID: 798 11111
  • CTCTGT SEQ ID: 455 11 SEQ ID: 1009 e III ⁇ iI TGGCAG SEQ ID: 1563
  • CTGCAA SEQ ID: 458 H 11 CAACGT SEQ ID: 1012 11 TGGTAC SEQ ID: 1566
  • CTGCAG SEQ ID: 459 CAACTG SEQ ID: 1013 TGGTCT SEQ ID: 1567
  • CTGCCA SEQ ID: 460 CAAGGC SEQ ID: 1014 TGTAAG SEQ ID: 1568
  • CTGGCC SEQ ID: 461 HI 11 CAATAC SEQ ID: 1015 1i11i1 TGTACC SEQ ID: 1569
  • CTGGGA SEQ ID: 462 CAATCG SEQ ID: 1016 1 TGTAGT SEQ ID: 1570
  • CTGTAA SEQ ID: 463 111 11 CAATTC SEQ ID: 1017 11 TGTATA SEQ ID: 1571
  • CTGTAC SEQ ID: 464 ft 11 C AC ATT SEQ ID: 1018 1 11 TGTATC SEQ ID: 1572
  • CTGTCG SEQ ID: 466 CACCGA SEQ ID: 1020 TGTCAC SEQ ID: 1574
  • CTGTGA SEQ ID: 467 CACCGT SEQ ID: 1021 TGTCAT SEQ ID: 1575
  • CTTGTA SEQ ID: 470 ill CACGCT SEQ ID: 1024 ⁇ tli TGTCTA SEQ ID: 1578
  • GAAGCC SEQ ID: 478 ft 11 CAGAGT SEQ ID: 1032 1 11 TGTGTG SEQ ID: 1586
  • GACTCG SEQ ID: 500 111 111 CATGTA SEQ ID: 1054 ll TTATGT SEQ ID: 1608
  • GACTGT SEQ ID: 502 ft 11 CAT TAG SEQ ID: 1056 s IiIlIlI TTCACG SEQ ID: 1610
  • GAGACA SEQ ID: 503 ill 1 ⁇ 2 CATTGA SEQ ID: 1057 l 1l1l1ll1l TTCACT SEQ ID: 161 1 GCCCAA SEQ ID: 554 IlCGGGAC SEQ ID: 1 108 T T T T T T SEQ ID: 1662
  • H is a nucleic acid hexamer (i.e., a hexanucleotide) having a hexanucleotide sequence selected from the sequences in Table 2.
  • the hexamers in Table 2 can be used for forward sense primers, for example in the FS-MEP primers.
  • H is a nucleic acid hexamer (i.e., a hexanucleotide) having a hexanucleotide sequence selected from the sequences in Table 3.
  • the hexamers in Table 3 can be used for reverse sense primers, for example in the RS-MEP primers.
  • H comprises more than six nucleotides. In some embodiments, H comprises fewer than six nucleotides. In some embodiments, N is 1, 2, or 3 random nucleotides.
  • ST is a specific tail sequence of 12 to 22 nucleotides. In some embodiments, ST is a specific tail sequence selected from the group consisting of 12 to 15, 13 to 16, 14 to 17, 15 to 18, 16 to 19, 17 to 20, 18 to 21, and 19 to 22 nucleotides. In some embodiments, ST is a specific tail sequence selected from the group consisting of 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, and 22 nucleotides. In some embodiments, ST is a specific tail sequence of 17 nucleotides.
  • the present disclosure relates to nucleic acid sequences having the formula X-H-N-ST-Y, where H is a nucleic acid hexamer; N is 1, 2, or 3 random nucleotides; X and Y are 0 to 3 random nucleotides; and ST is a specific tail sequence of 12 to 22 nucleotides.
  • H is more than six nucleotides. In other embodiments, H is fewer than six nucleotides.
  • N is 1, 2, or 3 random nucleotides.
  • X is 0, 1, 2, or 3 random nucleotides.
  • Y is 0, 1, 2, or 3 random nucleotides.
  • X and Y are the same number of random nucleotides. In some embodiments, X and Y are both 0 random nucleotides.
  • all primers in a set or in composition have substantially the same length. In some embodiments, all primers in a set or in a composition have the same length. In some embodiments, all primers in a set have the same specific tail.
  • a primer set described herein can comprise anywhere between 20-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000, 1000-1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600- 1700, 1700-1800, 1800-1900, or 1900-2000 primers, such that each primer contains a different hexamer sequence.
  • FP Forward Primers
  • RP Reverse Primers
  • the number of primers in a MEP primer set will vary based on which genes are highly expressed in the organism and based on the desired sensitivity of the high- throughput sequencing experiment. In general, a larger primer library is expected to increase sensitivity of microbe detection.
  • a set of forward strand primers is a set of 800-900 primers.
  • a set of reverse strand primers is a set of 800-900 primers.
  • the Hexamer is a hexamer selected from Table 1.
  • a primer set includes at least one primer in which the hexamer sequence is not a sequence selected from Table 4.
  • a set of forward strand primers includes from 20- 100, 100-200, 200-400, 400-600, 600-800, or 800-10000 primers in which at least one hexamer sequence is not a sequence selected from the hexamer sequences in Table 4.
  • a set of reverse strand primers includes from 20-100, 100-200, 200-400, 400-600, 600-800, or 800-10000 primers in which at least one hexamer sequence is not a sequence selected from the hexamer sequences in Table 5.
  • a set of forward strand (FS-MEP) primers include a hexamer (H) sequence selected from the group consisting of hexamers in Table 2.
  • a set of reverse strand (RS-MEP) primers include primers a hexamer sequence selected from the group consisting of hexamers in Table 3.
  • H is a hexamer sequence selected from Table 2. This embodiment can be useful in a set of forward strand (FS-MEP) primers.
  • FS-MEP forward strand
  • H is a hexamer selected from Table 3. This embodiment can be useful in a set of reverse strand (RS-MEP) primers.
  • RS-MEP reverse strand

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A primer set, referred to as the Microbial Enrichment Primers (MEP) that generates cDNA libraries depleted of ribosomal and mitochondrial sequences is provided. The primer set is useful in a method that enriches for the presence of microbial nucleic acids from samples. The method enhances sensitivity of microbe detection.

Description

MICROBIAL ENRICHMENT PRIMERS
[0001] This patent disclosure contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the U.S. Patent and Trademark Office patent file or records, but otherwise reserves any and all copyright rights.
[0002] This invention was made with government support under A 1070411 and Al 057158 awarded by the National Institute of Allergy and Infectious Diseases. The government has certain rights in the invention.
[0003] This application claims the benefit of and priority to U.S. provisional patent application Ser. No. 61/411,142 filed November 8, 2010 and U.S. provisional patent application Ser. No. 61/424,276 filed December 17, 2010, the disclosures of each of which is hereby incorporated by reference in its entirety for all purposes.
BACKGROUND OF THE INVENTION
[0004] Microarrays and unbiased high throughput sequencing of nucleic acid from tissue specimens is greatly impeded by the presence of a large quantity of host RNA, especially ribosomal RNA (rRNA) which accounts for about 90-95% RNA species in total RNA.
[0005] There is a need to devise primer design methods useful for enriching the presence of microbial nucleic acids, and enhancing sensitivity of microarrays and high throughput sequencing analysis. There is also a need to reduce costs associated with the use of such platforms in surveillance, diagnosis and discovery of micro-organisms and other microbes in research and medicine. This invention addresses these needs.
FIELD OF THE INVENTION
[0006] The invention relates to the field of microbe detection. In particular, the invention relates to microbe detection in an organism using transcriptome libraries.
SUMMARY OF THE INVENTION
[0007] It is understood that any of the embodiments described below can be combined in any desired way, and any embodiment or combination of embodiments can be applied to each of the aspects described below. [0008] In one aspect, the invention provides a composition comprising 20 or more nucleic acid sequences, wherein each of the 20 nucleic acid sequence comprises a different hexamer sequence selected from the group consisting of the hexamers sequences in Table 1 , provided that at least one nucleic sequence does not comprise a hexamer sequence selected from the group consisting of the hexamer sequences in Table 4.
[0009] In some embodiment, each different hexamer sequence is selected from the group consisting of the hexamer sequences in Table 2. In some embodiments, each different hexamer sequence is selected from the group consisting of hexamer sequences in Table 3.
[0010] In some embodiments, the composition comprises 200 or more nucleic acid sequences. In some embodiments, the composition comprises 800 or more nucleic acid sequences.
[0011] In some embodiments, each nucleic acid further comprises a tail sequence 5' to the hexamer sequence, wherein the tail sequence is about 10 to about 22 nucleotides in length, wherein tail sequence is separated from the hexamer sequence by 0 to 10 nucleotides, wherein each nucleic acid comprises the same tail sequence.
[0012] In some embodiment, each nucleic acid in the composition has the same length or substantially the same length. In some embodiments, each nucleic acid has the hexamer sequence in the same position in the nucleic acid.
[0013] In some embodiments, each nucleic acid sequence is a primer. In some embodiments, each nucleic sequence is DNA, RNA, PNA, LNA, GNA or TNA.
[0014] In another aspect, the invention provides a method for designing a primer set for amplification of microbial nucleic acids in an organism comprising: (a) sequencing the transcriptome of an organism; (b) identifying highly expressed genes of the organism from the plurality of sequence reads identified in step (a); (c) providing a first primer library, wherein each primer comprises a different hexamer sequence; and (d) removing primers from the first primer library that are predicted to anneal to the RNA of the organism's highly expressed genes to generate a second primer library, provided that primers expected to anneal to the RNA predicted to form a secondary structure are not removed from the first primer library.
[0015] In some embodiment, primers comprising hexamer sequences with perfect sequence matches to the regions giving a substantial number of reads are removed from the first primer library. In some embodiment, the substantial amount of reads is more than 1% of the relative coverage depth.
[0016] In some embodiments, steps (b) or (d) are performed by a computer.
[0017] In some embodiments, the organism's highly expressed genes comprise 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase, NADH dehydrogenase. In some embodiments, the organism's highly expressed genes further comprise one or more additional oxidative phosphorylation genes.
[0018] In some embodiments, the transcriptome of the organisms is sequenced using unbiased high throughput sequencing.
[0019] In some embodiments, the second primer library comprises 20 or more primers. In some embodiments, the second primer library comprises 800 or more primers. In some embodiments, the second primer library comprises 1600 or more primers.
[0020] In some embodiments, the method further comprises separating the primers in the second primer library into two primer sets.
[0021] In some embodiments, one primer set is used to generate a first cDNA strand from total RNA and the other primer set is used to generate a second cDNA strand from the first cDNA strand.
[0022] In some embodiments, the organism is a eukaryote. In some embodiments, the organism is a human.
[0023] In some embodiments, the method further comprises producing the second primer library.
[0024] In another aspect, the invention provides a method for amplifying a microbial nucleic acid comprising: (a) providing a sample from an organism; (b) isolating total RNA from the sample; (c) reverse-transcribing total RNA from the sample using a set of forward strand primers to provide a first cDNA strand, wherein the forward strand primers are designed not to amplify organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts, wherein the forward strand primers comprise primers complementary to regions predicted to form secondary RNA structure of the organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts; (d) replicating the first cDNA strand using a set of reverse strand primers to provide a second cDNA strand, wherein the reverse strand primers comprise primers are designed not to amplify organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts, wherein the reverse strand primers comprise primers complementary to regions predicted to form secondary RNA structure of the organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts; (e) allowing the first and the second cDNA strands to anneal to provide double-stranded cDNA; and (f) amplifying the double stranded cDNA using polymerase chain reaction using double-strand cDNA primers.
[0025] In some embodiments, the forward strand primers and the reverse strand primers are further designed not to amplify the RNA transcripts of one or more additional oxidative phosphorylation genes.
[0026] In some embodiments, the forward strand primers and the reverse strand primers each comprise a different hexamer sequence. In some embodiments, the forward strand primers and the reverse strand primers each comprise the same specific tail sequence.
[0027] In some embodiments, at least one of the double-strand cDNA primers is complementary to the specific tail sequence or a portion thereof.
[0028] In some embodiments, each forward strand primer comprises a hexamer sequence selected from the group consisting of the hexamer sequences in Table 2. In some embodiments, each reverse strand primer comprises a hexamer sequence selected from the group consisting of the hexamer sequences in Table 3.
[0029] In some embodiments, the method further comprises determining whether the organism is infected with a microbe.
[0030] In another aspect, the invention provides a composition comprising 800 or more nucleic acids, wherein each nucleic acid has the structure: H-Na-ST; wherein H is a nucleotide sequence of 5 to 7 nucleotides; N is a random nucleotide; a is an integer from 0 to 12; and ST is a nucleotide sequence from 10-22 nucleotides, provided that each nucleic acid in the composition has a different sequence H.
[0031] In some embodiments, H is a nucleotide sequence of 6 nucleotides. In some embodiments, in at least one nucleic acid sequence H is not selected from the group consisting of the hexamer sequences in Table 4. [0032] In some embodiments, H is selected from the group consisting of the hexamer sequences in Table 2. In some embodiments, H is selected from the group consisting of the hexamer sequences in Table 3.
[0033] In some embodiments, a is 1. In some embodiments, the composition comprises at least 1600 nucleic acids. In some embodiments, ST is a nucleotide sequence of 17 nucleotides.
[0034] In another aspect, the invention provides a kit comprising a composition of any of the aspects and embodiments described above and instructions for use.
BRIEF DESCRIPTION OF THE DRAWINGS
[0035] Fig. 1 is a schematic of a secondary DNA structure impeding the ability to synthesize cDNA;
[0036] Fig. 2 is an illustration of the coverage depth for the 28S rRNA gene by unbiased 454 high-throughput sequencing;
[0037] Fig. 3A is an illustration of the first strand cDNA synthesis using forward strand microbial enrichment (FS-MEP) primers;
[0038] Fig. 3B is an illustration of the second strand cDNA synthesis using reverse strand microbial enrichment (RS-MEP) primers;
[0039] Fig. 3C is an illustration of the PCR amplification of double stranded cDNA generated with the FS-MEP and RS-MEP primers;
[0040] Figs. 4A and 4B are illustrations explaining coverage depth;
[0041] Fig. 5 is an illustration of the predicted secondary structure of the 28S human ribosomal RNA sequence at 65 °C using RNAfold program from the Vienna-RNA package;
[0042] Figs. 6A-6D are illustrations of the relative coverage depth for human host genes 12S, 16S, 18S and 28S rRNA in 45 UHTS experiments using random primers;
[0043] Figs. 7A-7D. are illustrations of the raw coverage depths for human host genes 12S, 16S, 18S and 28S rRNA in 45 UHTS experiments using random primers;
[0044] Fig. 8 is a schematic of the microbe RNA detection and amplification using the microbial enrichment primer procedure; and [0045] Fig. 9A and 9B are illustrations of the raw coverage depth of Lujo virus for its S and L segments, respectively, using the MEP primers and using random primers.
DETAILED DESCRIPTION OF THE INVENTION
[0046] The patent and scientific literature referred to herein establishes knowledge that is available to those of skill in the art. The issued U.S. patents, allowed applications, published foreign applications, and references, including GenBank database sequences, that are cited herein are hereby incorporated by reference to the same extent as if each was specifically and individually indicated to be incorporated by reference.
Definitions and Abbreviations
[0047] As used herein, the recitation of a numerical range for a variable is intended to convey that the invention may be practiced with the variable equal to any of the values within that range. Thus, for a variable which is inherently discrete, the variable can be equal to any integer value within the numerical range, including the end-points of the range.
Similarly, for a variable which is inherently continuous, the variable can be equal to any real value within the numerical range, including the end-points of the range. As an example, and without limitation, a variable which is described as having values between 0 and 2 can take the values 0, 1 or 2 if the variable is inherently discrete, and can take the values 0.0, 0.1, 0.01, 0.001, or any other real values > 0 and≤ 2 if the variable is inherently continuous.
[0048] As used herein, unless specifically indicated otherwise, the word "or" is used in the inclusive sense of "and/or" and not the exclusive sense of "either/or."
[0049] The singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise.
[0050] The term "about" is used herein to mean approximately, in the region of, roughly, or around. When the term "about" is used in conjunction with a numerical range, it modifies that range by extending the boundaries above and below the numerical values set forth. In general, the term "about" is used herein to modify a numerical value above and below the stated value by a variance of 20%.
[0051] The term "microbe" as used herein refers to bacteria, viruses, fungi, parasites, and other infectious agents that are capable of infecting a host. As used herein, the term "host" or "organism" refers to any selected source that is infected or could potentially be infected with a microbe. Non- limiting examples of organisms include vertebrates, invertebrates, mammals, humans, dogs, cats, cattle, pigs, sheep, rabbits, mice, rats, birds, reptiles, amphibians, fish, insects, plants, tissue cultures, and cell cultures. As used herein, the term "sample" refers to material obtained from an organism. Non-limiting examples of samples include tissue, body fluids, blood, saliva, sperm, cells.
[0052] The coverage depth of a sequence region is the number of sequence reads which contain this region (i.e., they "cover" it). It does not have units as it is a count of reads. It is measured using the 454 reference mapper application. To increase the sensitivity of detection of a microbe infection, it is desirable to increase coverage depth (number of reads) for microbial sequences and to decrease it for the common host sequences.
[0053] In certain aspects, the present disclosure is directed to nucleic acid sequences as described herein for amplification of microbial nucleic acids from samples. Amplification can be performed by any suitable method known in the art, for example but not limited to polymerase chain reaction ("PCR"), real-time polymerase chain reaction ("RT- PCR"), and transcription mediated amplification (TMA). Nucleic amplification methods are disclosed in detail by Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y ("Sambrook"), which is incorporated herein by reference.
[0054] UHTS refers to Unbiased High Throughput Sequencing.
[0055] As used herein, "MEP primers" or "Microbial Enrichment Primers" or "microbe enrichment primers" refers to a set of primers designed or obtained using the methods described herein, which take into account the highly expressed host genes and the secondary structure of the RNA transcripts to be amplified.
[0056] As used herein, the phrases "forward strand microbial enrichment primers" and "FS-MEP" refer to microbial enrichment primers or a set of microbial enrichment primers that used to generate a first cDNA strand from the total RNA. In some embodiments, FS-MEP are of the formula H-N-ST described above where H is a nucleic acid hexamer (i.e., six nucleotides) N is a random nucleotide, and ST is a Specific Tail Sequence. The forward strand microbial enrichment primers are designed not to amplify transcripts from the highly expressed genes of an organism and include primers that are complementary to organism's RNA predicted to form a secondary structure.
[0057] As used herein, the phrases "reverse strand microbial enrichment primers" and "RS-MEP" refer to microbial enrichment primers or a set of microbial enrichment primers used to generate a second cDNA strand that is complementary to the first cDNA strand generated using FS-MEP. In some embodiments, RS-MEP are of the formula H-N-ST described above where H is a nucleic acid hexamer (i.e., six nucleotides), N is a random nucleotide and ST is a Specific Tail Sequence. In some embodiments, the hexamers (H) of the RS-MEP are complementary to the hexamers (H) of the FS-MEP.
[0058] In one aspect, the invention relates to a composition comprising 20 or more nucleic acid sequences, wherein each of the 20 nucleic acid sequence comprises a different hexamer sequence selected from the group consisting of the hexamers sequences of any of SEQ ID NOs: 1-1662, SEQ ID NOs: 1663-2490, or SEQ ID NOs: 2491-3324, provided that at least one nucleic sequence does not comprise a hexamer sequence selected from the group consisting of the hexamer sequences of any of SEQ ID NOs: 3325-4822. In certain embodiments, the composition comprising 20 or more nucleic acid sequences comprises 200 or more nucleic acid sequences. In certain embodiments, the composition comprising 20 or more nucleic acid sequences comprises 800 or more nucleic acid sequences. In certain embodiments, the composition comprising 20 or more nucleic acid sequences comprises is a composition wherein each nucleic acid further comprises a tail sequence 5 ' to the hexamer sequence, and wherein the tail sequence is about 10 to about 22 nucleotides in length, and wherein tail sequence is separated from the hexamer sequence by 0 to 10 nucleotides, and wherein each nucleic acid comprises the same tail sequence. In certain embodiments, the composition comprising 20 or more nucleic acid sequences is a composition wherein each nucleic acid sequence has the same length or substantially the same length. In certain embodiments, the composition comprising 20 or more nucleic acid sequences is a
composition wherein each nucleic acid has the hexamer sequence in the same position in the nucleic acid. In certain embodiments, the composition comprising 20 or more nucleic acid sequences is a composition wherein each nucleic acid sequence is a primer. In certain embodiments, the composition comprising 20 or more nucleic acid sequences is a
composition wherein each nucleic sequence is DNA, RNA, PNA, LNA, GNA or TNA.
[0059] In one aspect, the invention relates to a method for designing a primer set for amplification of microbial nucleic acids in an organism comprising: (a) sequencing RNA molecules in the transcriptome of an organism to generate a plurality of RNA sequence reads, (b) identifying a set of high copy redundant sequence reads from the plurality of sequence reads of step (a), wherein each high copy redundant sequence read is a sequence read representing at least about 1% of the total number of sequence reads identified in the plurality of sequences reads of step (a), (c) identifying a set of highly expressed RNA molecules comprising a sequence having at least about 95% sequence identity to a high copy redundant sequence read, (d) providing a first primer library, wherein each primer comprises a different hexamer sequence; and (e) generating a second primer library by removing primers from the first primer library predicted to anneal to one or more the RNA molecules in the set of highly expressed RNA molecules identified in step (d), provided that primers predicted to anneal to a region of a RNA predicted to have a strong secondary structure capable of impeding binding of a hexamer to the RNA are not removed from the first primer library. In certain embodiments, any one of steps (b), (c) or (e) are performed by a computer.In certain embodiments, the set of highly expressed RNA molecules identified in step (c) comprises any one of 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase, or NADH
dehydrogenase. In certain embodiments, the set of highly expressed RNA molecules identified in step (c) comprises one or more oxidative phosphorylation genes. In certain embodiments, the sequencing of step (a) is unbiased high throughput sequencing. In certain embodiments, the second primer library comprises 20 or more primers. In certain
embodiments, the second primer library comprises at least 800 primers. In certain
embodiments, the second primer library comprises at least 1600 primers. In certain embodiments, the method further comprises a step of generating DNA primers having the sequences of the primers if the second primer library. In certain embodiments, the method further comprises steps of, (i) generating cDNA from a RNA test sample using the second primer library and, (ii) generating cDNA complementary to the cDNA generated in step (i). In certain embodiments, the organism is a eukaryote. In certain embodiments, the organism is a human.
Identification of Microbes in Tissue Samples
[0060] Microbes in tissue samples can be identified by detecting their genetic signatures in the genetic background of a host. Traditionally, tissue samples are processed to recover genetic information as ribonucleic acid (RNA). RNAs are subsequently reverse- transcribed into cDNA and amplified prior to being sequenced with the 454-Roche sequencing platform. Reverse transcription is conducted using random primers ensuring an unbiased process. This procedure ensures that all RNA molecules present in the sample are represented in the cDNA library. Conversely, it is expected that an RNA species highly present in a sample will generate a higher number of reads. Practically, the 454-Roche sequencing platform generates a high, but limited, number of sequences (typically a few 100,000s).
Interference of Host Sequences with Detection of Microbe Detection
[0061] Identification of nucleic acids of microbes in tissue specimens by UHTS is greatly impeded by the presence of a large quantity of host RNA, especially ribosomal RNA (rRNA) which accounts for about 90-95% RNA species in total RNA. Thus, 90 to 95% of the sequence reads will be associated to rRNA, the remaining 5-10% of the sequencing reads will identify other host genes and potentially genetic signature of microbes. Hence, the host genetic material constitutes the vast majority of the sequence reads thus reducing the chance to detect a microbe in tissue samples.
[0062] 18S and 28S rRNA, mitochondrial 12S and 16S rRNA and oxidative phosphorylation (OxPhos) genes are known to be highly expressed genes. The oxidative phosphorylation genes include, but are not limited to, ATP synthase, cytochrome, fumarate reductase, NADH dehydrogenase, and polyphosphate kinase genes.
[0063] Armour et al. recently developed a procedure that allows generation of rRNA-depleted cDNA libraries using a selective collection of hexamers (1). The 'not- so- random' (NSR) primer collection was selected from in silico analyses of ribosomal RNA sequences present in public databases. All possible hexamer sequences with perfect sequence matches to human mitochondrial rRNA(12S, 16S) and ribosomal (18S, 28S) rRNA sequences were removed from the set of the initial random hexamer primers (4096 primers) (1). The NSR primer collection comprises 749 hexamers providing unbiased cDNA libraries.
[0064] Despite the proffered efficiency of the method of Armour (1), additional highly expressed genes contribute to the large number of host sequence reads identified in UHTS experiments (in particular, mitochondrial OxPhos genes). However, removing additional primers from the NSR primer collection to deplete additional highly expressed host genes would decrease significantly the number of NSR primers and the library complexity, adversely affecting the unbiased processes and reducing the chance to identify an unknown microbe.
[0065] In contrast with the method of Armour et al., the method for designing primers described herein is based, in part, on analyses of host sequences detected in unbiased high throughput sequencing experiments rather than in silica analyses of sequences in public databases. [0066] The advantages of the methods described herein include: (1) a more uniform coverage depth for microbe sequences; (2) depletion of a larger number of host genes; and/or (3) provision of a universal primer kit for a set of closely related organisms.
[0067] An analysis of 45 UHTS experiments (i.e., using random primers) generated with the 454-Roche sequencing platform revealed a non-uniform coverage depth of host genes, such as ribosomal (18S rRNA and 28S rRNA) and mitochondrial (12S rRNA, 16S rRNA, ATP Synthase, NADH dehydrogenase) genes. Fig. 2 shows the analysis of a 454 high- throughput sequencing experiment with the random primer set. Coverage depth for all positions in the 28S ribosomal RNA sequence using the 454 reference mapper application was identified. Fig. 2 shows that there are many reads coming from specific regions, while some other regions do not contribute any reads. This pattern was repeatedly observed through many experiments.
Secondary RNA Structure Impact on Primer Selection
[0068] This invention is based, in part, on the surprising discovery of a strong correlation between the presence of secondary structures in the target RNAs and the low coverage depth. It is presumed that strong secondary structures impede binding of random hexamers to the RNA template and the synthesis of the complementary DNA (Fig. 1). Thus, it is postulated that only primers that hybridize to the mRNA of the target gene that does not form a RNA secondary structure are able to generate sequencing reads. These observations allow for minimization of the number of primers to be removed from the set of random primers. If each primer contains a random hexamer nucleotides sequence NNNNNN, the set of random primers would comprise 46 primers (4096 primers).
[0069] The primary structure of an organism's RNA determines which hexamers can bind to it with maximal efficiency. This happens when there is an exact match between the hexamer sequence and the sequence of the host RNA to which it binds. In one
embodiment, this occurs when there is exact complementarity between the hexamer sequence and the sequence of the host RNA to which it binds. However, the secondary RNA structure analysis of an organism's RNA reveals that not all possible binding sites are of equal importance and that strong secondary structures impede binding of random hexamers to the RNA template. For example, in silico analyses of 28S rRNA secondary structure suggests that the absence of host 28S rRNA reads in specific regions is correlated with strong secondary structures in those regions. Based on these findings, only those primers that can anneal to the host regions for ribosomal RNA (12S, 16S, 18S, 28S) and mitochondrial RNA identified in the UHTS results were excluded from the initial primer library. A set of primers obtained using this method is referred to as set of MEP primers.
[0070] Many methods can be used to determine secondary structure of RNA. There are a number of algorithms that predict RNA secondary structures based on thermodynamic parameters and energy calculations. For example, RNA secondary structure can be predicted using RNAfold program from the Vienna-RNA package (Hofacker et al., "Fast Folding and Comparison of RNA Secondary Structures," Monatshefte fur Chemie 125, 167-188 (1994). Alternatively or in addition, RNA secondary structure prediction can also performed using either M-fold (Matthews et al. (1999), J. Mol. Biol. 288:911-940) or RNA Structure 2.52. M-fold can be accessed through the internet at
infold.bioinfo.rpi.edu/download/ (last visited 12/16/2010). Additional software for RNA secondary structure production includes programs such as RNAfield, RNAstructure and UNAFold.
[0071] In contrast to Armour's work, the MEP primer design is based on analyses of host sequences detected in UHTS experiments rather than in silico analyses of sequences in public databases. The primer removal algorithm in the MEP primer design not only takes into account the primer sequence, but also RNA secondary structures. This approach allows for the depletion of a larger set of host genes without impacting on the cDNA library complexity.
Microbial Enrichment Primers (MEP)
[0072] In some embodiments, each MEP primer has the following structure:
Hexamer + N + Specific Tail
where Hexamer represents a sequence of 6 nucleotides, N is a spacer or 0 to 10 nucleotides, and Specific Tail is a sequence of 10-22 nucleotides. In some embodiments, each primer in a primer set has the same Specific Tail, but a different Hexamer. In general, all of the primers in the set have the same length or substantially the same length.
[0073] In some embodiments, H is a nucleic acid hexamer (i.e., a hexanucleotide) having a hexanucleotide sequence selected from the sequences in Table 1.
TA B L E 1
Figure imgf000014_0001
Figure imgf000015_0001
Figure imgf000016_0001
Figure imgf000017_0001
Figure imgf000018_0001
ATACAT SEQ ID: 204 TGACAA SEQ ID: 758 1 ' s GGTACG SEQ ID: 1312
ATACCG SEQ ID: 205 111 TGACGT SEQ ID: 759 β 1111ΐ1ι GGTACT SEQ ID: 1313
ATACCT SEQ ID: 206 TGACTC SEQ ID: 760 GGTAGA SEQ ID: 1314
ATACGC SEQ ID: 207 turn TGACTG SEQ ID: 761 GGTAGG SEQ ID: 1315
ATACGG SEQ ID: 208 111 11 GAG AC SEQ ID: 762 1 GGTATC SEQ ID: 1316
ATACGT SEQ ID: 209 TGATAC SEQ ID: 763 GGTATG SEQ ID: 1317
ATAGCA SEQ ID: 210 TGCAAG SEQ ID: 764 GGTATT SEQ ID: 1318
ATAGTC SEQ ID: 21 1 Hi 11 TGCACA SEQ ID: 765 iiiii
11111 GGTCCA SEQ ID: 1319
ATATAC SEQ ID: 212 TGCAGG SEQ ID: 766 11 GGTCTA SEQ ID: 1320
ATATCC SEQ ID: 213 11 TGCAGT SEQ ID: 767 S8JJJJ
1 GGTGAT SEQ ID: 1321
ATATCG SEQ ID: 214 i l
ft TGCATA SEQ ID: 768 1 "111 GGTGCC SEQ ID: 1322
ATATTC SEQ ID: 215 111 TGCATC SEQ ID: 769 lift GGTTAC SEQ ID: 1323
ATCACC SEQ ID: 216 TGCCAA SEQ ID: 770 GGTTAT SEQ ID: 1324
ATCAGC SEQ ID: 217 TGCCAT SEQ ID: 771 GGTTGT SEQ ID: 1325
ATCATA SEQ ID: 218 S8JJJJ
111 TGCCGC SEQ ID: 772 1 GTAATA SEQ ID: 1326
ATCCGT SEQ ID: 219 ll s§lll
ft TGCCTA SEQ ID: 773 s
1111 GTAATG SEQ ID: 1327
A CGAC SEQ ID: 220 ill TGCGAA SEQ ID: 774 ^ tll G ACAG SEQ ID: 1328
ATCGCA SEQ ID: 221 TGCGAC SEQ ID: 775 i
11111 GTACAT SEQ ID: 1329
ATCTAC SEQ ID: 222 ft 11 TGCGCA SEQ ID: 776 1 GTACCA SEQ ID: 1330
ATCTCA SEQ ID: 223 TGCTGG SEQ ID: 777 GTACCT SEQ ID: 1331
ATCTGG SEQ ID: 224 TGGAAA SEQ ID: 778 GTACGC SEQ ID: 1332
ATCTTC SEQ ID: 225 in TGGAAG SEQ ID: 779 11111 GTACGT SEQ ID: 1333
ATCTTG SEQ ID: 226 TGGAAT SEQ ID: 780 11111 GTACTA SEQ ID: 1334
A G AC A SEQ ID: 227 ft 11 TGGACA SEQ ID: 781 1 G AG AC SEQ ID: 1335
ATGAGG SEQ ID: 228 i l
ft TGGACC SEQ ID: 782 1 "111 G TAG AT SEQ ID: 1336
ATGCAA SEQ ID: 229 111 TGGAGC SEQ ID: 783 lift GTAGGG SEQ ID: 1337
ATGCAT SEQ ID: 230 TGGCAG SEQ ID: 784 GTAGTG SEQ ID: 1338
ATGCCG SEQ ID: 231 TGGCAT SEQ ID: 785 GTATAT SEQ ID: 1339
ATGCGA SEQ ID: 232 ft TGGCGC SEQ ID: 786 S8JJJJ
1 GTATCA SEQ ID: 1340
ATGCTC SEQ ID: 233 ll
ft TGGCGT SEQ ID: 787 ss§lll
11111 GTATCG SEQ ID: 1341
ATGCTG SEQ ID: 234 ill TGGGAC SEQ ID: 788 ^ tll GTATCT SEQ ID: 1342
ATGGAC SEQ ID: 235 11 TGGGAG SEQ ID: 789 i i
11111 GTATGA SEQ ID: 1343
ATGGTG SEQ ID: 236 ft 11 TGGTAC SEQ ID: 790 1 GTATGC SEQ ID: 1344
ATGTAA SEQ ID: 237 i l
ft TGGTAT SEQ ID: 791 1 "111 GTATTA SEQ ID: 1345
ATGTAC SEQ ID: 238 ll
ft TGGTCT SEQ ID: 792 s 1i1l111l GTATTC SEQ ID: 1346
ATGTCA SEQ ID: 239 Hi 11 TGTAAC SEQ ID: 793 11111 GTATTT SEQ ID: 1347
ATGTCG SEQ ID: 240 TGTACT SEQ ID: 794 11111 GTCACG SEQ ID: 1348
ATGTGC SEQ ID: 241 ft 11 TGTAGT SEQ ID: 795 1 GTCACT SEQ ID: 1349
ATGTTG SEQ ID: 242 i l
ft TGTATC SEQ ID: 796 1 "111 GTCAGG SEQ ID: 1350
ATTACG SEQ ID: 243 ft TGTCAT SEQ ID: 797 GTCATG SEQ ID: 1351
ATTATC SEQ ID: 244 Hi 11 TGTCGA SEQ ID: 798 11111
11111 GTCCAC SEQ ID: 1352
ATTCAA SEQ ID: 245 TGTCGC SEQ ID: 799 1 GTCCCA SEQ ID: 1353
ATTCAC SEQ ID: 246 ft TGTCGG SEQ ID: 800 S8JJJJ
1 GTCCTC SEQ ID: 1354
ATTCGA SEQ ID: 247 ll
ft TGTCGT SEQ ID: 801 β 1111ΐ1ΐ GTCGAC SEQ ID: 1355
ATTCTC SEQ ID: 248 111 TGTCTT SEQ ID: 802 ^ tll GTCGAT SEQ ID: 1356
ATTGAT SEQ ID: 249 11 TGTGCA SEQ ID: 803 i i
11111 GTCGCA SEQ ID: 1357
ATTGCC SEQ ID: 250 111 111 TGTGCG SEQ ID: 804 11 GTCGTT SEQ ID: 1358
ATTGGA SEQ ID: 251 ft ½ TGTGGA SEQ ID: 805 S8JJJJ
1 GTCTAA SEQ ID: 1359
ATTTCC SEQ ID: 252 ll
ft TGTTCG SEQ ID: 806 β 1111ΐ1ΐ GTCTAC SEQ ID: 1360
ATTTCG SEQ ID: 253 ill 11 TGTTGC SEQ ID: 807 1111 GTCTAG SEQ ID: 1361
Figure imgf000020_0001
Figure imgf000021_0001
Figure imgf000022_0001
Figure imgf000023_0001
CTCTGC SEQ ID: 454 ATTGTC SEQ ID: 1008 1 TGGAGT SEQ ID: 1562
CTCTGT SEQ ID: 455 11 SEQ ID: 1009 e III§iI TGGCAG SEQ ID: 1563
C GAGA SEQ ID: 456 ATTTTA SEQ ID: 1010 TGGCTA SEQ ID: 1564
CTGAGT SEQ ID: 457 turn CAACAT SEQ ID: 101 1 TGGGAC SEQ ID: 1565
CTGCAA SEQ ID: 458 H 11 CAACGT SEQ ID: 1012 11 TGGTAC SEQ ID: 1566
CTGCAG SEQ ID: 459 CAACTG SEQ ID: 1013 TGGTCT SEQ ID: 1567
CTGCCA SEQ ID: 460 CAAGGC SEQ ID: 1014 TGTAAG SEQ ID: 1568
CTGGCC SEQ ID: 461 HI 11 CAATAC SEQ ID: 1015 1i11i1 TGTACC SEQ ID: 1569
CTGGGA SEQ ID: 462 CAATCG SEQ ID: 1016 1 TGTAGT SEQ ID: 1570
CTGTAA SEQ ID: 463 111 11 CAATTC SEQ ID: 1017 11 TGTATA SEQ ID: 1571
CTGTAC SEQ ID: 464 ft 11 C AC ATT SEQ ID: 1018 1 11 TGTATC SEQ ID: 1572
CTGTAG SEQ ID: 465 titi C AC CAT SEQ ID: 1019 lift TGTATT SEQ ID: 1573
CTGTCG SEQ ID: 466 CACCGA SEQ ID: 1020 TGTCAC SEQ ID: 1574
CTGTGA SEQ ID: 467 CACCGT SEQ ID: 1021 TGTCAT SEQ ID: 1575
CTTACA SEQ ID: 468 11 11 CACGAA SEQ ID: 1022 1 TGTCGC SEQ ID: 1576
CTTCGT SEQ ID: 469 ft 11 CACGAC SEQ ID: 1023 e II§IiI TGTCGT SEQ ID: 1577
CTTGTA SEQ ID: 470 ill CACGCT SEQ ID: 1024 ^ tli TGTCTA SEQ ID: 1578
GAAAAA SEQ ID: 471 11 CACGGC SEQ ID: 1025 TGTCTC SEQ ID: 1579
GAAACA SEQ ID: 472 ft 11 CACGTC SEQ ID: 1026 11 TGTCTT SEQ ID: 1580
GAAACG SEQ ID: 473 CACGTT SEQ ID: 1027 TGTGAC SEQ ID: 1581
GAAATG SEQ ID: 474 CACTCG SEQ ID: 1028 TGTGCA SEQ ID: 1582
GAACGA SEQ ID: 475 in 11 CACTGG SEQ ID: 1029 IiIiII TGTGCG SEQ ID: 1583
GAACGT SEQ ID: 476 C AG ACT SEQ ID: 1030 IIII TGTGGA SEQ ID: 1584
GAACTT SEQ ID: 477 ft 11 CAGAGA SEQ ID: 1031 11 ¾Is TGTGTC SEQ ID: 1585
GAAGCC SEQ ID: 478 ft 11 CAGAGT SEQ ID: 1032 1 11 TGTGTG SEQ ID: 1586
GAAGTC SEQ ID: 479 Hi CAGATG SEQ ID: 1033 lift TGTTAA SEQ ID: 1587
GAATAC SEQ ID: 480 CAGCAT SEQ ID: 1034 TGTTAC SEQ ID: 1588
GAATCA SEQ ID: 481 CAGCTG SEQ ID: 1035 TGTTAT SEQ ID: 1589
GAATTG SEQ ID: 482 ft 11 CAGGAA SEQ ID: 1036 1 TGTTCC SEQ ID: 1590
GACAAC SEQ ID: 483 ft 11 CAGGCT SEQ ID: 1037 IIII TGTTTC SEQ ID: 1591
GACAAG SEQ ID: 484 ill CAGGTA SEQ ID: 1038 ^ tli TGTTTG SEQ ID: 1592
GACAAT SEQ ID: 485 11 C AG TAG SEQ ID: 1039 TTAACG SEQ ID: 1593
GACACA SEQ ID: 486 ft 11 C AG TAT SEQ ID: 1040 11 TTAATG SEQ ID: 1594
GACAGA SEQ ID: 487 ft 11 CAGTCA SEQ ID: 1041 1 11 TTACAG SEQ ID: 1595
GACAGC SEQ ID: 488 ft 11 CAGTCT SEQ ID: 1042 s IiIlIlI TTACAT SEQ ID: 1596
GACATA SEQ ID: 489 Hi ½ CAGTGG SEQ ID: 1043 l IlIllIlIl TTACCG SEQ ID: 1597
GACCAC SEQ ID: 490 CAGTGT SEQ ID: 1044 IIII TTACCT SEQ ID: 1598
GACCAG SEQ ID: 491 ft 11 CATACG SEQ ID: 1045 11 ¾Is TTACGT SEQ ID: 1599
G AC CAT SEQ ID: 492 ft 11 CATAGA SEQ ID: 1046 1 11 TTACTC SEQ ID: 1600
GACCTT SEQ ID: 493 ft CATATG SEQ ID: 1047 TTACTG SEQ ID: 1601
GACGAA SEQ ID: 494 Hi ½ CATCCG SEQ ID: 1048 1 II11I1I1 TTAGCG SEQ ID: 1602
G AC GAT SEQ ID: 495 CATCGA SEQ ID: 1049 11 TTAGGC SEQ ID: 1603
GACGCG SEQ ID: 496 ft 11 CATCGG SEQ ID: 1050 1 TTAGGG SEQ ID: 1604
GACGGA SEQ ID: 497 ft 11 CATCGT SEQ ID: 1051 IIII TTATCG SEQ ID: 1605
GACGTA SEQ ID: 498 ill CATCTC SEQ ID: 1052 ^ tli TTATCT SEQ ID: 1606
GACGTG SEQ ID: 499 11 CATGCG SEQ ID: 1053 nIIil TTATGC SEQ ID: 1607
GACTCG SEQ ID: 500 111 111 CATGTA SEQ ID: 1054 ll TTATGT SEQ ID: 1608
GACTCT SEQ ID: 501 ft 11 CAT T AC SEQ ID: 1055 1 TTATTT SEQ ID: 1609
GACTGT SEQ ID: 502 ft 11 CAT TAG SEQ ID: 1056 s IiIlIlI TTCACG SEQ ID: 1610
GAGACA SEQ ID: 503 ill ½ CATTGA SEQ ID: 1057 l 1l1l1ll1l TTCACT SEQ ID: 161 1
Figure imgf000025_0001
GCCCAA SEQ ID: 554 IlCGGGAC SEQ ID: 1 108 T T T T T T SEQ ID: 1662
[0074] In some embodiments, H is a nucleic acid hexamer (i.e., a hexanucleotide) having a hexanucleotide sequence selected from the sequences in Table 2. The hexamers in Table 2 can be used for forward sense primers, for example in the FS-MEP primers.
TABLE 2
Figure imgf000026_0001
Figure imgf000027_0001
SEQ ID: 2304
SEQ ID: 2305
SEQ ID: 2306
SEQ ID: 2307
SEQ ID: 2308
SEQ ID: 2309
SEQ ID: 2310
SEQ ID: 231 1
SEQ ID: 2312
SEQ ID: 2313
SEQ ID: 2314
SEQ ID: 2315
SEQ ID: 2316
SEQ ID: 2317
SEQ ID: 2318
SEQ ID: 2319
SEQ ID: 2320
SEQ ID: 2321
SEQ ID: 2322
SEQ ID: 2323
SEQ ID: 2324
SEQ ID: 2325
SEQ ID: 2326
SEQ ID: 2327
SEQ ID: 2328
SEQ ID: 2329
SEQ ID: 2330
SEQ ID: 2331
SEQ ID: 2332
SEQ ID: 2333
SEQ ID: 2334
SEQ ID: 2335
SEQ ID: 2336
SEQ ID: 2337
SEQ ID: 2338
SEQ ID: 2339
SEQ ID: 2340
SEQ ID: 2341
SEQ ID: 2342
SEQ ID: 2343
SEQ ID: 2344
SEQ ID: 2345
SEQ ID: 2346
SEQ ID: 2347
SEQ ID: 2348
SEQ ID: 2349
SEQ ID: 2350
SEQ ID: 2351
SEQ ID: 2352
Figure imgf000028_0001
SEQ ID: 2353
Figure imgf000029_0001
SEQ ID: 2404
SEQ ID: 2405
SEQ ID: 2406
SEQ ID: 2407
SEQ ID: 2408
SEQ ID: 2409
SEQ ID: 2410
SEQ ID: 241 1
SEQ ID: 2412
SEQ ID: 2413
SEQ ID: 2414
SEQ ID: 2415
SEQ ID: 2416
SEQ ID: 2417
SEQ ID: 2418
SEQ ID: 2419
SEQ ID: 2420
SEQ ID: 2421
SEQ ID: 2422
SEQ ID: 2423
SEQ ID: 2424
SEQ ID: 2425
SEQ ID: 2426
SEQ ID: 2427
SEQ ID: 2428
SEQ ID: 2429
SEQ ID: 2430
SEQ ID: 2431
SEQ ID: 2432
SEQ ID: 2433
SEQ ID: 2434
SEQ ID: 2435
SEQ ID: 2436
SEQ ID: 2437
SEQ ID: 2438
SEQ ID: 2439
SEQ ID: 2440
SEQ ID: 2441
SEQ ID: 2442
SEQ ID: 2443
SEQ ID: 2444
SEQ ID: 2445
SEQ ID: 2446
SEQ ID: 2447
SEQ ID: 2448
SEQ ID: 2449
SEQ ID: 2450
SEQ ID: 2451
SEQ ID: 2452
SEQ ID: 2453
Figure imgf000030_0001
TGGTCT SEQ ID: 2454
TGTAAC SEQ ID: 2455
TGTACT SEQ ID: 2456
TGTAGT SEQ ID: 2457
TGTATC SEQ ID: 2458
TGTCAT SEQ ID: 2459
TGTCGA SEQ ID: 2460
TGTCGC SEQ ID: 2461
TGTCGG SEQ ID: 2462
TGTCGT SEQ ID: 2463
TGTCTT SEQ ID: 2464
TGTGCA SEQ ID: 2465
TGTGCG SEQ ID: 2466
TGTGGA SEQ ID: 2467
TGTTCG SEQ ID: 2468
TGTTGC SEQ ID: 2469
TTAAC SEQ ID: 2470
TTACAT SEQ ID: 2471
TTAGAA SEQ ID: 2472
T TAG AC SEQ ID: 2473
TTATTA SEQ ID: 2474
TTCCGG SEQ ID: 2475
TTCCTG SEQ ID: 2476
TTCGAG SEQ ID: 2477
TTCGCA SEQ ID: 2478
TTCGTG SEQ ID: 2479
TTGAAA SEQ ID: 2480
TTGACG SEQ ID: 2481
TTGCAG SEQ ID: 2482
TTGCCT SEQ ID: 2483
TTGCGA SEQ ID: 2484
TTGCTT SEQ ID: 2485
TTGGAA SEQ ID: 2486
TTGTCG SEQ ID: 2487
TTGTGC SEQ ID: 2488
TTTCCG SEQ ID: 2489
Figure imgf000031_0001
TTTGTC SEQ ID: 2490
[0075] In some embodiments, H is a nucleic acid hexamer (i.e., a hexanucleotide) having a hexanucleotide sequence selected from the sequences in Table 3. The hexamers in Table 3 can be used for reverse sense primers, for example in the RS-MEP primers.
TABLE 3
Figure imgf000031_0002
SEQ ID 2772 GTTATC SEQ ID: 3050
SEQ ID 2773 GTTCGG SEQ ID: 3051
SEQ ID 2774 GTTGCG SEQ ID: 3052
SEQ ID 2775 GTTGGC SEQ ID: 3053
SEQ ID 2776 GTTGGG SEQ ID: 3054
SEQ ID 2777 ¾GTTGTC SEQ ID: 3055
SEQ ID 2778 GTTGTG SEQ ID: 3056
SEQ ID 2779 GTTTAT SEQ ID: 3057
SEQ ID 2780 GTTTCA SEQ ID: 3058
SEQ ID 2781 GTTTCG SEQ ID: 3059
SEQ ID 2782 GTTTGC SEQ ID: 3060
SEQ ID 2783 GTTTTG SEQ ID: 3061
SEQ ID 2784 HGTTTTT SEQ ID: 3062
SEQ ID 2785 TAAAAT SEQ ID: 3063
SEQ ID 2786 TAACCG SEQ ID: 3064
SEQ ID 2787 TAACGT SEQ ID: 3065
SEQ ID 2788 TAACC SEQ ID: 3066
SEQ ID 2789 TAAGTT SEQ ID: 3067
SEQ ID 2790 TAATAA SEQ ID: 3068
SEQ ID 2791 TAATCT SEQ ID: 3069
SEQ ID 2792 TAATGC SEQ ID: 3070
SEQ ID 2793 TAATTG SEQ ID: 3071
SEQ ID 2794 TACAAG SEQ ID: 3072
SEQ ID 2795 ACACG SEQ ID: 3073
SEQ ID 2796 TACAGC SEQ ID: 3074
SEQ ID: 2797 TACAGG SEQ ID: 3075
SEQ ID: 2798 ACAG SEQ ID: 3076
SEQ ID: 2799 TACATA SEQ ID: 3077
SEQ ID: 2800 TACATC SEQ ID: 3078
SEQ ID: 2801 TACCAA SEQ ID: 3079
SEQ ID: 2802 TACCGA SEQ ID: 3080
SEQ ID: 2803 TACCTG SEQ ID: 3081
SEQ ID: 2804 TACGCT SEQ ID: 3082
SEQ ID: 2805 TACGGC SEQ ID: 3083
SEQ ID: 2806 TACGGG SEQ ID: 3084
SEQ ID: 2807 TACGGT SEQ ID: 3085
SEQ ID: 2808 TACGTA SEQ ID: 3086
SEQ ID: 2809 TACGTC SEQ ID: 3087
SEQ ID: 2810 TACGTT SEQ ID: 3088
SEQ ID: 281 1 TACTCG SEQ ID: 3089
SEQ ID: 2812 TACTGC SEQ ID: 3090
SEQ ID: 2813 TACTGT SEQ ID: 3091
SEQ ID: 2814 TACTTA SEQ ID: 3092
SEQ ID: 2815 AGACG SEQ ID: 3093
SEQ ID: 2816 AGA SEQ ID: 3094
SEQ ID: 2817 TAGCAC SEQ ID: 3095
SEQ ID: 2818 TAGCCG SEQ ID: 3096
SEQ ID: 2819 TAGCGC SEQ ID: 3097
SEQ ID: 2820 TAGCGT SEQ ID: 3098
Figure imgf000032_0001
SEQ ID: 2821 TAGGCA SEQ ID: 3099 TAGGTG SEQ ID: 3100
TAGTGC SEQ ID: 3101 TAGTGT SEQ ID: 3102 TATACG SEQ ID: 3103 TATATC SEQ ID: 3104 TATATG SEQ ID: 3105 TATCGA SEQ ID: 3106 TATCGC SEQ ID: 3107 TATCGG SEQ ID: 3108 TATCGT SEQ ID: 3109 TATCTT SEQ ID: 31 10 TATGAT SEQ ID: 31 1 1 TATGCA SEQ ID: 31 12 TATGCG SEQ ID: 31 13 TATGGA SEQ ID: 31 14 TATGGC SEQ ID: 31 15 TATGGG SEQ ID: 31 16 TATGTA SEQ ID: 31 17 TATGTC SEQ ID: 31 18 TATGTT SEQ ID: 31 19 TATTCG SEQ ID: 3120 TATTGG SEQ ID: 3121 TATTGT SEQ ID: 3122 TATTTG SEQ ID: 3123 TCACAG SEQ ID: 3124 TCACCG SEQ ID: 3125 TCACGC SEQ ID: 3126 TCACGG SEQ ID: 3127 TCACGT SEQ ID: 3128 TCACTT SEQ ID: 3129 TCAGAG SEQ ID: 3130 TCAGGC SEQ ID: 3131 TCAGGT SEQ ID: 3132 TCAGTG SEQ ID: 3133 TCATGA SEQ ID: 3134 TCATGC SEQ ID: 3135 TCATGG SEQ ID: 3136 TCATGT SEQ ID: 3137 TCCAAC SEQ ID: 3138 TCCACA SEQ ID: 3139 TCCACG SEQ ID: 3140 TCCAGA SEQ ID: 3141 TCCATT SEQ ID: 3142 TCCCAG SEQ ID: 3143 TCCCTT SEQ ID: 3144 TCCGAG SEQ ID: 3145 TCCGTC SEQ ID: 3146 TCCTGG SEQ ID: 3147 TCCTGT SEQ ID: 3148 TCCTTG SEQ ID: 3149
Figure imgf000033_0001
SEQ ID: 3150
SEQ ID: 3151
SEQ ID: 3152
SEQ ID: 3153
SEQ ID: 3154
SEQ ID: 3155
SEQ ID: 3156
SEQ ID: 3157
SEQ ID: 3158
SEQ ID: 3159
SEQ ID: 3160
SEQ ID: 3161
SEQ ID: 3162
SEQ ID: 3163
SEQ ID: 3164
SEQ ID: 3165
SEQ ID: 3166
SEQ ID: 3167
SEQ ID: 3168
SEQ ID: 3169
SEQ ID: 3170
SEQ ID: 3171
SEQ ID: 3172
SEQ ID: 3173
SEQ ID: 3174
SEQ ID: 3175
SEQ ID: 3176
SEQ ID: 3177
SEQ ID: 3178
SEQ ID: 3179
SEQ ID: 3180
SEQ ID: 3181
SEQ ID: 3182
SEQ ID: 3183
SEQ ID: 3184
SEQ ID: 3185
SEQ ID: 3186
SEQ ID: 3187
SEQ ID: 3188
SEQ ID: 3189
SEQ ID: 3190
SEQ ID: 3191
SEQ ID: 3192
SEQ ID: 3193
SEQ ID: 3194
SEQ ID: 3195
SEQ ID: 3196
SEQ ID: 3197
SEQ ID: 3198
SEQ ID: 3199
Figure imgf000034_0001
SEQ ID: 2644 GCGTAG SEQ ID: 2922 TGATTG SEQ ID: 3200
SEQ ID: 2645 GCGTAT SEQ ID: 2923 ill TGCAAC SEQ ID: 3201
SEQ ID: 2646 GCGTCT SEQ ID: 2924 fill TGCACA SEQ ID: 3202
SEQ ID: 2647 GC ACT SEQ ID: 2925 1111 TGCACC SEQ ID: 3203
SEQ ID: 2648 GCTAGC SEQ ID: 2926 ill TGCAGG SEQ ID: 3204
SEQ ID: 2649 Bill GCTAGT SEQ ID: 2927 TGCATC SEQ ID: 3205
SEQ ID: 2650 GCTATC SEQ ID: 2928 1 li1il1i TGCCCG SEQ ID: 3206
SEQ ID: 2651 GCTCCA SEQ ID: 2929 ll! TGCCGG SEQ ID: 3207
SEQ ID: 2652 GCTCGA SEQ ID: 2930 1111 TGCCGT SEQ ID: 3208
SEQ ID: 2653 GCTCGG SEQ ID: 2931 ill TGCGAC SEQ ID: 3209
SEQ ID: 2654 ^ GCTGAT TGCGAT
. SEQ ID: 2932 SEQ ID: 3210
SEQ ID: 2655 GCTGCC SEQ ID: 2933 ■ TGCGCA SEQ ID: 321 1
SEQ ID: 2656 GCTGCT SEQ ID: 2934 TGCGCG SEQ ID: 3212
SEQ ID: 2657 GCTGTA SEQ ID: 2935 111 TGCGCT SEQ ID: 3213
SEQ ID: 2658 iiiiii GCTGTC SEQ ID: 2936 111 TGCGTA SEQ ID: 3214
SEQ ID: 2659 i sillillilil; GCTGTG SEQ ID: 2937 ill
ill TGCTAC SEQ ID: 3215
SEQ ID: 2660 GCTGTT SEQ ID: 2938 1111 TGCTAG SEQ ID: 3216
SEQ ID: 2661 GC AT SEQ ID: 2939 ill TGCTAT SEQ ID: 3217
SEQ ID: 2662 GCTTCG SEQ ID: 2940 till TGCTCC SEQ ID: 3218
SEQ ID: 2663 GCTTCT SEQ ID: 2941 TGCTGA SEQ ID: 3219
SEQ ID: 2664 llii GGAAAT SEQ ID: 2942 1 il1l1 TGCTGG SEQ ID: 3220
SEQ ID: 2665 GGAAGC SEQ ID: 2943 TGCTGT SEQ ID: 3221
SEQ ID: 2666 illli GGACGT SEQ ID: 2944 111
ill TGCTTT SEQ ID: 3222
SEQ ID: 2667 iiiiii GGACTG SEQ ID: 2945 ill TGGACT SEQ ID: 3223
SEQ ID: 2668 iiiiii GGACTT SEQ ID: 2946 ill TGGAGT SEQ ID: 3224
SEQ ID: 2669 GGAGCT SEQ ID: 2947 TGGCAG SEQ ID: 3225
SEQ ID: 2670 ill GGAGGC SEQ ID: 2948 l TGGCTA SEQ ID: 3226
SEQ ID: 2671 GGATAT SEQ ID: 2949
SEQ ID: 2672 GGATGT 1l1ii TGGGAC SEQ ID: 3227
SEQ ID: 2950 1 TGGTAC SEQ ID: 3228
SEQ ID: 2673 GGCAAC SEQ ID: 2951 ill TGGTCT SEQ ID: 3229
SEQ ID: 2674 GGCAAT SEQ ID: 2952 ill TGTAAG SEQ ID: 3230
SEQ ID: 2675 1
Itltllti GGCAC SEQ ID: 2953 llii TGTACC SEQ ID: 3231
SEQ ID: 2676 illlli GGCAGA SEQ ID: 2954 1111 TGTAGT SEQ ID: 3232
SEQ ID: 2677 illlli GGCATC SEQ ID: 2955 TGTATA SEQ ID: 3233
SEQ ID: 2678 GGCCAG SEQ ID: 2956 ill
ill TGTATC SEQ ID: 3234
SEQ ID: 2679 GGCCTG SEQ ID: 2957 TGTATT SEQ ID: 3235
SEQ ID: 2680 lip ! i GGCCTT SEQ ID: 2958 ill TGTCAC SEQ ID: 3236
SEQ ID: 2681 GGC AG SEQ ID: 2959 till TGTCAT SEQ ID: 3237
SEQ ID: 2682 1 GGCTCC SEQ ID: 2960 ill TGTCGC SEQ ID: 3238
SEQ ID: 2683 i GGCTGT SEQ ID: 2961 TGTCGT SEQ ID: 3239
SEQ ID: 2684 GGCTTC SEQ ID: 2962 ill TGTCTA SEQ ID: 3240
SEQ ID: 2685 m ¾m§§1 GGGACC SEQ ID: 2963 nn
1111 TGTCTC SEQ ID: 3241
SEQ ID: 2686 illlill GGGACT SEQ ID: 2964 TGTCTT SEQ ID: 3242
SEQ ID: 2687 GGGCTC SEQ ID: 2965 ill
111 TGTGAC SEQ ID: 3243
SEQ ID: 2688 GGGGGT SEQ ID: 2966 lili TGTGCA SEQ ID: 3244
SEQ ID: 2689 GGGGTA SEQ ID: 2967 ill TGTGCG SEQ ID: 3245
SEQ ID: 2690 GGGGTG SEQ ID: 2968 lllll TGTGGA SEQ ID: 3246
SEQ ID: 2691 GGGGTT SEQ ID: 2969 TGTGTC SEQ ID: 3247
SEQ ID: 2692 GGGTAC SEQ ID: 2970 111 TGTGTG SEQ ID: 3248
SEQ ID: 2693 GGGTAG
;;;;;;;;;;;;! SEQ ID: 2971 111
■ TGTTAA SEQ ID: 3249 SEQ ID: 2694 GGGTGC SEQ ID: 2972 TGTTAC SEQ ID: 3250
SEQ ID: 2695 GGTACC SEQ ID: 2973 ill TGTTAT SEQ ID: 3251
SEQ ID: 2696 GGTACG SEQ ID: 2974 fill TGTTCC SEQ ID: 3252
SEQ ID: 2697 GGTACT SEQ ID: 2975 TGTTTC SEQ ID: 3253
SEQ ID: 2698 GG AGA 1111
SEQ ID: 2976 ill TGTTTG SEQ ID: 3254
SEQ ID: 2699 Bill GGTAGG SEQ ID: 2977 TTAACG SEQ ID: 3255
SEQ ID: 2700 GGTATC SEQ ID: 2978 1 li1il1i TTAATG SEQ ID: 3256
SEQ ID: 2701 GGTATG SEQ ID: 2979 ll! T ACAG SEQ ID: 3257
SEQ ID: 2702 GGTATT SEQ ID: 2980 1111 T ACAT SEQ ID: 3258
SEQ ID: 2703 GGTCCA SEQ ID: 2981 ill TTACCG SEQ ID: 3259
SEQ ID: 2704 ^ GGTCTA TTACCT
. SEQ ID: 2982 SEQ ID: 3260
SEQ ID: 2705 GGTGAT SEQ ID: 2983 ■ TTACGT SEQ ID: 3261
SEQ ID: 2706 GGTGCC SEQ ID: 2984 TTACTC SEQ ID: 3262
SEQ ID: 2707 GGTTAC SEQ ID: 2985 111 TTACTG SEQ ID: 3263
SEQ ID: 2708 iiiiii GGTTAT SEQ ID: 2986 111 TTAGCG SEQ ID: 3264
SEQ ID: 2709 i sillillilil; GGTTGT SEQ ID: 2987 ill
ill TTAGGC SEQ ID: 3265
SEQ ID: 2710 GTAATA SEQ ID: 2988 1111 TTAGGG SEQ ID: 3266
SEQ ID: 271 1 GTAATG SEQ ID: 2989 ill TTATCG SEQ ID: 3267
SEQ ID: 2712 GTACAG SEQ ID: 2990 till TTATCT SEQ ID: 3268
SEQ ID: 2713 GTACAT SEQ ID: 2991 TTATGC SEQ ID: 3269
SEQ ID: 2714 llii GTACCA SEQ ID: 2992 1 il1l1 TTATGT SEQ ID: 3270
SEQ ID: 2715 GTACCT SEQ ID: 2993 TTATTT SEQ ID: 3271
SEQ ID: 2716 illli GTACGC SEQ ID: 2994 111
ill TTCACG SEQ ID: 3272
SEQ ID: 2717 iiiiii GTACGT SEQ ID: 2995 ill TTCACT SEQ ID: 3273
SEQ ID: 2718 iiiiii G AC A SEQ ID: 2996 ill TTCAGG SEQ ID: 3274
SEQ ID: 2719 GTAGAC SEQ ID: 2997 TTCATC SEQ ID: 3275
SEQ ID: 2720 ill GTAGAT SEQ ID: 2998 l TTCATG SEQ ID: 3276
SEQ ID: 2721 GTAGGG SEQ ID: 2999
SEQ ID: 2722 GTAGTG 1l1ii TTCCAA SEQ ID: 3277
SEQ ID: 3000 1 TTCCAC SEQ ID: 3278
SEQ ID: 2723 GTATAT SEQ ID: 3001 ill TTCCTG SEQ ID: 3279
SEQ ID: 2724 GTATCA SEQ ID: 3002 ill TTCGAC SEQ ID: 3280
SEQ ID: 2725 1
Itltllti GTATCG SEQ ID: 3003 llii TTCGAG SEQ ID: 3281
SEQ ID: 2726 illlli GTATCT SEQ ID: 3004 1111 TTCGCA SEQ ID: 3282
SEQ ID: 2727 illlli GTATGA SEQ ID: 3005 TTCGGC SEQ ID: 3283
SEQ ID: 2728 GTATGC SEQ ID: 3006 ill
ill TTCGGT SEQ ID: 3284
SEQ ID: 2729 GTATTA SEQ ID: 3007 TTCGTC SEQ ID: 3285
SEQ ID: 2730 lip ! i GTATTC SEQ ID: 3008 ill TTCTAA SEQ ID: 3286
SEQ ID: 2731 GTATTT SEQ ID: 3009 till TTCTGG SEQ ID: 3287
SEQ ID: 2732 1 GTCACG SEQ ID: 3010 ill TTGAAT SEQ ID: 3288
SEQ ID: 2733 i GTCACT SEQ ID: 301 1 GAGA SEQ ID: 3289
SEQ ID: 2734 GTCAGG SEQ ID: 3012 ill TTGAGT SEQ ID: 3290
SEQ ID: 2735 m ¾m§§1 GTCATG SEQ ID: 3013 nn
1111 TTGATG SEQ ID: 3291
SEQ ID: 2736 illlill GTCCAC SEQ ID: 3014 TTGCAC SEQ ID: 3292
SEQ ID: 2737 GTCCCA SEQ ID: 3015 ill
111 TTGCAG SEQ ID: 3293
SEQ ID: 2738 GTCCTC SEQ ID: 3016 lili TTGCAT SEQ ID: 3294
SEQ ID: 2739 GTCGAC SEQ ID: 3017 ill TTGCGA SEQ ID: 3295
SEQ ID: 2740 GTCGAT SEQ ID: 3018 lllll TTGCGG SEQ ID: 3296
SEQ ID: 2741 GTCGCA SEQ ID: 3019 TTGCTA SEQ ID: 3297
SEQ ID: 2742 GTCGTT SEQ ID: 3020 111 TTGGAG SEQ ID: 3298
SEQ ID: 2743 GTCTAA
;;;;;;;;;;;;! SEQ ID: 3021 111
■ TTGGCA SEQ ID: 3299 SEQ ID: 2744 llll GTCTAC SEQ ID: 3022 TTGGCC SEQ ID: 3300
SEQ ID: 2745 GTCTAG SEQ ID: 3023 TTGGGC SEQ ID: 3301
SEQ ID: 2746 GTCTAT SEQ ID: 3024 TTGTAT SEQ ID: 3302
SEQ ID: 2747 GTCTCA SEQ ID: 3025 1111 TTGTCA SEQ ID: 3303
SEQ ID: 2748 i GTCTCT SEQ ID: 3026 till TTGTCC SEQ ID: 3304
SEQ ID: 2749 GTCTGA SEQ ID: 3027 TTGTCG SEQ ID: 3305
SEQ ID: 2750 Illli GTCTTA SEQ ID: 3028 111 TTGTGC SEQ ID: 3306
SEQ ID: 2751 GTGAAT SEQ ID: 3029 TTGTGT SEQ ID: 3307
SEQ ID: 2752 GTGACA SEQ ID: 3030 1111 TTGTTA SEQ ID: 3308
SEQ ID: 2753 iiiiiii GTGATA SEQ ID: 3031 ill TTGTTT SEQ ID: 3309
SEQ ID: 2754 GTGCCC SEQ ID: 3032 TTTAGG SEQ ID: 3310
SEQ ID: 2755 GTGCCG SEQ ID: 3033 TTTAGT SEQ ID: 331 1
SEQ ID: 2756 _ GTGCGA SEQ ID: 3034 TTTATG SEQ ID: 3312
SEQ ID: 2757 111 GTGCGC SEQ ID: 3035 111 TTTCAA SEQ ID: 3313
SEQ ID: 2758 iiiiiii GTGCGT SEQ ID: 3036 TTTCCA SEQ ID: 3314
SEQ ID: 2759 lllll¾ GTGCTA SEQ ID: 3037 111 TTTCCG SEQ ID: 3315
SEQ ID: 2760 ill GTGCTG SEQ ID: 3038 TTTCCT SEQ ID: 3316
SEQ ID: 2761 iiiiiii GTGGTC SEQ ID: 3039 1111 TTTGAG SEQ ID: 3317
SEQ ID: 2762 flllll GTGGTT SEQ ID: 3040 till TTTGCG SEQ ID: 3318
SEQ ID: 2763 GTGTCG SEQ ID: 3041 TTTGCT SEQ ID: 3319
SEQ ID: 2764 Illi GTGTCT SEQ ID: 3042 ill TTTGGC SEQ ID: 3320
SEQ ID: 2765 GTGTGG SEQ ID: 3043 TTTTCC SEQ ID: 3321
SEQ ID: 2766 GTTAAC SEQ ID: 3044 1111 TTTTGC SEQ ID: 3322
SEQ ID: 2767 flllll GT ACA SEQ ID: 3045 till TTTTTC SEQ ID: 3323
SEQ ID: 2768 G ACT SEQ ID: 3046 111 SEQ ID: 3324
[0076] In some embodiments of the primers of formula H-N-ST, H comprises more than six nucleotides. In some embodiments, H comprises fewer than six nucleotides. In some embodiments, N is 1, 2, or 3 random nucleotides. In some embodiments, ST is a specific tail sequence of 12 to 22 nucleotides. In some embodiments, ST is a specific tail sequence selected from the group consisting of 12 to 15, 13 to 16, 14 to 17, 15 to 18, 16 to 19, 17 to 20, 18 to 21, and 19 to 22 nucleotides. In some embodiments, ST is a specific tail sequence selected from the group consisting of 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, and 22 nucleotides. In some embodiments, ST is a specific tail sequence of 17 nucleotides.
[0077] In some embodiments, the present disclosure relates to nucleic acid sequences having the formula X-H-N-ST-Y, where H is a nucleic acid hexamer; N is 1, 2, or 3 random nucleotides; X and Y are 0 to 3 random nucleotides; and ST is a specific tail sequence of 12 to 22 nucleotides. In some embodiments, H is more than six nucleotides. In other embodiments, H is fewer than six nucleotides. In some embodiments, N is 1, 2, or 3 random nucleotides. In some embodiments X is 0, 1, 2, or 3 random nucleotides. In other embodiments, Y is 0, 1, 2, or 3 random nucleotides. In some embodiments, X and Y are the same number of random nucleotides. In some embodiments, X and Y are both 0 random nucleotides.
[0078] In some embodiments, all primers in a set or in composition have substantially the same length. In some embodiments, all primers in a set or in a composition have the same length. In some embodiments, all primers in a set have the same specific tail.
[0079] In some embodiments, a primer set described herein can comprise anywhere between 20-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000, 1000-1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600- 1700, 1700-1800, 1800-1900, or 1900-2000 primers, such that each primer contains a different hexamer sequence. Generally, about half of these primers will be used as Forward Primers (FP) about half as Reverse Primers (RP).
[0080] The number of primers in a MEP primer set will vary based on which genes are highly expressed in the organism and based on the desired sensitivity of the high- throughput sequencing experiment. In general, a larger primer library is expected to increase sensitivity of microbe detection.
[0081] In some embodiments, a set of forward strand primers (FS-MEP) is a set of 800-900 primers. In some embodiments, a set of reverse strand primers (FS-MEP) is a set of 800-900 primers.
[0082] In some embodiments, the Hexamer is a hexamer selected from Table 1. In some embodiments, a primer set includes at least one primer in which the hexamer sequence is not a sequence selected from Table 4.
TABLE 4
Figure imgf000038_0001
SEQ ID 3837 CGATAT SEQ ID: 4336
SEQ ID 3838 CGATGA SEQ ID: 4337
SEQ ID 3839 CGCAAC SEQ ID: 4338
SEQ ID 3840 CGCACA SEQ ID: 4339
SEQ ID 3841 CGCACT SEQ ID: 4340
SEQ ID 3842 CGCATC SEQ ID: 4341
SEQ ID 3843 CGCGTT SEQ ID: 4342
SEQ ID 3844 CGCTGT SEQ ID: 4343
SEQ ID: 3845 CGGATG SEQ ID: 4344
SEQ ID 3846 CGGATT SEQ ID: 4345
SEQ ID 3847 CGGCAT SEQ ID: 4346
SEQ ID 3848 CGGCCT SEQ ID: 4347
SEQ ID 3849 HCGGTAT SEQ ID: 4348
SEQ ID 3850 CGGTCT SEQ ID: 4349
SEQ ID 3851 CGGTTA SEQ ID: 4350
SEQ ID 3852 CGTAAT SEQ ID: 4351
SEQ ID 3853 CGTACT SEQ ID: 4352
SEQ ID 3854 CGTATC SEQ ID: 4353
SEQ ID 3855 CGTATG SEQ ID: 4354
SEQ ID 3856 H!CGTCAA SEQ ID: 4355
SEQ ID 3857 CGTCGA SEQ ID: 4356
SEQ ID 3858 CGTGAC SEQ ID: 4357
SEQ ID 3859 CGTGCT SEQ ID: 4358
SEQ ID 3860 CGTGGT SEQ ID: 4359
SEQ ID 3861 CGTGTA SEQ ID: 4360
SEQ ID: 3862 CGTTAC SEQ ID: 4361
SEQ ID: 3863 CGTTGT SEQ ID: 4362
SEQ ID: 3864 CGTTTC SEQ ID: 4363
SEQ ID: 3865 CTAAAG SEQ ID: 4364
SEQ ID: 3866 CTAACG SEQ ID: 4365
SEQ ID: 3867 C ACAG SEQ ID: 4366
SEQ ID: 3868 CTACGG SEQ ID: 4367
SEQ ID: 3869 C AG AC SEQ ID: 4368
SEQ ID: 3870 CTAGCG SEQ ID: 4369
SEQ ID: 3871 CTAGCT SEQ ID: 4370
SEQ ID: 3872 CTAGGC SEQ ID: 4371
SEQ ID: 3873 CTAGGT SEQ ID: 4372
SEQ ID: 3874 C ATAA SEQ ID: 4373
SEQ ID: 3875 CTATCG SEQ ID: 4374
SEQ ID: 3876 CTCACA SEQ ID: 4375
SEQ ID: 3877 CTCATG SEQ ID: 4376
SEQ ID: 3878 CTCCTG SEQ ID: 4377
SEQ ID: 3879 CTCCTT SEQ ID: 4378
SEQ ID: 3880 CTCGAA SEQ ID: 4379
SEQ ID: 3881 CTCGAG SEQ ID: 4380
SEQ ID: 3882 CTCGTT SEQ ID: 4381
SEQ ID: 3883 CTCTAC SEQ ID: 4382
SEQ ID: 3884 CTCTAT SEQ ID: 4383
SEQ ID: 3885 CTCTCA SEQ ID: 4384
Figure imgf000039_0001
SEQ ID: 3886 CTCTGC SEQ ID: 4385 SEQ ID: 3388 GTGAAT SEQ ID: 3887 CTCTGT SEQ ID: 4386
SEQ ID: 3389 ill GTGACA SEQ ID: 3888 1111 CTGAGC SEQ ID: 4387
SEQ ID: f
3390 GTGACT SEQ ID: 3889 CTGATA SEQ ID: 4388
SEQ ID: 3391 GTGAGA SEQ ID: 3890 CTGATT SEQ ID: 4389
SEQ ID: 3392 fill
Bill GTGAGC SEQ ID: 3891 ■ CTGCAA SEQ ID: 4390
SEQ ID: 3393 GTGCAA SEQ ID: 3892 CTGCCT SEQ ID: 4391
SEQ ID: 3394 GTGCAT SEQ ID: 3893 CTGCGC SEQ ID: 4392
SEQ ID: 3395 GTGCTA SEQ ID: 3894 CTGCTA SEQ ID: 4393
SEQ ID: 3396 GTGGAT SEQ ID: ill
3895 1111 CTGCTG SEQ ID: 4394
SEQ ID: iiiiii
3397 GTGGCA SEQ ID: 3896 ill CTGGTA SEQ ID: 4395
SEQ ID: 3398 GTGTAT SEQ ID: 3897 CTGGTC SEQ ID: 4396
SEQ ID: 3399 GTGTCT 1111
SEQ ID: 3898 111 CTGTAG SEQ ID: 4397
SEQ ID: 3400 GTTAAC SEQ ID: 3899 CTGTCG SEQ ID: 4398
SEQ ID: 3401 GTTAAG SEQ ID: 3900 i CTGTGC SEQ ID: 4399
SEQ ID: 3402 GTTGCA SEQ ID: 3901 CTGTGT SEQ ID: 4400
SEQ ID: 3403 GTTGCC SEQ ID: 3902 111
111 CTTAAC SEQ ID: 4401
SEQ ID: 3404 till GTTGCG SEQ ID: 3903 1111 CTTACA SEQ ID: 4402
SEQ ID: 3405 i illilllilll® GTTGCT SEQ ID: 3904 CTTACG SEQ ID: 4403
SEQ ID: 3406 GTTGGA 111
SEQ ID: 3905 CTTAGG SEQ ID: 4404
SEQ ID: 3407 GTTGGC SEQ ID: 3906 CTTATA SEQ ID: 4405
SEQ ID: 3408 GTTGTA SEQ ID: 3907 ill CTTATC SEQ ID: 4406
SEQ ID: 3409 GTTGTG SEQ ID: 3908 1 CTTATT SEQ ID: 4407
SEQ ID: 3410 i 'm
¾lmllilil¾ GTTTAA SEQ ID: 11
3909 CTTCCA SEQ ID: 4408
SEQ ID: 341 1 ill! TAAAAA SEQ ID: 111
3910 ill CTTCTA SEQ ID: 4409
SEQ ID: 3412 TAAAAT SEQ ID: 391 1 CTTCTC SEQ ID: 4410
SEQ ID: 3413 till TAAAGA SEQ ID: 3912 1111 CTTGAG SEQ ID: 441 1
SEQ ID: 3414 TAAATA SEQ ID: 3913 |§|§ CTTGCA SEQ ID: 4412
SEQ ID: 3415 TAACAA SEQ ID: 3914 CTTTAC SEQ ID: 4413
SEQ ID: 3416 TAACCG SEQ ID: 3915 CTTTAT SEQ ID: 4414
SEQ ID: 3417 TAACGA SEQ ID: 3916 11111 CTTTCA SEQ ID: 4415
SEQ ID: 3418 111 TAAC C SEQ ID: 3917 GAAATC SEQ ID: 4416
SEQ ID: 3419 i lllillillil® TAAGAC SEQ ID: 3918 111 GAATAT SEQ ID: 4417
SEQ ID: 3420 TAAGGC SEQ ID: 3919 ill GAATCG SEQ ID: 4418
SEQ ID: 3421 TAAG A SEQ ID: 3920 GACACC SEQ ID: 4419
SEQ ID: 3422 Ill TAAGTG SEQ ID: 3921 111 GACATA SEQ ID: 4420
SEQ ID: 3423 TAATGG SEQ ID: 3922 §1 GACC A SEQ ID: 4421
SEQ ID: 3424 flflflfl TAATGT SEQ ID: 3923 GACGCC SEQ ID: 4422
SEQ ID: 3425 TACAAT SEQ ID: 3924 GACG A SEQ ID: 4423
SEQ ID: 3426 ACACG SEQ ID: 3925
SEQ ID: 111 G AC AG SEQ ID: 4424
3427 1111 TACAGC SEQ ID: 3926 11111 GACTCC SEQ ID: 4425
SEQ ID: 3428 ACAG SEQ ID: 3927 GACTCG SEQ ID: 4426
SEQ ID: III
3429 illll ACA C SEQ ID: 3928 fill
llfi GACTGC SEQ ID: 4427
SEQ ID: 3430 ACA G SEQ ID: 3929 1 GACTTG SEQ ID: 4428
SEQ ID: 3431 m 11m11 ACCAG SEQ ID: li11
3930 GACTTT SEQ ID: 4429
SEQ ID: 3432 till TACCCC SEQ ID: 3931 llll GAGAAT SEQ ID: 4430
SEQ ID: 3433 TACCCT SEQ ID: 3932 liiii?
Illlll® GAGACG SEQ ID: 4431
SEQ ID: 3434 lilt TACCGA SEQ ID: 3933 1111 GAGATA SEQ ID: 4432
SEQ ID: 3435 TACCTG SEQ ID: 3934 III GAGATC SEQ ID: 4433
SEQ ID: 3436 ill
1 AC GAT SEQ ID: 3935 111 GAGCAT SEQ ID: 4434
SEQ ID: 3437 TACGCA SEQ ID: 3936 111 GAGGCT SEQ ID: 4435 SEQ ID: 3438 TACGTC SEQ ID: 4436
SEQ ID: 3439 fill TACGTT SEQ ID: 4437
SEQ ID: 3440 TACTCA SEQ ID: 4438
SEQ ID: 3441 fill TAGAAG SEQ ID: 4439
SEQ ID: 3442 Bill AGACC SEQ ID: 4440
SEQ ID: 3443 TAGAGA SEQ ID: 4441
SEQ ID: 3444 TAGCAA SEQ ID: 4442
SEQ ID: 3445 TAGCAC SEQ ID: 4443
SEQ ID: 3446 llli TAGCAG SEQ ID: 4444
SEQ ID: 3447 TAGCCA SEQ ID: 4445
SEQ ID: 3448 TAGGTC SEQ ID: 4446
SEQ ID: 3449 TAGGTG SEQ ID: 4447
SEQ ID: 3450 TAG T AC SEQ ID: 4448
SEQ ID: 3451 llli TAGTCG SEQ ID: 4449
SEQ ID: 3452 §§§§§§¾! TAGTGA SEQ ID: 4450
SEQ ID: 3453 TAGTGC SEQ ID: 4451
SEQ ID: 3454 till TATAAC SEQ ID: 4452
SEQ ID: 3455 illlll
§i§ii¾ TATAAG SEQ ID: 4453
SEQ ID: 3456 TATACA SEQ ID: 4454
SEQ ID: 3457 TATAGA SEQ ID: 4455
SEQ ID: 3458 Illll TAT C AG SEQ ID: 4456
SEQ ID: 3459 TATCTC SEQ ID: 4457
SEQ ID: 3460 11111111 TAT GAT SEQ ID: 4458
SEQ ID: 3461 llli TATGGC SEQ ID: 4459
SEQ ID: 3462 TATGTA SEQ ID: 4460
SEQ ID: 3463 till TATGTC SEQ ID: 4461
SEQ ID: 3464 Itltltl TATGTG SEQ ID: 4462
SEQ ID: 3465 TATTAA SEQ ID: 4463
SEQ ID: 3466 ■ iiiSSSSSi TAT T AC SEQ ID: 4464
SEQ ID: 3467 TATTCA SEQ ID: 4465
SEQ ID: 3468 till TCAATG SEQ ID: 4466
SEQ ID: 3469 illlll
lilisl TCAGAC SEQ ID: 4467
SEQ ID: 3470 TCAGCA SEQ ID: 4468
SEQ ID: 3471 Ill TCATAT SEQ ID: 4469
SEQ ID: 3472 11111 TCATCG SEQ ID: 4470
SEQ ID: 3473 TCATGA SEQ ID: 4471
SEQ ID: 3474 11111ft TCCACA SEQ ID: 4472
SEQ ID: 3475 illll TCCATC SEQ ID: 4473
SEQ ID: 3476 till TCCCAT SEQ ID: 4474
SEQ ID: 3477 till TCCGAT SEQ ID: 4475
SEQ ID: 3478 III TCGACA SEQ ID: 4476
SEQ ID: 3479 illll TCGACG SEQ ID: 4477
SEQ ID: 3480 TCGAGC SEQ ID: 4478
SEQ ID: 3481 TCGAGT SEQ ID: 4479
SEQ ID: 3482 lilt TCGATA SEQ ID: 4480
SEQ ID: 3483 illlll
iilisl TCGATG SEQ ID: 4481
SEQ ID: 3484 1 TCGCAA SEQ ID: 4482
SEQ ID: 3485 ill TCGCAC SEQ ID: 4483
SEQ ID: 3486 1 TCGGTG SEQ ID: 4484
SEQ ID: 3487 TCGTAT
Figure imgf000041_0001
SEQ ID: 4485
Figure imgf000042_0001
Figure imgf000043_0001
SEQ ID: 4586
SEQ ID: 4587
SEQ ID: 4588
SEQ ID: 4589
SEQ ID: 4590
SEQ ID: 4591
SEQ ID: 4592
SEQ ID: 4593
SEQ ID: 4594
SEQ ID: 4595
SEQ ID: 4596
SEQ ID: 4597
SEQ ID: 4598
SEQ ID: 4599
SEQ ID: 4600
SEQ ID: 4601
SEQ ID: 4602
SEQ ID: 4603
SEQ ID: 4604
SEQ ID: 4605
SEQ ID: 4606
SEQ ID: 4607
SEQ ID: 4608
SEQ ID: 4609
SEQ ID: 4610
SEQ ID: 461 1
SEQ ID: 4612
SEQ ID: 4613
SEQ ID: 4614
SEQ ID: 4615
SEQ ID: 4616
SEQ ID: 4617
SEQ ID: 4618
SEQ ID: 4619
SEQ ID: 4620
SEQ ID: 4621
SEQ ID: 4622
SEQ ID: 4623
SEQ ID: 4624
SEQ ID: 4625
SEQ ID: 4626
SEQ ID: 4627
SEQ ID: 4628
SEQ ID: 4629
SEQ ID: 4630
SEQ ID: 4631
SEQ ID: 4632
SEQ ID: 4633
SEQ ID: 4634
SEQ ID: 4635
Figure imgf000044_0001
SEQ ID: 4636
SEQ ID: 4637
SEQ ID: 4638
SEQ ID: 4639
SEQ ID: 4640
SEQ ID: 4641
SEQ ID: 4642
SEQ ID: 4643
SEQ ID: 4644
SEQ ID: 4645
SEQ ID: 4646
SEQ ID: 4647
SEQ ID: 4648
SEQ ID: 4649
SEQ ID: 4650
SEQ ID: 4651
SEQ ID: 4652
SEQ ID: 4653
SEQ ID: 4654
SEQ ID: 4655
SEQ ID: 4656
SEQ ID: 4657
SEQ ID: 4658
SEQ ID: 4659
SEQ ID: 4660
SEQ ID: 4661
SEQ ID: 4662
SEQ ID: 4663
SEQ ID: 4664
SEQ ID: 4665
SEQ ID: 4666
SEQ ID: 4667
SEQ ID: 4668
SEQ ID: 4669
SEQ ID: 4670
SEQ ID: 4671
SEQ ID: 4672
SEQ ID: 4673
SEQ ID: 4674
SEQ ID: 4675
SEQ ID: 4676
SEQ ID: 4677
SEQ ID: 4678
SEQ ID: 4679
SEQ ID: 4680
SEQ ID: 4681
SEQ ID: 4682
SEQ ID: 4683
SEQ ID: 4684
SEQ ID: 4685
Figure imgf000045_0001
SEQ ID: 4686
SEQ ID: 4687
SEQ ID: 4688
SEQ ID: 4689
SEQ ID: 4690
SEQ ID: 4691
SEQ ID: 4692
SEQ ID: 4693
SEQ ID: 4694
SEQ ID: 4695
SEQ ID: 4696
SEQ ID: 4697
SEQ ID: 4698
SEQ ID: 4699
SEQ ID: 4700
SEQ ID: 4701
SEQ ID: 4702
SEQ ID: 4703
SEQ ID: 4704
SEQ ID: 4705
SEQ ID: 4706
SEQ ID: 4707
SEQ ID: 4708
SEQ ID: 4709
SEQ ID: 4710
SEQ ID: 471 1
SEQ ID: 4712
SEQ ID: 4713
SEQ ID: 4714
SEQ ID: 4715
SEQ ID: 4716
SEQ ID: 4717
SEQ ID: 4718
SEQ ID: 4719
SEQ ID: 4720
SEQ ID: 4721
SEQ ID: 4722
SEQ ID: 4723
SEQ ID: 4724
SEQ ID: 4725
SEQ ID: 4726
SEQ ID: 4727
SEQ ID: 4728
SEQ ID: 4729
SEQ ID: 4730
SEQ ID: 4731
SEQ ID: 4732
SEQ ID: 4733
SEQ ID: 4734
SEQ ID: 4735
Figure imgf000046_0001
Figure imgf000047_0001
SEQ ID: 3788 llll CAGTCT SEQ ID: 4287 TTCGGC SEQ ID: 4786
SEQ ID: 3789 CAGTGG SEQ ID: 4288 TTCTAA SEQ ID: 4787
SEQ ID: 3790 CAGTGT SEQ ID: 4289 1111 TTCTTC SEQ ID: 4788
SEQ ID: 3791 CAGTTC SEQ ID: 4290 TTGAAT SEQ ID: 4789
SEQ ID: 3792 i CATACA SEQ ID: 4291 till TTGAGA SEQ ID: 4790
SEQ ID: 3793 CA ACG SEQ ID: 4292 TTGAGG SEQ ID: 4791
SEQ ID: 3794 CA AC SEQ ID: 4293 111 TTGAGT SEQ ID: 4792
SEQ ID: 3795 Illli CATAGA SEQ ID: 4294 1111 TTGATG SEQ ID: 4793
SEQ ID: 3796 CATAGG SEQ ID: 4295 TTGCAC SEQ ID: 4794
SEQ ID: 3797 iiiiiii CATCAT SEQ ID: 4296 ill TTGCAG SEQ ID: 4795
SEQ ID: 3798 CATCCG SEQ ID: 4297 TTGCAT SEQ ID: 4796
SEQ ID: 3799 CATCCT SEQ ID: 4298 TTGCCG SEQ ID: 4797
SEQ ID: 3800 _ CATCGA SEQ ID: 4299 TTGCGA SEQ ID: 4798
SEQ ID: 3801 111 CATCGG SEQ ID: 4300 111 TTGCGG SEQ ID: 4799
SEQ ID: 3802 iiiiiii CATCGT SEQ ID: 4301 111 TTGCTA SEQ ID: 4800
SEQ ID: 3803 lllll¾ CATGCG SEQ ID: 4302 TTGGCA SEQ ID: 4801
SEQ ID: 3804 ill CATGTA SEQ ID: 4303 TTGGGC SEQ ID: 4802
SEQ ID: 3805 iiiiiii CAT T AC SEQ ID: 4304 1111 TTGTAT SEQ ID: 4803
SEQ ID: 3806 flllll CAT TAG SEQ ID: 4305 till TTGTCA SEQ ID: 4804
SEQ ID: 3807 CATTCA SEQ ID: 4306 TTGTCG SEQ ID: 4805
SEQ ID: 3808 Illi CATTGA SEQ ID: 4307 ill TTGTGC SEQ ID: 4806
SEQ ID: 3809 CATTGC SEQ ID: 4308 TTGTGT SEQ ID: 4807
SEQ ID: 3810 CCACAA 1111
SEQ ID: 4309 TTGTTA SEQ ID: 4808
SEQ ID: 381 1 flllll CCAGAT SEQ ID: 4310 till TTGTTT SEQ ID: 4809
SEQ ID: 3812 CCATCA SEQ ID: 431 1 TTTACA SEQ ID: 4810
SEQ ID: 3813 CCATCC SEQ ID: 4312 TTTAGG SEQ ID: 481 1
SEQ ID: 3814 _ CCATCG SEQ ID: 4313 TTTATG SEQ ID: 4812
SEQ ID: 3815 111 CCATTA SEQ ID: 4314 111 TTTCAA SEQ ID: 4813
SEQ ID: 3816 iiiiiii CCCATC SEQ ID: 4315 111 TTTCCT SEQ ID: 4814
SEQ ID: 3817 lllll¾ CCCCTG SEQ ID: 4316 TTTCGC SEQ ID: 4815
SEQ ID: 3818 ill CCCGCA SEQ ID: 4317 TTTGAG SEQ ID: 4816
SEQ ID: 3819 iiiiiii CCGACA SEQ ID: 4318 1111 TTTGCG SEQ ID: 4817
SEQ ID: 3820 flllll CCGCTT SEQ ID: 4319 till TTTGCT SEQ ID: 4818
SEQ ID: 3821 CCGGTT SEQ ID: 4320 TTTGGC SEQ ID: 4819
SEQ ID: 3822 CCGTAT SEQ ID: 4321 111 TTTTCC SEQ ID: 4820
SEQ ID: 3823 CCGTTT SEQ ID: 4322 1111 TTTTGC SEQ ID: 4821
ill TTTTTA SEQ ID: 4822
[0083] In some embodiments, a set of forward strand primers includes from 20- 100, 100-200, 200-400, 400-600, 600-800, or 800-10000 primers in which at least one hexamer sequence is not a sequence selected from the hexamer sequences in Table 4. In some embodiments, a set of reverse strand primers includes from 20-100, 100-200, 200-400, 400-600, 600-800, or 800-10000 primers in which at least one hexamer sequence is not a sequence selected from the hexamer sequences in Table 5. [0084] In some embodiments, a set of forward strand (FS-MEP) primers include a hexamer (H) sequence selected from the group consisting of hexamers in Table 2. In some embodiments, a set of reverse strand (RS-MEP) primers include primers a hexamer sequence selected from the group consisting of hexamers in Table 3.
[0085] In some embodiments, H is a hexamer sequence selected from Table 2. This embodiment can be useful in a set of forward strand (FS-MEP) primers.
TABLE 5
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
Figure imgf000052_0001
SEQ ID: 551 1
SEQ ID: 5512
SEQ ID: 5513
SEQ ID: 5514
SEQ ID: 5515
SEQ ID: 5516
SEQ ID: 5517
SEQ ID: 5518
SEQ ID: 5519
SEQ ID: 5520
SEQ ID: 5521
SEQ ID: 5522
SEQ ID: 5523
SEQ ID: 5524
SEQ ID: 5525
SEQ ID: 5526
SEQ ID: 5527
SEQ ID: 5528
SEQ ID: 5529
SEQ ID: 5530
SEQ ID: 5531
SEQ ID: 5532
SEQ ID: 5533
SEQ ID: 5534
SEQ ID: 5535
SEQ ID: 5536
SEQ ID: 5537
SEQ ID: 5538
SEQ ID: 5539
SEQ ID: 5540
SEQ ID: 5541
SEQ ID: 5542
SEQ ID: 5543
SEQ ID: 5544
SEQ ID: 5545
SEQ ID: 5546
SEQ ID: 5547
SEQ ID: 5548
SEQ ID: 5549
SEQ ID: 5550
SEQ ID: 5551
SEQ ID: 5552
SEQ ID: 5553
SEQ ID: 5554
SEQ ID: 5555
SEQ ID: 5556
SEQ ID: 5557
SEQ ID: 5558
SEQ ID: 5559
SEQ ID: 5560
Figure imgf000053_0001
CACAAC SEQ ID: 5061 ft 11 GCCTAG SEQ ID: 531 1 11 TTGCGA SEQ ID: 5561
CACACA SEQ ID: 5062 ill ft GCCTGA SEQ ID: 5312 TTGCGT SEQ ID: 5562
CACAGC SEQ ID: 5063 GCCTTG SEQ ID: 5313 TTGCTT SEQ ID: 5563
CACAGG SEQ ID: 5064 GCGAAA SEQ ID: 5314 TTGGAA SEQ ID: 5564
C AC A SEQ ID: 5065 ft 11 GCGACA SEQ ID: 5315 ill TTGTAA SEQ ID: 5565
CACCGA SEQ ID: 5066 GCGATA SEQ ID: 5316 TTGTAC SEQ ID: 5566
CACGAT SEQ ID: 5067 GCGATT SEQ ID: 5317 11 11 TTGTAT SEQ ID: 5567
CACTGA SEQ ID: 5068 ft GCGCAG SEQ ID: 5318 TTGTGC SEQ ID: 5568
CAGAAT SEQ ID: 5069 GCGCAT SEQ ID: 5319 liil TTGTGG SEQ ID: 5569
CAGAGA SEQ ID: 5070 ft 11 GCGCTA SEQ ID: 5320 111 TTTATA SEQ ID: 5570
C AG ATT SEQ ID: 5071 ft 11 GCGTAT SEQ ID: 5321 ft "111 TTTTAT SEQ ID: 5571
CAGCAC SEQ ID: 5072 111 n GCGTCT SEQ ID: 5322
[0086] In some embodiments, H is a hexamer selected from Table 3. This embodiment can be useful in a set of reverse strand (RS-MEP) primers.
TABLE 6
Figure imgf000054_0001
Figure imgf000054_0002
SEQ ID 5852 TATAAA SEQ ID 6101
SEQ ID 5853 TATAAT SEQ ID 6102
SEQ ID 5854 1111 TATACA SEQ ID 6103
SEQ ID 5855 TATACG SEQ ID 6104
SEQ ID 5856 till TATATC SEQ ID 6105
SEQ ID 5857 TATATG SEQ ID 6106
SEQ ID 5858 111 TATCCT SEQ ID 6107
SEQ ID 5859 1111 TATCGA SEQ ID 6108
SEQ ID 5860 TATCGC SEQ ID 6109
SEQ ID 5861 ill TATCGG SEQ ID 61 10
SEQ ID 5862 TATCGT SEQ ID 61 1 1
SEQ ID 5863 TATCTC SEQ ID 61 12
SEQ ID 5864 TATCTT SEQ ID 61 13
SEQ ID 5865 111 TA GAG SEQ ID 61 14
SEQ ID 5866 111 TATGAT SEQ ID 61 15
SEQ ID 5867 TATGCA SEQ ID 61 16
SEQ ID 5868 TATGCG SEQ ID 61 17
SEQ ID 5869 1111 TATGGC SEQ ID 61 18
SEQ ID 5870 till TATGGG SEQ ID 61 19
SEQ ID 5871 TATGTC SEQ ID 6120
SEQ ID 5872 ill TATGTT SEQ ID 6121
SEQ ID 5873 1111 TATTCG SEQ ID 6122
SEQ ID 5874 TATTGG SEQ ID 6123
SEQ ID 5875 till TATTGT SEQ ID 6124
SEQ ID 5876 TATTTA SEQ ID 6125
SEQ ID 5877 TATTTC SEQ ID 6126
SEQ ID 5878 TATTTG SEQ ID 6127
SEQ ID 5879 111 TCAATC SEQ ID 6128
SEQ ID 5880 111 TCACAG SEQ ID 6129
SEQ ID 5881 TCACAT SEQ ID 6130
SEQ ID 5882 TCACCG SEQ ID 6131
SEQ ID 5883 1111 TCACGG SEQ ID 6132
SEQ ID 5884 till TCACGT SEQ ID 6133
SEQ ID 5885 TCACTA SEQ ID 6134
SEQ ID 5886 111 TCACTC SEQ ID 6135
SEQ ID 5887 1111 TCAGAG SEQ ID 6136
SEQ ID 5888 TCAGGC SEQ ID 6137
SEQ ID 5889 till TCAGGT SEQ ID 6138
SEQ ID 5890 TCAGTG SEQ ID 6139
SEQ ID 5891 1111 TCATCC SEQ ID 6140
SEQ ID 5892 1111 TCATCG SEQ ID 6141
SEQ ID 5893 TCATGA SEQ ID 6142
SEQ ID 5894 111 TCATGC SEQ ID 6143
SEQ ID 5895 TCATGG SEQ ID 6144
SEQ ID 5896 TCATGT SEQ ID 6145
SEQ ID 5897 1111 TCATTC SEQ ID 6146
SEQ ID 5898 TCATTT SEQ ID 6147
SEQ ID 5899 111 TCCAAC SEQ ID 6148
SEQ ID 5900 TCCACA SEQ ID 6149
SEQ ID 5901 ill TCCAGA SEQ ID
Figure imgf000055_0001
6150 SEQ ID 5902 TCCCAG SEQ ID 6151
SEQ ID 5903 TCCTGT SEQ ID 6152
SEQ ID 5904 1111 TCCTTG SEQ ID 6153
SEQ ID 5905 TCGAAT SEQ ID 6154
SEQ ID 5906 till TCGACC SEQ ID 6155
SEQ ID 5907 TCGACG SEQ ID 6156
SEQ ID 5908 111 TCGACT SEQ ID 6157
SEQ ID 5909 1111 TCGAGC SEQ ID 6158
SEQ ID 5910 TCGAGT SEQ ID 6159
SEQ ID 591 1 ill TCGATC SEQ ID 6160
SEQ ID 5912 TCGCAA SEQ ID 6161
SEQ ID 5913 TCGCAT SEQ ID 6162
SEQ ID 5914 TCGCGT SEQ ID 6163
SEQ ID 5915 111 TCGGAC SEQ ID 6164
SEQ ID 5916 111 TCGGTA SEQ ID 6165
SEQ ID 5917 TCGGTG SEQ ID 6166
SEQ ID 5918 TCGTCG SEQ ID 6167
SEQ ID 5919 1111 TCGTCT SEQ ID 6168
SEQ ID 5920 till TCGTGT SEQ ID 6169
SEQ ID 5921 TCGTTA SEQ ID 6170
SEQ ID 5922 ill TCGTTC SEQ ID 6171
SEQ ID 5923 1111 TCGTTG SEQ ID 6172
SEQ ID 5924 TCTACG SEQ ID 6173
SEQ ID 5925 till TCTAGG SEQ ID 6174
SEQ ID 5926 TCTATA SEQ ID 6175
SEQ ID 5927 TCTCAC SEQ ID 6176
SEQ ID 5928 TCTCAG SEQ ID 6177
SEQ ID 5929 111 TCTCAT SEQ ID 6178
SEQ ID 5930 111 TCTCGT SEQ ID 6179
SEQ ID 5931 TCTCTA SEQ ID 6180
SEQ ID 5932 TCTCTG SEQ ID 6181
SEQ ID 5933 1111 TCTGCG SEQ ID 6182
SEQ ID 5934 till TCTGCT SEQ ID 6183
SEQ ID 5935 TCTGTC SEQ ID 6184
SEQ ID 5936 111 TCTGTT SEQ ID 6185
SEQ ID 5937 1111 TCTTAT SEQ ID 6186
SEQ ID 5938 TCTTCG SEQ ID 6187
SEQ ID 5939 till TCTTCT SEQ ID 6188
SEQ ID 5940 TCTTGT SEQ ID 6189
SEQ ID 5941 1111 TCTTTA SEQ ID 6190
SEQ ID 5942 1111 TGAAGT SEQ ID 6191
SEQ ID 5943 TGAATA SEQ ID 6192
SEQ ID 5944 111 TGAATC SEQ ID 6193
SEQ ID 5945 TGACAT SEQ ID 6194
SEQ ID 5946 TGACCG SEQ ID 6195
SEQ ID 5947 1111 TGACTT SEQ ID 6196
SEQ ID 5948 T GAG AT SEQ ID 6197
SEQ ID 5949 111 TGAGCG SEQ ID 6198
SEQ ID 5950 TGAGGG SEQ ID 6199
SEQ ID 5951 ill TGAGTA SEQ ID
Figure imgf000056_0001
6200 SEQ ID 5952 TGATAA SEQ ID 6201
SEQ ID 5953 GA AC SEQ ID 6202
SEQ ID 5954 1111 TGATCA SEQ ID 6203
SEQ ID 5955 TGATTC SEQ ID 6204
SEQ ID 5956 till TGATTG SEQ ID 6205
SEQ ID 5957 TGCAAC SEQ ID 6206
SEQ ID 5958 111 TGCACA SEQ ID 6207
SEQ ID 5959 1111 TGCACC SEQ ID 6208
SEQ ID 5960 TGCAGG SEQ ID 6209
SEQ ID 5961 ill TGCATC SEQ ID 6210
SEQ ID 5962 TGCCAC SEQ ID 621 1
SEQ ID 5963 TGCCGG SEQ ID 6212
SEQ ID 5964 TGCCGT SEQ ID 6213
SEQ ID 5965 111 TGCGAC SEQ ID 6214
SEQ ID 5966 111 TGCGCA SEQ ID 6215
SEQ ID 5967 TGCGCT SEQ ID 6216
SEQ ID 5968 TGCGTA SEQ ID 6217
SEQ ID 5969 1111 TGCTAC SEQ ID 6218
SEQ ID 5970 till TGCTAG SEQ ID 6219
SEQ ID 5971 TGCTAT SEQ ID 6220
SEQ ID 5972 ill TGCTCC SEQ ID 6221
SEQ ID 5973 1111 TGCTGA SEQ ID 6222
SEQ ID 5974 TGCTGG SEQ ID 6223
SEQ ID 5975 till TGCTGT SEQ ID 6224
SEQ ID 5976 TGCTTT SEQ ID 6225
SEQ ID 5977 TGGACT SEQ ID 6226
SEQ ID 5978 TGGAGT SEQ ID 6227
SEQ ID 5979 111 TGGCAG SEQ ID 6228
SEQ ID 5980 111 TGGCTA SEQ ID 6229
SEQ ID 5981 TGGGAC SEQ ID 6230
SEQ ID 5982 TGGTAC SEQ ID 6231
SEQ ID 5983 1111 TGGTAT SEQ ID 6232
SEQ ID 5984 till TGGTCT SEQ ID 6233
SEQ ID 5985 111 TGTAAG SEQ ID 6234
SEQ ID 5986 TGTACC SEQ ID 6235
SEQ ID 5987 1111 TGTAGT SEQ ID 6236
SEQ ID 5988 TGTATA SEQ ID 6237
SEQ ID 5989 till TGTATC SEQ ID 6238
SEQ ID 5990 TGTATT SEQ ID 6239
SEQ ID 5991 1111 TGTCAC SEQ ID 6240
SEQ ID 5992 1111 TGTCAT SEQ ID 6241
SEQ ID 5993 TGTCGA SEQ ID 6242
SEQ ID 5994 111 TGTCGC SEQ ID 6243
SEQ ID 5995 TGTCGT SEQ ID 6244
SEQ ID 5996 TGTCTT SEQ ID 6245
SEQ ID 5997 1111 TGTGAC SEQ ID 6246
SEQ ID 5998 TGTGCA SEQ ID 6247
SEQ ID 5999 111 TGTGGA SEQ ID 6248
SEQ ID 6000 TGTGTC SEQ ID 6249
SEQ ID 6001 ill TGTGTG SEQ ID
Figure imgf000057_0001
6250 SEQ ID 6002 TGTTAA SEQ ID 6251
SEQ ID 6003 TGTTAT SEQ ID 6252
SEQ ID 6004 1111 TGTTCG SEQ ID 6253
SEQ ID 6005 TGTTTC SEQ ID 6254
SEQ ID 6006 till TGTTTG SEQ ID 6255
SEQ ID 6007 TTAAAC SEQ ID 6256
SEQ ID 6008 111 TTAA A SEQ ID 6257
SEQ ID 6009 1111 T ACAA SEQ ID 6258
SEQ ID 6010 T ACAT SEQ ID 6259
SEQ ID 601 1 ill TTACCG SEQ ID 6260
SEQ ID 6012 TTACCT SEQ ID 6261
SEQ ID 6013 TTACGG SEQ ID 6262
SEQ ID 6014 TTACGT SEQ ID 6263
SEQ ID 6015 111 TTACTC SEQ ID 6264
SEQ ID 6016 111 TTACTG SEQ ID 6265
SEQ ID 6017 TTAGCG SEQ ID 6266
SEQ ID 6018 TTAGGC SEQ ID 6267
SEQ ID 6019 1111 TTAGGG SEQ ID 6268
SEQ ID 6020 till TTATCG SEQ ID 6269
SEQ ID 6021 TTATCT SEQ ID 6270
SEQ ID 6022 ill TTATGC SEQ ID 6271
SEQ ID 6023 1111 TTATGT SEQ ID 6272
SEQ ID 6024 TTATTG SEQ ID 6273
SEQ ID 6025 till TTATTT SEQ ID 6274
SEQ ID 6026 TTCACG SEQ ID 6275
SEQ ID 6027 TTCAGG SEQ ID 6276
SEQ ID 6028 TTCATC SEQ ID 6277
SEQ ID 6029 111 TTCATG SEQ ID 6278
SEQ ID 6030 111 TTCCAA SEQ ID 6279
SEQ ID 6031 TTCCTG SEQ ID 6280
SEQ ID 6032 TTCGAC SEQ ID 6281
SEQ ID 6033 1111 TTCGCA SEQ ID 6282
SEQ ID 6034 till TTCGCT SEQ ID 6283
SEQ ID 6035 TTCGGC SEQ ID 6284
SEQ ID 6036 111 TTCTAA SEQ ID 6285
SEQ ID 6037 1111 TTCTTC SEQ ID 6286
SEQ ID 6038 TTGAAT SEQ ID 6287
SEQ ID 6039 till GAGA SEQ ID 6288
SEQ ID 6040 TTGAGG SEQ ID 6289
SEQ ID 6041 1111 TTGAGT SEQ ID 6290
SEQ ID 6042 1111 TTGATG SEQ ID 6291
SEQ ID 6043 TTGCAC SEQ ID 6292
SEQ ID 6044 111 TTGCAG SEQ ID 6293
SEQ ID 6045 TTGCAT SEQ ID 6294
SEQ ID 6046 TTGCCG SEQ ID 6295
SEQ ID 6047 1111 TTGCGA SEQ ID 6296
SEQ ID 6048 TTGCGG SEQ ID 6297
SEQ ID 6049 111 TTGCTA SEQ ID 6298
SEQ ID 6050 TTGGCA SEQ ID 6299
SEQ ID 6051 ill TTGGGC SEQ ID
Figure imgf000058_0001
6300 CA AG SEQ ID: 5803 1111111 GTGGTC SEQ ID 6052 TTGTAT SEQ ID 6301
CATTCA SEQ ID: 5804 illlll GTGGTT SEQ ID 6053 TTGTCA SEQ ID 6302
CATTGA SEQ ID: 5805 GTGTCT SEQ ID 6054 TTGTCG SEQ ID 6303
CATTGC SEQ ID: 5806 GTTAAC SEQ ID 1111
6055 TTGTGC SEQ ID 6304
CCACAA SEQ ID: 5807 ill 11 GTTACT SEQ ID 6056 till TTGTGT SEQ ID 6305
CCAGAT SEQ ID: 5808 GTTAGA SEQ ID 6057 TTGTTA SEQ ID 6306
CCATCA SEQ ID: 5809 GTTAGC SEQ ID 6058 111 TTGTTT SEQ ID 6307
CCATCC SEQ ID: 5810 ■ 11 GTTATA SEQ ID 6059 TTTACA SEQ ID 6308
CCATCG SEQ ID: 581 1 ill GTTATC SEQ ID 1111
6060 TTTAGG SEQ ID 6309
CCATTA SEQ ID: 5812 111111 GTTCGG SEQ ID 6061 ill TTTATG SEQ ID 6310
CCCATC SEQ ID: 5813 Ill 111 GTTGCG SEQ ID 6062 TTTCAA SEQ ID 631 1
CCCCTG SEQ ID: 5814 1 GTTGTG SEQ ID 6063 TTTCCT SEQ ID 6312
CCCGCA SEQ ID: 5815 1111 GTTTAT SEQ ID 6064 TTTCGC SEQ ID 6313
CCGACA SEQ ID: 5816 ■ GTTTCA SEQ ID 6065 111 TTTGAG SEQ ID 6314
CCGCTT SEQ ID: 5817 iiiiiii GTTTGC SEQ ID 6066 TTTGCG SEQ ID 6315
CCGGTT SEQ ID: 5818 Illlll GTTTTG SEQ ID 6067 111 TTTGCT SEQ ID 6316
CCGTAT SEQ ID: 5819 1 t TAAAAT SEQ ID 6068 TTTGGC SEQ ID 6317
CCGTTT SEQ ID: 5820 111 ½1 TAAACA SEQ ID 6069 1111 TTTTCC SEQ ID 6318
111 11 till TTTTGC SEQ ID 6319
§ 111 111 TTTTTA SEQ ID 6320
[0087] In some embodiments, a primer set includes at least one primer in which the hexamer sequence H is not a sequence selected from Table 4.
[0088] The primers can be labeled by any suitable molecule and/or label known in the art. Non-limiting examples of suitable labels include fluorescent tags suitable for use in Real Time PCR amplification, for example TaqMan™, SYBR™ Green, TAMRA™ and/or FAM probes, radiolabels, and so forth. In certain embodiments, the primer further comprises a detectable non-isotopic label selected from the group consisting of a fluorescent molecule, a chemiluminescent molecule, an enzyme, a cofactor, an enzyme substrate, and a hapten.
Generation of a Set of MEP Primers
[0089] The selection of hexamers for use in MEP primer sets, as well as selection of MEP primers, is not microbe specific. It can depend, however, on the host or the tissue of the host, as these factors will determine which of the host genes are highly expressed. In some embodiments, primers that amplify host ribosomal and mitochondrial sequences are removed from the initial 4096 (46) random hexamer set. For example, sequence data generated over the course of 800 high-throughput sequencing runs on the 454 sequencing platform were analyzed. These analyses show which host regions were amplified by the entire set of random hexamers. Primers that anneal to the host ribosomal and mitochondrial sequences are then removed from the initial primer library to generate a second primer library. The second library comprises a set of hexamer-containing primers that enrich the reads in a high throughput sequences for the presence of bacterial, viral, fungal, or parasite nucleic acids in a given organism.
[0090] In some embodiments, the primer library is generated as follows: (a) sequence reads representing all rRNA (12S, 16S, 18S, 28S) and mitochondrial RNA (ATP synthase, NADH dehydrogenase) present in 454-unbiased high throughput sequencing (UHTS) experiments are selected; (b) all combinations of hexamers (e.g., 3,268 primers) that anneal to host sequences identified in step (a), but not those that anneal to regions of rRNA and mitochondrial RNA predicted to form secondary structure, are excluded; and (c) 828 hexamer primers that do not bind to host sequences identified in step (a) are synthesized.
[0091] In some embodiments, the performance of the MEP primers can be validated in real time PCR experiments that confirmed depletion of host sequences and in 454 pyrosequencing experiments. In still another embodiment, sensitivity for detection of viral sequences is increased >1000X.
Amplification of Microbial Nucleic Acids with MEP Primers
[0092] Microbial nucleic acids can be amplified as follows. Total RNA can be extracted from the sample using commercially available kits (e.g., Qiagen, Valencia, CA). Contaminating genomic DNA is removed, for example with DNAsel (Qiagen, Valencia, CA) or by any other procedure or reagent that removes DNA, but not RNA. The resultant RNA is reverse transcribed using the forward-strand microbial enrichment primer set (FS-MEP) to generate first-strand cDNA, as illustrated in Fig. 3A. Reverse transcriptase is inactivated or removed. Next, second-strand cDNA synthesis is carried out with the reverse-strand microbial enrichment primer set (RS-MEP) in the presence of a polymerase enzyme as illustrated in Fig. 3B. The resultant double-stranded cDNA is amplified, for example using Extend primers (Roche Diagnostics, Branchburg, NJ), as illustrated in Fig. 3C. Other suitable primers can also be used for amplification of the resultant double-stranded cDNA, as long as they can hybridize to the Specific Tail sequence or a portion thereof under PCR conditions.
Detection of Microbial Nucleic Acids With MEP Primers
[0093] The nucleic acid sequences of the present disclosure are used as primers to amplify microbial nucleic acid in a sample from a host, without significantly amplifying the host's RNA in the sample. If the nucleic acid sequences successfully amplify nucleic acid in the sample, then the sample contains microbial nucleic acid, indicating that the sample is infected with microbes. This method is referred to herein as the "MEP protocol" or the "microbial enrichment primer protocol."
[0094] An exemplary method includes: providing a sample from an organism containing R A, reverse transcribing the R A in the sample using FS-MEP primers to form a first cDNA strand from. The method further includes generating a second cDNA strand that is complementary to the first cDNA strand using RS-MEP with hexamer sequences that are complementary to the hexamer sequences of FS-MEP. The method further includes PCR amplifying double-stranded cDNA using a first amplification primer having the same sequence as the special tail (ST) of the FS-MEP and a second amplification primer having a sequence complementary to the specific tail (ST) of the FS-MEP, thereby generating a population of double-stranded cDNA products. The method further includes determining the presence or absence of the amplified cDNA products, wherein the presence of the amplified cDNA products indicates the presence of a microbial nucleic acid in the sample.
[0095] In some embodiments the PCR amplification is done using real-time PCR. In some embodiments the first and second amplification primers are labeled. The primers can be labeled by any suitable molecule and/or label known in the art. Non-limiting examples of suitable labels include a radiolabel, a detectable non-isotopic label, a fluorescent molecule, a chemiluminescent molecule, an enzyme, a cofactor, an enzyme substrate, and a hapten.
[0096] In some embodiments, the method is used to detect RNA from a microbe selected from the group consisting of a bacterium, a virus, a fungi, and a parasite. In some embodiments, the microbe is a virus, an arenavirus, or Lujo virus. In some embodiments, the sample is from a subject, a mammal, or a human. In other embodiments, the sample is a biological sample. In some embodiments the biological sample is from a tissue culture or a cell culture. In some embodiments the sample is tissue, bodily fluids, blood, sperm saliva, or cells. In some embodiments, the sample is from a host. Non-limiting examples of hosts include organisms, vertebrates, invertebrates, mammals, humans, dogs, cats, cattle, pigs, sheep, rabbits, mice, rats, birds, reptiles, amphibians, fish, insects, plants, tissue cultures, and cell cultures.
Method of Making Microbial Enrichment Primers
[0097] Once the MEP primers are designed they can be synthesized using standard methods, such as by solid-phase synthesis using the phosphoramidite method. Microbial enrichment primer kits
[0098] In some embodiments, the invention provides a kit comprising a set of MEP primers and instructions for use. In some embodiments, the set of MEP primers is a set of forward-strand MEP (FS-MEP) primers. In some embodiments, the set of MEP primers is a set of reverse-strand MEP (FS-MEP) primers. In some embodiments, the set of MEP primers is a set of forward-strand MEP (FS-MEP) and reverse-strand MEP (FS-MEP) primers.
EXAMPLES
[0099] This invention is further illustrated by the following example, which should not be construed as limiting. Those skilled in the art will recognize, or be able to ascertain, using no more than routine experimentation, numerous equivalents to the specific substances and procedures described herein. Such equivalents are intended to be encompassed in the scope of the claims that follow the examples below.
Example 1
Unbiased High Throughput Sequencing (UHTS) Experiments
[00100] 45 UHTS experiments from human tissue samples (retina, pancreas, liver, heart, lung, brain, plasma, serum, throat swab, cerebrospinal fluid, brain, liver, heart) were analyzed. The libraries were generated using random octamer primers according to the protocol previously described (2-3). Briefly, total R A was extracted and digested with DNAsel to eliminate human chromosomal DNA. RNA preparations were amplified by means of reverse transcriptase PCR (PCR) with the use of random primers. Amplification products were pooled and sequenced with the use of the GSL FLX platform (454 Life Sciences, Branford, CT), but DNA fragmentation was omitted (4).
Microbe Enrichment Primer (MEP) Design
[00101] Selection of the microbial enrichment primers is based on the analysis of sequence data generated over the course of 45 UHTS sequencing runs on the 454-Roche sequencing platform.
[00102] Step 1. For each library, sequence reads identified for human, 18S
(NR_003286) and 28S (NR_003287) rRNA and mitochondrial 12S (NC_012920, nt 648- 1601), 16S (NC_012920, nt 1671-3229), ATP Synthase (NC_012920), NADH dehydrogenase (ND2, ND4, ND6) (NC_012920) are selected and mapped to the reference sequences.
[00103] Step 2. The read distribution for each host gene listed in Step 1 is determined by coverage depth. The coverage depth for a given nucleotide in a reference sequence is evaluated as the number of reads that align to the reference sequence and cover this given nucleotide (Figs. 4A-B). Fig. 4A illustrates the read mapping procedure. Reads identified by 454 UHTS (Read 1 through 5) are aligned to the reference sequence (Ref. seq). The coverage depth (Depth) for a given nucleotide in the reference sequence is evaluated as the number of reads that align to the reference sequence and cover this given nucleotide. In this example the coverage depth for a given nucleotide ranges from 0 to 5. Fig. 4B is an illustrative plot of the coverage depth along the reference sequence allows to visualize the number of reads in each region.
[00104] Fig. 5 shows predicted secondary structure of the 28S human ribosomal RNA sequence at 65 °C using RNAfold program from the Vienna-RNA package (Hofacker et al, "Fast Folding and Comparison of RNA Secondary Structures," Monatshefte fur Chemie 125, 167-188 (1994). Regions which are GC-rich have strong secondary structure stems and no reads where identified in 454- unbiased high-throughput sequencing experiments.
[00105] A plot of the coverage depth as a function of the nucleotide position provides a quantitative measure of the non-uniformity of the read distribution. Figs. 6A-6D illustrate for all 45 UHTS experiments the normalized coverage depth for human genes 12S, 16S, 18S and 28S rRNA, respectively, when random primers are used. Figs. 7A-7D illustrate the raw coverage depth (number of reads) for the same experiments as Figs. 6A-6D, respectively. Identical analyses were conducted for mitochondrial host genes.
[00106] Step 3. Regions giving a large amount of reads (arbitrary cutoff: >1% of relative coverage depth) were selected. Figs. 6A-6D and Table 8 show the regions selected for 12S, 16S, 18S and 28S. Table 7 is a summary table of regions giving a large number of reads (>1% of relative coverage depth) for the ribosomal genes. Figs. 6A-6D show the location of those regions. Rl to R6 indicate the different regions selected for each ribosomal gene.
TABLE 7 Accession Regions
G&m ttame number selected
1 S NC_0 ! 2920 Complete gene
16S INC j012920 R1 , 2, R3
18S NC_01.2:920 Rl r R2
R1 , R2, , 4,
[00107] Step 4. All possible hexamer sequences with perfect sequence matches to the host regions selected in Step 3 were excluded. Focusing on these specific host regions allows to exclude only a minimal number of primers from the random hexamer library.
[00108] The resulting microbe enrichment primer set, is composed of 828 hexamers for the forward sense primers (FS-MEP) and 834 hexamers for the reverse sense primers (RS- MEP). Each primer consists of a hexamer followed by a random base (N) and a specific tail:
Hexamer + N + Specific Tail
[00109] The Specific Tail sequence (Extend primer) was
GTTTCCCAGTAGGTCTC. Sequences of suitable the Hexamers are provided in Tables 1, 2 and 3.
[00110] The forward stand (FS-MEP) primer set used in this experiment is provided in Table 8.
TABLE 8
Figure imgf000064_0001
G I N CCCAGTAGGTCTC N AACAAG SEQ ID NO: 6335
GTTTCCCAGTAGGTCTC N AACAGC SEQ ID NO: 6336
GTTTCCCAGTAGGTCTC N AACATA SEQ ID NO: 6337
GTTTCCCAGTAGGTCTC N AACCAC SEQ ID NO: 6338
GTTTCCCAGTAGGTCTC N AACCCC SEQ ID NO: 6339
GTTTCCCAGTAGGTCTC N AACCGG SEQ ID NO: 6340
GTTTCCCAGTAGGTCTC N AACCGT SEQ ID NO: 6341
GTTTCCCAGTAGGTCTC N AACGAC SEQ ID NO: 6342
GTTTCCCAGTAGGTCTC N AACGAG SEQ ID NO: 6343
GTTTCCCAGTAGGTCTC N AACGAT SEQ ID NO: 6344
GTTTCCCAGTAGGTCTC N AACGCG SEQ ID NO: 6345
GTTTCCCAGTAGGTCTC N AACGTA SEQ ID NO: 6346
GTTTCCCAGTAGGTCTC N AACGTG SEQ ID NO: 6347
GTTTCCCAGTAGGTCTC N AACGTT SEQ ID NO: 6348
GTTTCCCAGTAGGTCTC N AACTAT SEQ ID NO: 6349
GTTTCCCAGTAGGTCTC N AACTCG SEQ ID NO: 6350
GTTTCCCAGTAGGTCTC N AACTGA SEQ ID NO: 6351
GTTTCCCAGTAGGTCTC N AACTTA SEQ ID NO: 6352
GTTTCCCAGTAGGTCTC N AACTTG SEQ ID NO: 6353
GTTTCCCAGTAGGTCTC N AAGACA SEQ ID NO: 6354
GTTTCCCAGTAGGTCTC N AAGACG SEQ ID NO: 6355
GTTTCCCAGTAGGTCTC N AAGATA SEQ ID NO: 6356
GTTTCCCAGTAGGTCTC N AAGCAG SEQ ID NO: 6357
GTTTCCCAGTAGGTCTC N AAGCGC SEQ ID NO: 6358
GTTTCCCAGTAGGTCTC N AAGCGG SEQ ID NO: 6359
GTTTCCCAGTAGGTCTC N AAGGAA SEQ ID NO: 6360
GTTTCCCAGTAGGTCTC N AAGGCC SEQ ID NO: 6361
GTTTCCCAGTAGGTCTC N AAGGGA SEQ ID NO: 6362
GTTTCCCAGTAGGTCTC N AAGGGT SEQ ID NO: 6363
GTTTCCCAGTAGGTCTC N AAGTCA SEQ ID NO: 6364
GTTTCCCAGTAGGTCTC N AAGTCC SEQ ID NO: 6365
GTTTCCCAGTAGGTCTC N AAGTCT SEQ ID NO: 6366
GTTTCCCAGTAGGTCTC N AAGTGA SEQ ID NO: 6367
GTTTCCCAGTAGGTCTC N AAGTGT SEQ ID NO: 6368
GTTTCCCAGTAGGTCTC N AATAAG SEQ ID NO: 6369
GTTTCCCAGTAGGTCTC N AATACA SEQ ID NO: 6370
GTTTCCCAGTAGGTCTC N AATACC SEQ ID NO: 6371
GTTTCCCAGTAGGTCTC N AATATC SEQ ID NO: 6372
GTTTCCCAGTAGGTCTC N AATCAG SEQ ID NO: 6373
GTTTCCCAGTAGGTCTC N AATCCG SEQ ID NO: 6374
GTTTCCCAGTAGGTCTC N AATCTA SEQ ID NO: 6375
GTTTCCCAGTAGGTCTC N AATGGC SEQ ID NO: 6376
GTTTCCCAGTAGGTCTC N AATGTG SEQ ID NO: 6377 G I N CCCAGTAGGTCTC N AATTCG SEQ ID NO: 6378
GTTTCCCAGTAGGTCTC N ACAACC SEQ ID NO: 6379
GTTTCCCAGTAGGTCTC N ACAACG SEQ ID NO: 6380
GTTTCCCAGTAGGTCTC N ACAAGA SEQ ID NO: 6381
GTTTCCCAGTAGGTCTC N ACAAGT SEQ ID NO: 6382
GTTTCCCAGTAGGTCTC N ACAATA SEQ ID NO: 6383
GTTTCCCAGTAGGTCTC N ACAATT SEQ ID NO: 6384
GTTTCCCAGTAGGTCTC N ACACAA SEQ ID NO: 6385
GTTTCCCAGTAGGTCTC N ACACAG SEQ ID NO: 6386
GTTTCCCAGTAGGTCTC N ACACGA SEQ ID NO: 6387
GTTTCCCAGTAGGTCTC N ACACTA SEQ ID NO: 6388
GTTTCCCAGTAGGTCTC N ACACTG SEQ ID NO: 6389
GTTTCCCAGTAGGTCTC N ACAGAG SEQ ID NO: 6390
GTTTCCCAGTAGGTCTC N ACAGCA SEQ ID NO: 6391
GTTTCCCAGTAGGTCTC N ACAGCC SEQ ID NO: 6392
GTTTCCCAGTAGGTCTC N ACAGCG SEQ ID NO: 6393
GTTTCCCAGTAGGTCTC N ACAGCT SEQ ID NO: 6394
GTTTCCCAGTAGGTCTC N ACAGGA SEQ ID NO: 6395
GTTTCCCAGTAGGTCTC N ACAGTA SEQ ID NO: 6396
GTTTCCCAGTAGGTCTC N ACATAA SEQ ID NO: 6397
GTTTCCCAGTAGGTCTC N ACATAT SEQ ID NO: 6398
GTTTCCCAGTAGGTCTC N ACATCC SEQ ID NO: 6399
GTTTCCCAGTAGGTCTC N ACATGA SEQ ID NO: 6400
GTTTCCCAGTAGGTCTC N ACATTT SEQ ID NO: 6401
GTTTCCCAGTAGGTCTC N ACCAAT SEQ ID NO: 6402
GTTTCCCAGTAGGTCTC N ACCACG SEQ ID NO: 6403
GTTTCCCAGTAGGTCTC N ACCACT SEQ ID NO: 6404
GTTTCCCAGTAGGTCTC N ACCAGG SEQ ID NO: 6405
GTTTCCCAGTAGGTCTC N ACCATT SEQ ID NO: 6406
GTTTCCCAGTAGGTCTC N ACCCCC SEQ ID NO: 6407
GTTTCCCAGTAGGTCTC N ACCCCT SEQ ID NO: 6408
GTTTCCCAGTAGGTCTC N ACCGAA SEQ ID NO: 6409
GTTTCCCAGTAGGTCTC N ACCGGT SEQ ID NO: 6410
GTTTCCCAGTAGGTCTC N ACCGTA SEQ ID NO: 641 1
GTTTCCCAGTAGGTCTC N ACCTAG SEQ ID NO: 6412
GTTTCCCAGTAGGTCTC N ACCTCG SEQ ID NO: 6413
GTTTCCCAGTAGGTCTC N ACCTGA SEQ ID NO: 6414
GTTTCCCAGTAGGTCTC N ACGAAG SEQ ID NO: 6415
GTTTCCCAGTAGGTCTC N ACGACA SEQ ID NO: 6416
GTTTCCCAGTAGGTCTC N ACGAGA SEQ ID NO: 6417
GTTTCCCAGTAGGTCTC N ACGAGC SEQ ID NO: 6418
GTTTCCCAGTAGGTCTC N ACGAGT SEQ ID NO: 6419
GTTTCCCAGTAGGTCTC N ACGATA SEQ ID NO: 6420 G I N CCCAGTAGGTCTC N ACGATC SEQ ID NO: 6421
GTTTCCCAGTAGGTCTC N ACGATG SEQ ID NO: 6422
GTTTCCCAGTAGGTCTC N ACGATT SEQ ID NO: 6423
GTTTCCCAGTAGGTCTC N ACGCGA SEQ ID NO: 6424
GTTTCCCAGTAGGTCTC N ACGCGG SEQ ID NO: 6425
GTTTCCCAGTAGGTCTC N ACGCGT SEQ ID NO: 6426
GTTTCCCAGTAGGTCTC N ACGCTA SEQ ID NO: 6427
GTTTCCCAGTAGGTCTC N ACGGCA SEQ ID NO: 6428
GTTTCCCAGTAGGTCTC N ACGGCT SEQ ID NO: 6429
GTTTCCCAGTAGGTCTC N ACGGTG SEQ ID NO: 6430
GTTTCCCAGTAGGTCTC N ACGTAA SEQ ID NO: 6431
GTTTCCCAGTAGGTCTC N ACGTAC SEQ ID NO: 6432
GTTTCCCAGTAGGTCTC N ACGTAT SEQ ID NO: 6433
GTTTCCCAGTAGGTCTC N ACGTCC SEQ ID NO: 6434
GTTTCCCAGTAGGTCTC N ACGTGA SEQ ID NO: 6435
GTTTCCCAGTAGGTCTC N ACGTGT SEQ ID NO: 6436
GTTTCCCAGTAGGTCTC N ACGTTA SEQ ID NO: 6437
GTTTCCCAGTAGGTCTC N ACGTTG SEQ ID NO: 6438
GTTTCCCAGTAGGTCTC N ACTACA SEQ ID NO: 6439
GTTTCCCAGTAGGTCTC N ACTACT SEQ ID NO: 6440
GTTTCCCAGTAGGTCTC N ACTAGC SEQ ID NO: 6441
GTTTCCCAGTAGGTCTC N ACTCAA SEQ ID NO: 6442
GTTTCCCAGTAGGTCTC N ACTCCA SEQ ID NO: 6443
GTTTCCCAGTAGGTCTC N ACTCGA SEQ ID NO: 6444
GTTTCCCAGTAGGTCTC N ACTCGT SEQ ID NO: 6445
GTTTCCCAGTAGGTCTC N ACTCTG SEQ ID NO: 6446
GTTTCCCAGTAGGTCTC N ACTGAT SEQ ID NO: 6447
GTTTCCCAGTAGGTCTC N ACTGCA SEQ ID NO: 6448
GTTTCCCAGTAGGTCTC N ACTGGC SEQ ID NO: 6449
GTTTCCCAGTAGGTCTC N ACTGTA SEQ ID NO: 6450
GTTTCCCAGTAGGTCTC N ACTGTC SEQ ID NO: 6451
GTTTCCCAGTAGGTCTC N ACTTCA SEQ ID NO: 6452
GTTTCCCAGTAGGTCTC N ACTTCT SEQ ID NO: 6453
GTTTCCCAGTAGGTCTC N AGAAGC SEQ ID NO: 6454
GTTTCCCAGTAGGTCTC N AGAATC SEQ ID NO: 6455
GTTTCCCAGTAGGTCTC N AGACAC SEQ ID NO: 6456
GTTTCCCAGTAGGTCTC N AGACAG SEQ ID NO: 6457
GTTTCCCAGTAGGTCTC N AGACAT SEQ ID NO: 6458
GTTTCCCAGTAGGTCTC N AGACCA SEQ ID NO: 6459
GTTTCCCAGTAGGTCTC N AGACCG SEQ ID NO: 6460
GTTTCCCAGTAGGTCTC N AGACGA SEQ ID NO: 6461
GTTTCCCAGTAGGTCTC N AGACGC SEQ ID NO: 6462
GTTTCCCAGTAGGTCTC N AGACTC SEQ ID NO: 6463 G I N CCCAGTAGGTCTC N AGACTG SEQ ID NO: 6464
GTTTCCCAGTAGGTCTC N AGAGAC SEQ ID NO: 6465
GTTTCCCAGTAGGTCTC N AGATAA SEQ ID NO: 6466
GTTTCCCAGTAGGTCTC N AGATAC SEQ ID NO: 6467
GTTTCCCAGTAGGTCTC N AGATCC SEQ ID NO: 6468
GTTTCCCAGTAGGTCTC N AGATGC SEQ ID NO: 6469
GTTTCCCAGTAGGTCTC N AGATGT SEQ ID NO: 6470
GTTTCCCAGTAGGTCTC N AGATTA SEQ ID NO: 6471
GTTTCCCAGTAGGTCTC N AGATTC SEQ ID NO: 6472
GTTTCCCAGTAGGTCTC N AGCAAA SEQ ID NO: 6473
GTTTCCCAGTAGGTCTC N AGCAAT SEQ ID NO: 6474
GTTTCCCAGTAGGTCTC N AGCACG SEQ ID NO: 6475
GTTTCCCAGTAGGTCTC N AGCAGA SEQ ID NO: 6476
GTTTCCCAGTAGGTCTC N AGCAGC SEQ ID NO: 6477
GTTTCCCAGTAGGTCTC N AGCATT SEQ ID NO: 6478
GTTTCCCAGTAGGTCTC N AGCCAT SEQ ID NO: 6479
GTTTCCCAGTAGGTCTC N AGCCTC SEQ ID NO: 6480
GTTTCCCAGTAGGTCTC N AGCCTG SEQ ID NO: 6481
GTTTCCCAGTAGGTCTC N AGCGCA SEQ ID NO: 6482
GTTTCCCAGTAGGTCTC N AGCGGT SEQ ID NO: 6483
GTTTCCCAGTAGGTCTC N AGCGTA SEQ ID NO: 6484
GTTTCCCAGTAGGTCTC N AGCGTG SEQ ID NO: 6485
GTTTCCCAGTAGGTCTC N AGCTCC SEQ ID NO: 6486
GTTTCCCAGTAGGTCTC N AGGAAA SEQ ID NO: 6487
GTTTCCCAGTAGGTCTC N AGGAAC SEQ ID NO: 6488
GTTTCCCAGTAGGTCTC N AGGACG SEQ ID NO: 6489
GTTTCCCAGTAGGTCTC N AGGAGC SEQ ID NO: 6490
GTTTCCCAGTAGGTCTC N AGGCCG SEQ ID NO: 6491
GTTTCCCAGTAGGTCTC N AGGTAA SEQ ID NO: 6492
GTTTCCCAGTAGGTCTC N AGGTAC SEQ ID NO: 6493
GTTTCCCAGTAGGTCTC N AGGTCT SEQ ID NO: 6494
GTTTCCCAGTAGGTCTC N AGGTGC SEQ ID NO: 6495
GTTTCCCAGTAGGTCTC N AGTAAC SEQ ID NO: 6496
GTTTCCCAGTAGGTCTC N AGTAAT SEQ ID NO: 6497
GTTTCCCAGTAGGTCTC N AGTACC SEQ ID NO: 6498
GTTTCCCAGTAGGTCTC N AGTACG SEQ ID NO: 6499
GTTTCCCAGTAGGTCTC N AGTATC SEQ ID NO: 6500
GTTTCCCAGTAGGTCTC N AGTATT SEQ ID NO: 6501
GTTTCCCAGTAGGTCTC N AGTCCA SEQ ID NO: 6502
GTTTCCCAGTAGGTCTC N AGTCCC SEQ ID NO: 6503
GTTTCCCAGTAGGTCTC N AGTCGA SEQ ID NO: 6504
GTTTCCCAGTAGGTCTC N AGTCGC SEQ ID NO: 6505
GTTTCCCAGTAGGTCTC N AGTCGT SEQ ID NO: 6506 G I N CCCAGTAGGTCTC N AGTCTG SEQ ID NO: 6507
GTTTCCCAGTAGGTCTC N AGTGAA SEQ ID NO: 6508
GTTTCCCAGTAGGTCTC N AGTGAC SEQ ID NO: 6509
GTTTCCCAGTAGGTCTC N AGTGAT SEQ ID NO: 6510
GTTTCCCAGTAGGTCTC N AGTGCC SEQ ID NO: 651 1
GTTTCCCAGTAGGTCTC N AGTGCT SEQ ID NO: 6512
GTTTCCCAGTAGGTCTC N AGTGGT SEQ ID NO: 6513
GTTTCCCAGTAGGTCTC N AGTTCG SEQ ID NO: 6514
GTTTCCCAGTAGGTCTC N AGTTGC SEQ ID NO: 6515
GTTTCCCAGTAGGTCTC N ATAAAC SEQ ID NO: 6516
GTTTCCCAGTAGGTCTC N ATAAAG SEQ ID NO: 6517
GTTTCCCAGTAGGTCTC N ATAACA SEQ ID NO: 6518
GTTTCCCAGTAGGTCTC N ATAACC SEQ ID NO: 6519
GTTTCCCAGTAGGTCTC N ATAACT SEQ ID NO: 6520
GTTTCCCAGTAGGTCTC N ATAAGC SEQ ID NO: 6521
GTTTCCCAGTAGGTCTC N ATACAA SEQ ID NO: 6522
GTTTCCCAGTAGGTCTC N ATACAG SEQ ID NO: 6523
GTTTCCCAGTAGGTCTC N ATACAT SEQ ID NO: 6524
GTTTCCCAGTAGGTCTC N ATACCG SEQ ID NO: 6525
GTTTCCCAGTAGGTCTC N ATACCT SEQ ID NO: 6526
GTTTCCCAGTAGGTCTC N ATACGC SEQ ID NO: 6527
GTTTCCCAGTAGGTCTC N ATACGG SEQ ID NO: 6528
GTTTCCCAGTAGGTCTC N ATACGT SEQ ID NO: 6529
GTTTCCCAGTAGGTCTC N ATAGCA SEQ ID NO: 6530
GTTTCCCAGTAGGTCTC N ATAGTC SEQ ID NO: 6531
GTTTCCCAGTAGGTCTC N ATATAC SEQ ID NO: 6532
GTTTCCCAGTAGGTCTC N ATATCC SEQ ID NO: 6533
GTTTCCCAGTAGGTCTC N ATATCG SEQ ID NO: 6534
GTTTCCCAGTAGGTCTC N ATATTC SEQ ID NO: 6535
GTTTCCCAGTAGGTCTC N ATCACC SEQ ID NO: 6536
GTTTCCCAGTAGGTCTC N ATCAGC SEQ ID NO: 6537
GTTTCCCAGTAGGTCTC N ATCATA SEQ ID NO: 6538
GTTTCCCAGTAGGTCTC N ATCCGT SEQ ID NO: 6539
GTTTCCCAGTAGGTCTC N ATCGAC SEQ ID NO: 6540
GTTTCCCAGTAGGTCTC N ATCGCA SEQ ID NO: 6541
GTTTCCCAGTAGGTCTC N ATCTAC SEQ ID NO: 6542
GTTTCCCAGTAGGTCTC N ATCTCA SEQ ID NO: 6543
GTTTCCCAGTAGGTCTC N ATCTGG SEQ ID NO: 6544
GTTTCCCAGTAGGTCTC N ATCTTC SEQ ID NO: 6545
GTTTCCCAGTAGGTCTC N ATCTTG SEQ ID NO: 6546
GTTTCCCAGTAGGTCTC N ATGACA SEQ ID NO: 6547
GTTTCCCAGTAGGTCTC N ATGAGG SEQ ID NO: 6548
GTTTCCCAGTAGGTCTC N ATGCAA SEQ ID NO: 6549 G I N CCCAGTAGGTCTC N ATGCAT SEQ ID NO: 6550
GTTTCCCAGTAGGTCTC N ATGCCG SEQ ID NO: 6551
GTTTCCCAGTAGGTCTC N ATGCGA SEQ ID NO: 6552
GTTTCCCAGTAGGTCTC N ATGCTC SEQ ID NO: 6553
GTTTCCCAGTAGGTCTC N ATGCTG SEQ ID NO: 6554
GTTTCCCAGTAGGTCTC N ATGGAC SEQ ID NO: 6555
GTTTCCCAGTAGGTCTC N ATGGTG SEQ ID NO: 6556
GTTTCCCAGTAGGTCTC N ATGTAA SEQ ID NO: 6557
GTTTCCCAGTAGGTCTC N ATGTAC SEQ ID NO: 6558
GTTTCCCAGTAGGTCTC N ATGTCA SEQ ID NO: 6559
GTTTCCCAGTAGGTCTC N ATGTCG SEQ ID NO: 6560
GTTTCCCAGTAGGTCTC N ATGTGC SEQ ID NO: 6561
GTTTCCCAGTAGGTCTC N ATGTTG SEQ ID NO: 6562
GTTTCCCAGTAGGTCTC N ATTACG SEQ ID NO: 6563
GTTTCCCAGTAGGTCTC N ATTATC SEQ ID NO: 6564
GTTTCCCAGTAGGTCTC N ATTCAA SEQ ID NO: 6565
GTTTCCCAGTAGGTCTC N ATTCAC SEQ ID NO: 6566
GTTTCCCAGTAGGTCTC N ATTCGA SEQ ID NO: 6567
GTTTCCCAGTAGGTCTC N ATTCTC SEQ ID NO: 6568
GTTTCCCAGTAGGTCTC N ATTGAT SEQ ID NO: 6569
GTTTCCCAGTAGGTCTC N ATTGCC SEQ ID NO: 6570
GTTTCCCAGTAGGTCTC N ATTGGA SEQ ID NO: 6571
GTTTCCCAGTAGGTCTC N ATTTCC SEQ ID NO: 6572
GTTTCCCAGTAGGTCTC N ATTTCG SEQ ID NO: 6573
GTTTCCCAGTAGGTCTC N ATTTTA SEQ ID NO: 6574
GTTTCCCAGTAGGTCTC N CAAAAC SEQ ID NO: 6575
GTTTCCCAGTAGGTCTC N CAAACA SEQ ID NO: 6576
GTTTCCCAGTAGGTCTC N CAAATA SEQ ID NO: 6577
GTTTCCCAGTAGGTCTC N CAACCT SEQ ID NO: 6578
GTTTCCCAGTAGGTCTC N CAACGA SEQ ID NO: 6579
GTTTCCCAGTAGGTCTC N CAACGT SEQ ID NO: 6580
GTTTCCCAGTAGGTCTC N CAAGGA SEQ ID NO: 6581
GTTTCCCAGTAGGTCTC N CAAGTC SEQ ID NO: 6582
GTTTCCCAGTAGGTCTC N CAAGTT SEQ ID NO: 6583
GTTTCCCAGTAGGTCTC N CAATAT SEQ ID NO: 6584
GTTTCCCAGTAGGTCTC N CAATCA SEQ ID NO: 6585
GTTTCCCAGTAGGTCTC N CAATGC SEQ ID NO: 6586
GTTTCCCAGTAGGTCTC N CAATGT SEQ ID NO: 6587
GTTTCCCAGTAGGTCTC N CACAAC SEQ ID NO: 6588
GTTTCCCAGTAGGTCTC N CACACA SEQ ID NO: 6589
GTTTCCCAGTAGGTCTC N CACAGC SEQ ID NO: 6590
GTTTCCCAGTAGGTCTC N CACAGG SEQ ID NO: 6591
GTTTCCCAGTAGGTCTC N CACAGT SEQ ID NO: 6592 G I N CCCAGTAGGTCTC N CACATT SEQ ID NO: 6593
GTTTCCCAGTAGGTCTC N CACCCC SEQ ID NO: 6594
GTTTCCCAGTAGGTCTC N CACCGA SEQ ID NO: 6595
GTTTCCCAGTAGGTCTC N CACCTA SEQ ID NO: 6596
GTTTCCCAGTAGGTCTC N CACGAG SEQ ID NO: 6597
GTTTCCCAGTAGGTCTC N CACGAT SEQ ID NO: 6598
GTTTCCCAGTAGGTCTC N CACTGA SEQ ID NO: 6599
GTTTCCCAGTAGGTCTC N CAGAAT SEQ ID NO: 6600
GTTTCCCAGTAGGTCTC N CAGAGA SEQ ID NO: 6601
GTTTCCCAGTAGGTCTC N CAGCAA SEQ ID NO: 6602
GTTTCCCAGTAGGTCTC N CAGCAC SEQ ID NO: 6603
GTTTCCCAGTAGGTCTC N CAGCAG SEQ ID NO: 6604
GTTTCCCAGTAGGTCTC N CAGCAT SEQ ID NO: 6605
GTTTCCCAGTAGGTCTC N CAGCCT SEQ ID NO: 6606
GTTTCCCAGTAGGTCTC N CAGCGT SEQ ID NO: 6607
GTTTCCCAGTAGGTCTC N CAGCTG SEQ ID NO: 6608
GTTTCCCAGTAGGTCTC N CAGGAA SEQ ID NO: 6609
GTTTCCCAGTAGGTCTC N CAGGCC SEQ ID NO: 6610
GTTTCCCAGTAGGTCTC N CAGGTA SEQ ID NO: 661 1
GTTTCCCAGTAGGTCTC N CAGTAA SEQ ID NO: 6612
GTTTCCCAGTAGGTCTC N CAGTAT SEQ ID NO: 6613
GTTTCCCAGTAGGTCTC N CAGTCC SEQ ID NO: 6614
GTTTCCCAGTAGGTCTC N CAGTCT SEQ ID NO: 6615
GTTTCCCAGTAGGTCTC N CAGTGC SEQ ID NO: 6616
GTTTCCCAGTAGGTCTC N CAGTTG SEQ ID NO: 6617
GTTTCCCAGTAGGTCTC N CATAAA SEQ ID NO: 6618
GTTTCCCAGTAGGTCTC N CATAAC SEQ ID NO: 6619
GTTTCCCAGTAGGTCTC N CATACC SEQ ID NO: 6620
GTTTCCCAGTAGGTCTC N CATACG SEQ ID NO: 6621
GTTTCCCAGTAGGTCTC N CATAGG SEQ ID NO: 6622
GTTTCCCAGTAGGTCTC N CATATA SEQ ID NO: 6623
GTTTCCCAGTAGGTCTC N CATATC SEQ ID NO: 6624
GTTTCCCAGTAGGTCTC N CATATG SEQ ID NO: 6625
GTTTCCCAGTAGGTCTC N CATCAA SEQ ID NO: 6626
GTTTCCCAGTAGGTCTC N CATCCG SEQ ID NO: 6627
GTTTCCCAGTAGGTCTC N CATCTG SEQ ID NO: 6628
GTTTCCCAGTAGGTCTC N CATGAA SEQ ID NO: 6629
GTTTCCCAGTAGGTCTC N CATGAC SEQ ID NO: 6630
GTTTCCCAGTAGGTCTC N CATGAG SEQ ID NO: 6631
GTTTCCCAGTAGGTCTC N CATGAT SEQ ID NO: 6632
GTTTCCCAGTAGGTCTC N CATGCT SEQ ID NO: 6633
GTTTCCCAGTAGGTCTC N CATTAA SEQ ID NO: 6634
GTTTCCCAGTAGGTCTC N CATTAC SEQ ID NO: 6635 G I N CCCAGTAGGTCTC N CCAATA SEQ ID NO: 6636
GTTTCCCAGTAGGTCTC N CCACAT SEQ ID NO: 6637
GTTTCCCAGTAGGTCTC N CCACGT SEQ ID NO: 6638
GTTTCCCAGTAGGTCTC N CCACTG SEQ ID NO: 6639
GTTTCCCAGTAGGTCTC N CCAGAA SEQ ID NO: 6640
GTTTCCCAGTAGGTCTC N CCAGAT SEQ ID NO: 6641
GTTTCCCAGTAGGTCTC N CCAGCA SEQ ID NO: 6642
GTTTCCCAGTAGGTCTC N CCAGGA SEQ ID NO: 6643
GTTTCCCAGTAGGTCTC N CCAGTG SEQ ID NO: 6644
GTTTCCCAGTAGGTCTC N CCATAA SEQ ID NO: 6645
GTTTCCCAGTAGGTCTC N C CAT AT SEQ ID NO: 6646
GTTTCCCAGTAGGTCTC N CCATGA SEQ ID NO: 6647
GTTTCCCAGTAGGTCTC N CCCAAC SEQ ID NO: 6648
GTTTCCCAGTAGGTCTC N CCCAAT SEQ ID NO: 6649
GTTTCCCAGTAGGTCTC N CCCAGA SEQ ID NO: 6650
GTTTCCCAGTAGGTCTC N CCCATA SEQ ID NO: 6651
GTTTCCCAGTAGGTCTC N CCCCCT SEQ ID NO: 6652
GTTTCCCAGTAGGTCTC N CCCCTT SEQ ID NO: 6653
GTTTCCCAGTAGGTCTC N CCCGGT SEQ ID NO: 6654
GTTTCCCAGTAGGTCTC N CCCGTA SEQ ID NO: 6655
GTTTCCCAGTAGGTCTC N CCCTAA SEQ ID NO: 6656
GTTTCCCAGTAGGTCTC N CCCTAC SEQ ID NO: 6657
GTTTCCCAGTAGGTCTC N CCCTAG SEQ ID NO: 6658
GTTTCCCAGTAGGTCTC N CCCTCA SEQ ID NO: 6659
GTTTCCCAGTAGGTCTC N CCGAAC SEQ ID NO: 6660
GTTTCCCAGTAGGTCTC N CCGAAT SEQ ID NO: 6661
GTTTCCCAGTAGGTCTC N CCGAGC SEQ ID NO: 6662
GTTTCCCAGTAGGTCTC N CCGATA SEQ ID NO: 6663
GTTTCCCAGTAGGTCTC N CCGATG SEQ ID NO: 6664
GTTTCCCAGTAGGTCTC N CCGCAA SEQ ID NO: 6665
GTTTCCCAGTAGGTCTC N CCGCGT SEQ ID NO: 6666
GTTTCCCAGTAGGTCTC N CCGGCA SEQ ID NO: 6667
GTTTCCCAGTAGGTCTC N CCGGTT SEQ ID NO: 6668
GTTTCCCAGTAGGTCTC N CCGTAC SEQ ID NO: 6669
GTTTCCCAGTAGGTCTC N CCGTAG SEQ ID NO: 6670
GTTTCCCAGTAGGTCTC N CCGTGA SEQ ID NO: 6671
GTTTCCCAGTAGGTCTC N CCGTTG SEQ ID NO: 6672
GTTTCCCAGTAGGTCTC N CCTAAA SEQ ID NO: 6673
GTTTCCCAGTAGGTCTC N CCTAAG SEQ ID NO: 6674
GTTTCCCAGTAGGTCTC N CCTAAT SEQ ID NO: 6675
GTTTCCCAGTAGGTCTC N CCTACC SEQ ID NO: 6676
GTTTCCCAGTAGGTCTC N CCTACG SEQ ID NO: 6677
GTTTCCCAGTAGGTCTC N CCTAGA SEQ ID NO: 6678 G I N CCCAGTAGGTCTC N CCTATC SEQ ID NO: 6679
GTTTCCCAGTAGGTCTC N CCTGAA SEQ ID NO: 6680
GTTTCCCAGTAGGTCTC N CCTGAC SEQ ID NO: 6681
GTTTCCCAGTAGGTCTC N CCTGCA SEQ ID NO: 6682
GTTTCCCAGTAGGTCTC N CCTGGC SEQ ID NO: 6683
GTTTCCCAGTAGGTCTC N CGAAAC SEQ ID NO: 6684
GTTTCCCAGTAGGTCTC N CGAACG SEQ ID NO: 6685
GTTTCCCAGTAGGTCTC N CGAACT SEQ ID NO: 6686
GTTTCCCAGTAGGTCTC N CGAAGA SEQ ID NO: 6687
GTTTCCCAGTAGGTCTC N CGAAGC SEQ ID NO: 6688
GTTTCCCAGTAGGTCTC N CGAATA SEQ ID NO: 6689
GTTTCCCAGTAGGTCTC N CGAATC SEQ ID NO: 6690
GTTTCCCAGTAGGTCTC N CGACAA SEQ ID NO: 6691
GTTTCCCAGTAGGTCTC N CGACAG SEQ ID NO: 6692
GTTTCCCAGTAGGTCTC N CGACCA SEQ ID NO: 6693
GTTTCCCAGTAGGTCTC N CGACGA SEQ ID NO: 6694
GTTTCCCAGTAGGTCTC N CGAGCG SEQ ID NO: 6695
GTTTCCCAGTAGGTCTC N CGAGTA SEQ ID NO: 6696
GTTTCCCAGTAGGTCTC N CGAGTC SEQ ID NO: 6697
GTTTCCCAGTAGGTCTC N CGAGTG SEQ ID NO: 6698
GTTTCCCAGTAGGTCTC N CGATAA SEQ ID NO: 6699
GTTTCCCAGTAGGTCTC N CGATAC SEQ ID NO: 6700
GTTTCCCAGTAGGTCTC N CGATAG SEQ ID NO: 6701
GTTTCCCAGTAGGTCTC N C GAT AT SEQ ID NO: 6702
GTTTCCCAGTAGGTCTC N CGATCT SEQ ID NO: 6703
GTTTCCCAGTAGGTCTC N CGATGT SEQ ID NO: 6704
GTTTCCCAGTAGGTCTC N CGATTC SEQ ID NO: 6705
GTTTCCCAGTAGGTCTC N CGATTG SEQ ID NO: 6706
GTTTCCCAGTAGGTCTC N CGCAAA SEQ ID NO: 6707
GTTTCCCAGTAGGTCTC N CGCAAC SEQ ID NO: 6708
GTTTCCCAGTAGGTCTC N CGCACG SEQ ID NO: 6709
GTTTCCCAGTAGGTCTC N CGCAGA SEQ ID NO: 6710
GTTTCCCAGTAGGTCTC N CGCAGT SEQ ID NO: 671 1
GTTTCCCAGTAGGTCTC N CGCATA SEQ ID NO: 6712
GTTTCCCAGTAGGTCTC N CGCATG SEQ ID NO: 6713
GTTTCCCAGTAGGTCTC N CGCGCA SEQ ID NO: 6714
GTTTCCCAGTAGGTCTC N CGCTAA SEQ ID NO: 6715
GTTTCCCAGTAGGTCTC N CGCTAG SEQ ID NO: 6716
GTTTCCCAGTAGGTCTC N CGCTGT SEQ ID NO: 6717
GTTTCCCAGTAGGTCTC N CGGAAA SEQ ID NO: 6718
GTTTCCCAGTAGGTCTC N CGGAGC SEQ ID NO: 6719
GTTTCCCAGTAGGTCTC N CGGAGT SEQ ID NO: 6720
GTTTCCCAGTAGGTCTC N CGGATG SEQ ID NO: 6721 G I N CCCAGTAGGTCTC N CGGCAC SEQ ID NO: 6722
GTTTCCCAGTAGGTCTC N CGGGCA SEQ ID NO: 6723
GTTTCCCAGTAGGTCTC N CGGTAA SEQ ID NO: 6724
GTTTCCCAGTAGGTCTC N CGGTCA SEQ ID NO: 6725
GTTTCCCAGTAGGTCTC N CGGTGA SEQ ID NO: 6726
GTTTCCCAGTAGGTCTC N CGGTTA SEQ ID NO: 6727
GTTTCCCAGTAGGTCTC N CGTAAG SEQ ID NO: 6728
GTTTCCCAGTAGGTCTC N CGTAAT SEQ ID NO: 6729
GTTTCCCAGTAGGTCTC N CGTACC SEQ ID NO: 6730
GTTTCCCAGTAGGTCTC N CGTACG SEQ ID NO: 6731
GTTTCCCAGTAGGTCTC N CGTAGA SEQ ID NO: 6732
GTTTCCCAGTAGGTCTC N CGTATA SEQ ID NO: 6733
GTTTCCCAGTAGGTCTC N CGTATG SEQ ID NO: 6734
GTTTCCCAGTAGGTCTC N CGTCAT SEQ ID NO: 6735
GTTTCCCAGTAGGTCTC N CGTCCG SEQ ID NO: 6736
GTTTCCCAGTAGGTCTC N CGTCGA SEQ ID NO: 6737
GTTTCCCAGTAGGTCTC N CGTCGT SEQ ID NO: 6738
GTTTCCCAGTAGGTCTC N CGTCTA SEQ ID NO: 6739
GTTTCCCAGTAGGTCTC N CGTCTC SEQ ID NO: 6740
GTTTCCCAGTAGGTCTC N CGTGAA SEQ ID NO: 6741
GTTTCCCAGTAGGTCTC N CGTGAC SEQ ID NO: 6742
GTTTCCCAGTAGGTCTC N CGTGAT SEQ ID NO: 6743
GTTTCCCAGTAGGTCTC N CGTGGA SEQ ID NO: 6744
GTTTCCCAGTAGGTCTC N CGTGTA SEQ ID NO: 6745
GTTTCCCAGTAGGTCTC N CGTTAG SEQ ID NO: 6746
GTTTCCCAGTAGGTCTC N CGTTGC SEQ ID NO: 6747
GTTTCCCAGTAGGTCTC N CGTTGG SEQ ID NO: 6748
GTTTCCCAGTAGGTCTC N CGTTGT SEQ ID NO: 6749
GTTTCCCAGTAGGTCTC N CTAACT SEQ ID NO: 6750
GTTTCCCAGTAGGTCTC N CTAATG SEQ ID NO: 6751
GTTTCCCAGTAGGTCTC N CTACAG SEQ ID NO: 6752
GTTTCCCAGTAGGTCTC N CTACCC SEQ ID NO: 6753
GTTTCCCAGTAGGTCTC N CTACGC SEQ ID NO: 6754
GTTTCCCAGTAGGTCTC N CTACGT SEQ ID NO: 6755
GTTTCCCAGTAGGTCTC N CTACTG SEQ ID NO: 6756
GTTTCCCAGTAGGTCTC N CTACTT SEQ ID NO: 6757
GTTTCCCAGTAGGTCTC N CTAGAC SEQ ID NO: 6758
GTTTCCCAGTAGGTCTC N CTAGCA SEQ ID NO: 6759
GTTTCCCAGTAGGTCTC N CTAGCC SEQ ID NO: 6760
GTTTCCCAGTAGGTCTC N CTATCT SEQ ID NO: 6761
GTTTCCCAGTAGGTCTC N CTATGC SEQ ID NO: 6762
GTTTCCCAGTAGGTCTC N CTCAAA SEQ ID NO: 6763
GTTTCCCAGTAGGTCTC N CTCAAG SEQ ID NO: 6764 G I N CCCAGTAGGTCTC N CTCATC SEQ ID NO: 6765
GTTTCCCAGTAGGTCTC N CTCCAT SEQ ID NO: 6766
GTTTCCCAGTAGGTCTC N CTCCGT SEQ ID NO: 6767
GTTTCCCAGTAGGTCTC N CTCCTA SEQ ID NO: 6768
GTTTCCCAGTAGGTCTC N CTCGAG SEQ ID NO: 6769
GTTTCCCAGTAGGTCTC N CTCGCT SEQ ID NO: 6770
GTTTCCCAGTAGGTCTC N CTCGGA SEQ ID NO: 6771
GTTTCCCAGTAGGTCTC N CTCGGT SEQ ID NO: 6772
GTTTCCCAGTAGGTCTC N CTCTGA SEQ ID NO: 6773
GTTTCCCAGTAGGTCTC N CTCTGC SEQ ID NO: 6774
GTTTCCCAGTAGGTCTC N CTCTGT SEQ ID NO: 6775
GTTTCCCAGTAGGTCTC N CTGAGA SEQ ID NO: 6776
GTTTCCCAGTAGGTCTC N CTGAGT SEQ ID NO: 6777
GTTTCCCAGTAGGTCTC N CTGCAA SEQ ID NO: 6778
GTTTCCCAGTAGGTCTC N CTGCAG SEQ ID NO: 6779
GTTTCCCAGTAGGTCTC N CTGCCA SEQ ID NO: 6780
GTTTCCCAGTAGGTCTC N CTGGCC SEQ ID NO: 6781
GTTTCCCAGTAGGTCTC N CTGGGA SEQ ID NO: 6782
GTTTCCCAGTAGGTCTC N CTGTAA SEQ ID NO: 6783
GTTTCCCAGTAGGTCTC N CTGTAC SEQ ID NO: 6784
GTTTCCCAGTAGGTCTC N CTGTAG SEQ ID NO: 6785
GTTTCCCAGTAGGTCTC N CTGTCG SEQ ID NO: 6786
GTTTCCCAGTAGGTCTC N CTGTGA SEQ ID NO: 6787
GTTTCCCAGTAGGTCTC N CTTACA SEQ ID NO: 6788
GTTTCCCAGTAGGTCTC N CTTCGT SEQ ID NO: 6789
GTTTCCCAGTAGGTCTC N CTTGTA SEQ ID NO: 6790
GTTTCCCAGTAGGTCTC N GAAAAA SEQ ID NO: 6791
GTTTCCCAGTAGGTCTC N GAAACA SEQ ID NO: 6792
GTTTCCCAGTAGGTCTC N GAAACG SEQ ID NO: 6793
GTTTCCCAGTAGGTCTC N GAAATG SEQ ID NO: 6794
GTTTCCCAGTAGGTCTC N GAACGA SEQ ID NO: 6795
GTTTCCCAGTAGGTCTC N GAACGT SEQ ID NO: 6796
GTTTCCCAGTAGGTCTC N GAACTT SEQ ID NO: 6797
GTTTCCCAGTAGGTCTC N GAAGCC SEQ ID NO: 6798
GTTTCCCAGTAGGTCTC N GAAGTC SEQ ID NO: 6799
GTTTCCCAGTAGGTCTC N GAATAC SEQ ID NO: 6800
GTTTCCCAGTAGGTCTC N GAATCA SEQ ID NO: 6801
GTTTCCCAGTAGGTCTC N GAATTG SEQ ID NO: 6802
GTTTCCCAGTAGGTCTC N GACAAC SEQ ID NO: 6803
GTTTCCCAGTAGGTCTC N GACAAG SEQ ID NO: 6804
GTTTCCCAGTAGGTCTC N GACAAT SEQ ID NO: 6805
GTTTCCCAGTAGGTCTC N GACACA SEQ ID NO: 6806
GTTTCCCAGTAGGTCTC N GACAGA SEQ ID NO: 6807 G I N CCCAGTAGGTCTC N GACAGC SEQ ID NO: 6808
GTTTCCCAGTAGGTCTC N GACATA SEQ ID NO: 6809
GTTTCCCAGTAGGTCTC N GACCAC SEQ ID NO: 6810
GTTTCCCAGTAGGTCTC N GACCAG SEQ ID NO: 681 1
GTTTCCCAGTAGGTCTC N GACCAT SEQ ID NO: 6812
GTTTCCCAGTAGGTCTC N GACCTT SEQ ID NO: 6813
GTTTCCCAGTAGGTCTC N GACGAA SEQ ID NO: 6814
GTTTCCCAGTAGGTCTC N GACGAT SEQ ID NO: 6815
GTTTCCCAGTAGGTCTC N GACGCG SEQ ID NO: 6816
GTTTCCCAGTAGGTCTC N GACGGA SEQ ID NO: 6817
GTTTCCCAGTAGGTCTC N GACGTA SEQ ID NO: 6818
GTTTCCCAGTAGGTCTC N GACGTG SEQ ID NO: 6819
GTTTCCCAGTAGGTCTC N GACTCG SEQ ID NO: 6820
GTTTCCCAGTAGGTCTC N GACTCT SEQ ID NO: 6821
GTTTCCCAGTAGGTCTC N GACTGT SEQ ID NO: 6822
GTTTCCCAGTAGGTCTC N GAGACA SEQ ID NO: 6823
GTTTCCCAGTAGGTCTC N GAGACT SEQ ID NO: 6824
GTTTCCCAGTAGGTCTC N GAGCCC SEQ ID NO: 6825
GTTTCCCAGTAGGTCTC N GAGCGG SEQ ID NO: 6826
GTTTCCCAGTAGGTCTC N GAGGAC SEQ ID NO: 6827
GTTTCCCAGTAGGTCTC N GAGTAA SEQ ID NO: 6828
GTTTCCCAGTAGGTCTC N GAGTCG SEQ ID NO: 6829
GTTTCCCAGTAGGTCTC N GAGTGC SEQ ID NO: 6830
GTTTCCCAGTAGGTCTC N GAGTTA SEQ ID NO: 6831
GTTTCCCAGTAGGTCTC N GATAAC SEQ ID NO: 6832
GTTTCCCAGTAGGTCTC N GATACA SEQ ID NO: 6833
GTTTCCCAGTAGGTCTC N GATACC SEQ ID NO: 6834
GTTTCCCAGTAGGTCTC N GATACG SEQ ID NO: 6835
GTTTCCCAGTAGGTCTC N GATAGC SEQ ID NO: 6836
GTTTCCCAGTAGGTCTC N GATATA SEQ ID NO: 6837
GTTTCCCAGTAGGTCTC N GATCAT SEQ ID NO: 6838
GTTTCCCAGTAGGTCTC N GATCGA SEQ ID NO: 6839
GTTTCCCAGTAGGTCTC N GATCGC SEQ ID NO: 6840
GTTTCCCAGTAGGTCTC N GATCTC SEQ ID NO: 6841
GTTTCCCAGTAGGTCTC N GATCTT SEQ ID NO: 6842
GTTTCCCAGTAGGTCTC N GATGAA SEQ ID NO: 6843
GTTTCCCAGTAGGTCTC N GATGCA SEQ ID NO: 6844
GTTTCCCAGTAGGTCTC N GATGCC SEQ ID NO: 6845
GTTTCCCAGTAGGTCTC N GATGCG SEQ ID NO: 6846
GTTTCCCAGTAGGTCTC N GATGTA SEQ ID NO: 6847
GTTTCCCAGTAGGTCTC N GATTCA SEQ ID NO: 6848
GTTTCCCAGTAGGTCTC N GATTTC SEQ ID NO: 6849
GTTTCCCAGTAGGTCTC N GCAAAA SEQ ID NO: 6850 G I N CCCAGTAGGTCTC N GCAAAC SEQ ID NO: 6851
GTTTCCCAGTAGGTCTC N GCAACT SEQ ID NO: 6852
GTTTCCCAGTAGGTCTC N GCAATT SEQ ID NO: 6853
GTTTCCCAGTAGGTCTC N GCACAA SEQ ID NO: 6854
GTTTCCCAGTAGGTCTC N GCACAG SEQ ID NO: 6855
GTTTCCCAGTAGGTCTC N GCACCC SEQ ID NO: 6856
GTTTCCCAGTAGGTCTC N GCACTA SEQ ID NO: 6857
GTTTCCCAGTAGGTCTC N GCAGAG SEQ ID NO: 6858
GTTTCCCAGTAGGTCTC N GCAGAT SEQ ID NO: 6859
GTTTCCCAGTAGGTCTC N GCAGCA SEQ ID NO: 6860
GTTTCCCAGTAGGTCTC N GCAGCT SEQ ID NO: 6861
GTTTCCCAGTAGGTCTC N GCAGTA SEQ ID NO: 6862
GTTTCCCAGTAGGTCTC N GCAGTC SEQ ID NO: 6863
GTTTCCCAGTAGGTCTC N GCATAA SEQ ID NO: 6864
GTTTCCCAGTAGGTCTC N GCATAC SEQ ID NO: 6865
GTTTCCCAGTAGGTCTC N G CAT AT SEQ ID NO: 6866
GTTTCCCAGTAGGTCTC N GCATCT SEQ ID NO: 6867
GTTTCCCAGTAGGTCTC N GCATGA SEQ ID NO: 6868
GTTTCCCAGTAGGTCTC N GCATGG SEQ ID NO: 6869
GTTTCCCAGTAGGTCTC N GCATTA SEQ ID NO: 6870
GTTTCCCAGTAGGTCTC N GCCAAA SEQ ID NO: 6871
GTTTCCCAGTAGGTCTC N GCCATA SEQ ID NO: 6872
GTTTCCCAGTAGGTCTC N GCCATT SEQ ID NO: 6873
GTTTCCCAGTAGGTCTC N GCCCAA SEQ ID NO: 6874
GTTTCCCAGTAGGTCTC N GCCCTA SEQ ID NO: 6875
GTTTCCCAGTAGGTCTC N GCCGAA SEQ ID NO: 6876
GTTTCCCAGTAGGTCTC N GCCTAA SEQ ID NO: 6877
GTTTCCCAGTAGGTCTC N GCCTGA SEQ ID NO: 6878
GTTTCCCAGTAGGTCTC N GCCTTG SEQ ID NO: 6879
GTTTCCCAGTAGGTCTC N GCGACA SEQ ID NO: 6880
GTTTCCCAGTAGGTCTC N GCGACC SEQ ID NO: 6881
GTTTCCCAGTAGGTCTC N GCGAGT SEQ ID NO: 6882
GTTTCCCAGTAGGTCTC N GCGATA SEQ ID NO: 6883
GTTTCCCAGTAGGTCTC N GCGATT SEQ ID NO: 6884
GTTTCCCAGTAGGTCTC N GCGCAC SEQ ID NO: 6885
GTTTCCCAGTAGGTCTC N GCGCAG SEQ ID NO: 6886
GTTTCCCAGTAGGTCTC N GCGCAT SEQ ID NO: 6887
GTTTCCCAGTAGGTCTC N GCGCTA SEQ ID NO: 6888
GTTTCCCAGTAGGTCTC N GCGGAA SEQ ID NO: 6889
GTTTCCCAGTAGGTCTC N GCGGCC SEQ ID NO: 6890
GTTTCCCAGTAGGTCTC N GCGTAC SEQ ID NO: 6891
GTTTCCCAGTAGGTCTC N GCGTCC SEQ ID NO: 6892
GTTTCCCAGTAGGTCTC N GCGTCT SEQ ID NO: 6893 G I N CCCAGTAGGTCTC N GCGTGA SEQ ID NO: 6894
GTTTCCCAGTAGGTCTC N GCGTGT SEQ ID NO: 6895
GTTTCCCAGTAGGTCTC N GCTAAC SEQ ID NO: 6896
GTTTCCCAGTAGGTCTC N GCTAGC SEQ ID NO: 6897
GTTTCCCAGTAGGTCTC N GCTCGA SEQ ID NO: 6898
GTTTCCCAGTAGGTCTC N GCTGGT SEQ ID NO: 6899
GTTTCCCAGTAGGTCTC N GCTGTA SEQ ID NO: 6900
GTTTCCCAGTAGGTCTC N GCTTCC SEQ ID NO: 6901
GTTTCCCAGTAGGTCTC N GGAAAA SEQ ID NO: 6902
GTTTCCCAGTAGGTCTC N GGAAAG SEQ ID NO: 6903
GTTTCCCAGTAGGTCTC N GGAACA SEQ ID NO: 6904
GTTTCCCAGTAGGTCTC N GGACAA SEQ ID NO: 6905
GTTTCCCAGTAGGTCTC N GGACGT SEQ ID NO: 6906
GTTTCCCAGTAGGTCTC N G GAG AT SEQ ID NO: 6907
GTTTCCCAGTAGGTCTC N GGAGCA SEQ ID NO: 6908
GTTTCCCAGTAGGTCTC N GGAGCC SEQ ID NO: 6909
GTTTCCCAGTAGGTCTC N GGAGCG SEQ ID NO: 6910
GTTTCCCAGTAGGTCTC N GGAGCT SEQ ID NO: 691 1
GTTTCCCAGTAGGTCTC N GGCACC SEQ ID NO: 6912
GTTTCCCAGTAGGTCTC N GGCAGC SEQ ID NO: 6913
GTTTCCCAGTAGGTCTC N GGCGAT SEQ ID NO: 6914
GTTTCCCAGTAGGTCTC N GGCGTC SEQ ID NO: 6915
GTTTCCCAGTAGGTCTC N GGGAGA SEQ ID NO: 6916
GTTTCCCAGTAGGTCTC N GGGAGC SEQ ID NO: 6917
GTTTCCCAGTAGGTCTC N GGGCAC SEQ ID NO: 6918
GTTTCCCAGTAGGTCTC N GGTACA SEQ ID NO: 6919
GTTTCCCAGTAGGTCTC N GGTACC SEQ ID NO: 6920
GTTTCCCAGTAGGTCTC N GGTCAA SEQ ID NO: 6921
GTTTCCCAGTAGGTCTC N GGTCCC SEQ ID NO: 6922
GTTTCCCAGTAGGTCTC N GGTCGA SEQ ID NO: 6923
GTTTCCCAGTAGGTCTC N GGTGCA SEQ ID NO: 6924
GTTTCCCAGTAGGTCTC N GGTGGA SEQ ID NO: 6925
GTTTCCCAGTAGGTCTC N GGTGTC SEQ ID NO: 6926
GTTTCCCAGTAGGTCTC N GGTTGC SEQ ID NO: 6927
GTTTCCCAGTAGGTCTC N GTAACA SEQ ID NO: 6928
GTTTCCCAGTAGGTCTC N GTAACC SEQ ID NO: 6929
GTTTCCCAGTAGGTCTC N GTAACG SEQ ID NO: 6930
GTTTCCCAGTAGGTCTC N GTAAGT SEQ ID NO: 6931
GTTTCCCAGTAGGTCTC N GTAATG SEQ ID NO: 6932
GTTTCCCAGTAGGTCTC N GTACAT SEQ ID NO: 6933
GTTTCCCAGTAGGTCTC N GTACCA SEQ ID NO: 6934
GTTTCCCAGTAGGTCTC N GTACCC SEQ ID NO: 6935
GTTTCCCAGTAGGTCTC N GTACCT SEQ ID NO: 6936 G I N CCCAGTAGGTCTC N GTACGA SEQ ID NO: 6937
GTTTCCCAGTAGGTCTC N GTACGT SEQ ID NO: 6938
GTTTCCCAGTAGGTCTC N GTACTC SEQ ID NO: 6939
GTTTCCCAGTAGGTCTC N GTAGAC SEQ ID NO: 6940
GTTTCCCAGTAGGTCTC N GTAGAG SEQ ID NO: 6941
GTTTCCCAGTAGGTCTC N GTAGCA SEQ ID NO: 6942
GTTTCCCAGTAGGTCTC N GTATCA SEQ ID NO: 6943
GTTTCCCAGTAGGTCTC N GTATGT SEQ ID NO: 6944
GTTTCCCAGTAGGTCTC N GTCACA SEQ ID NO: 6945
GTTTCCCAGTAGGTCTC N GTCACG SEQ ID NO: 6946
GTTTCCCAGTAGGTCTC N GTCATC SEQ ID NO: 6947
GTTTCCCAGTAGGTCTC N GTCCAT SEQ ID NO: 6948
GTTTCCCAGTAGGTCTC N GTCCCA SEQ ID NO: 6949
GTTTCCCAGTAGGTCTC N GTCCCC SEQ ID NO: 6950
GTTTCCCAGTAGGTCTC N GTCCCG SEQ ID NO: 6951
GTTTCCCAGTAGGTCTC N GTCGAA SEQ ID NO: 6952
GTTTCCCAGTAGGTCTC N GTCGAC SEQ ID NO: 6953
GTTTCCCAGTAGGTCTC N GTCGAG SEQ ID NO: 6954
GTTTCCCAGTAGGTCTC N GTCGAT SEQ ID NO: 6955
GTTTCCCAGTAGGTCTC N GTCGCA SEQ ID NO: 6956
GTTTCCCAGTAGGTCTC N GTCGTG SEQ ID NO: 6957
GTTTCCCAGTAGGTCTC N GTCGTT SEQ ID NO: 6958
GTTTCCCAGTAGGTCTC N GTCTAC SEQ ID NO: 6959
GTTTCCCAGTAGGTCTC N GTCTAG SEQ ID NO: 6960
GTTTCCCAGTAGGTCTC N GTGAAT SEQ ID NO: 6961
GTTTCCCAGTAGGTCTC N GTGACA SEQ ID NO: 6962
GTTTCCCAGTAGGTCTC N GTGACT SEQ ID NO: 6963
GTTTCCCAGTAGGTCTC N GTGCAA SEQ ID NO: 6964
GTTTCCCAGTAGGTCTC N GTGCAT SEQ ID NO: 6965
GTTTCCCAGTAGGTCTC N GTGCTA SEQ ID NO: 6966
GTTTCCCAGTAGGTCTC N GTGGAA SEQ ID NO: 6967
GTTTCCCAGTAGGTCTC N GTGGAT SEQ ID NO: 6968
GTTTCCCAGTAGGTCTC N GTGTCG SEQ ID NO: 6969
GTTTCCCAGTAGGTCTC N GTGTCT SEQ ID NO: 6970
GTTTCCCAGTAGGTCTC N GTTAAC SEQ ID NO: 6971
GTTTCCCAGTAGGTCTC N GTTAAG SEQ ID NO: 6972
GTTTCCCAGTAGGTCTC N GTTAGA SEQ ID NO: 6973
GTTTCCCAGTAGGTCTC N GTTGCA SEQ ID NO: 6974
GTTTCCCAGTAGGTCTC N GTTGCC SEQ ID NO: 6975
GTTTCCCAGTAGGTCTC N GTTGCG SEQ ID NO: 6976
GTTTCCCAGTAGGTCTC N GTTGCT SEQ ID NO: 6977
GTTTCCCAGTAGGTCTC N GTTGGA SEQ ID NO: 6978
GTTTCCCAGTAGGTCTC N TAAAAT SEQ ID NO: 6979 G I N CCCAGTAGGTCTC N TAACAA SEQ ID NO: 6980
GTTTCCCAGTAGGTCTC N TAACCG SEQ ID NO: 6981
GTTTCCCAGTAGGTCTC N TAACGA SEQ ID NO: 6982
GTTTCCCAGTAGGTCTC N TAACTC SEQ ID NO: 6983
GTTTCCCAGTAGGTCTC N TAAGAC SEQ ID NO: 6984
GTTTCCCAGTAGGTCTC N TAAGGC SEQ ID NO: 6985
GTTTCCCAGTAGGTCTC N TAAGTA SEQ ID NO: 6986
GTTTCCCAGTAGGTCTC N TAATGC SEQ ID NO: 6987
GTTTCCCAGTAGGTCTC N TAATGG SEQ ID NO: 6988
GTTTCCCAGTAGGTCTC N TAATGT SEQ ID NO: 6989
GTTTCCCAGTAGGTCTC N TACAAT SEQ ID NO: 6990
GTTTCCCAGTAGGTCTC N TACACG SEQ ID NO: 6991
GTTTCCCAGTAGGTCTC N TACAGC SEQ ID NO: 6992
GTTTCCCAGTAGGTCTC N TACAGT SEQ ID NO: 6993
GTTTCCCAGTAGGTCTC N TACATC SEQ ID NO: 6994
GTTTCCCAGTAGGTCTC N TACATG SEQ ID NO: 6995
GTTTCCCAGTAGGTCTC N TACCAG SEQ ID NO: 6996
GTTTCCCAGTAGGTCTC N TACCCC SEQ ID NO: 6997
GTTTCCCAGTAGGTCTC N TACCCT SEQ ID NO: 6998
GTTTCCCAGTAGGTCTC N TACCGA SEQ ID NO: 6999
GTTTCCCAGTAGGTCTC N TACCTG SEQ ID NO: 7000
GTTTCCCAGTAGGTCTC N TACGAG SEQ ID NO: 7001
GTTTCCCAGTAGGTCTC N TACGAT SEQ ID NO: 7002
GTTTCCCAGTAGGTCTC N TACGCA SEQ ID NO: 7003
GTTTCCCAGTAGGTCTC N TACGCT SEQ ID NO: 7004
GTTTCCCAGTAGGTCTC N TACGTA SEQ ID NO: 7005
GTTTCCCAGTAGGTCTC N TACGTC SEQ ID NO: 7006
GTTTCCCAGTAGGTCTC N TACTCA SEQ ID NO: 7007
GTTTCCCAGTAGGTCTC N TACTGT SEQ ID NO: 7008
GTTTCCCAGTAGGTCTC N TACTTC SEQ ID NO: 7009
GTTTCCCAGTAGGTCTC N TAGAAG SEQ ID NO: 7010
GTTTCCCAGTAGGTCTC N TAGACA SEQ ID NO: 701 1
GTTTCCCAGTAGGTCTC N TAGACC SEQ ID NO: 7012
GTTTCCCAGTAGGTCTC N TAGACT SEQ ID NO: 7013
GTTTCCCAGTAGGTCTC N TAGAGA SEQ ID NO: 7014
GTTTCCCAGTAGGTCTC N TAGATC SEQ ID NO: 7015
GTTTCCCAGTAGGTCTC N TAGCAA SEQ ID NO: 7016
GTTTCCCAGTAGGTCTC N TAGCAC SEQ ID NO: 7017
GTTTCCCAGTAGGTCTC N TAGCAG SEQ ID NO: 7018
GTTTCCCAGTAGGTCTC N TAGCCA SEQ ID NO: 7019
GTTTCCCAGTAGGTCTC N TAGCGT SEQ ID NO: 7020
GTTTCCCAGTAGGTCTC N TAGGTC SEQ ID NO: 7021
GTTTCCCAGTAGGTCTC N TAGTAC SEQ ID NO: 7022 G I N CCCAGTAGGTCTC N TATAAC SEQ ID NO: 7023
GTTTCCCAGTAGGTCTC N TATACA SEQ ID NO: 7024
GTTTCCCAGTAGGTCTC N TATACG SEQ ID NO: 7025
GTTTCCCAGTAGGTCTC N TATAGT SEQ ID NO: 7026
GTTTCCCAGTAGGTCTC N TATCAC SEQ ID NO: 7027
GTTTCCCAGTAGGTCTC N TATCAG SEQ ID NO: 7028
GTTTCCCAGTAGGTCTC N TATCTC SEQ ID NO: 7029
GTTTCCCAGTAGGTCTC N TATGGA SEQ ID NO: 7030
GTTTCCCAGTAGGTCTC N TATGGC SEQ ID NO: 7031
GTTTCCCAGTAGGTCTC N TATGTA SEQ ID NO: 7032
GTTTCCCAGTAGGTCTC N TATGTC SEQ ID NO: 7033
GTTTCCCAGTAGGTCTC N TATTAC SEQ ID NO: 7034
GTTTCCCAGTAGGTCTC N TCAAGT SEQ ID NO: 7035
GTTTCCCAGTAGGTCTC N TCAATG SEQ ID NO: 7036
GTTTCCCAGTAGGTCTC N TCAGAC SEQ ID NO: 7037
GTTTCCCAGTAGGTCTC N TCAGCA SEQ ID NO: 7038
GTTTCCCAGTAGGTCTC N TCATAC SEQ ID NO: 7039
GTTTCCCAGTAGGTCTC N TCATCG SEQ ID NO: 7040
GTTTCCCAGTAGGTCTC N TCATGA SEQ ID NO: 7041
GTTTCCCAGTAGGTCTC N TCCACA SEQ ID NO: 7042
GTTTCCCAGTAGGTCTC N TCCAGA SEQ ID NO: 7043
GTTTCCCAGTAGGTCTC N TCCATA SEQ ID NO: 7044
GTTTCCCAGTAGGTCTC N TCCATC SEQ ID NO: 7045
GTTTCCCAGTAGGTCTC N TCCCAT SEQ ID NO: 7046
GTTTCCCAGTAGGTCTC N TCCGAT SEQ ID NO: 7047
GTTTCCCAGTAGGTCTC N TCCGTT SEQ ID NO: 7048
GTTTCCCAGTAGGTCTC N TCCTAC SEQ ID NO: 7049
GTTTCCCAGTAGGTCTC N TCCTTC SEQ ID NO: 7050
GTTTCCCAGTAGGTCTC N TCGACG SEQ ID NO: 7051
GTTTCCCAGTAGGTCTC N TCGAGC SEQ ID NO: 7052
GTTTCCCAGTAGGTCTC N TCGAGT SEQ ID NO: 7053
GTTTCCCAGTAGGTCTC N TCGATA SEQ ID NO: 7054
GTTTCCCAGTAGGTCTC N TCGATG SEQ ID NO: 7055
GTTTCCCAGTAGGTCTC N TCGCAA SEQ ID NO: 7056
GTTTCCCAGTAGGTCTC N TCGCAC SEQ ID NO: 7057
GTTTCCCAGTAGGTCTC N TCGCAG SEQ ID NO: 7058
GTTTCCCAGTAGGTCTC N TCGCCA SEQ ID NO: 7059
GTTTCCCAGTAGGTCTC N TCGGTG SEQ ID NO: 7060
GTTTCCCAGTAGGTCTC N TCGTAT SEQ ID NO: 7061
GTTTCCCAGTAGGTCTC N TCTAAC SEQ ID NO: 7062
GTTTCCCAGTAGGTCTC N TCTACC SEQ ID NO: 7063
GTTTCCCAGTAGGTCTC N TCTACG SEQ ID NO: 7064
GTTTCCCAGTAGGTCTC N TCTAGG SEQ ID NO: 7065 G I N CCCAGTAGGTCTC N TCTATG SEQ ID NO: 7066
GTTTCCCAGTAGGTCTC N TCTCAA SEQ ID NO: 7067
GTTTCCCAGTAGGTCTC N TCTCGC SEQ ID NO: 7068
GTTTCCCAGTAGGTCTC N TCTCTG SEQ ID NO: 7069
GTTTCCCAGTAGGTCTC N TCTGAG SEQ ID NO: 7070
GTTTCCCAGTAGGTCTC N TCTGCC SEQ ID NO: 7071
GTTTCCCAGTAGGTCTC N TCTGGA SEQ ID NO: 7072
GTTTCCCAGTAGGTCTC N TCTGTG SEQ ID NO: 7073
GTTTCCCAGTAGGTCTC N TCTGTT SEQ ID NO: 7074
GTTTCCCAGTAGGTCTC N TCTTAT SEQ ID NO: 7075
GTTTCCCAGTAGGTCTC N TGAAAC SEQ ID NO: 7076
GTTTCCCAGTAGGTCTC N TGAAAT SEQ ID NO: 7077
GTTTCCCAGTAGGTCTC N TGACAA SEQ ID NO: 7078
GTTTCCCAGTAGGTCTC N TGACGT SEQ ID NO: 7079
GTTTCCCAGTAGGTCTC N TGACTC SEQ ID NO: 7080
GTTTCCCAGTAGGTCTC N TGACTG SEQ ID NO: 7081
GTTTCCCAGTAGGTCTC N TGAGAC SEQ ID NO: 7082
GTTTCCCAGTAGGTCTC N TGATAC SEQ ID NO: 7083
GTTTCCCAGTAGGTCTC N TGCAAG SEQ ID NO: 7084
GTTTCCCAGTAGGTCTC N TGCACA SEQ ID NO: 7085
GTTTCCCAGTAGGTCTC N TGCAGG SEQ ID NO: 7086
GTTTCCCAGTAGGTCTC N TGCAGT SEQ ID NO: 7087
GTTTCCCAGTAGGTCTC N TGCATA SEQ ID NO: 7088
GTTTCCCAGTAGGTCTC N TGCATC SEQ ID NO: 7089
GTTTCCCAGTAGGTCTC N TGCCAA SEQ ID NO: 7090
GTTTCCCAGTAGGTCTC N TGCCAT SEQ ID NO: 7091
GTTTCCCAGTAGGTCTC N TGCCGC SEQ ID NO: 7092
GTTTCCCAGTAGGTCTC N TGCCTA SEQ ID NO: 7093
GTTTCCCAGTAGGTCTC N TGCGAA SEQ ID NO: 7094
GTTTCCCAGTAGGTCTC N TGCGAC SEQ ID NO: 7095
GTTTCCCAGTAGGTCTC N TGCGCA SEQ ID NO: 7096
GTTTCCCAGTAGGTCTC N TGCTGG SEQ ID NO: 7097
GTTTCCCAGTAGGTCTC N TGGAAA SEQ ID NO: 7098
GTTTCCCAGTAGGTCTC N TGGAAG SEQ ID NO: 7099
GTTTCCCAGTAGGTCTC N TGGAAT SEQ ID NO: 7100
GTTTCCCAGTAGGTCTC N TGGACA SEQ ID NO: 7101
GTTTCCCAGTAGGTCTC N TGGACC SEQ ID NO: 7102
GTTTCCCAGTAGGTCTC N TGGAGC SEQ ID NO: 7103
GTTTCCCAGTAGGTCTC N TGGCAG SEQ ID NO: 7104
GTTTCCCAGTAGGTCTC N TGGCAT SEQ ID NO: 7105
GTTTCCCAGTAGGTCTC N TGGCGC SEQ ID NO: 7106
GTTTCCCAGTAGGTCTC N TGGCGT SEQ ID NO: 7107
GTTTCCCAGTAGGTCTC N TGGGAC SEQ ID NO: 7108 G I N CCCAGTAGGTCTC N TGGGAG SEQ ID NO: 7109
GTTTCCCAGTAGGTCTC N TGGTAC SEQ ID NO: 71 10
GTTTCCCAGTAGGTCTC N TGGTAT SEQ ID NO: 71 1 1
GTTTCCCAGTAGGTCTC N TGGTCT SEQ ID NO: 71 12
GTTTCCCAGTAGGTCTC N TGTAAC SEQ ID NO: 71 13
GTTTCCCAGTAGGTCTC N TGTACT SEQ ID NO: 71 14
GTTTCCCAGTAGGTCTC N TGTAGT SEQ ID NO: 71 15
GTTTCCCAGTAGGTCTC N TGTATC SEQ ID NO: 71 16
GTTTCCCAGTAGGTCTC N TGTCAT SEQ ID NO: 71 17
GTTTCCCAGTAGGTCTC N TGTCGA SEQ ID NO: 71 18
GTTTCCCAGTAGGTCTC N TGTCGC SEQ ID NO: 71 19
GTTTCCCAGTAGGTCTC N TGTCGG SEQ ID NO: 7120
GTTTCCCAGTAGGTCTC N TGTCGT SEQ ID NO: 7121
GTTTCCCAGTAGGTCTC N TGTCTT SEQ ID NO: 7122
GTTTCCCAGTAGGTCTC N TGTGCA SEQ ID NO: 7123
GTTTCCCAGTAGGTCTC N TGTGCG SEQ ID NO: 7124
GTTTCCCAGTAGGTCTC N TGTGGA SEQ ID NO: 7125
GTTTCCCAGTAGGTCTC N TGTTCG SEQ ID NO: 7126
GTTTCCCAGTAGGTCTC N TGTTGC SEQ ID NO: 7127
GTTTCCCAGTAGGTCTC N TTAACA SEQ ID NO: 7128
GTTTCCCAGTAGGTCTC N TTACAT SEQ ID NO: 7129
GTTTCCCAGTAGGTCTC N TTAGAA SEQ ID NO: 7130
GTTTCCCAGTAGGTCTC N TTAGAC SEQ ID NO: 7131
GTTTCCCAGTAGGTCTC N TTATTA SEQ ID NO: 7132
GTTTCCCAGTAGGTCTC N TTCCGG SEQ ID NO: 7133
GTTTCCCAGTAGGTCTC N TTCCTG SEQ ID NO: 7134
GTTTCCCAGTAGGTCTC N TTCGAG SEQ ID NO: 7135
GTTTCCCAGTAGGTCTC N TTCGCA SEQ ID NO: 7136
GTTTCCCAGTAGGTCTC N TTCGTG SEQ ID NO: 7137
GTTTCCCAGTAGGTCTC N TTGAAA SEQ ID NO: 7138
GTTTCCCAGTAGGTCTC N TTGACG SEQ ID NO: 7139
GTTTCCCAGTAGGTCTC N TTGCAG SEQ ID NO: 7140
GTTTCCCAGTAGGTCTC N TTGCCT SEQ ID NO: 7141
GTTTCCCAGTAGGTCTC N TTGCGA SEQ ID NO: 7142
GTTTCCCAGTAGGTCTC N TTGCTT SEQ ID NO: 7143
GTTTCCCAGTAGGTCTC N TTGGAA SEQ ID NO: 7144
GTTTCCCAGTAGGTCTC N TTGTCG SEQ ID NO: 7145
GTTTCCCAGTAGGTCTC N TTGTGC SEQ ID NO: 7146
GTTTCCCAGTAGGTCTC N TTTCCG SEQ ID NO: 7147
GTTTCCCAGTAGGTCTC N TTTGTC SEQ ID NO: 7148
[00111] The reverse strand (RS-MEP) primer set used in this experiment is provided 9. TABLE 9
Figure imgf000084_0001
GTTTCCCAGTAGGTCTC N ACATTA SEQ ID NO: 7186
GTTTCCCAGTAGGTCTC N ACATTG SEQ ID NO: 7187
GTTTCCCAGTAGGTCTC N ACCACT SEQ ID NO: 7188
GTTTCCCAGTAGGTCTC N ACCAGC SEQ ID NO: 7189
GTTTCCCAGTAGGTCTC N ACCCTT SEQ ID NO: 7190
GTTTCCCAGTAGGTCTC N ACCGAG SEQ ID NO: 7191
GTTTCCCAGTAGGTCTC N ACCGCT SEQ ID NO: 7192
GTTTCCCAGTAGGTCTC N ACCGGG SEQ ID NO: 7193
GTTTCCCAGTAGGTCTC N ACCGGT SEQ ID NO: 7194
GTTTCCCAGTAGGTCTC N ACGAAG SEQ ID NO: 7195
GTTTCCCAGTAGGTCTC N ACGACA SEQ ID NO: 7196
GTTTCCCAGTAGGTCTC N ACGACG SEQ ID NO: 7197
GTTTCCCAGTAGGTCTC N AC G ACT SEQ ID NO: 7198
GTTTCCCAGTAGGTCTC N ACGCCA SEQ ID NO: 7199
GTTTCCCAGTAGGTCTC N ACGCGG SEQ ID NO: 7200
GTTTCCCAGTAGGTCTC N ACGCGT SEQ ID NO: 7201
GTTTCCCAGTAGGTCTC N ACGCTG SEQ ID NO: 7202
GTTTCCCAGTAGGTCTC N ACGGAG SEQ ID NO: 7203
GTTTCCCAGTAGGTCTC N ACGGAT SEQ ID NO: 7204
GTTTCCCAGTAGGTCTC N ACGGCA SEQ ID NO: 7205
GTTTCCCAGTAGGTCTC N ACGGTC SEQ ID NO: 7206
GTTTCCCAGTAGGTCTC N ACGGTT SEQ ID NO: 7207
GTTTCCCAGTAGGTCTC N ACGTAC SEQ ID NO: 7208
GTTTCCCAGTAGGTCTC N ACGTAG SEQ ID NO: 7209
GTTTCCCAGTAGGTCTC N ACGTAT SEQ ID NO: 7210
GTTTCCCAGTAGGTCTC N ACGTCA SEQ ID NO: 721 1
GTTTCCCAGTAGGTCTC N ACGTCC SEQ ID NO: 7212
GTTTCCCAGTAGGTCTC N ACGTTC SEQ ID NO: 7213
GTTTCCCAGTAGGTCTC N ACGTTG SEQ ID NO: 7214
GTTTCCCAGTAGGTCTC N ACGTTT SEQ ID NO: 7215
GTTTCCCAGTAGGTCTC N ACTACA SEQ ID NO: 7216
GTTTCCCAGTAGGTCTC N ACTCAG SEQ ID NO: 7217
GTTTCCCAGTAGGTCTC N ACTCCG SEQ ID NO: 7218
GTTTCCCAGTAGGTCTC N ACTCGA SEQ ID NO: 7219
GTTTCCCAGTAGGTCTC N ACTCGC SEQ ID NO: 7220
GTTTCCCAGTAGGTCTC N ACTCGG SEQ ID NO: 7221
GTTTCCCAGTAGGTCTC N ACTCGT SEQ ID NO: 7222
GTTTCCCAGTAGGTCTC N ACTGCA SEQ ID NO: 7223
GTTTCCCAGTAGGTCTC N ACTGTA SEQ ID NO: 7224 GTTTCCCAGTAGGTCTC N ACTTAC SEQ ID NO: 7225
GTTTCCCAGTAGGTCTC N ACTTGT SEQ ID NO: 7226
GTTTCCCAGTAGGTCTC N AGAAGT SEQ ID NO: 7227
GTTTCCCAGTAGGTCTC N AGACAC SEQ ID NO: 7228
GTTTCCCAGTAGGTCTC N AGACCA SEQ ID NO: 7229
GTTTCCCAGTAGGTCTC N AGACCT SEQ ID NO: 7230
GTTTCCCAGTAGGTCTC N AGACGC SEQ ID NO: 7231
GTTTCCCAGTAGGTCTC N AGACTG SEQ ID NO: 7232
GTTTCCCAGTAGGTCTC N AGACTT SEQ ID NO: 7233
GTTTCCCAGTAGGTCTC N AGAGCA SEQ ID NO: 7234
GTTTCCCAGTAGGTCTC N AGAGGC SEQ ID NO: 7235
GTTTCCCAGTAGGTCTC N AGATCG SEQ ID NO: 7236
GTTTCCCAGTAGGTCTC N AGATGC SEQ ID NO: 7237
GTTTCCCAGTAGGTCTC N AGATTT SEQ ID NO: 7238
GTTTCCCAGTAGGTCTC N AGCAAC SEQ ID NO: 7239
GTTTCCCAGTAGGTCTC N AGCACA SEQ ID NO: 7240
GTTTCCCAGTAGGTCTC N AGCACT SEQ ID NO: 7241
GTTTCCCAGTAGGTCTC N AGCATG SEQ ID NO: 7242
GTTTCCCAGTAGGTCTC N AGCCGT SEQ ID NO: 7243
GTTTCCCAGTAGGTCTC N AGCGAG SEQ ID NO: 7244
GTTTCCCAGTAGGTCTC N AGCGCT SEQ ID NO: 7245
GTTTCCCAGTAGGTCTC N AGCGTA SEQ ID NO: 7246
GTTTCCCAGTAGGTCTC N AGCTCC SEQ ID NO: 7247
GTTTCCCAGTAGGTCTC N AGCTGC SEQ ID NO: 7248
GTTTCCCAGTAGGTCTC N AGCTGT SEQ ID NO: 7249
GTTTCCCAGTAGGTCTC N AGGCAA SEQ ID NO: 7250
GTTTCCCAGTAGGTCTC N AGGCAC SEQ ID NO: 7251
GTTTCCCAGTAGGTCTC N AGGCTG SEQ ID NO: 7252
GTTTCCCAGTAGGTCTC N AGGGGG SEQ ID NO: 7253
GTTTCCCAGTAGGTCTC N AGGGGT SEQ ID NO: 7254
GTTTCCCAGTAGGTCTC N AGGGTA SEQ ID NO: 7255
GTTTCCCAGTAGGTCTC N AGGTAC SEQ ID NO: 7256
GTTTCCCAGTAGGTCTC N AGGTAT SEQ ID NO: 7257
GTTTCCCAGTAGGTCTC N AGGTTG SEQ ID NO: 7258
GTTTCCCAGTAGGTCTC N AGTACA SEQ ID NO: 7259
GTTTCCCAGTAGGTCTC N AGTAGT SEQ ID NO: 7260
GTTTCCCAGTAGGTCTC N AGTATT SEQ ID NO: 7261
GTTTCCCAGTAGGTCTC N AGTCAC SEQ ID NO: 7262
GTTTCCCAGTAGGTCTC N AGTCTA SEQ ID NO: 7263 GTTTCCCAGTAGGTCTC N AGTCTC SEQ ID NO: 7264
GTTTCCCAGTAGGTCTC N AGTGGT SEQ ID NO: 7265
GTTTCCCAGTAGGTCTC N AGTTAG SEQ ID NO: 7266
GTTTCCCAGTAGGTCTC N AGTTCG SEQ ID NO: 7267
GTTTCCCAGTAGGTCTC N ATAAGA SEQ ID NO: 7268
GTTTCCCAGTAGGTCTC N ATACCA SEQ ID NO: 7269
GTTTCCCAGTAGGTCTC N ATACGA SEQ ID NO: 7270
GTTTCCCAGTAGGTCTC N ATACGT SEQ ID NO: 7271
GTTTCCCAGTAGGTCTC N ATACTG SEQ ID NO: 7272
GTTTCCCAGTAGGTCTC N ATAGTG SEQ ID NO: 7273
GTTTCCCAGTAGGTCTC N ATAGTT SEQ ID NO: 7274
GTTTCCCAGTAGGTCTC N ATATCG SEQ ID NO: 7275
GTTTCCCAGTAGGTCTC N ATATGC SEQ ID NO: 7276
GTTTCCCAGTAGGTCTC N ATATGG SEQ ID NO: 7277
GTTTCCCAGTAGGTCTC N ATATGT SEQ ID NO: 7278
GTTTCCCAGTAGGTCTC N ATATTG SEQ ID NO: 7279
GTTTCCCAGTAGGTCTC N ATATTT SEQ ID NO: 7280
GTTTCCCAGTAGGTCTC N ATCAAT SEQ ID NO: 7281
GTTTCCCAGTAGGTCTC N ATCACG SEQ ID NO: 7282
GTTTCCCAGTAGGTCTC N ATCACT SEQ ID NO: 7283
GTTTCCCAGTAGGTCTC N ATCATG SEQ ID NO: 7284
GTTTCCCAGTAGGTCTC N ATCCAC SEQ ID NO: 7285
GTTTCCCAGTAGGTCTC N ATCGAC SEQ ID NO: 7286
GTTTCCCAGTAGGTCTC N ATCGCC SEQ ID NO: 7287
GTTTCCCAGTAGGTCTC N ATCGGA SEQ ID NO: 7288
GTTTCCCAGTAGGTCTC N ATCGGC SEQ ID NO: 7289
GTTTCCCAGTAGGTCTC N ATCGTA SEQ ID NO: 7290
GTTTCCCAGTAGGTCTC N ATCGTC SEQ ID NO: 7291
GTTTCCCAGTAGGTCTC N ATCGTG SEQ ID NO: 7292
GTTTCCCAGTAGGTCTC N ATCGTT SEQ ID NO: 7293
GTTTCCCAGTAGGTCTC N ATCTCC SEQ ID NO: 7294
GTTTCCCAGTAGGTCTC N ATCTGC SEQ ID NO: 7295
GTTTCCCAGTAGGTCTC N ATCTGG SEQ ID NO: 7296
GTTTCCCAGTAGGTCTC N ATCTTT SEQ ID NO: 7297
GTTTCCCAGTAGGTCTC N ATGACA SEQ ID NO: 7298
GTTTCCCAGTAGGTCTC N ATGACG SEQ ID NO: 7299
GTTTCCCAGTAGGTCTC N ATGAGT SEQ ID NO: 7300
GTTTCCCAGTAGGTCTC N ATGATC SEQ ID NO: 7301
GTTTCCCAGTAGGTCTC N ATGCAC SEQ ID NO: 7302 GTTTCCCAGTAGGTCTC N ATGCAT SEQ ID NO: 7303
GTTTCCCAGTAGGTCTC N ATGCCA SEQ ID NO: 7304
GTTTCCCAGTAGGTCTC N ATGCGC SEQ ID NO: 7305
GTTTCCCAGTAGGTCTC N ATGCTG SEQ ID NO: 7306
GTTTCCCAGTAGGTCTC N ATGGAC SEQ ID NO: 7307
GTTTCCCAGTAGGTCTC N ATGGAG SEQ ID NO: 7308
GTTTCCCAGTAGGTCTC N ATGGCA SEQ ID NO: 7309
GTTTCCCAGTAGGTCTC N ATGGGA SEQ ID NO: 7310
GTTTCCCAGTAGGTCTC N ATGGTC SEQ ID NO: 731 1
GTTTCCCAGTAGGTCTC N ATGTAA SEQ ID NO: 7312
GTTTCCCAGTAGGTCTC N ATGTAC SEQ ID NO: 7313
GTTTCCCAGTAGGTCTC N ATGTAT SEQ ID NO: 7314
GTTTCCCAGTAGGTCTC N ATGTCT SEQ ID NO: 7315
GTTTCCCAGTAGGTCTC N ATGTGG SEQ ID NO: 7316
GTTTCCCAGTAGGTCTC N ATTACG SEQ ID NO: 7317
GTTTCCCAGTAGGTCTC N ATTACT SEQ ID NO: 7318
GTTTCCCAGTAGGTCTC N ATTAGG SEQ ID NO: 7319
GTTTCCCAGTAGGTCTC N ATTCAC SEQ ID NO: 7320
GTTTCCCAGTAGGTCTC N ATTCCA SEQ ID NO: 7321
GTTTCCCAGTAGGTCTC N ATTCGG SEQ ID NO: 7322
GTTTCCCAGTAGGTCTC N ATTCTG SEQ ID NO: 7323
GTTTCCCAGTAGGTCTC N ATTGCT SEQ ID NO: 7324
GTTTCCCAGTAGGTCTC N ATTGGG SEQ ID NO: 7325
GTTTCCCAGTAGGTCTC N ATTGGT SEQ ID NO: 7326
GTTTCCCAGTAGGTCTC N ATTGTA SEQ ID NO: 7327
GTTTCCCAGTAGGTCTC N ATTGTC SEQ ID NO: 7328
GTTTCCCAGTAGGTCTC N ATTTCA SEQ ID NO: 7329
GTTTCCCAGTAGGTCTC N ATTTTA SEQ ID NO: 7330
GTTTCCCAGTAGGTCTC N CAACAT SEQ ID NO: 7331
GTTTCCCAGTAGGTCTC N CAACGT SEQ ID NO: 7332
GTTTCCCAGTAGGTCTC N CAACTG SEQ ID NO: 7333
GTTTCCCAGTAGGTCTC N CAAGGC SEQ ID NO: 7334
GTTTCCCAGTAGGTCTC N CAATAC SEQ ID NO: 7335
GTTTCCCAGTAGGTCTC N CAATCG SEQ ID NO: 7336
GTTTCCCAGTAGGTCTC N CAATTC SEQ ID NO: 7337
GTTTCCCAGTAGGTCTC N CACATT SEQ ID NO: 7338
GTTTCCCAGTAGGTCTC N CACCAT SEQ ID NO: 7339
GTTTCCCAGTAGGTCTC N CACCGA SEQ ID NO: 7340
GTTTCCCAGTAGGTCTC N CACCGT SEQ ID NO: 7341 GTTTCCCAGTAGGTCTC N CACGAA SEQ ID NO: 7342
GTTTCCCAGTAGGTCTC N CACGAC SEQ ID NO: 7343
GTTTCCCAGTAGGTCTC N CACGCT SEQ ID NO: 7344
GTTTCCCAGTAGGTCTC N CACGGC SEQ ID NO: 7345
GTTTCCCAGTAGGTCTC N CACGTC SEQ ID NO: 7346
GTTTCCCAGTAGGTCTC N CACGTT SEQ ID NO: 7347
GTTTCCCAGTAGGTCTC N CACTCG SEQ ID NO: 7348
GTTTCCCAGTAGGTCTC N CACTGG SEQ ID NO: 7349
GTTTCCCAGTAGGTCTC N CAGACT SEQ ID NO: 7350
GTTTCCCAGTAGGTCTC N CAGAGA SEQ ID NO: 7351
GTTTCCCAGTAGGTCTC N CAGAGT SEQ ID NO: 7352
GTTTCCCAGTAGGTCTC N CAGATG SEQ ID NO: 7353
GTTTCCCAGTAGGTCTC N CAGCAT SEQ ID NO: 7354
GTTTCCCAGTAGGTCTC N CAGCTG SEQ ID NO: 7355
GTTTCCCAGTAGGTCTC N CAGGAA SEQ ID NO: 7356
GTTTCCCAGTAGGTCTC N CAGGCT SEQ ID NO: 7357
GTTTCCCAGTAGGTCTC N CAGGTA SEQ ID NO: 7358
GTTTCCCAGTAGGTCTC N CAGTAG SEQ ID NO: 7359
GTTTCCCAGTAGGTCTC N CAGTAT SEQ ID NO: 7360
GTTTCCCAGTAGGTCTC N CAGTCA SEQ ID NO: 7361
GTTTCCCAGTAGGTCTC N CAGTCT SEQ ID NO: 7362
GTTTCCCAGTAGGTCTC N CAGTGG SEQ ID NO: 7363
GTTTCCCAGTAGGTCTC N CAGTGT SEQ ID NO: 7364
GTTTCCCAGTAGGTCTC N CATACG SEQ ID NO: 7365
GTTTCCCAGTAGGTCTC N CATAGA SEQ ID NO: 7366
GTTTCCCAGTAGGTCTC N CATATG SEQ ID NO: 7367
GTTTCCCAGTAGGTCTC N CATCCG SEQ ID NO: 7368
GTTTCCCAGTAGGTCTC N CATCGA SEQ ID NO: 7369
GTTTCCCAGTAGGTCTC N CATCGG SEQ ID NO: 7370
GTTTCCCAGTAGGTCTC N CATCGT SEQ ID NO: 7371
GTTTCCCAGTAGGTCTC N CATCTC SEQ ID NO: 7372
GTTTCCCAGTAGGTCTC N CATGCG SEQ ID NO: 7373
GTTTCCCAGTAGGTCTC N CATGTA SEQ ID NO: 7374
GTTTCCCAGTAGGTCTC N CATTAC SEQ ID NO: 7375
GTTTCCCAGTAGGTCTC N CATTAG SEQ ID NO: 7376
GTTTCCCAGTAGGTCTC N CATTGA SEQ ID NO: 7377
GTTTCCCAGTAGGTCTC N CATTGG SEQ ID NO: 7378
GTTTCCCAGTAGGTCTC N CATTTC SEQ ID NO: 7379
GTTTCCCAGTAGGTCTC N CCAGAT SEQ ID NO: 7380 GTTTCCCAGTAGGTCTC N CCAGCA SEQ ID NO: 7381
GTTTCCCAGTAGGTCTC N CCATGC SEQ ID NO: 7382
GTTTCCCAGTAGGTCTC N CCATTA SEQ ID NO: 7383
GTTTCCCAGTAGGTCTC N CCATTG SEQ ID NO: 7384
GTTTCCCAGTAGGTCTC N CCCGTA SEQ ID NO: 7385
GTTTCCCAGTAGGTCTC N CCGACA SEQ ID NO: 7386
GTTTCCCAGTAGGTCTC N CCGATT SEQ ID NO: 7387
GTTTCCCAGTAGGTCTC N CCGCGT SEQ ID NO: 7388
GTTTCCCAGTAGGTCTC N CCGCTT SEQ ID NO: 7389
GTTTCCCAGTAGGTCTC N CCGGAA SEQ ID NO: 7390
GTTTCCCAGTAGGTCTC N CCGGTT SEQ ID NO: 7391
GTTTCCCAGTAGGTCTC N CCGTAT SEQ ID NO: 7392
GTTTCCCAGTAGGTCTC N CCGTGT SEQ ID NO: 7393
GTTTCCCAGTAGGTCTC N CCGTTT SEQ ID NO: 7394
GTTTCCCAGTAGGTCTC N CCTAGA SEQ ID NO: 7395
GTTTCCCAGTAGGTCTC N CCTATG SEQ ID NO: 7396
GTTTCCCAGTAGGTCTC N CCTCAT SEQ ID NO: 7397
GTTTCCCAGTAGGTCTC N CCTGCA SEQ ID NO: 7398
GTTTCCCAGTAGGTCTC N CCTGGT SEQ ID NO: 7399
GTTTCCCAGTAGGTCTC N CCTGTG SEQ ID NO: 7400
GTTTCCCAGTAGGTCTC N CGAAAT SEQ ID NO: 7401
GTTTCCCAGTAGGTCTC N CGAACA SEQ ID NO: 7402
GTTTCCCAGTAGGTCTC N CGAATT SEQ ID NO: 7403
GTTTCCCAGTAGGTCTC N CGACAA SEQ ID NO: 7404
GTTTCCCAGTAGGTCTC N CGACAC SEQ ID NO: 7405
GTTTCCCAGTAGGTCTC N CGACAG SEQ ID NO: 7406
GTTTCCCAGTAGGTCTC N CGACAT SEQ ID NO: 7407
GTTTCCCAGTAGGTCTC N CGACTC SEQ ID NO: 7408
GTTTCCCAGTAGGTCTC N CGAGGA SEQ ID NO: 7409
GTTTCCCAGTAGGTCTC N CGAGTC SEQ ID NO: 7410
GTTTCCCAGTAGGTCTC N CGAGTT SEQ ID NO: 741 1
GTTTCCCAGTAGGTCTC N C GAT AT SEQ ID NO: 7412
GTTTCCCAGTAGGTCTC N CGATGA SEQ ID NO: 7413
GTTTCCCAGTAGGTCTC N CGATTG SEQ ID NO: 7414
GTTTCCCAGTAGGTCTC N CGCAAC SEQ ID NO: 7415
GTTTCCCAGTAGGTCTC N CGCACA SEQ ID NO: 7416
GTTTCCCAGTAGGTCTC N CGCATC SEQ ID NO: 7417
GTTTCCCAGTAGGTCTC N CGCGTC SEQ ID NO: 7418
GTTTCCCAGTAGGTCTC N CGCGTG SEQ ID NO: 7419 GTTTCCCAGTAGGTCTC N CGCGTT SEQ ID NO: 7420
GTTTCCCAGTAGGTCTC N CGCTGT SEQ ID NO: 7421
GTTTCCCAGTAGGTCTC N CGGAAA SEQ ID NO: 7422
GTTTCCCAGTAGGTCTC N CGGATA SEQ ID NO: 7423
GTTTCCCAGTAGGTCTC N CGGATG SEQ ID NO: 7424
GTTTCCCAGTAGGTCTC N CGGATT SEQ ID NO: 7425
GTTTCCCAGTAGGTCTC N CGGCAT SEQ ID NO: 7426
GTTTCCCAGTAGGTCTC N CGGCCT SEQ ID NO: 7427
GTTTCCCAGTAGGTCTC N CGGGAC SEQ ID NO: 7428
GTTTCCCAGTAGGTCTC N CGGGGT SEQ ID NO: 7429
GTTTCCCAGTAGGTCTC N CGGTAT SEQ ID NO: 7430
GTTTCCCAGTAGGTCTC N CGGTCT SEQ ID NO: 7431
GTTTCCCAGTAGGTCTC N CGGTTA SEQ ID NO: 7432
GTTTCCCAGTAGGTCTC N CGGTTG SEQ ID NO: 7433
GTTTCCCAGTAGGTCTC N CGTAAT SEQ ID NO: 7434
GTTTCCCAGTAGGTCTC N CGTACT SEQ ID NO: 7435
GTTTCCCAGTAGGTCTC N CGTAGA SEQ ID NO: 7436
GTTTCCCAGTAGGTCTC N CGTAGG SEQ ID NO: 7437
GTTTCCCAGTAGGTCTC N CGTATA SEQ ID NO: 7438
GTTTCCCAGTAGGTCTC N CGTATC SEQ ID NO: 7439
GTTTCCCAGTAGGTCTC N CGTATG SEQ ID NO: 7440
GTTTCCCAGTAGGTCTC N CGTCAA SEQ ID NO: 7441
GTTTCCCAGTAGGTCTC N CGTCCT SEQ ID NO: 7442
GTTTCCCAGTAGGTCTC N CGTCGA SEQ ID NO: 7443
GTTTCCCAGTAGGTCTC N CGTCTA SEQ ID NO: 7444
GTTTCCCAGTAGGTCTC N CGTCTT SEQ ID NO: 7445
GTTTCCCAGTAGGTCTC N CGTGAC SEQ ID NO: 7446
GTTTCCCAGTAGGTCTC N CGTGCG SEQ ID NO: 7447
GTTTCCCAGTAGGTCTC N CGTGCT SEQ ID NO: 7448
GTTTCCCAGTAGGTCTC N CGTGGT SEQ ID NO: 7449
GTTTCCCAGTAGGTCTC N CGTGTA SEQ ID NO: 7450
GTTTCCCAGTAGGTCTC N CGTGTC SEQ ID NO: 7451
GTTTCCCAGTAGGTCTC N CGTTAC SEQ ID NO: 7452
GTTTCCCAGTAGGTCTC N CGTTCG SEQ ID NO: 7453
GTTTCCCAGTAGGTCTC N CGTTGT SEQ ID NO: 7454
GTTTCCCAGTAGGTCTC N CGTTTC SEQ ID NO: 7455
GTTTCCCAGTAGGTCTC N CGTTTT SEQ ID NO: 7456
GTTTCCCAGTAGGTCTC N CTAACG SEQ ID NO: 7457
GTTTCCCAGTAGGTCTC N CTACAG SEQ ID NO: 7458 GTTTCCCAGTAGGTCTC N CTACGG SEQ ID NO: 7459
GTTTCCCAGTAGGTCTC N CTAGAC SEQ ID NO: 7460
GTTTCCCAGTAGGTCTC N CTAGCG SEQ ID NO: 7461
GTTTCCCAGTAGGTCTC N CTAGGT SEQ ID NO: 7462
GTTTCCCAGTAGGTCTC N CTATCG SEQ ID NO: 7463
GTTTCCCAGTAGGTCTC N CTATGT SEQ ID NO: 7464
GTTTCCCAGTAGGTCTC N CTCATG SEQ ID NO: 7465
GTTTCCCAGTAGGTCTC N CTCCCA SEQ ID NO: 7466
GTTTCCCAGTAGGTCTC N CTCGAA SEQ ID NO: 7467
GTTTCCCAGTAGGTCTC N CTCGAC SEQ ID NO: 7468
GTTTCCCAGTAGGTCTC N CTCGAG SEQ ID NO: 7469
GTTTCCCAGTAGGTCTC N CTCGGT SEQ ID NO: 7470
GTTTCCCAGTAGGTCTC N CTCGTT SEQ ID NO: 7471
GTTTCCCAGTAGGTCTC N CTCTAC SEQ ID NO: 7472
GTTTCCCAGTAGGTCTC N CTCTGC SEQ ID NO: 7473
GTTTCCCAGTAGGTCTC N CTCTGT SEQ ID NO: 7474
GTTTCCCAGTAGGTCTC N CTGATA SEQ ID NO: 7475
GTTTCCCAGTAGGTCTC N CTGATT SEQ ID NO: 7476
GTTTCCCAGTAGGTCTC N CTGCAA SEQ ID NO: 7477
GTTTCCCAGTAGGTCTC N CTGCCA SEQ ID NO: 7478
GTTTCCCAGTAGGTCTC N CTGCGC SEQ ID NO: 7479
GTTTCCCAGTAGGTCTC N CTGCTA SEQ ID NO: 7480
GTTTCCCAGTAGGTCTC N CTGCTG SEQ ID NO: 7481
GTTTCCCAGTAGGTCTC N CTGCTT SEQ ID NO: 7482
GTTTCCCAGTAGGTCTC N CTGGTA SEQ ID NO: 7483
GTTTCCCAGTAGGTCTC N CTGGTC SEQ ID NO: 7484
GTTTCCCAGTAGGTCTC N CTGTAG SEQ ID NO: 7485
GTTTCCCAGTAGGTCTC N CTGTAT SEQ ID NO: 7486
GTTTCCCAGTAGGTCTC N CTGTCG SEQ ID NO: 7487
GTTTCCCAGTAGGTCTC N CTGTCT SEQ ID NO: 7488
GTTTCCCAGTAGGTCTC N CTGTGC SEQ ID NO: 7489
GTTTCCCAGTAGGTCTC N CTGTGT SEQ ID NO: 7490
GTTTCCCAGTAGGTCTC N CTTAAC SEQ ID NO: 7491
GTTTCCCAGTAGGTCTC N CTTACG SEQ ID NO: 7492
GTTTCCCAGTAGGTCTC N CTTAGG SEQ ID NO: 7493
GTTTCCCAGTAGGTCTC N CTTATT SEQ ID NO: 7494
GTTTCCCAGTAGGTCTC N CTTCCA SEQ ID NO: 7495
GTTTCCCAGTAGGTCTC N CTTCGT SEQ ID NO: 7496
GTTTCCCAGTAGGTCTC N CTTCTA SEQ ID NO: 7497 GTTTCCCAGTAGGTCTC N CTTGAG SEQ ID NO: 7498
GTTTCCCAGTAGGTCTC N CTTGCA SEQ ID NO: 7499
GTTTCCCAGTAGGTCTC N CTTGTC SEQ ID NO: 7500
GTTTCCCAGTAGGTCTC N CTTGTT SEQ ID NO: 7501
GTTTCCCAGTAGGTCTC N CTTTAT SEQ ID NO: 7502
GTTTCCCAGTAGGTCTC N CTTTCC SEQ ID NO: 7503
GTTTCCCAGTAGGTCTC N GAAATC SEQ ID NO: 7504
GTTTCCCAGTAGGTCTC N GAAGAT SEQ ID NO: 7505
GTTTCCCAGTAGGTCTC N GAAGGA SEQ ID NO: 7506
GTTTCCCAGTAGGTCTC N GAAGTA SEQ ID NO: 7507
GTTTCCCAGTAGGTCTC N GAATAT SEQ ID NO: 7508
GTTTCCCAGTAGGTCTC N GAATCG SEQ ID NO: 7509
GTTTCCCAGTAGGTCTC N GACAAA SEQ ID NO: 7510
GTTTCCCAGTAGGTCTC N GACACC SEQ ID NO: 751 1
GTTTCCCAGTAGGTCTC N GACATA SEQ ID NO: 7512
GTTTCCCAGTAGGTCTC N GACCTA SEQ ID NO: 7513
GTTTCCCAGTAGGTCTC N GACGCC SEQ ID NO: 7514
GTTTCCCAGTAGGTCTC N GACGTA SEQ ID NO: 7515
GTTTCCCAGTAGGTCTC N GACTAT SEQ ID NO: 7516
GTTTCCCAGTAGGTCTC N GACTCG SEQ ID NO: 7517
GTTTCCCAGTAGGTCTC N GACTGC SEQ ID NO: 7518
GTTTCCCAGTAGGTCTC N GACTTC SEQ ID NO: 7519
GTTTCCCAGTAGGTCTC N GACTTG SEQ ID NO: 7520
GTTTCCCAGTAGGTCTC N GACTTT SEQ ID NO: 7521
GTTTCCCAGTAGGTCTC N GAGAAT SEQ ID NO: 7522
GTTTCCCAGTAGGTCTC N GAGACG SEQ ID NO: 7523
GTTTCCCAGTAGGTCTC N GAGATA SEQ ID NO: 7524
GTTTCCCAGTAGGTCTC N GAGATC SEQ ID NO: 7525
GTTTCCCAGTAGGTCTC N GAGCAT SEQ ID NO: 7526
GTTTCCCAGTAGGTCTC N GAGGAT SEQ ID NO: 7527
GTTTCCCAGTAGGTCTC N GAGGCA SEQ ID NO: 7528
GTTTCCCAGTAGGTCTC N GAGGCT SEQ ID NO: 7529
GTTTCCCAGTAGGTCTC N GAGTAC SEQ ID NO: 7530
GTTTCCCAGTAGGTCTC N GAGTCA SEQ ID NO: 7531
GTTTCCCAGTAGGTCTC N GAGTCT SEQ ID NO: 7532
GTTTCCCAGTAGGTCTC N GAGTTA SEQ ID NO: 7533
GTTTCCCAGTAGGTCTC N GATAAT SEQ ID NO: 7534
GTTTCCCAGTAGGTCTC N GATACA SEQ ID NO: 7535
GTTTCCCAGTAGGTCTC N GATACT SEQ ID NO: 7536 GTTTCCCAGTAGGTCTC N GATAGG SEQ ID NO: 7537
GTTTCCCAGTAGGTCTC N GATATC SEQ ID NO: 7538
GTTTCCCAGTAGGTCTC N GATATG SEQ ID NO: 7539
GTTTCCCAGTAGGTCTC N GATATT SEQ ID NO: 7540
GTTTCCCAGTAGGTCTC N GATCGT SEQ ID NO: 7541
GTTTCCCAGTAGGTCTC N GATCTA SEQ ID NO: 7542
GTTTCCCAGTAGGTCTC N GATGAC SEQ ID NO: 7543
GTTTCCCAGTAGGTCTC N GATGAG SEQ ID NO: 7544
GTTTCCCAGTAGGTCTC N GATGCA SEQ ID NO: 7545
GTTTCCCAGTAGGTCTC N GATGGA SEQ ID NO: 7546
GTTTCCCAGTAGGTCTC N GATGTA SEQ ID NO: 7547
GTTTCCCAGTAGGTCTC N GATTCA SEQ ID NO: 7548
GTTTCCCAGTAGGTCTC N GATTCG SEQ ID NO: 7549
GTTTCCCAGTAGGTCTC N GATTCT SEQ ID NO: 7550
GTTTCCCAGTAGGTCTC N GCAACA SEQ ID NO: 7551
GTTTCCCAGTAGGTCTC N GCAACC SEQ ID NO: 7552
GTTTCCCAGTAGGTCTC N GCAACG SEQ ID NO: 7553
GTTTCCCAGTAGGTCTC N GCAACT SEQ ID NO: 7554
GTTTCCCAGTAGGTCTC N GCACAA SEQ ID NO: 7555
GTTTCCCAGTAGGTCTC N GCACAC SEQ ID NO: 7556
GTTTCCCAGTAGGTCTC N GCACAT SEQ ID NO: 7557
GTTTCCCAGTAGGTCTC N GCACCG SEQ ID NO: 7558
GTTTCCCAGTAGGTCTC N GCACCT SEQ ID NO: 7559
GTTTCCCAGTAGGTCTC N GCACTC SEQ ID NO: 7560
GTTTCCCAGTAGGTCTC N GCACTG SEQ ID NO: 7561
GTTTCCCAGTAGGTCTC N GCAGAG SEQ ID NO: 7562
GTTTCCCAGTAGGTCTC N GCAGCG SEQ ID NO: 7563
GTTTCCCAGTAGGTCTC N GCATAG SEQ ID NO: 7564
GTTTCCCAGTAGGTCTC N GCATCT SEQ ID NO: 7565
GTTTCCCAGTAGGTCTC N GCATTA SEQ ID NO: 7566
GTTTCCCAGTAGGTCTC N GCATTG SEQ ID NO: 7567
GTTTCCCAGTAGGTCTC N GCCAGT SEQ ID NO: 7568
GTTTCCCAGTAGGTCTC N GCCATA SEQ ID NO: 7569
GTTTCCCAGTAGGTCTC N GCCATT SEQ ID NO: 7570
GTTTCCCAGTAGGTCTC N GCCTTA SEQ ID NO: 7571
GTTTCCCAGTAGGTCTC N GCGACA SEQ ID NO: 7572
GTTTCCCAGTAGGTCTC N GCGACT SEQ ID NO: 7573
GTTTCCCAGTAGGTCTC N GCGAGA SEQ ID NO: 7574
GTTTCCCAGTAGGTCTC N GCGATC SEQ ID NO: 7575 GTTTCCCAGTAGGTCTC N GCGCCA SEQ ID NO: 7576
GTTTCCCAGTAGGTCTC N GCGCTT SEQ ID NO: 7577
GTTTCCCAGTAGGTCTC N GCGGAT SEQ ID NO: 7578
GTTTCCCAGTAGGTCTC N GCGGCA SEQ ID NO: 7579
GTTTCCCAGTAGGTCTC N GCGTAG SEQ ID NO: 7580
GTTTCCCAGTAGGTCTC N GCGTAT SEQ ID NO: 7581
GTTTCCCAGTAGGTCTC N GCGTCT SEQ ID NO: 7582
GTTTCCCAGTAGGTCTC N GCTACT SEQ ID NO: 7583
GTTTCCCAGTAGGTCTC N GCTAGC SEQ ID NO: 7584
GTTTCCCAGTAGGTCTC N GCTAGT SEQ ID NO: 7585
GTTTCCCAGTAGGTCTC N GCTATC SEQ ID NO: 7586
GTTTCCCAGTAGGTCTC N GCTCCA SEQ ID NO: 7587
GTTTCCCAGTAGGTCTC N GCTCGA SEQ ID NO: 7588
GTTTCCCAGTAGGTCTC N GCTCGG SEQ ID NO: 7589
GTTTCCCAGTAGGTCTC N GCTGAT SEQ ID NO: 7590
GTTTCCCAGTAGGTCTC N GCTGCC SEQ ID NO: 7591
GTTTCCCAGTAGGTCTC N GCTGCT SEQ ID NO: 7592
GTTTCCCAGTAGGTCTC N GCTGTA SEQ ID NO: 7593
GTTTCCCAGTAGGTCTC N GCTGTC SEQ ID NO: 7594
GTTTCCCAGTAGGTCTC N GCTGTG SEQ ID NO: 7595
GTTTCCCAGTAGGTCTC N GCTGTT SEQ ID NO: 7596
GTTTCCCAGTAGGTCTC N GCTTAT SEQ ID NO: 7597
GTTTCCCAGTAGGTCTC N GCTTCG SEQ ID NO: 7598
GTTTCCCAGTAGGTCTC N GCTTCT SEQ ID NO: 7599
GTTTCCCAGTAGGTCTC N GGAAAT SEQ ID NO: 7600
GTTTCCCAGTAGGTCTC N GGAAGC SEQ ID NO: 7601
GTTTCCCAGTAGGTCTC N GGACGT SEQ ID NO: 7602
GTTTCCCAGTAGGTCTC N GGACTG SEQ ID NO: 7603
GTTTCCCAGTAGGTCTC N GGACTT SEQ ID NO: 7604
GTTTCCCAGTAGGTCTC N GGAGCT SEQ ID NO: 7605
GTTTCCCAGTAGGTCTC N GGAGGC SEQ ID NO: 7606
GTTTCCCAGTAGGTCTC N GGATAT SEQ ID NO: 7607
GTTTCCCAGTAGGTCTC N GGATGT SEQ ID NO: 7608
GTTTCCCAGTAGGTCTC N GGCAAC SEQ ID NO: 7609
GTTTCCCAGTAGGTCTC N GGCAAT SEQ ID NO: 7610
GTTTCCCAGTAGGTCTC N GGCACT SEQ ID NO: 761 1
GTTTCCCAGTAGGTCTC N GGCAGA SEQ ID NO: 7612
GTTTCCCAGTAGGTCTC N GGCATC SEQ ID NO: 7613
GTTTCCCAGTAGGTCTC N GGCCAG SEQ ID NO: 7614 GTTTCCCAGTAGGTCTC N GGCCTG SEQ ID NO: 7615
GTTTCCCAGTAGGTCTC N GGCCTT SEQ ID NO: 7616
GTTTCCCAGTAGGTCTC N GGCTAG SEQ ID NO: 7617
GTTTCCCAGTAGGTCTC N GGCTCC SEQ ID NO: 7618
GTTTCCCAGTAGGTCTC N GGCTGT SEQ ID NO: 7619
GTTTCCCAGTAGGTCTC N GGCTTC SEQ ID NO: 7620
GTTTCCCAGTAGGTCTC N GGGACC SEQ ID NO: 7621
GTTTCCCAGTAGGTCTC N GGGACT SEQ ID NO: 7622
GTTTCCCAGTAGGTCTC N GGGCTC SEQ ID NO: 7623
GTTTCCCAGTAGGTCTC N GGGGGT SEQ ID NO: 7624
GTTTCCCAGTAGGTCTC N GGGGTA SEQ ID NO: 7625
GTTTCCCAGTAGGTCTC N GGGGTG SEQ ID NO: 7626
GTTTCCCAGTAGGTCTC N GGGGTT SEQ ID NO: 7627
GTTTCCCAGTAGGTCTC N GGGTAC SEQ ID NO: 7628
GTTTCCCAGTAGGTCTC N GGGTAG SEQ ID NO: 7629
GTTTCCCAGTAGGTCTC N GGGTGC SEQ ID NO: 7630
GTTTCCCAGTAGGTCTC N GGTACC SEQ ID NO: 7631
GTTTCCCAGTAGGTCTC N GGTACG SEQ ID NO: 7632
GTTTCCCAGTAGGTCTC N GGTACT SEQ ID NO: 7633
GTTTCCCAGTAGGTCTC N GGTAGA SEQ ID NO: 7634
GTTTCCCAGTAGGTCTC N GGTAGG SEQ ID NO: 7635
GTTTCCCAGTAGGTCTC N GGTATC SEQ ID NO: 7636
GTTTCCCAGTAGGTCTC N GGTATG SEQ ID NO: 7637
GTTTCCCAGTAGGTCTC N GGTATT SEQ ID NO: 7638
GTTTCCCAGTAGGTCTC N GGTCCA SEQ ID NO: 7639
GTTTCCCAGTAGGTCTC N GGTCTA SEQ ID NO: 7640
GTTTCCCAGTAGGTCTC N GGTGAT SEQ ID NO: 7641
GTTTCCCAGTAGGTCTC N GGTGCC SEQ ID NO: 7642
GTTTCCCAGTAGGTCTC N GGTTAC SEQ ID NO: 7643
GTTTCCCAGTAGGTCTC N GGTTAT SEQ ID NO: 7644
GTTTCCCAGTAGGTCTC N GGTTGT SEQ ID NO: 7645
GTTTCCCAGTAGGTCTC N GTAATA SEQ ID NO: 7646
GTTTCCCAGTAGGTCTC N GTAATG SEQ ID NO: 7647
GTTTCCCAGTAGGTCTC N GTACAG SEQ ID NO: 7648
GTTTCCCAGTAGGTCTC N GTACAT SEQ ID NO: 7649
GTTTCCCAGTAGGTCTC N GTACCA SEQ ID NO: 7650
GTTTCCCAGTAGGTCTC N GTACCT SEQ ID NO: 7651
GTTTCCCAGTAGGTCTC N GTACGC SEQ ID NO: 7652
GTTTCCCAGTAGGTCTC N GTACGT SEQ ID NO: 7653 GTTTCCCAGTAGGTCTC N GTACTA SEQ ID NO: 7654
GTTTCCCAGTAGGTCTC N GTAGAC SEQ ID NO: 7655
GTTTCCCAGTAGGTCTC N GTAGAT SEQ ID NO: 7656
GTTTCCCAGTAGGTCTC N GTAGGG SEQ ID NO: 7657
GTTTCCCAGTAGGTCTC N GTAGTG SEQ ID NO: 7658
GTTTCCCAGTAGGTCTC N GTATAT SEQ ID NO: 7659
GTTTCCCAGTAGGTCTC N GTATCA SEQ ID NO: 7660
GTTTCCCAGTAGGTCTC N GTATCG SEQ ID NO: 7661
GTTTCCCAGTAGGTCTC N GTATCT SEQ ID NO: 7662
GTTTCCCAGTAGGTCTC N GTATGA SEQ ID NO: 7663
GTTTCCCAGTAGGTCTC N GTATGC SEQ ID NO: 7664
GTTTCCCAGTAGGTCTC N GTATTA SEQ ID NO: 7665
GTTTCCCAGTAGGTCTC N GTATTC SEQ ID NO: 7666
GTTTCCCAGTAGGTCTC N GTATTT SEQ ID NO: 7667
GTTTCCCAGTAGGTCTC N GTCACG SEQ ID NO: 7668
GTTTCCCAGTAGGTCTC N GTCACT SEQ ID NO: 7669
GTTTCCCAGTAGGTCTC N GTCAGG SEQ ID NO: 7670
GTTTCCCAGTAGGTCTC N GTCATG SEQ ID NO: 7671
GTTTCCCAGTAGGTCTC N GTCCAC SEQ ID NO: 7672
GTTTCCCAGTAGGTCTC N GTCCCA SEQ ID NO: 7673
GTTTCCCAGTAGGTCTC N GTCCTC SEQ ID NO: 7674
GTTTCCCAGTAGGTCTC N GTCGAC SEQ ID NO: 7675
GTTTCCCAGTAGGTCTC N GTCGAT SEQ ID NO: 7676
GTTTCCCAGTAGGTCTC N GTCGCA SEQ ID NO: 7677
GTTTCCCAGTAGGTCTC N GTCGTT SEQ ID NO: 7678
GTTTCCCAGTAGGTCTC N GTCTAA SEQ ID NO: 7679
GTTTCCCAGTAGGTCTC N GTCTAC SEQ ID NO: 7680
GTTTCCCAGTAGGTCTC N GTCTAG SEQ ID NO: 7681
GTTTCCCAGTAGGTCTC N GTCTAT SEQ ID NO: 7682
GTTTCCCAGTAGGTCTC N GTCTCA SEQ ID NO: 7683
GTTTCCCAGTAGGTCTC N GTCTCT SEQ ID NO: 7684
GTTTCCCAGTAGGTCTC N GTCTGA SEQ ID NO: 7685
GTTTCCCAGTAGGTCTC N GTCTTA SEQ ID NO: 7686
GTTTCCCAGTAGGTCTC N GTGAAT SEQ ID NO: 7687
GTTTCCCAGTAGGTCTC N GTGACA SEQ ID NO: 7688
GTTTCCCAGTAGGTCTC N GTGATA SEQ ID NO: 7689
GTTTCCCAGTAGGTCTC N GTGCCC SEQ ID NO: 7690
GTTTCCCAGTAGGTCTC N GTGCCG SEQ ID NO: 7691
GTTTCCCAGTAGGTCTC N GTGCGA SEQ ID NO: 7692 GTTTCCCAGTAGGTCTC N GTGCGC SEQ ID NO: 7693
GTTTCCCAGTAGGTCTC N GTGCGT SEQ ID NO: 7694
GTTTCCCAGTAGGTCTC N GTGCTA SEQ ID NO: 7695
GTTTCCCAGTAGGTCTC N GTGCTG SEQ ID NO: 7696
GTTTCCCAGTAGGTCTC N GTGGTC SEQ ID NO: 7697
GTTTCCCAGTAGGTCTC N GTGGTT SEQ ID NO: 7698
GTTTCCCAGTAGGTCTC N GTGTCG SEQ ID NO: 7699
GTTTCCCAGTAGGTCTC N GTGTCT SEQ ID NO: 7700
GTTTCCCAGTAGGTCTC N GTGTGG SEQ ID NO: 7701
GTTTCCCAGTAGGTCTC N GTTAAC SEQ ID NO: 7702
GTTTCCCAGTAGGTCTC N GTTACA SEQ ID NO: 7703
GTTTCCCAGTAGGTCTC N GTTACT SEQ ID NO: 7704
GTTTCCCAGTAGGTCTC N GTTAGA SEQ ID NO: 7705
GTTTCCCAGTAGGTCTC N GTTAGC SEQ ID NO: 7706
GTTTCCCAGTAGGTCTC N GTTATA SEQ ID NO: 7707
GTTTCCCAGTAGGTCTC N GTTATC SEQ ID NO: 7708
GTTTCCCAGTAGGTCTC N GTTCGG SEQ ID NO: 7709
GTTTCCCAGTAGGTCTC N GTTGCG SEQ ID NO: 7710
GTTTCCCAGTAGGTCTC N GTTGGC SEQ ID NO: 771 1
GTTTCCCAGTAGGTCTC N GTTGGG SEQ ID NO: 7712
GTTTCCCAGTAGGTCTC N GTTGTC SEQ ID NO: 7713
GTTTCCCAGTAGGTCTC N GTTGTG SEQ ID NO: 7714
GTTTCCCAGTAGGTCTC N GTTTAT SEQ ID NO: 7715
GTTTCCCAGTAGGTCTC N GTTTCA SEQ ID NO: 7716
GTTTCCCAGTAGGTCTC N GTTTCG SEQ ID NO: 7717
GTTTCCCAGTAGGTCTC N GTTTGC SEQ ID NO: 7718
GTTTCCCAGTAGGTCTC N GTTTTG SEQ ID NO: 7719
GTTTCCCAGTAGGTCTC N GTTTTT SEQ ID NO: 7720
GTTTCCCAGTAGGTCTC N TAAAAT SEQ ID NO: 7721
GTTTCCCAGTAGGTCTC N TAACCG SEQ ID NO: 7722
GTTTCCCAGTAGGTCTC N TAACGT SEQ ID NO: 7723
GTTTCCCAGTAGGTCTC N TAACTC SEQ ID NO: 7724
GTTTCCCAGTAGGTCTC N TAAGTT SEQ ID NO: 7725
GTTTCCCAGTAGGTCTC N TAATAA SEQ ID NO: 7726
GTTTCCCAGTAGGTCTC N TAATCT SEQ ID NO: 7727
GTTTCCCAGTAGGTCTC N TAATGC SEQ ID NO: 7728
GTTTCCCAGTAGGTCTC N TAATTG SEQ ID NO: 7729
GTTTCCCAGTAGGTCTC N TACAAG SEQ ID NO: 7730
GTTTCCCAGTAGGTCTC N TACACG SEQ ID NO: 7731 GTTTCCCAGTAGGTCTC N TACAGC SEQ ID NO: 7732
GTTTCCCAGTAGGTCTC N TACAGG SEQ ID NO: 7733
GTTTCCCAGTAGGTCTC N TACAGT SEQ ID NO: 7734
GTTTCCCAGTAGGTCTC N TACATA SEQ ID NO: 7735
GTTTCCCAGTAGGTCTC N TACATC SEQ ID NO: 7736
GTTTCCCAGTAGGTCTC N TACCAA SEQ ID NO: 7737
GTTTCCCAGTAGGTCTC N TACCGA SEQ ID NO: 7738
GTTTCCCAGTAGGTCTC N TACCTG SEQ ID NO: 7739
GTTTCCCAGTAGGTCTC N TACGCT SEQ ID NO: 7740
GTTTCCCAGTAGGTCTC N TACGGC SEQ ID NO: 7741
GTTTCCCAGTAGGTCTC N TACGGG SEQ ID NO: 7742
GTTTCCCAGTAGGTCTC N TACGGT SEQ ID NO: 7743
GTTTCCCAGTAGGTCTC N TACGTA SEQ ID NO: 7744
GTTTCCCAGTAGGTCTC N TACGTC SEQ ID NO: 7745
GTTTCCCAGTAGGTCTC N TACGTT SEQ ID NO: 7746
GTTTCCCAGTAGGTCTC N TACTCG SEQ ID NO: 7747
GTTTCCCAGTAGGTCTC N TACTGC SEQ ID NO: 7748
GTTTCCCAGTAGGTCTC N TACTGT SEQ ID NO: 7749
GTTTCCCAGTAGGTCTC N TACTTA SEQ ID NO: 7750
GTTTCCCAGTAGGTCTC N TAGACG SEQ ID NO: 7751
GTTTCCCAGTAGGTCTC N TAGATT SEQ ID NO: 7752
GTTTCCCAGTAGGTCTC N TAGCAC SEQ ID NO: 7753
GTTTCCCAGTAGGTCTC N TAGCCG SEQ ID NO: 7754
GTTTCCCAGTAGGTCTC N TAGCGC SEQ ID NO: 7755
GTTTCCCAGTAGGTCTC N TAGCGT SEQ ID NO: 7756
GTTTCCCAGTAGGTCTC N TAGGCA SEQ ID NO: 7757
GTTTCCCAGTAGGTCTC N TAGGTG SEQ ID NO: 7758
GTTTCCCAGTAGGTCTC N TAGTGC SEQ ID NO: 7759
GTTTCCCAGTAGGTCTC N TAGTGT SEQ ID NO: 7760
GTTTCCCAGTAGGTCTC N TATACG SEQ ID NO: 7761
GTTTCCCAGTAGGTCTC N TATATC SEQ ID NO: 7762
GTTTCCCAGTAGGTCTC N TATATG SEQ ID NO: 7763
GTTTCCCAGTAGGTCTC N TATCGA SEQ ID NO: 7764
GTTTCCCAGTAGGTCTC N TATCGC SEQ ID NO: 7765
GTTTCCCAGTAGGTCTC N TATCGG SEQ ID NO: 7766
GTTTCCCAGTAGGTCTC N TATCGT SEQ ID NO: 7767
GTTTCCCAGTAGGTCTC N TATCTT SEQ ID NO: 7768
GTTTCCCAGTAGGTCTC N TATGAT SEQ ID NO: 7769
GTTTCCCAGTAGGTCTC N TATGCA SEQ ID NO: 7770 GTTTCCCAGTAGGTCTC N TATGCG SEQ ID NO: 7771
GTTTCCCAGTAGGTCTC N TATGGA SEQ ID NO: 7772
GTTTCCCAGTAGGTCTC N TATGGC SEQ ID NO: 7773
GTTTCCCAGTAGGTCTC N TATGGG SEQ ID NO: 7774
GTTTCCCAGTAGGTCTC N TATGTA SEQ ID NO: 7775
GTTTCCCAGTAGGTCTC N TATGTC SEQ ID NO: 7776
GTTTCCCAGTAGGTCTC N TATGTT SEQ ID NO: 7777
GTTTCCCAGTAGGTCTC N TATTCG SEQ ID NO: 7778
GTTTCCCAGTAGGTCTC N TATTGG SEQ ID NO: 7779
GTTTCCCAGTAGGTCTC N TATTGT SEQ ID NO: 7780
GTTTCCCAGTAGGTCTC N TATTTG SEQ ID NO: 7781
GTTTCCCAGTAGGTCTC N TCACAG SEQ ID NO: 7782
GTTTCCCAGTAGGTCTC N TCACCG SEQ ID NO: 7783
GTTTCCCAGTAGGTCTC N TCACGC SEQ ID NO: 7784
GTTTCCCAGTAGGTCTC N TCACGG SEQ ID NO: 7785
GTTTCCCAGTAGGTCTC N TCACGT SEQ ID NO: 7786
GTTTCCCAGTAGGTCTC N TCACTT SEQ ID NO: 7787
GTTTCCCAGTAGGTCTC N TCAGAG SEQ ID NO: 7788
GTTTCCCAGTAGGTCTC N TCAGGC SEQ ID NO: 7789
GTTTCCCAGTAGGTCTC N TCAGGT SEQ ID NO: 7790
GTTTCCCAGTAGGTCTC N TCAGTG SEQ ID NO: 7791
GTTTCCCAGTAGGTCTC N TCATGA SEQ ID NO: 7792
GTTTCCCAGTAGGTCTC N TCATGC SEQ ID NO: 7793
GTTTCCCAGTAGGTCTC N TCATGG SEQ ID NO: 7794
GTTTCCCAGTAGGTCTC N TCATGT SEQ ID NO: 7795
GTTTCCCAGTAGGTCTC N TCCAAC SEQ ID NO: 7796
GTTTCCCAGTAGGTCTC N TCCACA SEQ ID NO: 7797
GTTTCCCAGTAGGTCTC N TCCACG SEQ ID NO: 7798
GTTTCCCAGTAGGTCTC N TCCAGA SEQ ID NO: 7799
GTTTCCCAGTAGGTCTC N TCCATT SEQ ID NO: 7800
GTTTCCCAGTAGGTCTC N TCCCAG SEQ ID NO: 7801
GTTTCCCAGTAGGTCTC N TCCCTT SEQ ID NO: 7802
GTTTCCCAGTAGGTCTC N TCCGAG SEQ ID NO: 7803
GTTTCCCAGTAGGTCTC N TCCGTC SEQ ID NO: 7804
GTTTCCCAGTAGGTCTC N TCCTGG SEQ ID NO: 7805
GTTTCCCAGTAGGTCTC N TCCTGT SEQ ID NO: 7806
GTTTCCCAGTAGGTCTC N TCCTTG SEQ ID NO: 7807
GTTTCCCAGTAGGTCTC N TCGAAT SEQ ID NO: 7808
GTTTCCCAGTAGGTCTC N TCGACA SEQ ID NO: 7809 GTTTCCCAGTAGGTCTC N TCGACC SEQ ID NO: 7810
GTTTCCCAGTAGGTCTC N TCGACG SEQ ID NO: 781 1
GTTTCCCAGTAGGTCTC N TCGACT SEQ ID NO: 7812
GTTTCCCAGTAGGTCTC N TCGAGC SEQ ID NO: 7813
GTTTCCCAGTAGGTCTC N TCGAGT SEQ ID NO: 7814
GTTTCCCAGTAGGTCTC N TCGATC SEQ ID NO: 7815
GTTTCCCAGTAGGTCTC N TCGCAA SEQ ID NO: 7816
GTTTCCCAGTAGGTCTC N TCGCAT SEQ ID NO: 7817
GTTTCCCAGTAGGTCTC N TCGCGT SEQ ID NO: 7818
GTTTCCCAGTAGGTCTC N TCGGAT SEQ ID NO: 7819
GTTTCCCAGTAGGTCTC N TCGGTA SEQ ID NO: 7820
GTTTCCCAGTAGGTCTC N TCGGTG SEQ ID NO: 7821
GTTTCCCAGTAGGTCTC N TCGTAC SEQ ID NO: 7822
GTTTCCCAGTAGGTCTC N TCGTCG SEQ ID NO: 7823
GTTTCCCAGTAGGTCTC N TCGTCT SEQ ID NO: 7824
GTTTCCCAGTAGGTCTC N TCGTGT SEQ ID NO: 7825
GTTTCCCAGTAGGTCTC N TCGTTA SEQ ID NO: 7826
GTTTCCCAGTAGGTCTC N TCGTTC SEQ ID NO: 7827
GTTTCCCAGTAGGTCTC N TCGTTG SEQ ID NO: 7828
GTTTCCCAGTAGGTCTC N TCTAAC SEQ ID NO: 7829
GTTTCCCAGTAGGTCTC N TCTACG SEQ ID NO: 7830
GTTTCCCAGTAGGTCTC N TCTAGC SEQ ID NO: 7831
GTTTCCCAGTAGGTCTC N TCTAGG SEQ ID NO: 7832
GTTTCCCAGTAGGTCTC N TCTATG SEQ ID NO: 7833
GTTTCCCAGTAGGTCTC N TCTCAG SEQ ID NO: 7834
GTTTCCCAGTAGGTCTC N TCTCCC SEQ ID NO: 7835
GTTTCCCAGTAGGTCTC N TCTCGT SEQ ID NO: 7836
GTTTCCCAGTAGGTCTC N TCTCTA SEQ ID NO: 7837
GTTTCCCAGTAGGTCTC N TCTCTG SEQ ID NO: 7838
GTTTCCCAGTAGGTCTC N TCTGCG SEQ ID NO: 7839
GTTTCCCAGTAGGTCTC N TCTGCT SEQ ID NO: 7840
GTTTCCCAGTAGGTCTC N TCTGGA SEQ ID NO: 7841
GTTTCCCAGTAGGTCTC N TCTGGG SEQ ID NO: 7842
GTTTCCCAGTAGGTCTC N TCTGTC SEQ ID NO: 7843
GTTTCCCAGTAGGTCTC N TCTTCG SEQ ID NO: 7844
GTTTCCCAGTAGGTCTC N TCTTGT SEQ ID NO: 7845
GTTTCCCAGTAGGTCTC N TGAAGT SEQ ID NO: 7846
GTTTCCCAGTAGGTCTC N TGAATC SEQ ID NO: 7847
GTTTCCCAGTAGGTCTC N TGACAT SEQ ID NO: 7848 GTTTCCCAGTAGGTCTC N TGACCG SEQ ID NO: 7849
GTTTCCCAGTAGGTCTC N TGACGA SEQ ID NO: 7850
GTTTCCCAGTAGGTCTC N TGACTT SEQ ID NO: 7851
GTTTCCCAGTAGGTCTC N TGAGAT SEQ ID NO: 7852
GTTTCCCAGTAGGTCTC N TGAGGG SEQ ID NO: 7853
GTTTCCCAGTAGGTCTC N TGAGTA SEQ ID NO: 7854
GTTTCCCAGTAGGTCTC N TGAGTC SEQ ID NO: 7855
GTTTCCCAGTAGGTCTC N TGATAC SEQ ID NO: 7856
GTTTCCCAGTAGGTCTC N TGATTC SEQ ID NO: 7857
GTTTCCCAGTAGGTCTC N TGATTG SEQ ID NO: 7858
GTTTCCCAGTAGGTCTC N TGCAAC SEQ ID NO: 7859
GTTTCCCAGTAGGTCTC N TGCACA SEQ ID NO: 7860
GTTTCCCAGTAGGTCTC N TGCACC SEQ ID NO: 7861
GTTTCCCAGTAGGTCTC N TGCAGG SEQ ID NO: 7862
GTTTCCCAGTAGGTCTC N TGCATC SEQ ID NO: 7863
GTTTCCCAGTAGGTCTC N TGCCCG SEQ ID NO: 7864
GTTTCCCAGTAGGTCTC N TGCCGG SEQ ID NO: 7865
GTTTCCCAGTAGGTCTC N TGCCGT SEQ ID NO: 7866
GTTTCCCAGTAGGTCTC N TGCGAC SEQ ID NO: 7867
GTTTCCCAGTAGGTCTC N TGCGAT SEQ ID NO: 7868
GTTTCCCAGTAGGTCTC N TGCGCA SEQ ID NO: 7869
GTTTCCCAGTAGGTCTC N TGCGCG SEQ ID NO: 7870
GTTTCCCAGTAGGTCTC N TGCGCT SEQ ID NO: 7871
GTTTCCCAGTAGGTCTC N TGCGTA SEQ ID NO: 7872
GTTTCCCAGTAGGTCTC N TGCTAC SEQ ID NO: 7873
GTTTCCCAGTAGGTCTC N TGCTAG SEQ ID NO: 7874
GTTTCCCAGTAGGTCTC N TGCTAT SEQ ID NO: 7875
GTTTCCCAGTAGGTCTC N TGCTCC SEQ ID NO: 7876
GTTTCCCAGTAGGTCTC N TGCTGA SEQ ID NO: 7877
GTTTCCCAGTAGGTCTC N TGCTGG SEQ ID NO: 7878
GTTTCCCAGTAGGTCTC N TGCTGT SEQ ID NO: 7879
GTTTCCCAGTAGGTCTC N TGCTTT SEQ ID NO: 7880
GTTTCCCAGTAGGTCTC N TGGACT SEQ ID NO: 7881
GTTTCCCAGTAGGTCTC N TGGAGT SEQ ID NO: 7882
GTTTCCCAGTAGGTCTC N TGGCAG SEQ ID NO: 7883
GTTTCCCAGTAGGTCTC N TGGCTA SEQ ID NO: 7884
GTTTCCCAGTAGGTCTC N TGGGAC SEQ ID NO: 7885
GTTTCCCAGTAGGTCTC N TGGTAC SEQ ID NO: 7886
GTTTCCCAGTAGGTCTC N TGGTCT SEQ ID NO: 7887 GTTTCCCAGTAGGTCTC N TGTAAG SEQ ID NO: 7888
GTTTCCCAGTAGGTCTC N TGTACC SEQ ID NO: 7889
GTTTCCCAGTAGGTCTC N TGTAGT SEQ ID NO: 7890
GTTTCCCAGTAGGTCTC N TGTATA SEQ ID NO: 7891
GTTTCCCAGTAGGTCTC N TGTATC SEQ ID NO: 7892
GTTTCCCAGTAGGTCTC N TGTATT SEQ ID NO: 7893
GTTTCCCAGTAGGTCTC N TGTCAC SEQ ID NO: 7894
GTTTCCCAGTAGGTCTC N TGTCAT SEQ ID NO: 7895
GTTTCCCAGTAGGTCTC N TGTCGC SEQ ID NO: 7896
GTTTCCCAGTAGGTCTC N TGTCGT SEQ ID NO: 7897
GTTTCCCAGTAGGTCTC N TGTCTA SEQ ID NO: 7898
GTTTCCCAGTAGGTCTC N TGTCTC SEQ ID NO: 7899
GTTTCCCAGTAGGTCTC N TGTCTT SEQ ID NO: 7900
GTTTCCCAGTAGGTCTC N TGTGAC SEQ ID NO: 7901
GTTTCCCAGTAGGTCTC N TGTGCA SEQ ID NO: 7902
GTTTCCCAGTAGGTCTC N TGTGCG SEQ ID NO: 7903
GTTTCCCAGTAGGTCTC N TGTGGA SEQ ID NO: 7904
GTTTCCCAGTAGGTCTC N TGTGTC SEQ ID NO: 7905
GTTTCCCAGTAGGTCTC N TGTGTG SEQ ID NO: 7906
GTTTCCCAGTAGGTCTC N TGTTAA SEQ ID NO: 7907
GTTTCCCAGTAGGTCTC N TGTTAC SEQ ID NO: 7908
GTTTCCCAGTAGGTCTC N TGTTAT SEQ ID NO: 7909
GTTTCCCAGTAGGTCTC N TGTTCC SEQ ID NO: 7910
GTTTCCCAGTAGGTCTC N TGTTTC SEQ ID NO: 791 1
GTTTCCCAGTAGGTCTC N TGTTTG SEQ ID NO: 7912
GTTTCCCAGTAGGTCTC N TTAACG SEQ ID NO: 7913
GTTTCCCAGTAGGTCTC N TTAATG SEQ ID NO: 7914
GTTTCCCAGTAGGTCTC N TTACAG SEQ ID NO: 7915
GTTTCCCAGTAGGTCTC N TTACAT SEQ ID NO: 7916
GTTTCCCAGTAGGTCTC N TTACCG SEQ ID NO: 7917
GTTTCCCAGTAGGTCTC N TTACCT SEQ ID NO: 7918
GTTTCCCAGTAGGTCTC N TTACGT SEQ ID NO: 7919
GTTTCCCAGTAGGTCTC N TTACTC SEQ ID NO: 7920
GTTTCCCAGTAGGTCTC N TTACTG SEQ ID NO: 7921
GTTTCCCAGTAGGTCTC N TTAGCG SEQ ID NO: 7922
GTTTCCCAGTAGGTCTC N TTAGGC SEQ ID NO: 7923
GTTTCCCAGTAGGTCTC N TTAGGG SEQ ID NO: 7924
GTTTCCCAGTAGGTCTC N TTATCG SEQ ID NO: 7925
GTTTCCCAGTAGGTCTC N TTATCT SEQ ID NO: 7926 GTTTCCCAGTAGGTCTC N TTATGC SEQ ID NO: 7927
GTTTCCCAGTAGGTCTC N TTATGT SEQ ID NO: 7928
GTTTCCCAGTAGGTCTC N TTATTT SEQ ID NO: 7929
GTTTCCCAGTAGGTCTC N TTCACG SEQ ID NO: 7930
GTTTCCCAGTAGGTCTC N TTCACT SEQ ID NO: 7931
GTTTCCCAGTAGGTCTC N TTCAGG SEQ ID NO: 7932
GTTTCCCAGTAGGTCTC N TTCATC SEQ ID NO: 7933
GTTTCCCAGTAGGTCTC N TTCATG SEQ ID NO: 7934
GTTTCCCAGTAGGTCTC N TTCCAA SEQ ID NO: 7935
GTTTCCCAGTAGGTCTC N TTCCAC SEQ ID NO: 7936
GTTTCCCAGTAGGTCTC N TTCCTG SEQ ID NO: 7937
GTTTCCCAGTAGGTCTC N TTCGAC SEQ ID NO: 7938
GTTTCCCAGTAGGTCTC N TTCGAG SEQ ID NO: 7939
GTTTCCCAGTAGGTCTC N TTCGCA SEQ ID NO: 7940
GTTTCCCAGTAGGTCTC N TTCGGC SEQ ID NO: 7941
GTTTCCCAGTAGGTCTC N TTCGGT SEQ ID NO: 7942
GTTTCCCAGTAGGTCTC N TTCGTC SEQ ID NO: 7943
GTTTCCCAGTAGGTCTC N TTCTAA SEQ ID NO: 7944
GTTTCCCAGTAGGTCTC N TTCTGG SEQ ID NO: 7945
GTTTCCCAGTAGGTCTC N TTGAAT SEQ ID NO: 7946
GTTTCCCAGTAGGTCTC N TTGAGA SEQ ID NO: 7947
GTTTCCCAGTAGGTCTC N TTGAGT SEQ ID NO: 7948
GTTTCCCAGTAGGTCTC N TTGATG SEQ ID NO: 7949
GTTTCCCAGTAGGTCTC N TTGCAC SEQ ID NO: 7950
GTTTCCCAGTAGGTCTC N TTGCAG SEQ ID NO: 7951
GTTTCCCAGTAGGTCTC N TTGCAT SEQ ID NO: 7952
GTTTCCCAGTAGGTCTC N TTGCGA SEQ ID NO: 7953
GTTTCCCAGTAGGTCTC N TTGCGG SEQ ID NO: 7954
GTTTCCCAGTAGGTCTC N TTGCTA SEQ ID NO: 7955
GTTTCCCAGTAGGTCTC N TTGGAG SEQ ID NO: 7956
GTTTCCCAGTAGGTCTC N TTGGCA SEQ ID NO: 7957
GTTTCCCAGTAGGTCTC N TTGGCC SEQ ID NO: 7958
GTTTCCCAGTAGGTCTC N TTGGGC SEQ ID NO: 7959
GTTTCCCAGTAGGTCTC N TTGTAT SEQ ID NO: 7960
GTTTCCCAGTAGGTCTC N TTGTCA SEQ ID NO: 7961
GTTTCCCAGTAGGTCTC N TTGTCC SEQ ID NO: 7962
GTTTCCCAGTAGGTCTC N TTGTCG SEQ ID NO: 7963
GTTTCCCAGTAGGTCTC N TTGTGC SEQ ID NO: 7964
GTTTCCCAGTAGGTCTC N TTGTGT SEQ ID NO: 7965 GTTTCCCAGTAGGTCTC N TTGTTA SEQ ID NO: 7966
GTTTCCCAGTAGGTCTC N TTGTTT SEQ ID NO: 7967
GTTTCCCAGTAGGTCTC N TTTAGG SEQ ID NO: 7968
GTTTCCCAGTAGGTCTC N TTTAGT SEQ ID NO: 7969
GTTTCCCAGTAGGTCTC N TTTATG SEQ ID NO: 7970
GTTTCCCAGTAGGTCTC N TTTCAA SEQ ID NO: 7971
GTTTCCCAGTAGGTCTC N TTTCCA SEQ ID NO: 7972
GTTTCCCAGTAGGTCTC N TTTCCG SEQ ID NO: 7973
GTTTCCCAGTAGGTCTC N TTTCCT SEQ ID NO: 7974
GTTTCCCAGTAGGTCTC N TTTGAG SEQ ID NO: 7975
GTTTCCCAGTAGGTCTC N TTTGCG SEQ ID NO: 7976
GTTTCCCAGTAGGTCTC N TTTGCT SEQ ID NO: 7977
GTTTCCCAGTAGGTCTC N TTTGGC SEQ ID NO: 7978
GTTTCCCAGTAGGTCTC N TTTTCC SEQ ID NO: 7979
GTTTCCCAGTAGGTCTC N TTTTGC SEQ ID NO: 7980
GTTTCCCAGTAGGTCTC N TTTTTC SEQ ID NO: 7981
GTTTCCCAGTAGGTCTC N JTTTTT SEQ ID NO: 7982
Generation of a Library of MEP Primers
[00112] Total RNA (0.5 μg) from tissue samples was treated with DNAse I (Ambion DNA-free, Austin, TX, USA) and reverse transcribed by using superscript III (Invitrogen, Carlsbad, CA, USA) with the FS-MEP (forward strand microbe enrichment primers), as illustrated in Fig. 8. The reverse transcription reaction was incubated at 30 min at 40°C followed by an inactivation step of 75 °C for 15 min. Second-strand cDNA synthesis was carried out using RS-MEP (reverse strand microbe enrichment primers) and the 3' to 5' exo" Klenow fragment (New England BioLabs, Ipswich, MA). The purified doubled stranded cDNA is amplified using the Extend primers (Fig. 3C). The resulting purified PCR product represents the library. PCR products are ligated to linkers for sequencing on a GSL FLX Sequencer (454 Life Sciences, Branford, CT, USA).
[00113] Sequence data generated over the course of 45 UHTS experiments performed with the 454-Roche sequencing platform was analyzed. For each library, sequence reads identified for human, 18S and 28S rRNA; mitochondrial 12S and 16S rRNA, ATP Synthase, NADH dehydrogenase (ND2, ND4, ND6) are selected and mapped to the reference sequences. The read distribution for each host gene is determined by coverage depth. The coverage depth for a given nucleotide in a reference sequence is evaluated as the number of reads that align to the reference sequence and cover this given nucleotide (Figs. 2A-2B).
[00114] The plot of the coverage depth as a function of the nucleotide position gives a quantitative measure of the read distribution and unravels the regions that are amplified (Fig. 4B). Figs. 6A-6D illustrate the normalized coverage depth for human genes 12S, 16S, 18S and 28S rR A for all 45 UHTS experiments. Regions giving a large number of reads (arbitrary cutoff: >1% of relative coverage depth) are identified. The coverage depth pattern slightly fluctuates between the different UHTS experiments and is tissue independent.
Identical analyses were conducted for mitochondrial host genes.
[00115] The analysis revealed a non-uniform coverage depth of host genes
(ribosomal (18S rRNA, 28S rRNA) and mitochondrial (12S rRNA, 16S rRNA, ATP
Synthase, NADH dehydrogenase) despite the fact that random primers were used to generate the cDNA libraries (2-3) (Figs. 6A-6D and 7A-7D). When this non-uniformity was investigated, a strong correlation was found between the presence of RNA secondary structures (e.g., rich G-C content) in RNAs and the absence of amplification of such region.
[00116] It was postulated that regions having strong secondary structures will impede cDNA synthesis. Prediction of secondary structures in a gene allows (at least qualitatively) to predict the distribution of reads throughout the gene (5). Thus, only primers with a similar sequence to the target gene and not present in a region comprising a secondary structure are able to generate sequence reads. All possible hexamer sequences with perfect sequence matches to the host regions giving >1% of relative coverage depth were excluded from the primer library (Figs. 6A-6D, Table 1). Focusing on these specific host regions allows to exclude only a minimal number of primers from the random hexamer library, thus increasing the sensitivity of microbial RNA detection.
[00117] The resultant microbe enrichment primer set (MEP), is composed of 828 hexamers for the forward strand MEP primers (FS-MEP) and 834 hexamers for the reverse strand MEP primers (RS-MEP).
Microbe Enrichment Primer (MEP) Synthesis
[00118] Each microbe enrichment primer was synthesized individually by Euro fins MWG Operon (Huntsville, AL, USA). Oligonucleotides were desalted and resuspended in water to 100 μΜ before pooling at equimolar concentrations. [00119] Table 10 shows Real Time PCR results on 18S and 28S ribosomal RNA genes of a liver specimen using the using random octamer primers (standard protocol) or MEP primers (MEP protocol).
TABLE 10
Figure imgf000107_0001
As can be seen from Table 10, the MEP protocol reduced the number of unwanted reads from the 18S rRNA gene approximately 7,000-fold, while the number of unwanted reads from the 28S rRNA gene was reduced approximately 200-fold.
[00120] To evaluate the performance of the MEP primers (MEP method), sequences obtained from cDNA libraries prepared with random primers (Standard Method) were compared with sequences obtained from cDNA library prepared with MEP primers from a liver sample infected with Lujo virus (6). After isolation of the total RNA from the sample, total RNA (250 to 500 ng) was reverse-transcribed using the Superscript III first strand synthesis reverse transcriptase (Invitrogen, Carlsbad, CA) and the FS-MEP primers. Second strand cDNA synthesis was performed using the Exo" Klenow fragment (New England BioLabs, Ipswich, MA) and the reverse-strand MEP (RS-MEP) primers. Purified double stranded cDNAs are amplified using Expand plus enzyme and Extend primers (Roche Diagnostics, Branchburg, NJ).
[00121] Comparison of rRNA and viral sequences between both cDNA libraries revealed a 67 % depletion of host rRNA and a dramatic increase of the number of viral reads using the protocol with MEP primers, as seen in Table 11.
TABLE 11
Figure imgf000107_0002
[00122] Figs. 9A-9B show the raw coverage depth of Lujo virus for its S and L segments, respectively, using the MEP primers and using random primers. [00123] The results show that MEP primers provide efficient enrichment in microbial sequences. The methods described can be used to improve the design of primers and probes for use in microbial diagnostics. MEP primer sets can be used to generate libraries efficiently enriched for the presence of bacterial, viral, fungal, or other parasite sequences.
[00124] Experimental methods:
[00125] Step 1: A first strand of cDNA was prepared. Mix 1 was prepared by mixing R A template, water, and primers according to Table 12 below, and heating for 5 minutes at 65 °C.
Table 12: Mix 1
Figure imgf000108_0001
[00126] Mix 2 was prepared according to Table 13 below. Mix 1 was then added to Mix 2 and the combination was heated for 30 minutes at 40°C, then for 15 minutes at 75°C.
Table 13: Mix 2
Figure imgf000108_0002
[00127] Step 2: RNAse H (Invitrogen, Carlsbad, CA) was then added to the combination of Mix 1 and Mix 2, which was then heated for 20 minutes at 37°C, and then heated for 5 minutes at 70 °C.
[00128] Step 3: The combination of Mix 1 and Mix 2 was purified using the QIAquick™ purification kit (Qiagen, Valencia, CA), and eluted in 30 of water. This yielded a purified first strand of cDNA.
[00129] Step 4: The purified first strand of cDNA of Step 3 was mixed with the components listed in Table 14 below, then heated for 30 minutes at 37 °C. This yielded double-stranded cDNA. Table 14: Second strand cDNA synthesis
Figure imgf000109_0001
[00130] Step 5: The double-stranded cDNA from Step 4 was purified using the QIAquick™ purification kit (Qiagen, Valencia, CA), and eluted in 30 of water. This yielded purified double-stranded cDNA.
[00131] Step 6: The purified double-stranded cDNA from Step 5 was prepared as described in Table 4. PCR was performed according to Table 15.
Table 15: Purified double-stranded cDNA
Figure imgf000109_0002
Figure imgf000109_0003
[00132] Step 7: After the PCR reaction was completed, the resulting material was purified using the QIAquick™ purification kit (Qiagen, Valencia, CA), and eluted in 30 μΙ_, of water.
References
All of the references below are incorporated herein by reference in their entirety for the specific purpose they are mentioned in the text.
1. Armour et al., Digital transcriptome profiling using selective hexamer priming for cDNA synthesis, Nature Methods 6, 647-649 (2009).
2. Cox-Foster et al., 2007. A metagenomic survey of microbes in honey bee colony collapse disorder. Science 318:283-7.
3. Palacios et al., 2008. A new arenavirus in a cluster of fatal transplant-associated diseases. N Engl J Med 358:991-8.
4. Margulies et al., 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376-80.
5. Hofacker et. al, Fast Folding and Comparison of R A Secondary Structures, Monatsh. Chem. 125, 167-188 (1994).
6. Briese et al., 2009. Genetic detection and characterization of Lujo virus, a new hemorrhagic fever associated arenavirus from southern Africa. PLoS Pathog 5:el000455.

Claims

Claims:
1. A composition comprising 20 or more nucleic acid sequences, wherein each of the 20 nucleic acid sequence comprises a different hexamer sequence selected from the group consisting of the hexamers sequences of any of SEQ ID NOs: 1-1662, provided that at least one nucleic sequence does not comprise a hexamer sequence selected from the group consisting of the hexamer sequences of any of SEQ ID NOs: 3325- 4822.
2. The composition of claim 1, wherein each different hexamer sequence is selected from the group consisting of the hexamer sequences of any of SEQ ID NOs: 1663- 2490.
3. The composition of claim 1, wherein each different hexamer sequence is selected from the group consisting of hexamer sequences of any of SEQ ID NOs: 2491-3324.
4. The composition of any one of claims 1-3 comprising 200 or more nucleic acid
sequences.
5. The composition of any one of claims 1-3 comprising 800 or more nucleic acid
sequences.
6. The composition of any one of claims 1-5, wherein each nucleic acid further
comprises a tail sequence 5 ' to the hexamer sequence, wherein the tail sequence is about 10 to about 22 nucleotides in length, wherein tail sequence is separated from the hexamer sequence by 0 to 10 nucleotides, wherein each nucleic acid comprises the same tail sequence.
7. The composition of any one of claims 1-6, wherein each nucleic acid sequence has the same length or substantially the same length.
8. The composition of any one of claims 1-7, wherein each nucleic acid has the hexamer sequence in the same position in the nucleic acid.
9. The composition of any one of claims 1-8, wherein each nucleic acid sequence is a primer.
10. The composition of any one of claims 1-9, wherein each nucleic sequence is DNA, RNA, PNA, LNA, GNA or TNA.
11. A method for designing a primer set for amplification of microbial nucleic acids in an organism comprising:
(a) sequencing the transcriptome of an organism;
(b) identifying highly expressed genes of the organism from the plurality of sequence reads identified in step (a);
(c) providing a first primer library, wherein each primer comprises a different hexamer sequence; and
(d) removing primers from the first primer library that are predicted to anneal to the RNA of the organism's highly expressed genes to generate a second primer library, provided that primers expected to anneal to the RNA predicted to form a secondary structure are not removed from the first primer library.
12. The method of claim 11, wherein primers comprising hexamer sequences with perfect sequence matches to the regions giving a substantial number of reads are removed from the first primer library.
13. The method of claim 12, wherein the substantial amount of reads is more than 1% of the relative coverage depth.
14. The method of any one of claims 11-13, wherein steps (b) or (d) are performed by a computer.
15. The method of any one of claims 11-14, wherein the organism's highly expressed genes comprise 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase, NADH dehydrogenase.
16. The method of claim 15, wherein the organism's highly expressed genes further
comprise one or more additional oxidative phosphorylation genes.
17. The method of any one of claims 11-16, wherein the transcriptome of the organisms is sequenced using unbiased high throughput sequencing.
- Ill -
18. The method of any one of claims 11-17, wherein the second primer library comprises 20 or more primers.
19. The method of any one of claims 11-18, wherein the second primer library comprises 800 or more primers.
20. The method of any one of claims 11-19, wherein the second primer library comprises 1600 or more primers.
21. The method of any one of claims 11-20, further comprising separating the primers in the second primer library into two primer sets.
22. The method of claim 21, wherein one primer set is used to generate a first cDNA strand from total R A and the other primer set is used to generate a second cDNA strand from the first cDNA strand.
23. The method of any one of claims 11-22, wherein the organism is a eukaryote.
24. The method of claim 23, wherein the organism is a human.
25. The method of any one of claims 11-24, further comprising producing the second primer library.
26. A method for amplifying a microbial nucleic acid comprising:
(a) providing a sample from an organism;
(b) isolating total RNA from the sample;
(c) reverse-transcribing total RNA from the sample using a set of forward strand primers to provide a first cDNA strand, wherein the forward strand primers are designed not to amplify organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts, wherein the forward strand primers comprise primers complementary to regions predicted to form secondary RNA structure of the organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts;
(d) replicating the first cDNA strand using a set of reverse strand primers to provide a second cDNA strand, wherein the reverse strand primers comprise primers are designed not to amplify organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts, wherein the reverse strand primers comprise primers complementary to regions predicted to form secondary RNA structure of the organism's 18S rRNA, 28S rRNA, 12S rRNA, 16S rRNA, ATP synthase and NADH dehydrogenase RNA transcripts;
(e) allowing the first and the second cDNA strands to anneal to provide double- stranded cDNA; and
(f) amplifying the double stranded cDNA using polymerase chain reaction using double-strand cDNA primers.
27. The method of claim 26, wherein the forward strand primers and the reverse strand primers are further designed not to amplify the RNA transcripts of one or more additional oxidative phosphorylation genes.
28. The method of any one of claims 26-27, wherein the forward strand primers and the reverse strand primers each comprise a different hexamer sequence.
29. The method of any one of claims 26-28, wherein the forward strand primers and the reverse strand primers each comprise the same specific tail sequence.
30. The method of any one of claims 26-29, wherein at least one of the double-strand cDNA primers is complementary to the specific tail sequence or a portion thereof.
31. The method of any one of claims 26-30, wherein each forward strand primer
comprises a hexamer sequence selected from the group consisting of the hexamer sequences of any of SEQ ID NOs: 1663-2490.
32. The method of any one of claims 26-30, wherein each reverse strand primer
comprises a hexamer sequence selected from the group consisting of the hexamer sequences of any of SEQ ID NOs: 2491-3324.
33. The method of any one of claims 26-32, further comprising determining whether the organism is infected with a microbe.
34. A composition comprising 800 or more nucleic acids, wherein each nucleic acid has the structure: H-Na-ST wherein H is a nucleotide sequence of 5 to 7 nucleotides; N is a random nucleotide; a is an integer from 0 to 12; and ST is a nucleotide sequence from 10-22 nucleotides, provided that each nucleic acid in the composition has a different sequence H.
35. The composition of claim 34, wherein H is a nucleotide sequence of 6 nucleotides.
36. The composition of any one of claims 34-35, wherein in at least one nucleic acid sequence H is not selected from the group consisting of the hexamer sequences of any of SEQ ID NOs: 3325-4822.
37. The composition of any one of claims 34-36, wherein H is selected from the group consisting of the hexamer sequences of any of SEQ ID NOs: 1663-2490.
38. The composition of any one of claims 34-36, wherein H is selected from the group consisting of the hexamer sequences of any of SEQ ID NOs: 2491-3324.
39. The composition of any one of claims 34-38, wherein a is 1.
40. The composition of any one of claims 34-39, comprising at least 1600 nucleic acids.
41. The composition of any one of claims 34-40, wherein ST is a nucleotide sequence of 17 nucleotides.
42. A kit comprising a composition of any one of claims 1-10 and 34-41 and instructions for use.
PCT/US2011/059783 2010-11-08 2011-11-08 Microbial enrichment primers WO2012064739A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US41114210P 2010-11-08 2010-11-08
US61/411,142 2010-11-08
US201061424276P 2010-12-17 2010-12-17
US61/424,276 2010-12-17

Publications (2)

Publication Number Publication Date
WO2012064739A2 true WO2012064739A2 (en) 2012-05-18
WO2012064739A3 WO2012064739A3 (en) 2012-07-19

Family

ID=46051514

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/059783 WO2012064739A2 (en) 2010-11-08 2011-11-08 Microbial enrichment primers

Country Status (1)

Country Link
WO (1) WO2012064739A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2531741A (en) * 2014-10-28 2016-05-04 Bisn Laboratory Services Ltd Molecular and bioinformatics methods for direct sequencing
EP3099820A4 (en) * 2014-01-27 2018-01-03 The General Hospital Corporation Methods of preparing nucleic acids for sequencing
US10017810B2 (en) 2012-05-10 2018-07-10 The General Hospital Corporation Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence
CN112927756A (en) * 2019-12-06 2021-06-08 深圳华大基因科技服务有限公司 Method and device for identifying transcriptome rRNA pollution source and method for improving rRNA pollution

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5770402A (en) * 1995-04-05 1998-06-23 Board Of Regents, The University Of Texas System DNA encoding macrophage inflammatory protein-1γ
US6068991A (en) * 1997-12-16 2000-05-30 Bristol-Myers Squibb Company High expression Escherichia coli expression vector
US20030224357A1 (en) * 2000-06-07 2003-12-04 Santalucia John Method and system for predicting nucleic acid hybridization thermodynamics and computer-readable storage medium for use therein
US20030229044A1 (en) * 2002-03-29 2003-12-11 Lawrence Steinman Use of statins and other immunomodulatory agents in the treatment of autoimmune disease
US20080187969A1 (en) * 2005-10-27 2008-08-07 Rosetta Inpharmatics Llc Nucleic acid amplification using non-random primers
US20100029511A1 (en) * 2007-10-26 2010-02-04 Rosetta Inpharmatics Llc Cdna synthesis using non-random primers
US20100120022A1 (en) * 2004-01-27 2010-05-13 Michal Ayalon-Soffer Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5770402A (en) * 1995-04-05 1998-06-23 Board Of Regents, The University Of Texas System DNA encoding macrophage inflammatory protein-1γ
US6068991A (en) * 1997-12-16 2000-05-30 Bristol-Myers Squibb Company High expression Escherichia coli expression vector
US20030224357A1 (en) * 2000-06-07 2003-12-04 Santalucia John Method and system for predicting nucleic acid hybridization thermodynamics and computer-readable storage medium for use therein
US20030229044A1 (en) * 2002-03-29 2003-12-11 Lawrence Steinman Use of statins and other immunomodulatory agents in the treatment of autoimmune disease
US20100120022A1 (en) * 2004-01-27 2010-05-13 Michal Ayalon-Soffer Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis
US20080187969A1 (en) * 2005-10-27 2008-08-07 Rosetta Inpharmatics Llc Nucleic acid amplification using non-random primers
US20100029511A1 (en) * 2007-10-26 2010-02-04 Rosetta Inpharmatics Llc Cdna synthesis using non-random primers

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10017810B2 (en) 2012-05-10 2018-07-10 The General Hospital Corporation Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence
US10718009B2 (en) 2012-05-10 2020-07-21 The General Hospital Corporation Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence
US11781179B2 (en) 2012-05-10 2023-10-10 The General Hospital Corporation Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence
EP3099820A4 (en) * 2014-01-27 2018-01-03 The General Hospital Corporation Methods of preparing nucleic acids for sequencing
US10450597B2 (en) 2014-01-27 2019-10-22 The General Hospital Corporation Methods of preparing nucleic acids for sequencing
EP4219744A3 (en) * 2014-01-27 2023-08-30 The General Hospital Corporation Methods of preparing nucleic acids for sequencing
US11807897B2 (en) 2014-01-27 2023-11-07 The General Hospital Corporation Methods of preparing nucleic acids for sequencing
GB2531741A (en) * 2014-10-28 2016-05-04 Bisn Laboratory Services Ltd Molecular and bioinformatics methods for direct sequencing
CN112927756A (en) * 2019-12-06 2021-06-08 深圳华大基因科技服务有限公司 Method and device for identifying transcriptome rRNA pollution source and method for improving rRNA pollution
CN112927756B (en) * 2019-12-06 2023-05-30 深圳华大基因科技服务有限公司 Method and device for identifying rRNA pollution source of transcriptome and method for improving rRNA pollution

Also Published As

Publication number Publication date
WO2012064739A3 (en) 2012-07-19

Similar Documents

Publication Publication Date Title
US11795501B2 (en) Methods for next generation genome walking and related compositions and kits
US10787662B2 (en) Methods and compositions for the making and using of guide nucleic acids
CN113166797B (en) Nuclease-based RNA depletion
CN110785490A (en) Compositions and methods for detecting genomic variations and DNA methylation status
US20150031584A1 (en) Reverse transcription primers and methods of design
WO2012064739A2 (en) Microbial enrichment primers
AU2014346399A1 (en) Systems and methods for universal tail-based indexing strategies for amplicon sequencing
CN108138244A (en) Virus group capture microarray dataset, design and construction method and application method
US20200190565A1 (en) Methods and kits for reducing adapter-dimer formation
WO2021250617A1 (en) A rapid multiplex rpa based nanopore sequencing method for real-time detection and sequencing of multiple viral pathogens
US10870879B2 (en) Method for the preparation of bar-coded primer sets
US9879318B2 (en) Methods and compositions for nucleic acid sample preparation
CN108130366B (en) Method for constructing human miRNA sequencing library for high-throughput sequencing
Sharma et al. Ribonucleic acid extraction from archival formalin fixed paraffin embedded myocardial tissues for gene expression and pathogen detection
US20230323439A1 (en) Crispr-based methods for the detection of nucleic acids in a sample
CN103874766B (en) Molecular Detection is determined
CN113789368B (en) Nucleic acid detection kit, reaction system and method
CN108103173B (en) Method for constructing mouse miRNA sequencing library for high-throughput sequencing
CN115917002A (en) Pathogen diagnostic test
Orłowska et al. Evaluation of direct metagenomics and target enriched approaches for high-throughput sequencing of field rabies viruses
CN108018341B (en) Method for constructing drosophila miRNA sequencing library for high-throughput sequencing
CN104955962B (en) Nucleic acid amplification method
CN110997937B (en) Universal short adaptors with variable length non-random unique molecular identifiers
WO2024077202A2 (en) Probes for improving environmental sample surveillance
Sarmitha et al. Selection and evaluation of an efficient method for the recovery of viral nucleic acids from complex biologicals

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11839600

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11839600

Country of ref document: EP

Kind code of ref document: A2