WO2013175164A1 - Characterization, classification and identification of microorganisms - Google Patents

Characterization, classification and identification of microorganisms Download PDF

Info

Publication number
WO2013175164A1
WO2013175164A1 PCT/GB2013/000239 GB2013000239W WO2013175164A1 WO 2013175164 A1 WO2013175164 A1 WO 2013175164A1 GB 2013000239 W GB2013000239 W GB 2013000239W WO 2013175164 A1 WO2013175164 A1 WO 2013175164A1
Authority
WO
WIPO (PCT)
Prior art keywords
microorganism
str
repeated nucleotide
sample
escherichia coli
Prior art date
Application number
PCT/GB2013/000239
Other languages
French (fr)
Inventor
Colin P. GODDARD
David Hugh Williams
Arthur Keith Turner
John Wain
Original Assignee
Discuva Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB1209122.9A external-priority patent/GB201209122D0/en
Priority claimed from GB201307683A external-priority patent/GB201307683D0/en
Application filed by Discuva Limited filed Critical Discuva Limited
Publication of WO2013175164A1 publication Critical patent/WO2013175164A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention relates to methods for characterizing microorganisms (including bacteria, viruses and fungi) for various purposes, including classification, identification, diagnosis and sub-typing for epidemiology and virulence determination.
  • the methods involve sequencing complete or partial genomic DNA from a test microorganism and identifying repeated nucleotide n-mer sequences present at a copy number ⁇ r. The repeated nucleotide n-mer sequences are then compared with those present in one or more reference microorganism(s). The results of the comparison can be used to characterize, classify or identify the test microorganism.
  • the methods of the invention find application in the identification of bacterial pathogens in biological samples, including mixed cultures, and complex communities derived from clinical samples for the diagnosis of infectious diseases and other purposes,
  • sequence data from microorganisms including partial sequences, targeted sequencing, sample sequencing and whole genome sequence data generated by new or next generation sequencing methods (NGS, being rapid or high throughput methods used in place of Sanger sequencing), has made various molecular approaches more tractable.
  • rDNA PCR-amplified 16S ribosomal RNA genes
  • mo beacons molecular beacons
  • RT-PCR reverse transcription of ribosomal RNA
  • Erylewine-type amplification
  • 16S rRNA or the genes thereof comprise the largest set of gene-specific sequence data.
  • relevant information for other targets including CRISPAs (spacer regions in between large, often non-perfect repeats, such as spoligotyping typing for Mycobacterium tuberculosis), Multi Locus Variable Number Tandem repeat (VNTR) Analysis (MLVA), (including MIRU typing for Mycobacterium tuberculosis, 5S rRNA, 23S rRNA, rRNA spacer regions, RNase P RNA, housekeeping genes used for multi-locus sequence typing (MLST) etc is also accumulating rapidly, in part because of complete genome sequencing efforts.
  • CRISPAs spacer regions in between large, often non-perfect repeats, such as spoligotyping typing for Mycobacterium tuberculosis), Multi Locus Variable Number Tandem repeat (VNTR) Analysis (MLVA), (including MIRU typing for Mycobacterium tuberculosis, 5S rRNA, 23S rRNA, r
  • MLVA is carried out by estimating the number of defined repeats, in a tandem chain; this is usually carried out by size analysis of PCR products but long read sequencing of the large regions involved is becoming more accepted. NGS is not used because the size of the repeat chains is too large.
  • Existing sequence-based approaches are limited by problems arising from gene-target selection. For example, members of the Streptococcus mitis group, which include
  • Streptococcus pneumoniae have indistinguishable 16S rRNA gene sequences and many bacterial species do not have sufficient variation in the classic VNTR loci. Moreover, accurate identification requires high-quality, comprehensive reference libraries.
  • the present inventors have discovered that the use of repeated sequence motifs provides a very accessible readout of "relatedness" which is tolerant of short read sequence data, mismatch and sequencing errors. Changing the length of the n-mer or the minimum copy number r permits the resolution of the technique to be adjusted, permitting identification at the level of genus, species or strain as well as the construction of dendograms showing overall relatedness between test microorganism(s) and a number of reference
  • the identity (nucleotide sequence) and frequency of the repeated n-mer sequences is potentially unique (and so distinctive for particular species or strains).
  • the gap distance between n-mers can also be exploited in identification of species and sub-species (strain) recognition by allowing the calculation of frequency and so allowing comparison of the numbers of each unique repeat within the genome.
  • the invention therefore finds particular application in clinical diagnostic technology (using collated reference genomes) from both isolates and sequence data generated directly from clinical specimens.
  • a method for identifying a test microorganism comprising the steps of:
  • step (c) comparing the repeated nucleotide n-mer sequences identified in step (b) with those of one or more reference microorganism(s).
  • a method for characterizing a microorganism comprising the steps of: (a) sequencing complete or partial genomic DNA from the microorganism;
  • step (d) estimating the total number of n-mers in the genome directly.
  • the gap length may be used to derive the frequency of the n-mer (e.g. number per kilobase) and/or the pattern of frequencies for several n-mers.
  • Frequency determinations find particular application in embodiments where partial genomic DNA is sequenced in step (a).
  • a computer system for use in a method as defined in any one of the preceding claims comprising: (a) a test microorganism database comprising the sequence and copy number of repeated nucleotide n-mer sequences present at a copy number ⁇ r, and (b) a reference database of complete or partial genomic DNA sequences of a plurality of reference microorganisms.
  • the reference database may be in-house, one of many commercially available databases or publical (e.g. part of library websites or open access research facilities or from service laboratories accredited for the generation of accredited data for human, animal medical proposes or the food industry or industrial manufacture etc.).
  • the threshold copy number is preferably greater than 10, but may be selected to vary the resolution of the method. Accordingly, r may be chosen according to the needs and preferences of the user, so that the threshold copy number may be 5, 10, 15, 20, 25, 30, 35, 40 or greater than 40.
  • the length of the repeated nucleotide sequence is preferably about 12 (e.g. 12), but may be selected to vary the resolution of the method. Accordingly, n may be chosen according to the needs and preferences of the user, so that the length of the repeated nucleotide may be 6-24, 8-16 or 10-14.
  • the methods of the invention find utility in a wide range of applications, including clinical diagnostics, epidemiology, and food microbiology.
  • the methods are particularly suited to the direct analysis of samples (including clinical samples), where complete or partial genomic DNA from microorganisms present in a sample is extracted and analysed without a preliminary culturing step (so greatly reducing the time needed for e.g. clinical diagnosis).
  • “comprising,” are to be read to indicate the inclusion of any recited integer (e.g. a feature, element, characteristic, property, method/process step or limitation) or group of integers (e.g. features, element, characteristics, properties, method/process steps or limitations) but not the exclusion of any other integer or group of integers.
  • a recited integer e.g. a feature, element, characteristic, property, method/process step or limitation
  • group of integers e.g. features, element, characteristics, properties, method/process steps or limitations
  • the term "consisting” is used to indicate the presence of the recited integer (e.g. a feature, element, characteristic, property, method/process step or limitation) or group of integers (e.g. features, element, characteristics, properties, method/process steps or limitations) alone.
  • Gram-negative bacterium and Gram-positive bacterium are terms of art defining two distinct classes of bacteria on the basis of certain cell wall staining characteristics.
  • low G+C Gram-positive bacterium is a term of art defining a particular subclass of evolutionary related bacteria within the Gram-positives on the basis of the composition of the bases in the DNA.
  • the subclass includes Streptococcus spp., Staphylococcus spp., Listeria spp., Bacillus spp., Clostridium spp., Enterococcus spp. and Lactobacillus spp ).
  • high G+C Gram-positive bacterium is a term of art defining a particular subclass of evolutionarily related bacteria within the Gram-positives on the basis of the composition of the bases in the DNA.
  • the subclass includes actinomycetes (actinobacteria) including Actinomyces spp., Arthrobacter spp., Corynebacterium spp., Frankia spp., Micrococcus spp., Micromonospora spp., Mycobacterium spp., Nocardia spp., Propionibacterium spp. and Streptomyces spp.
  • Any suitable high-throughput sequencing technique can be used for sequencing the genomic DNA of the test microorganism, and there are many commercially available sequencing platforms that are suitable for use in the methods of the invention and some in development but not yet in use. Sequencing-by-synthesis (SBS)-based sequencing platforms are particularly suitable for use in the methods of the invention: for example, the llluminaTM system generates millions of relatively short sequence reads (54, 75 or 100bp) and is particularly preferred. Other suitable techniques include methods based on reversible dye-terminators.
  • SBS sequencing-by-synthesis
  • the methods and systems of the invention find application in the characterization, classification or identification of any microorganism, including bacteria, viruses and fungi.
  • the methods and systems of the invention may be applied to the characterization, classification or identification of: (a) Gram-positive, Gram-negative and/or Gram-variable bacteria; (b) spore-forming bacteria; (c) non-spore forming bacteria; (d) filamentous bacteria; (e) intracellular bacteria; (f) obligate aerobes; (g) obligate anaerobes; (h) facultative anaerobes; (i) microaerophilic bacteria and/or (f) opportunistic bacterial pathogens.
  • the invention is used in the characterization, classification or identification of one or more bacteria of the following genera: Acinetobacter (e.g. A.
  • Aeromonas e.g. A. hydrophila
  • Bacillus e.g. B. anthracis
  • Bacteroides e.g.
  • B. fragilis Bordetel!a (e.g. B. pertussis); Borrelia (e.g. B. burgdorferi); Brucella (e.g. B. abortus, B. can/s, B. melitensis and B. suis); Burkholderia (e.g. B. cepacia complex);
  • Campylobacter e.g. C. jejuni
  • Chlamydia e.g. C. trachomatis, C. suis and C. muridarum
  • Chlamydophila e.g. (e.g. C. pneumoniae, C. pecorum, C. psittaci, C. abortus, C. felis and
  • Citrobacter e.g. C. freundii
  • Clostridium e.g. C. botulinum, C. difficile, C. perfringens and C. tetani
  • Corynebacterium e.g. C. diphteriae and C. glutamicum
  • Enterobacter e.g. E. cloacae and E. aerogenes
  • Enterococcus e.g. E. faecalis and E. faecium
  • Escherichia e.g. E. coli
  • Flavobacterium Francisella (e.g. F. tularensis);
  • Fusobactehum e.g. F. necrophorum
  • Haemophilus e.g. H. somnus, H. influenzae and H. parainfluenzae
  • Helicobacter e.g. H. pylori
  • Klebsiella e.g. K. oxytoca and K.
  • Legionella e.g. L. pneumophila
  • Leptospira e.g. L interrogans
  • Listeria e.g. L. monocytogenes
  • Moraxella e.g. M. catarrhalis
  • Morganella e.g. M. morganii
  • Mycobacterium e.g. M. leprae and M. tuberculosis
  • Mycoplasma e.g. M. pneumoniae
  • Neisseria e.g. N. gonorrhoeae and N. meningitidis
  • Pasteurella e.g. P. multocida
  • Peptostreptococcus Prevotella; Proteus (e.g. P. mirabiHs and P. vulgaris), Pseudomohas
  • Serratia e.g. S. marcesens
  • Shigella e.g. S. flexnaria, S. dysenteriae
  • Staphylococcus e.g. S. aureus, S. haemolyticus, S. intermedius, S. epidermidis and S. saprophyticus
  • Stenotrophomonas e.g. S. maltophila
  • Streptococcus e.g. S. agalactiae, S. mutans, S. pneumoniae and S. pyogenes
  • Treponema e.g. T. pallidum
  • Vibrio e.g. V. cholerae
  • Yersinia e.g. Y. pestis
  • the invention may be used in the characterization, classification or identification of multidrug resistant bacteria, including, but not limited to penicillin-resistant, methicillin-resistant, quinolone- resistant, macrolide-resistant, and/or vancomycin-resistant bacterial strains, including for example penicillin-, methicillin-, macrolide-, vancomycin-, and/or quinolone- resistant Streptococcus pneumoniae; penicillin-, methicillin-, macrolide-, vancomycin-, and/or quinolone-resistant Sfaphy/ococcus aureus; penicillin-, methicillin-, macrolide-, vancomycin-, and/or quinolone-resistant Streptococcus pyogenes; and penicillin-, methicillin- , macrolide-, vancomycin-, and/or quinolone-resistant enterococci.
  • the invention may also be used to target strains of Staphylococcus aureus (SA) which are usually multidrug resistant (MR), for example selected from any of C-MSRA1 , C- MRSA2, C-MRSA3, C-MSRA4, Belgian MRSA, Swiss MRSA and any of the EMRSA strains.
  • SA Staphylococcus aureus
  • MR multidrug resistant
  • the invention may be used in the characterization, classification or identification of high G+C Gram-positive bacteria.
  • high G+C Gram-positive bacteria is a term of art defining a particular class of evolutionarily related bacteria.
  • the class includes Micrococcus spp. (e.g. M. luteus), Mycobacterium spp. (for example a fast- or slow-growing
  • mycobacterium e.g. M. tuberculosis, M. leprae, M. smegmatis or M. bovis
  • Streptomyces spp. e.g. S. rimosus and S. coelicolor
  • Corynebacterium spp. e.g. C. glutamicum
  • the invention may be used in the characterization, classification or identification of low G+C Gram-positive bacteria.
  • low G+C Gram-positive bacteria is a term of art defining a particular class of evolutionarily related bacteria.
  • the class includes members of the Firmicutes phylum, including for example Staphylococcus spp. and Bacillus spp.
  • nucleotide 12-mer sequences (12 base long sub-sequences) from a genomic sequence derived from a test microorganism are sorted into alphabetical order. This may be achieved by any suitable algorithm, for example a Radix-type sort.
  • the 12-mer sequence and its repeat count are stored.
  • the repeat statistics are printed to an output file for the test genome.
  • the process is then repeated, and separate output files generated, for all genomes of interest.
  • the total number of repeats in this case, represents the frequency of repeats because all genomes are of equivalent size. For genomes of variable size (or where partial genome sequences are used), frequencies rather than absolute numbers may be used.
  • An all-against-all comparison is then performed on the output files generated as described above.
  • the repeats from a specific test genome are compared with the repeats in all other genomes.
  • a score is compiled for the repeats from the test genome against the other genomes where the same repeated sequence occurs.
  • the output below shows the results when the test genome Acinetobacter baumannii AB0057, is compared to a range of other bacterial genomes from Klebsiella pneumoniae, Escherichia coli, Neisseria gonorrhoeae, S. Typhimurium, S. Typhi strains and other Acinetobacter baumannii strains.
  • the number on the left show the count of all repeated sequences shared by the test strain and each of the compared strains. The higher the number, the more the compared genomes have in common.
  • Acinetobacter baumannii AB307-0294 4 Acinetobacter baumannii AB307-0294 4
  • Neisseria gonorrhoeae NCCP11945 3 Acinetobacter baumannii AB0057
  • Example 2 Identification of different micro-organisms in a mixed culture
  • the method searches for abundant repeats that are completely specific for a strain or subtype.
  • Acinetobacter baumannii AB0057 has 1493 12-mer sequences that are repeated 10 or more times. Of these, 385 are shared with Acinetobacter baumannii AYE, only 367 with Acinetobacter baumannii AB307-0294, and 6 or less with each of the other tested strains listed below.
  • Acinetobacter baumannii AB0057 has 1493 12-mers that are repeated 10 or more times, and that 385 of these are shared with Acinetobacter baumannii AYE.
  • the number of "repeat hits” (385) is the number of distinct 12-mers, repeated 10 or more times, that the two strains have in common.
  • the “repeat count” (4780) is the total number of these repeated 12-mers that the two strains have in common.
  • Example 4 Using FASTA data of a completed genome sequence (with any identifying information removed) comparison with high quality genome sequences available from the NCBI site (http://www.ncbi. nlm.nih.gov/genome/) gave the following identification parameters Sequence one: test is Escherichia coli 0104:H4 str. 2011 C-3493
  • the operator was blind to the identity of the organism from which the genome data was generated.
  • the invention was able to correctly identify the species from an unknown bacterial sequence and place in order of similarity other members of that species for which an appropriate sequence was available.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Disclosed is a method for identifying a test microorganism comprising the steps of: (a) sequencing complete or partial genomic DNA from the test microorganism; (b) identifying repeated nucleotide n-mer sequences present at a copy number≥r, and (c) comparing the repeated nucleotide n-mer sequences identified in step (b) with those of one or more reference microorganism(s).

Description

CHARACTERIZATION, CLASSIFICATION AND IDENTIFICATION OF
MICROORGANISMS
Field of the Invention
The present invention relates to methods for characterizing microorganisms (including bacteria, viruses and fungi) for various purposes, including classification, identification, diagnosis and sub-typing for epidemiology and virulence determination. The methods involve sequencing complete or partial genomic DNA from a test microorganism and identifying repeated nucleotide n-mer sequences present at a copy number≥r. The repeated nucleotide n-mer sequences are then compared with those present in one or more reference microorganism(s). The results of the comparison can be used to characterize, classify or identify the test microorganism. The methods of the invention find application in the identification of bacterial pathogens in biological samples, including mixed cultures, and complex communities derived from clinical samples for the diagnosis of infectious diseases and other purposes,
Background to the Invention Conventional determinative bacteriology relies largely on the characterization of phenotypic properties of pure cultures obtained from specimens after cultivation and isolation of bacteria on appropriate, usually selective, laboratory media. Newer phenotypic methods using mass spectrometry still depend upon purified isolates and discrimination is at the same level as the 16S methods described below.
More accurate identification, typing and sub-typing traditionally uses extended biochemical tests involving growth on a range of substrates, serotyping with antibodies (typically raised in rabbits) against specific antigens on the surface of the microorganism or differential lysis by bacteriophage. Such methods are time-consuming, labour-intensive and the convergence of phenotypic traits among unrelated bacteria can complicate analysis and lead to errors.
The ever-increasing amount of sequence data from microorganisms, including partial sequences, targeted sequencing, sample sequencing and whole genome sequence data generated by new or next generation sequencing methods (NGS, being rapid or high throughput methods used in place of Sanger sequencing), has made various molecular approaches more tractable. Common examples of such approaches include comparative sequencing of PCR-amplified 16S ribosomal RNA genes (rDNA), isotopic or fluorescently labelled hybridization probes (molecular beacons), or reverse transcription of ribosomal RNA (rRNA) and amplification (RT-PCR, or "Eberwine-type" amplification) used in conjunction with hybridization probes or sequencing. Currently, 16S rRNA or the genes thereof (rDNA) comprise the largest set of gene-specific sequence data. However, relevant information for other targets including CRISPAs (spacer regions in between large, often non-perfect repeats, such as spoligotyping typing for Mycobacterium tuberculosis), Multi Locus Variable Number Tandem repeat (VNTR) Analysis (MLVA), (including MIRU typing for Mycobacterium tuberculosis, 5S rRNA, 23S rRNA, rRNA spacer regions, RNase P RNA, housekeeping genes used for multi-locus sequence typing (MLST) etc is also accumulating rapidly, in part because of complete genome sequencing efforts. For example MLVA is carried out by estimating the number of defined repeats, in a tandem chain; this is usually carried out by size analysis of PCR products but long read sequencing of the large regions involved is becoming more accepted. NGS is not used because the size of the repeat chains is too large. Existing sequence-based approaches are limited by problems arising from gene-target selection. For example, members of the Streptococcus mitis group, which include
Streptococcus pneumoniae, have indistinguishable 16S rRNA gene sequences and many bacterial species do not have sufficient variation in the classic VNTR loci. Moreover, accurate identification requires high-quality, comprehensive reference libraries.
It has now been realised that there are inherent limitations to sequence-based microbial identification based on particular targets (e.g. 16S rRNA and VNTR loci) and that these problems, arising from sequence errors and read length in both test organism and sequence databases, can be overcome by characterizing test microorganisms on the basis of genome wide repeated oligomers nucleic acid sequences defined by: (a) the length of the repeated oligomeric nucleotide sequence (n-mer); and (b) the copy number threshold r of the repeated nucleotide n-mer sequences.
The present inventors have discovered that the use of repeated sequence motifs provides a very accessible readout of "relatedness" which is tolerant of short read sequence data, mismatch and sequencing errors. Changing the length of the n-mer or the minimum copy number r permits the resolution of the technique to be adjusted, permitting identification at the level of genus, species or strain as well as the construction of dendograms showing overall relatedness between test microorganism(s) and a number of reference
species/strains/subtypes/isolates.
The identity (nucleotide sequence) and frequency of the repeated n-mer sequences is potentially unique (and so distinctive for particular species or strains). The gap distance between n-mers can also be exploited in identification of species and sub-species (strain) recognition by allowing the calculation of frequency and so allowing comparison of the numbers of each unique repeat within the genome.
The invention therefore finds particular application in clinical diagnostic technology (using collated reference genomes) from both isolates and sequence data generated directly from clinical specimens.
Summary of the Invention
In a first aspect of the present invention, there is provided a method for identifying a test microorganism comprising the steps of:
(a) sequencing complete or partial genomic DNA from the test microorganism;
(b) identifying repeated nucleotide n-mer sequences present at a copy number≥r, and
(c) comparing the repeated nucleotide n-mer sequences identified in step (b) with those of one or more reference microorganism(s).
In another aspect, there is provided a method for characterizing a microorganism comprising the steps of: (a) sequencing complete or partial genomic DNA from the microorganism;
(b) identifying repeated nucleotide n-mer sequences present at a copy number≥r, and optionally
(c) determining the gap length between repeated nucleotide n-mer sequence
sequences identified in step (b); and/or
(d) estimating the total number of n-mers in the genome directly. In step (c) (above), the gap length may be used to derive the frequency of the n-mer (e.g. number per kilobase) and/or the pattern of frequencies for several n-mers.
Frequency determinations find particular application in embodiments where partial genomic DNA is sequenced in step (a).
In another aspect, there is provided a computer system for use in a method as defined in any one of the preceding claims comprising: (a) a test microorganism database comprising the sequence and copy number of repeated nucleotide n-mer sequences present at a copy number≥r, and (b) a reference database of complete or partial genomic DNA sequences of a plurality of reference microorganisms.
The reference database may be in-house, one of many commercially available databases or publical (e.g. part of library websites or open access research facilities or from service laboratories accredited for the generation of accredited data for human, animal medical proposes or the food industry or industrial manufacture etc.).
The threshold copy number is preferably greater than 10, but may be selected to vary the resolution of the method. Accordingly, r may be chosen according to the needs and preferences of the user, so that the threshold copy number may be 5, 10, 15, 20, 25, 30, 35, 40 or greater than 40.
The length of the repeated nucleotide sequence is preferably about 12 (e.g. 12), but may be selected to vary the resolution of the method. Accordingly, n may be chosen according to the needs and preferences of the user, so that the length of the repeated nucleotide may be 6-24, 8-16 or 10-14.
The methods of the invention find utility in a wide range of applications, including clinical diagnostics, epidemiology, and food microbiology. The methods are particularly suited to the direct analysis of samples (including clinical samples), where complete or partial genomic DNA from microorganisms present in a sample is extracted and analysed without a preliminary culturing step (so greatly reducing the time needed for e.g. clinical diagnosis).
Other aspects of the invention are as defined in the claims attached hereto. Detailed Description of the Invention
Definitions and general preferences Where used herein and unless specifically indicated otherwise, the following terms are intended to have the following meanings in addition to any broader {or narrower) meanings the terms might enjoy in the art:
Unless otherwise required by context, the use herein of the singular is to be read to include the plural and vice versa. The term "a" or "an" used in relation to an entity is to be read to refer to one or more of that entity. As such, the terms "a" (or "an"), "one or more," and "at least one" are used interchangeably herein.
As used herein, the term "comprise," or variations thereof such as "comprises" or
"comprising," are to be read to indicate the inclusion of any recited integer (e.g. a feature, element, characteristic, property, method/process step or limitation) or group of integers (e.g. features, element, characteristics, properties, method/process steps or limitations) but not the exclusion of any other integer or group of integers. Thus, as used herein the term "comprising" is inclusive or open-ended and does not exclude additional, unrecited integers or method/process steps.
The phrase "consisting essentially of" is used herein to require the specified integer(s) or steps as well as those which do not materially affect the character or function of the claimed invention.
As used herein, the term "consisting" is used to indicate the presence of the recited integer (e.g. a feature, element, characteristic, property, method/process step or limitation) or group of integers (e.g. features, element, characteristics, properties, method/process steps or limitations) alone.
The terms Gram-negative bacterium and Gram-positive bacterium are terms of art defining two distinct classes of bacteria on the basis of certain cell wall staining characteristics.
The term low G+C Gram-positive bacterium is a term of art defining a particular subclass of evolutionary related bacteria within the Gram-positives on the basis of the composition of the bases in the DNA. The subclass includes Streptococcus spp., Staphylococcus spp., Listeria spp., Bacillus spp., Clostridium spp., Enterococcus spp. and Lactobacillus spp ).
The term high G+C Gram-positive bacterium is a term of art defining a particular subclass of evolutionarily related bacteria within the Gram-positives on the basis of the composition of the bases in the DNA. The subclass includes actinomycetes (actinobacteria) including Actinomyces spp., Arthrobacter spp., Corynebacterium spp., Frankia spp., Micrococcus spp., Micromonospora spp., Mycobacterium spp., Nocardia spp., Propionibacterium spp. and Streptomyces spp.
Exemplary DNA sequencing methods for use according to the invention
Any suitable high-throughput sequencing technique can be used for sequencing the genomic DNA of the test microorganism, and there are many commercially available sequencing platforms that are suitable for use in the methods of the invention and some in development but not yet in use. Sequencing-by-synthesis (SBS)-based sequencing platforms are particularly suitable for use in the methods of the invention: for example, the lllumina™ system generates millions of relatively short sequence reads (54, 75 or 100bp) and is particularly preferred. Other suitable techniques include methods based on reversible dye-terminators.
Exemplary bacterial targets of the methods of the invention
The methods and systems of the invention find application in the characterization, classification or identification of any microorganism, including bacteria, viruses and fungi. Preferred are methods and systems applied to the characterisation, classification or identification of bacteria, and especially pathogenic bacteria of clinical importance.
Thus, the methods and systems of the invention may be applied to the characterization, classification or identification of: (a) Gram-positive, Gram-negative and/or Gram-variable bacteria; (b) spore-forming bacteria; (c) non-spore forming bacteria; (d) filamentous bacteria; (e) intracellular bacteria; (f) obligate aerobes; (g) obligate anaerobes; (h) facultative anaerobes; (i) microaerophilic bacteria and/or (f) opportunistic bacterial pathogens. In certain embodiments, the invention is used in the characterization, classification or identification of one or more bacteria of the following genera: Acinetobacter (e.g. A.
baumannii); Aeromonas (e.g. A. hydrophila); Bacillus (e.g. B. anthracis); Bacteroides (e.g.
B. fragilis); Bordetel!a (e.g. B. pertussis); Borrelia (e.g. B. burgdorferi); Brucella (e.g. B. abortus, B. can/s, B. melitensis and B. suis); Burkholderia (e.g. B. cepacia complex);
Campylobacter (e.g. C. jejuni); Chlamydia (e.g. C. trachomatis, C. suis and C. muridarum); Chlamydophila (e.g. (e.g. C. pneumoniae, C. pecorum, C. psittaci, C. abortus, C. felis and
C. caviae); Citrobacter (e.g. C. freundii); Clostridium (e.g. C. botulinum, C. difficile, C. perfringens and C. tetani); Corynebacterium (e.g. C. diphteriae and C. glutamicum);
Enterobacter(e.g. E. cloacae and E. aerogenes); Enterococcus (e.g. E. faecalis and E. faecium); Escherichia (e.g. E. coli); Flavobacterium; Francisella (e.g. F. tularensis);
Fusobactehum (e.g. F. necrophorum); Haemophilus (e.g. H. somnus, H. influenzae and H. parainfluenzae); Helicobacter (e.g. H. pylori); Klebsiella (e.g. K. oxytoca and K.
pneumoniae), Legionella (e.g. L. pneumophila); Leptospira (e.g. L interrogans); Listeria (e.g. L. monocytogenes); Moraxella (e.g. M. catarrhalis); Morganella (e.g. M. morganii);
Mycobacterium (e.g. M. leprae and M. tuberculosis); Mycoplasma (e.g. M. pneumoniae);
Neisseria (e.g. N. gonorrhoeae and N. meningitidis); Pasteurella (e.g. P. multocida);
Peptostreptococcus; Prevotella; Proteus (e.g. P. mirabiHs and P. vulgaris), Pseudomohas
(e.g. P. aeruginosa); Rickettsia (e.g. P. rickettsii); Salmonella (e.g. S. typhi and S.
typhimurium); Serratia (e.g. S. marcesens); Shigella (e.g. S. flexnaria, S. dysenteriae and
S. sonnei); Staphylococcus (e.g. S. aureus, S. haemolyticus, S. intermedius, S. epidermidis and S. saprophyticus); Stenotrophomonas (e.g. S. maltophila); Streptococcus (e.g. S. agalactiae, S. mutans, S. pneumoniae and S. pyogenes); Treponema (e.g. T. pallidum);
Vibrio (e.g. V. cholerae) and Yersinia (e.g. Y. pestis).
The invention may be used in the characterization, classification or identification of multidrug resistant bacteria, including, but not limited to penicillin-resistant, methicillin-resistant, quinolone- resistant, macrolide-resistant, and/or vancomycin-resistant bacterial strains, including for example penicillin-, methicillin-, macrolide-, vancomycin-, and/or quinolone- resistant Streptococcus pneumoniae; penicillin-, methicillin-, macrolide-, vancomycin-, and/or quinolone-resistant Sfaphy/ococcus aureus; penicillin-, methicillin-, macrolide-, vancomycin-, and/or quinolone-resistant Streptococcus pyogenes; and penicillin-, methicillin- , macrolide-, vancomycin-, and/or quinolone-resistant enterococci. Thus, the invention may also be used to target strains of Staphylococcus aureus (SA) which are usually multidrug resistant (MR), for example selected from any of C-MSRA1 , C- MRSA2, C-MRSA3, C-MSRA4, Belgian MRSA, Swiss MRSA and any of the EMRSA strains.
The invention may be used in the characterization, classification or identification of high G+C Gram-positive bacteria. The term "high G+C Gram-positive bacteria" is a term of art defining a particular class of evolutionarily related bacteria. The class includes Micrococcus spp. (e.g. M. luteus), Mycobacterium spp. (for example a fast- or slow-growing
mycobacterium, e.g. M. tuberculosis, M. leprae, M. smegmatis or M. bovis), Streptomyces spp. (e.g. S. rimosus and S. coelicolor) and Corynebacterium spp. (e.g. C. glutamicum).
The invention may be used in the characterization, classification or identification of low G+C Gram-positive bacteria. The term "low G+C Gram-positive bacteria" is a term of art defining a particular class of evolutionarily related bacteria. The class includes members of the Firmicutes phylum, including for example Staphylococcus spp. and Bacillus spp.
Exemplification The invention will now be described with reference to specific Examples. These are merely exemplary and for illustrative purposes only: they are not intended to be limiting in any way to the scope of the monopoly claimed or to the invention described. These examples constitute the best mode currently contemplated for practicing the invention. Example 1
All nucleotide 12-mer sequences (12 base long sub-sequences) from a genomic sequence derived from a test microorganism are sorted into alphabetical order. This may be achieved by any suitable algorithm, for example a Radix-type sort.
When a 12-mer is found that is repeated and the number of repeats exceeds a set threshold, the 12-mer sequence and its repeat count are stored. At the end of the sorting process, the repeat statistics are printed to an output file for the test genome. The process is then repeated, and separate output files generated, for all genomes of interest. The total number of repeats, in this case, represents the frequency of repeats because all genomes are of equivalent size. For genomes of variable size (or where partial genome sequences are used), frequencies rather than absolute numbers may be used.
An all-against-all comparison is then performed on the output files generated as described above. The repeats from a specific test genome are compared with the repeats in all other genomes. A score is compiled for the repeats from the test genome against the other genomes where the same repeated sequence occurs.
The output below shows the results when the test genome Acinetobacter baumannii AB0057, is compared to a range of other bacterial genomes from Klebsiella pneumoniae, Escherichia coli, Neisseria gonorrhoeae, S. Typhimurium, S. Typhi strains and other Acinetobacter baumannii strains.
The number on the left show the count of all repeated sequences shared by the test strain and each of the compared strains. The higher the number, the more the compared genomes have in common.
Acinetobacter baumannii AB0057
1493, Acinetobacter baumannii AB0057
385, Acinetobacter baumannii AYE
367, Acinetobacter baumannii AB307-0294
6, Klebsiella pneumoniae 342
4, Klebsiella pneumoniae subsp. pneumoniae HS11286
4, Escherichia coli APEC 01
3, Escherichia coli 536
3, Neisseria gonorrhoeae NCCP11945
3, Typhimurium str. T000240
2, Typhi str. CT18
2, Typhimurium str. D23580
2, Escherichia coli 55989
2, Typhimurium str. UK-1
1 , Typhi str. P-stx-12
1 , Pseudomonas aeruginosa PA7
1 , Typhi str. Ty2 The process is then repeated for all of the other Klebsiella pneumoniae, Escherichia coli, Neisseria gonorrhoeae, Typhimurium, Typhi and Acinetobacter baumannii strains. The full output is shown below:
Acinetobacter baumannii AB0057
1493, Acinetobacter baumannii AB0057
385, Acinetobacter baumannii AYE
367, Acinetobacter baumannii AB307-0294
6, Klebsiella pneumoniae 342
4, Klebsiella pneumoniae subsp. pneumoniae HS11286
4, Escherichia coli APEC 01
3, Escherichia coli 536
3, Neisseria gonorrhoeae NCCP11945
3, Typhimurium str. T000240
2, Typhi str. CT18
2, Typhimurium str. D23580
2, Escherichia coli 55989
2, Typhimurium str. UK-1
1 , Typhi str. P-stx-12
1 , Pseudomonas aeruginosa PA7
1 , Typhi str. Ty2
Acinetobacter baumannii AB307-0294
1689, Acinetobacter baumannii AB307-0294
1548, Acinetobacter baumannii AYE
367, Acinetobacter baumannii AB0057
8, Escherichia coli 536
8, Typhimurium str. T000240
8, Escherichia coli APEC 01
7, Typhimurium str. D23580
7, Escherichia coli 55989
7, Typhimurium str. UK-1
6, Klebsiella pneumoniae subsp. pneumoniae HS11286
3, Neisseria gonorrhoeae NCCP11945
2, Pseudomonas aeruginosa PA7
2, Klebsiella pneumoniae 342 1 , Typhi str. CT18
1 , Typhi str. P-stx-12
1 , Typhi str. Ty2
Acinetobacter baumannii AYE
3314, Acinetobacter baumannii AYE
1548, Acinetobacter baumannii AB307-0294
385, Acinetobacter baumannii AB0057
8, Typhimurium str. Τ0Ό0240
8, Escherichia coli 536
8, Escherichia coli APEC 01
7, Typhimurium str. D23580
7, Escherichia coli 55989
7, Typhimurium str. UK-1
6, Klebsiella pneumoniae subsp. pneumoniae HS11286 4, Neisseria gonorrhoeae NCCP1 1945
2, Pseudomonas aeruginosa PA7
2, Klebsiella pneumoniae 342
1 , Typhi str. P-stx-12
1 , Typhi str. Ty2
Escherichia coli 536
604, Escherichia coli 536
468, Escherichia coli APEC 01
406, Escherichia coli 55989
251 , Klebsiella pneumoniae subsp. pneumoniae HS 11286
233, Klebsiella pneumoniae 342
212, Typhimurium str. T000240
207, Typhimurium str. D23580
207, Typhimurium str. UK-1
204, Pseudomonas aeruginosa PA7
199, Typhi str. CT18
177, Typhi str. P-stx-12
177, Typhi str. Ty2
128, Pseudomonas aeruginosa LESB58
8, Acinetobacter baumannii AYE
8, Acinetobacter baumannii AB307-0294 4, Neisseria gonorrhoeae NCCP11945 3, Acinetobacter baumannii AB0057
2, Neisseria gonorrhoeae FA 1090
Escherichia coli 55989
784, Escherichia coli 55989
419, Escherichia coli APEC 01
406, Escherichia coli 536
286, Klebsiella pneumoniae 342
285, Klebsiella pneumoniae subsp. pneumoniae HS11286
252, Pseudomonas aeruginosa PA7
248, Typhimurium str. T000240
241 , Typhimurium str. D23580
241 , Typhimurium str. UK-1
222, Typhi str. CT18
200, Typhi str. P-stx-12
200, Typhi str. Ty2
149, Pseudomonas aeruginosa LESB58
7, Acinetobacter baumannii AYE
7, Acinetobacter baumannii AB307-0294
6, Neisseria gonorrhoeae NCCP11945
3, Neisseria gonorrhoeae FA 1090
2, Acinetobacter baumannii AB0057
Escherichia coli APEC 01
638, Escherichia coli APEC 01
468, Escherichia coli 536
419, Escherichia coli 55989
264, Klebsiella pneumoniae subsp. pneumoniae HS11286
255, Klebsiella pneumoniae 342
232, Pseudomonas aeruginosa PA7
224, Typhimurium str. T000240
220, Typhimurium str. D23580
220, Typhimurium str. UK-1
208, Typhi str. CT18
191 , Typhi str. P-stx-12
191 , Typhi str. Ty2 143, Pseudomonas aeruginosa LESB58 8, Acinetobacter baumannii AYE
8, Acinetobacter baumannii AB307-0294
4, Acinetobacter baumannii AB0057
3, Neisseria gonorrhoeae NCCP11945
1 , Neisseria gonorrhoeae FA 1090
Klebsiella pneumoniae 342
8745, Klebsiella pneumoniae 342
5681, Pseudomonas aeruginosa PA7
4294, Klebsiella pneumoniae subsp. pneumoniae HS11286
2719, Pseudomonas aeruginosa LESB58
782, Typhimurium str. T000240
768, Typhimurium str. D23580
762, Typhimurium str. UK-1
691 , Typhi str. Ty2
688, Typhi str. P-stx-12
686, Typhi str. CT18
286, Escherichia coli 55989
255, Escherichia coli APEC 01
233, Escherichia coli 536
85, Neisseria gonorrhoeae FA 1090
76, Neisseria gonorrhoeae NCCP11945
6, Acinetobacter baumannii AB0057
2, Acinetobacter baumannii AYE
2, Acinetobacter baumannii AB307-0294
Klebsiella pneumoniae subsp. pneumoniae HS11286
8009, Klebsiella pneumoniae subsp. pneumoniae HS 11286
5484, Pseudomonas aeruginosa PA7
4294, Klebsiella pneumoniae 342
2737, Pseudomonas aeruginosa LESB58
771 , Typhimurium str. T000240
750, Typhimurium str. D23580
746, Typhimurium str. UK-1
660, Typhi str. CT18
652, Typhi str. Ty2 647, Typhi str. P-stx-12
285, Escherichia coli 55989
264, Escherichia coli APEC 01
251 , Escherichia coli 536
84, Neisseria gonorrhoeae FA 1090
81 , Neisseria gonorrhoeae NCCP11945 6, Acinetobacter baumannii AYE
6, Acinetobacter baumannii AB307-0294 4, Acinetobacter baumannii AB0057
Neisseria gonorrhoeae FA 1090
2039, Neisseria gonorrhoeae FA 1090
1189, Neisseria gonorrhoeae NCCP11945 186, Pseudomonas aeruginosa PA7
101 , Pseudomonas aeruginosa LESB58
85, Klebsiella pneumoniae 342
84, Klebsiella pneumoniae subsp. pneumoniae HS11286 41 , Typhimurium str. D23580
41 , Typhimurium str. T000240
41 , Typhimurium str. UK-1
36, Typhi str. P-stx-12
36, Typhi str. CT18
36, Typhi str. Ty2
3, Escherichia coli 55989
2, Escherichia coli 536
1 , Escherichia coli APEC 01
Neisseria gonorrhoeae NCCP11945
1827, Neisseria gonorrhoeae NCCP11945
1189, Neisseria gonorrhoeae FA 1090
181, Pseudomonas aeruginosa PA7
99, Pseudomonas aeruginosa LESB58
81 , Klebsiella pneumoniae subsp. pneumoniae HS11286
76, Klebsiella pneumoniae 342
43, Typhimurium str. D23580
42, Typhimurium str. T000240
42, Typhimurium str. UK-1 34, Typhi str. CT18
32, Typhi str. P-stx-12
32, Typhi str. Ty2
6, Escherichia coli 55989
4, Acinetobacter baumannii AYE
4, Escherichia coli 536
3, Acinetobacter baumannii AB0057
3, Acinetobacter baumannii AB307-0294
3, Escherichia coli APEC 01
Pseudomonas aeruginosa LESB58
9908, Pseudomonas aeruginosa LESB58
9891 , Pseudomonas aeruginosa PA7
2737, Klebsiella pneumoniae subsp. pneumoniae HS11286
2719, Klebsiella pneumoniae 342
493, Typhimurium str. T000240
478, Typhimurium str. D23580
477, Typhimurium str. UK-1
439, Typhi str. Ty2
437, Typhi str. P-stx-12
. 430, Typhi str. CT18
149, Escherichia coli 55989
143, Escherichia coli APEC 01
128, Escherichia coli 536
101, Neisseria gonorrhoeae FA 1090
99, Neisseria gonorrhoeae NCCP11945
Pseudomonas aeruginosa PA7
55494, Pseudomonas aeruginosa PA7
9891 , Pseudomonas aeruginosa LESB58
5681 , Klebsiella pneumoniae 342
5484, Klebsiella pneumoniae subsp. pneumoniae HS11286
734, Typhimurium str. T000240
708, Typhimurium str. D23580
703, Typhimurium str. UK-1
644, Typhi str. Ty2
639, Typhi str. CT18 637, Typhi str. P-stx-12
252, Escherichia coli 55989
232, Escherichia coli APEC 01
204, Escherichia coli 536
186, Neisseria gonorrhoeae FA 1090
181 , Neisseria gonorrhoeae NCCP11945
2, Acinetobacter baumannii AYE
2, Acinetobacter baumannii AB307-0294
1, Acinetobacter baumannii AB0057
Typhi str. CT18
2390, Typhi str. CT18
1759, Typhi str. Ty2
1758, Typhi str. P-stx-12
734, Typhimurium str. T000240
722, Typhimurium str. UK-1
720, Typhimurium str. D23580
686, Klebsiella pneumoniae 342
660, Klebsiella pneumoniae subsp. pneumoniae HS11286
639, Pseudomonas aeruginosa PA7
430, Pseudomonas aeruginosa LESB58
222, Escherichia coli 55989
208, Escherichia coli APEC 01
199, Escherichia coli 536
36, Neisseria gonorrhoeae FA 1090
34, Neisseria gonorrhoeae NCCP11945
2, Acinetobacter baumannii AB0057
1 , Acinetobacter baumannii AB307-0294
Typhi str. P-stx-12
2226, Typhi str. P-stx-12
2216, Typhi str. Ty2
1758, Typhi str. CT18
688, Klebsiella pneumoniae 342
647, Klebsiella pneumoniae subsp. pneumoniae HS11286 637, Pseudomonas aeruginosa PA7
635, Typhimurium str. T000240 630, Typhimurium str. D23580
630, Typhimurium str. UK-1
437, Pseudomonas aeruginosa LESB58
200, Escherichia coli 55989
191 , Escherichia coli APEC 01
177, Escherichia coli 536
36, Neisseria gonorrhoeae FA 1090
32, Neisseria gonorrhoeae NCCP11945
1 , Acinetobacter baumannii AYE
1 , Acinetobacter baumannii AB0057
1 , Acinetobacter baumannii AB307-0294
Typhi str. Ty2
2239, Typhi str. Ty2
2216, Typhi str. P-stx-12
1759, Typhi str. CT18
691 , Klebsiella pneumoniae 342
652, Klebsiella pneumoniae subsp. pneumoniae HS11286 644, Pseudomonas aeruginosa PA7
638, Typhimurium str. T000240
631 , Typhimurium str. D23580
631 , Typhimurium str. UK-1
439, Pseudomonas aeruginosa LESB58
200, Escherichia coli 55989
191 , Escherichia coli APEC 01
177, Escherichia coli 536
36, Neisseria gonorrhoeae FA 1090
32, Neisseria gonorrhoeae NCCP11945
1 , Acinetobacter baumannii AYE
1 , Acinetobacter baumannii AB0057
1 , Acinetobacter baumannii AB307-0294
Typhimurium str. D23580
1250, Typhimurium str. D23580
1223, Typhimurium str. UK-1
1205, Typhimurium str. T000240
768, Klebsiella pneumoniae 342 750, Klebsiella pneumoniae subsp. pneumoniae HS11286
720, Typhi str. CT18
708, Pseudomonas aeruginosa PA7
631 , Typhi str. Ty2
630, Typhi str. P-stx-12
478, Pseudomonas aeruginosa LESB58
241 , Escherichia coli 55989
220, Escherichia coli APEC 01
207, Escherichia coli 536
43, Neisseria gonorrhoeae NCCP11945
41 , Neisseria gonorrhoeae FA 1090
7, Acinetobacter baumannii AYE
7, Acinetobacter baumannii AB307-0294
2, Acinetobacter baumannii AB0057
Typhimurium str. T000240
1299, Typhimurium str. T000240
1217, Typhimurium str. UK-1
1205, Typhimurium str. D23580
782, Klebsiella pneumoniae 342
771 , Klebsiella pneumoniae subsp. pneumoniae HS11286
734, Typhi str. CT18
734, Pseudomonas aeruginosa PA7
638, Typhi str. Ty2
635, Typhi str. P-stx-12
493, Pseudomonas aeruginosa LESB58
248, Escherichia coli 55989
224, Escherichia coli APEC 01
212, Escherichia coli 536
42, Neisseria gonorrhoeae NCCP11945
41, Neisseria gonorrhoeae FA 1090
8, Acinetobacter baumannii AYE
8, Acinetobacter baumannii AB307-0294
3, Acinetobacter baumannii AB0057
Typhimurium str. UK-1
1250, Typhimurium str. UK-1 1223, Typhimurium str. D23580
1217, Typhimurium str. T000240
762, Klebsiella pneumoniae 342
746, Klebsiella pneumoniae subsp. pneumoniae HS11286
722, Typhi str. CT18
703, Pseudomonas aeruginosa PA7
631, Typhi str. Ty2
630, Typhi str. P-stx- 12
477, Pseudomonas aeruginosa LESB58
241, Escherichia coli 55989
220, Escherichia coli APEC 01
207, Escherichia coli 536
42, Neisseria gonorrhoeae NCCP11945
41 , Neisseria gonorrhoeae FA 1090
7, Acinetobacter baumannii AYE
7, Acinetobacter baumannii AB307-0294
2, Acinetobacter baumannii AB0057
Example 2: Identification of different micro-organisms in a mixed culture
To identify different micro-organisms in a mixed culture, the method searches for abundant repeats that are completely specific for a strain or subtype.
Acinetobacter baumannii AB0057 has 1493 12-mer sequences that are repeated 10 or more times. Of these, 385 are shared with Acinetobacter baumannii AYE, only 367 with Acinetobacter baumannii AB307-0294, and 6 or less with each of the other tested strains listed below.
Thus, there are more than 700 repeated sequences that can be used to differentiate Acinetobacter baumannii AB0057 from Acinetobacter baumannii AYE and Acinetobacter baumannii AB307-0294, and more than 1450 repeats that could identify it amongst all of the other tested strains.
Acinetobacter baumannii AB0057
1493, Acinetobacter baumannii AB0057 385, Acinetobacter baumannii AYE
367, Acinetobacter baumannii AB307-0294
6, Klebsiella pneumoniae 342
4, Klebsiella pneumoniae subsp. pneumoniae HS11286
4, Escherichia coli APEC 01
3, Escherichia coli 536
3, Neisseria gonorrhoeae NCCP11945
3, Typhimurium str. T000240
2, Typhi str. CT18
2, Typhimurium str. D23580
2, Escherichia coli 55989
2, Typhimurium str. UK-1
1 , Typhi str. P-stx-12
1 , Pseudomonas aeruginosa PA7
1 , Typhi str. Ty2
Example 3: Repeat hits and repeat counts
The above example shows that Acinetobacter baumannii AB0057 has 1493 12-mers that are repeated 10 or more times, and that 385 of these are shared with Acinetobacter baumannii AYE.
If the number of repeats of these 385 common 12-mers is taken into account, they share a total of 4780 repeated 12-mers. The number of "repeat hits" (385) is the number of distinct 12-mers, repeated 10 or more times, that the two strains have in common. The "repeat count" (4780) is the total number of these repeated 12-mers that the two strains have in common.
Thus, both the number of shared distinct repeated nucleotide n-mer sequences ("repeat h was unknown to the operator) its") and the number of copies of all shared repeated nucleotide n-mer sequences (the "repeat count") are useful statistics in the methods of the invention.
Example 4: Using FASTA data of a completed genome sequence (with any identifying information removed) comparison with high quality genome sequences available from the NCBI site (http://www.ncbi. nlm.nih.gov/genome/) gave the following identification parameters Sequence one: test is Escherichia coli 0104:H4 str. 2011 C-3493
3498, jwTest 1
3443, NC_010468 Escherichia coli ATCC 8739
3422, NC_017625 Escherichia coli
3395, NC_012947 Escherichia coli
3316, NC_016902 Escherichia coli
3222, NC_007779 Escherichia coli
3174, NC_000913 Escherichia coli
3170, NC_012759 Escherichia coli
3158, NC_017638 Escherichia coli
3145, NC 017663 Escherichia coli
3122, NC_017633 Escherichia coli
Sequence two: test is Pseudomonas aeruginosa NCGM2.S1
331660 jwTest 2
254203, NC_002516 Pseudomonas aeruginosa
231687, NCJD09656 Pseudomonas aeruginosa
The operator was blind to the identity of the organism from which the genome data was generated. Thus the invention was able to correctly identify the species from an unknown bacterial sequence and place in order of similarity other members of that species for which an appropriate sequence was available.
Equivalents \
The foregoing description details presently preferred embodiments of the present invention. Numerous modifications and variations in practice thereof are expected to occur to those skilled in the art upon consideration of these descriptions. Those modifications and variations are intended to be encompassed within the claims appended hereto.

Claims

CLAIMS:
1. A method for identifying a test microorganism comprising the steps of: (a) sequencing complete or partial genomic DNA from the test microorganism;
(b) identifying repeated nucleotide n-mer sequences present at a copy number≥r, and
(c) comparing the repeated nucleotide n-mer sequences identified in step (b) with those of one or more reference microorganism(s).
2. The method of claim 1 wherein step (c) comprises counting the number (or determining the frequency) of the repeated nucleotide n-mer sequences identified in step (b) that are shared with one or more reference microorganism(s).
3. The method of claim 2 further comprising the step of identifying the test microorganism on the basis of the degree to which the repeated nucleotide n-mer sequences are shared with the one or more reference microorganism(s).
4. The method of claim 2 or claim 3 wherein in step (c) the number of shared distinct repeated nucleotide n-mer sequences is counted, or the frequency thereof determined.
5. The method of claim 2 or claim 3 wherein in step (c) the number of shared copies of all repeated nucleotide n-mer sequences is counted, or the frequency thereof determined.
6. The method of claim 2 or claim 3 wherein in step (c) both the number of shared distinct repeated nucleotide n-mer sequences and the number of copies of all shared repeated nucleotide n-mer sequences are counted, or the frequencies thereof determined.
7. The method of any one of the preceding claims wherein step (c) comprises determining whether the repeated nucleotide n-mer sequences identified in step (b) include a distinctive repeated nucleotide n-mer sequence that is unique to a particular reference
microorganism.
8. The method of claim 7 further comprising the step of identifying the test microorganism on the basis of the presence of the distinctive repeated nucleotide n-mer sequence identified in step (c).
9. The method of any one of the preceding claims further comprising the step of determining the gap length between said repeated nucleotide n-mer sequences.
10. The method of any one of the preceding claims wherein n = 6-24.
11. The method of claim 10 wherein n = 8-16.
12. The method of claim 11 wherein n = 10-14.
13. The method of claim 12 wherein n is about 12.
14. The method of claim 13 wherein n = 12.
15. The method of any one of the preceding claims wherein r is 10, 15, 20, 25, 30, 35 or 40.
16. The method of any one of the preceding claims for identifying one or more
microorganism(s) present in a mixed culture.
17. The method of claim 16 for identifying two or more microorganism(s) present in mixed culture.
18. The method of any one of the preceding claims wherein the microorganism is from: (a) a clinical isolate; (b) a biological sample, for example a stool sample, blood sample, urine sample, saliva sample or swab; (c) an environmental sample; or (d) a food sample.
19. The method of any one of the preceding claims wherein step (c) comprises comparing the repeated nucleotide n-mer sequences identified in step (b) with those of a plurality of reference microorganisms!
20. The method of claim 19 wherein the plurality of reference microorganisms comprises bacterial pathogens.
21. The method of claim 19 or claim 20 wherein the reference microorganisms are selected from, (a) Gram-positive, Gram-negative and/or Gram-variable bacteria; (b) spore- forming bacteria; (c) non-spore forming bacteria; (d) filamentous bacteria; (e) intracellular bacteria; (f) obligate aerobes; (g) obligate anaerobes; (h) facultative anaerobes; (i) microaerophilic bacteria and/or (f) opportunistic bacterial pathogens.
22. The method of any one of claims 19-21 wherein the reference microorganisms are selected from: Ac/netobacter (e.g. A. baumannii); Aeromonas (e.g. A. hydrophila); Bacillus (e.g. B. anthracis); Bacteroides (e.g. B. fragilis); Bordetella (e.g. B. pertussis); Borrelia (e.g. B. burgdorferi); Brucella (e.g. B. abortus, B. canis, B. melitensis and B. suis); Burkholderia
(e.g. B. cepacia complex); Campylobacter (e.g. C. jejuni); Chlamydia (e.g. C. trachomatis,
C. suis and C. muridarum); Chlamydophila (e.g. (e.g. C. pneumoniae, C. pecorum, C. psittaci, C. abortus, C. felis and C. cav/'ae); Citrobacter (e.g. C. freundii); Clostridium (e.g.
C. botulinum, C. difficile, C. perfringens and C. tetani); Corynebacterium (e.g. C. diphteriae and C. glutamicum); Enterobacter (e.g. E. cloacae and E. aerogenes); Enterococcus (e.g.
E. faecalis and E. faecium); Escherichia (e.g. E. co!i); Flavobacterium; Francisella (e.g. F. tularensis); Fusobacterium (e.g. F. necrophorum); Haemophilus (e.g. H. somnus, H.
influenzae and H. parainfluenzae); Helicobacter (e.g. H. pylori); Klebsiella (e.g. K. oxytoca and K. pneumoniae), Legionella (e.g. L. pneumophila); Leptospira (e.g. L interrogans); Listeria (e.g. L. monocytogenes); Moraxella (e.g. M. catarrhalis); Morganella (e.g. M.
morganii); Mycobacterium (e.g. M. leprae and M. tuberculosis); Mycoplasma (e.g. M.
pneumoniae); Neisseria (e.g. N. gonorrhoeae and N. meningitidis); Pasteurella (e.g. P. multocida); Peptostreptococcus; Prevotella; Proteus (e.g. P. mirabilis and P vulgaris),
Pseudomonas (e.g. P. aeruginosa); Rickettsia (e.g. R rickettsii); Salmonella (e.g.
serotypes . Typhi and Typhimurium); Serratia (e.g. S. marcesens); Shigella (e.g. S.
flexnaria, S. dysenteriae and S. sonnei); Staphylococcus (e.g. S. aureus, S. haemolyticus,
S. intermedius, S. epidermidis and S. saprophytics); Stenotrophomonas (e.g. S.
maltophila); Streptococcus (e.g. S. agalactiae, S. mutans, S. pneumoniae and S.
pyogenes); Treponema (e.g. Γ. pallidum); Vibrio (e.g. V. cno/erae) and Yersinia (e.g. Y. pestis).
23. A method for characterizing a microorganism comprising the steps of: (a) sequencing complete or partial genomic DNA from the microorganism; (b) identifying repeated nucleotide n-mer sequences present at a copy number≥r, and optionally
(c) determining the gap length between repeated nucleotide n-mer sequence
sequences identified in step (b).
24. A method for classifying or identifying a microorganism comprising characterizing the microorganism according to the method of claim 23.
25. The method of any one of the preceding claims wherein the microorganism is a bacterium.
26. The method of any one of the preceding claims further comprising the step of extracting complete or partial genomic DNA from the test microorganism directly from a sample prior to the sequencing step (a).
27. The method of any one of claims 1-25 further comprising the step of culturing the test microorganism from a sample prior to the sequencing step (a).
28. The method of claim 26 or 27 wherein the sample is selected from: (a) a clinical isolate; (b) a biological or clinical sample, for example a stool sample, blood sample, urine sample, saliva sample or swab; (c) an environmental sample; or (d) a food sample.
29. A method for diagnosing infectious disease comprising the method of any one of the preceding claims.
30. A method for characterising the commensal flora in a human or non-human animal comprising the method of any one of the preceding claims.
31. A method for identifying and sub-typing bacteria comprising the method of any one of the preceding claims.
32. A computer system for use in a method as defined in any one of the preceding claims comprising: (a) a test microorganism database comprising the sequence and copy number of repeated nucleotide n-mer sequences present at a copy number≥r, and
(b) a reference database of complete or partial genomic DNA sequences of a plurality of reference microorganisms.
PCT/GB2013/000239 2012-05-24 2013-05-23 Characterization, classification and identification of microorganisms WO2013175164A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB1209122.9 2012-05-24
GBGB1209122.9A GB201209122D0 (en) 2012-05-24 2012-05-24 Characterization classification and identification of microorganisms
GB1307683.1 2013-04-29
GB201307683A GB201307683D0 (en) 2013-04-29 2013-04-29 Characterization, classification and identification of microorganisms

Publications (1)

Publication Number Publication Date
WO2013175164A1 true WO2013175164A1 (en) 2013-11-28

Family

ID=48626075

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2013/000239 WO2013175164A1 (en) 2012-05-24 2013-05-23 Characterization, classification and identification of microorganisms

Country Status (1)

Country Link
WO (1) WO2013175164A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016203246A1 (en) * 2015-06-17 2016-12-22 Isis Innovation Limited Method
CN112481402A (en) * 2020-12-29 2021-03-12 上海国际旅行卫生保健中心(上海海关口岸门诊部) Primer group for M.tuberculosis MLST typing detection based on Sanger sequencing and application thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020120408A1 (en) * 2000-09-06 2002-08-29 Kreiswirth Barry N. System and method for tracking and controlling infections

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020120408A1 (en) * 2000-09-06 2002-08-29 Kreiswirth Barry N. System and method for tracking and controlling infections

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CIAMMARUCONI ANDREA ET AL: "Fieldable genotyping of Bacillus anthracis and Yersinia pestis based on 25-loci Multi Locus VNTR Analysis", BMC MICROBIOLOGY, BIOMED CENTRAL LTD, GB, vol. 8, no. 1, 29 January 2008 (2008-01-29), pages 21, XP021033319, ISSN: 1471-2180 *
DENOEUD FRANCE ET AL: "Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains : a web-based resource", BMC BIOINFORMATICS, BIOMED CENTRAL, LONDON, GB, vol. 5, no. 1, 12 January 2004 (2004-01-12), pages 4, XP021000621, ISSN: 1471-2105, DOI: 10.1186/1471-2105-5-4 *
GUR-ARIE R ET AL: "Simple sequence repeats in escherichia coli: abundance, distribution, composition and polymorphism", GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, WOODBURY, NY, US, vol. 10, 1 January 2000 (2000-01-01), pages 62 - 71, XP002963866, ISSN: 1088-9051 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016203246A1 (en) * 2015-06-17 2016-12-22 Isis Innovation Limited Method
CN112481402A (en) * 2020-12-29 2021-03-12 上海国际旅行卫生保健中心(上海海关口岸门诊部) Primer group for M.tuberculosis MLST typing detection based on Sanger sequencing and application thereof
CN112481402B (en) * 2020-12-29 2024-03-22 上海国际旅行卫生保健中心(上海海关口岸门诊部) Mycobacterium tuberculosis MLST typing detection primer group based on Sanger sequencing and application thereof

Similar Documents

Publication Publication Date Title
Das et al. Understanding molecular identification and polyphasic taxonomic approaches for genetic relatedness and phylogenetic relationships of microorganisms
Sibley et al. Molecular methods for pathogen and microbial community detection and characterization: current and potential application in diagnostic microbiology
O'Sullivan Methods for analysis of the intestinal microflora
Eigner et al. Performance of a matrix-assisted laser desorption ionization-time-of-flight mass spectrometry system for the identification of bacterial isolates in the clinical routine laboratory
Shakya et al. Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities
Woo et al. Usefulness of the MicroSeq 500 16S ribosomal DNA-based bacterial identification system for identification of clinically significant bacterial isolates with ambiguous biochemical profiles
Vilo et al. Evaluation of the RDP classifier accuracy using 16S rRNA gene variable regions
WO2018217852A1 (en) Crispr based tool for characterizing bacterial serovar diversity
Raina et al. A polyphasic taxonomic approach for designation and description of novel microbial species
Mishra et al. Molecular revolution in the diagnosis of microbial brain abscesses
CN111534622B (en) Bacteroides rapid detection method based on high-throughput sequencing and application
Vandamme Taxonomy and classification of bacteria
Lau et al. Gene amplification and sequencing for bacterial identification
CN109797438A (en) A kind of joint component and library constructing method quantifying sequencing library building for the variable region 16S rDNA
Hoshino et al. Differential diagnostic assays for discriminating mycobacteria, especially for nontuberculous mycobacteria: what does the future hold?
CN112063702A (en) Method for analyzing and identifying clinical problematic strain by 16S rRNA gene sequence
Christensen et al. Ribosomal DNA sequencing: experiences from use in the Danish National Reference Laboratory for Identification of Bacteria
Bhattacharyya et al. Rapid identification and phylogenetic classification of diverse bacterial pathogens in a multiplexed hybridization assay targeting ribosomal RNA
WO2013175164A1 (en) Characterization, classification and identification of microorganisms
Teng et al. Evaluation of 16SpathDB 2.0, an automated 16S rRNA gene sequence database, using 689 complete bacterial genomes
Dingle et al. Molecular strain typing and characterisation of toxigenic Clostridium difficile
Godreuil et al. Which species concept for pathogenic bacteria?: An E-Debate
Fournier et al. Bacterial genomes
Faniyan et al. Analyzing bacterial species from different environments using direct 16S rRNA gene sequencing methods
Willner et al. Metagenomics and community profiling: culture-independent techniques in the clinical laboratory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13728781

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11/05/2015)

122 Ep: pct application non-entry in european phase

Ref document number: 13728781

Country of ref document: EP

Kind code of ref document: A1