WO2006048677A1 - Analysis of mass spectra for rapid microbial identification - Google Patents

Analysis of mass spectra for rapid microbial identification Download PDF

Info

Publication number
WO2006048677A1
WO2006048677A1 PCT/GB2005/004292 GB2005004292W WO2006048677A1 WO 2006048677 A1 WO2006048677 A1 WO 2006048677A1 GB 2005004292 W GB2005004292 W GB 2005004292W WO 2006048677 A1 WO2006048677 A1 WO 2006048677A1
Authority
WO
WIPO (PCT)
Prior art keywords
database
data
templates
replicates
records
Prior art date
Application number
PCT/GB2005/004292
Other languages
French (fr)
Inventor
Majeed Soufian
Original Assignee
Majeed Soufian
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Majeed Soufian filed Critical Majeed Soufian
Priority to EP05807934A priority Critical patent/EP1815373A1/en
Publication of WO2006048677A1 publication Critical patent/WO2006048677A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing

Definitions

  • the field of the invention is the analysis of complex data such as that derived from mass spectrometry and, in particular, the analysis, storage and comparison of spectra derived from biological samples.
  • mass spectrometry and in particular matrix-assisted laser desorption- ionisation, time of flight (MALDI-ToF) mass spectrometry in the analysis of biological macromolecules is well-known.
  • MALDI-ToF matrix-assisted laser desorption- ionisation, time of flight
  • ICM intact cell mass spectrometry
  • US patent application US2002/0138210 relates to a method of compensating for changes in 'fingerprint spectra' of micro-organsisms resulting from environmental changes, based on comparison with observed changes in the spectra of related and unrelated organisms subjected to similar environmental changes using principle component analysis (PCA).
  • PCA principle component analysis
  • the methods that may be used to detect differences in the spectra are artificial intelligence methods, including neural networks and fuzzy logic.
  • fuzzy logic especially using fuzzy sets
  • neural networks neurofuzzy and intelligent systems
  • fuzzy logic especially using fuzzy sets
  • neural networks neurofuzzy and intelligent systems
  • the radial base function of the spectral data may be applied across a neural network, which may use to analyse pattern distributions of radial base functions of the local kernel clusters using Cover's Theorem. Two important points flow from this use of Cover's Theorem.
  • a non-linear transformation 0 of input patterns, X, to a Euclidean measurement space 0: X ⁇ E d might transform a complex pattern classification problem into a linearly separable one.
  • the high dimensionality of measurement space E d compared to the input space means that a complex pattern classification problem cast in this high dimensional space is more likely to be linearly separable than in a low dimension input space.
  • a search engine based on this approach was developed (Manchester Metropolitan University Search Engine - MUSE).
  • a method of improving the quality of a database for use with such a search method was described in International Application WO 01/67295. The method comprises determining a single searchable reference point for a plurality of replicate samples of each item, establishing the co-ordinates of the replicate reference points in high dimensional space and thereafter determining a single reference co-ordinate for the cluster of replicate reference points for initial searching and/or comparison.
  • MUSE whole data are taken into account and transformed into a higher dimensional space, this includes noise and disturbance created by instrument. Therefore the final results in MUSE are be dependent on the quality of the instrument too.
  • Pathology labs are well equipped to rapidly detect at species level, but clinical diagnosis often requires identification at strain level.
  • the species Salmonella has over 2000 different strains, and the correct identification of Escherichia coli and Staphylococcus aureus has crucial importance for effective treatment planning and public health policy.
  • it presently takes 2-7 days to identify a meningococcal infection to strain level, following sub-culturing, which itself can take between 6-48 hours.
  • the invention provides a novel method of analysing complex datasets, such as mass spectrometric data, applicable to detection, identification, typing and analysis.
  • steps 1 and 2 are well-known.
  • the invention concerns steps 3 and 4. It will be clear to one of skill in the art that the methods disclosed are of more general application than the field of identification of micro-organisms alone.
  • the invention provides a method for creating a reference database of orthonormal mass spectrum data.
  • the spectra are MALDI-ToF spectra derived from intact cells. It is also preferred that the spectra are derived from micro ⁇ organisms, more preferably from pathogenic micro-organisms, most preferably pathogenic bacteria including Staphylococcus aureus (including antibiotic resistant strains, such as methicillin-resistant Staphylococcus aureus or MRSA), Escherichia coli (including pathogenic strains such as 0157), Salmonella sp, Enterococcus faecium, Listeria monocytogenes.
  • the method comprises the following steps.
  • the MALDI-ToF spectrum of a set of replicates is acquired.
  • the generated spectra may be inspected for any error and consistent replicates selected.
  • a series of templates of meaningful data and regions of interest (such as peaks) for each record are defined using selected replicate spectra for each record. This is done by setting a number of thresholds dividing data into non- null data (above the threshold) and null data (below the threshold). Preferably 2 to 10 thresholds are set, more preferably 5 to 7.
  • the number of contiguous non-null data regions in all replicates is determined.
  • Missing non-null data in one or more replicates may be searched for by lowering the threshold.
  • the degree of similarity of the templates is examined by means of a dendrogram across the replicates and most similar ones are selected.
  • steps 5-7 it is possible to set a different threshold which is higher than above threshold if the minimum number of templates that will be selected is at least equal to less than 3 times higher than number of records in the database.
  • the range of masses used in the expanding templates is determined by using a histogram of templates across the whole range of records in the database.
  • Common templates are selected from the population of the same strains in the database records. These common templates are taken to represent that strain in the database and are termed the 'basic PeaksCell'.
  • PeaksCells are expanded or modelled across all obtained mass ranges and expanded models of templates are defined as PeaksCells.
  • the PeaksCell could have different dimensions ranging from 1 to any desired accuracy.
  • a simple model of the contiguous non-null data, PeaksCell or basic PeaksCell such as boxes, spikes or curvilinear co ⁇ ordinates
  • the PeaksCells are obtained by expanding the region of meaningful data in terms of any localised functions (i.e. Gaussians).
  • the expanded model of each strain in the database is expressed as a vector.
  • the vectors are then transformed to an orthonormal (orthogonal) set of vectors (Ri), which span vector space (G), which have no projection on each other.
  • Rh orthonormal (orthogonal) set of vectors
  • G vector space
  • new vectors are in the directions of the components with no projection on any other vectors in old space, and are called invariants. Although it is possible to transform them in any other direction, this has no benefit.
  • the dimension that new transformed vectors span in the new space is equal to the dimension of the old space and is less than the dimension of each vector itself. In this way, the dimension of each vector is much less than the dimension of the original spectral data.
  • the invention provides a method of comparing unknown microbes with records in the reference database for the purpose of identification.
  • X is projected into an orthonormal set of vectors (Ri) giving the component of x in each vector Ri (x.Ri).
  • fuzzy calculus may be used as a means of projection.
  • the membership function corresponding to the record in the database which attains the largest value provides the identification.
  • the invention provides a search engine which analyses spectral data for the identification of Intact Cell MALDI (ICM) fingerprints of micro-organisms at strain level rapidly and accurately by use of one or more of the methods herein described.
  • ICM Intact Cell MALDI
  • the invention provides a method for discovering similarities in sources of epidemic infections and identifying the spread of pathogenic micro ⁇ organisms within a society and for facilitating the limitation of such outbreaks and cross infection control according to the mthods herein described and comprising the additional steps of finding and selecting templates in a population of similar strains in the database records which are not common with each other. These uncommon templates represent the differences within said strains in the database.
  • the vectors obtained by the methods described are then transformed to an orthonormal (orthogonal) set of vectors say (FTi), which span a vector space (G ' ), which has no projection on each other. In this way new vectors are in the directions of the components with no projection on any other vectors in old space, and are called variants as they points the differences in strain level.
  • the invention provides a method of identifying a micro-organism comprising the methods herein described.
  • the invention provides a computer program for performing the methods herein described, recorded on a data carrier. Also provided is a computer programmed to carry out one or more of said methods. Further provided is an apparatus for analysing mass spectra comprising a computer programmed to carry out said methods.
  • BiocypherTM A search engine known as BiocypherTM , has been developed as an intelligent search/analysis engine that has demonstrated capability of learning and interpreting complex spectral data.
  • Figure 2 (a) ICM and selected threshold
  • Figure 3 a curvilinear coordinate (r ⁇ z) model of the spectrum of Figures 1 and 2.
  • Figure 4 (a) a complex peak and
  • Figure 6 shows distinguishing biomarkers in different strains of MRSA
  • Figure 7 shows two replicates of the same strain of MRSA Figure 8; shows the results of 13 isolates of 3 different strains analysed by conventional techniques, including MUSE.
  • 1 Vancomycin res. Enterococcus
  • 2 Vancomycin res. Enterococcus
  • 3 Vancomycin res.
  • Enterococcus 4: Enterococcus faecium, 5: Enterococcus faecium, 6: Enterococcus faecium, 7: Enterococcus faecium , 8: Enterococcus faecium, 9: Enterococcus faecium, 10:
  • Enterococcus faecium 11 : Enterococcus faecium, 49: Listeria monocytogenes, 50: Listeria monocytogenes
  • Figure 9 shows 3 strains of vancomycin resistant Enterococcus
  • Figure 10 shows 8 strains of Enterococcus faecium
  • Figure 11 shows the results of Figure 8 analysed by means of simple 'boxes' as in Figure
  • FIG. 12 BioCypher analysis of 3 strains of vancomycin resistant Enterococcus
  • the invention provides a new search and analysis tool designed to identify unknown spectra obtained from whole cells.
  • BioCypherTM operates at a much lower dimension than original ICM data, whilst at the same time retaining the required higher discrimination power to resolve complex data.
  • BioCypherTM does not require ICM data have the same dimensions or to be compatible with each other, since it does not need to work in a fixed dimension. It works in an adaptive way.
  • BioCypherTM The basic concept behind BioCypherTM is that working in a lower dimension and achieving higher discrimination at the same time is possible by adaptively focusing on a particular region of interest and not whole universe of discourse. This method of focusing and magnifying a region of interest avoids the limitations imposed by Cover's theorem.
  • BioCypherTM examines ICM data and sets a threshold [HOW?], which breaks the data into a series of regions of data surrounded by common background (Null data). By doing this, BioCypherTM effectively reduce the dimensionality of ICM data and at the same time increase reproducibility of the whole system.
  • the threshold is a value to quote noise level in the system and null data are all data below threshold.
  • the selection of threshold is important and implies a compromise between filtering of random noise and handling undesirable effect of shift in ICM data. Lower values result in more robust handling of shift and taking more small meaningful variations into account but at the same time accepting more moise and interference.
  • PeaksCellTM Spikes of negligible area which include 5 or fewer datapoints are difficult or impossible to model and so are not included as PeaksCells.
  • PeaksCellTM means a region of data in a mass spectrum, which lies above a defined threshold and which is delimited along the x axis by regions of null data lying below the threshold.
  • PeaksCellTM is a term used to describe and model non-null data surrounded by null data. Note that a PeaksCellTM is neither only a peak, nor numerical data which describe ICM data. It may be considered as a dimension that contains or generates other dimensions. It is a model, which describes non-null data surrounded by null data, in that sense that it could be described by a few parameters with much lower dimensionality than non-null data and with visualization possibility for example in a curvilinear coordinate system (see Figure 3) .
  • 'BioVariant'TM means a series of PeaksCells belonging to one orthonormal dataset from a transformed spectrum in a set of replicates.
  • 'Biolnvariant'TM means a set of PeaksCellsTM common to a set of replicate data from identical samples. A histogram of all such sets, for all records in the database, defines informative mass regions that should be spanned and transformed or modelled, for example in terms of localised functions.
  • PeaksCeIITMs model of ICM data it is possible to handle uncertainties and variation in ICM data, since noise is excluded or minimised in system calculations and it has been observe that shift has much less effect in PeaksCellTM sets than in the raw spectrum itself.
  • PeaksCellsTM also provide a clearer and simpler visualisation of ICM data. It is possible to perform a rough search and analysis of ICM database using even conventional clustering, multivariate statistical analysis or pattern recognition techniques including intelligent systems, fuzzy logic, neural networks or neurofuzzy systems, although the application of the appropriate technique depends on how PeaksCellTM are modelled.
  • the inventions discloses the following methods. 1. If PeaksCellTMs are considered as a series or array of numerical parameters, then they may be modelled or presented in an appropriate curvilinear coordinate system such as (r ⁇ z, Figure 3), then apply conventional multivariate statistical analysis can be applied directly to a concatenation of arrays including all arrays of numerical parameters of the PeaksCellTM models.
  • PeaksCellTMs may be most efficiently used, by integrating integrating the models as a part of the above pattern recognition models.
  • PeaksCellTMs can be considered as a special first layer of them.
  • the discrimination power of the search engine and analysis is dependent on how PeaksCellTMs are presented and interpreted. This is like focusing on a specific area of map and increasing magnification power for that area.
  • a PeaksCellTM can be broken further into its components. Two ways are considered here for such 'diffraction' of a PeaksCellTM as follows:
  • PeaksCellTM The first way can be used mostly without affecting the PeaksCellTM itself.
  • This kind of PeaksCellTM is called a primary PeaksCellTM.
  • the primary components of PeaksCellTM are normally due to instrument accuracy. It can be however increased without increasing instrument accuracy by adding more primary components satisfifying the boundary conditions of PeaksCellTMs.
  • PeaksCellTM itself might be concealed (partly or wholly) and as a result new primary PeaksCellTMs might be created and replace it. This is because of effects such as double charge ions or PSD that have been included in ICM data. The new PeaksCellTMs might themselves undergo further diffraction of first type.
  • the invention provides specific criteria for considering differences in the origin of ICM data and taking into account which parts of should be considered together. This includes: a) parts of data that are related to similar effects of the same pathogens, b) parts of data, which are not related to anything or are related to everything, c) parts of data that are related to differences within the same pathogens, and d) parts of data that are related to different pathogens.
  • PeaksCellTMs themselves can be divided according to the variations they may present in a population of replicates of known micro-organisms. All PeaksCellTMs which show constant presentation in the population of the replicates of known micro ⁇ organism (group a) are defined as BiolnvariantTM, and the remaining PeaksCellTMs within that population (group c) are called BioVariantTM.
  • the BiolnvariantTMs in a database of mixed pathogens can themselves be divided further into those that are nearly common and those that are not common not at all (group d).
  • BioVariantTMs in a population of the same known pathogens can be divided further into those that are common in a subgroup of the population and those that are not common not at all (group b).
  • BiolnvariantTM groups a, b and d
  • BioVariantTM groups b and c
  • BioCypherTM to analyse PeaksCellTMs by employing "human” inspired techniques for analysing, typing, and classifying biological patterns. This may be achieved as follows.
  • the degree of belongingness of 'A' to 'B' is determined by membership functions of the projected patterns.
  • BiolnvariantTM (groups a, b and d) may be used for discriminating and identification of micro-organisms in handling contamination and mix-culture cases.
  • BioVariantTMs (points b and c) are useful for discovering the source of epidemic infections, epidemiology and cross-infection control.
  • membership functions can have values other than null and one, they introduce the degree of belongingness of 'A' to other biological patterns in the database, such as 'C, 'D' etc. This is common for biological patterns, which have a degree of similarity and overlap.
  • each record may be considered and the analysis above repeated.
  • the results may be presented in the form of a membership function matrix, which can be used for pictorial purposes, such as drawing a dendrogram.
  • a maximising decision is defined as a point in space of the alternatives at which the membership functions of decision attains its maximum value.
  • negation of membership functions when we are interested to know which biological patterns should not be considered, or whether the unknown pattern exists in the database or not. In this case we will have: 0) j J j

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computing Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the analysis of complex data such as that derived from mass spectrometry and, in particular, the analysis, storage and comparison of spectra derived from biological samples. The inventions discloses methods for the efficient analysis and comparison of large datasets such as those generated by MALDI-TOF MS of intact cells, which allow the identification of species and individual strains of micro-organisms present even in mixed culture samples.

Description

Identification of Micro-organisms
Field of Invention
The field of the invention is the analysis of complex data such as that derived from mass spectrometry and, in particular, the analysis, storage and comparison of spectra derived from biological samples.
Background
The use of mass spectrometry, and in particular matrix-assisted laser desorption- ionisation, time of flight (MALDI-ToF) mass spectrometry in the analysis of biological macromolecules is well-known. Although originally used in the analysis of semi- purified solutions of peptides, this approach has more recently been applied to the analysis of peptides and other macromolecules and fragments thereof derived directly from whole cells, especially bacterial cells, with the objective of using it to allow identification of species or strains of interest (Krishnamurthy and Ross, 1996, Rapid Commun Mass Spectrom 10: 1992; Claydon et al, 1996, Nature Biotechnology 14: 1584; Hettick et a/, 2004, Anal Chem 76: 5769).
One major problem with the use of intact cell mass spectrometry (ICM) is the complexity and poor reproducibility of the spectra generated from such complex mixtures of biological macromolecules and the wide variety of contaminants. The identification of informative peaks and patterns from such heterogenous spectra and the frequently poor and variable signal-to-noise ratios presents significant technical challenges, not least in terms of the very large datasets represented by individual stored spectra. Any workable system for recognising spectra and thereby identifying particular species, strains or mutants present must involve some form of comparative database comprising stored spectra and be capable of analysing and comparing a very large number of features in order to identify matching spectra.
In addition to the problem of the sheer size of the datasets involved, the reproducibility of data generated from ICM techniques suffers from shift, (especially at higher masses), uncertainty in amplitude value, and variations due to sample condition (age, culture, preparation etc) unrelated to the species itself. Such trivial or artefactual differences between spectra are difficult for any analytical system to deal
i with since the rules for deciding on the significance of the variations are complex and subtle.
A variety of analytical approaches have been attempted including conventional multivariate statistical analysis and pattern recognition techniques such as intelligent systems, fuzzy logic, and neural networks. Some instrument makers have used modified Jaccard methods and RMS (i.e. effective averaging of spectrum) but without effective success. Such methods involve large computational tasks, which may be practicable for small data sets but may take days for a full comparison to be made.
International patent application WO 00/29987 discloses the generation of mass spectra from whole cells, the peaks from these are compared with predicted molecular masses from sequence databases. There is no disclosure of the concept of building a database of complex spectra derived from whole cells, nor of any computing method to analyse such spectra and assign identity.
International patent application WO 01/79523 uses a scoring algorithm used in conjunction with mass spectrometric analysis. However, the application requires the use of a proteome database, rather than a database of reference spectra, and the method relies on determining a probability of observing false matches between compared spectra.
US patent application US2002/0138210 relates to a method of compensating for changes in 'fingerprint spectra' of micro-organsisms resulting from environmental changes, based on comparison with observed changes in the spectra of related and unrelated organisms subjected to similar environmental changes using principle component analysis (PCA). Among the methods that may be used to detect differences in the spectra are artificial intelligence methods, including neural networks and fuzzy logic.
The problem with fuzzy logic (especially using fuzzy sets), neural networks, neurofuzzy and intelligent systems is that they can not resolve the difference between similar micro-organisms within required resolutions and accuracy (strain level) as they are either too fuzzy or too rigid.
International application WO 00/28573 describes a method of comparing complex datasets, especially mass spectra, based on the principle of defining a plurality of datapoints to be compared across a complete range of data, converting each datapoint to a vector spatial function, said function being characteristic of the position/shape and/or relative intensity of the data at that point, assembling the vector spatial functions for the data range in question as a cluster and then determining the kernel function of said cluster. A radial base function of the cluster kernel of the sample is then determined, which is compared with radial base function of the cluster kernel of of the other data items in the database. The radial base function of the spectral data may be applied across a neural network, which may use to analyse pattern distributions of radial base functions of the local kernel clusters using Cover's Theorem. Two important points flow from this use of Cover's Theorem.
1. A non-linear transformation 0 of input patterns, X, to a Euclidean measurement space 0: X→Ed might transform a complex pattern classification problem into a linearly separable one.
2. The high dimensionality of measurement space Ed compared to the input space means that a complex pattern classification problem cast in this high dimensional space is more likely to be linearly separable than in a low dimension input space.
A search engine based on this approach was developed (Manchester Metropolitan University Search Engine - MUSE). A method of improving the quality of a database for use with such a search method was described in International Application WO 01/67295. The method comprises determining a single searchable reference point for a plurality of replicate samples of each item, establishing the co-ordinates of the replicate reference points in high dimensional space and thereafter determining a single reference co-ordinate for the cluster of replicate reference points for initial searching and/or comparison.
The use of MMUSE based on Cover's theorem and using fuzzy sets overlaps for transferring data to a higher dimensional space was had some success in dealing with above problems. Fuzzy sets were also used in the integration of neural networks with fuzzy logic to develop a so-called 'neurofuzzy' system. However, there remain significant limitations in MUSE. It is true that increasing dimensionality helps in achieving higher discrimination in dealing with ICM data but this is at the costs of the 'curse of dimensionality', which means that the number of calculations required increases rapidly with the increase of dimensionality. The calculation time required is significant and limits the practicality of this approach. There are also implications for the error created in the transformation to a higher dimensional space and computations in that space. For example, if an error of order 0.01 ppm (i.e., 10"8), is created in each calculation for a typical ICM data of dimension 104, the final accumulative error, depending on the size of calculation, would be around 100 ppm, which is comparable with the error created by ICM process itself. This also creates an upper level limit in using MUSE.
A further problem with the application of ICM analysis to practical problems in health care, such diagnosis and epidemiology, is the high probability of samples containing cells of more than one species or strain. The presence of unrelated ICM data due to impurities and mixed-culture samples leads to major difficulties even for relatively sophisticated analytical approaches such as MUSE, since, by transforming all data to a higher dimensional space there is then no means of separating the misleading data.
In MUSE whole data are taken into account and transformed into a higher dimensional space, this includes noise and disturbance created by instrument. Therefore the final results in MUSE are be dependent on the quality of the instrument too.
Because of the way that MUSE works, no particular region of interest is selected, and a whole map or universe of discourse must be magnified and focused. This demands a higher dimensional space, a higher number of calculations and the overall performance and discrimination power of the search engine is limited by errors created by the high number of calculations and required time for performing these calculations. Therefore the search engine is not always able to deliver the required magnification for the whole map or required discrimination power for typing of some micro-organisms.
It is an object of the invention to be able to identify and detect micro-organisms at strain level. Pathology labs are well equipped to rapidly detect at species level, but clinical diagnosis often requires identification at strain level. For example, the species Salmonella has over 2000 different strains, and the correct identification of Escherichia coli and Staphylococcus aureus has crucial importance for effective treatment planning and public health policy. For example, it presently takes 2-7 days to identify a meningococcal infection to strain level, following sub-culturing, which itself can take between 6-48 hours. It is an object of the invention to provide a method of analysing spectral data, generated from bombarding whole bacterial cells (after 18 hour cultures) on the intact cell- MALDI-TOF-MS, much more quickly.
Statement of Invention
The invention provides a novel method of analysing complex datasets, such as mass spectrometric data, applicable to detection, identification, typing and analysis.
There are four clear steps in the process as applied to identifying micro-organisms.
1. Isolating and preparing the test micro-organism for testing
2. Creating the spectra
3. Creating and using a database of reference spectra
4. Analysing the spectra for identification
Methods for steps 1 and 2 are well-known. The invention concerns steps 3 and 4. It will be clear to one of skill in the art that the methods disclosed are of more general application than the field of identification of micro-organisms alone.
In one aspect, the invention provides a method for creating a reference database of orthonormal mass spectrum data. Preferably the spectra are MALDI-ToF spectra derived from intact cells. It is also preferred that the spectra are derived from micro¬ organisms, more preferably from pathogenic micro-organisms, most preferably pathogenic bacteria including Staphylococcus aureus (including antibiotic resistant strains, such as methicillin-resistant Staphylococcus aureus or MRSA), Escherichia coli (including pathogenic strains such as 0157), Salmonella sp, Enterococcus faecium, Listeria monocytogenes. The method comprises the following steps.
1. For each record in the database, the MALDI-ToF spectrum of a set of replicates (identical samples) is acquired. Preferably 2 to 20, and more preferably 5 to 10 replicates are acquired.
2. Optionally, the generated spectra may be inspected for any error and consistent replicates selected. 3. A series of templates of meaningful data and regions of interest (such as peaks) for each record are defined using selected replicate spectra for each record. This is done by setting a number of thresholds dividing data into non- null data (above the threshold) and null data (below the threshold). Preferably 2 to 10 thresholds are set, more preferably 5 to 7.
4. In the first highest threshold, the number of contiguous non-null data regions in all replicates is determined.
5. Missing non-null data in one or more replicates (as compared with the others) may be searched for by lowering the threshold.
6. Lower thresholds are selected until all replicates produce similar templates or the minimum threshold is reached.
7. The degree of similarity of the templates is examined by means of a dendrogram across the replicates and most similar ones are selected.
As an alternative to steps 5-7, it is possible to set a different threshold which is higher than above threshold if the minimum number of templates that will be selected is at least equal to less than 3 times higher than number of records in the database.
8. The range of masses used in the expanding templates is determined by using a histogram of templates across the whole range of records in the database. 9. Common templates are selected from the population of the same strains in the database records. These common templates are taken to represent that strain in the database and are termed the 'basic PeaksCell'.
10. Templates are expanded or modelled across all obtained mass ranges and expanded models of templates are defined as PeaksCells. In this way, depending on the complexity of the expanded model, the PeaksCell could have different dimensions ranging from 1 to any desired accuracy. In a database of very different micro-organisms and when identification to strain level is not required, a simple model of the contiguous non-null data, PeaksCell or basic PeaksCell (such as boxes, spikes or curvilinear co¬ ordinates) would be sufficient for identification purposes. For accurate identification of closely related organisms at strain level, the PeaksCells are obtained by expanding the region of meaningful data in terms of any localised functions (i.e. Gaussians).
11. The expanded model of each strain in the database is expressed as a vector. 12. The vectors are then transformed to an orthonormal (orthogonal) set of vectors (Ri), which span vector space (G), which have no projection on each other. In this way new vectors are in the directions of the components with no projection on any other vectors in old space, and are called invariants. Although it is possible to transform them in any other direction, this has no benefit. The dimension that new transformed vectors span in the new space is equal to the dimension of the old space and is less than the dimension of each vector itself. In this way, the dimension of each vector is much less than the dimension of the original spectral data.
In a second aspect, the invention provides a method of comparing unknown microbes with records in the reference database for the purpose of identification.
1. For an unknown spectrum, the method as described above in steps 1 to 12 is followed, resulting in a vector (x) in vector space G.
2. X is projected into an orthonormal set of vectors (Ri) giving the component of x in each vector Ri (x.Ri).
3. In the case of a pure culture and if the unknown microbe strain exists in the database, it is expected that all major components will be in the direction of the vector corresponding to the record (micro-organism and strain) which identifies the unknown.
However, in situations where there are variations and uncertainty resulting in very small components in the remaining projections, fuzzy calculus may be used as a means of projection. In this case, the membership function corresponding to the record in the database which attains the largest value provides the identification.
In the case of an unknown not present in the database, projection of x produces small components in any direction.
In cases of contamination or mixed sample cultures, there will be more than one major component in the direction of vectors corresponding to the records of micro¬ organisms in the database, which identify the unknown micro-organisms. In other words, there would be more than one major membership function corresponding to the records of the database. In a third aspect the invention provides a search engine which analyses spectral data for the identification of Intact Cell MALDI (ICM) fingerprints of micro-organisms at strain level rapidly and accurately by use of one or more of the methods herein described.
In a fourth aspect, the invention provides a method for discovering similarities in sources of epidemic infections and identifying the spread of pathogenic micro¬ organisms within a society and for facilitating the limitation of such outbreaks and cross infection control according to the mthods herein described and comprising the additional steps of finding and selecting templates in a population of similar strains in the database records which are not common with each other. These uncommon templates represent the differences within said strains in the database. The vectors obtained by the methods described are then transformed to an orthonormal (orthogonal) set of vectors say (FTi), which span a vector space (G'), which has no projection on each other. In this way new vectors are in the directions of the components with no projection on any other vectors in old space, and are called variants as they points the differences in strain level.
In a further aspect, the invention provides a method of identifying a micro-organism comprising the methods herein described.
In a final aspect, the invention provides a computer program for performing the methods herein described, recorded on a data carrier. Also provided is a computer programmed to carry out one or more of said methods. Further provided is an apparatus for analysing mass spectra comprising a computer programmed to carry out said methods.
Detailed description of the invention
A search engine known as Biocypher™ , has been developed as an intelligent search/analysis engine that has demonstrated capability of learning and interpreting complex spectral data.
The invention will now be described in more detail, with reference to the following drawings.
Figure 1 : Outline of ICM process and Biocypher analysis
Figure 2: (a) ICM and selected threshold;
(b) simple boxes are used to encapsulate selected peaks;
(c) a simplified model of (b) depicting only the centres of the selected peaks.
Figure 3: a curvilinear coordinate (r θ z) model of the spectrum of Figures 1 and 2. Figure 4: (a) a complex peak and
(b) its internal imaginary components Figure 5: (a) a complex peak and
(b) its external imaginary components
Figure 6: shows distinguishing biomarkers in different strains of MRSA Figure 7: shows two replicates of the same strain of MRSA Figure 8; shows the results of 13 isolates of 3 different strains analysed by conventional techniques, including MUSE. 1 : Vancomycin res. Enterococcus, 2: Vancomycin res. Enterococcus, 3: Vancomycin res.
Enterococcus, 4: Enterococcus faecium, 5: Enterococcus faecium, 6: Enterococcus faecium, 7: Enterococcus faecium , 8: Enterococcus faecium, 9: Enterococcus faecium, 10:
Enterococcus faecium, 11 : Enterococcus faecium, 49: Listeria monocytogenes, 50: Listeria monocytogenes
Figure 9: shows 3 strains of vancomycin resistant Enterococcus
Figure 10: shows 8 strains of Enterococcus faecium
Figure 11 : shows the results of Figure 8 analysed by means of simple 'boxes' as in Figure
2(b), which is an improvement but does not yet have the required accuracy produced by
BioCypher
Figure 12: BioCypher analysis of 3 strains of vancomycin resistant Enterococcus The invention provides a new search and analysis tool designed to identify unknown spectra obtained from whole cells. BioCypher™ operates at a much lower dimension than original ICM data, whilst at the same time retaining the required higher discrimination power to resolve complex data. BioCypher™ does not require ICM data have the same dimensions or to be compatible with each other, since it does not need to work in a fixed dimension. It works in an adaptive way.
The basic concept behind BioCypher™ is that working in a lower dimension and achieving higher discrimination at the same time is possible by adaptively focusing on a particular region of interest and not whole universe of discourse. This method of focusing and magnifying a region of interest avoids the limitations imposed by Cover's theorem.
BioCypher™ examines ICM data and sets a threshold [HOW?], which breaks the data into a series of regions of data surrounded by common background (Null data). By doing this, BioCypher™ effectively reduce the dimensionality of ICM data and at the same time increase reproducibility of the whole system. The threshold is a value to quote noise level in the system and null data are all data below threshold. The selection of threshold is important and implies a compromise between filtering of random noise and handling undesirable effect of shift in ICM data. Lower values result in more robust handling of shift and taking more small meaningful variations into account but at the same time accepting more moise and interference. Higher values mean filtering out noise but results in less robust handling of shift and ignoring small meaningful variations (although we may be able to analysis noise for driving meaningful information from it). Please note that to avoid spikes in ICM data, not all data above threshold would be considered for having a new PeaksCell™. Spikes of negligible area which include 5 or fewer datapoints are difficult or impossible to model and so are not included as PeaksCells.
'PeaksCell'™ means a region of data in a mass spectrum, which lies above a defined threshold and which is delimited along the x axis by regions of null data lying below the threshold. PeaksCell™ is a term used to describe and model non-null data surrounded by null data. Note that a PeaksCell™ is neither only a peak, nor numerical data which describe ICM data. It may be considered as a dimension that contains or generates other dimensions. It is a model, which describes non-null data surrounded by null data, in that sense that it could be described by a few parameters with much lower dimensionality than non-null data and with visualization possibility for example in a curvilinear coordinate system (see Figure 3) .
'BioVariant'™ means a series of PeaksCells belonging to one orthonormal dataset from a transformed spectrum in a set of replicates.
'Biolnvariant'™ means a set of PeaksCells™ common to a set of replicate data from identical samples. A histogram of all such sets, for all records in the database, defines informative mass regions that should be spanned and transformed or modelled, for example in terms of localised functions.
The problem is then how to obtain an appropriate model of each PeaksCell™. It is important to note that the complexities of describing models and the precision with which PeaksCell™s can be defined by descriptive models will determine the resolution achievable. There are a number of model structures available for applying to PeaksCell™ with its own advantages and disadvantages. It is possible to select the most suitable and there is no general restriction to limit all PeaksCell™ to the same selected model structure. It is within the capability of a person skilled in the art to select the appropriate model.
With a reasonable PeaksCeII™s model of ICM data, it is possible to handle uncertainties and variation in ICM data, since noise is excluded or minimised in system calculations and it has been observe that shift has much less effect in PeaksCell™ sets than in the raw spectrum itself.
Further, it is possible to resolve problems regarding incompatibility and dimensionality differences in ICM data (created by using different instrument resolutions), because PeaksCell™s are not dependent on the number or dimension of data.
PeaksCells™ also provide a clearer and simpler visualisation of ICM data. It is possible to perform a rough search and analysis of ICM database using even conventional clustering, multivariate statistical analysis or pattern recognition techniques including intelligent systems, fuzzy logic, neural networks or neurofuzzy systems, although the application of the appropriate technique depends on how PeaksCell™ are modelled. The inventions discloses the following methods. 1. If PeaksCell™s are considered as a series or array of numerical parameters, then they may be modelled or presented in an appropriate curvilinear coordinate system such as (r θ z, Figure 3), then apply conventional multivariate statistical analysis can be applied directly to a concatenation of arrays including all arrays of numerical parameters of the PeaksCell™ models.
2. For pattern recognition techniques, including intelligent systems, fuzzy logic, neural networks or neurofuzzy systems, PeaksCell™s, may be most efficiently used, by integrating integrating the models as a part of the above pattern recognition models. For example PeaksCell™s can be considered as a special first layer of them.
The discrimination power of the search engine and analysis is dependent on how PeaksCell™s are presented and interpreted. This is like focusing on a specific area of map and increasing magnification power for that area. A PeaksCell™ can be broken further into its components. Two ways are considered here for such 'diffraction' of a PeaksCell™ as follows:
1. Superposition of constituent components inside the PeaksCell™
2. Superposition of constituent components outside the PeaksCell™
The first way can be used mostly without affecting the PeaksCell™ itself. This kind of PeaksCell™ is called a primary PeaksCell™. The primary components of PeaksCell™ are normally due to instrument accuracy. It can be however increased without increasing instrument accuracy by adding more primary components satisfifying the boundary conditions of PeaksCell™s.
However in the second way the PeaksCell™ itself might be concealed (partly or wholly) and as a result new primary PeaksCell™s might be created and replace it. This is because of effects such as double charge ions or PSD that have been included in ICM data. The new PeaksCell™s might themselves undergo further diffraction of first type.
The invention provides specific criteria for considering differences in the origin of ICM data and taking into account which parts of should be considered together. This includes: a) parts of data that are related to similar effects of the same pathogens, b) parts of data, which are not related to anything or are related to everything, c) parts of data that are related to differences within the same pathogens, and d) parts of data that are related to different pathogens.
Therefore PeaksCell™s themselves can be divided according to the variations they may present in a population of replicates of known micro-organisms. All PeaksCell™s which show constant presentation in the population of the replicates of known micro¬ organism (group a) are defined as Biolnvariant™, and the remaining PeaksCell™s within that population (group c) are called BioVariant™. The Biolnvariant™s in a database of mixed pathogens can themselves be divided further into those that are nearly common and those that are not common not at all (group d). Similarly BioVariant™s in a population of the same known pathogens can be divided further into those that are common in a subgroup of the population and those that are not common not at all (group b).
These divisions are especially important since they allow for more discrimination power for handling contamination and mixed cultures, and for epidemiology applications and cross-infection control, as follows;
i. the use of Biolnvariant™ (groups a, b and d) for discriminating and identifying micro-organisms in cases of contamination and mixed cultures; ii. the use of BioVariant™ (groups b and c) for discovering of similarities in the source of epidemic infections and identification the spread of micro-organisms in a society (like hospitals) for handling outbreaks and cross-infection control.
Microbiologists and epidemiologists normally use learning capability rather than algorithms for typing and classification problems. One possibility is for BioCypher™ to analyse PeaksCell™s by employing "human" inspired techniques for analysing, typing, and classifying biological patterns. This may be achieved as follows.
1. Definition of specific attributes of the spectra of the biological patterns as words. This involves extraction or identification of PeaksCell™ (Biolnvariant™s and BioVariant™s) features and representing them as linguistic variables.
2. Each of the linguistic variables defines a single membership function of the spectral pattern under investigation. 3. When all the attributes of an unknown spectral pattern (say A) are correctly assigned to all the "PeaksCell™" of a known pattern (say B) in the database, then A is a full member of B.
(P J J J J
R : If xi is Fi and xn is Fn then v is G
(where xi, 1 = 1 ,2 n, are PeaksCell™ (Biolnvariant™ or BioVariant™) defined by linguistic variables Fi and G. Note that the dimension of xi is depend on how PeaksCell™s are built).
The degree of belongingness of 'A' to 'B' is determined by membership functions of the projected patterns. As mentioned above, Biolnvariant™ (groups a, b and d) may be used for discriminating and identification of micro-organisms in handling contamination and mix-culture cases. BioVariant™s (points b and c) are useful for discovering the source of epidemic infections, epidemiology and cross-infection control.
As the membership functions can have values other than null and one, they introduce the degree of belongingness of 'A' to other biological patterns in the database, such as 'C, 'D' etc. This is common for biological patterns, which have a degree of similarity and overlap.
In analysing a database each record may be considered and the analysis above repeated. When all records have been considered individually against the rest of records in the database, the results may be presented in the form of a membership function matrix, which can be used for pictorial purposes, such as drawing a dendrogram.
If this is repeated for only an unknown (or test sample) against all of records in an ICM database, then the highest degree of belongingness determines the appropriate candidate for the unknown pattern. A maximising decision is defined as a point in space of the alternatives at which the membership functions of decision attains its maximum value. There is a possibility of using negation of membership functions when we are interested to know which biological patterns should not be considered, or whether the unknown pattern exists in the database or not. In this case we will have: 0) j J j
R : If xi is Fi and xn is Fn, then v is NOT G
Using the linguistic rules, above, and calculus of these linguistic rules, it is possible to make checks on the completeness, interaction, consistency, and generality of the database.

Claims

Claims
1. A method of creating a reference database representing records of mass spectra comprising the steps: a) acquiring, for each record in the database, the spectrum of a set of replicates; b) defining a series of templates of meaningful data and regions of interest for each record using selected replicate spectra by setting a plurality of thresholds dividing data into non-null data (above the threshold) and null data (below the threshold); c) determining, at the highest threshold, the number of contiguous non-null data regions in all replicates; d) selecting progressively lower thresholds until all replicates produce similar templates or a minimum threshold is reached; e) examining the degree of similarity of the templates by means of a dendrogram applied across the replicates; f) determining the range of masses used in the expanding templates by using a histogram of templates across the whole range of records in the database; g) selecting common templates from the population of the same strains in the database records and defining these as the 'basic PeaksCell' representing that strain in the database; h) expanding templates across all obtained mass ranges and defining them as
PeaksCells. i) expressing an expanded model of each strain in the database as a vector (x); j) transforming the vector (x) to an orthonormai set of vectors (Ri) spanning vector space (G)
2. The method of claim 1 , wherein 2 to 10 replicates are acquired.
3. The method of claim 2, wherein 5 to 7 replicates are acquired.
4. A method of comparing an unknown spectrum with records in a reference database according to claim 1 , for the purpose of identification, further comprising the steps: a) projecting X into an orthonormai set of vectors (Ri) giving the component of x in each vector Ri (x.Ri). b) comparing the direction major component with the records in the reference database to identify a corresponding record.
5. A method of identifying a micro-organism comprising the steps of either of claims 1 or 4.
6. A computer program for performing the method of any preceding claim, recorded on a data carrier.
7. A computer programmed to carrier out the method of any of claims 1 to 5.
8. An apparatus for analysing mass spectra comprising a computer according to claim 7.
PCT/GB2005/004292 2004-11-05 2005-11-07 Analysis of mass spectra for rapid microbial identification WO2006048677A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP05807934A EP1815373A1 (en) 2004-11-05 2005-11-07 Analysis of mass spectra for rapid microbial identification

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0424519A GB0424519D0 (en) 2004-11-05 2004-11-05 Identification of micro-organisms
GB0424519.7 2004-11-05

Publications (1)

Publication Number Publication Date
WO2006048677A1 true WO2006048677A1 (en) 2006-05-11

Family

ID=33523262

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2005/004292 WO2006048677A1 (en) 2004-11-05 2005-11-07 Analysis of mass spectra for rapid microbial identification

Country Status (3)

Country Link
EP (1) EP1815373A1 (en)
GB (1) GB0424519D0 (en)
WO (1) WO2006048677A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2762557A1 (en) * 2011-10-18 2014-08-06 Shimadzu Corporation Cell identification device and program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030078739A1 (en) * 2001-10-05 2003-04-24 Surromed, Inc. Feature list extraction from data sets such as spectra

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030078739A1 (en) * 2001-10-05 2003-04-24 Surromed, Inc. Feature list extraction from data sets such as spectra

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JARMAN K H ET AL: "Extracting and visualizing matrix-assisted laser desorption/ionization time-of-flight mass spectral fingerprints.", RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM. 1999, vol. 13, no. 15, 1999, pages 1586 - 1594, XP002365812, ISSN: 0951-4198 *
SAHOTA RACHHPAL S ET AL: "Vector representation, feature selection, and fingerprinting: An application of pattern recognition to pyrolysis-gas chromatography/mass spectrometry of nucleosides", ANALYTICAL CHEMISTRY, vol. 65, no. 1, 1993, pages 70 - 77, XP002365811, ISSN: 0003-2700 *
WIEMER J C ET AL: "Bioinformatics in proteomics: application, terminology, and pitfalls", PATHOLOGY RESEARCH AND PRACTICE, GUSTAV FISCHER, STUTTGART, DE, vol. 200, no. 2, 30 April 2004 (2004-04-30), pages 173 - 178, XP004959034, ISSN: 0344-0338 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2762557A1 (en) * 2011-10-18 2014-08-06 Shimadzu Corporation Cell identification device and program
EP2762557A4 (en) * 2011-10-18 2015-03-25 Shimadzu Corp Cell identification device and program
US10385379B2 (en) 2011-10-18 2019-08-20 Shimadzu Corporation Cell identification device and program
US10550418B2 (en) 2011-10-18 2020-02-04 Shimadzu Corporation Cell identification device and program

Also Published As

Publication number Publication date
GB0424519D0 (en) 2004-12-08
EP1815373A1 (en) 2007-08-08

Similar Documents

Publication Publication Date Title
US6466923B1 (en) Method and apparatus for biomathematical pattern recognition
Wang et al. Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
CN109241741B (en) Malicious code classification method based on image texture fingerprints
Valera et al. Automatic discovery of the statistical types of variables in a dataset
Raymer et al. Simultaneous feature extraction and selection using a masking genetic algorithm
Pires de Lima et al. Convolutional neural networks as an aid to biostratigraphy and micropaleontology: a test on late Paleozoic microfossils
Barnett et al. Endnote: Feature-based classification of networks
Sun et al. Boosting an associative classifier
Fuda et al. Artificial intelligence in clinical multiparameter flow cytometry and mass cytometry–key tools and progress
Pernice et al. Out of distribution generalization via interventional style transfer in single-cell microscopy
EP1815373A1 (en) Analysis of mass spectra for rapid microbial identification
WO2000028573A2 (en) Data analysis
Fuchs et al. The impact of variable selection and transformation on the interpretability and accuracy of fuzzy models
EP3304374B1 (en) Sample mass spectrum analysis
Mithra et al. An efficient approach to sputum image segmentation using improved fuzzy local information c means clustering algorithm for tuberculosis diagnosis
Li et al. Gene function classification using fuzzy k-nearest neighbor approach
Lee et al. Utilizing Negative Markers for Identifying Mycobacteria Species based on Mass Spectrometry with Machine Learning Methods
CN113257342B (en) Protein interaction site prediction method based on residue position characteristics
Abou-Taleb et al. Hybridizing filters and wrapper approaches for improving the classification accuracy of microarray dataset
Bhoomeshwar et al. Random Forest Classifier For Classifying Birds Species using Scikitlearn
CN117612747B (en) Drug sensitivity prediction method and device for klebsiella pneumoniae
Cao et al. Scalable Outlier Detection Using Distance Projections
Huang et al. Hybrid svm/cart classification of pathogenic species of bacterial meningitis with surface-enhanced raman scattering
Wu et al. An Effective Feature Fusion Method for Protein Subnuclear Localization
Kweku A Combined Functional Data & Mixture Models Approach for Modeling and Classification of Nanomotions

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KN KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2005807934

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2005807934

Country of ref document: EP