WO2005015199A1

WO2005015199A1 - Methods and systems for chromatography/mass-spectrometry analysis

Info

Publication number: WO2005015199A1
Application number: PCT/EP2004/007340
Authority: WO
Inventors: Anders Kaplan; Lennart Bjorkesten
Original assignee: Amersham Biosciences Ab
Priority date: 2003-07-21
Filing date: 2004-07-06
Publication date: 2005-02-17
Also published as: GB0316942D0; GB2404193A

Abstract

The method and measurement system according to the invention performs combined Chromatography and Mass Spectroscopy Spectrometry analysis and comprises the steps of: performing an C/MS analysis (300); generating at least one first elution profile (305), wherein one dimension is an elution time of the chromatography, and one dimension is mass to charge ratio (m/z), and at least one dimension a signal intensity, and the signal from each biomolecule species is dispersed forming a plurality of signal peaks associated with each biomolecule species in the elution profile; and a step of automated alignment of elution profiles, correlating the variations in intensity indicative of specific biomolecules species over at least two of the first elution profiles. The automated alignment produces a set of aligned elution profiles, thereby facilitating an annotation process and/or a matching process.

Description

METHODS AND SYSTEMS FOR CHROMATOGRAPHY/MASS- SPECTROMETRY ANALYSIS

TECHNICAL FIELD OF INVENTION

The present invention relates to the study of biological samples containing a mixture of biomolecules, e.g. peptides, in order to identify, characterise and quantify individual biomolecules, and more particularly to methods and systems for profiling the relative abundance of at least some of the individual biomolecules across different experimental and biological conditions.

BACKGROUND OF THE INVENTION A widespread method of studying protein content in biological samples is by using two- dimensional gel electrophoresis in combination with mass spectrometry, see for example, Kennedy, S., Toxicol. Lett. 2001, 120, 379-384. Two-dimensional gel electrophoresis is limited to the analysis of molecules with a molecular mass greater than approximately 10 kDa and there are no well-established methods to globally address the content of proteins and peptides below this limit.

Many of these smaller protein and peptide molecules play an important role in many biological processes and the advent of a method to routinely analyse peptide content in biological samples would therefore be a significant advance. Liquid chromatography (LC) coupled with mass spectrometry (MS) has emerged as a promising tool in proteomics capable of dealing with the inherent complexity in the biological samples and an increasing number of reports have been published illustrating the usefulness in combining LC and MS. It is suggested in "A neuroproteomic approach to targeting neuropeptides in the brain ", Skδld K, Svensson M, Kaplan A, Bjδrkesten L, Astrδm J and Andren P, Proteomics, 2, 447-454, that neuropeptides in the mass range of 300- 5000 Da can be analysed by on-line nanoscale capillary reversed phase liquid chromatography (CRP LC) followed by electrospray ionisation quadrupole-time of flight mass spectrometry (ESI Q-TOF MS). The article describes how the relative abundance of individual biomolecules across samples representing different experimental and biological conditions can be profiled and differences between the samples shown. Samples containing biomolecules were run through nanoscale CPR LC and ESI Q-TOF MS. Each run resulted in an elution profile. Each individual data point in the elution profile represented an intensity value, or ion count, obtained from the MS detector for a particular chromotographic elution time and a particular m z value. 3D representations of these elution profiles were drawn in which the y-axis showed the m/z ratio, the x-axis showed the elution time and the z-axis represented ion counts. Comparisons between the different samples were performed by manually selecting similar regions on the 3 D representations of the different samples, integrating the ion counts within the regions and comparing the integrated ion counts of corresponding regions.

A problem with this method is that it is time-consuming and labour intensive as, because the elution times of samples in LC columns may vary from run to run, it is not possible to simply overlay different representations of elution profiles on top of each other, instead the corresponding regions on the different representations have to manually identified, selected and marked so that they can be compared to each other.

An automated method of processing LC/MS-data with multiway analysis (PCA and PARAFAC modelling) is suggested in "Chromatographic alignment and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatographic-mass spectroscopy data" by D. Bylund et al, J. of Chromatography A, 961 (2002) 237-244. The aim of this method is to identify and quantify peptides directly from LC/MS elution profiles or to detect differences from a reference sample. To facilitate the use of, for example, PARAFAC modelling, elution profiles are, in a preprocessing step, automatically aligned with respect to the elution time by warping and dynamic programming. The step of auto alignment is based on a method disclosed in "Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping" by N.P.N Nielsen et al, J. of

Chromatography A, 805 (1998) 17-35. The method of Bylund et al. is shown to work with a small number of peptide species (up to 7 reported) present in a sample. However, a sample in a typical study may contain hundreds of peptide species and millions of data points. Such a sample would, if analyzed with the method according to Bylund et al., require a factorisation of a very large matrix - a mathematical operation that is known to be very computationally heavy and in most realistic experimental situation would create unacceptably long processing times. In addition, the factorisation process may not lead to a unique identification of a peptide, but to a linear combination of peptides, or noise.

The above mentioned studies illustrate the usefulness of LC/MS investigations. However, to make LC/MS-based analysis a method that can be routinely used for analysing peptide content in biological samples further requirements have to be met. Most importantly, the method has to be able to screen a large amount of data and profile the relative abundance of some of the individual biomolecules across different experimental and biological conditions. Due to the vast amount of data produced in a typical experiment, the method needs to be at least partly automated.

Furthermore, an attractive method needs to provide means for confirmation and validation of the result. This will be of special importance in fully automated methods and/or if advanced statistical methods like multivariate analysis are used, since these usually powerful analysis methods in certain cases can yield doubtful or misleading results even if the statistical measures indicate a high accuracy. In these cases an ability to trace a gh level result back to its origins in the source data would be of high value.

SUMMARY OF THE INVENTION

The objective problem is to provide a method and measurement system of analysing LC/MS data for profiling the relative abundance of some of the individual biomolecules across different experimental and biological conditions adapted for the vast amount of data typically appearing in real experiments. Furthermore, it should be possible to trace back high level results to their origins in the source data.

The problem is solved by the method as defined in claim 1, the measurement system as defined in claim 15 and the computer program product defined in claim 16. Further improved methods and measurement systems have the features mentioned in the respective dependent claims.

The method of performing a combined Chromatography and Mass Spectrometry analysis (C/MS) according to the present invention comprises the steps of: -performing an C/MS analysis;

-generating a plurality of first elution profiles, at least one elution profile per sample. The first elution profiles are multidimensional representations of the data resulting from the C/MS analysis wherein one dimension is the elution time of the chromatography, one dimension is mass to charge ratio (m/z), and at least one dimension a signal intensity. A characteristic variation, typically a peak, in the signal intensity is an indication of the existence of a specific biomolecule. The method is characterized in that it comprises the steps of:

-generating peptide maps of the elution profiles in which data from the same biomolecule species are reassembled;

-matching the peptide maps, the matching linking the peptide species across the different elution profiles; and a step of aligning elution profiles, comprising an automated alignment between at least two first elution profiles for correlating the variations in intensity indicative of specific biomolecules species over at least two of the first elution profiles. The automated alignment produces a set of aligned elution profiles, thereby facilitating the annotation process and/or the matching process. The alignment is, for example, adapted to compensate for variations in elution time between the first elution profiles.

In one embodiment of the method according to the present invention, the fact that the first elution profiles are aligned is utilised to facilitate the annotation process. A first elution profile is annotated from scratch, giving a reference peptide map. The reference peptide map is used to annotate and match the rest of the individual elution profiles in the set of aligned elution profiles. The annotation and matching is, thanks to the alignment, reduced to a simple projection of the reference map onto the individual elution profiles.

In another embodiment of the invention, data from the individual elution profiles of the set of aligned elution profiles are comprised in a consensus profile. The consensus profile representing characteristic intensity variations appearing in the aligned elution profiles. The consensus profile can enhance the signals of the individual elution profiles and thereby facilitate the annotation. The consensus profile may also be used to annotate essentially all biomolecule species appearing in the first elution profile in a single annotation process. In a further embodiment data from at least two aligned elution profiles are comprised in a differential profile, the differential profile indicating differences in the intensity variations appearing in the at least two first elution profiles.

One advantage afforded by the present invention is that the automated alignment makes it possible to screen a large amount of data and profile the relative abundance of some biomolecule species across different samples.

A further advantage is that the enhancement in the signal intensity afforded b the consensus profile can be used to detect weak signals typically corresponding to biomolecule species with low abundance.

Another advantage is that in the method according to the present invention it is possible to trace a high level result back to its origins in the source data.

BRIEF DESCRIPTION OF THE FIGURES

The features and advantages of the present invention outlined above are described more fully below in the detailed description in conjunction with the drawings where like reference numerals refer to like elements throughout, in which:

FIG. 1 is a schematic block diagram illustrating a system to practise the method of the invention;

FIG. 2a) is an example of an elution profile produced by the system of FIG 1, and 2b) and 2c) illustrate the signal dispersion caused by different isotopes and charge states; FIG. 3 is a flowchart illustrating the main steps of the method according to the invention;

FIG. 4a) shows a reference elution profile and FIG. 4b) shows a target elution profile;

FIG. 5 shows how the method according to the invention is used to generate a consensus elution profile from an aligned set of elution profiles; and FIG. 6 shows how the method according to the invention is used to generate a differential elution profile from an aligned set of elution profiles.

DETAILED DESCRIPTION

A Chromatography/Mass-Spectrometry (C/MS) analysis of a biological system is typically performed by running a plurality of samples representing different conditions in a biological system under study, through a combination of C/MS instrumentation. The chromatography can be seen as a separation method and the mass-spectrometry as a method of detection. Currently the most used and most promising method, for the separation of biomolecules comprises Liquid Chromatography (LC). However, other separation methods may also be used, for example Gas Chromatography (GC). The inventive method and measurement system of the present invention will be described using, but is not limited to, liquid chromatography. An instrumental setup, schematically illustrated in FIG. 1, suitable for performing LC/MS analysis according to the method of the present invention, comprises a sample inlet 110, a carrier inlet 115, a flow control unit 120, at least one chromatography columnl25, a mass spectrometer interface 130, a mass spectrometer 135 a controlling means such as a control unit 140 and an analysing means such as analysis unit 145. The liquid chromatograph typically comprises a reversed phase column and is commercially available from for example LC Packings, Amsterdam, The Netherlands or TheπnoFirmigan, San Jose, USA. The mass spectrometer may preferably operate according to the time of flight (TOF) or triple stage quadrupole (TSQ) principles, but other MS devices are conceivable. Commercially available spectrometers and electrospray units suitable for the measurement system according to the invention are available from, for example, Micromass, Manchester, UK and ThermoFinnigan, San Jose, USA. The controlling means 140 and analysing means 145 are typically realized by a PC or PCs with high computational and storage capacity as the computational loads will be substantial. The controlling 140 and analysing means 145 are in communication with the chromatography column 125 and the MS 135, and possible with other units (not shown) responsible for sample preparation or transportation, for example. The method according to the invention is preferably at least partly automated and implemented as a software program or a plurality of software program modules stored and executed in the controlling 140 and/or analysing means 145. Using the above exemplified instrumental setup, elution profiles of the type described in the background section may be produced. An example of an elution profile is depicted in FIG. 2a, having the m/z ratio represented on the y-axis, the elution time t on the x-axis, and the z-axis representing ion counts I. Each biomolecule species in the sample will typically, as will be further described below, produce characteristic variations, peaks, in the z-dimension. Due to, for example, the existence of different isotopes and different charge states, each biomolecule species will typically cause a plurality of peaks. As appreciated by those skilled in the art, the instrumental setup adapted for producing elution profiles with the described characteristics, may be realized in a number of various ways, and the above should be regarded as a non-limiting example of an instrumental setup adapted for performing the method according to the present invention.

In the description the use of the method and the measurement arrangement according to the present invention, will be exemplified by the analysis of peptides in a biological system. The peptides are of special interest due to their importance in many biological processes. However, the methods and measurement systems according to the present invention are not limited to the study of peptides. A wide range of biomolecules, especially molecules with masses smaller than lOkDa, can advantageously be analyzed with the method and measurement system disclosed herein. The term biomolecules should be interpreted as including both single biomolecules and biomolecule complexes.

A proteomic experiment typically includes a plurality of varieties, e.g. a treated group and a control group of subjects, i.e. patients, animals, colonies etc., generating a large and diverse data set. The LC/MS analysis can be pictured as dispersing the signal from each peptide species in the elution time and m/z dimensions. The typically large data set and the dispersion of the signal constitutes an information handling problem. In the method according to the invention the vast amount of data is handled by alternately using refined data representations, the original elution profiles and using peptide maps generate from elution profiles. The refined data representations are, for example, a consensus elution profile combining the data of several elution profiles or a differential profile highlighting differences between individual elution profiles. Preferably, throughout the method, although refined data representations are used, the raw data and the links between the raw and refined data are always preserved, in order to be able to "go back" to confirm a result and to be able to perform further analysis either on the data already collected or to initiate further analysis processes. The preservation of raw data and the possibility to alternatively use refined and corresponding original raw data are useful for the checking the reliability of the results generated by a method in accordance with the present invention. In one embodiment of a method according to the present invention, regions of interest, corresponding to peptides showing an interesting variation over a set of samples, may be selected based on the variation behaviour, before the peptides have been identified. In said method a region with an interesting signal variation between different profiles and selecting a region of interest for further analysis is detected, without attempting to identify the peptides before the selection.

As discussed above, the LC/MS analysis can be pictured as a dispersion of the signal from each peptide species in the elution time and m/z dimensions and each peptide species will typically yield a plurality of peaks in the elution profile. If the resolution of the mass spectrometer is high enough, different isotopes of the same peptide species will be separated in the elution profile. Characteristic "isotope ladders" 205 can be seen in the elution profiles, as exemplified in FIG. 2b. Another type of dispersion of the signal is inflicted by the experimental method. The commonly used electrospray interface of the mass spectrometer often produces several kinds of molecule-adduct ion complexes with varying number of adduct ions. These are referred to as different charge states of the peptide. As the mass spectrometer measures the mass-to-charge ratio, not just the mass, these different charge states will end up at different positions in the elution profiles. Hence one peptide species may appear in several charge states, each consisting of several molecule isotopes as illustrated in FIG. 2c. For a peptide species of mass M, containing / additional neutrons and aggregated with z adduct ions (charge state), peaks may be expected at:

. , . M + iz ,, , .. ,

(m/z)_{i z} = = (M + ι)/z + l eq. 1 ' z

wherein it is assumed that the spacing between isotopes and the adduct ion mass are precisely 1 Da. As indicated in the figure the "distance" between different isotopes of the same peptide species will be 1/z.

If the separation of different isotopes are distinguishable or not, will depend on the mass spectrometer resolution. The resolution of the mass spectrometer may in turn depend on m/z. A peptide species will typically appear in the elution profile with separated isotopes, i.e. well-defined peaks, for the charge states with low z and as less well- defined "blobs" including several isotopes, for higher z. In order to, for example, compare the abundances of certain peptide species between different samples, it is in most application advantageous to reassemble, or link, all peaks originating from the same peptide species. The aim of the reassembling is to generate a peptide map corresponding to an elution profiles. In the peptide map all dispersed signals relating to each peptide species in one elution profile are, if possible, brought together.

To be able to compare the relative abundance between different peptide species and/or the changes in abundances of certain peptides between different experimental and biological conditions, typically represented by different samples (and hence different elution profiles), it is necessary to also link peptide species across different samples represented by individual elution profiles and peptide maps thus forming a global annotation. The global annotation is preferably achieved by an automated matching process as will be described below.

Even though the theoretical relation between different peaks of the same peptide species is known according to the above, the generation of peptide maps and the matching are in practice not trivial tasks. The complications arise from several factors. In a typical sample a large number of different peptides are present, and peaks may be very close or overlapping, making it difficult to, taking experimental uncertainties under consideration, for example ascribe the correct charge states to a specific peptide. In addition, typically not all the charge states are represented and their relations are not known. Noise will always be present, both as a background noise level and as spurious noise peaks. The noise may lead to falsely identified peptides peaks. One complication of special importance is caused by experimental variations, most pronounced as unpredictable variations in the elution time. Elution profiles from identical samples may be shifted and/or compressed or expanded in the elution time when compared to each other. The method according to the present invention offers an automated alignment process that overcomes the problems associated with the variations in the elution times and hence greatly simplifies the global annotation. The process can further be used to give an enhancement of the peaks of the elution profiles which can facilitate the compilation of the peptide map. The process may also be used to highlight differences between elution profiles and to transfer a preferably verified annotation from one reference elution profile to other elution profiles. The annotation and the matching processes are typically very time consuming and/or computationally heavy. The method according to the present invention, using the automated alignment as a pre-processing to the annotation and matching facilitate the annotation/matching in a plurality of ways, as described above. The use of the automated alignment, representing different embodiments of the invention, comprises, but is not limited to the following cases:

A. The automated alignment process is used to produce a consensus profile. The consensus profile is used for the annotation. By combining signals from the elution profiles the peaks are enhanced and the signal-to-noise ratio is lower than compared to individual profiles. This simplifies the annotation. Another important simplification is that all peptides species are annotated at the same time, greatly reducing the time needed for the annotation. The peptide map of the consensus profile is either used as it is or is projected on to the individual elution profiles thus providing an annotation for each profile.

B. The automated alignment process is used to produce a differential profile. The differential profile is used to highlight interesting variations between the individual elution profiles.

C. The annotation is performed on one elution profile from the set of aligned profiles, producing a reference peptide map, which is then projected on to the rest of the elution profiles in the set of aligned profiles. Only one elution profile need to be annotated from scratch, thus the time needed for the annotation of all elution profiles is significantly reduced.

D. As in the previous embodiment a reference peptide map is used. In this embodiment the reference peptide map is produced in a previous run or retrieved from a library of reference peptide maps, for example, and used to annotate, or compare with the elution profiles of the present run. This alternative could be particularly useful if a large number of runs with similar experimental and biological conditions are performed, in which a high degree of similarity between the individual elution profiles are to be expected. Alternatively the reference is not a peptide map, but a reference elution profile which is automatically aligned with the elution profiles of the present run. By comparing the reference with the elution profiles, for example with the aid of a differential elution profile, changes may be detected without annotating the elution profiles of the present run. This could be particularly useful in quality controls of batches in drug processing, for example. As long as the elution profiles correspond to the reference no further analysis is needed. If a discrepancy is detected further analysis and control of that batch may be initiated.

E. The automated alignment process is used to facilitate the matching of peptide maps across the different elution profiles. All elution profiles have been individually annotated and individual peptides maps generated, and a more elaborate matching process of the peptide maps than the above projection is required. The fact that the elution profiles are aligned may significantly improve the quality of the matching process.

The above embodiments may be combined with each other and/or extended in various ways. In an experiment with, for example, two distinctively different groups of samples, for example a control group and a treated group, it may be useful to use two different reference elution profiles or peptide maps. It is also possible to use the simple matching by projection according to embodiments A, C and D as a preliminary matching and use the more elaborate matching of embodiment E only if the preliminary matching seems to be insufficient. The more elaborate matching can also be used as a check if the projection yields relevant results. Thanks to that preferably in methods according to the present invention raw and refined data is always kept, it is possible to perform such verifications.

The main steps of method according to the present invention, which will be described with references to the flowchart of FIG.3, comprises the steps of:

-performing an C/MS analysis 300.

-generating first elution profiles 305.

-automated alignment 310 of first elution profiles adapted to compensate for experimental variations in elution time and to facilitate the production of peptide maps. The automated alignment between at least two first elution profiles correlates the variations in signal strength indicative of specific peptide species over at least two of the first elution profiles. At least one set of aligned elution profiles is produced.

-optionally generating refined data representations 315 from the set of aligned elution profiles, by using the alignment between the first elution profiles. Example of refined data representations include a consensus elution profile comprising a summation of the features of the individual first elution profiles, and a differential profile representing differential data between a plurality of first elution profiles adapted to emphasize differences in signal variation over the first elution profiles.

-generating peptide maps 320 by an annotation process in which data from the same peptide species is reassembled. Peptide maps may be formed from either the first elution profiles or on a refined data representation and utilises that the elution profiles have been aligned in the automated alignment step 310.

-matching 330 the peptide species of the peptide maps are matched to each other utilising that the elution profiles have been aligned. The matching links the peptide species across the different elution profiles, for example representing different experimental and biological conditions, and produces a global annotation.

-optionally performing measurement and evaluation 335 for profiling the relative abundance of some of the individual peptide species across different experimental and biological conditions. The abundance profiles are based on the global annotation obtained in the preceding steps.

The steps of the method will be described in detail below:

Performing a C/MS analysis 300 and generating first elution profiles 305 Two or more peptide-containing samples are run through a combination of LC/MS instrumentation according to the setup described above. The samples could typically represent different conditions in a biological system being studied. The simplest case is a differential experiment aiming at highlighting peptide species for which there is a large change in abundance between two different experimental conditions. A more advanced experimental design involves more than two conditions and/or introduces replication, i.e., the use of more than one sample per experimental condition. By the use of well-established statistical methods it is possible to assign statistical significance to abundance changes between the different conditions.

The measurement system according to FIG. 1 is used for carrying out the method according to the invention. Each run resulted in an elution profile. Each individual data point in the elution profile represented an intensity value, or ion count, obtained from the MS detector for a particular chromotographic elution time and a particular m/z value. 3D representations of these elution profiles were drawn in which the y-axis showed the m/z ^"ratio, the x-axis showed the elution time and the z-axis represented ion counts. In certain cases, depending on the characteristics of the measurement system, a re-sampling is needed to compensate for shifts in the m/z-spectra. This is an established and well-known procedure. The step of generating typically produces a set of first elution profiles in which a characteristic variation in the signal intensity is an indication of the existence of a, or part of a, specific peptide species.

Automated alignment of elution profiles 310 The first elution profiles are aligned in order to facilitate further comparison between the profiles. It is typically not necessary to perform any non-trivial corrections in the m/z-dimension. In the LC elution time dimension, however, the experimental variation between runs may be substantial. The experimental variation in this dimension is of non-trivial nature. Elution profiles from identical samples may for example be shifted and/or compressed or expanded in the elution time dimension then compared to each other. To be able to compare different elution profiles and to detect common features in them it is necessary to use a more elaborate alignment procedure. The alignment may based on the methods described by Nest Nielsen et al, used for aligning multiple wavelength chromatograms in the field of chemometry.

According to the method of the present invention one elution profile is taken to be the alignment reference, and the other elution profiles are denoted targets. The target profiles are then aligned, one by one, to the alignment reference profile. The choice of reference is not important as long as the elution profiles are fairly similar. The goal of the in-pair alignment is to distort the target profile in the elution time dimension in such a way that the similarity to the reference profile is maximised. This is accomplished as follows: The reference profile and the target profile are divided into a number of consecutive segments, the boundary points of which are called nodes. The nodes of the target profile can now be moved, leading to that the segment within will be linearly stretched or compressed, which is also known as "warping". The warping is performed in such a way that the integrated signal intensity in each segment is unaffected. The warping that gives the highest degree of conformance between the pair of elution profiles defines the optimal alignment. Preferably a dynamic programming algorithm is applied to find the optimal set of node positions, called the warping scheme. Naturally the use of other optimisation algorithms is also conceivable.

A detailed example of an alignment algorithm, suitable for use in the method according to the present invention, will now be described.

Example of alignment algorithm pre-processing:

-signal level equalisation (optional). A linear transformation is applied to the signal level in the target elution profile, in order to compensate for sample volume differences.

-manual selection of a region of interest (ROI) in the reference profile. Any features outside the ROI will be ignored in the subsequent steps. The purpose of this step is to avoid the alignment of data collected before the first component elutes and after the last component has eluted.

piecewise linear warping

The ROI of the reference elution profile and the target profile are divided into a user- specified number of segments, N(N is typically 20-50). The segment boundaries of the reference profile are fixed throughout the warping procedure, while those of the target profile are flexible. FIG. 4 illustrates a possible arrangement of segments, in which the reference elution profile, FIG 4a), is compared to the target elution profile, FIG 4b).

Each target segment is warped, i.e. stretched or compressed, to the same size as its reference counterpart so that the two can be compared. A re-sampling algorithm is used for the warping, which preserves the integrated signal level. This has obvious advantages when it comes to quantitative comparisons. optimisation

The object is to find the set of target segment boundaries that maximise the similarity between the reference profile and the warped target profile.

There are N+1 degrees of freedom in this optimisation problem. To start with, the block of target segments can be translated to any position along the time axis, as long as it remains fully within the profile. The position of the block along the time axis defines the left boundary of the first target segment. Following this, there are N further segment boundaries to be positioned (because gaps are not allowed between segments).

A maximum distortion parameter τ is introduced to limit the problem size. It is introduced into the optimisation by requiring that, for each segment,

where *V is the size of the warped target segment and "Λ is the size of the corresponding reference segment. The optimisation method of choice is linear programming. A similarity measure, p, is introduced and the optimization will be the problem of maximizing:

f = ∑p( i,Ri) eq.3 i wherein _Tj- is the warped target segment, and Ri the corresponding reference segment.

Generating refined data representations 315

When elution profiles have been aligned to the same alignment reference, a consensus profile can be generated, that combines all features of the input profiles. The intensity of each data point in the consensus profile is calculated as a function of the corresponding points in the input profiles. Suitable functions include the arithmetic mean, the geometric mean, the median value, the maximum, for example. Illustrated in FIG. 5 is a set of aligned elution profiles 505 and the consensus profile 510 combining all features of the individual elution profiles.

A differential profile representing differential data between the aligned elution profiles can be generated in the same manner. The differential profile emphasizes differences in signal variation over the first elution profiles. For the differential profile suitable functions include the variance, max-min value, and max/min ratio, for example. Illustrated in FIG. 6 is a set of aligned elution profiles 605 and the differential profile 610 highlighting the variations between the individual elution profiles.

Generating peptide maps by an annotation process 320

Peptide patterns from the aligned elution profiles are produced by an annotation process. The annotation reassembles the signal variation originating from the same peptide species. The annotation process can be performed in various ways, including both manual, computer aided and fully automated processes. A preferred fully automated annotation process will be described below (non-limiting example). An inventive fully automated annotation process is disclosed in the patent application "Method and a system for annotation of biomolecule patterns in chromatography/mass- spectrometry analysis", by the same inventors and applicant as the present application. "Method and a system for annotation of biomolecule patterns in chromatography/mass- spectrometry analysis" is hereby incorporated by reference. The automated annotation process automatically reassembles signals originating from the same peptide species dispersed in the elution profile and appearing as a plurality of peaks. The peaks typically range from well-defined to weak and diffuse for the same peptide species. The automated annotation process generates a peptide map for each elution profile.

The automated annotation algorithm starts by detecting primary features presumably corresponding to peaks in the signal variation of the elution profile. Primary features may comprise e.g. local maxima in the signal intensity, seeds from thresholding morphological operations or positions selected by analysis of gradients. Spots are compact areas of high intensity, which are detected starting from the primary features. Spots may correspond to individual isotopic peaks, or to isotopic peak clusters when the instrument resolution is not good enough to separate them. Spots may also originate from noise and data acquisition artefacts. The primary feature detection and spot detection steps make use of the local surroundings of the data points in both the m/z and elution time dimensions. A spot must have at least one predefined extension in both dimensions, hi that way noise peaks, for example, are avoided. When a spot is found, attempts are made to put it into context, i.e., to find additional traces of the peptide species that gave rise to the spot in the elution profile. As previously described, these traces are highly structured; the spot corresponds to a certain charge state and possibly a certain molecule isotope of the peptide species, and there may also be spots for other molecule isotopes and additional charge states. If a labelling method is used, there may also be spots corresponding to differently labelled versions of the same peptide species. Thus, a peptide map entry for the peptide species is constructed, starting from a single spot. This step is carried out for each spot.

The last step in the process is a refinement step, where duplicate entries are removed and overlaps are resolved. A peptide species may be detected several times by the algorithm (e.g. once for each charge state), which leads to duplicate entries in the peptide map. Such duplication is detected by systematic comparison and duplicate entries are removed either automatically or manually. There may also be regions where two or more peptide species overlap, due to insufficient chromatographic separation. A region where there is a large overlap between two peptide species cannot be used for . measurements of the amounts of either species, and may therefore have to be removed from the map entries of both species or otherwise be indicated as being unreliable.

Matching peptide maps 330

The aim of the matching step 330 is to generate the global annotation which is needed for the abundance profiles for individual peptides across different samples. In that the method according to the invention uses that the elution profiles have been aligned, the matching will be computationally trivial or significantly improved. A reference peptide map has been generated in the automated annotation step 325 from either an individual elution profile or from a consensus elution profile. Alternatively a previously generated and stored reference may be used.

In the above described embodiments A, C, and D of the inventive method, the matching is reduced to a simple projection and is obtained at the same time as the annotation. In fact, the annotation and matching can in these cases, apart from the generation of the reference peptide map, be regarded as have merged into one step. The merged matching and annotation will in these embodiments comprise projecting the reference peptide map onto the rest of the elution profiles in the set of aligned elution profiles. These embodiments are advantageous to use if the samples are expected to have a relatively high degree of similarity.

In the embodiment E a more elaborate matching process will be needed. In this embodiment all elution profiles have been individually annotated and individual peptides maps generated, and a true matching of the peptide maps is needed. This embodiment is particularly useful if the samples are expected to be of more hetero genie character, for example if one or group of peptides only exist in some of the samples. In certain application the number of peptides in one map will not be very large (typically on the order of 100 - 10,000) and the mass spectrometer can give a very accurate and specific mass measurement for each peptide. Thus, many peptides will have unique masses, and therefore can be unambiguously matched. In these case and since the elution profiles are aligned, the matching of the peptide maps will also in this embodiment be a simple projection of the peptide map of one elution profile onto another elution profile.

In other cases the unique masses of individual peptides can not be fully resolved and clusters will be formed. These clusters must be resolved in order to get the global annotation. This is preferably achieved by treating the matching process as an optimization problem. Those skilled in the art will appreciate that many different optimization methods may be used for this type of problem, including greedy algorithms, simulated annealing, dynamic programming or genetic algorithms.

Whatever optimization method, the differences in elution time between elution profiles has to be taken into account, typically as an unknown variable in the optimization, and large differences will affect the amount of calculations needed as well as the quality and reliability of the result. Hence, regardless of the optimization method, the optimization will be significantly simplified and/or faster as the variation in elution time can be ignored, or at least assumed to be within narrow limits, provided the elution profiles have been aligned.

An example of a matching algorithm, suitable for taking advantage of the aligned peptide maps, will now be described.

Example of matching algorithm The algorithm takes two or more peptide maps as input. The output is a match table, holding one column for each peptide map. The rows of the table correspond to unique peptides. Non-empty table cells represent a mapping from a unique peptide (table row) to a peptide in a particular map (table column). An empty table cell indicates that a unique peptide does not match any peptide in a particular peptide map. For each peptide in each map, the mass (M/z and usually M) and the elution time are known.

The matching is performed in two steps. Both steps may employ a greedy algorithm. A greedy algorithm is not optimal, but scales well with problem size. Other algorithms such as simulated annealing or genetic algorithms could also be employed.

1. Cluster formation:

A cluster is a putative row in the match table, hi the first step, the optimal cluster for each peptide is found, at this stage ignoring conflicts with other clusters.

All peptide maps are joined to form a large peptide list. The list is sorted with respect to M (or M/z if charges are not available). For each entry in the list, the optimal cluster is identified by exhaustive search (within a mass tolerance). The optimal cluster for a given list entry (i.e., peptide) is defined as the best-scoring cluster that contains that particular list entry (called the reference) and at most one list entry from all other maps, fulfilling the requirements: a) the mass difference between the peptide and the reference must be within a predefined limit, and b) the peptide does not belong to a selected cluster (see below).

Each cluster is assigned a score, which is calculated as the sum of all pairwise elution time difference scores within the cluster:

wherein tt -tj is the pairwise elution time difference. The parameter r is interpreted as the largest time difference that is considered a perfect match. Score 1 is considered a perfect match between two peptides, and 0 an infinitely bad match. A cluster must not contain a pair with zero score. In this example of a matching algorithm the fact that the peptide maps are aligned, makes it possible to chose a small T, which will improve the quality of the matching.

Alternatively, as mentioned above, since the difference U can be assumed to be small the matching algorithm can be simplified and possibly faster.

2. Selection of clusters:

In the second step the clusters are sorted with respect to score. The following procedure is then iterated as long as there are any clusters left:

a) the best-scoring cluster is found using linear search.

b) the cluster formation algorithm, 1) is run on that cluster again. If the score has decreased, it is assume that some of the peptides in the cluster now belong to a selected cluster; the cluster score is updated and the procedure restarts. It may also happen that the score increases; this is due to the non-optimality of the greedy algorithm and is ignored.

c) the best-scoring cluster is selected, i.e., copied to the match table.

This exemplary algorithm may preferably be extended in several ways. For example with a limitation on how well the elution times must match in order to make a valid match. A simple way of solving this problem is to append a cut-off threshold to the cluster formation requirements. Alternatively dynamic thresholds, for example based on a statistical measure on how well all peptides match, can be used.

Abundance measurement 335

For each elution profile with an associated peptide map, the signal intensity over the data points belonging to each peptide species in the map can be integrated. This yields an intensity measurement for each peptide species, and (optionally) for its charge states and molecule isotopes.

A data point in an elution profile is a measurement of the number of ions that were detected in a certain mass-to-charge ratio interval, during a certain time interval. Provided that the ions all come from the same peptide species, this can be can regarded as a measurement of the amount of the species in the sample. Measurements cannot be compared directly between species, because different molecule species are ionised to different extent in the mass spectrometer. However, the previously mentioned investigation by Skold et al indicates that the measurements are at least repeatable. Since the peptide species are matched the relative abundance of peptide species between the different samples can be established.

Certain measures can be taken to further increase the accuracy of the abundance and relative abundance: A normalisation procedure can be applied to e.g. compensate for uneven sample loadings among the LC/MS runs; and internal standards (spikes), i.e. known amounts of certain peptide species can be added to the samples before the LC/MS analysis. In each experiment there will be a large number of elution profiles, yielding a large number of abundance measurements. These measurements have a high degree of structure. There is the peptide species - charge state - isotope relation, to begin with, which may be aggregated to reduce the number of measurements. There is also the experimental design that relates the runs to each other and adds a number of factors/dimensions to the data set. In many cases further analysis of the data will facilitate the interpretation. This kind of data is preferentially analysed by multivariate statistical methods for example ANONA (Analysis of Variance), PCA (Principal Components Analysis) and FA (Factor Analysis). Various regression methods can also prove useful for model building. The analysis may be performed using dedicated, custom-built software, or by general-purpose statistical and data analysis packages such as SAS (SAS Institute Inc, Gary, ΝC USA) or Spotfire (Spotfire, U.S. Headquarters, Somerville, MA, USA).

In the automated alignment of the present invention, and also in the annotation and matching process, the original data of the elution profiles is preferably preserved as well as the correlations between refined data and the original data. In addition the method is very visual, and preferably visualized with the aid of computer graphics, for example how peptide maps are projected onto elution profiles. This gives an ability to visual the steps of the method as well as confirm and verify a high level result with original data. For example to check the consistence of a global annotation with the first elution profiles. This is of special importance if, for example, advanced statistical methods are needed for the abundance measurement. Such advanced methods, however powerful, may in certain cases produce doubtful results even if the statistical measure may indicate a high accuracy. In this cases the ability to trace the result back to original data and the visual nature of the results and interim results such as elution profiles and peptide maps are of high value.

The above embodiments are intended only to illustrate the present invention and are not intended to limit the scope of protection claimed by the following claims.

Claims

1. A method of performing a combined Chromatography and Mass Spectrometry analysis (C/MS) on at least two samples for characterization of biomolecules in the samples, which method comprises the steps of:

-performing an C/MS analysis (300);

-generating a plurality of first elution profiles (305), at least one elution profile per sample, which first elution profiles are multidimensional representations of the data resulting from the C/MS analysis wherein one dimension is the elution time of the chromatography, and one dimension is mass to charge ratio (m/z), and at least one dimension a signal intensity, and in which elution profiles a characteristic variation in the signal intensity is an indication of the existence of a, or part of a, specific biomolecule; and which method is characterized in that the method comprises the steps of:

-generating peptide maps (320) of the elution profiles in which data from the same biomolecule species are reassembled;

-matching the peptide maps (330), the matching linking the peptide species across the different elution profiles; and

-aligning (310) elution profiles, said alignment step comprises an automated alignment between at least two first elution profiles for correlating the variations in intensity indicative of specific biomolecules species over at least two of the first elution profiles, said automated alignment producing a set of aligned elution profiles, whereby facilitating the annotation process and/or the matching process. 2. The C/MS analysis method according to claim 1, wherein the first elution profiles are aligned in the elution time dimension, the alignment being adapted for compensating for variations in elution time between the first elution profiles.

Claims

3. The C/MS analysis method according to claim 2, wherein the alignment comprises the steps of: -selecting one profile of the first elution profile as a reference profile, and another elution profile as a target profile; -dividing the reference profile and the target profile into a number of consecutive segments, -warping the segments of the target profile, to achieve an essentially optimal correspondence to the signal variation of the reference profile,

4. The C/MS analysis method according to claim 3, wherein data from the individual elution profiles of the set of aligned elution profiles are comprised in a consensus profile (315), the consensus profile representing characteristic intensity variations appearing in the aligned elution profiles.

5. The C/MS analysis method according to claim 4, wherein the consensus profile comprises maximum values of the characteristic intensity variations appearing in the aligned elution profiles.

6. The C/MS analysis method according to claim 4, wherein the consensus profile comprises a weighted average of the characteristic intensity variations appearing in the aligned elution profiles.

7. The C/MS analysis method according to any of claims 1 to 6, wherein in a first annotation is performed on a first elution profile or the consensus profile and the annotation generates a reference peptide map, which reference peptide map is used to annotate and match, in a second combined annotation and matching, at least one of the individual elution profiles from the set of aligned elution profiles, said second combined annotation and matching being a projection of the reference peptide map onto individual elution profiles.

8. The C/MS analysis method according to any of claims 1 to 6, wherein in a first annotation is performed on a plurality of first elution profiles and the first annotation generates a plurality of peptide maps, each corresponding to a elution profile, and wherein the matching step comprises matching of peptide mass lists comprising the masses of peptide species and their elution times.

9. The C/MS analysis method according to claim 8, wherein matching utilises the fact that the individually elution profiles are aligned to increase the quality of the matching.

10. The C/MS analysis method according to any of claims 4 to 6, wherein the annotation is performed on a consensus profile, the consensus profile giving an enhancement of the signal variations of the individual elution profiles of the set of aligned elution profiles, whereby the result of the annotation is improved.

11. The C/MS analysis method according to any of claims 4 to 6, wherein the annotation is performed on a consensus profile, the consensus profile comprising essentially all signal variations of the individual elution profiles of the set of aligned elution profiles, whereby essentially all peptide species of the individual elution profiles of the set of aligned elution profiles are annotated in the single annotation of the consensus profile.

12. The C/MS analysis method according to any of claims 2 to 6, wherein in a first annotation is performed on one of the profiles of the set of aligned profiles and the result of the first annotation is used in the annotation of individual elution profiles from the set of aligned elution profiles.

13. The C/MS analysis method according to claim 3, wherein data from at least two aligned elution profiles are comprised in a differential profile (315), the differential profile indicating differences in the intensity variations appearing in the at least two first elution profiles.

14. The C/MS analysis method according to any of claims 1 to 13, wherein a elution profile is aligned with and compared with a reference elution profile and the analysed in a first analysis without taking the annotation and/or matching step, and the annotation and/or matching step is taken only if the first analysis indicate that these steps are needed.

15. A measurement system for performing a combined Chromatography and Mass Spectrometry analysis (C/MS) on at least one sample for characterization of biomolecules species in the sample, wherein the measurement system comprises at least one chromatography column (125), a mass spectrometer interface (130), a mass spectrometer (135) and means for control and analysis (140,145), and the measurement system is adapted to: -perform an C/MS analysis;

-generate a plurality of first elution profiles, at least one elution profile per sample, which first elution profiles are multidimensional representations of the data resulting from the C/MS analysis wherein one dimension is the elution time of the chromatography, and one dimension is mass to charge ratio (m/z), and at least one dimension a signal intensity, and in which elution profiles a characteristic variation in the signal intensity is an indication of the existence of a, or part of a, specific biomolecule; and which measurement system is characterized in that the method comprises:

-means for generating peptide maps of the elution profiles in which data from the same biomolecule species are reassembled; -means for matching the peptide maps, the matching linking the peptide species across the different elution profiles; and

-means for aligning elution profiles, said alignment comprises an automated alignment between at least two first elution profiles for correlating the variations in intensity indicative of specific biomolecules species over at least two of the first elution profiles, said automated alignment producing a set of aligned elution profiles, whereby facilitating the annotation process and/or the matching process.

16. Computer program products directly loadable into the internal memory of a processing means within the means for controlling an analysing (140,145), comprising the software code means adapted for controlling the steps of any of the claims 1 to 14.

7. Computer program products stored on a computer usable medium, comprising readable program adapted for causing a processing means in a processing unit within the means for controlling an analysing (140,145), to control an execution of the steps of any of the claims 1 to 14.