CA2651934A1 - Mass spectrometry biomarker assay - Google Patents

Mass spectrometry biomarker assay Download PDF

Info

Publication number
CA2651934A1
CA2651934A1 CA002651934A CA2651934A CA2651934A1 CA 2651934 A1 CA2651934 A1 CA 2651934A1 CA 002651934 A CA002651934 A CA 002651934A CA 2651934 A CA2651934 A CA 2651934A CA 2651934 A1 CA2651934 A1 CA 2651934A1
Authority
CA
Canada
Prior art keywords
biomarker
sample
signal
mass
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002651934A
Other languages
French (fr)
Inventor
Toshihide Nishimura
Atsushi Ogiwara
Takeshi Kawamura
Takao Kawakami
Yutaka Kyono
Mitsuhiro Kanazawa
Fredrik Nyberg
Gyoergy Marko-Varga
Hisase Anyoji
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AstraZeneca UK Ltd
Medical Proteoscope Co Ltd
Original Assignee
Astrazeneca Uk Limited
Toshihide Nishimura
Atsushi Ogiwara
Takeshi Kawamura
Takao Kawakami
Yutaka Kyono
Mitsuhiro Kanazawa
Fredrik Nyberg
Gyoergy Marko-Varga
Hisase Anyoji
Medical Proteoscope Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Astrazeneca Uk Limited, Toshihide Nishimura, Atsushi Ogiwara, Takeshi Kawamura, Takao Kawakami, Yutaka Kyono, Mitsuhiro Kanazawa, Fredrik Nyberg, Gyoergy Marko-Varga, Hisase Anyoji, Medical Proteoscope Co Ltd filed Critical Astrazeneca Uk Limited
Publication of CA2651934A1 publication Critical patent/CA2651934A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/74Optical detectors
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • G01N30/7233Mass spectrometers interfaced to liquid or supercritical fluid chromatograph

Abstract

The invention provides a method for determining the presence of one or more polypeptide biomarkers in a sample, comprising the steps of: (a) subjecting the sample to a mass spectrometric (MS) analysis and recording retention time index and corresponding mass for each signal detected; (b) con-elating the mass corresponding to each signal to a reference database of biomarker masses to form a con-elation between each signal and a reference biomarker, and discarding those signals whose masses do not correlate to a reference boimarker mass; (c) storing those signals whose masses correlate with a reference biomarker; (d) confirming the con-elation between each stored signal and a reference biomarker by matching the MS spectrum of each signal with the MS spectrum of the reference biomarker in the database using a similarity measure, to define a set of positively correlating signals; (d) measuring the intensity of each positivley coreelating signal and scoring its absolute signal intensity or its relative signal intensity using a discrimination function; (e) applying a threshold to the score values obtained from the discrimination function to detemiine the presence or absence of the biomarker.

Description

Mass Spectrometry Biomarker assay The present invention relates to an assay for bioinarlcers. In particular, the invention describes a multiplex assay capable of automatically screening for the presence of biomarlcers in sainples by mass spectrometry.

Various biological marlcers, known as biomarkers, have been identified and studied through the application of biochemistry and molecular biology to medical and toxicological states. Bioinarlcers can be discovered in both tissues and biofluids, where blood is the most common biofluid used in biomarker studies.

Biomarlcers may have a predictive power, and as such may be used to predict or detect the presence, level, type or stage of particular conditions or diseases (including the presence or level of particular microorganisms or toxins), the susceptibility (including genetic susceptibility) to particular conditions or diseases, or the response to particular treatments (including drug treatments). It is thought that biomarkers will play an increasingly important role in the future of drug discovery and development, by improving the efficiency of research and development programs. Biomarkers can be used as diagnostic agents, monitors of disease progression, monitors of treatment and predictors of clinical outcome. For example, various biomarker research projects are attempting to identify lnarkers of specific cancers and of specific cardiovascular and immunological diseases.
Intact proteins can be assayed in a number of ways utilizing both gel-based as well as liquid phase separation technologies. Two-dimensional gel electrophoresis is used with solubilised protein mixtures where the proteins are separated based upon charge and size.
The proteins are resolved such that both isomeric forms, as well as post-translational modifications, are resolved. Quantitation of the proteins is made by staining techniques, where both pre- and post staining tecliniques can be applied. Metabolic labelling also allows the linear range to be extended tip to 5 orders of magnitude, offering sensitivities within the femtomolar range. Protein identification is performed from excised gel spots.
The proteins are digested after chemical degradation and modification. The resulting peptide mixtures are extracted from the isolated gel sample and subsequently identified by mass spectrometry.
Multidimensional HPLC (Hig11 Performance Liquid Chromatography) can be used as a good alteinative for separating proteins or peptides. The protein or peptide mixture is passed through a succession of chromatographic stationary phases or dimensions which gives a higller resolving power. HPLC is flexible for many experimental approaches and various stationary and mobile phases can be selected for their suitability in resolving specific protein or peptide classes of interest and for compatibility with each other and with downstream mass spectrometric metllods of detection and identification.
High Performance Liquid Chromatography is cuiTently the best methodology for solute separations which also allows for automated operation with a high degree of reproducibility. On-line configurations of these types of multi-mechanism separation platforms are commonly applied within proteomics studies.

Mass spectrometry (MS) is also an essential element of the proteoinics field.
In fact MS
is the major tool used to study and characterize purified proteins in this field. The interface linl{ in proteomics and MS, displaying hundreds or thousands of proteins, is made by gel technology where high resolution can be reached on a single gel.
Researchers are successfully harnessing the power of MS to supersede the two-dimensional gels that originally gave proteomics its impetus.
The application and development of mass spectrometry (MS) to identify proteins or peptides separated via liquid phase separation techniques and/or gel-based separation techniques have led to significant technological advance in protein and peptide expression analysis. There are two inain methods for the mass spectrometric characterization of proteins and peptides: matrix-assisted laser desorption ionization (MALDI) and electrospray ionization (ESI). Using various approaches, MALDI and ESI ion sources can be combined with time-of-flight (TOF) or other types of mass spectrometric analyzers to determine the mass or the sequence of peptides.

In MALDI, peptides are co-crystallized with the matrix, and pulsed with lasers. This treatment vaporizes and ionizes the peptides. The molecular weights (masses) of the charged peptides are then determined in a TOF analyzer. In this device, an electric field accelerates the charged molecules toward a detector, and the differences in the length of time it talces ionized peptides to reach the detector (their time-of-flight) reveal the inolecular weights of the peptides; smaller peptides reach the detector more quickly. This metliod generates mass profiles of the peptide mixtures - that is, profiles of the molecular weigllts and amounts of peptides in the mixture. These profiles can then be used to identify lalown proteins from protein sequence databases.

By malcing an ESI-MS interface to liquid chromatography (LC/MS/MS), the eluting peptides from the LC-column are introduced into the ion source of the mass spectrometer.
A voltage is applied to a very fine needle. The needle then sprays droplets into a mass spectrometric analyzer where the droplets evaporate and peptide ions are released corresponding to a variety of charge states that are fragmented and from where the sequence can be determined. In LC/MS/MS, researchers use microcapilliary LC
devices to initially separate peptides.

Mass spectrometry (MS) is a valuable analytical technique because it measures an intrinsic property of a bio-molecule, its mass, with very high sensitivity MS
can therefore be used to measure a wide range of molecule types (proteins, peptide, or any other bio-molecules) and a wide range of sample types/biological materials.
Correct sample preparation is known to be crucial for the MS signal generation and spectra resolution and sensitivity. Sample preparation is therefore a crucial area for overall feasibility and sensitivity of the analysis.

Proteins are bio-macro molecules that are difficult to separate by liquid phase chromatographic separation techn.iques, due to the unfavorable mass transfer within the particles of the chromatographic coluinn material, the stationary phase.
However, proteins can be rendered into smaller unit (peptide or polypeptide) form by breaking the peptide bond joining two adjacent amino acids. This can be accomplished by enzymatic cleavage by proteases, proteins that are capable of interacting and dissolving peptide bonds on other proteins. Trypsin is the most commonly used protease, used in protein expression analysis studies. After the enzyinatic degradation, a resulting complex mixture of peptides -will be separated and fractionated by capillary chromatography. All peptides that are the sum of the digested proteins in the sainple will be unresolved at this stage.
The peptides that have been generated from the coiresponding protein will not be separated as one unit in the chromatographic fractionation step, but rather will be separated togetlier with the resulting peptides fiom all other proteins in the sample. The high resolved and separated eluting peptides fiom the capillary, will be fractioned most commonly based upon charge and hydrophobicity. The separated peptides are introduced on-line from the chromatographic part of the platform into the mass spectrometer, thereby circumventing possible contaminations. The peptides are then mass deterinined (m/z), in order to capture all the peptides present in that given time window. Next, a number of peptide masses are selected for sequencing (MS/MS), based upon tlleir abundance in the given time window.
This is perfonned by a new ion sampling interface by an electrospray ionization ion trap mass spectrometer system. The interface uses linear quadrupoles as ion guides and ion traps to enhance the performance of the trap. Trapping ions in the linear quadrupoles is demonstrated to improve the duty cycle of the system. Dipolar excitation of ions trapped in a linear quadiLipole is used to eject unwanted ions.

After the first appearance of successful instrumentation in 1990, ion trap mass spectrometry with electro-spray ionization (ESI) has become a widely used tool for trace analysis. Electrospray is a gentle source that can ionize important analytes such as peptides, and proteins. Highly charged ions produced in ESI can extend the range of mass analyzers. Trap mass spectrometers have favorable capabilities such as flexible tandem MS capability (MS fz..... ). In this ionization process, the precursor ion is activated by acceleration into a mass-selective linear ion trap under conditions whereby some of the fraginent ions formed are unstable within the trap. After a time delay the stability parameters of the ion trap are changed to allow capture of fragments that that were previously unstable. The result is a product ion spectrum that originates from precursor ions with a modified internal energy distribution. It is possible to follow the evolution of the precursor internal energy distribution for many milliseconds after admittance of the precursor ions into the linear ion trap. Time-delayed fragmentation product ion spectra typically display reduced sequential fragmentation products leading to spectra that are more easily interpreted. Several iinportant experimental parameters important to time-delayed fragmentation have been identified and are discussed. The teclmique has applications for both small precursor ions and multiply charged peptides.

Tandem mass spectrometry (MS/MS) is at the heart of most of modenl mass spectrometric investigations of complex mixtures. The fragmentation involves activation of a precursor ion via collisions with a target gas and may produce charged and neutral fraginents. The nature of the fi=agment ions, as well as their intensities, is often indicative 5 of the structure of the precursor ion and thus can yield useful information for the identification of unknown analytes, as well as providing a useful screening technique for different classes of analytes. Activation via multiple collisions both prolongs the activation time and enables higlier energies to be deposited into precursor ions. Higher collision gas pressures also imply higher collision relaxation rates.
Whilst the combination of protein separation by 2D gel electrophoresis and analysis by mass spectrometry have been established to be useful for biomarker analysis, multiplex systems capable of analysing several biomarkers are currently at the experimental stage.

Many diseases have been shown to be associated with a complex pattern of biomarkers, which may be diagnostic for the disease or indicative of the resposnse to di-ug treatment by a patient. These patterns often involve several biomarkers, requiring multiple simultaneous analyses. There is a need, therefore, for a system capable of assaying multiple biomarlcers simultaneously. Ideally, the system could be automated.

Summary of the Invention The invention provides an assay for biomarlcers in a biological sample which is automated and accurate. The assay relies on mass spectrometry to identify biomarkers, and is referred to herein as the mass spectrometry biomarker assay (MSBA).

The invention provides a method for detennining the presence of one or more polypeptide biomarkers in, preferably, a human test sample, which may including non-human test samples, which is typically confined in a voluine of a biofluid containing naturally occurring proteins and peptides contained within an amount of tissue, blood, or other clinically obtained speciments.

The method preferably comprises the following steps:
(a) subjecting the sample to a mass spectrometric (MS) analysis and recording retention time index and corresponding mass for eac11 signal detected;
(b) correlating the mass corresponding to each signal to a reference database holding a master set of biomarlcer masses from a known disease or biological alteration, to forin a correlation between each test sample signal and a biomarker from the master set of biomarkers within the reference database, and discarding those test signals whose masses do not correlate to a reference biomarker mass in the master data set;
(c) storing those test sainple signals whose masses correlate with a reference biomarker in the master data;
(d) confirming the correlation between each stored signal and a reference biomarker by matching the MS spectrum of each signal with the MS spectrum of the reference biomarlcer in the database using a similarity measure, to define a set of positively correlating signals;
(d) measuring the intensity of each stored test signal positively correlating signal and scoring its absolute signal intensity or its relative signal intensity using a discrimination function;
(e) applying a threshold to the score values obtained from the discrimination fiinction to determine the presence or absence of the biomarker.

Preferably, the method of the invention uses the master data set in the test sample screening phase.

Advantageously, the method filters and screens mass and sequence identities of data sets that are based on each of the unique properties of charge, mass, sequence spectra associated with certaiil identified protein sequences in the master data set.

In a first aspect, therefore, the invention provides a method for determining the presence of one or more polypeptide biomarkers in a sample, comprising the steps of:
(a) subjecting the sample to a mass spectrometric (MS) analysis and recording retention time index and corresponding mass for each signal detected;
(b) correlating the mass corresponding to each signal to a reference database of biomarker masses to forin a coiTelation between each signal and a reference biomarker, and discarding those signals whose masses do not correlate to a reference boimarker inass;
(c) storing those signals whose masses correlate with a reference biomarlcer;
(d) confirining the correlation between each stored signal and a reference biomarker by matching the MS spectrum of each signal with the MS spectruin of the reference biomarlcer in the database using a similarity measure, to define a set of positively correlating signals;
(d) measuring the intensity of each positivley coiTelating signal and scoring its absolute signal intensity or its relative signal intensity using a discrimination function;
(e) applying a threshold to the score values obtained fiom the discrimination function to determine the presence or absence of the biomarlcer.

The method of the invention allows users to analyse, simultaneously, hundreds or thousands of biomarkers in a sample. The method relies on a database of biomarkers, which have been shown to be associated with a disease, which comprises mass and spectral data for each of the biomarlcers and allows the said biomarkers to be indentified precisely by the MSBA software in a given sample. By screening the peptides present in a sample and eliminating undesired sequences on the basis of the retention time index, which correlates with the time of arrival of the peptide at the MS detector, upwards of 30,000 sequences can be analysed in minutes and given biomarlcers identified with high confidence. The method is automatable, high-throughput and operable by relatively unskilled technicians.

The sample can be subjected to MS analysis without prior separation procedures. In such an einbodiment, the sample is preferably analysed by direct infusion using static nano-electrospray principles, flow injection analysis or flow injection with sample enrichment.
Advantageously, the sample is processed prior to MS analysis, preferably to separate sample components prior to loading them into the MS. For example, the sample processing coinprises sainple separation by single- or multi-phase higli-pressure liquid chromatography (HPLC).
The MS system itself is preferably electrospray ionisation (ESI) MS, matrix-assisted laser desorption ionisation - time of flight (MALDI-TOF) MS or surface ei-d-ianced laser desorption ionisation - time of fliglit (SELDI-TOF) MS.

The method according to the invention is advantageously automated and performed under computer control. Identification of biomarkers in a sample is made by coinparison with reference data for said biomarkers; preferably, reference mass and MS spectral data for a plurality of biomarkers are stored on a computer.

Reference MS spectra for a defined biomarlcer are preferably averaged spectra obtained from actual and measured data obtained by a clustering calculation.

The method of the invention may be implemented in two ways; using internal standards to provide a reference for quantitating signal intensity, and without such standards. Thus, in one embodiment, one or more internal standards are added to the sample prior to analysis by MS. Preferably, the internal standards are labelled.

In such an implementation of the invention, the absolute signal intensity for each biomarker signal is scored by measuring the biomarker signal intensity and comparing it to the signal intensity of one or more known internal standards.

In the alternative implementation, the sample is processed without the addition of internal standards. In such an embodiment, the relative signal intensity is scored by measuring the ratio between the individual biomarlcer signal intensities in a patient and the reference signal intensity for a patient group.

A bioinarker can be described as "a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic inteivention". A biomarker is any identifiable and measurable indicator associated with a particular condition or disease where there is a coiTelation between the presence or level of the bioinarker and some aspect of the condition or disease (including the presence of, the level or changing level of, the type of, the stage of, the susceptibility to the condition or disease, or the responsiveness to a drug used for treating the condition or disease). The coiTelation may be qualitative, quantitative, or both qualitative and quantitative. Typically a biomarker is a coznpound, compound frag7nent or group of compounds. Such compounds may be any compounds found in or produced by an organism, including proteins (and peptides), nucleic acids and other compounds.

The sample may be any biological substance of interest, but is advantageously a biological tissue and preferably a biological fluid such as blood or plasma.

The method of the invention relies upon correlation of observed MS signals with reference masses and MS spectra of lmown biomarkers. The reference data is preferably stored on a computer server, which allows the entire procedure to be carried out under coinputer control.

Signals are correlated to reference standards by comparison, for example using computational functions as described herein. Preferably, signals are characterised as "positive" or "negative" according to whether a threshold level of similarity is achieved;
signals which are negative and do not achieve the threshold level of similarity are discarded in the MSBA process, whilst those signals which are positive are matched with biomarlcers and result in a diagnosis of the presence of said biomarkers in a biological sample.

Signal intensity is measured with reference to known control standards added to the biological sample, or to by comparison with a reference intensity calculated across a patient group, depending on the implementation of the MSBA assay.

In the case of iznplementation with standards, the MSBA scoring of the biomarlcer signals is calculated by the ratio between the signal of biomarker present in the sainple and the intemal standard added to the sample. All biomarkers in a multiplex assay will be analysed the same way resulting in a final MSBA scoring factor.

In order to have absolute quantitation built into the MSBA methodology, the use of internal calibrant standards is preferred. Such standards are for instance isotope labelled, making the assay read-out highly accurate in terms of protein sequence, as well advatageous in tenns of absolute quantitation. Built in calibration sequences within the MSBA screening will allow the measurement of absolute protein biomarker levels in blood, or any otlier clinical sample.

A new method is provided, composed of multiple linked steps, for detecting and quantifying protein sequence biomarlcers with a multiplex read-out where the expression levels of, but not restricted to, 2-100 biomarkers can be mapped in one single MSBA
read-out. The MSBA system is built on a liquid phase platform that can handle single line 10 diagnostic mapping, or a multiple flow configuration with simultaneous parallel processing of sainples, thereby increasing the capacity and throughput of the system. The detection mode of the MSBA method is the accurate mass identification and sequence deteiinination and subsequent quantitation by mass spectrometry.

This methodology may be applied to any type of biological sample that is in, or can be transformed into, a liquid form. The MSBA methodology can also process samples from any type of cellular, or biotechnology processes where for instance kinetic profiles over time are measured. This analysis over time is perfoimed by subsequent sample introduction into the MSBA platfoi-in automatically over time. The entire analysis capability of the MSBA diagnostic profiling is entirely computer control including the mass signal evaluation, the sequence analysis, the multiplex quantitation by weighing discrimination and finally the MSBA SCORE diagnosis. All of the intermediate steps within the MSBA cycle run on this platform are evaluated by dedicated algorithms that make accurate decision making from the massive amount of data generated in each cycle of MSBA analysis from any given biological sample.

The MSBA method results in the identification of specific peptides, as well as biochemically modified variants thereof, present as separate entities or present within complex mixtures of proteins and peptides. Each peptide may be defined by a specific sequence of amino acids, that can be selectively identified by either its precise mass, or its unique iinmuno-affinity binding properties to a given immunological reagent.
The method allows the identification of statistically significant protein identities and znodified versions thereof. Moreover, it is possible to measure relative quantities of each biomarker in the sample even without the use of intemal standards.
Alternafiively, absolute quantitations can be made of each bioinarlcer separately in any given biological sample by the use of internal standards, where these internal standards (e.g.
n=1-20) are the protein sequences of the biomarkers. These internal standards can then be made as cold amino acid sequences, or as isotope labeled amino acid sequences. The standards have identical sequences to the selected biomarkers, with the possible exception of the labelling.
The method comb'rnes several key steps which results in the specific processing, separation, isolation, and identification of unique protein sequences present in a biological material sample. The metlzod may be applied to human clinical samples. The method may also be applied to samples derived from non-human animals.
We provide a multi-step method for identifying the identity of unique protein sequences presented for exainple as atoinic mass units of entities from a biological sample that has been proven to have a quantitative alteration in a given multiplex biomarker group, the size of which can range for example between 2-100, of a given sample.
We moreover provide a method to determine or confirm that the biomarkers in any specific biological sample have the multiplex quantitative shift of a biomarker set of protein sequences that is pre-determined, in clinical, cellular, or any other type of sample.
This quantitative alteration is finally calculated by the MSBA algorithms to generate a MSBA SCORE that will be the diagnostic read-out.

Further, statistically significant similarities may be detected and registered as unique protein sequencs identities or multiple-peptide identities. Detennining statistically significant similarities involves using publicly available protein and gene sequence data bases as well as algoritluns developed specifically to meet the demands of the MSBA
methodology.
The integration of process steps for biomarker identification is advantageous.
The integrated process relies on the following principles: 1) high quality biomedical clinical material, 2) reproducible and high speed sample processing wit11 subsequent liquid phase separations, 3) accurate quantitative and qualitative detennination of a multiplex set of biomarkers and 4) algoritluns that will control the data generation and calculate and allow the isolation of the biomarkers in the multiplex protein sequence group, one by one.

Brief Description of the Figures Figure 1 shows a schematic illustration of the MSBA principle.

Figure 2 illustrates in more detail the data handling procedures involved in MSBA.

Figure 3 shows a mass spectrum from a blood sample fiom a lung disease patient.
Multiple biomarlcers are identified in the sainple.

Figure 4 shows an example of biomarlcer annotation made fonn the multiplex assay, presented by the MS spectrum where the biomarker was recognised by the MSBA
software, and the follow up MS/MS spectrum that represents the resulting CVLFPYGGCQGNGNK biomarker.

Figure 5 shows an example of evaluating the predictability of an MSBA model with 11 biomarker signals on sample data of 19 patients, as described in Example 2. 10 cases and 9 controls were used as if they were blinded samples. The MSBA score for each subject was calculated using Eq.5. In this example, subjects whose MSBA score was equal or greater than 1 were diagnosed as cases (red circles). Otherwise the subject was considered to be a control (blue circles). The prediction accuracy was 100%.

Figure 6 shows the auto-discrimination results using an MSBA model with 10 signals on sample data of 96 patients, as described in Exainple 3. Each dot represents a patient, and vertical axis represents the discriminant score (z), calculated using Eq. 6.
If this score was >0, it was inteipreted as a case of the disease (red circles). Otherwise the subject was considered to be a control (blue circles). The prediction accuracy was 83.3%.

Detailed Description of the Invention Biomarkers The FDA definition of biomarker is "a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention".

As used herein, the terin "biomarker" refers to a polypeptide which can be use to monitor the presence or the progress of a disease, consistent with the above FDA
definition.
Bioinarlcers can be used as diagnostic agents, monitors of disease progression, monitors of treatment and predictors of clinical outcome. For example, various biomarker research projects are attempting to identify marlcers of specific cancers and of specific cardiovascular and immunological diseases.

Some of these disease-associated proteins lnay be identified as novel drug targets and some may be useful as biomarlcers of disease progression. Such biomarkers may be used to iinprove clinical development of a new drug or to develop new diagnostics for the particular disease.

Disease-associated proteins are lcnown in the art, and their use as biomarkers for the disease is established, Such biomarkers can be monitored by means of the present invention. Novel disease-associated proteins, however, may be identified.
Detection of disease-associated proteins may be achieved, for exainple, by the following method.
Protein samples are talcen from single patients or groups of patients. These samples may be cells, tissues, or biological fluids that are processed to extract and enrich protein and/or peptide constituents. Typically the process entails partitioning into solution phase but may also include the establisllinent of protein and/or peptide coinponents attached to solid matrixes. After separation and analysis (proteomics, peptidonomics), protein expression fingerprints are produced for both diseased and healthy subjects by qualitative and quantitative measurement. These fingerprints may be used as unique identifiers to distinguish individuals and/or establish and/or track certain natural or disease processes.
These prototype fingeiprints are established for each individual sainple/subject and are recorded as numerical values in a coinputer database. The fingerprints are then analysed using bioinformatic tools to identify and select the proteins or peptides that are present in the prototype fingeiprints and whose expression may or may not be differentially present in the salnples derived from the healtlly and diseased subject samples. These proteins/peptides are then further characterised and detailed profiles are produced which identify the characteristic physical properties of the proteins or peptides.
Either a singular proteins/peptide or groups of proteins/peptides may be determined to be significantly associated with certain natural or diseased processes.
Mass Spectrometry Mass spectrometry is the method of choice for the analysis of proteins and peptides.
Modem biomarlcer discovery research employs two major mass spectrometry principles:
MALDI-TOF (matrix assisted laser desorption ionisation time of flight) mass spectrometry where the proteins are analysed in a crystalline state, and ESI
(electrospray ionisation ) mass spectrometry where the proteins are analysed in liquid state. In addition, a surface enhanced chip application of MALDI nained surface-enhanced laser desorption ionisation (SELDI) has been used extensively in biomarker discovery studies.
See, for example, Petricoin et al., Lancet, 16, 572-577, 2002; Alexe et al., Proteomics. 2004, 4766-4783; and Liotta et al., Endocr Relat Cancer. 2004 Dec;11(4):585-7.

The surface-enhanced laser desorption/ionizatioii (SELDI)-TOF-MS technology uses chromatographic surfaces coupled to the assay target plate. The protein-bound material on the plates is then directly analyzed by MALDI-MS. SELDI assays peptides and proteins predominantly in the low molecular mass range. This technology is applicable to the major, to nledluln-abundant peptides and proteins where a suitable upfront purification scheme is not integrated. The SELDI technology leads primarily to a pattern from where sequencing can be perfonned using MALDI-TOF-TOF identification of peptides.

Multi-mechanism separation platforins enable high resolution peptide separation 5 configured on-line with electrospray ionization mass spectrometry, or off-line with ionization principles such as matrix assisted laser desoiption ionization mass spectrometry. See, for example, Aebersold,R. & Goodlett,D.R. Chem. Rev. 2001, 101, 269-295; Maim, et al., Annu. Rev. Biochem, 2001. 70, 437-473; Wolters,et al.
Anal.
Chem. 73, 5683-5690 (2001); and Washburn,et al., Nat. Biotechnol. 19, 242-247 (2001).
Mass spectrometry (MS) is also an essential element of the proteomics field.
In fact MS
is the major tool used to study and characterise proteins structure and sequence within this field. See Aebersold, R. & Mann, M. Mass spectrometry-based proteoinics.
Nature 422, 198-207 (2003); Steen, H. and M. Mann (2004). Nat Rev Mol Cell Biol 5(9):

711; and Olsen, J. V. and M. Mann (2004) Proc Natl Acad Sci U S A 101(37):
13417-22.
Researchers are successfully harnessing the power of MS to supersede the two-dimensional gels that originally gave proteomics its impetus. Using ESI and liquid chromatography (LC)/MS/MS, a voltage is applied to a very fine needle that contains a peptide mixture, generating peptide sequences, eluting from the LC-column. The needle then sprays droplets into a mass spectrometric analyzer wllere the droplets evaporate and peptide ions are released. In LC/MS/MS, researchers use microcapilliary LC
devices to initially separate peptides.

Mass spectrometry (MS) is a valuable analytical technique because it measures an intrinsic property of a bio-molecule, its mass, with very high sensitivity. MS
can therefore be used to measure a wide range of molecule types (proteins, peptide, or any other bio-molecules) in a wide range of sample types/biological materials.

Correct sample preparation can influence MS signal generation and spectrum resolution and sensitivity. High resolution separation systems such as single-dimensional high-pressure Liquid Cluomatography (LC) and multidimensional Liquid Cliromatography (LC/LC) can be directly interfaced wit11 Mass Spectrometry. This interface allows fast automated acquisition and collection of large data sets that represents both quantification as well as sequence information with.in the mass spectra generated. This integrated shotgun proteomics teclulology is laiown as MudPIT (Multidimensional Protein Identification Teclulology). See Eng et al., J Am Soc Mass Spectrom 1994, 5:
976-989;
Lii-dc et al., Nat Bioteclulol. 1999 Ju1;17(7):676-82; Washburn et al., Nat Biotechnol. 2001 Mar;19(3):242-7; Lin et al., Ainerican Genomic/Proteomic Techiiology, 2001 1(1): 38-46; and Tabb et al., J. Proteome Res. 2002 1:21-26.

In the shotgun proteomics approach, peptides generated by specific protein digesting enzyines such as trypsin and other endo-, and exo-peptidases/proteinases are analysed rather than intact proteins. This fraginentation offers definite advantages due to the fact that even very large proteins, with varying physical and chemical characteristics such a's very hydrophobic, or very basic proteins, can be analysed. Such protein classes can otherwise be difficult to handle. These proteins will give rise to resulting peptide mixtures of sufficient size and number that allows for accurate protein amiotation and identification. However, since several peptides are generated from each respective protein, the complexity of the mixture to be analyzed is increased.
Consequently, considerable instruinent time and computing power are needed for the shotgun approach.
However, the wealth of protein expression information is extensive, and generated in a fiilly automated setting with simultaneous real-time protein identification.

In order to be able to handle different patient samples which present a various degree of disease, different methods can be applied to adjust and align the resulting liquid chromatography chromatograms and mass spectra. This normalisation can be perform.ed by a software approach whereby the total signal generation made from the entire experiment is used and compared to that of the various patient samples analyzed. The mean values and commonalities of all the signals will be aligned to allow differential quantitations.

In a second approach, a pre-determined amount of peptide standard is added to the sainple. This addition will be made both before and after, or, either before or after the digestion of the samples. The standards used will be the actual biomarlcer sequences synthesized as isotope labelled sequences, or without isotope labelling, and spiked with the samples.

The use of labelling technologies within the Proteomics field for quantitative clinical protein regulation studies is highly common. Various labelling techniques have been developed and applied utilizing a of variety binding chemistries. Ainongst the most commonly used labels within the proteomics field are the ICAT and ITRAQ
labels. See Parker et al., Mol. Cell. Proteoinics, 625-659, 3, 2004; Ross et al., Cell.
Proteoinics, 3, 1153-1169, 2004; and DeSouza et al., J. Proteome Res., 2005, 4, 377- 386.
Sample separation Single-, or multidimensional HPLC (High Perfonnance Liquid Chromatography) will be used as the preferred alternative for separating proteins or peptides. The protein or peptide mixture is passed through a succession of chromatographic stationary phases or dimensions which gives a higher resolving power. HPLC is adaptable for many experimental approaches and various stationary and mobile phases can be selected for their suitability in resolving specific protein or peptide classes of interest and for compatibility with each other and with downstream mass spectrometric methods of detection and identification. HPLC is used to separate clinical samples that have been digested by a proteolytic enzylne where the corresponding enzyme products, the peptide mixtures, are generated. Sample preparation procedures are applied to protein samples such as blood, tissue, or any other type of biofluid. See, for instance, Schulte et al., Expert Rev. Diagn., 5(2), 2005, 145-157; Chertov et al., Expert Rev. Diagn., 5(2), 2005, 139-145; Adkins et al., Mol. Cell. Proteomics, 1 (12), 2002 947-955; Pieper et al., Proteomics. 2003 Jul;3(7):1345-64; and Aiiderson, N. L. & Anderson, N. G. Mol.
Cell Proteom.2002 1, 845-867.

The coiTesponding peptide mixture is passed through a succession of chromatographic stationary phases or dimensions which gives a high resolving power. HPLC is flexible for many experimental approaches; in the setting of the present invention an optimization is made that specifically eliminates the high abundance fraction of proteins expressed in human blood samples, whereby enriclunent is made of proteins in the medium-, and low abundance region. The separation of peptides and proteins is based on the peptide sequence, the functional groups of the peptide sequence, as well as the physical properties.

MSBA-OPERATION PRINCIPLES

Prior to exposing samples to MSBA, a sample handling and preparation step is required in most cases. The aim of introducing this step prior to the MSBA metliodology is to eliminate interfering agents and matrix components, thereby facilitating improved overall detectability resulting an increase in annotation, as well as overall sensitivity. However, in certain embodiments sample preparation can be dispensed with, particularly if the biomarker is in higher abtindance aiid the sample of low complexity. Those skilled in the art will be able to determine whether a preparation step is essential.

The MSBA platfonn can be operated in a number of different ways, predominantly determined by the nature of the sample and its complexity.

The biomarker protein sequences are determined qualitatively and quantitatively in the patient sample by multiplex analysis. Both Labelled and Unlabelled MSBA
principles can be applied, employing configurations of the MSBA assay according to two possible principles: the internal standard addition principle and the no internal standard principle.
General Methodology After sample preparation, the sample is injected into the MSBA platform.
Next, the following operations are undertaken;

Firstly, the biomarl{er MS-signals need to be identified within the sample.

A predefined list of Biomarlcer list masses +/- 1 Dalton that correlates with the retention time index and corresponding mass of the respective biomarker is screened for in the biofluid saniple. The relative retention time indexes obtained in most MSBA
assays is defined in minutes and has a variability of about +/- 2%, altough this figure may vary.

These steps are perfoi7ned by real-time mass spectral matching to the MSBA
reference spectra repository, as illustrated in Figure 1.

Next, a coinparison is made of the masses in the MS-spectra with the masses of the reference list of Biomarlcers, to about +/- 1 Dalton.

When the biomarker candidate mass is identified as that of a biomarker having a matching MS spectruin within the reference list, within +/- 1 Da, the infonnation therein is saved on the MSBA-server. In case thst the mass is incorrect, the MSBA
screening malces no spectral file savings to the server.

These operations are performed by a file sharing and inter-process communication (such as client-server-type communication) mechanism.

STEP 2A;
When the mass identity in the MS-spectra is identified, mass identification and sequence identity analysis is initiated.
The pattern matching step within the MSBA software will identify a certain similarity measure, for example the cosine coirelation. Using the similarity measure, the correct protein sequence is confirmed. This confirmation is made by spectral matching.
The spectral matching is performed by comparison of the sample spectra and the reference spectra in the MSBA database. For a positive identity at this stage a cosine coiTelation factor of 0.8 or higher is required in order to confirm the accurate protein sequence.
Equivalent threshold values for alternative similarity measures will be apparent to those skilled in the art.

The reference spectral coinparison and evaluation is performed in the following way.

The MS/MS spectrum is represented as a list of doublets (m, v) where in represents mass-to-charge ratio, and v means the ion signal intensity value. By binning a with the interval of mU, (inU = the actual width of the bin) MS/MS spectrum can also be expressed by a vector v={ where its length (n) equals the number of bins, and the value of each element is the sum of intensity of all signals within each bin. This is a profile representation of an MS/MS spectrum.

The Cosine correlation (S) of 2 different MS/MS spectra (VI , V 2) can be calculated as a cosine correlation accordiia.g to eq 1;

VVz,t S = V'GV' `-' (Eq 1) The value of S varies from 0 to 1. 0 A value of 0 signifies that two vectors are completely independent; in the case of a S vector with the value=l, this signifies that the direction of the two vectors is the same.
Note that the two MS/MS spectra vectors must have the same binning, i.e., if the binning of m of one vector is 500-501, 501-502, ... 1999-2000, then another specti-uin must be binned in the same manner. Consequently, the lengtll of two vectors inust be the same.

In order to judge whether measured MS signals are the cor-rect biomarlcer or not, the MS/MS portion of the measured signals is extracted and compared with the MS/MS
reference spectrum of the sample by using for example the cosine correlation described above.

If the cosine correlation value S is equal to or greater than a pre-defined threshold value, for example 0.8, then the measured signals are judged to be the derived from the putative biomarker in the reference set.

The following section describes how to construct reference spectra that are obtained as a group specific spectrum from many individual patients.

For each candidate biomarlcer, once such biomarlcer is established, several MS/MS
spectra should be collected to construct a reference MS/MS spectnmz map. This is an averaged spectrum from actual and measured data sets and is obtained by a clustering calculation.

An example of the construction of such reference spectrum is as follows:
(1) Collection of multiple MS/MS spectra for the targeted biomarlcer. These MS/MS
spectra must be confirmed to be derived from the target biomarker by MASCOT, SEQUEST or other programs with a given confidence level.

(2) Investigation of the similarity of each collected MS/MS spectium with the above mentioned resemblance is perforined, This is performed by a clustering calculation using the similarity measure. The clustering calculation is performed to a point where the similarity measure decreases to the pre-defined threshold value. Following these clustering calculations, a summary list of all remaining protein sequence ions within the MS/MS spectra is generated. The next step is the removal of the remaining protein sequence ions in tlie summary list from the chister calculation.

(3) Using the established and qualified summary MS/MS spectrum from the cluster, it is possible to calculate the arithmetic average for each element of the spectrum vector. The averaged vector can now be used as the MS/MS reference spectrum.

(4) During the clustering process, it is possible to come up with a result that generates more than one reference from the patient group.

a) In that case, there is more than one cluster that contains a difference in the MS/MS
spectral profile map. The criteria set on these situations is that these groups need to have reliable target biornarker identification. It is then possible to generate and make use of more than one reference spectilun for one target. Such cases would appear if there are MS/MS fraginent ions that are different in the groups but correlates to the same annotated protein.

b) It is also possible to arrive at situations witli the cluster analysis data where there is a difference in the biomarlcer profile map. That would mean that several individual groups can be established from the patient cohoi-t, e.g. different phenotypes.
In these cases, the comparative multiplex pattern will be phenotype specific. However, there will also be possibilities of biomarlcer overlaps in between the phenotypes.

These confiimation algorithnis will be applied and used in real-time within the high-throughput screening operations of the MSBA platform, exainplified below.

The mass spectrometer (for example the Finnigan LTQ), once a positive biomarker mass has been identified, will stay on that mass target in order to make repeated scanning of the biomarker ion signal. The number of scans will be dependent on the score match generated for each particular protein sequence, but will be aligned to tlie positive identity of the biomarker. The scanning window will be determined automatically by the MSBA
software.

The criterion for a positive correlation should be higher or equal to 0.8 in a cosine correlation similarity measure.
The next succeeding step will be to inalce a statistically significant identity of the protein sequence by utilizing commercial search engines such as MASCOT or SEQUEST or any other search engine with the protein data bases, to confirm that it is the correct Biomarker identity.
The MSBA system will only store and archive those signals and data files that are within the mass and sequence area of the biomarkers. All other data generated from the assay are not transfeired to the MSBA database.

Calculation of the multiplex biomarker assay read-out The calculation of the inultiplex biomarker assay read-out is perfornled by the application of the MSBA algoritlun which consists of a discrimination function that will calculate the diagnostic MSBA score.

A discrimination fiulction is defined as a fiinction of x, ,===, x,t , where xl represents n absolute or relative signal intensity of the i:th biomarker. The output of a discrimination function must be either positive or negative value according to the diagnosis result. For example, if the diagnosis is positive, the output value of the discrimination function must be positive, and vice versa.
For example, in Eq 2, the discrimination function used is outlined:

Y, aixi - aoxtotal (Eq 2) i=1 where, ii is tlie number of multiple biomarkers used for the diagnosis. xi is the absolute or relative signal intensity of the i:th biomarker, and xtorar is the total signal intensity of the MS measurement.

There is also a weight factor included into the algorithms of the MSBA
software.

A vector { a,,..., a,l , Czo } is a weight vector that determines the direction of the normal vector of a separating hyper-plane that divides the n-dimensional signal intensity space into two: diagnosis positive and diagnosis negative. An example of the procedure to determine the weiglit vector is described afterward, however various kind of algorithms e.g. Support Vector Machine, Artificial Neural Networlc, and others can be used to determine the weight vector.

Anotlier exainple of a discrimination function is included in the MSBA
algorithms and are defiiled as follows:

afp (.xl ,. . . , .xjZ ~ y P4, (.xl ~ . . . ~ .xj2 ~ (Eq 3) The fiinction fp and fõ is an arbitrary function that give a measure of either similarity or distance between a set of measured biomarkers in a patient to be diagnosed and sets of reference biomarkers signals in the MSBA server. fp denotes the similarity or difference ftulction from the diagnosis positive references, and fõ denotes that from diagnosis negative ones.

a and (3 are coefficients that can be used to unequally weight the diagnosis-positive and the diagnosis-negative metrics.

If the function fp and fõ give a similarity measures, then a patient sample will be diagnosed as positive when the equation 3 generates a positive value. If the functions give distance measures, the positive value of the eqn3 means the diagnosis negative.

An example of such function is Euclidean distance , where yi is the i:th biomarker signal intensity in a patient to be diagnosed, y xZ) (Eq 4) Fi=

the i:th biomarker signal intensity of the reference set.
and x; is Another example is a standard error of the predicted value in the regression:

72 fZ 11 2 (ny, xiJ'Z-(~xi)(Y' Yi) ~ ~a i=1 i-1 i=1 77~1r -~~ yi) - , /n(n_2) (Eq 5) n =1 i~1 n 2 n x; - Y, xi M
where ra is the number of biomarker signals, xi is the measured i:th signal intensity of a 5 patient sample, and y, is the predicted value from each x; by using a linear regression line that was calculated by the least square fitting between the measured xi's and the reference signals.

The entire software scheme, including the algorithms that controls each specific step 10 within the process is outlined in Figure 2.

Bioinarlcer annotation and quantitation 15 The MSBA assay platform builds on a:
A) a separating principle B) non-separating principle in the case of non separating principle we are able to inake the biomarker annotations and 20 quantitations by:

(a) Direct MS-analysis i) direct infusion of biological sample by using static nano-electrospray principles.
Disposable nanospray needles are used, where each nano electrospray needle will only be 25 exposed to one biological sample, thereby circumventing sample overload and memory effects.

(ii) flow injection analysis mode where the sample is injected as a plug. The sample voluine chosen within the plug is directly related to the signal intensity of the respective biomarker protein sequence. It is also possible for low abundant biomarkers to use large (several ml) sample injection voluines thereby reaching a saturation (steady-state) of the ion signal efficiency of the mass spectrometer.

(iii) flow injection with sample enriclunent by MS-analysis utilizing a chromatographic solid phase extraction enriclunent column. This step allows for simultaneous clean-up, by elimination of matrix components within the sample, and trace enrichment of biomarkers.
The advantage of this approach is that it is possible to analyze biomarkers from sainple origins with high complexity, e. g. tissue extracts.
Additionally, in the sample enrichment mode (iii), we are able to generate signal amplification factors ranging but not restricted to 2-500. Additionally, this approach will improve on the detectability of biomarkers expressed at low levels, but also on the accuracy of the protein sequence annotation.
B) In separating analysis, liquid chromatography (LC) integrated biomarker identification relies upon the high resolving power of LC that can be operated in the single column mode (see Figure x) or in the multi-column mode utilizing column switching where the samples are analyzed in a sequential mode, thereby improving the sample throughput.

MSBA Programming Here is an example code of the core part to calculate a weight vector (or a model) and to predict diagnosis using the Support Vector Machine algorithm. The code is written in R-Language. A model (model) will be constri.icted from a training data set (train) , and then will be utilized to generate a prediction (pred) for a given test data set (test). Data sets train and test are data fraines containing plural number of data points, which consist of an object Diag containing diagnosis results (categorical value: either "Positive"
or "Negative". The values are empty for the case of pred), and a vector containing signal intensity for each biomarker and total signal intensity. The MSBA prograinming will rely on data generated from the protein sequence screening performed on the two patient groups from where the biomarkers have been generated.

The programining within the MSBA software is perfonned by in the following way;

library(e 1071) model <- svrn( Diag - ., data = train ) ## the above equation can calculate the model, but it may be too simple to reflect the eqn 2.
### selected support vectors are: model$SV
## and weight vectors are: model$coefs pred <- predict( model, test ) EXAMPLES

The following examples are illustrations from a lung cancer study that was performed by LC-MS protein profiling in human blood samples. Two patient groups were analysed, the CASE and the CONTROL cancer group with differential protein expression differences analysed.

Experimental details From each patient approximately 6 ml blood was taken into a sampling tube containing Heparin sodium salt and was 2-3 times upside-down mixed. Then, it was subjected to centrifugation at 2,000 x g for 10 min at 4 C. Three ml plasma was obtained from the supernatant. The sample was freeze stored at -80 C. Next, the proteins were extracted from plasma and were subjected to tryptic digestion after depleting abundant human plasma albumin and IgG. Aliquots of the fractionated plasma sample was then analyzed by LC-MS, and sequenced by MS/MS, as previously described [WO06100446].

The MSBA Plot The Multiplex biomarlcer suinmary plot presents the inultiplex expression data of the patient biomarkers within the Lung cancer study.

The 10-inultiplex biomarker diagnostic read-out generated from the MSBA
methodology illustrates each and every biomarker separately (see Figure 3). The quantitative difference, i.e. the fold change difference that is already known and stored within the MSBA database (see Figure 1) is used together with the qualitative differences to assay the biomarlcers. Biomarlcer data generated from the lung cancer study correctly identifies all of these patients as positive from the diagilostic multiplex MSBA read-out.
The scoring of all ofthese 10 patients was found to be in the range of 0.80-1Ø

Figure 4 shows an example of biomarlcer annotation made form the multiplex assay, presented by the MS spectruin where the biomarker was recognised by the MSBA
software, and the follow up MS/MS spectrum (see Figure 4) that represents the resulting CVLFPYGGCQGNGNK bioinarker. The MSBA matching, using the reference biomarker spectra in the MSBA-database applying cosine correlation, shows the cosine correlation factor to be equal, or higher than 0.8.

Table 1 presents the details of the MSBA-data generation, where pre-defined masses of the regulated biomarkers are analyzed.

cu en C.> N tn ~D N N 00 ~O ~D M 7t d (y rrj ~rj O O O
o O.
ct un c. C
+~ o ' o G~ O~ Q~ O~ O~
Q' vv, O C O O O
~
0o d N d d~ 06 M d al ~ ~ O N ~ O O06 ~
v~ l~ d v~ ~ N in 00 l~
00 d N d d V~ ~ M Ol ~ O ~ d~' ~ O 06 V'1 M V,) Ln N d C, N d M O~ 00 O~ d O M C r M ~n ~ O ~f N
N c~ M N M M d m N M ~Y o0 O1 l ~t 00 O O ~D 00 l~ N C
M l O Ol\ Lr) N Ln tn ,--~ N v') \-o oO \D l~
d N oo vt N M ~ ~f ~ N M ~ N N
~ M N C ~O ~ ~ N O N ~ d' N O~ l~ l~
C M in 01~ N v') ~D 00 0~
N ~O N N M d O~ ~O M ~) ' ~O O M ~ vl M 01 cr w ~ l~ N Ln Ln N O O 01 O N N ~ 00 d [~ O
N V') u') oo ~ N Ln "O ~ kn N M 00 ~
ca tn kn N N 00 N N 00 fV

C C

E O O~ 16 N
b~n z.
n!

~ N M d r-~
CC

N d M M
M M M ~
prp0~0 ~
G O O O O
N 01 ~ oo N pp 01 M
Ln pp N
M ~ ~ N M
00 4' 00 O_ ~O l~ ~ N \O [~ - 00 O 00 00 ~O O M N
00 Ln 00 M
C~ ~F d N ~D
c',' 00 d' - C=j '--i O N r' ~ ~ ~ `O l0 OO
~ C7~ \~o O "D ~ ~O tn 6 06 \.6 M M T N M O\ ~~~ [~ 00 Ql ~
O l~ N O - N l~ = a M l~ l~ '- ~
00 d t~ V.) co ln ~ v-) oo "o 0o d' t~ a-~ `o o0 C~ co N ~ 00 M d ~ d`n N d ~ `~
00 00 N ~ d O 00 N l~ Lri N M N \0 d,% M
V) i!) 00 Vl l~ d' ~D a1 .. l~ k-j "o Vq l~ r= - ~D
O v~ 01 00 [~ M C-q ~ 0~1 0M0 ~ ~ N M
tf) ti0 '"' C) 00 M

N
~
M M M ~ CN

Another example described as follows is derived from a lung disease CASE-CONTROL
study that was perfonned by LC-MS protein profiling in huinan blood samples.
Two patient groups were analyzed, the CASE and the CONTROL lung disease group with differential protein expression analysis.

Experimental details Procedures of plasma sample collection and preparation were the same as above described example.

MSBA model building and evaluation 46 patient sample data consisting of 10 cases and 36 controls were used to construct MSBA models. We had constructed 5 different MSBA models containing 14, 8, 26, 8, and 11 signals, respectively. After combining the 5 models, it was revealed that the final (5t1') MSBA model had dominant discrimination ability. Thus we used only the 5"' model in the following step. LC-MS information (Retention time (min) / MS-value (m/z)) of the 11 signals of the final model was as follows: 11.6/485.2, 12.5/608.1, 18.3/547.0, 20.2/681.3, 21.1/575.1, 21.3/531.5, 25.51561.6, 23.1/514.5, 32.5/682.2, 44.0/985.2, 48.7/945.8 In order to evaluate the predictability of the model, we tried to predict CASE
/
CONTROL using 10 cases and 9 controls as if they were blinded samples.
According to the MSBA diagnosis procedure described above (also illustrated in Figure 2), the 11-multiplex biomarlcer signals were identified for each test sample, and the quantification of each peak signal was perfonned. From the quantity of all the 11-multiplex signals, the MSBA score for each subject was calculated using the above described fonnula (Eq.5). In this exainple, subjects whose MSBA score was equal or greater than 1 were diagnosed as cases (Table 2, MSBA diagnosis). Consequently, we could predict all the samples correctly, i.e. discrimination ability was 100% (See Figure 5).

The following exainple was also derived fiom the lung disease CASE-CONTROL
study perfoi-lned by LC-MS protein profiling in human blood samples with two patient groups (CASE and CONTROL). hl this example, another set of multiplex biomarkers was used to construct an MSBA model, with different patient dataset of much larger size.

Experimental details Procedures of plasma sample collection and preparation were the same as above described example.

MSBA model building and evaluation 96 patient sample data consisting of 21 cases and 75 controls were used as training dataset. As the number of samples increased, sample variability did also increase. Thus firstly we applied Smirnov test to remove outlier signals. Consequently, 5 samples that contain so many outliers were also removed from the analysis set. Using the result of t-test, we constructed an initial MSBA model containing 100 candidate biomarker signals.
Then by recursively applying discriminant analysis to remove the minimum contributed signal from the discrimination model, finally we obtained an MSBA model with signals. See Table 3 for the listing of 10 multiplex marlcer signals.

In this example, to calculate the discriminant score (z), we used another scoring function presented as the following equation.

z=Ea1=x;+C (Eq.6) (x; : signal intensity, ai & C: coefficients described in Table 3) If the score value is positive, it is inteipreted as CASE, otherwise CONTROL.

In order to evaluate the predictability of the model, we tried to predict CASE
/
CONTROL using the same dataset (21 cases + 75 controls, 96 in total).
According to the MSBA diagnosis procedure described above (also illustrated in Figure 2), the multiplex biomarlcer signals were identified for each sample, and the quantification of each peak signal was perfonned. From the quantity of all the 10-multiplex signals, the MSBA score for each subject was calculated using the above described foili7ula (Eq.6). hi this example, sttbjects whose MSBA score was equal or greater than 0 were diagnosed as cases. Table 4 and Figure 6 show the auto-discrimination results. In Figure 6, each dot represents a patient, and vertical axis represents the discriminant score (z).
In this exainple, sensitivity was 85.7%, specificity was 82.7%, and prediction accuracy was 83.3%.

Table 2 MSBA Clinical Sample#
Score diagnosis diagnosis SBJ201 0.542 control control SBJ202 1.586 Case Case --------------- ---------- --------------- ----------------------SBJ203 3.688 Case Case --------------- ---------- ---------------- ----------------------SBJ204 1.964 Case Case ---------------- ---------- ----------- ----------------------SBJ205 3.339 Case Case --------------- ---------- ---------------------------------------SBJ206 0.682 control control ---------------- ---------- ---------------- ----------------------SBJ207 4.430 Case Case -------------- ---------- ---------------- ----------------------SBJ208 0.241 control control ---------------- --------------------------------------------------SBJ209 1.947 Case Case ---------------- ---------- ---------------- -------------- -------SBJ210 3.456 Case Case ---------------- ---------- ---------------- ----------------------SBJ211 0.483 control control ---------------- ---------- ---------------- ----------------------SBJ212 2.835 Case Case ---------------- ---------- ---------------- ----------------------SBJ213 0.268 control control -----------------SBJ214 0.522 control control -------------- ---------- ---------------- ----------------------SBJ215 0.332 control control ----------------SBJ216 1.091 Case Case ---------------- --------------------------------------------------SBJ217 0.840 control control --------------------SBJ218 0.269 control control -------------------------------------------------------------------SBJ219 1.691 Case Case Table 3 i RT MZ a' 1 36.44 758.5 13.00464 2 37.04 509.2 17.09257 3 37.81 611.2 2.55337 4 39.68 652.3 3.74875 5 39.84 671.2 -32.15044 6 39.97 523.1 -18.14263 7 50.35 974.3 2.41990 8 52.94 758.7 -29.15126 9 59.27 800.8 -17.91525 10 59.41 786.7 5.34939 C= -5.92638 Table 4 MSBA diagnosis Clinical MSBA diagnosis Clinical Sample# Sample#
z-score diagnosis diagnosis z-score diagnosis diagnosis SBJ301 -0.6753 control case SBJ349 9.3784 case case SBJ302 -11.1851 control control SBJ350 9.4884 case case _.... ..e...._ ._ ,. .. .... ...... ...... _......... . ..
....... ._.. ... .._ ......
SBJ303 -6.5025 control control SBJ351 -8.1428 control control ...... . ...... -W...... . _ ......... .. ...... .......__ ......
SBJ304 -5.9519 control control SBJ352 -9.2091 control control _ _..e _......... _ ..
SBJ305 -6.0320 control control SBJ353 -4.8211 control control SBJ306 -2.6594 control control SBJ354 22.5948 control control ....... ... _....... .......... ...._, .._ .. . _ ..
SBJ307 4.9968 case control SBJ355 -8.7626 control control SBJ308 -9.1398 control control SBJ356 10.5248 control control ..
SBJ309 19.1731 case control SBJ357 6.5157 case case ,.... ...... ....... .... _.... ......... .... ... ,.,... .....
SBJ310 -7.1342 control control SBJ358 -4.0407 control control ..... ... . ......._,_............... ......... ...,. _ _.._......
SBJ311 -8.4585 control control SBJ359 25.0186 case case .......... ... .. .... .. _........... _m ...... .. - .........._ ........
......... .. _ ..._........ ............
SBJ312 -6.7038 control control SBJ360 4.9111 case control ....m....... . -, _...... ._.........._ ..... ....... .....
SBJ313 -3.6724 control control SBJ361 -6.5589 control control . ...... ...................... _ ..., __.... ............ ...
SBJ314 -5.8528 control control SBJ362 -3.6577 control control SBJ315 8.8020 case control SBJ363 11.7341 case control .................. ..
SBJ316 -2.6856 control control SBJ364 1.7725 case control . ........... .....
SBJ317 -7.5656 control control SBJ365 -8.0470 control control ...... . ..... ........... ......
SBJ318 -1.9188 control control SBJ366 0.0504 case control ................... ......
SBJ319 9.9846 case case SBJ367 2.5321 case case ......... ..... .... ..... .... .......,........._....
.............._....................._.......... . ............. ..
SBJ320 -4.2516 control control SBJ368 38.8935 case control SBJ321 6.9731 case control SBJ369 6.2574 case case ............ ..... .......................... .......... .....
................._... ............... _..... .._....... ..... .......
_....~......._.........__...
SBJ322 -6.2063 control control SBJ370 -1.2407 control control ___._....._........._..__..... ._.......... ............. ....._._.........
SBJ323 10.9300 case control SBJ371 -7.9560 control control ~_ . ..........._.... ......... .. __.,...._.. . _...... .._._..
SBJ324 -5.6956 control control SBJ372 -8.4344 control control SBJ325 2.3495 case case SBJ373 -8.2976 control control SBJ326 3.4061 case control SBJ374 10.4185 control control .,... . _.__...... _ SBJ327 -3.8970 control control SBJ375 -6.5147 control control SBJ328 -0.7222 control control SBJ376 2.6958 case case SBJ329 2.3875 case case SBJ377 -5.1208 control control SBJ330 -1.6660 control control SBJ378 4.0404 case case SBJ331 -1.2825 control control SBJ379 3.0357 case case SBJ332 -8.4760 control control SBJ380 10.3597 case case SBJ333 -3.8589 control control SBJ381 -3.2779 control control ..._..... ...._ ....... ..- ............... ._ -........ .
SBJ334 -3.9578 control control SBJ382 -6.5229 control control .... _ .. .._ ...... . ..
SBJ335 -2.6746 control case SBJ383 14.4020 case control SBJ336 -5.2540 control control SBJ384 -6.7897 control control 2.9959 SBJ337 case case SBJ385 24.2026 control case SBJ338 -5.6114 control control SBJ386 -8.7277 control control SBJ339 -2.9495 control control SBJ387 6.4477 case case .... _......... .....
SBJ340 -4.8496 control control SBJ388 -2.8463 control control ..._............. .............. .._.......... ............ _..._......
..,.... _............ _ .. ......
SBJ341 -9.7494 control control SBJ389 -8.2789 control control ...... .... ...- ..............
SBJ342 -8.3938 control control SBJ390 -3.9892 control control _ ._.... ... ..... __..............................
SBJ343 -6.4816 control control SBJ391 -9.9261 control control SBJ344 5.4095 case case SBJ392 3.9925 case case SBJ345 7.5053 case case SBJ393 -4.5450 control control ...........
...._.
SBJ346 -7.3492 control control SBJ394 -0.3773 control control ...
SBJ347 -11.2632 control control SBJ395 -4.7587 control control ... _......... _....... SBJ348 124.7669 case control SBJ396 -3.9212 control control

Claims (17)

1. A method for determining the presence of one or more polypeptide biomarkers in a sample, comprising the steps of:
(a) subjecting the sample to a mass spectrometric (MS) analysis and recording retention time index and corresponding mass for each signal detected;
(b) correlating the mass corresponding to each signal to a reference database of biomarker masses to form a correlation between each signal and a reference biomarker, and discarding those signals whose masses do not correlate to a reference boimarker mass;
(c) storing those signals whose masses correlate with a reference biomarker;
(d) confirming the correlation between each stored signal and a reference biomarker by matching the MS spectrum of each signal with the MS spectrum of the reference biomarker in the database using a similarity measure, to define a set of positively correlating signals;
(d) measuring the intensity of each positivley correlating signal and scoring its absolute signal intensity or its relative signal intensity using a discrimination function;
(e) applying a threshold to the score values obtained from the discrimination function to determine the presence or absence of the biomarker.
2. A method according to claim 1, wherein the test sample is subjected to MS
analysis without prior separation procedures.
3. A method according to claim 2, wherein the test sample is analysed by direct infusion using static nano-electrospray principles, flow injection analysis or flow injection with sample enrichment.
4. A method according to claim 1, wherein the test sample is processed prior to MS
analysis.
5. A method according to claim 4, wherein the sample processing comprises sample separation by single- or multi-phase high-pressure liquid chromatography (HPLC).
6. A method according to any preceding claim, wherein the MS is electrospray ionisation (ESI) MS, matrix-assisted laser desorption ionisation - time of flight (MALDI-TOF) MS or surface enhanced laser desorption ionisation - time of flight (SELDI-TOF) MS.
7. A method according to any preceding claim, wherein reference mass and MS
spectral data for a plurality of biomarkers are stored in electronic or paper form.
8. A method according to any preceding claim, wherein reference MS spectra for a defined biomarker are averaged spectra from actual and measured data obtained by a clustering calculation.
9. A method according to any preceding claim, wherein one or more internal standards of reference peptides are added to the sample prior to analysis by MS.
10. A method according to claim 9, wherein the internal standards are labelled with a molecular tag.
11. A method according to claim 9, wherein the internal standards are labelled and included in the master data set.
12. A method according to claim 11, wherein the absolute signal intensity is scored by measuring the biomarker signal intensity and comparing it to the signal intensity of one or more known internal standards.
13. A method according to any one of claims 1 to 9, wherein the sample is processed without the addition of internal standards.
14. A method according to claim 13, wherein the relative signal intensity is scored by measuring the ratio between the individual biomarker signal intensities in a patient and the reference signal intensity for a patient group.
15. A method according to claim 13, which is fully automated.
16. A method according to claim 1, wherein the discrimination function to calculate the score from MS signal intensity optionally includes the use of any clinical variables such as clinical examination results and/or phenotying of clinical observation and/or medical records.
17. A diagnostic method for determining the presence of a disease which comprises comparing the protein sequence biomarkers of a test sample with reference biomarkers, wherein the reference biomarkers comprise peptides identified in Table 1.
CA002651934A 2006-06-13 2007-06-13 Mass spectrometry biomarker assay Abandoned CA2651934A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GB0611669.3 2006-06-13
GBGB0611669.3A GB0611669D0 (en) 2006-06-13 2006-06-13 Mass spectrometry biomarker assay
PCT/GB2007/002187 WO2007144606A2 (en) 2006-06-13 2007-06-13 Mass spectrometry biomarker assay

Publications (1)

Publication Number Publication Date
CA2651934A1 true CA2651934A1 (en) 2007-12-21

Family

ID=36745797

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002651934A Abandoned CA2651934A1 (en) 2006-06-13 2007-06-13 Mass spectrometry biomarker assay

Country Status (10)

Country Link
EP (1) EP2035829A2 (en)
JP (1) JP2009540319A (en)
KR (1) KR20090068199A (en)
CN (1) CN101611313A (en)
AU (1) AU2007258970A1 (en)
BR (1) BRPI0711967A2 (en)
CA (1) CA2651934A1 (en)
GB (1) GB0611669D0 (en)
MX (1) MX2008015158A (en)
WO (1) WO2007144606A2 (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2304421A4 (en) * 2008-07-23 2017-05-31 P Devices Inc. Portable plasma based diagnostic apparatus and diagnostic method
GB0906698D0 (en) 2009-04-17 2009-06-03 Queen Mary & Westfield College Method for quantifying modified peptides
KR101144237B1 (en) * 2009-12-29 2012-05-11 한국기초과학지원연구원 Method for discovering pharmacologically active substance of natural products using high resolution mass spectrometry and pharmacologically active test
GB201006360D0 (en) * 2010-04-16 2010-06-02 Immatics Biotechnologies Gmbh Method for differentially quantifying naturally processed HLA-restricted peptides for cancer, autoimmune and infectious diseases immunotherapy development
US9141756B1 (en) 2010-07-20 2015-09-22 University Of Southern California Multi-scale complex systems transdisciplinary analysis of response to therapy
CN102201039A (en) * 2010-08-19 2011-09-28 上海聚类生物科技有限公司 Method for analyzing linear ion trap quadrupole (LTQ) data
GB201204278D0 (en) 2012-03-09 2012-04-25 Queen Mary & Westfield College Method
FR3016461B1 (en) * 2014-01-10 2017-06-23 Imabiotech METHOD FOR PROCESSING MOLECULAR IMAGING DATA AND CORRESPONDING DATA SERVER
DE102014000646B4 (en) * 2014-01-17 2023-05-11 Bruker Daltonics GmbH & Co. KG Mass spectrometric resistance determination by metabolism measurement
GB201509313D0 (en) * 2015-05-29 2015-07-15 Micromass Ltd Sample mass spectrum analysis
WO2017142025A1 (en) * 2016-02-19 2017-08-24 国立大学法人宮崎大学 Adenocarcinoma detection method
US20190113520A1 (en) * 2016-03-31 2019-04-18 Discerndx, Inc. Biomarker Database Generation and Use
ES2877368T3 (en) * 2016-05-24 2021-11-16 In Ovo Holding B V Method and system for the non-destructive in ovo determination of the gender of birds
WO2018033894A1 (en) * 2016-08-19 2018-02-22 Monash University Spectroscopic systems and methods for the identification and quantification of pathogens
US9905405B1 (en) * 2017-02-13 2018-02-27 Thermo Finnigan Llc Method of generating an inclusion list for targeted mass spectrometric analysis
CN110753845B (en) * 2017-06-29 2023-08-22 烟台载通生物技术有限公司 Method and device for absolute quantification of biomarkers for solid tumor diagnosis
KR102633621B1 (en) 2017-09-01 2024-02-05 벤 바이오사이언시스 코포레이션 Identification and use of glycopeptides as biomarkers for diagnosis and therapeutic monitoring
EP3679378A2 (en) * 2017-09-05 2020-07-15 DiscernDx, Inc. Automated sample workflow gating and data analysis
GB2569805B (en) * 2017-12-22 2023-03-08 Asymptote Ltd Method and system relating to cooling conditions for a biological sample
JP7214756B2 (en) * 2018-05-15 2023-01-30 ベンタナ メディカル システムズ, インコーポレイテッド Quantification of signal in stain aggregates
CN111672658B (en) * 2020-06-08 2022-05-13 惠州市新泓威科技有限公司 Method for detecting and controlling optimal atomization temperature of electronic atomization equipment and electronic atomization equipment
EP4047371A1 (en) * 2021-02-18 2022-08-24 Thermo Fisher Scientific (Bremen) GmbH Method and apparatus for analysing samples of biomolecules using mass spectrometry with data-independent acquisition
KR102632163B1 (en) 2022-04-13 2024-02-01 오보젠 주식회사 Device for detcting and selecting an egg embryo

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1573044A4 (en) * 2002-12-18 2006-07-05 Ciphergen Biosystems Inc Serum biomarkers in lung cancer
DE10324149A1 (en) * 2003-05-26 2004-12-16 Imat-Uve Gmbh Screening analysis for unknown substance, especially non-metal automobile components, uses gas or liquid chromatography with mass spectrometry and databases of known retention indexes and mass spectra and substances
EP1735620A4 (en) * 2004-03-30 2008-04-09 Eastern Virginia Med School Lung cancer biomarkers
JP5008564B2 (en) * 2004-05-20 2012-08-22 ウオーターズ・テクノロジーズ・コーポレイシヨン Method and apparatus for identifying proteins in a mixture

Also Published As

Publication number Publication date
JP2009540319A (en) 2009-11-19
MX2008015158A (en) 2009-04-30
AU2007258970A1 (en) 2007-12-21
EP2035829A2 (en) 2009-03-18
BRPI0711967A2 (en) 2012-01-24
WO2007144606A2 (en) 2007-12-21
KR20090068199A (en) 2009-06-25
GB0611669D0 (en) 2006-07-19
CN101611313A (en) 2009-12-23
WO2007144606A3 (en) 2008-04-03

Similar Documents

Publication Publication Date Title
CA2651934A1 (en) Mass spectrometry biomarker assay
Angel et al. Mass spectrometry-based proteomics: existing capabilities and future directions
Guerrera et al. Application of mass spectrometry in proteomics
US7538321B2 (en) Method of identifying substances using mass spectrometry
JP4932834B2 (en) Method and apparatus for performing retention time alignment
EP2279260B1 (en) Techniques for performing retention-time matching of precursor and product ions and for constructing precursor and product ion spectra
US20100233815A1 (en) Methods for the Development of a Biomolecule Assay
WO2003042774A2 (en) Mass intensity profiling system and uses thereof
US7070949B2 (en) Protein mixture analysis by mass spectrometry
Sokolowska et al. Applications of mass spectrometry in proteomics
Fuller et al. Quantitative proteomics using iTRAQ labeling and mass spectrometry
EP1887351A1 (en) Screening method for specific protein in proteome comprehensive analysis
Masselon et al. Identification of tryptic peptides from large databases using multiplexed tandem mass spectrometry: simulations and experimental results
Hall et al. Isotope-differentiated binding energy shift tags (IDBEST™) for improved targeted biomarker discovery and validation
WO2005057208A1 (en) Methods of identifying peptides and proteins
CN112630344B (en) Use of metabolic markers in cerebral infarction
Vaezzadeh et al. Proteomics and opportunities for clinical translation in urological disease
Henry et al. Introduction to sample preparation for proteomics and mass spectrometry
Meyers et al. Protein identification and profiling with mass spectrometry
ÜNAL Mass Spectrometry in Clinical Diagnosis: Cancer Biomarkers
Hathout et al. Current Proteome Profiling Methods
JPWO2007072648A1 (en) Mass spectrometry system and mass spectrometry method
Borchers et al. CH 12 Application of Proteomics in Basic Biological Sciences and Cancer
Conrads et al. Mass Spectrometry‐Based Proteomic Approaches for Disease Diagnosis and Biomarker Discovery
De Wit et al. The application of proteomics to fetal and maternal medicine

Legal Events

Date Code Title Description
FZDE Discontinued