AU2004239416A1 - Methods and applications of biomarker profiles in the diagnosis and treatment of breast cancer - Google Patents

Methods and applications of biomarker profiles in the diagnosis and treatment of breast cancer Download PDF

Info

Publication number
AU2004239416A1
AU2004239416A1 AU2004239416A AU2004239416A AU2004239416A1 AU 2004239416 A1 AU2004239416 A1 AU 2004239416A1 AU 2004239416 A AU2004239416 A AU 2004239416A AU 2004239416 A AU2004239416 A AU 2004239416A AU 2004239416 A1 AU2004239416 A1 AU 2004239416A1
Authority
AU
Australia
Prior art keywords
breast
breast cancer
biomolecules
subjects
cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2004239416A
Inventor
Jorn Meuer
Jan Wiemer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EUROPROTEOME AG
Original Assignee
EUROPROTEOME AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from EP03090153A external-priority patent/EP1477803A1/en
Application filed by EUROPROTEOME AG filed Critical EUROPROTEOME AG
Publication of AU2004239416A1 publication Critical patent/AU2004239416A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57438Specifically defined cancers of liver, pancreas or kidney
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57419Specifically defined cancers of colon

Description

WO 2004/102188 PCT/EP2004/005292 Methods and Applications of Biomarker Profiles in the Diagnosis and Treatment of Breast Cancer The present invention provides biomolecules and the use of these biomolecules for. the 5 differential diagnosis of breast cancer and/or non-malignant diseases of the breast. In specific embodiments, the biomolecules are characterised by mass profiles generated by contacting a test and/or biological sample with an anion exchange surface under specific binding conditions and detecting said biomolecules using gas phase ion spectrometry. The biomolecules used according to the invention are preferably proteins or polypeptides. Furthermore, preferred test and/or 10 biological samples are blood serum samples and are of human origin. BACKGROUND TO THE INVENTION The incidence of breast cancer, a leading cause of death in women, has been gradually increasing in the United States over the last thirty years. Despite improvements in the rates of 15 screening and early detection, as well as advancements in cancer therapies and improved lifestyles, breast cancer stills remains the most common cancer (other than skin cancer) among women. In 2003, the number of new cases of breast cancer in women was estimated to be about 212,600. 20 While the exact mechanism of tumourigenesis for most breast cancers is largely unknown, research has shown that patients exposed to certain risk factors are more likely than others to develop some form of breast cancer. Some of the strongest risk factors include: an increase in age, wherein women over 60 are at the highest risk of developing a breast cancer; a familial and/or personal history of breast cancer; the reproductive and menstrual history of a woman 25 including the age of menarche (<12 years of age), age at first child-bearing and menopause (>55 years of age); hormone replacement therapies, as well as genetic factors such as the breast cancer gene (BRCA) family. For example, research has shown that the tumour suppressor BRCA1 and BRCA2 contribute to familial breast'cancer in 5% to 10% of breast cancer cases. Germ-line mutations within these two loci are associated with a 50 to 85% lifetime risk of 30 breast and/or ovarian cancer [Marcus et al. (1996) Hereditary Breast Cancer: Pathobiology, Prognosis, and BRCA1 and BRCA2 Gene Linkage. Cancer 77:697-709; Casey, G. (1997) The BRCA1 and BRCA2 Breast Cancer Genes. Curr. 'Opin. Oncol. 9:88-93]. Whereas the cumulative lifetime risk of breast cancer for women who carry the mutant BRCA1 is predicted to be approximately 92%, the risk for the non-carrier -majority is estimated to be approximately 35 10%. Other risk factors have also been linked to the development of breast cancers such as obesity after menopause, alcohol intake, breast density and ethic race. Although obesity and WO 2004/102188 PCT/EP2004/005292 alcohol intake are associated with an increased risk, prospective studies have not yet shown that steering clear of these risk factors actually prevents the development of the disease. Currently there are only a handful of treatments available for specific types of breast cancer. 5 Despite scientific and medical advancements, such therapies provide no guarantee of success. In order for therapies to reach their maximum efficacy, an early detection of malignancy, including the ability to differentiate between malignant vs. non-malignant disease is required. In addition, a reliable assessment of the cancers severity is also needed. For example, patients diagnosed with early breast cancer have greater than a 90% five-year relative survival rate as compared to 10 a survival rate of about 20% for patients diagnosed with distantly metastasized breast cancers. (American Cancer Society statistics). Currently, the best initial detection methods of early breast cancer are palpation of the breast (physical examination) and mammography. Although a physical examination of the breast may be a very good initial indicator, this diagnostic method must be used in parallel with other methods since the detected lesions may either be benign, 15 malignant, or too small to detect be by palpation alone. Mammography, in contrast, is able to detect a breast tumour before it can be discovered by physical examination, but this diagnostic method is not without its own limitations. For example, mammography's predictive value depends on the observer's skill and the quality of the mammogram. In addition, 80 to 93% of suspicious mammograms are false positives, and 10 to 15% of women with breast cancer have 20 false negative mammograms. Clearly, new diagnostic methods that offer a more sensitive and specific detection of early breast cancer are needed. Not only should such methods offer more sensitive and specific detection of early breast cancer, they should also be able to determine the stage to which the patient's disease has progressed; 25 stage determination has potential prognostic value and provides criteria for designing optimal therapy. The advantage of pathological staging of breast cancer over clinical staging is that it provides a more accurate prognosis of the disease, the disadvantage being that this method is invasive. Conversely, clinical staging could become a more attractive approach if it were at least as accurate as pathological staging; it does not depend on an invasive procedure to obtain tissue 30 for evaluation. Early detection and staging of breast cancer could be improved by detecting new markers in serum or urine. Such markers could be mRNA or protein markers expressed by cells originating from the primary tumour in the breast but residing in blood, bone marrow or lymph nodes and 35 could serve as sensitive indicators for tumour development and/or metastasis to these distal organs. For example, specific protein antigens and mRNA, associated with breast epithelial WO 2004/102188 PCT/EP2004/005292 cells, have been detected by immunohistochemical techniques and RT-PCR, respectively, in bone marrow, lymph nodes and blood of breast cancer patients Currently, the serum tumour markers most commonly used for breast cancer detection are 5 carcinoembryonic antigen (CEA) and CA 15-3. Limitations of CEA include the absence of elevated serum levels in about 40% of women with metastatic disease. CA 15-3 suffers a similar fate since this marker can also be negative in a significant number of patients with progressive disease and, therefore, fails to predict metastasis. Furthermore, both CEA and CA 15-3 can be elevated in non-malignant, benign conditions giving rise to false positive results. These serum 10 tumour markers evidently lack the adequate sensitivity and specificity required to be effective in detecting early stage breast cancer in a large population; only reaching performance levels of 23% sensitivity and 69% specificity. In addition, the US Food and Drug Administration has approved the tumour markers CA15.3 and CA27.29 only for the monitoring of therapeutic treatment in the cases advanced stage breast cancer. Clearly, new serological biomarkers that 15 could be used individually, or in combination with an existing modality for cost-effective screening of breast cancer are still urgently needed. Currently, many groups are utilising proteomic technologies to comparatively analyse the differences in protein levels in breast cancer patients as compared to non-diseased subjects, in 20 the hopes of discovering such new serological biomarkers. Formerly, the standard method of proteome analysis has been two dimensional (2D) gel electrophoresis, which is an invaluable tool for the separation and identification of biomarkers. This method is also an effective tool for the identification of aberrantly expressed proteins in a variety of tissue samples. Unfortunately, the analysis of data generated by 2D-gel electrophoresis is labour-intensive and requires large 25 quantities of material for protein analysis, thereby rendering it impractical for routine clinical use. Through the introduction of SELDI (surface enhanced laser desorption ionization), a modification of MALDI-TOF (matrix-assisted laser desorption ionization/time of flight) which 30 is a mass spectrometry technique that allows for the simultaneous analysis of multiple biomarkers within one sample, this tool has been achieved. Small amounts of biomarkers can be directly bound to a biochip, carrying spots with different types of chromatographic material, including those with hydrophobic, hydrophilic, cation-exchanging and anion-exchanging characteristics. This approach has been proven to be very useful to identify biomarkers and 35 biomarker patterns (profiles) in various biological fluids (Ciphergen Inc.).
WO 2004/102188 PCT/EP2004/005292 To date, specific serological biomarkets for the detection of breast cancers (patents W00223200 and W003058198 from Ciphergen) have been identified using the above-mentioned SELDI technology. Unfortunately, due to the nature of the sample testing, the biomarkers identified can only be used to diagnose a patient as having a breast cancer versus not having the disease at all. 5 For example, whereas the test samples analysed in W003058198 (Ciphergen) and W00223200 (Ciphergen) were taken from patients with late-stage breast cancer (stages III and IV), the control samples were taken from patients with undetectable breast cancer. The biomarkers identified are neither grade-specific nor can they detect the disease at its earliest stages (stage I and II), and thereby would not allow for effective patient-specific diagnosis and/or treatment of 10 the disease. Moreover, such serological biomarkers that can specifically differentiate between the presence of a given breast cancer and a non-malignant disease of the breast have not yet been identified. Again, there is a critical need to develop a simple, non-invasive, reliable and inexpensive 15 method for the effective detection of breast cancer at its early stages. Preferably, such a diagnostic method should be able to detect early-stage breast cancer, as well as distinguish between the later stages or grades of the disease. Furthermore, such a diagnostic tool should be able to differentiate between breast cancer and a non-malignant disease of the breast. With such valuable information, clinicians would be able to tailor patient therapies for optimum treatment 20 of the disease. The present invention addresses this difficulty with the development of a non-invasive diagnostic tool for the differential diagnosis of breast cancer and/or a non-malignant disease of the breast. 25 SUMMARY OF THE INVENTION The present invention relates to methods for the differential diagnosis of breast cancer and/or non-malignant diseases of the breast, by detecting one or more differentially expressed biomolecules within a test sample of a given subject, comparing results with samples from 30 healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, subjects having a metastasised breast cancer, or subjects having a non-malignant disease of the breast, wherein the comparison allows for the differential diagnosis of a subject as healthy, having a precancerous lesion of the breast, having a breast cancer, having a metastasised breast cancer or a non-malignant disease of the breast. 35 The present invention provides a method for the differential diagnosis of breast cancer and/or a non-malignant disease of the breast, in vitro, comprising obtaining a test sample from a subject, WO 2004/102188 PCT/EP2004/005292 contacting test sample with a biologically active surface under specific binding conditions, allowing for biomolecules present within the test sample to bind to the biologically active surface, detecting one or more bound biomolecules using mass spectrometry thereby generating a mass profile of said test sample, transforming data into a computer-readable form, and 5 comparing said mass profile against a database containing mass profiles specific for healthy subjects, subjects having a precancerous lesion of the breast, subjects having breast cancer, subjects having metastasised breast cancers, or subjects having a non-malignant disease of the breast, wherein the comparison allows for the differential diagnosis of a subject as healthy, having a precancerous lesion of the breast, having a breast cancer, having a metastasised breast 10 cancer or a non-malignant disease of the breast. In one embodiment the invention provides a database comprising of mass profiles of biological samples from healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, subjects having a metastasised breast cancer, or subjects having 15 non-malignant disease of the breast. Within the same embodiment the database is generated by obtaining biological samples from healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, subjects having a metastasised breast cancer, and subjects having non-malignant 20 diseases of the breast, contacting said biological samples with a biologically active surface under specific binding conditions, allowing the biomolecules within the biological sample to bind to said biologically active surface, detecting one or more bound biomolecules using mass spectrometry thereby generating a mass profile of said biological samples, transforming data into a computer-readable form, and applying a mathematical algorithm to classify the mass 25 profiles as specific for healthy subjects, subjects having a precancerous lesion of the breast, subjects having breast cancer, subjects having metastasised breast cancer, and subjects having a non-malignant disease of the breast. In specific embodiments, the present invention provides biomolecules having a molecular mass 30 selected from the group consisting of 1506 Da ± 8 Da, 1533 Da + 8 Da, 1623 Da ± 8 Da, 1975 Da ± 10 Da, 2017 Da 10 Da, 2053 Da 10 Da, 2268 Da+ 11 Da, 2607 Da+ 13 Da, 3328 Da z 17 Da, 3508 Da ± 18 Da, 3660 Da 18 Da, 3951 Da 20 Da, 4107 Da ± 21 Da, 4161 Da ± 21 Da, 4245 Da + 21 Da, 4295 Da± 21 Da, 4363 Da± 22 Da, 4476 Da± 22 Da, 4614 Da± 23 Da, 4725 Da 24 Da, 4831 Da ±24 Da, 4874 Da+ 24 Da, 4962 Da ±25 Da, 5115 Da ± 26 Da, 35 5497 Da ± 27 Da, 5655 Da 28 Da, 5863 Da ± 29 Da, 6454 Da ± 32 Da, 6655 Da ± 33 Da, 6906 Da ± 35 Da, 7012 Da ± 35 Da, 7591 Da ± 38 Da, 7998 Da ± 40 Da, 8230 Da 41 Da, 8487 Da ± 42 Da, 8589 Da ± 43 Da, 8717 Da + 44 Da, 8792 Da ± 44 Da, 8939 Da ± 45 Da, WO 2004/102188 PCT/EP2004/005292 9160 Da ± 46 Da, 9221 Da ± 46 Da, 9377 Da 47 Da, 9446 Da 47 Da, 9661 Da 48 Da, 9737 Da + 49 Da, 9955 Da ± 50 Da, 10232 Da+ 51 Da, 10464 Da 52 Da, 10682 Da+ 53 Da, 11414 Dal 57 Da, 11567 Da i 58 Da, 11723 Da± 59 Da, 12492 Da± 62 Da, 12656 Da ± 63 Da, 13652 Da 68 Da, 13776 Da± 69 Da, 13812 Da 69 Da, 14014 Da 70 Da, 14082 Da± 5 70 Da, 14821 Da 74 Da, 15160 Da ± 76 Da, 15367 Da ± 77 Da, 15909 Da+ 78 Da, 15975 Da ±80 Da, 16202 Da 81 Da, 17288 Da ±86 Da, 17416 Da ± 87 Da, 17504 Da ± 88 Da, 17638 Da ± 88 Da, 17961 Da t 90 Da, 18146 Da ± 91 Da, 18430 Da ± 92 Da, 18656 Da ± 93 Da, 22383 Da 112 Da, 22496 Da 113 Da, 22710 Dat 114 Da, 23218 Da+ 116 Da, 28119 Da± 141 Da, and 28313 Da ± 142 Da. The biomolecules having said molecular masses are detected 10 by contacting a test and/or biological sample with a biologically active surface comprising an adsorbent under specific binding conditions and further analysed by gas phase ion spectrometry. Preferably the adsorbent used is comprised of positively charged quaternary ammonium groups (anion exchange surface). 15 In specific embodiments, the invention provides specific binding conditions for the detection of biomolecules within a sample. In preferred embodiments, a sample is diluted 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 2% Ampholine, and then diluted again 1:10 in binding buffer consisting of 0.1 M Tris-HC1, 0.02% Triton X-100 at a pH 8.5 at 0 to 4'C. The treated sample is then contacted with a biologically 20 active surface comprising of positively charged (cationic) quaternary ammonium groups (anion exchanging), incubated for 120 minutes at 20 to 24'C, and the bound biomolecules are detected using gas phase ion spectrometry. In an alternative embodiment, the invention provides a method for the differential diagnosis of 25 breast cancer and/or a non-malignant disease of the breast comprising detecting of one or more differentially expressed biomolecules within a sample. This method comprises obtaining a test sample from a subject, contacting said sample with a binding molecule specific for a differentially expressed polypeptide, detecting an interaction between the binding molecule and its specific polypeptide, wherein the detection of an interaction indicates the presence or 30 absence of said polypeptide, thereby allowing for the differential diagnosis of a subject as healthy, having a precancerous lesion of the breast, having a breast cancer, having a metastasised breast cancer or a non-malignant disease of the breast. Preferably, binding molecules are antibodies specific for said polypeptides. 35 The biomolecules related to the invention, having a molecular mass selected from the group consisting of 1506 Da 1 8 Da, 1533 Da ± 8 Da, 1623 Da ± 8 Da, 1975 Da t 10 Da, 2017 Da ± 10 Da, 2053 Da + 10 Da, 2268 Da ± 11 Da, 2607 Da t 13 Da, 3328 Da ± 17 Da, 3508 Da + 18 WO 2004/102188 PCT/EP2004/005292 Da, 3660 Da± 18 Da, 3951 Dad 20 Da, 4107 Dad 21 Da, 4161 Da 21 Da, 4245 Da 21 Da, 4295 Da t 21 Da, 4363 Da 22 Da, 4476 Da ± 22 Da, 4614 Da 23 Da, 4725 Da ± 24 Da, 4831 Da + 24 Da, 4874 Da t 24 Da, 4962 Da t 25 Da, 5115 Dad 26 Da, 5497 Da 27 Da, 5655 Da + 28 Da, 5863 Da J: 29 Da, 6454 Da ± 32 Da, 6655 Da J= 33 Da, 6906 Da ± 35 Da, 5 7012 Da + 35 Da, 7591 Da + 38 Da, 7998 Da ± 40 Da, 8230 Da 41 Da, 8487 Da t 42 Da, 8589 Da i 43 Da, 8717 Da ± 44 Da, 8792 Da i 44 Da, 8939 Da +: 45 Da, 9160 Da: 46 Da, 9221 Da t 46 Da, 9377 Da 47 Da, 9446 Da ± 47 Da, 9661 Da + 48 Da, 9737 Da ± 49 Da, 9955 Da t 50 Da, 10232 Da 51 Da, 10464 Dad: 52 Da, 10682 Da t 53 Da, 11414 Da + 57 Da, 11567 Da 58 Da, 11723 Dal+ 59 Da, 12492 Da 62 Da, 12656 Da 63 Da, 13652 Dad: 10 68 Da, 13776 Da 69 Da, 13812 Da 69 Da, 14014 Da 70 Da, 14082 Da 70 Da, 14821 Da ± 74 Da, 15160 Da A 76 Da, 15367 Da ± 77 Da, 15909 Da 78 Da, 15975 Da 80 Da, 16202 Da : 81 Da, 17288 Da 86 Da, 17416 Da 87 Da, 17504 Da 88 Da, 17638 Da: 88 Da, 17961 Da ± 90 Da, 18146 Da 91 Da, 18430 Da 92 Da, 18656 Da 93 Da, 22383 Da + 112 Da, 22496 Da 113 Da, 22710 Da 114 Da, 23218 Da J: 116 Da, 28119 Da ± 141 Da, or 15 28313 Da 142 Da, and may include, but are not limited to, molecules comprising nucleotides, amino acids, sugars, fatty acids, steroids, nucleic acids, polynucleotides (DNA or RNA), polypeptides, proteins, antibodies, carbohydrates, lipids, and combinations thereof (e.g., glycoproteins, ribonucleoproteins, lipoproteins). Preferably said biomolecules are proteins, polypeptides, or fragments thereof. 20 In yet another embodiment, the invention provides a method for the identification of biomolecules within a sample, provided that the biomolecules are proteins, polypeptides or fragments thereof, comprising: chromatography and fractionation, analysis of fractions for the presence of said differentially expressed proteins and/or fragments thereof, using a biologically 25 active surface, further analysis using mass spectrometry to obtain amino acid sequences encoding said proteins and/or fragments thereof, and searching amino acid sequence databases of known proteins to identify said differentially expressed proteins by amino acid sequence comparison. Preferably the method of chromatography is high performance liquid chromatography (HiPLC) or fast protein liquid chromatography (FPLC). Furthermore, the mass 30 spectrometry used is selected from the group of matrix-assisted laser desorption ionization/time of flight (MALDI-TOF), surface enhanced laser desorption ionisation/time of flight (SELDI-TOF), liquid chromatography, MS-MS, or ESI-MS. Furthermore, the invention provides kits for the differential diagnosis of breast cancer and/or a 35 non-malignant disease of the breast.
WO 2004/102188 PCT/EP2004/005292 The test or biological samples used according to the invention may be of blood, blood serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, excreta, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract origin. Preferably, thetest and/or biological samples are blood serum samples, and are isolated from 5 subjects of mammalian origin, preferably of human origin. DESCRIPTION OF FIGURES Figure 1. Comparison of protein mass spectra detected using the above-mentioned SAX2 ProteinChip arrays for samples isolated from patients with breast cancer (Ti and T2) and from 10 patients suffering from non-malignant diseases of the breast (Cl and C2). A shows an overview in the mass range 11 - 20 kDa. B shows the boxed section of A. The mass signal at m/z 12,656.2 Da is highlighted. Its variable importance score ranks 2 nd within the classifier. Figure 2A. Development of out-of-bag error. During the training process of the final classifier, 15 the out-of-bag error decreased to about 27%. The out-of-bag error is typically higher than the resulting test error as class assignment is only conducted on the basis of about 1/3 of the generated trees. Figure 2B. Out-of-bag estimation of ROC curve for final classifier. The out-of-bag estimates of 20 sensitivity and specificity presented in Table -3 are extrapolated into the entire range of sensitivity and specificity. This is done by varying the percentage of decision trees with vote "positive" necessary for assigning a case to class "positive". The diagonal represents the average random classifier, assigning cases randomly to class "positive" and "negative". The circle marks the pair of sensitivity and specificity of Table 3. 25 Figure 3. Decision tree complexity. The histogram visualizes the distribution of decision tree complexity in the final random forest classifier. Here, decision tree complexity is measured by the number of terminal nodes. 30 Figure 4. Voting distribution. The histogram shows how frequently trees of the final classifier vote for class "positive". For each case (patient) only the votes of those trees are collected for which the considered case is "out-of-bag". For each case, votes are normalized as follows: (number of votes for class "positive" - number of votes for class negative) / (number of trees for which the considered case is "out-of-bag"). Dashed vertical lines correspond to quantiles at 0%, 35 25%, 50%, 75%, and 100%.
WO 2004/102188 PCT/EP2004/005292 Figure 5. A - E. Scatter plots of peak clusters belonging to differentially expressed proteins included in the classifier. Peak clusters are aligned along the vertical axis, e.g. M1516.00 denotes the peak cluster with characteristic mass 1516 Da. The horizontal axis shows the raw relative signal intensity of the peaks in the examined serum samples. Here, "raw" refers the non 5 logarithmic and not additionally normalized intensities, see Figure 6 and 7 for further intensity transformations. o T (Tumour): Breast cancer & DCIS patients' serum samples. o C (Control): Healthy & diseased control patients' serum samples. Figure 6A - E. Scatter plots of peak clusters belonging to differentially expressed proteins 10 included in the classifier. Peak clusters are aligned along the vertical axis, e.g. M1516.00 denotes the peak cluster with characteristic mass 1516 Da. The horizontal axis shows the logarithmic normalized relative signal intensity of the peaks in the examined serum samples. For each mass, intensities were first shifted to entirely positive values and then normalized by dividing the intensity values by the average intensity of that mass. Finally, the base 2 logarithm 15 was taken. Accordingly, zero logarithmic normalized relative intensity refers to mean peak cluster intensity, and logarithmic normalized relative intensities of +1 and -1 mean two-fold over- and under-expression relative to mean peak cluster intensity, respectively. o T (Tumour): Breast cancer & DCIS patients' serum samples. o C (Control): Healthy & diseased control patients' serum samples. 20 Figure 7A - E. Additionally scaled scatter plots of peak clusters belonging to differentially expressed proteins included in the classifier. Peak clusters are aligned along the vertical axis, e.g. M1516.00 denotes the peak cluster with characteristic mass 1516 Da. As in Figure 3, the Y axis shows the logarithmic normalized relative signal intensity of the peaks in the examined 25 serum samples. However, intensities were additionally (shifted and) scaled so that the intensities of each peak cluster cover the entire horizontal range. Thereby, the minimum and maximum intensities of all masses are aligned on the left and right edge of the plot, respectively. This allows to better visualize the extend of class overlap. n T (Tumour): Breast cancer & DCIS patients' serum samples. o C (Control): Healthy & diseased control patients' serum samples. 30 DESCRIPTION OF THE INVENTION It is to be understood that the present invention is not limited to the particular materials and methods described or equipment, as these may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not 35 intended to limit the scope of the present invention, which will be limited only by the appended claims.
WO 2004/102188 PCT/EP2004/005292 It should be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "an antibody" is a reference -to one or more antibodies and derivatives thereof known to those skilled in the art, and so forth. 5 Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art. Although any materials and methods, or equipment comparable to those specifically described herein can be used to practice or test the present invention, the preferred equipment, materials and methods are described 10 below. All publications mentioned herein are cited for the purpose of describing and disclosing protocols, reagents, and current state of the art technologies that might be used in connection with the invention. Nothing herein is to be construed as an admission that the invention is not entitled to precede such disclosure by virtue of prior invention. 15 Definitions The term "biomolecule" refers to a molecule produced by a cell or living organism. Such molecules include, but are not limited to, molecules comprising nucleotides, amino acids, sugars, fatty acids, steroids, nucleic acids, polynucleotides, polypeptides, proteins, carbohydrates, lipids, and combinations thereof (e.g., glycoproteins, ribonucleoproteins, 20 lipoproteins). Furthermore, the terms "nucleotide" or polynucleotide" refer to a nucleotide, oligonucleotide, polynucleotide, or any fragment thereof. These phrases also refer to DNA or RNA of genomic or synthetic origin which may be single-stranded or double-stranded and may represent the sense, or the antisense strand, to peptide polynucleotide sequences (i.e. peptide nucleic acids; PNAs), or to any DNA-like or RNA-like material. 25 The term "fragment" refers to a portion of a polypeptide (parent) sequence that comprises at least 10 consecutive amino acid residues and retains a biological activity and/or some functional characteristics of the parent polypeptide e.g. antigenicity or structural domain characteristics. 30 The terms "biological sample" and "test sample" refer to all biological fluids and excretions isolated from any given subject. In the context of the invention such samples include, but are not limited to, blood, blood serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, excreta, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract samples. 35 The term "specific binding" refers to the binding reaction between a biomolecule and a specific "binding molecule". Related to the invention are binding molecules that include, but are not WO 2004/102188 PCT/EP2004/005292 limited to, proteins, peptides, nucleotides, nucleic acids, hormones, amino acids, sugars, fatty acids, steroids, polynucleotides, carbohydrates, lipids, or a combination thereof (e.g. glycoproteins, ribonucleoproteins, lipoproteins). Furthermore, a binding reaction is considered to be specific when the interaction between said molecules is substantial. In the context of the 5 invention, a binding reaction is considered substantial when the reaction that takes place between said molecules is at least two times the background. Moreover, the term "specific binding conditions" refers to reaction conditions that permit the binding of said molecules such as pH, salt, detergent and other conditions known to those skilled in the art. 10 The term "interaction" relates to the direct or indirect binding or alteration of biological activity of a biomolecule. The term "differential diagnosis" refers to a diagnostic decision between healthy and different disease states, including various stages of a specific disease. A subject is diagnosed as healthy 15 or to be suffering from a specific disease, or a specific stage of a disease based on a set of hypotheses that allow for the distinction between healthy and one or more stages of the disease. The choice between healthy and one or more stages of disease depends on a significant difference between each hypothesis. Under the same principle, a "differential diagnosis" may also refer to a diagnostic decision between one disease type as compared to another (e.g. breast 20 cancer vs. a non-malignant disease of the breast). The term "breast cancer" refers to a malignant neoplastic lesion of the breast within a given subject, wherein the neoplasm is defined according to its type, stage and/or grade. The various stages of a cancer may be identified using staging systems known to those skilled in the art [e.g. 25 Union Internationale Contre Cancer (UICC) system or American Joint Committee on Cancer (AJC)]. It is to be understood that the term "breast canter" is also referred to as "mammary cancer" or "a carcinoma of the breast". Within the context. of the invention, breast cancer includes both in situ (non-invasive) and invasive breast cancers. Whereas, in situ (non-invasive) breast cancers include ductal und lobular carcinoma in situ (DCIS und LCIS, respectively), 30 invasive breast cancers encompass infiltrating diseases such as invasive ductal, lobular und papillary carcinoma's (DCIS und LCIS) and medullar, colloid, und tubular carcinomas. The term "a non-malignant disease of the breast" refers to a lesion of the breast that does not exhibit malignant neoplastic physiological, biochemical, and/or morphological properties 35 known to those skilled in the art. Such diseases include, but are not limited to, inflammatory and proliferative lesions, fibrocystic changes within mammary tissue as well as benign disorders of the breast. Within the context of the invention, inflammatory lesions encompass acute, WO 2004/102188 PCT/EP2004/005292 periductal and granulomatous mastitis, duct ectasia, fat necrosis, whereas proliferative lesions include epithelial hyperplasia (atypical ductal and lobular hyperplasia), sclerosing adenosis, and small duct papillomas. Also included in the invention are benign disorders of the glandular tissue (mastopathy), papillomas (large duct, intraductal), and fibroadenomas. 5 The term "healthy individual" refers to a subject possessing good health. Such a subject demonstrates an absence of any disease within the breast; preferably an absence of a non malignant disease of the breast or breast cancer. 10 The term "precancerous lesion of the breast" refers to a biological change within the breast such that it becomes susceptible to the development of a cancer. More specifically, a precancerous lesion of the breast is a preliminary stage of a breast cancer. Causes of a precancerous lesion may include, but are not limited to, genetic predisposition and exposure to cancer-causing agents (carcinogens); such cancer causing agents include agents that cause genetic damage and 415 induce neoplastic transformation of a cell. Furthermore, the phrase "neoplastic transformation of a cell" refers an alteration in normal cell physiology and includes, but is not limited to, self-sufficiency in growth signals, insensitivity to growth-inhibitory (anti-growth) signals, evasion of programmed cell death (apoptosis), limitless replicative potential, sustained angiogenesis, and tissue invasion and metastasis. 20 The phrase "differentially present" refers to differences in the quantity of a biomolecule (of a particular apparent molecular mass) present in a sample from a subject as compared to a comparable sample. For example, a biomolecule is present at an elevated level, a decreased level or absent in samples of subjects having breast cancer compared to samples of subjects who 25 do not have a cancer of the breast. Therefore in the context of the invention, the term "differentially present biomolecule" refers to the quantity of the biomolecule (of a particular apparent molecular mass) present within a sample taken.from a subject having a breast cancer or a non-malignant disease of the breast as compared to a comparable sample taken from a healthy subject. Within the context of the invention, a biomolecule is differentially present between two 30 samples if the quantity of said biomolecule in one sample is significantly different (defined statistically) from the quantity of said biomolecule in another sample. The term "diagnostic assay" can be used interchangeably with "diagnostic method" and refers to the detection of the presence or nature of a pathologic condition. Diagnostic assays differ in 35 - their sensitivity and specificity. Within the context of the invention the sensitivity of a diagnostic assay is defined as the percentage of diseased subjects who test positive for a breast cancer or a non-malignant disease of the breast, and are considered "true positives". Subjects WO 2004/102188 PCT/EP2004/005292 having either a breast cancer or a non-malignant disease of the breast, but are not detected by the diagnostic assay are considered to be "false negatives". Subjects who show no disease, whether a breast cancer or a non-malignant disease of the breast, and who test negative in the diagnostic assay are considered to be "true negatives". Furthermore, the term specificity of a 5 diagnostic assay, as used herein, is defined as 1 minus the false positive rate, where the "false positive rate" is defined as the proportion of those subjects devoid of a non-malignant disease of the breast or a breast cancer, but who test positive in said assay. The term "adsorbent" refers to any material that is capable of accumulating (binding) a given 10 biomolecule. The adsorbent typically coats a biologically active surface and is composed of a single material or a plurality of different materials that are capable of binding a biomolecule. Such materials include, but are not limited to, anion exchange materials, cation exchange materials, metal chelators, polynucleotides, oligonucleotides, peptides, antibodies, metal chelators etc. 15 The term "biologically active surface" refers to any two- or three-dimensional extension of a material that biomolecules can bind to, or interact with, due to the specific biochemical properties of this material and those of the biomolecules. Such biochemical properties include, but are not limited to, ionic character (charge), hydrophobicity, or hydrophilicity. 20 The term "binding molecule" refers to a molecule that displays an affinity for another molecule. With in the context of the invention such molecules may include, but are not limited to nucleotides, amino acids, sugars, fatty acids, steroids, nucleic acids, polypeptides, carbohydrates, lipids, and combinations thereof (e.g. glycoproteins, ribonucleoproteins, 25 lipoproteins). Preferably, such binding molecules are antibodies. The term "solution" refers to a homogeneous mixture of two or more substances. Solutions may include, but are not limited to buffers, substrate solutions, elution solutions, wash solutions, detection solutions, standardisation solutions, chemical solutions, solvents, etc. Furthermore, 30 other solutions known to those skilled in the art are also included herein. The term "mass profile" refers to a mass spectrum as a characteristic property of a given sample or a group of samples. Such a profile, when compared to the mass profile of a second sample or group of samples, will allow for the differentiation between the two samples. In the context of 35 the invention, the mass profile is obtained by treating the biological sample as follows. The sample is diluted it 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1%. DTT, and 2% ampholine and subsequently diluted 1:10 in binding buffer WO 2004/102188 PCT/EP2004/005292 consisting of 0.1 M Tris-HCl, 0.02% Triton X-100 at pH 8.5. Thus pre-treated sample is applied to a biologically active surface comprising positively charged quaternary ammonium groups (anion exchange surface) and incubated for 120 minutes. The biomolecules bound to the surface are analysed by gas phase ion spectrometry as described in another section. All but the dilution 5 steps are performed at 20 to 24C. Dilution steps are performed at 0 to 4*C. The phrase "apparent molecular mass" refers to the molecular mass value in Dalton (Da) of a biomolecule as it may appear in a given method of investigation, e.g. size exclusion chromatography, gel electrophoresis, or mass spectrometry. 10 The term "chromatography" refers to any method of separating biomolecules within a given sample such that the original native state of a given biomolecule is retained. Separation of a biomolecule from other biomolecules within a given sample for the purpose of enrichment, purification and/or analysis, may be achieved by methods including, but not limited to, size 15 exclusion chromatography, ion exchange chromatography, hydrophobic and hydrophilic interaction chromatography, metal affinity chromatography, wherein "metal" refers to metal ions (e.g. nickel, copper, gallium, or zinc) of all chemically possible valences, or ligand affmity chromatography wherein "ligand" refers to binding molecules, preferably proteins, antibodies, or DNA. Generally, chromatography uses biologically active surfaces as adsorbents to 20 selectively accumulate certain biomolecules. The term "mass spectrometry" refers to a method comprising employing an ionization source to generate gas phase ions from a biological entity of a sample presented on a biologically active surface, and detecting the gas phase ions with a mass spectrometer. 25 The phrase "laser desorption mass spectrometry" refers to a method comprising the use of a laser as an ionization source to generate gas phase ions from a biomolecule presented on a biologically active surface, and detecting the gas phase ions with a mass spectrometer. 30 The term "mass spectrometer" refers to a gas phase ion spectrometer that includes an inlet system, an ionisation source, an ion optic assembly, a mass analyser, and a detector. Within the context of the invention, the terms "detect", "detection" or "detecting" refer to the identification of the presence, absence, or quantity of a biomolecule. 35 The term "energy absorbing molecule" or "EAM" refers to a molecule that absorbs energy from an energy source in a mass spectrometer thereby enabling desorption of a biomolecule from a WO 2004/102188 PCT/EP2004/005292 biologically active surface. Cinnamic acid derivatives, sinapinic acid and dihydroxybenzoic acid are frequently used as energy-absorbing molecules in laser desorption of biomolecules. See U.S. Pat. No. 5,719,060 (Hutchens & Yip) for a further description of energy absorbing molecules. 5 The term "training set" refers to a subset of the respective entire available data set. This subset is typically randomly selected, and is solely used for the purpose of classifier construction. The term "test set" refers to a subset of the entire available data set consisting of those entries not included in the training set. Test data is applied to evaluate classifier performance. 10 The term "decision tree" refers to a flow-chart-like tree structure employed for classification. Decision trees consist of repeated splits of a data set into subsets. Each split consists of a simple rule applied to one variable, e.g., "if value of 'variable 1' larger than 'threshold 1' then go left else go right". Accordingly, the given feature space is partitioned into a set of rectangles with 15 each rectangle assigned to one class. The terms "ensemble", "tree ensemble" or "ensemble classifier" can be used interchangeably and refer to a classifier that consists of many simpler elementary classifiers, e.g., an ensemble of decision trees is a classifier consisting of decision trees. The result of the ensemble classifier is 20 obtained by combining all the results of its constituent classifiers, e.g., by majority voting that weights all constituent classifiers equally. Majority voting is especially reasonable in the case of bagging, where constituent classifiers are then naturally weighted by the frequency with which they are generated. 25 The term "competitor" refers to a variable that can be used as an alternative splitting rule in a decision tree. Within the context of the invention,the competitor is the apparent molecular mass of a given biomolecule. In each step of decision tree construction, only the variable yielding the best data-splitting is selected. Competitors are non-selected variables with similar but lower performance than the selected variable. They point into the direction of alternative decision 30 trees. The term "surrogate" refers to a splitting rule that closely mimics the action of the primary split. A surrogate is a variable that can substitute a selected decision tree variable, e.g. in the case of missing values. Not only must a good surrogate split the parent node into descendant nodes 35 similar in size and composition to the primary descendant nodes, it must also match the primary split on the specific cases that go to the left child and right child nodes.
WO 2004/102188 PCT/EP2004/005292 The terms "peak" and "signal" may be used interchangeably, and refer to any signal which is generated by a biomolecule when under investigation using a specific method, for example chromatography, mass spectrometry, or any type of spectroscopy like Ultraviolet/Visible Light (UV/Vis) spectroscopy, Fourier Transformed Infrared (FTIR) spectroscopy, Electron 5 Paramagnetic Resonance (EPR) spectroscopy, or Nuclear Mass Resonance (NMR) spectroscopy. Within the context of the invention, the terms "peak" and "signal" refer to the signal generated by a biomolecule of a certain molecular mass hitting the detector of a mass spectrometer, thus 10 generating a signal intensity which correlates with the amount or concentration of said biomolecule of a given sample. A "peak" and "signal" is defined by two values: an apparent molecular mass value (m/z) and an intensity value generated as described. The mass value is an elemental characteristic of a biological entity, whereas the intensity value accords to a certain amount or concentration of a biological entity with the corresponding apparent molecular mass 15 value, and thus "peak" and "signal" always refer to the properties of this biological entity. The term "cluster" refers to a signal or peak present in a certain set of mass spectra or mass profiles obtained from different samples belonging to two or more different groups (e.g. cancer and non-cancer). Within the set, signals belonging to cluster can differ in their intensities, but 20 not in the apparent molecular masses. The term "variable" refers to a cluster which is subjected to a statistical analysis aiming towards a classification of samples into two or more different sample groups (e.g. cancer and non cancer) by using decision trees, wherein the sample feature relevant for classification is the 25 intensity value of the variables in the analysed samples. Detailed Description of the invention a) Diagnostics The present invention relates to methods for the differential diagnosis of breast cancer and/or a 30 non-malignant disease of the breast by detecting one or more differentially expressed biomolecules within a test sample of a given subject, comparing results with samples from healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, subjects having a metastasised breast cancer, or subjects having a non-malignant disease of the breast, wherein the comparison allows for the differential diagnosis of a subject as 35 healthy, having a precancerous lesion of the breast, having a breast cancer, having a metastasised breast cancer or a non-malignant disease of the breast.
WO 2004/102188 PCT/EP2004/005292 In one aspect of the invention, a method for the differential diagnosis of a breast cancer and/or a non-malignant disease of the breast comprises: obtaining a test sample from a given subject, contacting said sample with an adsorbent present on a biologically active surface under specific binding conditions, allowing the biomolecules within the test sample to bind to said adsorbent, 5 detecting one or more bound biomolecules using a detection method, wherein the detection method generates a mass profile of said sample, transforming mass profile data into a computer-readable form comparing the mass profile of said sample with a database containing mass profiles from comparable samples specific for healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, subjects having a 10 metastasised breast cancer, or subjects having a non-malignant disease of the breast. A comparison of mass profiles allows for the medical practitioner to determine if a subject is healthy, has a precancerous lesion of the breast, a breast cancer, a metastasised breast cancer or a non-malignant disease of the breast based on the presence, absence or quantity of specific biomolecules. 15 In more than one embodiment, a single biomolecule or a combination of more than one biomolecule selected from the group having an apparent molecular mass of 1506 Da + 8 Da, 1533 Da ± 8 Da, 1623 Da ± 8 Da, 1975 Da ± 10 Da, 2017 Da + 10 Da, 2053 Da ± 10 Da, 2268 Da t 11 Da, 2607 Da ± 13 Da, 3328 Da + 17 Da, 3508 Da ± 18 Da, 3660 Da ± 18 Da, 3951 Da 20 20 Da, 4107 Da 21 Da, 4161 Da ±21 Da, 4245 Da ±21 Da, 4295 Da± 21 Da, 4363 Da + 22 Da, 4476 Da± 22 Da, 4614 Da± 23 Da, 4725 Da + 24 Da, 4831 Da± 24 Da, 4874 Da± 24 Da, 4962 Da ± 25 Da, 5115 Da ± 26 Da, 5497 Da ± 27 Da, 5655 Da + 28 Da, 5863 Da 29 Da, 6454 Da + 32 Da, 6655 Da ± 33 Da, 6906 Da ± 35 Da, 7012 Da ± 35 Da, 7591 Da + 38 Da, 7998 Da ± 40 Da, 8230 Da ± 41 Da, 8487 Da± 42 Da, 8589 Da ± 43 Da, 8717 Da 44 Da, 25 8792 Da ± 44 Da, 8939 Da ± 45 Da, 9160 Da ± 46 Da, 9221 Da ± 46 Da, 9377 Da ±47 Da, 9446 Da 47 Da, 9661 Da + 48 Da, 9737 Da ±49 Da, 9955 Da ± 50 Da, 10232 Da + 51 Da, 10464 Da ±52 Da, 10682 Da ±53 Da, 11414 Da ±57 Da, 11567 Da ± 58 Da, 11723 Da ±59 Da, 12492 Da t 62 Da, 12656 Da + 63 Da, 13652 Da± 68 Da, 13776 Da ± 69 Da, 13812 Da ± 69 Da, 14014 Da 70 Da, 14082 Da ± 70 Da, 14821 Da ±74 Da, 15160 Da± 76 Da, 15367 Da 30 77 Da, 15909 Da± 78 Da, 15975 Da ± 80 Da, 16202 Da + 81 Da, 17288 Da± 86 Da, 17416 Da± 87 Da, 17504 Da ± 88 Da, 17638 Da ± 88 Da, 17961 Da ± 90 Da, 18146 Da ± 91 Da, 18430 Da ± 92 Da, 18656 Da ± 93 Da, 22383 Da ± 112 Da, 22496 Da ± 113 Da, 22710 Da ± 114 Da, 23218 Da ± 116 Da, 28119 Da ± 141 Da, or 28313 Da ± 142 Da may be detected within a given sample. Detection of a single or a combination of more than one biomolecule of 35 the invention is based on specific sample pre-treatment conditions, the pH of binding conditions, and the type of biologically active surface used for the detection of biomolecules. For example, prior to the detection of the biomolecules described herein, a given sample is pre- WO 2004/102188 PCT/EP2004/005292 treated by diluting 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 2% ampholine. The denatured sample is then diluted 1:10 in a specific binding buffer (0.1 M Tris-HC1, 0.02% Triton X-100, pH 8.5), applied to a biologically active surface comprising of positively-charged quaternary ammonium groups (cationic) and incubated 5 using specific buffer conditions (0.1 M Tris-HC1, 0.02% Triton X-100, pH 8.5) to allow for binding of said biomolecules to the above-mentioned biologically active surface. According to the invention, a biomolecule with the molecular mass of 1506 Da + 8 Da, 1533 Da ± 8 Da, 1623 Da 8 Da, 1975 Da ± 10 Da, 2017 Da 10 Da, 2053 Da + 10 Da, 2268 Da t 10 11 Da, 2607 Da: 13 Da, 3328 Dad: 17 Da, 3508 Da + 18 Da, 3660 Dad: 18 Da, 3951 Dad= 20 Da, 4107 Da t 21 Da, 4161 Da 21 Da, 4245 Da 21 Da, 4295 Da + 21 Da, 4363 Da+ 22 Da, 4476 Da ± 22 Da, 4614 Da ± 23 Da, 4725 Da 24 Da, 4831 Da ± 24 Da, 4874 Da 24 Da, 4962 Dad+ 25 Da, 5115 Da + 26 Da, 5497 Da 27 Da, 5655 Dad: 28 Da, 5863 Da 29 Da, 6454 Da + 32 Da, 6655 Da ± 33 Da, 6906 Da 35 Da, 7012 Da t 35 Da, 7591 Dad+ 38 Da, 15 7998 Dad: 40 Da, 8230 Da + 41 Da, 8487 Da 42 Da, 8589 Da + 43 Da, 8717 Da 44 Da, 8792 Da + 44 Da, 8939 Da ± 45 Da, 9160 Da 46 Da, 9221 Da 46 Da, 9377 Da ± 47 Da, 9446 Da + 47 Da, 9661 Da 48 Da, 9737 Da 49 Da, 9955 Da: 50 Da, 10232 Da ± 51 Da, 10464 Da 52 Da, 10682 Da ±53 Da, 11414 Da 57 Da, 11567 Da 58 Da, 11723 Da: 59 Da, 12492 Da ± 62 Da, 12656 Da + 63 Da, 13652 Da + 68 Da, 13776 Da t 69 Da, 13812 Da + 20 69 Da, 14014 Da 70 Da, 14082 Da 70 Da, 14821 Da 74 Da, 15160 Da+ 76 Da, 15367 Da ± 77 Da, 15909 Da 78 Da, 15975 Da 80 Da, 16202 Da 81 Da, 17288 Da 86 Da, 17416 Da +87 Da, 17504 Dad 88 Da, 17638 Da t 88 Da, 17961 Da 90 Da, 18146 Da 91 Da, 18430 Dad: 92 Da, 18656 Da 93 Da, 22383 Da 112 Da, 22496 Da 113 Da, 22710 Dad: 114 Da, 23218 Da t 116 Da, 28119 Da 141 Da, or 28313 Da 142 Da is detected by diluting 25 the biological sample 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 2% Ampholine, and then 1:10 in binding buffer consisting of 0.1 M Tris-HC1, 0.02% Triton X-100 at pH 8.5 at 0 to 4'C, applying thus treated sample to a biologically active surface comprising positively charged (cationic) quaternary ammonium groups (anion exchanging), incubating for 120 minutes at 20 to 24 C, and subjecting the bound 30 biomolecules to gas phase ion spectrometry as described in another section. A biomolecule of the invention may include any molecule that is produced by a cell or living organism, and may have any biochemical property (e.g. phosphorylated proteins, glycosylated proteins, positively charged molecules, negatively charged molecules, hydrophobicity, 35 hydrophilicity), but preferably biochemical properties that allow binding of the biomolecule to a biologically active surface comprising positively charged quaternary ammonium groups after denaturation in 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 2% Ampholine and dilution WO 2004/102188 PCT/EP2004/005292 in 0.1 M Tris-HCI, 0.02% Triton X-100 at pH 8.5 at 0 to 4'C followed by incubation on said biologically active surface for 120 minutes at 20 to 24'C. Such molecules include, but are not limited to, molecules comprising nucleotides, amino acids, sugars, fatty acids, steroids, nucleic acids, polynucleotides (DNA or RNA), polypeptides, proteins, antibodies, carbohydrates, lipids, 5 and combinations thereof (e.g., glycoproteins, ribonucleoproteins, lipoproteins). Preferably a biomolecule may be a nucleotide, polynucleotide, peptide, protein or fragments thereof. Even more preferred are peptide or protein biomolecules or fragments thereof. The methods for detecting these biomolecules have many applications. For example, a single 10 biomolecule or a combination of more than one biomolecule selected from the group having an apparent molecular mass of 1506 Da + 8 Da, 1533 Da + 8 Da, 1623 Da + 8 Da, 1975 Da ± 10 Da, 2017 Da + 10 Da, 2053 Dal 10 Da, 2268 Da 11 Da, 2607 Da 13 Da, 3328 Da 17 Da, 3508 Da t 18 Da, 3660 Da 18 Da, 3951 Dad: 20 Da, 4107 Dad: 21 Da, 4161 Da ± 21 Da, 4245 Da + 21 Da, 4295 Da 21 Da, 4363 Da 22 Da, 4476 Da ± 22 Da, 4614 Da i 23 Da, 15 4725 Da + 24 Da, 4831 Da 24 Da, 4874 Da 24 Da, 4962 Da± 25 Da, 5115 Dad: 26 Da, 5497 Da ± 27 Da, 5655 Da 28 Da, 5863 Da 29 Da, 6454 Da + 32 Da, 6655 Da + 33 Da, 6906 Da & 35 Da, 7012 Da 35 Da, 7591 Da + 38 Da, 7998 Da t 40 Da, 8230 Da ± 41 Da, 8487 Da 42 Da, 8589 Da 43 Da, 8717 Da 44 Da, 8792 Dad 44 Da, 8939 Da ± 45 Da, 9160 Dad: 46 Da, 9221 Da 46 Da, 9377 Dad: 47 Da, 9446 Da + 47 Da, 9661 Dad: 48 Da, 20 9737 Dad: 49 Da, 9955 Da 50 Da, 10232 Da 51 Da, 10464 Da : 52 Da, 10682 Da d 53 Da, 11414 Da + 57 Da, 11567 Da ±58 Da, 11723 Da : 59 Da, 12492 Da + 62 Da, 12656 Dad: 63 Da, 13652 Da 68 Da, 13776 Da 69 Da, 13812 Dad 69 Da, 14014 Da + 70 Da, 14082 Da + 70 Da, 14821 Dad 74 Da, 15160 Da 76 Da, 15367 Da 77 Da, 15909 Da 78 Da, 15975 Da 80 Da, 16202 Dad: 81 Da, 17288 Da 86 Da, 17416 Da + 87 Da, 17504 Da + 88 Da, 17638 25 Da 88 Da, 17961 Da: 90 Da, 18146 Da 91 Da, 18430 Da 92 Da, 18656 Da 93 Da, 22383 Da 112 Da, 22496 Da 113 Da, 22710 Da± 114 Da, 23218 Da± 116 Da, 28119Da+± 141 Da, or 28313 Da d: 142 Da can be measured to differentiate between healthy subjects, subjects having a precancerous lesion of the breast, subjects having breast cancer, subjects having a metastasized breast cancer or subjects with a non-malignant disease of the breast, and 30 thus are useful as an aid in the diagnosis of a breast cancer and/or a non-malignant disease of the breast within a subject. Alternatively, said biomolecules may be used to diagnose a subject as healthy. For example, a biomolecule having the apparent molecular mass of about e.g. 8940 Da is 35 present only in biological samples from patients having a metastasised breast cancer. Mass profiling of two test samples from different subjects, X and Y, reveals the presence of a biomolecule with the apparent molecular mass of about 8940 Da in a sample from test subject WO 2004/102188 PCT/EP2004/005292 X, and the absence of said biomolecule in test sample from subject Y. The medical practitioner is able to diagnose subject X as having a potential metastasised breast cancer and subject Y as not having a metastasised breast cancer. In yet another example, three biomolecules having the apparent molecular mass of about 2053 Da, 4161 Da and 10682 Dare present in varying 5 quantities in samples specific for precancerous lesions and "early" breast cancers. The biomolecule having the apparent molecular mass of 2053 Da is more present in samples specific for precancerous lesions of the breast than for "early" breast cancers. A biomolecule having an apparent molecular mass of 4161 Da is detected in samples from subjects having "early" breast cancers but not in those having a precancerous lesion, whereas the biomolecule having the 10 molecular mass of 10682 Da is present in about the same quantity in both sample types. Such biomolecules are not present in samples from healthy subjects, only those of apparent molecular mass of 14014 Da and 9377 Da. Analysis of a test sample reveals the presence of biomolecules having the molecular mass of 10682 Da, 2053 Da and 4161 Da. Comparison of the quantity of the biomolecules within said sample reveals that the biomolecule with an apparent molecular 15 mass of 2053 Da is present at lower levels than those found in samples from subjects having a precancerous lesion. The medical practitioner is able to diagnose the test subject as having an "early" breast cancer. These examples are solely used for the purpose of clarification and are not intended to limit the scope of this invention. 20 In another aspect of the invention, an immunoassay can be used to determine the presence or absence of a biomolecule within a test sample of a subject. First, the presence or absence of a biomolecule within a sample can be detected using the various immunoassay methods known to those skilled in the art (i.e. ELISA, western blots). If a biomolecule is present in the test sample, it will form an antibody-marker complex with an antibody that specifically binds a biomolecule 25 under suitable incubation conditions. The amount of an antibody-biomolecule complex can be determined by comparing to a standard. Thus the invention provides a method for the differential diagnosis of a breast cancer and/or a non-malignant disease of the breast comprising: detecting of one or more differentially 30 expressed biomolecules within a sample. This method comprises obtaining a test sample from a subject, contacting said sample with a binding molecule specific for a differentially expressed polypeptide, detecting an interaction between the binding molecule and its specific polypeptide, wherein the detection of an interaction indicates the presence or absence of said polypeptide, thereby allowing for the differential diagnosis of a subject as healthy, having a precancerous 35 lesion of the breast, having a breast cancer, having a metastasised breast cancer and/or a non-malignant disease of the breast. Binding molecules include, but are not limited to, proteins, peptides, nucleotides, nucleic acids, hormones, amino acids, sugars, fatty acids, steroids, WO 2004/102188 PCT/EP2004/005292 polynucleotides, carbohydrates, lipids, or a combination thereof (e.g. glycoproteins, ribonucleoproteins, lipoproteins), compounds or synthetic molecules. Preferably, binding molecules are antibodies specific for biomolecules selected from the group of having an. apparent molecular mass of 1506 Da + 8 Da, 1533 Da 8 Da, 1623 Da t 8 Da, 1975 Da + 10 5 Da, 2017 Dad± 1.0 Da, 2053 Da 10 Da, 2268 Dad: 11 Da, 2607 Da 13 Da, 3328 Da 17 Da, 3508 Dad: 18 Da, 3660 Da ± 18 Da, 3951 Da 20 Da, 4107 Da + 21 Da, 4161 Da ± 21 Da, 4245 Da +- 21 Da, 4295 Da 4 21 Da, 4363 Da + 22 Da, 4476 Da 22 Da, 4614 Da ± 23 Da, 4725 Dad: 24 Da, 4831 Da ± 24 Da, 4874 Dad: 24 Da, 4962 Da 25 Da, 5115 Dad: 26 Da, 5497 Da ± 27 Da, 5655 Da ± 28 Da, 5863 Da 29 Da, 6454 Da 32 Da, 6655 Da + 33 Da, 10 6906 Dad 35 Da, 7012 Da ± 35 Da, 7591 Da 38 Da, 7998 Da 40 Da, 8230 Da ± 41 Da, 8487 Da 42 Da, 8589 Da + 43 Da, 8717 Dad: 44 Da, 8792 Dad: 44 Da, 8939 Da : 45 Da, 9160 Da 46 Da, 9221 Dad: 46 Da, 9377 Da 47 Da, 9446 Dad: 47 Da, 9661 Da + 48 Da, 9737 Da 49 Da, 9955 Dad: 50 Da, 10232 Da 51 Da, 10464 Dad: 52 Da, 10682 Dad: 53 Da, 11414 Da +57 Da, 11567 Da + 58 Da, 11723 Da: 59 Da, 12492 Da + 62 Da, 12656 Da + 63 15 Da, 13652 Da 68 Da, 13776 Da: 69 Da, 13812 Da + 69 Da, 14014 Da± 70 Da, 14082 Da: 70 Da, 14821 Da 74 Da, 15160 Da 76 Da, 15367 Da 77 Da, 15909 Da: 78 Da, 15975 Da d:80 Da, 16202 Da 81 Da, 17288 Da: 86 Da, 17416 Da + 87 Da, 17504 Da: 88 Da, 17638 Da 88 Da, 17961 Da 90 Da, 18146 Da 91 Da, 18430 Da 92 Da, 18656 Da 93 Da, 22383 Da 112 Da, 22496 Da 113 Da, 22710 Da 114 Da, 23218 Da 116 Da, 28119 Da 20 141 Da, or 28313 Da: 142 Da In another aspect of the invention, a method for detecting the differential presence or absence of one or more biomolecules selected from the group having an apparent molecular mass of 1506 Da ±8 Da, 1533 Da± 8 Da, 1623 Da 8 Da, 1975 Da 10 Da, 2017 Da + 10 Da, 2053 Da 25 10 Da, 2268 Da 11 Da, 2607 Da± 13 Da, 3328 Da± 17 Da, 3508 Da + 18 Da, 3660 Da+± 18 Da, 3951 Da t 20 Da, 4107 Dad: 21 Da, 4161 Da 21 Da, 4245 Da + 21 Da, 4295 Da 21 Da, 4363 Da 22 Da, 4476 Da + 22 Da, 4614 Da 23 Da, 4725 Da i 24 Da, 4831 Da + 24 Da, 4874 Dad: 24 Da, 4962 Da 25 Da, 5115 Dad: 26 Da, 5497 Da + 27 Da, 5655 Da 28 Da, 5863 Da± 29 Da, 6454 Da 32 Da, 6655 Dad 33 Da, 6906 Da + 35 Da, 7012 Da 35 Da, 30 7591 Dad 38 Da, 7998 Da =L 40 Da, 8230 Dad: 41 Da, 8487 Da ± 42 Da, 8589 Da 43 Da, 8717 Da: 44 Da, 8792 Da 44 Da, 8939 Da 45 Da, 9160 Da ± 46 Da, 9221 Da 46 Da, 9377 Da 47 Da, 9446 Da 47 Da, 9661 Da ± 48 Da, 9737 Dad: 49 Da, 9955 Da: 50 Da, 10232 Da + 51 Da, 10464 Dad: 52Da, 10682 Da 53 Da, 11414 Da: 57 Da, 11567 Da + 58 Da, 11723 Da + 59 Da, 12492 Da 62 Da, 12656 Da 163 Da, 13652 Da t 68 Da, 13776 Da 35 69 Da, 13812 Da 69 Da, 14014 Da 70 Da, 14082 Da: 70 Da, 14821 Da 74 Da, 15160 Da d 76 Da, 15367 Da 77 Da, 15909 Da 78 Da, 15975 Da 80 Da, 16202 Da: 81 Da, 17288 Da 86 Da, 17416 Da + 87 Da, 17504 Da : 88 Da, 17638 Da : 88 Da, 17961 Da ± 90 Da, WO 2004/102188 PCT/EP2004/005292 18146 Da+ 91 Da, 18430 Da 92 Da, 18656 Da± 93 Da, 22383 DaL 112 Da, 22496 Da± 113 Da, 22710 Da 114 Da, 23218 Da 116 Da, 28119 Da 141 Da, or 28313 Da+ 142 Da in a test sample of a subject involves contacting the test sample with a compound or agent capable of detecting said biomolecule such that the presence or absence of said biomolecule is directly 5 and/or indirectly labelled. For example a fluorescently labelled secondary antibody can be used to detect a primary antibody bound to its specific biomolecule. Furthermore, such detection methods can be used to detect a variety of biomolecules within a test sample both in vitro as well as in vivo. 10 For example, in vivo, antibodies or fragments thereof may be utilised for the detection of a biomolecule in a biological sample comprising: applying a labelled antibody directed against a given biomolecule of the invention to said sample under conditions that favour an interaction between the labelled antibody and its corresponding protein. Depending on the nature of the biological sample, it is possible to determine'not only the presence of a biomolecule, but also its 15 cellular distribution. For example, in a blood serum sample, only the serum levels of a given biomolecule can be detected, whereas its level of expression and cellular localisation can be detected in histological samples. It will be obvious to those skilled in the art, that a wide variety of methods can be modified in order to achieve such detection. 20 For example, an antibody coupled to an enzyme is detected using a chromogenic substrate that is recognised and cleaved by the enzyme to produce a chemical. moiety, which is readily detected using spectrometric, fluorimetric or visual means. Enzymes used to for labelling include, but are not limited to, malate dehydrogenase, staphylococcal nuclease, delta-S-steroid isomerase, yeast alcohol dehydrogenase, alpha-glycerophosphate, dehydrogenase, triose 25 phosphate isomerase, horseradish peroxidase, alkaline phosphatase, asparaginase, glucose oxidase, beta-galactosidase, ribonuclease, urease, catalase, glucose-6-phosphate dehydrogenase, glucoamylase and acetyleholinesterase. Detection may also be accomplished by visual comparison of the extent of the enzymatic reaction of a substrate with that of similarly prepared standards. Alternatively, radiolabelled antibodies can be detected using a gamma or a 30 scintillation counter, or they can be detected using autoradiography. In another example, fluorescently labelled antibodies are detected based on the level at which the attached compound fluoresces following exposure to a given wavelength. Fluorescent compounds typically used in antibody labelling include, but are not limited to, fluorescein isothiocynate, rhodamine, phycoerthyrin, phycocyanin, allophycocyani, o-phthaldehyde and fluorescamine. In 35 yet another example, antibodies coupled to a chemi- or bioluminescent compound can be detected by determining the presence of luminescence. Such compounds include, but are not WO 2004/102188 PCT/EP2004/005292 limited to, luminal, isoluminal, theromatic acridinium ester, imidazole, acridinium salt, oxalate ester, luciferin, luciferase and aequorin. Furthermore, in vivo techniques for the detection of a biomolecule of the invention include 5 introducing into a subject a labelled antibody directed against a given polypeptide or fragment thereof. In more than one embodiment of the invention, the test sample used for the differential diagnosis of a breast cancer and/or a non-malignant disease of the breast within a subject may 10 be of blood, blood serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, excreta, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract origin. Preferably, test samples are of blood, blood serum, plasma, urine, excreta, prostatic fluid, biopsy, ascites, lymph or tissue extract origin. More preferred are blood, blood serum, plasma, urine, excreta, biopsy, lymph or tissue extract samples. Even more preferred are 15 blood serum, urine, excreta or biopsy samples. Overall preferred are blood serum samples. Furthermore, test samples used for the methods of the invention are isolated from subjects of mammalian origin, preferably of primate origin. Even more preferred are subjects of human origin. 20 In addition, the methods of the invention for the differential diagnosis of healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, subjects having a metastasized breast cancer or subjects having a non-malignant disease of the breast described herein may be combined with other diagnostic methods to improve the outcome of the 25 differential diagnosis. Other diagnostic methods are known to those skilled in the art. b) Database In another aspect of the invention, a database comprising of mass profiles specific for healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, 30 subjects having a metastasised breast cancer, or subjects having a non-malignant disease of the breast is generated by contacting biological samples isolated from above-mentioned subjects with an adsorbent on a biologically active surface under specific binding conditions, allowing the biomolecules within said sample to bind said adsorbent, detecting one or more bound biomolecules using a detection method wherein the detection method generates a mass profile 35 of said sample, transforming the mass profile data into a computer-readable form and applying a mathematical algorithm to classify the mass profile as specific for healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, subjects having a WO 2004/102188 PCT/EP2004/005292 metastasised breast cancer or subjects having a non-malignant disease of the breast. According to the invention, the classification of said mass profiles is performed using the "CART" decision tree approach [classification and regression trees; Breiman et al. (1984) 5 Classification and regression trees. Wadsworth International, Belmont, California] and is known to those skilled in the art. Furthermore, bagging of classifiers is applied to overcome typical instabilities of forward variable selection procedures, thereby increasing overall classifier performance [Breiman L. (1996) Bagging Predictors. Machine learning 24: 123-140]. 10 In more than one embodiment, one or more biomolecules selected from the group having an apparent molecular mass of 1506 Da ± 8 Da, 1533 Da + 8 Da, 1623 Da + 8 Da, 1975 Da + 10 Da, 2017 Dad 10 Da, 2053 Da ± 10 Da, 2268 Da 11 Da, 2607 Da 13 Da, 3328 Da 17 Da, 3508 Da + 18 Da, 3660 Da 18 Da, 3951 Da 20 Da, 4107 Da + 21 Da, 4161 Da ± 21 Da, 4245 Da ± 21 Da, 4295 Da + 21 Da, 4363 Dadt 22 Da, 4476 Da + 22 Da, 4614 Da ± 23 Da, 15 4725 Da + 24 Da, 4831 Da 24 Da, 4874 Da 24 Da, 4962 Dad: 25 Da, 5115 Da : 26 Da, 5497 Da ± 27 Da, 5655 Da + 28 Da, 5863 Da± 29 Da, 6454 Da + 32 Da, 6655 Da + 33 Da, 6906 Da ± 35 Da, 7012 Da 35 Da, 7591 Da + 38 Da, 7998 Da t 40 Da, 8230 Da 41 Da, 8487 Da ± 42 Da, 8589 Da 43 Da, 8717 Da ± 44 Da, 8792 Da 44 Da, 8939 Da + 45 Da, 9160 Da 46 Da, 9221 Da 1 46 Da, 9377 Da + 47 Da, 9446 Da 47 Da, 9661 Da 48 Da, 20 9737 Dad: 49 Da, 9955 Da ± 50 Da, 10232 Da ± 51 Da, 10464 Da 52 Da, 10682 Dad: 53 Da, 11414 Dad i57 Da, 11567 Da: 58 Da, 11723 Da: 59 Da, 12492 Da 62 Da, 12656 Dad: 63 Da, 13652 Da: 68 Da, 13776 Dad 69 Da, 13812 Da 69 Da, 14014 Da +70 Da, 14082 Da 70 Da, 14821 Da 74 Da, 15160 Da: 76 Da, 15367 Da 77 Da, 15909 Da 78 Da, 15975 Da +80 Da, 16202 Da 81 Da, 17288 Da 86 Da, 17416 Da t 87 Da, 17504 Da + 88 Da, 17638 25 Da 88 Da, 17961 Da 90 Da, 18146 Da 91 Da, 18430 Da 92 Da, 18656 Dad: 93 Da, 22383 Da: 112 Da, 22496 Da 113 Da, 22710 Da 114 Da, 23218 Da± 116 Da, 28119 Da± 141 Da, or 2831-3 Da 142 Da may be detected within a given biological sample. Detection of said biomolecules of the invention is based on specific sample pre-treatment conditions, the pH of binding conditions, and the type of biologically active surface used for the detection of 30 biomolecules. Within the context of the invention, biomolecules within a given sample are bound to an adsorbent on a biologically active surface under specific binding conditions, for example, the biomolecules within a given sample are applied to a biologically active surface comprising 35 positively-charged quaternary ammonium groups (cationic) and incubated with 0.1 M Tris-HC1, 0.02% Triton X-100 at a pH of 8.5 to allow for specific binding. Biomolecules that bind to said biologically active surface under these conditions are negatively charged molecules. It should be WO 2004/102188 PCT/EP2004/005292 noted that although the biomolecules of the invention are bound to a cationic adsorbent comprising of positively-charged quaternary ammonium groups, the biomolecules are capable of binding other types of adsorbents, as described in another section using binding conditions known to those skilled in the art. Accordingly, some embodiments of the invention are not 5 limited to the use of cationic adsorbents. According to the invention, a biomolecule with the molecular mass of 1506 Da ± 8 Da, 1533 Da + 8 Da, 1623 Da 8 Da, 1975 Da 10 Da, 2017 Da 10 Da, 2053 Da 10 Da, 2268 Dad± 11 Da, 2607 Da± 13 Da, 3328 Da 17 Da, 3508 Da 18 Da, 3660 Da 18 Da, 3951 Da ± 20 10 Da, 4107 Da t 21 Da, 4161 Da 21 Da, 4245 Da 21 Da, 4295 Da + 21 Da, 4363 Da 422 Da, 4476 Da ± 22 Da, 4614 Da ± 23 Da, 4725 Da + 24 Da, 4831 Da ± 24 Da, 4874 Da : 24 Da, 4962 Da + 25 Da, 5115 Dad± 26 Da, 5497 Da 27 Da, 5655 Da & 28 Da, 5863 Dad: 29 Da, 6454 Dad 32 Da, 6655 Da +L 33 Da, 6906 Dad: 35 Da, 7012 Da 35 Da, 7591 Da ± 38 Da, 7998 Da ± 40 Da, 8230 Da ± 41 Da, 8487 Da 42 Da, 8589 Da ± 43 Da, 8717 Da ± 44 Da, 15 8792 Dad: 44 Da, 8939 Da + 45 Da, 9160 Da: 46 Da, 9221 Da 46 Da, 9377 Da 47 Da, 9446 Da 47 Da, 9661 Da + 48 Da, 9737 Da + 49 Da, 9955 Da ± 50 Da, 10232 Da + 51 Da, 10464 Dad: 52 Da, 10682 Da 53 Da, 11414 Da± 57 Da, 11567 Da± 58 Da, 11723 Da l 59 Da, 12492 Da 62 Da, 12656 Dad: 63 Da, 13652 Da: 68 Da, 13776 Da 69 Da, 13812 Da + 69 Da, 14014 Da + 70 Da, 14082 Da 70 Da, 14821 Da 74 Da, 15160 Da 76 Da, 15367 Da 20 db77 Da, 15909 Da 78 Da, 15975 Da: 80 Da, 16202 Da 81 Da, 17288 Dad: 86 Da, 17416 Da ± 87 Da, 17504 Da ± 88 Da, 17638 Da + 88 Da, 17961 Da ± 90 Da, 18146 Da ± 91 Da, 18430 Da 92 Da, 18656 Da 93 Da, 22383 Da i 112 Da, 22496 Dad 113 Da, 22710 Da: 114 Da, 23218 Dad: 116 Da, 28119 Da 141 Da, or 28313 Dad: 142 Da is detected by diluting the biological sample 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% 25 CHAPS, 1% DTT, and 2% Ampholine, and then 1:10 in binding buffer consisting of 0.1 M Tris-HC1, 0.02% Triton X-100 at pH 8.5 at 0 to 4*C, applying thus treated sample to a biologically active surface comprising positively charged (cationic) quaternary ammonium groups (anion exchanging), incubating for 120 minutes at 20 to 24'C, and subjecting the bound biomolecules to gas phase ion spectrometry as described in another section. 30 In one embodiment of the invention, biological samples used to generate a database of mass profiles for healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, subjects having a metastasised breast cancer or subjects having a non-malignant disease of the breast, may be of blood, blood serum, plasma, nipple aspirate, urine, semen, 35 seminal fluid, seminal plasma, prostatic fluid, excreta, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract origin. Preferably, biological samples are of blood, blood serum, plasma, urine, excreta, prostatic fluid, biopsy, ascites, lymph or tissue WO 2004/102188 PCT/EP2004/005292 extract origin. More preferred are blood, blood serum, plasma, urine, excreta, biopsy, lymph or tissue extract samples. Even more preferred are blood serum, urine, excreta or biopsy samples. Overall preferred are blood serum samples. 5 Furthermore, the biological samples related to the invention are isolated from subjects considered to be healthy, having a precancerous lesion of the breast, having a breast cancer, having a metastasised breast cancer or having a non-malignant disease of the breast. Said subjects are of mammalian origin, preferably of primate origin. Even more preferred are subjects of human origin. 10 A subject of the invention that is said to have a precancerous lesion of the breast, displays preliminary stages of a cancer, wherein a cell and/or tissue has become susceptible to the development of a cancer as a result of either a genetic predisposition, exposure to a cancer-causing agent (carcinogen) or both. 15 A genetic pre-disposition may include a predisposition for an autosomal dominant inherited cancer syndrome which is generally indicated by a strong family history of uncommon cancer and/or an association with a specific marker phenotype, a familial cancer (e.g. familial relapsing a non-malignant disease of the breast) wherein an evident clustering of cancer is observed but 20 the role of inherited predisposition may not be clear, or an autosomal recessive syndrome characterised by chromosomal or DNA instability. Cancer-causing agents include agents that stimulate genetic damage and induce neoplastic transformation of a cell. Such agents fall into three categories: 1) chemical carcinogens such as alkylating agents, polycyclic aromatic hydrocarbons, aromatic amines, azo dyes, nitrosamines and amides, asbestos, vinyl chloride, 25 chromium, nickel, arsenic, and naturally occurring carcinogens (e.g. aflotoxin B 1); 2) radiation such as ultraviolet (UV) and ionisation radiation including electromagnetic (e.g. x-rays, y-rays) and particulate radiation (e.g. a and P particles, protons, neutrons); 3) viral and microbial carcinogens such as human Papillomavirus (HPV), Epstein-Barr virus (EBV), hepatitis B virus (HBV), human T-cell leukaemia virus type 1 (HTLV-1), or Helicobacter pylori. In addition, 30 environmental factors have also been implicated to play a role in the predisposition of breast cancer. Such factors are known to those skilled in the art and include, but are not limited to smoking, chronic alcohol intake, and the consumption of a high-energy diet rich in fats. Furthermore, breast cancer arises with greater frequency in patient with chronic a non-malignant disease of the breast 35 Within the context of the invention, cancers of the breast are also referred to as mammary cancers or carcinomas of the breast. Breast cancers of the invention include both in situ (non- WO 2004/102188 PCT/EP2004/005292 invasive) and invasive breast cancers. Whereas, in situ (non-invasive) breast cancers include ductal und lobular carcinoma in situ (DCIS und LCIS, respectively), invasive breast cancers encompass infiltrating diseases such as invasive ductal, lobular und papillary carcinoma's (DCIS und LCIS) and medullar, colloid, und tubular carcinomas. Furthermore, breast cancers of 5 the invention may also be of various stages, wherein the staging is based on the size of the primary lesion, its extent of spread to regional lymph nodes, and the presence or absence of blood-borne metastases (metastatic breast cancers). The various stages of a breast cancer may be identified using staging systems known to those skilled in the art [e.g. Union Internationale Contre Cancer (UICC) system or American Joint Committee on Cancer (AJC)]. Also included 10 are different grades of said breast cancers, wherein the grade of the cancer is based on the degree of differentiation of the epithelial cells within the lining of the breast and the number of mitoses as a correlation to a neoplasm's aggression. A subject said to have a non-malignant disease of the breast possesses a lesion of the breast that 15 does not exhibit malignant neoplastic physiological, biochemical, and/or morphological properties known to those skilled in the art. Such diseases include, but are not limited to, inflammatory and proliferative lesions, fibrocystic changes within mammary tissue as well as benign disorders of the breast. Within the context of the invention, inflammatory lesions encompass acute, periductal and granulomatous mastitis, duct ectasia, fat necrosis, whereas 20 proliferative lesions include epithelial hyperplasia (atypical ductal and lobular hyperplasia), sclerosing adenosis, and small duct papillomas. Also included in the invention are benign disorders of the glandular tissue (mastopathy), papillomas (large duct, intraductal), and fibroadenomas. 25 Healthy individuals, as related to certain embodiments of the invention, are those that possess good health, and demonstrate an absence of a breast cancer or a non-malignant disease of the breast. c) Biomolecules 30 The differential expression of biomolecules in samples from healthy subjects, subjects having a precancerous lesion of the breast, subjects having a breast cancer, subjects having metastasised breast cancer, and subjects having a non-malignant disease of the breast, allows for the differential diagnosis of a breast cancer and/or a non-malignant disease of the breast within a subject. 35 Biomolecules are said to be specific for a particular clinical state (e.g. healthy, precancerous lesion of the breast, breast cancer, metastasised breast cancer, a non-malignant disease of the WO 2004/102188 PCT/EP2004/005292 breast) when they are present at different levels within samples taken from subjects in one clinical state as compared to samples taken from subjects from other clinical states (e.g. in subjects with a precancerous lesion of the breast vs. in subjects with a metastasised breast cancer). Biomolecules may be present at elevated levels, at decreased levels, or altogether 5 absent within a sample taken from a subject in a particular clinical state (e.g. healthy, precancerous lesion of the breast, breast cancer, metastasised breast cancer, a non-malignant disease of the breast). For example, biomolecules And B are found at elevated levels in samples isolated from healthy subjects as compared to samples isolated from subjects having a precancerous lesion of the breast, a breast cancer, a metastatic breast cancer or a non-malignant 10 disease of the breast. Whereas, biomolecules X, Y, Z are found at elevated levels and/or more frequently in samples isolated from subjects having a precancerous lesion of the breast as opposed to subjects in good health, having a breast cancer, a metastasised breast cancer or a non-malignant disease of the breast. Biomolecules And B are said to be specific for healthy subjects, whereas biomolecules X, Y, Z are specific for subjects having a precancerous lesion of 15 the breast. Accordingly, the differential presence of one or more biomolecules found in a test sample compared to samples from healthy subjects, subjects with a precancerous lesion of the breast, a breast cancer, a metastasized breast cancer, or a non-malignant disease of the breast, or the mere 20 detection of one or more biomolecules in the test sample provides useful information regarding probability of whether a subject being tested has a precancerous lesion of the breast, a breast cancer, a metastasized breast cancer or a non-malignant disease of the breast. The probability that a subject being tested has a precancerous lesion of the breast, a breast cancer, a metastasized breast cancer or a non-malignant disease of the breast depends on whether the 25 quantity of one or more biomolecules in a test sample taken from said subject is statistically significantly different from the quantity of one or more biomolecules in a biological sample taken from healthy subjects, subjects having a precancerous lesion of the breast, a breast cancer, a metastasised breast cancer, or a non-malignant disease of the breast. 30 A biomolecule of the invention may be any molecule that is produced by a cell or living organism, and may have any biochemical property (e.g. phosphorylated proteins, positively charged molecules, negatively charged molecules, hydrophobicity, hydrophilicity), but preferably biochemical properties that allow binding of the biomolecule to a biologically active surface comprising positively charged quaternary ammonium groups after denaturation in 7 M 35 urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 2% Ampholine and dilution in 0.1 M Tris-HC1, 0.02% Triton X-100 at pH 8.5 at 0 to 4*C followed by incubation on said biologically active surface for 120 minutes at 20 to 24'C. Such molecules include, but are not limited to, molecules WO 2004/102188 PCT/EP2004/005292 comprising nucleotides, amino acids, sugars, fatty acids, steroids, nucleic acids, polynucleotides (DNA or RNA), polypeptides, proteins, antibodies, carbohydrates, lipids, and combinations thereof (e.g., glycoproteins, ribonucleoproteins, lipoproteins). Preferably a biomolecule may be a nucleotide, polynucleotide, peptide, protein or fragments thereof. Even more preferred are 5 peptide or protein biomolecules. The biomolecules of the invention can be detected based on specific sample pre-treatment conditions, the pH of binding conditions, the type of biologically active surface used for the detection of biomolecules within a given sample and their molecular mass. For example, prior 10 to the detection of the biomolecules described herein, a given sample is pre-treated by diluting 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 2% ampholine. The denatured sample is then diluted 1:10 in 0.1 M Tris-HCl, 0.02% Triton X-100, pH 8.5, applied to a biologically active surface comprising positively-charged quaternary ammonium groups (cationic) and incubated using specific buffer conditions (0.1 M 15 Tris-HC1, 0.02% Triton X-100, pH 8.5) to allow for binding of said biomolecules to the above-mentioned biologically active surface. It should be noted that although the biomolecules of the invention are detected using a cationic adsorbent positively charged quaternary ammonium groups, as well as specific pre-treatment and binding conditions, the biomolecules are capable of binding other types of adsorbents, as described below, using alternative 20 pre-treatment and binding conditions known to those skilled in the art. Accordingly, some embodiments of the invention are not limited to the use of cationic adsorbents. The biomolecules of the invention include biomolecules having a molecular mass selected from the group consisting of 1506 Da ± 8 Da, 1533 Da ± 8 Da, 1623 Da ± 8 Da, 1975 Da 10 Da, 25 2017 Da 10 Da, 2053 Da ± 10 Da, 2268 Da± 11 Da, 2607 Da t 13 Da, 3328 Da 17 Da, 3508 Da 18 Da, 3660 Da + 18 Da, 3951 Da ± 20 Da, 4107 Da 21 Da, 4161 Da 21 Da, 4245 Da t 21 Da, 4295 Da 21 Da, 4363 Da & 22 Da, 4476 Da 22 Da, 4614 Da 23 Da, 4725 Da ± 24 Da, 4831 Da 24 Da, 4874 Da 24 Da, 4962 Da + 25 Da, 5115 Da 26 Da, 5497 Da ± 27 Da, 5655 Da 28 Da, 5863 Da 29 Da, 6454 Da 32 Da, 6655 Da 33 Da, 30 6906 Da ± 35 Da, 7012 Da 35 Da, 7591 Da 38 Da, 7998 Da 40 Da, 8230 Da 41 Da, 8487 Da + 42 Da, 8589 Da + 43 Da, 8717 Da + 44 Da, 8792 Da t 44 Da, 8939 Da 45 Da, 9160 Da ± 46 Da, 9221 Da 46 Da, 9377 Da ± 47 Da, 9446 Da + 47 Da, 9661 Da + 48 Da, 9737 Da ±49 Da, 9955 Da 50 Da, 10232 Da ± 51 Da, 10464 Da t 52 Da, 10682 Da ± 53 Da, 11414 Da ± 57 Da, 11567 Da 58 Da, 11723 Da t 59 Da, 12492 Da t 62 Da, 12656 Da 63 35 Da, 13652 Da ± 68 Da, 13776 Da ± 69 Da, 13812 Da ± 69 Da, 14014 Da 70 Da, 14082 Da± 70 Da, 14821 Da 74 Da, 15160 Da ± 76 Da, 15367 Da + 77 Da, 15909 Da± 78 Da, 15975 Da ± 80 Da, 16202 Da ± 81 Da, 17288 Da 1 86 Da, 17416 Da & 87 Da, 17504 Da + 88 Da, 17638 WO 2004/102188 PCT/EP2004/005292 Da 88 Da, 17961 Da ± 90 Da, 18146 Da ± 91 Da, 18430 Da + 92 Da, 18656 Da ± 93 Da, 22383 Da 112 Da, 22496 Da 113 Da, 22710 Da ±114 Da, 23218 Da 116 Da, 28119 Da 141 Da, or 28313 Da+: 142 Da 5 According to the invention, a biomolecule with the molecular mass of 1506 Da +E 8 Da, 1533 Da ± 8 Da, 1623 Da + 8 Da, 1975 Da 10 Da, 2017 Da ± 10 Da, 2053 Da + 10 Da, 2268 Da + 11 Da, 2607 Da ± 13 Da, 3328 Da 17 Da, 3508 Da 18 Da, 3660 Da i 18 Da, 3951 Da ± 20 Da, 4107 Da 21 Da, 4161 Da 21 Da, 4245 Da 21 Da, 4295 Da 21 Da, 4363 Da 22 Da, 4476 Da + 22 Da, 4614 Da ± 23 Da, 4725 Dad 24 Da, 4831 Da 24 Da, 4874 Da 24 Da, 10 4962 Da 25 Da, 5115 Da 26 Da, 5497 Da 27 Da, 5655 Da 28 Da, 5863 Da 29 Da, 6454 Da ± 32 Da, 6655 Da + 33 Da, 6906 Da + 35 Da, 7012 Da ± 35 Da, 7591 Da 38 Da, 7998 Da J: 40 Da, 8230 Da i 41 Da, 8487 Da ± 42 Da, 8589 Da 43 Da, 8717 Da t 44 Da, 8792 Da ± 44 Da, 8939 Da 45 Da, 9160 Da ± 46 Da, 9221 Da + 46 Da, 9377 Da ± 47 Da, 9446 Da ± 47 Da, 9661 Da 48 Da, 9737 Da ± 49 Da, 9955 Da ± 50 Da, 10232 Dad: 51 Da, 15 10464 Da 52 Da, 10682 Da : 53 Da, 11414 Da + 57 Da, 11567 Da 58 Da, 11723 Da ±59 Da, 12492 Da ± 62 Da, 12656 Da ± 63 Da, 13652 Da + 68 Da, 13776 Da+ 69 Da, 13812 Da + 69 Da, 14014 Dali 70 Da, 14082 Dad 70 Da, 14821 Da 74 Da, 15160 Dad 76 Da, 15367 Da !77 Da, 15909 Da + 78 Da, 15975 Da 80 Da, 16202 Da 81 Da, 17288 Da & 86 Da, 17416 Da 87 Da, 17504 Da d 88 Da, 17638 Da + 88 Da, 17961 Da ± 90 Da, 18146 Da & 91 Da, 20 18430 Da 92 Da, 18656 Da 93 Da, 22383 Da 112 Da, 22496 Da 113 Da, 22710 Da + 114 Da, 23218 Da: 116 Da, 28119 Da + 141 Da, or 28313 Dad: 142 Da is detected by diluting the biological sample 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 2% Ampholine, and then 1:10 in binding buffer consisting of 0.1 M Tris-HC1, 0.02% Triton X-100 at pH 8.5 at 0 to 4'C, applying thus treated sample to a 25 biologically active surface comprising positively charged (cationic) quaternary ammonium groups (anion exchanging), incubating for 120 minutes at 20 to 24'C, and subjecting the bound biomolecules to gas phase ion spectrometry as described in another section. Although said biomolecules were first identified in blood serum samples, their detection is not 30 limited to said sample type. The biomolecules may also be detected in other samples types, such as blood, blood serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, excreta, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract. Preferably, samples are of blood, blood serum, plasma, urine, excreta, prostatic fluid, biopsy, ascites, lymph or tissue extract origin. More preferred are blood, blood serum, 35 plasma, urine, excreta, biopsy, lymph or tissue extract samples. Even more preferred are blood serum, urine, excreta or biopsy samples. Overall preferred are blood serum samples.
WO 2004/102188 PCT/EP2004/005292 Since the biomolecules can be sufficiently characterized by their mass and biochemical characteristics such as the type of biologically active surface they bind to or the pH of binding conditions, it is not necessary to identify the biomolecules in order to be able to identify them in a sample. It should be noted that molecular mass and binding properties are characteristic 5 properties of these biomolecules and not limitations on the means of detection or isolation. Furthermore, using the methods described herein, or other methods known in the art, the absolute identity of the markers can be determined. This is important when one wishes to develop and/or screen for specific binding molecules, or to develop an assay for the detection of said biomolecules using specific binding molecules. 10 d) Biologically Active Surfaces In one embodiment of the invention, biologically active surfaces include, but are not restricted to, surfaces that contain adsorbents such as quaternary ammonium groups (anion exchange surfaces), carboxylate groups (cation exchange surfaces), alkyl or aryl chains (hydrophobic 15 interaction, reverse phase chemistry), groups such as nitriloacetic acid that immobilize metal ions such as nickel, gallium, copper, or zinc (metal affinity interaction), or biomolecules such as proteins, preferably antibodies, or nucleic acids, preferably protein binding sequences, covalently bound to the surface via carbonyl dimidazole moieties or epoxy groups (specific affinity interaction). Preferred are adsorbents comprising anion exchange surfaces. 20 These surfaces may be located on matrices like polysaccharides such as sepharose, e.g. anion exchange surfaces or hydrophobic interaction surfaces, or solid metals, e.g. antibodies coupled to magnetic beads. Surfaces may also include gold-plated surfaces such as those used for Biacore Sensor Chip technology. Other surfaces known to those skilled in the art are also 25 included within the scope of the invention. Biologically active surfaces are able to adsorb biomolecules like amino acids, sugars, fatty acids, steroids, nucleic acids, polynucleotides, polypeptides, carbohydrates, lipids, and combinations thereof (e.g., glycoproteins, ribonucleoproteins, lipoproteins). 30 In another embodiment, devices that use biologically active surfaces to selectively adsorb biomolecules may be chromatography columns for Fast Protein Liquid Chromatography (FPLC) and High Pressure Liquid Chromatography (HPLC), where the matrix, e.g. a polysaccharide, carrying the biologically active surface, is filled into vessels (usually referred to 35 as "columns") made of glass, steel, or synthetic materials like polyetheretherketone (PEEK). In yet another embodiment, devices that use biologically active surfaces to selectively adsorb WO 2004/102188 PCT/EP2004/005292 biomolecules may be metal strips carrying thin layers of the biologically active surface on one or more spots of the strip surface to be used as probes for gas phase ion spectrometry analysis, for example the SAX2 ProteinChip array (Ciphergen Biosystems, Inc.) for SELDI analysis. 5 e) Mass Profiling In one embodiment, the mass profile of a sample may be generated using an array-based assay in which the biomolecules of a given sample are bound by biochemical or affinity interactions to an adsorbent present on a biologically active surface located on a solid platform ("array" or "probe"). After the biomolecules have bound to the adsorbent, they are detected using gas phase 10 ion spectrometry. Biomolecules or other substances bound to the adsorbents on the probes can be analyzed using a gas phase ion spectrometer. This includes, e.g., mass spectrometers, ion mobility spectrometers, or total ion current measuring devices. The quantity and characteristics of the biomolecule can be determined using gas phase ion spectrometry. Other substances in addition to the biomolecule of interest can also be detected by gas phase ion spectrometry. 15 In one embodiment, a mass spectrometer can be used to detect biomolecules on the probe. In a typical mass spectrometer, a probe with a biomolecule is introduced into an inlet system of the mass spectrometer. The biomolecule is then ionized by an ionization source, such as a laser, fast atom bombardment, or plasma. The generated ions are collected by an ion optic assembly, and 20 then a mass analyzer disperses and analyzes the passing ions. Within the scope of this invention, the ionisation course that ionises the biomolecule is a laser. The ions exiting the mass analyzer are detected by a ion detector. The ion detector then translates information of the detected ions into mass-to-charge ratios. Detection of the presence 25 of a biomolecule or other substances will typically involve detection of signal intensity. This, in turn, can reflect the quantity and character of a biomolecule bound to the probe. In another embodiment, the mass profile of a sample may be generated using a liquid-chromatography (LC)-based assay in which the biomolecules of a given sample are 30 bound by biochemical or affinity interactions to an adsorbent located in a vessel made of glass, steel, or synthetic material; known to those skilled in the art as a chromatography column. The biomolecules are eluted from the biologically active surface by washing the vessel with appropriate solutions known to those skilled in the art. Such solutions include but are not limited to, buffers, e.g. Tris (hydroxymethyl) aminomethane hydrochloride (TRIS-HCl), buffers 35 containing salt, e.g. sodium chloride (NaCl), or organic solvents, e.g. acetonitrile. Biomolecule mass profiles are generated by application of the eluting biomolecules of the sample by direct connection via an electrospray device to a mass spectrometer (LC/ESI-MS).
WO 2004/102188 PCT/EP2004/005292 Conditions that promote binding of biomolecules to an adsorbent are known to those skilled in the art (reference) and ordinarily include parameters such as pH, the concentration of salt, organic solvent, or other competitors for binding of the biomolecule to the adsorbent. Within the scope of the invention, incubation temperatures are of at least 0 to 100'C, preferably of at least 5 4 to 60'C, and most preferably of at least 15 to 30'C. Varying additional parameters, such as incubation time, the concentration of detergent, e.g., 3-[(3-Cholamidopropyl) dimethylammonio]-2-hydroxy-1-propanesulfonate (CHAPS), or reducing agents, e.g. dithiotbreitol (DTT), are also known to those skilled in the art. Various degrees of binding can be accomplished by combining the above stated conditions as needed, and will be readily 10 apparent to those skilled in the art. f) Methods for detecting biomolecules within a sample In yet another aspect, the invention relates to methods for detecting differentially present biomolecules in a test sample and/or biological sample. Within the context of the invention, any 15 suitable method can be used to detect one or more of the biomolecules described herein. For example, gas phase ion spectrometry can be used. This technique includes, e.g., laser desorption/ionization mass spectrometry. Preferably, the test and/or biological sample is prepared prior to gas phase ion spectrometry, e.g., pre-fractionation, two-dimensional gel chromatography, high performance liquid chromatography, etc. to assist detection of said 20 biomolecules. Detection of said biomolecules can also be achieved using methods other than gas phase ion spectrometry. For example, immunoassays can be used to detect the biomolecules within a sample. In one embodiment, the test and/or biological sample is prepared prior to contacting a 25 biologically active surface and is in aqueous form. Examples of said samples include, but are not limited to, blood, blood serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, tears, saliva, sweat, ascites, cerebrospinal fluid, milk, lymph, or tissue extract samples. Furthermore, solid test and/or biological samples, such as excreta or biopsy samples can be solubilised in or admixed with an eluent using methods known to those skilled 30 in the art such that said samples may be easily applied to a biologically active surface. Test and/or biological samples in the aqueous form can be further prepared using specific solutions for denaturation (pre-treatment) like sodium dodecyl sulphate (SDS), mercaptoethanol, urea, etc. For example, a test and/or biological sample of the invention can be denatured prior to contacting a biologically active surface comprising of quaternary ammonium groups by diluting 35 said sample 1:5 with a buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT and 2% ampholine.
WO 2004/102188 PCT/EP2004/005292 The sample is contacted with a biologically active surface using any techniques including bathing, soaking, dipping, spraying, washing over, or pipetting, etc. Generally, a volume of sample containing from a few atomoles to 100 picomoles of a biomolecule in about 1 to 500 pl is sufficient for detecting binding of the biomolecule to the adsorbent. 5 The'pH value of the solvent in which the sample contacts the biologically active surface is a function of the specific sample and the selected biologically active surface. Typically, a sample is contacted with a biologically active surface under pH values between 0 and 14, preferably between about 4 and 10, more preferably between 4.5 and 9.0, and most preferably, at pH 8.5. 10 The pH value depends on the type of adsorbent present on a biologically active surface and can be adjusted accordingly. The sample can contact the adsorbent present on a biologically active surface for a period of time sufficient to allow the marker to bind to the adsorbent. Typically, the sample and the 15 biologically active surface are contacted for a period of between about 1 second and about 12 hours, preferably, between about 30 seconds and about 3 hours, and most preferably for 120 minutes. The temperature at which the sample contacts the biologically active surface (incubation 20 temperature) is a function of the specific sample and the selected biologically active surface. Typically, the washing solution can be at a temperature of between 0 and 100 C, preferably between 4 and 37 0 C, and most preferably between 20 and 24'C. For example, a biologically active surface comprising of quaternary ammonium groups (anion 25 exchange surface) will bind the biomolecules described herein when the pH value is between 6.5 and 9.0. Optimal binding of the biomolecules of the present invention occurs at a pH of 8.5. Furthermore, a sample is contacted with said biologically active surface for 120 minutes at a temperature of 20 - 24 'C. 30 Following contacting a sample or sample solution with a biological surface, it is preferred to remove any unbound biomolecules so that only the bound biomolecules remain on the biologically active surface. Washing unbound biomolecules are removed by methods known to those skilled in the art such as bathing, soaking, dipping, rinsing, spraying, or washing the biologically active surface with an eluent or a washing solution. A microfluidics process is 35 preferably used when a washing solution such as an eluent is introduced to small spots of adsorbents on the biologically active surface. Typically, the washing solution can be at a temperature of between 0 and 100'C, preferably between 4 and 37 0 C, and most preferably WO 2004/102188 PCT/EP2004/005292 between 20 and 24'C. Washing solution or eluents used to wash the unbound biomolecules from a biologically active surface include, but are not limited to, organic solutions, aqueous solutions such as buffers 5 wherein a buffer may contain detergents, salts, or reducing agents in appropriate concentrations as those known to those skilled in the art. Aqueous solutions are preferred for washing biologically active surfaces. Exemplary aqueous solutions include, but are not limited to, HEPES buffer, Tris buffer, phosphate buffered saline 10 (PBS), and modifications thereof. The selection of a particular washing solution or an eluent is dependent on other experimental conditions (e. g., types of adsorbents used or biomolecules to be detected), and can be determined by those of skill in the art. For example, if a biologically active surface comprising a quaternary ammonium group as adsorbent (anion exchange surface) is used, then an aqueous solution, such as a Tris buffer, may be preferred. In another example, if 15 a biologically active surface comprising a carboxylate group as adsorbent (cation exchange surface) is used, then an aqueous solution, such as an acetate buffer, may be preferred. Optionally, an energy absorbing molecule (EAM), e.g. in solution, can be applied to biomolecules or other substances bound on the biologically active surface by spraying, pipetting 20 or dipping. Applying an EAM can be done after unbound materials are washed off of the biologically active surface. Exemplary energy absorbing molecules include, but are not limited to, cinnamic acid derivatives, sinapinic acid and dihydroxybenzoic acid. Once the biologically active surface is free of any unbound biomolecules, adsorbent-bound 25 biomolecules are detected using gas phase ion spectrometry. The quantity and characteristics of a biomolecule can be determined using said method. Furthermore, said biomolecules can be analyzed using a gas phase ion spectrometer such as mass spectrometers, ion mobility spectrometers, or total ion current measuring devices. Other gas phase ion spectrometers known to those skilled in the art are also included. 30 In one embodiment, mass spectrometry can be used to detect biomolecules of a given sample present on a biologically active surface. Such methods include, but are not limited to, matrix assisted laser desorption ionization/time-of-flight (MALDI-TOF), surface-enhanced laser desorption ionization/time-of-flight (SELDI-TOF), liquid chromatography coupled with MS, 35 MS-MS, or ESI-MS. Typically, biomolecules are analysed by introducing a biologically active surface containing said biomolecules, ionizing said biomolecules to generate ions that are collected and analysed.
WO 2004/102188 PCT/EP2004/005292 In a preferred embodiment, the biomolecules present in a sample are detected using gas phase ion spectrometry, and more preferably, using mass spectrometry. In one embodiment, matrix assisted laser desorption/ionization ("MALDI") mass spectrometry can be used. In MALDI, the 5 sample is partially purified to obtain a fraction that essentially consists of a biomolecule by employing such separation methods as: two-dimensional gel electrophoresis (2D-gel) or high performance liquid chromatography (HPLC). In another embodiment, surface-enhanced laser desorption/ionization mass spectrometry 10 ("SELDI") can be used. SELDI uses a substrate comprising adsorbents to capture biomolecules, which can then be directly desorbed and ionized from the substrate surface during mass spectrometry. Since the substrate surface in SELDI captures biomolecules, a sample need not be partially purified as in MALDI. However, depending on the complexity of a sample and the type of adsorbents used, it may be desirable to prepare a sample to reduce its complexity prior 15 to SELDI analysis. For example, biomolecules bound to a biologically active surface can be introduced into an inlet system of the mass spectrometer. The biomolecules are then ionized by an ionization source such as a laser, fast atom bombardment, or plasma. The generated ions are then collected by an 20 ion optic assembly, and then a mass analyzer disperses the passing ions. The ions exiting the mass analyzer are detected by a detector and translated into mass-to-charge ratios. Detection of the presence of a biomolecule typically involves detection of its specific signal intensity, and reflects the quantity and character of said biomolecule. 25 In a preferred embodiment, a laser desorption time-of-flight mass spectrometer is used with the probe of the present invention. In laser desorption mass spectrometry, biomolecules bound to a biologically active surface are introduced into an inlet system. Biomolecules are desorbed and ionized into the gas phase by a laser. The ions generated are then collected by an ion optic assembly. These ions are accelerated through a short high-voltage field and allowed to drift into 30 a high vacuum chamber of a time-of-flight mass analyzer. At the far end of the high vacuum chamber, the accelerated ions collide with a detector surface at varying times. Since the time-of-flight is a function of the mass of the ions, the elapsed time between ionization and impact can be used to identify the presence or absence of molecules of a specific mass. 35 The detection of biomolecules described herein can be enhanced using certain selectivity conditions (e. g., types of adsorbents used or washing solutions). In a preferred embodiment, the same or substantially the same selectivity conditions that were used to discover the WO 2004/102188 PCT/EP2004/005292 biomolecules can be used in the methods for detecting a biomolecule in a sample. Combinations of the laser desorption time-of-flight mass spectrometer with other components described herein, in the assembly of mass spectrometer that employs various means of 5 desorption, acceleration, detection, measurement of time, etc., are known to those skilled in the art. Data generated by desorption and detection of markers can be analyzed with the use of a programmable digital computer. The computer program generally contains a readable medium 10 that stores codes. Certain codes can be devoted to memory that include the location of each feature on a biologically active surface, the identity of the adsorbent at that feature and the elution conditions used to wash the adsorbent. Using this information, the program can then identify the set of features on the biologically active surface defining certain selectivity characteristics (e. g. types of adsorbent and eluents used). The computer also contains codes that 15 receive as data (input) on the strength of the signal at various molecular masses received from a particular addressable location on the biologically active surface. This data can indicate the number of biomolecules detected, as well as the strength of the signal and the determined molecular mass for each biomolecule detected. 20 Data analysis can include the steps of determining signal strength (e. g., height of peaks) of a biomolecule detected and removing "outliers" (data deviating from a predetermined statistical distribution). For example, the observed peaks can be normalized, a process whereby the height of each peak relative to some reference is calculated. For example, a reference can be background noise generated by instrument and chemicals (e. g., energy absorbing molecule), 25 which is set as zero in the scale. Then the signal strength detected for each biomolecule can be displayed in the form of relative intensities in the scale desired (e. g., 100). Alternatively, a standard may be admitted with the sample so that a peak from the standard can be used as a reference to calculate relative intensities of the signals observed for each biomolecule or other biomolecules detected. 30 The computer can transform the resulting data into various formats for displaying. In one format, referred to as "spectrum view", a standard spectral view can be displayed, wherein the view depicts the quantity of a biomolecule reaching the detector at each particular molecular mass. In another format, referred to as "scatter plot" only the peak height and mass information 35 are retained from the spectrum view, yielding a cleaner image and enabling biomolecules with nearly identical molecular mass to be more visible.
WO 2004/102188 PCT/EP2004/005292 Using any of the above display formats, it can be readily determined from the signal display whether a biomolecule having a particular molecular mass is detected from a sample. Preferred biomolecules of the invention are biomolecules with an apparent molecular mass of about 1506 Da t 8 Da, 1533 Da± 8 Da, 1623 Da i 8 Da, 1975 Da + 10 Da, 2017 Da + 10 Da, 2053 Da + 5 10 Da, 2268 Da 11 Da, 2607 Da + 13 Da, 3328 Da 17 Da, 3508 Da 18 Da, 3660 Dadi 18 Da, 3951 Da :20 Da, 4107 Da± 21 Da, 4161 Da 21 Da, 4245 Da 21 Da, 4295 Da 21 Da, 4363 Da & 22 Da, 4476 Da &22 Da, 4614 Da 23 Da, 4725 Da 24 Da, 4831 Da 24 Da, 4874 Da + 24 Da, 4962 Da 25 Da, 5115 Dad 26 Da, 5497 Da t 27 Da, 5655 Da + 28 Da, 5863 Da t 29 Da, 6454 Da ± 32 Da, 6655 Da 33 Da, 6906 Da d 35 Da, 7012 Da 35 Da, 10 7591 Dad: 38 Da, 7998 Da± 40 Da, 8230 Da 41 Da, 8487 Dali 42 Da, 8589 Da 43 Da, 8717 Da d: 44 Da, 8792 Da t 44 Da, 8939 Da!d: 45 Da, 9160 Da: 46 Da, 9221 Da 46 Da, 9377 Da ± 47 Da, 9446 Dad: 47 Da, 9661 Da 48 Da, 9737 Da 49 Da, 9955 Da 50 Da, 10232 Da + 51 Da, 10464 Da 152 Da, 10682 Da: 53 Da, 11414 Da 57 Da, 11567 Da: 58 Da, 11723 Da ± 59 Da, 12492 Da 62 Da, 12656 Da 63 Da, 13652 Da ± 68 Da, 13776 Dad: 15 69 Da, 13812 Da 69 Da, 14014 Da 70 Da, 14082 Da 70 Da, 14821 Dad: 74 Da, 15160 Da + 76 Da, 15367 Da 77 Da, 15909 Dad: 78 Da, 15975 Da 80 Da, 16202 Da t 81 Da, 17288 Da ± 86 Da, 17416 Da: 87 Da, 17504 Da 88 Da, 17638 Da + 88 Da, 17961 Da 1 90 Da, 18146 Da 91 Da, 18430 Da 92 Da, 18656 Da 93 Da, 22383 Da i 112 Da, 22496 Da 113 Da, 22710 Da + 114 Da, 23218 Da ± 116 Da, 28119 Da + 141 Da, or 28313 Dad: 142 Da. 20 Moreover, from the strength of signal, the amount of a biomolecule bound on the biologically active surface can be determined. g) Identification of proteins In the event that the biomolecules of the invention are proteins, the present invention comprises 25 a method for the identification of these proteins, especially by obtaining their amino acid sequence. This method comprises the purification of said proteins from the complex biological sample (blood, blood serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, tears, saliva, sweat, ascites, cerebrospinal fluid, milk, lymph, or tissue extract samples) by fractionating said sample using techniques known by the one of ordinary 30 skill in the art, most preferably protein chromatography (FPLC, HPLC). The biomolecules of the invention include those proteins with a molecular mass selected from 1506 Da ±8 Da, 1533 Dad: 8 Da, 1623 Da 8 Da, 1975 Da : 10 Da, 2017 Da= 10 Da, 2053 Da 10 Da, 2268 Da t 11 Da, 2607 Da 13 Da, 3328 Da + 17 Da, 3508 Da t 18 Da, 3660 Da 35 +L 18 Da, 3951 Da 20 Da, 4107 Da 21 Da, 4161 Da 21 Da, 4245 Da: 21 Da, 4295 Da: 21 Da, 4363 Da 22 Da, 4476 Da + 22 Da, 4614 Da + 23 Da, 4725 Da + 24 Da, 4831 Da + 24 Da, 4874 Da 24 Da, 4962 Da: 25 Da, 5115 Da 26 Da, 5497 Da 27 Da, 5655 Da 28 Da, WO 2004/102188 PCT/EP2004/005292 5863 Da + 29 Da, 6454 Da ± 32 Da, 6655 Da ± 33 Da, 6906 Da ± 35 Da, 7012 Da ± 35 Da, 7591 Da ± 38 Da, 7998 Da ± 40 Da, 8230 Da ± 41 Da, 8487 Da ± 42 Da, 8589 Da + 43 Da, 8717 Da ± 44 Da, 8792 Da ± 44 Da, 8939 Da ± 45 Da, 9160 Da * 46 Da, 9221 Da ± 46 Da, 9377 Da ± 47 Da, 9446 Da ± 47 Da, 9661 Da ± 48 Da, 9737 Da t 49 Da, 9955 Da ± 50 Da, 5 10232 Da 51 Da, 10464 Da ± 52 Da, 10682 Da ± 53 Da, 11414 Da 57 Da, 11567 Da 58 Da, 11723 Da ± 59 Da, 12492 Da ± 62 Da, 12656 Da ± 63 Da, 13652 Da + 68 Da, 13776 Da ± 69 Da, 13812 Da+ 69 Da, 14014 Da + 70 Da, 14082 Da ± 70 Da, 14821 Da± 74 Da, 15160 Da ± 76 Da, 15367 Da ± 77 Da, 15909 Da ± 78 Da, 15975 Da ± 80 Da, 16202 Da± 81 Da, 17288 Da ± 86 Da, 17416 Da ± 87 Da, 17504 Da ± 88 Da, 17638 Da ± 88 Da, 17961 Da ± 90 Da, 10 18146 Da± 91 Da, 18430 Da 92 Da, 18656 Da± 93 Da, 22383 Da± 112 Da, 22496 Da 113 Da, 22710 Da± 114 Da, 23218 Da ± 116 Da, 28119 Da ± 141 Da, or 28313 Da 142 Da. Furthermore, the method comprises the analysis of the fractions for the presence and purity of said proteins by the method which was used to identify them as differentially expressed 15 biomolecules, for example two-dimensional gel electrophoresis, SELDI mass spectrometry of MALDI mass spectrometry, but most preferably MALDI mass spectrometry. The method also comprises an analysis of the purified proteins aiming towards the revealing of their amino acid sequence. This analysis may be performed using techniques in mass spectroscopy known to those skilled in the art. 20 In one embodiment, this analysis may be performed using peptide mass fingerprinting, revealing information about the specific peptide mass profile after proteolytic digestion of the investigated protein. 25 In another embodiment, this analysis may be preferably performed using post-source-decay (PSD), or ESI-MS, but most preferably ESI-MS, revealing mass information about all possible fragments of the investigated protein or proteolytic peptides thereof leading to the amino acid sequence of the investigated protein of proteolytic peptide thereof. 30 The information revealed by the aforementioned techniques can be used to feed world-wide web search engines, such as MS Fit (Protein Prospector, http://prospector.ucsf edu) for information obtained from peptide mass fingerprinting, or MS Tag (Protein Prospector, http://prospector.ucsf.edu) for information obtained from PSD, or mascot (www.matrixscience.com) for information obtained from MSMS and peptide mass 35 fingerprinting, for the alignment of the obtained results with data available in public protein sequence databases, such as SwissProt (http://us.expasy.org/sprot/), NCBI WO 2004/102188 PCT/EP2004/005292 (http://www.ncbi.nlm.nih.gov/BLAST/), EMBL (http://srs.embl-heidelberg.de:8000/srs5/) which leads to a confident information about the identity of said proteins. This information may comprise, if available, the complete amino acid sequence, the calculated 5 molecular mass, the structure, the enzymatic activity, the physiological function, and gene expression of the investigated proteins. h) Kits In yet another aspect, the invention provides kits using the methods of the invention as 10 described in the section Diagnostics for the differential diagnosis of a breast cancer or a non-malignant disease of the breast, wherein the kits are used to detect the biomolecules of the present invention. The methods used to detect the biomolecules of the invention can also be used to determine 15 whether a subject is at risk of developing a breast cancer or has developed a breast cancer. Such methods may also be employed in the form of a diagnostic kit comprising an antibody specific to a biomolecule of the invention or a biologically active surface described herein, which may be conveniently used, for example, in clinical settings to diagnose patients exhibiting symptoms or a family history of a non-steroid dependent cancer. Such diagnostic kits also include 20 solutions and materials necessary for the detection of a biomolecule of the invention, and instructions to use the kit based on the above-mentioned methods. The biomolecules of the invention include those proteins with a molecular mass selected from 1506 Da ± 8 Da, 1533 Da 8 Da, 1623 Da 8 Da, 1975 Da + 10 Da, 2017 Da 10 Da, 2053 25 Da + 10 Da, 2268 Da 11 Da, 2607 Da 13 Da, 3328 Da: 17 Da, 3508 Da t 18 Da, 3660 Da ± 18 Da, 3951 Da ±20 Da, 4107 Da 21 Da, 4161 Da ± 21 Da, 4245 Da + 21 Da, 4295 Da ± 21 Da, 4363 Da 1 22 Da, 4476 Da :1:22 Da, 4614 Da 123 Da, 4725 Da: 24 Da, 4831 Da + 24 Da, 4874 Da ± 24 Da, 4962 Da 25 Da, 5115 Da J26 Da, 5497 Da ±27 Da, 5655 Da 28 Da, 5863 Da + 29 Da, 6454 Da ± 32 Da, 6655 Da 33 Da, 6906 Da + 35 Da, 7012 Da 35 Da, 30 7591 Da ± 38 Da, 7998 Dad± 40 Da, 8230 Da 41 Da, 8487 Da ± 42 Da, 8589 Da 43 Da, 8717 Da ± 44 Da, 8792 Da 44 Da, 8939 Da 45 Da, 9160 Da ± 46 Da, 9221 Da + 46 Da, 9377 Da ± 47 Da, 9446 Da 1 47 Da, 9661 Da 48 Da, 9737 Da J 49 Da, 9955 Da 50 Da, 10232 Da 51 Da, 10464 Da + 52 Da, 10682 Da ±53 Da, 11414 Da + 57 Da, 11567 Da ±58 Da, 11723 Da i 59 Da, 12492 Da + 62 Da, 12656 Da 63 Da, 13652 Da 68 Da, 13776 Da: 35 69 Da, 13812 Da 69 Da, 14014 Da 70 Da, 14082 Da 70 Da, 14821 Da+ 74 Da, 15160 Da ± 76 Da, 15367 Da 77 Da, 15909 Da 78 Da, 15975 Da 80 Da, 16202 Da 81 Da, 17288 Da 1 86 Da, 17416 Da + 87 Da, 17504 Da + 88 Da, 17638 Da : 88 Da, 17961 Da 90 Da, WO 2004/102188 PCT/EP2004/005292 18146 Da± 91 Da, 18430 Da± 92 Da, 18656 Da 93 Da, 22383 Da+ 112 Da, 22496 Da ±113 Da, 22710 Da 114 Da, 23218 Da 116 Da, 28119 DaL 141 Da, or 28313 Dat 142 Da. For example, the kits can be used to detect one or more of differentially present biomolecules as 5 described above in a test sample of subject. The kits of the invention have many applications. For example, the kits can be used to differentiate if a subject is healthy, having a precancerous lesion of the breast, a breast cancer, a metastasized breast cancer or a non-malignant disease of the breast. Thus aiding the diagnosis of a breast cancer and/or a non-malignant disease of the breast. In another example, the kits can be used to identify compounds that modulate expression 10 of said biomolecules. In one embodiment, a kit comprises an adsorbent on a biologically active surface, wherein the adsorbent is suitable for binding one or more biomolecules of the invention, a denaturation solution for the pre-treatment of a sample, a binding solution, a washing solution or instructions 15 for making a denaturation solution, binding solution, or washing solution, wherein the combination allows for the detection of a biomolecule using gas phase ion spectrometry. Such kits can be prepared from the materials described in other previously detailed sections (e. g., denaturation buffer, binding buffer, adsorbents, washing solutions, etc.). 20 In some embodiments, the kit may comprise a first substrate comprising an adsorbent thereon (e. g., a particle functionalized with an adsorbent) and a second substrate onto which the first substrate can be positioned to form a probe, which is removably insertable into a gas phase ion spectrometer. In other embodiments, the kit may comprise a single substrate, which is in the form of a removably insertable probe with adsorbents on the substrate. 25 In another embodiment, a kit comprises a binding molecule that specifically binds to a biomolecule related to the invention, a detection reagent, appropriate solutions and instructions on how to use the kit. Such kits can be prepared from the materials described above, and other materials known to those skilled in the art. A binding molecule used within such a kit may 30 include, but is not limited to, proteins, peptides, nucleotides, nucleic acids, hormones, amino acids, sugars, fatty acids, steroids, polynucleotides, carbohydrates, lipids, or a combination thereof (e.g. glycoproteins, ribonucleoproteins, lipoproteins), compounds or synthetic molecules. Preferably, a binding molecule used in said kit is an antibody. 35 In either embodiment, the kit may optionally further comprise a standard or control information so that the test sample can be compared with the control information standard to determine if the test amount of a marker detected in a sample is a diagnostic amount consistent with a WO 2004/102188 PCT/EP2004/005292 diagnosis of a breast cancer or a non-malignant disease of the breast. Each recorded measurement reading is accompanied by a margin of deviation. The latter statistical imprecision is well-known to those skilled in the art. In the scope of the 5 present invention, the margin of deviation is exclusively device-specific. That means it is caused by the type of analytical device used which is preferably a mass spectrometer. The accuracy of the recorded measurement reading is specified by a fixed percentage. In the meaning of the present invention, each disclosed molecular mass represents the averaged value of that range which deviates from the averaged value about ± 0.5 %. 10 Furthermore, slight differences appear in the molecular mass value itself which concerns the same protein in parallel patent applications disclosing the matter of cancer biomarkers. There are three reasons to be considered. First, each molecular mass results from the analysis of samples belonging to another type of cancer. The origin of sample, the cellular status, the environmental conditions of the gathered tissue etc. exert an 15 influence on the measurements. Secondly, the given molecular mass of the biomarkers represents the averaged value which is calculated from the data of numerous samples of each cancer species. Thirdly, measuring errors might be also imaginable, for example due to the sample preparation. 20 Above statements are further illustrated by examples which should not be construed as limiting with regard to the type of disease, the number of given molecular masses or in any other way. The following molecular masses [Da] of biomolecules are regarded as equivalent: (i) 1516 ± 8 (epithelial cancer) and 1506 ± 8 (breast cancer) 25 (ii) 1535 ± 8 (epithelial cancer) and 1533 ± 8 (breast cancer) (iii) 1624 ± 8 (pancreatic cancer) and 1623 ± 8 (breast cancer) (iv) 2020 ± 10 (epithelial cancer), 2020 ± 10 (colorectal cancer), 2020 ± 10 (pancreatic cancer) and 2017 ± 10 (breast cancer) (v) 2050 ± 10 (epithelial cancer), 2049 10 (colorectal cancer) and 2053 ± 30 10 (breast cancer) (vi) 2270 ± 11 (colorectal cancer), 2271 11 (pancreatic cancer) and 2268 11 (breast cancer) (vii) 3326 ± 17 (colorectal cancer) and 3328 ± 17 (breast cancer) WO 2004/102188 PCT/EP2004/005292 (viii) 3946 ± 20 (epithelial cancer), 3946 ± 20 (colorectal cancer), 3951 ± 20 (pancreatic cancer) and 3951 ± 20 (breast cancer) (ix) 4104 ± 21 (epithelial cancer), 4103 ± 21 (colorectal cancer), 4108 ± 20 (pancreatic cancer) and 4107 ± 21 (breast cancer) 5 (x) 4151 ± 21 (epithelial cancer) and 4161 ± 21 (breast cancer) (xi) 4242 ±21 (colorectal cancer), 4249 21 (pancreatic cancer) and 4245 21 (breast cancer) (xii) 4298 ± 21 (epithelial cancer), 4295 21 (colorectal cancer), 4307 ± 21 (pancreatic cancer) and 4295 ±21 (breast cancer) 10 (xiii) 4360 ± 22 (epithelial cancer), 4359 ±22 (colorectal cancer), 4364 ± 22 (pancreatic cancer) and 4363 ±22 (breast cancer) (xiv) 4477 ± 22 (epithelial cancer), 4476 ± 22 (colorectal cancer), 4480 ± 22 (pancreatic cancer) and 4476 ± 22 (breast cancer) (xv) 4607 ± 23 (colorectal cancer), 4614 ± 23 (pancreatic cancer) and 4614 ± 15 23 (breast cancer) (xvi) 4719 ±24 (colorectal cancer), 4725 ±24 (pancreatic cancer) and 4725 ± 24 (breast cancer) (xvii) 4830 ± 24 (colorectal cancer), 4836 ± 24 (pancreatic cancer) and 4831 ± 24 (breast cancer) 20 (xviii) 4867 ± 24 (epithelial cancer), 4865 ± 24 (colorectal cancer), 4875 ± 24 (pancreatic cancer) and 4874 ± 24 (breast cancer) (xix) 4958 ± 25 (epithelial cancer), 4963 ± 25 (colorectal cancer), 4969 ± 25 (pancreatic cancer) and 4962 ± 25 (breast cancer) (xx) 5112 ± 26 (colorectal cancer), 5119 ±26 (pancreatic cancer) and 5119 25 26 (breast cancer) (xxi) 5491 ± 27 (epithelial cancer), 5493 ±,27 (colorectal cancer), 5497 ± 27 (pancreatic cancer) and 5497 ±27 (breast cancer) (xxii) 5650 ± 28 (epithelial cancer), 5648 ± 28 (colorectal cancer), 5657 ± 28 (pancreatic cancer) and 5655 ±28 (breast cancer) 30 (xxiii) 5854 ± 29 (colorectal cancer), 5857 29 (pancreatic cancer) and 5863 29 (breast cancer) (xxiv) 6449 ± 32 (epithelial cancer), 6446 32 (colorectal cancer), 6458 ± 32 (pancreatic cancer) and 6454 ± 32 (breast cancer) WO 2004/102188 PCT/EP2004/005292 (xxv) 6644 ± 33 (colorectal cancer) and 6655 ± 33 (breast cancer) (xxvi) 6897 ± 35 (colorectal cancer), 6908 35 (pancreatic cancer) and 6906 35 (breast cancer) (xxvii) 7001 ± 35 (epithelial cancer), 6999 35 (colorectal cancer), 7013 ± 35 5 (pancreatic cancer) and 7012 ± 35 (breast cancer) (xxviii) 7575 ± 38 (colorectal cancer) and 7591 ± 38 (breast cancer) (xxix) 7969 ± 40 (epithelial cancer), 8001 ± 40 (pancreatic cancer) and 7998 40 (breast cancer) (xxx) 8232 ± 41 (epithelial cancer), 8215 ± 41 (colorectal cancer), 8237 ± 41 10 (pancreatic cancer) and 8230 ± 41 (breast cancer) (xxxi) 8474 ± 42 (colorectal cancer), 8494 ±42 (pancreatic cancer) and 8487 ± 42 (breast cancer) (xxxii) 8574 ± 43 (colorectal cancer), 8596 ± 43 (pancreatic cancer) and 8589 ± 43 (breast cancer) 15 (xxxiii) 8711 ± 44 (epithelial cancer), 8702 ±44 (colorectal cancer), 8717 +44 (pancreatic cancer) and 8717 ± 44 (breast cancer) (xxxiv) 8780 ± 44 (colorectal cancer), 8794 ± 44 (pancreatic cancer) and 8792 ±44 (breast cancer) (xxxv) 8922 ±45 (colorectal cancer), 8942 ±45 (pancreatic cancer) and 20 8939 ± 45 (breast cancer) (xxxvi) 9143 ± 46 (colorectal cancer), 9163 ± 46 (pancreatic cancer) and 9160 ± 46 (breast cancer) (xxxvii) 9201 ± 46 (colorectal cancer), 9220 ±46 (pancreatic cancer) and 9221 ± 46 (breast cancer) 25 (xxxviii) 9359 ±47 (colorectal cancer), 9382 ± 47 (pancreatic cancer) and 9377 ± 47 (breast cancer) (xxxix) 9425 ± 47 (colorectal cancer), 9443 ± 47 (pancreatic cancer) and 9446 ±47 (breast cancer) (xl) 9641 ± 48 (colorectal cancer), 9652 ± 48 (pancreatic cancer) and 9661 ± 30 48 (breast cancer) (xli) 9718 ± 49 (colorectal cancer), 9741 ± 49 (pancreatic cancer) and 9737 ± 49 (breast cancer) (xlii) 9930 ± 50 (colorectal cancer) and 9955 ± 50 (breast cancer) WO 2004/102188 PCT/EP2004/005292 (xliii) 10215 ± 51 (colorectal cancer), 10233 ± 51 (pancreatic cancer) and 10232 ± 51 (breast cancer) (xliv) 10440 ± 52 (colorectal cancer), 10455 ± 52 (pancreatic cancer) and 10464 ± 52 (breast cancer) 5 (xlv) 10665 ± 53 (epithelial cancer), 10748 54 (pancreatic cancer) and 10682 ± 53 (breast cancer) (xlvi) 11464 ± 57 (colorectal cancer), 11488 ± 57 (pancreatic cancer) and 11414 ± 57 (breast cancer) (xlvii) 11547 ± 58 (colorectal cancer), 11558 58 (pancreatic cancer) and 10 11567 ± 58 (breast cancer) (xlviii) 11693 ± 58 (colorectal cancer), 11713 58 (pancreatic cancer) and 11723 ± 58 (breast cancer) (xlix) 12504 ± 62 (epithelial cancer) and 12492 ± 62 (breast cancer) (1) 12669 ± 63 (epithelial cancer), 12619 ± 63 (colorectal cancer), 12648 15 63 (pancreatic cancer) and 12656 ± 63 (breast cancer) (li) 13632 ± 68 (colorectal cancer) and 13652 ± 68 (breast cancer) (lii) 13784 ± 69 (colorectal cancer), 13800 ± 69 (pancreatic cancer) and 13776 ± 69 (breast cancer) (liii) 13824 ± 69 (pancreatic cancer) and 13812 ± 69 (breast cancer) 20 (liv) 13989 ± 70 (epithelial cancer), 13983 ± 70 (colorectal cancer) and 14014 + 70 (breast cancer) (lv) 14206 ± 71 (pancreatic cancer) and 14082 ± 70 (breast cancer) (lvi) 14798 ± 74 (colorectal cancer), 14829 ± 74 (pancreatic cancer) and 14821 ± 74 (breast cancer) 25 (lvii) 15140 ± 76 (colorectal cancer), 15168 ± 76 (pancreatic cancer) and 15160 ± 76 (breast cancer) (lviii) 15350 ± 77 (colorectal cancer), 15378 ± 77 (pancreatic cancer) and 15367 ± 77 (breast cancer) (lix) 15879 ± 79 (colorectal cancer), 15858 ± 79 (pancreatic cancer) and 30 15909 ± 79 (breast cancer) (lx) 15959 ± 80 (epithelial cancer), 15957 ± 80 (colorectal cancer), 15984 80 (pancreatic cancer) and 15975 ± 80 (breast cancer) WO 2004/102188 PCT/EP2004/005292 (lxi) 16164 ± 81 (epithelial cancer), 16164 ± 81 (colorectal cancer), 16200 81 (pancreatic cancer) and 16202 ± 81 (breast cancer) (lxii) 17279 ± 86 (epithelial cancer), 17263 ± 86 (colorectal cancer) and 17288 ± 86 (breast cancer) 5 (lxiii) 17406 ± 87 (epithelial cancer), 17397 ± 87 (colorectal cancer), 17426 87 (pancreatic cancer) and 17416 ± 87 (breast cancer) (lxiv) 17630 ± 88 (epithelial cancer), 17617 88 (colorectal cancer) and 17638 + 88 (breast cancer) (lxv) 17890 ± 89 (colorectal cancer), 17932 89 (pancreatic cancer) and17961 10 i 89 (breast cancer) (lxvi) 18133 ± 91 (epithelial cancer), 18115 ± 91 (colorectal cancer), 18153 91 (pancreatic cancer) and 18146 ± 91 (breast cancer) (lxvii) 17890 ± 89 (colorectal cancer), 17932 ± 89 (pancreatic cancer) and 17961 ± 90 (breast cancer) 15 (lxviii) 18647 ± 93 (pancreatic cancer) and 18656 ± 93 (breast cancer) (lxix) 22338 ± 112 (colorectal cancer) and 22383 112 (breast cancer) (lxx) 22466 ± 113 (colorectal cancer) and 22496 ± 113 (breast cancer) (lxxi) 22676 ± 114 (colorectal cancer) and 22710 114 (breast cancer) (lxxii) 23166 ± 116 (pancreatic cancer) and 23218 ± 116 (breast cancer) 20 (lxxiii) 28055 ± 140 (colorectal cancer), 28009 ± 140 (pancreatic cancer) and 28119 ± 141 (breast cancer) (lxxiv) 28259 ± 141 (colorectal cancer), 28124 ± 141 (pancreatic cancer) and 28313 ± 142 (breast cancer) 25 In all examples, each recorded measurement reading is overlapping with any others within its margin of deviation. A further calculation of averaged values which incorporates the matching molecular masses of each type of cancer is known to those skilled in the art. By applying formulas 30 which the method of error calculation by means of weights (weighted average) is based upon, the following generalized results are obtained for the aforementioned examples: (i) 1511± 8 (ii) 1534 ± 8 WO 2004/102188 PCT/EP2004/005292 (iii) 1624 8 (iv) 2019 10 (v) 2051 10 (vi) 2270 11 5 (vii) 3327 17 (viii) 3949 ±20 (ix) 4106 ± 21 (x) 4155±21 (xi) 4245 ± 21 10 (xii) 4299 ±21 (xiii) 4362 ±22 (xiv) 4477 ± 22 (xv) 4612 ± 23 (xvi) 4723 ±24 15 (xvii) 4832 ±24 (xviii) 4870 ± 24 (xix) 4963 ± 25 (xx) 5115 ±26 (xxi) 5495 ±27 20 (xxii) 5653 ±28 (xxiii) 5858 ±29 (xxiv) 6452 ±32 (xxv) 6650 ±33 (xxvi) 6904 ± 35 25 (xxvii) 7006 ± 35 (xxviii) 7583 ± 38 (xxix) 7989 ± 40 (xxx) 8229 ± 41 (xxxi) 8485 ±42 30 (xxxii) 8586 ± 43 (xxxiii) 8712 ±44 (xxxiv) 8789 ± 44 WO 2004/102188 PCT/EP2004/005292 (xxxv) 8934 ± 45 (xxxvi) 9155 ± 46 (xxxvii) 9214 ± 46 (xxxviii) 9373 ±47 5 (xxxix) 9438 ± 47 (xl) 9651 ±48 (xli) 9732 ±49 (xlii) 9943 ± 50 (xliii) 10227 ± 51 10 (xliv) 10453 ±52 (xlv) 10698 ±53 (xlvi) 11455 ±57 (xlvii) 11557 ± 58 (xlviii) 11710 ±59 15 (xlix) 12498 ± 62 (1) 12648 ± 63 (li) 13642 ±68 (lii) 13787 ±69 (liii) 13818 ±69 20 (liv) 13995 ± 70 (lv) 14144 ±71 (lvi) 14816 ± 74 (lvii) 15156 ±76 (lviii) 15365 ±77 25 (lix) 15882 ± 78 Qx) 15969 ±80 (lxi) 16183 ±81 (lxii) 17277 ± 86 (lxiii) 17411 ±87 30 (lxiv) 17628 ± 88 (lxv) 17928 ±90 (lxvi) 18137 ± 91 WO 2004/102188 PCT/EP2004/005292 (lxvii) 18415 ±92 (1xviii) 18652 ± 93 (1xix) 22361 ±112 (lxx) 22481 ±113 5 (lxxi) 22693 ± 114 (lxxii) 23192± 116 (lxxiii) 28061 ± 140 (lxxiv) 28232 ± 141 10 The present invention is further illustrated by the following examples, which should not be construed as limiting in any way. The contents of all cited references (including literature references, issued patents, published patent applications), as cited throughout this application, are hereby expressly incorporated by reference. The practice of the present invention will employ, unless otherwise indicated, conventional techniques of cell biology, cell culture, 15 molecular biology, transgenic biology, microbiology, recombinant DNA, and immunology, which are known to those skilled in the art. Such techniques are explained fully in the literature. Examples. Example 1. Sample collection. 20 Serum samples were obtained from a total of 216 individuals: 147 samples from women suffering from a given disease of the breast, courtesy of the Department of Gynaecology and Obstetrics at the University of Heidelberg in Heidelberg, Germany; and 69 serum samples obtained from healthy patients, courtesy of both the "Deutsches Rotes Kreuz (DRK)" in Berlin, Gennany, and the "GENICA study group" in Bonn, Germany. 25 In addition, serum samples obtained from woman suffering from a breast disease could be further subdivided based on the type of disease and the stage to which the disease has progressed e.g. non-malignant disease, mastopathy, DCIS or breast cancer (Table 1). Serum samples were collected from the patients directly before surgery. At this time, a primary 30 diagnosis was made based on standard techniques e.g. mammography, magnetic resonance imaging (MRI) and/or other means for the detection of diseases of the breast. In most cases the final diagnosis was confirmed by histological evaluation after surgery. In about 30% of the cases surgery was not possible due to the advanced stage of cancer. Follow-up data for all breast cancer patients are currently collected and will be available for later studies. 35 Example 2. ProteinChip Array analysis.
WO 2004/102188 PCT/EP2004/005292 ProteinChip Arrays of the SAX2-type (strong anion exchanger) were arranged into a bioprocessor (Ciphergen Biosystems, Inc.), a device that contains Up to 12 ProteinChips and facilitates processing of the ProteinChips. The ProteinChips were pre-incubated in the bioprocessor with 200 pd binding buffer (0.1 M Tris-HCl, 0.02% Triton X-100, pH 8.5) for two 5 times 15 minutes. 10 pl of serum sample was diluted 1:5 in a buffer (7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, 2% ampholine) and again diluted 1:10 in the binding buffer. Then, 300 pl of this mixture (equivalent to 6 pl original serum sample) were directly applied to the spots of the SAX2 ProteinChips. Between dilution steps, and prior to the application to the spots, the sample was kept on ice (at 0 0 C). After incubation for'120 minutes at 20 to 24 'C the chips were 10- incubated with 200 pl binding buffer, before 2 x 0.5 l EAM solution (20 mg/ml sinapinic acid in 50% acetonitrile and 0.5% trifluoroacetic acid) was applied to the spots. After air-drying for 10 min, the ProteinChips were placed in the ProteinChip Reader (ProteinChip Biology System II, Ciphergen Biosystems, Inc.) and time-of-flight spectra were generated by laser shots collected in the positive mode at an average laser intensity of 215, with the detector sensitivity 15 of 8. Sixty laser shots per average spectra were performed. Calibration of mass accuracy was performed by using the following mixture of mass standard calibrant proteins: Dynorphin A (porcine, 209 - 225, 2147.50 Da), Beta-endorphin (human, 61 - 91, 3465.00 Da), Insulin (bovine, 5733.58 Da), and Cytochrome c (bovine, 12230.90 Da) at a 20 concentration of 1.21 pmol/pLl, and Myoglobin (equine cardiac, 16951.50 Da) at a concentration of 5.16 pmol/gl. 0.5 1 of this mixture was applied to a single spot of a H4 ProteinChip array. After air-drying of the drop, 2 x 1 p1 matrix solution (a saturated solution of sinapinic acid in 50% acetonitrile 0.5% trifluoracetic acid) was applied to the spot. The drop was allowed to air dry for 10 min after each application of matrix solution. 25 - The ProteinChip was placed in the ProteinChip Reader (Biology System II, Ciphergen Biosystems, Inc.) and time-of-flight spectra were generated by laser shots collected in the positive mode at laser intensity 210, with the detector sensitivity of 8. Sixty laser shots per average spectra were performed. Subsequently, Time-Of-Flight values were correlated to the 30 molecular masses of the standard proteins, and calibration was performed according to the instrument manual. Example 3. Peak detection and data analysis. The analysis of the data was performed by automatic peak detection and alignment using the 35 operating software of the ProteinChip Biology System II, the ProteinChip Software Version 3.1 (Ciphergen Biosystems, Inc.). Figure 1 shows a comparison of protein mass spectra detected using the above mentioned SAX2 ProteinChip arrays for samples isolated from patients WO 2004/102188 PCT/EP2004/005292 suffering from a non-malignant disease of the breast (Cl and C2) and of patients with breast cancer (Ti and T2). The m/z values of all mass spectra selected for the analysis ranged between 1500 Da and 30000 5 Da, wherein smaller masses were not used since artefacts with the "Energy Absorbing Molecule, EAM" ("Matrix") could not be excluded, and higher masses were not detected under the chosen experimental conditions. First, baseline subtraction was conducted for each mass spectrum followed by external calibration with the calibration equation generated on Oct 01, 2003 (most of the spectra were recorded under this calibration), and subsequent internal 10 calibration using the mass signal at 6655.0 Da, which is present in all spectra with a signal-to noise ratio of at least 5. Then, normalisation of the spectra according to the intensity of the total ion current in the range from 1500 to 50000 Da was performed. Finally, automatic peak detection was applied as previously described by Adam et al., using the "Biomarker Wizard" tool of the ProteinChip Software Version 3.1 (Ciphergen Biosystems, Inc.). The following 15 settings were chosen for peak detection by "Biomarker Wizard": a) auto-detect peaks to cluster, b) first pass signal/noise = 5, c) minimum peak threshold: 2% of all spectra (peak present in at least 5 of 15 DCIS samples), d) deletion of user-detected peaks below threshold, e) cluster mass window: +/- 0.3% of mass, f) second pass signal/noise = 2. Using these settings, 91 signal clusters were identified. The following clusters were deleted because of defective peak 20 recognition: m/z 1553.25, 9598.64, 14211.2, 16139.6, 17161.4, 18879.3, 22979.5, 23455.3, and 27570.4 Da. The clusters m/z 1508.07, 2020.59, 4303.89.3, and 4614.06, 18408.2, and 23174.5 Da were changed into 1506, 2017, 3660, 4295, 4611, 18430, and 23210 Da, respectively, because of defective cluster mass centring. So, in total, 82 signal clusters were received. 25 The cluster information (containing sample ID and sample group, cluster mass values and cluster signal intensities for each spectrum) was transformed into an interchangeable data format (a .csv table) using the "Sample group statistics" function of the "Biomarker Wizard" tool of the ProteinChip Software Version 3.1. In this format, the data was subjected to statistical analyses (see Examples 4 to 6). 30 Example 4. Classifier Construction. Classifiers with binary target variable (cancer versus non-cancer) were constructed as follows. First, as a ,proof of principle, classifiers were constructed and evaluated by stratified 10-fold cross validation. The data set was partitioned in 10 approximately equal-sized subsets in which 35 the two classes are represented in about the same proportion as in the overall data set. Then, 10 classifiers were constructed using only 9/10 of the data by excluding subsequently one sub-dataset. Classifier performance was determined on the excluded test data set. Thereby, each WO 2004/102188 PCT/EP2004/005292 available case was employed 9 times for classifier construction, and once for classifier evaluation. Test results were collected to determine overall sensitivity-and specificity. Second, a final classifier was constructed on the basis of all available cases. This classifier was evaluated by using out-of-bag error estimates, see below. 5 Classifiers were constructed as decision tree ensembles to overcome typical instabilities of simple forward variable selection procedures such as single decision trees, thereby improving the overall classifier performance on independent test data, see e.g. Breiman L (1996) Bagging Predictors, Machine Learning, Vol. 24, No. 2, pp. 123-140. The results of the present invention 10 were generated using the "random forest" approach, see the following references available at ftp://ftp.stat.berkeley.edu/pub/users/breiman/: Breiman L. (2001 a) Random forests. Machine Learning, 45(l):5-32, available at ftp://ftp.stat.berkeley.edu/pub/users/breiman/ 15 Breiman L. (200 lb) Wald Lecture I: Machine Learning, available at ftp://ftp.stat.berkeley.edui/ub/users/breiman/ Breiman, L. (2001c) Wald Lecture II: Looking Inside the Black Box", available at 20 ftp://ftp.stat.berkeley.edu/pub/users/breiman/ Breiman, L. (2003) Manual - Setting Up, Using, and Understanding Random Forests V4.0", available at ftp://ftp.stat.berkeley.edu/pub/users/breiman/ 25 The generated random forest classifiers consisted of 1000 exploratory decision trees, i.e. maximally grown decision trees consisting of pure final nodes only. The high number of decision trees was used in order to (1) ensure best classification performance, i.e. a saturation of the test error on the lowest possible level, see Figure 2A and (3) to obtain a sound statistical 30 basis for determining variable importance. Decision tree generation was based on bootstrap sub-samples resulting from 98 random selections of cases with replacement from each class, so that both classes were weighted equally. Nodes were split by applying the Gini splitting rule to random subsets consisting of 8 randomly selected variables (masses). 35 Example 5. Classifier structure. The final classifier consists of 1000 decision trees, each decision tree consisting typically of about 25 terminal nodes, see Figure 3. For each variable, variable importance was determined as the total decrease in node impurity achieved by splits using this variable, averaged over all trees. The high number of trees ensures a sound statistical basis for variable importance. Node 40 impurity was measured by the Gini index. Table 4 shows all variables ranked according to importance in the final random forest classifier.
WO 2004/102188 PCT/EP2004/005292 The high classification performance of random forests is based on the high degree of independence of the underlying low-biased single decision trees. The high degree of independence is established by two stochastic processes: (1) bootstrapping introducing 5 variations of the training data and (2) the random restriction to small variable subsets for each node splitting. The classification result of the final random forest classifier is determined by majority vote: each case is assigned to the class for which most single decision trees vote. The more decision 10 trees assigned to a given case, the higher the probability that this case actually belongs to the corresponding class. Figure 4 visualizes how normalized votes for class "positive" are distributed. Votes were determined by an out-of-bag approach to estimate the distribution of votes on independent test data. Vote normalization was performed as follows; (number of votes for class "positive" - number of votes for class negative) / (number of trees for which the 15 considered case is "out-of-bag"). Normalized votes range from -1 (all votes for class "negative") to +1 (all votes for class "positive"). Difficult to classify cases possess normalized votes around zero. Cases with especially clear classification result are those with high absolute value of normalized vote. In Figure 4, dashed vertical lines correspond to quantiles at 0%, 25%, 50%, 75%, and 100%, thereby illustrating which values of normalized votes are typical for clear 20 (e.g. below 25%- and above 75%-quantile) and non-clear voting results (between 25%- and 75%-quantile). Example 6. Classification performance. Classification performance was estimated by two different methods: 1) 10-fold cross validation 25 in the proof-of-principle framework and, 2) out-of-bag estimation for the final classifier. The confusion matrix obtained from cross-validation is presented in Table 2. Performance was estimated by 67.59 % specificity and 76.85 % sensitivity. The confusion matrix obtained for the final classifier from out-of-bag estimation is presented in Table 3. It yields slightly higher performance levels of 68.52 % specificity and 76.85 % sensitivity. 30 For the final classifier, progression and success of learning are visualized in Figure 2A. The out-of-bag error is the proportion of misclassified cases in the entire data set. For the classification of each case (patient) only those trees are applied that were constructed independently of that case, i.e. for which the considered case was not in the bootstrap 35 sub-sample used for training. Such cases are called "out-of-bag" cases. Table 3 states classifier performance on the basis of out-of-bag estimation and majority voting for the final classifier. A case is assigned to class "positive" if more than 50% of the decision WO 2004/102188 PCT/EP2004/005292 trees vote for this class. By varying the 50% threshold from 0% to 100%, we obtain an out-of bag estimation of the ROC curve of the final classifier, see Figure 2B. The ROC curve extrapolates the performance of the generated classifier to neighbouring sensitivity and specificity ranges, thereby visualizing the possible trade-off between sensitivity and specificity. 5 The out-of-bag ROC curve estimatioii is a valid estimation for the ROC performance of the final classifier on unseen test data as the out-of-bag error was not used for classifier tuning. Instead, training parameters were chosen in accordance with Breiman L. (2001 a) and in order to obtain reasonable statistics for variable importance, see Table 4. The obtained AUC value is 0.79. 10 Summary Currently, many groups are utilising proteomic technologies to comparatively analyse the differences in protein levels in. disease vs. non-diseased patients in the hopes of discovering serological biomarkers that will aid in disease diagnosis. One such technology currently being employed is surface enhanced laser desorption ionization (SELDI); a modification of 15 matrix-assisted laser desorption ionization/time of flight (MALDI-TOF). This technology is a mass spectrometry technique that allows for the simultaneous analysis of multiple biomarkers within a biological sample. This technology, when coupled with decision tree ensembles of varying complexities, can lead 20 to the identification of biomarker patterns (classifiers) which correctly classify a patient as healthy or having a given disease. In the context of this invention, the biomarker profiles (biomolecule molecular masses) listed in Table 4 are able to correctly classify a patient as either healthy, having a non-malignant disease of the breast, or having DCIS (early stage cancer) or a breast cancer, with a high degree of sensitivity and specificity. The higher the sensitivity and 25 specificity of a biomarker pattern, the more likely it is capable of determining a patients' diagnosis with a high degree of accuracy. Herein, classification performances were estimated by two different approaches: cross validation and out-of-bag. Both approaches yielded similar performance estimates, see Table 2 30 and 3, respectively. The progressive success of classifier generation is shown in Figure 2A. From Figure 2A, it is evident that the out-of-bag error decreases with an increase in the number of decision trees. Classification performance can be extended to the entire range of sensitivity and specificity and visualized in ROC curve form, see Figure 2B. The classifiers are ensemble classifiers, i.e., they consist of many single decision trees of varying complexity. Figure 3 35 visualizes decision tree complexity by the number of nodes of each single decision tree. The importance of a single mass in an ensemble classifier is determined by summing its partitioning success. This yields a ranked list of masses shown in Table 4. For some patients, out-of-bag WO 2004/102188 PCT/EP2004/005292 voting is clear, e.g. all trees vote for the same class, while for other patients the decision is close, e.g. 51% of trees vote for class "negative" and 49% for class "positive". The entire gradual distribution of voting results is shown in Figure 4. 5 The present analysis applies the "random forest" approach, an extension of bagging. This approach, in addition to data set variations on the level of included cases ("bootstrapping"), restricts feature selection in each partitioning step to random feature subsets. Thereby, the generated decision trees vary significantly and are more independent from each other. Accordingly, averaging over many decision trees yields a better overall classification 10 performance. Based on this information, one can employ the biomarker patterns for the development of a comprehensive diagnostic tool for breast cancer detection. Furthermore, such a diagnostic tool will provide the practising clinician with a basis on which to design a more personalised therapy 15 program for a given patient, thereby improving the overall prognosis of the patient. 20 25 30 35 WO 2004/102188 PCT/EP2004/005292 Disease Number of samples Non-malignant mastopathy 25 5 Other* 14 Malignant DCIS 15 T1 37 Table 1. Distribution of T2 38 10 Serum Samples from T3 8 Patients with a given Breast T4 10 Disease 15 20 *fat necrosis, sclerosing adenosis, fibroadenoma, and small 25 duct and intraductal papillomas Table 2. Confusion matrix by cross-validation using the respective test datasets. 30 (predicted class) negative positive (actual class) negative 73 35 positive 25 83 35 WO 2004/102188 PCT/EP2004/005292 Table 3. Confusion matrix by out-of-bag estimation for final classifier. (predicted class) negative positive 5 (actual class) negative 74 34 positive 25 83 10 Table 4. Variable importance. The table presents the variable importance for all masses, i.e. the total decrease in node impurity achieved by a variable during final classifier construction 15 averaged over all trees. Masses are ranked according to their importance. mass importance mass importance mass importance M 12656.2 6.49 M 14082.5 1.22 M4476.36 0.89 M11414.5 5.42 M10464.2 1.14 M9661.47 0.88 M15909.3 4.41 M6454.34 1.13 M22495.9 0.88 M15366.8 3.46 M11723.3 1.12 M2017.00 0.88 M14820.7 3.19 M4363.33 1.11 M2267.52 0.87 M15159.6 3.1 M3327.77 1.07 M28312.7 0.86 M 1533.37 2.89 M23217.8 1.06 M7011.50 0.86 M4962.25 2.53 M17637.6 1.05 M8230.23 0.85 M7590.86 2.43 M4831.37 1.05 M4614.05 0.85 M2607.17 2.34 M17961.4 1.04 M4295.00 0.85 M1506.00 2.22 M9445.67 1.04 M8589.09 0.84 M3507.81 1.76 M4161.22 1.03 M8791.50 0.82 M2052.78 1.76 M10232.1 1.02 M4724.85 0.8 M16202.0 1.64 M3950.65 0.99 M9954.84 0.8 M6654.56 1.56 M17503.5 0.98 M22709.5 0.78 M15975.3 1.45 M18145.8 0.97 M9376.93 0.76 M5114.84 1.44 M1974.74 0.97 M8939.44 0.75 M8717.36 1.44 M13811.8 0.96 M4873.64 0.71 M4107.32 1.39 M17416.3 0.96 M28118.6 0.68 M7998.40 1.36 M12491.8 0.95 M14013.7 0.68 M22383.2 1.32 M8486.84 0.94 M18430.0 0.67 M5654.92 1.31 M11566.7 0.94 M6905.53 0.66 M5863.17 1.3 M13776.5 0.93 M13651.7 0.62 M3660.00 1.29 M10681.8 0.92 M18656.4 0.61 M9736.95 1.29 M17287.5 0.92 M9220.92 0.53 M9160.26 1.28 M4244.87 0.89 M5497.21 1.27 M1623.22 0.89 20

Claims (18)

1. A method for the differential diagnosis of a breast cancer and/or a non-malignant disease of the breast, in vitro, comprising: 5 a) obtaining a test sample from a subject, b) contacting test sample with a biologically active surface under specific binding conditions c) allowing the biomolecules within the test sample to bind said biologically active surface, 10 d) detecting bound biomolecules using a detection method, wherein the detection method generates a mass profile of said test sample, e) transforming the mass profile into a computer readable form, and f) comparing the mass profile of e) with a database containing mass profiles specific for healthy subjects, subjects having a precancerous lesion of the 15 breast, subjects having breast cancer, subjects having metastasised breast cancer, or subjects having a non-malignant disease of the breast, wherein said comparison allows for the differential diagnosis of a subject as healthy, having a precancerous lesion of the breast, having a breast cancer, having a metastasised breast cancer and/or a non-malignant disease of the breast. 20
2. The method of claim 1, wherein the database is generated by a) obtaining biological samples from healthy subjects, subjects having a precancerous lesion of the breast, subjects having breast cancer, subjects having metastasised breast cancer, and subjects having non-malignant disease of the 25 breast, b) contacting said biological samples with a biologically active surface under specific binding conditions, c) allowing the biomolecules within the biological samples to bind to said biologically active surface, 30 d) detecting bound biomolecules using a detection method, wherein the detection method generates mass profiles of said biological samples, e) transforming the mass profiles into a computer-readable form, f) applying a mathematical algorithm to classify the mass profiles in e) as specific for healthy subjects, subjects having a precancerous lesion of the breast, 35 subjects having breast cancer, subjects having metastasised breast cancer, and subjects having non-malignant disease of the breast. WO 2004/102188 PCT/EP2004/005292
3. The method of claim 1, wherein the biomolecules are characterized by: a) diluting a sample 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, 2% Ampholine, at 0' to 40 b) further diluting said sample 1:10 with a binding buffer consisting of 0.1 M Tris 5 HC1, 0.02% Triton X-100, pH 8.5 at 00 to 40 c) contacting the sample with a biologically active surface comprising positively charged quaternary ammonium groups, d) incubating of the treated sample with said biologically active surface for 120 minutes under temperatures between 20 and 24'C at pH 8.5, 10 e) and analysing the bound biomolecules by gas phase ion spectrometry.
4. The method of claim 1, wherein the detection method is mass spectrometry.
5. The method of claim 4 wherein the method of mass spectrometry is selected from the 15 group of matrix-assisted laser desorption ionization/time of flight (MALDI-TOF), surface enhanced laser desorption ionisation/time of flight (SELDI-TOF), liquid chromatography, MS-MS, or ESI-MS.
6. The method of claims 1, wherein the biologically active surface comprises an adsorbent 20 selected from the group of quaternary ammonium groups, carboxylate groups, groups with alkyl or aryl chains, groups such as nitriloacetic acid that immobilize metal ions, or proteins, antibodies, or nucleic acids.
7. The method of claim 1, wherein the mass profiles comprise a panel of one or more 25 differentially expressed biomolecules.
8. The method of claim 7, wherein, wherein the biomolecules are selected from a group having the apparent molecular mass of 1506 Da ± 8 Da, 1533 Da d 8 Da, 1623 Da ± 8 Da, 1975 Da 1 10 Da, 2017 Da+ 10 Da, 2053 Da± 10 Da, 2268 Da± 11 Da, 2607 Da 30 ± 13 Da, 3328 Da ± 17 Da, 3508 Da ± 18 Da, 3660 Da 18 Da, 3951 Da± 20 Da, 4107 Da ± 21 Da, 4161 Da + 21 Da, 4245 Da ± 21 Da, 4295 Da ± 21 Da, 4363 Da ± 22 Da, 4476 Da ± 22 Da, 4614 Da ± 23 Da, 4725 Da ± 24 Da, 4831 Da ± 24 Da, 4874 Da ± 24 Da, 4962 Da ± 25 Da, 5115 Da ± 26 Da, 5497 Da ± 27 Da, 5655 Da ± 28 Da, 5863 Da 29 Da, 6454 Da ± 32 Da, 6655 Da ± 33 Da, 6906 Da 35 Da, 7012 Da ± 35 Da, 7591 35 Da + 38 Da, 7998 Da + 40 Da, 8230 Da + 41 Da, 8487 Da + 42 Da, 8589 Da ± 43 Da, 8717 Da ± 44 Da, 8792 Da ± 44 Da, 8939 Da 45 Da, 9160 Da + 46 Da, 9221 DaL 46 Da, 9377 Da ± 47 Da, 9446 Da± 47 Da, 9661 Da± 48 Da, 9737 Da± 49 Da, 9955 Da WO 2004/102188 PCT/EP2004/005292 + 50 Da, 10232 Da 51 Da, 10464 Da 52 Da, 10682 Da± 53 Da, 11414 Da± 57 Da, 11567 Da 58 Da, 11723 DaW 59 Da, 12492 Da ± 62 Da, 12656 Da t 63 Da, 13652 Da 68 Da 13776 Da ±69 Da, 13812 Da + 69 Da, 14014 Da 70 Da, 14082 Da 70 Da, 14821 Da ± 74 Da, 15160 Dat 76 Da, 15367 Da ± 77 Da, 15909 Da ± 78 Da, 5 15975 Da + 80 Da, 16202 Da± 81 Da, 17288 Da± 86 Da, 17416 Da ± 87 Da, 17504 Da± 88 Da, 17638 Da!+ 88 Da, 17961 Da ± 90 Da, 18146 Da + 91 Da, 18430 Da ± 92 Da, 18656 Da ± 93 Da, 22383 Da± 112 Da, 22496 Da± 113 Da, 22710 Da 114 Da, 23218 Da 116 Da, 28119 Da± 141 Da, or 28313 Da:t 142 Da. 10
9. A method for the identification of differentially expressed biomolecules wherein the biomolecules of any of claims 1-8 are proteins, comprising: a) chromatography and fractionation, b) analysis of fractions for the presence of said differentially expressed proteins and/or fragments thereof, using a biologically active surface, 15 c) further analysis using mass spectrometry to obtain amino acid sequences encoding said proteins and/or fragments thereof, and d) searching amino acid sequence databases of known proteins to identify said differentially expressed proteins by amino acid sequence comparison. 20
10. The method of claim 9, wherein the method of chromatography is selected from high performance liquid chromatography (HPLC) or fast protein liquid chromatography (FPLC).
11. The method of claim 9, wherein the mass spectrometry used is selected from the group 25 of matrix-assisted laser desorption ionization/time of flight (MALDI-TOF), surface enhanced laser desorption ionisation/time of flight (SELDI-TOF), liquid chromatography, MS-MS, or ESI-MS.
12. A method for the differential diagnosis of a breast cancer and/or a malignant disease of 30 the breast, in vitro, comprising detection of one or more differentially expressed biomolecules wherein the biomolecules are polypeptides, comprising: a) obtaining a test sample from a subject, b) contacting said sample with a binding molecule specific for a differentially expressed polypeptide identified in claims 9-11, 35 c) detecting the presence or absence of said polypeptide(s), wherein the presence or absence of said polypeptide(s) allows for the differential diagnosis of a subject as healthy, having a precancerous lesion of the breast, having a WO 2004/102188 PCT/EP2004/005292 breast cancer, having a metastasised breast cancer and/or a non-malignant disease of the breast.
13. The method of any one of claims 1-12, wherein the test sample is a blood, blood serum, 5 plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, excreta, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract sample.
14. The method of any one of claims 1-12, wherein the biological sample is a blood, blood 10 serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, excreta, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract sample.
15. The method of any one of claims 1-12, wherein the subject is of mammalian origin. 15
16. The method of claim 15, wherein the subject is of human origin.
17. A kit for the diagnosis of a breast cancer and/or a non-malignant disease of the breast within a subject using the method of any one of claims 1-11 and 13-16 comprising a 20 denaturation solution, a binding solution, a washing solution, a biologically active surface comprising an adsorbent, and instructions to use the kit.
18. A kit for the diagnosis of a breast cancer or a non-malignant disease of the breast within a subject using the method of any one of claims 12-16 comprising a solution, binding 25 molecule, detection substrate, and instructions to use the kit.
AU2004239416A 2003-05-15 2004-05-17 Methods and applications of biomarker profiles in the diagnosis and treatment of breast cancer Abandoned AU2004239416A1 (en)

Applications Claiming Priority (15)

Application Number Priority Date Filing Date Title
EP03090141 2003-05-15
EP03090141.7 2003-05-15
US47277203P 2003-05-23 2003-05-23
EP03090153A EP1477803A1 (en) 2003-05-15 2003-05-23 Serum protein profiling for the diagnosis of epithelial cancers
EP03090153.2 2003-05-23
US60/472,772 2003-05-23
US52558303P 2003-11-24 2003-11-24
EP03090401.5 2003-11-24
US60/525,583 2003-11-24
EP03090401 2003-11-24
EP03090460.1 2003-12-30
EP03090460 2003-12-30
US53419704P 2004-01-02 2004-01-02
US60/534,197 2004-01-02
PCT/EP2004/005292 WO2004102188A1 (en) 2003-05-15 2004-05-17 Methods and applications of biomarker profiles in the diagnosis and treatment of breast cancer

Publications (1)

Publication Number Publication Date
AU2004239416A1 true AU2004239416A1 (en) 2004-11-25

Family

ID=56290563

Family Applications (2)

Application Number Title Priority Date Filing Date
AU2004239418A Abandoned AU2004239418A1 (en) 2003-05-15 2004-05-17 Biomarkers for the differential diagnosis of pancreatitis and pancreatic cancer
AU2004239416A Abandoned AU2004239416A1 (en) 2003-05-15 2004-05-17 Methods and applications of biomarker profiles in the diagnosis and treatment of breast cancer

Family Applications Before (1)

Application Number Title Priority Date Filing Date
AU2004239418A Abandoned AU2004239418A1 (en) 2003-05-15 2004-05-17 Biomarkers for the differential diagnosis of pancreatitis and pancreatic cancer

Country Status (4)

Country Link
EP (2) EP1629278A1 (en)
AU (2) AU2004239418A1 (en)
CA (2) CA2525740A1 (en)
WO (2) WO2004102189A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010151731A1 (en) * 2009-06-26 2010-12-29 University Of Utah Research Foundation Materials and methods for the identification of drug-resistant cancers and treatment of same
WO2011038509A1 (en) 2009-10-01 2011-04-07 Phenomenome Discoveries Inc. Serum-based biomarkers of pancreatic cancer and uses thereof for disease detection and diagnosis
WO2012129192A2 (en) * 2011-03-18 2012-09-27 Fox Chase Cancer Center Mucin 5b as a pancreatic cyst fluid specific biomarker for accurate diagnosis of mucinous cysts and other markers useful for dection of pancreatic malignancy
GB201501930D0 (en) 2015-02-05 2015-03-25 Univ London Queen Mary Biomarkers for pancreatic cancer
CA3188616A1 (en) * 2021-03-23 2022-09-29 Kashiv Biosciences, Llc Method for size based evalution of pancreatic protein mixture

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002023200A2 (en) * 2000-09-11 2002-03-21 Ciphergen Biosystems, Inc. Human breast cancer biomarkers
US6855554B2 (en) * 2001-09-21 2005-02-15 Board Of Regents, The University Of Texas Systems Methods and compositions for detection of breast cancer

Also Published As

Publication number Publication date
CA2525725A1 (en) 2004-11-25
WO2004102188A1 (en) 2004-11-25
AU2004239418A1 (en) 2004-11-25
WO2004102189A1 (en) 2004-11-25
EP1673623A1 (en) 2006-06-28
EP1629278A1 (en) 2006-03-01
CA2525740A1 (en) 2004-11-25

Similar Documents

Publication Publication Date Title
EP1838867B1 (en) Apolipoprotein a-ii isoform as a biomarker for prostate cancer
US20170089906A1 (en) Biomarkers for ovarian cancer
Seibert et al. Advances in clinical cancer proteomics: SELDI-ToF-mass spectrometry and biomarker discovery
Zhang et al. Mass spectrometry‐based “omics” technologies in cancer diagnostics
EP1934603A2 (en) Biomarker for prostate cancer
US20100047847A1 (en) Methods for diagnosing ovarian cancer
US20090227692A1 (en) Biomarkers for breast cancer
US9766246B2 (en) SRM/MRM assay for subtyping lung histology
AU2004279326A1 (en) Method for diagnosing head and neck squamous cell carcinoma
EP1477803A1 (en) Serum protein profiling for the diagnosis of epithelial cancers
AU2004239416A1 (en) Methods and applications of biomarker profiles in the diagnosis and treatment of breast cancer
CA2525743A1 (en) Differential diagnosis of colorectal cancer and other diseases of the colon
Fung et al. Biomarkers for Ovarian Cancer

Legal Events

Date Code Title Description
MK1 Application lapsed section 142(2)(a) - no request for examination in relevant period