EP1704416A2 - Protein expression profiling and breast cancer prognosis - Google Patents

Protein expression profiling and breast cancer prognosis

Info

Publication number
EP1704416A2
EP1704416A2 EP05702409A EP05702409A EP1704416A2 EP 1704416 A2 EP1704416 A2 EP 1704416A2 EP 05702409 A EP05702409 A EP 05702409A EP 05702409 A EP05702409 A EP 05702409A EP 1704416 A2 EP1704416 A2 EP 1704416A2
Authority
EP
European Patent Office
Prior art keywords
protein
breast
proteins
cytokeratin
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP05702409A
Other languages
German (de)
French (fr)
Inventor
Jocelyne Jacquemier
François BERTUCCI
Daniel Birnbaum
Stéphane DEBONO
Rebecca Tagett
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institut Paoli-Calmettes
Ipsogen
Institut National de la Sante et de la Recherche Medicale INSERM
Original Assignee
Institut Paoli-Calmettes
Ipsogen
Institut National de la Sante et de la Recherche Medicale INSERM
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institut Paoli-Calmettes, Ipsogen, Institut National de la Sante et de la Recherche Medicale INSERM filed Critical Institut Paoli-Calmettes
Publication of EP1704416A2 publication Critical patent/EP1704416A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57415Specifically defined cancers of breast
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/46Assays involving biological materials from specific organisms or of a specific nature from animals; from humans from vertebrates
    • G01N2333/47Assays involving proteins of known structure or function as defined in the subgroups
    • G01N2333/4701Details
    • G01N2333/4739Cyclin; Prad 1
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/46Assays involving biological materials from specific organisms or of a specific nature from animals; from humans from vertebrates
    • G01N2333/47Assays involving proteins of known structure or function as defined in the subgroups
    • G01N2333/4701Details
    • G01N2333/4742Keratin; Cytokeratin
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/705Assays involving receptors, cell surface antigens or cell surface determinants
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/705Assays involving receptors, cell surface antigens or cell surface determinants
    • G01N2333/70567Nuclear receptors, e.g. retinoic acid receptor [RAR], RXR, nuclear orphan receptors
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/705Assays involving receptors, cell surface antigens or cell surface determinants
    • G01N2333/71Assays involving receptors, cell surface antigens or cell surface determinants for growth factors; for growth regulators
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/90Enzymes; Proenzymes
    • G01N2333/914Hydrolases (3)
    • G01N2333/948Hydrolases (3) acting on peptide bonds (3.4)
    • G01N2333/95Proteinases, i.e. endopeptidases (3.4.21-3.4.99)
    • G01N2333/964Proteinases, i.e. endopeptidases (3.4.21-3.4.99) derived from animal tissue
    • G01N2333/96425Proteinases, i.e. endopeptidases (3.4.21-3.4.99) derived from animal tissue from mammals
    • G01N2333/96427Proteinases, i.e. endopeptidases (3.4.21-3.4.99) derived from animal tissue from mammals in general
    • G01N2333/9643Proteinases, i.e. endopeptidases (3.4.21-3.4.99) derived from animal tissue from mammals in general with EC number
    • G01N2333/96466Cysteine endopeptidases (3.4.22)

Definitions

  • the present invention relates to protein analysis and, in particular, to protein expression profiling of breast tumors and cancers.
  • Adjuvant systemic therapy has a favorable impact on survival in patients with early breast cancer. 1 ' 2
  • the decision to give or withhold such therapy is based upon a series of histoclinical prognostic criteria reviewed in consensus conferences (i.e. National Institute Health NIH and St-Gallen) . 3 ' 4
  • the heterogeneity of breast tumors remains poorly understood.
  • clinical treatment decisions on whether to treat patients with node-negative breast cancer by surgery and radiotherapy alone, or in combination with adjuvant chemotherapy are currently being made with scant information on patient risk for metastatic relapse. Additionally, identifying among the patients who receive chemotherapy those who will benefit and those who will not benefit from standard anthracyclin-based protocols remains elusive.
  • DNA arrays have recently significantly contributed to enhance understanding of the molecular complexity of breast cancer. 6
  • Several studies have demonstrated the potential clinical utility of gene expression signatures defined by the combined RNA expression of a few tens of genes. These signatures have lead to the development of a new molecular taxonomy of disease, including the identification of previously indistinguishable prognostic subclasses . 7_15
  • the clinical impact of these tests on disease management must be subsequently evaluated in large retrospective and prospective studies of adequate statistical power on fully annotated patient samples, followed by the development of gene expression-based diagnostics adapted to the clinical setting.
  • Unfortunately the cost, technical complexity, and interpretation of DNA microarray technology still complicate investigation with cancer specimens and are currently unsuitable for routine use in the standard clinical setting.
  • TMA tissue microarray
  • the aim of the present invention is to provide means capable of analyzing histopathologic features of breast disease, in particular of classifying breast cancers into prognostically relevant subclasses.
  • the present invention provides a protein expression signature identified by protein expression profiling and which may be used for analysing histopathologic features of breast disease as well as methods for carrying out such analysis.
  • protein expression profiling may be a clinically useful approach to assess breast cancer heterogeneity and prognosis in patients with stage I, II, or III disease.
  • the invention provides in one aspect a method for analyzing differential protein expression associated with histopathologic features of breast disease, in particular breast tumours, e.g., breast carcinomas, comprising the detection of the overexpression or underexpression of a pool of proteins in breast tissues or cells, said pool comprising all or part of a protein set comprising Afadin, Aurora A, a-Catenin, b-Catenin, BCL2, Cyclin Dl, Cyclin E, Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin, EGFR, ERBB2 , ERBB3, ERBB4, Estrogen receptor, FGFRl, FHIT, GATA3 , Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC1, TACC2, TACC3.
  • Cytokeratin 5/6 is meant Cytokeratin 5 and/or Cytokeratin 6. The same is applicable to “Cytokeratin 8/18".
  • the following table displays the proteins of the present invention and their corresponding amino- acid sequences (SEQ ID NO. 1 to 52). These proteins are identified by their common names (first column) in the methods, libraries, sets, pools etc. of the invention. Other names in the literature which designate the same proteins (alias, synonyms etc.)are covered as well, and are incorporated herein by reference.
  • the present invention may also define these proteins by their amino-acid ( polypeptidic ) sequences (SEQ ID NO. ) , or portions or modifications thereof in accordance with the definition of "protein” provided below.
  • Table 0 lists the proteins of the present invention and their corresponding amino- acid sequences (SEQ ID NO. 1 to 52). These proteins are identified by their common names (first column) in the methods, libraries, sets, pools etc. of the invention. Other names in the literature which designate the same proteins (alias, synonyms etc.)are covered as well, and are incorporated herein by reference.
  • the present invention may also define these proteins by their amino-acid ( polypeptidic
  • the invention provides a method for analyzing for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of protein in breast tissues comprising a protein set comprising: Aurora A, a-Catenin, b-Catenin, Cyclin Dl, Cytokeratin 8/18, ERBB2, ERBB3 , Estrogen receptor, FGFRl, Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC2.
  • the invention provides a method for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of protein in breast tissues comprising a protein set comprising: Afadin, Aurora A, a-Catenin, BCL2 , Cyclin Dl,
  • ERBB2 ERBB2, ERBB3, ERBB4, Estrogen receptor, FGFRl, FHIT,
  • Ki67 Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC2, TACC3.
  • the pool of protein comprises a protein set comprising Afadin, Aurora A, a-Catenin, b-Catenin, BCL2 , Cyclin Dl, Cyclin E, Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin, EGFR, ERBB2, ERBB3, ERBB4, Estrogen receptor, FGFRl, FHIT, GATA3 , Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC1, TACC2, TACC3.
  • the pool of protein comprises a protein set comprising all proteins of the Table 0 above.
  • the method further comprises at least one of the following embodiments : - the detection of overexpression of at least one, preferably at least two, three or all of the following proteins : EGFR, P53, Ki67, FGFRl, ERBB2 , ERBB3 , ERBB4 , Cyclin Dl, Cyclin E, Cytokeratin 5/6.
  • Estrogen Receptor FHIT, GATA3 , Mucin 1, P- Cadherin, Progesterone receptor, TACC1, TACC2, TACC3, Afadin, Aurora A, ⁇ -Catenin, ⁇ - Catenin,BCL2, Cytokeratin 8/18, E-Cadherin.
  • a further object of the invention is to provide a protein library useful for the molecular characterization of histopathologic features of breast disease comprising or corresponding to a pool of protein sequences, over or under expressed, in breast tissue or cells, said pool corresponding to the protein sets previously described.
  • said protein librairies may be immobilized on a solid support which may be preferably selected from the group comprising nylon membrane, nitrocellulose membrane, polyvinylidene difluoride, glass slide, glass beads, polystyrene plates, membranes on glass support, silicon chip or gold chip.
  • a solid support which may be preferably selected from the group comprising nylon membrane, nitrocellulose membrane, polyvinylidene difluoride, glass slide, glass beads, polystyrene plates, membranes on glass support, silicon chip or gold chip.
  • the present invention provides a method for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of protein in breast tissues comprising : a) obtaining breast tissue cells from a patient, and b) measuring in the tissue cells obtained in step (a) over or underexpression of proteins of a library as previously described.
  • the detection of over or under expression of the pool of protein may be carried out on breast tumor cell lines.
  • the proteins may be directly or indirectly labeled before reaction step (b) with a label which may be selected from the group comprising radioactive, colorimetric , enzymatic, molecular amplification, bioluminescent or fluorescent labels.
  • one or more specific label are used for each protein of the library according to the invention.
  • a person skilled the art will be able to choose appropriate labels and labelling methods to carry out the invention.
  • the measuring of over or under expression of proteins may be carried out on cell or tissue, frozen or embedded in any appropriate material, e.g., paraffin, e.g. tissue microarray.
  • any appropriate material e.g., paraffin, e.g. tissue microarray.
  • Various known method of the prior art may be used as, e.g., ImmunoHistoChemistry (IHC) technologies.
  • the measuring of over or under expression of proteins may be also be carried out by the use of, e.g., protein (micro)arrays, antibody (micro) arrays, antigen (micro) arrays or any other appropriate technology, e.g., by using the previously defined supports .
  • the method for analysing differential protein expression of the invention further comprises: a) obtaining a control sample b) measuring in the control sample obtained in step (a) expression level of each protein corresponding to library according to the invention c) comparing expression level of each protein with the level of equivalent protein in breast tissue cells from a patient, or in cell lines.
  • the present invention is useful for detecting, diagnosing, staging, monitoring, predicting, preventing conditions associated with breast cancer. It is particularly useful for predicting clinical outcome of breast cancer and/or predicting occurrence of metastatic relapse and/or determining the stage or aggressiveness of a breast disease in at least 50%, e.g., at least 55%, e.g., at least 60%, e.g., at least 65%, e.g., at least 70%, e.g., at least 75%, e.g., at least 80%, e.g., at least 85%, e.g., at least 90%, e.g., at least 95%, e.g., 100% of the patients.
  • the invention is also useful for selecting more appropriate doses and/or schedule of chemotherapeutics and/or biopharmaceuticals and/or radiation therapy to circumvent toxicities in a patient.
  • the invention is also useful for selecting appropriate doses and/or schedule of chemotherapeutics and/or (bio)pharmaceuticals , and/or targeted agents, among which one may cite Aromatase Inhibitors (e.g., Exomestane, Anastrazole, Letrozole), Anti-estrogens (e.g., Fluvestrant, Tamoxifen), Taxanes (e.g., PacliTaxol, Docetaxel), Antracyclines (e.g., Doxurubicin, Cyclophosphamide) , CHOP (Doxurubicin, Cyclophosphamide, ocovorin, prednisone when taken in combination).
  • Aromatase Inhibitors e.g., Exomestane, Anastrazole, Letrozole
  • Anti-estrogens e.g., Fluvestrant, Tamoxifen
  • Taxanes e.g., PacliTaxol, Docetaxel
  • Iressa gefitnib, ZD1839, anti-EGFR, PDGFR, c-kit, Astra-
  • GRH Novartis
  • PD-183805 RTK inhibitor, Pfizer
  • EMD72000 (anti-EGFR/VEGF ab, MerckKgaA) ; CI-1033 (HER2/neu & EGF-R dual inhibitor, Pfizer); EGF10004;
  • anti-breast cancer agents are described by Awada et al. in "The pipeline of new anticancer agents for breast cancer treatment in 2003" Critical Reviews in Oncology/Hematology 48 (2003) 45-63, the content of which is incorporated herein by reference.
  • breast tissue cell may be obtained from a patient regardless of whether said patient has received or not a neo-adjuvant or adjuvant, e.g., systemic, therapy.
  • a neo-adjuvant or adjuvant e.g., systemic, therapy.
  • treated or untreated cell lines may be used.
  • breast tissue cell may be obtained from a patient regardless of ER receptor expression.
  • the present invention provides a method for treating a patient with a breast cancer comprising (i) the implementation of a method for analysing differential protein expression according to the invention on a sample from said patient, and (ii) determining a treatment for this patient based on the analysis of differential protein expression profile obtained in step i).
  • the present invention relates to a method for analyzing differential protein expression associated with histopathologic features of breast disease according to the invention wherein the detection of the overexpression or underexpression of said pool of protein in breast tissues comprises the detection of the overexpression or underexpression of nucleic acids coding for said proteins.
  • the present invention further relates to a nucleic acids library useful for the molecular characterization of histopathologic features of breast disease comprising nucelic acids coding for the over or underexpressed proteins according to the invention, or equivalent thereof.
  • the sequences of the nucleic acids of the library according to the invention are easily available for a person skilled in the art that may, for example, use printed publications describing said sequences and/or public databases, e.g., the National Center for Biotechnological Information (NCBI) database, that provide such sequences as well.
  • NCBI National Center for Biotechnological Information
  • aggressiveness of cancer refers to cancer growth rate or potential to metastasise; a so-called “aggressive cancer” will grow or metastasise rapidly or significantly affect overall health status and quality of life
  • adjuvant therapy refers to treatment involving radiation, chemotherapy (drug treatment), biologic therapy (vaccines) or hormone therapy, or any combination given after primary treatment.
  • antibody is intended to include whole antibodies, e.g., of any isotype, and includes fragments thereof which are also specifically reactive with a vertebrate, e.g., mammalian, protein. Antibodies can be fragmented using conventional techniques and the fragments screened for utility in the same manner as described above for whole antibodies. Thus, the term includes segments generated by proteolyticcleavage or prepared recombinant portions of an antibody molecule capable of selectively reacting with a certain protein.
  • Non-limiting examples of such proteolytic and/or recombinant fragments include Fab, F(ab')2, Fab 1 , Fv, and single chain antibodies (scFv) containing a V[L] and/or V[H] domain joined by a peptide linker.
  • the scFv's may be covalently or non-covalently linked to form antibodies having two or more binding sites.
  • Antibodies may include polyclonal, monoclonal, or other purified preparations of antibodies and recombinant antibodies.
  • associated with refers to a disease in a subject which is caused by, contributed to by, or causative of an abnormal level of expression of a protein.
  • control comprises for example proteins from a sample of the same patient or from a pool of different patients, or selected among reference proteins which may be already known to be over or under expressed.
  • the expression level of said control can be an average or an absolute value of the expression of reference proteins. These values may be processed in order to accentuate the difference relative to the expression of the proteins according to the invention.
  • the analysis of the over or under expression of proteins can be carried out on sample such as biological material derived from any mammalian cells, including cell lines, xenografts, human tissues preferably breast tissue, etc.
  • the method according to the invention may be performed on sample from a, e.g., cell lines, healthy donors, patients or an animal (for example for veterinary application or preclinical studies).
  • directly or indirectly labeled include proteins the sub-constituants of which, i.e., amino acids or amino acid groups or atoms, are themselves labeled (directly), as well as proteins labeled by the intermediate of any element able to recognize and bind to the targeted protein, e.g., an antibody.
  • Equivalent includes nucleic acids encoding functionally equivalent proteins.
  • Equivalent nucleotide sequences will include sequences that differ by one or more nucleotide substitutions, additions or deletions, such as allelic variants; and will, therefore, include sequences that differ from the nucleotide sequence of the nucleic acids of the invention because of the degeneracy of the genetic code.
  • good-prognosis and “poor-prognosis” respectively refer to favorable (e.g., remission) or unfavorable (e.g., metastasis, death) patient clinical outcome.
  • histopathologic features of breast diseases includes diseases, disorders or conditions known as, lethaly or not, affecting breast cells and/or tissues, including but not limited to breast tumours, for example i) non cancerous breast diseases, for example, hyperplasias , metaplasias, fibroadenomas , fibrocystic disease, papillomas, sclerosing adenosis or preneoplastic, or ii) breast cancer.
  • breast cancer As “breast cancer” one may cite : A) noninvasive breast cancers including i) ductal carcinoma in situ (also called intraductal carcinoma or DCIS), consisting of cancer cells in the lining of the duct ii) Lobular carcinoma in situ, or LCIS (also known as lobular neoplasia); B) Invasive cancer occurring when cancer cells spread beyond the basement membrane which covers the underlying connective tissue in the breast, and which include i) Infiltrating ductal carcinoma that penetrates the wall of a duct and ii) Infiltrating lobular carcinoma which spread through the wall of a lobule and may sometimes appear in both breasts, sometimes in several separate locations.
  • ductal carcinoma in situ also called intraductal carcinoma or DCIS
  • LCIS also known as lobular neoplasia
  • B) Invasive cancer occurring when cancer cells spread beyond the basement membrane which covers the underlying connective tissue in the breast and which include i) Infiltrating ductal carcinoma that penetrate
  • ImmunoHistoChemistry refers to methods using histochemical localization of immunoreactive substances using antibodies as reagents on cells or tissues by technologies such as, but not limited to flow cytometry, ELISA, Western and Southwestern Blot Analysis, and frozen and paraffin-embedded samples.
  • Nucleic acids refers to polynucleotides, e.g., isolated, such as deoxyribonucleic acid (DNA), and, where appropriate, ribonucleic acid (RNA).
  • DNA deoxyribonucleic acid
  • RNA ribonucleic acid
  • the term should also be understood to include, as equivalents, analogs of RNA or DNA made from nucleotide analogs, and, as applicable to the embodiment being described, single (sense or antisense) and double-stranded polynucleotides.
  • ESTs, chromosomes, cDNAs, mRNAs , and rRNAs are representative examples of molecules that may be referred to as nucleic acids.
  • over or underexpression may comprise the detection of difference in the expression of the proteins according to the present invention in relation to at least one control.
  • predicting clinical outcome refers to the ability for a skilled artisan to classify patients into at least two classes “good prognosis” and “bad prognosis” showing significantly different long- term Metastasis Free Survival (MFS)
  • Protein refers to a polypeptide with a primary, secondary, tertiary or quaternary structure, or any portion or modification, e.g., a mutant, or isoform thereof.
  • a "portion” or “modification” of a protein retains at least one biological or antigenic characteristic of a native (wild-type) protein.
  • Protein microarray refers to a spatially defined and separated collection of individual proteins immobilised on a solid surface.
  • Treating as used herein is intended to encompass treating as well as ameliorating at least one symptom of the condition or disease.
  • Figure 1 represents hierarchical clustering analysis of global protein expression profiles in breast cancer as measured by IHC on TMA.
  • Colored bars to the right and colored branches in the dendrogram indicate the locations of 3 sample clusters of interest zoomed in C.
  • B Dendrogram of proteins. Two major clusters “PI” (basal/stem cells) and “P2" ( luminal/glandular cells) are identified and further divided in 4 smaller clusters designated “proliferation”, “mitosis”, “ER-related” and “adhesion” cluster, respectively.
  • C Expanded view of selected sample clusters showing a partial grouping of tumors with similar histological type (LOB: lobular, DUC : ductal, OTH: other, MIX: mixed; blue bar) or ER status (positive, red bar and negative, orange bar).
  • LOB histological type
  • Figure 2 represents classification of 552 breast cancer samples based on the expression of the 21-protein discriminator set identified by supervised analysis.
  • Each row of the data matrix (left panel) represents a sample and each column represents a protein. Immunostaining results are depicted according to the color scale used in Figure 1.
  • the 21 proteins, listed above the matrix (ER*: means of three independent ER analyses), are ordered from left to right according to decreasing _P (_P is the difference between the probability of positive staining and the probability of negative staining in non-metastatic samples).
  • Tumor samples are numbered from 1 to 552 and are ordered from top to bottom according to their increasing “Metastasis Score” (right panel).
  • the orange dashed line indicates the threshold 0 that separates the two classes of samples, "poor- prognosis” (under the line) and "good-prognosis” (above the line).
  • the middle panel indicates the occurrence (black square) or not (white square) of metastatic relapse for each patient.
  • Figure 3 represents Kaplan-Meier analysis of the metastasis-free survival of patients with breast cancer according to the molecular classification based on the 21-protein expression signature or the St-Gallen and the NIH consensus criteria.
  • Patients were classified in the "good- prognosis” class or the "poor-prognosis” class using the 21-protein signature identified by supervised analysis (A, B, E and F) or in the "low risk” class or the "high risk” class using the St-Gallen and the NIH consensus criteria (C and D) .
  • the P-values are calculated using the log-rank test.
  • Figure 4 represents expression of proteins studied by IHC on tissue microarrays (TMA).
  • TMA tissue microarrays
  • C Examples of IHC staining for 5 proteins with differential expression in cancer tissue (bottom) compared with normal tissue (top).
  • clustering allowed the identification of four major coherent protein clusters designated according to the function of most included proteins: "ER-related cluster”, “adhesion cluster”, “mitosis cluster” and “proliferation cluster”.
  • Correlated expression of proteins may be due to different mechanisms such as coregulation (e.g., ER/BCL2 30 ), functional interaction (e.g., STK6/Taxins 27, 28 ), phenotypic association (e.g., ERBB2/P53 31 ) or chromosomal location (e.g., FGFR1/TACC1 located on 8pll).
  • coregulation e.g., ER/BCL2 30
  • functional interaction e.g., STK6/Taxins 27, 28
  • phenotypic association e.g., ERBB2/P53 31
  • chromosomal location e.g., FGFR1/TACC1 located on 8pll.
  • this cluster also included CDH3/P-Cadherin, present in a "basal cluster” identified in gene expression analyses 9 and previously shown to be overexpressed in a subgroup of breast carcinomas associated with higher proliferation rates and aggressive behavior.
  • 35 Hierarchical clustering sorted tumors into three clusters that correlated with relevant histoclinical parameters, including histological type, SBR grade, ER status, ERBB2 status and the presence or absence of peritumoral vascular emboli. Correlations were found between the characteristics of these tumor clusters and their protein expression profiles.
  • the high number of grade III tumors in cluster B agreed with the frequent strong expression of the "proliferation" cluster - which included ERBB2 - and the "mitosis” cluster in these tumors.
  • 99% of cluster Al samples were ER-positive, and showed a frequent strong expression of the "ER-related” cluster and low expression of the "proliferation cluster”.
  • the tumor clusters also correlated with a breast cancer classification recently proposed in two series of analyses that provided a new conceptual framework of mammary oncogenesis.
  • phenotypic analyses have established a three-cell phenotypic classification of breast cancer cells.
  • cytokeratins cytokeratins
  • basic cells contain mammary gland progenitor cells able to give raise to both "luminal” and “myoepithelial” 38 cells.
  • Progenitor cells express type II keratins CK5 and 6.
  • differentiated "luminal” cells express type II keratin CK8 and type I keratin CK18, which are also observed in normal simple and glandular epithelia.
  • Luminal cells also express ER.
  • Cluster B may consist of tumors with basal/progenitor, ER-negative characteristics, i.e. strong expression of CK5/6 and proliferation markers.
  • A2 tumors, with an intermediate profile, may represent a transitory "baso-luminal" stage, or consist of tumors that have lost ER function. It can be expected that luminal Al tumors, in which the bulk of cells are more differentiated and express ER-related cluster proteins, are of better prognosis, whereas more undifferentiated and proliferative basal B tumors are associated with poor prognosis. The significant differences in clinical outcome observed between the three defined tumor clusters in this study are consistent with this model and recent studies. 9"11 ' 41 In addition, we show that lobular carcinomas are luminal-like tumors, and consist of differentiated luminal cells that express CK8/18.
  • this prognostic signature was validated in an independent set of 184 patients, showing its robustness.
  • Our discriminator set included 10 proteins coded by genes identified across recent gene expression studies, 7"15 as well as other proteins with unclear role in disease progression and sensitivity to systemic therapy.
  • the prognostic value of the signature was increasingly accurate with the addition of other proteins as evidenced by univariate and multivariate analyses, further highlighting the strength of large-scale molecular analyses for understanding tumor heterogeneity through the identification of expression signatures.
  • the classification based on the 21-protein predictor was associated with a highly significant difference in clinical outcome.
  • the 5-year MFS was
  • the 5-year MFS was 90% for ER-positive patients from the "good- prognosis class", and 58% for ER-positive patients from the "poor-prognosis class", suggesting our 21- protein set may provide more accurate clinical information than ER status alone, possibly reflecting functional differences in the ER pathway. Additionally, our molecular classification conserved its predictive impact for patients independent of adjuvant systemic therapy. Since distant metastasis may be influenced by adjuvant therapy, we separately analyzed the 186 patients who did not receive any chemo- and hormone therapy, as well as the 133 patients who exclusively received adjuvant chemotherapy with anthracyclin-based regimen in most cases.
  • the 21- protein signature may facilitate the selection of appropriate treatment options in early breast cancer patients. It may be an important clinical tool to circumvent unnecessary, toxic and costly treatment of node-negative patients, and it may help for selecting, among patients who need adjuvant chemotherapy, those who might benefit from standard protocol and those who would be candidates to other protocol or other form of systemic therapy.
  • Clinical annotation of each sample included patient age, axillary lymph node status, pathological tumor size, Scarff-Bloom-Richardson (SBR) grade, peritumoral vascular invasion, estrogen receptor (ER), progesterone receptor (PR) and ERBB2 status as evaluated by IHC with positivity cut-off values of 1% for hormone receptors and with 2 or 3+ score (HercepTest kit scoring guidelines) for ERBB2.
  • SBR Scarff-Bloom-Richardson
  • PR progesterone receptor
  • ERBB2 status as evaluated by IHC with positivity cut-off values of 1% for hormone receptors and with 2 or 3+ score (HercepTest kit scoring guidelines) for ERBB2.
  • the characteristics of patients are listed in Table 1 (see first column only).
  • Table 1 Histoclinical characteristics of 552 breast cancer patients, according to the membership to the "good-prognosis” or the “poor-prognosis class” as defined using the expression of the 21-protein set.
  • CI denotes confidence interval
  • the median follow-up was 57 months (range, 2 to 182) after diagnosis for the 450 patients who did not experience metastatic relapse as a first event, 37 months (range, 4 to 151) for the 102 patients with metastasis as first event, and 51 months (range, 2 to 182) for all patients.
  • the 5-year MFS rate was 80% [95%CI 76.2 - 83.7].
  • TMA's were prepared as previously described 25 with slight modifications. For each tumor, three representative areas from the primary tumor were carefully selected from a hematoxylin-eosin stained section of a donor block. Core cylinders with a diameter of 0.6 mm each were punched from each of these areas and deposited into three separate recipient paraffin blocks using a specific arraying device (Beecher Instruments, Silver Spring, MD). The technique of TMA allows the analysis of tumors and controls under identical experimental conditions. In addition to tumor tissues, the recipient block also received 10 normal breast tissue samples from 10 healthy women that underwent reductive mammary surgery and pellets from nine mammary cell lines.
  • the selection of the proteins was done according to the following criteria: known or potential importance in breast cancer and availability of a corresponding antibody that performed well in IHC on paraffin-embedded tissues. Twenty-six proteins were selected including hormone receptors (ER, PR), subclass markers (Cytokeratins), oncogenes and proliferation proteins (ERBB family members, BCL2, Cyclins, MIBl, FGFRl, Aurora A, Taxins), tumor suppressors (P53, FHIT), adhesion molecules (Cadherins, Catenins, Afadin), proteins from oncogenes of amplified genomic regions (ERBB2, CCND1, STK6), and other potential prognostic markers identified in specific studies or previous DNA microarray experiments (CCNE, GATA3 , MUCl).
  • ER hormone receptors
  • Cytokeratins oncogenes and proliferation proteins
  • ERBB family members BCL2, Cyclins, MIBl, FGFRl, Aurora A, Taxins
  • Mmab mouse monoclonal antibody
  • Rpab rabbit polyclonal antibody
  • DTRS Dako target retrieval solution.
  • IHC Immunohistochemical analysis
  • IHC was carried out on five- ⁇ m sections of tissue fixed in alcohol formalin for 24 h and embedded in paraffin. Sections were deparaffinized in Histolemon (Carlo Erba Reagenti, Rodano, Italy) and rehydrated in graded alcohol. Antigen retrieval was accomplished by incubating the sections in pre-treatment solutions depending on the antibody used. Pretreatment conditions are listed in Table 2. The reactions were carried out using an autoimmunostainer (Dako Autostainer).
  • Staining was performed at room temperature as follows: rehydrated tissues were washed in phosphate buffer, followed by quenching of endogenous peroxidase activity by treatment with 0.1% H 2 0 2 , slides, incubated with blocking serum (Dako) for 30 min., then with the affinity-purified antibody for one hour. After washes, slides were sequentially incubated with biotinylated antibody against rabbit IgG for 20 min. followed by streptadivin-conjugated peroxidase (Dako LSAB R 2 kit), then visualized with Diaminobenzidine (3-amino-9- ethylcarbazole) .
  • the IHC scores were recorded as negative (negative staining) or positive (weakly and strong positive staining).
  • the classifier was derived through training on a subset of chosen samples (2/3 of population, learning set) and then validated on the remaining subset (1/3 of population, validation set). The assignment of samples to each set was random, but the ratio between tumors with and without metastatic relapse was preserved. An exhaustive testing comprising all combinations of 1 to 5 proteins, as well as the complementary combinations of 21 to 25 proteins was performed to assess their ability to classify tumors into 2 classes ("poor-prognosis" and "good-prognosis") in agreement with their clinical outcome.
  • the number of misclassifications was defined as the number of X tumors classified in the "good-prognosis class" plus the number of Y tumors classified in the "poor-prognosis class".
  • the best classifier protein-set was that with the minimal rate of misclassified tumors.
  • the prognostic power of the classifier was tested on the validation set by classifying the remaining independent tumors using the same approach. Finally, it was assessed on the whole population. For each tumor set, the prognostic impact was further estimated by univariate analyses that compared the rate of metastatic relapses within the two molecularly defined classes of tumors (Fisher exact test) .
  • CI denotes confidence interval
  • Figure IB displays the dendrogram of related proteins.
  • the three interpretations of ER staining made independently by two pathologists were highly correlated (R 2 between 0.87 and 0.96) ( Figure 1C, middle and bottom panels).
  • PI Two major protein clusters - designated "PI” and "P2" - were identified ( Figure IB).
  • ER-related cluster of ER-associated proteins
  • Afadin an "adhesion cluster”
  • the fourth cluster (thereafter designated “proliferation cluster") defined by the routinely used marker Ki67/MIB1, revealed that proteins such as EGFR, ERBB2 , P53 and the Gl cyclin CCNE are preferentially overexpressed in tumors undergoing rapid growth.
  • the combined protein expression patterns defined two major clusters of tumors designated cluster A (462 cases) and cluster B (89 cases) in Figure 1 (1 case that clustered outside of the 2 clusters was excluded from further analysis).
  • Cluster A could be further subdivided into two subclusters, Al (393 cases) and A2 (89 cases).
  • cluster Al tumors displayed a strong expression of the "ER cluster” and the “adhesion cluster” and a low expression of the "proliferation cluster” in most of cases, whereas the "mitosis cluster” was strongly expressed in -50% of samples.
  • cluster B tumors displayed overall a low expression of the "ER cluster” but a strong expression of the three other protein clusters.
  • Cluster A2 included ER-positive and ER-negative tumors that displayed an intermediate profile characterized overall by strong expression of the "adhesion cluster” and a low expression of the "ER cluster", the "proliferation cluster” and the "mitosis cluster”.
  • cluster Al 41% of cases were grade I and 15% were grade III compared with 23% and 35% in cluster A2 , and 7% and 63% in cluster B (p ⁇ 0.0001; Chi-2 test), respectively.
  • cluster B samples were more likely to be ERBB2-positive (2+ or 3+ in IHC, 36% of cases) compared with 8% in cluster Al and 12% in cluster A2 (p ⁇ 0.0001, Chi-2 test).
  • cluster Al samples were more likely to be ER-positive (99% of cases) compared with 35% in cluster A2 and 10% in cluster B (p ⁇ 0.0001, Chi-2 test).
  • the learning set of samples allowed the identification of a combination of proteins (protein expression signature) that correlated with long-term MFS.
  • the number of proteins in the "metastatic predictor” was optimized by iteratively testing all combinations of 1 to 5 proteins and the complementary combinations of 21 to 25 proteins and by assessing their ability for correct classification of samples using a "Metastatic Score".
  • the optimal combination for these tumors contained 21 proteins ( Figure 2C). Examples of IHC staining for these 21 proteins are shown in Figure 4B.
  • Samples from the learning set were ordered using the "Metastatic Score”. Two classes of samples (“poor-prognosis class", positive scores and "good- prognosis class", negative scores) were defined using a cut-off value of 0.
  • OR 6.1 [95%CI 3.3 - 11.3], p ⁇ 0.0001, Fisher exact test).
  • _P is the difference between the probability of positive staining and the probability of negative staining in non-metastatic samples.
  • the orange dashed line indicates the threshold 0 that separates the two classes, "good- prognosis” (above the line) and “poor-prognosis” (under the line).
  • the histoclinical factors that correlated with MFS were pathological tumor size ( ⁇ 20 mm, >20), tumor grade (SBR I, II, III), number of positive axillary lymph nodes (0, 1-3, ⁇ 4), and peritumoral vascular invasion (negative, positive).
  • CI denotes confidence interval.
  • the parameters entered in the model were dichotomised and included the classification based on the discriminator 21-protein set ("good- prognosis class" and "poor-prognosis class"), age of patients ( ⁇ 50 years, >50 years), number of positive axillary lymph nodes (0, 1-3, ⁇ 4 ) , pathological tumor size ( ⁇ 20 mm, >20), tumor grade (SBR I, II, III), estrogen receptor status (negative, positive), progesterone receptor status (negative, positive), peritumoral vascular invasion (negative, positive), chemotherapy (delivery or not), hormone therapy (delivery or not) and each of the proteins (negative, positive) significantly associated with survival in univariate analyses.
  • Results are shown in Table 4.
  • Several independent factors predictive of distant metastasis as first event were evidenced including the prognosis signature based on the 21- protein combination, pathological size of tumors, axillary lymph node status (only when dichotomized ⁇ 3 vs >3), Ki67/MIB1 status and delivery of hormone therapy.
  • the 21-protein signature was the strongest predictor with a hazard ratio of 2.2 for "poor-prognosis class” patients, compared to "good- prognosis class” patients ([95%CI 1.25 - 3.89], p ⁇ 0.0001) .
  • References :
  • Tamoxifen for early breast cancer an overview of the randomised trials .
  • Early Breast Cancer Trialists ' Collaborative Group . Lancet 1998 ;
  • Boecker W Buerger H. Evidence of progenitor cells of glandular and myoepithelial cell lineages in the human adult female breast epithelium: a new progenitor (adult stem) cell concept. Cell Prolif 2003; 36 Suppl 1:73-84.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Urology & Nephrology (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Hematology (AREA)
  • Medicinal Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Food Science & Technology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of proteins in breast tissues or cells, said pool comprising all or part, for example one, two, three or more of a protein set comprising: Afadin, Aurora A, a-Catenin, b-Catenin, BCL2, Cyclin Dl, Cyclin E, Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin, EGFR, ERBB2, ERBB3, ERBB4, Estrogen receptor, FGFR1, FHIT, GATA3, Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC1, TACC2, TACC3, Cytokeratin 6, Cytokeratin 18, Angl, AuroraB, BCRP1, CathepsinD, CD10, CD44, CK14, Cox2, FGF2, GATA4, Hifla, MMP9, MTA1, NM23, NRG1a, NRGlbeta, P27, Parkin, PLAU, S100, SCRIBBLE, Smooth Muscle Actin, THBS1, TIMP1.

Description

Protein expression profiling and breast cancer prognosis
I - Field of the invention The present invention relates to protein analysis and, in particular, to protein expression profiling of breast tumors and cancers.
II — Background
Adjuvant systemic therapy has a favorable impact on survival in patients with early breast cancer.1' 2 The decision to give or withhold such therapy is based upon a series of histoclinical prognostic criteria reviewed in consensus conferences (i.e. National Institute Health NIH and St-Gallen) .3' 4 However, despite the establishment of standardized criteria, the heterogeneity of breast tumors remains poorly understood. For example, clinical treatment decisions on whether to treat patients with node-negative breast cancer by surgery and radiotherapy alone, or in combination with adjuvant chemotherapy are currently being made with scant information on patient risk for metastatic relapse. Additionally, identifying among the patients who receive chemotherapy those who will benefit and those who will not benefit from standard anthracyclin-based protocols remains elusive. However, the relatively limited efficacy of current protocols (-30-40% of failure rate) and the increasing availability of new therapies make this issue clinically important. Furthermore, the development of molecularly-targeted drugs such as trastuzumab (Herceptin™) , a monoclonal antibody against the ERBB2 tyrosine kinase receptor, is needed.5 With few exceptions, such as estrogen receptor and ERBB2 receptor, the available molecular markers are of limited value in clinical practice. High-throughput molecular technologies such as
DNA arrays, have recently significantly contributed to enhance understanding of the molecular complexity of breast cancer.6 Several studies have demonstrated the potential clinical utility of gene expression signatures defined by the combined RNA expression of a few tens of genes. These signatures have lead to the development of a new molecular taxonomy of disease, including the identification of previously indistinguishable prognostic subclasses .7_15The clinical impact of these tests on disease management must be subsequently evaluated in large retrospective and prospective studies of adequate statistical power on fully annotated patient samples, followed by the development of gene expression-based diagnostics adapted to the clinical setting. Unfortunately, the cost, technical complexity, and interpretation of DNA microarray technology still complicate investigation with cancer specimens and are currently unsuitable for routine use in the standard clinical setting. Issues that must be addressed prior to validation and integration of this technology to clinical pathology laboratories include the requirement for high-quality RNA extracted from unfixed tissues, intra-tumoral heterogeneity of excised patient samples, and bias resulting from the asymmetry of variables with a number of hybridized samples greatly inferior to the number of genes being tested leading to non-trivial statistical problems. Finally, the sensitivity, specificity, reproducibility and technical feasibility outside large academic centers will have to be addressed, and experimental conditions will have to be standardized and data compared in multi- center clinical trials. Additional opportunities to validate and/or identify prognostic expression signatures are provided by alternative high-throughput approaches, which may be used either separately or in combination with DNA microarrays. One of these is the tissue microarray (TMA) technique,16"18 which allows for the simultaneous study of hundreds of tumor specimens at the DNA, RNA or protein level. Immunohistochemistry (IHC) is applicable to paraffin-embedded samples that constitute the bulk of pathology archives, avoiding the requirement for high-quality RNA extracted from frozen specimens. IHC is relatively inexpensive, straightforward and well established in standard clinical pathology laboratories. Thus, IHC on TMA may be a practical approach both in validation studies and in routine testing. However, analytical classification methods to efficiently process and interpret multiple target IHC data have not been previously developed. Recent studies have shown the reliability of hierarchical clustering for classifying cancers when applied to IHC TMA data of a significant range of markers.19"24 However none addressed the prognostic issue. The aim of the present invention is to provide means capable of analyzing histopathologic features of breast disease, in particular of classifying breast cancers into prognostically relevant subclasses. After exhaustive testing on a retrospective panel of 552 early breast cancer samples we have found that this classification was possible by analyzing a consistent set of proteins. Classification of samples, based on this multidimensional protein data set, was first done using classical unsupervised hierarchical clustering. We then developed a supervised bioinformatic method that further improved the classification as compared with usual prognostic factors.
III - Summary of the invention
The present invention provides a protein expression signature identified by protein expression profiling and which may be used for analysing histopathologic features of breast disease as well as methods for carrying out such analysis. In particular, protein expression profiling may be a clinically useful approach to assess breast cancer heterogeneity and prognosis in patients with stage I, II, or III disease. It may be used both for breast tumor management in clinical settings and as a research tool in academic laboratories The invention provides in one aspect a method for analyzing differential protein expression associated with histopathologic features of breast disease, in particular breast tumours, e.g., breast carcinomas, comprising the detection of the overexpression or underexpression of a pool of proteins in breast tissues or cells, said pool comprising all or part of a protein set comprising Afadin, Aurora A, a-Catenin, b-Catenin, BCL2, Cyclin Dl, Cyclin E, Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin, EGFR, ERBB2 , ERBB3, ERBB4, Estrogen receptor, FGFRl, FHIT, GATA3 , Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC1, TACC2, TACC3.
By "all or part" is meant 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 or 52 proteins.
By "Cytokeratin 5/6" is meant Cytokeratin 5 and/or Cytokeratin 6. The same is applicable to "Cytokeratin 8/18".
The following table displays the proteins of the present invention and their corresponding amino- acid sequences (SEQ ID NO. 1 to 52). These proteins are identified by their common names (first column) in the methods, libraries, sets, pools etc. of the invention. Other names in the literature which designate the same proteins (alias, synonyms etc.)are covered as well, and are incorporated herein by reference. The present invention may also define these proteins by their amino-acid ( polypeptidic ) sequences (SEQ ID NO. ) , or portions or modifications thereof in accordance with the definition of "protein" provided below. Table 0
"Over or underexpression of a pool of protein" means that overexpression of certain proteins are detected simultaneously to the underexpression of others said proteins. "Simultaneously" means concurrent with or within a biologic or functionally relevant period of time during which the over expression of a protein may be followed by the under expression of another protein, or conversely, e.g., because both expressions are directly or indirectly correlated.
In a further aspect, the invention provides a method for analyzing for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of protein in breast tissues comprising a protein set comprising: Aurora A, a-Catenin, b-Catenin, Cyclin Dl, Cytokeratin 8/18, ERBB2, ERBB3 , Estrogen receptor, FGFRl, Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC2.
In a further aspect, the invention provides a method for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of protein in breast tissues comprising a protein set comprising: Afadin, Aurora A, a-Catenin, BCL2 , Cyclin Dl,
Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin,
ERBB2, ERBB3, ERBB4, Estrogen receptor, FGFRl, FHIT,
Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC2, TACC3.
According to a preferred embodiment the pool of protein comprises a protein set comprising Afadin, Aurora A, a-Catenin, b-Catenin, BCL2 , Cyclin Dl, Cyclin E, Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin, EGFR, ERBB2, ERBB3, ERBB4, Estrogen receptor, FGFRl, FHIT, GATA3 , Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC1, TACC2, TACC3.
According to another embodiment the pool of protein comprises a protein set comprising all proteins of the Table 0 above. The method further comprises at least one of the following embodiments : - the detection of overexpression of at least one, preferably at least two, three or all of the following proteins : EGFR, P53, Ki67, FGFRl, ERBB2 , ERBB3 , ERBB4 , Cyclin Dl, Cyclin E, Cytokeratin 5/6. the detection of overexpression of at least one, preferably at least two, three or all of the following proteins : Estrogen Receptor, FHIT, GATA3 , Mucin 1, P- Cadherin, Progesterone receptor, TACC1, TACC2, TACC3, Afadin, Aurora A, α-Catenin, β- Catenin,BCL2, Cytokeratin 8/18, E-Cadherin.
A further object of the invention is to provide a protein library useful for the molecular characterization of histopathologic features of breast disease comprising or corresponding to a pool of protein sequences, over or under expressed, in breast tissue or cells, said pool corresponding to the protein sets previously described.
Preferably, said protein librairies may be immobilized on a solid support which may be preferably selected from the group comprising nylon membrane, nitrocellulose membrane, polyvinylidene difluoride, glass slide, glass beads, polystyrene plates, membranes on glass support, silicon chip or gold chip.
In a further aspect, the present invention provides a method for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of protein in breast tissues comprising : a) obtaining breast tissue cells from a patient, and b) measuring in the tissue cells obtained in step (a) over or underexpression of proteins of a library as previously described. Alternatively to breast tissue cells from a patient, the detection of over or under expression of the pool of protein may be carried out on breast tumor cell lines. The proteins may be directly or indirectly labeled before reaction step (b) with a label which may be selected from the group comprising radioactive, colorimetric , enzymatic, molecular amplification, bioluminescent or fluorescent labels. Advantageously, one or more specific label are used for each protein of the library according to the invention. A person skilled the art will be able to choose appropriate labels and labelling methods to carry out the invention. For example, one may use a label selected in the group comprising and not limited to : biotine, digoxygenin.
The measuring of over or under expression of proteins may be carried out on cell or tissue, frozen or embedded in any appropriate material, e.g., paraffin, e.g. tissue microarray. Various known method of the prior art may be used as, e.g., ImmunoHistoChemistry (IHC) technologies. The measuring of over or under expression of proteins may be also be carried out by the use of, e.g., protein (micro)arrays, antibody (micro) arrays, antigen (micro) arrays or any other appropriate technology, e.g., by using the previously defined supports . According to an advantageous embodiement, the method for analysing differential protein expression of the invention further comprises: a) obtaining a control sample b) measuring in the control sample obtained in step (a) expression level of each protein corresponding to library according to the invention c) comparing expression level of each protein with the level of equivalent protein in breast tissue cells from a patient, or in cell lines.
The present invention is useful for detecting, diagnosing, staging, monitoring, predicting, preventing conditions associated with breast cancer. It is particularly useful for predicting clinical outcome of breast cancer and/or predicting occurrence of metastatic relapse and/or determining the stage or aggressiveness of a breast disease in at least 50%, e.g., at least 55%, e.g., at least 60%, e.g., at least 65%, e.g., at least 70%, e.g., at least 75%, e.g., at least 80%, e.g., at least 85%, e.g., at least 90%, e.g., at least 95%, e.g., 100% of the patients. The invention is also useful for selecting more appropriate doses and/or schedule of chemotherapeutics and/or biopharmaceuticals and/or radiation therapy to circumvent toxicities in a patient.
In particular, the invention is also useful for selecting appropriate doses and/or schedule of chemotherapeutics and/or (bio)pharmaceuticals , and/or targeted agents, among which one may cite Aromatase Inhibitors (e.g., Exomestane, Anastrazole, Letrozole), Anti-estrogens (e.g., Fluvestrant, Tamoxifen), Taxanes (e.g., PacliTaxol, Docetaxel), Antracyclines (e.g., Doxurubicin, Cyclophosphamide) , CHOP (Doxurubicin, Cyclophosphamide, ocovorin, prednisone when taken in combination). Other drugs like Velcade™, 5-Fluorouracil , Vinblastine, Gemcitabine, Methotrexate, Goserelin, Irinotecan, Thiotepa, Topotecan or Toremifene may be cited as well.
For targeted therapies, one may cite Iressa (gefitnib, ZD1839, anti-EGFR, PDGFR, c-kit, Astra-
Zeneca); ABX-EGFR (anti-EGFR, Abgenix/Amgen ) ;
Zarnestra (FTI, J & J/Ortho-Biotech) ; Herceptin
(anti-HER2/neu, Genentech); Avastin (bevancizumab, anti-VEGF antibody, Genentech); Tarceva (ertolinib, OSI-774, RTK inhibitor, Genentech-Roche ) ; ZD66474
(anti-VEGFR, Astra-Zeneca ) ; Erbitux (IMC-225, cetuximab, anti-EGFR, Imclone/BMS) ; Oncolar (anti-
GRH, Novartis); PD-183805 (RTK inhibitor, Pfizer);
EMD72000, (anti-EGFR/VEGF ab, MerckKgaA) ; CI-1033 (HER2/neu & EGF-R dual inhibitor, Pfizer); EGF10004;
Herzyme (anti-HER2 ab, Medizyme Pharmaceuticals);
Corixa (Microsphere delivery of HER2/neu vaccine,
Medarex) .
Further relevant anti-breast cancer agents are described by Awada et al. in "The pipeline of new anticancer agents for breast cancer treatment in 2003" Critical Reviews in Oncology/Hematology 48 (2003) 45-63, the content of which is incorporated herein by reference.
Advantageously, in a method according to the present invention, breast tissue cell may be obtained from a patient regardless of whether said patient has received or not a neo-adjuvant or adjuvant, e.g., systemic, therapy. Similarly, treated or untreated cell lines may be used.
Advantageously, in a method according to the present invention, breast tissue cell may be obtained from a patient regardless of ER receptor expression.
In a further aspect, the present invention provides a method for treating a patient with a breast cancer comprising (i) the implementation of a method for analysing differential protein expression according to the invention on a sample from said patient, and (ii) determining a treatment for this patient based on the analysis of differential protein expression profile obtained in step i).
In a further aspect, the present invention relates to a method for analyzing differential protein expression associated with histopathologic features of breast disease according to the invention wherein the detection of the overexpression or underexpression of said pool of protein in breast tissues comprises the detection of the overexpression or underexpression of nucleic acids coding for said proteins.
The present invention further relates to a nucleic acids library useful for the molecular characterization of histopathologic features of breast disease comprising nucelic acids coding for the over or underexpressed proteins according to the invention, or equivalent thereof. The sequences of the nucleic acids of the library according to the invention are easily available for a person skilled in the art that may, for example, use printed publications describing said sequences and/or public databases, e.g., the National Center for Biotechnological Information (NCBI) database, that provide such sequences as well. The content of the NCBI database may be available via internet at the following adress http://www.ncbi.nlm.nih.gov/.
Definitions
"aggressiveness of cancer" refers to cancer growth rate or potential to metastasise; a so-called "aggressive cancer" will grow or metastasise rapidly or significantly affect overall health status and quality of life
"adjuvant therapy" refers to treatment involving radiation, chemotherapy (drug treatment), biologic therapy (vaccines) or hormone therapy, or any combination given after primary treatment.
"antibody" is intended to include whole antibodies, e.g., of any isotype, and includes fragments thereof which are also specifically reactive with a vertebrate, e.g., mammalian, protein. Antibodies can be fragmented using conventional techniques and the fragments screened for utility in the same manner as described above for whole antibodies. Thus, the term includes segments generated by proteolyticcleavage or prepared recombinant portions of an antibody molecule capable of selectively reacting with a certain protein. Non-limiting examples of such proteolytic and/or recombinant fragments include Fab, F(ab')2, Fab1, Fv, and single chain antibodies (scFv) containing a V[L] and/or V[H] domain joined by a peptide linker. The scFv's may be covalently or non-covalently linked to form antibodies having two or more binding sites. Antibodies may include polyclonal, monoclonal, or other purified preparations of antibodies and recombinant antibodies.
"associated with" refers to a disease in a subject which is caused by, contributed to by, or causative of an abnormal level of expression of a protein.
"control" comprises for example proteins from a sample of the same patient or from a pool of different patients, or selected among reference proteins which may be already known to be over or under expressed. The expression level of said control can be an average or an absolute value of the expression of reference proteins. These values may be processed in order to accentuate the difference relative to the expression of the proteins according to the invention. The analysis of the over or under expression of proteins can be carried out on sample such as biological material derived from any mammalian cells, including cell lines, xenografts, human tissues preferably breast tissue, etc. The method according to the invention may be performed on sample from a, e.g., cell lines, healthy donors, patients or an animal (for example for veterinary application or preclinical studies).
"directly or indirectly labeled" include proteins the sub-constituants of which, i.e., amino acids or amino acid groups or atoms, are themselves labeled (directly), as well as proteins labeled by the intermediate of any element able to recognize and bind to the targeted protein, e.g., an antibody.
"equivalent" includes nucleic acids encoding functionally equivalent proteins. Equivalent nucleotide sequences will include sequences that differ by one or more nucleotide substitutions, additions or deletions, such as allelic variants; and will, therefore, include sequences that differ from the nucleotide sequence of the nucleic acids of the invention because of the degeneracy of the genetic code.
"good-prognosis" and "poor-prognosis" respectively refer to favorable (e.g., remission) or unfavorable (e.g., metastasis, death) patient clinical outcome.
"histopathologic features of breast diseases" includes diseases, disorders or conditions known as, lethaly or not, affecting breast cells and/or tissues, including but not limited to breast tumours, for example i) non cancerous breast diseases, for example, hyperplasias , metaplasias, fibroadenomas , fibrocystic disease, papillomas, sclerosing adenosis or preneoplastic, or ii) breast cancer. As "breast cancer" one may cite : A) noninvasive breast cancers including i) ductal carcinoma in situ (also called intraductal carcinoma or DCIS), consisting of cancer cells in the lining of the duct ii) Lobular carcinoma in situ, or LCIS (also known as lobular neoplasia); B) Invasive cancer occurring when cancer cells spread beyond the basement membrane which covers the underlying connective tissue in the breast, and which include i) Infiltrating ductal carcinoma that penetrates the wall of a duct and ii) Infiltrating lobular carcinoma which spread through the wall of a lobule and may sometimes appear in both breasts, sometimes in several separate locations.
"ImmunoHistoChemistry (IHC)" refers to methods using histochemical localization of immunoreactive substances using antibodies as reagents on cells or tissues by technologies such as, but not limited to flow cytometry, ELISA, Western and Southwestern Blot Analysis, and frozen and paraffin-embedded samples.
"Nucleic acids" refers to polynucleotides, e.g., isolated, such as deoxyribonucleic acid (DNA), and, where appropriate, ribonucleic acid (RNA). The term should also be understood to include, as equivalents, analogs of RNA or DNA made from nucleotide analogs, and, as applicable to the embodiment being described, single (sense or antisense) and double-stranded polynucleotides. ESTs, chromosomes, cDNAs, mRNAs , and rRNAs are representative examples of molecules that may be referred to as nucleic acids.
"over or underexpression" may comprise the detection of difference in the expression of the proteins according to the present invention in relation to at least one control.
"predicting clinical outcome" refers to the ability for a skilled artisan to classify patients into at least two classes "good prognosis" and "bad prognosis" showing significantly different long- term Metastasis Free Survival (MFS)
"Protein" refers to a polypeptide with a primary, secondary, tertiary or quaternary structure, or any portion or modification, e.g., a mutant, or isoform thereof. A "portion" or "modification" of a protein retains at least one biological or antigenic characteristic of a native (wild-type) protein.
"Protein microarray" refers to a spatially defined and separated collection of individual proteins immobilised on a solid surface.
"Treating" as used herein is intended to encompass treating as well as ameliorating at least one symptom of the condition or disease.
IV - Description of the figures
Figure 1 represents hierarchical clustering analysis of global protein expression profiles in breast cancer as measured by IHC on TMA. A/ Graphical representation of hierarchical clustering results based on expression profiles of 26 proteins in 552 early breast cancer samples. Each row represents a sample and each column represents a protein. Immunostaining results are depicted according to a color scale: red or brown for strong or moderate positive staining, respectively, green for negative staining, gray for missing data. Dendrograms of samples (to the left of matrix) and proteins (above matrix) represent overall similarities in expression profiles. Three major clusters of tumors (Al, A2 and B) are shown (Al and A2 correspond to luminal cells; B corresponds to basal cells). Colored bars to the right and colored branches in the dendrogram indicate the locations of 3 sample clusters of interest zoomed in C. B / Dendrogram of proteins. Two major clusters "PI" (basal/stem cells) and "P2" ( luminal/glandular cells) are identified and further divided in 4 smaller clusters designated "proliferation", "mitosis", "ER-related" and "adhesion" cluster, respectively. C/ Expanded view of selected sample clusters showing a partial grouping of tumors with similar histological type (LOB: lobular, DUC : ductal, OTH: other, MIX: mixed; blue bar) or ER status (positive, red bar and negative, orange bar).
Figure 2 represents classification of 552 breast cancer samples based on the expression of the 21-protein discriminator set identified by supervised analysis.
A and B/ Correlations between the molecular grouping based on the combined expression of the 21 proteins and the occurrence of metastatic relapse in the learning (A) and the validation (B) set of samples. C/ Supervised classification of all 552 samples using the 21-protein expression signature. Each row of the data matrix (left panel) represents a sample and each column represents a protein. Immunostaining results are depicted according to the color scale used in Figure 1. The 21 proteins, listed above the matrix (ER*: means of three independent ER analyses), are ordered from left to right according to decreasing _P (_P is the difference between the probability of positive staining and the probability of negative staining in non-metastatic samples). Tumor samples are numbered from 1 to 552 and are ordered from top to bottom according to their increasing "Metastasis Score" (right panel). The orange dashed line indicates the threshold 0 that separates the two classes of samples, "poor- prognosis" (under the line) and "good-prognosis" (above the line). The middle panel indicates the occurrence (black square) or not (white square) of metastatic relapse for each patient.
Figure 3 represents Kaplan-Meier analysis of the metastasis-free survival of patients with breast cancer according to the molecular classification based on the 21-protein expression signature or the St-Gallen and the NIH consensus criteria.
Patients (pts) were classified in the "good- prognosis" class or the "poor-prognosis" class using the 21-protein signature identified by supervised analysis (A, B, E and F) or in the "low risk" class or the "high risk" class using the St-Gallen and the NIH consensus criteria (C and D) . The P-values are calculated using the log-rank test. A/ Survival of all 552 patients. B/ Survival of 292 patients with node-negative cancer (N-) and 255 patients with node-positive cancer (N+). The difference of survival is significant between the "good-prognosis" class and the "poor-prognosis" class for the node- negative patients, as well as for the node-positive patients. In contrast, survival is not significantly different between the node-positive patients from the "good-prognosis class" and the node-negative patients from the "poor-prognosis class". C/ Survival of 292 patients with node-negative cancer (N-) according to the St-Gallen criteria. D / Survival of 292 patients with node-negative cancer (N-) according to the NIH criteria. E/ Survival of 186 patients without any adjuvant chemotherapy (CT) and hormone therapy (HT). F/ Survival of 133 patients who received adjuvant chemotherapy (CT) without hormone therapy (HT).
Figure 4 represents expression of proteins studied by IHC on tissue microarrays (TMA). A/ Representative Hematoxylin-Eosin and Safran staining of a paraffin block section (25x30 mm2) from a TMA containing 552 early breast cancer cases with 0.6 mm tumor cores. B/ Immunohistochemical staining of a tumor core for the 21 proteins identified by supervised analysis (magnification x200). C/ Examples of IHC staining for 5 proteins with differential expression in cancer tissue (bottom) compared with normal tissue (top). 1, FHIT expression in cytoplasm in normal lobules, down- regulation in cancer sample (arrow); 2, Apical normal expression of MUC1, down-regulation and miss- localization in the cytoplasm of cancer sample (arrow); 3, Absence of ERBB2 expression in normal lobule (arrow), overexpression on the cytoplasmic membrane in positive cancer sample (arrow); 4, Absence of nuclear expression of Cyclin Dl in normal lobules (arrow), overexpression in nucleus of positive cancer sample (arrow); 5, Normal myoepithelial cells are immunostained by P Cadherin (arrow), overexpression in cancer sample (arrow). Magnification is x400.
V - Detailed description of the invention We have combined IHC and TMA to measure the expression levels of selected proteins in a consecutive series of 552 patients with early stage breast cancer. Our aim was to determine protein combinations to refine tumor classification and improve the prognostic classification of disease.
V.l) Protein expression profiling identifies subclasses of breast cancer Analysis and interpretation of the large amount of data generated (552 samples and 26 antibodies, -14.000 data points) required the development of bioinformatic tools. As a first step, we applied pre-existing unsupervised hierarchical clustering algorithms as previously reported.19"24 Two recent studies on breast cancer analyzed the expression of 15 proteins in 166 tumors,22 and 13 proteins on 107 samples,19 respectively. Several of these markers were included in the present work (BCL2, ER, PR, ERBB2, EGFR, Cyclins, Cytokeratins , MIBl, P53), allowing for direct comparison of results. In our analysis, clustering allowed the identification of four major coherent protein clusters designated according to the function of most included proteins: "ER-related cluster", "adhesion cluster", "mitosis cluster" and "proliferation cluster". Correlated expression of proteins may be due to different mechanisms such as coregulation (e.g., ER/BCL230), functional interaction (e.g., STK6/Taxins27, 28), phenotypic association (e.g., ERBB2/P5331) or chromosomal location (e.g., FGFR1/TACC1 located on 8pll). Some co-expressed proteins were previously reported in RNA or protein expression profiling studies. For example, ER, PR, BCL2 and GATA3 clustered together.8"10' 13 This "ER-related cluster" was negatively correlated with the "mitosis" and "proliferation" clusters, in agreement with the higher proliferation index in ER-negative tumors32 and the known proliferation-differentiation balance in carcinomas. The "ER-related cluster" was close to the "adhesion cluster" that included other markers that may correlate positively with ER expression such as FHIT,33 CK8/18,19' 22 CCNDl34 and MUC1.8 Our "proliferation cluster" had some similarities to that identified by others with the common presence of P53, Ki67, CCNE, ERBB2 and CK5/619 or CCNE, ERBB2, EGFR and CK5/6.22 Interestingly, this cluster also included CDH3/P-Cadherin, present in a "basal cluster" identified in gene expression analyses9 and previously shown to be overexpressed in a subgroup of breast carcinomas associated with higher proliferation rates and aggressive behavior.35 Hierarchical clustering sorted tumors into three clusters that correlated with relevant histoclinical parameters, including histological type, SBR grade, ER status, ERBB2 status and the presence or absence of peritumoral vascular emboli. Correlations were found between the characteristics of these tumor clusters and their protein expression profiles. For example, the high number of grade III tumors in cluster B, as well as the high number of ERBB2-positive samples, agreed with the frequent strong expression of the "proliferation" cluster - which included ERBB2 - and the "mitosis" cluster in these tumors. Conversely, 99% of cluster Al samples were ER-positive, and showed a frequent strong expression of the "ER-related" cluster and low expression of the "proliferation cluster".32 Interestingly, the tumor clusters also correlated with a breast cancer classification recently proposed in two series of analyses that provided a new conceptual framework of mammary oncogenesis. First, phenotypic analyses have established a three-cell phenotypic classification of breast cancer cells.22, 36' 37 These authors suggested that biomarkers such as intermediate filaments cytokeratins (CK), encoded by a large number of keratin genes, are able to distinguish between distinct cell subpopulations within the mammary gland epithelial compartment. It has been proposed that "basal" cells contain mammary gland progenitor cells able to give raise to both "luminal" and "myoepithelial"38 cells. (39 for review) Progenitor cells express type II keratins CK5 and 6. In contrast, differentiated "luminal" cells express type II keratin CK8 and type I keratin CK18, which are also observed in normal simple and glandular epithelia. Luminal cells also express ER.10' 11 Use of tissue microarray screening has confirmed this emerging theory.19, 22 Second, recent gene expression analyses using DNA microarrays have led to a similar identification of subclasses of breast tumors that corresponded to the phenotypic classification.9"11 These experiments concurred to establish a distinction between several types of epithelial cells in the mammary gland. The origin of the breast malignant cell remains unknown. Two major types of breast cancer may derive from basal/progenitor or luminal cells, respectively. Alternatively, most tumors may originate from pluripotent stem cells and reach different stages of differentiation.40 Our results support this new classification model. Tumor cluster Al may be approximated to a cluster of luminal cell-like tumors, with frequent strong expression of ER and CK8/18. Cluster B may consist of tumors with basal/progenitor, ER-negative characteristics, i.e. strong expression of CK5/6 and proliferation markers. A2 tumors, with an intermediate profile, may represent a transitory "baso-luminal" stage, or consist of tumors that have lost ER function. It can be expected that luminal Al tumors, in which the bulk of cells are more differentiated and express ER-related cluster proteins, are of better prognosis, whereas more undifferentiated and proliferative basal B tumors are associated with poor prognosis. The significant differences in clinical outcome observed between the three defined tumor clusters in this study are consistent with this model and recent studies.9"11' 41 In addition, we show that lobular carcinomas are luminal-like tumors, and consist of differentiated luminal cells that express CK8/18.
V.2) Protein expression profiling predicts clinical outcome of breast cancer Thus classical unsupervised hierarchical clustering applied to all tested proteins was able to identify biologically and clinically relevant classes of breast cancer. Recently, supervised methods have been successfully applied to gene expression data analysis in parallel with unsupervised approaches.42 In a second step, we thus developed a supervised method to identify the best combination within 26 proteins that would further improve the prognostic classification. To our knowledge, our study is the first application of such supervised methods to large-scale IHC data. We identified a 21-protein set which optimally classified patients into two classes ("good- prognosis" and "poor-prognosis class") with significantly different long-term MFS. Initially identified in a random learning set of 368 patients, this prognostic signature was validated in an independent set of 184 patients, showing its robustness. Our discriminator set included 10 proteins coded by genes identified across recent gene expression studies,7"15 as well as other proteins with unclear role in disease progression and sensitivity to systemic therapy. The prognostic value of the signature was increasingly accurate with the addition of other proteins as evidenced by univariate and multivariate analyses, further highlighting the strength of large-scale molecular analyses for understanding tumor heterogeneity through the identification of expression signatures. The classification based on the 21-protein predictor was associated with a highly significant difference in clinical outcome. The 5-year MFS was
90% for patients of the "good-prognosis class" and only 62% for patients of the "poor-prognosis class". When compared in multivariate analysis with classical prognostic factors and with each tested protein separately, our classification performed significantly better for predicting the occurrence of metastatic relapse. Such prognostic association persisted when applied to patients with lymph node- positive and lymph node-negative cancer. Interestingly, the MFS of node-negative patients from the "poor-prognosis class" was similar to that of node-positive patients from the "good-prognosis class". Notably, our molecular classification performed better than that defined by St-Gallen and NIH criteria for node-negative patients. This finding is of particular significance, since -75% of node-negative patients candidate for adjuvant chemotherapy based on the St. Gallen/NIH criteria are currently thought to be over-treated. In the present study, our 21-protein predictor assigned fewer node-negative patients to the "poor-prognosis class", and their clinical outcome was more frequently unfavorable than it was for patients assigned to the high-risk class defined by St-Gallen or NIH criteria. Our predictor also performed well in patients irrespective of ER status. The 5-year MFS was 90% for ER-positive patients from the "good- prognosis class", and 58% for ER-positive patients from the "poor-prognosis class", suggesting our 21- protein set may provide more accurate clinical information than ER status alone, possibly reflecting functional differences in the ER pathway. Additionally, our molecular classification conserved its predictive impact for patients independent of adjuvant systemic therapy. Since distant metastasis may be influenced by adjuvant therapy, we separately analyzed the 186 patients who did not receive any chemo- and hormone therapy, as well as the 133 patients who exclusively received adjuvant chemotherapy with anthracyclin-based regimen in most cases. Interestingly, we found within the group of 186 untreated patients an odds ratio of 7.45 for metastatic relapse in the "poor-prognosis class" when compared with patients of the "good-prognosis class". Similar discrimination was observed within the 133 patients treated with chemotherapy alone with a corresponding odds ratio of 3. Thus, the 21- protein signature may facilitate the selection of appropriate treatment options in early breast cancer patients. It may be an important clinical tool to circumvent unnecessary, toxic and costly treatment of node-negative patients, and it may help for selecting, among patients who need adjuvant chemotherapy, those who might benefit from standard protocol and those who would be candidates to other protocol or other form of systemic therapy.
VI - Materials and Methods
VI.1) Patients and histological samples A consecutive series of 552 women with early (stage I, II or III) breast cancer treated at the Institut Paoli-Calmettes before December 1999 was studied using the TMA technology. The stage of disease was defined according to TNM classification (Union Internationale Contre le Cancer, UICC, TNM, 5th edition). Patients with locally advanced, inflammatory or metastatic disease, or with previous history of cancer were not included. Tumors were invasive adenocarcinomas including, according to the WHO histological typing, 388 ductal carcinomas (70%), 72 lobular (13%), 24 mixed (4%), 40 tubular (8%), 8 medullary (1%) and 20 other types (4%). Clinical annotation of each sample included patient age, axillary lymph node status, pathological tumor size, Scarff-Bloom-Richardson (SBR) grade, peritumoral vascular invasion, estrogen receptor (ER), progesterone receptor (PR) and ERBB2 status as evaluated by IHC with positivity cut-off values of 1% for hormone receptors and with 2 or 3+ score (HercepTest kit scoring guidelines) for ERBB2. The characteristics of patients are listed in Table 1 (see first column only).
Table 1. Histoclinical characteristics of 552 breast cancer patients, according to the membership to the "good-prognosis" or the "poor-prognosis class" as defined using the expression of the 21-protein set.
*, as defined using the 21-protein signature;
**, P-values for the comparison of numbers of patients were calculated using the Chi-2 test, and P-values for the comparison of metastasis-free survival (MFS) were calculated using the log-rank test; NS, not significant; ***, calculated, for the 450 patients who did not experience metastatic relapse as a first event, from the date of diagnosis to the time of last follow-up;
CI denotes confidence interval.
Patients were treated according to the following guidelines : all had primary surgery that included complete resection of breast tumor (modified radical mastectomy in 28% of cases and lumpectomy in 72%) and axillary lymph node dissection; 96% of patients (including 100% of those treated with breast-conservative surgery) received adjuvant local-regional radiotherapy; 47% were given adjuvant chemotherapy (anthracyclin-based regimen in most cases), and 42% received adjuvant hormone treatment (tamoxifen for most cases). After completion of local-regional treatment, patients were evaluated at least twice per year for the first 5 years and at least annually thereafter. The median follow-up was 57 months (range, 2 to 182) after diagnosis for the 450 patients who did not experience metastatic relapse as a first event, 37 months (range, 4 to 151) for the 102 patients with metastasis as first event, and 51 months (range, 2 to 182) for all patients. The 5-year MFS rate was 80% [95%CI 76.2 - 83.7].
VI.2) Tissue microarrays construction TMA's were prepared as previously described25 with slight modifications. For each tumor, three representative areas from the primary tumor were carefully selected from a hematoxylin-eosin stained section of a donor block. Core cylinders with a diameter of 0.6 mm each were punched from each of these areas and deposited into three separate recipient paraffin blocks using a specific arraying device (Beecher Instruments, Silver Spring, MD). The technique of TMA allows the analysis of tumors and controls under identical experimental conditions. In addition to tumor tissues, the recipient block also received 10 normal breast tissue samples from 10 healthy women that underwent reductive mammary surgery and pellets from nine mammary cell lines. Five-μm sections of the resulting TMA block were made and used for IHC analysis after transfer onto glass slides. We previously assessed the reliability of the method by comparison with the standard immunohistochemical method for the usual prognostic parameters; the value of the kappa test was 0.95.25
VI.3) Selection of the 26 markers The selection of the proteins was done according to the following criteria: known or potential importance in breast cancer and availability of a corresponding antibody that performed well in IHC on paraffin-embedded tissues. Twenty-six proteins were selected including hormone receptors (ER, PR), subclass markers (Cytokeratins), oncogenes and proliferation proteins (ERBB family members, BCL2, Cyclins, MIBl, FGFRl, Aurora A, Taxins), tumor suppressors (P53, FHIT), adhesion molecules (Cadherins, Catenins, Afadin), proteins from oncogenes of amplified genomic regions (ERBB2, CCND1, STK6), and other potential prognostic markers identified in specific studies or previous DNA microarray experiments (CCNE, GATA3 , MUCl). Twelve out of the 26 proteins were mentioned as potential significant genes in RNA expression profiling studies in breast cancer. 6"15 The characteristics of the antibodies used are listed in Table 4. When available, several antibodies were studied for comparison, and only the reagents that gave the best quality data were kept for the global analysis.
Table 2 . Proteins tested by immunohistochemistry on TMAs and characteristics of the corresponding antibodies .
Mmab: mouse monoclonal antibody; Rpab: rabbit polyclonal antibody; DTRS: Dako target retrieval solution.
VI.4) Immunohistochemical analysis IHC was carried out on five-μm sections of tissue fixed in alcohol formalin for 24 h and embedded in paraffin. Sections were deparaffinized in Histolemon (Carlo Erba Reagenti, Rodano, Italy) and rehydrated in graded alcohol. Antigen retrieval was accomplished by incubating the sections in pre-treatment solutions depending on the antibody used. Pretreatment conditions are listed in Table 2. The reactions were carried out using an autoimmunostainer (Dako Autostainer). Staining was performed at room temperature as follows: rehydrated tissues were washed in phosphate buffer, followed by quenching of endogenous peroxidase activity by treatment with 0.1% H202, slides, incubated with blocking serum (Dako) for 30 min., then with the affinity-purified antibody for one hour. After washes, slides were sequentially incubated with biotinylated antibody against rabbit IgG for 20 min. followed by streptadivin-conjugated peroxidase (Dako LSABR2 kit), then visualized with Diaminobenzidine (3-amino-9- ethylcarbazole) . Slides were counter-stained with hematoxylin, coverslipped using Aquatex (Merck, Darmstadt, Germany) mounting solution, then evaluated under a light microscope by two pathologists . The results were expressed in terms of percentage (P) and intensity (I) of positive cells as previously described 25. For each sample, the mean of the score of a minimum of two core biopsies was calculated. The results were then scored by the quick score (Q) (Q = P X I), except for ERBB2 status that was evaluated with the Dako scale (HercepTest™ kit scoring guidelines). Quick score allowed separating tumors into two or three classes. Homogeneous classes were defined by grouping samples with an equivalent staining level according to the distribution curves as described.25 Two classes (negative and positive) were defined for Afadin, α and β Catenins, BCL2, Cyclins Dl and E, Cytokeratins 5/6 and 8/18, EGFR, ERBB3, ERBB4 , FGFRl, GATA3, MIB1, P53, P-Cadherin, PR and TACC3, with a positivity cut-off value of Q = 1, except for Cyclin Dl and MIB1 with a positivity cut-off value of 10 and 20, respectively. Three classes were defined (negative, moderate and strong staining) for Aurora A, E-Cadherin, ER, FHIT, MUCl, TACC1, and TACC2, with negative (Q = 0), moderate ( 0< Q ≤ 100 ) or strong expression (100 < Q ≤ 300). For ERBB2 , three classes (0/1+, 2+, 3+) were obtained with the Dako scale.
VI.5) Data analysis A combination of exploratory unsupervised and supervised bioinformatic methods was used to analyze these immunohistochemical profiles. First, we applied unsupervised hierarchical clustering similar to that used in gene expression profiling studies. Data were reformatted using the following scoring system: -2 designated negative staining, 1 weakly positive staining, 2 strongly positive staining and missing data were left blank in the scored table. Hierarchical clustering investigates relationships between samples and between proteins, based on the similarity of sample immunoreactive scores. We used the Cluster program (average-linkage with Pearson correlation as similarity metric) and results were displayed with the TreeView software.26 We then performed supervised analysis to identify the**- protein-set that best distinguished between two classes of samples with different clinical outcome. To simplify the analyses, the IHC scores were recorded as negative (negative staining) or positive (weakly and strong positive staining). The classifier was derived through training on a subset of chosen samples (2/3 of population, learning set) and then validated on the remaining subset (1/3 of population, validation set). The assignment of samples to each set was random, but the ratio between tumors with and without metastatic relapse was preserved. An exhaustive testing comprising all combinations of 1 to 5 proteins, as well as the complementary combinations of 21 to 25 proteins was performed to assess their ability to classify tumors into 2 classes ("poor-prognosis" and "good-prognosis") in agreement with their clinical outcome. Using the protein expression scores of each combination, we developed a "Metastasis Scoring" system that assigned to each tumor a probability to belong to the "poor-prognosis class" or the "good-prognosis class". Consider a combination of N proteins P^ ,...,PN (where N ranges from 1 to 5 and 21 to 26) and two predefined classes X,Y of tumors within the learning set: X = includes samples with metastatic relapse during the follow-up and Y = γ{,...,YM } includes samples without any metastatic relapse. For each protein combination tested, one tumor is represented as a ternary vector (e.g. X{ = (Pi),...,Xl (PN)} where each component is scored 0 for missing data or +1/-1 for positive/negative IHC staining. Every tumor Z has a score S(Z) defined as follows. For each protein Pt , we compute the frequencies of +1/-1 value in the X class (adjusted to avoid a 0 probability): x caτdty Xk (Pl) ≠ θ}+ 2 where, for instance, card : Xk (Pl ) = +l| is the number of X tumors with positive IHC staining for protein Pt . Similarly we compute the frequencies fγ(+l) and fγ(-l) in the 7class and we define /.'(0) = 1. The Metastasis Score of tumor Z is the log ratio of the joint probabilities : s(z) = logfez )))- ]og(f z(p,))). 1 =\ 1^1 Samples were then sorted according to their S(Z) score. The natural threshold that divides the population in 2 classes is S = 0: if S(Z) > 0 then Z is more similar to the class X and is predicted to belong to the "poor-prognosis class" and if S(Z) < 0 then Z is more similar to the class Y and is predicted to belong to the "good-prognosis class". The number of misclassifications (error rate) was defined as the number of X tumors classified in the "good-prognosis class" plus the number of Y tumors classified in the "poor-prognosis class". The best classifier protein-set was that with the minimal rate of misclassified tumors. Once identified, the prognostic power of the classifier was tested on the validation set by classifying the remaining independent tumors using the same approach. Finally, it was assessed on the whole population. For each tumor set, the prognostic impact was further estimated by univariate analyses that compared the rate of metastatic relapses within the two molecularly defined classes of tumors (Fisher exact test) .
VI .6 ) Statistical methods Distributions of molecular markers and other categorical variables were compared using either the standard Chi-2 test or Fisher exact test. The follow-up was calculated from the date of diagnosis to the time of metastasis as first event or time of last follow-up for censored patients. The end point was the metastasis-free survival (MFS), calculated from the date of diagnosis, first metastasis being scored as an event. All other patients were censored at the time of the last follow-up, death, recurrence of local or regional disease, or development of a second primary cancer, including contralateral breast cancer. Survival curves were derived from Kaplan-Meier estimates and were compared by log-rank test. The influence of molecular grouping, adjusted for other factors including classical prognostic factors and significant IHC measurement, was assessed in multivariate analysis by the Cox proportional hazard models. Survival rates and odds ratios (OR) are presented with their 95% confidence intervals (95%CI). Statistical tests were two-sided at the 5% level of significance. All statistical tests were done using SAS Version 8.02. VII - Results
VII.1) Expression protein profiling of breast cancers using tissue microarrays. The expression of 26 proteins was studied by IHC on TMA containing 552 early stage breast tumor samples and controls (Figure 4A) . As expected, staining for all antibodies was homogeneous among the 10 normal breast samples (data not shown), but much more heterogeneous for tumor samples. Sixteen proteins were underexpressed in 12% (for MUCl) to 60% (for Aurora A) of cases, and overexpressed for 10 proteins in 11% (for Ki67/MIB1) to 66% (for ERBB4 ) of cases in cancerous tissues compared to normal samples. Examples of IHC staining are shown in Figure 4 (panels B and C). Results are summarized in Table 3.
Table 3. Expression of proteins tested by immunohistochemistry in 552 early breast cancers deposited on TMA and Kaplan-Meier analysis of the metastasis-free survival (MFS).
*, as compared to 10 normal breast samples.
**, P-values for the comparison of MFS were calculated using the log-rank test.
CI denotes confidence interval.
VII.2) Unsupervised hierarchical classification of 552 breast tumors upon protein expression profiling. VII.2.1) Hierarchical clustering The overall expression patterns for the 552 samples were first analyzed with hierarchical clustering. Results are displayed in a color-coded matrix in Figure 1A. The clustering algorithm orders proteins on the horizontal axis and samples on the vertical axis on the basis of similarity of their expression profiles. This similarity is shown as a dendrogram where the length of branch between two elements reflects their degree of relatedness. Protein expression scores are represented according to a color scale: red for strong positive staining, brown for weak positive staining and green for negative staining. Despite significantly heterogeneous expression, such combinatorial analysis and color display highlighted groups of correlated proteins across correlated samples. Figure IB displays the dendrogram of related proteins. As expected, the three interpretations of ER staining made independently by two pathologists were highly correlated (R2 between 0.87 and 0.96) (Figure 1C, middle and bottom panels). Furthermore, there was a high degree of concordance for expression of ER between IHC on full sections and on TMA (p<0.0001, Chi-2 test). Two major protein clusters - designated "PI" and "P2" - were identified (Figure IB). These clusters were further divided into smaller sub-groups including a cluster (thereafter designated "ER-related cluster") of ER- associated proteins (PR, BCL2 , GATA3 ) and an "adhesion cluster" (E-Cadherin, _-Catenin, Afadin). We27 have demonstrated that Aurora A (STK6) and Taxins (TACCl-3) are interacting partners and involved in cell division. This translated in the formation of a third cluster (thereafter designated "mitosis cluster"). The fourth cluster (thereafter designated "proliferation cluster") defined by the routinely used marker Ki67/MIB1, revealed that proteins such as EGFR, ERBB2 , P53 and the Gl cyclin CCNE are preferentially overexpressed in tumors undergoing rapid growth. The combined protein expression patterns defined two major clusters of tumors designated cluster A (462 cases) and cluster B (89 cases) in Figure 1 (1 case that clustered outside of the 2 clusters was excluded from further analysis). Cluster A could be further subdivided into two subclusters, Al (393 cases) and A2 (89 cases). Globally, cluster Al tumors displayed a strong expression of the "ER cluster" and the "adhesion cluster" and a low expression of the "proliferation cluster" in most of cases, whereas the "mitosis cluster" was strongly expressed in -50% of samples. In general, cluster B tumors displayed overall a low expression of the "ER cluster" but a strong expression of the three other protein clusters. Cluster A2 included ER-positive and ER-negative tumors that displayed an intermediate profile characterized overall by strong expression of the "adhesion cluster" and a low expression of the "ER cluster", the "proliferation cluster" and the "mitosis cluster".
VII.2.2) Correlation with histoclinical parameters and survival We identified correlations between tumor clusters and relevant biopathological parameters. In each cluster, the most frequent histological type was the ductal type; however in cluster Al , 19% of samples were of the lobular type compared with 12% in cluster A2 and only 7% in cluster B (p=0.03; Chi- 2 test). Figure 1C (top panel) shows, within cluster Al, a subcluster of 24 tumors that includes 21 lobular or mixed ( lobular/ductal ) carcinomas with low expression of E-Cadherin, consistent with a previous report.29 Correlation also existed with SBR grade; in cluster Al, 41% of cases were grade I and 15% were grade III compared with 23% and 35% in cluster A2 , and 7% and 63% in cluster B (p<0.0001; Chi-2 test), respectively. In cluster B, samples were more likely to be ERBB2-positive (2+ or 3+ in IHC, 36% of cases) compared with 8% in cluster Al and 12% in cluster A2 (p<0.0001, Chi-2 test). Conversely, cluster Al samples were more likely to be ER-positive (99% of cases) compared with 35% in cluster A2 and 10% in cluster B (p<0.0001, Chi-2 test). Finally, peritumoral vascular emboli were more frequent in A2 tumors (53% of cases) than in B (37%) and Al (35%) tumors (p=0.02, Chi-2 test). Interestingly, no correlation was found with age of patients, pathological size of tumors, and axillary lymph node status . Importantly, the tumor clusters correlated with clinical outcome. With a median follow-up of 57 months, the 5-year MFS was significantly different (p<0.0001, log-rank test) between cluster Al (54 metastases, 86% MFS [95%CI 82.1 - 89.9]), cluster A2 (21 metastases, 68% MFS [95%CI 79.9 - 56.5]) and cluster B (26 metastases, 66% MFS [95%CI 54.3 - 77.6]) (data not shown). VII.3) Supervised analysis and clinical outcome We developed a supervised analysis method to search for smaller sets of discriminator proteins that might improve our prognostic classification. Analysis was conducted using two equivalent but independent tumor sets (learning and validation sets) . VII.3.1) Supervised analysis and classification of patients
The learning set of samples (n=368) allowed the identification of a combination of proteins (protein expression signature) that correlated with long-term MFS. The number of proteins in the "metastatic predictor" was optimized by iteratively testing all combinations of 1 to 5 proteins and the complementary combinations of 21 to 25 proteins and by assessing their ability for correct classification of samples using a "Metastatic Score". The optimal combination for these tumors contained 21 proteins (Figure 2C). Examples of IHC staining for these 21 proteins are shown in Figure 4B. Samples from the learning set were ordered using the "Metastatic Score". Two classes of samples ("poor-prognosis class", positive scores and "good- prognosis class", negative scores) were defined using a cut-off value of 0. As shown in Figure 2A, the classifier predicted rather successfully the actual clinical outcome of patients: 47 out of the 128 patients (37%) with positive score displayed metastatic relapse whereas only 21 out of the 240 (9%) with negative score experienced metastasis during follow-up (odds ratio, OR=6.1 [95%CI 3.3 - 11.3], p<0.0001, Fisher exact test). We then shown the ability of this multiprotein signature to predict prognosis in an independent set of 184 patients (validation set) . Using the same threshold for the "Metastatic Score" previously described, we identified two classes of patients that strongly correlated with clinical outcome. There were 24 metastatic relapses out of the 63 patients (38%) in the "poor-prognosis class" and only 10 out of the 121 (8%) in the "good-prognosis class" (odds ratio, OR=6.8 [95%CI 2.8 - 17.3], p<0.0001, Fisher exact test) (Figure 2B) . These results confirmed and validated the predictive capacity and robustness of our 21-protein signature. When all 552 cases (learning and validation cases) were analyzed together, the predictor correlated well with long-term MFS. Figure 2C shows the expression profiles of the 21 proteins in the 552 tumors in a color-coded matrix. Samples are ordered from top to bottom according to their increasing "Metastatic Score" and proteins from left to right according to decreasing _P (_P is the difference between the probability of positive staining and the probability of negative staining in non-metastatic samples). The orange dashed line indicates the threshold 0 that separates the two classes, "good- prognosis" (above the line) and "poor-prognosis" (under the line).
VII.3.2) Correlation of molecular classification with histoclinical parameters and survival Table 1 (see the three last columns) shows the characteristics of patients in each class. The histoclinical parameters significantly associated with this classification were SBR grade (p<0.0001, Chi-2 test), hormone receptor status (p<0.0001, Fisher exact test), ERBB2 status (p<0.0001, Fisher exact test), and whether patients received adjuvant chemotherapy (p=0.001, Fisher exact test) or hormone therapy (p<0.0001, Fisher exact test). There was no correlation with patient age, tumor size, and number of involved lymph nodes. In contrast, a strong correlation with clinical outcome was observed (Figure 2C): 65 of 194 patients (34%) assigned to the "poor-prognosis class" displayed metastatic relapse whereas only 37 of 358 (10%) assigned to the "good-prognosis class" experienced metastasis during follow-up (odds ratio, OR=4.4 [95%CI 2.7 - 7.0], p<0.0001, Fisher exact test). The 5-year MFS was 62% [95%CI 54.7 - 70.0] in the "poor-prognosis class", and 90% [95%CI 86.0 - 93.3] in the "good- prognosis class" (p<0.0001, log-rank test) (Figure 3A) . VII.3.3) Survival and lymph node status Our protein expression signature also classified the 255 patients with node-positive disease into two classes that correlated with clinical outcome. In the "good-prognosis class", 28 out of 158 patients experienced metastatic relapse during follow-up as compared with 43 out of 97 in the "poor-prognosis class" (odds ratio, OR=3.7 [95%CI 2.0 - 6.8], p<0.0001, Fisher exact test) (Figure 3B) . The same was true for the 292 patients with node-negative breast cancer. In this group, the odds ratio for metastasis was 6.5 ([95%CI 2.7 - 16.8], p<0.0001, Fisher exact test) among the 93 women from the "poor-prognosis class", as compared with the 199 women from the "good-prognosis class" (Figure 3B). As shown, there was no significant difference for MFS between the 158 node-positive patients from the "good-prognosis class" and the 93 node-negative patients from the "poor-prognosis class" (p=0.142, log-rank test). We compared our prognostic classification of node-negative patients with those provided by the consensus criteria established during the St-Gallen and NIH conferences.3, These criteria classified all 292 patients into two groups (low risk versus high risk) (Figures 3C and 3D). Our multiprotein signature classified many more patients into the "good-prognosis class" (199 vs 80 vs 43, respectively) and less patients in the "poor- prognosis class" (93 vs 209 vs 245) as compared with St-Gallen and NIH classifications, and interestingly, with a percentage of metastatic relapse similar in the classes with low risk (4.5% vs 5% vs 7%, respectively), but greater in the classes with high risk (24% vs 13% vs 11%, respectively). In fact, the low-risk group and the high-risk group defined according to consensual criteria could further be subdivided in prognostic subgroups when the 21-protein signature was applied (data not shown).
VII.3.4) Survival and estrogen receptor status. The same analysis was separately applied to ER- positive and ER-negative tumors. In the ER-positive group (n=422), 35 of 345 patients from the "good- prognosis class" displayed metastatic relapse as compared with 29 of 77 from the "poor-prognosis class" (odds ratio, OR=5.4 [95%CI 2.8 - 9.9], p=<0.0001, Fisher exact test). The corresponding 5- year MFS were 90% [95%CI 85.9 - 93.3] and 58% [95%CI 45.4 - 70.6], respectively (p<0.0001, log-rank test) (data not shown). The same trend was observed, although not significant (p=0.21, log-rank test), for the 129 ER-negative tumors with 5-year MFS of 91% [95%CI 76.0 - 100.0] and 66% [95%CI 56.0 - 75.1], respectively.
VII.3.5) Survival and adjuvant systemic therapy Since the occurrence of metastatic relapse may be influenced by the delivery of adjuvant systemic therapy, the classification based on our 21-protein signature was applied to 186 women who received neither chemotherapy nor hormone therapy after local-regional treatment. Importantly, the 21- protein signature successfully predicted prognosis in these patients: 6 metastatic relapses of 119 patients in the "good-prognosis class" and 19 of 67 in the "poor-prognosis class" (odds ratio, OR=7.4 [95%CI 2.6 - 23.9], p<0.0001, Fisher exact test) (Figure 3E) . Similar results were observed when we focused on the 133 patients who received adjuvant chemotherapy without hormone therapy. In the "good- prognosis class", 12 of the 58 patients displayed metastatic relapse whereas 33 of 75 experienced metastasis in the "poor-prognosis class" (odds ratio, OR=3 [95%CI 1.3 - 7.2], p=0.006 Fisher exact test) (Figure 3F) . VII.3.6) Uni- and multivariate prognostic analysis We finally compared the prognostic ability of our molecular grouping of tumors with classical histoclinical factors and individual protein markers. In univariate analysis, the histoclinical factors that correlated with MFS (p<0.05, log-lank test) were pathological tumor size (≤20 mm, >20), tumor grade (SBR I, II, III), number of positive axillary lymph nodes (0, 1-3, ≥4), and peritumoral vascular invasion (negative, positive). Proteins significantly correlated to MFS were BCL2 (p<0.0001), GATA3 (p=0.0006), MIB1 (p<0.0001), ER (p<0.0001), PR (p=0.0007), P53 (p=0.003) and - Catenin (p=0.005) (Table 4).
Table 4. Cox proportional-hazards multivariate analyses in metastasis-free survival (n=552)
CI denotes confidence interval. The influence on the risk of distant metastasis of our multiprotein-based grouping, adjusted for other prognostic factors, was assessed in multivariate analysis by the Cox proportional hazards model. The parameters entered in the model were dichotomised and included the classification based on the discriminator 21-protein set ("good- prognosis class" and "poor-prognosis class"), age of patients (≤50 years, >50 years), number of positive axillary lymph nodes (0, 1-3, ≥4 ) , pathological tumor size (≤20 mm, >20), tumor grade (SBR I, II, III), estrogen receptor status (negative, positive), progesterone receptor status (negative, positive), peritumoral vascular invasion (negative, positive), chemotherapy (delivery or not), hormone therapy (delivery or not) and each of the proteins (negative, positive) significantly associated with survival in univariate analyses. Results are shown in Table 4. Several independent factors predictive of distant metastasis as first event were evidenced including the prognosis signature based on the 21- protein combination, pathological size of tumors, axillary lymph node status (only when dichotomized ≤3 vs >3), Ki67/MIB1 status and delivery of hormone therapy. However, the 21-protein signature was the strongest predictor with a hazard ratio of 2.2 for "poor-prognosis class" patients, compared to "good- prognosis class" patients ([95%CI 1.25 - 3.89], p<0.0001) . References :
1 . Tamoxifen for early breast cancer : an overview of the randomised trials . Early Breast Cancer Trialists ' Collaborative Group . Lancet 1998 ;
351 : 1451-67.
2 . Polychemotherapy for early breast cancer : an overview of the randomised trials . Early Breast Cancer Trialists ' Collaborative Group . Lancet 1998 ; 352 : 930-42.
3. Eifel P, Axelson JA, Costa J, et al. National Institutes of Health Consensus Development Conference Statement: adjuvant therapy for breast cancer, November 1-3, 2000. J Natl Cancer Inst 2001; 93:979-89.
4. Goldhirsch A, Glick JH, Gelber RD, Coates AS, Senn HJ. Meeting highlights: International Consensus Panel on the Treatment of Primary Breast Cancer. Seventh International Conference on Adjuvant Therapy of Primary Breast Cancer. J Clin Oncol 2001; 19:3817-27.
5. Leyland-Jones B. Trastuzumab: hopes and realities. Lancet Oncol 2002; 3:137-44.
6. Bertucci F, Viens P, Hingamp P, Nasser V, Houlgatte R, Birnbaum D. Breast cancer revisited using DNA array-based gene expression profiling. Int J Cancer 2003; 103:565-71.
7. Bertucci F, Houlgatte R, Benziane A, et al. Gene expression profiling of primary breast carcinomas using arrays of candidate genes. Hum Mol Genet 2000; 9:2981-2991.
8. Bertucci F, Nasser V, Granjeaud S, et al. Gene expression profiles of poor-prognosis primary breast cancer correlate with survival. Hum Mol Genet 2002; 11:863-72.
9. Perou CM, Sorlie T, Eisen MB, et al. Molecular portraits of human breast tumours. Nature 2000; 406:747-52.
10. Sorlie T, Tibshirani R, Parker J, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 2003; 100:8418-23. 11. Sotiriou C, Neo SY, McShane LM, et al . Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc Natl Acad Sci U S A 2003; 100:10393-8.
12. van de Vijver MJ, He YD, van't Veer LJ, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 2002; 347:1999-2009.
13. van 't Veer LJ, Dai H, van De Vijver MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; 415:530-6.
14. Huang E, Cheng SH, Dressman H, et al . Gene expression predictors of breast cancer outcomes. Lancet 2003; 361:1590-6.
15. Cheng Q, Lau WM, Tay SK, Chew SH, Ho TH, Hui KM. Identification and characterization of genes involved in the carcinogenesis of human squamous cell cervical carcinoma. Int J Cancer 2002; 98:419- 26.
16. Hoos A, Cordon-Cardo C. Tissue microarray profiling of cancer specimens and cell lines: opportunities and limitations. Lab Invest 2001; 81:1331-8. 17. Kononen J, Bubendorf L, Kallioniemi A, et al. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat Med 1998; 4:844-7.
18. Richter J, Wagner U, Kononen J, et al. High- throughput tissue microarray analysis of cyclin E gene amplification and overexpression in urinary bladder cancer. Am J Pathol 2000; 157:787-94.
19. Callagy G, Cattaneo E, Daigo Y, et al. Molecular classification of breast carcinomas using tissue microarrays. Diagn Mol Pathol 2003; 12:27-34.
20. Hsu FD, Nielsen TO, Alkushi A, et al. Tissue microarrays are an effective quality assurance tool for diagnostic immunohistochemistry. Mod Pathol 2002; 15:1374-80. 21. Liu CL, Prapong W, Natkunam Y, et al . Software tools for high-throughput analysis and archiving of immunohistochemistry staining data obtained with tissue microarrays. Am J Pathol 2002; 161:1557-65.
22. Korsching E, Packeisen J, Agelopoulos K, et al. Cytogenetic alterations and cytokeratin expression patterns in breast cancer: integrating a new model of breast differentiation into cytogenetic pathways of breast carcinogenesis. Lab Invest 2002; 82:1525-33. 23. Alkushi A, Irving J, Hsu F, et al. Immunoprofile of cervical and endometrial adenocarcinomas using a tissue microarray. Virchows Arch 2003; 442:271-7.
24. Nielsen TO, Hsu FD, O'Connell JX, et al . Tissue Microarray Validation of Epidermal Growth Factor Receptor and SALL2 in Synovial Sarcoma with Comparison to Tumors of Similar Histology. Am J Pathol 2003; 163:1449-56. 25. Ginestier C, Charaffe-Jauffret E, Bertucci F, et al. Interest and limitations of tissue- microarrays for validation of breast tumor markers selected upon cDNA array analysis. Am J Pathol 2002; 161:1223-1233.
26. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998; 95:14863-8. 27. Conte N, Delaval B, Ginestier C, et al. The TACCl-chTOG-Aurora A protein complex in breast cancer. Oncogene in press.
28. Giet R, McLean D, Descamps S, et al. Drosophila Aurora A kinase is required to localize D-TACC to centrosomes and to regulate astral microtubules . J Cell Biol 2002; 156:437-51.
29. Droufakou S, Deshmane V, Roylance R, Hanby A, Tomlinson I, Hart IR. Multiple ways of silencing E- cadherin gene expression in lobular carcinoma of the breast. Int J Cancer 2001; 92:404-8.
30. Teixeira C, Reed JC, Pratt MA. Estrogen promotes chemotherapeutic drug resistance by a mechanism involving Bcl-2 proto-oncogene expression in human breast cancer cells. Cancer Res 1995; 55:3902-7.
31. Menard S, Fortis S, Castiglioni F, Agresti R, Balsari A. HER2 as a prognostic factor in breast cancer. Oncology 2001; 61:67-72.
32. Fisher ER, Osborne CK, McGuire WL, et al. Correlation of primary breast cancer histopathology and estrogen receptor content. Breast Cancer Res Treat 1981; 1:37-41.
33. Ginestier C, Bardou VJ, Popovici C, et al. Loss of FHIT protein expression is a marker of adverse evolution in good prognosis localized breast cancer. Int J Cancer 2003; 107:854-62.
34. Hui R, Cornish AL, McClelland RA, et al. Cyclin Dl and estrogen receptor messenger RNA levels are positively correlated in primary breast cancer. Clin Cancer Res 1996; 2:923-8.
35. Paredes J, Milanezi F, Reis-Filho JS, Leitao D, Athanazio D, Schmitt F. Aberrant P-cadherin expression: is it associated with estrogen- independent growth in breast cancer? Pathol Res Pract 2002; 198:795-801.
36. Lakhani SR, Chaggar R, Davies S, et al. Genetic alterations in ' normal ' luminal and myoepithelial cells of the breast. J Pathol 1999; 189:496-503.
37. Dontu G, Al-Hajj M, Abdallah WM, Clarke MF, Wicha MS. Stem cells in normal breast development and breast cancer. Cell Prolif 2003; 36 Suppl 1:59- 72. 38. Lakhani SR, 0 ' Hare MJ . The mammary myoepithelial cell--Cinderella or ugly sister? Breast Cancer Res 2001; 3:1-4.
39. Boecker W, Buerger H. Evidence of progenitor cells of glandular and myoepithelial cell lineages in the human adult female breast epithelium: a new progenitor (adult stem) cell concept. Cell Prolif 2003; 36 Suppl 1:73-84.
40. Al-Hajj M, Wicha MS, Benito-Hernandez A, Morrison SJ, Clarke MF. Prospective identification of tumorigenic breast cancer cells. Proc Natl Acad Sci U S A 2003; 100:3983-8.
41. van de Rijn M, Perou CM, Tibshirani R, et al. Expression of cytokeratins 17 and 5 identifies a group of breast carcinomas with poor clinical outcome. Am J Pathol 2002; 161:1991-6.
42. Brazma A, Vilo J. Gene expression data analysis. FEBS Lett 2000; 480:17-24.

Claims

Claims
1) A method for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of proteins in breast tissues or cells, said pool comprising all or part, for example one, two, three or more of a protein set comprising: Afadin, Aurora A, a-Catenin, b-Catenin, BCL2, Cyclin Dl, Cyclin E, Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin, EGFR, ERBB2 , ERBB3, ERBB4 , Estrogen receptor, FGFRl, FHIT, GATA3 , Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC1, TACC2, TACC3, Cytokeratin 6, Cytokeratin 18, Angl, AuroraB, BCRP1, CathepsinD, CD10, CD44, CK14, Cox2 , FGF2, GATA4, Hifla, MMP9, MTA1, NM23, NRGla, NRGlbeta, P27, Parkin, PLAU, S100, SCRIBBLE, Smooth Muscle Actin, THBS1, TIMP1. 2) A method for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of proteins in breast tissues or cells, said pool comprising all or part, for example one, two, three or more of a protein set comprising: Afadin, Aurora A, a-Catenin, b-Catenin, BCL2, Cyclin Dl, Cyclin E, Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin, EGFR, ERBB2 , ERBB3 , ERBB4 , Estrogen receptor, FGFRl, FHIT, GATA3 , Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC1, TACC2, TACC3. 3) A method for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of protein in breast tissues comprising a protein set comprising: Afadin, Aurora A, a-Catenin, BCL2 , Cyclin Dl,
Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin,
ERBB2, ERBB3, ERBB4 , Estrogen receptor, FGFRl, FHIT, Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC2, TACC3.
4) The method according to claims 1 to 3 wherein the pool comprises a protein set comprising: Afadin, Aurora A, a-Catenin, b-Catenin, BCL2 , Cyclin Dl, Cyclin E, Cytokeratin 5/6, Cytokeratin 8/18, E-Cadherin, EGFR, ERBB2 , ERBB3 , ERBB4 , Estrogen receptor, FGFRl, FHIT, GATA3 , Ki67, Mucin 1, P53, P-Cadherin, Progesterone receptor, TACC1, TACC2, TACC3.
5) The method according to claim 1 to 4 comprising the detection of overexpression of the following proteins : EGFR, P53, Ki67, FGFRl, ERBB2 , ERBB3, ERBB4 , Cyclin Dl, Cyclin E, Cytokeratin 5/6.
6 ) The method according to claim 1 to 5 comprising the detection of underexpression of the following proteins : Estrogen Receptor, FHIT, GATA3 , Mucin 1, P- Cadherin, Progesterone receptor, TACC1, TACC2, TACC3, Afadin, Aurora A, a-Catenin, b-Catenin, BCL2, Cytokeratin 8/18, E-Cadherin. 7) A protein library useful for the molecular characterization of histopathologic features of breast disease comprising or corresponding to a pool of protein sequences, over or under expressed, in breast tissue or cells, said pool corresponding to the protein defined in any of claims 1 to 6. 8) A protein library according to
Claim 7 immobilized on a solid support.
9) A protein library according to claim 7 or 8 wherein the support is selected from the group comprising nylon membrane, nitrocellulose membrane, polyvinylidene difluoride, glass slide, glass beads, polyustyrene plates, membranes on glass support, silicon chip or gold chip. 10) A method for analyzing differential protein expression associated with histopathologic features of breast disease comprising the detection of the overexpression or underexpression of a pool of protein in breast tissues comprising : a) obtaining breast tissue cells from a patient, and b) measuring in the tissue cells obtained in step (a) over or underexpression of proteins of a library according to any of Claims 7 to 9.
11) The method according to Claim 10 wherein said proteins are directly or indirectly labeled before reaction step (b). 12) The method according to claim 11 wherein the label is selected from the group consisting of radioactive, colorimetric, enzymatic, molecular amplification, bioluminescent or fluorescent labels.
13) The method according to claim 12 wherein one or more specific label are used for each protein of a library according to any of Claims 7 to 9.
14) The method according to any of claims 10 to 13, wherein said measuring of over or under expression of proteins is carried out on tissue microarray.
15) The method according to any of claims 10 to 14, wherein the measuring of over or under expression of protein is carried out by ImmunoHistoChemistry ( IHC) technologies .
16) A method according to claim 10 wherein the detection of over or under expression of the pool of protein is alternatively carried out on breast tumor cell lines.
17) The method according to any of claims 10 to 16 further comprising a) obtaining a control sample b) measuring in the control sample obtained in step (a) expression level of each protein corresponding to library according to any of Claims 7 to 9 c) comparing expression level of each protein with the level of equivalent protein in a tissue sample according to claim 10. 18) A method according to any of claims 1 to 6 or 10 to 17, for detecting, diagnosing, staging, monitoring, predicting, preventing conditions associated with breast cancer. 19) The method according to any of claims 1 to 6 or 10 to 17 for predicting clinical outcome of breast cancer.
20) The method according to any of claims 1 to 6 or 10 to 17 for predicting occurrence of metastatic relapse.
21) The method according to claim 18 for determining the stage or aggressiveness of a breast cancer.
22) A method according to any of claims 1 to 6 or 10 to 21 wherein the breast tissue sample is obtained from a patient regardless of whether said patient has received a neo adjuvant or an adjuvant therapy.
23) The method according to claim 22 wherein the breast tissue sample is obtained from a patient who has received an adjuvant therapy.
24) The method according to claim 22 wherein the breast tissue sample is obtained from a patient who has not received an adjuvant therapy. 25) A method for treating a patient with a breast cancer comprising (i) the implementation of a method according to any of claims 1 to 6 or 10 to 24 on a sample from said patient, and (ii) determining a treatment for this patient based on the analysis of differential protein expression profile obtained with said method . 26) A method for analyzing differential protein expression associated with histopathologic features of breast disease according to claim 1 to 6 wherein the detection of the overexpression or underexpression of said pool of protein in breast tissues comprises the detection of the overexpression or underexpression of nucleic acids coding for said proteins.
27) A nucleic acids library useful for the molecular characterization of histopathologic features of breast disease comprising nucelic acids according to claim 26.
EP05702409A 2004-01-16 2005-01-17 Protein expression profiling and breast cancer prognosis Withdrawn EP1704416A2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US53741204P 2004-01-16 2004-01-16
US3629805A 2005-01-14 2005-01-14
PCT/IB2005/000261 WO2005071419A2 (en) 2004-01-16 2005-01-17 Protein expression profiling and breast cancer prognosis

Publications (1)

Publication Number Publication Date
EP1704416A2 true EP1704416A2 (en) 2006-09-27

Family

ID=34810500

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05702409A Withdrawn EP1704416A2 (en) 2004-01-16 2005-01-17 Protein expression profiling and breast cancer prognosis

Country Status (2)

Country Link
EP (1) EP1704416A2 (en)
WO (1) WO2005071419A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110643571A (en) * 2019-10-22 2020-01-03 康妍葆(北京)干细胞科技有限公司 Application of human keratin 6A in stem cell culture and product

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BRPI0515562A (en) * 2004-09-22 2008-07-29 Tripath Imaging Inc methods and compositions for assessing breast cancer prognosis
US20070134687A1 (en) * 2005-09-12 2007-06-14 Aurelium Biopharma Inc. Focused microarray and methods of diagnosing cancer
US8119655B2 (en) 2005-10-07 2012-02-21 Takeda Pharmaceutical Company Limited Kinase inhibitors
EA200970361A1 (en) 2006-10-09 2010-02-26 Такеда Фармасьютикал Компани Лимитед KINASE INHIBITORS
WO2009103790A2 (en) 2008-02-21 2009-08-27 Universite Libre De Bruxelles Method and kit for the detection of genes associated with pik3ca mutation and involved in pi3k/akt pathway activation in the er-positive and her2-positive subtypes with clinical implications
WO2012078365A2 (en) * 2010-12-10 2012-06-14 Nuclea Biotechnologies, Inc. Biomarkers for prediction of breast cancer
ES2609249T3 (en) * 2011-01-11 2017-04-19 Inserm - Institut National De La Santé Et De La Recherche Médicale Methods of predicting the outcome of a cancer in a patient analyzing gene expression
JP6008305B2 (en) 2012-05-02 2016-10-19 公益財団法人がん研究会 Small compounds targeting TACC3
KR101882755B1 (en) * 2015-02-27 2018-07-27 연세대학교 산학협력단 Apparatus and method for evaluating the prognosis and the need for chemotherapy in the treatment of breast cancer
EP3797173A2 (en) * 2018-05-21 2021-03-31 Nanostring Technologies, Inc. Molecular gene signatures and methods of using same

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002253878A1 (en) * 2001-01-25 2002-08-06 Gene Logic, Inc. Gene expression profiles in breast tissue

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2005071419A3 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110643571A (en) * 2019-10-22 2020-01-03 康妍葆(北京)干细胞科技有限公司 Application of human keratin 6A in stem cell culture and product
CN110643571B (en) * 2019-10-22 2021-07-27 康妍葆(北京)干细胞科技有限公司 Application of human keratin 6A in stem cell culture and product

Also Published As

Publication number Publication date
WO2005071419A3 (en) 2006-02-23
WO2005071419A2 (en) 2005-08-04

Similar Documents

Publication Publication Date Title
US20050221398A1 (en) Protein expression profiling and breast cancer prognosis
EP1704416A2 (en) Protein expression profiling and breast cancer prognosis
US10494677B2 (en) Predicting cancer outcome
Jacquemier et al. Protein expression profiling identifies subclasses of breast cancer and predicts prognosis
Leong et al. The changing role of pathology in breast cancer diagnosis and treatment
Roepman et al. Microarray-based determination of estrogen receptor, progesterone receptor, and HER2 receptor status in breast cancer
Pauletti et al. Assessment of methods for tissue-based detection of the HER-2/neu alteration in human breast cancer: a direct comparison of fluorescence in situ hybridization and immunohistochemistry
Ginestier et al. Distinct and complementary information provided by use of tissue and DNA microarrays in the study of breast tumor markers
US20220390451A1 (en) Single cell genomic profiling of circulating tumor cells (ctcs) in metastatic disease to characterize disease heterogeneity
US20080153098A1 (en) Methods for diagnosing and treating breast cancer based on a HER/ER ratio
Schneider et al. Identification and meta‐analysis of a small gene expression signature for the diagnosis of estrogen receptor status in invasive ductal breast cancer
US20100105564A1 (en) Stroma Derived Predictor of Breast Cancer
US9721067B2 (en) Accelerated progression relapse test
WO2012125411A1 (en) Methods of predicting prognosis in cancer
US20100081666A1 (en) Src activation for determining cancer prognosis and as a target for cancer therapy
US20150344962A1 (en) Methods for evaluating breast cancer prognosis
EP3631445B1 (en) Methods of determining therapies based on single cell characterization of circulating tumor cells (ctcs) in metastatic disease
Asleh et al. Nestin expression in breast cancer: association with prognosis and subtype on 3641 cases with long-term follow-up
WO2019051041A1 (en) Methods of prognosing early stage breast lesions
Fey The impact of chip technology on cancer medicine
Gatalica et al. Molecular Profiling of Uveal Melanoma Patients

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20060712

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20080715

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110427