WO2012145607A2

WO2012145607A2 - Specific copy number aberrations as predictors of breast cancer

Info

Publication number: WO2012145607A2
Application number: PCT/US2012/034421
Authority: WO
Inventors: Patricia THOMPSON; Melissa BONDY; Gordon Mills; Li Zhang; Spyros VACHIDIS
Original assignee: Board Of Regents, The University Of Texas System; Arizona Board Of Regents On Behalf Of University Of Arizona
Priority date: 2011-04-20
Filing date: 2012-04-20
Publication date: 2012-10-26
Also published as: WO2012145607A3

Abstract

Methods and compositions for the prognosis and classification of cancer, especially breast cancer, are provided. For example, in certain aspects methods for cancer prognosis using copy number analysis of selected biomarkers are described. In further aspects, copy number information may be used to predict metastasis risk and help treatment decision-making.

Description

DESCRIPTION

SPECIFIC COPY NUMBER ABERRATIONS AS PREDICTORS OF BREAST

CANCER

BACKGROUND OF THE INVENTION

[0001] This application claims priority to U.S. Application No. 61/477,529 filed on April 20, 2011, the entire disclosure of which is specifically incorporated herein by reference in its entirety without disclaimer

[0002] This invention was made with government support under P50CA116199 and R01CA089608 awarded by the National Institutes of Health. The government has certain rights in the invention.

1. Field of the Invention [0003] The present invention relates generally to the fields of oncology, molecular biology, cell biology, and cancer. More particularly, it concerns cancer prognosis or treatment using molecular markers.

2. Description of Related Art

[0004] Cancer represents the phenotypic end-point of multiple genetic lesions that endow cells with a full range of biological properties required for tumori genesis. Indeed, a hallmark genomic feature of many cancers, including, for example, B cell cancer, lung cancer, breast cancer, ovarian cancer, pancreatic cancer, and colon cancer, is the presence of numerous complex chromosome structural aberrations— including non-reciprocal translocations, amplifications and deletions. Better understanding of the fundamental biology of chromosome structural aberrations in cancer may not only improve prognostication but also offer new individualized therapeutic options.

[0005] However, despite many attempts to establish pre-treatment prognostic markers to understand the clinical biology of patients with cancer, validated clinical or biomarker parameters are lacking in many aspects. Therefore, there remains a need to discover novel prognostic markers for cancer patients, especially breast cancer patients. SUMMARY OF THE INVENTION

[0006] Certain aspects of the invention are based, in part, on the discovery of biomarker copy number alterations that predict prognosis (e.g., risk of recurrence) in breast cancer patients. The biomarkers significantly improved the performance of current prognostication, independent of currently used clinical tumor subtyping methods for prognostication. The application of these biomarkers may improve prognostication for patient management, and improve patient classification for treatment decision making, including use of chemotherapy, selection of specific treatment agents and intensity and modes of surveillance for disease recurrence tailed to the patient tumor genotype.

[0007] Therefore, certain aspects of the present invention overcomes major deficiencies in the art by providing novel methods for providing a breast cancer prognosis in a subject determined to have a breast cancer. The prognosis may be poor or favorable. In certain aspects of the invention, the poor prognosis may indicate high risk of recurrence, poor survival, higher chance of cancer progress or metastasis, or a low response to or a poor clinical outcome after a conventional therapy such as surgery, chemotherapy and/or radiation therapy. In other aspects, the favorable prognosis may comprise low risk of recurrence, higher chance of survival, lower chance of cancer progress or metastasis, or a high response to or a favorable clinical outcome after a conventional therapy. In a particular aspect, the favorable prognosis may comprise a higher chance of survival as compared with a reference level. The poor or favorable prognosis may be determined as compared to a reference level. Such as a reference level may be obtained from an individual or a cohort group of subjects, such as an mean or average level of survival or risk of recurrence.

[0008] The method may comprise determining the number of copies of one or more biomarkers in a breast cancer sample of the patient. For example, the biomarkers may comprise one or more genomic regions selected from the group consisting of 22ql 1.1-1 1.21 , Xp21.1-21.2, lpl2, 12ql3.13, 13ql2.3, 2pl l . l , 3ql3.12, 10pl l .21 , 10q23.1 , l lpl5, 14ql3.2-13.3, and 17q21.33. In a particular aspect, the biomarkers may comprise four genomic regions including 8p22, l lql3.5, 22ql 1.1-ql 1.21 , and Xp21.1-p21.2. [0009] In an alternative aspect, the biomarkers may comprise one or more genomic regions selected from the group consisting of 8p22, l l ql3.5, 22ql 1.1-1 1.21 , Xp21.1-21.2, lpl2, 12ql3.13, 13ql2.3, 2pl l . l, 3ql3.12, 10pl l .21, 10q23.1, l lpl5, 14ql3.2-13.3, 17q21.33, 16pl 1.2, 1 Op 13 , 12p 13 , 20ql3, and Xq28. In a particular aspect, the biomarkers may comprise at least two, three, four, five, six, seven, eight, nine, ten, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35 genomic regions or any intermediate ranges thereof. [0010] Using prognostic variable information including the number of copies so determined as compared to reference level, a prognosis may be provided for the patient. The copy number information in the breast cancer sample may be compared to a reference level. For example, a poor prognosis may be provided if the breast cancer sample has a decrease in the number of copies of 8p22, 22ql 1.1-ql 1.21 and Xp21.1-p21.2 and an increase in the number of copies of 1 lql3.5, compared to the number of copies in a control sample. In other aspects, a favorable prognosis may be provided if the breast cancer sample has an increase in the number of copies of 8p22, 22ql 1.1-ql 1.21 and Xp21.1-p21.2 and a decrease in the number of copies of 1 lql3.5, compared to the number of copies in a control sample.

[0011] In further aspects, a poor prognosis may be provided if the breast cancer sample has two or more high-risk events of a decrease in the number of copies of 8p22, 22ql 1.1-ql 1.21, Xp21.1-p21.2, lpl2, 12ql3.13 and 13ql2.3, and an increase in the number of copies of 10pl3, l lql3.5, 12pl3, 16pl l .2, 20ql3, Xp28, 2pl l .l, 3ql3.12, 10pl l .21, 10q23.1, l lpl5, 14ql3.2-13.3, and 17q21.33, compared to a reference level or the number of copies in a control sample. A favorable prognosis may be provided if the breast cancer sample has two or more low-risk events of an increase in the number of copies of 8p22, 22ql 1.1-ql 1.21, Xp21.1-p21.2, lpl2, 12ql3.13 and 13ql2.3, and a decrease in the number of copies of 1 Op 13 , l lql3.5, 12pl3, 16pl l .2, 20ql3, Xp28, 2pl l .l, 3ql3.12, 10pl l .21, 10q23.1, l lpl5, 14ql3.2-13.3, and 17q21.33, compared to a reference level or the number of copies in a control sample. [0012] To improve the accuracy of a prognosis, a prognostic value may be provided based on the copy number determination. For example, the prognostic value may be summarized by combining of coefficients of a Cox proportional hazard regression model of each biomarker. In a particular aspect, a boosting strategy may be used to fit proportional subdistribution hazards models for copy number and recurrence treating chromosomal segments in a dose specific fashion (-1 (loss), 0 (no change) and +1 (gain). The prognostic values of patients may be used to define at least three groups: intermediate risk (tumors that show no copy number changes of the selected biomarkers, risk index = 0), high risk (tumors with risk index > 0), and low risk (tumors with risk index < 0).

[0013] In further aspects, the biomarker information may be combined with other prognostic variable information. The other information may include clinical information, such as, but not limited to, patient age, tumor stage, tumor size, tumor subtypes, lymph node status, nuclear grade, ER status, progesterone receptor (PR status), or primary treatment status to generate a prognosis.

[0014] In certain aspects, the breast cancer is an early stage breast cancer. For example, the breast cancer is at stage 0, I or II. In a further aspect, the method may comprise determining subtypes of breast cancer. The breast cancer may be determined to be Her2- positive (Her2⁺), estrogen receptor-negative (ER^"), high Ki67 (Ki67^Mgh), ER⁺/ Ki67^Mgl1, or triple negative (ER7PR7Her2^~).

[0015] For example, the breast cancer may be determined to be Her2⁺. For the prognosis of the breast cancer determined to be Her2⁺, the method may comprising determining the copy numbers of biomarkers comprising the genomic region 17q21.23. The biomarkers may also comprise one or more additional markers that are neighboring sites of 17q21.23. The method may further comprise providing a poor prognosis if the breast cancer sample has a decrease in the number of copies of 17q21.23, compared to the number of copies in a control sample. In other aspects, a favorable prognosis may be provided if the breast cancer sample has an increase in the number of copies of 17q21.23, compared to the number of copies in a control sample.

[0016] For predicting risk of metastasis to specified tissue sites, copy number information may be determined or obtained. For example, there may be provided methods of predicting risk of metastasis to the brain in a patient determined to have a breast cancer. The method may comprise determining the number of copies of one or more biomarkers in a breast cancer sample of the patient. The biomarkers may comprise one or more genomic regions selected from the group consisting of 3p29, 6p22.3, 6p23, 10pl4, 1 Op 13 , l lpl3, l lql3.1, l lql3.5, l lq21, 13ql2.11, 13ql2.12, 14ql2, 14q21.1, 19ql3.11, 19ql3.33, 19ql3.41, 19ql3.42, 19ql3.43, Xq28, 4pl2, 4ql2, 4q21.1, 4q21.21, 5ql4.2, 5ql4.3, 5ql5, 5q21.1, 5q21.2, 5q21.3, 5q22.1, 5q22.2, 5q22.3, 5q23.1, 5q23.2, 5q23.3, 8p22, l lpl2, 12ql3.13, 14q32.12, 14q32.13, 14q32.2, 14q32.31, 17q23.1 and 17q23.2. In particular, the biomarkers may comprise one or more genomic regions selected from the group consisting of l lq21, 13ql2.11, 13ql2.12, 14q21.1, 19ql3.33, 19ql3.41, 19ql3.42, 19ql3.43, 4q21.1, 4q21.21, l lpl2, 17q23.1 or 17q23.2. In a more particular aspect, the biomarkers may comprise two genomic regions including 3q29 and 1 Op 13. [0017] In a further aspect, the method may comprise predicting risk of metastasis to the brain using prognostic variable information comprising the number of copies so determined above. For example, a high risk of metastasis to the brain may be predicted if the breast cancer sample has an increase in the number of copies of one or more genomic regions selected from the group consisting of 3p29, 6p22.3, 6p23, 10pl4, 10pl3, l lpl3, l lql3.1, l lql3.5, l lq21, 13ql2.11, 13ql2.12, 14ql2, 14q21.1, 19ql3.11, 19ql3.33, 19ql3.41, 19ql3.42, 19ql3.43, and Xq28, or the breast cancer sample has a decrease in the number of copies of one or more genomic regions selected from the group consisting of 4pl2, 4ql2, 4q21.1, 4q21.21, 5ql4.2, 5ql4.3, 5ql5, 5q21.1, 5q21.2, 5q21.3, 5q22.1, 5q22.2, 5q22.3, 5q23.1, 5q23.2, 5q23.3, 8p22, l lpl2, 12ql3.13, 14q32.12, 14q32.13, 14q32.2, 14q32.31, 17q23.1 and 17q23.2, compared to the number of copies in a control sample. In a particular aspect, the method may comprise predicting a high risk of metastasis to the brain if the breast cancer sample has an increase in the number of copies of 3q29 and 1 Op 13 , compared to the number of copies in a control sample. The method may further comprise providing a low risk of metastasis to the brain if the breast cancer sample has a decrease in the number of copies of 3q29 and 1 Op 13 , compared to the number of copies in a control sample.

[0018] In certain aspects, there may be also provided methods of providing a prognosis for a patient determined to have a breast cancer by mammography. The method may be used to improve breast cancer screen and differentiation of indolent, low risk lymph node negative tumors detected by mammography from more aggressive tumors in early detection. The method may comprise determining the number of copies of one or more biomarkers in a breast cancer sample of a patient determined to have a breast cancer by mammography. The biomarkers may comprise at least five genomic regions including 2pl l .2, 3q27.1-29, 8q24.13, l lpl3, and 20ql3.13-13.32.

[0019] A breast cancer prognosis using prognostic variable information including the number of copies so determined may be provided. For example, a poor prognosis may be provided if the breast cancer sample has an increase in the number of copies of 2pl l .2, 3q27.1-29, 8q24.13, l lpl3, and 20ql3.13-13.32, compared to the number of copies in a control sample. In other aspects, a favorable prognosis may be provided if the breast cancer sample has a decrease in the number of copies of 2pl l .2, 3q27.1-29, 8q24.13, l lpl3, and 20ql3.13-13.32, compared to the number of copies in a control sample. In a particular aspect, the prognosis predicts disease-free survival or a response to chemotherapy. For example, the method may further comprise developing a treatment plan comprising chemotherapy for the patient if a poor prognosis is provided, which indicates a more aggressive tumor than the indolent tumor usually detected by mammography.

[0020] The copy number information may be combined with other prognostic variable information, like clinical information, to provide a prognosis. Non-limiting examples of clinical information include patient age, tumor stage, tumor size, tumor subtypes, lymph node status, nuclear grade, ER status, progesterone receptor (PR status), or primary treatment status. In particular aspects, the prognostic variable information further comprises clinical information of patient age, tumor size, and lymph node status.

[0021] In further aspects, there may be provided methods of treating a patient determined to have a breast cancer by mammography. The methods may comprise obtaining copy number information of one or more biomarkers in a breast cancer sample in a patient determined to have a breast cancer by mammography. For example, the biomarkers comprise five genomic regions including 2pl l .2, 3q27.1-29, 8q24.13, l lpl3, and 20ql3.13-13.32. The methods may further comprise treating the patient based on the copy number information. For example, the methods may comprise administering chemotherapy if the breast cancer sample has an increase in the number of the five genomic regions, or administering a treatment other than chemotherapy if the breast cancer sample has an increase in the number of the five genomic regions, compared to the number of copies in a control sample or a reference level. [0022] In certain aspects, a reference level used for comparison may be a reference level of copy numbers from a control sample, such as non-cancerous tissue from the same subject. Alternatively, the reference level may be a reference level of copy numbers from a control sample of different subject or group of subjects, which may not have cancer. For example, the reference level of copy numbers may be a copy number level obtained from tissue of a subject or group of subjects without cancer, or a copy number level obtained from noncancerous tissue of a subject or group of subjects with cancer. The reference level may be a single value or may be a range of values. The reference level of copy numbers can be determined using any method known to those of ordinary skill in the art. In some embodiments, the reference level is an average level of copy numbers determined from a cohort of subjects with cancer or without cancer. The reference level may also be depicted graphically as an area on a graph. [0023] In further aspects, there may be provided methods that comprise obtaining a sample of the subject. For assessing biomarker copy numbers, the sample may be serum, saliva, biopsy or needle aspirate. In a further aspect, the sample may be paraffin-embedded or frozen. In a particular aspect, the sample may be preserved, particularly, a formalin-fixed, paraffin-embedded (FFPE) sample. [0024] The method may further comprise isolation nucleic acid of the subject's cancer. In particular aspects, the method may comprise assaying nucleic acids of the subject's cancer, in particular for one or more of the biomarkers described above.

[0025] The skilled artisan will understand that any methods known in the art for assessing copy numbers can be used in the present methods and compositions. The testing to assess copy number of the nucleic acids may include, but are not limited to, in situ hybridization (such as fluorescent in situ hybridization (FISH)), polymerase chain reaction (PCR) analysis, reverse transcription polymerase chain reaction (RT-PCR) analysis, Southern blotting, Northern blotting, array-based methods (such as single nucleotide polymorphism (SNP)- based analysis or comparative genomic hybridization) and/or sequencing methods. In a particular aspect, the determination may comprise the use of an array, such as MIP (molecular inversion probe) array.

[0026] In an alternative aspect, there may be provided methods that comprise analyzing a predetermined copy number profile. The predetermined copy number profile may be obtained from a lab, a service provider, or a technician. [0027] In a further aspect, the method may comprise recording the copy number determination in a tangible medium. For example, such a tangible medium may be a computer-readable medium, such as a computer-readable disk, a solid state memory device, an optical storage device or the like, more specifically, a storage device such as a hard drive, a Compact Disk (CD) drive, a floppy disk drive, a tape drive, a random access memory (RAM), etc.

-Ί- [0028] Based on the prognosis information, the methods may comprise reporting the copy number determination and/or prognosis to the subject, a health care payer, a physician, an insurance agent, or an electronic system.

[0029] Certain aspects of the methods also comprise methods for treating subjects that with a predetermined copy number profile of one or more biomarkers as described above. In further aspects, the methods may comprise prescribing or administering a treatment to the subject: for example, such a treatment would be a conventional therapy like surgery, chemotherapy and/or radiation therapy to the subject if favorable prognosis provided, or an alternative treatment other than surgery, chemotherapy and radiation therapy to the subject if a poor prognosis is provided.

[0030] Embodiments discussed in the context of methods and/or compositions of the invention may be employed with respect to any other method or composition described herein. Thus, an embodiment pertaining to one method or composition may be applied to other methods and compositions of the invention as well. [0031] As used herein the terms "encode" or "encoding" with reference to a nucleic acid are used to make the invention readily understandable by the skilled artisan; however, these terms may be used interchangeably with "comprise" or "comprising" respectively.

[0032] As used herein the specification, "a" or "an" may mean one or more. As used herein in the claim(s), when used in conjunction with the word "comprising", the words "a" or "an" may mean one or more than one.

[0033] The use of the term "or" in the claims is used to mean "and/or" unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and "and/or." As used herein "another" may mean at least a second or more. [0034] Throughout this application, the term "about" is used to indicate that a value includes the inherent variation of error for the device, the method being employed to determine the value, or the variation that exists among the study subjects.

[0035] Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

[0036] The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.

[0037] FIGS. 1A-1C. Time-to-recurrence using data for all breast cancers by the 19 individual copy number imbalances identified in the variable selection process. The black line = no change in copy number, dashes = loss and dots =gain.

[0038] FIG. 2. Comparison of clinical and marker based prognostic models. The performance of the clinical, 19-marker model, and combined (clinical + marker) prognostic models by C-indices: a) all tumors, b) ER-positive only, c) ER-negative only, d) ER⁺/Ki67^low, e) ER⁺/Ki67^hlgh, f) HER2+, and g) TNBC. The clinical model includes age at diagnosis, tumor size, and lymph node status. Subtype model includes tumor classified as ER⁺/Ki67^low, ER⁺/Ki67^high, ERBB2 gene amplified (HER2⁺), and triple negative (TNBC [ER /PR 7HER2 ]). The closed square indicates the training set with the 95% confidence interval for the estimate and the open square represent performance in the test set.

[0039] FIGS. 3A-3B. Time-to-recurrence for all breast cancers by CNI marker-only risk groups. Risk groups are defined as low risk CNI, no copy number variation at the 19 markers (no CNV), and high risk CNIs: FIG. 3 A is training set and FIG. 3B is test set. [0040] FIGS. 4A-4B. Pattern of copy number imbalances and their frequency across risk subtypes. FIG. 4A. Frequency and type of CN imbalances (dark shade = gain and light shade = loss) across the entire genome for the three marker-based risk groups (low risk CNIs, no CNV, and high risk CNIs). FIG. 4B. Copy number gain/loss frequencies for the 19 markers, for the low- and high-risk CNI defined groups [0041] FIG. 5. Kaplan-Meier Analysis of the Recurrence Probability by marker-only risk categories for ER negative cases and chemotherapy. Left panel is ER negative (All); middle panel is ER negative (Chemotherapy); and right panel is ER negative (No Chemotherapy).

[0042] FIG. 6. The five panels show the percentage of samples showing gain (darker shade) or loss (lighter shade) for all 971 tumors (top) and individually for each clinical subtype. The horizontal black lines at the top (and bottom) of a panel associated with a particular clinical subtype indicate regions showing statistically significant increase in gain (and loss) frequencies (FDR < 0.01) for this subtype compared to the other subtypes.

[0043] FIG. 7. Time-dependent receiver operator characteristic (ROC) curves with the area under the curve (AUC) for the full models (markers plus clinical covariates) compared to the clinical-only and marker-only models for 5 year (Panel a, c) and 10 year (Panel b, d) recurrence probability for all breast cancers (a, b) and ER-negative cases only (c, d).

[0044] FIG. 8. Kaplan-Meier (KM) curves of the two selected markers.

DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS [0045] Certain embodiments of the invention relate to determination of a genetic profile and use of the profile in cancer prognosis and personalized treatments. Certain aspects of the present invention are based, in part, on the identification of a genetic profile that are associated with clinical outcome and could therefore serve as a clinical test to predict prognosis in cancer patients, especially breast cancer patients. In particular aspects, the genetic profile could be defined by one or more copy number changes or imbalances, which could significantly improve risk prediction for recurrence, independent of tumor marker- defined subtypes, or prediction of metastasis sites, such as bone or brain. For example, the presence of gain and loss of copy number imbalances within the same genomic loci that confer opposing effects on tumor behavior may be particularly useful for recurrence risk classification, especially for refining among the intrinsically more aggressive tumors (e.g., ER-negative cancers) where there remains a clinical need. In further aspects, measurement of tumor genotypes like copy number changes may have the potential to improve discrimination between indolent and aggressive screen (e.g. , mammogram)-detected tumors and aid patient and physician decision making about the use of cancer treatments. [0046] "Prognosis" refers to as a prediction of how a patient will progress, and whether there is a chance of recovery. "Cancer prognosis" generally refers to a forecast or prediction of the probable course or outcome of the cancer. As used herein, cancer prognosis may include the forecast or prediction of any one or more of the following: duration of survival of a patient susceptible to or diagnosed with a cancer, duration of recurrence-free survival, duration of progression free survival of a patient susceptible to or diagnosed with a cancer, response rate in a group of patients susceptible to or diagnosed with a cancer, duration of response in a patient or a group of patients susceptible to or diagnosed with a cancer, and/or likelihood of metastasis or recurrence in a patient susceptible to or diagnosed with a cancer. Prognosis may also include prediction of favorable responses to cancer treatments, such as a conventional cancer therapy like chemotherapy.

[0047] A favorable or poor prognosis may, for example, be assessed in terms of patient survival, likelihood of disease recurrence or disease metastasis. Patient survival, disease recurrence and metastasis may for example be assessed in relation to a defined time point, e.g. at a given number of years after a cancer treatment (e.g. surgery to remove one or more tumors) or after initial diagnosis. In one embodiment, a favorable or poor prognosis may be assessed in terms of overall survival or disease free survival.

[0048] By "subject" or "patient" is meant any single subject for which therapy is desired, including humans, cattle, dogs, guinea pigs, rabbits, chickens, and so on. Also intended to be included as a subject are any subjects involved in clinical research trials not showing any clinical sign of disease, or subjects involved in epidemiological studies, or subjects used as controls.

I. Biomarkers

[0049] Methods and compositions are disclosed to use copy number information of biomarkers for cancer prognosis and treatment optimizations in certain aspects. In certain aspects, it has been identified a set of biomarkers with copy numbers changes that may discriminate risk of recurrence among early stage breast tumors, particularly relevant to ER- negative and intrinsically more aggressive tumors. Further, the presence of specific copy number changes may promote or, in some cases, limit tumor spread. [0050] In certain aspects of the invention, a set of specific alterations as loss or gain of genomic loci that were identified that improve discrimination between patients for outcomes. Copy number gains and losses were identified using high-density molecular inversion probe arrays on stage I/II breast tumors (n=971) and a boosting strategy applied to fit proportional subdistribution hazards models for copy number and recurrence treating chromosomal segments in a dose specific fashion, for example, -1 (loss), 0 (no change) and +1 (gain). The concordance index (C-index) was used to compare prognostic accuracy between a training (n=728) and test set (n=243) and across models.

[0051] Prognostic copy number changes were identified as indicating a poor prognosis, including, but are not limited to, losses at lpl2, 8p22, 12ql3.13, 13ql2.3, 22ql l, and Xp21 and gains at 2pl l .l, 3ql3.12, 10pl l .21, 10q23.1, l lpl5, 14ql3.2-13.3, 17q21.33, 10pl3, l lql3.5, 12pl3, 16pl l .2, 20ql3, and Xp28. A model combining these 19 copy number imbalance biomarkers (CNIs) with clinical covariates outperformed the tumor subtypes (C- index, train[test] = 0.72[0.71] ± 0.02 vs. 0.62[0.62] ± 0.02, respectively; p < 10^"6 ). Surprisingly, the CNI-based high and low-risk groups showed equally extensive genome instability with features of aggressiveness, but different clinical outcomes. [0052] A copy number changes-based analysis may significantly improve prognostication for breast cancer, especially in more aggressive forms such as estrogen receptor (ER)- negative and intrinsically more aggressive (HER2⁺ and Ki67^Mgh) cases. For example, ER- negative cases assigned a CNI-based low risk score experienced lower recurrence (HR (hazard ratio)=0.06, 95% CI; 0.01-0.42, /?=0.005) compared to cases with no copy number changes in the 19 loci. Interesting is the observation that by combining the copy number prognostic score with clinical covariates, significant gains were achieved in prognostic accuracy in the highly aggressive estrogen receptor (ER) negative tumors, over known clinical prognosticators independent of treatment. Unlike ER-positive disease, these are cancers for which currently no strong marker-based subclassifiers exist. [0053] Most strikingly, only four loci (8p22, l lql3.5, 22ql 1.1-ql 1.21, and Xp21.1-p21.2) are sufficient to distinguish high risk/high recurrent/poor clinical outcomes among these patients. Moreover, loss of six selected loci (8p22, 22ql 1.1-ql 1.21, Xp21.1-p21.2, lpl2, 12ql3.13 and 13ql2.3) may correlate with higher risk of early recurrence among patients compared to gain of the same loci. Possibly, these loci contain a potential tumor suppressive gene/noncoding RNAs. These loci may be targets of key genomic events and be used to facilitate clinical decision-making. II. Copy Number Changes

[0054] Currently, there are mostly crude clinical measurements that are largely based on patient status that are used to predict the clinical course of cancer patients, such as tumor subtypes for breast cancer patients. Lacking is a more tumor-based, biologic assessment that can be used to predict clinical outcomes in these patients and considered in the tailoring of more personalized therapeutic regimens. Copy number changes of genomic loci identified in certain aspects of the present invention may be used to predict prognosis of cancer patients, especially breast cancer patients.

A.Copy number changes

[0055] Copy number changes, copy number imbalances, or copy number variations, a form of structural variation, may refer to alterations of the DNA of a genome that results in the cell having an abnormal number of copies of one or more sections of the DNA. Copy number changes correspond to relatively large regions of the genome that have been deleted (fewer than the normal number) or duplicated (more than the normal number) on certain chromosomes. For example, the chromosome that normally has sections in order as A- B - C -D might instead have sections A- B - C - C -D (a duplication of "C") or A- B -D (a deletion of "C").

[0056] This variation accounts for roughly 12% of human genomic DNA and each variation may range from about one kilobase (1,000 nucleotide bases) to several megabases in size. Copy number changes contrast with single-nucleotide polymorphisms (SNPs), which affect only one single nucleotide base.

[0057] Copy number changes may either be inherited or caused by de novo mutation. A recently proposed mechanism for the cause of some CNVs was fork stalling and template switching, a replication misstep. However, this model was subsequently superseded by microhomology-mediated break-induced replication (MMBIR). [0058] Copy number changes can be caused by structural rearrangements of the genome such as deletions, duplications, inversions, and translocations. Low copy repeats (LCRs), which are region-specific repeat sequences, are susceptible to such genomic rearrangements resulting in copy number changes. Factors such as size, orientation, percentage similarity and the distance between the copies influence the susceptibility of LCRs to genomic rearrangement. Segmental Duplications (SDs) map near ancestral duplication sites in a phenomenon called duplication shadowing which describes the observation of a ~10 fold increased probability of duplication in regions flanking duplications versus other random regions.

[0059] Copy number changes can be limited to a single gene or include a contiguous set of genes. CNVs can result in having either too many or too few of the dosage-sensitive genes, which may be responsible for a substantial amount of human phenotypic variability, complex behavioral traits and disease susceptibility. Elevating the gene copy number of a particular gene can increase the expression of the protein that it encodes.

[0060] Human beings are diploid, so a normal copy number is always two for the non-sex chromosomes. Deletions may be the loss of genetic material. The deletion can be heterozygous (copy number of 1) or homozygous (copy number of 0, nullisomy). Microdeletion syndromes are examples of constitutional disorders due to small deletions in germlme DNA. Deletions in tumor cells may represent the inactivation of a tumor suppressor gene, and may have diagnostic, prognostic, or therapeutic implications. A copy number gain represents the gain of genetic material. If the gain is of just one additional copy of a segment of DNA, it may be called a duplication. If there is one extra copy of an entire chromosome, it may be called a trisomy.

[0061] Copy number gains or loss in tumor cells may have diagnostic, prognostic, or therapeutic implications. Technically, an amplification is a type of copy number gain in which there is a copy number >10. In the context of cancer biology, amplifications are often seen in oncogenes. This could indicate a worse prognosis, help categorize the tumor, or indicate drug eligibility.

B. Detecting copy number changes

[0062] A karyotype is the characteristic chromosome complement of a eukaryote species. A karyotype is typically presented as an image of the chromosomes from a single cell arranged from largest (chromosome 1) to smallest (chromosome 22), with the sex chromosomes (X and/or Y) shown last. Historically, karyotypes have been obtained by staining cells after they have been chemically arrested during cell division. Karyotypes have been used for several decades to identify chromosomal abnormalities in both germline and cancer cells. Conventional karyotypes can assess the entire genome for changes in chromosome structure and number, but the resolution is relatively coarse, with a detection limit of 5- 10 Mb. [0063] Copy number analysis usually refers to the process of analyzing data produced by a test for DNA copy number variation in patient's sample. Such analysis helps detect chromosomal copy number variation that may cause or may increase risks of various critical disorders. Copy number variation can be discovered by cytogenetic techniques such as fluorescent in situ hybridization, comparative genomic hybridization, and with high- resolution array-based tests based on array comparative genomic hybridization (or aCGH) and SNP array technologies. Array-based methods have been accepted as the most efficient in terms of their resolution and high-throughput nature and they are also referred to as Virtual Karyotype. Recent advances in DNA sequencing technology have further enabled the identification of copy number changes by next-generation sequencing.

[0064] For example, BAC (Bacterial Artificial Chromosome)-based aCGH arrays may be used for DNA copy number analysis. This platform is used to identify gross deletions or amplifications in DNA. Such anomalies for example are common in cancer and can be used for diagnosis of many developmental disorders). Data produced by such platforms are usually low to medium resolution in terms of genome coverage. Usually, log-ratio measurements are produced by this technology to represent deviation of patient's copy number state from normal. Such measurements then are studied and those that significantly differ from zero value are announced to represent a part of a chromosome with an anomaly (an abnormal copy number state). Positive log-ratios indicate a region of DNA copy number gain and negative log-ratio values mark a region of DNA copy number loss. Even a single data point can be declared an indication of a copy number gain or a copy number loss in BAC arrays.

[0065] Virtual Karyotype (also array comparative genomic hybridization, chromosomal microarray analysis(CMA), microarray-based comparative genomic hybridization, array CGH, a-CGH, aCGH, or molecular karyotyping, if using SNP-based arrays, also SNP array karyotyping, molecular allelokaryotyping or SOMA) may be used to detect genomic copy number variations at a higher resolution level than conventional karyotyping or chromosome- based comparative genomic hybridization (CGH).

[0066] For example, these platforms for generating high-resolution karyotypes in silico from disrupted DNA have emerged may include array comparative genomic hybridization (arrayCGH) and SNP arrays. Conceptually, the arrays are composed of hundreds to millions of probes which are complementary to a region of interest in the genome. The disrupted DNA from the test sample is fragmented, labeled, and hybridized to the array. The hybridization signal intensities for each probe are used by specialized software to generate a log ratio of test/normal for each probe on the array. Knowing the address of each probe on the array and the address of each probe in the genome, the software lines up the probes in chromosomal order and reconstructs the genome in silico. [0067] Virtual karyotypes have dramatically higher resolution than conventional cytogenetics. The actual resolution will depend on the density of probes on the array. Currently, the Affymetrix SNP6.0 is the highest density commercially available array for virtual karyotyping applications. It contains 1.8 million polymorphic and non-polymorphic markers for a practical resolution of 10-20 kb— about the size of a gene. This is approximately 1000-fold greater resolution than karyotypes obtained from conventional cytogenetics.

[0068] Array-based karyotyping can be done with several different platforms, both laboratory-developed and commercial. The arrays themselves can be genome-wide (probes distributed over the entire genome) or targeted (probes for genomic regions known to be involved in a specific disease) or a combination of both. Further, arrays used for karyotyping may use non-polymorphic probes, polymorphic probes {i.e., SNP-containing), or a combination of both. Non-polymorphic probes can provide only copy number information, while SNP arrays can provide both copy number and loss-of-heterozygosity (LOH) status in one assay. The probe types used for non-polymorphic arrays include cDNA, BAC clones {e.g., BlueGnome), and oligonucleotides {e.g., Agilent, Santa Clara, CA, USA or Nimblegen, Madison, WI, USA). Commercially available oligonucleotide SNP arrays can be solid phase (Affymetrix, Santa Clara, CA, USA) or bead-based (Illumina, San Diego, CA, USA). Despite the diversity of platforms, ultimately they all use genomic DNA from disrupted cells to recreate a high resolution karyotype in silico. The end product does not yet have a consistent name, and has been called virtual karyotyping, digital karyotyping, molecular allelokaryotyping, and molecular karyotyping. Other terms used to describe the arrays used for karyotyping include SOMA (SNP oligonucleotide microarrays) and CMA (chromosome microarray). Some consider all platforms to be a type of array comparative genomic hybridization (arrayCGH), while others reserve that term for two-dye methods, and still others segregate SNP arrays because they generate more and different information than two- dye arrayCGH methods. III. Tissue Samples

[0069] In certain aspects to carry out the method of the invention, a sample may be obtained from the subject. In a particular embodiment, said sample is a tumor tissue sample or portion thereof. In a more particular embodiment, said tumor tissue sample is a breast tumor tissue sample from a patient suffering from or determined to have a breast cancer. Said sample can be obtained by conventional methods, e.g., biopsy, by using methods well known to those of ordinary skill in the related medical arts. Methods for obtaining the sample from the biopsy include gross apportioning of a mass, or microdissection or other art-known cell- separation methods. Tumor cells can additionally be obtained from fine needle aspiration cytology.

[0070] Samples can be obtained from subjects previously diagnosed or not with breast cancer, or from subjects who are receiving or have previously received anti-breast cancer treatment. In a particular embodiment, samples can be obtained from patients who have not previously received any anti-breast cancer treatment. [0071] In order to simplify conservation and handling of the samples, these samples can be formalin-fixed and paraffin-embedded, or first frozen and then embedded in a cryosolidifiable medium, such as OCT-Compound, through immersion in a highly cryogenic medium that allows for rapid freeze. In a particular embodiment, the presence of the copy number changes may be determined using nucleic acids obtained from as fresh tissue or from a biopsy or fine needle aspiration cytology.

[0072] Other tissue samples are envisaged, such a formalin-fixed, paraffin-embedded tissue sample depending on their availability. Fixed and paraffin-embedded tissue samples are broadly used storable or archival tissue samples in the field of oncology. Nucleic acid may be isolated from an archival pathological sample or biopsy sample which is first deparaffmized. An exemplary deparaffmization method involves washing the paraffmized sample with an organic solvent, such as xylene, for example. Deparaffmized samples can be rehydrated with an aqueous solution of a lower alcohol. Suitable lower alcohols, for example include, methanol, ethanol, propanols, and butanols. Deparaffmized samples may be rehydrated with successive washes with lower alcoholic solutions of decreasing concentration, for example. Alternatively, the sample is simultaneously deparaffmized and rehydrated. The sample may be then lysed and nucleic acid is extracted from the sample. As an illustrative, non-limitative example, tissue selected for fixation and paraffin embedding can be fixed in 10% buffered formalin for 16 hours to 48 hours. After this period of time, said tissue will be embedded in paraffin following conventional techniques. Nevertheless, nucleic acid quality issues are especially delicate when analyzing formalin-fixed tissue samples.

[0073] Because of the variability of the cell types in diseased-tissue biopsy material, and the variability in sensitivity of the diagnostic methods used, the sample size required for analysis may range from 1, 10, 50, 100, 200, 300, 500, 1,000, 5,000, 10,000, to 50,000 or more cells. The appropriate sample size may be determined based on the cellular composition and condition of the biopsy or cytology, and the standard preparative steps for this determination and subsequent isolation of the nucleic acid for use in the invention are well known to one of ordinary skill in the art.

IV. Nucleic Acid Extraction and Amplification

[0074] Using standard methods, the biological sample may be treated to physically or mechanically disrupt tissue or cell structure, to release intracellular components into an aqueous or organic solution to prepare nucleic acids for further analysis. The nucleic acids may be extracted from the sample by procedures known to the skilled person and commercially available. In a particular embodiment, the total DNA extracted from tissue samples will represent the working material suitable for subsequent detection of the genetic marker of interest.

[0075] The term "nucleic acid" refers to a multimeric compound comprising nucleosides or nucleoside analogues which have nitrogenous heterocyclic bases, or base analogues, which are linked by phosphodiester bonds to form a polynucleotide.

[0076] The term "DNA" refers to deoxyribonucleic acid. A DNA sequence is a deoxyribonucleic sequence. DNA is a long polymer of nucleotides and encodes the sequence of the amino acid residues in proteins using the genetic code. [0077] Once the sample has been obtained and the total DNA has been extracted, amplification of nucleic acid may be carried out in order to produce sufficient sample material for further detection procedures. Several techniques can be used for producing sufficient starting material. These techniques include polymerase chain reaction (PCR), degenerate primer PCR using one or several sets of primers, rolling circle amplification, etc. Examples of techniques sufficient to direct persons of skill through in vitro amplification methods are found in Mullis et al, U.S. Pat. No. 4,683,202 (1987); and Innis et al. (1990). Commercially available kits for genomic PCR amplification are known in the art. See, e.g., Advantage-GC Genomic PCR Kit (Clontech). Additionally, e.g., the T4 gene 32 protein (Boehringer Mannheim) can be used to improve yield of long PCR products. [0078] In a particular embodiment, the amplification of the DNA is carried out by means of PCR. The general principles and conditions for amplification and detection of nucleic acids, such as using PCR, are well known for the skilled person in the art.

V. Cancer Prognosis

[0079] Prognosis is a medical term to describe the likely outcome of an illness. A prognosis may include expected time, function, and a description of the disease course such as progressive decline, intermittent crisis, or sudden, unpredictable crisis.

[0080] A prognosis may be a prediction of outcome and the probability of progression-free survival (PFS) or disease-free survival (DFS). These predictions are based on knowledge of breast cancer patients with similar classification. A prognosis may be an estimate, as patients with the same classification will survive a different amount of time, and classifications are not always precise. Survival is usually calculated as an average number of months (or years) that 50% of patients survive, or the percentage of patients that are alive after 1, 5, 15, and 20 years. Prognosis may be important for treatment decisions because patients with a good prognosis are usually offered less invasive treatments, such as lumpectomy and radiation or hormone therapy, while patients with poor prognosis are usually offered more aggressive treatment, such as more extensive mastectomy and one or more chemotherapy drugs.

[0081] In numerous embodiments of the present invention, the presence of copy number changes in selected genomic regions may be detected in a biological sample. In some embodiments, the biological sample comprises a tissue sample from a tissue suspected of containing cancerous cells. Human genomic DNA samples can be obtained by any means known in the art. The nucleic acid-containing specimen used for detection of copy number changes may be from any source and may be extracted by a variety of techniques such as those described by Ausubel et al. (1995) or Sambrook et al. (2001).

[0082] The methods of certain aspects of the invention can be used to evaluate individuals known or suspected to have cancer, particularly breast cancer, or as a routine clinical test e.g., in an individual not necessarily suspected to have cancer. Further diagnostic assays can be performed to confirm or determine the status of cancer in the individual.

[0083] Further, the present methods may be used to assess the efficacy of a course of treatment. For example, the efficacy of an anti-cancer treatment can be assessed by monitoring copy number changes of the marker sequences described herein over time in a mammal having cancer, particularly breast cancer.

[0084] In some embodiments, the methods may be used in conjunction with additional prognostic or diagnostic methods, e.g., detection of other cancer markers, etc. Accordingly, detection of any one or more of the copy number biomarkers as described above can be used either alone, or in combination with other markers, for the diagnosis or prognosis of cancer.

[0085] The methods of certain aspects of the present invention can be used to determine the optimal course of treatment in a mammal with cancer. For example, a certain copy number profile associated with a poor prognosis can indicate a reduced survival expectancy of a mammal with cancer, particularly breast cancer, thereby indicating a more aggressive treatment for the mammal. In addition, a correlation can be readily established between the copy number profile of selected biomarkers, as described herein, and the relative efficacy of one or another anti-cancer agent. Such analyses can be performed, e.g., retrospectively, i.e., by detecting copy numbers in one or more of the biomarkers in samples taken previously from mammals that have subsequently undergone one or more types of anti-cancer therapy, and correlating the known efficacy of the treatment with the copy number profile of one or more of the biomarkers as described above.

[0086] In making a diagnosis, prognosis, risk assessment, classification, detection of recurrence or selection of therapy based on the copy number profile of two or more of the biomarkers, the quantity of copy number changes may be combined to provide an integrated analysis. In a particular embodiment, a prognostic value may be generated based on the copy number determination. For example, the prognostic value may be provided by employing a linear combination of the dosage of target copy numbers with coefficients computed from a regression analysis. Based on the value of the prognostic score, a patient group may be categorized into appropriate risk groups, for example, low (prognostic score < 0), intermediate (prognostic score = 0, no copy number changes for any of the selected markers), and high (prognostic score > 0). For example, a prognostic score for a 3-probe set breast cancer biomarker can take the form of:

[0087] Prognostic score=al *(-l, 0 or +l)*(copy number of biomarker l)+a2*(-l, 0 or +l)*(copy number of biomarker 2)+a3*(-l, 0 or +l)*(copy number of biomarker 3). (As used herein, refers to copy number loss, "0" refers to no change in copy number, and refers to copy number gain; al, a2 and a3 are coefficients of specific biomarker changes derived from a regression analysis, such as a Cox-based proportional hazards regression model).

[0088] For example, suppose that through a prior analysis, the coefficients were found to be: al=l, a2=l, a3=-2, and that a prognostic score <0 was found to correspond to a low risk of recurrence. For a breast cancer patient with a tumor having the following characteristics:

[0089] Copy number of biomarker 1=4 (gain);

[0090] Copy number of biomarker 2=2 (no change);

[0091] Copy number of biomarker 3=4 (gain).

[0092] The prognostic score may be computed as:

PI=l *(+l)*(4)+l *0*(2)-2*(+l)(4)=-4<0, so this patient would be considered at low risk of distant recurrence.

[0093] If there are both positive and negative coefficients, then this prognostic score may be expressed as a ratio, such as ratio of copy numbers. For test implementation, ratio predictors may have certain advantages over linear combination predictors.

[0094] In some embodiments, the methods comprise recording a diagnosis, prognosis, risk assessment or classification, based on the copy number status determined from an individual. Any type of recordation is contemplated, including electronic recordation, e.g., by a computer.

VI. Breast Cancer

[0095] Certain embodiments of the present invention provide for determination of copy number changes in a subject's cancer. The copy number information may be used for cancan prognosis, assessment, classification and/or treatment. Cancers which may be examined by a method described herein may include, but are not limited to, glioma, gliosarcoma, anaplastic astrocytoma, medulloblastoma, lung cancer, small cell lung carcinoma, cervical carcinoma, colon cancer, rectal cancer, chordoma, throat cancer, Kaposi's sarcoma, lymphangiosarcoma, lymphangioendotheliosarcoma, colorectal cancer, endometrium cancer, ovarian cancer, breast cancer, pancreatic cancer, prostate cancer, renal cell carcinoma, hepatic carcinoma, bile duct carcinoma, choriocarcinoma, seminoma, testicular tumor, Wilms' tumor, Ewing's tumor, bladder carcinoma, angiosarcoma, endotheliosarcoma, adenocarcinoma, sweat gland carcinoma, sebaceous gland sarcoma, papillary sarcoma, papillary adenosarcoma, cystadenosarcoma, bronchogenic carcinoma, medullar carcinoma, mastocytoma, mesothelioma, synovioma, melanoma, leiomyosarcoma, rhabdomyosarcoma, neuroblastoma, retinoblastoma, oligodentroglioma, acoustic neuroma, hemangioblastoma, meningioma, pinealoma, ependymoma, craniopharyngioma, epithelial carcinoma, embryonic carcinoma, squamous cell carcinoma, base cell carcinoma, fibrosarcoma, myxoma, myxosarcoma, liposarcoma, chondrosarcoma, osteogenic sarcoma, leukemia, and the metastatic lesions secondary to these primary tumors.

[0096] In particular aspects, the methods and composition may be used in breast cancer. Breast cancer (malignant breast neoplasm) is cancer originating from breast tissue, most commonly from the inner lining of milk ducts or the lobules that supply the ducts with milk.^ Cancers originating from ducts are known as ductal carcinomas; those originating from lobules are known as lobular carcinomas.

[0097] The size, stage, rate of growth, and other characteristics of the tumor determine the kinds of treatment. Treatment may include surgery, drugs (hormonal therapy and chemotherapy), radiation and/or immunotherapy. Surgical removal of the tumor provides the single largest benefit, with surgery alone being capable of producing a cure in many cases. To somewhat increase the likelihood of long-term disease-free survival, several chemotherapy regimens are commonly given in addition to surgery. Most forms of chemotherapy kill cells that are dividing rapidly anywhere in the body, and as a result cause temporary hair loss and digestive disturbances. Radiation may be added to kill any cancer cells in the breast that were missed by the surgery, which usually extends survival somewhat, although radiation exposure to the heart may cause heart failure in the future. Some breast cancers are sensitive to hormones such as estrogen and/or progesterone, which makes it possible to treat them by blocking the effects of these hormones. A. Breast Cancer Classification

[0098] Breast cancers can be classified by different schemata. Every aspect influences treatment response and prognosis. Description of a breast cancer would optimally include multiple classification aspects, as well as other findings, such as signs found on physical exam. Classification aspects may include stage (TNM), pathology, grade, receptor status, and the presence or absence of genes as determined by DNA testing and any of these classifications may be combined with CNI-based markers described herein. In particular aspects, the copy number information of selected biomarkers may be used in a selected breast cancer subtype to improve prognosis. 1. Stage

[0099] The TNM classification for breast cancer is based on the size of the tumor (T), whether or not the tumor has spread to the lymph nodes (N) in the armpits, and whether the tumor has metastasized (M) (i.e., spread to a more distant part of the body). Larger size, nodal spread, and metastasis have a larger stage number and a worse prognosis. [00100] The main stages are:

[00101] Stage 0 is a pre-cancerous or marker condition, either ductal carcinoma in situ (DCIS) or lobular carcinoma in situ (LCIS).

[00102] Stages 1-3 are defined as 'early' cancer with a good prognosis.

[00103] Stage 4 is defined as 'advanced' and/or 'metastatic' cancer with a poor prognosis. 2. Histopathology

[00104] Breast cancer is usually classified primarily by its histological appearance. Most breast cancers are derived from the epithelium lining the ducts or lobules, and these cancers are classified as ductal or lobular carcinoma. Carcinoma in situ is growth of low grade cancerous or precancerous cells in particular tissue compartment such as the mammary duct without invasion of the surrounding tissue. In contrast, invasive carcinoma does not confine itself to the initial tissue compartment and invades the surrounding tissue.

3. Grade (Bloom-Richardson grade)

[00105] When cells become differentiated, they take different shapes and forms to function as part of an organ. Cancerous cells lose that differentiation. In cancer grading, tumor cells are generally classified as well differentiated (low grade), moderately differentiated (intermediate grade), and poorly differentiated (high grade). Poorly differentiated cancers have a worse prognosis.

4. Receptor status

[00106] Cells have receptors on their surface and in their cytoplasm and nucleus. Chemical messengers such as hormones bind to these receptors, and this causes changes in the cell. Breast cancer cells may or may not have three important receptors: estrogen receptor (ER), progesterone receptor (PR), and HER2/neu.

[00107] ER⁺ cancer cells depend on estrogen for their growth, so they can be treated with drugs to block estrogen effects (e.g. tamoxifen), and generally have a better prognosis.

[00108] HER2 breast cancer had a worse prognosis, but HER2 cancer cells respond to drugs such as the monoclonal antibody, trastuzumab, (in combination with conventional chemotherapy) and this has improved the prognosis significantly. Cells with none of these receptors are called basal-like or triple negative. 5. DNA changes

[00109] DNA assays of various types including DNA microarrays have compared normal cells to breast cancer cells. The specific changes in a particular breast cancer, such as copy number changes, can be used to classify the cancer in several ways, and may assist in choosing the most effective treatment for that DNA type. B. Breast Cancer Detection

[00110] The first noticeable symptom of breast cancer is typically a lump that feels different from the rest of the breast tissue. More than 80% of breast cancer cases are discovered when the woman feels a lump. The earliest breast cancers are detected by a mammogram (American Cancer Society, 2007). Lumps found in lymph nodes located in the armpits can also indicate breast cancer.

[00111] Indications of breast cancer other than a lump may include changes in breast size or shape, skin dimpling, nipple inversion, or spontaneous single-nipple discharge. Pain ("mastodynia") is an unreliable tool in determining the presence or absence of breast cancer, but may be indicative of other breast health issues. [00112] Inflammatory breast cancer is a special type of breast cancer which can pose a substantial diagnostic challenge. Symptoms may resemble a breast inflammation and may include pain, swelling, nipple inversion, warmth and redness throughout the breast, as well as an orange-peel texture to the skin referred to as peau d'orange. [00113] Another reported symptom complex of breast cancer is Paget's disease of the breast. This syndrome presents as eczematoid skin changes such as redness and mild flaking of the nipple skin. As Paget's advances, symptoms may include tingling, itching, increased sensitivity, burning, and pain. There may also be discharge from the nipple. Approximately half of women diagnosed with Paget's also have a lump in the breast. [00114] In rare cases, what initially appears as a fibroadenoma (hard movable lump) could in fact be a phyllodes tumor. Phyllodes tumors are formed within the stroma (connective tissue) of the breast and contain glandular as well as stromal tissue. Phyllodes tumors are not staged in the usual sense; they are classified on the basis of their appearance under the microscope as benign, borderline, or malignant. [00115] Occasionally, breast cancer presents as metastatic disease, that is, cancer that has spread beyond the original organ. Metastatic breast cancer will cause symptoms that depend on the location of metastasis. Common sites of metastasis include bone, liver, lung and brain (Lacroix, 2006). Unexplained weight loss can occasionally herald an occult breast cancer, as can symptoms of fevers or chills. Bone or joint pains can sometimes be manifestations of metastatic breast cancer, as can jaundice or neurological symptoms. These symptoms are "non-specific", meaning they can also be manifestations of many other illnesses.

[00116] Most symptoms of breast disorder do not turn out to represent underlying breast cancer. Benign breast diseases such as mastitis and fibroadenoma of the breast are more common causes of breast disorder symptoms. The appearance of a new symptom should be taken seriously by both patients and their doctors, because of the possibility of an underlying breast cancer at almost any age.

1. Breast Cancer Screening

[00117] Breast cancer screening refers to testing otherwise-healthy women for breast cancer in an attempt to achieve an earlier diagnosis. The assumption is that early detection will improve outcomes. A number of screening test have been employed including: clinical and self breast exams, mammography, genetic screening, ultrasound, and magnetic resonance imaging.

[00118] A clinical or self breast exam involves feeling the breast for lumps or other abnormalities. Research evidence does not support the effectiveness of either type of breast exam, because by the time a lump is large enough to be found it is likely to have been growing for several years and will soon be large enough to be found without an exam (Kosters and Gotzsche, 2003). Mammographic screening for breast cancer uses x-rays to examine the breast for any uncharacteristic masses or lumps. The Cochrane collaboration in 2009 concluded that mammograms reduce mortality from breast cancer by 15 percent but also result in unnecessary surgery and anxiety, resulting in their view that it is not clear whether mammography screening does more good or harm (Gotzsche and Nielsen, 2009). Many national organizations recommend regular mammography, nevertheless. For the average woman, the U.S. Preventive Services Task Force recommends mammography every two years in women between the ages of 50 and 74. The Task Force points out that in addition to unnecessary surgery and anxiety, the risks of more frequent mammograms include a small but significant increase in breast cancer induced by radiation.

[00119] In women at high risk, such as those with a strong family history of cancer, mammography screening is recommended at an earlier age and additional testing may include genetic screening that tests for the BRCA genes and / or magnetic resonance imaging. 2. Breast Cancer Diagnosis

[00120] While screening techniques are useful in determining the possibility of cancer, a further testing is necessary to confirm whether a lump detected on screening is cancer, as opposed to a benign alternative such as a simple cyst.

[00121] Very often the results of noninvasive examination, mammography and additional tests that are performed in special circumstances such as ultrasound or MR imaging are sufficient to warrant excisional biopsy as the definitive diagnostic and curative method.

[00122] Both mammography and clinical breast exam, also used for screening, can indicate an approximate likelihood that a lump is cancer, and may also detect some other lesions. When the tests are inconclusive, Fine Needle Aspiration and Cytology (FNAC) may be used. FNAC may be done in a GP's office using local anesthetic if required, involves attempting to extract a small portion of fluid from the lump. Clear fluid makes the lump highly unlikely to be cancerous, but bloody fluid may be sent off for inspection under a microscope for cancerous cells. Together, these three tools can be used to diagnose breast cancer with a good degree of accuracy. [00123] Other options for biopsy include core biopsy, where a section of the breast lump is removed, and an excisional biopsy, where the entire lump is removed.

[00124] In addition vacuum-assisted breast biopsy (VAB) may help diagnose breast cancer among patients with a mammographically detected breast in women.

C. Breast Cancer Treatment

[00125] Breast cancer is usually treated with surgery and then possibly with chemotherapy or radiation, or both. Hormone positive cancers are treated with long term hormone blocking therapy. Treatments are given with increasing aggressiveness according to the prognosis and risk of recurrence.

[00126] Stage 1 cancers (and DCIS) have an excellent prognosis and are generally treated with lumpectomy and sometimes radiation. HER2⁺ cancers should be treated with the trastuzumab (Herceptin) regime. Chemotherapy is uncommon for other types of stage 1 cancers.

[00127] Stage 2 and 3 cancers with a progressively poorer prognosis and greater risk of recurrence are generally treated with surgery (lumpectomy or mastectomy with or without lymph node removal), chemotherapy (plus trastuzumab for HER2⁺ cancers) and sometimes radiation (particularly following large cancers, multiple positive nodes or lumpectomy).

[00128] Stage 4, metastatic cancer, (i.e. spread to distant sites) has poor prognosis and is managed by various combination of all treatments from surgery, radiation, chemotherapy and targeted therapies. 10 year survival rate is 5% without treatment and 10% with optimal treatment.

D. Other Prognosis Factors

[00129] In addition to biomarkers identified herein, other prognostic factors may be used in combination with copy number information, including staging (i.e., tumor size, location, grade, whether disease has spread to other parts of the body, or marker status), recurrence of the disease, or age of patient.

[00130] Stage is the most important, as it takes into consideration size, local involvement, lymph node status and whether metastatic disease is present. The higher the stage at diagnosis, the poorer the prognosis. The stage is raised by the invasiveness of disease to lymph nodes, chest wall, skin or beyond, and the aggressiveness of the cancer cells. The stage is lowered by the presence of cancer-free zones and close-to-normal cell behavior (grading). Size is not a factor in staging unless the cancer is invasive. For example, Ductal Carcinoma In Situ (DCIS) involving the entire breast will still be stage zero and consequently an excellent prognosis with a 10-year disease free survival of about 98%.

[00131] Grading is based on how biopsied, cultured cells behave. The closer to normal cancer cells are, the slower their growth and the better the prognosis. If cells are not well differentiated, they will appear immature, will divide more rapidly, and will tend to spread. Well differentiated is given a grade of 1, moderate is grade 2, while poor or undifferentiated is given a higher grade of 3 or 4 (depending upon the scale used).

[00132] The most widely used grading system is the Nottingham Modification of the Bloom-Richardson system, which grades breast carcinomas by adding up scores for tubule formation, nuclear pleomorphism, and mitotic count, each of which is given 1 to 3 points, which add to a final score as follows: 3-5 points = Grade I; 6-7 points = Grade II; and 8-9 points = Grade III.

[00133] The presence of estrogen and progesterone receptors in the cancer cell is important in guiding treatment. Those who do not test positive for these specific receptors will not be able to respond to hormone therapy, and this can affect their chance of survival depending upon what treatment options remain, the exact type of the cancer, and how advanced the disease is.

[00134] In addition to hormone receptors, there are other cell surface proteins that may affect prognosis and treatment. HER2 status directs the course of treatment. Patients whose cancer cells are positive for HER2 have more aggressive disease and may be treated with the 'targeted therapy', trastuzumab (Herceptin), a monoclonal antibody that targets this protein and improves the prognosis significantly. Tumors overexpressing the Wnt signaling pathway co-receptor low-density lipoprotein receptor-related protein 6 (LRP6) may represent a distinct subtype of breast cancer and a potential treatment target.

VII. Kits

[00135] Certain aspects of this invention also provide kits for the detection and/or quantification of the biomarker copy numbers using the methods described herein.

[00136] In a certain aspect, there may be provided a kit for predicting prognosis in a patient with or determined to have a breast cancer, said kit comprising one or more nucleic acid probes which selectively bind to a target polynucleotide sequence on one or more selected genomic regions, under conditions in which the probe forms a stable hybridization complex with the target polynucleotide sequence(s). Said probes can be directly or indirectly labeled. Thus, in a particular embodiment of the invention, the probes may be directly labeled. In another particular embodiment of the invention, the probe may be indirectly labeled. In a particular embodiment of the invention the nucleic acid probes may be attached to a solid surface. In another particular embodiment, the attached probes may be a member of a nucleic acid array.

[00137] The kit can include one or more containers for the probe or probes. In some embodiments, the kit contains separate containers, dividers or compartments for the probes and the informational material. For example, the probe or probes can be contained in vials and the informational material can be contained in a plastic sleeve or packet. In other embodiments, the separate elements of the kit are contained within a single, undivided container. For example, the probes are contained in different vials that have attached thereto the informational material in the form of a label. In some embodiments, the kit includes a plurality (e.g., a pack) of individual containers. For example, the kit includes a plurality of vials for the different probes and informational material thereof. [00138] Kits may comprise a container with a label. Suitable containers include, for example, bottles, vials, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. The container may hold a composition which includes a probe that is useful for prognostic or non-prognostic applications, such as described above. The label on the container may indicate that the composition is used for a specific prognostic or non-prognostic application, and may also indicate directions for either in vivo or in vitro use, such as those described above. The kit of the invention will typically comprise the container described above and one or more other containers comprising materials desirable from a commercial and user standpoint, including buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.

VIII. Examples

[00139] The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.

Example 1 - Selective copy number imbalances predict clinical outcome in breast cancer

[00140] Molecular inversion probe (MIP) determined CNIs and their association with tumor characteristics and tumor marker-defined tumor subtypes. CNIs using MIP -based arrays were determined in 971 stage I/II breast cancer patients. All subtypes showed recurrent gains at lq, 8p-q, l lq, 14q, 16p, and 20q, with recurrent losses at 8p (FIG. 6). Recurrent gains at 16pl3 and losses at 16ql2 were present in estrogen receptor (ER)-positive tumors; however, recurrent losses at 16p36, 6ql4, and 20ql 1.1-13.33, along with gains at 7p21.3- 21.1, were present but limited to ER⁺/Ki67^Mgl1 tumors. Gains in 4ql3.3-21.21 and 17ql 1.1- 23.2 (containing the ERBB2 gene) were common among HER2⁺ tumors. When separated on ER status, HER2⁺/ER⁺ tumors were similar to HER2⁺/ER- tumors for the extent and type of CNIs, with the exception of a significantly higher proportion of ER⁺ tumors exhibiting gains in 17q distal to the ERBB2 locus (45.7%) and gains at 8pl2 (28.4%), compared with ER tumors (20.8%> and 12.2%, respectively; FDR > 0.05). Triple negative breast cancers (TNBCs) showed numerous recurrent CNIs not common in other subtypes, including losses at 3pl2.3-12, 14ql3.3, 15ql2-14, and Xp22.21-11.23, and gains at lpl2, 6ql6.2-23.1, 7p22.1-q35, 9p24.3-21.3, l lpl3-12, 12pl3.33-l 1.2, 13q33.3-34, and 18pl 1.32-11.21. TNBCs exhibited extensive losses on chromosome 4 (4pl6.1-q35.2) and loss of the 5q arm. Recurrent gains at 3q22.3-29, 5pl5.33-13.1, and 17q23.2-25.3, and losses at 9p21.2-21.1 and 13ql4.2-31.1, were present among the more clinically aggressive subtypes (ER⁺/Ki67^Mgl1, HER2⁺, and TNBC), but not the less aggressive ER⁺/Ki67^low group. [00141] CNIs as prognostic markers for breast cancer recurrence. Using a training set (n=723) and the CoxBoost algorithm described by Binder and Schumacher (2008) to build a Cox proportional hazards model using the high-dimensional segment data, several CNIs relevant to recurrence were identified, such as imbalances at lpl2, 2p 11.1, 3ql3.12, 10pl l .21, 10q23.1, l lpl5, 12ql3.13, 13ql2.3, 14ql3.2-13.3, 17q21.33, 22ql l and Xp21, consistent with previous studies reporting higher risk of recurrence among breast tumors (Bergamaschi et ah, 2006; Chin et ah, 2006).

[00142] FIG. 1 shows the time-to-recurrence for each of the individual markers. Detailed information on these CNIs, correlated segments, and genes contained within the segments is provided in Table 1.

Table 1. Start and Stop Boundaries and Genes of the 19 CNIs associated with recurrence of early stage breast cancer

Cytoband Start Stop GeneSymbol R²

12pl3.32 3394093 3630092 PRMT8 1.00

12ql3.13 50493755 51600159 ACVRL1, ANKRD33, ACVR1B, GRASP, 1.00

C12orf44, NR4A1, KRT7, KRT80, KRT81,

KRT83, KRT86, KRT6B, KRT75, KRT82,

KRT84, KRT85, KRT5, KRT6A, KRT6C,

KRT71, KRT72, KRT74, KRT1, KRT2,

KRT73, KRT77, KRT3, KRT4, KRT76,

KRT78, KRT79

13ql2.3 28554115 29380652 KIAA0774,KIAA0774,SLC7A1,UBL3 1.00

13ql2.3 29380653 29705278 0.97

14ql3.2,ql 35380230 36252346 MBIP, NKX2-1, SFTA3, NKX2-8, PAX9 1.00 3.3

16pll.2 31526202 35843070 C16orf58, C16orf67, ERAF, SLC5A2, 1.00

TGFB1I1, ZNF267, Z NF720, HERC2P4,

LOC440366, LOC729355,

SLC6A10P,TP53TG3

17q21.33 47411130 48137311 1.00

20ql3.33 59100794 59456750 0.96

20ql3.33 59456751 59788832 CDH4 1.00

20ql3.33 59788833 60260005 GTPBP5, LSM14B, PSMA7, SS18L1, TAF4 0.95

22qll.l,ql 15236255 16625906 CCT8L2, CECR8, XKR3, psiTPTE22, GAB4, 1.00 1.21 CECR6, CECR7, IL17RA, CECR1, CECR4,

CECR5, ATP6V1E1, CECR2, SLC25A18,

BCL2L13, BID

Xp21.1,p2 30907133 32653344 DMD, DMD, DMD, DMD 1.00 1.2

Xp21.1 32653345 33748113 0.95

Xq28 151081086 151871524 GABRA3, MAGEA10, MAGEA5, CETN2, 1.00

CSAG1, CSAG2, CSAG3, GABRQ,

MAGEA12, MAGEA2, MAGEA2B,

MAGEA3, MAGEA6, NSDHL, PNMA5,

ZNF185

^Regions shown are expanded to include segments that were highly correlated with the one selected by the CoxBoost algorithm at the 0.95 or higher correlation level.

[00143] Selective CNIs predict clinical outcome and improve prognostication for both ER-positive and ER-negative breast cancers. Next, to assess the independent and joint effect of the selected CNIs with clinical covariates in determining risk of recurrence, clinical models were compared to models that considered the 19 CNI markers alone or in combination with clinical variables including tumor subtypes as ER⁺/ Ki67^low, ER⁺/Ki67^Mgh, ERBB2 gene amplified (HER2⁺), and triple negative (TNBC [ER7PR7HER2 ]). Performance of the various models was evaluated by computing the concordance index (C- Index) for each model. The C-Index is a measure of the probability of agreement between what the model predicts and the actual observed risk of breast cancer recurrence.

[00144] The performance of the models for all cancers in the training and test sets are compared in FIG. 2, Panel a. Shown are the C-Indices for the training set ± 95% confidence interval and the estimate for the test set to assess the degree of overfitting of the models. Applied across all breast cancers, the 19 marker-only model is a significantly stronger predictor for recurrence (C-index = 0.68 ± 0.03; p < 0.001) than either the multivariate model with the clinical covariates alone (C-index = 0.61 ± 0.02) or the model with clinical covariates plus tumor-marker defined tumor subtypes (C-index = 0.62 ± 0.02). Though not significantly different from the model with only the 19 CNIs (p = 0.13), the full model containing the clinical covariates, tumor subtypes, and 19 markers performed the best in both the training (C-index = 0.72) and test (C-index = 0.71) sets. The details of the final multivariate containing the markers, tumor subtypes, and clinical characteristics are available in Table 2.

Table 2. Full Multivariate Cox model, based on the training set (n=728)

Factor N (N recurrence) HR (95% CI) P value

20ql3.33 gain 1.27 (0.88 - 1.83) 0.21

Xq28 gain 1.87 (1.19 - 2.94) 0.007 lpl2 loss 1.96 (1.10 - 3.45) 0.02

8p22 loss 1.52 (1.09 - 2.08) 0.016

10pl l .21 gain 0.24 (0.11 - 0.53) <0.001

12ql3.13 loss 1.96 (0.98 - 3.85) 0.06

16pl l .2 loss 1.54 (0.98 - 2.38) 0.06

17q21.33 gain 0.31 (0.18 - 0.54) <0.001

22qll.l-qll.21 loss 1.82 (1.10 - 3.01) 0.02

Xp21.1-p21.2 loss 2.78 (1.72 - 4.55) <0.001

The clinical covariates shown were selected from a step-wise model selection procedure that minimizes the AIC, except for age at diagnosis.

[00145] FIG. 2 compares the C-index obtained with the full model to that of the clinical covariate models for all the cases (FIG. 2, panel a) and then stratified by each of the clinically recognized subgroups and for the tumor subtypes. When separated on ER status (FIG. 2, panels b and c), lymph node status (data not shown), HER2 (FIG. 2, panel f), Ki67 (data not shown), or tumor subtypes (FIG. 2, panels d-g), it was found that the CNI-based model significantly improved prognostication in all clinically-defined subgroups, with the greatest gain in prognostication accuracy observed among the more intrinsically aggressive ER-negative patients (C-index= 0.78 versus 0.63, p = 0.0009), ER⁺/Ki67^Mgl1 (C-index=0.71 versus 0.50; p =0.0016), and HER2⁺ tumors (C-index = 0.78 versus 0.64; p = 0.014).

[00146] Risk index based on CNIs and recurrence probability. To classify individuals based on the markers alone and to gain some insight on how the set of CNIs relate to clinical characteristics, risk categories of low, 'no CNI' (no copy number imbalance in any of the 19 markers), and high were created using the hazard of recurrence of the 19 CNI states as described in the methods. FIGS. 3A-3B show the time-to-recurrence for all breast cancers defined using the 19 marker-only risk categories of low (15.8% of cases), no CNI (46.2% of cases), or high risk (38% of cases) for both the training (FIG. 3A) and test (FIG. 3B) sets. Table 3 shows the association between the clinical characteristics and the prognosis signature for the marker-only defined risk groups. Table 3. Association Between Clinicopathological Characteristics and Marker-based Risk Signatures

P

Low Risk (n=153) No CNV Risk (n=449) High Risk (n=369)

Characteristic value

Age (years) 0.04

<40 14 (9.2) 50 (11.1) 50 (13.6)

40-50 42 (27.5) 131 (29.2) 115 (31.2)

50-60 41 (26.8) 110 (24.5) 111 (30.1)

>60 56 (36.6) 155 (34.5) 91 (24.7)

NA 0 (0) 3 (0.7) 2 (0.5)

Race 0.70

White 110 (71.9) 334 (74.4) 271 (73.4)

African American 21 (13.7) 51 (11.4) 53 (14.4)

Hispanic 21 (13.7) 59 (13.1) 43 (11.7)

NA 1 (0.7) 5 (1.1) 2 (0.5)

Lymph node status 0.50

Negative 87 (56.9) 268 (59.7) 210 (56.9)

Positive 62 (40.5) 167 (37.2) 154 (41.7)

NA 4 (2.6) 14 (3.1) 5 (1.4)

Tumor size (cm) <0.001

<1 14 (9.2) 84 (18.7) 31 (8.4)

1-2 71 (46.4) 201 (44.8) 165 (44.7)

>2 63 (41.2) 141 (31.4) 165 (44.7)

NA 5 (3.3) 23 (5.1) 8 (2.2)

Histologic grade <0.001

I 11 (7.2) 58 (12.9) 23 (6.2)

II 70 (45.8) 238 (53) 169 (45.8)

III 65 (42.5) 110 (24.5) 161 (43.6)

NA 7 (4.6) 43 (9.6) 16 (4.3)

ER status <0.001 negative 41 (26.8) 114 (25.4) 138 (37.4)

positive 110 (71.9) 331 (73.7) 225 (61)

NA 2 (1.3) 4 (0.9) 6 (1.6)

Ki67 status (%) <0.001

<17 63 (41.2) 277 (61.7) 155 (42)

>17 69 (45.1) 106 (23.6) 176 (47.7)

NA 21 (13.7) 66 (14.7) 38 (10.3)

HER2 status 0.002

Negative 121 (79.1) 307 (88.7) 296 (80.2)

Positive 32 (20.9) 39 (11.3) 73 (19.8)

Chemotherapy 0.008

No 84 (54.9) 234 (52.1) 162 (43.9)

Yes 62 (40.5) 184 (41) 191 (51.8)

NA 7 (4.6) 31 (6.9) 16 (4.3)

Subtype

ER⁺ Ki67 ^Low 51 (33.3) 225 (50.1) 113 (30.6) <0.001

ER⁺ Ki67 ^High 32 (20.9) 53 (11.8) 71 (19.2)

HER2⁺ 32 (20.9) 53 (11.8) 73 (19.8)

TNBC 24 (15.7) 71 (15.8) 89 (24.1)

NA (84) 14 (9.2) 47 (10.5) 23 (6.2) [00147] There were no differences between the three groups by race/ethnicity, lymph node status, or hormone therapy (p = 0.34). Counterintuitively, the clinical characteristics of the tumors in the low- and high-risk categories were significantly more likely to be larger, of high histological grade, HER2 positive, and Ki67^Mgh compared to the no CNV risk group. In contrast, the high-risk group was more likely to be ER-negative, to receive chemotherapy, and to be younger age than the low or intermediate groups. Interestingly, both the high- and low-risk groups show similar global copy number changes (FIG. 4, panel a) and similar within-group occurrence of HER2 amplification (-20%) (see Table 2). As expected, the low and high-risk groups differ in the composition and nature (gain or loss) of the 19 selected markers (FIG. 4, panel b).

[00148] CNI-based risk index identifies a low risk group among ER-negative patients.

The full model containing the 19 makers and the clinical covariates showed greater improvements in prognostication for the intrinsically more-aggressive tumors ER-negative than ER-positive tumors (FIG. 2). Significant improvements in prognostication were also observed for ER /Ki67^hlgh, HER2⁺, and TNBCs compared to the clinical covariates when assessed separately. To explore the importance of the CNI markers as prognostic markers for the clinically more aggressive tumors, recognizing treatment heterogeneity in the sample, the performance of the marker-only risk score among ER-negative cases by treatment status were assessed (FIG. 5). As shown, there is a strong relationship between risk scores and time-to- recurrence in ER-negative patients independent of treatment. For the marker-based models, -14% (41 of 293 ER-negative cases) of patients experience very low hazard of recurrence (HR=0.06, 95% CI; 0.01-0.42, p=0.005) relative to the group with no imbalances in the 19 markers, independent of treatment with chemotherapy. Thus, the low risk group appears to achieve no treatment benefit with the addition of chemotherapy. As for the effect of chemotherapy for ER negative and 'high risk' individuals, a comparison of the Kaplan Meier curves for the chemotherapy versus no chemotherapy recipients showed no significant differences for either the marker-only high risk or clinical + marker high risk groups by chemotherapy yes or no (log-rank p-values are 0.497 and 0.248, respectively). These results support a prognostic and not predictive nature of the CNI-based risk classifier. [00149] Prediction accuracy of the CNI-based models and recurrence probability at 5 and 10 years of follow-up. Lastly, time-dependent receiver operator characteristic (ROC) curves were derived along with the area under the curve (AUC) for the full models containing both markers and clinical covariates compared to the clinical-only and marker-only models for 5 - and 10-year recurrence probability (FIG. 7). Using all the data, the average AUC for the full model was 0.71 and 0.68 at 5 and 10 years, respectively. This compared to 0.64 and 0.60, respectively for the clinical model and 0.65 and 0.64, respectively for the marker only model. For ER-negative the average AUC for the full model was 0.74 and 0.73 at 5 and 10 years, respectively. This compared to 0.63 and 0.64, respectively for the clinical model and 0.72 and 0.70, respectively for the marker only model. These results illustrate the potential contribution of CNI for improved prognostic accuracy, particularly among the ER-negative cases. [00150] Four recurrent CNIs as strongest predictors of breast cancer recurrence. To gain a better understanding of how the number of arbitrarily determined iterations (n=100) in the variable selection process were influencing the performance of the model for recurrence, a sensitivity analyses was conducted by altering the number of iterations. The model did not significantly improve beyond that achieved with 50 iterations and the selection of 10 CNI markers (Table 4) with the majority of the model performance explained by 8p22, l lql3.5, 22ql l, and Xp21.

Table 4. Effect of the number of iterations of CoxBoost to the number of selected markers and the performance of the Cox model on the test set.

C-index of full model on

# Iterations # of selected markers

the test data set (n=243)

10 4 0.68

30 10 0.71

50 12 0.71

100 19 0.71

200 35 0.71

[00151] CNI-based marker predicts breast cancer recurrence for Her2⁺ patients. It was discovered that a recurrent gain in 17q21.33 among HER2⁺ tumors was significantly associated with reduced risk of recurrence. Genomic aberrations in the long arm of chromosome 17 are frequent events in human cancers and have been extensively studied for their association with poor outcomes (Mani et ah, 2008; Celebiler Cavusoglu et ah, 2009; Honeth et ah, 2008). This is the first report showing a reduced hazard of recurrence (HR = 0.31; 95% CI, 0.18-0.54) for tumors bearing an amplification in a relatively narrow region in 17q21.33 previously suggested to harbor MYST2 as a putative oncogene of significance in HER2+ tumors (Letessier et al., 2006). Reasons for the reduced hazard among tumors bearing this amplicon in this study are unclear. On additional examination, this marker is not highly correlated with the segment containing the topoisomerase II alpha (TOP02a) gene located at 17q21.1 (Spearman = 0.28), which has been implicated as a modifier of anthracycline sensitivity among breast tumors (Oakman et al., 2009). Additional investigation of the regional effects of chromosome 17 alterations and other identified loci will allow us to conduct a more thorough examination of the impact of correlated segments most adjacent to this region; information that is potentially lost in the boosting strategy used.

[00152] The CNI marker model performed well both within and between the tumor subtypes, supporting others' observations that a sizable number of breast tumors appear to share features of more than one of the clinical or expression-based subtypes (Sotiriou and Piccart , 2007). There are a number of strengths to this study, including the large sample of early-stage tumors with long-term follow-up and high-quality copy number data derived from novel MIP technology. The CoxBoost method allowed us to fit Cox models by likelihood- based boosting for a single survival endpoint. This approach uses component- wise likelihood- based boosting, is especially suited for models with a large number of predictors, and allows for mandatory covariates with unpenalized parameter estimates.

Materials and Methods

[00153] Ethics Statement. This study included banked samples dating from 1985-1999 and was approved by the Institutional Review Board of the University of Texas M.D. Anderson Cancer Center (MDACC) with waiver of consent for passive follow up of deceased patients. For all patients who were alive during the study period, patients were contacted and consented in writing for participation in the study.

[00154] Patient population and breast tumor specimens. Breast tumors (n=l,003) for which complete clinical and follow-up data were available and adequate tumor DNA from formalin-fixed, paraffin-embedded (FFPE) tissue blocks were identified from the Early Stage Breast Cancer Repository (ESBCR) at MDACC. The cohort is a retrospective study of 2,409 women diagnosed with pathologic stage I or II breast cancer and surgically treated at MDACC between 1985 and 2000. Criteria for eligibility and cohort details have been previously reported (Brewster et al, 2007). Clinical information, including patient's age, stage, tumor size, lymph node status, nuclear grade, ER and progesterone receptor (PR) Table 5. Clinical characteristics of all stage I/II breast cancer patients with MIP derived copy number

White Black Hispanic Other Total

Characteristic (N=715) (N=125) (N=123) (N=8) (N=971)

Age (yrs), mean (s.d.) 55(12.7) 54(13.8) 52.1(10.9) 51.8(8.7) 54.4(12.6)

Year of Diagnosis

>1995 304(42.5) 45(36) 56(45.5) 0 405(41.7) 1990-1994 227(31.8) 42(33.6) 38(30.9) 0 307(31.6) 1985-1989 174(24.3) 36(28.8) 27(22) 0 237(24.4) Missing 10 (1.4) 2(1.6) 2(1.6) 8(100) 22(2.3)

Stage, # (%)

I 224 (31.4) 33(26.4) 47(38.2) 0 303(31.2)

Ila 337 (47.1) 54(43.2) 49(39.9) 1(12.5) 441(45.4) lib 151(21.1) 37(29.6) 26(21.1) 7(87.5) 221(22.8)

Missing 3(0.4) 1(0.8) 1(0.8) 0 5(0.01)

Intrinsic Subtype, #

(%)

E ⁺, Ki67 ^low 310(43.3) 33(26.4) 45(36.6) 1(12.5) 389(40.1) ER⁺, Ki67 ^high 110(15.4) 23(18.4) 19(15.5) 4(50) 156(16.1) Her2⁺ 107(15) 25(20) 24(19.5) 2(25) 158(16.4) Triple Negative 123(17.2) 34(27.2) 26(21, 1) 1(12.5) 184(18.8) Missing 65(9.1) 10(8) 9(7.3) 0 84(8.6)

Nuclear Grade*, # (%)

1-2 434(60.7) 60(48) 75(61) 0 569(58.6) 3 234(32.7) 58(46.4) 44(35.8) 0 336(34.6)

Missing 47(6.6) 7(5.6) 4(3.2) 8(100) 66(6.8)

Tumor size, # (%)

< 2 cm 419(58.6) 58(46.4) 89(72.4) 0 566(58.3) > 2 cm 276(38.6) 62(49.6) 31(25.2) 0 369(38.0) Missing 20(2.8) 5(4) 3(2.4) 8(100) 36(3.7)

Lymph node status, #

(%)

Negative 409(57.2) 82(65.6) 74(60.2) 0 565(58.2) Positive 295(41.3) 41(32.8) 47(38.2) 0 383(39.4) Missing 11(1.5) 2(1.6) 2(1.6) 8(100) 23(2.4)

* Nuclear grade was determined by the Modified Black's method.

status, and primary treatment, including surgery, radiation therapy, chemotherapy, and endocrine therapy, was abstracted from medical charts (Table 5). The group selected for copy number determination was enriched to include all African American (n=196) and Hispanic (n=208) patients, and a random sample of non-Hispanic white patients oversampled for recurrences (n=808).

[00155] Definition of tumor subtypes. Four tumor subtypes were approximated from clinically validated immunohistochemical (IHC) analyses of ER, PR, HER2, and Ki67: ER / Ki67^low, ER⁺/Ki67^Mgl1, ERBB2 gene amplified (HER2⁺), and triple negative (TNBC [ER /PR /HER2 ]). For the derivation of IHC-based tumor subtypes, ER and PR positivity was defined as > 1% staining. ER-positive tumors were further subclassified by mitotic index using Ki67 and a clinical threshold of >17% to approximate luminal B tumors (Nielsen et al., 2004). HER2-positive tumors were defined as a separate group because of the prognostic and predictive significance of HER2 amplification. ER and PR status were obtained from medical records (primary source) and tissue microarray studies (secondary source); the agreement in ER and PR status between the two sources was 84.8% and 76.4%, respectively. Molecular inversion probe (MIP) array-based HER2 copy number proved superior to IHC (area under receiver operator characteristic curve was 0.94) and equivalent to fluorescence in situ hybridization (FISH) for assessment of ERBB2 (HER2) gene amplification. Thus, in cases that were ambiguous by IHC and FISH data were unavailable, copy number was used to determine HER2 amplification.

[00156] DNA extraction. Tumor DNA was extracted from FFPE tissues and processed for copy number analyses as described previously (Wang et al., 2009). Briefly, 5-10 (5 μιη) macrodissected tumor sections were pooled and treated three times with proteinase K in ATL Tissue Lysis Buffer (Qiagen, Valencia, CA). Following lysis, the samples were applied to uncoated Argylla Particles™ (Argylla Technologies, Tucson, AZ) and processed according to manufacturer recommendations (available at world wide web via argylla.com).

[00157] Molecular inversion probe-based arrays for copy number measurement. DNA from FFPE tumor and a 10% matched normal sample was prepared (10 ng/μΕ) and shipped to the Affymetrix™ MIP laboratory for copy number measurement. The laboratory was blinded to all sample and subject information including identity of duplicates. The MIP assay has been described in detail (Wang et al., 2009; Hardenbol et al., 2005; Hardenbol et al., 2003;

Wang et al., 2005; Ji et al., 2006), including platform validation using representative but independent samples from the ESBCR (Wang et al., 2005). Data quality was assessed using the sample two point relative standard error (2p-RSE), as previously described (Wang et al.,

2007). The majority (95%) of FFPE tumors samples applied to the MIP arrays passed the 2p-

RSE threshold.

[00158] Determination of copy number change. DNA copy number differences were analyzed using AsCNAR software (http://genome.umin.jp) for SNP mapping data. Data collected from matched normal samples were used for normalizing the copy number data. For each sample, full genome MIP quantifications (33 OK MIPs) were generated. In order to reduce the data dimension, the running median within groups of 25 consecutive MIPs was computed, yielding 13,175 data points per sample. The Circular Binary Segmentation algorithm (Olshen et al. , 2004) was used to convert the data to a list of segments for each sample. Copy number differences were analyzed with the R package DNAcopy (www.r- project.org) using thresholds of 2.5 for one copy gained and 1.5 for one copy lost. The parameter alpha (significance level for acceptance of change-points) used in the segmentation algorithm was set to 0.01. Consecutive segments were combined if their gain/loss calls agreed for at least 99.5% of the samples. This procedure yielded 1,593 segments, representing the entire genome. Comparisons of copy number patterns across different demographic, clinical, and tumor subtype groups were performed by Fisher's exact test, chisquare test, or Wilcoxon rank-sum test, as appropriate, with random permutations of the samples, to incorporate a false discovery rate adjustment for multiple comparisons.

[00159] Development of prediction models with copy number data. The entire sample was randomly split into two groups: 75% (n=728) for training and 25% (n=243) for testing. The primary endpoint of the study was time-to-breast cancer recurrence, defined as the occurrence of local lymph node or breast recurrence; metastasis to contralateral breast, chest wall or other sites; or second primary breast cancer. Patients not known to have a recurrence at the date of last contact were censored. Univariate Cox proportional hazards regression models were used to evaluate the associations between tumor characteristics and treatment variables and time-to-recurrence. [00160] To integrate information on copy number, the CoxBoost algorithm was applied for fitting a Cox proportional hazards model with high-dimensional covariates to select CNIs relevant to recurrence (Binder and Schumacher, 2008). It is important to note that 100 iterations were arbitrarily chosen, which yielded 19 CNI markers that were used throughout model building. Next, a backwards elimination procedure was used to fit a multivariate Cox proportional hazards model with clinical covariates, considering those that were associated with time-to-recurrence in univariate analysis. Finally, the selected CNIs and clinical covariates from the above two steps were combined with tumor subtype and backward elimination was applied with Cox proportional hazards modeling to derive the final multivariate model for recurrence. Internal validation of the final multivariate model was performed to confirm that results were not spurious and to assess the performance of the resulting models with respect to potential overfitting. Specifically, for the training data set, prediction performance was evaluated using bootstrap .632+ estimates of prediction error curves. To assess model performance, the concordance-index (C-index) (Therneau and Grambsch, 2000) was used to compare the strengths of the various models by fitting the same multivariate models to the test set. The C-index estimates were also used to compare differences between the individual models using the two-sample t-test.

[00161] Creation of risk group classifiers. The coefficients of the Cox model based on the training data (n=723) including the 19 markers, as -1, 0, and +1, were used to define three groups: intermediate risk (tumors that show no event for the 19 markers, risk index = 0), high risk (tumors with risk index > 0), and low risk (tumors with risk index < 0).

[00162] Time-dependent ROC curves for recurrence. The discrimination potential of these models (clinical-only, markers-only, and clinical + markers models) was summarized by calculating ROC curves for cumulative recurrence incidence at 5 and 10 years, see (Heagerty et ah, 2000). A ROC curve is the plot of the sensitivity versus 1 -specificity of the dichotomized test X>c for all possible values of c, where X is a risk indicator. A time- dependent ROC curve can be produced by estimating time-dependent sensitivity and specificity,

Sensitivity(c,t) = P{X>c | D(t)=l }

Specificity(ct) = P{X<c | D(t)=0}, where D(t) is 1 if an event (recurrence) happened up to time t, and 0 otherwise. For these three models, the log-hazard values estimated by each Cox model was used as a risk indicator for the ROC curve computation. The R package survival ROC was used (via http://cran.rproject.org/web/packages/survivalROC/index.html).

Example 2 - Copy Number Imbalances Between Screen and Symptom-Detected Breast

Cancers and Impact on Disease-free Survival

[00163] Characteristics of the ESBCR Patient Population. As shown in Table 6, the majority of breast tumors were detected as a result of symptoms (70%) compared to screening (30%). The proportion of screen-detected tumors increased from 13.8% between the years 1985 and 1989 to 54% in the years 1995 to 2000. As anticipated, compared to symptom-detected tumors screen-detected tumors were more likely to be smaller (p<0.001), lymph node negative (p<0.001) and low histological grade (p<0.001). Screen-detected tumors were also more likely to be luminal A (51%) and symptom-detected tumors were more likely to be Her2 -positive (19%>) and triple negative (21%). As a result, screen-detected tumors were also more likely to be treated with endocrine adjuvant therapy and symptom- detected breast tumors were more likely to be treated with adjuvant chemotherapy.

Table 6. Patient and Clinical Characteristics of Study Population

Factors Screen-detected (N=247) Symptom-detected (N=603)

Year at diagnosis p-value=<.001

1985-1989 34 (13.8) 169 (28)

1990-1994 75 (30.4) 192 (31.8)

>=1995 135 (54.6) 226 (37.5)

Missing 3 (1.2) 16 (2.7)

Age at diagnosis p-value=0.005

40-49 55 (22.3) 201 (33.3)

50-59 82 (33.2) 188 (31.2)

60-70 60 (24.3) 130 (21.6)

>70 50 (20.2) 84 (13.9)

Ethnicity p-value=0.128

White 193 (78.1) 434 (72)

Black 23 (9.4) 85 (14)

Hispanic 30 (12.1) 77 (12.8)

Other 1 (0.4) 7 (1.2)

Tumor size (cm) p-value=<.001

<=2 193 (78.1) 300 (49.8)

2-5 47 (19.1) 260 (43.1)

>5 1 (0.4) 16 (2.7)

Missing 6 (2..4) 27 (4.4)

Nodal status p-value=.061

Negative 160 (64.8) 342 (56.7)

Positive 84 (34) 244 (40.5)

Missing 3 (1.2) 17 (2.8)

Nuclear grade p-value=0.002

I 35 (14.2) 52 (8.6)

II 131 (53) 292 (48.4)

III 62 (25.1) 217 (36)

Missing 19 (7.7) 42 (7)

Histology p-value=0.22

Invasive ductal 225 (91.1) 562 (93.2)

Invasive lobular 20 (8.1) 35 (5.8)

Other 2 (0.8) 6 (1)

Tumor subtype p-value<.001

Luminal A 127 (51.4) 214 (35.5)

Luminal B 35 (14.3) 104 (17.2) Her2 positive 29 (11.7) 114 (18.9)

Triple Negative 28 (11.3) 126 (20.9)

Missing 28 (11.3) 45 (7.5)

Chemotherapy p-value<.001

None 170 (68.8) 290 (48.1)

Anthracyline 40 (16.2) 205 (34)

Anthracyline + Taxane 27 (10.9) 72 (11.9)

Other 4 (1.7) 13 (2.2)

Missing 6 (2.4) 23 (3.8)

Hormone therapy p-value=.003

Tamoxifen 136 (55.1) 254 (42.1)

Other 3 (1.2) 17 (2.8)

None 107 (43.3) 321 (53.2)

Missing 1 (0.4) 11 (1.8)

[00164] Copy Number Imbalances in Screen-detected and Symptom-detected Breast Cancers. CNIs involving 22 segments in 5 chromosomes (Immonen-Raiha et al., 2005; Shen et al.„ 2005; Friedrich et al., 2009; Letessier et al., 2006; Olshen et al., 2004) that were statistically significantly associated with method of breast cancer detection were identified (Table 7). The 22 segments identified correspond to a threshold of 0.05 of the false discovery rate. Gains in copy number were more frequently observed than losses in the segments of these five chromosomes. Recurrent copy number gains in the chromosomal regions of 3q29, 8q24.13 and 20ql3.13-ql2.32 were significantly more common among symptom-detected than screen-detected tumors (p-value <0.0001). Ingenuity Pathway Analysis was used to identify gene functions related to breast cancer that corresponded with the segments identified in this study (Ingenuity ® Systems, world wide web via ingenuity.com).

Table 7. Copy Number Imbalances Associated with Method of Breast Cancer Detection

(p< 0001)

[00165] The Effect of Clinical Variables and CNIs on Breast Cancer-Specific Survival. The HR for the association between method of breast cancer detection and breast cancer disease-free survival and then adjusting for the clinical factors individually are shown in Table 8. In the unadjusted analysis, patients with screen-detected breast tumors had a statistically significant 34% improvement in disease-free survival compared to patients with symptom-detected tumors. The HR for the effect of screening on disease-free survival was attenuated from 0.66 (95% CI, 0.50-0.88) to 0.88 (95% CI, 0.63-1.21) after adjusting for age at diagnosis, tumor size and nodal status. The HR was further attenuated with the inclusion of the 5 CNIs to 0.97 (95% CI, 0.69-1.35). The Freedman estimate of the proportion of the survival advantage associated with screen-detection that was attributed to the clinical variables (age at diagnosis, tumor size, nodal status) was 68%> (p=.02). The inclusion of the 5 CNIs increased the Freedman statistic to 93% (p=.01). The Freedman estimate of the proportion of the disease-free survival advantage from screen-detection that could be attributed to the 5 CNIs alone was 20%. Table 8. Attenuation of the Association between Method of Detection and Disease- free Survival After Adjusting for Clinical variables and Copy Number Imbalances (CNIs)

Methods

[00166] Patient Population. The Early Stage Breast Cancer Repository (ESBCR) is a retrospective cohort of 2,409 women diagnosed with American Joint Committee on Cancer pathologic stage I or II breast cancer and surgically treated at MD Anderson Cancer Center between 1985 and 2000. Criteria for eligibility and cohort details have been previously described (Brewster et ah, 2007). Briefly, detailed clinical information including patient age, stage, tumor size, lymph node status, nuclear grade, estrogen receptor (ER) and progesterone receptor (PR) status, and primary treatment including surgery, radiation therapy, chemotherapy and endocrine therapy were abstracted from medical charts. Patient's formalin-fixed paraffin-embedded (FFPE) breast tumor specimens were accessed from the MD Anderson Cancer Center Department of Pathology Tumor Bank. Screen-detected tumors were defined as those detected via a screening mammogram. Symptom-detected breast tumors were those identified through patient reported symptoms including palpable breast lump, breast pain or nipple discharge. Follow-up information for patients in the ESBCR is obtained by direct review of the medical records and linkage to the MD Anderson Cancer Center Tumor Registry, which mails annual follow-up letters to each patient registered at MD Anderson Cancer Center known to be alive to determine their clinical status. The MD Anderson Cancer Center Tumor Registry checks the social security death index and the Texas Bureau of Vital Statistics for the status of patients who fail to respond to the letters.

[00167] The ESBCR patients selected for the MIP assay of DNA copy number were enriched to include all African American (n=196) and Hispanic patients (n=208) and a random sample of Caucasian patients over sampled for recurrences (n=808). There were 241 patients excluded from the analyses because of insufficient tumor for DNA extraction, DNA extraction failure, or MIP assay failure. Women younger than age 40 at the time of breast cancer diagnosis were also excluded (n=121). There were 850 patients included in the final study analyses.

[00168] Definition of Tumor Subtypes. Tumor subtypes were approximated from clinically validated immunohistochemical (IHC) analyses of ER, PR, HER2, and Ki67 (Cheang et ah, 2009). ER and PR were interpreted as positive when >5% of the tumor nuclei were positive. The ER and PR status was obtained from two sources: medical records (primary source) and tissue microarray (secondary source). The agreement in ER and PR status between the two sources were 84.8% and 76.4%, respectively. MIP array-based HER2 copy number proved superior to IHC (area under receiver operator characteristic curve was 0.94) and equivalent to fluorescence in situ hybridization (FISH) for HER2 gene amplification. Thus, MIP copy number was used to determine HER2 amplification in cases that were ambiguous by IHC (2+) and FISH data were unavailable. Tumors were classified into approximated subtypes based on IHC and FISH results with luminal A (ER or PR- positive, Her2 -negative, Ki67<17%>), luminal B (ER or PR-positive, Her2 -negative, Ki67 > 17%), Her2 -positive (ER and PR-negative, Her2 -positive) and triple negative (ER and PR- negative and Her2 -negative).

[00169] Tumor DNA Extraction. DNA was extracted and processed for copy number analyses from FFPE tissues as described previously (Wang et ah, 2009). Briefly, 5-10 (5 μιη) macrodissected tumor sections were pooled and treated three times with proteinase K in ATL Tissue Lysis Buffer (Qiagen, Valencia, CA). Following lysis, the samples were applied to uncoated Argylla Particles™ (Argylla Technologies, Tucson, AZ) and processed according to manufacturer recommendations (http://www.argylla.com). [00170] Molecular Inversion Probe Based Arrays for Copy Number Measurement.

DNA from FFPE tumor and a 10% matched normal sample was prepared (10 ng/μΕ) and shipped for processing for copy number measurement at the Affymetrix™ MIP laboratory, which was blinded to all sample information including matched normals and duplicates. The MIP assay has been described in detail (Wang et al, 2009; Hardenbol et al, 2005; Hardenbol et al, 2003; Wang et al, 2005; Ji et al, 2006) including platform validation using representative but independent samples from the ESBCR (Wang et al, 2005). Data quality was assessed using the sample two-point relative standard error (2p-RSE) (Wang et al, 2007). The majority (95%) of FFPE tumors samples applied to the MIP arrays passed the 2p- RSE threshold.

[00171] Determination of Copy Number Change. DNA copy number differences were analyzed using AsCNAR software (http://genome.umin.jp) for SNP mapping data. Data collected from matched normal were used for normalizing the copy number data. For each sample, full genome MIP quantifications (330K MIPs) were generated. In order to reduce the data dimension, the running median was computed within groups of 25 consecutive MIPs, yielding 13,175 data points per sample. Circular Binary Segmentation (Olshen et al, 2004) was used to convert the data to a list of segments for each sample. DNA copy number differences were analyzed with the R package DNAcopy (www.r-project.org) using thresholds of 2.5 for one copy gained and 1.5 for one copy lost. The parameter alpha (significance level for acceptance of change-points) used in the segmentation algorithm was set to 0.01. Consecutive segments were recombined if their gain/loss calls agreed for at least 99.5%) of the samples. This procedure yielded 1,593 segments, representing the entire genome.

[00172] Statistical Methods. Frequency tables of the patient's clinical characteristics were computed by method of detection using the chi-square test. Fisher's exact test (two tailed) was performed with random permutations of the samples, to compare CNIs between tumors detected by screening versus symptoms. The data were also adjusted for multiple comparisons using a false discovery rate (Westfall et al, 1993). Breast cancer disease-free survival was calculated from the date of diagnosis to the primary endpoint of the study defined as the occurrence of local lymph node or breast recurrence, metastasis to contralateral breast, chest wall or other sites, or breast cancer-related death. Patients not known to have a breast cancer event at the date of last contact were censored. Cox proportional hazards regression model was used to estimate the hazard ratio (HR) and 95% confidence interval (CI) first for the association between method of detection and disease-free survival and then adjusted for each of the clinical variables (age at diagnosis, year at diagnosis, ethnicity, tumor size, lymph node status, treatment, tumor subtype, nuclear grade histology, and the CNIs found to differ between screen and symptom-detected tumors). The method of Freedman et al. was used to calculate the percentage of the effect of method of detection on disease-free survival accounted for by the variables (Freedman et al., 1992). The Freedman estimate is defined as: P=100{l-a/b}, where a is the adjusted and b is the unadjusted logarithm of the hazard ratio. The null hypothesis is that the Freedman statistic = 0% meaning that the adjustment makes no difference. Significance at the 0.05 level implies that adjusting for the variable significantly explains part of the association between method of detection and breast cancer disease-free survival.

Example 3 - Brain metastasis survival analysis of breast cancer patients

[00173] There are 961 breast cancer subjects with complete brain metastasis information and MIP data (Table 9). Effects of clinical factors on survival of these patients were assessed by univariable Cox-proportional model (Table 10). Only subtype factor was selected for forward, backward or stepwise selection method from all nine potential clinical factors (overall C-index = 0.91). Two copy number imbalance markers 328 and 813 (3q29 and 10pl3) (Table 11) were first identified to predict metastasis to the brain. Copy number frequencies of the identified markers 328 and 813 were shown in Table 12. Kaplan-Meier (KM) curves of the two selected markers are shown in FIG. 8. Loss of these two markers were associated with higher survival rate. More markers were identified and listed in Table 13 with increasing the iteration steps.

Table 9 Patient information

Frequency Percent missing 34

Nuclear grade

I-II 564 62.88

III 333 37.12 missing 64

Stage

I 304 31.63

II 657 68.37

Race

White 707 74.19

Black 124 13.01

Hispanic 122 12.80 missing 8

Radiation

Yes 408 43.54

No 529 56.46 missing 24

Endocrine

Yes 420 44.87

No 516 55.13 missing 25

Chemo-therapy

No chemo 476 52.31

Anthracycline 320 35.16

Anthracycline & Taxane 114 12.53 missing 51

Lymph node

Negative 562 59.79

Positive 378 40.21 missing 21

Table 10 Univariable Cox-proportional model of clinical factors

Factor P-value Characteristics Hazard 95% CI

ratio

Tumor size 0.0307 <2cm(ref.) 1

(34 missing) >=2cm 1.926 1.063 3.489

Nuclear grade 0.0662 I & II 1

(64 missing) III 1.764 0.963 3.231

Stage 0.0477 II(ref.) 1

I 0.479 0.231 0.993

Race 0.1024 White(ref.) 1

(8 missing) Black 2.182 1.063 4.480

Hispanic 1.333 0.553 3.215

Radiation therapy 0.5465 No(ref.) 1

(24 missing) Yes 1.200 0.664 2.169

Endocrine therapy 0.1320 No(ref.) 1

(25 missing) Yes 0.605 0.314 1.163

Chemo therapy 0.2624 No(ref.) 1

(51 missing) anthracycline 1.324 0.695 2.524 anthracycline & 2.051 0.854 4.927 taxane

Lymph node 0.8466 Negative(ref.) 1

(21 missing) Positive 1.061 0.581 1.94

Table 11. Start and Stop Boundaries and Genes of the two CNIs associated with brain metastasis of breast cancer patients.

Chromo Start Stop GeneSymbol Cytoband -some

MUC20,MUC4,SDHALP2

Marker chrlO 12450045 16084813 CAMK1D,CCDC3,0PTN,C pl3 813 10orf49,MCMl Ο,ΡΗΥΗ,ΒΕ

ND7, SEPHS 1 ,PRPF 18,FRM

D4A,FAM107B,ARMETL1 ,

HSPA14,DCLRE1C,MEIG1,

SUV39H2,ACBD7,C10orfl 1 l,NMT2,OLAH,RPP38,FA

M171Al,ITGA8,C10orf97

Table 12 Copy number frequencies of the identified markers 328 and 813

Table 13 Markers that predict brain metastasis

Iteration step=50

Risk of Gain: Found in Example 1

Marker 648 chr8 p22 Yes

Marker 650 chr8 p22 Yes

Marker 888 chrl l pl2 No (l lpl5.1-pl5.2)

Marker 1013 chrl2 ql3.13 Yes

Marker 1137 chrl4 q32.12,q32.13,q32.2,q32.31 No (14ql3.2-13.3)

Marker 1323 chrl7 q23.1,q23.2 No (17q21.33)

[00174] Cox proportional hazards regression model was used to estimate the hazard ratio

(HR) and 95% confidence interval (CI) for the association between selected markers or in combination with selected clinical factors. The multivariable model if treat the two selected markers as continuous variables is shown in Table 14. If adding selected clinical factors, the model is adjusted as in Table 15.

Table 14 The multivariable model only containing the four markers (by R)

Table 15 The multivariable model containing both of the selected clinical factor and markers

[00175] If subtypes were kept in the model and a stepwise selection method was used for the 24 markers in Table 13, ten markers were selected below (Table 16). Compared to the Example 1 , seven new markers were found.

Table 16 The multivariable model for the ten markers

Markers (n=881) HR with 95% CI p-value Found in

Example 1

Marker 948 (llq21) 8.778(2.619,29.413) 0.0004 No

Marker 1047 (13ql2.11-ql2.12) 7.134(1.893,26.883) 0.0037 No

Marker 1114 (14q21.1) 5.194(2.227,12.117) 0.0001 No

Marker 1443 3.651(1.299,10.261) 0.0140 No

(19ql3.33, ql3.41, ql3.42, ql3.43)

Markerl591 (Xq28) 3.906(1.529,9.975) 0.0044

Marker364 (4q21.1,q21.21) 0.087(0.021,0.364) 0.0008 No

Marker888 (llpl2) 0.227(0.077,0.675) 0.0076 No

Markerl013 (12ql3.13) 0.505(0.170,1.500) 0.2189

Markerl323 (17q23.1,q23.2) 0.311(0.099,0.974) 0.0450 No

Subtypes 0.0203

ER⁺, Ki671ow(ref.) 1

ER⁺, Ki67high 4.659(1.753,12.382)

Her2⁺ 3.211(1.131,9.114)

TNBC 2.675(1.011,7.080)

C-index=0.847

* * *

[00176] All of the methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims. REFERENCES

The following references, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference.

U.S. Patent 4,683,202

American Cancer Society, In: Cancer Facts & Figures, 2007.

Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates and

Wiley Interscience, N.Y., 1995.

Bergamaschi et al., Genes Chromosomes Cancer, 45: 1033-1040, 2006.

Binder and Schumacher, BMC Bioinformatics, 9: 14, 2008.

Brewster et al, J. Clin. Oncol, 25(28):4438-44, 2007.

Celebiler Cavusoglu et al, Cancer Sci., 100:2341-2345, 2009.

Cheang et al, J. Natl Cancer Inst., 101 :736-50, 2009.

Chin et al, Cancer Cell, 10:529-541, 2006.

Freedman et al., Stat. Med., 11 :167-78, 1992.

Friedrich et al, Anal. Quant. Cytol. Histol, 31 : 101-8, 2009.

Gotzsche and Nielsen, Cochrane Database Syst. Rev., (4):CD001877, 2009.Hardenbol et al,

Genome Res., 15:269-275, 2005.

Hardenbol et al, Nat. Biotechnol, 21 :673-678, 2003.

Heagerty et al, Biometrics, 56:337-344, 2000.

Honeth et al, Breast Cancer Res., 10:R53, 2008.

Immonen-Raiha et al, Cancer, 103:474-82, 2005.

Innis et al, In: PCR Protocols A Guide to Methods and Applications, Academic Press Inc.,

San Diego, Calif, 1990.

Ji et al, Cancer Res., 66:7910-7919, 2006.

Kosters and Gotzsche, Cochrane Database Syst. Rev., (2):CD003373, 2003.

Lacroix, Endocr. Relat. Cancer, 13(4): 1033-1067, 2006.

Letessier et al, BMC Cancer, 6:245, 2006.

Mani l a/., Cell, 133:704-15, 2008.

Nielsen et al, Clin. Cancer Res., 10:5367-5374, 2004. Oakman et al, Cancer Treatment Rev., 35:662-667, 2009.

Olshen et al, Biostatistics, 5:557-572, 2004.

Sambrook and Russell, Molecular Cloning: A Laboratory Manual, 3^rd Ed. Cold Spring Harbor Lab. Press, 2001.

Shen et al, J. Natl Cancer Inst., 97: 1195-203, 2005.

Sotiriou and Piccart, Nat. Rev. Cancer, 7:545-553, 2007.

Therneau and Grambsch, In: Modeling Survival Data: Extending the Cox Model, Springer-

Verlag, 2000.

Wang et al, BMC Med. Genomics, 2:8, 2009.

Wang et al, Genome Biol, 8:R246, 2007.

Wang et al, Nucleic Acids Res., 33:el83, 2005.

Westfall and Young, In: Resampling-Based Multiple Testing: Examples and Methods for p- Value Adjustment, NY, Wiley, 1993.

Claims

1. A method for providing a report for use in evaluating a patient determined to have a breast cancer, comprising: assaying for a copy number imbalance (CNI) of one or more biomarkers in a breast cancer sample of the patient, wherein the biomarkers comprise: one or more genomic regions selected from the group consisting of 22ql 1.1- 11.21, Xp21.1-21.2, lpl2, 12ql3.13, 13ql2.3, 2pl l.l, 3ql3.12, 10pl l.21, 10q23.1, l lpl5, 14ql3.2-13.3, and 17q21.33, or two or more genomic regions selected from the group consisting of 8p22, l lql3.5, 22ql l.1-11.21, Xp21.1-21.2, lpl2, 12ql3.13, 13ql2.3, 2pl l.l, 3ql3.12, 10pl l.21, 10q23.1, l lpl5, 14ql3.2-13.3, 17q21.33, 16pl l.2, 10pl3, 12pl3, 20ql3, and Xq28; and providing a report that the patient has an increased or decreased CNI with respect to the assayed biomarkers.

2. The method of claim 1, wherein the one or more biomarkers comprise one or more genomic regions selected from the group consisting of 22ql 1.1-11.21, Xp21.1-21.2, lpl2, 12ql3.13, 13ql2.3, 2pl l.l, 3ql3.12, 10pl l.21, 10q23.1, l lpl5, 14ql3.2-13.3, and 17q21.33.

3. The method of claim 2, wherein the one or more biomarkers comprise four genomic regions including 8p22, l lql3.5, 22ql l.l-ql l.21, and Xp21.1-p21.2.

4. The method of claim 1, wherein the one or more biomarkers comprise two or more genomic regions selected from the group consisting of 8p22, l lql3.5, 22ql l.1-11.21, Xp21.1-21.2, lpl2, 12ql3.13, 13ql2.3, 2pl l.l, 3ql3.12, 10pl l.21, 10q23.1, l lpl5, 14ql3.2-13.3, 17q21.33, 16pl l.2, 10pl3, 12pl3, 20ql3, and Xq28.

5. The method of claim 1, wherein the one or more biomarkers comprise the genomic region 17q21.23.

6. The method of claim 1, further comprising generating a prognostic value of the patient from determination of the number of copies of two or more of the genomic regions.

7. The method of claim 6, wherein the one or more biomarkers comprise ten or more of the genomic regions.

8. The method of claim 1, wherein the breast cancer is at stage 0, 1 or II.

9. The method of claim 1, wherein assaying for a copy number imbalance comprises single nucleotide polymorphism (SNP) analysis or the use of an array.

10. The method of claim 1, wherein the breast cancer sample is a preserved sample.

11. A method of providing a prognosis of a breast cancer patient, comprising: obtaining a report that the patient has an increased or decreased copy number imbalance (CNI) with respect to one or more biomarkers, wherein the biomarkers comprise: one or more genomic regions selected from the group consisting of 22ql 1.1-

11.21, Xp21.1-21.2, lpl2, 12ql3.13, 13ql2.3, 2pl l .l, 3ql3.12, 10pl l .21, 10q23.1, l lpl5, 14ql3.2-13.3, and 17q21.33, or two or more genomic regions selected from the group consisting of 8p22, l lql3.5, 22ql l .1-11.21, Xp21.1-21.2, lpl2, 12ql3.13, 13ql2.3, 2pl l .l, 3ql3.12, 10pl l .21, 10q23.1, l lpl5, 14ql3.2-13.3, 17q21.33, 16pl l .2, 10pl3, 12pl3, 20ql3, and Xq28; providing a poor breast cancer prognosis if the breast cancer sample has a decrease in the number of copies selected from the group consisting of 8p22, 22ql 1.1-ql 1.21 and Xp21.1-p21.2 and an increase in the number of copies of 1 lql3.5; providing a favorable breast cancer prognosis if the breast cancer sample has an increase in the number of copies selected from the group consisting of 8p22, 22ql 1.1-ql 1.21 and Xp21.1-p21.2 and a decrease in the number of copies of l lql3.5, compared to the number of copies in a control sample; providing a poor breast cancer prognosis if the breast cancer sample has two or more high-risk events of a decrease in the number of copies of 8p22, 22ql 1.1-ql 1.21, Xp21.1- p21.2, lpl2, 12ql3.13 and 13ql2.3, and an increase in the number of copies of 1 Op 13 , l lql3.5, 12pl3, 16pl l .2, 20ql3, Xp28, 2pl l .l, 3ql3.12, 10pl l .21, 10q23.1, l lpl5, 14ql3.2-13.3, and l7q21.33; providing a favorable breast cancer prognosis if the breast cancer sample has two or more low-risk events of an increase in the number of copies of 8p22, 22ql 1.1 -ql 1.21, Xp21.1-p21.2, lpl2, 12ql3.13 and 13ql2.3, and a decrease in the number of copies of 1 Op 13 , l lql3.5, 12pl3, 16pl l .2, 20ql3, Xp28, 2pl l .l, 3ql3.12, 10pl l .21, 10q23.1, l lpl5, 14ql3.2-13.3, and l7q21.33; providing a poor prognosis if the breast cancer sample has a decrease in the number of copies of 17q21.23; or providing a favorable prognosis if the breast cancer sample has an increase in the number of copies of 17q21.23, compared to the number of copies in a control sample; and reporting the prognosis to the patient.

12. The method of claim 11, further comprising classifying a patient as low-risk, intermediate-risk (no copy number changes in the biomarkers) or high-risk using the prognostic values.

13. The method of claim 11, wherein the breast cancer is determined to be Her2 -positive (Her2⁺), estrogen receptor-negative (ER^"), high Ki67 (Ki67^Mgh), ER⁺/ Ki67^Mgh, or triple negative (ER7PR7Her2^~).

14. The method of claim 11, wherein the prognosis predicts a risk of breast cancer recurrence, response to cancer treatments, or metastasis risk.

15. The method of claim 1, wherein the prognostic variable information further comprises clinical information of patient age, tumor stage, tumor size, tumor subtypes, lymph node status, nuclear grade, ER status, progesterone receptor (PR status), or primary treatment status.

16. A method for providing a report for use in predicting risk of metastasis to the brain in a patient determined to have a breast cancer, comprising: assaying for a copy number imbalance of one or more biomarkers in a breast cancer sample of the patient, wherein the biomarkers comprise one or more genomic regions selected from the group consisting of 3p29, 6p22.3, 6p23, 10pl4, 10pl3, l lpl3, l lql3.1, l lql3.5, l lq21, 13ql2.11, 13ql2.12, 14ql2, 14q21.1, 19ql3.11, 19ql3.33, 19ql3.41, 19ql3.42, 19ql3.43, Xq28, 4pl2, 4ql2, 4q21.1, 4q21.21, 5ql4.2, 5ql4.3, 5ql5, 5q21.1, 5q21.2, 5q21.3, 5q22.1, 5q22.2, 5q22.3, 5q23.1, 5q23.2, 5q23.3, 8p22, l lpl2, 12ql3.13, 14q32.12, 14q32.13, 14q32.2, 14q32.31, 17q23.1 and 17q23.2; and providing a report that the patient has an increased or decreased CNI with respect to the assayed biomarkers.

17. The method of claim 16, wherein the biomarkers comprise two genomic regions 3q29 and 1 Op 13.

18. A method of providing a prognosis of a breast cancer patient with respect to risk of metastasis, comprising: obtaining a report that the patient has an increased or decreased copy number imbalance (CNI) with respect to one or more biomarkers, wherein the biomarkers comprise one or more genomic regions selected from the group consisting of 3p29, 6p22.3, 6p23, 10pl4, 10pl3, l lpl3, l lql3.1, l lql3.5, l lq21, 13ql2.11, 13ql2.12, 14ql2, 14q21.1, 19ql3.11, 19ql3.33, 19ql3.41, 19ql3.42, 19ql3.43, Xq28, 4pl2, 4ql2, 4q21.1, 4q21.21, 5ql4.2, 5ql4.3, 5ql5, 5q21.1, 5q21.2, 5q21.3, 5q22.1, 5q22.2, 5q22.3, 5q23.1, 5q23.2, 5q23.3, 8p22, l lpl2, 12ql3.13, 14q32.12, 14q32.13, 14q32.2, 14q32.31, 17q23.1 and 17q23.2; and predicting a high risk of metastasis to the brain if the breast cancer sample has an increase in the number of copies of one or more genomic regions selected from the group consisting of 3p29, 6p22.3, 6p23, 10pl4, 10pl3, l lpl3, l lql3.1, l lql3.5, l lq21, 13ql2.11, 13ql2.12, 14ql2, 14q21.1, 19ql3.11, 19ql3.33, 19ql3.41, 19ql3.42, 19ql3.43, and Xq28, or the breast cancer sample has a decrease in the number of copies of one or more genomic regions selected from the group consisting of 4pl2, 4ql2, 4q21.1, 4q21.21, 5ql4.2, 5ql4.3, 5ql5, 5q21.1, 5q21.2, 5q21.3, 5q22.1, 5q22.2, 5q22.3, 5q23.1, 5q23.2, 5q23.3, 8p22, l lpl2, 12ql3.13, 14q32.12, 14q32.13, 14q32.2, 14q32.31, 17q23.1 and 17q23.2; or predicting a high risk of metastasis to the brain if the breast cancer sample has an increase in the number of copies of 3q29 and 1 Op 13; or providing a low risk of metastasis to the brain if the breast cancer sample has a decrease in the number of copies of 3q29 and 1 Op 13 , compared to the number of copies in a control sample; and reporting the prognosis to the patient.

19. A method for providing a report for use in providing a prognosis for a patient determined to have a breast cancer by mammography, comprising: assaying for a copy number imbalance in the number of copies of one or more biomarkers in a breast cancer sample of a patient determined to have a breast cancer by mammography, wherein the biomarkers comprise five genomic regions selected from the group consisting of 2p 11.2, 3q27.1-29, 8q24.13, l lpl3, and 20ql3.13-13.32; and providing a report that the patient has an increased or decreased CNI with respect to the assayed biomarkers.

20. A method of providing a prognosis for a patient determined to have a breast cancer by mammography, comprising: obtaining a report that the patient has an increased or decreased copy number imbalance (CNI) with respect to five genomic regions comprising 2pl l .2, 3q27.1-29, 8q24.13, l lpl3, and 20ql3.13-13.32; providing a poor breast cancer prognosis if the breast cancer sample has an increase in the number of copies of 2pl l .2, 3q27.1-29, 8q24.13, l lpl3, and 20ql3.13-13.32; or providing a favorable breast cancer prognosis if the breast cancer sample has a decrease in the number of copies of 2pl l .2, 3q27.1-29, 8q24.13, l lpl3, and 20ql3.13-13.32, compared to the number of copies in a control sample; and reporting the prognosis to the patient.

21. The method of claim 20, wherein the prognosis report predicts disease-free survival or a response to chemotherapy.

22. The method of claim 20, wherein the prognosis report further comprises a treatment plan comprising chemotherapy for the patient if a poor prognosis is provided.

23. A method of treating a breast cancer patient comprising, obtaining a prognosis report for the patient in accordance with any one of claims 11, 18 or 19 and treating the patient for breast cancer based on said report.