WO2013059152A2 - Methods and kits for selection of a treatment for breast cancer - Google Patents

Methods and kits for selection of a treatment for breast cancer Download PDF

Info

Publication number
WO2013059152A2
WO2013059152A2 PCT/US2012/060351 US2012060351W WO2013059152A2 WO 2013059152 A2 WO2013059152 A2 WO 2013059152A2 US 2012060351 W US2012060351 W US 2012060351W WO 2013059152 A2 WO2013059152 A2 WO 2013059152A2
Authority
WO
WIPO (PCT)
Prior art keywords
genes
breast cancer
responder
patient
gene expression
Prior art date
Application number
PCT/US2012/060351
Other languages
French (fr)
Other versions
WO2013059152A3 (en
Inventor
Jason B. NIKAS
Walter C. Low
Paul A. Burgio
Original Assignee
Applied Informatic Solutions, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Applied Informatic Solutions, Inc. filed Critical Applied Informatic Solutions, Inc.
Publication of WO2013059152A2 publication Critical patent/WO2013059152A2/en
Publication of WO2013059152A3 publication Critical patent/WO2013059152A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • Breast cancer is the leading cancer in women in the United States in terms of annual incidence rate ( ⁇ 207,090 new cases/yr.) and the second most lethal cancer ( ⁇ 39,840 deaths / yr.) [American Cancer Society, 2010; Jemal et al., 2010].
  • Treatment may entail lumpectomy or mastectomy and removal of some of the axillary lymph nodes, and it may involve chemotherapy (with taxol or other chemotherapeutic agents), before or after surgery, hormone therapy, or radiation [American Cancer Society, 2010].
  • our prognostic biomarker test for breast cancer can have a significant impact in the area of pharmacogenomics.
  • Pharmaceutical companies may utilize our prognostic biomarker test to develop new chemotherapeutic treatments which would be specifically aimed at those breast cancer patients that do not respond to the current taxol-based chemotherapy, and which would seek to restore the gene networks that are characteristically aberrant in that subpopulation of patients.
  • One embodiment provides a method to determine if an breast cancer patient is a treatment responder or a non-responder comprising measuring the level of expression of at least one gene in a sample from the patient, wherein the level of expression of the at least one gene in the sample is an indication that the subject is a treatment responder or a non-responder to breast cancer chemotherapy.
  • the sample is a tumor cell sample obtained from a tumor or from blood.
  • One embodiment provides a method for diagnosing breast cancer in a subject comprising: measuring the level of expression of at least one gene in a test sample from a subject and comparing the level of expression with the level of expression of the at least one gene in a control sample from a healthy subject, wherein a higher or lower level of expression of the gene in the test sample compared with the level of expression in the control sample is an indication that the subject will respond or not respond to a breast cancer treatment.
  • the mRNA levels are measured.
  • the protein levels are measured.
  • the gene expression levels are measured by microarray analysis.
  • An embodiment provides a method of identifying markers in an individual correlated with the individual's likelihood being a responder or nonresponder comprising: assaying genetic material from the individual for the expression level of at least one gene, wherein the expression levels of at least one gene are associated with the likelihood of the patient being a responder or nonresponder to a breast cancer treatment.
  • the measurement of gene expression provides a diagnosis which indicates that the subject/patient will respond to treatment. In another embodiment, the measurement of gene expression provides a diagnosis that the subject/patient will not respond to treatment.
  • the subject/patient is a mammal, such as a human. In one embodiment, a health care provider is informed. In another embodiment, the subject/patient is treated for breast cancer. In embodiments, the treatment is selected based on whether the patient has a tumor that is responder or a non responder. In some embodiments, if the patient is a nonresponder, the patient is not treated with preoperative chemotherapy.
  • a method of selecting a treatment for a subject having breast cancer comprises determining whether a subject having breast cancer is likely to have short term or long term survival by a method comprising measuring the level of gene expression of at least a set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 in a sample comprising breast cancer cells from the subject; inputting the expression levels of the set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to obtain an output score; determining whether the subject is likely to have long term survival(responder) by determining if the output score is less than a cutoff value or whether the subject is likely to have short term survival (nonresponder) by determining if the output score is greater than or equal to the cutoff value, wherein the cutoff value is a value determined by identifying
  • a method of selecting a treatment for a subject having breast cancer comprises determining whether a subject having breast cancer is likely to have short term or long term survival by a method comprising measuring the level of gene expression of at least a set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, ⁇ 1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK in a sample comprising breast cancer cells from the subject; inputting the expression levels of the set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to obtain an output score; determining whether the subject is likely to have long term survival by determining if the output score is less than a cutoff value or whether the subject is likely to have short term survival by determining if the output score is greater than or equal to the cutoff value, wherein the cutoff value is a value determined by identifying a value between the 99% confidence interval of a mean output score of a first
  • the methods further comprise treating a subject likely to have long term survival with standard chemotherapy (T/FAC).
  • standard chemotherapy T/FAC
  • standard chemotherapy standard chemotherapy
  • T/FAC comprises paclitaxel, 5-fluorouracil, doxorubicin, and cyclophosphamide.
  • the method further comprises treating a subject likely to have short term survival(nonresponder) with therapy in addition to or in place of standard chemotherapy.
  • an alternative therapy comprises a therapy selected from the group consisting of antiangiogenesis compounds, taxane analogues, tubulin binding agents, and ubiquitination inhibitors.
  • a subject likely to have short term survival is treated with an inhibitor of a protein selected from the group consisting of CCND1 , RARA, UBE2J1 , and combinations thereof.
  • the disclosure provides a method for selecting a treatment for a subject that has breast cancer comprising, the method comprising: calculating an output score, using a computing device, by inputting gene expression levels of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1, RARA, and UBE2J1, or a second set of genes comprising ESR1, BTG3, ODC1, MCM5, TTK, ⁇ 1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK, into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer; and displaying the output score, using a computing device.
  • the method further comprises determining whether the output score is greater than or equal to or less than a cutoff value, using a computing device; and displaying whether the subject is likely to be a short term or long term survivor.
  • the status of the patient is communicated to a health care worker, optionally with a recommendation for treatment.
  • the treatment options include standard chemotherapy, an alternative therapy selected from the group consisting of antiangiogenesis compounds, taxane analogues, tubulin binding agents, and ubiquitination inhibitors, and /or an inhibitor of a protein selected from the group consisting of CCND1 , RARA, UBE2J1 , and combinations thereof.
  • a nonresponder is not treated with preoperative chemotherapy but may be treated with chemotherapy post surgery.
  • kits for selecting a treatment for a breast cancer patient comprises or consists essentially of primer or a probe or both that specifically hybridizes to each gene of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 .
  • the kit consists essentially of reagents for detecting expression of the first set of genes and contains other reagents such as primer or probes for housekeeping genes, positive controls and/or negative controls.
  • a kit comprises or consists essentially of: a primer or a probe or both that specifically hybridizes to each gene of a first set of genes comprising ESR1 , BTG3, ODC1, MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK .
  • the kit contains no more than 200 primers or probes or both, no more than 175 primers, probes or both, no more than 150 primers, probes or both, no more than 125 primers, probes or both, no more than 100 primers, probes or both, no more than 75 primers, probes or both, no more than 50 primers, probes or both, no more than 25 primers, probes or both, or no more than 15 primers, probes or both.
  • a kit further comprises a non transitory computer readable storage medium having computer-executable instructions that, when executed by a computing device, cause the computing device to perform a step comprising: calculating an output score by inputting gene expression levels of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IG V1 -5, LAMA5, OXCTl , RARA, and UBE2J1 , or a second set of genes comprising ESRl , BTG3, ODCl , MCM5, TTK, NKAINl , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK or both from a sample from the patient into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to provide an output score.
  • the disclosure provides a computing device comprising a processing unit; and a system memory connected to the processing unit, the system memory including instructions that, when executed by the processing unit, cause the processing unit to: calculate an output score by inputting gene expression levels of a set of genes comprising a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCTl , RARA, and UBE2J1, or a second set of genes comprising ESRl , BTG3, ODCl , MCM5, TTK, NKAINl , IDUA, SLC43 A3, TXNDC5, SLC7A8, and MELK from a sample, into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer; and display the output score.
  • a computing device comprising a processing unit; and a system memory connected to the processing unit, the system memory including instructions that, when executed by the
  • system memory includes instructions, that when executed by the processing unit, cause the processing unit to determine whether the output score is greater than or equal to or less than a cutoff value; and displaying whether the subject is likely to be a short term or long term survivor.
  • system memory further includes instructions for making a recommendation for treatment options.
  • those recommendations include: for a responder standard chemotherapy prior to surgery, and for a nonresponder no chemotherapy or an alternative chemotherapy prior to surgery.
  • Figure 4 Results and performance assessment of the F 2 breast cancer prognostic biomarker during the Discovery Study.
  • the AUC value, the probability of significance (P), and the mean group F 2 score and standard deviation for both groups (R & NR) are also shown.
  • Figure 7 shows a flow diagram for an analytic method for determining the prognosis of a breast cancer tissue sample.
  • each of the following terms has the meaning associated with it in this section.
  • the articles “a” and “an” are used herein to refer to one or to more than one (i.e. to at least one) of the grammatical object of the article.
  • an element means one element or more than one element.
  • a "subject” or “patient” is a vertebrate, including a mammal, such as a human. Mammals include, but are not limited to, humans, farm animals, sport animals and pets.
  • binding refers to the adherence of molecules to one another, such as, but not limited to, enzymes to substrates, ligands to receptors, antibodies to antigens, DNA binding domains of proteins to DNA, and DNA or RNA strands to complementary strands.
  • Binding partner refers to a molecule capable of binding to another molecule.
  • biological sample refers to samples obtained from a subject, including, but not limited to, skin, hair, tissue, blood, plasma, serum, cells, sweat, saliva, feces, tissue, biopsy samples, and/or urine.
  • RNA sequences refers to the nucleic acid strand that is related to the base sequence in another nucleic acid strand by the Watson-Crick base-pairing rules. In general, two sequences are complementary when the sequence of one can bind to the sequence of the other in an anti-parallel sense wherein the 3 '-end of each sequence binds to the 5 '-end of the other sequence and each A, T(U), G, and C of one sequence is then aligned with a T(U), A, C, and G respectively, of the other sequence.
  • RNA sequences can also include complementary G U or U/G basepairs.
  • deoxyribonucleic acid and DNA as used herein mean a polymer composed of deoxyribonucleotides.
  • determining means a polymer composed of deoxyribonucleotides.
  • assessing means a polymer composed of deoxyribonucleotides.
  • assaying are used interchangeably and include both quantitative and qualitative determinations.
  • the use of the word “detect” and its grammatical variants refers to measurement of the species without quantification, whereas use of the word “determine” or “measure” with their grammatical variants are meant to refer to measurement of the species with quantification.
  • the terms “detect” and “identify” are used interchangeably herein.
  • health care provider or worker includes either an individual or an institution that provides preventive, curative, promotional or rehabilitative health care services to a subject, such as a patient.
  • the data is provided to a health care provider so that they may use it in their diagnosis/treatment of the patient.
  • BLAST protein searches can be performed with the XBLAST program (designated "blastn” at the NCBI web site) or the NCBI “blastp” program, using the following parameters: expectation value 10.0, BLOSUM62 scoring matrix to obtain amino acid sequences homologous to a protein molecule described herein.
  • Gapped BLAST can be utilized as described in Altschul et al.
  • PSI-Blast or PHI- Blast can be used to perform an iterated search which detects distant relationships between molecules and relationships between molecules which share a common pattern.
  • the default parameters of the respective programs e.g., XBLAST and NBLAST
  • the percent identity between two sequences can be determined using techniques similar to those described above, with or without allowing gaps. In calculating percent identity, typically exact matches are counted.
  • a "substantially homologous amino acid sequences" or “substantially identical amino acid sequences” includes those amino acid sequences which have at least about 92%, or at least about 95% homology or identity, including at least about 96% homology or identity, including at least about 97% homology or identity, including at least about 98% homology or identity, and at least about 99% or more homology or identity to an amino acid sequence of a reference antibody chain.
  • Amino acid sequence similarity or identity can be computed by using the BLASTP and TBLASTN programs which employ the BLAST (basic local alignment search tool) 2.0.14 algorithm. The default settings used for these programs are suitable for identifying substantially similar amino acid sequences for purposes of the present invention.
  • substantially homologous nucleic acid sequence or “substantially identical nucleic acid sequence” means a nucleic acid sequence corresponding to a reference nucleic acid sequence wherein the corresponding sequence encodes a peptide having substantially the same structure and function as the peptide encoded by the reference nucleic acid sequence; e.g., where only changes in amino acids not significantly affecting the peptide function occur.
  • the substantially identical nucleic acid sequence encodes the peptide encoded by the reference nucleic acid sequence.
  • the percentage of identity between the substantially similar nucleic acid sequence and the reference nucleic acid sequence is at least about 50%, 65%, 75%, 85%, 92%, 95%, 99% or more.
  • Substantial identity of nucleic acid sequences can be determined by comparing the sequence identity of two sequences, for example by physical/chemical methods (i.e., hybridization) or by sequence alignment via computer algorithm.
  • Isolated or purified generally refers to isolation of a substance (compound, polynucleotide, protein, polypeptide, polypeptide, chromosome, etc.) such that the substance comprises the majority percent of the sample in which it resides.
  • a substantially purified component comprises 50%, preferably 80%-85%, more preferably 90-95% of the sample.
  • Techniques for purifying polynucleotides, polypeptides and intact chromosomes of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography, sorting, and sedimentation according to density.
  • nucleic acid and “polynucleotide” are used interchangeably herein to describe a polymer of any length, e.g., greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, usually up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, or compounds produced synthetically which can hybridize with naturally occurring nucleic acids in a sequence specific manner analogous to that of two naturally occurring nucleic acids, e.g., can participate in Watson-Crick base pairing interactions.
  • hybrid refers to a double-stranded nucleic acid molecule formed by hybridization between complementary nucleotides.
  • nucleoside and nucleotide are intended to include those moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles.
  • modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles.
  • nucleoside and nucleotide include those moieties that contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides or nucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.
  • oligonucleotide denotes single stranded nucleotide multimers of from about 10 to 100 nucleotides and up to 200 nucleotides in length. Oligonucleotides are usually synthetic and, in many embodiments, are under 50 nucleotides in length. Each oligonucleotide may have any suitable length.
  • the length of the oligonucleotide may be between 60 nucleotides and 200 nucleotides (inclusive), between 80 nucleotides and 200 nucleotides, between 100 nucleotides and 200 nucleotides, between 125 nucleotides and 200 nucleotides, or between 150 nucleotides and 200 nucleotides.
  • the oligonucleotide may have a length of at least 60 nucleotides, at least 80 nucleotides, at least 100 nucleotides, or at least 150 nucleotides, and in certain embodiments, the oligonucleotide may have a length no greater than 200 nucleotides, no greater than 175 nucleotides, or no greater than 160 nucleotides. Oligonucleotides having such nucleotide lengths may be prepared using any suitable method, for example, using de novo DNA synthesis techniques known to those of ordinary skill in the art, such as solid-phase DNA synthesis techniques.
  • oligonucleotides can be designed with the aid of a computer, based on the sequence of the genome and/or a region of interest. Oligonucleotides can serve as primers or probes in accord with standard methods. Primers and probes can be designed with publicly available software such as PRIME , Primer3, Webprimer, Genefisher, OLIGO Primer analysis software, and PROBER.
  • primer refers to a nucleic acid capable of acting as a point of initiation of synthesis along a complementary strand when conditions are suitable for synthesis of a primer extension product.
  • the synthesizing conditions include the presence of four different bases and at least one polymerization-inducing agent such as reverse transcriptase or DNA polymerase. These are present in a suitable buffer, which may include constituents which are co-factors or which affect conditions such as pH and the like at various suitable temperatures.
  • a primer is preferably a single strand sequence, such that amplification efficiency is optimized, but double stranded sequences can be utilized.
  • Primers are typically at least about 15 nucleotides. In embodiments, primers can have a length of anywhere from 15 to 2000 nucleotides. In embodiments, primers have a melting temp of at least 50°C, 52°C, 55°C, 58°C, 60°C, or 65°C.
  • a probe refers to a nucleic acid that hybridizes to a target sequence.
  • a probe includes about eight nucleotides, about 10 nucleotides, about 15 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 40 nucleotides, about 50 nucleotides, about 60 nucleotides, about 70 nucleotides, about 75 nucleotides, about 80 nucleotides, about 90 nucleotides, about 100 nucleotides, about 1 10 nucleotides, about 1 15 nucleotides, about 120 nucleotides, about 130 nucleotides, about 140 nucleotides, about 150 nucleotides, about 175 nucleotides, about 187 nucleotides, about 200 nucleotides, about 225 nucleotides, and about 250 nucleotides.
  • probes have a melting temp of at least 50°C, 52°C, 55°C, 58°C, 60°C, or 65°C.
  • a probe can further include a detectable label.
  • Detectable labels include, but are not limited to, a fluorophore (e.g.,Texas- Red ® , Fluorescein isothiocyanate, etc.,) and a hapten, (e.g., biotin).
  • a detectable label can be covalently attached directly to a probe oligonucleotide, e.g., located at the probe's 5' end or at the probe's 3' end.
  • a probe including a fluorophore may also further include a quencher, e.g., Black Hole QuencherTM, Iowa BlackTM, etc.
  • ribonucleic acid and "RNA” as used herein mean a polymer composed of ribonucleotides.
  • stringent assay conditions refers to conditions that are compatible to produce binding pairs of nucleic acids, e.g., probes and targets, of sufficient complementarity to provide for the desired level of specificity in the assay while being incompatible to the formation of binding pairs between binding members of insufficient complementarity to provide for the desired specificity.
  • stringent assay conditions refer to the combination of hybridization and wash conditions.
  • a “stringent hybridization” and “stringent hybridization wash conditions” in the context of nucleic acid hybridization are sequence dependent, and are different under different environmental parameters.
  • Stringent hybridization conditions that can be used to identify nucleic acids within the scope of the invention can include, e.g., hybridization in a buffer comprising 50% formamide, 5xSSC, and 1% SDS at 42°C, or hybridization in a buffer comprising 5xSSC and 1% SDS at 65°C, both with a wash of 0.2xSSC and 0.1 % SDS at 65°C.
  • Exemplary stringent hybridization conditions can also include hybridization in a buffer of 40% formamide, 1 M NaCl, and 1%» SDS at 37°C, and a wash in I xSSC at 45°C.
  • hybridization to filter-bound DNA in 0.5 M NaHP0 4 , 7% sodium dodecyl sulfate (SDS), 1 mM EDTA at 65°C, and washing in O. l xSSC/0.1% SDS at 68°C can be employed.
  • Additional stringent hybridization conditions include hybridization at 60°C or higher and 3 x SSC (450 mM sodium chloride/45 mM sodium citrate) or incubation at 42°C in a solution containing 30% formamide, 1M NaCl, 0.5% sodium sarcosine, 50 mM MES, pH 6.5.
  • SSC 450 mM sodium chloride/45 mM sodium citrate
  • incubation at 42°C in a solution containing 30% formamide, 1M NaCl, 0.5% sodium sarcosine, 50 mM MES, pH 6.5 can be utilized to provide conditions of similar stringency.
  • the stringency of the wash conditions can determine whether a nucleic acid is specifically hybridized to a probe.
  • Wash conditions used to identify nucleic acids may include, e.g. a salt concentration of about 0.02 M at pH 7 and a temperature of about 20°C to about 40°C; or, a salt concentration of about 0.15 M NaCl at 72°C for about 15 minutes; or, a salt concentration of about 0.2xSSC at a temperature of about 30°C to about 50°C for about 2 to about 20 minutes; or, the hybridization complex is washed twice with a solution with a salt concentration of about 2xSSC containing 1% SDS at room temperature for 15 minutes and then washed twice by O.l xSSC containing 0.1% SDS at 37°C for 15 minutes; or, equivalent conditions.
  • Stringent conditions for washing can also be, e.g., 0.2xSSC/0.1 % SDS at 42°C. See Sambrook, Ausubel, or Tijssen for detailed descriptions of equivalent hybridization and wash conditions and for reagents and buffers, e.g., SSC buffers and equivalent reagents and conditions.
  • Stringent assay conditions are hybridization conditions that are at least as stringent as the above representative conditions, where a given set of conditions are considered to be at least as stringent if substantially no additional binding complexes that lack sufficient complementarity to provide for the desired specificity are produced in the given set of conditions as compared to the above specific conditions, whereby “substantially no more” is meant less than about 5-fold more, typically less than about 3 -fold more.
  • Other stringent hybridization conditions are known in the art and may also be employed, as appropriate.
  • fragment is a portion of an amino acid sequence, comprising at least one amino acid, or a portion of a nucleic acid sequence comprising at least one nucleotide.
  • fragment and “segment” are used interchangeably herein.
  • fragment as applied to a protein or peptide, can ordinarily be at least about 3-15 amino acids in length, at least about 15-25 amino acids, at least about 25-50 amino acids in length, at least about 50-75 amino acids in length, at least about 75-100 amino acids in length, and greater than 100 amino acids in length.
  • fragment as applied to a nucleic acid, may ordinarily be at least about 20 nucleotides in length, typically, at least about 50 nucleotides, more typically, from about 50 to about 100 nucleotides, at least about 100 to about 200 nucleotides, at least about 200 nucleotides to about 300 nucleotides, at least about 300 to about 350, at least about 350 nucleotides to about 500 nucleotides, at least about 500 to about 600, at least about 600 nucleotides to about 620 nucleotides, at least about 620 to about 650, and or the nucleic acid fragment will be greater than about 650 nucleotides in length.
  • standard refers to something used for comparison, such as control or a healthy subject.
  • a method of selecting a treatment for a subject that has breast cancer comprises: a)determining whether the subject is likely to have short term or long term survival by a method comprising i)measuring the level of gene expression of at least a set of genes in a sample comprising breast cancer cells from the subject; ii)inputting the expression levels of the set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to obtain an output score; iii)determining whether the subject is likely to have long term survival by determining if the output score is less than a cutoff value or whether the subject is likely to have short term survival by determining if the output score is greater than or equal to the cutoff value, wherein the cutoff value is a value determined by identifying a value between the 99% confidence interval of the mean output score of a first set of samples from subjects known to have short term survival and the 99%
  • the set of genes comprises at least the genes CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 , RARA, and UBE2J1.
  • the set of genes comprises at least the genes ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK.
  • Each of the genes identified herein as useful in determining a short term or long term survivor can have one or more variants that are known and primers and probes can be designed to detect all variants and/or each variant.
  • Variants include those nucleic acids or proteins that are "Substantially homologous nucleic acid sequence” or “substantially identical nucleic acid sequence” “substantially homologous amino acid sequences” or “substantially identical amino acid sequences”. Such variants are either known or may be readily determined.
  • methods for detecting breast cancer biomarkers in a biological sample.
  • the biomarkers are determined by gene expression of a first set of genes or a second set of genes. Detection of the biomarkers is useful to identify patients that are responders or nonresponders to a treatment.
  • the treatment is preoperative chemotherapy. Identifying patients that are responders or nonresponders provides for the ability to apply a different treatment to those identified as nonresponders, screen for compounds that may be more effective on those breast cancer tumors that are non responsive to standard preoperative chemotherapy, and stratify patients for treatment either therapeutically or during clinical trials.
  • the disclosure provides a method to identify whether a breast cancer patient is treatment responder or non-responder comprising: determining the expression level of a first set of genes comprising CCND1 , CELSR1, DKFZp566H0824, FAAH, IGKV1 -5,
  • LAMA5, OXCTl, RARA, and UBE2J1 genes in a sample from the patient wherein the level of expression of the CCND1 , CELSR1, DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCTl , RARA, and UBE2J1 genes indicates that the subject is a responder or non-responder.
  • the disclosure provides a method to identify whether a breast cancer patient is treatment responder or non-responder comprising: determining the expression level of second set of genes comprising ESRl , BTG3, ODCl , MCM5, TTK, NKAINI , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes in a sample from the patient, wherein the level of expression of the ESRl , BTG3, ODCl , MCM5, TTK, NKAINI , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes indicates that the subject is a responder or non-responder.
  • the disclosure provides a method to identify whether a breast cancer patient is treatment responder or non-responder comprising: determining the expression level of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESRl , BTG3, ODCl , MCM5, TTK, NKAINI , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes in a sample from the patient; inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score; and comparing the score to a cutoff value to identify the patient as a treatment responder or nonresponder.
  • the disclosure provides a method to identify whether a breast cancer patient is treatment responder or non-responder comprising: inputting the levels of gene expression of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMAS, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESRl , BTG3, ODCl, MCM5, TTK, NKAINI , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes in a sample from the patient into a predictive function to obtain a score; comparing the score to a cutoff value to identify the patient as a treatment responder or nonresponder; and optionally, communicating the identification of the patient as a responder or non responder to a user such as a health care professional.
  • the disclosure provides a nontransitory computer readable medium or computing device implemented method to identify whether a breast cancer patient is a treatment responder or non-responder comprising: a)receiving gene expression levels of a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAINl , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient at a receiver module; b) inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score in a scoring module; c) optionally, comparing the score to a cutoff value in a diagnostic module to identify
  • Samples from a subject having breast cancer are analyzed for gene expression.
  • the subject is a vertebrate, including a mammal, such as a human. Mammals include, but are not limited to, humans, farm animals, sport animals and pets. Samples are obtained from the subject including without limitation skin, hair, tissue, blood, plasma, serum, cells, sweat, saliva, feces, tissue and/or urine. In a specific embodiment, the sample is from a biopsy of suspected breast cancer or from tumor cells found in the blood.
  • the sample may be analyzed for the presence of cancerous or precancerous cells using methods known to those of skill in the art.
  • Breast cancers are optionally classified histologically. Infiltrating or invasive ductal cancer is the most common breast cancer histologic type and comprises 70% to 80% of all cases. Types of breast cancer include
  • breast cancer cell samples are also optionally staged, typically prior to treatment.
  • the American Joint Committee on Cancer (AJCC) staging system provides a strategy for grouping patients with respect to prognosis.
  • Therapeutic decisions are formulated in part according to staging categories but primarily according to tumor size, lymph node status, estrogen-receptor and progesterone-receptor levels in the tumor tissue, human epidermal growth factor receptor 2 (HER2/neu) status, menopausal status, and the general health of the patient.
  • HER2/neu human epidermal growth factor receptor 2
  • Stage 0 describes noninvasive (in situ) breast cancer.
  • Ductal carcinoma in situ (DCIS) is an example of stage 0 cancer.
  • Stage ⁇ is an early stage of invasive breast cancer in which: the tumor measures no more than 2 centimeters (cm) in diameter (3/4 inch); and no lymph nodes are involved— the cancer hasn't spread outside the breast.
  • Stage II describes invasive breast cancers in which one of the following is true: the tumor measures less than 2 cm (3/4 inch) but has spread to lymph nodes under the arm; no tumor is found in the breast, but breast cancer cells are found in lymph nodes under the arm; the tumor is between 2 and 5 cm (about 3/4 to 2 inches) and may or may not have spread to lymph nodes under the arm; and the tumor is larger than 5 cm (2 inches) but hasn't spread to any lymph nodes.
  • Stage III breast cancers are subdivided into three categories— IIIA, IIIB and IIIC— based on a number of criteria.
  • stage III cancers haven't spread to distant sites.
  • a stage IIIA tumor is larger than 5 cm (2 inches) and has spread to one to three lymph nodes under the arm.
  • Other stage IIIA tumors may be any size and have spread into multiple lymph nodes. The lymph nodes clump and attach to one another or to the surrounding tissue.
  • a tumor of any size has spread to tissues near the breast— the skin and chest muscles— and may have spread to lymph nodes within the breast or under the arm.
  • Stage IIIB also includes inflammatory breast cancer, an uncommon but aggressive type of breast cancer.
  • Stage IIIC cancer is a tumor of any size that has spread: to 10 or more lymph nodes under the arm; to lymph nodes above or beneath the collarbone and near the neck; or to lymph nodes within the breast itself and to lymph nodes under the arm.
  • Stage IV breast cancer has spread to distant parts of the body, such as the lungs, liver, bones or brain.
  • the breast cancer sample may optionally be analyzed for the presence or absence of one or more markers including estrogen receptor positive, progesterone receptor positive, hormone receptor negative, the presence or absence of Her2, and the presence or absence of both Her2 and hormonal receptors.
  • Triple negative breast cancer is a breast cancer type that lacks hormonal receptors and Her2.
  • the sample is analyzed for gene expression in accord with the methods described herein.
  • the gene expression analysis provides for identification of the sample of breast cancer cells as a responder to chemotherapy or a nonresponder to chemotherapy.
  • the gene expression analysis is complementary to other information regarding the breast cancer cells and provides a measure of risk assessment that is independent of age, ethnicity, stage of cancer, and receptor status of the cancer.
  • genes have been demonstrated herein to be prognostic of breast cancer. Two sets of genes have been identified as providing classification of the breast cancer cells as responders or nonresponders. These genes are characterized by multiple known transcript sequences that are known or are readily identifiable by searching for related sequences to the known sequences. Within a particular species, gene or transcript sequences for a particular gene are those that have at least 80% sequence identity in the coding sequence, for example, several of the genes and the gene products are known to have isoforms. Transcripts for such isoforms are included within the scope of detecting the gene expression of a particular gene.
  • the first set of genes include the following: CCND1 , such as cyclin Dl , also known as: BCL1 ; PRAD1 ; U21B31 ; Dl 1 S287E, as exemplified by a target sequence 208712_at and reference sequence gl 77628157; CELSR1 , such as cadherin, EGF LAG seven-pass G-type receptor 1 , also known as: ME2; FMI2; CDHF9; HFMI2; DKFZp434P0729, exemplified by a target sequence 41660_at and reference sequence gl 656966: DKFZp566H0824, also known as hypothetical LOC54744, exemplified as a target sequence 207470_at, and gl 23273884; FAAH, such as fatty acid amide hydrolase, also known as FAAH-1 ; MGC 102823; MGC138146, exemplified by a target sequence 20423 l_
  • immunoglobulin kappa variable 1-5 also known as: VI ; L12; IGKV; L12a; IGKV15; MGC22745; MGC32715; MGC88810, exemplified by a target sequence 214768_x_at, and gl 19718803 ;LAMA5, such as laminin, alpha 5, also known as: KIAA1907, exemplified by a target sequence 210150_s_at, and gl 21264601 ; OXCT1 , such as 3-oxoacid CoA transferase 1 , also known as: OXCT; SCOT, exemplified by a target sequence 202780_at, and gl 1 12382246;
  • RARA such as retinoic acid receptor, alpha, also known as: RAR; NR1B 1 , exemplified by an Affymetrix target sequence 216300_x_at, and gl 300388174; and UBE2J1 , such as ubiquitin- conjugating enzyme E2, Jl , U, also known as: UBC6; Ubc6p; CGI-76; NCUBE1 ; HSPC153; HSPC205; NCUBE-1 ; HSU93243; MGC12555, exemplified by a target sequence 217825_s_at, and gl 37577121.
  • Each of these genes can be described by a number of transcripts that are identified in databases such as Gene in Genbank, the Unigene database, and the Image id database.
  • the target sequences define a unique region for detecting each gene as identified by Affymetrix for their whole genome array U133A.
  • polynucleotide sequences comprising the target sequences are detected as described herein and can be used to design primers or other types of probes such as aptamers that can be utilized in other methods that detect gene expression levels using methods and available programs for primer or probe design.
  • a second set of genes includes the following: ESR1 , such as estrogen receptor 1 , also known as: ER; ESR; Era; ESRA; NR3A1 ; DKFZp686N23123, exemplified as a target sequence 205225_at, and gl 170295748; BTG3, such as BTG family, member 3, also known as: ANA; TOB5; TOFA; TOB55; MGC8928, exemplified by a target sequence 213134_x_at, and gl 195963405; ODC1, such as ornithine decarboxylase 1 , also known as: ODC, exemplified by a target sequence 200790_at, and gI4505488; MCM5, such as minichromosome maintenance complex component 5, also known as: CDC46; MGC5315; P1-CDC46, exemplified by a target sequence 216237_s_at, and gl 143770796; TTK,
  • SLC7A8 such as solute carrier family 7 (amino acid transporter light chain, L system), member 8, also known as: LAT2; LPI-PC1 , exemplified by a target sequence 216092_s_at, and gl 33286427; MCM5, such as minichromosome maintenance complex component 5, also known as: CDC46; MGC5315; P1-CDC46, exemplified by a target sequence 201755_at, and gl 143770796 and MEL , such as maternal embryonic leucine zipper kinase, also known as: HPK38; KIAA0175, exemplified by a target sequence 204825_at, and gl 41281490.
  • MCM5 such as minichromosome maintenance complex component 5, also known as: CDC46; MGC5315; P1-CDC46, exemplified by a target sequence 201755_at, and gl 143770796 and MEL
  • MEL such as maternal embryonic
  • Each of these genes can be described by a number of transcripts that are identified in databases such as Gene in Genbank, the Unigene database, and the Image id database.
  • the target sequences define a unique region for detecting each gene as identified by Affymetrix for their whole genome array U133A.
  • polynucleotide sequences comprising the target sequences are detected as described herein and can be used to design primers or other types of probes such as aptamers that can be utilized in other methods that detect gene expression levels using methods and available programs for primer or probe design.
  • expression of IGKV1-5, OXCT1 , and UBE2J1 is increased and the expression of CCND1 , CELSR1 , DKFZp566H0824, FAAH, LAMA5, and RARA is decreased.
  • the expression of ESR1 , ⁇ 1 , IDUA, and SLC7A8 is decreased and the expression of BTG3, ODC1 , MCM5, TTK, SLC43A3, TXNDC5, MCM5, and MELK is increased.
  • the expression of the gene in the responders is either under expressed or overexpressed as compared to the nonresponders.
  • detecting gene expression levels as up or down regulated allows for classification of a patient as a responder or nonresponder.
  • the gene expression levels can be analyzed using standard statistical methods and/or as described herein.
  • the expression of only these genes is analyzed except for the analysis of any control gene or nucleic acid sequences to ensure the functionality of the assay.
  • a subarray or smaller subset of genes that includes the first and/or second set of genes can be analyzed, providing that 200 genes or less are analyzed.
  • about 9 to 200 genes are analyzed, about 9 to 150 genes are analyzed, about 9 to 100 genes are analyzed, about 9 to 50 genes are analyzed, or about 9 to 25 genes are analyzed.
  • the other genes that are analyzed may include other known markers for breast cancer or other known markers for ovarian cancer such as LYPLA2, TUBA3C, ACTB, MED13L, OSBPL8, EED, and PKP4 and/or SSRl , USP5, ACTB, HLCS, NDUFB1 , LYPLA2, TUBA3C, MED13L, and EED .
  • the expression of the nucleic acid such as mRNA of the genes of interest is determined.
  • Methods for detecting gene expression from biological samples are known. A number of different approaches are described herein.
  • probes or primers to detect gene expression of each gene or the target sequence for each gene are designed to specifically detect expression of CCND1 versus the other listed genes.
  • the primers and probes are designed to specifically identify a CCND1 gene regardless of whether the gene has sequence variation.
  • RNA levels of mRNA can be quantitatively measured by Northern blotting.
  • a sample of RNA is separated on an agarose gel and hybridized to a radio-labeled RNA probe that is
  • the radio-labeled RNA is then detected by an
  • RT- PCR first generates a DNA template from the mRNA by reverse transcription, which is called cDNA. This cDNA template is then used for qPCR where the change in fluorescence of a probe changes as the DNA amplification process progresses.
  • qPCR can produce an absolute measurement such as number of copies of mRNA, typically in units of copies per nanolitre of homogenized tissue or copies per cell. qPCR is very sensitive (detection of a single mRNA molecule is possible).
  • Another approach is to individually tag single mRNA molecules with fluorescent barcodes (nanostrings), which can be detected one-by-one and counted for direct digital quantification (Krassen Dimitrov, NanoString Technologies).
  • DNA microarrays can be used to determine the transcript levels for many genes at once (expression profiling). Recent advances in microarray technology allow for the
  • tag based technologies like Serial analysis of gene expression (SAGE), which can provide a relative measure of the cellular concentration of different mRNAs, can be used.
  • SAGE Serial analysis of gene expression
  • the level of expression can be determined using RNA sequencing technology.
  • RNA sequencing technology involves high throughput sequencing of cDNA. mRNA is isolated and reverse transcribed to form a library of cDNA. The cDNA is fragmented to a specific size and optionally may be detectably labeled. The fragments are sequenced and the full sequence is assembled in accord with different platforms such as provided by Ilumina, 454 Sequencing or SOLID sequencing. In addition, mRNA can be sequenced directly(without conversion to cDNA) using protocols available from Helicos.
  • the expression of the protein from the genes of interest is
  • the most commonly used method is to perform a Western blot against the protein of interest - this gives information on the size of the protein in addition to its identity.
  • a sample (often cellular lysate) is separated on a polyacrylamide gel, transferred to a membrane and then probed with an antibody to the protein of interest.
  • Other methods include, for example, Enzyme- linked immunosorbent assay (ELISA), lateral flow test, latex agglutination, other forms of immunochromatography, western blot, and/or magnetic immunoassay.
  • Reagents to the detect the molecules of interest can be produced by methods available to an art worker or purchased commercially.
  • a method for selecting a treatment of a subject with breast cancer comprises inputting the expression levels of the set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to obtain an output score.
  • the gene expression analysis of the genes of interest is applied to the equations provided herein.
  • the method is a computer implemented method.
  • the data obtained from a microarray analysis with regards to the gene expression of the genes of interest is first normalized and then applied to the equations provided.
  • gene expression analysis is obtained using probes and methods as designed by Affymetrix U133A chip.
  • Other arrays may also be utilized and functions derived that provide classification of responders and nonresponders in accord with the methods described herein. However, it is not necessary or even desirable to utilize a full array of 10,000, 20,000 or even 30,000 genes.
  • a subarray can be constructed to detect gene expression of at least the first set of genes or the second set of genes or both.
  • gene expression values can be converted to values of the Affymetrix gene expression analysis algorithm using known methods. For example, the gene expression analysis can be run in parallel using PCR or RNA sequencing and using the Affymetrix U133 chip and software. The gene expression values for each gene from PCR or RNA sequencing can be compared to the values generated using Affymetrix system and a conversion factor identified. Gene expression levels for each gene generated by PCR or RNA sequencing can be generated and converted to the output of the Affymetrix algorithm using the conversion factor before inputting gene expression levels for each gene into the function
  • Gene expression values are obtained from, for example, an Affymetrix array or subarray, and are normalized using an algorithm that corrects for background and normalizes the probe intensity.
  • the MAS5 or RMA algorithms are used to correct the raw gene expression data.
  • the MAS5 algorithm provides for measuring the intensities for each probe on the array to generate CEL files.
  • the background level across the area is computed using a weighted sum for of individual zone backgrounds. Backgound is then subtracted from spot intensity.
  • the probe intensity is then corrected for stray signal by subtracting out the signal from a mismatched probe pair that is paired with the perfect match probe.
  • the signal for each probe in the probe set is used to calculate an average of the signals from each probe.
  • the RMA algorithm is a model based approach including background correction, quantile normalization, and modeling probe specific effects across multiple arrays using a median polish method for fitting the model.
  • the same algorithm can be used or combinations of the algorithms can be used.
  • These algorithms are available through Affymetrix and as well at Libaffy (at moffit.usf.edu).
  • Other software may also be compatible for analyzing gene expression including that of Biotique (XRAY), Genomematrix Chiplnspector, JMP Genomics, Arraystar form
  • all of the variables were normalized using the RMA algorithm.
  • variables are weighted in each function as follows:
  • ⁇ 2 [( ⁇ 2) ⁇ (-1 )]* 10 2 ;
  • ⁇ 3 [( ⁇ 3) ⁇ (-3.8)]* 10 4 ;
  • ⁇ 4 [( ⁇ 4) ⁇ (-5)]* 10 5 ;
  • ⁇ 5 [( ⁇ 5) ⁇ (-1.5)]* 10 2 ;
  • ⁇ 6 [( ⁇ 6) ⁇ (-1 )]* 10 2 ;
  • ⁇ 9 [( ⁇ 9) ⁇ (-6.2)]* 10 4 .
  • the Fl function is shown below.
  • the * denotes multiplication and the ⁇ symbol denotes an exponent.
  • the above F 2 transcripts indicated with (R) are obtained from RMA processing of the raw intensity data (CEL files), whereas those indicated with (M) are obtained from MAS5 processing of the raw intensity data (CEL files).
  • the variables are weighted as follows:
  • the function F2 is below.
  • a score is obtained. The score is then compared to a cutoff value. If the score is less than the cutoff value it indicates that the breast cancer cells are representative of a responder. If the score is greater than or equal to the cutoff value it indicates the breast cancer cells are representative of a nonresponder.
  • Cutoff values are determined according to standard methods. In an embodiment, a cutoff value is determined by dividing a set of patients with known status as a responder or
  • each group containing responders and nonresponders analyzing gene expression levels for a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patients in the first group, determining the score of each of the known responders and nonresponder samples, calculate the halfway point between the mean score of the responder group and the mean score of the nonresponder group, and identify this halfway point as the cutoff value.
  • a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 ,
  • the predictive capability of the cutoff value is tested using the second group of patients with the samples blinded. If the accuracy falls within at least 80% no further action is necessary. If the cutoff value does not provide the required degree of accuracy, then the cutoff value is adjusted using standard statistical methods. The cutoff is set within that difference between the 99% confidence interval of the groups and adjusted up or down from the
  • the cutoff is moved away from the middle point from the group that has the larger standard deviation and closer to the other group (the one with the smaller standard deviation).
  • the cutoff value is determined by a method comprising calculating an optimal point on the ROC curve based on the 50 scores of the 50 original subjects used in the discovery study [optimal point is defined as the point with the highest sensitivity and the lowest false positive rate (1 -specificity)] for first group of short term survivors and a second group of long term survivors. That optimal point (the score of one of the 50 original subjects), which represents, according to ROC curve analysis, the best cutoff point for all of the 50 original subjects' scores, itself may be used as the cutoff point.
  • a cutoff value is selected that provides the highest specificity and lowest rates of false positives. In a specific embodiment this value can be determined by analyzing the score of each patient in a group of patients using ROC curve analysis.
  • the cutoff value for Fl is about 4.7 and for the F2 it is about 13.7.
  • a health care provider can select a treatment appropriate for the individual patient.
  • the health care provider may not utilize preoperative chemotherapy such as taxol or tamoxifen.
  • the health care provider may want to perform surgery and radiation immediately after analysis or employ alternative chemotherapy before or after surgery.
  • the methods of the disclosure may be employed to determine further treatment with standard chemotherapy will be beneficial, that is, will the cells of the recurrent tumor be classified as a responder or
  • the health care worker may select one or more standard therapy options. These standard therapy options include chemotherapy, surgery, and or radiation. Standard
  • chemotherapeutic options include treatment with one or more of cyclophosphamide, Taxol, Platinum, Carboplatin, Cisplatin, Gemcitabine, Topotecan, Oxaliplatin, Doxorubicin, Paclitaxel, Docetaxel, and combinations thereof.
  • the health care worker may select a more aggressive treatment in addition to or in place of the standard chemotherapy.
  • treatment includes treatment with a cancer vaccine, angiogenesis inhibitors, tubulin binding inhibitors, taxane analogs, actin polymerization inhibitors, adoptive cell therapy, and protein ubiquination inhibitors.
  • the chemotherapy treatment includes treatment with an inhibitor of CCND1 , RARA, UBE2J1, and combinations thereof.
  • the methods of the invention may be employed on a set of patients to identify a responder group or a nonresponder group in a clinical trial, for example.
  • a new therapeutic agent it is useful to know whether the therapeutic agent has different effects in the responder population versus the nonresponder population.
  • a group of patients having breast cancer are identified as responders or nonresponders and are then treated with a potential therapeutic agent. Safety and efficacy of the drug is assessed in responder and nonresponder propulations.
  • Another aspect of the disclosure includes methods for screening therapeutic agents. Identification of breast cancer tissue samples as nonresponders and responders can be used to screen therapeutic effectiveness of the potential therapeutic agent on both types of patient populations.
  • cell lines may be developed from breast cancer tissue using standard methods from nonresponder and responders in order to provide for high through put analysis.
  • a method for screening agents for treating breast cancer comprises contacting a breast cancer sample identified as a nonresponder or responder with a potential agent for treating breast cancer; and b) determining whether the agent decreases the growth, spread of the breast cancer sample, or changes the gene expression profile of the first set of genes, the second set of genes or both.
  • the method further comprises identifying a breast cancer sample as from a responder or nonresponder by determining the expression level of a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FA AH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN l , IDUA, SLC43A3, TXNDC5, SLC7A8, and MEL genes or both set of genes in a sample from the patient.
  • a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FA AH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN l , ID
  • the potential therapeutic agents are those that interact with any one of the genes CCND1, CELSR1, DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, ⁇ 1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes in a sample from the patient. Examples of such agents are listed in Table 1 under known drugs or chemicals.
  • Drugs or chemicals similar to those known drugs in mechanism of action may be screened using nonresponder and responder breast cancer cells or cell lines as a measure of their efficacy in each of the patient groups.
  • Other drugs or agents may also be those that are selected to act on other genes that are known to interact with any of the genes in the first or second set of genes as described on Table 1.
  • the genes in the first and/or second set of genes are targets to develop new therapeutics which can be tested on breast cancer cells identified as responders or nonresponders.
  • High throughput assays such as multiwell plate assays or arrays with cells attached to nanobeads can be utilized to test a number of therapeutic compounds for any effects on the responder or nonresponder cell types with regard to inhibition of cell growth, cell death, or change is gene expression of one or more of the genes of the first set of genes, the second set of genes or both. Those agents effective on both the responder and nonresponder population may be selected for further development. In other embodiments, an effective agent on either a responder or nonresponder cell types is selected and the patient group is sorted as responders and non responders for further testing of the agent effective in the respective responder or nonresponder cell type.
  • a computing device implemented method to identify whether a breast cancer patient is treatment responder or non-responder comprises: a)receiving gene expression levels of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, ⁇ 1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient at a receiver module; and b) inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score in a scoring module.
  • the method further comprises c) comparing the score to a cutoff value in a diagnostic module to identify the patient as a treatment responder or nonresponder.
  • the method further comprises d) communicating the identification of the patient as a treatment responder or nonresponder to a user.
  • Fig.7 is a flowchart illustrating a
  • a method for selecting a treatment for a subject that has breast cancer comprises calculating an output score, using a computing device, by inputting gene expression levels of a set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer; and displaying the output score, using a computing device.
  • the method further comprises determining whether the output score is greater than or equal to or less than a cutoff value, using a computing device; and displaying whether the subject is likely to be a short term survivor if the output score is greater than or equal to the cutoff value or long term survivor if the output score is less than the cutoff value.
  • the biological samples are processed at a health care facility and the gene expression values are determined and then those values are sent to a remote location for further analysis via the internet or wireless communication systems.
  • the gene expression and analysis of the gene expression data is done at a single location.
  • the computing device can include a single computing device, such as a server computer. In other embodiments, the computing device can include multiple computing devices configured to communicate with one another over a network (not shown).
  • the computing device can store multiple databases within memory. The databases stored on the computing device can be organized by clinic, practicing clinician, programmer identification code, or any other desired category.
  • Gene expression information can be sent to the remote computing system or another data storage device.
  • the communication process initializes and begins at a start module and proceeds to a connect operation.
  • the connect operation communicatively couples the stored information of the health care provider to the remote computing system, for example, via a cabled
  • a transfer operation transmits gene expression data from the health care provider to the computing device.
  • the transfer operation encrypts the data before
  • the communication process can complete and end at a stop module.
  • the data is optionally normalized and then inputted into a stored function 1 , function 2 or both in order to obtain a score.
  • the score is then compared to a cutoff value. A score greater than or equal to the cutoff value is identified as a non responder and below the cutoff value as a responder.
  • the status of the analysis of the sample as a responder or non responder is communicated back to the health care provider using a similar process over cabled connection, a wireless local area network (WLAN or Wi-Fi) connection, a cellular network, a wireless personal area network (WPAN) connection, e.g., BLUETOOTH®, or any desired communication link.
  • WLAN wireless local area network
  • WPAN wireless personal area network
  • the status may also be associated with a treatment recommendation that is also communicated to a health care worker.
  • the treatment recommendation includes standard chemotherapy, surgery, and/or radiation.
  • the recommendation includes immediate surgery and radiation with no preoperative chemotherapy and/ or surgery and radiation with alternative chemotherapy before or after surgery.
  • the remote computing device includes a computer, a server, storage devices, mobile devices, and the like for receiving the level of gene expression of a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TT , NKAIN1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient at a receiver module.
  • a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TT ,
  • the gene expression levels for each of the genes are inputted into function 1 or function 2 or both as described herein to provide a score.
  • the score is then compared to a cutoff value in the diagnostic module and if the score greater than or equal to the cutoff value, the sample is identified as a nonresponder and if it falls below, the sample is identified as a responder.
  • the identification of the sample as a responder or nonresponder is stored in a database and communicated to the user via internet communication to another computer or mobile device.
  • the user is a health care professional.
  • the health care provider chooses a course of treatment appropriate to the responder or nonresponder group.
  • the health care provider may choose to eliminate any preoperative chemotherapy and treat the patient by surgery and radiation, optionally followed by chemotherapy post surgery and radiation.
  • the detection, prognosis and/or diagnosis method can employ the use of a
  • processor/computer device For example, a general purpose computer system comprising a processor coupled to program memory storing computer program code to implement the method, to working memory, and to interfaces such as a conventional computer screen, keyboard, mouse, and printer, as well as other interfaces, such as a network interface, and software interfaces including a database interface find use one embodiment described herein.
  • a computing device comprises a processing unit; and a system memory connected to the processing unit, the system memory including instructions that, when executed by the processing unit, cause the processing unit to: calculate an output score by inputting gene expression levels of a set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer; and display the output score.
  • the system memory includes instructions that when executed by the processing unit, cause the processing unit to determine whether the output score is greater than or equal to or less than a cutoff value; and displaying whether the subject is likely to be a short term survivor if the output score is greater than or equal to the cutoff value or long term survivor if the output score is less than the cutoff value.
  • the set of genes comprises at least the genes CCND1 , CELSR1 ,
  • the function is selected from the group consisting of function 1 , or function 2, or both.
  • the computer system accepts user input from a data input device, such as a keyboard, input data file, or network interface, or another system, such as the system interpreting, for example, the microarray or PCR data, and provides an output to an output device such as a printer, display, network interface, or data storage device.
  • Input device for example a network interface, receives an input comprising detection of the proteins/nucleic acids described herein and/or quantification of those compounds.
  • the output device provides an output such as a display, including one or more numbers and/or a graph depicting the detection and/or quantification of the compounds.
  • Computer system is coupled to a data store which stores data generated by the methods described herein. This data is stored for each measurement and/or each subject; optionally a plurality of sets of each of these data types is stored corresponding to each subject.
  • One or more computers/processors may be used, for example, as a separate machine, for example, coupled to computer system over a network, or may comprise a separate or integrated program running on computer system. Whichever method is employed these systems receive data and provide data regarding detection/diagnosis in return.
  • the disclosure provides a computing device or a nontransitory computer readable medium with instructions to implement the methods of the disclosure.
  • the computer readable medium includes CD, DVD, flash drive, external hard drive, and mobile device.
  • the computing device includes a receiver module for receiving gene expression data, an optional normalization module for normalizing gene expression data, a scoring module for inputting the gene expression data into function 1 , function 2 or both, and calculating a score, an optional diagnostic module for comparing the score to a cutoff value and identifying a score above the cutoff value as a responder and below a cutoff value as a nonresponder, and an optional communication module for communicating the identification of the sample as a nonresponder or responder to a user.
  • the communication module may communicate to a user through a graphical interface on a computer or mobile device or through a cabled connection, a wireless local area network (WLAN or Wi-Fi) connection, a cellular network, a wireless personal area network (WPAN) connection, e.g., BLUETOOTH®, or any desired
  • Instructions can also be stored on a nontransitory computer readable medium.
  • the instructions provide for a computer implemented method comprising a)receiving gene expression levels of a first set of genes comprising CCND1 , CELSR1, DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1, RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1, MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient; b) inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score c) comparing the score to a cutoff value to identify the patient as a treatment responder or nonresponder; and d) communicating the identification of the patient as a treatment responder or nonresponder to a
  • kits for identifying a patient or set of patients as a responder or nonresponder include reagents for detecting the gene expression levels of the first set of genes or the second set of genes or both.
  • the reagents include primers or probes.
  • the probes may be attached to a surface.
  • the primers or probes may be detectably labeled.
  • the kit includes reagents for conducting PCR.
  • Primers and probes that specifically detect expression of each of the first set of genes and the second set of genes can be readily designed by using known methods and/or publicly available software.
  • the primers and probes are not designed to bind to 3' poly A regions or other repetitive or nonunique sequences.
  • the primers or probes specifically bind to or hybridize to under stringent conditions to each gene or exemplary target sequence for each gene identified in Table 1 and provided in Tables 6-26 and do not cross hybridize to other genes.
  • target regions for each gene have been identified that are known to uniquely identify each of the genes.
  • Probes or primer can be designed that detect or amplify all or a part of the target sequence.
  • One of skill in the art will recognize that such primers and probes may amplify a number of different gene sequences corresponding to each gene including allelic variants, snps, splice variants and the like.
  • Labels include radioactive isotopes, fluorescent moieties, other dyes, biotin, and molecular beacons.
  • a kit contains no more than 200 sets of primers and/or no more than 200 probes, no more than 150 probes and/or sets of primers, no more than 100 probes and/or sets of primers, no more than 50 probes and/or sets of primers, no more than 25 probes and/or 25 sets of primers, no more than 9 probes and/or sets of primers.
  • a subarray may be prepared by attaching at least one probe that specifically binds to and hybridizes to each gene of a first set of genes comprising CCND1 , CELSR1 , D FZp566H0824, FAAH, IG V1-5, LAMA5, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESRl , BTG3, ODC1 , MCM5, TTK, ⁇ 1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes to a surface.
  • a first set of genes comprising CCND1 , CELSR1 , D FZp566H0824, FAAH, IG V1-5, LAMA5, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESRl , BTG3, ODC1 , MCM5, TTK, ⁇ 1, IDUA, SLC43A3, TXNDC5,
  • the kit can provide a subarray or primers/probes that detect a smaller subset of genes that includes the first and/or second set of genes, providing that 200 genes or less are analyzed.
  • about 9 to 200 genes are analyzed, about 9 to 150 genes are analyzed, about 9 to 100 genes are analyzed, about 9 to 50 genes are analyzed, or about 9 to 25 genes are analyzed.
  • the other genes that are analyzed may include other known markers for breast cancer or other known markers for cancer such as ovarian cancer including LYPLA2, TUBA3C, ACTB, MED13L, OSBPL8, EED, and PKP4 and/or SSR1, USP5, ACTB, HLCS, NDUFB 1 , LYPLA2, TUBA3C, MEDI 3L, and EED .
  • the kit includes a least one set of primers or a probes that is specific for one or more housekeeping genes expressed in both responder and nonresponder breast cancer cells and/or that is expressed in normal breast tissue.
  • the kit includes a least one set of primers or a probes that is specific for another transcript of at least one gene of the first set of genes or the second set of genes or both. For example, in the second set of genes, detection of two different transcripts of MCM5 is determined. The use of two different transcripts serves as an internal control to determine if the subarray or PCR is functioning properly. Other genes of the first set or second set are known to have alternative transcripts and can serve as controls.
  • the kit includes a control that includes one or more of the gene sequences of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1, BTG3, ODC1, MCM5, TTK, ⁇ 1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes.
  • a control that includes one or more of the gene sequences of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1, BTG3, ODC1, MCM5, TTK, ⁇ 1, IDUA, SLC43A3, TXNDC5, SLC7A8, and
  • the kit includes a control that is breast cancer cells from a known responder and/or a known non responder.
  • the kit can comprise or consist essentially of other reagents for detecting the gene expression level of the identified genes.
  • the kit may also contain primers or probes for detecting one or more housekeeping genes as a positive control.
  • the kit does not contain probes for any other genes that are predictive of short term or long term survivorship of breast cancer other than the genes identified herein.
  • the kit further comprises instruction for inputting the gene expression values into function 1 , or function 2, or combinations thereof to obtain an output score.
  • the instructions further provide comparing the output score for each function to a cutoff value and determining if the subject is likely to have long term survival if the output score is less than the cutoff value or if the subject is likely to have short term survival if the subject has an output score greater than or equal to the cutoff value for each function.
  • a kit further comprises a nontransitory computer readable storage medium having computer-executable instructions that, when executed by a computing device, cause the computing device to perform a step comprising: calculating an output score by inputting gene expression levels of a set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer as described previously herein.
  • the nontransitory computer readable storage medium having computer-executable instructions that, when executed by a computing device, cause the computing device to perform a step comprising: comparing the output score to a cutoff value and displaying whether the subject is likely to have long term survival if the output score is less than the cutoff value or if the subject is likely to have short term survival if the subject has an output score greater than or equal to the cutoff value for each function.
  • the kit includes a nontransitory computer readable medium with instructions for analyzing the gene expression data identified above.
  • the instructions provide for a computer implemented method comprising a)receiving gene expression levels of a first set of genes comprising CCND1 , CELSR1 , D FZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, ⁇ 1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient; b) inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score c) optionally, comparing the score to a cutoff value to identify the patient as a treatment responder or nonresponder; and d) optionally, communicating
  • the kit includes instructions for communicating the gene expression information to a remote computing device.
  • the kit includes instructions for comparing the score from function 1 or function 2 to cutoff value.
  • Pathological complete response to chemotherapy was defined as the absence of any residual invasive cancer at the breast site and at the nearest axillary lymph node site.
  • Ninety three subjects were able to complete the aforementioned treatment protocol in terms of dosage and frequency (number of administered courses) of taxol and the other drugs. Of those 93 subjects, 20 responded to the taxol-based chemotherapy and had no residual invasive cancer at the end of the six-month treatment, whereas the remaining 73 did not do so.
  • the cut-off score of the Fl prognostic biomarker model was determined by taking into account the results of the following two analyses: (1) calculation of the optimal point on the ROC curve based on the 50 scores of the 50 original subjects used in the discovery study
  • Table 4 shows the observed mean Fl scores of the two groups (R and NR) of the 43 unknown subjects in the validation study. As can be seen, both of those group mean scores, as observed in the validation study with the 43 unknown subjects, fall within the 99.99% confidence interval of the respective group mean scores as predicted in the discovery study (Table 3).
  • IP A Ingenuity Pathway Analysis
  • the CCNDl (also cyclin Dl) gene encodes a protein that belongs to the cyclin family, the members of which are regulators of CDK kinases.
  • Overexpression of the CCNDl gene which alters cell cycle progression, has been observed in a variety of tumors and may contribute to tumorigenesis.
  • the CCNDl gene has been observed to interact with the BRCAl and BRCA2 genes, known to be familial breast and ovarian cancer susceptibility genes.
  • Over- expression of CCNDl has been shown to play a crucial role in the development and progression of several types of cancers, such as breast, esophageal, bladder, and lung cancer.
  • CCNDl has been linked to the development of resistance to endocrine drugs in breast cancer cells.
  • Over-expression of CCDN1 has also been shown to contribute to the progression of breast tumor cells to invasive carcinomas.
  • the CCND1 gene was significantly over-expressed in the NR group (non-responders) relative to the R group (responders)
  • the LAMA5 (laminin, alpha 5) gene encodes a protein that belongs to the alpha subfamily of laminin proteins, which constitute a major component of basement membranes, and which affect tissue development in many organs.
  • Over-expression of the LAMA5 gene has been observed in various types of cancer, such as glioma, melanoma, hepatocellular carcinoma, lung adenocarcinoma, breast cancer, ovarian cancer, etc., especially in connection with tumor cell migration and invasiveness.
  • oncogenesis and metastatic colonization In addition to oncogenesis and metastatic colonization,
  • the gene FAAH (fatty acid amide hydrolase) encodes a protein that is responsible for the hydrolysis of a number of primary and secondary fatty acid amides.
  • over-expression of the FAAH gene resulted in cell invasion and cell migration in prostate carcinoma cells.
  • tumor over-expression of FAAH has been associated with prostate cancer severity and outcome, and it has been shown that antiproliferative effects could be observed in prostate cancer cell lines by inhibiting the FAAH enzyme.
  • FAAH was significantly over-expressed in the subjects that failed to respond to the T/FAC treatment.
  • the FAAH gene was significantly over- expressed in the NR group (non-responders) relative to the R group (responders).
  • the RARA gene encodes a protein (retinoic acid receptor alpha) that regulates transcription.
  • RARA retinoic acid receptor alpha
  • over-expression of RARA has been shown to induce cell proliferation via direct up-regulation of c-MYC in mice.
  • Over- expression of RARA has also been observed in human ovarian tumor cells.
  • ER receptor a and that of RARA are coordinated; more specifically, over-expression of the former induces overexpression of the latter in ER-positive breast cancer cells. More interestingly, however, regarding our findings, it has also been observed that the crucial biological effects exerted by RARA on human breast cancer cells are mediated regardless of the ER status of those cells.
  • the RARA gene was significantly over-expressed in the NR group (non-responders) relative to the R group (responders).
  • the CELSRl (cadherin, EGF LAG seven-pass G-type receptor 1) gene encodes a protein that is a member of the flamingo subfamily, which is part of the cadherin superfamily.
  • the flamingo cadherins are located at the plasma membrane and are thought to be receptors involved in contact-mediated cell communication. In squamous cell carcinoma cells, it has been shown that over-expressed G protein-coupled receptor proteins, via communication with EGFR
  • the IGKV1 -5 (immunoglobulin kappa variable 1-5) gene encodes a protein whose molecular function is antigen binding, and which is involved in compliment activation, innate immune response, and in regulation of immune response, in general. Although little is known about the exact function of IGKV1 -5, it has been shown that it is expressed in leukocytes in human peripheral blood, and that various types of cancer cells effect significant reduction of the expression of immune-response related genes, such as those involved in antigen presentation pathway, genes in the B-cell receptor complex, genes in the human leukocyte antigen (HLA) class, etc.
  • HLA human leukocyte antigen
  • the Affymetrix HG-U133 A probe set 207470_at corresponds to DKFZp566H0824 (hypothetical LOC54744). According to our results, this unknown gene was significantly over- expressed in the NR group (non-responders) relative to the R group (responders).
  • the UBE2J1 (ubiqui tin-conjugating enzyme E2, Jl, U) gene encodes a protein that is a member of the E2 ubiqui tin-conjugating enzyme family. The modification of proteins with ubiquitin is an important cellular mechanism that targets abnormal or shortlived proteins for degradation.
  • BRCA1 via its binding to UBE2J1 , as well as to other members of the E2 family, directs the synthesis of specific polyubiquitin chain linkages. Given that BRCA1 functions as tumor suppressor and plays a role in DNA damage repair, it follows that an abnormal down-regulation of BRCA1 would most likely entail a down-regulation of UBE2J1.
  • the UBE2J1 gene was significantly under-expressed in the NR group (non-responders) relative to the R group (responders).
  • the OXCT1 (3-oxoacid Co A transferase 1) gene encodes a protein that is a
  • OXCT1 mitochondrial matrix enzyme and plays a central role in ketone metabolism.
  • HRAS a well-known oncogene involved in many different types of cancer, suppresses the expression of OXCT1.
  • HRAS a well-known oncogene involved in many different types of cancer
  • over-expression of HRAS in breast cancer tumors can be constitutively mediated via deregulation of HER2, ER, EGFR, and other receptors. That, therefore, over-expression of HRAS in aggressive breast tumor cells leads to suppression of the expression of OXCT1 accords with our finding: the OXCT1 gene was significantly under- expressed in the NR group (non-responders) relative to the R group
  • breast cancer prognostic tests currently in the market not only have limited accuracy (sensitivity and specificity , 80%) but also limited applicability: they can be administered only to specific combinations of the aforementioned three hormone receptors, that is to say, they can be administered to a small subset of the population of the breast cancer patients.
  • accuracy sensitivity and specificity , 80%
  • applicability it can be administered only to specific combinations of the aforementioned three hormone receptors, that is to say, they can be administered to a small subset of the population of the breast cancer patients.
  • Physicians will have the ability to identify with a high degree of accuracy both the responders and the non-responders to current chemotherapy at the outset (at the time of the biopsy and prior to the commencement of chemotherapy).
  • aforementioned nine important genes can assist pharmaceutical companies to test and develop new analogs of chemotherapeutic agents or new cocktails of small molecules that can modulate most, if not all, of those nine genes.
  • Pathological complete response to chemotherapy was defined as the absence of any residual invasive cancer at the breast site and at the nearest axillary lymph node site.
  • Two hundred and sixty six subjects were able to complete the aforementioned treatment protocol in terms of dosage and frequency (number of administered courses) of taxol and the other drugs.
  • 58 responded to the taxol-based chemotherapy and had no residual invasive cancer at the end of the six-month treatment, whereas the remaining 208 did not do so.
  • prognostic biomarker F 2
  • This prognostic biomarker is a complex mathematical function of twelve genes, as shown immediately below.
  • AAAACATGGGCGCTTACACTGTTGC SEQ ID NO:47

Abstract

The present invention relates to methods for the prognosis of breast cancer, in particular methods to distinguish between breast cancer patients who will respond or not respond to therapy.

Description

METHODS AND KITS FOR SELECTION OF A TREATMENT FOR BREAST
CANCER
This application is being filed on 16 October 2012, as a PCT International Patent application and claims priority to U.S. Patent Application Serial No. 61/548,041 filed on 17 October 201 1 and U.S. Patent Application Serial No. 61/641,532 filed on 02 May 2012, the disclosures of which are incorporated herein by reference in their entireties.
Background of the Invention
Breast cancer is the leading cancer in women in the United States in terms of annual incidence rate (~ 207,090 new cases/yr.) and the second most lethal cancer (~ 39,840 deaths / yr.) [American Cancer Society, 2010; Jemal et al., 2010]. Treatment may entail lumpectomy or mastectomy and removal of some of the axillary lymph nodes, and it may involve chemotherapy (with taxol or other chemotherapeutic agents), before or after surgery, hormone therapy, or radiation [American Cancer Society, 2010].
There are currently prognostic breast biomarkers that can guide treatment based on the estrogen receptor, progesterone receptor, or HER2 status of the patient. However, no biomarkers are available that can predict which breast cancer patients will respond to chemotherapy regardless of the status of the three above mentioned receptors with both sensitivity and specificity > 80%, as mandated by the latest FDA requirements. Thus, there is a need to identify additional biomarkers for diagnosis and prognosis of breast cancer.
Summary of the Invention
We have developed two sets of prognostic biomarkers for breast cancer. Our prognostic biomarker for breast cancer for the US-based study demonstrated an overall sensitivity of 90.00 % and an overall specificity of 91.78 % (AUC = 96.16 %) regardless of the estrogen receptor, progesterone receptor, or HER2 status of the patient.
Our prognostic biomarker for the global study demonstrated an overall sensitivity of 84.5% and an overall specificity of 85.1 % (AUC = 88.7 %), and again regardless of the estrogen receptor, progesterone receptor, or HER2 status of the patient. Our prognostic biomarker tests for breast cancer empowers physicians to identify at the time of the biopsy, and prior to the administration of any chemotherapeutic treatment, those patients who will respond to the six-month taxol-based chemotherapy and will have no residual invasive cancer, as well as those patients who will not respond. The ability to identify the responders and the non-responders to the taxol-based chemotherapy at the outset (time of biopsy) regardless of age, ethnicity, and status of hormone receptors (estrogen, progesterone, or HER2), or stage of cancer will have a significant impact on guiding the patient and their oncologist on the most appropriate treatment options.
In addition to the area of personalized medicine, our prognostic biomarker test for breast cancer can have a significant impact in the area of pharmacogenomics. Pharmaceutical companies may utilize our prognostic biomarker test to develop new chemotherapeutic treatments which would be specifically aimed at those breast cancer patients that do not respond to the current taxol-based chemotherapy, and which would seek to restore the gene networks that are characteristically aberrant in that subpopulation of patients.
One embodiment provides a method to determine if an breast cancer patient is a treatment responder or a non-responder comprising measuring the level of expression of at least one gene in a sample from the patient, wherein the level of expression of the at least one gene in the sample is an indication that the subject is a treatment responder or a non-responder to breast cancer chemotherapy. In embodiments, the sample is a tumor cell sample obtained from a tumor or from blood.
One embodiment provides a method for diagnosing breast cancer in a subject comprising: measuring the level of expression of at least one gene in a test sample from a subject and comparing the level of expression with the level of expression of the at least one gene in a control sample from a healthy subject, wherein a higher or lower level of expression of the gene in the test sample compared with the level of expression in the control sample is an indication that the subject will respond or not respond to a breast cancer treatment. In one embodiment, the mRNA levels are measured. In another embodiment, the protein levels are measured. In one embodiment, the gene expression levels are measured by microarray analysis.
An embodiment provides a method of identifying markers in an individual correlated with the individual's likelihood being a responder or nonresponder comprising: assaying genetic material from the individual for the expression level of at least one gene, wherein the expression levels of at least one gene are associated with the likelihood of the patient being a responder or nonresponder to a breast cancer treatment.
In one embodiment, the measurement of gene expression provides a diagnosis which indicates that the subject/patient will respond to treatment. In another embodiment, the measurement of gene expression provides a diagnosis that the subject/patient will not respond to treatment. In one embodiment, the subject/patient is a mammal, such as a human. In one embodiment, a health care provider is informed. In another embodiment, the subject/patient is treated for breast cancer. In embodiments, the treatment is selected based on whether the patient has a tumor that is responder or a non responder. In some embodiments, if the patient is a nonresponder, the patient is not treated with preoperative chemotherapy.
In embodiments, a method of selecting a treatment for a subject having breast cancer comprises determining whether a subject having breast cancer is likely to have short term or long term survival by a method comprising measuring the level of gene expression of at least a set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 in a sample comprising breast cancer cells from the subject; inputting the expression levels of the set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to obtain an output score; determining whether the subject is likely to have long term survival(responder) by determining if the output score is less than a cutoff value or whether the subject is likely to have short term survival (nonresponder) by determining if the output score is greater than or equal to the cutoff value, wherein the cutoff value is a value determined by identifying a value between the 99% confidence interval of a mean output score of a first set of samples from subjects known to have short term survival and the 99% confidence interval of a mean output score of a second set of samples from subjects known to have long term survival; and optionally, displaying whether the output score is greater than or equal to the cutoff value or less than the cutoff value to a health care worker so that the health care worker can select a treatment for the subject.
In embodiments, a method of selecting a treatment for a subject having breast cancer comprises determining whether a subject having breast cancer is likely to have short term or long term survival by a method comprising measuring the level of gene expression of at least a set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, ΝΚΑΓΝ1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK in a sample comprising breast cancer cells from the subject; inputting the expression levels of the set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to obtain an output score; determining whether the subject is likely to have long term survival by determining if the output score is less than a cutoff value or whether the subject is likely to have short term survival by determining if the output score is greater than or equal to the cutoff value, wherein the cutoff value is a value determined by identifying a value between the 99% confidence interval of a mean output score of a first set of samples from subjects known to have short term survival and the 99%
confidence interval of a mean output score of a second set of samples from subjects known to have long term survival; and optionally, displaying whether the output score is greater than or equal to the cutoff value or less than the cutoff value to a health care worker so that the health care worker can select a treatment for the subject.
In embodiments, the methods further comprise treating a subject likely to have long term survival with standard chemotherapy (T/FAC). In embodiments, standard chemotherapy
(T/FAC) comprises paclitaxel, 5-fluorouracil, doxorubicin, and cyclophosphamide. In embodiments, the method further comprises treating a subject likely to have short term survival(nonresponder) with therapy in addition to or in place of standard chemotherapy. In embodiments, an alternative therapy comprises a therapy selected from the group consisting of antiangiogenesis compounds, taxane analogues, tubulin binding agents, and ubiquitination inhibitors. In embodiments, a subject likely to have short term survival is treated with an inhibitor of a protein selected from the group consisting of CCND1 , RARA, UBE2J1 , and combinations thereof.
In yet other embodiments, the disclosure provides a method for selecting a treatment for a subject that has breast cancer comprising, the method comprising: calculating an output score, using a computing device, by inputting gene expression levels of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1, RARA, and UBE2J1, or a second set of genes comprising ESR1, BTG3, ODC1, MCM5, TTK, ΝΚΑΓΝ1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK, into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer; and displaying the output score, using a computing device. In embodiments, the method further comprises determining whether the output score is greater than or equal to or less than a cutoff value, using a computing device; and displaying whether the subject is likely to be a short term or long term survivor. In embodiments, the status of the patient is communicated to a health care worker, optionally with a recommendation for treatment. In embodiments, the treatment options include standard chemotherapy, an alternative therapy selected from the group consisting of antiangiogenesis compounds, taxane analogues, tubulin binding agents, and ubiquitination inhibitors, and /or an inhibitor of a protein selected from the group consisting of CCND1 , RARA, UBE2J1 , and combinations thereof. In other embodiments, a nonresponder is not treated with preoperative chemotherapy but may be treated with chemotherapy post surgery.
In another aspect, the disclosure provides kits for selecting a treatment for a breast cancer patient. In embodiments, a kit comprises or consists essentially of primer or a probe or both that specifically hybridizes to each gene of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 . In embodiments, the kit consists essentially of reagents for detecting expression of the first set of genes and contains other reagents such as primer or probes for housekeeping genes, positive controls and/or negative controls. In other embodiments, a kit comprises or consists essentially of: a primer or a probe or both that specifically hybridizes to each gene of a first set of genes comprising ESR1 , BTG3, ODC1, MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK .
In embodiments, the kit contains no more than 200 primers or probes or both, no more than 175 primers, probes or both, no more than 150 primers, probes or both, no more than 125 primers, probes or both, no more than 100 primers, probes or both, no more than 75 primers, probes or both, no more than 50 primers, probes or both, no more than 25 primers, probes or both, or no more than 15 primers, probes or both.
In embodiments, a kit further comprises a non transitory computer readable storage medium having computer-executable instructions that, when executed by a computing device, cause the computing device to perform a step comprising: calculating an output score by inputting gene expression levels of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IG V1 -5, LAMA5, OXCTl , RARA, and UBE2J1 , or a second set of genes comprising ESRl , BTG3, ODCl , MCM5, TTK, NKAINl , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK or both from a sample from the patient into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to provide an output score.
In embodiments, the disclosure provides a computing device comprising a processing unit; and a system memory connected to the processing unit, the system memory including instructions that, when executed by the processing unit, cause the processing unit to: calculate an output score by inputting gene expression levels of a set of genes comprising a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCTl , RARA, and UBE2J1, or a second set of genes comprising ESRl , BTG3, ODCl , MCM5, TTK, NKAINl , IDUA, SLC43 A3, TXNDC5, SLC7A8, and MELK from a sample, into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer; and display the output score. In yet another embodiment, the system memory includes instructions, that when executed by the processing unit, cause the processing unit to determine whether the output score is greater than or equal to or less than a cutoff value; and displaying whether the subject is likely to be a short term or long term survivor. In embodiments, the system memory further includes instructions for making a recommendation for treatment options. In embodiments, those recommendations include: for a responder standard chemotherapy prior to surgery, and for a nonresponder no chemotherapy or an alternative chemotherapy prior to surgery.
Brief Description of the Drawings
Figure 1. Results and performance assessment of the Fi breast cancer prognostic biomarker during the Discovery Study. As can be seen in this scatter plot and bar graph, of the 10 original responders, 9 had Fi scores that were lower than the cut-off score of 4.6683, and they were, therefore, classified correctly as responders (R) [Sensitivity = (9/10) = 0.900]. Of the 40 original non-responders, 36 had F| scores that were higher than the cut-off score of 4.6683, and they were, therefore, classified correctly as non-responders (NR) [Specificity = (36/40) = 0.900]. The AUC value, the probability of significance (P), and the mean group Fj score and standard deviation for both groups (R & NR) are also shown.
Figure 2. Results and performance assessment of the Fi breast cancer prognostic biomarker during the Validation Study. As can be seen in this scatter plot and bar graph, of the 10 unknown responders (new subjects and different from the original ones), 9 had Fj scores that were lower than the cut-off score of 4.6683, and they were, therefore, classified correctly as responders (R) [Sensitivity = (9/10) = 0.900]. Of the 33 unknown non-responders(new subjects and different from the original ones), 31 had F| scores that were higher than the cut-off score of 4.6683, and they were, therefore, classified correctly as non-responders (NR) [Specificity = (31/33) = 0.939]. The AUC value, the probability of significance (P), and the mean group Fi score and standard deviation for both groups (R & NR) are also shown.
Figure 3. Overall results and performance assessment of the F| breast cancer prognostic biomarker from the Discovery & Validation Studies. As can be seen in this scatter plot and bar graph, of the 20 responders, 18 had Fi scores that were lower than the cut-off score of 4.6683, and they were, therefore, classified correctly as responders (R) [Sensitivity = (18/20) = 0.900]. Of the 73 non-responders, 67 had Fj scores that were higher than the cut-off score of 4.6683, and they were, therefore, classified correctly as non-responders (NR) [Specificity = (67/73) = 0.918]. The AUC value, the probability of significance (P), and the mean group Fi score and standard deviation for both groups (R & NR) are also shown.
Figure 4. Results and performance assessment of the F2 breast cancer prognostic biomarker during the Discovery Study. As can be seen in this scatter plot and bar graph, of the 38 original responders, 32 had F2 scores that were lower than the cut-off score of 13.69, and they were, therefore, classified correctly as responders (R) [Sensitivity = (32/38) = 0.842]. Of the 134 original non-responders, 1 10 had F2 scores that were higher than the cut-off score of 13.69, and they were, therefore, classified correctly as non-responders (NR) [Specificity = (1 10/134) = 0.821]. The AUC value, the probability of significance (P), and the mean group F2score and standard deviation for both groups (R & NR) are also shown.
Figure 5. Results and performance assessment of the F2 breast cancer prognostic biomarker during the Validation Study. As can be seen in this scatter plot and bar graph, of the 20 unknown responders (new subjects and different from the original ones), 17 had F2 scores that were lower than the cut-off score of 13.69, and they were, therefore, classified correctly as responders (R) [Sensitivity = (17/20) = 0.850]. Of the 74 unknown non-responders(new subjects and different from the original ones), 67 had F2 scores that were higher than the cut-off score of 13.69, and they were, therefore, classified correctly as non-responders (NR.) [Specificity = (67/74) = 0.905]. The AUC value, the probability of significance (P), and the mean group F2score and standard deviation for both groups (R & NR) are also shown.
Combining the results of the F2 breast cancer prognostic biomarker from the Discovery and the Validation Studies, Figure 6 below depicts the overall performance of the F2 prognostic biomarker.
Figure 6. Overall results and performance assessment of the F2 breast cancer prognostic biomarker from the Discovery & Validation Studies. As can be seen in this scatter plot and bar graph, of the 58 responders, 49 had F2 scores that were lower than the cut-off score of 13.69, and they were, therefore, classified correctly as responders (R) [Sensitivity = (49/58) = 0.845]. Of the 208 non-responders, 177 had F2 scores that were higher than the cut-off score of 13.69, and they were, therefore, classified correctly as non-responders (NR) [Specificity - (177/208) = 0.851 ]. The AUC value, the probability of significance (P), and the mean group F2score and standard deviation for both groups (R & NR) are also shown.
Figure 7 shows a flow diagram for an analytic method for determining the prognosis of a breast cancer tissue sample.
Definitions
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, several embodiments with regards to methods and materials are described herein.
As used herein, each of the following terms has the meaning associated with it in this section. The articles "a" and "an" are used herein to refer to one or to more than one (i.e. to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element. A "subject" or "patient" is a vertebrate, including a mammal, such as a human. Mammals include, but are not limited to, humans, farm animals, sport animals and pets.
The term "about," as used herein, means approximately, in the region of, roughly, or around. When the term "about" is used in conjunction with a numerical range, it modifies that range by extending the boundaries above and below the numerical values set forth. In general, the term "about" is used herein to modify a numerical value above and below the stated value by a variance of 10%. In one aspect, the term "about" means plus or minus 20% of the numerical value of the number with which it is being used. Therefore, about 50% means in the range of 45%-55%. Numerical ranges recited herein by endpoints include all numbers and fractions subsumed within that range (e.g. 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.90, 4, and 5). It is also to be understood that all numbers and fractions thereof are presumed to be modified by the term "about."
The term "binding" refers to the adherence of molecules to one another, such as, but not limited to, enzymes to substrates, ligands to receptors, antibodies to antigens, DNA binding domains of proteins to DNA, and DNA or RNA strands to complementary strands. "Binding partner," as used herein, refers to a molecule capable of binding to another molecule.
The term "biological sample," or "patient sample" as used herein, refers to samples obtained from a subject, including, but not limited to, skin, hair, tissue, blood, plasma, serum, cells, sweat, saliva, feces, tissue, biopsy samples, and/or urine.
The term "complementary," "complement," or "complementary nucleic acid sequence" refers to the nucleic acid strand that is related to the base sequence in another nucleic acid strand by the Watson-Crick base-pairing rules. In general, two sequences are complementary when the sequence of one can bind to the sequence of the other in an anti-parallel sense wherein the 3 '-end of each sequence binds to the 5 '-end of the other sequence and each A, T(U), G, and C of one sequence is then aligned with a T(U), A, C, and G respectively, of the other sequence. RNA sequences can also include complementary G U or U/G basepairs.
The terms "comprises", "comprising", and the like can have the meaning ascribed to them in U.S. Patent Law and can mean "includes", "including" and the like.
The terms "deoxyribonucleic acid" and "DNA" as used herein mean a polymer composed of deoxyribonucleotides. The terms "determining," "measuring," and "assessing," and "assaying" are used interchangeably and include both quantitative and qualitative determinations. The use of the word "detect" and its grammatical variants refers to measurement of the species without quantification, whereas use of the word "determine" or "measure" with their grammatical variants are meant to refer to measurement of the species with quantification. The terms "detect" and "identify" are used interchangeably herein.
As used herein, "health care provider or worker" includes either an individual or an institution that provides preventive, curative, promotional or rehabilitative health care services to a subject, such as a patient. In one embodiment, the data is provided to a health care provider so that they may use it in their diagnosis/treatment of the patient.
As used herein, "homology" is used synonymously with "identity." The determination of percent identity between two nucleotide or amino acid sequences can be accomplished using a mathematical algorithm. For example, a mathematical algorithm useful for comparing two sequences is the algorithm of Karlin and Altschul (1990), modified as in Karlin and Altschul (1993). This algorithm is incorporated into the NBLAST and XBLAST programs of Altschul, et al., and can be accessed, for example at the National Center for Biotechnology Information (NCBI) world wide web site. BLAST nucleotide searches can be performed with the NBLAST program (designated "blastn" at the NCBI web site), using the following parameters: gap penalty = 5; gap extension penalty = 2; mismatch penalty = 3; match reward = 1 ; expectation value 10.0; and word size = 1 1 to obtain nucleotide sequences homologous to a nucleic acid described herein. BLAST protein searches can be performed with the XBLAST program (designated "blastn" at the NCBI web site) or the NCBI "blastp" program, using the following parameters: expectation value 10.0, BLOSUM62 scoring matrix to obtain amino acid sequences homologous to a protein molecule described herein. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al. Alternatively, PSI-Blast or PHI- Blast can be used to perform an iterated search which detects distant relationships between molecules and relationships between molecules which share a common pattern. When utilizing BLAST, Gapped BLAST, PSI-Blast, and PHI-Blast programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can be used. The percent identity between two sequences can be determined using techniques similar to those described above, with or without allowing gaps. In calculating percent identity, typically exact matches are counted.
As used herein, a "substantially homologous amino acid sequences" or "substantially identical amino acid sequences" includes those amino acid sequences which have at least about 92%, or at least about 95% homology or identity, including at least about 96% homology or identity, including at least about 97% homology or identity, including at least about 98% homology or identity, and at least about 99% or more homology or identity to an amino acid sequence of a reference antibody chain. Amino acid sequence similarity or identity can be computed by using the BLASTP and TBLASTN programs which employ the BLAST (basic local alignment search tool) 2.0.14 algorithm. The default settings used for these programs are suitable for identifying substantially similar amino acid sequences for purposes of the present invention.
"Substantially homologous nucleic acid sequence" or "substantially identical nucleic acid sequence" means a nucleic acid sequence corresponding to a reference nucleic acid sequence wherein the corresponding sequence encodes a peptide having substantially the same structure and function as the peptide encoded by the reference nucleic acid sequence; e.g., where only changes in amino acids not significantly affecting the peptide function occur. In one
embodiment, the substantially identical nucleic acid sequence encodes the peptide encoded by the reference nucleic acid sequence. The percentage of identity between the substantially similar nucleic acid sequence and the reference nucleic acid sequence is at least about 50%, 65%, 75%, 85%, 92%, 95%, 99% or more. Substantial identity of nucleic acid sequences can be determined by comparing the sequence identity of two sequences, for example by physical/chemical methods (i.e., hybridization) or by sequence alignment via computer algorithm.
"Isolated" or "purified" generally refers to isolation of a substance (compound, polynucleotide, protein, polypeptide, polypeptide, chromosome, etc.) such that the substance comprises the majority percent of the sample in which it resides. Typically in a sample a substantially purified component comprises 50%, preferably 80%-85%, more preferably 90-95% of the sample. Techniques for purifying polynucleotides, polypeptides and intact chromosomes of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography, sorting, and sedimentation according to density.
The terms "nucleic acid" and "polynucleotide" are used interchangeably herein to describe a polymer of any length, e.g., greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, usually up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, or compounds produced synthetically which can hybridize with naturally occurring nucleic acids in a sequence specific manner analogous to that of two naturally occurring nucleic acids, e.g., can participate in Watson-Crick base pairing interactions. The term "hybrid" refers to a double-stranded nucleic acid molecule formed by hybridization between complementary nucleotides.
The terms "nucleoside" and "nucleotide" are intended to include those moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles. In addition, the terms
"nucleoside" and "nucleotide" include those moieties that contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides or nucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.
The term "oligonucleotide" as used herein denotes single stranded nucleotide multimers of from about 10 to 100 nucleotides and up to 200 nucleotides in length. Oligonucleotides are usually synthetic and, in many embodiments, are under 50 nucleotides in length. Each oligonucleotide may have any suitable length. For example, the length of the oligonucleotide may be between 60 nucleotides and 200 nucleotides (inclusive), between 80 nucleotides and 200 nucleotides, between 100 nucleotides and 200 nucleotides, between 125 nucleotides and 200 nucleotides, or between 150 nucleotides and 200 nucleotides. In some cases, the oligonucleotide may have a length of at least 60 nucleotides, at least 80 nucleotides, at least 100 nucleotides, or at least 150 nucleotides, and in certain embodiments, the oligonucleotide may have a length no greater than 200 nucleotides, no greater than 175 nucleotides, or no greater than 160 nucleotides. Oligonucleotides having such nucleotide lengths may be prepared using any suitable method, for example, using de novo DNA synthesis techniques known to those of ordinary skill in the art, such as solid-phase DNA synthesis techniques. Often, such oligonucleotides can be designed with the aid of a computer, based on the sequence of the genome and/or a region of interest. Oligonucleotides can serve as primers or probes in accord with standard methods. Primers and probes can be designed with publicly available software such as PRIME , Primer3, Webprimer, Genefisher, OLIGO Primer analysis software, and PROBER.
The term "primer" refers to a nucleic acid capable of acting as a point of initiation of synthesis along a complementary strand when conditions are suitable for synthesis of a primer extension product. The synthesizing conditions include the presence of four different bases and at least one polymerization-inducing agent such as reverse transcriptase or DNA polymerase. These are present in a suitable buffer, which may include constituents which are co-factors or which affect conditions such as pH and the like at various suitable temperatures. A primer is preferably a single strand sequence, such that amplification efficiency is optimized, but double stranded sequences can be utilized. Primers are typically at least about 15 nucleotides. In embodiments, primers can have a length of anywhere from 15 to 2000 nucleotides. In embodiments, primers have a melting temp of at least 50°C, 52°C, 55°C, 58°C, 60°C, or 65°C.
The term "probe" refers to a nucleic acid that hybridizes to a target sequence. In some embodiments, a probe includes about eight nucleotides, about 10 nucleotides, about 15 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 40 nucleotides, about 50 nucleotides, about 60 nucleotides, about 70 nucleotides, about 75 nucleotides, about 80 nucleotides, about 90 nucleotides, about 100 nucleotides, about 1 10 nucleotides, about 1 15 nucleotides, about 120 nucleotides, about 130 nucleotides, about 140 nucleotides, about 150 nucleotides, about 175 nucleotides, about 187 nucleotides, about 200 nucleotides, about 225 nucleotides, and about 250 nucleotides. In embodiments, probes have a melting temp of at least 50°C, 52°C, 55°C, 58°C, 60°C, or 65°C. A probe can further include a detectable label. Detectable labels include, but are not limited to, a fluorophore (e.g.,Texas- Red®, Fluorescein isothiocyanate, etc.,) and a hapten, (e.g., biotin). A detectable label can be covalently attached directly to a probe oligonucleotide, e.g., located at the probe's 5' end or at the probe's 3' end. A probe including a fluorophore may also further include a quencher, e.g., Black Hole Quencher™, Iowa Black™, etc.
The terms "ribonucleic acid" and "RNA" as used herein mean a polymer composed of ribonucleotides.
The term "stringent assay conditions" as used herein refers to conditions that are compatible to produce binding pairs of nucleic acids, e.g., probes and targets, of sufficient complementarity to provide for the desired level of specificity in the assay while being incompatible to the formation of binding pairs between binding members of insufficient complementarity to provide for the desired specificity. The term stringent assay conditions refer to the combination of hybridization and wash conditions.
A "stringent hybridization" and "stringent hybridization wash conditions" in the context of nucleic acid hybridization (e.g., as in array, Southern or Northern hybridizations) are sequence dependent, and are different under different environmental parameters. Stringent hybridization conditions that can be used to identify nucleic acids within the scope of the invention can include, e.g., hybridization in a buffer comprising 50% formamide, 5xSSC, and 1% SDS at 42°C, or hybridization in a buffer comprising 5xSSC and 1% SDS at 65°C, both with a wash of 0.2xSSC and 0.1 % SDS at 65°C. Exemplary stringent hybridization conditions can also include hybridization in a buffer of 40% formamide, 1 M NaCl, and 1%» SDS at 37°C, and a wash in I xSSC at 45°C. Alternatively, hybridization to filter-bound DNA in 0.5 M NaHP04, 7% sodium dodecyl sulfate (SDS), 1 mM EDTA at 65°C, and washing in O. l xSSC/0.1% SDS at 68°C can be employed. Yet additional stringent hybridization conditions include hybridization at 60°C or higher and 3 x SSC (450 mM sodium chloride/45 mM sodium citrate) or incubation at 42°C in a solution containing 30% formamide, 1M NaCl, 0.5% sodium sarcosine, 50 mM MES, pH 6.5. Those of ordinary skill will readily recognize that alternative but comparable hybridization and wash conditions can be utilized to provide conditions of similar stringency.
In certain embodiments, the stringency of the wash conditions can determine whether a nucleic acid is specifically hybridized to a probe. Wash conditions used to identify nucleic acids may include, e.g. a salt concentration of about 0.02 M at pH 7 and a temperature of about 20°C to about 40°C; or, a salt concentration of about 0.15 M NaCl at 72°C for about 15 minutes; or, a salt concentration of about 0.2xSSC at a temperature of about 30°C to about 50°C for about 2 to about 20 minutes; or, the hybridization complex is washed twice with a solution with a salt concentration of about 2xSSC containing 1% SDS at room temperature for 15 minutes and then washed twice by O.l xSSC containing 0.1% SDS at 37°C for 15 minutes; or, equivalent conditions. Stringent conditions for washing can also be, e.g., 0.2xSSC/0.1 % SDS at 42°C. See Sambrook, Ausubel, or Tijssen for detailed descriptions of equivalent hybridization and wash conditions and for reagents and buffers, e.g., SSC buffers and equivalent reagents and conditions.
Stringent assay conditions are hybridization conditions that are at least as stringent as the above representative conditions, where a given set of conditions are considered to be at least as stringent if substantially no additional binding complexes that lack sufficient complementarity to provide for the desired specificity are produced in the given set of conditions as compared to the above specific conditions, whereby "substantially no more" is meant less than about 5-fold more, typically less than about 3 -fold more. Other stringent hybridization conditions are known in the art and may also be employed, as appropriate.
A "fragment" or "segment" is a portion of an amino acid sequence, comprising at least one amino acid, or a portion of a nucleic acid sequence comprising at least one nucleotide. The terms "fragment" and "segment" are used interchangeably herein. As used herein, the term "fragment," as applied to a protein or peptide, can ordinarily be at least about 3-15 amino acids in length, at least about 15-25 amino acids, at least about 25-50 amino acids in length, at least about 50-75 amino acids in length, at least about 75-100 amino acids in length, and greater than 100 amino acids in length. As used herein, the term "fragment" as applied to a nucleic acid, may ordinarily be at least about 20 nucleotides in length, typically, at least about 50 nucleotides, more typically, from about 50 to about 100 nucleotides, at least about 100 to about 200 nucleotides, at least about 200 nucleotides to about 300 nucleotides, at least about 300 to about 350, at least about 350 nucleotides to about 500 nucleotides, at least about 500 to about 600, at least about 600 nucleotides to about 620 nucleotides, at least about 620 to about 650, and or the nucleic acid fragment will be greater than about 650 nucleotides in length.
The term "standard," as used herein, refers to something used for comparison, such as control or a healthy subject.
Methods of selecting a treatment
The disclosure provides methods for selecting a treatment for a subject having breast cancer. In embodiments, a method of selecting a treatment for a subject that has breast cancer comprises: a)determining whether the subject is likely to have short term or long term survival by a method comprising i)measuring the level of gene expression of at least a set of genes in a sample comprising breast cancer cells from the subject; ii)inputting the expression levels of the set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to obtain an output score; iii)determining whether the subject is likely to have long term survival by determining if the output score is less than a cutoff value or whether the subject is likely to have short term survival by determining if the output score is greater than or equal to the cutoff value, wherein the cutoff value is a value determined by identifying a value between the 99% confidence interval of the mean output score of a first set of samples from subjects known to have short term survival and the 99% confidence interval of the mean output score of a second set of samples from subjects known to have long term survival; and iv) optionally, displaying whether the output score is greater than or equal to the cutoff value or less than the cutoff value to a health care worker so that the health care worker can select a treatment for the subject.
In embodiments, the set of genes comprises at least the genes CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 , RARA, and UBE2J1. In other embodiments the set of genes comprises at least the genes ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK. Each of the genes identified herein as useful in determining a short term or long term survivor can have one or more variants that are known and primers and probes can be designed to detect all variants and/or each variant. Variants include those nucleic acids or proteins that are " Substantially homologous nucleic acid sequence" or "substantially identical nucleic acid sequence" "substantially homologous amino acid sequences" or "substantially identical amino acid sequences". Such variants are either known or may be readily determined.
In one aspect of the disclosure methods are provided for detecting breast cancer biomarkers in a biological sample. The biomarkers are determined by gene expression of a first set of genes or a second set of genes. Detection of the biomarkers is useful to identify patients that are responders or nonresponders to a treatment. In some embodiments, the treatment is preoperative chemotherapy. Identifying patients that are responders or nonresponders provides for the ability to apply a different treatment to those identified as nonresponders, screen for compounds that may be more effective on those breast cancer tumors that are non responsive to standard preoperative chemotherapy, and stratify patients for treatment either therapeutically or during clinical trials.
In an embodiment, the disclosure provides a method to identify whether a breast cancer patient is treatment responder or non-responder comprising: determining the expression level of a first set of genes comprising CCND1 , CELSR1, DKFZp566H0824, FAAH, IGKV1 -5,
LAMA5, OXCTl, RARA, and UBE2J1 genes in a sample from the patient, wherein the level of expression of the CCND1 , CELSR1, DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCTl , RARA, and UBE2J1 genes indicates that the subject is a responder or non-responder.
In another embodiment, the disclosure provides a method to identify whether a breast cancer patient is treatment responder or non-responder comprising: determining the expression level of second set of genes comprising ESRl , BTG3, ODCl , MCM5, TTK, NKAINI , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes in a sample from the patient, wherein the level of expression of the ESRl , BTG3, ODCl , MCM5, TTK, NKAINI , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes indicates that the subject is a responder or non-responder.
In yet another embodiment, the disclosure provides a method to identify whether a breast cancer patient is treatment responder or non-responder comprising: determining the expression level of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESRl , BTG3, ODCl , MCM5, TTK, NKAINI , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes in a sample from the patient; inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score; and comparing the score to a cutoff value to identify the patient as a treatment responder or nonresponder.
In yet another embodiment, the disclosure provides a method to identify whether a breast cancer patient is treatment responder or non-responder comprising: inputting the levels of gene expression of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMAS, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESRl , BTG3, ODCl, MCM5, TTK, NKAINI , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes in a sample from the patient into a predictive function to obtain a score; comparing the score to a cutoff value to identify the patient as a treatment responder or nonresponder; and optionally, communicating the identification of the patient as a responder or non responder to a user such as a health care professional.
In a further embodiment, the disclosure provides a nontransitory computer readable medium or computing device implemented method to identify whether a breast cancer patient is a treatment responder or non-responder comprising: a)receiving gene expression levels of a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAINl , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient at a receiver module; b) inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score in a scoring module; c) optionally, comparing the score to a cutoff value in a diagnostic module to identify the patient as a treatment responder or nonresponder; and d) communicating the identification of the patient as a treatment responder or nonresponder to a user.
Subjects and Samples
Samples from a subject having breast cancer are analyzed for gene expression. The subject is a vertebrate, including a mammal, such as a human. Mammals include, but are not limited to, humans, farm animals, sport animals and pets. Samples are obtained from the subject including without limitation skin, hair, tissue, blood, plasma, serum, cells, sweat, saliva, feces, tissue and/or urine. In a specific embodiment, the sample is from a biopsy of suspected breast cancer or from tumor cells found in the blood.
In some embodiments, the sample may be analyzed for the presence of cancerous or precancerous cells using methods known to those of skill in the art. Breast cancers are optionally classified histologically. Infiltrating or invasive ductal cancer is the most common breast cancer histologic type and comprises 70% to 80% of all cases. Types of breast cancer include
Carcinoma, NOS (not otherwise specified), Ductal, Intraductal (in situ), Invasive with predominant intraductal component, Invasive, NOS, Comedo, Inflammatory, Medullary with lymphocytic infiltrate, Mucinous (colloid),Papillary, Scirrhous, Tubular, Lobular In situ, Lobular Invasive with predominant in situ component, Lobular Invasive, and Undifferentiated carcinoma. In some embodiments, breast cancer cell samples are also optionally staged, typically prior to treatment. The American Joint Committee on Cancer (AJCC) staging system provides a strategy for grouping patients with respect to prognosis. Therapeutic decisions are formulated in part according to staging categories but primarily according to tumor size, lymph node status, estrogen-receptor and progesterone-receptor levels in the tumor tissue, human epidermal growth factor receptor 2 (HER2/neu) status, menopausal status, and the general health of the patient.
Breast cancer stages range from 0 to IV, with many subcategories. Lower numbers indicate earlier stages of cancer, while higher numbers reflect late-stage cancers. Stage 0 describes noninvasive (in situ) breast cancer. Ductal carcinoma in situ (DCIS) is an example of stage 0 cancer. Stage ί is an early stage of invasive breast cancer in which: the tumor measures no more than 2 centimeters (cm) in diameter (3/4 inch); and no lymph nodes are involved— the cancer hasn't spread outside the breast. Stage II describes invasive breast cancers in which one of the following is true: the tumor measures less than 2 cm (3/4 inch) but has spread to lymph nodes under the arm; no tumor is found in the breast, but breast cancer cells are found in lymph nodes under the arm; the tumor is between 2 and 5 cm (about 3/4 to 2 inches) and may or may not have spread to lymph nodes under the arm; and the tumor is larger than 5 cm (2 inches) but hasn't spread to any lymph nodes.
Stage III breast cancers are subdivided into three categories— IIIA, IIIB and IIIC— based on a number of criteria. By definition, stage III cancers haven't spread to distant sites. For example, a stage IIIA tumor is larger than 5 cm (2 inches) and has spread to one to three lymph nodes under the arm. Other stage IIIA tumors may be any size and have spread into multiple lymph nodes. The lymph nodes clump and attach to one another or to the surrounding tissue. In stage IIIB breast cancer, a tumor of any size has spread to tissues near the breast— the skin and chest muscles— and may have spread to lymph nodes within the breast or under the arm. Stage IIIB also includes inflammatory breast cancer, an uncommon but aggressive type of breast cancer. Stage IIIC cancer is a tumor of any size that has spread: to 10 or more lymph nodes under the arm; to lymph nodes above or beneath the collarbone and near the neck; or to lymph nodes within the breast itself and to lymph nodes under the arm.
Stage IV breast cancer has spread to distant parts of the body, such as the lungs, liver, bones or brain. In addition, to typing and staging of breast cancer, the breast cancer sample may optionally be analyzed for the presence or absence of one or more markers including estrogen receptor positive, progesterone receptor positive, hormone receptor negative, the presence or absence of Her2, and the presence or absence of both Her2 and hormonal receptors. Triple negative breast cancer is a breast cancer type that lacks hormonal receptors and Her2.
Once a biological sample of the suspected cancer cells are obtained and optionally, analyzed for type, stage, and receptor type, the sample is analyzed for gene expression in accord with the methods described herein. The gene expression analysis provides for identification of the sample of breast cancer cells as a responder to chemotherapy or a nonresponder to chemotherapy. The gene expression analysis is complementary to other information regarding the breast cancer cells and provides a measure of risk assessment that is independent of age, ethnicity, stage of cancer, and receptor status of the cancer.
Detecting Gene Expression
The expression of certain genes has been demonstrated herein to be prognostic of breast cancer. Two sets of genes have been identified as providing classification of the breast cancer cells as responders or nonresponders. These genes are characterized by multiple known transcript sequences that are known or are readily identifiable by searching for related sequences to the known sequences. Within a particular species, gene or transcript sequences for a particular gene are those that have at least 80% sequence identity in the coding sequence, for example, several of the genes and the gene products are known to have isoforms. Transcripts for such isoforms are included within the scope of detecting the gene expression of a particular gene.
The first set of genes include the following: CCND1 , such as cyclin Dl , also known as: BCL1 ; PRAD1 ; U21B31 ; Dl 1 S287E, as exemplified by a target sequence 208712_at and reference sequence gl 77628157; CELSR1 , such as cadherin, EGF LAG seven-pass G-type receptor 1 , also known as: ME2; FMI2; CDHF9; HFMI2; DKFZp434P0729, exemplified by a target sequence 41660_at and reference sequence gl 656966: DKFZp566H0824, also known as hypothetical LOC54744, exemplified as a target sequence 207470_at, and gl 23273884; FAAH, such as fatty acid amide hydrolase, also known as FAAH-1 ; MGC 102823; MGC138146, exemplified by a target sequence 20423 l_s_at, and gl 62739402; IGKV1 -5, such as
immunoglobulin kappa variable 1-5, also known as: VI ; L12; IGKV; L12a; IGKV15; MGC22745; MGC32715; MGC88810, exemplified by a target sequence 214768_x_at, and gl 19718803 ;LAMA5, such as laminin, alpha 5, also known as: KIAA1907, exemplified by a target sequence 210150_s_at, and gl 21264601 ; OXCT1 , such as 3-oxoacid CoA transferase 1 , also known as: OXCT; SCOT, exemplified by a target sequence 202780_at, and gl 1 12382246;
RARA, such as retinoic acid receptor, alpha, also known as: RAR; NR1B 1 , exemplified by an Affymetrix target sequence 216300_x_at, and gl 300388174; and UBE2J1 , such as ubiquitin- conjugating enzyme E2, Jl , U, also known as: UBC6; Ubc6p; CGI-76; NCUBE1 ; HSPC153; HSPC205; NCUBE-1 ; HSU93243; MGC12555, exemplified by a target sequence 217825_s_at, and gl 37577121.
Each of these genes can be described by a number of transcripts that are identified in databases such as Gene in Genbank, the Unigene database, and the Image id database. The target sequences define a unique region for detecting each gene as identified by Affymetrix for their whole genome array U133A. In embodiments, polynucleotide sequences comprising the target sequences are detected as described herein and can be used to design primers or other types of probes such as aptamers that can be utilized in other methods that detect gene expression levels using methods and available programs for primer or probe design.
A second set of genes includes the following: ESR1 , such as estrogen receptor 1 , also known as: ER; ESR; Era; ESRA; NR3A1 ; DKFZp686N23123, exemplified as a target sequence 205225_at, and gl 170295748; BTG3, such as BTG family, member 3, also known as: ANA; TOB5; TOFA; TOB55; MGC8928, exemplified by a target sequence 213134_x_at, and gl 195963405; ODC1, such as ornithine decarboxylase 1 , also known as: ODC, exemplified by a target sequence 200790_at, and gI4505488; MCM5, such as minichromosome maintenance complex component 5, also known as: CDC46; MGC5315; P1-CDC46, exemplified by a target sequence 216237_s_at, and gl 143770796; TTK, such as TTK protein kinase, also known as: ESK; PYT; CT96; MPSl ; MPSl LI ; FLJ38280, exemplified by a target sequence 204822_at, and gI262399360; ΝΚΑΓΝ1 , such as Na+/K+ transporting ATPase interacting 1 , also known as: FAM77C; FLJ 12650, exemplified by a target sequence 219438_at, and gl 296317327; IDUA, such as iduronidase, alpha-L-, also known as: IDA; MPS l , exemplified by a target sequence 205059_s_at, and gl 1 1061 1238; SLC43A3, such as solute carrier family 43, member 3, also known as: EEG1 ; FOAP-13; PRO 1659; SEEEG-1 ; DKFZp762A227, exemplified by a target sequence 2131 13 s at, and gI46410928; TXNDC5, such as thioredoxin domain containing 5 (endoplasmic reticulum), also known as: ERP46; HCC-2; STRF8; PDIA15; UNQ364;
ENDOPDI; MGC3178, exemplified by a target sequence 221253_s_at, and gI313482855;
SLC7A8, such as solute carrier family 7 (amino acid transporter light chain, L system), member 8, also known as: LAT2; LPI-PC1 , exemplified by a target sequence 216092_s_at, and gl 33286427; MCM5, such as minichromosome maintenance complex component 5, also known as: CDC46; MGC5315; P1-CDC46, exemplified by a target sequence 201755_at, and gl 143770796 and MEL , such as maternal embryonic leucine zipper kinase, also known as: HPK38; KIAA0175, exemplified by a target sequence 204825_at, and gl 41281490.
Each of these genes can be described by a number of transcripts that are identified in databases such as Gene in Genbank, the Unigene database, and the Image id database. The target sequences define a unique region for detecting each gene as identified by Affymetrix for their whole genome array U133A. In embodiments, polynucleotide sequences comprising the target sequences are detected as described herein and can be used to design primers or other types of probes such as aptamers that can be utilized in other methods that detect gene expression levels using methods and available programs for primer or probe design.
In embodiments, in samples from patients classified as responders as described herein, expression of IGKV1-5, OXCT1 , and UBE2J1 is increased and the expression of CCND1 , CELSR1 , DKFZp566H0824, FAAH, LAMA5, and RARA is decreased. In other embodiments, the expression of ESR1 , ΝΚΑΓΝ1 , IDUA, and SLC7A8 is decreased and the expression of BTG3, ODC1 , MCM5, TTK, SLC43A3, TXNDC5, MCM5, and MELK is increased. The expression of the gene in the responders is either under expressed or overexpressed as compared to the nonresponders. In embodiments, detecting gene expression levels as up or down regulated allows for classification of a patient as a responder or nonresponder. The gene expression levels can be analyzed using standard statistical methods and/or as described herein.
In embodiments, the expression of only these genes is analyzed except for the analysis of any control gene or nucleic acid sequences to ensure the functionality of the assay. In other embodiments, a subarray or smaller subset of genes that includes the first and/or second set of genes can be analyzed, providing that 200 genes or less are analyzed. In embodiments about 9 to 200 genes are analyzed, about 9 to 150 genes are analyzed, about 9 to 100 genes are analyzed, about 9 to 50 genes are analyzed, or about 9 to 25 genes are analyzed. The other genes that are analyzed may include other known markers for breast cancer or other known markers for ovarian cancer such as LYPLA2, TUBA3C, ACTB, MED13L, OSBPL8, EED, and PKP4 and/or SSRl , USP5, ACTB, HLCS, NDUFB1 , LYPLA2, TUBA3C, MED13L, and EED .
Characteristic of the genes are described in Table 1 below
Figure imgf000025_0001
Figure imgf000026_0001
Figure imgf000027_0001
Figure imgf000028_0001
Figure imgf000029_0001
Figure imgf000030_0001
Figure imgf000031_0001
Figure imgf000032_0001
Determination of Expression Levels
In one embodiment, the expression of the nucleic acid, such as mRNA of the genes of interest is determined. Methods for detecting gene expression from biological samples are known. A number of different approaches are described herein. In embodiments, probes or primers to detect gene expression of each gene or the target sequence for each gene are designed to specifically detect expression of CCND1 versus the other listed genes. However, the primers and probes are designed to specifically identify a CCND1 gene regardless of whether the gene has sequence variation.
Levels of mRNA can be quantitatively measured by Northern blotting. A sample of RNA is separated on an agarose gel and hybridized to a radio-labeled RNA probe that is
complementary to the target sequence. The radio-labeled RNA is then detected by an
autoradiograph.
Another approach for measuring mRNA abundance is polymerase chain reaction. RT- PCR first generates a DNA template from the mRNA by reverse transcription, which is called cDNA. This cDNA template is then used for qPCR where the change in fluorescence of a probe changes as the DNA amplification process progresses. With a standard curve qPCR can produce an absolute measurement such as number of copies of mRNA, typically in units of copies per nanolitre of homogenized tissue or copies per cell. qPCR is very sensitive (detection of a single mRNA molecule is possible).
Another approach is to individually tag single mRNA molecules with fluorescent barcodes (nanostrings), which can be detected one-by-one and counted for direct digital quantification (Krassen Dimitrov, NanoString Technologies).
Also, DNA microarrays can be used to determine the transcript levels for many genes at once (expression profiling). Recent advances in microarray technology allow for the
quantification, on a single array, of transcript levels for every known gene in several organism's genomes, including humans. However, preferably a subarray allowing for the detection of 200 genes or less is employed.
Also, "tag based" technologies like Serial analysis of gene expression (SAGE), which can provide a relative measure of the cellular concentration of different mRNAs, can be used.
In other embodiments, the level of expression can be determined using RNA sequencing technology. RNA sequencing technology involves high throughput sequencing of cDNA. mRNA is isolated and reverse transcribed to form a library of cDNA. The cDNA is fragmented to a specific size and optionally may be detectably labeled. The fragments are sequenced and the full sequence is assembled in accord with different platforms such as provided by Ilumina, 454 Sequencing or SOLID sequencing. In addition, mRNA can be sequenced directly(without conversion to cDNA) using protocols available from Helicos.
In one embodiment, the expression of the protein from the genes of interest is
determined. For genes encoding proteins the expression level can be directly assessed by a number of means with some clear analogies to the techniques for mRNA quantification.
The most commonly used method is to perform a Western blot against the protein of interest - this gives information on the size of the protein in addition to its identity. A sample (often cellular lysate) is separated on a polyacrylamide gel, transferred to a membrane and then probed with an antibody to the protein of interest. Other methods include, for example, Enzyme- linked immunosorbent assay (ELISA), lateral flow test, latex agglutination, other forms of immunochromatography, western blot, and/or magnetic immunoassay.
Reagents to the detect the molecules of interest (such as a mRNA, cDNA, a nucleic probe, or antibodies) can be produced by methods available to an art worker or purchased commercially.
Mathematical Analysis
In embodiments, a method for selecting a treatment of a subject with breast cancer comprises inputting the expression levels of the set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer to obtain an output score. In one embodiment, the gene expression analysis of the genes of interest is applied to the equations provided herein. In embodiments, the method is a computer implemented method.
In one embodiment, the data obtained from a microarray analysis (gene expression profile) with regards to the gene expression of the genes of interest is first normalized and then applied to the equations provided. In a specific embodiment, when using the following functions, gene expression analysis is obtained using probes and methods as designed by Affymetrix U133A chip. Other arrays may also be utilized and functions derived that provide classification of responders and nonresponders in accord with the methods described herein. However, it is not necessary or even desirable to utilize a full array of 10,000, 20,000 or even 30,000 genes. A subarray can be constructed to detect gene expression of at least the first set of genes or the second set of genes or both. If the expression levels of genes are to be quantified by PCR a conversion of the functions/equations for the gene expression values from PCR as compared to microarray can be readily obtained. Gene expression values obtained using an Affymetrix array technology correlate with expression levels as determined by PCR. (Pepper et al, BMC Bioinformatics 2007, 8:273)
If gene expression analysis is conducted using PCR or RNA sequencing the gene expression values can be converted to values of the Affymetrix gene expression analysis algorithm using known methods. For example, the gene expression analysis can be run in parallel using PCR or RNA sequencing and using the Affymetrix U133 chip and software. The gene expression values for each gene from PCR or RNA sequencing can be compared to the values generated using Affymetrix system and a conversion factor identified. Gene expression levels for each gene generated by PCR or RNA sequencing can be generated and converted to the output of the Affymetrix algorithm using the conversion factor before inputting gene expression levels for each gene into the function
Gene expression values are obtained from, for example, an Affymetrix array or subarray, and are normalized using an algorithm that corrects for background and normalizes the probe intensity. In embodiments, the MAS5 or RMA algorithms are used to correct the raw gene expression data. The MAS5 algorithm provides for measuring the intensities for each probe on the array to generate CEL files. The background level across the area is computed using a weighted sum for of individual zone backgrounds. Backgound is then subtracted from spot intensity. The probe intensity is then corrected for stray signal by subtracting out the signal from a mismatched probe pair that is paired with the perfect match probe. The signal for each probe in the probe set is used to calculate an average of the signals from each probe. The RMA algorithm is a model based approach including background correction, quantile normalization, and modeling probe specific effects across multiple arrays using a median polish method for fitting the model. In each set of genes the same algorithm can be used or combinations of the algorithms can be used. These algorithms are available through Affymetrix and as well at Libaffy (at moffit.usf.edu). Other software may also be compatible for analyzing gene expression including that of Biotique (XRAY), Genomematrix Chiplnspector, JMP Genomics, Arraystar form
DNAstar, Expressionist from Genedatra, and Rosetta Resolver System.
Once the gene expression data has been normalized, the values for each gene are inputted into function 1 or function 2 as described below.
For function 1 , the variables are defined as the expression values of the genes:
XI = 208712_at = CCNDl ;
X2 = 41660_at = CELSRl ;
X3 = 207470_at = DKFZp566H0824;
X4 = 20423 l_s_at = FAAH;
X5 = 214768_x_at = IGKV1 -5;
X6 = 210150_s_at = LAMA5;
X7 = 202780_at = OXCTl ;
X8 = 216300_x_at = RARA; and
X9 = 217825_s_at = UBE2J 1.
In an embodiment, all of the variables were normalized using the RMA algorithm.
In an embodiment, the variables are weighted in each function as follows:
Y l = X 1 ;
Υ2 = [(Χ2)Λ(-1 )]* 102;
Υ3 = [(Χ3)Λ(-3.8)]* 104;
Υ4 = [(Χ4)Λ(-5)]* 105;
Υ5 = [(Χ5)Λ(-1.5)]* 102;
Υ6 = [(Χ6)Λ(-1 )]* 102;
Υ7 = [(Χ7)Λ(-4)]* 104
Υ8 = [(Χ8)Λ(-1 .8)]* 102
Υ9 = [(Χ9)Λ(-6.2)]* 104.
In an embodiment, the Fl function is shown below. The * denotes multiplication and the Λ symbol denotes an exponent.
For the F2, the variables are described as the expression values of genes:
XI 0 = 205225_at (M) = ESR1 ;
XI I = 213134_x_at (R) = BTG3;
XI 2 = 200790_at (R) = ODC 1 ;
X13 = 216237_s_at (R) = MCM5; X14 = 204822_at (R) = TT ;
XI 5 = 219438_at (R) = NKAIN 1 ;
XI 6 =-205059_s_at (R) = IDUA;
X 17 = 213113_s_at (R) = SLC43 A3 ;
XI 8 = 221253_s_at (R) = TXNDC5;
X19 = 216092_s_at (M) = SLC7A8;
X20 = 201755_at(M) = MCM5; and
X21 = 204825_at (M) = MELK.
In embodiments, the above F2 transcripts indicated with (R) are obtained from RMA processing of the raw intensity data (CEL files), whereas those indicated with (M) are obtained from MAS5 processing of the raw intensity data (CEL files). In an embodiment, the variables are weighted as follows:
Y10 = In (X10)
Yll = [(XI 1)^(0.1)3*10
Y12 = (X12)A(0.4)
Y13 = (X13)A(0.2)
Y14 = [(X14)A(0.05)]*10
Y15 = [(X15)A(-0.7)]*I0
Y16 = [(X16)A(0.1)]*10
Y17 = [(X17)A(0.05)]*10
Y18 = X18
Y19 = ln (X19)
Y20 = In (X20)
Y21 = ln (X21)
In an embodiment, the function F2 is below.
Figure imgf000038_0001
In embodiments, once the gene expression values are inputted into function 1 or 2 or both a score is obtained. The score is then compared to a cutoff value. If the score is less than the cutoff value it indicates that the breast cancer cells are representative of a responder. If the score is greater than or equal to the cutoff value it indicates the breast cancer cells are representative of a nonresponder.
Cutoff values are determined according to standard methods. In an embodiment, a cutoff value is determined by dividing a set of patients with known status as a responder or
nonresponder into two groups, each group containing responders and nonresponders, analyzing gene expression levels for a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patients in the first group, determining the score of each of the known responders and nonresponder samples, calculate the halfway point between the mean score of the responder group and the mean score of the nonresponder group, and identify this halfway point as the cutoff value. The predictive capability of the cutoff value is tested using the second group of patients with the samples blinded. If the accuracy falls within at least 80% no further action is necessary. If the cutoff value does not provide the required degree of accuracy, then the cutoff value is adjusted using standard statistical methods. The cutoff is set within that difference between the 99% confidence interval of the groups and adjusted up or down from the
aforementioned middle point according to the magnitude of the standard deviation of the two groups, i.e. the cutoff is moved away from the middle point from the group that has the larger standard deviation and closer to the other group (the one with the smaller standard deviation).
In another embodiment, the cutoff value is determined by a method comprising calculating an optimal point on the ROC curve based on the 50 scores of the 50 original subjects used in the discovery study [optimal point is defined as the point with the highest sensitivity and the lowest false positive rate (1 -specificity)] for first group of short term survivors and a second group of long term survivors. That optimal point (the score of one of the 50 original subjects), which represents, according to ROC curve analysis, the best cutoff point for all of the 50 original subjects' scores, itself may be used as the cutoff point. In embodiments, a cutoff value is selected that provides the highest specificity and lowest rates of false positives. In a specific embodiment this value can be determined by analyzing the score of each patient in a group of patients using ROC curve analysis.
In embodiments the cutoff value for Fl is about 4.7 and for the F2 it is about 13.7.
Once a sample is identified as a responder or nonresponder, a health care provider can select a treatment appropriate for the individual patient. In embodiments, if the patient's breast cancer sample is identified as a nonresponder, then the health care provider may not utilize preoperative chemotherapy such as taxol or tamoxifen. In that circumstance, the health care provider may want to perform surgery and radiation immediately after analysis or employ alternative chemotherapy before or after surgery. For recurrence of cancer, the methods of the disclosure may be employed to determine further treatment with standard chemotherapy will be beneficial, that is, will the cells of the recurrent tumor be classified as a responder or
nonresponder. Since the methods of the disclosure can be employed regardless of age, ethnicity, stage of cancer, or receptor status, this method provides an additional complementary risk assessment to determine treatment options for the patient.
In embodiments, where the output score indicates that the subject is likely to be a long term survivor, the health care worker may select one or more standard therapy options. These standard therapy options include chemotherapy, surgery, and or radiation. Standard
chemotherapeutic options include treatment with one or more of cyclophosphamide, Taxol, Platinum, Carboplatin, Cisplatin, Gemcitabine, Topotecan, Oxaliplatin, Doxorubicin, Paclitaxel, Docetaxel, and combinations thereof.
In embodiments, where the output score indicates that the subject is likely to be a short term survivor, the health care worker may select a more aggressive treatment in addition to or in place of the standard chemotherapy. Such treatment includes treatment with a cancer vaccine, angiogenesis inhibitors, tubulin binding inhibitors, taxane analogs, actin polymerization inhibitors, adoptive cell therapy, and protein ubiquination inhibitors. Examples of compounds that can be utilized include Avastin, Votrient, exemestane, leucovorin, carmustine, rituximab, cytarabine, vincristine, filgastrim, etoposide, fluderamine, zileuton, everolimis, tretinoin, fulvestrant, sirolimus, and troglitazone or any chemotherapeutic compounds in Table 1 . In embodiments, the chemotherapy treatment includes treatment with an inhibitor of CCND1 , RARA, UBE2J1, and combinations thereof.
In some embodiments, the methods of the invention may be employed on a set of patients to identify a responder group or a nonresponder group in a clinical trial, for example. When testing a new therapeutic agent, it is useful to know whether the therapeutic agent has different effects in the responder population versus the nonresponder population. Using the methods of the disclosure, a group of patients having breast cancer are identified as responders or nonresponders and are then treated with a potential therapeutic agent. Safety and efficacy of the drug is assessed in responder and nonresponder propulations.
Methods for screening therapeutic agents
Another aspect of the disclosure includes methods for screening therapeutic agents. Identification of breast cancer tissue samples as nonresponders and responders can be used to screen therapeutic effectiveness of the potential therapeutic agent on both types of patient populations. In some embodiments, cell lines may be developed from breast cancer tissue using standard methods from nonresponder and responders in order to provide for high through put analysis.
In an embodiment, a method for screening agents for treating breast cancer, comprises contacting a breast cancer sample identified as a nonresponder or responder with a potential agent for treating breast cancer; and b) determining whether the agent decreases the growth, spread of the breast cancer sample, or changes the gene expression profile of the first set of genes, the second set of genes or both.
In embodiments, the method further comprises identifying a breast cancer sample as from a responder or nonresponder by determining the expression level of a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FA AH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN l , IDUA, SLC43A3, TXNDC5, SLC7A8, and MEL genes or both set of genes in a sample from the patient.
In embodiments, the potential therapeutic agents are those that interact with any one of the genes CCND1, CELSR1, DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, ΝΚΑΓΝ1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes in a sample from the patient. Examples of such agents are listed in Table 1 under known drugs or chemicals. Drugs or chemicals similar to those known drugs in mechanism of action may be screened using nonresponder and responder breast cancer cells or cell lines as a measure of their efficacy in each of the patient groups. Other drugs or agents may also be those that are selected to act on other genes that are known to interact with any of the genes in the first or second set of genes as described on Table 1.The genes in the first and/or second set of genes are targets to develop new therapeutics which can be tested on breast cancer cells identified as responders or nonresponders.
High throughput assays such as multiwell plate assays or arrays with cells attached to nanobeads can be utilized to test a number of therapeutic compounds for any effects on the responder or nonresponder cell types with regard to inhibition of cell growth, cell death, or change is gene expression of one or more of the genes of the first set of genes, the second set of genes or both. Those agents effective on both the responder and nonresponder population may be selected for further development. In other embodiments, an effective agent on either a responder or nonresponder cell types is selected and the patient group is sorted as responders and non responders for further testing of the agent effective in the respective responder or nonresponder cell type.
Computer Implemented methods
In one aspect of the disclosure, a computing device implemented method is provided. In embodiments, a computing device implemented method to identify whether a breast cancer patient is treatment responder or non-responder comprises: a)receiving gene expression levels of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, ΝΚΑΓΝ 1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient at a receiver module; and b) inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score in a scoring module. In embodiments, the method further comprises c) comparing the score to a cutoff value in a diagnostic module to identify the patient as a treatment responder or nonresponder. In yet another embodiment, the method further comprises d) communicating the identification of the patient as a treatment responder or nonresponder to a user. For example, Fig.7 is a flowchart illustrating a
communication process to transfer patient information from the health care provider to the computing device.
In embodiments, a method for selecting a treatment for a subject that has breast cancer comprises calculating an output score, using a computing device, by inputting gene expression levels of a set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer; and displaying the output score, using a computing device. In embodiments, the method further comprises determining whether the output score is greater than or equal to or less than a cutoff value, using a computing device; and displaying whether the subject is likely to be a short term survivor if the output score is greater than or equal to the cutoff value or long term survivor if the output score is less than the cutoff value.
In embodiments, the biological samples are processed at a health care facility and the gene expression values are determined and then those values are sent to a remote location for further analysis via the internet or wireless communication systems. In other embodiments, the gene expression and analysis of the gene expression data is done at a single location.
In some embodiments, the computing device can include a single computing device, such as a server computer. In other embodiments, the computing device can include multiple computing devices configured to communicate with one another over a network (not shown). The computing device can store multiple databases within memory. The databases stored on the computing device can be organized by clinic, practicing clinician, programmer identification code, or any other desired category.
Gene expression information can be sent to the remote computing system or another data storage device. The communication process initializes and begins at a start module and proceeds to a connect operation. The connect operation communicatively couples the stored information of the health care provider to the remote computing system, for example, via a cabled
connection, a wireless local area network (WLAN or Wi-Fi) connection, a cellular network, a wireless personal area network (WPAN) connection, e.g., BLUETOOTH®, or any desired communication link. A transfer operation transmits gene expression data from the health care provider to the computing device. In an embodiment, the transfer operation encrypts the data before
transmitting the data between the devices. The communication process can complete and end at a stop module. Once the gene expression data is transferred to a remote computing device, the data is optionally normalized and then inputted into a stored function 1 , function 2 or both in order to obtain a score. In embodiments, the score is then compared to a cutoff value. A score greater than or equal to the cutoff value is identified as a non responder and below the cutoff value as a responder. In embodiments, the status of the analysis of the sample as a responder or non responder is communicated back to the health care provider using a similar process over cabled connection, a wireless local area network (WLAN or Wi-Fi) connection, a cellular network, a wireless personal area network (WPAN) connection, e.g., BLUETOOTH®, or any desired communication link.
In embodiments, once the status of a patient's sample as a responder or nonresponder is determined, the status may also be associated with a treatment recommendation that is also communicated to a health care worker. For a responder, the treatment recommendation includes standard chemotherapy, surgery, and/or radiation. For a nonresponder, the recommendation includes immediate surgery and radiation with no preoperative chemotherapy and/ or surgery and radiation with alternative chemotherapy before or after surgery.
The remote computing device includes a computer, a server, storage devices, mobile devices, and the like for receiving the level of gene expression of a first set of genes comprising CCND1, CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TT , NKAIN1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient at a receiver module. The gene expression levels for each of the genes are inputted into function 1 or function 2 or both as described herein to provide a score. In embodiments, the score is then compared to a cutoff value in the diagnostic module and if the score greater than or equal to the cutoff value, the sample is identified as a nonresponder and if it falls below, the sample is identified as a responder. In embodiments, the identification of the sample as a responder or nonresponder is stored in a database and communicated to the user via internet communication to another computer or mobile device. In embodiments, the user is a health care professional.
Once the health care professional receives the identification, the health care provider chooses a course of treatment appropriate to the responder or nonresponder group. In
embodiments, if the patient is identified as a nonresponder to chemotherapy, the health care provider may choose to eliminate any preoperative chemotherapy and treat the patient by surgery and radiation, optionally followed by chemotherapy post surgery and radiation.
Computer/Processor
The detection, prognosis and/or diagnosis method can employ the use of a
processor/computer device. For example, a general purpose computer system comprising a processor coupled to program memory storing computer program code to implement the method, to working memory, and to interfaces such as a conventional computer screen, keyboard, mouse, and printer, as well as other interfaces, such as a network interface, and software interfaces including a database interface find use one embodiment described herein.
In embodiments, a computing device, comprises a processing unit; and a system memory connected to the processing unit, the system memory including instructions that, when executed by the processing unit, cause the processing unit to: calculate an output score by inputting gene expression levels of a set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer; and display the output score. In embodiments, the system memory includes instructions that when executed by the processing unit, cause the processing unit to determine whether the output score is greater than or equal to or less than a cutoff value; and displaying whether the subject is likely to be a short term survivor if the output score is greater than or equal to the cutoff value or long term survivor if the output score is less than the cutoff value.
In embodiments the set of genes comprises at least the genes CCND1 , CELSR1 ,
D FZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 , RARA, and UBE2J1 or at least the genes ESR1 , BTG3, ODC1 , MCM5, TTK, ΝΚΑΓΝ1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MEL .In embodiments, the function is selected from the group consisting of function 1 , or function 2, or both. The computer system accepts user input from a data input device, such as a keyboard, input data file, or network interface, or another system, such as the system interpreting, for example, the microarray or PCR data, and provides an output to an output device such as a printer, display, network interface, or data storage device. Input device, for example a network interface, receives an input comprising detection of the proteins/nucleic acids described herein and/or quantification of those compounds. The output device provides an output such as a display, including one or more numbers and/or a graph depicting the detection and/or quantification of the compounds.
Computer system is coupled to a data store which stores data generated by the methods described herein. This data is stored for each measurement and/or each subject; optionally a plurality of sets of each of these data types is stored corresponding to each subject. One or more computers/processors may be used, for example, as a separate machine, for example, coupled to computer system over a network, or may comprise a separate or integrated program running on computer system. Whichever method is employed these systems receive data and provide data regarding detection/diagnosis in return.
In embodiments, the disclosure provides a computing device or a nontransitory computer readable medium with instructions to implement the methods of the disclosure. The computer readable medium includes CD, DVD, flash drive, external hard drive, and mobile device.
The computing device includes a receiver module for receiving gene expression data, an optional normalization module for normalizing gene expression data, a scoring module for inputting the gene expression data into function 1 , function 2 or both, and calculating a score, an optional diagnostic module for comparing the score to a cutoff value and identifying a score above the cutoff value as a responder and below a cutoff value as a nonresponder, and an optional communication module for communicating the identification of the sample as a nonresponder or responder to a user. The communication module may communicate to a user through a graphical interface on a computer or mobile device or through a cabled connection, a wireless local area network (WLAN or Wi-Fi) connection, a cellular network, a wireless personal area network (WPAN) connection, e.g., BLUETOOTH®, or any desired
communication link. Instructions can also be stored on a nontransitory computer readable medium. The instructions provide for a computer implemented method comprising a)receiving gene expression levels of a first set of genes comprising CCND1 , CELSR1, DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1, RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1, MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient; b) inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score c) comparing the score to a cutoff value to identify the patient as a treatment responder or nonresponder; and d) communicating the identification of the patient as a treatment responder or nonresponder to a user. The functions are stored on the computer readable medium and include function 1 or function 2. Optionally, the instructions can include
instructions to normalize raw gene expression data by correcting for background and probe intensity.
Kits
Another aspect of the disclosure provides kits for identifying a patient or set of patients as a responder or nonresponder. In embodiments, the kits include reagents for detecting the gene expression levels of the first set of genes or the second set of genes or both. In embodiment, the reagents include primers or probes. In embodiments, the probes may be attached to a surface. In embodiments, the primers or probes may be detectably labeled. In embodiments, the kit includes reagents for conducting PCR.
Primers and probes that specifically detect expression of each of the first set of genes and the second set of genes can be readily designed by using known methods and/or publicly available software. In embodiments, the primers and probes are not designed to bind to 3' poly A regions or other repetitive or nonunique sequences. In embodiments, the primers or probes specifically bind to or hybridize to under stringent conditions to each gene or exemplary target sequence for each gene identified in Table 1 and provided in Tables 6-26 and do not cross hybridize to other genes. In embodiments, target regions for each gene have been identified that are known to uniquely identify each of the genes. Probes or primer can be designed that detect or amplify all or a part of the target sequence. One of skill in the art will recognize that such primers and probes may amplify a number of different gene sequences corresponding to each gene including allelic variants, snps, splice variants and the like.
Preparing primers and probes with detectable labels are known to those of skill in the art. Labels include radioactive isotopes, fluorescent moieties, other dyes, biotin, and molecular beacons.
In embodiments, a kit contains no more than 200 sets of primers and/or no more than 200 probes, no more than 150 probes and/or sets of primers, no more than 100 probes and/or sets of primers, no more than 50 probes and/or sets of primers, no more than 25 probes and/or 25 sets of primers, no more than 9 probes and/or sets of primers. In a specific embodiment, a subarray may be prepared by attaching at least one probe that specifically binds to and hybridizes to each gene of a first set of genes comprising CCND1 , CELSR1 , D FZp566H0824, FAAH, IG V1-5, LAMA5, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESRl , BTG3, ODC1 , MCM5, TTK, ΝΚΑΓΝ1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes to a surface.
In embodiments, the kit can provide a subarray or primers/probes that detect a smaller subset of genes that includes the first and/or second set of genes, providing that 200 genes or less are analyzed. In embodiments about 9 to 200 genes are analyzed, about 9 to 150 genes are analyzed, about 9 to 100 genes are analyzed, about 9 to 50 genes are analyzed, or about 9 to 25 genes are analyzed. The other genes that are analyzed may include other known markers for breast cancer or other known markers for cancer such as ovarian cancer including LYPLA2, TUBA3C, ACTB, MED13L, OSBPL8, EED, and PKP4 and/or SSR1, USP5, ACTB, HLCS, NDUFB 1 , LYPLA2, TUBA3C, MEDI 3L, and EED .
In embodiments, the kit includes a least one set of primers or a probes that is specific for one or more housekeeping genes expressed in both responder and nonresponder breast cancer cells and/or that is expressed in normal breast tissue.
In embodiments, the kit includes a least one set of primers or a probes that is specific for another transcript of at least one gene of the first set of genes or the second set of genes or both. For example, in the second set of genes, detection of two different transcripts of MCM5 is determined. The use of two different transcripts serves as an internal control to determine if the subarray or PCR is functioning properly. Other genes of the first set or second set are known to have alternative transcripts and can serve as controls.
In embodiments, the kit includes a control that includes one or more of the gene sequences of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKV1-5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1, BTG3, ODC1, MCM5, TTK, ΝΚΑΓΝ1, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes.
In embodiments, the kit includes a control that is breast cancer cells from a known responder and/or a known non responder.
In embodiments, the kit can comprise or consist essentially of other reagents for detecting the gene expression level of the identified genes. In embodiments, the kit may also contain primers or probes for detecting one or more housekeeping genes as a positive control. In embodiments, the kit does not contain probes for any other genes that are predictive of short term or long term survivorship of breast cancer other than the genes identified herein.
In embodiments, the kit further comprises instruction for inputting the gene expression values into function 1 , or function 2, or combinations thereof to obtain an output score. The instructions further provide comparing the output score for each function to a cutoff value and determining if the subject is likely to have long term survival if the output score is less than the cutoff value or if the subject is likely to have short term survival if the subject has an output score greater than or equal to the cutoff value for each function.
In embodiments, a kit further comprises a nontransitory computer readable storage medium having computer-executable instructions that, when executed by a computing device, cause the computing device to perform a step comprising: calculating an output score by inputting gene expression levels of a set of genes into a function that provides a predictive relationship between gene expression levels of the set of genes and short term or long term survival of subjects having breast cancer as described previously herein.
In embodiments, the nontransitory computer readable storage medium having computer-executable instructions that, when executed by a computing device, cause the computing device to perform a step comprising: comparing the output score to a cutoff value and displaying whether the subject is likely to have long term survival if the output score is less than the cutoff value or if the subject is likely to have short term survival if the subject has an output score greater than or equal to the cutoff value for each function.
In embodiments, the kit includes a nontransitory computer readable medium with instructions for analyzing the gene expression data identified above. The instructions provide for a computer implemented method comprising a)receiving gene expression levels of a first set of genes comprising CCND1 , CELSR1 , D FZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1 , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, ΝΚΑΓΝ1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient; b) inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score c) optionally, comparing the score to a cutoff value to identify the patient as a treatment responder or nonresponder; and d) optionally, communicating the identification of the patient as a treatment responder or nonresponder to a user. The functions are stored on the computer readable medium and include function 1 or function 2. Optionally, the instructions can include instructions to normalize raw gene expression data by correcting for background and probe intensity.
In other embodiment, the kit includes instructions for communicating the gene expression information to a remote computing device. Optionally, the kit includes instructions for comparing the score from function 1 or function 2 to cutoff value.
Examples
The following examples are provided in order to demonstrate and further illustrate certain embodiments and aspects of the present invention and are not to be construed as limiting the scope thereof.
Example 1 - Discovery of Breast Cancer Biomarkers for the U.S.-based Study
The objectives of this scientific study are:
• To determine whether those patients with breast cancer who will respond to the current taxol-based chemotherapy can be identified and differentiated from those patients who will not do so, regardless of age, ethnicity, and status of hormone receptors (estrogen, progesterone, or HER2), by performing global gene expression analysis on tumor tissue obtained at the time of biopsy. • To identify and develop one or more prognostic biomarkers that can successfully accomplish the preceding objective by employing the novel bioinformatic platform technology of Applied Informatic Solutions, Inc., developed by Dr. Jason B. Nikas. • To validate the developed biomarkers by using an independent set of patients - more specifically, by using patients that were new and different from the original patients and came from five different clinical centers around the world.
Study Design
This study was conducted using microarray data acquired from the Gene Expression Omnibus of the National Center for Biotechnology Information (GEO Accession Number:
GSE20271). Previous results associated with these data were published by Tabchy et al. [Clin Cancer Res 2010; 16(21):5351 -61 (PMID: 20829329)] [3].
Patients with breast cancer (clinical stages I-III) were recruited in five different clinical centers around the world [M. D. Anderson Cancer Center, Houston, TX, USA; Lyndon B.
Johnson General Hospital, Houston, TX, USA; Instituto Nacional de Enfermedades Neoplasicas, Lima, Peru; Centro Medico Nacionalde Occidente, Guadalajara, Mexico; and Grupo Espanol de Investigacion en Cancer de Mama, Spain]. All subjects first underwent biopsy, and tumor tissue obtained thus was analyzed for global gene expression using the GeneChip array U133A by Affymetrix. Histological diagnosis of invasive cancer and status of the estrogen, progesterone, and HER2 receptors were also determined from tissue obtained from the biopsy. Following biopsy, all subjects were treated with chemotherapy comprising the following drugs and dosage protocol: weekly paclitaxel (80 mg/m2/wk) * 12 courses followed by 5-fluorouracil (500 mg/m2), doxorubicin (50 mg/m2), and cyclophosphamide (500 mg/m2) all on day 1 repeated in 21 -day cycles χ 4 courses. Following the completion of the aforementioned chemotherapy, all subjects underwent surgery (modified radical mastectomy or lumpectomy and sentinel lymph node biopsy or axillary node dissection) in order to determine whether a subject experienced pathological complete response to chemotherapy or whether residual invasive cancer was still present. Pathological complete response to chemotherapy was defined as the absence of any residual invasive cancer at the breast site and at the nearest axillary lymph node site. Ninety three subjects were able to complete the aforementioned treatment protocol in terms of dosage and frequency (number of administered courses) of taxol and the other drugs. Of those 93 subjects, 20 responded to the taxol-based chemotherapy and had no residual invasive cancer at the end of the six-month treatment, whereas the remaining 73 did not do so.
Discovery Study: Of the 93 subjects, we randomly selected 50 [10 responders (R) and 40 non-responders (NR)] in such a way that the proportions of clinical stages remained the same. Moreover, subjects with all possible combinations of receptor classifications (ER, PR, and HER2) were included in both the discovery and the validation study with approximately equal proportions.
Validation Study: The remaining 43 subjects [10 responders (R) and 33 non-responders (NR)] were used with the sole and express purpose of validating our prognostic biomarker model. The clinical information about the samples is shown in Table 2.
Table 2
Figure imgf000052_0001
Validation
R 1 0 0 1 0 0 1 0 1 1 0
NR 2 0 0 0 2 0 2 0 2 0 2
The original raw intensity data (CEL files) were processed using the RMA algorithm in the Expression Console software by Affymetrix. A series of statistical methods, part of the bioinformatic technology platform of Applied Informatic Solutions, Inc., developed by Dr. Jason B. Nikas, was employed to reduce the dimensionality of the data. Briefly, we performed ROC curve analysis in order to assess the discriminating capability of all variables with respect to our two groups, namely, R (responders) and NR (non-responders). In the final round, we selected only those variables with an AUC > 0.770. Fourteen variables fulfilled this criteria. From the aforementioned 14 most significant variables, 9 became the input variables to the complex mathematical function Fi, also referred to here as super variable. We should point out that one other super variable was generated employing the remaining of the aforementioned 14 most significant variables, but, following final assessment, it proved to be not as robust as the Fi, and it is consequently not presented here. The 9 input variables (transcripts) to the Ft super variable correspond to 9 different genes. From the final pool of the most significant variables identified thus, using the mathematical theory of super variables and mathematical modeling, developed by Dr. Jason B. Nikas, we were able to generate one super variable (Fj), a complex mathematical function of nine genes.
Computer programs
Computer programs were written using MATLAB ©201 l b by The Math Works, Inc., Natick,
MA,
USA.
Findings
In the Discovery Study, one prognostic biomarker (Fi) was developed. This prognostic biomarker is a complex mathematical function of nine genes, as shown immediately below.
Fi = / ( CCNDl, CELSR l, DKFZp566HQ824,FAAH, IGKVl - 5, LAMA5, OXCTl, RARA, U3E2J1 J A wealth of statistical information pertaining to the performance and assessment of our Fi prognostic biomarker during both the Discovery Study and the Validation Study is shown below in Figure 1 and Figure 2. The combined results are shown in Figure 3.
Results
As was mentioned earlier, from the total number of 93 subjects [20 responders (R) and 73 non-responders(NR)] used in this study, we randomly selected 50 subjects [10 responders (R) and 40 non-responders (NR)] for the development and training of the prognostic biomarker model (Fl); and we will henceforward refer to those 50 subjects as the 50 original subjects. After the development of the prognostic biomarker model, we assessed its accuracy using the aforementioned 50 original subjects, which were employed for its development. This constitutes an important first step in the assessment of a prognostic test.
The cut-off score of the Fl prognostic biomarker model was determined by taking into account the results of the following two analyses: (1) calculation of the optimal point on the ROC curve based on the 50 scores of the 50 original subjects used in the discovery study
[optimal point is defined as the point with the highest sensitivity and the lowest false positive rate (1 -specificity)] and (2) calculation of the 99.99% confidence intervals for the mean Fl scores of the two groups (R and NR) and their respective standard deviations. Based on that, the cut-off score of the Fl model was determined to be 4.6683. If a subject has an Fl score less than 4.6683, then that subject is classified as an R (responder); otherwise >4.6683, that subject is classified as an NR (non-responder).
As can be seen from Figure 1 , the Fl model correctly identified (9/10) R subjects and (36/40) NR subjects. Assuming that we are interested in identifying the responders (R) to the T/FAC chemotherapy, our target group is the R group and our reference group is the NR group. It follows, then, that for the discovery study, the Fl model exhibited a sensitivity = 9/10 = 0.900 and a specificity = 36/40 = 0.900.
Figure 1 and Table 3 show all pertinent statistical results of the Fl prognostic biomarker model in connection with the discovery study in great detail. Table 3 is shown below. Table 3: Discovery Study
Figure imgf000055_0001
Validation study
As was mentioned earlier, from the total number of 93 subjects [20 responders (R) and 73 non-responders (NR)] used in this study, we had randomly segregated 43 subjects [10 responders (R) and 33 non-responders (NR)] for the sole and express purpose of testing our prognostic biomarker model. Those 43 unknown subjects were completely extraneous to the model, that is to say they were new and different from the original 50 subjects used for the development of the model, and they had never before been encountered by it. This, validation by unknown and different subjects, constitutes the most important test in the assessment of a prognostic test.
As can be seen from Figure 2 and Table 4, our prognostic biomarker model (Fl) correctly identified (9/10) R subjects and (31/33) NR subjects from the total of 43 unknown subjects used in the validation study. More specifically, 9/10 R subjects had Fl scores that were less than the 4.6683 cut-off value, and 31/33 NR subjects had Fl scores that were >4.6683. Therefore, in connection with the validation study, the sensitivity of the Fl prognostic model was (9/10) = 0.900, and the specificity was (31/33) = 0.939.
Table 4, in addition to other pertinent statistical results of our prognostic biomarker model, shows the observed mean Fl scores of the two groups (R and NR) of the 43 unknown subjects in the validation study. As can be seen, both of those group mean scores, as observed in the validation study with the 43 unknown subjects, fall within the 99.99% confidence interval of the respective group mean scores as predicted in the discovery study (Table 3).
Table 4: Validation Study
Figure imgf000055_0002
Overall prognostic biomarker model performance
If we combined the discovery study results with those of the validation study, then the overall performance of our Fl prognostic biomarker model would be as follows. Overall sensitivity = 0.900 (18/20 R subjects) and overall specificity = 0.918 (67/73 NR subjects). Figure 3 and Table 5 depict those overall results, along with additional pertinent statistical results of the Fl prognostic biomarker model.
Table 5 Overall Results (Discovery & Validation Studies)
Figure imgf000056_0001
Significant genes
In connection with the aforementioned 9 significant genes that constitute the input variables to the Fl function (Equation 1), we conducted an Ingenuity Pathway Analysis (IP A) search. We sought to ascertain information about those 9 genes pertaining to their known interactions with other genes; their known interactions with drugs, chemicals, and/or hormones; and their known associations with various types of cancer as derived from the findings of scientific, peer reviewed studies.
The CCNDl (also cyclin Dl) gene encodes a protein that belongs to the cyclin family, the members of which are regulators of CDK kinases. Overexpression of the CCNDl gene, which alters cell cycle progression, has been observed in a variety of tumors and may contribute to tumorigenesis. Moreover, the CCNDl gene has been observed to interact with the BRCAl and BRCA2 genes, known to be familial breast and ovarian cancer susceptibility genes. Over- expression of CCNDl has been shown to play a crucial role in the development and progression of several types of cancers, such as breast, esophageal, bladder, and lung cancer. Furthermore, and more importantly, overexpression of CCNDl has been linked to the development of resistance to endocrine drugs in breast cancer cells. Over-expression of CCDN1 has also been shown to contribute to the progression of breast tumor cells to invasive carcinomas. We found that the CCND1 gene was significantly over-expressed in the NR group (non-responders) relative to the R group (responders)
The LAMA5 (laminin, alpha 5) gene encodes a protein that belongs to the alpha subfamily of laminin proteins, which constitute a major component of basement membranes, and which affect tissue development in many organs. Over-expression of the LAMA5 gene has been observed in various types of cancer, such as glioma, melanoma, hepatocellular carcinoma, lung adenocarcinoma, breast cancer, ovarian cancer, etc., especially in connection with tumor cell migration and invasiveness. In addition to oncogenesis and metastatic colonization,
overexpression of laminin has been linked to cytotoxic drug resistance in the case of lung cancer cell lines. Furthermore, in the case of breast cancer cells, it has been shown that over-expression of laminin inhibits estrogen action and leads to resistance to hormonal drugs without the loss of hormone receptors on the part of the breast cancer cells. We found that the LAMAS gene was significantly over-expressed in the NR group (non-responders) relative to the R
group(responders) .
The gene FAAH (fatty acid amide hydrolase) encodes a protein that is responsible for the hydrolysis of a number of primary and secondary fatty acid amides. In connection with cancer, it has been observed that over-expression of the FAAH gene resulted in cell invasion and cell migration in prostate carcinoma cells. Moreover, tumor over-expression of FAAH has been associated with prostate cancer severity and outcome, and it has been shown that antiproliferative effects could be observed in prostate cancer cell lines by inhibiting the FAAH enzyme. In connection with breast cancer, and more specifically regarding treatment response, it has been observed that FAAH was significantly over-expressed in the subjects that failed to respond to the T/FAC treatment. We found that the FAAH gene was significantly over- expressed in the NR group (non-responders) relative to the R group (responders).
The RARA gene encodes a protein (retinoic acid receptor alpha) that regulates transcription. In the case of acute promyelocytic leukemia, over- expression of RARA has been shown to induce cell proliferation via direct up-regulation of c-MYC in mice. Over- expression of RARA has also been observed in human ovarian tumor cells. In connection with human breast cancer cells, it has been widely observed that the expression of ER receptor a and that of RARA are coordinated; more specifically, over-expression of the former induces overexpression of the latter in ER-positive breast cancer cells. More interestingly, however, regarding our findings, it has also been observed that the crucial biological effects exerted by RARA on human breast cancer cells are mediated regardless of the ER status of those cells. We found that the RARA gene was significantly over-expressed in the NR group (non-responders) relative to the R group (responders).
The CELSRl (cadherin, EGF LAG seven-pass G-type receptor 1) gene encodes a protein that is a member of the flamingo subfamily, which is part of the cadherin superfamily. The flamingo cadherins are located at the plasma membrane and are thought to be receptors involved in contact-mediated cell communication. In squamous cell carcinoma cells, it has been shown that over-expressed G protein-coupled receptor proteins, via communication with EGFR
(epidermal growth factor receptor) signaling systems, induce cell proliferation and migration. In the case of breast cancer cells, it has been observed that CELSRl interacts with estrogen receptor (ER).Our findings show that the CELSRl gene was significantly over-expressed in the NR group (non-responders) relative to the R group (responders) .
The IGKV1 -5 (immunoglobulin kappa variable 1-5) gene encodes a protein whose molecular function is antigen binding, and which is involved in compliment activation, innate immune response, and in regulation of immune response, in general. Although little is known about the exact function of IGKV1 -5, it has been shown that it is expressed in leukocytes in human peripheral blood, and that various types of cancer cells effect significant reduction of the expression of immune-response related genes, such as those involved in antigen presentation pathway, genes in the B-cell receptor complex, genes in the human leukocyte antigen (HLA) class, etc. More specifically, in connection with breast cancer, it has been observed that significant down-regulation of immune-response related genes was significantly associated with tumor progression, nodal involvement, lymphatic invasion, and risk of breast cancer recurrence. We found that the IGKV1 -5 gene was significantly under-expressed in the NR group (non- responders) relative to the R group(responders) .
The Affymetrix HG-U133 A probe set 207470_at corresponds to DKFZp566H0824 (hypothetical LOC54744). According to our results, this unknown gene was significantly over- expressed in the NR group (non-responders) relative to the R group (responders). The UBE2J1 (ubiqui tin-conjugating enzyme E2, Jl, U) gene encodes a protein that is a member of the E2 ubiqui tin-conjugating enzyme family. The modification of proteins with ubiquitin is an important cellular mechanism that targets abnormal or shortlived proteins for degradation. It has been shown that BRCA1, via its binding to UBE2J1 , as well as to other members of the E2 family, directs the synthesis of specific polyubiquitin chain linkages. Given that BRCA1 functions as tumor suppressor and plays a role in DNA damage repair, it follows that an abnormal down-regulation of BRCA1 would most likely entail a down-regulation of UBE2J1. The UBE2J1 gene was significantly under-expressed in the NR group (non-responders) relative to the R group (responders).
The OXCT1 (3-oxoacid Co A transferase 1) gene encodes a protein that is a
mitochondrial matrix enzyme and plays a central role in ketone metabolism. Among other biological processes, OXCT1 is involved in adipose tissue development and cellular lipid metabolism It has been observed that HRAS, a well-known oncogene involved in many different types of cancer, suppresses the expression of OXCT1.In connection with breast cancer, it has been shown that 69% of breast cancer tumors exhibit an over-expression of HRAS, which is associated positively with disease progression and lymph node involvement and negatively with response to treatment. It has also been shown that over-expression of HRAS in breast cancer tumors can be constitutively mediated via deregulation of HER2, ER, EGFR, and other receptors. That, therefore, over-expression of HRAS in aggressive breast tumor cells leads to suppression of the expression of OXCT1 accords with our finding: the OXCT1 gene was significantly under- expressed in the NR group (non-responders) relative to the R group
(responders) .
Discussion
We are not aware of the existence of any prognostic tests that can predict which breast cancer patients will respond to chemotherapy with both sensitivity and specificity of 80% as mandated by the latest FDA requirements. Having employed 50 subjects [10 responders (R) and 40 non-responders (NR)], we were able to develop a prognostic test that— based on global gene expression analysis of tumor tissue collected during biopsy and prior to the commencement of chemotherapy— can identify with a high accuracy those patients with breast cancer (clinical stages I— III) who will respond to the T/FAC chemotherapy and will experience pathological complete response (Responders), as well as those breast cancer patients (clinical stages I-III) who will not do so (Non-Responders). Following validation with 43 unknown (new and different) subjects [10 responders (R) and 33 non-responders (NR)], our prognostic test (Fl) exhibited an overall sensitivity = 0.900 (18/20 R subjects) and overall specificity = 0.918 (67/73 NR subjects).
Furthermore, we are equally unaware of the existence of any prognostic tests that can predict which breast cancer patients will respond to chemotherapy with both sensitivity and specificity > 90% and, at the same time, regardless of the status of the three hormone receptors (ER, PR, and HER2). As can be seen from the information and results shown in Table 2, our prognostic test with both sensitivity and specificity > 90% is applicable, and can be administered, to all breast cancer patients independently of the status of the hormone receptors ER, PR, and HER2, as well as of the ethnicity and age of the patients. In contrast, other breast cancer prognostic tests currently in the market not only have limited accuracy (sensitivity and specificity , 80%) but also limited applicability: they can be administered only to specific combinations of the aforementioned three hormone receptors, that is to say, they can be administered to a small subset of the population of the breast cancer patients. Conversely, that also means that a large fraction of the women with breast cancer cannot avail themselves of those prognostic tests, and that, therefore, they cannot be enabled to make accurate decisions about treatment and management of their disease.
The clinical significance of our prognostic test in the field of breast cancer can be summarized in the following.
(1) Our prognostic test could be applied to all breast cancer patients in spite of receptor status, age, or ethnicity.
(2) Physicians will have the ability to identify with a high degree of accuracy both the responders and the non-responders to current chemotherapy at the outset (at the time of the biopsy and prior to the commencement of chemotherapy).
(3) Alternative therapies may be provided to those patients identified as non-responders to chemotherapy at the beginning, saving, thus, valuable and critical time, and increasing the probability of a favorable outcome. (4) In connection with providing the non-responders to the T/FAC chemotherapy with effective drugs, our prognostic test and the findings of our study pertaining to the
aforementioned nine important genes can assist pharmaceutical companies to test and develop new analogs of chemotherapeutic agents or new cocktails of small molecules that can modulate most, if not all, of those nine genes.
Example 2 - Discovery of Breast Cancer Biomarkers for the Global Study
This study was conducted using microarray data acquired from the Gene Expression Omnibus of the National Center for Biotechnology Information (GEO Accession Numbers: GSE20271& GSE20194). Previous results associated with these data were published by Tabchy et al. [Clin Cancer Res 2010; 16(21):5351-61(PMID: 20829329)] and by Popovici et al. [Breast Cancer Res 2010; 12(1):R5(PMID: 20064235)] [Tabchy et al., 2010; Popovici et al., 2010].
Patients with breast cancer (clinical stages I-III) were recruited in five different clinical centers around the world [M. D. Anderson Cancer Center, Houston, TX, USA; Lyndon B.
Johnson General Hospital, Houston, TX, USA; InstitutoNacional de EnfermedadesNeoplasicas, Lima, Peru; Centra Medico NacionaldeOccidente, Guadalajara, Mexico; and GrupoEspanol de Investigacion en Cancer de Mama, Spain]. All subjects first underwent biopsy, and tumor tissue obtained thus was analyzed for global gene expression using the GeneChip array U 133 A by Affymetrix. Histological diagnosis of invasive cancer and status of the estrogen, progesterone, and HER2 receptors were also determined from tissue obtained from the biopsy. Following biopsy, all subjects were treated with chemotherapy comprising the following drugs and dosage protocol: weekly paclitaxel (80 mg/m2/wk) x l2 courses followed by 5-fluorouracil (500 mg/m2), doxorubicin (50 mg/m2), and cyclophosphamide (500 mg/m2) all on day 1 repeated in 21 -day cycles χ 4 courses. Following the completion of the aforementioned chemotherapy, all subjects underwent surgery (modified radical mastectomy or lumpectomy and sentinel lymph node biopsy or axillary node dissection) in order to determine whether a subject experienced pathological complete response to chemotherapy or whether residual invasive cancer was still present. Pathological complete response to chemotherapy was defined as the absence of any residual invasive cancer at the breast site and at the nearest axillary lymph node site. Two hundred and sixty six subjects were able to complete the aforementioned treatment protocol in terms of dosage and frequency (number of administered courses) of taxol and the other drugs. Of those 266 subjects, 58 responded to the taxol-based chemotherapy and had no residual invasive cancer at the end of the six-month treatment, whereas the remaining 208 did not do so.
Discovery Study: Of the 266 subjects, we randomly selected 172[38 responders (R) and 134 non-responders NR)] in such a way that the proportions of clinical stages remained the same.
Validation Study: The remaining 94 subjects [20 responders (R) and 74 non-responders (NR)] were used with the sole and express purpose of validating our prognostic biomarker model.
The original raw intensity data (CEL files) were processed using the RMA and MAS5 algorithms in the Expression Console software by Affymetrix.
A series of statistical methods, part of the bioinformatic technology platform of Applied Informatic Solutions, Inc., developed by Dr. Jason B. Nikas, was employed to reduce the dimensionality of the data. From the final pool of the most significant variables identified thus, using the mathematical theory of super variables and mathematical modeling, developed by Dr. Jason B. Nikas, we were able to generate one super variable (F2), a complex mathematical function of twelve genes.
Findings
In the Discovery Study, one prognostic biomarker (F2) was developed. This prognostic biomarker is a complex mathematical function of twelve genes, as shown immediately below.
F2 - f (ES 1. BTG3. ODC1, MCMS, TTK, NKAIN 1, IDUA, SLC43A 3. TXN DC 5, SLC7A8, MC M5, E LK)
A wealth of statistical information pertaining to the performance and assessment of our F2 prognostic biomarker during both the Discovery Study and the Validation Study is shown below in Figure 4 and Figure 5. The combined results are shown in Figure 6. The cutoff value for function 2 was determined as described herein. Patient samples having an output score less than score less than 13.69 are responders and those with a score greater than or equal to 13.69 are nonresponders. Bibliography
American Cancer Society (2010). Cancer Facts & Figures 2010. Atlanta: American Cancer Society.
Jemal A, Siegel R, Xu J, Ward E. (2010). Cancer statistics. CA Cancer J Clin; 60:277-
300.
Nikas J.B., C. Dirk Keene, and Low W.C. (2010). Comparison of Analytical
Mathematical Approaches for Identifying Key Nuclear Magnetic Resonance Spectroscopy Biomarkers in the Diagnosis and Assessment of Clinical Change of Diseases. Journal of Comparative Neurology, 518: 4091-41 12.
Nikas J.B. and Low W.C. (201 1 ). ROC-Supervised Principal Component Analysis in Connection with the Diagnosis of Diseases. American Journal of Translational Research, 3(2): 180-196.
Nikas J.B. and Low W.C. (201 1). Application of Clustering Analyses to the Diagnosis of Huntington Disease in Mice and Other Diseases with Weil-Defined Group Boundaries.
Computer Methods and Programs in Biomedicine, doi: 10.1016/j.cmpb.201 1.03.004.
Popovici V, Chen W, Brandon, Gallas BG, et al. (2010). Effect of training-sample size and classification difficulty on the accuracy of genomic predictors. Breast Cancer Research; 12:R5.
Tabchy A, Valero V, Vidaurre T, et al. (2010). Evaluation of a 30-Gene Paclitaxel, Fluorouracil, Doxorubicin, and Cyclophosphamide Chemotherapy Response Predictor in a Multicenter Randomized Trial in Breast Cancer.Clin Cancer Res; 16:5351-5361.
All publications, patents and patent applications are incorporated herein by reference. While in the foregoing specification this invention has been described in relation to certain preferred embodiments thereof, and many details have been set forth for purposes of illustration, it will be apparent to those skilled in the art that the invention is susceptible to additional embodiments and that certain of the details described herein may be varied considerably without departing from the basic principles of the invention. Table 6: XI: CCND1
Reference Sequence: NM_053056.2 GI:77628152; SEQ ID NO: 1
1 cacacggact acaggggagt tttgttgaag ttgcaaagtc ctggagcctc cagagggctg 61 tcggcgcagt agcagcgagc agcagagtcc gcacgctccg gcgaggggca gaagagcgcg 121 agggagcgcg gggcagcaga agcgagagcc gagcgcggac ccagccagga cccacagccc 181 tccccagctg cccaggaaga gccccagcca tggaacacca gctcctgtgc tgcgaagtgg 241 aaaccatccg ccgcgcgtac cccgatgcca acctcctcaa cgaccgggtg ctgcgggcca 301 tgctgaaggc ggaggagacc tgcgcgccct cggtgtccta cttcaaatgt gtgcagaagg 361 aggtcctgcc gtccatgcgg aagatcgtcg ccacctggat gctggaggtc tgcgaggaac 421 agaagtgcga ggaggaggtc ttcccgctgg ccatgaacta cctggaccgc ttcctgtcgc 481 tggagcccgt gaaaaagagc cgcctgcagc tgctgggggc cacttgcatg ttcgtggcct 541 ctaagatgaa ggagaccatc cccctgacgg ccgagaagct gtgcatctac accgacaact 601 ccatccggcc cgaggagctg ctgcaaatgg agctgctcct ggtgaacaag ctcaagtgga 661 acctggccgc aatgaccccg cacgatttca ttgaacactt cctctccaaa atgccagagg 721 cggaggagaa caaacagatc atccgcaaac acgcgcagac cttcgttgcc ctctgtgcca 781 cagatgtgaa gttcatttcc aatccgccct ccatggtggc agcggggagc gtggtggccg 841 cagtgcaagg cctgaacctg aggagcccca acaacttcct gtcctactac cgcctcacac 901 gcttcctctc cagagtgatc aagtgtgacc cggactgcct ccgggcctgc caggagcaga 961 tcgaagccct gctggagtca agcctgcgcc aggcccagca gaacatggac cccaaggccg 1021 ccgaggagga ggaagaggag gaggaggagg tggacctggc ttgcacaccc accgacgtgc 1081 gggacgtgga catctgaggg cgccaggcag gcgggcgcca ccgccacccg cagcgagggc 1 141 ggagccggcc ccaggtgctc ccctgacagt ccctcctctc cggagcattt tgataccaga 1201 agggaaagct tcattctcct tgttgttggt tgttttttcc tttgctcttt cccccttcca
1261 tctctgactt aagcaaaaga aaaagattac ccaaaaactg tctttaaaag agagagagag 1321 aaaaaaaaaa tagtatttgc ataaccctga gcggtggggg aggagggttg tgctacagat 1381 gatagaggat tttatacccc aataatcaac tcgtttttat attaatgtac ttgtttctct
1441 gttgtaagaa taggcattaa cacaaaggag gcgtctcggg agaggattag gttccatcct 1501 ttacgtgttt aaaaaaaagc ataaaaacat tttaaaaaca tagaaaaatt cagcaaacca 1561 tttttaaagt agaagagggt tttaggtaga aaaacatatt cttgtgcttt tcctgataaa
1621 gcacagctgt agtggggttc taggcatctc tgtactttgc ttgctcatat gcatgtagtc 1681 actttataag tcattgtatg ttattatatt ccgtaggtag atgtgtaacc tcttcacctt
1741 attcatggct gaagtcacct cttggttaca gtagcgtagc gtgcccgtgt gcatgtcctt 1801 tgcgcctgtg accaccaccc caacaaacca tccagtgaca aaccatccag tggaggtttg 1861 tcgggcacca gccagcgtag cagggtcggg aaaggccacc tgtcccactc ctacgatacg 1921 ctactataaa gagaagacga aatagtgaca taatatattc tatttttata ctcttcctat
1981 ttttgtagtg acctgtttat gagatgctgg ttttctaccc aacggccctg cagccagctc 2041 acgtccaggt tcaacccaca gctacttggt ttgtgttctt cttcatattc taaaaccatt
2101 ccatttccaa gcactttcag tccaataggt gtaggaaata gcgctgtttt tgttgtgtgt 2161 gcagggaggg cagttttcta atggaatggt ttgggaatat ccatgtactt gtttgcaagc 2221 aggactttga ggcaagtgtg ggccactgtg gtggcagtgg aggtggggtg tttgggaggc 2281 tgcgtgccag tcaagaagaa aaaggtttgc attctcacat tgccaggatg ataagttcct 2341 ttccttttct ttaaagaagt tgaagtttag gaatcctttg gtgccaactg gtgtttgaaa
2401 gtagggacct cagaggttta cctagagaac aggtggtttt taagggttat cttagatgtt 2461 tcacaccgga aggtttttaa acactaaaat atataattta tagttaaggc taaaaagtat 2521 atttattgca gaggatgttc ataaggccag tatgatttat aaatgcaatc tccccttgat 2581 ttaaacacac agatacacac acacacacac acacacacaa accttctgcc tttgatgtta 2641 cagatttaat acagtttatt tttaaagata gatcctttta taggtgagaa aaaaacaatc
2701 tggaagaaaa aaaccacaca aagacattga ttcagcctgt ttggcgtttc ccagagtcat ctgattggac aggcatgggt gcaaggaaaa ttagggtact caacctaagt tcggttccga tgaattctta tcccctgccc cttcctttaa aaaacttagt gacaaaatag acaatttgca catcttggct atgtaattct tgtaattttt atttaggaag tgttgaaggg aggtggcaag agtgtggagg ctgacgtgtg agggaggaca ggcgggagga ggtgtgagga ggaggctccc gaggggaagg ggcggtgccc acaccgggga caggccgcag ctccattttc ttattgcgct gctaccgttg acttccaggc acggtttgga aatattcaca tcgcttctgt gtatctcttt cacattgttt gctgctattg gaggatcagt tttttgtttt acaatgtcat atactgccat gtactagttt tagttttctc ttagaacatt gtattacaga tgcctttttt gtagtttttt ttttttttat gtgatcaatt ttgacttaat gtgattactg ctctattcca aaaaggttgc tgtttcacaa tacctcatgc ttcacttagc catggtggac ccagcgggca ggttctgcct gctttggcgg gcagacacgc gggcgcgatc ccacacaggc tggcgggggc cggccccgag gccgcgtgcg tgagaaccgc gccggtgtcc ccagagacca ggctgtgtcc ctcttctctt ccctgcgcct gtgatgctgg gcacttcatc tgatcggggg cgtagcatca tagtagtttt tacagctgtg ttattctttg cgtgtagcta tggaagttgc ataattatta ttattattat tataacaagt gtgtcttacg tgccaccacg gcgttgtacc tgtaggactc tcattcggga tgattggaat agcttctgga atttgttcaa gttttgggta tgtttaatct gttatgtact agtgttctgt ttgttattgt tttgttaatt acaccataat gctaatttaa agagactcca aatctcaatg aagccagctc acagtgctgt gtgccccggt cacctagcaa gctgccgaac caaaagaatt tgcaccccgc tgcgggccca cgtggttggg gccctgccct ggcagggtca tcctgtgctc ggaggccatc tcgggcacag gcccaccccg ccccacccct ccagaacacg gctcacgctt acctcaacca tcctggctgc ggcgtctgtc tgaaccacgc gggggccttg agggacgctt tgtctgtcgt gatggggcaa gggcacaagt cctggatgtt gtgtgtatcg agaggccaaa ggctggtggc aagtgcacgg ggcacagcgg agtctgtcct gtgacgcgca agtctgaggg tctgggcggc gggcggctgg gtctgtgcat ttctggttgc accgcggcgc ttcccagcac caacatgtaa ccggcatgtt tccagcagaa gacaaaaaga caaacatgaa agtctagaaa taaaactggt aaaaccccaa aaaaaaaaaa aaaa
Exemplary Target Sequence for HG-U133A:208712_AT
SEQ ID NO: 2
gttttgggtatgtttaatctgttatgtactagtgttctgtttgttattgttttgttaatt
acaccataatgctaatttaaagagactccaaatctcaatgaagccagctcacagtgctgt
gtgccccggtcatctagcaagctgccgaaccaaaagaatttgcaccccgctgcgggccca cgtggttggggccctgccctggcagggtcatcctgtgctcggaggccatctcgggcacag gcccaccccgccccacccctccagaacacggctcacgcttacctcaaccatcctggctgc ggcgtctgtctgaaccacgcgggggccttgagggacgctttgtctgtcgtgatggggcaa gggcacaagtcctggatgttgtgtgtatcgagaggccaaaggctggtggcaagtgcacgg ggcacagcggagtctgtcctgtgacgcgcaagtctgagggtctgggcggcg
Sample Probes:
SEQ ID NO: 3
GTTTTGGGTATGTTTAATCTGTTAT
SEQ ID NO: 4
CGCAAGTCTGAGGGTCTGGGCGGCG Table 7: X2; CELSR1
Reference Sequence: NM_014246.1 GI:7656966;SEQ ID NO: 5
1 atggcgccgc cgccgccgcc cgtgctgccc gtgctgctgc tcctggccgc cgccgccgcc 61 ctgccggcga tggggctgcg agcggccgcc tgggagccgc gcgtacccgg cgggacccgc 121 gccttcgccc tccggcccgg ctgtacctac gcggtgggcg ccgcttgcac gccccgggcg 181 ccgcgggagc tgctggacgt gggccgcgat gggcggctgg caggacgtcg gcgcgtctcg 241 ggcgcggggc gcccgctgcc gctgcaagtc cgcttggtgg cccgcagtgc cccgacggcg 301 ctgagccgcc gcctgcgggc gcgcacgcac cttcccggct gcggagcccg tgcccggctc 361 tgcggaaccg gtgcccggct ctgcggggcg ctctgcttcc ccgtccccgg cggctgcgcg 421 gccgcgcagc attcggcgct cgcagctccg accaccttac ccgcctgccg ctgcccgccg 481 cgccccaggc cccgctgtcc cggccgtccc atctgcctgc cgccgggcgg ctcggtccgc 541 ctgcgtctgc tgtgcgccct gcggcgcgcg gctggcgccg tccgggtggg actggcgctg 601 gaggccgcca ccgcggggac gccctccgcg tcgccatccc catcgccgcc cctgccgccg 661 aacttgcccg aagcccgggc ggggccggcg cgacgggccc ggcggggcac gagcggcaga 721 gggagcctga agtttccgat gcccaactac caggtggcgt tgtttgagaa cgaaccggcg 781 ggcaccctca tcctccagct gcacgcgcac tacaccatcg agggcgagga ggagcgcgtg 841 agctattaca tggaggggct gttcgacgag cgctcccggg gctacttccg aatcgactct 901 gccacgggcg ccgtgagcac ggacagcgta ctggaccgcg agaccaagga gacgcacgtc 961 ctcagggtga aagccgtgga ctacagtacg ccgccgcgct cggccaccac ctacatcact 1021 gtcttggtca aagacaccaa cgaccacagc ccggtcttcg agcagtcgga gtaccgcgag 1081 cgcgtgcggg agaacctgga ggtgggctac gaggtgctga ccatccgcgc cagcgaccgc 1 141 gactcgccca tcaacgccaa cttgcgttac cgcgtgttgg ggggcgcgtg ggacgtcttc 1201 cagctcaacg agagctctgg cgtggtgagc acacgggcgg tgctggaccg ggaggaggcg 1261 gccgagtacc agctcctggt ggaggccaac gaccaggggc gcaatccggg cccgctcagt 1321 gccacggcca ccgtgtacat cgaggtggag gacgagaacg acaactaccc ccagttcagc 1381 gagcagaact acgtggtcca ggtgcccgag gacgtggggc tcaacacggc tgtgctgcga 1441 gtgcaggcca cggaccggga ccagggccag aacgcggcca ttcactacag catcctcagc 1501 gggaacgtgg ccggccagtt ctacctgcac tcgctgagcg ggatcctgga tgtgatcaac 1561 cccttggatt tcgaggatgt ccagaaatac tcgctgagca ttaaggccca ggatgggggc 1621 cggcccccgc tcatcaattc ttcaggggtg gtgtctgtgc aggtgctgga tgtcaacgac 1681 aacgagccta tctttgtgag cagccccttc caggccacgg tgctggagaa tgtgcccctg 1741 ggctaccccg tggtgcacat tcaggcggtg gacgcggact ctggagagaa cgcccggctg 1801 cactatcgcc tggtggacac ggcctccacc tttctggggg gcggcagcgc tgggcctaag 1861 aatcctgccc ccacccctga cttccccttc cagatccaca acagctccgg ttggatcaca 1921 gtgtgtgccg agctggaccg cgaggaggtg gagcactaca gcttcggggt ggaggcggtg 1981 gaccacggct cgccccccat gagctcctcc accagcgtgt ccatcacggt gctggacgtg 2041 aatgacaacg acccggtgtt cacgcagccc acctacgagc ttcgtctgaa tgaggatgcg 2101 gccgtgggga gcagcgtgct gaccctgcag gcccgcgacc gtgacgccaa cagtgtgatt 2161 acctaccagc tcacaggcgg caacacccgg aaccgctttg cactcagcag ccagagaggg 2221 ggcggcctca tcaccctggc gctacctctg gactacaagc aggagcagca gtacgtgctg 2281 gcggtgacag catccgacgg cacacggtcg cacactgcgc atgtcctaat caacgtcact 2341 gatgccaaca cccacaggcc tgtctttcag agctcccatt acacagtgag tgtcagtgag 2401 gacaggcctg tgggcacctc cattgctacc ctcagtgcca acgatgagga cacaggagag 2461 aatgcccgca tcacctacgt gattcaggac cccgtgccgc agttccgcat tgaccccgac 2521 agtggcacca tgtacaccat gatggagctg gactatgaga accaggtcgc ctacacgctg 2581 accatcatgg cccaggacaa cggcatcccg cagaaatcag acaccaccac cctagagatc 2641 ctcatcctcg atgccaatga caatgcaccc cagttcctgt gggatttcta ccagggttcc 2701 atctttgagg atgctccacc ctcgaccagc atcctccagg tctctgccac ggaccgggac 2761 tcaggtccca atgggcgtct gctgtacacc ttccagggtg gggacgacgg cgatggggac Table 7 cont'd
2821 ttctacatcg agcccacgtc cggtgtgatt cgcacccagc gccggctgga ccgggagaat 2881 gtggccgtgt acaacctttg ggctctggct gtggatcggg gcagtcccac tccccttagc 2941 gcctcggtag aaatccaggt gaccatcttg gacattaatg acaatgcccc catgtttgag 3001 aaggacgaac tggagctgtt tgttgaggag aacaacccag tggggtcggt ggtggcaaag 3061 attcgtgcta acgaccctga tgaaggccct aatgcccaga tcatgtatca gattgtggaa 3121 ggggacatgc ggcatttctt ccagctggac ctgctcaacg gggacctgcg tgccatggtg 3181 gagctggact ttgaggtccg gcgggagtat gtgctggtgg tgcaggccac gtcggctccg 3241 ctggtgagcc gagccacggt gcacatcctt ctcgtggacc agaatgacaa cccgcctgtg 3301 ctgcccgact tccagatcct cttcaacaac tatgtcacca acaagtccaa cagtttcccc 3361 accggcgtga tcggctgcat cccggcccat gaccccgacg tgtcagacag cctcaactac 3421 accttcgtgc agggcaacga gctgcgcctg ttgctgctgg accccgccac gggcgaactg 3481 cagctcagcc gcgacctgga caacaaccgg ccgctggagg cgctcatgga ggtgtctgtg 3541 tctgatggca tccacagcgt cacggccttc tgcaccctgc gtgtcaccat catcacggac 3601 gacatgctga ccaacagcat cactgtccgc ctggagaaca tgtcccagga gaagttcctg 3661 tccccgctgc tggccctctt cgtggagggg gtggccgccg tgctgtccac caccaaggac 3721 gacgtcttcg tcttcaacgt ccagaacgac accgacgtca gctccaacat cctgaacgtg 3781 accttctcgg cgctgctgcc tggcggcgtc cgcggccagt tcttcccgtc ggaggacctg 3841 caggagcaga tctacctgaa tcggacgctg ctgaccacca tctccacgca gcgcgtgctg 3901 cccttcgacg acaacatctg cctgcgcgag ccctgcgaga actacatgaa gtgcgtgtcc 3961 gttctgcgat tcgacagctc cgcgcccttc ctcagctcca ccaccgtgct cttccggccc 4021 atccacccca tcaacggcct gcgctgccgc tgcccgcccg gcttcaccgg cgactactgc 4081 gagacggaga tcgacctctg ctactccgac ccgtgcggcg ccaacggccg ctgccgcagc 4141 cgcgagggcg gctacacctg cgagtgcttc gaggacttca ctggagagca ctgtgaggtg 4201 gatgcccgct caggccgctg tgccaacggg gtgtgcaaga acgggggcac ctgcgtgaac 4261 ctgctcatcg gcggcttcca ctgcgtgtgt cctcctggcg agtatgagag gccctactgt 4321 gaggtgacca ccaggagctt cccgccccag tccttcgtca ccttccgggg cctgagacag 4381 cgcttccact tcaccatctc cctcacgttt gccactcagg aaaggaacgg cttgcttctc 4441 tacaacggcc gcttcaatga gaagcacgac ttcatcgccc tggagatcgt ggacgagcag 4501 gtgcagctca ccttctctgc aggcgagaca acaacgaccg tggcaccgaa ggttcccagt 4561 ggtgtgagtg acgggcggtg gcactctgtg caggtgcagt actacaacaa gcccaatatt 4621 ggccacctgg gcctgcccca tgggccgtcc ggggaaaaga tggccgtggt gacagtggat 4681 gattgtgaca caaccatggc tgtgcgcttt ggaaaggaca tcgggaacta cagctgcgct 4741 gcccagggca ctcagaccgg ctccaagaag tccctggatc tgaccggccc tctactcctg 4801 gggggtgtcc ccaacctgcc agaagacttc ccagtgcaca accggcagtt cgtgggctgc 4861 atgcggaacc tgtcagtcga cggcaaaaat gtggacatgg ccggattcat cgccaacaat 4921 ggcacccggg aaggctgcgc tgctcggagg aacttctgcg atgggaggcg gtgtcagaat 4981 ggaggcacct gtgtcaacag gtggaatatg tatctgtgtg agtgtccact ccgattcggc 5041 gggaagaact gtgagcaagc catgcctcac ccccagctct tcagcggtga gagcgtcgtg 5101 tcctggagtg acctgaacat catcatctct gtgccctggt acctggggct catgttccgg 5161 acccggaagg aggacagcgt tctgatggag gccaccagtg gtgggcccac cagctttcgc 5221 ctccagatcc tgaacaacta cctccagttt gaggtgtccc acggcccctc cgatgtggag 5281 tccgtgatgc tgtccgggtt gcgggtgacc gacggggagt ggcaccacct gctgatcgag 5341 ctgaagaatg ttaaggagga cagtgagatg aagcacctgg tcaccatgac cttggactat 5401 gggatggacc agaacaaggc agatatcggg ggcatgcttc ccgggctgac ggtaaggagc 5461 gtggtggtcg gaggcgcctc tgaagacaag gtctccgtgc gccgtggatt ccgaggctgc 5521 atgcagggag tgaggatggg ggggacgccc accaacgtcg ccaccctgaa catgaacaac 5581 gcactcaagg tcagggtgaa ggacggctgt gatgtggacg acccctgtac ctcgagcccc 5641 tgtcccccca atagccgctg ccacgacgcc tgggaggact acagctgcgt ctgtgacaaa 5701 gggtaccttg gaataaactg tgtggatgcc tgtcacctga acccctgcga gaacatgggg 5761 gcctgcgtgc gctcccccgg ctccccgcag ggctacgtgt gcgagtgtgg gcccagtcac 5821 tacgggccgt actgtgagaa caaactcgac cttccgtgcc ccagaggctg gtgggggaac 5881 cccgtctgtg gaccctgcca ctgtgccgtc agcaaaggct ttgatcccga ctgtaataag 5941 accaacggcc agtgccaatg caaggagaat tactacaagc tcctagccca ggacacctgt 6001 ctgccctgcg actgcttccc ccatggctcc cacagccgca cttgcgacat ggccaccggg 6061 cagtgtgcct gcaagcccgg cgtcatcggc cgccagtgca accgctgcga caacccgttt 6121 gccgaggtca ccacgctcgg ctgtgaagtg atctacaatg gctgtcccaa agcatttgag 6181 gccggcatct ggtggccaca gaccaagttc gggcagccgg ctgcggtgcc atgccctaag 6241 ggatccgttg gaaatgcggt ccgacactgc agcggggaga agggctggct gcccccagag 6301 ctctttaact gtaccaccat ctccttcgtg gacctcaggg ccatgaatga gaagctgagc 6361 cgcaatgaga cgcaggtgga cggcgccagg gccctgcagc tggtgagggc gctgcgcagt 6421 gctacacagc acacgggcac gctctttggc aatgacgtgc gcacggccta ccagctgctg 6481 ggccacgtcc ttcagcacga gagctggcag cagggcttcg acctggcagc cacgcaggac 6541 gccgactttc acgaggacgt catccactcg ggcagcgccc tcctggcccc agccaccagg 6601 gcggcgtggg agcagatcca gcggagcgag ggcggcacgg cacagctgct ccggcgcctc 6661 gagggctact tcagcaacgt ggcacgcaac gtgcggcgga cgtacctgcg gcccttcgtc 6721 atcgtcaccg ccaacatgat tcttgctgtc gacatctttg acaagttcaa ctttacggga 6781 gccagggtcc cgcgattcga caccatccat gaagagttcc ccagggagct ggagtcctcc 6841 gtctccttcc cagccgactt cttcagacca cctgaagaaa aagaaggccc cctgctgagg 6901 ccggctggcc ggaggaccac cccgcagacc acgcgcccgg ggcctggcac cgagagggag 6961 gccccgatca gcaggcggag gcgacaccct gatgacgctg gccagttcgc cgtcgctctg 7021 gtcatcattt accgcaccct ggggcagctc ctgcccgagc gctacgaccc cgaccgtcgc 7081 agcctccggt tgcctcaccg gcccatcatt aataccccga tggtgagcac gctggtgtac 7141 agcgaggggg ctccgctccc gagacccctg gagaggcccg tcctggtgga gttcgccctg 7201 ctggaggtgg aggagcgaac caagcctgtc tgcgtgttct ggaaccactc cctggccgtt 7261 ggtgggacgg gagggtggtc tgcccggggc tgcgagctcc tgtccaggaa ccggacacat 7321 gtcgcctgcc agtgcagcca cacagccagc tttgcggtgc tcatggatat ctccaggcgt 7381 gagaacgggg aggtcctgcc tctgaagatt gtcacctatg ccgctgtgtc cttgtcactg 7441 gcagccctgc tggtggcctt cgtcctcctg agcctggtcc gcatgctgcg ctccaacctg 7501 cacagcattc acaagcacct cgccgtggcg ctcttcctct ctcagctggt gttcgtgatt 7561 gggatcaacc agacggaaaa cccgtttctg tgcacagtgg ttgccatcct cctccactac 7621 atctacatga gcacctttgc ctggaccctc gtggagagcc tgcatgtcta ccgcatgctg 7681 accgaggtgc gcaacatcga cacggggccc atgcggttct actacgtcgt gggctggggc 7741 atcccggcca ttgtcacagg actggcggtc ggcctggacc cccagggcta cgggaacccc 7801 gacttctgct ggctgtcgct tcaagacacc ctgatttgga gctttgcggg gcccatcgga 7861 gctgttataa tcatcaacac agtcacttct gtcctatctg caaaggtttc ctgccaaaga 7921 aagcaccatt attatgggaa aaaagggatc gtctccctgc tgaggaccgc attcctcctg 7981 ctgctgctca tcagcgccac ctggctgctg gggctgctgg ctgtgaaccg cgatgcactg 8041 agctttcact acctcttcgc catcttcagc ggcttacagg gccccttcgt cctccttttc 8101 cactgcgtgc tcaaccagga ggtccggaag cacctgaagg gcgtgctcgg cgggaggaag 8161 ctgcacctgg aggactccgc caccaccagg gccaccctgc tgacgcgctc cctcaactgc 8221 aacaccacct tcggtgacgg gcctgacatg ctgcgcacag acttgggcga gtccaccgcc 8281 tcgctggaca gcatcgtcag ggatgaaggg atccagaagc tcggcgtgtc ctctgggctg 8341 gtgaggggca gccacggaga gccagacgcg tccctcatgc ccaggagctg caaggatccc 8401 cctggccacg attccgactc agatagcgag ctgtccctgg atgagcagag cagctcttac 8461 gcctcctcac actcgtcaga cagcgaggac gatggggtgg gagctgagga aaaatgggac 8521 ccggccaggg gcgccgtcca cagcaccccc aaaggggacg ctgtggccaa ccacgttccg 8581 gccggctggc ccgaccagag cctggctgag agtgacagtg aggaccccag cggcaagccc 8641 cgcctgaagg tggagaccaa ggtcagcgtg gagctgcacc gcgaggagca gggcagtcac 8701 cgtggagagt accccccgga ccaggagagc gggggcgcag ccaggcttgc tagcagccag 8761 cccccagagc agaggaaagg catcttgaaa aataaagtca cctacccgcc gccgctgacg 8821 ctgacggagc agacgctgaa gggccggctc cgggagaagc tggccgactg tgagcagagc 8881 cccacatcct cgcgcacgtc ttccctgggc tctggcggcc ccgactgcgc catcacagtc 8941 aagagccctg ggagggagcc ggggcgtgac cacctcaacg gggtggccat gaatgtgcgc 9001 actgggagcg cccaggccga tggctccgac tctgagaaac cgtgaggcaa gcccgtcacc 9061 ccacacaggc tgcggcatca ccctcagacc ttggagccca aggggccact gcccttgaag 9121 tggagtgggc ccagagtgtg gcggtcccca tggtggcagc cccccgactg atcatccaga 9181 cacaaaggtc ttggttctcc caggagctca gggcctgtca gacctggtga caagtgccaa 9241 aggccacagg catgagggag gcgtggacca ctgggccagc accgctgagt cctaagactg 9301 cagtcaaagc cagaactgag aggggacccc agactgggcc cagaggctgg ccagagttca 9361 ggaacgccgg gcacagacca aagaccgcgg tccagccccg cccaggcggg catctcatgg 9421 cagtgcggac ccgtggctgg cagcccgggc agtcctttgc aaaggcaccc cttgtcttaa 9481 aatcacttcg ctatgtggga aaggtggaga tacttttata tatttgtatg ggactctgag 9541 gaggtgcaac ctgtatatat attgcattcg tgctgacttt gttatcccga gagatccatg 9601 caatgatctc ttgctgtctt ctctgtcaag attgcacagt tgtacttgaa tctggcatgt 9661 gttgacgaaa ctggtgcccc agcagatcaa aggtgggaaa tacgtcagca gtggggctaa 9721 aaccaagcgg ctagaagccc tacagctgcc ttcggccagg aagtgaggat ggtgtgggcc 9781 ctccccgccg gccccctggg tccccagtgt tcgctgtgtg tgcgtttgtc ctctgctgcc 9841 atctgccccg gctgtgtgaa ttcaagacag ggcagtgcag cactaggcag gtgtgaggag 9901 ccctgctgag gtcactgtgg ggcacggttg ccacacggct gtcatttttc acctggtcat 9961 tctgtgacca ccaccccctc ccctcaccgc ctcccaggtg gcccgggagc tgcaggtggg 10021 gatggctttg tcctttgctc ctgctccccg tgggacctgg gaccttaaag cgttgcaggt 10081 tcctgatttg gacagaggtg tggggccttc caggccgtta catacctcct gccaattctc 10141 taactctctg agactgcgag gatctccagg cagggttctc ccctctggag tctgaccaat 10201 tacttcattt tgcttcaaat ggccaattgt gcagagggac aaagccacag ccacactctt 10261 caacggttac caaactgttt ttggaaattc acaccaaggt cgggcccact gcaggcagct 10321 ggcacagcgt ggcccgaggg gctgtggaac gggtcccgga actgtcagac atgtttgatt 10381 ttagcgtttc ctttgttctt caaatcaggt gcccaaataa gtgatcagca cagctgcttc 10441 caaataggag aaaccataaa ataggatgaa aatcaagtaa aatgcaaaga tgtccacact 10501 gttttaaact tgaccctgat gaaaatgtga gcactgttag cagatgccta tgggagagga 10561 aaagcgtatc tgaaaatggt ccaggacagg aggatgaaat gagatcccag agtcctcaca 10621 cctgaatgaa ttatacatgt gccttaccag gtgagtggtc tttcgaagat aaaaaactct 10681 agtcccttta aacgtttgcc cctggcgttt cctaagtacg aaaaggtttt taagtcttcg 10741 aacagtctcc tttcatgact ttaacaggat tctgccccct gaggtgtaat ttttttgttc 10801 tatttttttc cacgtactcc acagccaaca tcacgaggtg taatttttaa tttgatcaga 10861 actgttacca aaaaacaact gtcagtttta ttgagatggg aaaaatgtaa acctattttt 10921 attacttaag actttatggg agagattaga cactggaggt ttttaacaga acgtgtattt 10981 attaatgttc aaaacactgg aattacaaat gagaagagtc tacaataaat taagattttt 1 1041 gaatttgtac ttctgcggtg ctggtttttc tccacaaaca cccccgcccc tccccatgcc 1 1 101 cagggtggcc gtggaaggga cggtttacgg acgtgcagct gagctgtccg tgtcccatgc 1 1 161 tccctcagcc agtggaacgt gccggaactt tttgtccatt ccctagtagg cctgccacag 1 1221 cctagatggg cagtttttgt ctttcaccaa atttgaggac tttttttttt tgccattatt
1 1281 tcttcagttt tcttttcttg cactgatctt tctcctctcc ttctgtgact ccagtgactc
1 1341 agacgttaga cctcttgatg ttttcccact ggtccctgag gctctgttc Table 7 cont'd
Exemplary Target Sequence for HG-U133A:41660_AT; SEQ ID NO: 6 gatgtccacactgttttaaacttgacnnnnnnnnnnnnnnnagcactgttagcagatgcc tatgggagaggaaaagcgtatctgaaaatggtccaggacaggaggatgaaatgagatccc agagtcctcacacctgaatgaattatacatgtgccttaccaggtgagtggtctttcgaag
ataaaaaactctagtccctttaaacgtttgcccctggcgtttcctaagtacgaaaaggtt
tttaagtcttcgaacagtctcctttcatgactttaacaggattctgccccctgaggtgta
atttttttgttctatttttttccacgtactccacagccaacatcacgaggtgtaattttt
Sample Probes:
SEQ ID NO: 7
GATGTCCACACTGTTTTAAACTTGA
SEQ ID NO: 8
TGCCCCTGGCGTTTCCTAAGTACGA
Table 8 :X3: DKFZp566H0824 (aka FAM117B. ALS2CR13, FLJ38771, Mapkll, MGC90831)
Reference Sequence: BC029383.1 GI:23273884;SEQ ID NO. 9
1 tccctgtaag actcttgttc tcatttcaca ctggcgcgtc cccggggctg gggaaatcat
61 tttcattttt gagagtaacc ttaactaatt aaatggatca agggactctt tgtgttttgt
121 tttcattttg ttttgttttt tactttaatg tacagataaa ttaagggaaa taataattac
181 cttgaataat tttgattctc gtaactaatt ttccacctat gtgtgtttag ctcaattttg
241 aaatggttca ctgaaagatt tccaaatttc ttatcactga aactgagtta tttcactttt
301 atgactgtta gtgactcctc tgttttatgt atttctggtg cagtataatt cttctatgtc
361 tgtgtgtctg tgatagatct cagttcaagt tagcgtttcc cgagcaccca tcttgtgtca
421 ggcactctgc taatgattgc gcaggttaca aggtttagag tggccacatc ccctgttcac
481 ttgcagttta gagtccagtc ggaggaaagg gaggagggga atcatgttat agacataact
541 tgatatcctg caggactcag tgatgtatga gccacaaaaa ggtcacagct gagtgagaga
601 gttggtaatc atggaaacag gcgcatcgga aactcgcttc ctatgggtca gtaaagtgtg
661 tatgaggtgg tgtggatgac agcacataga gctggagggg tgggccagca gcaaccagga
721 gaaggcgttc caggtgtctg cacccaagga gatcctgggg tcagaggcag gacatgtggg
781 aaaaaccatt ggtccctttt acccagggaa ggactcaaga agccgaacgt ggtaggagat
841 ggaggctccc aagccctggc agacgttaca gcagcgctga gcactttcag gaccaggact
901 ccaaggtcca gctctggaac gcccctctct gccctgactt tggttccttc atcagcagag
961 gggctcctgg gctatgggct caagtctgaa gtcaccttaa agagaaatct ctacctttct
1021 gttctccttc attgctgagg attttgactt gtgttgaaaa gtttctgaag ctttctgcag
1081 ctggaaaatc aagcttttaa aaaagctctt gatgggccag gcttggtggc tcacgcctgt
1 141 aatcccagca ctttaggagg ccaaggcagg tggatcacga gatcaggagt ttgagactag
1201 cctggtcaac atggtgaaac gccatcccta ctaaaaatac aaaaattagc caggtgcggt
1261 ggcacgtgcc tatagttcca gctgctcagg aggttgaggg agaagaatca cttaaaccca
1321 ggaggtggag gttgcagtga gctgagacca tgccactgca ctccagcctg ggcaacagag
1381 caaggctctg ttttgtttcg caaaaaaaaa aaaaaaaaaa aaaaaaaaaa a
Exemplary Target Sequence for HG-U133A:207470_AT
SEQ ID NO: 10
gataggagatggaggctcccaagccctggcagacgttacagcagcgctgagcactttcag
gaccaggactccaaggtccagctctggaacgcccctctctgccctgactttggttccttc
atcagcagaggggctcctgggctatgggctcaagtctgaagtcaccttaaagagaaatct
ctacctttctgttctccttcattgctgaggattttgacttgtgttgaaaagtttctgaag
ctttctgcagctggaaaatcaagcttttaaaaaagctcttgatgggccaggcttggtggc
tcacgcctgtaatcccagcactttaggaggccaaggcaggtggatcatgagatcaggagt
ttgagactagcctggtcagcatggtgaaacgccatccctactaaggatac
Sample Probes:
SEQ ID NO: 1 1
GATAGGAGATGGAGGCTCCCAAGCC
SEQ ID NO: 12
GAAACGCCATCCCTACTAAGGATAC Table 9;X4; FAAH
Reference Sequence: BC093632.1 GI:62739402;SEQ ID NO: 13
1 aggcagcagc aggctgaagg gatcatggtg cagtacgagc tgtgggccgc gctgcctggc 61 gcctccgggg tcgccctggc ctgctgcttc gtggcggcgg ccgtggccct gcgctggtcc 121 gggcgccgga cggcgcgggg cgcggtggtc cgggcgcgac agaggcagcg agcgggcctg 181 gagaacatgg acagggcggc gcagcgcttc cggctccaga acccagacct ggactcagag 241 gcgctgctag ccctgcccct gcctcagctg gtgcagaagt tacacagtag agagctggcc 301 cctgaggccg tgctcttcac ctatgtggga aaggcctggg aagtgaacaa agggaccaac 361 tgtgtgacct cctatctggc tgactgtgag actcagctgt ctcaggcccc aaggcagggc 421 ctgctctatg gcgtccctgt gagcctcaag gagtgcttca cctacaaggg ccaggactcc 481 acgctgggct tgagcctgaa tgaaggggtg ccggcggagt gcgacagcgt agtggtgcat 541 gtgctgaagc tgcagggtgc cgtgcccttc gtgcacacca atgttccaca gtccatgttc 601 agctatgact gcagtaaccc cctctttggc cagaccgtga acccatggaa gtcctccaaa 661 agcccagggg gctcctcagg gggtgaaggg gccctcatcg ggtctggagg ctcccccctg 721 ggcttaggca ctgatatcgg aggcagcatc cgcttcccct cctccttctg cggcatctgc 781 ggcctcaagc ccacagggaa ccgcctcagc aagagtggcc tgaagggctg tgtctatgga 841 caggaggcag tgcgtctctc cgtgggcccc atggcccggg acgtggagag cctggcactg 901 tgcctgcgag ccctgctgtg cgaggacatg ttccgcttgg accccactgt gcctcccttg 961 cccttcagag aagaggtcta caccagctct cagcccctgc gtgtggggta ctatgagact 1021 gacaactata ccatgccctc cccggccatg aggcgggccg tgctggagac caaacagagc 1081 cttgaggctg cggggcacac gctggttccc ttcttgccaa gcaacatacc ccatgctctg 1 141 gagaccctgt caacaggtgg gctcttcagt gatggtggcc acaccttcct acagaacttc 1201 aaaggtgatt tcgtggaccc ctgcctgggg gacctggtct caattctgaa gcttccccaa 1261 tggcttaaag gactgctggc cttcctggtg aagcctctgc tgccaaggct gtcagctttc 1321 ctcagcaaca tgaagtctcg ttcggctgga aaactctggg aactgcagca cgagatcgag 1381 gtgtaccgca aaaccgtgat tgcccagtgg agggcgctgg acctggatgt ggtgctgacc 1441 cccatgctgg cccctgctct ggacttgaat gccccaggca gggccacagg ggccgtcagc 1501 tacactatgc tgtacaactg cctggacttc cctgcagggg tggtgcctgt caccacggtg 1561 actgctgagg acgaggccca gatggaacat tacaggggct actttgggga tatctgggac 1621 aagatgctgc agaagggcat gaagaagagt gtggggctgc cggtggccgt gcagtgtgtg 1681 gctctgccct ggcaagaaga gttgtgtctg cggttcatgc gggaggtgga gcgactgatg 1741 acccctgaaa agcagtcatc ctgatggctc tggctccaga ggacctgaga ctcacactct 1801 ctgcagccca g
Exemplary Target Sequence for HG-U133A:204231_S_AT; SEQ ID NO: 14 cctgctctggacttgaatgccccaggcagggccacaggggccgtcagctacactatgctg
tacaactgcctggacttccctgcaggggtggtgcctgtcaccacggtgactgctgaggac
gaggcccagatggaacattacaggggctactttggggatatctgggacaagatgctgcag
aagggcatgaagaagagtgtggggctgccggtggccgtgcagtgtgtggctctgccctgg
caagaagagttgtgtctgcggttcatgcgggaggtggagcgactgatgacccctgaaaag
cagtcatcctgatggctctggctccagaggacctgagactcacactctctgcagcccagc
ctagtcagggcacagctgccctgctgccacagcaaggaaatgtcctgcatggggcagagg
cttccgtgtcctctcccccaaccccctgcaagaagcgccgactccctgagtctggacctc
catccctgctctggtcccctctcttcgtcctgatccctccacccccatgtggcagcccat
gggt
Sample Probes:
CCTGCTCTGGACTTGAATGCCCCAG(SEQ ID NO: 15)
CACCCCCATGTGGCAGCCCATGGGT (SEQ ID NO: 16) Table 10:XS: IGKV1-5
Reference Sequence: NG_000834.1 GI: 19718803
SEQ ID NO: 17
1 atggacatga gggtccccgc tcagctcctg gggctcctgc tgctctggct cccaggtaag 61 gaaggagaac actaggaatt tactcagccc agtgtgctca gtactgcctg gttattcagg 121 gaagtcttcc tataatatga tcaatagtat gaatatttgt gtttctattt ccaatctcag 181 gtgccaaatg tgacatccag atgacccagt ctccttccac cctgtctgca tctgtaggag 241 acagagtcac catcacttgc cgggccagtc agagtattag tagctggttg gcctggtatc 301 agcagaaacc agggaaagcc cctaagctcc tgatctataa ggcgtctagt ttagaaagtg 361 gggtcccatc aaggttcagc ggcagtggat ctgggacaga attcactctc accatcagca 421 gcctgcagcc tgatgatttt gcaacttatt actgccaaca gtataatagt tattct
Exemplary Target Sequence for HG-U133A:214768_X_AT
SEQ ID NO: 18
aatgctctgggtctctggatccagtggggnatattgtgatgactcagtctccactctccc tgcccgtcacccctggagagccggcctccatctcctgcaggtctagtcagagcctcntgc atnntaatggatacaactatttggattggtacctgcagaagccagggcagtctccacagc tcctgatctatttgggttctaatcgggcctccggggtcccngacaggttcagtggcagtg gatcaggcacagattttacactgaaaatcagcagagtggaggctgaggatgttggggttt attactgcatgcaagctctacaaactcctcngacnttnggccangggaccaagntggana tcaaacgaactgtggctgcaccatct
Sample Probes:
SEQ ID NO: 19
AATGCTCTGGGTCTCTGGATCCAGT
SEQ ID NO: 20
CAAACGAACTGTGGCTGCACCATCT
Table 11:X6: LAMA5
Reference Sequence: NM_005560.3 GL21264601
SEQ ID NO: 21
1 agacccgccg ggctcccgcc gcgcgcgctg tccctggagc tcggggacgc ggcccggagc 61 cgggaagatg gcgaagcggc tctgcgcggg gagcgcactg tgtgttcgcg gcccccgggg 121 ccccgcgccg ctgctgctgg tcgggctggc gctgctgggc gcggcgcggg cgcgggagga 181 ggcgggcggc ggcttcagcc tgcacccgcc ctacttcaac ctggccgagg gcgcccgcat 241 cgccgcctcc gcgacctgcg gagaggaggc cccggcgcgc ggctccccgc gccccaccga 301 ggacctttac tgcaagctgg tagggggccc cgtggccggc ggcgacccca accagaccat 361 ccggggccag tactgtgaca tctgcacggc tgccaacagc aacaaggcac accccgcgag 421 caatgccatc gatggcacgg agcgctggtg gcagagtcca ccgctgtccc gcggcctgga 481 gtacaacgag gtcaacgtca ccctggacct gggccaggtc ttccacgtgg cctacgtcct 541 catcaagttt gccaactcac cccggccgga cctctgggtg ctggagcggt ccatggactt 601 cggccgcacc taccagccct ggcagttctt tgcctcctcc aagagggact gtctggagcg 661 gttcgggcca cagacgctgg agcgcatcac acgggacgac gcggccatct gcaccaccga 721 gtactcacgc atcgtgcccc tggagaacgg agagatcgtg gtgtccctgg tgaacggacg 781 tccgggcgcc atgaatttct cctactcgcc gctgctacgt gagttcacca aggccaccaa 841 cgtccgcctg cgcttcctgc gtaccaacac gctgctgggc catctcatgg ggaaggcgct 901 gcgggacccc acggtcaccc gccggtatta ttacagcatc aaggatatca gcatcggagg 961 ccgctgtgtc tgccacggcc acgcggatgc ctgcgatgcc aaagacccca cggacccgtt 1021 caggctgcag tgcacctgcc agcacaacac ctgcgggggc acctgcgacc gctgctgccc 1081 cggcttcaat cagcagccgt ggaagcctgc gactgccaac agtgccaacg agtgccagtc 1 141 ctgtaactgc tacggccatg ccaccgactg ttactacgac cctgaggtgg accggcgccg 1201 cgccagccag agcctggatg gcacctatca gggtgggggt gtctgtatcg actgccagca 1261 ccacaccacc ggcgtcaact gtgagcgctg cctgcccggc ttctaccgct ctcccaacca 1321 ccctctcgac tcgccccacg tctgccgccg ctgcaactgc gagtccgact tcacggatgg 1381 cacctgcgag gacctgacgg gtcgatgcta ctgccggccc aacttctctg gggagcggtg 1441 tgacgtgtgt gccgagggct tcacgggctt cccaagctgc tacccgacgc cctcgtcctc 1501 caatgacacc agggagcagg tgctgccagc cggccagatt gtgaattgtg actgcagcgc 1 61 ggcagggacc cagggcaacg cctgccggaa ggacccaagg gtgggacgct gtctgtgcaa 1621 acccaacttc caaggcaccc attgtgagct ctgcgcgcca gggttctacg gccccggctg 1681 ccagccctgc cagtgttcca gccctggagt ggccgatgac cgctgtgacc ctgacacagg 1741 ccagtgcagg tgccgagtgg gcttcgaggg ggccacatgt gatcgctgtg cccccggcta 1801 ctttcacttc cctctctgcc agttgtgtgg ctgcagccct gcaggaacct tgcccgaggg 1861 ctgcgatgag gccggccgct gcctatgcca gcctgagttt gctggacctc attgtgaccg 1921 gtgccgccct ggctaccatg gtttccccaa ctgccaagca tgcacctgcg accctcgggg 1981 agccctggac cagctctgtg gggcgggagg tttgtgccgc tgccgccccg gctacacagg 2041 cactgcctgc caggaatgca gccccggctt tcacggcttc cccagctgtg tcccctgcca 2101 ctgctctgct gaaggctccc tgcacgcagc ctgtgacccc cggagtgggc agtgcagctg 2161 ccggccccgt gtgacggggc tgcggtgtga cacatgtgtg cccggtgcct acaacttccc 2221 ctactgcgaa gctggctctt gccaccctgc cggtctggcc ccagtggatc ctgcccttcc 2281 tgaggcacag gttccctgta tgtgccgggc tcacgtggag gggccgagct gtgaccgctg 2341 caaacctggg ttctggggac tgagccccag caaccccgag ggctgtaccc gctgcagctg 2401 cgacctcagg ggcacactgg gtggagttgc tgagtgccag ccgggcaccg gccagtgctt 2461 ctgcaagccc cacgtgtgcg gccaggcctg cgcgtcctgc aaggatggct tctttggact 2521 ggatcaggct gactattttg gctgccgcag ctgccggtgt gacattggcg gtgcactggg 2581 ccagagctgt gaaccgagga cgggcgtctg ccggtgccgc cccaacaccc agggccccac 2641 ctgcagcgag cctgcgaggg accactacct cccggacctg caccacctgc gcctggagct 2701 ggaggaggct gccacacctg agggtcacgc cgtgcgcttt ggcttcaacc ccctcgagtt Table 1 1 cont'd
2761 cgagaacttc agctggaggg gctacgcgca gatggcacct gtccagccca ggatcgtggc 2821 caggctgaac ctgacctccc ctgacctttt ctggctcgtc ttccgatacg tcaaccgggg 2881 ggccatgagt gtgagcgggc gggtctctgt gcgagaggag ggcaggtcgg ccacctgcgc 2941 caactgcaca gcacagagtc agcccgtggc cttcccaccc agcacggagc ctgccttcat 3001 caccgtgccc cagaggggct tcggagagcc ctttgtgctg aaccctggca cctgggccct 3061 gcgtgtggag gccgaagggg tgctcctgga ctacgtggtt ctgctgccta gcgcatacta 3121 cgaggcggcg ctcctgcagc tgcgggtgac tgaggcctgc acataccgtc cctctgccca 3181 gcagtctggc gacaactgcc tcctctacac acacctcccc ctggatggct tcccctcggc 3241 cgccgggctg gaggccctgt gtcgccagga caacagcctg ccccggccct gccccacgga 3301 gcagctcagc ccgtcgcacc cgccactgat cacctgcacg ggcagtgatg tggacgtcca 3361 gcttcaagtg gcagtgccac agccaggccg ctatgcccta gtggtggagt acgccaatga 3421 ggatgcccgc caggaggtgg gcgtggccgt gcacacccca cagcgggccc cccagcaggg 3481 gctgctctcc ctgcacccct gcctgtacag caccctgtgc cggggcactg cccgggatac 3541 ccaggaccac ctggctgtct tccacctgga ctcggaggcc agcgtgaggc tcacagccga 3601 acaggcacgc ttcttcctgc acggggtcac tctggtgccc attgaggagt tcagcccgga 3661 gttcgtggag ccccgggtca gctgcatcag cagccacggc gcctttggcc ccaacagtgc 3721 cgcctgtctg ccctcgcgct tcccaaagcc gccccagccc atcatcctca gggactgcca 3781 ggtgatcccg ctgccgcccg gcctcccgct gacccacgcg caggatctca ctccagccat 3841 gtccccagct ggaccccgac ctcggccccc caccgctgtg gaccctgatg cagagcccac 3901 cctgctgcgt gagccccagg ccaccgtggt cttcaccacc catgtgccca cgctgggccg 3961 ctatgccttc ctgctgcacg gctaccagcc agcccacccc accttccccg tggaagtcct 4021 catcaacgcc ggccgcgtgt ggcagggcca cgccaacgcc agcttctgtc cacatggcta 4081 cggctgccgc accctggtgg tgtgtgaggg ccaggccctg ctggacgtga cccacagcga 4141 gctcactgtg accgtgcgtg tgcccaaggg ccggtggctc tggctggatt atgtactcgt 4201 ggtccctgag aacgtctaca gctttggcta cctccgggag gagcccctgg ataaatccta 4261 tgacttcatc agccactgcg cagcccaggg ctaccacatc agccccagca gctcatccct 4321 gttctgccga aacgctgctg cttccctctc cctcttctat aacaacggag cccgtccatg 4381 tggctgccac gaagtaggtg ctacaggccc cacgtgtgag cccttcgggg gccagtgtcc 4441 ctgccatgcc catgtcattg gccgtgactg ctcccgctgt gccaccggat actggggctt 4501 ccccaactgc aggccctgtg actgcggtgc ccgcctctgt gacgagctca cgggccagtg 4561 catctgcccg ccacgcacca tcccgcccga ctgcctgctg tgccagcccc agacctttgg 4621 ctgccacccc ctggtcggct gtgaggagtg taactgctca gggcccggca tccaggagct 4681 cacagaccct acctgtgaca cagacagcgg ccagtgcaag tgcagaccca acgtgactgg 4741 gcgccgctgt gatacctgct ctccgggctt ccatggctac ccccgctgcc gcccctgtga 4801 ctgtcacgag gcgggcactg cgcctggcgt gtgtgacccc ctcacagggc agtgctactg 4861 taaggagaac gtgcagggcc ccaaatgtga ccagtgcagc cttgggacct tctcactgga 4921 tgctgccaac cccaaaggtt gcacccgctg cttctgcttt ggggccacgg agcgctgccg 4981 gagctcgtcc tacacccgcc aggagttcgt ggatatggag ggatgggtgc tgctgagcac 5041 tgaccggcag gtggtgcccc acgagcggca gccagggacg gagatgctcc gtgcagacct 5101 gcggcacgtg cctgaggctg tgcccgaggc tttccccgag ctgtactggc aggccccacc 5 161 ctcctacctg ggggaccggg tgtcatccta cggtgggacc ctccgttatg aactgcactc 5221 agagacccag cggggagatg tctttgtccc catggagagc aggccggatg tggtgctgca 5281 gggcaaccag atgagcatca cattcctgga gccggcatac cccacgcctg gccacgttca 5341 ccgtgggcag ctgcagctgg tggaggggaa cttccggcat acggagacgc gcaacactgt 5401 gtcccgcgag gagctcatga tggtgctggc cagcctggag cagctgcaga tccgtgccct 5461 cttctcacag atctcctcgg ctgtcttcct gcgcagggtg gcactggagg tggccagccc 5521 agcaggccag ggggccctgg ccagcaatgt ggagctgtgc ctgtgccccg ccagctaccg 5581 gggggactca tgccaggaat gtgcccccgg cttctatcgg gacgtcaaag gtctcttcct 5641 gggccgatgt gtcccttgtc agtgccatgg acactcagac cgctgcctcc ctggctctgg e 1 1 cont'd
cgtctgtgtg gactgccagc acaacaccga aggggcccac tgtgagcgct gccaggctgg cttcgtgagc agcagggacg accccagcgc cccctgtgtc agctgcccct gccccctctc agtgccttcc aacaacttcg ccgagggctg tgtcctgcga ggcggccgca cccagtgcct ctgcaaacct ggttatgcag gtgcctcctg cgagcggtgt gcgcccggat tctttgggaa cccactggtg ctgggcagct cctgccagcc atgcgactgc agcggcaacg gtgaccccaa cttgctcttc agcgactgcg accccctgac gggcgcctgc cgtggctgcc tgcgccacac cactgggccc cgctgcgaga tctgtgcccc cggcttctac ggcaacgccc tgctgcccgg caactgcacc cggtgcgact gtaccccatg tgggacagag gcctgcgacc cccacagcgg gcactgcctg tgcaaggcgg gcgtgactgg gcggcgctgt gaccgctgcc aggagggaca ttttggtttc gatggctgcg ggggctgccg cccgtgtgct tgtggaccgg ccgccgaggg ctccgagtgc cacccccaga gcggacagtg ccactgccga ccagggacca tgggacccca gtgccgcgag tgtgcccctg gctactgggg gctccctgag cagggctgca ggcgctgcca gtgccctggg ggccgctgtg accctcacac gggccgctgc aactgccccc cggggctcag cggggagcgc tgcgacacct gcagccagca gcatcaggtg cctgttccag gcgggcctgt gggccacagc atccactgtg aagtgtgtga ccactgtgtg gtcctgctcc tggatgacct ggaacgggcc ggcgccctcc tccccgccat tcacgagcaa ctgcgtggca tcaatgccag ctccatggcc tgggcccgtc tgcacaggct gaacgcctcc atcgctgacc tgcagagcca gctccggagc cccctgggcc cccgccatga gacggcacag cagctggagg tgctggagca gcagagcaca agcctcgggc aggacgcacg gcggctaggc ggccaggccg tggggacccg agaccaggcg agccaattgc tggccggcac cgaggccaca ctgggccatg cgaagacgct gttggcggcc atccgggctg tggaccgcac cctgagcgag ctcatgtccc agacgggcca cctggggctg gccaatgcct cggctccatc aggtgagcag ctgctccgga cactggccga ggtggagcgg ctgctctggg agatgcgggc ccgggacctg ggggccccgc aggcagcagc tgaggctgag ttggctgcag cacagagatt gctggcccgg gtgcaggagc agctgagcag cctctgggag gagaaccagg cactggccac acaaacccgc gaccggctgg cccagcacga ggccggcctc atggacctgc gagaggcttt gaaccgggca gtggacgcca cacgggaggc ccaggagctc aacagccgca accaggagcg cctggaggaa gccctgcaaa ggaagcagga gctgtcccgg gacaatgcca ccctgcaggc cactctgcat gcggctaggg acaccctggc cagcgtcttc agattgctgc acagcctgga ccaggctaag gaggagctgg agcgcctcgc cgccagcctg gatggggctc ggaccccact gctgcagagg atgcagacct tctccccggc gggcagcaag ctgcgtctag tggaggccgc cgaggcccac gcacagcagc tgggccagct ggcactcaat ctgtccagca tcatcctgga cgtcaaccag gaccgcctca cccagagggc catcgaggcc tccaacgcct acagccgcat cctgcaggcc gtgcaggctg ccgaggatgc tgctggccag gccctgcagc aggcggacca cacgtgggcg acggtggtgc ggcagggcct ggtggaccga gcccagcagc tcctggccaa cagcactgca ctagaagagg ccatgctcca ggaacagcag aggctgggcc ttgtgtgggc tgccctccag ggtgccagga cccagctccg agatgtccgg gccaagaagg accagctgga ggcgcacatc caggcggcgc aggccatgct tgccatggac acagacgaga caagcaagaa gatcgcacat gccaaggctg tggctgctga agcccaggac accgccaccc gtgtgcagtc ccagctgcag gccatgcagg agaatgtgga gcggtggcag ggccagtacg agggcctgcg gggccaggac ctgggccagg cagtgcttga cgcaggccac tcagtgtcca ccctggagaa gacgctgccc cagctgctgg ccaagctgag catcctggag aaccgtgggg tgcacaacgc cagcctggcc ctgtccgcca gcattggccg cgtgcgagag ctcattgccc aggcccgggg ggctgccagt aaggtcaagg tgcccatgaa gttcaacggg cgctcagggg tgcagctgcg caccccacgg gatcttgccg accttgctgc ctacactgcc ctcaagttct acctgcaggg cccagagcct gagcctgggc agggtaccga ggatcgcttt gtgatgtaca tgggcagccg ccaggccact ggggactaca tgggtgtgtc tctgcgtgac aagaaggtgc actgggtgta tcagctgggt gaggcgggcc ctgcagtcct aagcatcgat gaggacattg gggagcagtt cgcagctgtc agcctggaca ggactctcca gtttggccac atgtccgtca cagtggagag acagatgatc caggaaacca agggtgacac Table 1 1 cont'd
8641 ggtggcccct ggggcagagg ggctgctcaa cctgcggcca gacgacttcg tcttctacgt 8701 cggggggtac cccagtacct tcacgccccc tcccctgctt cgcttccccg gctaccgggg 8761 ctgcatcgag atggacacgc tgaatgagga ggtggtcagc ctctacaact tcgagaggac 8821 cttccagctg gacacggctg tggacaggcc ttgtgcccgc tccaagtcga ccggggaccc 8881 gtggctcacg gacggctcct acctggacgg caccggcttc gcccgcatca gcttcgacag 8941 tcagatcagc accaccaagc gcttcgagca ggagctgcgg ctcgtgtcct acagcggggt 9001 gctcttcttc ctgaagcagc agagccagtt cctgtgcttg gccgtgcaag aaggcagcct 9061 cgtgctgttg tatgactttg gggctggcct gaaaaaggcc gtcccactgc agcccccacc 9121 gcccctgacc tcggccagca aggcgatcca ggtgttcctg ctggggggca gccgcaagcg 9181 tgtgctggtg cgtgtggagc gggccacggt gtacagcgtg gagcaggaca atgatctgga 9241 gctggccgac gcctactacc tggggggcgt gccgcccgac cagctgcccc cgagcctgcg 9301 acggctcttc cccaccggag gctcagtccg tggctgcgtc aaaggcatca aggccctggg 9361 caagtatgtg gacctcaagc ggctgaacac gacaggcgtg agcgccggct gcaccgccga 9421 cctgctggtg gggcgcgcca tgactttcca tggccacggc ttccttcgcc tggcgctctc 9481 gaacgtggca ccgctcactg gcaacgtcta ctccggcttc ggcttccaca gcgcccagga 9541 cagtgccctg ctctactacc gggcgtcccc ggatgggcta tgccaggtgt ccctgcagca 9601 gggccgtgtg agcctacagc tcctgaggac tgaagtgaaa actcaagcgg gcttcgccga 9661 tggtgccccc cattacgtcg ccttctacag caatgccacg ggagtctggc tgtatgtcga 9721 tgaccagctc cagcagatga agccccaccg gggaccaccc cccgagctcc agccgcagcc 9781 tgaggggccc ccgaggctcc tcctgggagg cctgcctgag tctggcacca tttacaactt 9841 cagtggctgc atcagcaacg tcttcgtgca gcggctcctg ggcccacagc gcgtatttga 9901 tctgcagcag aacctgggca gcgtcaatgt gagcacgggc tgtgcacccg ccctgcaagc 9961 ccagaccccg ggcctggggc ctagaggact gcaggccacc gcccggaagg cctcccgccg 10021 cagccgtcag cccgcccggc atcctgcctg catgctgccc ccacacctca ggaccacccg 10081 agactcctac cagtttgggg gttccctgtc cagtcacctg gagtttgtgg gcatcctggc 10141 ccgacatagg aactggccca gtctctccat gcacgtcctc ccgcgaagct cccgaggcct 10201 cctcctcttc actgcccgtc tgaggcccgg cagcccctcc ctggcgctct tcctgagcaa 10261 tggccacttc gttgcacaga tggaaggcct cgggactcgg ctccgcgccc agagccgcca 10321 gcgctcccgg cctggccgct ggcacaaggt ctccgtgcgc tgggagaaga accggatcct 10381 gctggtgacg gacggggccc gggcctggag ccaggagggg ccgcaccggc agcaccaggg 10441 ggcagagcac ccccagcccc acaccctctt tgtgggcggc ctcccggcca gcagccacag 10501 ctccaaactt ccggtgaccg tcgggttcag cggctgtgtg aagagactga ggctgcacgg 10561 gaggcccctg ggggccccca cacggatggc aggggtcaca ccctgcatct tgggccccct 10621 ggaggcgggc ctgttcttcc caggcagcgg gggagttatc actttagacc tcccaggagc 10681 tacactgcct gatgtgggcc tggaactgga ggtgcggccc ctggcagtca ccggactgat 10741 cttccacttg ggccaggccc ggacgccccc ctacttgcag ttgcaggtga ccgagaagca 10801 agtcctgctg cgggcggatg acggagcagg ggagttctcc acgtcagtga cccgcccctc 10861 agtgctgtgt gatggccagt ggcaccggct agcggtgatg aaaagcggga atgtgctccg 10921 gctggaggtg gacgcgcaga gcaaccacac cgtgggcccc ttgctggcgg ctgcagctgg 10981 tgccccagcc cctctgtacc tcgggggcct gcctgagccc atggccgtgc agccctggcc 1 1041 ccccgcctac tgcggctgca tgaggaggct ggcggtgaac cggtcccccg tcgccatgac 1 1 101 tcgctctgtg gaggtccacg gggcagtggg ggccagtggc tgcccagccg cctaggacac 1 1 161 agccaacccc ggcccctggt caggcccctg cagctgcctc acaccgcccc ttgtgctcgc 1 1221 ctcataggtg tctatttgga ctctaagctc tacgggtgac agatcttgtt tctgaagatg 1 1281 gtttaagtta tagcttctta aacgaaagaa taaaatactg caaaatgttt ttatatttgg
1 1341 cccttccacc catttttaat tgtgagagat ttgtcaccaa tcatcactgg ttcctcctta
1 1401 aaaattaaaa agtaacttct gtgtaa Table 1 1 cont'd
Exemplary Target Sequence for HG-U133A:210150_S_AT
SEQ ID NO: 22
tcagtgctgtgtgatggccagtggcaccggctagcggtgatgaaaagcgggaatgtgctc cggctggaggtggacgcgcagagcaaccacaccgtgggccccttgctggcggctgcagct ggtgccccagcccctctgtacctcgggggcctgcctgagcccatggccgtgcagccctgg ccccccgcctactgcggctgcatgaggaggctggcggtgaaccggtcccccgtcgccatg actcgctctgtggaggtccacggggcagtgggggccagtggctgcccagccgcctaggac acagccaaccccggcccctggtcaggcccctgcagctgcctcacaccgccccttgtgctc gcctcataggtgtctatttggactctaagctctacgggtgacagatcttgtttctgaaga tggtttaagttatagcttcttaaacgaaagaataaaatactgcaaaatgtttttatattt ggcccttccacccatttttaattgtgagagatttgtcaccaatcatcactggttcctcct taaaaat
Sample Probes:
SEQ ID NO: 23
TCAGTGCTGTGTGATGGCCAGTGGC SEQ ID NO: 24
TCATCACTGGTTCCTCCTTAAAAAT
Table 12.X7: OXCT1
Reference Sequence: NM_000436.3 GI: 112382246
SEQ ID NO: 25
1 atcgaagcta aagggctgac agaagggctg tagcggcaag tgtgcgccaa cacagctgca 61 cacgcccaga cacctgaggc gctgagagga acttccccgc gatccgcccc ggctgcacag 121 agcctcttct ccccgccggg gccccgccca gcctcacttc ttttaaaagc agcagccagc 181 agcgcgccgg tggcgtgcgg ctcgaaaggc cgggaggacg tcatcgacgc gctgtcgagc 241 ctccagcccg cccgggtttc cttcgcagtc gcgcaccgac gctcaaacgc gcgctccaac 301 ccgcagcctc ctcctgcctc accgcccgaa gatggcggct ctcaaactcc tctcctccgg 361 gcttcggctc tgcgcctctg cccgcggatc tggggcaacc tggtacaagg gatgtgtttg 421 ttccttttcc accagtgctc atcgccatac caagttttat acagatccag tagaagctgt 481 aaaagacatc cctgatggtg ccacggtttt ggttggtggt tttgggctat gtggaattcc 541 agagaatctt atagatgctt tactgaaaac tggagtaaaa ggactaactg cagtcagcaa 601 caatgcaggg gttgacaatt ttggtttggg gcttttgctt cggtccaagc agataaaacg 661 catggtctct tcatatgtgg gagaaaatgc agaatttgaa cgacagtact tatctggtga 721 attagaagtg gagctgacac cacagggcac acttgcagag aggatccgtg caggcggggc 781 tggagttcct gcattttaca ccccaacagg gtatgggacc ctggtacaag aaggaggatc 841 gcccatcaaa tacaacaaag atggcagtgt tgccattgcc agtaagccaa gagaggtgag 901 ggagttcaat ggtcagcact ttattttgga ggaagcaatt acaggggatt ttgctttggt 961 gaaagcctgg aaggcggacc gagcaggaaa cgtgattttc aggaaaagtg caaggaattt 1021 caacttgcca atgtgcaaag ctgcagaaac cacagtggta gaggttgaag aaattgtgga 1081 tattggagca tttgctccag aagacatcca tattcctcag atttatgtac atcgccttat 1 141 aaagggagaa aaatatgaga aaagaattga gcgtttatca atccggaaag agggagatgg 1201 ggaagccaaa tctgctaaac ctggagatga cgtaagggaa cgaatcatca agagggccgc 1261 tcttgagttt gaggatggca tgtatgctaa tttgggcata ggaatccctc tcctggccag 1321 caattttatc agcccaaata taactgttca tcttcaaagt gaaaatggag ttctgggttt 1381 gggtccatat ccacgacaac atgaagctga tgcagatctc atcaatgcag gcaaggaaac 1441 agttactatt cttccaggag cctctttttt ctccagcgat gaatcatttg caatgattag 1501 aggtggacac gtcgatctga caatgctagg agcgatgcag gtttccaaat atggtgacct 1561 ggctaactgg atgatacctg ggaagatggt gaaaggaatg ggaggtgcta tggatttagt 1621 gtccagtgcg aaaaccaaag tggtggtcac catggagcat tctgcaaagg gaaatgcaca 1681 taaaatcatg gagaaatgta cattaccatt gactggaaag caatgtgtca accgcattat 1741 tactgaaaag gctgtgtttg atgtggacaa gaagaaaggg ttgactctga ttgagctctg 1801 ggaaggcctg acagtggatg acgtacagaa gagtactggg tgtgattttg cagtttcacc 18 1 aaaactcatg ccaatgcagc agatcgcaaa ttgaaatatg gatatttgta ccaggctgcg 1921 tgtttttcat tttaaacaca caagatttaa ttgaaaggac atcaataatc ataattgtgt 1981 atttaacagg tggtttttta ttagttttct tgtgtttcag actttatgca gccatataaa 2041 ctgttctcta ggcatgctgt gacattttaa taaaaagcaa aaggagcatt tataattatc 2101 tcatttgtta aggctgagaa ggttgttttt ataataggta attatattga atgcattttc 2161 actgaatatg gtatgtatgc taaattatat gaacctttcc ccaagaaggg ccctagaaat 2221 tgatgtggct ttcctcttaa atattaatta ttagtcctga aagaaagata acatatgtga 2281 tttttgtggt taggagagtt gctgtcatga ttgttttttc ttcagcctcc tctgactttt 2341 cttttggggc ttcagatttt atgattacat cttgtccccc tagaacatcc cccttcctcc 2401 catactgctt ttaaacagat gcccaagaag gcaagcagga atgcctcttg tgggggaggg 2461 cagggagaaa taactagttc aaaccaacta tctatctatg ctttgcaaag actaaggcgt 2521 attataggaa gagggctaga aacctaactg attcttctca gttttctcat tttaaaacag 2581 cccagtattc ctttgtatcc tcaagggtcc ttgagaatac ttctgttatt gagaccctgt 2641 gggctacttg tactgtacct cctctcaagc caagaagggc tgtgggataa tttaccatga 2701 atccttagta gcaatgacag cagagttaaa aaataaaagg tgttttactt tcaggctctt Table 12 cont'd
2761 gttttggttc agaggagatt ttaaatattg aatgacactt ctacagaaca acggtttttc 2821 ttctgccaag gctacttcct ttaacgaagt gcctttaatt cagccttatc caactaggga 2881 aaataatgtt ggacaagtct aggatttgaa gagtcagtga acttttagtg tcagggaata 2941 aacatggtgg gtagattagg tttgaaaaaa acttccttag aggtatttat tctcaatacc 3001 tgacaggggc ccatgggaat gacttcagaa gcatcccgga taatagatgg gtaaaaagtc 3061 taggcaccct gaagaacagg tgagacagct ggcctctgga cagaggtagg catagtacag 3121 tacgatatat cattcctctg gtcctaaata tacaaactta ttcatgtttt taggtgatga 3181 tggtcattga aactcacttc ttttcaggtg tagctacaat tgtgtaatgt acaatattag 3241 agaaaggaca ggctttttat gagtaacaca caccatatat aaaacagcct ttctggctga 3301 ccacatggtt aaatgcatac cttcccagta ctggggggaa aaaatgaccc ttcttagaat 3361 gtgcaagttc catgagagta atatattgat atgattttga aaagaattgt tgatagttac 3421 atcttcaaac ttatcattcc agtatgcatc tttaagataa tgtgattcta agtagatgac 3481 tttatattct tgattaaaga gtgctataca tgttaagaaa tgcattaagg aatacaataa 3541 atattctaaa gtgatgtaaa aaaaaaaaaa aa
Exemplary Target Sequence for HG-U133A:202780_AT
SEQ ID NO: 26
ttctcaatacctgacaggggcccatgggaatgacttcagaagcatcccggataatagatg ggtaaaaagtctaggcaccctgaagaacaggtgagacagctggcctctggacagaggtag gcatagtacagtacgatatatcattcctctggtcctaaatatacaaacttattcatgttt
ttaggtgatgatggtcattgaaactcacttcttttcaggtgtagctacaattgtgtaatg
tacaatattagagaaaggacaggctttttatgagtaacacacaccatatataaaacagcc tttctggctgaccacatggttaaatgcataccttcccagtactggggggaaaatgaccct tcttagaatgtgcaagttccatagagtaatatattgatatgattttgaaaagaattgttg
atagttacatcttcaaacttatcattccagtatgcatctt
Sample Probes:
SEQ ID NO: 27
TTCTCAATACCTGACAGGGGCCCAT
SEQ ID NO: 28
AACTTATCATTCCAGTATGCATCTT
Table 13:X8: RARA
Reference Sequence: NM_000964.3 GI:300388174
SEQ ID NO: 29
1 gtgcctcttg cagcagccta acccagaagc aggggggaat cctgaatcga gctgagaggg 61 cttccccggt tctcctggga accccatcgg ccccctgcca gcacacacct gagcagcatc 121 acaggacatg gccccctcag ccacctagct ggggcccatc taggagtggc atcttttttg 181 gtgccctgaa ggccagctct ggaccttccc aggaaaagtg ccagctcaca gaactgcttg 241 accaaaggac cggctcttga gacatccccc aacccacctg gcccccagct agggtggggg 301 ctccaggaga ctgagattag cctgccctct ttggacagca gctccaggac agggcgggtg 361 ggctgaccac ccaaacccca tctgggccca ggccccatgc cccgaggagg ggtggtctga 421 agcccaccag agccccctgc cagactgtct gcctcccttc tgactgtggc cgcttggcat 481 ggccagcaac agcagctcct gcccgacacc tgggggcggg cacctcaatg ggtacccggt 541 gcctccctac gccttcttct tcccccctat gctgggtgga ctctccccgc caggcgctct 601 gaccactctc cagcaccagc ttccagttag tggatatagc acaccatccc cagccaccat 661 tgagacccag agcagcagtt ctgaagagat agtgcccagc cctccctcgc caccccctct 721 accccgcatc tacaagcctt gctttgtctg tcaggacaag tcctcaggct accactatgg 781 ggtcagcgcc tgtgagggct gcaagggctt cttccgccgc agcatccaga agaacatggt 841 gtacacgtgt caccgggaca agaactgcat catcaacaag gtgacccgga accgctgcca 901 gtactgccga ctgcagaagt gctttgaagt gggcatgtcc aaggagtctg tgagaaacga 961 ccgaaacaag aagaagaagg aggtgcccaa gcccgagtgc tctgagagct acacgctgac 1021 gccggaggtg ggggagctca ttgagaaggt gcgcaaagcg caccaggaaa ccttccctgc 1081 cctctgccag ctgggcaaat acactacgaa caacagctca gaacaacgtg tctctctgga 1 141 cattgacctc tgggacaagt tcagtgaact ctccaccaag tgcatcatta agactgtgga 1201 gttcgccaag cagctgcccg gcttcaccac cctcaccatc gccgaccaga tcaccctcct 1261 caaggctgcc tgcctggaca tcctgatcct gcggatctgc acgcggtaca cgcccgagca 1321 ggacaccatg accttctcgg acgggctgac cctgaaccgg acccagatgc acaacgctgg 1381 cttcggcccc ctcaccgacc tggtctttgc cttcgccaac cagctgctgc ccctggagat 1441 ggatgatgcg gagacggggc tgctcagcgc catctgcctc atctgcggag accgccagga 1501 cctggagcag ccggaccggg tggacatgct gcaggagccg ctgctggagg cgctaaaggt 1561 ctacgtgcgg aagcggaggc ccagccgccc ccacatgttc cccaagatgc taatgaagat 1621 tactgacctg cgaagcatca gcgccaaggg ggctgagcgg gtgatcacgc tgaagatgga 1681 gatcccgggc tccatgccgc ctctcatcca ggaaatgttg gagaactcag agggcctgga 1741 cactctgagc ggacagccgg ggggtggggg gcgggacggg ggtggcctgg cccccccgcc 1801 aggcagctgt agccccagcc tcagccccag ctccaacaga agcagcccgg ccacccactc 1861 cccgtgaccg cccacgccac atggacacag ccctcgccct ccgccccggc ttttctctgc 1921 ctttctaccg accatgtgac cccgcaccag ccctgccccc acctgccctc ccgggcagta 1981 ctggggacct tccctggggg acggggaggg aggaggcagc gactccttgg acagaggcct 2041 gggccctcag tggactgcct gctcccacag cctgggctga cgtcagaggc cgaggccagg 2101 aactgagtga ggcccctggt cctgggtctc aggatgggtc ctgggggcct cgtgttcatc 2161 aagacacccc tctgcccagc tcaccacatc ttcatcacca gcaaacgcca ggacttggct 2221 cccccatcct cagaactcac aagccattgc tccccagctg gggaacctca acctcccccc 2281 tgcctcggtt ggtgacagag ggggtgggac aggggcgggg ggttccccct gtacataccc 2341 tgccatacca accccaggta ttaattctcg ctggttttgt ttttatttta atttttttgt
2401 tttgattttt ttaataagaa ttttcatttt aagcacattt atactgaagg aatttgtgct
2461 gtgtattggg gggagctgga tccagagctg gagggggtgg gtccggggga gggagtggct 2521 cggaaggggc ccccactctc ctttcatgtc cctgtgcccc ccagttctcc tcctcagcct 2581 tttcctcctc agttttctct ttaaaactgt gaagtactaa ctttccaagg cctgccttcc
2641 cctccctccc actggagaag ccgccagccc ctttctccct ctgcctgacc actgggtgtg 2701 gacggtgtgg ggcagccctg aaaggacagg ctcctggcct tggcacttgc ctgcacccac Table 13 cont'd
2761 catgaggcat ggagcagggc agagcaaggg ccccgggaca gagtlttccc agacctggct 2821 cctcggcaga gctgcctccc gtcagggccc acatcatcta ggctccccag cccccactgt 2881 gaaggggctg gccaggggcc cgagctgccc ccacccccgg cctcagccac cagcaccccc 2941 atagggcccc cagacaccac acacatgcgc gtgcgcacac acacaaacac acacacactg 3001 gacagtagat gggccgacac acacttggcc cgagttcctc catttccctg gcctgccccc 3061 cacccccaac ctgtcccacc cccgtgcccc ctccttaccc cgcaggacgg gcctacaggg 3121 gggtctcccc tcacccctgc acccccagct gggggagctg gctctgcccc gacctccttc 3181 accaggggtt ggggcccctt cccctggagc ccgtgggtgc acctgttact gttgggcttt 3241 ccactgagat ctactggata aagaataaag ttctatttat tctaaaaaaa aaaaaaaaaa 3301 a
Exemplary Target Sequence for HG-U133A:216300_X_AT
SEQ ID NO: 30
ggagggaggaggcagcgactccttgngacagaggcctgggccctcagtggactgcctgct cccacagcctgggctgacgtcagaggccgaggccaggaactgagtgaggcccctggtcct gggtctcaggatgggtcctgggggcctcgtgttcatcaagacnnnnntctgcccagctca ccacatcttcatcaccagcaaacgccaggacttggctcccccatcctcagaactcacaag
ccattgctcaacctcccccctgcctcggttggtgacagagggggtgggacaggggcgggg ggttccccctgtacataccctgccataccaaccccaggtattaattctcgctggttttgt
ttttattttaattgtttttgtgttagattatttttaataagaa
tttatactgaaggaatttgtgctgtgtattggggggagctggatccagagctggaggggg
tgggtccgg
Sample Probes:
SEQ ID NO: 3 1
GGAGGGAGGAGGCAGCGACTCCTTG
SEQ ID NO: 32
CCAGAGCTGGAGGGGGTGGGTCCGG
Table 14:X9 UBE2J1
Reference Sequence: NM_016021.2 GI:37577121
SEQ ID NO: 33
1 gcggccgcgg cagggctggg cctgcgacta cccgaggagg ctgacctcca gcccgggcgc 61 ccggttcagc gccgccccgg ccggcgccgg tgcctgccag gcactcaggg aggcgggggc 121 gcagtggagg aggcggcgcc atcgcgaagc gagcgcctcg cccgcactca gccttgccac 181 cccgcccgca gtccaggctg gactgggcgg catttgccga ggctcctcgg ccaggccccg 241 tccgcccgag ccgcgctgag acccgggcag cggccgcgtg gagaggaggt ggcagcggcc 301 cgggaggccg gagccaagcc agcgacccac catggagacc cgctacaacc tgaagagtcc 361 ggctgttaaa cgtttaatga aagaagcggc agaattgaaa gatccaacag atcattacca 421 tgcgcagcct ttagaggata acctttttga atggcacttc acggttagag ggcccccaga 481 ctccgatttt gatggaggag tttatcacgg gcggatagta ctgccaccag agtatcccat 541 gaaaccacca agcattattc tcctaacggc taatggtcga tttgaagtgg gcaagaaaat 601 ctgtttgagc atctcaggcc atcatcctga aacttggcag ccttcgtgga gtataaggac 661 agcattatta gccatcattg ggtttatgcc aacaaaagga gagggagcca taggttctct 721 agattacact cctgaggaaa gaagagcact tgccaaaaaa tcacaagatt tctgttgtga 781 aggatgtggc tctgccatga aggatgtcct gttgccttta aaatctggaa gcgattcaag 841 ccaagctgac caagaagcca aagaactggc taggcaaata agctttaagg cagaagtcaa 901 ttcatctgga aagactatct ctgagtcaga cttaaaccac tctttttcac taactgattt
961 acaagatgat atacctacaa cattccaggg tgctacggcc agtacatcgt acggactcca 1021 gaattcctca gcagcatcct ttcatcaacc tacccaacct gtagctaaga atacctccat 1081 gagccctcga cagcgccggg cccagcagca gagtcagaga aggttgtcta cttcaccaga 1 141 tgtaatccag ggccaccagc caagagacaa ccacactgat catggtgggt cagctgtact 1201 gattgtcatc ctgactttgg cattggcagc tcttatattc cgacgaatat atctggcaaa 1261 cgaatacata tttgactttg agttataata tggttttgtg acttatgagc tgtgactcaa 1321 ctgcttcatt aaacattctg cattgggtat aatctaagaa ttgtttacaa aaagattatt 1381 ttgtatttac ccttcattcc tttttttgat ccttgtaagt ttagtataaa tatatctaga
1441 cattcagact gtgtctagca gttacgtcct gcttaaaggg actagaagtc aaagttcctt 1501 gtctcactat ttgatctgct ttgcagggaa ataacttgtt ttttctcatg tttcatcttc
1561 tttttatgta aatttgtaat actttcctat attgcccttt gaaatttttg gataaaagat
1621 gatgttttaa gttccaatga gtattactag ttactcaata ccacttattg agtactctgt 1681 ttctacgtat gtagaatgta tagggataga agagttgaaa agggaaagca aaactcctca 1741 agtagcttcc ttaaaatgtc attcatagga gatgtactgg aattgctcat tctgtgactt 1801 tatttgtgtc ctaaacattc ttcagtgaaa ataattttat ttcagtcaaa catttatgag
1861 gaaatgagat cacatctttg tcactggatg ctacttgaag agggagtact ttgtaaccac 1921 tttgatatgc tgttatcacc accccctgcc ctctgctgcc ataatcacac aaatttaaaa 1981 agaaagaaaa cagtcttcca tagattttta aggaagaaag ggcccaagcc aggagatcgc 2041 ttggttttct tccagaagtt aaatgggggg atctgaagat ttgaatgttt ggtctgcttt 2101 gaaatgtatg tcttttggga tgtattatat gcctagcttt ataatcagta taaattttaa
2161 ttattccagg aatatgcata atattgaaat atttcatgtc ctattttaat agaaaacctc 2221 agggcccaag taacagtgat agaagttaga aaaaccttta cttagaattg tccacctagt 2281 cagagcccaa gaaagaattt tcagtggaaa aatcaatata taacttagtg ctagctagcg 2341 ccacagactc tagtagataa tattatcatc ataatggctg gtgaaaccat ataatcacag 2401 aaaaacattg ccttcagcat gttcagttcg cagcactgag ggcactcttg agggtgttgt 2461 taatgaagat ttaattttta aatacaggtg gttccaagct ttcaaatagg ttatgctcca 2521 aaagtgttat ttgtaagtta atttttttac aagtcaaaca atgttggaag tggtatttag
2581 gttctagatc ggtccacgaa agttagccca tatgtatatc ttgaatagta taggggaggg 2641 tattcataaa gtccttatgt ggttttaact aagtgaaatt atggacaaga gaaataattg 2701 taaaatcgtc ttaaaggaaa atttaatttt tactcctgtt tatgggacat tcgttctatt Table 14 cont'd
2761 aactgtcaga cacaatttct gttttcatct gagagccagt tttcctttat ttctacatct 2821 aaaataagaa catattgtac actattatat aatacagaat tgtcttaaac tttaataaat 2881 tcgcatttta aaggtgttta cagattattt tttatatctg tagctgaatt tgttaaagtc 2941 taaaaagctc aaggacttta tgaagatctc attatatgag gaaaatcata ggttaccatt 3001 ttataactct attgccataa gaaaatacac tctaaaatct tgatttgaaa catattagaa 3061 accttgattc agtgctcagt ggtctcctag taagaagtca ccgacggtag cgtcatatga 3121 gaagaaagaa atccccacca cctcaacctc tgctgagatt gtgtgctagg aacagccttc 3181 cctccgtttc ccctcagtca aacttgagcc agcctctgga tcgatgtgat cttattgcat 3241 gtttccatgg ggtgtaccta tactttaagc caatcctgct gcattcactg ctaagttaaa 3301 taaaaagcca agaagatttt gcactgtgca gatcctttgc tatctgactt gcatctcttc 3361 ccccacctgt cagctagcca cctgcttgtt tgtgttggga tattttttag cacctgaagc 3421 accatctgaa aggggcacca ttttcttctt ccctttgatc tcacatatgc tccctaaaaa 3481 tccttaagtt gtcaatctga tccccagtgt gaggttaatg agcaaaattg gtctttgggg 3541 ccctttttgt ccaagcccca ctgaaaggcc tcttcagaaa actattatct ttaaagccct 3601 actttaactc cttaattcca gcatacagct aaaactggat gtatattctg gcaagtaaag 3661 gctgaggact cctctttaat cctcagatct agataactca tgacatttta tttgaccaac 3721 atagcacatg atgagatatc aaggtaatta aaatagcatg cttgaaaaaa aaatacgtaa 3781 tctgtttcac ctgtaactgt ttaagccaat aaacttttca aaatttatgt aatgtggggc 3841 ttttatgtag cactttacgt tttcatgctg cttattgttt tattctactg aaaaaaatga 3901 atttcaagat tctcaacttt tttaatttca aaaattgttt attgttttga ctataggaat 3961 acaaaatttc ctattttggg agaataagaa ctctttttgt catttttggc tatgaataaa 4021 ctttctggtc ttttgagacc acccattttt atagatcaga atcagaaaac aggtaaacct 4081 cactcacaca tttggactca tttgaacaaa aatctaggcc aaaatactga aaagcctatg 4141 tgttttttta attggaagta tatgtaaggt taatgcattt agtgaacgtg actaacaaag 4201 actaatgtgc acattaacag atgtactttt taaggtttta tgggaggctg tgcattgctc 4261 aaaagctgtt gggaacgcct tctgaacagt tgccttcaga actagtttga gctgctcaat 4321 aaaaccagtg actttactca taaaaaaaaa aaaaaaaaaa
Exemplary Target Sequence for HG-U133A:217825_S_AT
SEQ ID NO: 34
taccattttataactctattgccataagaaaatacactctaaaatcttgatttgaaacat
attagaaaccttgattcagtgctcagtggtctcctagtaagaagtcaccgacggtagcgt catatgagaagaaagaaatccccaccacctcaacctctgctgagattgtgtgctaggaac agccttccctccgtttcccctcagtcaaacttgagccagcctctggatcgatgtgatctt attgcatgtttccatggggtgtacctatactttaagccaatcctgctgcattcactgcta agttaaataaaaagccaagaagaaaaaaaaaattttgcactgtgcagatcctttgctatc tgacttgcatc
Sample Probes:
SEQ ID NO: 35
TACCATTTTATAACTCTATTGCCAT SEQ ID NO: 36
GATCCTTTGCTATCTGACTTGCATC Table 15:X10: ESR1
Reference Sequence: NM_000125.3 GI.170295798
SEQ ID NO: 37
1 aggagctggc ggagggcgtt cgtcctggga ctgcacttgc tcccgtcggg tcgcccggct 61 tcaccggacc cgcaggctcc cggggcaggg ccggggccag agctcgcgtg tcggcgggac 121 atgcgctgcg tcgcctctaa cctcgggctg tgctcttttt ccaggtggcc cgccggtttc 181 tgagccttct gccctgcggg gacacggtct gcaccctgcc cgcggccacg gaccatgacc 241 atgaccctcc acaccaaagc atctgggatg gccctactgc atcagatcca agggaacgag 301 ctggagcccc tgaaccgtcc gcagctcaag atccccctgg agcggcccct gggcgaggtg 361 tacctggaca gcagcaagcc cgccgtgtac aactaccccg agggcgccgc ctacgagttc 421 aacgccgcgg ccgccgccaa cgcgcaggtc tacggtcaga ccggcctccc ctacggcccc 481 gggtctgagg ctgcggcgtt cggctccaac ggcctggggg gtttcccccc actcaacagc 541 gtgtctccga gcccgctgat gctactgcac ccgccgccgc agctgtcgcc tttcctgcag 601 ccccacggcc agcaggtgcc ctactacctg gagaacgagc ccagcggcta cacggtgcgc 661 gaggccggcc cgccggcatt ctacaggcca aattcagata atcgacgcca gggtggcaga 721 gaaagattgg ccagtaccaa tgacaaggga agtatggcta tggaatctgc caaggagact 781 cgctactgtg cagtgtgcaa tgactatgct tcaggctacc attatggagt ctggtcctgt 841 gagggctgca aggccttctt caagagaagt attcaaggac ataacgacta tatgtgtcca 901 gccaccaacc agtgcaccat tgataaaaac aggaggaaga gctgccaggc ctgccggctc 961 cgcaaatgct acgaagtggg aatgatgaaa ggtgggatac gaaaagaccg aagaggaggg
102 agaatgttga aacacaagcg ccagagagat gatggggagg gcaggggtga agtggggtct 108 gctggagaca tgagagctgc caacctttgg ccaagcccgc tcatgatcaa acgctctaag 1 14 aagaacagcc tggccttgtc cctgacggcc gaccagatgg tcagtgcctt gttggatgct 120 gagcccccca tactctattc cgagtatgat cctaccagac ccttcagtga agcttcgatg 126 atgggcttac tgaccaacct ggcagacagg gagctggttc acatgatcaa ctgggcgaag 132 agggtgccag gctttgtgga tttgaccctc catgatcagg tccaccttct agaatgtgcc 138 tggctagaga tcctgatgat tggtctcgtc tggcgctcca tggagcaccc agggaagcta 144 ctgtttgctc ctaacttgct cttggacagg aaccagggaa aatgtgtaga gggcatggtg 150 gagatcttcg acatgctgct ggctacatca tctcggttcc gcatgatgaa tctgcaggga 156 gaggagtttg tgtgcctcaa atctattatt ttgcttaatt ctggagtgta cacatttctg 162 tccagcaccc tgaagtctct ggaagagaag gaccatatcc accgagtcct ggacaagatc 168 acagacactt tgatccacct gatggccaag gcaggcctga ccctgcagca gcagcaccag 174 cggctggccc agctcctcct catcctctcc cacatcaggc acatgagtaa caaaggcatg 180 gagcatctgt acagcatgaa gtgcaagaac gtggtgcccc tctatgacct gctgctggag 186 atgctggacg cccaccgcct acatgcgccc actagccgtg gaggggcatc cgtggaggag 192 acggaccaaa gccacttggc cactgcgggc tctacttcat cgcattcctt gcaaaagtat 198 tacatcacgg gggaggcaga gggtttccct gccacggtct gagagctccc tggctcccac 04 acggttcaga taatccctgc tgcattttac cctcatcatg caccacttta gccaaattct 10 gtctcctgca tacactccgg catgcatcca acaccaatgg ctttctagat gagtggccat 16 tcatttgctt gctcagttct tagtggcaca tcttctgtct tctgttggga acagccaaag 22 ggattccaag gctaaatctt tgtaacagct ctctttcccc cttgctatgt tactaagcgt 28 gaggattccc gtagctcttc acagctgaac tcagtctatg ggttggggct cagataactc 34 tgtgcattta agctacttgt agagacccag gcctggagag tagacatttt gcctctgata 40 agcacttttt aaatggctct aagaataagc cacagcaaag aatttaaagt ggctccttta 46 attggtgact tggagaaagc taggtcaagg gtttattata gcaccctctt gtattcctat 52 ggcaatgcat ccttttatga aagtggtaca ccttaaagct tttatatgac tgtagcagag 58 tatctggtga ttgtcaattc attcccccta taggaataca aggggcacac agggaaggca 64 gatcccctag ttggcaagac tattttaact tgatacactg cagattcaga tgtgctgaaa 70 gctctgcctc tggctttccg gtcatgggtt ccagttaatt catgcctccc atggacctat Table 15 cont'd
2761 ggagagcagc aagttgatct tagttaagtc tccctatatg agggataagt tcctgatttt 2821 tgtttttatt tttgtgttac aaaagaaagc cctccctccc tgaacttgca gtaaggtcag 2881 cttcaggacc tgttccagtg ggcactgtac ttggatcttc ccggcgtgtg tgtgccttac 2941 acaggggtga actgttcact gtggtgatgc atgatgaggg taaatggtag ttgaaaggag 3001 caggggccct ggtgttgcat ttagccctgg ggcatggagc tgaacagtac ttgtgcagga 3061 ttgttgtggc tactagagaa caagagggaa agtagggcag aaactggata cagttctgag 3121 gcacagccag acttgctcag ggtggccctg ccacaggctg cagctaccta ggaacattcc 3 181 ttgcagaccc cgcattgccc tttgggggtg ccctgggatc cctggggtag tccagctctt 3241 cttcatttcc cagcgtggcc ctggttggaa gaagcagctg tcacagctgc tgtagacagc 3301 tgtgttccta caattggccc agcaccctgg ggcacgggag aagggtgggg accgttgctg 3361 tcactactca ggctgactgg ggcctggtca gattacgtat gcccttggtg gtttagagat 3421 aatccaaaat cagggtttgg tttggggaag aaaatcctcc cccttcctcc cccgccccgt 3481 tccctaccgc ctccactcct gccagctcat ttccttcaat ttcctttgac ctataggcta 3541 aaaaagaaag gctcattcca gccacagggc agccttccct gggcctttgc ttctctagca 3601 caattatggg ttacttcctt tttcttaaca aaaaagaatg tttgatttcc tctgggtgac 3661 cttattgtct gtaattgaaa ccctattgag aggtgatgtc tgtgttagcc aatgacccag 3721 gtgagctgct cgggcttctc ttggtatgtc ttgtttggaa aagtggattt cattcatttc 3781 tgattgtcca gttaagtgat caccaaagga ctgagaatct gggagggcaa aaaaaaaaaa 3841 aaagttttta tgtgcactta aatttgggga caattttatg tatctgtgtt aaggatatgt 3901 ttaagaacat aattcttttg ttgctgtttg tttaagaagc accttagttt gtttaagaag 3961 caccttatat agtataatat atattttttt gaaattacat tgcttgttta tcagacaatt 4021 gaatgtagta attctgttct ggatttaatt tgactgggtt aacatgcaaa aaccaaggaa 4081 aaatatttag tttttttttt tttttttgta tacttttcaa gctaccttgt catgtataca
4141 gtcatttatg cctaaagcct ggtgattatt catttaaatg aagatcacat ttcatatcaa 4201 cttttgtatc cacagtagac aaaatagcac taatccagat gcctattgtt ggatactgaa 4261 tgacagacaa tcttatgtag caaagattat gcctgaaaag gaaaattatt cagggcagct 4321 aattttgctt ttaccaaaat atcagtagta atatttttgg acagtagcta atgggtcagt 4381 gggttctttt taatgtttat acttagattt tcttttaaaa aaattaaaat aaaacaaaaa 4441 aaaatttcta ggactagacg atgtaatacc agctaaagcc aaacaattat acagtggaag 4501 gttttacatt attcatccaa tgtgtttcta ttcatgttaa gatactacta catttgaagt 4561 gggcagagaa catcagatga ttgaaatgtt cgcccagggg tctccagcaa ctttggaaat 4621 ctctttgtat ttttacttga agtgccacta atggacagca gatattttct ggctgatgtt 4681 ggtattgggt gtaggaacat gatttaaaaa aaaactcttg cctctgcttt cccccactct 4741 gaggcaagtt aaaatgtaaa agatgtgatt tatctggggg gctcaggtat ggtggggaag 4801 tggattcagg aatctgggga atggcaaata tattaagaag agtattgaaa gtatttggag 4861 gaaaatggtt aattctgggt gtgcaccagg gttcagtaga gtccacttct gccctggaga 4921 ccacaaatca actagctcca tttacagcca tttctaaaat ggcagcttca gttctagaga 4981 agaaagaaca acatcagcag taaagtccat ggaatagcta gtggtctgtg tttcttttcg 5041 ccattgccta gcttgccgta atgattctat aatgccatca tgcagcaatt atgagaggct 5101 aggtcatcca aagagaagac cctatcaatg taggttgcaa aatctaaccc ctaaggaagt 5161 gcagtctttg atttgatttc cctagtaacc ttgcagatat gtttaaccaa gccatagccc 5221 atgccttttg agggctgaac aaataaggga cttactgata atttactttt gatcacatta 5281 aggtgttctc accttgaaat cttatacact gaaatggcca ttgatttagg ccactggctt 5341 agagtactcc ttcccctgca tgacactgat tacaaatact ttcctattca tactttccaa 5401 ttatgagatg gactgtgggt actgggagtg atcactaaca ccatagtaat gtctaatatt 5461 cacaggcaga tctgcttggg gaagctagtt atgtgaaagg caaatagagt catacagtag 5521 ctcaaaaggc aaccataatt ctctttggtg caggtcttgg gagcgtgatc tagattacac 5581 tgcaccattc ccaagttaat cccctgaaaa cttactctca actggagcaa atgaactttg 5641 gtcccaaata tccatctttt cagtagcgtt aattatgctc tgtttccaac tgcatttcct Table 15 cont'd
5701 ttccaattga attaaagtgt ggcctcgttt ttagtcattt aaaattgttt tctaagtaat 5761 tgctgcctct attatggcac ttcaattttg cactgtcttt tgagattcaa gaaaaatttc 5821 tattcttttt tttgcatcca attgtgcctg aacttttaaa atatgtaaat gctgccatgt 5881 tccaaaccca tcgtcagtgt gtgtgtttag agctgtgcac cctagaaaca acatattgtc 5941 ccatgagcag gtgcctgaga cacagacccc tttgcattca cagagaggtc attggttata 6001 gagacttgaa ttaataagtg acattatgcc agtttctgtt ctctcacagg tgataaacaa 6061 tgctttttgt gcactacata ctcttcagtg tagagctctt gttttatggg aaaaggctca 6121 aatgccaaat tgtgtttgat ggattaatat gcccttttgc cgatgcatac tattactgat 61 81 gtgactcggt tttgtcgcag ctttgctttg tttaatgaaa cacacttgta aacctctttt 6241 gcactttgaa aaagaatcca gcgggatgct cgagcacctg taaacaattt tctcaaccta 6301 tttgatgttc aaataaagaa ttaaactaaa
Exemplary Target Sequence for HG-U133A:205225_AT
SEQ ID NO: 38
attgctgcctctattatggcacttcaattttgcactgtcttttgagattcaagaaaaatt
tctattcatttttttgcatccaattgtgcctgaacttttaaaatatgtaaatgctgccat
gttccaaacccatcgtcagtgtgtgtgtttagagctgtgcaccctagaaacaacatactt gtcccatgagcaggtgcctgagacacagacccctttgcattcacagagaggtcattggtt atagagacttgaattaataagtgacattatgccagtttctgttctctcacaggtgataaa caatgctttttgtgcactacatactcttcagtgtagagctcttgttttatgggaaaaggc tcaaatgccaaattgtgtttgatggattaatatgcccttttgccgatgcatactattact gatgtgactcggttttgtcgcagctttgctttgtttaatgaaacacacttgtaaacctct tttgcactttgaaaaagaatccagcgggatgctcgagcacctgtaaacaatt
Sample Probes:
SEQ ID NO: 39
ATTGCTGCCTCTATTATGGCACTTC
SEQ ID NO: 40
GATGCTCGAGCACCTGTAAACAATT
Table 16:X11 : BTG3
Reference Sequence: NM_001130914.1 GI: 195963405
SEQ ID NO: 41
1 ccctcttccg ggccgcgagc cccctgcgcg ccgctttggg gctgcgctca ctcgtgtgcg
61 cgctcgtccg cccgccagtc ctctcaacgc gcgcttggcc gcccgacgac gcgggagccg 121 cacgcgccgg acgaggctcg ctgcgctccc tgttgcccag cgcgggcccg ttgaggcgga 181 gccctcagtt cccggccagg acacggtctg ggccgccgaa tctccggccg aagagcggcg 241 gcggcagcgg cgggaaaaaa atgaagaatg aaattgctgc cgttgtcttc tttttcacaa 301 ggctagttcg aaaacatgat aagttgaaaa aagaggcagt tgagaggttt gctgagaaat 361 tgaccctaat acttcaagaa aaatataaaa atcactggta tccagaaaaa ccatcgaaag 421 gacaggccta cagatgtatt cgtgtcaata aatttcagag agttgatcct gatgtcctga
481 aagcctgtga aaacagctgc atcttgtata gtgacctggg cttgccaaag gagctcactc 541 tctgggtgga cccatgtgag gtgtgctgtc gtagagatgg ggtttcacca tgttggccag 601 actgctctca aactcctgac ctcgtgatcc gcccgccttg gcctcccaaa gcgctggatt 661 acaggcgtga gccactgcgc ccggcctcct cctttttgat tatgtatgga gagaaaaaca 721 atgcattcat tgttgccagc tttgaaaata aagatgagaa caaggatgag atctccagga
781 aagttaccag ggcccttgat aaggttacct ctgattatca ttcaggatcc tcttcttcag
841 atgaagaaac aagtaaggaa atggaagtga aacccagttc ggtgactgca gccgcaagtc 901 ctgtgtacca gatttcagaa cttatatttc cacctcttcc aatgtggcac cctttgccca
961 gaaaaaagcc aggaatgtat cgagggaatg gccatcagaa tcactatcct cctcctgttc 1021 catttggtta tccaaatcag ggaagaaaaa ataaaccata tcgcccaatt ccagtgacat 1081 gggtacctcc tcctggaatg cattgtgacc ggaatcactg gattaatcct cacatgttag 1 141 cacctcacta acttcgtttt tgattgtgtt ggtgtcatgt tgagaaaaag gtagaataaa
1201 ccttactaca cattaaaagt taaaagttct tactaatagt agtgaagtta gatgggccaa
1261 accatcaaac ttatttttat agaagttatt gagaataatc tttcttaaaa aatatatgca
1321 ctttagatat tgatatagtt tgagaaattt tattaaagtt agtcaagtgc ctaagttttt
1381 aatattggac ttgagtattt atatattgtg catcaactct gttggatacg agaacactgt
1441 agaagtggac gatttgttct agcacctttg agaatttact ttatggagcg tatgtaagtt
1501 atttatatac aaggaaatct attttatgtc gttgtttaag agaattgtgt gaaatcatgt
1561 agttgcaaat aaaaaatagt ttgaggcatg acaaaa
Exemplary Target Sequence for HG-U133A:213134_X_AT;SEQ ID NO: 42 aaataaaccatatcgcccaattccagtgacatgggtacctcctcctggaatgcattgtga
ccggaatcactggattaatcctcacatgttagcacctcactaacttcgtttttgattgtg
ttggtgtcatgttgagaaaaaggtagaataaaccttactacacattaaaagttaaaagtt
cttactaatagtagtgaagttagatgggccaaaccatcaaacttatttttatagaagtta
ttgagaataatctttcttaaaaaatatatgcactttagatattgatatagtttgagaaat
tttattaaagttagtcaagtgcctaagtttttaatattggacttgagtatttatatattg
tgcatcaactctgttggatacgagaacactgtagaagtggacgatttgttctagcacctt
tgagaatttacttta
Sample Probes:
SEQ ID NO: 43
AAATAAACCATATCGCCCAATTCCA
SEQ ID NO: 44
CTAGCACCTTTGAGAATTTACTTTA Table 17:X12: ODC1
Reference Sequence: NM_002539.1 GI:4505488; SEQ ID NO: 45
1 gtcagtccct cctgtagccg ccgccgccgc cgcccgccgc ccctctgcca gcagctccgg 61 cgccacctcg ggccggcgtc tccggcgggc gggagccagg cgctgacggg cgcggcgggg 121 gcggccgagc gctcctgcgg ctgcgactca ggctccggcg tctgcgcttc cccatggggc 181 tggcctgcgg cgcctgggcg ctctgagatt gtcactgctg ttccaagggc acacgcagag 241 ggatttggaa ttcctggaga gttgcctttg tgagaagctg gaaatatttc tttcaattcc 301 atctcttagt tttccatagg aacatcaaga aatcatgaac aactttggta atgaagagtt 361 tgactgccac ttcctcgatg aaggttttac tgccaaggac attctggacc agaaaattaa 421 tgaagtttct tcttctgatg ataaggatgc cttctatgtg gcagacctgg gagacattct 481 aaagaaacat ctgaggtggt taaaagctct ccctcgtgtc accccctttt atgcagtcaa 541 atgtaatgat agcaaagcca tcgtgaagac ccttgctgct accgggacag gatttgactg 601 tgctagcaag actgaaatac agttggtgca gagtctgggg gtgcctccag agaggattat 661 ctatgcaaat ccttgtaaac aagtatctca aattaagtat gctgctaata atggagtcca 721 gatgatgact tttgatagtg aagttgagtt gatgaaagtt gccagagcac atcccaaagc 781 aaagttggtt ttgcggattg ccactgatga ttccaaagca gtctgtcgtc tcagtgtgaa 841 attcggtgcc acgctcagaa ccagcaggct ccttttggaa cgggcgaaag agctaaatat 901 cgatgttgtt ggtgtcagct tccatgtagg aagcggctgt accgatcctg agaccttcgt 961 gcaggcaatc tctgatgccc gctgtgtttt tgacatgggg gctgaggttg gtttcagcat 1021 gtatctgctt gatattggcg gtggctttcc tggatctgag gatgtgaaac ttaaatttga 1081 agagatcacc ggcgtaatca acccagcgtt ggacaaatac tttccgtcag actctggagt 1 141 gagaatcata gctgagcccg gcagatacta tgttgcatca gctttcacgc ttgcagttaa 1201 tatcattgcc aagaaaattg tattaaagga acagacgggc tctgatgacg aagatgagtc 1261 gagtgagcag acctttatgt attatgtgaa tgatggcgtc tatggatcat ttaattgcat 1321 actctatgac cacgcacatg taaagcccct tctgcaaaag agacctaaac cagatgagaa 1381 gtattattca tccagcatat ggggaccaac atgtgatggc ctcgatcgga ttgttgagcg 1441 ctgtgacctg cctgaaatgc atgtgggtga ttggatgctc tttgaaaaca tgggcgctta 1501 cactgttgct gctgcctcta cgttcaatgg cttccagagg ccgacgatct actatgtgat 1561 gtcagggcct gcgtggcaac tcatgcagca attccagaac cccgacttcc cacccgaagt 1621 agaggaacag gatgccagca ccctgcctgt gtcttgtgcc tgggagagtg ggatgaaacg 1681 ccacagagca gcctgtgctt cggctagtat taatgtgtag atagcactct ggtagctgtt 1741 aactgcaagt ttagcttgaa ttaagggatt tggggggacc atgtaactta attactgcta 1801 gttttgaaat gtctttgtaa gagtagggtc gccatgatgc agccatatgg aagactagga 1861 tatgggtcac acttatctgt gttcctatgg aaactatttg aatatttgtt ttatatggat
1921 ttttattcac tcttcagaca cgctactcaa gagtgcccct cagctgctga acaagcattt 1981 gtagcttgta caatggcaga atgggccaaa agcttagtgt tgtgacctgt ttttaaaata 2041 aagtatcttg aaataattag gc
Exemplary Target Sequence for HG-U133A:200790_AT;SEQ ID NO: 46 aaaacatgggcgcttacactgttgctgctgcctctacgttcaatggcttccagaggccga
cgatctactatgtgatgtcagggcctgcgtggcaactcatgcagcaattccagaaccccg
acttcccacccgaagtagaggaacaggatgccagcaccctgcctgtgtcttgtgcctggg agagtgggatgaaacgccacagagcagcctgtgcttcggctagtattaatgtgtagatag
cactctggtagctgttaactgcaagtttagcttgaattaagggatttggggggaccatgt
aacttaattactgctagttttgaaatgtctttgtaagagtagggtcgccatgatgcagcc
atatggaagactaggatatgggtcacacttatctgtgttcctatggaaactatttgaata
tttgttttatatggatttttattcactcttcagacacgctactcaagagtgcccct
Sample Probes:
AAAACATGGGCGCTTACACTGTTGC (SEQ ID NO:47)
AGACACGCTACTCAAGAGTGCCCCT(SEQ ID NO:48) Table 18:X13: MCM5
Reference Sequence: NM_006739.3 GI:143770796;SEQ ID NO: 49
1 accgcctctt gtttttcccg cgaaactcgg cggctgagcg tggaggttct tgtctcccct 61 ggtttgtgaa gtgcggaaaa ccagaggcgc agtcatgtcg ggattcgacg atcctggcat 121 tttctacagc gacagcttcg ggggcgacgc ccaggccgac gaggggcagg cccgcaaatc 181 gcagctgcag aggcgcttca aggagttcct gcggcagtac cgagtgggca ccgaccgcac 241 gggcttcacc ttcaaataca gggatgaact caagcggcat tacaacctgg gggagtactg 301 gattgaggtg gagatggagg atctggccag ctttgatgag gacctggccg actacttgta 361 caagcagcca gccgagcacc tgcagctgct ggaggaagct gccaaggagg tagctgatga 421 ggtgacccgg ccccggcctt ctggggagga ggtgctccag gacatccagg tcatgctcaa 481 gtcggacgcc agcccttcca gcattcgtag cctgaagtcg gacatgatgt cacacctggt 541 gaagatccct ggcatcatca tcgcggcctc tgcggtccgt gccaaggcca cccgcatctc 601 tatccagtgc cgcagctgcc gcaacaccct caccaacatt gccatgcgcc ctggcctcga 661 gggctatgcc ctgcccagga agtgcaacac agatcaggct gggcgcccca aatgcccatt 721 ggacccgtac ttcatcatgc ccgacaaatg caaatgcgtg gacttccaga ccctgaagct 781 gcaggagctg cctgatgcag tcccccacgg ggagatgccc agacacatgc agctctactg 841 cgacaggtac ctgtgtgaca aggtcgtccc tgggaacagg gttaccatca tgggcatcta 901 ctccatcaag aagtttggcc tgactaccag caggggccgt gacagggtgg gcgtgggcat 961 ccgaagctcc tacatccgtg tcctgggcat ccaggtggac acagatggct ctggccgcag 1021 ctttgctggg gccgtgagcc cccaggagga ggaggagttc cgtcgcctgg ctgccctccc 1081 aaatgtctat gaggtcatct ccaagagcat cgccccctcc atctttgggg gcacagacat 1 141 gaagaaggcc attgcctgcc tgctctttgg gggctcccga aagaggctcc ctgatggact 1201 tactcgccga ggagacatca acctgctgat gctaggggac cctgggacag ccaagtccca 1261 gcttctgaag tttgtggaga agtgttctcc cattggggta tacacgtctg ggaaaggcag 1321 cagcgcagct ggactgacag cctcggtgat gagggaccct tcgtcccgga atttcatcat 1381 ggagggcgga gccatggtcc tggccgatgg tggggtcgtc tgtattgacg agtttgacaa 1441 gatgcgagaa gatgaccgtg tggcaatcca cgaagccatg gagcagcaga ccatctctat 1501 cgccaaggct gggatcacca ccaccctgaa ctcccgctgc tccgtcctgg ctgctgccaa 1561 ctcagtgttc ggccgctggg atgagacgaa gggggaggac aacattgact tcatgcccac 1621 catcttgtcg cgcttcgaca tgatcttcat cgtcaaggat gagcacaatg aggagaggga 1681 tgtgatgctg gccaagcatg tcatcactct gcacgtgagc gcactgacac agacacaggc 1741 tgtggagggc gagattgacc tggccaagct gaagaagttt attgcctact gccgagtgaa 1801 gtgtggcccc cggctgtcag cagaggctgc agagaaactg aagaaccgct acatcatcat 1861 gcggagcggg gcccgtcagc acgagaggga cagtgaccgc cgctccagca tccccatcac 1921 tgtgcggcag ctggaggcca ttgtgcgcat cgcggaagcc ctcagcaaga tgaagctgca 1981 gcccttcgcc acagaggcag atgtggagga ggccctgcgg ctcttccaag tgtccacgtt 2041 ggatgctgcc ttgtccggta ccctgtcagg ggtggagggc ttcaccagcc aggaggacca 2101 ggagatgctg agccgcatcg agaagcagct caagcgccgc tttgccattg gctcccaggt 2161 gtctgagcac agcatcatca aggacttcac caagcagaaa tacccggagc acgccatcca 2221 caaggtgctg cagctcatgc tgcggcgcgg cgagatccag catcgcatgc agcgcaaggt 2281 tctctaccgc ctcaagtgag tcgcgccgcc tcactggact catggactcg cccacgcctc 2341 gcccctcctg ccgctgcctg ccattgacaa tgttgctggg acctctgcct ccccactgca 2401 gccctcgaac ttcccaggca ccctcctttc tgccccagag gaaggagctg tagtgtcctg 2461 ctgcctctgg gcgcccgcct ctagcgcggt tctgggaagt gtgcttttgg catccgttaa 2521 taataaagcc acggtgtgtt caggtaaaaa aaaaaaaaaa aaaaaaaa Table 18 cont'd
Exemplary Target Sequence for HG-U133A:216237_S_AT
SEQ ID NO: 50
cggcgagatccagcatcgcatgcagcgcaaggttctctaccgcctcaagtgagtcgcgcc gcctcactggactcatggactcgcccacgcctcgcccctcctgccgctgcctgccattga caatgttgctgggacctctgcctccccactgcagccctcgaacttcccaggcaccctcct ttctgccccagaggaaggagctgtagtgtcctgctgcctctgggcgcccgcctctagcgc ggttctgggaagtgtgcttttggcatccg
Sample Probes:
SEQ ID NO: 51
CGGCGAGATCCAGCATCGCATGCAG
SEQ ID NO: 52
CTGGGAAGTGTGCTTTTGGCATCCG
Table 19:X14: TTK
Reference Sequence: NM_001166691.1 GI:262399360;SEQ ID NO: 53
1 agcattgacc aataggagac cgtagtgata gcgacgggga aattcaaacg tgtttgcgga 61 aaggagtttg ggttccatct tttcatttcc ccagcgcagc tttctgtagt ttttttctta
121 gaaatggaat ccgaggattt aagtggcaga gaattgacaa ttgattccat aatgaacaaa 181 gtgagagaca ttaaaaataa gtttaaaaat gaagacctta ctgatgaact aagcttgaat 241 aaaatttctg ctgatactac agataactcg ggaactgtta accaaattat gatgatggca 301 aacaacccag aggactggtt gagtttgttg ctcaaactag agaaaaacag tgttccgcta 361 agtgatgctc ttttaaataa attgattggt cgttacagtc aagcaattga agcgcttccc 421 ccagataaat atggccaaaa tgagagtttt gctagaattc aagtgagatt tgctgaatta 481 aaagctattc aagagccaga tgatgcacgt gactactttc aaatggccag agcaaactgc 541 aagaaatttg cttttgttca tatatctttt gcacaatttg aactgtcaca aggtaatgtc 601 aaaaaaagta aacaacttct tcaaaaagct gtagaacgtg gagcagtacc actagaaatg 661 ctggaaattg ccctgcggaa tttaaacctc caaaaaaagc agctgctttc agaggaggaa 721 aagaagaatt tatcagcatc tacggtatta actgcccaag aatcattttc cggttcactt 781 gggcatttac agaataggaa caacagttgt gattccagag gacagactac taaagccagg 841 tttttatatg gagagaacat gccaccacaa gatgcagaaa taggttaccg gaattcattg 901 agacaaacta acaaaactaa acagtcatgc ccatttggaa gagtcccagt taaccttcta 961 aatagcccag attgtgatgt gaagacagat gattcagttg taccttgttt tatgaaaaga 1021 caaacctcta gatcagaatg ccgagatttg gttgtgcctg gatctaaacc aagtggaaat 1081 gattcctgtg aattaagaaa tttaaagtct gttcaaaata gtcatttcaa ggaacctctg 1 141 gtgtcagatg aaaagagttc tgaacttatt attactgatt caataaccct gaagaataaa 1201 acggaatcaa gtcttctagc taaattagaa gaaactaaag agtatcaaga accagaggtt 1261 ccagagagta accagaaaca gtggcaatct aagagaaagt cagagtgtat taaccagaat 1321 cctgctgcat cttcaaatca ctggcagatt ccggagttag cccgaaaagt taatacagag 1381 aaacatacca cttttgagca acctgtcttt tcagtttcaa aacagtcacc accaatatca 1441 acatctaaat ggtttgaccc aaaatctatt tgtaagacac caagcagcaa taccttggat 1501 gattacatga gctgttttag aactccagtt gtaaagaatg actttccacc tgcttgtcag 1561 ttgtcaacac cttatggcca acctgcctgt ttccagcagc aacagcatca aatacttgcc 1621 actccacttc aaaatttaca ggttttagca tcttcttcag caaatgaatg catttcggtt 1681 aaaggaagaa tttattccat attaaagcag ataggaagtg gaggttcaag caaggtattt 1741 caggtgttaa atgaaaagaa acagatatat gctataaaat atgtgaactt agaagaagca 1801 gataaccaaa ctcttgatag ttaccggaac gaaatagctt atttgaataa actacaacaa 1861 cacagtgata agatcatccg actttatgat tatgaaatca cggaccagta catctacatg 1921 gtaatggagt gtggaaatat tgatcttaat agttggctta aaaagaaaaa atccattgat 1981 ccatgggaac gcaagagtta ctggaaaaat atgttagagg cagttcacac aatccatcaa 2041 catggcattg ttcacagtga tcttaaacca gctaactttc tgatagttga tggaatgcta 2101 aagctaattg attttgggat tgcaaaccaa atgcaaccag atacaacaag tgttgttaaa 2161 gattctcagg ttggcacagt taattatatg ccaccagaag caatcaaaga tatgtcttcc 2221 tccagagaga atgggaaatc taagtcaaag ataagcccca aaagtgatgt ttggtcctta 2281 ggatgtattt tgtactatat gacttacggg aaaacaccat ttcagcagat aattaatcag 2341 atttctaaat tacatgccat aattgatcct aatcatgaaa ttgaatttcc cgatattcca 2401 gagaaagatc ttcaagatgt gttaaagtgt tgtttaaaaa gggacccaaa acagaggata 2461 tccattcctg agctcctggc tcatccatat gttcaaattc aaactcatcc agttaaccaa 2521 atggccaagg gaaccactga agaaatgaaa tatgttctgg gccaacttgt tggtctgaat 2581 tctcctaact ccattttgaa agctgctaaa actttatatg aacactatag tggtggtgaa 2641 agtcataatt cttcatcctc caagactttt gaaaaaaaaa ggggaaaaaa atgatttgca 2701 gttattcgta atgtcagata ccacctataa aatatattgg actgttatac tcttgaatcc 2761 ctgtggaaat ctacatttga agacaacatc actctgaagt gttatcagca aaaaaaattc Table 19 cont'd
2821 agtagattat ctttaaaaga aaactgtaaa aatagcaacc acttatggca ctgtatatat 2881 tgtagacttg ttttctctgt tttatgctct tgtgtaatct acttgacatc attttactct 2941 tggaatagtg ggtggatagc aagtatattc taaaaaactt tgtaaataaa gttttgtggc 3001 taaaatgaca ctaacattt
Exemplary Target Sequence for HG-U133A:204822_AT
SEQ ID NO: 54
agaggatatccattcctgagctcctggctcatccatatgttcaaattcaaactcatccag ttaaccaaatggccaagggaaccactgaagaaatgaaatatgttctgggccaacttgttg gtctgaattctcctaactccattttgaaagctgctaaaactttatatgaacactatagtg gtggtgaaagtcataattcttcatcctccaagacttttgaaaaaaaaaggggaaaaaaat gatttgcagttattcgtaatgtcagataggaggtataaaatatattggactgttatactc ttgaatccctgtggaaatctacatttgaagacaacatcactctgaagtgttatcagcaaa aaaaattcagtgagattatctttaaaagaaaactgtaaaaatagcaaccacttatggcac tgtatatattgtagacttgttttctctgttttatgctcttgtgtaatctacttgacatca
ttttactct
Sample Probes:
SEQ ID NO: 55
AGAGGATATCCATTCCTGAGCTCCT
SEQ ID NO: 56
AATCTACTTGACATCATTTTACTCT
Table 20;X15: NKNAIN5
Reference Sequence: NM_024522.2 GI:2 317327;SEQ ID NO: 57
1 agtgctgctc tgcgctgcgc cgcgctcggg gctcgctctc cttgctccgc gctccccgcc 61 agccgccccg gggcaggagg cgcgcctgac ggacggcccg ctagacaaag gaggcgcggc 121 tcggcggggc cagcgcgcgg acggacggac catggactcg gagcgcgggc ggccggcccc 181 agccttgggg accggacact cccgggcccg gccctaggcg cccggccccg ccgcccggcg 241 cgcccagcgg ggaggacgtg gagcccgcgc ggcgcgagca ggcggcggcc gcggagcaag 301 aagggcgccg cggcgtgcgg cccgcgcagc ccccggagcc atgggcaagt gcagcgggcg 361 ctgcacgctg gtcgccttct gctgcctgca gctggtggct gcgctggagc ggcagatctt 421 tgacttcctg ggctaccagt gggctcccat cctagccaac ttcctgcaca tcatggcagt 481 catcctgggc atctttggca ccgtgcagta ccgctcccgg tacctcatcc tgtatgcagc 541 ctggctggtg ctctgggttg gctggaatgc atttatcatc tgcttctact tggaggttgg
601 acagctgtcc caggaccggg acttcatcat gaccttcaac acatccctgc accgctcctg 661 gtggatggag aatgggccag gctgcctggt gacacctgtt ctgaactccc gcctggctct 721 ggaggaccac catgtcatct ctgtcactgg ctgcctgctt gactacccct acattgaagc 781 cctcagcagc gccctgcaga tcttcctggc actgttcggc ttcgtgttcg cctgctacgt 841 gagcaaagtg ttcctggagg aggaggacag ctttgacttc atcggcggct ttgactccta 901 cggataccag gcgccccaga agacgtcgca tttacagctg cagcctctgt acacgtcggg 961 gtagcctctg ccccgcgccc accccggcgc ctcgccctgg gctgaccgca gctgccgcga 1021 gctcgggcca aggcgcaggc gtgtccccct ggtggcccgc gcgctcactg cagcctgtgc 1081 ccaaccccgc gtctgcatct ggagatgcgg acttggacgt ggacttggac ttggacttgg 1 141 atttgagctt ggctcttcgc agcccggact tcggaggagt ggggcggggc gggggagggg 1201 caccacgggt tttttgtttt ttgtttgttt gtttttaatc tcagccttgg cgtgagctgg
1261 ggccttcctc tcttctccag cctctccctt tcactcttca cccagcatcc tgcccccctg 1321 tccaaaaaca gcaggacatc agacccatcc catcccacca cactcactca ccagctctgg 1381 ggaaagctac tgtgaactag gagcaggatt cctgggttct aatcgcaggt ccatcactga 1441 ctgtgacgtc tagcaaagcc cttgccctct ctgagcctcg gtttccgcac ctcaagtaat 1501 taatccctta gcaaatggac tcttttagac ttctcattta actcaattcc ctgagctaga
1561 ctgggattaa aattctcatt ttgcagtaca ttaaaactga ggcccagaga tgtgatttgc 1621 ttgaggccac acagctagat ttttggtgga agtgggcctt gaacacagtg tactttctgc 1681 agtttctgac tgtaaaaccc agtgtctgct ctctgagttc catttccaag cccccctcca 1741 tcttggacct atgtggtctc caccatattc acacaccacc accaccactt gccaatgcct 1801 ctcttaaagc aatataccca ttcgttctct tattgggaac tggatggatg aagccccaaa 1861 ttcagcccca cccacagaga agccttccta cactcagcct ctgtccaccc ttggcaaatc 1921 tttcaagctc tctcctccag gaaagtgggg ccccaactca gtcactccac ccccttccag 1981 gtccctgagg ctggttctac tgtatcccca tcacctccac aactccactc acccctgacg 2041 gctccatcca cctcaccagt tggaaggctt gtggtttcag agaggagcaa tgctggtcag 2101 cgctgcccag actccagtgt ttacagatca ccagcattta caaccaatcc aatggccaga 2161 agcctcctct aacaagccca gaaggagttc tgaaggggca gatgggggtg tgagtagtcg 2221 gggagtcggg attgccagca ccctcaccct tccttggggg caagtagagg tgagaacact 2281 ttccccacct ccctccacag acactcctga ggacgctgca tcccacgcac tgcctggtgc 2341 gtccatagag agaggatcag gtctcagcat ttcatctgtg aaagaggcat ggccctgggt 2401 tagaaaggag ggcaggagac atggaggaac tggggggcac ccagatggtg cagatggttt 2461 gcacacctga gcctgtctgt ggtgaccatt ccgctcctct cccactaccc tccaatctat 2521 cattccctac tctctaaggc caaaatatcc tgagcaaggc tggcaacccc accccaccat 2581 cccaaatgca agcagccagg cccaggagtt cctctggccc ccacaggcat ggagctccca 2641 gctggtgggt acagcttgag aggggggcag ctccctcagg ctaagctact gcccttcact 2701 gggccagccc tgcctccagc cctcacctct ctcaccccaa ctctccccca agcccctttc 2761 tactcaacgg gtgtagccac tggtgctttg aagccttttg tttttataag atggtttttg Table 20 cont'd
2821 caaggggacc aggttctctt ttcactggga ccttgcaagg aggggagtgc tctcctggtt 2881 tctgtgcagg cgggttgatt aaagatggtg ttttcttctc taaaaaaaaa
Exemplary Target Sequence for HG-U133_PLUS_2:219438_AT
SEQ ID NO: 58
actgcctggtgcgtccatagagagaggatcaggtctcagcatttcatctgtgaaagaggc atggccctgggttagaaaggagggcaggagacatggaggaactggggggcacccagatgg tgcagatggtttgcacacctgagcctgtctgtggtgaccattccgctcctctcccactac cctccaatctatcattccctactctctaaggccaaaatatcctgagcaaggctggcaacc ccaccccaccatcccaaatgcaagcagccaggcccaggagttcctctggcccccacaggc atggagctcccagctggtgggtacagcttgagaggggggcagctccctcaggctaagcta ctgcccttcactgggccagccctgcctccagccctcacctctctcaccccaactctcccc caagcccctttctactcaacgggtgtagccactggtgctttgaagccttttgtttttata agatggtttttgcaaggggaccaggttctcttttcactgggaccttgcaagg
Sample Probes:
SEQ ID NO: 59
ACTGCCTGGTGCGTCCATAGAGAGA
SEQ ID NO: 60
CTCTTTTCACTGGGACCTTGCAAGG
Table 21 :X16: ID iA
Reference Sequence: NM_000203.3 GI:110611238;SEQ ID NO: 61
1 gtcacatggg gtgcgcgccc agactccgac ccggaggcgg aaccggcagt gcagcccgaa 61 gccccgcagt ccccgagcac gcgtggccat gcgtcccctg cgcccccgcg ccgcgctgct 121 ggcgctcctg gcctcgctcc tggccgcgcc cccggtggcc ccggccgagg ccccgcacct 181 ggtgcatgtg gacgcggccc gcgcgctgtg gcccctgcgg cgcttctgga ggagcacagg 241 cttctgcccc ccgctgccac acagccaggc tgaccagtac gtcctcagct gggaccagca 301 gctcaacctc gcctatgtgg gcgccgtccc tcaccgcggc atcaagcagg tccggaccca 361 ctggctgctg gagcttgtca ccaccagggg gtccactgga cggggcctga gctacaactt 421 cacccacctg gacgggtacc tggaccttct cagggagaac cagctcctcc cagggtttga 481 gctgatgggc agcgcctcgg gccacttcac tgactttgag gacaagcagc aggtgtttga 541 gtggaaggac ttggtctcca gcctggccag gagatacatc ggtaggtacg gactggcgca 601 tgtttccaag tggaacttcg agacgtggaa tgagccagac caccacgact ttgacaacgt 661 ctccatgacc atgcaaggct tcctgaacta ctacgatgcc tgctcggagg gtctgcgcgc 721 cgccagcccc gccctgcggc tgggaggccc cggcgactcc ttccacaccc caccgcgatc 781 cccgctgagc tggggcctcc tgcgccactg ccacgacggt accaacttct tcactgggga 841 ggcgggcgtg cggctggact acatctccct ccacaggaag ggtgcgcgca gctccatctc 901 catcctggag caggagaagg tcgtcgcgca gcagatccgg cagctcttcc ccaagttcgc 961 ggacaccccc atttacaacg acgaggcgga cccgctggtg ggctggtccc tgccacagcc 1021 gtggagggcg gacgtgacct acgcggccat ggtggtgaag gtcatcgcgc agcatcagaa 1081 cctgctactg gccaacacca cctccgcctt cccctacgcg ctcctgagca acgacaatgc 1 141 cttcctgagc taccacccgc accccttcgc gcagcgcacg ctcaccgcgc gcttccaggt 1201 caacaacacc cgcccgccgc acgtgcagct gttgcgcaag ccggtgctca cggccatggg 1261 gctgctggcg ctgctggatg aggagcagct ctgggccgaa gtgtcgcagg ccgggaccgt 1321 cctggacagc aaccacacgg tgggcgtcct ggccagcgcc caccgccccc agggcccggc 1381 cgacgcctgg cgcgccgcgg tgctgatcta cgcgagcgac gacacccgcg cccaccccaa 1441 ccgcagcgtc gcggtgaccc tgcggctgcg cggggtgccc cccggcccgg gcctggtcta 1501 cgtcacgcgc tacctggaca acgggctctg cagccccgac ggcgagtggc ggcgcctggg 1561 ccggcccgtc ttccccacgg cagagcagtt ccggcgcatg cgcgcggctg aggacccggt 1621 ggccgcggcg ccccgcccct tacccgccgg cggccgcctg accctgcgcc ccgcgctgcg 1681 gctgccgtcg cttttgctgg tgcacgtgtg tgcgcgcccc gagaagccgc ccgggcaggt 1741 cacgcggctc cgcgccctgc ccctgaccca agggcagctg gttctggtct ggtcggatga 1801 acacgtgggc tccaagtgcc tgtggacata cgagatccag ttctctcagg acggtaaggc 1861 gtacaccccg gtcagcagga agccatcgac cttcaacctc tttgtgttca gcccagacac 1921 aggtgctgtc tctggctcct accgagttcg agccctggac tactgggccc gaccaggccc 1981 cttctcggac cctgtgccgt acctggaggt ccctgtgcca agagggcccc catccccggg 2041 caatccatga gcctgtgctg agccccagtg ggttgcacct ccaccggcag tcagcgagct 2101 ggggctgcac tgtgcccatg ctgccctccc atcaccccct ttgcaatata tttttatatt
2161 ttattatttt cttttatatc ttggtaaaaa aaaaaaaaaa aaa
Exemplary Target Sequence for HG-U133A:205059_S_AT;SEQ ID NO: 62 gacccaagggcagctggttctggtctggtcggatgaacacgtgggctccaagtgcctgtg
gacatacgagatccagttctctcaggacggtaaggcgtacaccccggtcagcaggaagcc
atcgaccttcaacctctttgtgttcagcccagacacaggtgctgtctctggctcctaccg
agttcgagccctggactactgggc
Sample Probes:
GACCCAAGGGCAGCTGGTTCTGGTC (SEQ ID NO:63)
GAGTTCGAGCCCTGGACTACTGGGC(SEQ ID NO:64) Table 22.X17: SLC43A3
Reference Sequence: NM_014096.2 GI:46410928;SEQ ID NO: 65
1 ggtccccttt cgggcgccat ggggcgccga gcgcggcctg gcccctcggg ctcctctgcg 61 gggagggcag gccgcaggct ggagcggggt gcggaggctg gcggggagcg gcccccggag 121 gctttcctgg tagaagttga tgcgaggaag ggcggcgggg accaggggac ggtattcaga 181 attcgagcgc aggagctccg cttctccacc tgctcccggg gagctattgg gatccagaga 241 atcacccgct gatggttttt gcccaggcct gaaacaacca gagagctacg ggaaaggaag 301 ggcttggctt gccagaggaa ttttccaagt gctcaaacgc caggcttacg gcgcctgtga 361 tccgtccagg aggacaaagt gggatttgaa gatccactcc acttctgctc atggcgggcc 421 agggcctgcc cctgcacgtg gccacactgc tgactgggct gctggaatgc ctgggctttg 481 ctggcgtcct ctttggctgg ccttcactag tgtttgtctt caagaatgaa gattacttta
541 aggatctgtg tggaccagat gctgggccga ttggcaatgc cacagggcag gctgactgca 601 aagcccagga tgagaggttc tcactcatct tcaccctggg gtccttcatg aacaacttca 661 tgacattccc cactggctac atctttgacc ggttcaagac caccgtggca cgcctcatag 721 ccatattttt ctacaccacc gccacactca tcatagcctt cacctctgca ggctcagccg 781 tgctgctctt cctggccatg ccaatgctca ccattggggg aatcctgttt ctcatcacca 841 acctgcagat tgggaaccta tttggccaac accgttcgac catcatcact ctgtacaatg 901 gagcatttga ctcttcctcg gcagtcttcc ttattattaa gcttctttat gaaaaaggca
961 tcagcctcag ggcctccttc atcttcatct ctgtctgcag tacctggcat gtagcacgca 1021 ctttcctcct gatgccccgg gggcacatcc catacccact gccccccaac tacagctatg 1081 gcctgtgccc tgggaatggc accacaaagg aagagaagga aacagctgag catgaaaaca 1 141 gggagctaca gtcaaaggag ttcctttcag cgaaggaaga gaccccaggg gcagggcaga 1201 agcaggaact ccgctccttc tggagctacg ctttctctcg gcgctttgcc tggcacctgg 1261 tgtggctgtc tgtgatacag ttgtggcact acctcttcat tggcactctc aactccttgc 1321 tgaccaacat ggccggtggg gacatggcac gagtcagcac ctacacaaat gcctttgcct 1381 tcactcagtt cggagtgctg tgtgccccct ggaatggcct gctcatggac cggcttaaac 1441 agaagtacca gaaggaagca agaaagacag gttcctccac tttggcggtg gccctctgct 1501 cgacggtgcc ttcgctggcc ctgacatccc tgctgtgcct gggcttcgcc ctctgtgcct 1561 cagtccccat cctccctctc cagtacctca ccttcatcct gcaagtgatc agccgctcct 1621 tcctctatgg gagcaacgcg gccttcctca cccttgcttt cccttcagag cactttggca 1681 agctctttgg gctggtgatg gccttgtcgg ctgtggtgtc tctgctccag ttccccatct 1741 tcaccctcat caaaggctcc cttcagaatg acccatttta cgtgaatgtg atgttcatgc 1801 ttgccattct tctgacattc ttccacccct ttctggtata tcgggaatgc cgtacttgga 1861 aagaaagtcc ctctgcaatt gcatagttca gaagccctca cttttcagcc ccgaggatgg 1921 ttttgttcat cttccaccac ctttgaggac ctcgtgtccc aaaagacttt gcctatccca 1981 gcaaaacaca cacacacaca cacacacaca caaaataaag acacacaagg acgtctgcgc 2041 agcaagaaaa gaatctcagt tgccaagcag attgatatca cacagactca aagcaaaggc 2101 atgtggaact tctttatttc aaaacagaag tgtctccttg cacttagcct tggcagaccc 2161 ttgactccag gggagatgac ctgggggagg aagtgtgtca actatttctt taggcctgtt 2221 tggctccgaa gcctatatgt gcctggatcc tctgccacgg gttaaatttt caggtgaaga 2281 gtgaggttgt catggcctca gctatgcttc ctggctctcc ctcaagagtg cagccttggc 2341 tagagaactc acagctctgg gaaaaagagg agcagacagg gttccctggg cccagtctca 2401 gcccagccac tgatgctgga tgaccttggc ctgaccctgg tctggtctca gaatcacttt 2461 tcccatctgt aaaattgaga tgaattttgg tgttgaaagt tcttcctgga gcagatgtcc 2521 tagaaggttt taggaatagt gacagagtca ggccacccca agggccatgg gagccagctg 2581 acctgcttga ccgaaggatt tctgacagac tatctttggg gatgttttca agaagggata 2641 taagttattt actttgggca tttaaaagaa aatttctctc gggaataatt ttatagaaaa 2701 ataaagcttc tgtgtctaag gcaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa 2761 aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa 2821 aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaa Table 22 cont'd
Exemplary Target Sequence for HG-U133A:213113_S_AT
SEQ ID NO: 66
tccttgcacttagccttggcagacccttgactccaggggagatgacctgggggaggaagt gtgtcaactatttctttaggcctgtttggctccgaagcctatatgtgcctggatcctctg ccacgggttaaattttcaggtgaagagtgaggttgtcatggcctcagctatgcttcctgg ctctccctcaagagtgcagccttggctagagaactcacagctctgggaaaaagaggagca gacagggttccctgggcccagtctcagcccagccactgatgctggatgaccttggcctga ccctggtctggtctcagaatcacttttcccatctgtaaaattgagatgaattttggtgtt gaaagttcttcctggagcagatgtcctagaaggttttaggaatagtgacagagtcaggcc accccaagggccatgggagccagctgacctgcttgaccgaaggatt
Sample Probes:
SEQ ID NO: 67
TCCTTGCACTTAGCCTTGGCAGACC SEQ ID NO: 68
AGCTGACCTGCTTGACCGAAGGATT
Table 23:X18; TXNDC5
Reference Sequence: NM_001145549.2 GI:313482855;SEQ ID NO: 69
1 aattcaaccg cctcttgcac ctcggcaccg agggagggga aggtggggtc gtcgcccttt 61 cgggcagccg ggagtccaaa tgtcaccccg cggtccctgc ccagcgcccc aaacttcctg 121 tgccggccgg acgcgcggcc tgcccgtggg ccacgtgcac tcaccagagc ggccttgctg 181 ctgccgcggc caccggggtc ggctgggaca gactgcgggc acgtcccctt ccagaggctt 241 taactgaaaa atagaaccca ggaaggtgtg gacactgcca gcggctgcag ccgacttgga 301 atgacctggg agacaaatac aacagcatgg aagatgccaa agtctatgtg gctaaagtgg 361 actgcacggc ccactccgac gtgtgctccg cccagggggt gcgaggatac cccaccttaa 421 agcttttcaa gccaggccaa gaagctgtga agtaccaggg tcctcgggac ttccagacac 481 tggaaaactg gatgctgcag acactgaacg aggagccagt gacaccagag ccggaagtgg 541 aaccgcccag tgcccccgag ctcaagcaag ggctgtatga gctctcagca agcaactttg 601 agctgcacgt tgcacaaggc gaccacttta tcaagttctt cgctccgtgg tgtggtcact 661 gcaaagccct ggctccaacc tgggagcagc tggctctggg ccttgaacat tccgaaactg 721 tcaagattgg caaggttgat tgtacacagc actatgaact ctgctccgga aaccaggttc 781 gtggctatcc cactcttctc tggttccgag atgggaaaaa ggtggatcag tacaagggaa 841 agcgggattt ggagtcactg agggagtacg tggagtcgca gctgcagcgc acagagactg 901 gagcgacgga gaccgtcacg ccctcagagg ccccggtgct ggcagctgag cccgaggctg 961 acaagggcac tgtgttggca ctcactgaaa ataacttcga tgacaccatt gcagaaggaa 1021 taaccttcat caagttttat gctccatggt gtggtcattg taagactctg gctcctactt 1081 gggaggaact ctctaaaaag gaattccctg gtctggcggg ggtcaagatc gccgaagtag 1 141 actgcactgc tgaacggaat atctgcagca agtattcggt acgaggctac cccacgttat 1201 tgcttttccg aggagggaag aaagtcagtg agcacagtgg aggcagagac cttgactcgt 1261 tacaccgctt tgtcctgagc caagcgaaag acgaacttta ggaacacagt tggaggtcac 1321 ctctcctgcc cagctcccgc accctgcgtt taggagttca gtcccacaga ggccactggg 1381 ttcccagtgg tggctgttca gaaagcagaa catactaagc gtgaggtatc ttctttgtgt 1441 gtgtgttttc caagccaaca cactctacag attctttatt aagttaagtt tctctaagta 1501 aatgtgtaac tcatggtcac tgtgtaaaca ttttcagtgg cgatatatcc cctttgacct 1561 tctcttgatg aaatttacat ggtttccttt gagactaaaa tagcgttgag ggaaatgaaa 1621 ttgctggact atttgtggct cctgagttga gtgattttgg tgaaagaaag cacatccaaa 1681 gcatagttta cctgcccacg agttctggaa aggtggcctt gtggcagtat tgacgttcct 1741 ctgatcttaa ggtcacagtt gactcaatac tgtgttggtc cgtagcatgg agcagattga 1801 aatgcaaaaa cccacacctc tggaagatac cttcacggcc gctgctggag cttctgttgc 1861 tgtgaatact tctctcagtg tgagaggtta gccgtgatga aagcagcgtt acttctgacc 1921 gtgcctgagt aagagaatgc tgatgccata actttatgtg tcgatacttg tcaaatcagt 1981 tactgttcag gggatccttc tgtttctcac ggggtgaaac atgtctttag ttcctcatgt 2041 taacacgaag ccagagccca catgaactgt tggatgtctt ccttagaaag ggtaggcatg 2101 gaaaattcca cgaggctcat tctcagtatc tcattaactc attgaaagat tccagttgta 2161 tttgtcacct ggggtgacaa gaccagacag gctttcccag gcctgggtat ccagggaggc 2221 tctgcagccc tgctgaaggg ccctaactag agttctagag tttctgattc tgtttctcag 2281 tagtcctttt agaggcttgc tatacttggt ctgcttcaag gaggtcgacc ttctaatgta 2341 tgaagaatgg gatgcatttg atctcaagac caaagacaga tgtcagtggg ctgctctggc 2401 cctggtgtgc acggctgtgg cagctgttga tgccagtgtc ctctaactca tgctgtcctt 2461 gtgattaaac acctctatct cccttgggaa taagcacata caggcttaag ctctaagata 2521 gataggtgtt tgtcctttta ccatcgagct acttcccata ataaccactt tgcatccaac 2581 actcttcacc cacctcccat acgcaagggg atgtggatac ttggcccaaa gtaactggtg 2641 gtaggaatct tagaaacaag accacttata ctgtctgtct gaggcagaag ataacagcag 2701 catctcgacc agcctctgcc ttaaaggaaa tctttattaa tcacgtatgg ttcacagata 2761 attctttttt taaaaaaacc caacctccta gagaagcaca actgtcaaga gtcttgtaca Table 23 cont'd
2821 cacaacttca gctttgcatc acgagtcttg tattccaaga aaatcaaagt ggtacaattt 2881 gtttgtttac actatgatac tttctaaata aactcttttt ttttaaaagt ctggtctttc 2941 cttcaatgtt acagcaaaac agatataaaa tagacaataa attatagttt atatttacaa 3001 aaaaagctgt aagtgcaaac agttgtagat tataaatgta ttatttaatc agtttagtat 3061 gaaattgcct tcccagtaca tgattgtgaa aaagacattt agaaaatatt ctaaaattta 3121 atctgagcct cactttctac aagggaaatc atgatttccg ttcataaaca gcatgctcat 3181 ccccctaaca ccatt
Exemplary Target Sequence for HG-U133A:221253_S_AT
SEQ ID NO: 70
tgtgcacggctgtggcagctgttgatgccagtgtcctctaactcatgctgtccttgtgat taaacacctctatctcccttgggaataagcacatacaggcttaagctctaagatagatag gtgtttgtccttttaccatcgagctacttcccataataaccactttgcatccaacactct tcacccacctcccatacgcaaggggatgtggatacttggcccaaagtaactggtggtagg aatcttagaaacaagaccacttatactgtctgtctgaggcagaagataacagcagcatct cgaccagcctctgccttaaaggaaatctttattaatcacgtatggttcacagataattct ttttttaaaaaaacccaacctcctagagaagcacaactgtcaagagtcttgtacacacaa cttcagctttgcatcacgagtcttgt
Sample Probes:
SEQ ID NO: 71
TGTGCACGGCTGTGGCAGCTGTTGA
SEQ ID NO: 72
TTCAGCTTTGCATCACGAGTCTTGT
Table 24:X19: SLC7A8
Reference Sequence: NM_012244.2 GI:33286427;SEQ ID NO: 73
1 tcccgaaacc agagggatgg ggccggctgt gcagtagaac ggggatcgaa aagaggaaaa 61 caagggcacg aagaccagcg agaaagaaga ggacacctgg gaaaggcgga agcagaagac 121 ggggaaggga aaagaaaccc atagcaggtg gaaaccagat ctagagcaac accgtcaggt 181 tcacagtttg tttttctaga agagaagaaa gtacctgagg attgctcttt tttcctaccg
241 ttaatgaaaa ctacttttgt cttcatcata aaagaaaaaa ctaaggggag gtaaaggcag 301 tctcctgttt tattaggggg agaggtgaag ggaaatccag gctcactttc tgaataagcc 361 actgcctggt gcacagagca gaaccatcct ggtttctgaa gacacatccc tttcagcaga 421 attccagccg gagtcgctgg cacagttcta tttttatatt taaatgtatg tctcccctgg
481 cctttttttt tttttttttt tttagcaaca cttttcttgt ttgtaaacgc gagtgaccag
541 aaagtgtgaa tgcggagtag gaatattttt cgtgttctct tttatctgct tgcctttttt
601 agagagtagc agtggttcct atttcggaaa aggacgttct aattcaaagc tctctcccaa 661 tatatttaca cgaatacgca tttagaaagg gaggcagctt ttgaggttgc aatcctactg 721 agaaggatgg aagaaggagc caggcaccga aacaacaccg aaaagaaaca cccaggtggg 781 ggcgagtcgg acgccagccc cgaggctggt tccggagggg gcggagtagc cctgaagaaa 841 gagatcggat tggtcagtgc ctgtggtatc atcgtaggga acatcatcgg ctctggaatc 901 tttgtctcgc caaagggagt gctggagaat gctggttctg tgggccttgc tctcatcgtc 961 tggattgtga cgggcttcat cacagttgtg ggagccctct gctatgctga actcggggtc 1021 accatcccca aatctggagg tgactactcc tatgtcaagg acatcttcgg aggactggct 1081 gggttcctga ggctgtggat tgctgtgctg gtgatctacc ccaccaacca ggctgtcatc 1 141 gccctcacct tctccaacta cgtgctgcag ccgctcttcc ccacctgctt ccccccagag 1201 tctggccttc ggctcctggc tgccatctgc ttattgctcc tcacatgggt caactgttcc 1261 agtgtgcggt gggccacccg ggttcaagac atcttcacag ctgggaagct cctggccttg 1321 gccctgatta tcatcatggg gattgtacag atatgcaaag gagagtactt ctggctggag 1381 ccaaagaatg catttgagaa tttccaggaa cctgacatcg gcctcgtcgc actggctttc 1441 cttcagggct cctttgccta tggaggctgg aactttctga attacgtgac tgaggagctt 1501 gttgatccct acaagaacct tcccagagcc atcttcatct ccatcccact ggtcacattt 1561 gtgtatgtct ttgccaatgt cgcttatgtc actgcaatgt ccccccagga gctgctggca 1621 tccaacgccg tcgctgtgac ttttggagag aagctcctag gagtcatggc ctggatcatg 1681 cccatttctg ttgccctgtc cacatttgga ggagttaatg ggtctctctt cacctcctct 1741 cggctgttct tcgctggagc ccgagagggc caccttccca gtgtgttggc catgatccac 1801 gtgaagcgct gcaccccaat cccagccctg ctcttcacat gcatctccac cctgctgatg ] 861 ctggtcacca gcgacatgta cacactcatc aactatgtgg gcttcatcaa ctacctcttc 1921 tatggggtca cggttgctgg acagatagtc cttcgctgga agaagcctga tatcccccgc 1981 cccatcaaga tcaacctgct gttccccatc atctacttgc tgttctgggc cttcctgctg 2041 gtcttcagcc tgtggtcaga gccggtggtg tgtggcattg gcctggccat catgctgaca 2101 ggagtgcctg tctatttcct gggtgtttac tggcaacaca agcccaagtg tttcagtgac 2161 ttcattgagc tgctaaccct ggtgagccag aagatgtgtg tggtcgtgta ccccgaggtg 2221 gagcggggct cagggacaga ggaggctaat gaggacatgg aggagcagca gcagcccatg 2281 taccaaccca ctcccacgaa ggacaaggac gtggcggggc agccccagcc ctgaggacca 2341 ccattccctg gctactctct ccttcctccc ccttttatcc tacctccctg ccttggtcct 2401 gccaacacat gcgagtacac acacacccct ctctctgctt ttgtcaggca gtggtaggac 2461 tttggtgtgg gtggtgagaa attgtaaaca aaaactgaca ttcataccca aagaaccagc 2521 ctctcacccc agggtccatg tcccaggccc cactccagtg ctgcccacac tcccagctgc 2581 tggaggagag gggagatgcc aaggtgccct gcaggacctc cctccgggcc acaccctcag 2641 ctgcctcttc aggaaccgga gctcattact gccttccctc ccagggaggc cccttcagag 2701 aggagaggcc acaggagctg cattgtgggg ggacaggctc aagcaattct gtccccatca 2761 aggggtcagc tggagagacc caagacccta tctgttcacc agggacccaa aatccaaggg Table 24 cont'd
2821 gatgcttccc tctgccctct ttcctgcccc tccccatcat acctgcaccc accccagcca 2881 gggctccctg tccagaattc ggttctcctc aggacgccaa ctcccagagc taaggaccaa 2941 ggagaagaac agcctctcca cccccaagcc aggcggttga ggaacatatt gagaaaggtt 3001 cagattgcag aaacccagcc ctgcccctgc ctcctgcatc cagcccccaa catggtgcca 3061 aagcttccag aagccaaaaa gcttctgatt tttaaggtag tgggcatctc tctcctaatg 3121 acgaagctgc tcagcaactc cacctgcccg ccgcaggaag gagcagtccc ctgctatccc 3181 tgcagccact cccagcacac ccgcacacag ccagcaccac cgcccccacc gtgcacttct 3241 cctctctggg ccttggcttg ggaccaggta cgaaggatcc ccaagccctt caggcctgag 3301 atcagagcca gatcagcctt aagtcacctc ccatccaaga acttggccta aaaatactcc 3361 cctatttcta accctcagga cggatctgat attaaatgcc ttccctggga ggaagggtgc 3421 tttccccctc cctagaggtg cccattccat accctgggag actgaggaga gcattggctg 3481 aagcccagtt cctttcccat ccatccccaa ctccaataat cccccactcc tcgcaggtct 3541 cagtgtcatg ctgtcttggg gcagggtgaa agggtagtgg cagcagggcg cccactctgg 3601 agatcctcaa aaaaggccct cctctgtggc tggcagcctc tgacctttcc ctgggcttca 3661 aaggaaggct atggagtttg ctgtgggccc tgcaaccttc ccagccactc ctgctgcact 3721 aaggacttag gatcctttta tcacaaatcg ggattctctc ccccaccccg aattctgtct 3781 gcttaaactg gaatacacag gagcccttcc tggcctggat ggtgtctccc agcttccccg 3841 cccagcttgc ccaccccata gttggtgaga tgccaagttt ggtctgagtt gtgacccctt 3901 cagagtagat gcccggcagg ctggggttgg cccctggagg gtcaggggac catcttctta 3961 ttccctcttt tctcattcct ccaacttcct cccctccttc aattattttt ttgtaaagtt
4021 gatgccttac tttttggata aatatttttg aagctggtat ttctatttct tttggatttt
4081 ttttaatgta aggttgtttt gggggatgga gttagaacct taatgataat ttctttcgtt 4141 tggtgtaggt tttagagatt tgttttgtgg agaggttttt ttcttttgat gtaataaaat 4201 ttaaaatgga aatgaaaaaa aaaaaaaaaa aaaaaaa
Exemplary Target Sequence for 216092_s_at
SEQ ID NO: 74
tgacctttccctgggcttcaaaggaaggctatggagtttgctgtgggccctgcaaccttc ccagccactcctgctgcactaaggacttaggatccttttataacaaagtccaccccgaat tctgtctgcttaaactggaatacacaggagcccttcctggcctggatggtgtctcccagc ttccccgcccagcttgcccaccccatagttggtgagatgccaagtttggtctgagttgtg accccttcagagtagatgcccggcaggctggggttggcccctggagggtcaggggaccat cttcttattccctcttttctcattcctccaacttcctcccctccttcaattatttttttg
taaagttgatgccttactttttggataaatatttttgaagctggtatttctatttctttt
ggattttttttaatgtaaggttgttttgggggatggagttagaaccttaatgataatttc
tttcgtttggtgtaggttttagagatttgttttgtggagaggttt
Sample Probes:
SEQ ID NO: 75
TGACCTTTCCCTGGGCTTCAAAGGA SEQ ID NO: 76
AGAGATTTGTTTTGTGGAGAGGTTT Table 25:X20: MCM5
Reference Sequence: NM_006739.3 GI:143770796;SEQ ID NO: 77
1 accgcctctt gtttttcccg cgaaactcgg cggctgagcg tggaggttct tgtctcccct 61 ggtttgtgaa gtgcggaaaa ccagaggcgc agtcatgtcg ggattcgacg atcctggcat 121 tttctacagc gacagcttcg ggggcgacgc ccaggccgac gaggggcagg cccgcaaatc 181 gcagctgcag aggcgcttca aggagttcct gcggcagtac cgagtgggca ccgaccgcac 241 gggcttcacc ttcaaataca gggatgaact caagcggcat tacaacctgg gggagtactg 301 gattgaggtg gagatggagg atctggccag ctttgatgag gacctggccg actacttgta 361 caagcagcca gccgagcacc tgcagctgct ggaggaagct gccaaggagg tagctgatga 421 ggtgacccgg ccccggcctt ctggggagga ggtgctccag gacatccagg tcatgctcaa 481 gtcggacgcc agcccttcca gcattcgtag cctgaagtcg gacatgatgt cacacctggt 541 gaagatccct ggcatcatca tcgcggcctc tgcggtccgt gccaaggcca cccgcatctc 601 tatccagtgc cgcagctgcc gcaacaccct caccaacatt gccatgcgcc ctggcctcga 661 gggctatgcc ctgcccagga agtgcaacac agatcaggct gggcgcccca aatgcccatt 721 ggacccgtac ttcatcatgc ccgacaaatg caaatgcgtg gacttccaga ccctgaagct 781 gcaggagctg cctgatgcag tcccccacgg ggagatgccc agacacatgc agctctactg 841 cgacaggtac ctgtgtgaca aggtcgtccc tgggaacagg gttaccatca tgggcatcta 901 ctccatcaag aagtttggcc tgactaccag caggggccgt gacagggtgg gcgtgggcat 961 ccgaagctcc tacatccgtg tcctgggcat ccaggtggac acagatggct ctggccgcag 1021 ctttgctggg gccgtgagcc cccaggagga ggaggagttc cgtcgcctgg ctgccctccc 1081 aaatgtctat gaggtcatct ccaagagcat cgccccctcc atctttgggg gcacagacat 1 141 gaagaaggcc attgcctgcc tgctctttgg gggctcccga aagaggctcc ctgatggact 1201 tactcgccga ggagacatca acctgctgat gctaggggac cctgggacag ccaagtccca 1261 gcttctgaag tttgtggaga agtgttctcc cattggggta tacacgtctg ggaaaggcag 1321 cagcgcagct ggactgacag cctcggtgat gagggaccct tcgtcccgga atttcatcat 1381 ggagggcgga gccatggtcc tggccgatgg tggggtcgtc tgtattgacg agtttgacaa 1441 gatgcgagaa gatgaccgtg tggcaatcca cgaagccatg gagcagcaga ccatctctat 1501 cgccaaggct gggatcacca ccaccctgaa ctcccgctgc tccgtcctgg ctgctgccaa 1561 ctcagtgttc ggccgctggg atgagacgaa gggggaggac aacattgact tcatgcccac 1621 catcttgtcg cgcttcgaca tgatcttcat cgtcaaggat gagcacaatg aggagaggga 1681 tgtgatgctg gccaagcatg tcatcactct gcacgtgagc gcactgacac agacacaggc 1741 tgtggagggc gagattgacc tggccaagct gaagaagttt attgcctact gccgagtgaa 1801 gtgtggcccc cggctgtcag cagaggctgc agagaaactg aagaaccgct acatcatcat 1861 gcggagcggg gcccgtcagc acgagaggga cagtgaccgc cgctccagca tccccatcac 1921 tgtgcggcag ctggaggcca ttgtgcgcat cgcggaagcc ctcagcaaga tgaagctgca 1981 gcccttcgcc acagaggcag atgtggagga ggccctgcgg ctcttccaag tgtccacgtt 2041 ggatgctgcc ttgtccggta ccctgtcagg ggtggagggc ttcaccagcc aggaggacca 2101 ggagatgctg agccgcatcg agaagcagct caagcgccgc tttgccattg gctcccaggt 2161 gtctgagcac agcatcatca aggacttcac caagcagaaa tacccggagc acgccatcca 2221 caaggtgctg cagctcatgc tgcggcgcgg cgagatccag catcgcatgc agcgcaaggt 2281 tctctaccgc ctcaagtgag tcgcgccgcc tcactggact catggactcg cccacgcctc 2341 gcccctcctg ccgctgcctg ccattgacaa tgttgctggg acctctgcct ccccactgca 2401 gccctcgaac ttcccaggca ccctcctttc tgccccagag gaaggagctg tagtgtcctg 2461 ctgcctctgg gcgcccgcct ctagcgcggt tctgggaagt gtgcttttgg catccgttaa 2521 taataaagcc acggtgtgtt caggtaaaaa aaaaaaaaaa aaaaaaaa
Exemplary Target Sequence for HG-U133A:201755_AT
SEQ ID NO: 78
catcgcggaagccctcagcaagatgaagctgcagcccttcgccacagaggcagatgtgga Table 25 cont'd
ggaggccctgcggctcttccaagtgtccacgttggatgctgccttgtccggtaccctgtc aggggtggagggcttcaccagccaggaggaccaggagatgctgagccgcatcgagaagca gctcaagcgccgctttgccattggctcccaggtgtctgagcacagcatcatcaaggactt caccaagcagaaatacccggagcacgccatccacaaggtgctgcagctcatgctgcggcg cggcgagatccagcatcgcatgcagcgcaaggttctctaccgcctcaagtgagtcgcgcc gctcactggactcatggactcgccacgctcgccctccttgccgctgcctgccattgacaa tgttgctgggacctctgcctccccactgcagccctcgaacttcccaggcaccctcctttc tgccccagaggaaggagctgtagtgtcctgctgcctctgggcgcccgctctagcgggttc tgggaa
Sample Probes:
SEQ ID NO: 79
CATCGCGGAAGCCCTCAGCAAGATG SEQ ID NO: 80
CGCCCGCTCTAGCGGGTTCTGGGAA
Table 26.X21: MELK
Reference Sequence: NM_014791.2 GI:41281490:SEQ ID NO: 81
1 cgaaaagatt cttaggaacg ccgtaccagc cgcgtctctc aggacagcag gcccctgtcc 61 ttctgtcggg cgccgctcag ccgtgccctc cgcccctcag gttctttttc taattccaaa 121 taaacttgca agaggactat gaaagattat gatgaacttc tcaaatatta tgaattacat 181 gaaactattg ggacaggtgg ctttgcaaag gtcaaacttg cctgccatat ccttactgga 241 gagatggtag ctataaaaat catggataaa aacacactag ggagtgattt gccccggatc 301 aaaacggaga ttgaggcctt gaagaacctg agacatcagc atatatgtca actctaccat 361 gtgctagaga cagccaacaa aatattcatg gttcttgagt actgccctgg aggagagctg 421 tttgactata taatttccca ggatcgcctg tcagaagagg agacccgggt tgtcttccgt 481 cagatagtat ctgctgttgc ttatgtgcac agccagggct atgctcacag ggacctcaag 541 ccagaaaatt tgctgtttga tgaatatcat aaattaaagc tgattgactt tggtctctgt 601 gcaaaaccca agggtaacaa ggattaccat ctacagacat gctgtgggag tctggcttat 661 gcagcacctg agttaataca aggcaaatca tatcttggat cagaggcaga tgtttggagc 721 atgggcatac tgttatatgt tcttatgtgt ggatttctac catttgatga tgataatgta 781 atggctttat acaagaagat tatgagagga aaatatgatg ttcccaagtg gctctctccc 841 agtagcattc tgcttcttca acaaatgctg caggtggacc caaagaaacg gatttctatg 901 aaaaatctat tgaaccatcc ctggatcatg caagattaca actatcctgt tgagtggcaa 961 agcaagaatc cttttattca cctcgatgat gattgcgtaa cagaactttc tgtacatcac 1021 agaaacaaca ggcaaacaat ggaggattta atttcactgt ggcagtatga tcacctcacg 1081 gctacctatc ttctgcttct agccaagaag gctcggggaa aaccagttcg tttaaggctt 1 141 tcttctttct cctgtggaca agccagtgct accccattca cagacatcaa gtcaaataat 1201 tggagtctgg aagatgtgac cgcaagtgat aaaaattatg tggcgggatt aatagactat 1261 gattggtgtg aagatgattt atcaacaggt gctgctactc cccgaacatc acagtttacc 1321 aagtactgga cagaatcaaa tggggtggaa tctaaatcat taactccagc cttatgcaga 1381 acacctgcaa ataaattaaa gaacaaagaa aatgtatata ctcctaagtc tgctgtaaag 1441 aatgaagagt actttatgtt tcctgagcca aagactccag ttaataagaa ccagcataag 1501 agagaaatac tcactacgcc aaatcgttac actacaccct caaaagctag aaaccagtgc 1561 ctgaaagaaa ctccaattaa aataccagta aattcaacag gaacagacaa gttaatgaca 1621 ggtgtcatta gccctgagag gcggtgccgc tcagtggaat tggatctcaa ccaagcacat 1681 atggaggaga ctccaaaaag aaagggagcc aaagtgtttg ggagccttga aagggggttg 1741 gataaggtta tcactgtgct caccaggagc aaaaggaagg gttctgccag agacgggccc 1801 agaagactaa agcttcacta taatgtgact acaactagat tagtgaatcc agatcaactg 1861 ttgaatgaaa taatgtctat tcttccaaag aagcatgttg actttgtaca aaagggttat 1921 acactgaagt gtcaaacaca gtcagatttt gggaaagtga caatgcaatt tgaattagaa 1981 gtgtgccagc ttcaaaaacc cgatgtggtg ggtatcagga ggcagcggct taagggcgat 2041 gcctgggttt acaaaagatt agtggaagac atcctatcta gctgcaaggt ataattgatg 2101 gattcttcca tcctgccgga tgagtgtggg tgtgatacag cctacataaa gactgttatg 2161 atcgctttga ttttaaagtt cattggaact accaacttgt ttctaaagag ctatcttaag 2221 accaatatct ctttgttttt aaacaaaaga tattattttg tgtatgaatc taaatcaagc 2281 ccatctgtca ttatgttact gtctttttta atcatgtggt tttgtatatt aataattgtt
2341 gactttctta gattcacttc catatgtgaa tgtaagctct taactatgtc tctttgtaat 2401 gtgtaatttc tttctgaaat aaaaccattt gtgaatataa aaaaaaaaaa aaaaaaaaaa 2461 aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa a Table 26 cont'd
Exemplary Target Sequence for HG-U133A:204825_AT
SEQ ID NO: 82
atgtggtgggtatcaggaggcagcggcttaagggcgatgcctgggtttacaaaagattag tggaagacatcctatctagctgcaaggtataattgatggattcttccatcctgccggatg agtgtgggtgtgatacagcctacataaagactgttatgatcgctttgattttaaagttca ttggaactaccaacttgtttctaaagagctatcttaagaccaatatctctttgtttttaa acaaaagatattattttgtgtatgaatctaaatcaagcccatctgtcattatgttactgt cttttttaatcatgtggttttgtatattaataattgttgactttcttagattcacttcca tatgtgaatgtaagctcttaactatgtctctttgtaa
Sample Probes:
SEQ ID NO: 83
AAGACTGTTATGATCGCTTTGATTT SEQ ID NO: 84
GGAGGCAGCGGCTTAAGGGCGATGC

Claims

WHAT IS CLAIMED IS:
1. An in vitro method to identify whether a breast cancer patient is treatment responder or non-responder comprising: determining the expression level of a first set of genes comprising CCNDl , CELSRl , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCTl , RARA, and UBE2J1 genes in a tumor or blood sample from the patient, wherein the level of expression of the CCNDl , CELSRl , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCTl , RARA, and UBE2J1 genes indicates that the subject is a responder or non-responder.
2. An in vitro method to identify whether a breast cancer patient is treatment responder or non-responder comprising: determining the expression level of second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes in a tumor or blood sample from the patient, wherein the level of expression of the ESR1 , BTG3, ODC1 , MCM5, TTK, ΝΚΑΓΝ1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes indicates that the subject is a responder or non-responder.
3. An in vitro method to identify whether a breast cancer patient is treatment responder or non-responder comprising: determining the expression level of a first set of genes comprising CCNDl , CELSRl , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes in a tumor or blood sample from the patient; inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score; and comparing the score to a cutoff value to identify the patient as a treatment responder or nonresponder.
4. A computing device implemented method to identify whether a breast cancer patient is treatment responder or non-responder comprising: a)receiving gene expression levels of a first set of genes comprising CCNDl , CELSRl , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCTl, RARA, and UBE2J1 genes or a second set of genes comprising ESR1, BTG3, ODC1 , MCM5, TTK, ΝΚΑΓΝ1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a tumor or blood sample from the breast cancer patient at a receiver module; b) inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score in a scoring module c) comparing the score to a cutoff value in a diagnostic module to identify the patient as a treatment responder or nonresponder; and d) communicating the identification of the patient as a treatment responder or nonresponder to a user.
5. The methods of any one of claims 1 to 4, wherein determining the expression level of the first set of genes and the second set of genes is determined by measuring mRNA levels of the genes in the sample.
6. The methods of any one of claims 1 to 4, wherein the gene expression levels are measured by microarray analysis or PCR.
7. The methods of any one of claims 1 to 6, wherein the breast cancer patient is in a set of breast cancer patients.
8. The method of claim 7, wherein the set of breast cancer patients is treated with an agent that treats breast cancer.
9. The method of any one of claims 1 to 6, further comprising treating the patient identified as a nonresponder without any preoperative chemotherapy.
10. The method of any one of claims 3 to 4 wherein the predictive function is
Figure imgf000108_0001
Wherein Yl = XI ,
Y2 = [(X2)A(-1 )]* 102,
Y3 = [(X3)A(-3.8)]* 104,
Y4= [(Χ4)Λ(-5)]* 105,
Y5 = [(X5)A(-1.5)]* 102,
Υ6 = [(Χ6)Λ(-1)]* 102,
Y7 = [(X7)A(-4)]* 104,
Y8 = [(X8)A(-1.8)]* 102, and
Y9 = [(X9)A(-6.2)]* 104, and
XI = 208712_at = CCNDl ,
X2 = 41660 at = CELSR1 , X3 = 207470_at = DKFZp566H0824,
X4 = 20423 l_s_at = FAAH,
X5 = 214768_x_at = IGKV1-5,
X6 = 210150_s_at = LAMA5,
X7 = 202780_at = OXCTl,
X8 = 216300_x_at = RARA, and
X9 = 217825 s at = UBE2Jl.
11. The method of any one of claims 3 to 4, wherein the predictive function is
Figure imgf000109_0001
Wherein
Y10 = ln (X10),
Yll = [(X11)A(0.1)]*10,
Y12 = (X12)A(0.4),
Y13 = (Χ13)Λ(0.2),
Υ14 = [(Χ14)Λ(0.05)]*10,
Υ15 = [(Χ15)Λ(-0.7)]*10,
Y16 = [(Χ16)Λ(0.1)]*10,
Υ17 = [(Χ17)Λ(0.05)]*10,
Υ18 = Χ18,
Υ19 = In (X19),
Υ20 = In (Χ20), and
Y21 = In (X21),
And wherein
I 0 = 205225_at (Μ) = ESR1;
I 1 =213134 χ at (R) = BTG3; X12 = 200790_at (R) = ODC1 ;
X13 = 216237_s_at (R) = MCM5;
X14 = 204822_at (R) = TTK;
X15 = 219438_at (R) = NKAIN1 ;
X16 = 205059_s_at (R) = IDUA;
X17 = 2131 13_s_at (R) = SLC43A3;
X18 = 221253_s_at (R) = TXNDC5;
X19 = 216092_s_at (M) i = SLC7A8;
X20 = 201755_at (M) = MCM5; and
X21 = 204825_at (M) = MELK.
12. The method of claim 3 or claim 4 , further comprising processing the gene expression level data for each gene of the first or second set of genes using RMA or MAS 5 algorithm.
13. The method of claim 3 or claim 4 , further comprising determining the cutoff value by determining a first set of scores from a set of samples from known responders and a second set of scores from known nonresponders and setting the cutoff value at the midpoint of the distance between the mean of the first set of scores and the mean of the second set of scores; and validating the cutoff value for accuracy by comparing another set of scores from known and unknown samples and determining whether those samples are properly classified.
14. A method for screening agents for treating breast cancer, comprising :
a) identifying a breast cancer sample as from a responder or nonresponder by determining the expression level of a first set of genes comprising CCNDl , CELSRl , DKFZp566H0824, FAAH, IGKV1 -5, LAMA5, OXCT1, RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes in a breast cancer or blood sample from the patient; and b) contacting a breast cancer identified as a nonresponder with a potential agent for treating breast cancer; and c) determining whether the agent decreases breast cancer growth
15. The method of claim 14, further comprising selecting the agent that inhibits the growth or spread of breast cancer from nonresponders.
16. A kit for the prognosis of breast cancer comprising: reagents for detecting the expression of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKVl-5, LAMA5, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes in a tumor sample from the patient.
17. The kit of claim 18, wherein the reagents comprise one or more probes, each probe hybridizes to a different gene in the sets of genes.
18. The kit of claim 19, wherein the one or more probes are attached to a surface.
19. The kit of claim 18, wherein the reagents comprise one or more primers, each primer hybridizes to a different gene in the set of genes.
20. The kit of any one of claims 16 to 19, further comprising a computer readable medium comprising instructions to receive gene expression of a first set of genes comprising CCND1, CELSR1, DKFZp566H0824, FAAH, IGKVl-5, LAMA5, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESR1, BTG3, ODC1 , MCM5, TTK, ΝΚΑΓΝ1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient, inputting the levels of gene expression of a first set of genes or a second set of genes into a predictive function to obtain a score c) comparing the score to a cutoff value in a diagnostic module to identify the patient as a treatment responder or nonresponder; and d) communicating the identification of the patient as a treatment responder or nonresponder to a user.
21. The kit of any one of claims 16 to 20, further comprising instructions for communicating gene expression levels of a first set of genes comprising CCND1 , CELSR1 , DKFZp566H0824, FAAH, IGKVl -5, LAMA5, OXCTl , RARA, and UBE2J1 genes or a second set of genes comprising ESR1 , BTG3, ODC1 , MCM5, TTK, NKAIN1 , IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient to a computing device over a wireless network. set of genes comprising ESRl, BTG3, ODCl, MCM5, TTK, NKAINl, IDUA, SLC43A3, TXNDC5, SLC7A8, and MELK genes or both set of genes obtained from a sample from the breast cancer patient to a computing device over a wireless network.
Ill
PCT/US2012/060351 2011-10-17 2012-10-16 Methods and kits for selection of a treatment for breast cancer WO2013059152A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201161548041P 2011-10-17 2011-10-17
US61/548,041 2011-10-17
US201261641532P 2012-05-02 2012-05-02
US61/641,532 2012-05-02

Publications (2)

Publication Number Publication Date
WO2013059152A2 true WO2013059152A2 (en) 2013-04-25
WO2013059152A3 WO2013059152A3 (en) 2015-06-25

Family

ID=48141604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/060351 WO2013059152A2 (en) 2011-10-17 2012-10-16 Methods and kits for selection of a treatment for breast cancer

Country Status (1)

Country Link
WO (1) WO2013059152A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115779090A (en) * 2022-11-30 2023-03-14 江苏省人民医院(南京医科大学第一附属医院) Application of substance for improving UBE2J1 expression in preparation of anti-cancer drugs

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040225449A1 (en) * 1999-06-28 2004-11-11 Bevilacqua Michael P. Systems and methods for characterizing a biological condition or agent using selected gene expression profiles
CA2528669A1 (en) * 2003-06-09 2005-01-20 The Regents Of The University Of Michigan Compositions and methods for treating and diagnosing cancer
US8065093B2 (en) * 2004-10-06 2011-11-22 Agency For Science, Technology, And Research Methods, systems, and compositions for classification, prognosis, and diagnosis of cancers
US20110166838A1 (en) * 2008-06-16 2011-07-07 Sividon Diagnostics Algorithms for outcome prediction in patients with node-positive chemotherapy-treated breast cancer
US20110165566A1 (en) * 2009-07-09 2011-07-07 Wittliff James L Methods of optimizing treatment of breast cancer
EP2507396A4 (en) * 2009-12-01 2013-06-19 Precision Therapeutics Inc Multi drug response markers for breast cancer cells

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115779090A (en) * 2022-11-30 2023-03-14 江苏省人民医院(南京医科大学第一附属医院) Application of substance for improving UBE2J1 expression in preparation of anti-cancer drugs
CN115779090B (en) * 2022-11-30 2024-02-20 江苏省人民医院(南京医科大学第一附属医院) Application of substance for improving UBE2J1 expression in preparation of anticancer drugs

Also Published As

Publication number Publication date
WO2013059152A3 (en) 2015-06-25

Similar Documents

Publication Publication Date Title
DK2681333T3 (en) EVALUATION OF RESPONSE TO GASTROENTEROPANCREATIC NEUROENDOCRINE NEOPLASIS (GEP-NENE) THERAPY
EP1812590B1 (en) Methods and reagents for the detection of melanoma
RU2721916C2 (en) Methods for prostate cancer prediction
AU2008286361B2 (en) IVIG modulation of chemokines for treatment of multiple sclerosis, Alzheimer's disease, and Parkinson's disease
AU2016295347A1 (en) Gene signature for immune therapies in cancer
US20100167939A1 (en) Multigene assay to predict outcome in an individual with glioblastoma
CA2430981A1 (en) Gene expression profiling of primary breast carcinomas using arrays of candidate genes
KR100964193B1 (en) Markers for liver cancer prognosis
KR20160052729A (en) Molecular diagnostic test for lung cancer
WO2010030365A2 (en) Thyroid tumors identified
CN111479933A (en) Assessment of JAK-STAT1/2 cell signaling pathway activity using mathematical modeling of target gene expression
IL205635A (en) Method for measuring resistance or sensitivity to docetaxel
CN111448325A (en) Assessment of JAK-STAT3 cell signaling pathway activity using mathematical modeling of target gene expression
KR20140140069A (en) Compositions and methods for diagnosis and treatment of pervasive developmental disorder
AU2016377391A1 (en) Triage biomarkers and uses therefor
WO2013059152A2 (en) Methods and kits for selection of a treatment for breast cancer
AU2018304242B2 (en) Methods for detection of plasma cell dyscrasia
KR20090025898A (en) Marker, kit, microarray and method for predicting the risk of lung cancer recurrence
IL270787B2 (en) Methods for melanoma detection
KR102631854B1 (en) Use of BUB1 as a biomarker for predicting the prognosis of Non-muscle invasive bladder cancer
AU2013276992C1 (en) IVIG Modulations of Chemokines for Treatment of Multiple Sclerosis, Alzheimer's Disease, and Parkinson's Disease
KR20100115283A (en) Markers for liver cancer prognosis
KR20220099686A (en) Metastic interval-specific markers for diagnosing prognosis and determining treatment strategies in metastatic solid cancer patients
CN117099000A (en) Cardiovascular diseases
NZ618191A (en) Molecular diagnostic test for cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12841452

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07/08/2014)

122 Ep: pct app. not ent. europ. phase

Ref document number: 12841452

Country of ref document: EP

Kind code of ref document: A2