WO2018174861A1 - Methods and compositions for detecting early stage breast cancer with rna-seq expression profiling - Google Patents

Methods and compositions for detecting early stage breast cancer with rna-seq expression profiling Download PDF

Info

Publication number
WO2018174861A1
WO2018174861A1 PCT/US2017/023475 US2017023475W WO2018174861A1 WO 2018174861 A1 WO2018174861 A1 WO 2018174861A1 US 2017023475 W US2017023475 W US 2017023475W WO 2018174861 A1 WO2018174861 A1 WO 2018174861A1
Authority
WO
WIPO (PCT)
Prior art keywords
breast cancer
reagents
sample
biomarker
target analytes
Prior art date
Application number
PCT/US2017/023475
Other languages
French (fr)
Inventor
Bruce Xuefeng Ling
Limin Chen
Shiying Hao
Original Assignee
Mprobe Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mprobe Inc. filed Critical Mprobe Inc.
Priority to PCT/US2017/023475 priority Critical patent/WO2018174861A1/en
Publication of WO2018174861A1 publication Critical patent/WO2018174861A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to expression profiling to differentiate early stage breast cancer patients from normal subjects.
  • BC Breast cancer
  • cancer antigen-125 CA125
  • CEA carcinoembryonic antigen
  • CA199 antigen-199
  • RNA-seq Serum RNAs and proteins found to correlate with tumor status and/or patient survival are increasingly being applied as diagnostic and prognostic indicators in various carcinomas.
  • RNA-seq technology provides a revolutionary tool for transcriptome analysis. Compared with microarray platform, RNA-seq has less background noise due to image analysis and is more sensitive in detection of transcripts with low-abundance or higher fold change in expression. In this invention, we use RNA-seq to find biomarkers for breast cancer early detection.
  • methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all of the target molecules selected from Table 3, or any sub-combinations thereof, in a sample from a subject.
  • methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all early stage breast cancer biomarkers identified in experiment conducted during development of embodiments of the present invention.
  • biomarkers are selected from Table 3, or any sub-combinations thereof.
  • a method comprises detecting the level of one or more biomarkers in a sample from a subject.
  • a method of monitoring breast cancer (e.g., response to treatment, likelihood of mortality, etc.) in a subject comprises forming a biomarker panel having 50 biomarker proteins from breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., comprising HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, USAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL
  • N is 1 to 50. In some embodiments, N is 2 to 50. In some embodiments, methods comprise panels of any combination of the breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP1 1 , CHRDL1, ASF1 B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C,
  • methods comprise
  • methods comprise comparing biomarker(s) level to a reference value/range or a threshold. In some embodiments, deviation of the biomarker(s) level from the reference value/range, or exceeding or failing to meet the threshold, is indicative of a diagnosis, prognosis, etc. for the subject.
  • each biomarker may be a protein biomarker.
  • the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected.
  • each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected.
  • each biomarker capture reagent may be an antibody or an aptamer.
  • a biomarker is an RNA transcript.
  • the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected.
  • each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected.
  • each biomarker capture reagent may be a nucleic acid probe.
  • the sample may be a biological sample (e.g., tissue, fluid (e.g., blood, urine, saliva, etc.), etc.).
  • the sample is filtered, concentrated (e.g., 2-fold, 5-fold, 10 fold, 20-fold, 50-fold, 100-fold, or more), diluted, or un-manipulated.
  • a methods further comprise treating the subject for breast cancer.
  • treating the subject for breast cancer comprises a treatment regimen of administering one or more chemotherapeutic, radiation, surgery, etc.
  • biomarkers described herein are monitored before, during, and/or after treatment.
  • methods comprise providing palliative treatment (e.g., symptom relief) to a subject suffering from breast cancer, but not providing interventional treatment of the breast cancer.
  • palliative treatment e.g., symptom relief
  • methods comprise providing palliative treatment (e.g., symptom relief) to a subject suffering from breast cancer, but not providing interventional treatment of the breast cancer.
  • palliative care is pursued in place of breast treatment.
  • palliative care is provided in addition to treatment for breast cancer.
  • a method comprises detecting the level of one or more breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1 B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2,
  • the method further comprises measuring the level one or more of the biomarkers at a second time point.
  • breast cancer severity is improving (e.g., declining) if the level of said biomarkers improved at the second time point than at the first time point.
  • biomarkers or panels thereof provide a prognosis regarding the future course a breast cancer in a subject (e.g., likelihood of survival, likelihood of mortality, likelihood of response to therapy, etc.).
  • treatment decisions e.g., whether to treat, surgery, radiation, chemotherapy, etc.
  • are made based on the detection and/or quantification of one or more (e.g., 1, 2, 3, 4, 5) of the biomarkers identified in experiments conducted during development of embodiments of the present invention e.g., comprising HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1, HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300
  • kits are provided.
  • a kit comprises at least one, at least two, at least three, at least four, of at least five capture/detection reagents (e.g., antibody, probe, etc.), wherein each capture/detection reagents specifically binds to a different biomarker (e.g., protein or nucleic acid) selected from the breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ⁇ 5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD,
  • a kit comprises N capture/detection reagents.
  • N is 1 to 50.
  • N is 2 to 50.
  • N is 3 to 50.
  • N is 4 to 50.
  • N is 5 to 50.
  • At least one of the 51 biomarker proteins is selected from the breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1 B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1, ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4,
  • compositions comprising proteins of a sample from a subject and at least one, at least two, at least three, at least four, at least five capture/detection reagents that each specifically bind to a different biomarker selected from the breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 ,
  • FIG. 1 The analysis procedure of RNA sequencing data. Each step and packages used in alignment, quantification, and DE analysis are described in this figure.
  • Figure 2 Scatterplot of calculated probabilities of breast cancer with selected 50-gene panel.
  • the model was trained with Random Forest algorithm, 732/491 case/control (814/546 in total) were selected out randomly to train the model.
  • breast cancer biomarkers are provided.
  • biomarker or “marker” it is meant a molecular entity whose representation in a sample is associated with a disease phenotype.
  • breast cancer it is meant any cancerous growth arising from the breast, for example, Ductal Carcinoma In situ, Invasive Ductal Carcinoma, Triple Negative Breast Cancer, Inflammatory Breast Cancer, Medullary Carcinoma, Tubular Carcinoma, Mucinous Carcinoma, and the like, as known in the art or as described herein.
  • a breast cancer “biomarker” or “breast cancer marker” it is meant a molecular entity whose representation in a sample is associated with a breast cancer phenotype, e.g., the presence of breast cancer, the stage of breast cancer, a prognosis associated with the breast cancer, the predictability of the breast cancer being responsive to a therapy, etc.
  • the marker may be said to be differentially represented in a sample having a breast cancer phenotype.
  • Breast cancer biomarkers include proteins that are differentially represented in a breast cancer phenotype and their corresponding genetic sequences, i.e., mRNA, DNA, etc.
  • a “gene” or “recombinant gene” it is meant a nucleic acid comprising an open reading frame that encodes for the protein.
  • the boundaries of a coding sequence are determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus.
  • a transcription termination sequence may be located 3' to the coding sequence.
  • a gene may optionally include its natural promoter (i.e., the promoter with which the exons and introns of the gene are operably linked in a non-recombinant cell , i.e., a naturally occurring cell), and associated regulatory sequences, and may or may not have sequences upstream of the AUG start site, and may or may not include untranslated leader sequences, signal sequences, downstream untranslated sequences, transcriptional start and stop sequences, polyadenylation signals, translational start and stop sequences, ribosome binding sites, and the like.
  • its natural promoter i.e., the promoter with which the exons and introns of the gene are operably linked in a non-recombinant cell , i.e., a naturally occurring cell
  • associated regulatory sequences may or may not have sequences upstream of the AUG start site, and may or may not include untranslated leader sequences, signal sequences, downstream untranslated sequences, transcriptional start and stop sequences, polyaden
  • gene product or "expression product” are used herein to refer to the RNA transcription products (transcripts) of the gene, including mRNA; and the polypeptide translation products of such RNA transcripts, i.e. the amino acid product encoded by a gene.
  • a gene product can be, for example, an RNA transcript of the gene, e.g. an unspliced RNA, an mRNA, a splice variant mRNA, a microRNA, a fragmented RNA, etc.; or an amino acid product encoded by the gene, including, for example, full length polypeptide, splice variants of the full length polypeptide, post-translationaily modified polypeptide, and fragments of the gene product, e.g. peptides, etc.
  • an elevated level of marker or marker activity may be associated with the breast cancer phenotype.
  • a reduced level of marker or marker activity may be associated with the breast cancer phenotype.
  • T is used to categorize the pathology of the tumor (TX: Primary tumor cannot be assessed. TO: No evidence of primary tumor.
  • Tis Carcinoma in situ (DCIS, LCIS, or Paget disease of the nipple with no associated tumor mass)
  • T1 (includes T1a, T1b, and T1c): Tumor is 2 cm (3/4 of an inch) or less across.
  • T2 Tumor is more than 2 cm but not more than 5 cm (2 inches) across.
  • T3 Tumor is more than 5 cm across.
  • T4 (includes T4a, T4b, T4c, and T4d): Tumor of any size growing into the chest wall or skin. This includes inflammatory breast cancer.);
  • N describes the pathology of local lymph nodes (NX: The regional lymph nodes cannot be evaluated. NO: The cancer has not spread to the regional lymph nodes. N1 : Cancer has spread to 1 to 3 axillary (underarm) lymph node(s), and/or tiny amounts of cancer are found in internal mammary lymph nodes (those near the breast bone) on sentinel lymph node biopsy. N2: Cancer has spread to 4 to 9 lymph nodes under the arm, or cancer has enlarged the internal mammary lymph nodes.
  • N3 Any of the following: N3a: either: Cancer has spread to 10 or more axillary lymph nodes, with at least one area of cancer spread greater than 2mm, OR Cancer has spread to the lymph nodes under the collar bone (infraclavicular nodes), with at least one area of cancer spread greater than 2mm.
  • N3b either: Cancer is found in at least one axillary lymph node (with at least one area of cancer spread greater than 2 mm) and has enlarged the internal mammary lymph nodes, OR Cancer has spread to 4 or more axillary lymph nodes (with at least one area of cancer spread greater than 2 mm), and tiny amounts of cancer are found in internal mammary lymph nodes on sentinel lymph node biopsy.
  • N3c Cancer has spread to the lymph nodes above the collar bone (supraclavicular nodes) with at least one area of cancer spread greater than 2mm.);
  • M describes the extent, if any, of metastasis (MO: The disease has not metastasized.
  • M1 Cancer has spread to distant organs (most often to the bones, lungs, brain, or liver).
  • stage 0 stage 1
  • stage II stage 2
  • Table.1 The TNM classification for staging of breast cancer. Stage T N M
  • a biomarker level is detected using a capture reagent.
  • the capture reagent contains a feature that is reactive with a secondary feature on a solid support. In these embodiments, the capture reagent is exposed to the biomarker in solution, and then the feature on the capture reagent is used in conjunction with the secondary feature on the solid support to immobilize the biomarker on the solid support.
  • Capture reagent is selected based on the type of analysis to be conducted.
  • Capture reagents include but are not limited to aptamers, antibodies, other antibody mimetics and other protein scaffolds, autoantibodies, chimeras, small molecules, F(ab')2 fragments, single chain antibody fragments, FV fragments, single chain FV fragments, nucleic acids, lectins, ligand-binding receptors, affybodies, nanobodies, imprinted polymers, avimers, peptidomimetics, hormone receptors, cytokine receptors, and synthetic receptors, and modifications and fragments of these.
  • biomarker presence or level is detected using a
  • biomarker/capture reagent complex the biomarker presence or level is derived from the biomarker/capture reagent complex and is detected indirectly, such as, for example, as a result of a reaction that is subsequent to the biomarker/capture reagent interaction, but is dependent on the formation of the biomarker/capture reagent complex.
  • biomarker presence or level is detected directly from the biomarker in a biological sample.
  • biomarkers are detected using a multiplexed format that allows for the simultaneous detection of two or more biomarkers in a biological sample.
  • capture reagents are immobilized, directly or indirectly, covalently or non-covalently, in discrete locations on a solid support.
  • a multiplexed format uses discrete solid supports where each solid support has a unique capture reagent associated with that solid support, such as, for example quantum dots.
  • an individual device is used for the detection of each one of multiple biomarkers to be detected in a biological sample. Individual devices are configured to permit each biomarker in the biological sample to be processed simultaneously. For example, a microtiter plate can be used such that each well in the plate is used to analyze one or more of multiple biomarkers to be detected in a biological sample.
  • the fluorescent label is a fluorescent dye molecule.
  • the fluorescent dye molecule includes at least one substituted indolium ring system in which the substituent on the 3-carbon of the indolium ring contains a chemically reactive group or a conjugated substance.
  • the dye molecule includes an AlexFluor molecule, such as, for example, AlexaFluor 488, AlexaFluor 532, AlexaFluor 647, AlexaFluor680, or AlexaFluor 700.
  • the dye molecule includes a first type and a second type of dye molecule, such as, e.g., two different AlexaFluor molecules.
  • the dye molecule includes a first type and a second type of dye molecule, and the two dye molecules have different emission spectra.
  • Fluorescence can be measured with a variety of instrumentation compatible with a wide range of assay formats.
  • instrumentation for example, spectrofluorimeters have been designed to analyze microtiter plates, microscope slides, printed arrays, cuvettes, etc. See Principles of
  • a chemiluminescence tag is optionally used to label a component of the biomarker/capture complex to enable the detection of a biomarker level.
  • Suitable chemiluminescent materials include any of oxalylchloride, Rodamin 6G, Ru(bipy)32+, TMAE (tetrakis(dimethylamino)ethylene), Pyrogallol (1 ,2,3-trihydroxibenzene), Lucigenin, peroxyoxalates, Aryl oxalates, Acridinium esters, dioxetanes, and others.
  • the detection method includes an enzyme/substrate combination that generates a detectable signal that corresponds to the biomarker level (e.g., using the techniques of ELISA, Western blotting, isoelectric focusing).
  • the enzyme catalyzes a chemical alteration of the chromogenic substrate which can be measured using various techniques, including spectrophotometry, fluorescence, and chemiluminescence.
  • Suitable enzymes include, for example, luciferases, luciferin, malate dehydrogenase, urease, horseradish peroxidase (HRPO), alkaline phosphatase, beta-galactosidase, glucoamylase, lysozyme, glucose oxidase, galactose oxidase, and glucose-6-phosphate dehydrogenase, uricase, xanthine oxidase, lactoperoxidase, microperoxidase, and the like.
  • HRPO horseradish peroxidase
  • alkaline phosphatase beta-galactosidase
  • glucoamylase lysozyme
  • glucose oxidase galactose oxidase
  • glucose-6-phosphate dehydrogenase uricase
  • xanthine oxidase lactoperoxidase
  • microperoxidase and the like.
  • the detection method is a combination of fluorescence, chemiluminescence, radionuclide or enzyme/substrate combinations that generate a
  • multimodal signaling has unique and advantageous characteristics in biomarker assay formats.
  • the biomarker levels for the biomarkers described herein is detected using any analytical methods including, singleplex aptamer assays, multiplexed aptamer assays, singleplex or multiplexed immunoassays, mRNA expression profiling histological/cytological methods, etc. as discussed below.
  • Measuring mRNA in a biological sample may, in some embodiments, be used as a surrogate for detection of the level of a corresponding protein in the biological sample.
  • a biomarker or biomarker panel described herein can be detected by detecting the appropriate RNA.
  • mRNA expression levels are measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed with qPCR).
  • RT-PCR is used to create a cDNA from the mRNA.
  • the cDNA may be used in a qPCR assay to produce fluorescence as the DNA amplification process progresses. By comparison to a standard curve, qPCR can produce an absolute measurement such as number of copies of mRNA per cell.
  • Northern blots, microarrays, RNA-seq, Invader assays, and RT-PCR combined with capillary electrophoresis have all been used to measure expression levels of mRNA in a sample. See Gene Expression Profiling; Methods and Protocols, Richard A. Shimkets, editor, Humana Press, 2004; herein incorporated by reference in its entirety.
  • Immunoassay methods are based on the reaction of an antibody to its corresponding target or analyte and can detect the anaiyte in a sample depending on the specific assay format.
  • monoclonal antibodies and fragments are often used because of their specific epitope recognition.
  • Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies.
  • Immunoassays have been designed for use with a wide range of biological sample matrices. Immunoassay formats have been designed to provide qualitative, semi-quantitative, and quantitative results.
  • Quantitative results are generated through the use of a standard curve created with known concentrations of the specific analyte to be detected.
  • the response or signal from an unknown sample is plotted onto the standard curve, and a quantity or level corresponding to the target in the unknown sample is established.
  • ELISA or E!A can be quantitative for the detection of an analyte. This method relies on attachment of a label to either the analyte or the antibody and the label component includes, either directly or indirectly, an enzyme. ELISA tests may be formatted for direct, indirect, competitive, or sandwich detection of the analyte. Other methods rely on labels such as, for example, radioisotopes (I 125 ) or fluorescence.
  • Additional techniques include, for example, agglutination, nephelometry, turbidimetry, Western blot, immunoprecipitation, immunocytochemistry, immunohistochemistry, flow cytometry, Luminex assay, and others (see ImmunoAssay: A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition; herein incorporated by reference in its entirety).
  • Exemplary assay formats include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, fluorescent, chemiluminescence, and fluorescence resonance energy transfer (FRET) or time resolved-FRET (TR-FRET) immunoassays.
  • ELISA enzyme-linked immunosorbent assay
  • FRET fluorescence resonance energy transfer
  • TR-FRET time resolved-FRET
  • biomarkers include biomarker immunoprecipitation followed by quantitative methods that allow size and peptide level discrimination, such as gel electrophoresis, capillary
  • detectable label can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light.
  • detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
  • Any of the methods for detection can be performed in any format that allows for any suitable preparation, processing, and analysis of the reactions. This can be, for example, in multi-well assay plates (e.g., 96 wells or 384 wells) or using any suitable array or microarray. Stock solutions for various agents can be made manually or robotically, and all subsequent pipetting, diluting, mixing, distribution, washing, incubating, sample readout, data collection and analysis can be done robotically using commercially available analysis software, robotics, and detection instrumentation capable of detecting a detectable label.
  • the biomarkers described herein may be detected in a variety of tissue samples using histological or cytological methods.
  • one or more capture reagent/s specific to the corresponding biomarkers are used in a cytological evaluation of a sample and may include one or more of the following: collecting a cell sample, fixing the cell sample, dehydrating, clearing, immobilizing the cell sample on a microscope slide,
  • the cell sample is produced from a cell block.
  • one or more capture reagent s specific to the corresponding biomarkers are used in a histological evaluation of a tissue sample and may include one or more of the following: collecting a tissue specimen, fixing the tissue sample, dehydrating, clearing, immobilizing the tissue sample on a microscope slide, permeabilizing the tissue sample, treating for analyte retrieval, staining, destaining, washing, blocking, rehydrating, and reacting with capture reagent/s in a buffered solution. In another embodiment, fixing and dehydrating are replaced with freezing. Data Analysis and Reporting
  • results are analyzed and/or reported (e.g., to a patient, clinician, researcher, investigator, etc.).
  • Results, analyses, and/or data e.g., signature, disease score, diagnosis, recommended course, etc. are identified and/or reported as an
  • a result may be produced by receiving or generating data (e.g., test results) and transforming the data to provide an outcome or result.
  • An outcome or result may be determinative of an action to be taken.
  • results determined by methods described herein can be independently verified by further or repeat testing.
  • analysis results are reported (e.g., to a health care professional (e.g., laboratory technician or manager, physician, nurse, or assistant, etc.), patient, researcher, investigator, etc.).
  • a result is provided on a peripheral, device, or component of an apparatus.
  • an outcome is provided by a printer or display.
  • an outcome is reported in the form of a report.
  • an outcome can be displayed in a suitable format that facilitates downstream use of the reported information.
  • Generating and reporting results from the methods described herein comprises transformation of biological data (e.g., presence or level of biomarkers) into a representation of the characteristics of a subject (e.g., likelihood of mortality, likelihood corresponding to treatment, etc.). Such a representation reflects information not determinable in the absence of the method steps described herein. Converting biologic data into understandable characteristics of a subject allows actions to be taken in response such information.
  • a downstream individual upon receiving or reviewing a report comprising one or more results determined from the analyses provided herein, will take specific steps or actions in response. For example, a decision about whether or not to treat the subject, and/or how to treat the subject is made.
  • receiving a report refers to obtaining, by a communication means, a written and/or graphical representation comprising results or outcomes of analysis.
  • the report may be generated by a computer or by human data entry, and can be communicated using electronic means (e.g., over the internet, via computer, via fax, from one network location to another location at the same or different physical sites), or by another method of sending or receiving data (e.g., mail service, courier service and the like).
  • the outcome is transmitted in a suitable medium, including, without limitation, in verbal, document, or file form.
  • the file may be, for example, but not limited to, an auditory file, a computer readable file, a paper file, a laboratory file or a medical record file.
  • a report may be encrypted to prevent unauthorized viewing.
  • systems and method described herein transform data from one form into another form (e.g., from biomarker levels to diagnoistic/prognostic determination, etc.).
  • the terms “transformed”, “transformation”, and grammatical derivations or equivalents thereof refer to an alteration of data from a physical starting material (e.g., biological sample, etc.) into a digital representation of the physical starting material (e.g., biomarker levels), a condensation/representation of that starting material (e.g., risk level), or a recommended action (e.g., treatment, no treatment, etc.).
  • any combination of the biomarkers described herein can be detected using a suitable kit, such as for use in performing the methods disclosed herein.
  • the biomarkers described herein may be combined in any suitable combination, or may be combined with other markers not described herein.
  • any kit can contain one or more detectable labels as described herein, such as a fluorescent moiety, etc.
  • a kit includes (a) one or more capture reagents for detecting one or more biomarkers in a biological sample, and optionally (b) one or more software or computer program products for providing a diagnosis/prognosis for the individual from whom the biological sample was obtained.
  • one or more instructions for manually performing the above steps by a human can be provided.
  • a kit comprises a solid support, a capture reagent, and a signal generating material.
  • the kit can also include instructions for using the devices and reagents, handling the sample, and analyzing the data. Further the kit may be used with a computer system or software to analyze and report the result of the analysis of the biological sample.
  • kits can also contain one or more reagents (e.g., solubilization buffers, detergents, washes, or buffers) for processing a biological sample.
  • reagents e.g., solubilization buffers, detergents, washes, or buffers
  • Any of the kits described herein can also include, e.g., buffers, blocking agents, mass spectrometry matrix materials, serum/plasma separators, antibody capture agents, positive control samples, negative control samples, software and information such as protocols, guidance and reference data.
  • kits are provided for the analysis of glioma, wherein the kits comprise PCR primers for one or more biomarkers described herein.
  • a kit may further include instructions for use and correlation of the biomarkers.
  • kits may include a DNA array containing the complement of one or more of the biomarkers described herein, reagents, and/or enzymes for amplifying or isolating sample DNA.
  • the kits may include reagents for real-time PCR, for example, TaqMan probes and/or primers, and enzymes.
  • a kit can comprise (a) reagents comprising at least one capture reagent for determining the level of one or more biomarkers in a test sample, and optionally (b) one or more algorithms or computer programs for performing the steps of comparing the amount of each biomarker quantified in the test sample to one or more predetermined cutoffs.
  • an algorithm or computer program assigns a score for each biomarker quantified based on said comparison and, in some embodiments, combines the assigned scores for each biomarker quantified to obtain a total score.
  • an algorithm or computer program compares the total score with a predetermined score, and uses the comparison to determine a diagnosis/prognosis.
  • one or more instructions for manually performing the above steps by a human can be provided.
  • therapy is administered to treat breast cancer.
  • therapy is administered to treat complications of breast cancer (e.g., surgery, radiation, chemotherapy).
  • treatment comprises palliative care.
  • methods of monitoring treatment of glioma are provided.
  • the present methods of detecting biomarkers are carried out at a time 0.
  • the method is carried out again at a time 1 , and optionally, a time 2, and optionally, a time 3, etc., in order to monitor the progression of breast cancer or to monitor the effectiveness of one or more treatments of breast cancer.
  • Time points for detection may be separated by, for example at least 4 hours, at least 8 hours, at least 12 hours, at least 1 day, at least 2 days, at least 4 days, at least 1 week, at least 2 weeks, at least 1 month, at least 2 months, at least 3 months, at least 4 months, at least 6 months, or by 1 year or more.
  • a treatment regimen is altered based upon the results of monitoring (e.g., upon determining that a first treatment is ineffective).
  • the level of intervention may be altered.
  • Reagents, cloning vectors, and kits for genetic manipulation referred to in this disclosure are available from commercial vendors such as BioRad, Stratagene, Invitrogen, Sigma-Aldrich, and ClonTech.
  • the raw count RNA sequencing data for breast cancer patients were downloaded from GDC data portal.
  • the patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port.
  • the SRA RNA sequencing data for normal breast tissue were downloaded from GTEx data portal through dbGaP (Table 2).
  • the two data sets were then manually curated based on the available stage and grade information from patient clinical data.
  • Genomic sequencing pipeline for RNA sequencing data Genomic sequencing pipeline for RNA sequencing data.
  • the entire RNA-seq pipeline was divided into two parts for GTEx data: alignment and quantification ( Figure 1).
  • the alignment step consists of: SRA to bam conversion using SRA Toolkits (SRA Toolkit deveiopment team), bam to fastq conversion using Biobambam (Tischier G et al.,2014), and fastq to aligned bam conversion using STAR (Alex D et al,.2016).
  • the quantification step consists of: quality improvement filtering using Fixmate
  • the gene expression profile is then pre-filtered based on the mean expression per gene.
  • the filtered profile is then normalized using quantile metric and is converted into log2 scale.
  • combat package (from edgeR, http://www.r-proiect.org) is then used to perform further normalization between GDC case, GDC control, and GTEx control to minimize the difference between normal controls from two databases ( Figure 1 ). Differentiated gene selection.
  • the normalized gene profile is then analyzed by linear model using R package 'limma' (http://www.r-project.org/).
  • the 50 genes with relatively low p-values and relatively large absolute value of log2 fold change were selected as our panel.
  • the selected gene expression profile was firstly normalized to z-score across all the samples.
  • Receiver-operator characteristic (ROC) analysis was conducted ( Figure 3) to evaluate the ability of the selected gene expression profile in differentiating the subjects in the testing cohort with early stage breast cancer patients from those normal samples. This process was repeated 500 times using bootstrapping algorithm to get more accurate evaluation of the model.
  • RNA sequencing data for early stage breast cancer tissue and normal breast tissue were downloaded from GDC and GTEx data portal.
  • the patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port.
  • the normal breast tissue data from GTEx were processed using developed RNA-seq pipeline. Statistical results for fifty-one selected genes.
  • the 50 genes used to differentiate of early stage are the 50 genes used to differentiate of early stage
  • the Random Forest based risk model stratified all subjects in training and testing cohorts into two levels of risk for progression as discussed above (normal or early stage). 50 selected genes profiles (normalized) were used as the model input. The risk scores of breast cancer were calculated by the model ( Figure 2). We use 0.5 as the cutoff threshold.
  • Unsupervised hierarchical clustering analysis was applied to the selected genes profiles to visually depict the association of the disease status with the abundance patterns of these genes profiles (Figure 4). This analysis demonstrated two major clusters reflecting normal samples and early stage breast cancer samples. The error rate of the unsupervised clustering is 4.19%, which reinforced the effectiveness of the selected gene profiles for breast cancer assessment.

Abstract

Breast cancer markers, breast cancer marker panels, and methods for obtaining a breast cancer marker level representation for a sample are provided, based upon RNAseq expression profiling. These composition and methods find use in a number of applications, including, for example, diagnosing breast cancer, prognosing breast cancer, monitoring a subject with breast cancer, and determining a treatment for breast cancer. In addition, systems, devices, and kits thereof that find use in practicing the subject methods are provided.

Description

FIELD OF THE INVENTION
The present invention relates to expression profiling to differentiate early stage breast cancer patients from normal subjects.
BACKGROUND OF THE INVENTION
Breast cancer (BC) is the most frequently diagnosed cancer and the second leading cause of cancer-related death among American women. As BC is a systemic disease at diagnosis, chemotherapy and hormonal therapy are usually given to eradicate any potential presence of occult micro-metastasis after radical surgery, reducing the risk of relapse and improving overall survival according to validated prognostic factors. However, outcomes for patients with metastatic disease remain poor, with a median overall survival time of two to three years. A lack of effective treatment options, which rely heavily on timely diagnosis, contributes to poor survival in early-stage BC patients. Novel biomarkers are urgently needed to detect early stage BC. However, many identified biomarkers, such as cancer cancer antigen-125 (CA125), carcinoembryonic antigen (CEA), and antigen-199 (CA199), have little clinical value due to low sensitivity, specificity, and reproducibility.
Serum RNAs and proteins found to correlate with tumor status and/or patient survival are increasingly being applied as diagnostic and prognostic indicators in various carcinomas. RNA-seq technology provides a revolutionary tool for transcriptome analysis. Compared with microarray platform, RNA-seq has less background noise due to image analysis and is more sensitive in detection of transcripts with low-abundance or higher fold change in expression. In this invention, we use RNA-seq to find biomarkers for breast cancer early detection.
SUMMARY OF THE INVENTION
In some embodiments, methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all of the target molecules selected from Table 3, or any sub-combinations thereof, in a sample from a subject.
In some embodiments, methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all early stage breast cancer biomarkers identified in experiment conducted during development of embodiments of the present invention. In some embodiments, biomarkers are selected from Table 3, or any sub-combinations thereof. In some embodiments, a method comprises detecting the level of one or more biomarkers in a sample from a subject. In some embodiments, a method of monitoring breast cancer (e.g., response to treatment, likelihood of mortality, etc.) in a subject comprises forming a biomarker panel having 50 biomarker proteins from breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., comprising HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, USAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1 or any sub-combinations thereof), and detecting the level of each of the N biomarker proteins of the panel in a sample from the subject. In some embodiments, N is 1 to 50. In some embodiments, N is 2 to 50. In some embodiments, methods comprise panels of any combination of the breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP1 1 , CHRDL1, ASF1 B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1 or any sub-combinations thereof), in addition to any other breast cancer biomarkers.
In some embodiments, methods comprise comparing biomarker(s) level to a reference value/range or a threshold. In some embodiments, deviation of the biomarker(s) level from the reference value/range, or exceeding or failing to meet the threshold, is indicative of a diagnosis, prognosis, etc. for the subject.
In any of the embodiments described herein, each biomarker may be a protein biomarker. In any of the embodiments described herein, the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected. In some embodiments, each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected. In any of the embodiments described herein, each biomarker capture reagent may be an antibody or an aptamer.
In some embodiments, a biomarker is an RNA transcript. In any of the embodiments described herein, the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected. In some embodiments, each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected. In any of the embodiments described herein, each biomarker capture reagent may be a nucleic acid probe.
In any of the embodiments described herein, the sample may be a biological sample (e.g., tissue, fluid (e.g., blood, urine, saliva, etc.), etc.). In some embodiments, the sample is filtered, concentrated (e.g., 2-fold, 5-fold, 10 fold, 20-fold, 50-fold, 100-fold, or more), diluted, or un-manipulated.
In any of the embodiments described herein, a methods further comprise treating the subject for breast cancer. In some embodiments, treating the subject for breast cancer comprises a treatment regimen of administering one or more chemotherapeutic, radiation, surgery, etc. In some embodiments, biomarkers described herein are monitored before, during, and/or after treatment.
In some embodiments, methods comprise providing palliative treatment (e.g., symptom relief) to a subject suffering from breast cancer, but not providing interventional treatment of the breast cancer. In some embodiments, when embodiments herein indicate a low likelihood of success in treating breast cancer, palliative care is pursued in place of breast treatment. In some embodiments, palliative care is provided in addition to treatment for breast cancer.
In some embodiments, methods of monitoring progression or severity of breast cancer and/or monitoring effectiveness of treatment in a subject are provided. In some embodiments, a method comprises detecting the level of one or more breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1 B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, M E, PLPP4, LINC01614, AL035610.1 , CAR N, UHRF1 or any subcombinations thereof) in a sample from the subject at a first time point. In some embodiments, the method further comprises measuring the level one or more of the biomarkers at a second time point. In some embodiments, breast cancer severity is improving (e.g., declining) if the level of said biomarkers improved at the second time point than at the first time point.
In some embodiments, biomarkers or panels thereof provide a prognosis regarding the future course a breast cancer in a subject (e.g., likelihood of survival, likelihood of mortality, likelihood of response to therapy, etc.). In some embodiments treatment decisions (e.g., whether to treat, surgery, radiation, chemotherapy, etc.) are made based on the detection and/or quantification of one or more (e.g., 1, 2, 3, 4, 5) of the biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., comprising HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1, HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1 or any sub-combinations thereof).
In some embodiments, kits are provided. In some embodiments, a kit comprises at least one, at least two, at least three, at least four, of at least five capture/detection reagents (e.g., antibody, probe, etc.), wherein each capture/detection reagents specifically binds to a different biomarker (e.g., protein or nucleic acid) selected from the breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ΠΊΗ5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1 ). In some embodiments, a kit comprises N capture/detection reagents. In some embodiments, N is 1 to 50. In some embodiments, N is 2 to 50. In some embodiments, N is 3 to 50. In some embodiments, N is 4 to 50. In some embodiments, N is 5 to 50. In some embodiments, at least one of the 51 biomarker proteins is selected from the breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1 B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1, ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1 ). In some embodiments, compositions are provided comprising proteins of a sample from a subject and at least one, at least two, at least three, at least four, at least five capture/detection reagents that each specifically bind to a different biomarker selected from the breast cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1 ).
BRIEF DESCRIPTION OF THE DRAWINGS
The invention wili be best understood from the foilowing detailed description when read in conjunction with the accompanying drawings. The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. It is emphasized that, according to common practice, the various features of the drawings are not to- scale. On the contrary, the dimensions of the various features are arbitrarily expanded or reduced for clarity. Included in the drawings are the foilowing figures.
Figure 1. The analysis procedure of RNA sequencing data. Each step and packages used in alignment, quantification, and DE analysis are described in this figure.
Figure 2. Scatterplot of calculated probabilities of breast cancer with selected 50-gene panel. The model was trained with Random Forest algorithm, 732/491 case/control (814/546 in total) were selected out randomly to train the model.
Figure 3. ROC curves for models of breast cancer assessment with selected biomarker profile evaluated on early stage patients versus normal subjects. Average true positive rate was calculated with 500 10-fold cross validation fits of the model.
Figure 4. Unsupervised hierarchical cluster analysis with heat map shows the
abundance pattern of selected biomarkers of early stage breast cancer patients versus normal subjects.
DETAIL DESCRIPTION OF THE INVENTION
Breast cancer markers and panels
In some aspects of the invention, breast cancer biomarkers are provided. By a
"biomarker" or "marker" it is meant a molecular entity whose representation in a sample is associated with a disease phenotype. By "breast cancer" it is meant any cancerous growth arising from the breast, for example, Ductal Carcinoma In Situ, Invasive Ductal Carcinoma, Triple Negative Breast Cancer, Inflammatory Breast Cancer, Medullary Carcinoma, Tubular Carcinoma, Mucinous Carcinoma, and the like, as known in the art or as described herein. Thus, by a breast cancer "biomarker" or "breast cancer marker" it is meant a molecular entity whose representation in a sample is associated with a breast cancer phenotype, e.g., the presence of breast cancer, the stage of breast cancer, a prognosis associated with the breast cancer, the predictability of the breast cancer being responsive to a therapy, etc. In other words, the marker may be said to be differentially represented in a sample having a breast cancer phenotype.
Breast cancer biomarkers include proteins that are differentially represented in a breast cancer phenotype and their corresponding genetic sequences, i.e., mRNA, DNA, etc. By a "gene" or "recombinant gene" it is meant a nucleic acid comprising an open reading frame that encodes for the protein. The boundaries of a coding sequence are determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus. A transcription termination sequence may be located 3' to the coding sequence. In addition, a gene may optionally include its natural promoter (i.e., the promoter with which the exons and introns of the gene are operably linked in a non-recombinant cell , i.e., a naturally occurring cell), and associated regulatory sequences, and may or may not have sequences upstream of the AUG start site, and may or may not include untranslated leader sequences, signal sequences, downstream untranslated sequences, transcriptional start and stop sequences, polyadenylation signals, translational start and stop sequences, ribosome binding sites, and the like. The term "gene product" or "expression product" are used herein to refer to the RNA transcription products (transcripts) of the gene, including mRNA; and the polypeptide translation products of such RNA transcripts, i.e. the amino acid product encoded by a gene. A gene product can be, for example, an RNA transcript of the gene, e.g. an unspliced RNA, an mRNA, a splice variant mRNA, a microRNA, a fragmented RNA, etc.; or an amino acid product encoded by the gene, including, for example, full length polypeptide, splice variants of the full length polypeptide, post-translationaily modified polypeptide, and fragments of the gene product, e.g. peptides, etc. In some instances, an elevated level of marker or marker activity may be associated with the breast cancer phenotype. In other instances, a reduced level of marker or marker activity may be associated with the breast cancer phenotype.
Breast cancer stage We summarized breast cancer staging information (Tabie 1 ) based on National
Comprehensive Cancer Network. NCCN Clinical Practice Guidelines: Breast Cancer Version. 2.2016.
T is used to categorize the pathology of the tumor (TX: Primary tumor cannot be assessed. TO: No evidence of primary tumor. Tis: Carcinoma in situ (DCIS, LCIS, or Paget disease of the nipple with no associated tumor mass) T1 (includes T1a, T1b, and T1c): Tumor is 2 cm (3/4 of an inch) or less across. T2: Tumor is more than 2 cm but not more than 5 cm (2 inches) across. T3: Tumor is more than 5 cm across. T4 (includes T4a, T4b, T4c, and T4d): Tumor of any size growing into the chest wall or skin. This includes inflammatory breast cancer.);
N describes the pathology of local lymph nodes (NX: The regional lymph nodes cannot be evaluated. NO: The cancer has not spread to the regional lymph nodes. N1 : Cancer has spread to 1 to 3 axillary (underarm) lymph node(s), and/or tiny amounts of cancer are found in internal mammary lymph nodes (those near the breast bone) on sentinel lymph node biopsy. N2: Cancer has spread to 4 to 9 lymph nodes under the arm, or cancer has enlarged the internal mammary lymph nodes. N3: Any of the following: N3a: either: Cancer has spread to 10 or more axillary lymph nodes, with at least one area of cancer spread greater than 2mm, OR Cancer has spread to the lymph nodes under the collar bone (infraclavicular nodes), with at least one area of cancer spread greater than 2mm. N3b: either: Cancer is found in at least one axillary lymph node (with at least one area of cancer spread greater than 2 mm) and has enlarged the internal mammary lymph nodes, OR Cancer has spread to 4 or more axillary lymph nodes (with at least one area of cancer spread greater than 2 mm), and tiny amounts of cancer are found in internal mammary lymph nodes on sentinel lymph node biopsy. N3c: Cancer has spread to the lymph nodes above the collar bone (supraclavicular nodes) with at least one area of cancer spread greater than 2mm.);
And M describes the extent, if any, of metastasis (MO: The disease has not metastasized. M1 : Cancer has spread to distant organs (most often to the bones, lungs, brain, or liver).
By early stage breast cancer, it is meant stage 0, stage I and stage II.
Table.1 The TNM classification for staging of breast cancer. Stage T N M
Stage 0 Tis NO MO
Stage !A T1 NO MO
Stage IB TO or T1 N1 mi MO
TO or T1 N1 (but not N1 mi) MO
Stage II A
T2 NO MO
T2 N1 MO
Stage IIB
T3 NO MO
TO to T2 N2 MO
Stage IMA
T3 N1 or N2 MO
Stage 1MB T4 NO to N2 MO
Stage IMC any T N3 MO
Stage IV Any T Any N M1
Detection of Biomarkers and Determination of Biomarker Levels
The presence of a biomarker or a biomarker level for the biomarkers described herein can be detected using any of a variety of analytical methods. In one embodiment, a biomarker level is detected using a capture reagent. In various embodiments, the capture reagents exposed to the biomarker in solution or is exposed to the biomarker while the capture reagent is immobilized on a solid support. In other embodiments, the capture reagent contains a feature that is reactive with a secondary feature on a solid support. In these embodiments, the capture reagent is exposed to the biomarker in solution, and then the feature on the capture reagent is used in conjunction with the secondary feature on the solid support to immobilize the biomarker on the solid support. The capture reagent is selected based on the type of analysis to be conducted. Capture reagents include but are not limited to aptamers, antibodies, other antibody mimetics and other protein scaffolds, autoantibodies, chimeras, small molecules, F(ab')2 fragments, single chain antibody fragments, FV fragments, single chain FV fragments, nucleic acids, lectins, ligand-binding receptors, affybodies, nanobodies, imprinted polymers, avimers, peptidomimetics, hormone receptors, cytokine receptors, and synthetic receptors, and modifications and fragments of these.
In some embodiments, biomarker presence or level is detected using a
biomarker/capture reagent complex. In some embodiments, the biomarker presence or level is derived from the biomarker/capture reagent complex and is detected indirectly, such as, for example, as a result of a reaction that is subsequent to the biomarker/capture reagent interaction, but is dependent on the formation of the biomarker/capture reagent complex.
In some embodiments, biomarker presence or level is detected directly from the biomarker in a biological sample.
In some embodiments, biomarkers are detected using a multiplexed format that allows for the simultaneous detection of two or more biomarkers in a biological sample. In some embodiments of the multiplexed format, capture reagents are immobilized, directly or indirectly, covalently or non-covalently, in discrete locations on a solid support. In some embodiments, a multiplexed format uses discrete solid supports where each solid support has a unique capture reagent associated with that solid support, such as, for example quantum dots. In some embodiments, an individual device is used for the detection of each one of multiple biomarkers to be detected in a biological sample. Individual devices are configured to permit each biomarker in the biological sample to be processed simultaneously. For example, a microtiter plate can be used such that each well in the plate is used to analyze one or more of multiple biomarkers to be detected in a biological sample.
In some embodiments, the fluorescent label is a fluorescent dye molecule. In some embodiments, the fluorescent dye molecule includes at least one substituted indolium ring system in which the substituent on the 3-carbon of the indolium ring contains a chemically reactive group or a conjugated substance. In some embodiments, the dye molecule includes an AlexFluor molecule, such as, for example, AlexaFluor 488, AlexaFluor 532, AlexaFluor 647, AlexaFluor680, or AlexaFluor 700. In some embodiments, the dye molecule includes a first type and a second type of dye molecule, such as, e.g., two different AlexaFluor molecules. In some embodiments, the dye molecule includes a first type and a second type of dye molecule, and the two dye molecules have different emission spectra.
Fluorescence can be measured with a variety of instrumentation compatible with a wide range of assay formats. For example, spectrofluorimeters have been designed to analyze microtiter plates, microscope slides, printed arrays, cuvettes, etc. See Principles of
Fluorescence Spectroscopy, by J. R. Lakowicz, Springer Science+Business Media, Inc., 2004. See Bioluminescence & Chemiluminescence: Progress & Current Applications; Philip E. Stanley and Larry J. Kricka editors, World Scientific Publishing Company, January 2002.
In one or more embodiments, a chemiluminescence tag is optionally used to label a component of the biomarker/capture complex to enable the detection of a biomarker level. Suitable chemiluminescent materials include any of oxalylchloride, Rodamin 6G, Ru(bipy)32+, TMAE (tetrakis(dimethylamino)ethylene), Pyrogallol (1 ,2,3-trihydroxibenzene), Lucigenin, peroxyoxalates, Aryl oxalates, Acridinium esters, dioxetanes, and others.
In some embodiments, the detection method includes an enzyme/substrate combination that generates a detectable signal that corresponds to the biomarker level (e.g., using the techniques of ELISA, Western blotting, isoelectric focusing). Generally, the enzyme catalyzes a chemical alteration of the chromogenic substrate which can be measured using various techniques, including spectrophotometry, fluorescence, and chemiluminescence. Suitable enzymes include, for example, luciferases, luciferin, malate dehydrogenase, urease, horseradish peroxidase (HRPO), alkaline phosphatase, beta-galactosidase, glucoamylase, lysozyme, glucose oxidase, galactose oxidase, and glucose-6-phosphate dehydrogenase, uricase, xanthine oxidase, lactoperoxidase, microperoxidase, and the like.
In some embodiments, the detection method is a combination of fluorescence, chemiluminescence, radionuclide or enzyme/substrate combinations that generate a
measurable signal. In some embodiments, multimodal signaling has unique and advantageous characteristics in biomarker assay formats.
In some embodiments, the biomarker levels for the biomarkers described herein is detected using any analytical methods including, singleplex aptamer assays, multiplexed aptamer assays, singleplex or multiplexed immunoassays, mRNA expression profiling histological/cytological methods, etc. as discussed below.
Determination of Biomarker Levels Using Gene Expression Profiling
Measuring mRNA in a biological sample may, in some embodiments, be used as a surrogate for detection of the level of a corresponding protein in the biological sample.
Thus, in some embodiments, a biomarker or biomarker panel described herein can be detected by detecting the appropriate RNA.
In some embodiments, mRNA expression levels are measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed with qPCR). RT-PCR is used to create a cDNA from the mRNA. The cDNA may be used in a qPCR assay to produce fluorescence as the DNA amplification process progresses. By comparison to a standard curve, qPCR can produce an absolute measurement such as number of copies of mRNA per cell. Northern blots, microarrays, RNA-seq, Invader assays, and RT-PCR combined with capillary electrophoresis have all been used to measure expression levels of mRNA in a sample. See Gene Expression Profiling; Methods and Protocols, Richard A. Shimkets, editor, Humana Press, 2004; herein incorporated by reference in its entirety.
Determination of Biomarker Levels Using immunoassays
Immunoassay methods are based on the reaction of an antibody to its corresponding target or analyte and can detect the anaiyte in a sample depending on the specific assay format. To improve specificity and sensitivity of an assay method based on immuno-reactivity, monoclonal antibodies and fragments are often used because of their specific epitope recognition. Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies.
Immunoassays have been designed for use with a wide range of biological sample matrices. Immunoassay formats have been designed to provide qualitative, semi-quantitative, and quantitative results.
Quantitative results are generated through the use of a standard curve created with known concentrations of the specific analyte to be detected. The response or signal from an unknown sample is plotted onto the standard curve, and a quantity or level corresponding to the target in the unknown sample is established.
Numerous immunoassay formats have been designed. ELISA or E!A can be quantitative for the detection of an analyte. This method relies on attachment of a label to either the analyte or the antibody and the label component includes, either directly or indirectly, an enzyme. ELISA tests may be formatted for direct, indirect, competitive, or sandwich detection of the analyte. Other methods rely on labels such as, for example, radioisotopes (I125) or fluorescence.
Additional techniques include, for example, agglutination, nephelometry, turbidimetry, Western blot, immunoprecipitation, immunocytochemistry, immunohistochemistry, flow cytometry, Luminex assay, and others (see ImmunoAssay: A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition; herein incorporated by reference in its entirety).
Exemplary assay formats include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, fluorescent, chemiluminescence, and fluorescence resonance energy transfer (FRET) or time resolved-FRET (TR-FRET) immunoassays. Examples of procedures for detecting biomarkers include biomarker immunoprecipitation followed by quantitative methods that allow size and peptide level discrimination, such as gel electrophoresis, capillary
electrophoresis, planar electrochromatography, and the like. Methods of detecting and/or for quantifying a detectable label or signal generating material depend on the nature of the label. The products of reactions catalyzed by appropriate enzymes (where the detectable label is an enzyme; see above) can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light. Examples of detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
Any of the methods for detection can be performed in any format that allows for any suitable preparation, processing, and analysis of the reactions. This can be, for example, in multi-well assay plates (e.g., 96 wells or 384 wells) or using any suitable array or microarray. Stock solutions for various agents can be made manually or robotically, and all subsequent pipetting, diluting, mixing, distribution, washing, incubating, sample readout, data collection and analysis can be done robotically using commercially available analysis software, robotics, and detection instrumentation capable of detecting a detectable label.
Determination of Biomarkers Using Histology/Cytology Methods
In some embodiments, the biomarkers described herein may be detected in a variety of tissue samples using histological or cytological methods. In some embodiments, one or more capture reagent/s specific to the corresponding biomarkers are used in a cytological evaluation of a sample and may include one or more of the following: collecting a cell sample, fixing the cell sample, dehydrating, clearing, immobilizing the cell sample on a microscope slide,
permeabilizing the cell sample, treating for analyte retrieval, staining, destaining, washing, blocking, and reacting with one or more capture reagent/s in a buffered solution. In another embodiment, the cell sample is produced from a cell block.
In some embodiments, one or more capture reagent s specific to the corresponding biomarkers are used in a histological evaluation of a tissue sample and may include one or more of the following: collecting a tissue specimen, fixing the tissue sample, dehydrating, clearing, immobilizing the tissue sample on a microscope slide, permeabilizing the tissue sample, treating for analyte retrieval, staining, destaining, washing, blocking, rehydrating, and reacting with capture reagent/s in a buffered solution. In another embodiment, fixing and dehydrating are replaced with freezing. Data Analysis and Reporting
In some embodiments, the results are analyzed and/or reported (e.g., to a patient, clinician, researcher, investigator, etc.). Results, analyses, and/or data (e.g., signature, disease score, diagnosis, recommended course, etc.) are identified and/or reported as an
outcome/result of an analysis. A result may be produced by receiving or generating data (e.g., test results) and transforming the data to provide an outcome or result. An outcome or result may be determinative of an action to be taken. In some embodiments, results determined by methods described herein can be independently verified by further or repeat testing.
In some embodiments, analysis results are reported (e.g., to a health care professional (e.g., laboratory technician or manager, physician, nurse, or assistant, etc.), patient, researcher, investigator, etc.). In some embodiments, a result is provided on a peripheral, device, or component of an apparatus. For example, sometimes an outcome is provided by a printer or display. In some embodiments, an outcome is reported in the form of a report. Generally, an outcome can be displayed in a suitable format that facilitates downstream use of the reported information. Non-limiting examples of formats suitable for use for reporting and/or displaying data, characteristics, etc. include text, outline, digital data, a graph, graphs, a picture, a pictograph, a chart, a bar graph, a pie-graph, a diagram, a flow chart, a scatter plot, a map, a histogram, a density chart, a function graph, a circuit diagram, a block diagram, a bubble map, a constellation diagram, a contour diagram, a cartogram, spider chart, Venn diagram, nomogram, and the like, and combination of the foregoing. Generating and reporting results from the methods described herein comprises transformation of biological data (e.g., presence or level of biomarkers) into a representation of the characteristics of a subject (e.g., likelihood of mortality, likelihood corresponding to treatment, etc.). Such a representation reflects information not determinable in the absence of the method steps described herein. Converting biologic data into understandable characteristics of a subject allows actions to be taken in response such information.
In some embodiments, a downstream individual (e.g., clinician, patient, etc.), upon receiving or reviewing a report comprising one or more results determined from the analyses provided herein, will take specific steps or actions in response. For example, a decision about whether or not to treat the subject, and/or how to treat the subject is made.
The term "receiving a report" as used herein refers to obtaining, by a communication means, a written and/or graphical representation comprising results or outcomes of analysis. The report may be generated by a computer or by human data entry, and can be communicated using electronic means (e.g., over the internet, via computer, via fax, from one network location to another location at the same or different physical sites), or by another method of sending or receiving data (e.g., mail service, courier service and the like). In some embodiments the outcome is transmitted in a suitable medium, including, without limitation, in verbal, document, or file form. The file may be, for example, but not limited to, an auditory file, a computer readable file, a paper file, a laboratory file or a medical record file. A report may be encrypted to prevent unauthorized viewing.
As noted above, in some embodiments, systems and method described herein transform data from one form into another form (e.g., from biomarker levels to diagnoistic/prognostic determination, etc.). In some embodiments, the terms "transformed", "transformation", and grammatical derivations or equivalents thereof, refer to an alteration of data from a physical starting material (e.g., biological sample, etc.) into a digital representation of the physical starting material (e.g., biomarker levels), a condensation/representation of that starting material (e.g., risk level), or a recommended action (e.g., treatment, no treatment, etc.).
Kits
Any combination of the biomarkers described herein can be detected using a suitable kit, such as for use in performing the methods disclosed herein. The biomarkers described herein may be combined in any suitable combination, or may be combined with other markers not described herein. Furthermore, any kit can contain one or more detectable labels as described herein, such as a fluorescent moiety, etc.
In some embodiments, a kit includes (a) one or more capture reagents for detecting one or more biomarkers in a biological sample, and optionally (b) one or more software or computer program products for providing a diagnosis/prognosis for the individual from whom the biological sample was obtained. Alternatively, rather than one or more computer program products, one or more instructions for manually performing the above steps by a human can be provided.
In some embodiments, a kit comprises a solid support, a capture reagent, and a signal generating material. The kit can also include instructions for using the devices and reagents, handling the sample, and analyzing the data. Further the kit may be used with a computer system or software to analyze and report the result of the analysis of the biological sample.
The kits can also contain one or more reagents (e.g., solubilization buffers, detergents, washes, or buffers) for processing a biological sample. Any of the kits described herein can also include, e.g., buffers, blocking agents, mass spectrometry matrix materials, serum/plasma separators, antibody capture agents, positive control samples, negative control samples, software and information such as protocols, guidance and reference data. In some embodiments, kits are provided for the analysis of glioma, wherein the kits comprise PCR primers for one or more biomarkers described herein. In some embodiments, a kit may further include instructions for use and correlation of the biomarkers. In some embodiments, a kit may include a DNA array containing the complement of one or more of the biomarkers described herein, reagents, and/or enzymes for amplifying or isolating sample DNA. The kits may include reagents for real-time PCR, for example, TaqMan probes and/or primers, and enzymes.
For example, a kit can comprise (a) reagents comprising at least one capture reagent for determining the level of one or more biomarkers in a test sample, and optionally (b) one or more algorithms or computer programs for performing the steps of comparing the amount of each biomarker quantified in the test sample to one or more predetermined cutoffs. In some embodiments, an algorithm or computer program assigns a score for each biomarker quantified based on said comparison and, in some embodiments, combines the assigned scores for each biomarker quantified to obtain a total score. Further, in some embodiments, an algorithm or computer program compares the total score with a predetermined score, and uses the comparison to determine a diagnosis/prognosis. Alternatively, rather than one or more algorithms or computer programs, one or more instructions for manually performing the above steps by a human can be provided.
Methods of Treatment
In some embodiments, following a determination that a subject has suffers from breast cancer, the subject is appropriately treated, in some embodiments, therapy is administered to treat breast cancer. In some embodiments, therapy is administered to treat complications of breast cancer (e.g., surgery, radiation, chemotherapy). In some embodiments, treatment comprises palliative care.
In some embodiments, methods of monitoring treatment of glioma are provided. In some embodiments, the present methods of detecting biomarkers are carried out at a time 0. In some embodiments, the method is carried out again at a time 1 , and optionally, a time 2, and optionally, a time 3, etc., in order to monitor the progression of breast cancer or to monitor the effectiveness of one or more treatments of breast cancer. Time points for detection may be separated by, for example at least 4 hours, at least 8 hours, at least 12 hours, at least 1 day, at least 2 days, at least 4 days, at least 1 week, at least 2 weeks, at least 1 month, at least 2 months, at least 3 months, at least 4 months, at least 6 months, or by 1 year or more. In some embodiments, a treatment regimen is altered based upon the results of monitoring (e.g., upon determining that a first treatment is ineffective). In some embodiments, the level of intervention may be altered.
EXAMPLES
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are ail or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Centigrade, and pressure is at or near atmospheric.
General methods in molecular and cellular biochemistry can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd ED. (Sambrook et ai., HaRBor Laboratory Press 2001 ); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et al. eds., john Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference.
Reagents, cloning vectors, and kits for genetic manipulation referred to in this disclosure are available from commercial vendors such as BioRad, Stratagene, Invitrogen, Sigma-Aldrich, and ClonTech.
EXAMPLE 1
Materials and methods
Data collection and pre-processing.
The raw count RNA sequencing data for breast cancer patients were downloaded from GDC data portal. The patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port. The SRA RNA sequencing data for normal breast tissue were downloaded from GTEx data portal through dbGaP (Table 2). The two data sets were then manually curated based on the available stage and grade information from patient clinical data. In this patent we used 814 eariy stage breast cancer samples and 546 norma! breast samples as our dataset for early stage breast cancer biomarker detection. We manually categorized the data sets based on the available stage and grade information for each samples (Table 2).
Tabie 2. Data sets used for RNA sequencing.
Figure imgf000018_0001
«
Genomic sequencing pipeline for RNA sequencing data.
The entire RNA-seq pipeline was divided into two parts for GTEx data: alignment and quantification (Figure 1). The alignment step consists of: SRA to bam conversion using SRA Toolkits (SRA Toolkit deveiopment team), bam to fastq conversion using Biobambam (Tischier G et al.,2014), and fastq to aligned bam conversion using STAR (Alex D et al,.2016). The quantification step consists of: quality improvement filtering using Fixmate
(http://broadinstitute.github.io/picard/), sorting and quality filtering using samtools (Li H et al,.2009), and sequence counting using HTSeq (Simon A et al,.2014). The output from quantification step results in gene raw counts for GTEx data and is conbined with GDC gene profile for further downstream analysis.
Normalizations of RNA sequencing data.
The gene expression profile is then pre-filtered based on the mean expression per gene. The filtered profile is then normalized using quantile metric and is converted into log2 scale. Combat package (from edgeR, http://www.r-proiect.org) is then used to perform further normalization between GDC case, GDC control, and GTEx control to minimize the difference between normal controls from two databases (Figure 1 ). Differentiated gene selection.
The normalized gene profile is then analyzed by linear model using R package 'limma' (http://www.r-project.org/). The 50 genes with relatively low p-values and relatively large absolute value of log2 fold change were selected as our panel.
Random forest analysis.
The selected gene expression profile was firstly normalized to z-score across all the samples. The z-score of the gene expression profiles for the samples randomized to the statistical training cohort (n=1223) were then analyzed by Random Forest analysis using the R package tandomForest' (http://www.r-project.org/). All subjects in the training cohort were subsequently assigned to one of two possible subgroups (normal and early stage). With the trained model applied to both training cohort and testing cohort (n=137), the possibility of each sample in each subgroup can be calculated (Figure 2). Receiver-operator characteristic (ROC) analysis was conducted (Figure 3) to evaluate the ability of the selected gene expression profile in differentiating the subjects in the testing cohort with early stage breast cancer patients from those normal samples. This process was repeated 500 times using bootstrapping algorithm to get more accurate evaluation of the model.
Heat map.
Unsupervised hierarchical clustering analysis was performed (Figure 4) to visually depict the association between the disease status and the abundance pattern of the selected genes profile. This analysis was used to demonstrate the effectiveness of the selected genes panel in differentiating early stage breast cancer from normal subjects.
EXAMPLE 2
Results
Data collection, pre-processing.
The RNA sequencing data for early stage breast cancer tissue and normal breast tissue were downloaded from GDC and GTEx data portal. The patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port. The normal breast tissue data from GTEx were processed using developed RNA-seq pipeline. Statistical results for fifty-one selected genes.
A linear model from R package "limma" is applied for gene profile between early stage breast cancer and normal samples. P-values and log2 fold change for each selected gene were shown in Table 3.
Table 3
The 50 genes used to differentiate of early stage
breast cancer patients from normal subjects
Gene symbol LogFC FDR
HSPB6 -4.664683236 1.61E-88
FHL1 -4.283651601 1.84E-53 YOC -5.973265213 6.91 E-65
UBE2T 3.065997635 5.30E-57
TPX2 3.482868788 4.37E-47
KIF4A 3.956147517 5.60E-58
MMP11 5.722113465 4.14E-51
CHRDL1 -5.200270693 7.32E-49
ASF1B 3.124992643 4.64E-60
KIF20A 3.547264873 8.73E-54
NEK2 4.508772418 4.92E-87
CENPF 3.282401967 5.02E-54
KCNIP2 -4.684448692 5.72E-88
ADRA1A -4.199336624 2.15E-67
ITIH5 -4.13688626 4.28E-40
HJURP 3.584516363 1.39E-60
COL10A1 6.78475046 6.21 E-89 HIF3A -4.740324073 6.87E-67
PK YT1 4.152012808 4.50E-85
LYVE1 -4.829889971 4.75E-57
TROAP 3.541006192 6.09E-46
NUSAP1 3.155133303 1.58E-40
ABCA8 -4.577150586 2.68E-68
NUF2 3.634755036 5.56E-66
DTL 3.107514288 3.50E-80
CDC25C 3.48718754 2.68E-83
CD300LG -6.397763598 5.06E-40
VEGFD -5.830793241 3.02E-59
BTNL9 -4.312507345 5.68E-59
KIAA0101 3.405680204 5.14E-66
PLK1 3.253257045 1.19E-90
CA4 -6.397703457 3.30E-92
SCARA5 -6.274723992 7.78E-66
TNXB -4.3798869 4.35E-55
SDPR -4.675953069 2.02E-71
BUB1 3.304113833 1.45E-61
CDK1 3.063279645 2.98E-88
HSD17B13 -4.604341118 1.69E-43
ANGPTL7 -5.090691675 9.86E-76
HPSE2 -4.483599221 1.14E-72 UBE2C 4.161178377 2.21 E-47
TMEM132C -5.605232486 6.04E-71
SLC2A4 -4.144164384 1.31E-39
IQGAP3 3.828114653 2.01 E-52
MME -4.095890031 8.13E-73
PLPP4 4.785268223 8.18E-44
LINC01614 5.009777715 1.10E-39
AL035610.1 -4.42011048 7.13E-79
CARMN -4.102379377 1.03E-56
UHRF1 3.642290885 5.55E-56
Performance of transcriptomics profile-based prognostic algorithm
The Random Forest based risk model stratified all subjects in training and testing cohorts into two levels of risk for progression as discussed above (normal or early stage). 50 selected genes profiles (normalized) were used as the model input. The risk scores of breast cancer were calculated by the model (Figure 2). We use 0.5 as the cutoff threshold.
The c statistic of the model measured on the testing cohort was 1 (Figure 3).
Unsupervised hierarchical clustering with transcriptomics profiles
Unsupervised hierarchical clustering analysis was applied to the selected genes profiles to visually depict the association of the disease status with the abundance patterns of these genes profiles (Figure 4). This analysis demonstrated two major clusters reflecting normal samples and early stage breast cancer samples. The error rate of the unsupervised clustering is 4.19%, which reinforced the effectiveness of the selected gene profiles for breast cancer assessment.

Claims

CLAIMS What is claimed is:
1. A method, comprising detecting the level of one or more target analytes, but fewer than 50 target anaiytes, in a sample from a subject to be tested for breast cancer, one or more of the target analytes being selected from the group consisting of HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP1 1 , CHRDL1, ASF1 B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ΓΠΗ5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL,
CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1 wherein a change in the expression of these genes are associated with early stage breast cancer.
2. The method of claim 1 , further comprising detecting one or more additional target analytes.
3. The method of claim 2, comprising detecting three or more target analytes being selected from the group consisting of HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP1 1 , CHRDL1 , ASF1 B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A,
PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1.
4. The method of claim 2, comprising detecting ten or more target analytes.
5. The method of claim 1 , wherein the sample is a blood product selected from whole blood; plasma; serum; and filtered, concentrated, fractionated or diluted samples of the preceding.
6. The method of claim 1 , wherein the sample is a biopsy tissue.
7. The method of claim 1 , wherein the method comprises contacting the sample with a set of capture reagents, wherein each capture reagent specifically binds to a different target analyte being detected.
8. The method of claim 7, wherein each capture reagent is an antibody.
9. The method of claim 7, wherein each capture reagent is a nucleic acid probe.
10. Reagents comprising capture reagents for the detection of two or more target analytes, but fewer than 50 target analytes, two or more of the target analytes being selected from the group consisting of HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, ME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1.
11. The reagents of claim 10, wherein said capture reagents are antibodies.
12. The reagents of claim 10, wherein said capture reagents are nucleic acid probes.
13. A kit comprising the reagents of claim 10 and one or more additional reagents for carrying out an assay in a sample from a subject.
14. The reagents of claim 10, comprising capture reagents for detecting three or more target analytes selected from the group consisting of HSPB6, FHL1 , MYOC, UBE2T, TPX2, KIF4A, MMP11 , CHRDL1 , ASF1B, KIF20A, NEK2, CENPF, KCNIP2, ADRA1A, ITIH5, HJURP, COL10A1 , HIF3A, PKMYT1 , LYVE1 , TROAP, NUSAP1 , ABCA8, NUF2, DTL, CDC25C, CD300LG, VEGFD, BTNL9, KIAA0101 , PLK1 , CA4, SCARA5, TNXB, SDPR, BUB1 , CDK1 , HSD17B13, ANGPTL7, HPSE2, UBE2C, TMEM132C, SLC2A4, IQGAP3, MME, PLPP4, LINC01614, AL035610.1 , CARMN, UHRF1.
PCT/US2017/023475 2017-03-21 2017-03-21 Methods and compositions for detecting early stage breast cancer with rna-seq expression profiling WO2018174861A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2017/023475 WO2018174861A1 (en) 2017-03-21 2017-03-21 Methods and compositions for detecting early stage breast cancer with rna-seq expression profiling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2017/023475 WO2018174861A1 (en) 2017-03-21 2017-03-21 Methods and compositions for detecting early stage breast cancer with rna-seq expression profiling

Publications (1)

Publication Number Publication Date
WO2018174861A1 true WO2018174861A1 (en) 2018-09-27

Family

ID=63586061

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/023475 WO2018174861A1 (en) 2017-03-21 2017-03-21 Methods and compositions for detecting early stage breast cancer with rna-seq expression profiling

Country Status (1)

Country Link
WO (1) WO2018174861A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111269981A (en) * 2020-02-17 2020-06-12 中国医科大学附属盛京医院 Application of TROAP in preparing prognosis product for detecting breast cancer patient treated by endocrine
CN114150059A (en) * 2020-09-07 2022-03-08 香港城市大学深圳研究院 MCM3 related breast cancer biomarker kit, diagnosis system and related application thereof
CN114181937A (en) * 2021-12-08 2022-03-15 浙江中医药大学 shRNA molecule for silencing human LINC01614 expression and application thereof
US11767526B2 (en) 2019-01-23 2023-09-26 Regeneron Pharmaceuticals, Inc. Treatment of ophthalmic conditions with angiopoietin-like 7 (ANGPTL7) inhibitors
US11845989B2 (en) 2019-01-23 2023-12-19 Regeneron Pharmaceuticals, Inc. Treatment of ophthalmic conditions with angiopoietin-like 7 (ANGPTL7) inhibitors
US11865134B2 (en) 2021-02-26 2024-01-09 Regeneron Pharmaceuticals, Inc. Treatment of inflammation with glucocorticoids and angiopoietin-like 7 (ANGPTL7) inhibitors

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100009858A1 (en) * 2006-07-28 2010-01-14 Chundsell Medicals Ab Embryonic stem cell markers for cancer diagnosis and prognosis
US20110217297A1 (en) * 2010-03-03 2011-09-08 Koo Foundation Sun Yat-Sen Cancer Center Methods for classifying and treating breast cancers
US20140018253A1 (en) * 2012-04-05 2014-01-16 Oregon Health And Science University Gene expression panel for breast cancer prognosis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100009858A1 (en) * 2006-07-28 2010-01-14 Chundsell Medicals Ab Embryonic stem cell markers for cancer diagnosis and prognosis
US20110217297A1 (en) * 2010-03-03 2011-09-08 Koo Foundation Sun Yat-Sen Cancer Center Methods for classifying and treating breast cancers
US20140018253A1 (en) * 2012-04-05 2014-01-16 Oregon Health And Science University Gene expression panel for breast cancer prognosis

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11767526B2 (en) 2019-01-23 2023-09-26 Regeneron Pharmaceuticals, Inc. Treatment of ophthalmic conditions with angiopoietin-like 7 (ANGPTL7) inhibitors
US11845989B2 (en) 2019-01-23 2023-12-19 Regeneron Pharmaceuticals, Inc. Treatment of ophthalmic conditions with angiopoietin-like 7 (ANGPTL7) inhibitors
CN111269981A (en) * 2020-02-17 2020-06-12 中国医科大学附属盛京医院 Application of TROAP in preparing prognosis product for detecting breast cancer patient treated by endocrine
CN114150059A (en) * 2020-09-07 2022-03-08 香港城市大学深圳研究院 MCM3 related breast cancer biomarker kit, diagnosis system and related application thereof
CN114150059B (en) * 2020-09-07 2024-04-12 香港城市大学深圳研究院 MCM3 related breast cancer biomarker kit, diagnosis system and related application thereof
US11865134B2 (en) 2021-02-26 2024-01-09 Regeneron Pharmaceuticals, Inc. Treatment of inflammation with glucocorticoids and angiopoietin-like 7 (ANGPTL7) inhibitors
CN114181937A (en) * 2021-12-08 2022-03-15 浙江中医药大学 shRNA molecule for silencing human LINC01614 expression and application thereof
CN114181937B (en) * 2021-12-08 2024-01-23 浙江中医药大学 shRNA molecule for silencing human LINC01614 expression and application thereof

Similar Documents

Publication Publication Date Title
WO2018174861A1 (en) Methods and compositions for detecting early stage breast cancer with rna-seq expression profiling
CN108957006B (en) Non-alcoholic fatty liver disease (NAFLD) and non-alcoholic steatohepatitis (NASH) biomarkers and uses thereof
CN108603887B (en) Non-alcoholic fatty liver disease (NAFLD) and non-alcoholic steatohepatitis (NASH) biomarkers and uses thereof
JP7136697B2 (en) A biomarker for the detection of breast cancer in women with dense breasts
US20120143805A1 (en) Cancer Biomarkers and Uses Thereof
CN107406510B (en) Prostate antigen standard substance and application thereof
CN113234830B (en) Product for lung cancer diagnosis and application
EP2748356A2 (en) Renal cell carcinoma biomarkers and uses thereof
Qi et al. Concordance of the 21-gene assay between core needle biopsy and resection specimens in early breast cancer patients
WO2015164616A1 (en) Biomarkers for detection of tuberculosis
WO2018140049A1 (en) Methods and compositions for detecting early stage ovarian cancer with rnaseq expression profiling
US20210072245A1 (en) Biomarkers for detection of breast cancer
WO2018174862A1 (en) Methods and compositions for detecting early stage bladder cancer with rna-seq expression profiling
WO2018174863A1 (en) Methods and composition for detecting early stage colon cancer with rna-seq expression profiling
US20160138110A1 (en) Glioma biomarkers
WO2016123058A1 (en) Biomarkers for detection of tuberculosis risk
CN113444796B (en) Biomarkers associated with lung cancer and their use in diagnosing cancer
US20180356419A1 (en) Biomarkers for detection of tuberculosis risk
US20230071234A1 (en) Nonalcoholic Steatohepatitis (NASH) Biomarkers and Uses Thereof
WO2018174859A1 (en) Methods and compositions for detection of early stage lung squamous cell carcinoma with rnaseq expression profiling
WO2018174860A1 (en) Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling
WO2023059854A1 (en) Lung cancer prediction and uses thereof
WO2024064322A2 (en) Methods of assessing tobacco use status
EP4356140A2 (en) Renal insufficiency prediction and uses thereof
WO2021024009A1 (en) Methods and compositions for providing colon cancer assessment using protein biomarkers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17902134

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17902134

Country of ref document: EP

Kind code of ref document: A1