WO2019245587A1 - Méthodes et compositions pour l'analyse de biomarqueurs du cancer - Google Patents

Méthodes et compositions pour l'analyse de biomarqueurs du cancer Download PDF

Info

Publication number
WO2019245587A1
WO2019245587A1 PCT/US2018/039163 US2018039163W WO2019245587A1 WO 2019245587 A1 WO2019245587 A1 WO 2019245587A1 US 2018039163 W US2018039163 W US 2018039163W WO 2019245587 A1 WO2019245587 A1 WO 2019245587A1
Authority
WO
WIPO (PCT)
Prior art keywords
cancer
specimen
biomarkers
seq
breast cancer
Prior art date
Application number
PCT/US2018/039163
Other languages
English (en)
Inventor
Brandon STEELMAN
Julia Meyer
Original Assignee
Clear Gene, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clear Gene, Inc. filed Critical Clear Gene, Inc.
Priority to EP18923183.0A priority Critical patent/EP3810807A4/fr
Priority to PCT/US2018/039163 priority patent/WO2019245587A1/fr
Priority to AU2018428853A priority patent/AU2018428853A1/en
Priority to CA3103572A priority patent/CA3103572A1/fr
Publication of WO2019245587A1 publication Critical patent/WO2019245587A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • Molecular tests can detect residual disease after a treatment.
  • the presence of residual disease indicates that the treatment did not completely eliminate a tumor, where treatment may include surgery, radiotherapy, chemotherapy, endocrine therapy, or targeted molecular therapy.
  • tumor cells on the surface of an excised tissue specimen are defined as tumor cells on the surface of an excised tissue specimen. Since the surface of the excised specimen is topologically equivalent to the wall of the incision, tumor cells on the surface of the incision indicate the presence of residual tumor in a patient after surgical treatment.
  • Pathologic Complete Response is defined as the absence of residual tumor in tissue from patients who were previously diagnosed with invasive cancer. pCR is used as a primary endpoint to determine the success of emerging breast cancer treatments in the neoadjuvant setting.
  • innovative clinical trial designs have validated pathologic complete response (pCR) as a surrogate endpoint, and are now validating pCR as a therapeutic endpoint.
  • RNA-based test suitable for analysis of tumor margins from surgical samples for residual disease, or for analysis of residual disease in post-treatment cancer patients from other samples.
  • the disclosure provides a method of distinguishing a cancer from adjacent healthy tissue, said method comprising: (a) obtaining a specimen from a human subject, (b) detecting a presence of a set of markers in said specimen by performing an amplification reaction in a plurality of polynucleotides from said specimen, wherein said set of markers is selected from the group consisting essentially of: Matrix Metallopeptidase 11 ( MMP11 ), integrin binding sialoprotein ( IBSP ), and collagen type X alpha 1 chain
  • said plurality of polynucleotides comprise RNA, cDNA, or DNA.
  • the detecting comprises using a DNA-intercalating dye or a fluorescent probe, such as a TaqMan probe.
  • said amplification reaction is a PCR reaction, such as a qPCR reaction or an RTqPCR reaction.
  • said method can distinguish said cancer in at least lOng of said plurality of polynucleotides from specimen. In some instances, said method can distinguish said cancer in at least 250 cells of said specimen.
  • said amplification reaction uses at least one primer sequence that has at least 90% identity to SEQ ID NO: 1- SEQ ID NO: 356, for example to convert RNA into cDNA and/or to amplify a cDNA.
  • the specimen is a frozen specimen, a fresh specimen, or a fixed specimen.
  • the specimen is a biopsy specimen, such as a liquid biopsy, a solid tissue biopsy, or a surgical excision.
  • said specimen is obtained by imprint cytology, with for example a touch- preparation.
  • said specimen is obtained by scrape preparation, a nipple aspiration, or a ductal lavage.
  • said cancer is breast cancer, including, but not-limited to, invasive adenocarcinoma, invasive ductal breast cancer, and invasive lobular breast cancer.
  • said method distinguishes said breast cancer from adjacent healthy tissue with greater than 90% accuracy, greater than 90% sensitivity, or greater than 90% specificity.
  • said method quantitates an amount of said cancer.
  • said method further comprises outputting a percentage of said plurality of polynucleotides expressing said markers from said specimen.
  • the method further comprises comparing said set of markers from said specimen to said set of markers from said control specimen, such as a second specimen from said human subject or a synthetic nucleotide control.
  • the method further comprises performing a second assay to distinguish said cancer, such as an immunohistochemistry assay.
  • said threshold level of said MMP I / is 1,000 copies per microliter
  • said threshold level of said IBSP is 25 copies per microliter
  • said threshold level of said COL10A1 is 700 copies per microliter.
  • said set of markers is selected from the group consisting of: Matrix Metallopeptidase 11 (MMP 11), integrin binding sialoprotein (IBSP), and collagen type X alpha 1 chain (COLIOAI).
  • said amplification reaction can be a singleplex reaction or a multiplex reaction.
  • the disclosure provides a kit comprising, at least one primer sequence that has at least 90% identity to any one of SEQ ID NO: 1- SEQ ID NO: 356, and a buffer system.
  • said buffer system is a PCR buffer system.
  • the kits further comprise a DNA-intercalating dye, a fluorescent probe, such as a TaqMan compatible probe.
  • the kit also comprises a negative control sample, a positive control sample, or a synthetic nucleotide control.
  • the disclosure provides isolated nucleic acid comprising a primer sequence that has at least 90%, at least 95%, or at least 99% identity to SEQ ID NO: 1- SEQ ID NO: 356.
  • the disclosure provides a method of identifying a biomarker for a cancer comprising: (a) analyzing, by a computer system, a cohort of biomarkers from a population of subjects afflicted with a cancer; (b) applying, by said computer system, a first filter to said cohort of said biomarkers to identify a first subset of biomarkers from said cohort that has at least a 3 -fold higher expression level in said cancer as compared to a healthy control biomarker; (c) applying, by said computer system, a second filter to said first subset of biomarkers to identify a second subset of biomarkers that have a false discovery rate for said cancer that is less than 0.000001; and (d) applying, by said computer system, a correlation based filter selection to said second subset of biomarkers to identify the biomarkers that classify the largest number of different types of said cancer.
  • said correlation based filter is an anti-correlation based method.
  • the method further comprises using the identified biomarkers as features input into a machine learning algorithm that distinguishes clinical specimens based on predefined attributes.
  • said cancer is breast cancer, including, but not-limited to invasive adenocarcinoma, invasive ductal breast cancer, and invasive lobular breast cancer.
  • said one or more biomarkers identify said cancer with greater than 90% accuracy, greater than 90% sensitivity, or greater than 90% specificity.
  • said one or more biomarkers are therapeutic targets.
  • said false discovery rate is a /i-value for said cancer that is less than 0.0000001.
  • FIGURE 1 is a diagram illustrating positive versus clear surgical margins.
  • FIGURE 2A is a Volcano plot of 20,253 mRNAs in 1,014 samples.
  • RNA Seq was used to analyze 1,014 samples from early-stage tumors and healthy samples from adjacent tissue. Selected genes had the highest Correlation-based Feature Selection scores among genes that passed /i- value threshold (dashed horizontal line) and fold-change threshold (dashed vertical line).
  • FIGURE 2B panels (a-c) are cumulative frequency plots of 1,536 patient samples that show that a 3-gene set ( MMP11 , COL10AJ IBSP ) is overexpressed in samples from early-stage tumors and adjacent healthy tissue.
  • the genes have comparable distributions on RNA Seq samples (a), a subset of samples that were also analyzed by microarray (b), and a subset that were also analyzed by RTqPCR (c). These results confirm that expression is not platform-specific.
  • Panels (c-e) are 2D-Density maps illustrating the advantage of a multianalyte test over a single biomarker. Separation of tumor and healthy improves as we progress from RNA Seq to Microarray to custom RTqPCR.
  • FIGURE 3 is a chart showing a Principal Component Analysis (PCA) of all available microarray probes shows a clear demarcation between tumor (left dots) and healthy samples (right dots).
  • PCA Principal Component Analysis
  • FIGURE 4 depicts receiver-operator characteristic (ROC) curves of classifiers for a 3-gene set including MMPH, COL10AJ IBSP.
  • ROC curves show the tradeoff between sensitivity and specificity over all possible thresholds.
  • the solid dark line shows performance of the 3-gene test on 939 cross-validated RNA Seq samples.
  • FIGURE 5 illustrates an error plot of a 3-gene set ( MMP11 , COL10AJ IBSP ) in 939 RNA Seq samples.
  • error plots set the threshold based on the tradeoff between Type I and Type II errors.
  • Type I errors False Positives
  • Type II errors False Negatives
  • FIGURE 7A and FIGURE 7B depict charts showing analytic validation of qPCR assays for using clinical-grade reagents.
  • FIGURE 7A panel a depicts amplification plots of 20 microliter qPCR reactions. 12 concentrations of synthetic cDNA template 1.1 million copies per microliter to 0 copies per microliter), including lO-fold dilutions for 6 high concentrations (5 technical replicates) and 2-fold dilutions for 5 low concentrations (7 technical replicates). One concentration point overlapped in the high and low concentration series. Each primer pair includes 24 replicates of no-template controls. Error bars at each cycle represent 95% Cl of technical replicates.
  • FIGURE 7A panel b depict fluorescence versus cycle plots to determine Ct for MMP11.
  • a 4-parameter linear model was fitted to 5 technical replicates (circles). The maximum of the second derivative was used to define the Ct (CtD2).
  • FIGURE 7B panel c depicts threshold cycle versus template dilution plots to calculate linear range.
  • the linear range is defined as the range of concentrations where CtD2 fit a straight line with R-squared >0.995. Red lines indicate 95% Confidence Intervals calculated from 200 bootstraps.
  • FIGURE 7B panel d depicts melt plots confirm to specificity of the primers. Increasing temperature denatures PCR amplicons, which decreases
  • a single peak of the negative first derivative confirms the presence of a single amplicon.
  • the peak corresponds to the expected melting temperature (dashed line).
  • FIGURE 7A and FIGURE 7B panels e-h depict charts showing analytic validation of qPCR assays for IBSP RNA as for MM PI / All assays used clinical-grade reagents.
  • Panel e depicts amplification plots of 20 microliter qPCR reactions. 12 concentrations of synthetic cDNA template (1.1M to 0 copies per microliter), including lO-fold dilutions for 6 high concentrations (5 technical replicates) and 2-fold dilutions for 5 low concentrations (7 technical replicates). One concentration point overlapped in the high and low concentration series.
  • Each primer pair includes 24 replicates of no-template controls. Error bars at each cycle represent 95% Confidence Intervals of technical replicates.
  • FIGURE 7A Panel f depicts fluorescence versus cycle plots to determine Ct for IBSP.
  • a 4-parameter linear model was fitted to 5 technical replicates (circles). The maximum of the second derivative was used to define the Ct (CtD2).
  • FIGURE 7B panel g depicts threshold cycle versus template dilution plots to calculate linear range.
  • the linear range is defined as the range of concentrations where CtD2 fit a straight line with R-squared >0.995. Red lines indicate 95% Confidence Intervals calculated from 200 bootstraps.
  • FIGURE 7B panel h depict melt plots that demonstrate the specificity of the primers. Increasing temperature denatures PCR amplicons, which decreases fluorescence. A single peak of the negative first derivative confirms the presence of a single amplicon. The peak corresponds to the expected melting temperature (dashed line).
  • FIGURE 7A and FIGURE 7B panels i-1 depict analytic validation of qPCR assays for COL10A1 RNA as for MMP I / All assays use clinical-grade reagents.
  • FIGURE 7A panel i depict amplification plots of 20 microliter qPCR reactions. 12 concentrations of synthetic cDNA template (1.1M to 0 copies per microliter), including lO-fold dilutions for 6 high concentrations (5 technical replicates) and 2-fold dilutions for 5 low concentrations (7 technical replicates). One concentration point overlapped in the high and low concentration series. Each primer pair included 24 replicates of no-template controls. Error bars at each cycle represent 95% Confidence Intervals of technical replicates.
  • FIGURE 7A Panel j depicts fluorescence versus cycle plots to determine Ct for COL10A1.
  • a 4-parameter linear model was used to fit all 5 technical replicates (circles). The maximum of the second derivative (green curve) was used to define the Ct (CtD2).
  • FIGURE 7 A panel k depicts threshold cycle versus template dilution plots to calculate linear range. The linear range is defined as the range of concentrations where CtD2 fit a straight line with R-squared >0.995. Red lines indicate 95% Confidence Intervals calculated from 200 bootstraps.
  • Panel 1 depicts melt plots confirm to specificity of the primers. Increasing temperature denatures PCR amplicons, which decreases fluorescence (black line). A single peak of the negative first derivative (red line) confirms the presence of a single amplicon. The peak corresponds to the expected melting temperature (dashed line).
  • FIGURE 8A, FIGURE 8B, and FIGURE 8C are graphs depicting absolute quantification (RT-qPCR) of the 3 RNAs in the 3-gene set ( MMP11 , COL10AJ IBSP ) in 22 patient samples using Tukey Boxplots.
  • Tukey Boxplots the thick center line represents the mean, boxes show the interquartile range (Q1-Q3). Cumulative Frequency plots show the distribution of expression in tumor and healthy samples.
  • Panel b depicts absolute quantification
  • FIGURE 9 is a graph depicting a Receiver-Operator Characteristic (ROC) Curve of the 3-Gene Classifier. ROC curves show the tradeoff between sensitivity and specificity over all possible thresholds. The 3-gene classifier uses Random Forest to distinguish between tumor and adjacent healthy tissue. Performance estimates are based on 5-fold cross validation of 22 samples that were analyzed with the disclosed RTqPCR assays.
  • ROC Receiver-Operator Characteristic
  • FIGURE 10 depicts a plot showing Generalized Linear Model (glm) (dashed line) sample discrimination using IBSP RNA in 22 patient samples, analyzed by the disclosed RTqPCR assay.
  • the disclosed RTqPCR assays can resolve a greater difference in analytes than RNA Seq.
  • the disclosed assays perform so well that a simple linear model can correctly classify 100% of the analyzed samples using a single biomarker.
  • RNA Seq required a complex combination of 3 biomarkers, and still did not achieve 100% accuracy.
  • FIGURE 11 depicts a plot showing Generalized Linear Model (glm) (dashed line) sample discrimination using MMP11 RNA in 22 patient samples, analyzed by the disclosed RTqPCR assay.
  • glm Generalized Linear Model
  • FIGURE 12 shows a chart depicting a Tumor Probability Score calculated using the 3 -gene classifier described in EXAMPLE 1.
  • the 3 -gene classifier uses the Random Forest algorithm to calculate a Tumor Probability Score (T) from zero to one.
  • Panel a shows the T score for RNA Seq samples from 901 tumors (black) and 113 adjacent healthy samples (grey).
  • Panel b shows the T score for RTqPCR samples from 11 tumors (black) and 11 adjacent healthy samples (grey).
  • FIGURE 13 shows a computer system that is programmed or otherwise configured to implement methods provided herein.
  • pCR has quickly become the primary endpoint for -50% of enrolling phase II rectal cancer trials, and 45% of phase III preoperative breast cancer trials.
  • Unpublished results from the I-SPY 2 TRIAL of high-risk breast cancer patients indicate that pCR was statistically associated with 3-year outcomes on pooled patients across all treatment arms. After 3 years, patients who achieved pCR had a 6% recurrence risk (event-free survival), compared to 24% recurrence risk for those who did not achieve pCR.
  • Positive margins are defined as malignant cells that touch the cut surface of a specimen (FIG.lb), indicating residual tumor in the bed of the incision. Positive margins increase the risk of recurrence and disease-specific mortality.
  • Type I & II Errors are known as false positives. False positives have proven a significant barrier in the adoption of analysis of tumor margins by microscopy /histology; in a previous study of lumpectomy margin analysis by Tang et al. only 149 (32%) of 462 positive microscopy results actually had residual tumor along the margin. See , e.g., Tang R, Coopey SB, Specht MC, Lei L, Gadd MA, Hughes KS, Brachtel EF, Smith BL. Lumpectomy specimen margins are not reliable in predicting residual disease in breast conserving surgery. Am J Surg. 2015 Jul;2l0(l):93-8.
  • Type II errors Fralse Negatives
  • RNA Seq RNA Seq
  • Type II error rate ⁇ 5% represents a 75-100% improvement over existing methods.
  • exclusive focus on Type II errors would be insufficient; high Type I errors (False Positives) would result in overtreatment. Surgeons may even avoid using a test with high Type I errors (False Positives) because it would trigger unnecessary reexcisions.
  • Type I errors Fe Positives
  • Mammography is the most widely used screening modality for the detection of breast cancer. There is conflicting evidence about whether screening mammography decreases breast cancer mortality. The evidence is strongest for women aged 50 to 69 years. However, screening in all age groups is also associated with harms. Harms can include unnecessary invasive procedures for patients who do not have breast cancer, and overdiagnosis, which is the detection of tumors that are not clinically significant. The error rates for mammography in women less than 50 are so high relative to the incidence of invasive breast cancers that the benefit of mammography is uncertain for women between 40 to 49 years old. In 2014, the Canadian National Breast Screening Study completed 25 years of follow-up and found no survival benefit associated with screening mammograms for women of all ages. While it is debatable how these findings should be applied to individual patients, it is clear that screening technologies are insufficient.
  • This group of technologies includes molecular breast imaging, ultrasound, and magnetic resonance imaging.
  • Described herein is a method for analysis of residual tumor cells.
  • the method and kits disclosed herein can identify improved treatment regimens. Accordingly, disclosed herein are post-operative devices and methods for obtaining and analyzing gene expression from cells from patient samples (e.g. from an excisional surgical biopsy) for residual disease.
  • a panel of one to three cDNAs can serve as biomarkers to distinguish invasive breast cancer from adjacent healthy tissue with an accuracy of 96-100%.
  • the disclosed 3-gene test had a 96% Accuracy, 96% Sensitivity, and 94% Specificity.
  • RNA Seq samples On an independent test set of 75 RNA Seq samples, the 3-gene test had a 97% Accuracy, 98% Sensitivity, 96% Specificity, 98% Positive Predictive Value, and 96 % Negative Predictive Value.
  • TCGA Cancer Genome Atlas
  • mRNAs are promising biomarkers because changes in cell and tissue morphology necessarily involve changes in gene activity and are therefore ideally situated to improve margin analysis. Moreover, we can now catalog tumor mRNAs across the genome. Finally, clinical labs routinely perform sensitive nucleic acid tests, positioning this qPCR assay for rapid adoption.
  • Prosigna® (PAM50 gene expression test) has 510K clearance from the FDA as a prognostic test for the risk of recurrence, in conjunction with clinical factors.
  • PAM50 gene expression test
  • the PAM50 strategy of using genes that are downregulated in tumors could therefore not be used to detect rare tumor cells. Since our clinical indication involves detecting tumor cells in a population of healthy cells, we validated tumor-specific mRNAs with high expression in tumors.
  • Described herein is a method for analysis of residual tumor cells.
  • the method and kits disclosed herein can identify complete excision of malignant tissue from patients.
  • are post-operative devices and methods for obtaining and analyzing gene expression from cells from patient samples (e.g. on the surface of surgical specimens) for residual disease.
  • Nucleic acid tests for residual tumor cells provide a powerful solution to address positive surgical margins when combined with methods to acquire samples from the surface of a surgical sample.
  • Described herein is a method for analysis of rare tumor cells.
  • the method and kits disclosed herein can identify rare cancer cells, even when those tumor cells are not found in the context of healthy tissue. Accordingly, disclosed herein are screening devices and methods for obtaining and analyzing gene expression from cells from patient samples (e.g. nipple aspirates from ductal lavage) for disease. Disclosed herein are also adjuvant devices and methods to determine whether a screening test result warrants further investigation.
  • the term“subject” or“patient” can include human or non-human animals.
  • the methods and described herein are applicable to both human and veterinary disease and animal models.
  • Preferred subjects are“patients,” e.g., living humans that are receiving medical care for a disease or condition (e.g., cancer). This includes persons with no defined illness who are being investigated for signs of pathology.
  • the methods described herein are particularly useful for the evaluation of patients having or suspected of having breast adenocarcinomas.
  • Biomarkers broadly refer to any characteristics that are objectively measured and evaluated as indicators of normal biological processes, pathogenic processes, or pharmacologic responses to therapeutic intervention.
  • biomarker specifically refers to biomarkers that have biophysical properties, which allow their measurements in biological samples (e.g., plasma, serum, lavage, biopsy).
  • biomarker is used interchangeably with“molecule biomarker” or“molecular markers.”
  • biomarkers include nucleic acid biomarkers (e.g., oligonucleotides or polynucleotides), peptides or protein biomarkers, lipids, and lipopolysaccharide markers.
  • polynucleotide or“nucleic acid” as used herein refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides, that comprise purine and/or pyrimidine bases, or other naturally modified nucleotide bases.
  • Polynucleotides of the embodiments of the invention include sequences of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), or DNA copies of ribonucleic acid (cDNA), all of which may be isolated from natural sources, recombinantly produced, or artificially synthesized.
  • the polynucleotides and nucleic acids may exist as single-stranded or double-stranded.
  • primer refers to an oligonucleotide which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand, is induced, i.e., in the presence of nucleotides and an inducing agent such as a DNA polymerase and at a suitable temperature and pH.
  • the primer may be either single-stranded or double- stranded and must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent.
  • the exact length of the primer will depend upon many factors, including temperature, source of primer and the method used. For example, for diagnostic applications, depending on the complexity of the target sequence, the oligonucleotide primer typically contains 15-35 or more nucleotides, although it may vary for certain biomarkers or
  • Biological sample as used herein is a sample of biological tissue or chemical fluid that is suspected of containing a biomarker or an analyte of interest.
  • the sample may be an ex vivo sample or in vivo sample.
  • Samples include, for example, tissue biopsies, e.g., from the breast or any other tissue suspected to be affected by, for instance, a metastasis of a cancer.
  • the biopsy can be a liquid biopsy or a solid tissue biopsy.
  • the sample can be a surgical excision from a tissue margin or another area suspected to be affected.
  • a sample may be suspended or dissolved in, e.g., buffers, extractants, solvents, and the like.
  • the terms sample and specimen can be used interchangeably herein.
  • Ranges can be expressed herein as from“about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent“about,” it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.
  • the term“about” as used herein refers to a range that is 15% plus or minus from a stated numerical value within the context of the particular usage. For example, about 10 would include a range from 8.5 to 11.5.
  • the molecules are circulating molecules.
  • the molecules are expressed in the cytoplasm of blood, endothelial, or organ cells. In some cases, the molecules are expressed on the surface of blood, endothelial, or organ cells.
  • a sample can be any material containing tissues, cells, nucleic acids, genes, gene fragments, expression products, polypeptides, exosomes, gene expression products, or gene expression product fragments of a subject to be tested.
  • a sample can include but is not limited to, tissue, cells, or biological material from cells or derived from cells of an individual.
  • the sample can be a heterogeneous or homogeneous population of cells or tissues.
  • the sample can be a fluid that is acellular or depleted of cells (e.g., serum).
  • the sample is from a single patient.
  • the method comprises analyzing multiple samples at once, e.g., via massively parallel multiplex expression analysis on protein arrays or the like.
  • the sample may be obtained using any suitable method.
  • the sample may be obtained by a minimally-invasive method, e.g., venipuncture or ductal lavage.
  • the sample obtained by venipuncture may comprise whole blood or a component thereof (e.g. serum, white blood cells).
  • Ductal lavage may be performed by e.g. the method described in US20020058887A1, which is incorporated by reference herein.
  • the sample may be obtained an invasive method, such as by biopsy.
  • Biopsies could include core biopsies, punch biopsies, incisional biopsies and excisional biopsies.
  • a sample obtained by surgical excision may comprise a subsection of an excised tissue chunk (e.g.
  • a sample obtained by surgical excision may comprise a cell-dissociated or homogenized chunk of some or all of the excised tissue.
  • a sample obtained by surgical excision may comprise a surface sample of excised tissue.
  • a surface sample of excised tissue may comprise a“touch prep” sample which reflects the population of cells along the margins of the excised tissue ( e.g . tumor).
  • obtaining a sample comprises directly isolating a sample from a patient. In some embodiments, obtaining a sample comprises obtaining a sample previously isolated from a patient. In some embodiments, obtaining a sample comprises obtaining polynucleotides isolated from a sample previously isolated from a patient.
  • the cellular specimen may be obtained using imprint cytology acquisition strategies, one form of which is a‘touch prep’ or similar method.
  • A‘touch prep’ is known as a type of imprint cytology.
  • The‘touch prep’ method may involve smearing or spreading the obtained cellular specimen onto a slide or a plurality of slides.
  • The‘touch prep’ method may involve pressing the slide to the biological sample.
  • The‘touch prep’ method may involve pressing the slide to the excised tissue.
  • the ‘touch prep’ method may involve pressing the slide to a tissue on or within the subject.
  • the ‘touch prep’ method may involve pressing the slide to an area, wall or margin surrounding a tissue or biological sample on or within the subject.
  • The‘touch prep’ method may involve pressing the slide to an area, wall or margin surrounding a site where a tissue was excised. Touch prep may be performed in, e.g.
  • The‘touch prep’ method may be performed in a few seconds per slide.
  • The‘touch prep’ method may be performed by a surgeon, a nurse, an assistant, a cytopathologist, a person with no medical training or the subject.
  • The‘touch prep’ method may be operated manually.
  • The‘touch prep’ method may be operated automatically by a machine.
  • The‘touch prep’ method may be performed intraoperatively to detect or rule out malignant cells along the surgical margin (e.g. during a breast lumpectomy).
  • the excised tissue may be pressed against a sample collection unit which is a glass slide coated with poly-Lysine, or other surface.
  • the cellular specimen obtained by a touch prep method may be used to determine the presence or absence of malignant cells along the margin of excised tissue.
  • the surface comprises sample collection unit.
  • the sample is then applied to a sample input unit of a device.
  • the touch prep sample may be obtained according to the methodology described in US20040030263A1, which is incorporated by reference herein.
  • the samples comprise tissue samples and are prepared by tumor dissociation/homogenization. In some embodiments, this is accomplished using the Miltenyi Biotec Tumor Dissociation Kit in combination with a gentleMACS Tissue Dissociator to homogenize tissue samples in a sterile environment.
  • the Tissue Dissociator uses disposable Miltenyi M tubes with rotor-stators that are built into the tube lids. Frozen samples may be used to achieve more comsistent yields.
  • Tissue is added to cell lysis in buffer directly in the disruptor tube. After dissociation and lysis, RNA is isolated using an RNA isolation kit, such as Qiagen RNeasy Mini Kit.
  • This method can isolate high-quality RNA from both tumor and adipose-based tissues. Larger specimens may be divided into smaller pieces depending on maximum tissue input. If tissue dissociation alone does not collect enough high-quality RNA for RTqPCR, samples may be pre-incubated with enzymatic treatments (e.g. Collagenases). Enzymatic treatments may be applied during mechanical dissociation, which others have validated for the GentleMACS Tissue Dissociator.
  • enzymatic treatments e.g. Collagenases
  • the methods or compositions herein are capable of detecting breast cancer in a sample from a cancer patient, detecting residual breast cancer in a sample from a cancer patient (e.g. post-chemotherapy/radiation/surgery) , or distinguishing between breast cancer and surrounding healthy breast tissue.
  • the detection is based on a minimal amount of polynucleotides or nucleic acids isolated from a sample.
  • the minimal amount of polynucleotides or nucleic acids isolated from the sample is at least 10 ng, 50 ng, lOOng, 200 ng, 500ng, 1 mg, 2 mg, 3 mg, 4 mg, 5 mg, 10 mg, 15 mg, 20 mg, 50 mg, or 500ng.
  • the methods or compositions herein are capable of detecting residual cancer in a sample from a patient, or distinguishing between cancer and surrounding healthy tissue, based on a minimal weight of tissue sample used to isolate polynucleotides or nucleic acids.
  • the minimal amount of tissue sample is at least 100 ng, 200 ng, 500 ng, lmg, 2 mg, 3 mg, 4 mg, 5 mg, 10 mg, 15 mg, 20 mg, 50 mg, 100 mg, 200 mg, 300 mg, or 500 mg.
  • biomarker refers to a measurable indicator of some biological state or condition.
  • a biomarker can be a substance found in a subject, a quantity of the substance, or some other indicator.
  • a biomarker can be the amount of a protein and/or other gene expression products in a sample.
  • a biomarker is a total level of protein in a sample.
  • a biomarker is a total level of a particular type of nucleic acid (e.g . RNA, cDNA) in a sample.
  • a biomarker is a therapeutic target, or an indicator of response to therapy.
  • the methods, compositions and systems as described here also relate to the use of biomarker panels for purposes of research, identification, diagnosis, classification, treatment or to otherwise characterize the status of cancer in a patient.
  • Sets of biomarkers useful for classifying biological samples are provided, as well as methods of obtaining such sets of biomarkers.
  • the pattern of levels of biomarkers in a panel (also known as a signature) is determined from a control sample or population and then used to evaluate the signature of the same panel of biomarkers in an experimental sample or population, such as by a measure of similarity between the sample signature and the reference signature.
  • the panels of biomarkers described herein are useful for the detection of breast cancer (e.g. detection of positive surgical margins on a biopsy sample, detection of residual disease in a cancer patient post-radiation/chemotherapy/surgery, or detection of disease in a patient suspected of having cancer).
  • the breast cancer is invasive adenocarcinoma, invasive ductal breast cancer, invasive lobular breast cancer, or a combination thereof.
  • the breast cancer is HER2 positive, ER (estrogen receptor) positive, or PR (progesterone receptor) positive, or a combination thereof.
  • the breast cancer is HER2 negative, ER
  • estrogen receptor negative
  • PR progesterone receptor
  • the methods herein comprise measuring expression levels of genes selected from the group consisting essentially of Matrix Metallopeptidase 11
  • MMP11 integrin binding sialoprotein
  • IBSP integrin binding sialoprotein
  • the methods herein comprise measuring expression levels of genes selected from the group consisting of Matrix Metallopeptidase 11 ( MMP11 ), integrin binding sialoprotein (IBSP), and collagen type X alpha 1 chain (COLIOAI). In some embodiments the methods herein comprise measuring expression levels of genes selected from the group consisting essentially of Matrix Metallopeptidase 11 ( MMP11 ) and integrin binding sialoprotein (IBSP). In some embodiments the methods herein comprise measuring expression levels of genes selected from the group consisting of Matrix Metallopeptidase 11 (MMP11) and integrin binding sialoprotein (IBSP).
  • biomarkers that form the basis for the 3-gene test described herein ( MMP11 , IBSP, and COL10A1 ) particularly useful in that their expression is higher (upregulated) in cancerous tissues than in normal tissues.
  • MMP11 , IBSP, and COL10A1 particularly useful in that their expression is higher (upregulated) in cancerous tissues than in normal tissues.
  • the fraction of a sample that must contain cancerous cells for the sample to be labeled as positive is much lower than for a test that depends on genes that have decreased expression (downregulated) in cancerous tissue.
  • the methods, compositions and systems as described here also relate to the use of a biomarker test of research, identification, diagnosis, classification, treatment or to otherwise characterize the status of cancer in a patient, wherein at least one of Matrix Metallopeptidase 11 ( MMP11 ), integrin binding sialoprotein (IBSP), and collagen type X alpha 1 chain ( COL10A1 ) are higher in said cancer than in healthy tissue.
  • at least two of Matrix Metallopeptidase 11 ( MMP11 ), integrin binding sialoprotein (IBSP), and collagen type X alpha 1 chain (COLIOAI) are higher in said cancer than in healthy tissue.
  • the levels of each of Matrix Metallopeptidase 11 ( MMP11 ), integrin binding sialoprotein (IBSP), and collagen type X alpha 1 chain (COLIOAI) are higher in said cancer than in healthy tissue.
  • the methods, kits, and systems disclosed herein may comprise specifically detecting, profiling, or quantitating biomolecules (e.g., nucleic acids, DNA, RNA, polypeptides, etc.) that are within the biological samples to determine an expression profile.
  • biomolecules e.g., nucleic acids, DNA, RNA, polypeptides, etc.
  • genomic expression products, including RNA, or polypeptides may be isolated from the biological samples.
  • nucleic acids, DNA, RNA, polypeptides may be isolated from a cell-free source.
  • nucleic acids, DNA, RNA, polypeptides may be isolated from cells derived from the cancer patient.
  • the molecules detected are derived from molecules endogenously present in the sample via an enzymatic process (e.g, cDNA derived from reverse transcription of RNA from the biological sample followed by amplification).
  • Expression profiles are preferably measured at the nucleic acid level, meaning that levels of mRNA or nucleic acid derived therefrom (e.g, cDNA or RNA) are measured.
  • An expression profile refers to the expression levels of a plurality of genes in a sample.
  • a nucleic acid derived from mRNA means a nucleic acid synthesized using mRNA as a template. Methods of isolation and amplification of mRNA are described in, e.g, Chapter 3 of Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, (P. Tijssen, ed.) Elsevier, N.Y. (1993).
  • the amplification is performed under conditions that approximately preserve the relative proportions of mRNA in the original samples, such that the levels of the amplified nucleic acids can be used to establish phenotypic associations representative of the mRNAs.
  • expression levels are determined by direct detection of nucleic acids.
  • methods include e.g. gel or capillary electrophoresis, wherein specifically amplified DNA is detected by its intrinsic fluorescence/absorbance, or by complexing with a suitable absorbent or fluorescent DNA-binding dye.
  • Such methods can be used alongside PCR or RT-PCR with forward and reverse primers against specific genes to detect levels of genes within nucleic acids isolated from a sample.
  • expression levels are determined by NanoStringTM assay.
  • NanoStringTM based assays are described in the U.S. Patent Nos. 8,415, 102, 8,519,115, and 7,919,237, which are herein incorporated by reference in their entirety.
  • NanoString's NCOUNTER technology is a variation on the DNA microarray. It uses molecular "barcodes" and microscopic imaging to detect and count up to several hundred unique transcripts in one hybridization reaction. Each color-coded barcode is attached to a single target-specific probe corresponding to a target of interest.
  • the protocol typically includes hybridization
  • the protocol is carried out with a prep station, which is an automated fluidic instrument that immobilizes code set complexes for data collection, and a digital analyzer, which derives data by counting fluorescent barcodes.
  • Code set complexes are custom-made or pre-designed sets of color-coded probes pre -mixed with a set of system controls.
  • Probes for the barcode-based assay can be designed according to desired variables such as melting temperature (Tm) and specificity for the template mRNA/cDNA to be detected.
  • expression levels are determined by so-called“real time amplification” methods also known as quantitative PCR (qPCR) or Taqman.
  • qPCR quantitative PCR
  • Taqman The basis for this method of monitoring the formation of amplification product formed during a PCR reaction with a template using oligonucleotide probes/oligos specific for a region of the template to be detected.
  • qPCR or Taqman are used immediately following a reverse-transcriptase reaction performed on isolated cellular mRNA; this variety serves to quantitate the levels of individual mRNAs during qPCR.
  • Taqman uses a dual-labeled fluorogenic oligonucleotide probe.
  • the dual labeled fluorogenic probe used in such assays is typically a short (ca. 20-25 bases) polynucleotide that is labeled with two different fluorescent dyes.
  • the 5’ terminus of the probe is typically attached to a reporter dye and the 3’ terminus is attached to a quenching dye. Regardless of labelling or not, the qPCR probe is designed to have at least substantial sequence
  • telomere sequence complementarity with a site on the target mRNA or nucleic acid derived from.
  • Upstream and downstream PCR primers that bind to flanking regions of the locus are also added to the reaction mixture.
  • the probe When the probe is intact, energy transfer between the two fluorophores occurs and the quencher quenches emission from the reporter.
  • the probe is cleaved by the 5’ nuclease activity of a nucleic acid polymerase such as Taq polymerase, thereby releasing the reporter from the polynucleotide-quencher and resulting in an increase of reporter emission intensity which can be measured by an appropriate detector.
  • mRNA levels can also be measured without amplification by hybridization to a probe, for example, using a branched nucleic acid probe, such as a QuantiGene® Reagent System from Panomics.
  • This format of test is particularly useful for the multiplex detection of multiple genes from a single sample reaction, as each fluorophore/quencher pair attached to an individual probe may be spectrally orthogonal to the other probes used in the reaction such that multiple probes (each directed against a different gene product) can be detected during the amplification/detection reaction.
  • qPCR can also be performed without a dual-labeled fluorogenic probe by using a fluorescent dye (e.g. SYBR Green) specific for dsDNA that reflects the accumulation of dsDNA amplified specific upstream and downstream oligonucleotide primers.
  • a fluorescent dye e.g. SYBR Green
  • the increase in fluorescence during the amplification reaction is followed on a continuous basis and can be used to quantify the amount of mRNA being amplified.
  • the levels of particular genes may be expressed relative to one or more internal control gene measured from the same sample using the same detection methodology.
  • Internal control genes may include so-called“housekeeping” genes (e.g. ACTB, B2M, UBC, GAPD and HPRT1).
  • the one or more internal control gene is TTC5, C2orf44, or Chr3.
  • a“pre-amplification” step is performed on cDNA transcribed from cellular RNA prior to the quantitatively monitored PCR reaction. This serves to increase signal in conditions where the natural level of the RNA/cDNA to be detected is very low. Suitable methods for pre-amplification include but are not limited LM-PCR, PCR with random oligonucleotide primers (e.g. random hexamer PCR), PCR with poly-A specific primers, and any combination thereof.
  • an RT-PCR step is first performed to generate cDNA from cellular RNA.
  • amplification by RT-PCR can either be general (e.g. amplification with partially/fully degenerate oligonucleotide primers) or targeted (e.g. amplification with oligonucleotide primers directed against specific genes which are to be analyzed at a later step).
  • expression levels are determined by sequencing, such as by RNA sequencing or by DNA sequencing (e.g., of cDNA generated from reverse-transcribing RNA (e.g, mRNA) from a sample). Sequencing may also be general (e.g. with amplification using partially/fully degenerate oligonucleotide primers) or targeted (e.g. with amplification using oligonucleotide primers directed against specific genes which are to be analyzed at a later step). Sequencing may be performed by any available method or technique.
  • Sequencing methods may include: Next Generation sequencing, high-throughput sequencing, pyrosequencing, classic Sanger sequencing methods, sequencing-by-ligation, sequencing by synthesis, sequencing-by-hybridization, RNA-Seq (Illumina), Digital Gene Expression (Helicos), next generation sequencing, single molecule sequencing by synthesis (SMSS) (Helicos), Ion Torrent Sequencing Machine (Life Technologies/Thermo-Fisher), massively- parallel sequencing, clonal single molecule Array (Solexa), shotgun sequencing, Maxim- Gilbert sequencing, primer walking, and any other sequencing methods known in the art.
  • Measuring gene expression levels may comprise reverse transcribing RNA (e.g., mRNA) within a sample in order to produce cDNA.
  • RNA e.g., mRNA
  • the cDNA may then be measured using any of the methods described herein (e.g, qPCR, sequencing, etc.).
  • expression levels of genes can be determined at the protein level, meaning that levels of proteins encoded by the genes discussed above are measured.
  • immunoassays such as sandwich, competitive, or non-competitive assay formats, to generate a signal that is related to the presence or amount of a protein analyte of interest.
  • Immunoassays such as, but not limited to, lateral flow, enzyme-linked immunoassays (ELISA), radioimmunoassays (RIAs), and competitive binding assays may be utilized.
  • ELISA enzyme-linked immunoassays
  • RIAs radioimmunoassays
  • competitive binding assays may be utilized.
  • Numerous formats for antibody arrays have been described proposed employing antibodies.
  • Other ligands having specificity for a particular protein target can also be used, such as synthetic antibodies.
  • the methods provided herein can detect the presence of residual disease, such as a positive margin on a surgical cancer biopsy or presence of disease (e.g. of in a sample from a cancer patient with a high degree of accuracy, sensitivity, and/or specificity.
  • the accuracy e.g., for detecting residual disease, or distinguishing between residual disease and surrounding healthy tissue is at least about 75%, 80%, 85%, 90%, 91%, 92%, 93%,
  • the sensitivity e.g., for detecting residual disease, or distinguishing between residual disease and surrounding healthy tissue
  • the specificity e.g., for detecting residual disease, or distinguishing between residual disease and surrounding healthy tissue
  • the positive predictive value e.g, for detecting residual disease, or distinguishing between residual disease and surrounding healthy tissue of the method at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%,
  • the AUC after thresholding in any of the methods provided herein may be at least about 0.7, 0.75, 0.8, 0.85, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95. 0.96, 0.97, 0.98, 0.99, 0.995, or 0.999.
  • the methods disclosed herein have a positive predictive value of at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99%.
  • the methods disclosed herein have a negative predictive value of at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99%.
  • the methods, compositions, systems and kits provided herein can be used to detect, diagnose, predict or monitor a condition of a pregnant patient.
  • the methods, compositions, systems and kits described herein provide information to a medical practitioner that can be useful in making a therapeutic decision.
  • Therapeutic decisions can include decisions to: continue with a particular therapy, modify a particular therapy, alter the dosage of a particular therapy, stop or terminate a particular therapy, altering the frequency of a therapy, introduce a new therapy, introduce a new therapy to be used in combination with a current therapy, or any combination of the above.
  • the methods provided herein can be applied in an experimental setting, e.g., a clinical trial.
  • the guidance of a test result herein e.g.
  • a test result herein e.g. presence of residual disease
  • the guidance of a test result herein may be used to indicate the location of a further tumor excision to be performed on the patient (e.g. in the case where the test is used in combination with touch prep multiple touch prep samples derived as described above to indicate where surgical margins have been insufficient in an excised sample).
  • serial testing such as serial non-invasive tests, serial minimally-invasive tests (e.g, blood draws, ductal lavage), or some combination thereof.
  • the cancer patient is monitored as needed using the methods described herein.
  • the cancer patient can be monitored weekly, monthly, or at any pre-specified intervals.
  • the cancer patient is monitored at least once every 24 hours.
  • the cancer patient is monitored at least once every 1 day to 30 days.
  • the cancer patient is monitored at least once every at least 1 day.
  • the cancer patient is monitored at least once every at most 30 days. In some instances the cancer patient is monitored at least once every 1 day to 5 days, 1 day to 10 days, 1 day to 15 days, 1 day to 20 days, 1 day to 25 days, 1 day to 30 days, 5 days to 10 days, 5 days to 15 days, 5 days to 20 days, 5 days to 25 days, 5 days to 30 days, 10 days to 15 days,
  • the cancer patient is monitored at least once every 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 28, 29, 30 or 31 days. In some instances, the cancer patient is monitored at least once every 1, 2, 3, or 6 months.
  • the primers disclosed herein such as a pair of primers as described herein, specifically a forward primer (“F”) and a reverse primer (“R”) for both strands to be detected, can be in a composition in amounts effective to permit detection of native, mutant, reference, or control sequences. Detection of native, mutant, reference, or control sequences is accomplished using any of the methods described herein or known by one of ordinary skill in the art in the art for detecting a specific nucleic acid molecule in a sample.
  • the primers disclosed herein may be provided as part of a kit.
  • a kit can also comprise buffers, nucleotide bases and other compositions to be used in hybridization and/or amplification reactions. In other cases, the primers described herein may be part of a device.
  • a panel of nucleic acids is detected in a sample from a patient.
  • a panel of one to three cDNAs can serve as biomarkers to distinguish invasive breast cancer from adjacent healthy tissue.
  • a panel of one to three cDNAs can serve to residual breast cancer post-chemotherapy, post-radiation treatment, or post-surgical excision of tumor(s).
  • Such cDNA panels may comprise IBSP, MMP11, and/or COL10A1 cDNA.
  • a panel may comprise two or three genes selected from IBSP, MMP11, and COL10A1, which can be amplified using the primers disclosed herein.
  • the relative levels of cDNA panels may be assessed relative to the cDNA levels of a reference gene panel.
  • Such reference gene panel may comprise TTC5 and/or C2orf44, which can be amplified using the primers disclosed herein.
  • the single genes or gene panels are compared to a negative control for genomic DNA, for example, chr3 gDNA, which can be amplified using the primers disclosed herein.
  • the nucleic acids disclosed herein may be used a biomarker.
  • a portion of the cDNA sequence of MMP I /, IBSP, or COL10A1 may be used as a biomarker to detect cancer.
  • sequence of an MMP 11 cDNA is according to:
  • sequence of an IBSP cDNA is according to:
  • sequence of a COL10A1 cDNA is according to:
  • AAAA SEQ ID NO:359
  • FIG. 13 shows a computer system 1301 that is programmed or otherwise configured to identify biomarkers for a cancer, such as a breast cancer.
  • the computer system 1301 can regulate various aspects of the analysis of the present disclosure, such as, for example, it can analyze a cohort of biomarkers from a population of subjects afflicted with a cancer; it can identify a first subset from said cohort of said biomarkers that has at least a 3 -fold higher expression level in said cancer as compared to tissue samples that do not contain cancer, such a healthy control biomarker; it can identify a second subset from said first subset of said biomarkers that have a false discovery rate of less than a 10 6 it can use at least one biomarker from said second subset of said biomarkers as input for a machine learning algorithm such as correlation feature selection (CFS); and it can further output one or more biomarkers that identify said cancer.
  • the computer system 1301 can be an electronic device of a user or
  • the computer system 1301 includes a central processing unit (CPU, also “processor” and“computer processor” herein) 1305, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
  • the computer system 1301 also includes memory or memory location 1310 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 1315 (e.g., hard disk), communication interface 1320 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 1325, such as cache, other memory, data storage and/or electronic display adapters.
  • the memory 1310, storage unit 1315, interface 1320 and peripheral devices 1325 are in communication with the CPU 1305 through a communication bus (solid lines), such as a motherboard.
  • the storage unit 1315 can be a data storage unit (or data repository) for storing data.
  • the computer system 1301 can be operatively coupled to a computer network (“network”) 1330 with the aid of the communication interface 1320.
  • the network 1330 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
  • the network 1330 in some cases is a telecommunication and/or data network.
  • the network 1330 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
  • the network 1330, in some cases with the aid of the computer system 1301, can implement a peer-to-peer network, which may enable devices coupled to the computer system 1301 to behave as a client or a server.
  • the system can train a number of classifiers that identify breast cancer.
  • the CPU 1305 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
  • the instructions may be stored in a memory location, such as the memory 1310.
  • the instructions can be directed to the CPU 1305, which can subsequently program or otherwise configure the CPU 1305 to implement methods of the present disclosure. Examples of operations performed by the CPU 1305 can include fetch, decode, execute, and writeback.
  • the CPU 1305 can be part of a circuit, such as an integrated circuit.
  • a circuit such as an integrated circuit.
  • One or more other components of the system 1301 can be included in the circuit.
  • the circuit is an application specific integrated circuit (ASIC).
  • ASIC application specific integrated circuit
  • the storage unit 1315 can store files, such as drivers, libraries and saved programs.
  • the storage unit 1315 can store user data, e.g., user preferences and user programs.
  • the computer system 1301 in some cases can include one or more additional data storage units that are external to the computer system 1301, such as located on a remote server that is in communication with the computer system 1301 through an intranet or the Internet.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • the computer system 1301 can communicate with one or more remote computer systems through the network 1330.
  • remote computer system of a user
  • a remote computer system of a user e.g., it can access electronic data from the TCGA project.
  • remote computer systems include personal computers (e.g, portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g, Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
  • the user can access the computer system 1301 via the network 1330.
  • Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 1301, such as, for example, on the memory 1310 or electronic storage unit 1315.
  • machine e.g., computer processor
  • the machine executable or machine readable code can be provided in the form of software.
  • the code can be executed by the processor 1305.
  • the code can be retrieved from the storage unit 1315 and stored on the memory 1310 for ready access by the processor 1305.
  • the electronic storage unit 1315 can be precluded, and machine-executable instructions are stored on memory 1310.
  • the code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime.
  • the code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
  • aspects of the systems and methods provided herein can be embodied in programming.
  • Various aspects of the technology may be thought of as“products” or“articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium.
  • Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk.
  • “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
  • another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
  • a machine-readable medium such as computer-executable code
  • a tangible storage medium such as computer-executable code
  • Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
  • Volatile storage media include dynamic memory, such as main memory of such a computer platform.
  • Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
  • Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
  • RF radio frequency
  • IR infrared
  • Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD- ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data.
  • Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
  • the computer system 1301 can include or be in communication with an electronic display 1335 that comprises a user interface (E ⁇ ) 1340 for providing, for example, an output listing one or more biomarkers that identify a cancer, such as breast cancer.
  • E ⁇ user interface
  • Examples of Ed’s include, without limitation, a graphical user interface (GUI) and web-based user interface.
  • GUI graphical user interface
  • Methods and systems of the present disclosure can be implemented by way of one or more algorithms.
  • An algorithm can be implemented by way of software upon execution by the central processing unit 1305.
  • kits comprising any of the primers and reagents for detecting the 3-gene panel of biomarkers described in this application.
  • the kits may comprise at least one primer sequence that has at least 90% identity to any one of SEQ ID NO: 1- SEQ ID NO: 356, and a buffer solution/system.
  • the kit comprises at least one forward primer that has at least 90% identity to any one of SEQ ID NO: 1-40, SEQ ID NO:56-l50, or SEQ ID NO:228-248 and at least one reverse primer that has at least 90% identity to any one of SEQ ID NO:4l-85, SEQ ID NO: 151-227, or SEQ ID NO:249-279.
  • the kit comprises at least one forward reference primer that has at least 90% identity to any one of SEQ ID NO:280-293 or SEQ ID NO:311-324 and at least one reverse reference primer that has at least 90% identity to any one of SEQ ID NO: 294-310 or SEQ ID NO: 325-338.
  • the kit comprises at least one forward positive control primer that has at least 90% identity to any one of SEQ ID NO:339- 347 and at least one reverse positive control primer that has at least 90% identity to any one of SEQ ID NO: 348-356.
  • the kit comprises at least one forward and reverse primer sequence for each of IBSP, MMP11, and COL10A that has at least 90% identity to any of the primer combinations in Table 13, Table 14, and Table 15. In some embodiments, the kit comprises at least one forward and reverse primer sequence for each of TTC5 and C2orf44 that has at least 90% identity to any of the primer combinations in Table 16 and Table 17. In some embodiments, the kit comprises at least one forward and reverse primer sequence for chr3 gDNA that has at least 90% identity to any of the primer
  • kits further comprise a DNA-intercalating dye or a fluorescent probe, such as a TaqMan compatible probe.
  • a TaqMan compatible probe may comprise a short oligonucleotide sequence designed to hybridize to the desired gene, in combination with a 5’-fluorophore and a 3’ -quencher attached to either end of the
  • the kit also comprises a negative control sample, a positive control sample, or a using a synthetic nucleotide control.
  • kits can further comprise a set of reagents for a polymerase chain reaction.
  • reagents for a polymerase chain reaction include a suitable thermostable DNA polymerase (e.g. Taq polymerase, which may be a hot-start polymerase to improve fidelity) solution, a solution of 4 dNTPs (e.g. dATP, dTTP, dGTP, dCTP), a buffer solution, DNAse- free water, and/or solutions of PCR stabilizers/enhancers.
  • Buffers are prepared at the pH optimum for the enzyme and may additionally comprise salts such as KC1, NaCl, and/or MgCl2, reducing agents such as DTT or B-me, detergents such as triton-x or tween-20, and/or glycerol as useful for function of the enzyme.
  • Stabilizers/additives may include agents such as DMSO, betaine monohydrate, formamide, MgCl2, glycerol, BSA, tween-20,
  • a polymerase chain reaction kit may include suitable fluorescent DNA-binding dyes such as SYBR Green, ethidium bromide, or EvaGreen.
  • the set of reagents can be for a reverse-transcriptase polymerase chain reaction.
  • Reagents for a reverse-transcriptase polymerase chain reaction include a suitable reverse transcriptase (such as Maloney murine leukemia virus, M-MLV, reverse transcriptase) solution, solution of 4 dNTPs (e.g. dATP, dTTP, dGTP, dCTP), a buffer solution, an RNAse inhibitor solution, and/or RNAse-free water.
  • dNTPs e.g. dATP, dTTP, dGTP, dCTP
  • RNAse inhibitor solution e.g. RNAse inhibitor solution
  • the kit can further comprise written instructions for a use thereof. Such instructions may include instructions for isolating/preparing the sample, operating
  • the kit can further comprise components for touch-prep.
  • Such components include poly-D-lysine coated glass slides, an RNA isolation kit, and/or spin columns (suitable for isolation/purification of RNA) and collection tubes.
  • a minimal RNA isolation kit may comprise a solution RNAse-free sample disruption buffer, solutions of RNA isolation reagents (e.g. Trizol or phenol/chloroform or phenol/chloroform/isoamyl alcohol mixtures), RNAse-free DNase, and/or a solution of an RNAse inhibitor.
  • the kit can further comprise components for tumor/tissue dissociation.
  • Such components include a) solutions of enzymes for extracellular matrix (ECM) or other protein degradation such as collagenase, trypsin, elastase, hyaluronidase, and/or papain; b) solutions for lysis of red blood cells from tissue (e.g. a hypotonic lysis buffer); c) a tissue dissociator (e.g. Miltenyi gentleMACS Octo Tissue Dissociator); d) a stabilization buffer (e.g.
  • ECM extracellular matrix
  • a tissue dissociator e.g. Miltenyi gentleMACS Octo Tissue Dissociator
  • a stabilization buffer e.g.
  • a lysis buffer a buffered solution, optionally hypotonic, containing ionic or nonionic detergents such as Triton X-100, tween-20, beta-octyl glucoside, and/or SDS.
  • the kit is a kit for the detection of positive surgical margins.
  • a kit includes components such as instructions, primers or primer
  • kits consists of as instructions, touch prep components as described above, reagents for polymerase chain reaction, and reagents for reverse-polymerase chain reaction.
  • the kit is a kit for detection of molecular complete response (mCR).
  • mCR molecular complete response
  • a kit includes components such as instructions, primers or primer combinations outlined above (e.g. forward and reverse primers for each target gene, forward and reverse primers for a reference gene, and forward and reverse primers for a gDNA control gene), components for tumor/tissue dissociation as described above, an RNA isolation kit, and/or spin columns suitable for isolation/purification of RNA.
  • Example 1 A 3-gene test for residual disease/pathologic complete response
  • cDNA from these samples was prepared from clinical samples and q-PCR performed according to standard protocols.
  • standard protocols and kits for cDNA preparation/q-PCR are known to those of skill in the art.
  • Exemplary protocols include, those from ThermoFisher (e.g. Manual for Power SYBR® Green RNA-to-Cx Tm l-Step Kit Part Number 4391003 Rev. D; Manual for EXPRESS One-Step; Superscript® qRT-PCR Kits, Rev. Date: 28 June 2010 Manual part no. A10327), BioRad (e.g.
  • This test was designed to detect early-stage tumors, but the analysis also included 175 late-stage tumors (T3-T4) as a second independent test set.
  • novel biomarkers for cancer were identified when a computer system was used to analyze a cohort of biomarkers from the aforementioned population of subjects afflicted with a cancer.
  • the method identified a first subset of biomarkers that had at least a 3 -fold higher expression level in said cancer as compared to a healthy control biomarker; and a second subset from said first subset of said biomarkers that provided a false discovery rate for said cancer that was less than 0.000001.
  • the markers identified were used to train a machine learning algorithm and were experimentally validated.
  • the method identified a 3-gene set of markers from a plurality of biomarkers from 939 RNA Seq samples.
  • the method was tested on two independent RNA Seq test sets (TABLE 19A and TABLE 19B).
  • the selected 3-gene set of markers correctly classified 96.2% of 939 samples in the Cross-validation Set (early stage, AJCC 1NN4 Tumor Stages T1-12). Since these results were unexpected, we tested whether the performance estimates from cross-validation were inflated by potential modeling errors (e.g. overfitting). First, a suite of negative controls did not detect any modeling errors in the cross validation. Second, the classifier was trained on all 939 samples in the cross-validation set, and tested on a hold-out set of 75 samples.
  • the 3-gene Random Forest test correctly classified 97.3% of 75 early-stage samples in one independent test set, and correctly classified 94.3% of 175 late-stage samples in a second independent test set. Performance was not significantly affected by race, ethnicity, tumor stage, or ER/PR/Her2 status.
  • overfit models have higher performance from resampling estimates like cross validation than on independent validation sets.
  • cross validation estimates and performance on the independent validation set were within the 95% confidence intervals.
  • a subset of biomarkers that had a large mean difference between groups, with two clearly separated distributions, was first identified using a computer system. In addition, to detect tumor cells in a population of healthy cells, additional biomarkers that had a higher level of expression in tumors than healthy samples were selected. To identify such biomarkers, genes with a log2(fold-change) + 3 and genes with a False Discovery Rate (adjusted p-value) of p ⁇ 10 6 were identified. The method identified a first subset of biomarkers that were overexpressed in tumors (FIG.2A).
  • CFS Correlation-based Feature Selection
  • the 6 algorithms were the support vector algorithm SMO, Naive Bayes, J48 Decision Tree, Lazy-IBk, the Multilayer Perceptron neural network, and Random Forest.
  • the Random Forest ensemble machine learning method was used in the remaining of the experiments.
  • Principal Component Analysis (PCA) (FIG.3) suggests a rationale for why the disclosed 3-gene set of markers had higher performance than existing breast cancer disease classifiers.
  • Existing classifiers attempt to identify subgroups among the cluster of tumor samples. This leads to the strongest performance being focused on distinguishing the two most prominent groups, such as tumor and healthy (FIG. 3).
  • the 3 -gene test had an accuracy of 94-97% when analyzed on 3 sets of RNA
  • RNA Seq samples were divided from early stage breast tumors and adjacent healthy tissue into a Cross-validation set of 939 samples and an Independent Test Set of 75 samples (TABLE 19). The 3-gene test correctly classified 96.2% of the samples in the Cross-validation set (TABLE 20). The Area Linder the Receiver Operator Characteristic Curve (ALiC ROC) was 0.990 (95% Cl: 0.997-1.000) (FIG.4).
  • the 3-gene test has equivalent performance on the early-stage Independent Test Set: 97.3% Accuracy, 0.998 AUC ROC (95% Cl: 0.992-1.000), 98.0% Sensitivity, 96.0% Specificity, 98.0% Positive Predictive Value, and 96.0% Negative Predictive Value.
  • T3 and T4 (later stage) samples were also tested. 175 late-stage primary tumors were used as a second independent test set. In this analysis, the classifier correctly detects 94.3% of late-stage tumors. In all tests sets, the 3- gene test performed equally well regardless of racial groups or clinical subtypes (ER ⁇ , PR, Her2 ⁇ ) (see TABLE 19).
  • Our classifier combines a 3 -gene set of markers including MM PI /, IBSP, and
  • COL10AX using the Random Forest machine learning algorithm.
  • Negative Control I Randomized. Fictitious Class Labels to Detect Overfitting
  • Biomarker selection workflow was performed on a dataset with randomized class labels to detect overfitting. Using the existing classification of samples in the dataset as either Tumor or Healthy, markers were randomly assigned to a fictitious class or to a gene expression class (Class A or B). The workflow was repeated in the same manner used to develop the 3-gene test, this time trying to distinguish Class A from B. The 3 best genes were selected in each of 10 cross validation folds, and used Random Forest to train a classifier for each fold. Subsequently, 10 independent test sets were used to determine performance of the 10 models. By performing the disclosed workflow on a dataset with randomized class labels, the strategy detects overfitting. FIG.6, Negative Control I (Random Class) clearly shows no evidence of overfitting: 0.51 Area Under the ROC Curve, 51.9% Accuracy, 51.6%
  • FIG.6 0.733 AUC ROC, 72.6% Accuracy, 73.8% Sensitivity, 61.8% Specificity, however randomly selected genes perform much worse than our 3-Gene Test. This data demonstrates that randomly selected genes do not provide an adequate set of biomarkers.
  • the disclosed one-step RTqPCR assay uses targeted primers to reverse transcribe RNA into cDNA, followed by qPCR amplification of cDNA and detection using a DNA-intercalating dye. Synthetic templates were utilized to optimize the concentration of each primer (titrations of primer concentrations) and annealing temperatures (temperature gradients). Some RNA primers were designed to span exon junctions. For exon-spanning primers, genomic DNA from HeLa cells was used to verify that RNA quantification is not impacted by the presence of genomic DNA.
  • RNA from 3 invasive breast tumors was used to test each primer pair.
  • the testing evaluated 3 tumor-specific genes ( IBSP , MMP1J and COL10A1 ), 2 reference genes ( C2orf44 and TTC5 ), and a control to detect genomic DNA, chr3 gDNA.
  • IBSP tumor-specific gene
  • MMP1J MMP1J
  • COL10A1 2 reference genes
  • C2orf44 and TTC5 2 reference genes
  • All RNA experiments also included a positive control for each primer pair.
  • qPCR assays were used to analyze RNA from 22 clinical samples (11 pairs of invasive breast adenocarcinomas and adjacent healthy samples).
  • FIGURE 7A and FIGURE 7B depict charts showing analytic validation of qPCR assays for using clinical-grade reagents.
  • FIGURE 7A panel a depicts amplification plots of 20 microliter qPCR reactions. 12 concentrations of synthetic cDNA template (1.1 million to 0 copies per microliter), including lO-fold dilutions for 6 high concentrations (5 technical replicates) and 2-fold dilutions for 5 low concentrations (7 technical replicates).
  • Each primer pair includes 24 replicates of no-template controls. Error bars at each cycle represent 95% Cl of technical replicates.
  • FIGURE 7A panel b depict fluorescence versus cycle plots to determine Ct iorMMPll.
  • a 4-parameter linear model was fitted to 5 technical replicates (circles). The maximum of the second derivative was used to define the Ct (CtD2).
  • FIGURE 7B panel c depicts threshold cycle versus template dilution plots to calculate linear range.
  • the linear range is defined as the range of concentrations where CtD2 fit a straight line with R-squared >0.995. Red lines indicate 95% Confidence Intervals calculated from 200 bootstraps.
  • FIGURE 7B panel d depicts melt plots confirm to specificity of the primers. Increasing temperature denatures PCR amplicons, which decreases fluorescence. A single peak of the negative first derivative confirms the presence of a single amplicon. The peak corresponds to the expected melting temperature (dashed line).
  • FIGURE 7A and FIGURE 7B panels e-h depict charts showing analytic validation of qPCR assays for IBSP RNA as for A/A// J / / All assays used clinical-grade reagents.
  • Panel e depicts amplification plots of 20 microliter qPCR reactions. 12
  • concentrations of synthetic cDNA template (1.1M to 0 copies per microliter), including 10- fold dilutions for 6 high concentrations (5 technical replicates) and 2-fold dilutions for 5 low concentrations (7 technical replicates).
  • concentration point overlapped in the high and low concentration series.
  • Each primer pair includes 24 replicates of no-template controls. Error bars at each cycle represent 95% Confidence Intervals of technical replicates.
  • FIGURE 7A Panel f depicts fluorescence versus cycle plots to determine Ct for IBSP.
  • a 4-parameter linear model was fitted to 5 technical replicates (circles). The maximum of the second derivative was used to define the Ct (CtD2).
  • FIGURE 7B panel g depicts threshold cycle versus template dilution plots to calculate linear range.
  • the linear range is defined as the range of concentrations where CtD2 fit a straight line with R-squared >0.995. Red lines indicate 95% Confidence Intervals calculated from 200 bootstraps.
  • FIG.9 shows the ROC curve for a 3-gene test using Random Forest.
  • IBSP RNA and MMP11 RNA can unexpectedly be used in combination in a Generalized Linear Model of the Binomial Family to correctly classify 100% of samples (EXAMPLE 7).
  • IBSP RNA FOG.10
  • MMP11 MMP11 RNA
  • Example 4 Performance of the 3-gene test using the Random Forest ( R G ) machine learning algorithm, as determined by 5-fold cross validation.
  • RNA from 11 tumor samples and 11 healthy samples were analyzed using the disclosed clinical-grade RTqPCR assays.
  • Example 5 Performance of the 3-gene test using a Generalized Linear Model .
  • RNA from 11 tumor samples and 11 healthy samples were analyzed using the disclosed clinical-grade RTqPCR assays.
  • Resampling was used to estimate performance and statistical parameters of a test generated using a Generalized Linear Model in the binomial family. Five-fold cross validation showed that the 3-gene glm test had an accuracy of 100%, as shown in TABLE 26.
  • RNA from 11 tumor samples and 11 healthy samples were analyzed using the disclosed clinical-grade RTqPCR assays.
  • Example 7 Performance of the 2-gene test using a Generalized Linear Model (glm).
  • RNA from 11 tumor samples and 11 healthy samples were analyzed using the disclosed clinical-grade RTqPCR assays.
  • the two genes were IBSP and MMP I / Resampling was used to estimate performance and statistical parameters of a test generated using a Generalized Linear Model (glm). Five-fold cross validation showed that the 2-gene glm test had an accuracy of 100%, as shown in TABLE 29.
  • the markers and methods can be used to improve the evaluation of surgical margins.
  • cells are collected from the surface of a surgical specimen and the disclosed assays are used to detect the disclosed markers.
  • a number of methods can be used to collect cells from the surface of a surgical specimen.
  • cells can be collected using a surface with a functionalized surface, such as a poly lysine coated touch imprint cytology slide.
  • Cells could also be collected using a membrane, such as a nitrocellulose membrane.
  • cells could be collected using a sharp or blunt instrument, such as scrape preparations, which are routinely performed for pathologic examination.
  • the markers and methods could be used to screen patients for invasive breast cancer. Specimens can be collected using nipple aspirates or ductal lavage, where the mammary ducts and glands are flushed with fluid and aspirated, sometimes following brief hormonal stimulation. Existing screening methods suffer from poor sensitivity or specificity, and often exposure patients to radiation. Ductal lavage is the preferred screening method for some surgeons, because it directly samples the ducts and glands that give rise to epithelial tumors like adenocarcinomas. However, the analysis of rare tumor cells is not ideal.
  • Microscopic detection of tumors has the best performance when the tumor is analyzed in the context of its surrounding healthy tissue.
  • the name histo-pathology derives from the Greek histos , meaning tissue.
  • Ductal lavage is therefore a promising screening strategy that is currently limited by the microscopic analysis required to detect rare or isolated breast cancer cells.
  • Molecular analysis is particularly well suited to solve this problem because it does not rely on visual analysis, and does not require tumor to be evaluated in the context of healthy tissue. The disclosed markers and methods could therefore be used as a screening tool to determine whether there are invasive cancer cells present in screened patients.
  • Biopsies could include core biopsies, punch biopsies, incisional biopsies and excisional biopsies.
  • biopsy samples did not collect a sufficient amount of cells, or the tissue architecture has been disrupted, making it challenging to reach a definitive histopathologic or cytological diagnosis.
  • These challenging cases are prime examples of the advantage of molecular analysis. Molecular analysis does not require abundant tissue, and does not require intact tissue structures in order to detect the disclosed signatures of invasive cancer.
  • Example 10 Identifying pre-cancerous lesions
  • the disclosed markers and methods can be used to establish a new diagnostic paradigm for pre-cancerous lesions.
  • Lesions like ductal carcinoma in situ (DCIS) and lobular carcinoma in situ (LCIS) are currently considered pre-cancerous lesions or risk factors for invasive cancer. In only some cases do they develop into invasive cancer, but there is currently no way to identify which lesions have invasive potential. Moreover, precursor lesions are only analyzed by a few microscopic sections.
  • the current diagnostic paradigm for precancerous lesions is based on whether a pathologist happens to observe cells that penetrate the basement membrane on the few slides that they examine. There is therefore thought to be a subset of pre-cancerous lesions with undiagnosed invasive potential.
  • tissue or biopsy specimens can be morcellated, digested
  • the disclosed biomarkers represent a strategy to stratify patients by their risk for developing invasive cancer.
  • Pathologic Complete Response is the absence of residual cancer in a solid tissue specimen, obtained from a patient who was previously diagnosed with invasive cancer. pCR is used as a surrogate endpoint for solid tumor neoadjuvant therapies.
  • FDA guidance acknowledges that there is an“uncertain relationship between pCR and long term outcome,” and emphasizes the possibility“that a neoadjuvant trial could fail to demonstrate a significant difference in pCR rates and result in abandoned development of a drug that is, in fact, active in the adjuvant or metastatic setting.”
  • a 2016 analysis found that pCR is the primary endpoint of -50% of enrolling phase II rectal cancer trials, and 45% of phase III preoperative breast cancer trials.
  • Histopathology has been the best way to examine tumors for over a century, but it is not ideal to hunt for minimal residual disease (MRD). While FDA guidance documents emphasize the importance of compressive sectioning, sampling by pathology is woefully underpowered to provide a statistically meaningful analysis of the specimen (e.g. in practice, only a few sections are used to hunt for elusive residual tumor).
  • Embodiment 1 A method of distinguishing a cancer from adjacent healthy tissue, said method comprising: a) obtaining a specimen from a human subject, b) collecting a sample from said specimen, c) detecting a presence of a set of markers in said sample by performing an amplification reaction in a plurality of polynucleotides from said sample, wherein said set of markers is selected from the group consisting essentially of: Matrix Metallopeptidase 11 ( MMP11 ), integrin binding sialoprotein ( IBSP ), and collagen type X alpha 1 chain ( COL10A1 ); and d) distinguishing said cancer when a threshold level of said set of markers is detected.
  • MMP11 Matrix Metallopeptidase 11
  • IBSP integrin binding sialoprotein
  • COL10A1 collagen type X alpha 1 chain
  • Embodiment 2 The method of Embodiment 1, wherein said amplification reaction is a PCR reaction.
  • Embodiment 3 The method of Embodiment 1, wherein said PCR reaction is a qPCR reaction.
  • Embodiment 4 The method of Embodiment 1, wherein said PCR reaction is a RTqPCR reaction.
  • Embodiment 5 The method of Embodiment 1, wherein said method can distinguish said cancer in at least lOng of said plurality of polynucleotides from sample.
  • Embodiment 6 The method of Embodiment 1, wherein said method can distinguish said cancer in at least 100 mg of said sample.
  • Embodiment 7 The method of Embodiment 1, wherein said amplification reaction uses at least one primer sequence that has at least 90% identity to SEQ ID NO: 1- SEQ ID NO: 356.
  • Embodiment 8 The method of Embodiment 1, wherein said sample is frozen.
  • Embodiment 9 The method of Embodiment 1, wherein said sample is a biopsy sample.
  • Embodiment 10 The method of Embodiment 9, wherein said biopsy is a liquid biopsy.
  • Embodiment 11 The method of Embodiment 9, wherein said biopsy is a solid tissue biopsy.
  • Embodiment 12 The method of Embodiment 1 [00209], wherein said cancer is breast cancer.
  • Embodiment 13 The method of Embodiment 12, wherein said breast cancer is invasive breast cancer.
  • Embodiment 14 The method of Embodiment [00221], wherein said method distinguishes said breast cancer from adjacent healthy tissue with greater than 96% accuracy.
  • Embodiment 15 The method of Embodiment [00221], wherein said method distinguishes said breast cancer from adjacent healthy tissue with greater than 96% sensitivity.
  • Embodiment 16 The method of Embodiment [00221], wherein said method distinguishes said breast cancer from adjacent healthy tissue with greater than 94% specificity.
  • Embodiment 17 The method of Embodiment 1, wherein said cancer is a urothelial carcinoma.
  • Embodiment 18 The method of Embodiment [00209], further comprising outputting a percentage of said plurality of polynucleotides expressing said markers from said sample.
  • Embodiment 19 The method of Embodiment [00209], further comprising comparing said set of markers from said sample to said set of markers from said a control sample.
  • Embodiment 20 The method of Embodiment [00227], wherein said control sample is a second sample from said human subject.
  • Embodiment 21 The method of Embodiment 21, further comprising performing a second assay to distinguish said cancer.
  • Embodiment 22 The method of Embodiment 21, wherein said second assay is an immunohistochemistry assay.
  • Embodiment 23 The method of Embodiment [00209], wherein said threshold level of saidA7 P77 is 1,000 copies.
  • Embodiment 24 The method of Embodiment 1, wherein said threshold level of said 73 ⁇ 4SP is 25 copies.
  • Embodiment 25 The method of Embodiment [00209], wherein said threshold level of said COL10A1 is 700 copies.
  • Embodiment 26 The method of Embodiment [00209], wherein said set of markers is selected from the group consisting of: Matrix Metallopeptidase 11 (L7MR77), integrin binding sialoprotein ( IBSP ), and collagen type X alpha 1 chain ( COL10A1 ).
  • said set of markers is selected from the group consisting of: Matrix Metallopeptidase 11 (L7MR77), integrin binding sialoprotein ( IBSP ), and collagen type X alpha 1 chain ( COL10A1 ).
  • Embodiment 27 A kit comprising, at least one primer sequence that has at least 90% identity to SEQ ID NO: 1- SEQ ID NO: 356 and a buffer system.
  • Embodiment 28 The kit of claim [00235], wherein said buffer system is a PCR buffer system.
  • Embodiment 29 Isolated nucleic acid comprising a primer sequence that has at least 90% identity to SEQ ID NO: 1- SEQ ID NO: 356.
  • Embodiment 30 A method of identifying a biomarker for a cancer
  • Embodiment 31 The method of Embodiment 30, wherein said cancer is breast cancer.
  • Embodiment 32 The method of Embodiment 31, wherein said breast cancer is invasive breast cancer.
  • Embodiment 33 The method of Embodiment 30, wherein said one or more biomarkers identify said cancer with greater than 96% accuracy.
  • Embodiment 34 The method of Embodiment 30, wherein said one or more biomarkers identify said cancer with greater than 96% sensitivity.
  • Embodiment 35 The method of Embodiment 30, wherein said one or more biomarkers identify said cancer with greater than 94% specificity.
  • Embodiment 36 The method of Embodiment 30, wherein said training set comprises one or more markers selected from the group consisting essentially of: Matrix Metallopeptidase 11 ( MMP11 ), integrin binding sialoprotein ( IBSP ), and collagen type X alpha 1 chain ( COL10A1 ).
  • Embodiment 37 A method of diagnosing a cancer in a human subject, said method comprising:
  • Embodiment 38 A method of detecting Matrix Metallopeptidase 11 (MMP11) in a human subject, said method comprising:
  • Embodiment 39 A method of detecting integrin binding sialoprotein (IBSP) in a human subject, said method comprising:
  • Embodiment 40 A method of detecting collagen type X alpha 1 chain (COLIOAI) in a human subject, said method comprising:

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne des méthodes, des compositions et des kits améliorés pour l'analyse de tumeurs solides résiduelles minimales.
PCT/US2018/039163 2018-06-22 2018-06-22 Méthodes et compositions pour l'analyse de biomarqueurs du cancer WO2019245587A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP18923183.0A EP3810807A4 (fr) 2018-06-22 2018-06-22 Méthodes et compositions pour l'analyse de biomarqueurs du cancer
PCT/US2018/039163 WO2019245587A1 (fr) 2018-06-22 2018-06-22 Méthodes et compositions pour l'analyse de biomarqueurs du cancer
AU2018428853A AU2018428853A1 (en) 2018-06-22 2018-06-22 Methods and compositions for the analysis of cancer biomarkers
CA3103572A CA3103572A1 (fr) 2018-06-22 2018-06-22 Methodes et compositions pour l'analyse de biomarqueurs du cancer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2018/039163 WO2019245587A1 (fr) 2018-06-22 2018-06-22 Méthodes et compositions pour l'analyse de biomarqueurs du cancer

Publications (1)

Publication Number Publication Date
WO2019245587A1 true WO2019245587A1 (fr) 2019-12-26

Family

ID=68984201

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/039163 WO2019245587A1 (fr) 2018-06-22 2018-06-22 Méthodes et compositions pour l'analyse de biomarqueurs du cancer

Country Status (4)

Country Link
EP (1) EP3810807A4 (fr)
AU (1) AU2018428853A1 (fr)
CA (1) CA3103572A1 (fr)
WO (1) WO2019245587A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11060149B2 (en) 2014-06-18 2021-07-13 Clear Gene, Inc. Methods, compositions, and devices for rapid analysis of biological markers
US11401558B2 (en) 2015-12-18 2022-08-02 Clear Gene, Inc. Methods, compositions, kits and devices for rapid analysis of biological markers

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017106790A1 (fr) * 2015-12-18 2017-06-22 Clear Gene, Inc. Méthodes, compositions, kits et dispositifs pour l'analyse rapide de marqueurs biologiques

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017106790A1 (fr) * 2015-12-18 2017-06-22 Clear Gene, Inc. Méthodes, compositions, kits et dispositifs pour l'analyse rapide de marqueurs biologiques

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3810807A4 *
ZHANG, X ET AL.: "Insights into the Distinct Roles of MMP-11 in Tumor Biology and Future Therapeutics (Review)", INTERNATIONAL JOURNAL OF ONCOLOGY, vol. 48, no. 5, 18 February 2016 (2016-02-18) - May 2016 (2016-05-01), pages 1783 - 1793, XP055665387 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11060149B2 (en) 2014-06-18 2021-07-13 Clear Gene, Inc. Methods, compositions, and devices for rapid analysis of biological markers
US11401558B2 (en) 2015-12-18 2022-08-02 Clear Gene, Inc. Methods, compositions, kits and devices for rapid analysis of biological markers

Also Published As

Publication number Publication date
AU2018428853A1 (en) 2021-01-14
CA3103572A1 (fr) 2019-12-26
EP3810807A1 (fr) 2021-04-28
EP3810807A4 (fr) 2022-01-19

Similar Documents

Publication Publication Date Title
JP6246845B2 (ja) 遺伝子発現を用いた前立腺癌の予後を定量化する方法
Wittenberger et al. DNA methylation markers for early detection of women’s cancer: promise and challenges
JP6140202B2 (ja) 乳癌の予後を予測するための遺伝子発現プロフィール
Tam et al. Robust global microRNA expression profiling using next-generation sequencing technologies
Farragher et al. RNA expression analysis from formalin fixed paraffin embedded tissues
Li et al. Serum circulating human mRNA profiling and its utility for oral cancer detection
JP6666852B2 (ja) 前立腺がん再発の予後に関する遺伝子発現パネル
JP6285009B2 (ja) 前立腺ガンの予後の検知及び判定のための組成物及び該検知及び判定方法
WO2008070301A9 (fr) Prédiction de la survie à un cancer des poumons en utilisant l'expression génique
US11661632B2 (en) Compositions and methods for diagnosing lung cancers using gene expression profiles
JP2018524972A (ja) 肺癌の診断または検出のための方法及び組成物
JP2020519296A (ja) 膀胱がん監視のためのdnaメチル化および変異分析方法
JP2023524016A (ja) 結腸細胞増殖性障害を特定するためのrnaマーカと方法
US20180371553A1 (en) Methods and compositions for the analysis of cancer biomarkers
WO2019245587A1 (fr) Méthodes et compositions pour l'analyse de biomarqueurs du cancer
JP6611411B2 (ja) 膵臓がんの検出キット及び検出方法
CN112921083A (zh) 肠道息肉和结直肠癌评价中的基因标志物
JP2022524382A (ja) 前立腺がんを予測するための方法およびその使用
JP2017502699A (ja) Mirna比率を使用する肺がん決定
US20210079479A1 (en) Compostions and methods for diagnosing lung cancers using gene expression profiles
WO2014057279A1 (fr) Biomarqueurs de microarn pour le cancer de la prostate
US11427874B1 (en) Methods and systems for detection of prostate cancer by DNA methylation analysis
WO2015121663A1 (fr) Biomarqueurs destinés au cancer de la prostate
JP2024519082A (ja) 肝細胞がんのdnaメチル化バイオマーカー
Hendriksǂ et al. Detection of high-grade prostate cancer using a urinary molecular

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18923183

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3103572

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018428853

Country of ref document: AU

Date of ref document: 20180622

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2018923183

Country of ref document: EP