WO2018174860A1 - Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling - Google Patents

Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling Download PDF

Info

Publication number
WO2018174860A1
WO2018174860A1 PCT/US2017/023474 US2017023474W WO2018174860A1 WO 2018174860 A1 WO2018174860 A1 WO 2018174860A1 US 2017023474 W US2017023474 W US 2017023474W WO 2018174860 A1 WO2018174860 A1 WO 2018174860A1
Authority
WO
WIPO (PCT)
Prior art keywords
lung adenocarcinoma
reagents
sample
biomarker
target analytes
Prior art date
Application number
PCT/US2017/023474
Other languages
French (fr)
Inventor
Xuefeng Bruce LING
Limin Chen
Shiying Hao
Original Assignee
Mprobe Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mprobe Inc. filed Critical Mprobe Inc.
Priority to PCT/US2017/023474 priority Critical patent/WO2018174860A1/en
Publication of WO2018174860A1 publication Critical patent/WO2018174860A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57423Specifically defined cancers of lung
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/60Complex ways of combining multiple protein biomarkers for diagnosis

Definitions

  • the present invention relates to expression profiling to differentiate early stage lung adenocarcinoma patients from normal subjects.
  • Lung cancer is the leading cause of cancer mortality in the United States and worldwide.
  • Lung adenocarcinoma is the most common histological subtype of lung cancer in most countries, accounting for almost half of lung cancers.
  • diagnostic methods and treatments have markedly improved in recent years, the 5- and 10-year survival rates remain at ⁇ 15 and ⁇ 7%, respectively.
  • the lack of appropriate molecular diagnostic tools for early detection represents a major clinical obstacle. Therefore, identification of biomarkers that can detect early stage lung adenocarcinoma may improve treatment strategies and survival rate.
  • RNA-seq technology provides a revolutionary tool for transcriptome analysis. Compared with microarray platform, RNA-seq has less background noise due to image analysis and is more sensitive in detection of transcripts with low-abundance or higher fold change in expression. In this invention, we use RNA-seq to find biomarkers for lung adenocarcinoma early detection.
  • methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all of the target molecules selected from Table 2, or any sub-combinations thereof, in a sample from a subject.
  • methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all early stage lung adenocarcinoma biomarkers identified in experiment conducted during development of embodiments of the present invention.
  • biomarkers are selected from Table 2, or any subcombinations thereof.
  • a method comprises detecting the level of one or more biomarkers in a sample from a subject.
  • a method of monitoring lung adenocarcinoma (e.g., response to treatment, likelihood of mortality, etc.) in a subject comprises forming a biomarker panel having 50 biomarker proteins from lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., comprising FHL1 , CD5L, PTPRH, MMP1 1 , ANGPT4, RS1 , CAV1, SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 ,
  • N is 2 to 50.
  • methods comprise panels of any combination of the lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1, SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FA 83A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FAM83A-AS1 , RP11-371 A19.2,
  • methods comprise comparing biomarker(s) level to a reference value/range or a threshold. In some embodiments, deviation of the biomarker(s) level from the reference value/range, or exceeding or failing to meet the threshold, is indicative of a diagnosis, prognosis, etc. for the subject.
  • each biomarker may be a protein biomarker.
  • the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected.
  • each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected.
  • each biomarker capture reagent may be an antibody or an aptamer.
  • a biomarker is an RNA transcript.
  • the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected.
  • each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected.
  • each biomarker capture reagent may be a nucleic acid probe.
  • the sample may be a biological sample (e.g., tissue, fluid (e.g., blood, urine, saliva, etc.), etc.).
  • the sample is filtered, concentrated (e.g., 2-fold, 5-fold, 10 fold, 20-fold, 50-fold, 100-fold, or more), diluted, or un-manipulated.
  • a methods further comprise treating the subject for lung adenocarcinoma.
  • treating the subject for lung adenocarcinoma comprises a treatment regimen of administering one or more
  • biomarkers described herein are monitored before, during, and/or after treatment.
  • methods comprise providing palliative treatment (e.g., symptom relief) to a subject suffering from lung adenocarcinoma, but not providing interventional treatment of the lung adenocarcinoma.
  • palliative treatment e.g., symptom relief
  • methods comprise providing palliative treatment (e.g., symptom relief) to a subject suffering from lung adenocarcinoma, but not providing interventional treatment of the lung adenocarcinoma.
  • palliative care is pursued in place of lung treatment.
  • palliative care is provided in addition to treatment for lung adenocarcinoma.
  • a method comprises detecting the level of one or more lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., FHL1, CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD
  • the method further comprises measuring the level one or more of the biomarkers at a second time point.
  • lung adenocarcinoma severity is improving (e.g., declining) if the level of said biomarkers improved at the second time point than at the first time point.
  • biomarkers or panels thereof provide a prognosis regarding the future course lung adenocarcinoma in a subject (e.g., likelihood of survival, likelihood of mortality, likelihood of response to therapy, etc.).
  • treatment decisions e.g., whether to treat, surgery, radiation, chemotherapy, etc.
  • are made based on the detection and/or quantification of one or more (e.g., 1 , 2, 3, 4, 5) of the biomarkers identified in experiments conducted during development of embodiments of the present invention e.g., comprising FHL1 , CD5L, PTPRH, MP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GP 6A, WNT3A, ITLN2, SCUBE1 , CLEC3B,
  • kits are provided.
  • a kit comprises at least one, at least two, at least three, at least four, of at least five capture/detection reagents (e.g., antibody, probe, etc.), wherein each capture/detection reagents specifically binds to a different biomarker (e.g., protein or nucleic acid) selected from the lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GP 6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 ,
  • a kit comprises N capture/detection reagents.
  • N is 1 to 50.
  • N is 2 to 50.
  • N is 3 to 50.
  • N is 4 to 50.
  • N is 5 to 50.
  • at least one of the 50 biomarker proteins is selected from the lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1, SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FA 83A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FA 83A-AS1 , RP11-371
  • compositions comprising proteins of a sample from a subject and at least one, at least two, at least three, at least four, at least five capture/detection reagents that each specifically bind to a different biomarker selected from the lung
  • adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention e.g., FHL1 , CD5L, PTPRH, MP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1, TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GP 6A, WNT3A, ITLN2, SCUBE1, CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FA 83A-AS1 , RP1 1-371 A19.2, MNX1-AS1 , MEX3A, TUBB3, RP11-141J13.5
  • FIG. 1 The analysis procedure of RNA sequencing data. Each step and packages used in alignment, quantification, and DE analysis are described in this figure.
  • Figure 2 Scatterplot of calculated probabilities of lung adenocarcinoma with selected 50- gene panel.
  • the model was trained with Random Forest algorithm, 371/564 case/control (417/634 in total) were selected out randomly to train the model.
  • lung adenocarcinoma biomarkers are provided.
  • a “biomarker” or “marker” it is meant a molecular entity whose representation in a sample is associated with a disease phenotype.
  • lung adenocarcinoma it is meant a subtype of non- small cell lung cancer that is often diagnosed in an outer area of the lung and arises from the secretory (glandular) cells located in the epithelium lining the bronchi.
  • a lung adenocarcinoma “biomarker” or “iung adenocarcinoma marker” it is meant a molecular entity whose representation in a sample is associated with a lung adenocarcinoma phenotype, e.g., the presence of lung adenocarcinoma, the stage of lung adenocarcinoma, a prognosis associated with the lung adenocarcinoma, the predictability of the lung adenocarcinoma being responsive to a therapy, etc.
  • the marker may be said to be differentially represented in a sample having a iung adenocarcinoma phenotype.
  • Lung adenocarcinoma biomarkers include proteins that are differentially represented in a lung adenocarcinoma phenotype and their corresponding genetic sequences, i.e., mRNA, DNA, etc.
  • a “gene” or “recombinant gene” it is meant a nucleic acid comprising an open reading frame that encodes for the protein. The boundaries of a coding sequence are determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus. A transcription termination sequence may be located 3' to the coding sequence.
  • a gene may optionally include its natural promoter (i.e., the promoter with which the exons and introns of the gene are operably linked in a non-recombinant cell , i.e., a naturally occurring cell), and associated regulatory sequences, and may or may not have sequences upstream of the AUG start site, and may or may not include untranslated leader sequences, signal sequences, downstream untranslated sequences, transcriptional start and stop sequences, polyadenylation signals, translational start and stop sequences, ribosome binding sites, and the like.
  • its natural promoter i.e., the promoter with which the exons and introns of the gene are operably linked in a non-recombinant cell , i.e., a naturally occurring cell
  • associated regulatory sequences may or may not have sequences upstream of the AUG start site, and may or may not include untranslated leader sequences, signal sequences, downstream untranslated sequences, transcriptional start and stop sequences, polyaden
  • gene product or "expression product” are used herein to refer to the RNA transcription products (transcripts) of the gene, including mRNA; and the polypeptide translation products of such RNA transcripts, i.e. the amino acid product encoded by a gene.
  • a gene product can be, for example, an RNA transcript of the gene, e.g. an unspliced RNA, an mRNA, a splice variant mRNA, a microRNA, a fragmented RNA, etc.; or an amino acid product encoded by the gene, including, for example, full length polypeptide, splice variants of the full length polypeptide, post-translationally modified polypeptide, and fragments of the gene product, e.g.
  • an elevated level of marker or marker activity may be associated with the lung adenocarcinoma phenotype.
  • a reduced level of marker or marker activity may be associated with the lung adenocarcinoma phenotype.
  • T is used to categorize the pathology of the tumor(TX: Primary tumor cannot be assessed, or tumor proven by the presence of malignant cells in sputum or bronchial washings but not visualized by imaging or
  • adenocarcinoma, Tis SCIS: squamous cell carcinoma;T1 : Tumor 3 cm or less in greatest dimension, surrounded by lung or visceral pleura, without bronchoscopic evidence of invasion more proximal than the lobar bronchus (i.e., not in the main bronchus); the uncommon superficial spreading tumor of any size with its invasive component limited to the bronchial wall, which may extend proximal to the main bronchus, is also classified as T1a,T1 mi: Minimally invasive adenocarcinoma;T1a: Tumor 1 cm or less in greatest dimension;T1 b: Tumor more than 1 cm but not more than 2 cm in greatest dimension;T1c: Tumor more than 2 cm but not more than 3 cm in greatest dimension;T2: Tumor more than 3 cm but not more than 5 cm; or tumor with any of the following features (T2 tumors with these features are classified T2a if 4 cm or less or if size cannot be determined and as
  • T1mi minimally invasive adenocarcinoma
  • Tis tumor in situ.
  • a biomarker level is detected using a capture reagent.
  • the capture reagent contains a feature that is reactive with a secondary feature on a solid support. In these embodiments, the capture reagent is exposed to the biomarker in solution, and then the feature on the capture reagent is used in conjunction with the secondary feature on the solid support to immobilize the biomarker on the solid support.
  • Capture reagent is selected based on the type of analysis to be conducted.
  • Capture reagents include but are not limited to aptamers, antibodies, other antibody mimetics and other protein scaffolds, autoantibodies, chimeras, small molecules, F(ab')2 fragments, single chain antibody fragments, FV fragments, single chain FV fragments, nucleic acids, lectins, ligand-binding receptors, affybodies, nanobodies, imprinted polymers, avimers, peptidomimetics, hormone receptors, cytokine receptors, and synthetic receptors, and modifications and fragments of these.
  • biomarker presence or level is detected using a
  • the biomarker presence or level is derived from the biomarker/capture reagent complex and is detected indirectly, such as, for example, as a result of a reaction that is subsequent to the biomarker/capture reagent interaction, but is dependent on the formation of the biomarker/capture reagent complex.
  • biomarker presence or level is detected directly from the biomarker in a biological sample.
  • biomarkers are detected using a multiplexed format that allows for the simultaneous detection of two or more biomarkers in a biological sample.
  • capture reagents are immobilized, directly or indirectly, covalently or non-covalently, in discrete locations on a solid support.
  • a multiplexed format uses discrete solid supports where each solid support has a unique capture reagent associated with that solid support, such as, for example quantum dots.
  • an individual device is used for the detection of each one of multiple biomarkers to be detected in a biological sample. Individual devices are configured to permit each biomarker in the biological sample to be processed simultaneously. For example, a microtiter plate can be used such that each well in the plate is used to analyze one or more of multiple biomarkers to be detected in a biological sample.
  • the fluorescent label is a fluorescent dye molecule.
  • the fluorescent dye molecule includes at least one substituted indolium ring system in which the substituent on the 3-carbon of the indolium ring contains a chemically reactive group or a conjugated substance.
  • the dye molecule includes an AlexFluor molecule, such as, for example, AlexaFluor 488, AlexaFluor 532, AlexaFluor 647, AlexaFluor680, or AlexaFluor 700.
  • the dye molecule includes a first type and a second type of dye molecule, such as, e.g., two different AlexaFluor molecules.
  • the dye molecule includes a first type and a second type of dye molecule, and the two dye molecules have different emission spectra.
  • Fluorescence can be measured with a variety of instrumentation compatible with a wide range of assay formats.
  • instrumentation for example, spectrofluorimeters have been designed to analyze microtiter plates, microscope slides, printed arrays, cuvettes, etc. See Principles of
  • a chemiluminescence tag is optionally used to label a component of the biomarker/capture complex to enable the detection of a biomarker level.
  • Suitable chemiluminescent materials include any of oxalylchloride, Rodamin 6G, Ru(bipy)32+, TMAE (tetrakis(dimethylamino)ethylene), Pyrogallol (1 ,2,3-trihydroxibenzene), Lucigenin, peroxyoxaiates, Aryl oxalates, Acridinium esters, dioxetanes, and others.
  • the detection method includes an enzyme/substrate combination that generates a detectable signal that corresponds to the biomarker level (e.g., using the techniques of ELISA, Western blotting, isoelectric focusing).
  • the enzyme catalyzes a chemical alteration of the chromogenic substrate which can be measured using various techniques, including spectrophotometry, fluorescence, and chemiluminescence.
  • Suitable enzymes include, for example, luciferases, luciferin, malate dehydrogenase, urease, horseradish peroxidase (HRPO), alkaline phosphatase, beta-galactosidase, glucoamylase, lysozyme, glucose oxidase, galactose oxidase, and glucose-6-phosphate dehydrogenase, uricase, xanthine oxidase, lactoperoxidase, microperoxidase, and the like.
  • HRPO horseradish peroxidase
  • alkaline phosphatase beta-galactosidase
  • glucoamylase lysozyme
  • glucose oxidase galactose oxidase
  • glucose-6-phosphate dehydrogenase uricase
  • xanthine oxidase lactoperoxidase
  • microperoxidase and the like.
  • the detection method is a combination of fluorescence, chemiluminescence, radionuclide or enzyme/substrate combinations that generate a
  • multimodal signaling has unique and advantageous characteristics in biomarker assay formats.
  • the biomarker levels for the biomarkers described herein is detected using any analytical methods including, singleplex aptamer assays, multiplexed aptamer assays, singleplex or multiplexed immunoassays, mRNA expression profiling histological/cytological methods, etc. as discussed below.
  • Determination of Biomarker Levels Using Gene Expression Profiling Measuring mRNA in a biological sample may, in some embodiments, be used as a surrogate for detection of the level of a corresponding protein in the biological sample.
  • a biomarker or biomarker panel described herein can be detected by detecting the appropriate RNA.
  • mRNA expression levels are measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed with qPCR).
  • RT-PCR reverse transcription quantitative polymerase chain reaction
  • qPCR reverse transcription quantitative polymerase chain reaction
  • qPCR fluorescence as the DNA amplification process progresses.
  • qPCR can produce an absolute measurement such as number of copies of mRNA per cell.
  • Northern blots, microarrays, RNAseq, Invader assays, and RT-PCR combined with capillary electrophoresis have all been used to measure expression levels of mRNA in a sample. See Gene Expression Profiling; Methods and Protocols, Richard A. Shimkets, editor, Humana Press, 2004; herein incorporated by reference in its entirety.
  • Immunoassay methods are based on the reaction of an antibody to its corresponding target or anaiyte and can detect the anaiyte in a sample depending on the specific assay format.
  • monoclonal antibodies and fragments thereof are often used because of their specific epitope recognition.
  • Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies.
  • Immunoassays have been designed for use with a wide range of biological sample matrices. Immunoassay formats have been designed to provide qualitative, semi-quantitative, and quantitative results.
  • Quantitative results are generated through the use of a standard curve created with known concentrations of the specific anaiyte to be detected.
  • the response or signal from an unknown sample is plotted onto the standard curve, and a quantity or level corresponding to the target in the unknown sample is established.
  • ELISA or EIA can be quantitative for the detection of an anaiyte. This method relies on attachment of a label to either the anaiyte or the antibody and the label component includes, either directly or indirectly, an enzyme. ELISA tests may be formatted for direct, indirect, competitive, or sandwich detection of the anaiyte. Other methods rely on labels such as, for example, radioisotopes (I 125 ) or fluorescence.
  • Additional techniques include, for example, agglutination, nephelometry, turbidimetry, Western blot, immunoprecipitation, immunocytochemistry, immunohistochemistry, flow cytometry, Luminex assay, and others (see ImmunoAssay: A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition; herein incorporated by reference in its entirety).
  • Exemplary assay formats include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, fluorescent, chemiluminescence, and fluorescence resonance energy transfer (FRET) or time resolved-FRET (TR-FRET) immunoassays.
  • ELISA enzyme-linked immunosorbent assay
  • FRET fluorescence resonance energy transfer
  • TR-FRET time resolved-FRET
  • biomarkers include biomarker immunoprecipitation followed by quantitative methods that allow size and peptide level discrimination, such as gel electrophoresis, capillary
  • Methods of detecting and/or for quantifying a detectable label or signal generating material depend on the nature of the label.
  • the products of reactions catalyzed by appropriate enzymes can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light.
  • detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
  • Any of the methods for detection can be performed in any format that allows for any suitable preparation, processing, and analysis of the reactions. This can be, for example, in multi-well assay plates (e.g., 96 wells or 384 wells) or using any suitable array or microarray. Stock solutions for various agents can be made manually or robotically, and all subsequent pipetting, diluting, mixing, distribution, washing, incubating, sample readout, data collection and analysis can be done robotically using commercially available analysis software, robotics, and detection instrumentation capable of detecting a detectable label.
  • the biomarkers described herein may be detected in a variety of tissue samples using histological or cytological methods.
  • one or more capture reagent/s specific to the corresponding biomarkers are used in a cytological evaluation of a sample and may include one or more of the following: collecting a cell sample, fixing the cell sample, dehydrating, clearing, immobilizing the cell sample on a microscope slide,
  • the cell sample is produced from a cell block.
  • one or more capture reagent s specific to the corresponding biomarkers are used in a histological evaluation of a tissue sample and may include one or more of the following: collecting a tissue specimen, fixing the tissue sample, dehydrating, clearing, immobilizing the tissue sample on a microscope slide, permeabilizing the tissue sample, treating for analyte retrieval, staining, destaining, washing, blocking, rehydrating, and reacting with capture reagent/s in a buffered solution.
  • fixing and dehydrating are replaced with freezing.
  • results are analyzed and/or reported (e.g., to a patient, clinician, researcher, investigator, etc.).
  • Results, analyses, and/or data e.g., signature, disease score, diagnosis, recommended course, etc. are identified and/or reported as an
  • a result may be produced by receiving or generating data (e.g., test results) and transforming the data to provide an outcome or result.
  • An outcome or result may be determinative of an action to be taken.
  • results determined by methods described herein can be independently verified by further or repeat testing.
  • analysis results are reported (e.g., to a health care professional (e.g., laboratory technician or manager, physician, nurse, or assistant, etc.), patient, researcher, investigator, etc.).
  • a result is provided on a peripheral, device, or component of an apparatus.
  • an outcome is provided by a printer or display.
  • an outcome is reported in the form of a report.
  • an outcome can be displayed in a suitable format that facilitates downstream use of the reported information.
  • Generating and reporting results from the methods described herein comprises transformation of biological data (e.g., presence or level of biomarkers) into a representation of the characteristics of a subject (e.g., likelihood of mortality, likelihood corresponding to treatment, etc.).
  • Such a representation reflects information not determinable in the absence of the method steps described herein. Converting biologic data into understandable characteristics of a subject allows actions to be taken in response such information.
  • a downstream individual e.g., clinician, patient, etc.
  • upon receiving or reviewing a report comprising one or more results determined from the analyses provided herein will take specific steps or actions in response. For example, a decision about whether or not to treat the subject, and/or how to treat the subject is made.
  • receiving a report refers to obtaining, by a communication means, a written and/or graphical representation comprising results or outcomes of analysis.
  • the report may be generated by a computer or by human data entry, and can be communicated using electronic means (e.g., over the internet, via computer, via fax, from one network location to another location at the same or different physical sites), or by another method of sending or receiving data (e.g., mail service, courier service and the like).
  • the outcome is transmitted in a suitable medium, including, without limitation, in verbal, document, or file form.
  • the file may be, for example, but not limited to, an auditory file, a computer readable file, a paper file, a laboratory file or a medical record file.
  • a report may be encrypted to prevent unauthorized viewing.
  • systems and method described herein transform data from one form into another form (e.g., from biomarker levels to diagnoistic/prognostic determination, etc.).
  • the terms “transformed”, “transformation”, and grammatical derivations or equivalents thereof refer to an alteration of data from a physical starting material (e.g., biological sample, etc.) into a digital representation of the physical starting material (e.g., biomarker levels), a condensation/representation of that starting material (e.g., risk level), or a recommended action (e.g., treatment, no treatment, etc.).
  • any combination of the biomarkers described herein can be detected using a suitable kit, such as for use in performing the methods disclosed herein.
  • the biomarkers described herein may be combined in any suitable combination, or may be combined with other markers not described herein.
  • any kit can contain one or more detectable labels as described herein, such as a fluorescent moiety, etc.
  • a kit includes (a) one or more capture reagents for detecting one or more biomarkers in a biological sample, and optionally (b) one or more software or computer program products for providing a diagnosis/prognosis for the individual from whom the biological sample was obtained.
  • one or more instructions for manually performing the above steps by a human can be provided.
  • a kit comprises a solid support, a capture reagent, and a signal generating material. The kit can also include instructions for using the devices and reagents, handling the sample, and analyzing the data. Further the kit may be used with a computer system or software to analyze and report the result of the analysis of the biological sample.
  • kits can also contain one or more reagents (e.g., solubilization buffers, detergents, washes, or buffers) for processing a biological sample.
  • reagents e.g., solubilization buffers, detergents, washes, or buffers
  • Any of the kits described herein can also include, e.g., buffers, blocking agents, mass spectrometry matrix materials, serum/plasma separators, antibody capture agents, positive control samples, negative control samples, software and information such as protocols, guidance and reference data.
  • kits are provided for the analysis of glioma, wherein the kits comprise PCR primers for one or more biomarkers described herein.
  • a kit may further include instructions for use and correlation of the biomarkers.
  • a kit may include a DNA array containing the complement of one or more of the biomarkers described herein, reagents, and/or enzymes for amplifying or isolating sample DNA.
  • the kits may include reagents for real-time PCR, for example, TaqMan probes and/or primers, and enzymes.
  • a kit can comprise (a) reagents comprising at least one capture reagent for determining the level of one or more biomarkers in a test sample, and optionally (b) one or more algorithms or computer programs for performing the steps of comparing the amount of each biomarker quantified in the test sample to one or more predetermined cutoffs.
  • an algorithm or computer program assigns a score for each biomarker quantified based on said comparison and, in some embodiments, combines the assigned scores for each biomarker quantified to obtain a total score.
  • an algorithm or computer program compares the total score with a predetermined score, and uses the comparison to determine a diagnosis/prognosis.
  • one or more instructions for manually performing the above steps by a human can be provided.
  • the subject following a determination that a subject has suffers from lung adenocarcinoma, the subject is appropriately treated.
  • therapy is administered to treat lung adenocarcinoma.
  • therapy is administered to treat complications of lung adenocarcinoma (e.g., surgery, radiation, chemotherapy).
  • treatment comprises palliative care.
  • methods of monitoring treatment of lung adenocarcinoma are provided.
  • the present methods of detecting biomarkers are carried out at a time 0.
  • the method is carried out again at a time 1 , and optionally, a time 2, and optionally, a time 3, etc., in order to monitor the progression of lung adenocarcinoma or to monitor the effectiveness of one or more treatments of lung adenocarcinoma.
  • Time points for detection may be separated by, for example at least 4 hours, at least 8 hours, at least 12 hours, at least 1 day, at least 2 days, at least 4 days, at least 1 week, at least 2 weeks, at least 1 month, at least 2 months, at least 3 months, at least 4 months, at least 6 months, or by 1 year or more.
  • a treatment regimen is altered based upon the results of monitoring (e.g., upon determining that a first treatment is ineffective).
  • the level of intervention may be altered.
  • the raw count RNA sequencing data for iung adenocarcinoma patients were downloaded from GDC data portal.
  • the patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port.
  • the SRA RNA sequencing data for normal lung tissue were downloaded from GTEx data portal through dbGaP (Table 2).
  • the two data sets were then manually curated based on the available stage and grade information from patient clinical data.
  • Genomic sequencing pipeline for RNA sequencing data Genomic sequencing pipeline for RNA sequencing data.
  • the entire RNAseq pipeline was divided into two parts for GTEx data: alignment and quantification ( Figure 1 ).
  • the alignment step consists of: SRA to bam conversion using SRA Toolkits (SRA Toolkit development team), bam to fastq conversion using Biobambam (Tischler G et al.,2014), and fastq to aligned bam conversion using STAR (Alex D et al,.2016).
  • the quantification step consists of: quality improvement filtering using Fixmate (http://broadinstitute.github.io/picard/), sorting and quality filtering using samtools (Li H et al,.2009), and sequence counting using HTSeq (Simon A et al,.2014).
  • the output from quantification step results in gene raw counts for GTEx data and is conbined with GDC gene profile for further downstream analysis.
  • the gene expression profile is then pre-filtered based on the mean expression per gene.
  • the filtered profile is then normalized using quantile metric and is converted into log2 scale.
  • Combat package (from edgeR, http://www.r-project.org ⁇ is then used to perform further normalization between GDC case, GDC control, and GTEx control to minimize the difference between normal controls from two databases ( Figure 1).
  • the normalized gene profile is then analyzed by linear model using R package 'iimma' (http://www.r-project.org).
  • the 50 genes with relatively low p-values and relatively large absolute value of log2 fold change were selected as our panel.
  • the selected gene expression profile was firstly normalized to z-score across all the samples.
  • the probability of each sample in each subgroup can be calculated ( Figure 2).
  • Receiver-operator characteristic (ROC) analysis was conducted ( Figure 3) to evaluate the ability of the selected gene expression profile in differentiating the subjects in the testing cohort with early stage lung adenocarcinoma patients from those normal samples. This process was repeated 500 times using bootstrapping algorithm to get more accurate evaluation of the model.
  • RNA sequencing data for early stage lung adenocarcinoma tissue and normal lung tissue were downloaded from GDC and GTEx data portal.
  • the patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port.
  • the normal lung tissue data from GTEx were processed using developed RNA-seq pipeline.
  • the 50 genes used to differentiate of early stage lung are The 50 genes used to differentiate of early stage lung.
  • Unsupervised hierarchical clustering analysis was applied to the selected genes profiles to visually depict the association of the disease status with the abundance patterns of these genes profiles ( Figure 4). This analysis demonstrated two major clusters reflecting normal samples and early stage lung cancer samples. The error rate of the unsupervised clustering is 0.381%, which reinforced the effectiveness of the selected gene profiles for lung cancer assessment.

Abstract

Lung adenocarcinoma markers, lung adenocarcinoma marker panels, and methods for obtaining a lung adenocarcinoma marker level representation for a sample are provided, based upon RNAseq expression profiling. These composition and methods find use in a number of applications, including, for example, diagnosing lung adenocarcinoma, prognosing lung adenocarcinoma, monitoring a subject with lung adenocarcinoma, and determining a treatment for lung adenocarcinoma. In addition, systems, devices, and kits thereof that find use in practicing the subject methods are provided.

Description

FIELD OF THE INVENTION
The present invention relates to expression profiling to differentiate early stage lung adenocarcinoma patients from normal subjects.
BACKGROUND OF THE INVENTION
Lung cancer is the leading cause of cancer mortality in the United States and worldwide. Lung adenocarcinoma is the most common histological subtype of lung cancer in most countries, accounting for almost half of lung cancers. Although diagnostic methods and treatments have markedly improved in recent years, the 5- and 10-year survival rates remain at <15 and <7%, respectively. At present, the lack of appropriate molecular diagnostic tools for early detection represents a major clinical obstacle. Therefore, identification of biomarkers that can detect early stage lung adenocarcinoma may improve treatment strategies and survival rate.
RNA-seq technology provides a revolutionary tool for transcriptome analysis. Compared with microarray platform, RNA-seq has less background noise due to image analysis and is more sensitive in detection of transcripts with low-abundance or higher fold change in expression. In this invention, we use RNA-seq to find biomarkers for lung adenocarcinoma early detection.
SUMMARY OF THE INVENTION
In some embodiments, methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all of the target molecules selected from Table 2, or any sub-combinations thereof, in a sample from a subject.
In some embodiments, methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all early stage lung adenocarcinoma biomarkers identified in experiment conducted during development of embodiments of the present invention. In some embodiments, biomarkers are selected from Table 2, or any subcombinations thereof. In some embodiments, a method comprises detecting the level of one or more biomarkers in a sample from a subject.
In some embodiments, a method of monitoring lung adenocarcinoma (e.g., response to treatment, likelihood of mortality, etc.) in a subject comprises forming a biomarker panel having 50 biomarker proteins from lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., comprising FHL1 , CD5L, PTPRH, MMP1 1 , ANGPT4, RS1 , CAV1, SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG1 1A, AGER, FA 83A-AS1 , RP1 1- 371A19.2, MNX1-AS1 , MEX3A, TUBB3, RP11-141 J13.5, RP11-353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1 , or any sub-combinations thereof), and detecting the level of each of the N biomarker proteins of the panel in a sample from the subject, in some embodiments, N is 1 to 50. In some embodiments, N is 2 to 50. In some embodiments, methods comprise panels of any combination of the lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1, SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FA 83A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FAM83A-AS1 , RP11-371 A19.2, MNX1-AS1 , MEX3A, TUBB3, RP11-141 J13.5, RP11-353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1 , or any sub-combinations thereof), in addition to any other lung adenocarcinoma biomarkers.
In some embodiments, methods comprise comparing biomarker(s) level to a reference value/range or a threshold. In some embodiments, deviation of the biomarker(s) level from the reference value/range, or exceeding or failing to meet the threshold, is indicative of a diagnosis, prognosis, etc. for the subject.
In any of the embodiments described herein, each biomarker may be a protein biomarker. In any of the embodiments described herein, the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected. In some embodiments, each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected. In any of the embodiments described herein, each biomarker capture reagent may be an antibody or an aptamer.
In some embodiments, a biomarker is an RNA transcript. In any of the embodiments described herein, the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected. In some embodiments, each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected. In any of the embodiments described herein, each biomarker capture reagent may be a nucleic acid probe. In any of the embodiments described herein, the sample may be a biological sample (e.g., tissue, fluid (e.g., blood, urine, saliva, etc.), etc.). In some embodiments, the sample is filtered, concentrated (e.g., 2-fold, 5-fold, 10 fold, 20-fold, 50-fold, 100-fold, or more), diluted, or un-manipulated.
In any of the embodiments described herein, a methods further comprise treating the subject for lung adenocarcinoma. In some embodiments, treating the subject for lung adenocarcinoma comprises a treatment regimen of administering one or more
chemotherapeutic, radiation, surgery, etc. In some embodiments, biomarkers described herein are monitored before, during, and/or after treatment.
In some embodiments, methods comprise providing palliative treatment (e.g., symptom relief) to a subject suffering from lung adenocarcinoma, but not providing interventional treatment of the lung adenocarcinoma. In some embodiments, when embodiments herein indicate a low likelihood of success in treating lung adenocarcinoma, palliative care is pursued in place of lung treatment. In some embodiments, palliative care is provided in addition to treatment for lung adenocarcinoma.
In some embodiments, methods of monitoring progression or severity of lung adenocarcinoma and/or monitoring effectiveness of treatment in a subject are provided. In some embodiments, a method comprises detecting the level of one or more lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., FHL1, CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FAM83A-AS1 , RP11-371A19.2, MNX1-AS1 , EX3A, TUBB3, RP11-141 J13.5, RP11- 353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1 , or any sub-combinations thereof) in a sample from the subject at a first time point. In some embodiments, the method further comprises measuring the level one or more of the biomarkers at a second time point. In some embodiments, lung adenocarcinoma severity is improving (e.g., declining) if the level of said biomarkers improved at the second time point than at the first time point.
In some embodiments, biomarkers or panels thereof provide a prognosis regarding the future course lung adenocarcinoma in a subject (e.g., likelihood of survival, likelihood of mortality, likelihood of response to therapy, etc.). In some embodiments treatment decisions (e.g., whether to treat, surgery, radiation, chemotherapy, etc.) are made based on the detection and/or quantification of one or more (e.g., 1 , 2, 3, 4, 5) of the biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., comprising FHL1 , CD5L, PTPRH, MP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GP 6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FA 107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FA 83A-AS1 , RP11-371A19.2, MNX1 -AS1 , MEX3A, TUBB3, RP11-141 J13.5, RP11- 353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1 , or any sub-combinations thereof).
In some embodiments, kits are provided. In some embodiments, a kit comprises at least one, at least two, at least three, at least four, of at least five capture/detection reagents (e.g., antibody, probe, etc.), wherein each capture/detection reagents specifically binds to a different biomarker (e.g., protein or nucleic acid) selected from the lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GP 6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FR D5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FAM83A-AS1 , RP11-371A19.2, MNX1-AS1 , MEX3A, TUBB3, RP11-141 J13.5, RP11- 353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1 ). In some embodiments, a kit comprises N capture/detection reagents. In some embodiments, N is 1 to 50. In some embodiments, N is 2 to 50. In some embodiments, N is 3 to 50. In some embodiments, N is 4 to 50. In some
embodiments, N is 5 to 50. In some embodiments, at least one of the 50 biomarker proteins is selected from the lung adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1, SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FA 83A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FA 83A-AS1 , RP11-371 A19.2, MNX1-AS1 , EX3A, TUBB3, RP11-141 J13.5, RP11-353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1 ). In some embodiments, compositions are provided comprising proteins of a sample from a subject and at least one, at least two, at least three, at least four, at least five capture/detection reagents that each specifically bind to a different biomarker selected from the lung
adenocarcinoma biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., FHL1 , CD5L, PTPRH, MP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1, TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GP 6A, WNT3A, ITLN2, SCUBE1, CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FA 83A-AS1 , RP1 1-371 A19.2, MNX1-AS1 , MEX3A, TUBB3, RP11-141J13.5, RP11-353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1).
BRIEF DESCRIPTION OF THE DRAWINGS
The invention wili be best understood from the following detailed description when read in conjunction with the accompanying drawings. The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. It is emphasized that, according to common practice, the various features of the drawings are not to- scale. On the contrary, the dimensions of the various features are arbitrarily expanded or reduced for clarity. Included in the drawings are the following figures.
Figure 1. The analysis procedure of RNA sequencing data. Each step and packages used in alignment, quantification, and DE analysis are described in this figure.
Figure 2. Scatterplot of calculated probabilities of lung adenocarcinoma with selected 50- gene panel. The model was trained with Random Forest algorithm, 371/564 case/control (417/634 in total) were selected out randomly to train the model.
Figure 3. ROC curves for models of lung adenocarcinoma assessment with selected biomarker profile evaluated on early stage patients versus normal subjects. Average true positive rate was calculated with 500 10-fold cross validation fits of the model.
Figure 4. Unsupervised hierarchical cluster analysis with heat map shows the
abundance pattern of selected biomarkers of early stage lung adenocarcinoma patients versus normal subjects.
DETAIL DESCRIPTION OF THE INVENTION
Lung adenocarcinoma markers and panels
In some aspects of the invention, lung adenocarcinoma biomarkers are provided. By a "biomarker" or "marker" it is meant a molecular entity whose representation in a sample is associated with a disease phenotype. By "lung adenocarcinoma" it is meant a subtype of non- small cell lung cancer that is often diagnosed in an outer area of the lung and arises from the secretory (glandular) cells located in the epithelium lining the bronchi. Thus, by a lung adenocarcinoma "biomarker" or "iung adenocarcinoma marker" it is meant a molecular entity whose representation in a sample is associated with a lung adenocarcinoma phenotype, e.g., the presence of lung adenocarcinoma, the stage of lung adenocarcinoma, a prognosis associated with the lung adenocarcinoma, the predictability of the lung adenocarcinoma being responsive to a therapy, etc. In other words, the marker may be said to be differentially represented in a sample having a iung adenocarcinoma phenotype.
Lung adenocarcinoma biomarkers include proteins that are differentially represented in a lung adenocarcinoma phenotype and their corresponding genetic sequences, i.e., mRNA, DNA, etc. By a "gene" or "recombinant gene" it is meant a nucleic acid comprising an open reading frame that encodes for the protein. The boundaries of a coding sequence are determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus. A transcription termination sequence may be located 3' to the coding sequence. In addition, a gene may optionally include its natural promoter (i.e., the promoter with which the exons and introns of the gene are operably linked in a non-recombinant cell , i.e., a naturally occurring cell), and associated regulatory sequences, and may or may not have sequences upstream of the AUG start site, and may or may not include untranslated leader sequences, signal sequences, downstream untranslated sequences, transcriptional start and stop sequences, polyadenylation signals, translational start and stop sequences, ribosome binding sites, and the like. The term "gene product" or "expression product" are used herein to refer to the RNA transcription products (transcripts) of the gene, including mRNA; and the polypeptide translation products of such RNA transcripts, i.e. the amino acid product encoded by a gene. A gene product can be, for example, an RNA transcript of the gene, e.g. an unspliced RNA, an mRNA, a splice variant mRNA, a microRNA, a fragmented RNA, etc.; or an amino acid product encoded by the gene, including, for example, full length polypeptide, splice variants of the full length polypeptide, post-translationally modified polypeptide, and fragments of the gene product, e.g. peptides, etc. In some instances, an elevated level of marker or marker activity may be associated with the lung adenocarcinoma phenotype. In other instances, a reduced level of marker or marker activity may be associated with the lung adenocarcinoma phenotype.
Lung cancer stage
Lung cancer stages were classified with the American Joint Committee on Cancer (AJCC) staging system (Table 1 ). In the AJCC system, T is used to categorize the pathology of the tumor(TX: Primary tumor cannot be assessed, or tumor proven by the presence of malignant cells in sputum or bronchial washings but not visualized by imaging or
bronchoscopy;T0: No evidence of primary tumor; Tis Carcinoma in situ: Tis (AIS):
adenocarcinoma, Tis (SCIS): squamous cell carcinoma;T1 : Tumor 3 cm or less in greatest dimension, surrounded by lung or visceral pleura, without bronchoscopic evidence of invasion more proximal than the lobar bronchus (i.e., not in the main bronchus); the uncommon superficial spreading tumor of any size with its invasive component limited to the bronchial wall, which may extend proximal to the main bronchus, is also classified as T1a,T1 mi: Minimally invasive adenocarcinoma;T1a: Tumor 1 cm or less in greatest dimension;T1 b: Tumor more than 1 cm but not more than 2 cm in greatest dimension;T1c: Tumor more than 2 cm but not more than 3 cm in greatest dimension;T2: Tumor more than 3 cm but not more than 5 cm; or tumor with any of the following features (T2 tumors with these features are classified T2a if 4 cm or less or if size cannot be determined and as T2b if greater than 4 cm but not larger than 5 cm): Involves main bronchus regardless of distance to the carina, but without involving the carina ; Invades visceral pleura; Associated with atelectasis or obstructive pneumonitis that extends to the hilar region, either involving part of the lung or the entire lung; T2a: Tumor more than 3 cm but not more than 4 cm in greatest dimension;T2b:Tumor more than 4 cm but not more than 5 cm in greatest dimension;T3: Tumor more than 5 cm but not more than 7 cm in greatest dimension or one that directly invades any of the following: parietal pleura (PL3), chest wall (including superior sulcus tumors), phrenic nerve, parietal pericardium; or associated separate tumor nodule(s) in the same lobe as the primary;T4: Tumors more than 7 cm or one that invades any of the following: diaphragm, mediastinum, heart, great vessels, trachea, recurrent laryngeal nerve, esophagus, vertebral body, carina; separate tumor nodule(s) in a different ipsilateral lobe to that of the primary); N describes the pathology of local lymph (NX: Regional lymph nodes cannot be assessed;N0: No regional lymph node metastasis;N1 :
Metastasis in ipsilateral peribronchial and/or ipsilateral hilar lymph nodes and intrapulmonary nodes, including involvement by direct extension;N2: Metastasis in ipsilateral mediastinal and/or subcarinal lymph node(s);N3: Metastasis in contralateral mediastinal, contralateral hilar, ipsilateral, or contralateral scalene, or supraclavicular lymph node(s)); and M describes the extent, if any, of metastasis(MO:No distant metastasis;M1 : Distant metastasis;M1 a: Separate tumor nodule(s) in a contralateral lobe; tumor with pleural nodules or malignant pleural or pericardial effusion; most pleural (pericardial) effusions with lung cancer are due to tumor; in a few patients, however, multiple microscopic examinations of pleural (pericardial) fluid are negative for tumor, and the fluid is non-bloody and is not an exudate; where these elements and ciinicai judgment dictate that the effusion is not related to the tumor, the effusion should be excluded as a staging descriptor;M1 b:Single extrathoracic metastasis in a single organ and involvement of a single distant (non-regionai) node; 1c: Multiple extrathoracic metastases in one or several organs).
Table 1. Stage Grouping of the Eighth Edition of the TNM Classification of Lung Cancer
( J Thorac Oncol. 2016;1 1:39-51 )
STAGE T N M
Occult carcinoma TX NO MO
0 Tis NO MO
!Al Tlmi NO MO
Tla NO MO
IA2 Tib NO MO
IA3 Tic NO MO
IB T2a NO MO
HA T2b NO MO
IIB Tla,b,c Nl MO
T2a,b Nl MO
T3 NO MO
!IIA Tla,b,c N2 MO
T2a,b N2 MO
T3 Nl MO
T4 NO MO
T4 Nl MO
IIIB Tla,b,c N3 MO
T2a,b N3 MO
T3 N2 MO
T4 N2 MO
!IIC T3 N3 MO
T4 N3 MO
IVA Any T Any N Mia
Any T Any N Mlb
IVB Any T Any N Mlc
Abbreviations: T1mi, minimally invasive adenocarcinoma; Tis, tumor in situ.
Detection of Biomarkers and Determination of Biomarker Levels
The presence of a biomarker or a biomarker level for the biomarkers described herein can be detected using any of a variety of analytical methods, in one embodiment, a biomarker level is detected using a capture reagent. In various embodiments, the capture reagents exposed to the biomarker in solution or is exposed to the biomarker while the capture reagent is immobilized on a solid support. In other embodiments, the capture reagent contains a feature that is reactive with a secondary feature on a solid support. In these embodiments, the capture reagent is exposed to the biomarker in solution, and then the feature on the capture reagent is used in conjunction with the secondary feature on the solid support to immobilize the biomarker on the solid support. The capture reagent is selected based on the type of analysis to be conducted. Capture reagents include but are not limited to aptamers, antibodies, other antibody mimetics and other protein scaffolds, autoantibodies, chimeras, small molecules, F(ab')2 fragments, single chain antibody fragments, FV fragments, single chain FV fragments, nucleic acids, lectins, ligand-binding receptors, affybodies, nanobodies, imprinted polymers, avimers, peptidomimetics, hormone receptors, cytokine receptors, and synthetic receptors, and modifications and fragments of these.
In some embodiments, biomarker presence or level is detected using a
biomarker/capture reagent complex, in some embodiments, the biomarker presence or level is derived from the biomarker/capture reagent complex and is detected indirectly, such as, for example, as a result of a reaction that is subsequent to the biomarker/capture reagent interaction, but is dependent on the formation of the biomarker/capture reagent complex.
In some embodiments, biomarker presence or level is detected directly from the biomarker in a biological sample.
In some embodiments, biomarkers are detected using a multiplexed format that allows for the simultaneous detection of two or more biomarkers in a biological sample. In some embodiments of the multiplexed format, capture reagents are immobilized, directly or indirectly, covalently or non-covalently, in discrete locations on a solid support. In some embodiments, a multiplexed format uses discrete solid supports where each solid support has a unique capture reagent associated with that solid support, such as, for example quantum dots. In some embodiments, an individual device is used for the detection of each one of multiple biomarkers to be detected in a biological sample. Individual devices are configured to permit each biomarker in the biological sample to be processed simultaneously. For example, a microtiter plate can be used such that each well in the plate is used to analyze one or more of multiple biomarkers to be detected in a biological sample.
In some embodiments, the fluorescent label is a fluorescent dye molecule. In some embodiments, the fluorescent dye molecule includes at least one substituted indolium ring system in which the substituent on the 3-carbon of the indolium ring contains a chemically reactive group or a conjugated substance. In some embodiments, the dye molecule includes an AlexFluor molecule, such as, for example, AlexaFluor 488, AlexaFluor 532, AlexaFluor 647, AlexaFluor680, or AlexaFluor 700. In some embodiments, the dye molecule includes a first type and a second type of dye molecule, such as, e.g., two different AlexaFluor molecules. In some embodiments, the dye molecule includes a first type and a second type of dye molecule, and the two dye molecules have different emission spectra.
Fluorescence can be measured with a variety of instrumentation compatible with a wide range of assay formats. For example, spectrofluorimeters have been designed to analyze microtiter plates, microscope slides, printed arrays, cuvettes, etc. See Principles of
Fluorescence Spectroscopy, by J. R. Lakowicz, Springer Science+Business Media, Inc., 2004. See Bioluminescence & Chemiluminescence: Progress & Current Applications; Philip E. Stanley and Larry J. Kricka editors, World Scientific Publishing Company, January 2002.
In one or more embodiments, a chemiluminescence tag is optionally used to label a component of the biomarker/capture complex to enable the detection of a biomarker level. Suitable chemiluminescent materials include any of oxalylchloride, Rodamin 6G, Ru(bipy)32+, TMAE (tetrakis(dimethylamino)ethylene), Pyrogallol (1 ,2,3-trihydroxibenzene), Lucigenin, peroxyoxaiates, Aryl oxalates, Acridinium esters, dioxetanes, and others.
In some embodiments, the detection method includes an enzyme/substrate combination that generates a detectable signal that corresponds to the biomarker level (e.g., using the techniques of ELISA, Western blotting, isoelectric focusing). Generally, the enzyme catalyzes a chemical alteration of the chromogenic substrate which can be measured using various techniques, including spectrophotometry, fluorescence, and chemiluminescence. Suitable enzymes include, for example, luciferases, luciferin, malate dehydrogenase, urease, horseradish peroxidase (HRPO), alkaline phosphatase, beta-galactosidase, glucoamylase, lysozyme, glucose oxidase, galactose oxidase, and glucose-6-phosphate dehydrogenase, uricase, xanthine oxidase, lactoperoxidase, microperoxidase, and the like.
In some embodiments, the detection method is a combination of fluorescence, chemiluminescence, radionuclide or enzyme/substrate combinations that generate a
measurable signal, in some embodiments, multimodal signaling has unique and advantageous characteristics in biomarker assay formats.
In some embodiments, the biomarker levels for the biomarkers described herein is detected using any analytical methods including, singleplex aptamer assays, multiplexed aptamer assays, singleplex or multiplexed immunoassays, mRNA expression profiling histological/cytological methods, etc. as discussed below.
Determination of Biomarker Levels Using Gene Expression Profiling Measuring mRNA in a biological sample may, in some embodiments, be used as a surrogate for detection of the level of a corresponding protein in the biological sample.
Thus, in some embodiments, a biomarker or biomarker panel described herein can be detected by detecting the appropriate RNA.
In some embodiments, mRNA expression levels are measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed with qPCR). RT-PCR is used to create a cDNA from the mRNA. The cDNA may be used in a qPCR assay to produce
fluorescence as the DNA amplification process progresses. By comparison to a standard curve, qPCR can produce an absolute measurement such as number of copies of mRNA per cell. Northern blots, microarrays, RNAseq, Invader assays, and RT-PCR combined with capillary electrophoresis have all been used to measure expression levels of mRNA in a sample. See Gene Expression Profiling; Methods and Protocols, Richard A. Shimkets, editor, Humana Press, 2004; herein incorporated by reference in its entirety.
Determination of Biomarker Levels Using immunoassays
Immunoassay methods are based on the reaction of an antibody to its corresponding target or anaiyte and can detect the anaiyte in a sample depending on the specific assay format. To improve specificity and sensitivity of an assay method based on immuno-reactivity, monoclonal antibodies and fragments thereof, are often used because of their specific epitope recognition. Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies.
Immunoassays have been designed for use with a wide range of biological sample matrices. Immunoassay formats have been designed to provide qualitative, semi-quantitative, and quantitative results.
Quantitative results are generated through the use of a standard curve created with known concentrations of the specific anaiyte to be detected. The response or signal from an unknown sample is plotted onto the standard curve, and a quantity or level corresponding to the target in the unknown sample is established.
Numerous immunoassay formats have been designed. ELISA or EIA can be quantitative for the detection of an anaiyte. This method relies on attachment of a label to either the anaiyte or the antibody and the label component includes, either directly or indirectly, an enzyme. ELISA tests may be formatted for direct, indirect, competitive, or sandwich detection of the anaiyte. Other methods rely on labels such as, for example, radioisotopes (I125) or fluorescence.
Additional techniques include, for example, agglutination, nephelometry, turbidimetry, Western blot, immunoprecipitation, immunocytochemistry, immunohistochemistry, flow cytometry, Luminex assay, and others (see ImmunoAssay: A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition; herein incorporated by reference in its entirety).
Exemplary assay formats include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, fluorescent, chemiluminescence, and fluorescence resonance energy transfer (FRET) or time resolved-FRET (TR-FRET) immunoassays. Examples of procedures for detecting biomarkers include biomarker immunoprecipitation followed by quantitative methods that allow size and peptide level discrimination, such as gel electrophoresis, capillary
electrophoresis, planar electrochromatography, and the like.
Methods of detecting and/or for quantifying a detectable label or signal generating material depend on the nature of the label. The products of reactions catalyzed by appropriate enzymes (where the detectable label is an enzyme; see above) can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light. Examples of detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
Any of the methods for detection can be performed in any format that allows for any suitable preparation, processing, and analysis of the reactions. This can be, for example, in multi-well assay plates (e.g., 96 wells or 384 wells) or using any suitable array or microarray. Stock solutions for various agents can be made manually or robotically, and all subsequent pipetting, diluting, mixing, distribution, washing, incubating, sample readout, data collection and analysis can be done robotically using commercially available analysis software, robotics, and detection instrumentation capable of detecting a detectable label.
Determination of Biomarkers Using Histology/Cytology Methods
In some embodiments, the biomarkers described herein may be detected in a variety of tissue samples using histological or cytological methods. In some embodiments, one or more capture reagent/s specific to the corresponding biomarkers are used in a cytological evaluation of a sample and may include one or more of the following: collecting a cell sample, fixing the cell sample, dehydrating, clearing, immobilizing the cell sample on a microscope slide,
permeabilizing the cell sample, treating for analyte retrieval, staining, destaining, washing, blocking, and reacting with one or more capture reagent/s in a buffered solution. In another embodiment, the cell sample is produced from a cell block. In some embodiments, one or more capture reagent s specific to the corresponding biomarkers are used in a histological evaluation of a tissue sample and may include one or more of the following: collecting a tissue specimen, fixing the tissue sample, dehydrating, clearing, immobilizing the tissue sample on a microscope slide, permeabilizing the tissue sample, treating for analyte retrieval, staining, destaining, washing, blocking, rehydrating, and reacting with capture reagent/s in a buffered solution. In another embodiment, fixing and dehydrating are replaced with freezing.
Data Analysis and Reporting
In some embodiments, the results are analyzed and/or reported (e.g., to a patient, clinician, researcher, investigator, etc.). Results, analyses, and/or data (e.g., signature, disease score, diagnosis, recommended course, etc.) are identified and/or reported as an
outcome/result of an analysis. A result may be produced by receiving or generating data (e.g., test results) and transforming the data to provide an outcome or result. An outcome or result may be determinative of an action to be taken. In some embodiments, results determined by methods described herein can be independently verified by further or repeat testing.
In some embodiments, analysis results are reported (e.g., to a health care professional (e.g., laboratory technician or manager, physician, nurse, or assistant, etc.), patient, researcher, investigator, etc.). In some embodiments, a result is provided on a peripheral, device, or component of an apparatus. For example, sometimes an outcome is provided by a printer or display. In some embodiments, an outcome is reported in the form of a report. Generally, an outcome can be displayed in a suitable format that facilitates downstream use of the reported information. Non-limiting examples of formats suitable for use for reporting and/or displaying data, characteristics, etc. include text, outline, digital data, a graph, graphs, a picture, a pictograph, a chart, a bar graph, a pie-chart, a diagram, a flow chart, a scatter plot, a map, a histogram, a density chart, a function graph, a circuit diagram, a block diagram, a bubble map, a constellation diagram, a contour diagram, a cartogram, spider chart, Venn diagram, nomogram, and the like, and combination of the foregoing. Generating and reporting results from the methods described herein comprises transformation of biological data (e.g., presence or level of biomarkers) into a representation of the characteristics of a subject (e.g., likelihood of mortality, likelihood corresponding to treatment, etc.). Such a representation reflects information not determinable in the absence of the method steps described herein. Converting biologic data into understandable characteristics of a subject allows actions to be taken in response such information. In some embodiments, a downstream individual (e.g., clinician, patient, etc.), upon receiving or reviewing a report comprising one or more results determined from the analyses provided herein, will take specific steps or actions in response. For example, a decision about whether or not to treat the subject, and/or how to treat the subject is made.
The term "receiving a report" as used herein refers to obtaining, by a communication means, a written and/or graphical representation comprising results or outcomes of analysis. The report may be generated by a computer or by human data entry, and can be communicated using electronic means (e.g., over the internet, via computer, via fax, from one network location to another location at the same or different physical sites), or by another method of sending or receiving data (e.g., mail service, courier service and the like). In some embodiments the outcome is transmitted in a suitable medium, including, without limitation, in verbal, document, or file form. The file may be, for example, but not limited to, an auditory file, a computer readable file, a paper file, a laboratory file or a medical record file. A report may be encrypted to prevent unauthorized viewing.
As noted above, in some embodiments, systems and method described herein transform data from one form into another form (e.g., from biomarker levels to diagnoistic/prognostic determination, etc.). In some embodiments, the terms "transformed", "transformation", and grammatical derivations or equivalents thereof, refer to an alteration of data from a physical starting material (e.g., biological sample, etc.) into a digital representation of the physical starting material (e.g., biomarker levels), a condensation/representation of that starting material (e.g., risk level), or a recommended action (e.g., treatment, no treatment, etc.).
Kits
Any combination of the biomarkers described herein can be detected using a suitable kit, such as for use in performing the methods disclosed herein. The biomarkers described herein may be combined in any suitable combination, or may be combined with other markers not described herein. Furthermore, any kit can contain one or more detectable labels as described herein, such as a fluorescent moiety, etc.
In some embodiments, a kit includes (a) one or more capture reagents for detecting one or more biomarkers in a biological sample, and optionally (b) one or more software or computer program products for providing a diagnosis/prognosis for the individual from whom the biological sample was obtained. Alternatively, rather than one or more computer program products, one or more instructions for manually performing the above steps by a human can be provided. In some embodiments, a kit comprises a solid support, a capture reagent, and a signal generating material. The kit can also include instructions for using the devices and reagents, handling the sample, and analyzing the data. Further the kit may be used with a computer system or software to analyze and report the result of the analysis of the biological sample.
The kits can also contain one or more reagents (e.g., solubilization buffers, detergents, washes, or buffers) for processing a biological sample. Any of the kits described herein can also include, e.g., buffers, blocking agents, mass spectrometry matrix materials, serum/plasma separators, antibody capture agents, positive control samples, negative control samples, software and information such as protocols, guidance and reference data.
In some embodiments, kits are provided for the analysis of glioma, wherein the kits comprise PCR primers for one or more biomarkers described herein. In some embodiments, a kit may further include instructions for use and correlation of the biomarkers. In some embodiments, a kit may include a DNA array containing the complement of one or more of the biomarkers described herein, reagents, and/or enzymes for amplifying or isolating sample DNA. The kits may include reagents for real-time PCR, for example, TaqMan probes and/or primers, and enzymes.
For example, a kit can comprise (a) reagents comprising at least one capture reagent for determining the level of one or more biomarkers in a test sample, and optionally (b) one or more algorithms or computer programs for performing the steps of comparing the amount of each biomarker quantified in the test sample to one or more predetermined cutoffs. In some embodiments, an algorithm or computer program assigns a score for each biomarker quantified based on said comparison and, in some embodiments, combines the assigned scores for each biomarker quantified to obtain a total score. Further, in some embodiments, an algorithm or computer program compares the total score with a predetermined score, and uses the comparison to determine a diagnosis/prognosis. Alternatively, rather than one or more algorithms or computer programs, one or more instructions for manually performing the above steps by a human can be provided.
Methods of Treatment
In some embodiments, following a determination that a subject has suffers from lung adenocarcinoma, the subject is appropriately treated. In some embodiments, therapy is administered to treat lung adenocarcinoma. In some embodiments, therapy is administered to treat complications of lung adenocarcinoma (e.g., surgery, radiation, chemotherapy). In some embodiments, treatment comprises palliative care.
In some embodiments, methods of monitoring treatment of lung adenocarcinoma are provided. In some embodiments, the present methods of detecting biomarkers are carried out at a time 0. In some embodiments, the method is carried out again at a time 1 , and optionally, a time 2, and optionally, a time 3, etc., in order to monitor the progression of lung adenocarcinoma or to monitor the effectiveness of one or more treatments of lung adenocarcinoma. Time points for detection may be separated by, for example at least 4 hours, at least 8 hours, at least 12 hours, at least 1 day, at least 2 days, at least 4 days, at least 1 week, at least 2 weeks, at least 1 month, at least 2 months, at least 3 months, at least 4 months, at least 6 months, or by 1 year or more. In some embodiments, a treatment regimen is altered based upon the results of monitoring (e.g., upon determining that a first treatment is ineffective). In some embodiments, the level of intervention may be altered.
EXAMPLES
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been make to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Centigrade, and pressure is at or near atmospheric.
General methods in molecular and cellular biochemistry can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd ED. (Sambrook et al., HaRBor Laboratory Press 2001 ); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et ai. eds., john Wiley & Sons 1999); Protein Methods (Boilag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference.
Reagents, cloning vectors, and kits for genetic manipulation referred to in this disclosure are available from commercial vendors such as BioRad, Stratagene, Invitrogen, Sigma-Aldrich, and ClonTech. EXAMPLE 1
Materials and methods
Data collection and pre-processing.
The raw count RNA sequencing data for iung adenocarcinoma patients were downloaded from GDC data portal. The patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port. The SRA RNA sequencing data for normal lung tissue were downloaded from GTEx data portal through dbGaP (Table 2). The two data sets were then manually curated based on the available stage and grade information from patient clinical data. In this patent we used 417 early stage lung adenocarcinoma samples and 634 normal lung samples as our dataset for early stage lung adenocarcinoma biomarker detection. We manually categorized the data sets based on the available stage and grade information for each samples (Table 2).
Table 2. Data sets used for RNA sequencing differential expression analysis.
Library Normal
Data set Tissue source Early stage Late stage Low gracfe High grade Total samples layout contra!
Ovarian cancer
GOC Pair-end -517 110 NA NA 59 586 tissue
GTEx Pair-end Normal ovary tissue NA 57S S7S
Unknown Normal
Tissue so ce Stage 1 Stage II Stage HI Stage IV Total samples stage control
Ovarian cancer tissue 294 123 84 26 NA 59 SS6
Normal ovary tissue NA NA NA NA NA 575 575
Genomic sequencing pipeline for RNA sequencing data.
The entire RNAseq pipeline was divided into two parts for GTEx data: alignment and quantification (Figure 1 ). The alignment step consists of: SRA to bam conversion using SRA Toolkits (SRA Toolkit development team), bam to fastq conversion using Biobambam (Tischler G et al.,2014), and fastq to aligned bam conversion using STAR (Alex D et al,.2016). The quantification step consists of: quality improvement filtering using Fixmate (http://broadinstitute.github.io/picard/), sorting and quality filtering using samtools (Li H et al,.2009), and sequence counting using HTSeq (Simon A et al,.2014).
The output from quantification step results in gene raw counts for GTEx data and is conbined with GDC gene profile for further downstream analysis.
Normalizations of RNA sequencing data.
The gene expression profile is then pre-filtered based on the mean expression per gene. The filtered profile is then normalized using quantile metric and is converted into log2 scale. Combat package (from edgeR, http://www.r-project.org} is then used to perform further normalization between GDC case, GDC control, and GTEx control to minimize the difference between normal controls from two databases (Figure 1).
Differentiated gene selection.
The normalized gene profile is then analyzed by linear model using R package 'iimma' (http://www.r-project.org). The 50 genes with relatively low p-values and relatively large absolute value of log2 fold change were selected as our panel.
Random forest analysis.
The selected gene expression profile was firstly normalized to z-score across all the samples. The z-score of the gene expression profiles for the samples randomized to the statistical training cohort (n = 935) were then analyzed by Random Forest analysis using the R package 'randomForest' (http://www.r-project.org/). All subjects in the training cohort were subsequently assigned to one of two possible subgroups (normal and early stage). With the trained model applied to both training cohort and testing cohort (n = 116), the probability of each sample in each subgroup can be calculated (Figure 2). Receiver-operator characteristic (ROC) analysis was conducted (Figure 3) to evaluate the ability of the selected gene expression profile in differentiating the subjects in the testing cohort with early stage lung adenocarcinoma patients from those normal samples. This process was repeated 500 times using bootstrapping algorithm to get more accurate evaluation of the model.
Heat map. Unsupervised hierarchical clustering analysis was performed (Figure 4) to visually depict the association between the disease status and the abundance pattern of these transcriptome profile. This analysis was used to demonstrate the effectiveness of this selected gene panel in differentiating early stage lung adenocarcinoma and normal class distinction.
EXAMPLE 2
Results
Data collection, pre-processing.
The RNA sequencing data for early stage lung adenocarcinoma tissue and normal lung tissue were downloaded from GDC and GTEx data portal. The patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port. The normal lung tissue data from GTEx were processed using developed RNA-seq pipeline.
Statistical results for fifty-one selected genes.
A linear model from R package "limma" is applied for gene profile between early stage lung adenocarcinoma and normal samples. P-values and log2 fold change for each selected gene were shown in Table 3.
Table 3
The 50 genes used to differentiate of early stage lung
adenocarcinoma patients from normal subjects.
Gene symbol LogFC FDR
FHL1 -3.918 1.61 E-88
CD5L -4.060 1.84E-53
PTPRH 4.348 6.91 E-65
MMP11 4.827 5.30E-57
ANGPT4 -4.370 4.37E-47
RS1 -4.390 5.60E-58
CAV1 -4.093 4.14E-51
SLC6A4 -7.429 7.32E-49
CLIC5 -4.422 4.64E-60
OTX1 3.376 8.73E-54
KIF14 3.258 4.92E-87
ATP10B 4.421 5.02E-54
COL10A1 4.708 5.72E-88
TOP2A 3.490 2.15E-67
PTPRQ -4.757 4.28E-40 SH3GL3 -5.041 1.39E-60
FCN3 -4.758 6.21 E-89
CRABP2 4.518 6.87E-67
ABCA12 4.469 4.50E-85
F AM 83 A 5.975 4.75E-57
GPM6A -5.657 6.09E-46
WNT3A -5.027 1.58E-40
ITLN2 -6.709 2.68E-68
SCUBE1 -4.214 5.56E-66
CLEC3B -4.307 3.50E-80
CTHRC1 3.730 2.68E-83
CA4 -5.490 5.06E-40
FAM107A -4.910 3.02E-59
STXBP6 -4.079 5.68E-59
FABP4 -6.026 5.14E-66
CST1 5.202 1.19E-90
FRMD5 3.486 3.30E-92
ETV4 3.456 7.78E-66
C10orf67 -4.510 4.35E-55
SERTM1 -6.414 2.02E-71
PYCR1 3.525 1.45E-61
IQGAP3 3.350 2.98E-88
SAPCD2 3.332 1.69E-43
ZYG11A 3.862 9.86E-76
AGER -6.536 1.14E-72
FAM83A-AS1 4.416 2.21 E-47
RP11-371A19.2 -4.949 6.04E-71
MNX1-AS1 3.562 1.31 E-39
MEX3A 3.656 2.01 E-52
TUBB3 4.129 8.13E-73
RP11 -141J13.5 -4.886 8.18E-44
RP11-353N14.2 3.543 1.10E-39
CTD-3010D24.3 3.569 7.13E-79
FENDRR -4.430 1.03E-56
AFAP1-AS1 5.377 5.55E-56
Performance of transcriptomics profile-based prognostic algorithm The Random Forest based risk model stratified all subjects in training and testing cohorts into two levels of risk for progression as discussed above (normal or early stage). 50 selected genes profiles (normalized) were used as the model input. The risk scores of lung cancer were calculated by the model (Figure 2). We use 0.5 as the cutoff threshold.
The c statistic of the model measured on the testing cohort was 1 (Figure 3).
Unsupervised hierarchical clustering with transcriptomics profiles
Unsupervised hierarchical clustering analysis was applied to the selected genes profiles to visually depict the association of the disease status with the abundance patterns of these genes profiles (Figure 4). This analysis demonstrated two major clusters reflecting normal samples and early stage lung cancer samples. The error rate of the unsupervised clustering is 0.381%, which reinforced the effectiveness of the selected gene profiles for lung cancer assessment.

Claims

CLAIMS What is claimed is:
1. A method, comprising detecting the level of one or more target analytes, but fewer than 50 target analytes, in a sample from a subject to be tested for lung adenocarcinoma, one or more of the target analytes being selected from the group consisting of FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FR D5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FAM83A-AS1 , RP11-371A19.2, MNX1-AS1 , EX3A, TUBB3, RP11-141 J13.5, RP11-353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1, wherein a change in the expression of these genes are associated with lung adenocarcinoma.
2. The method of claim 1 , further comprising detecting one or more additional target analytes.
3. The method of claim 2, comprising detecting three or more target analytes being selected from the group consisting of FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FA 83A, GP 6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FA 83A-AS1 , RP11-371A19.2, MNX1-AS1 , MEX3A, TUBB3, RP11- 141J13.5, RP11-353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1.
4. The method of claim 2, comprising detecting ten or more target analytes.
5. The method of claim 1 , wherein the sample is a blood product selected from whole blood; plasma; serum; and filtered, concentrated, fractionated or diluted samples of the preceding.
6. The method of claim 1 , wherein the sample is a biopsy tissue.
7. The method of claim 1 , wherein the method comprises contacting the sample with a set of capture reagents, wherein each capture reagent specifically binds to a different target analyte being detected.
8. The method of claim 7, wherein each capture reagent is an antibody.
9. The method of claim 7, wherein each capture reagent is a nucleic acid probe.
10. Reagents comprising capture reagents for the detection of two or more target anaiytes, but fewer than 50 target analytes, two or more of the target analytes being selected from the group consisting of FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, F AM 83 A, GPM6A, WNT3A, ITLN2, SCUBE1 , CLEC3B, CTHRC1 , CA4, FAM107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FAM83A-AS1 , RP11 -371A19.2, MNX1-AS1 , MEX3A, TUBB3, RP11-141 J13.5, RP11 - 353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1.
11. The reagents of claim 10, wherein said capture reagents are antibodies.
12. The reagents of claim 10, wherein said capture reagents are nucleic acid probes.
13. A kit comprising the reagents of claim 10 and one or more additional reagents for carrying out an assay in a sample from a subject.
14. The reagents of claim 10, comprising capture reagents for detecting three or more target analytes selected from the group consisting of FHL1 , CD5L, PTPRH, MMP11 , ANGPT4, RS1 , CAV1 , SLC6A4, CLIC5, OTX1 , KIF14, ATP 10B, COL10A1 , TOP2A, PTPRQ, SH3GL3, FCN3, CRABP2, ABCA12, FAM83A, GP 6A, WNT3A, ITLN2, SCUBE1, CLEC3B, CTHRC1 , CA4, FA 107A, STXBP6, FABP4, CST1 , FRMD5, ETV4, C10orf67, SERTM1 , PYCR1 , IQGAP3, SAPCD2, ZYG11A, AGER, FAM83A-AS1 , RP1 1-371 A19.2, MNX1-AS1 , MEX3A, TUBB3, RP11-141J13.5, RP11-353N14.2, CTD-3010D24.3, FENDRR, AFAP1-AS1.
PCT/US2017/023474 2017-03-21 2017-03-21 Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling WO2018174860A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2017/023474 WO2018174860A1 (en) 2017-03-21 2017-03-21 Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2017/023474 WO2018174860A1 (en) 2017-03-21 2017-03-21 Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling

Publications (1)

Publication Number Publication Date
WO2018174860A1 true WO2018174860A1 (en) 2018-09-27

Family

ID=63586056

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/023474 WO2018174860A1 (en) 2017-03-21 2017-03-21 Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling

Country Status (1)

Country Link
WO (1) WO2018174860A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110106252A (en) * 2019-06-19 2019-08-09 济宁市第一人民医院 Cancer diagnosis molecular marker

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110287967A1 (en) * 2009-01-28 2011-11-24 Ait Austrian Institute Of Technology Gmbh Lung Cancer Methylation Markers
US20130123130A1 (en) * 2007-06-01 2013-05-16 The Regents Of The University Of California Multigene prognostic assay for lung cancer

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130123130A1 (en) * 2007-06-01 2013-05-16 The Regents Of The University Of California Multigene prognostic assay for lung cancer
US20110287967A1 (en) * 2009-01-28 2011-11-24 Ait Austrian Institute Of Technology Gmbh Lung Cancer Methylation Markers

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110106252A (en) * 2019-06-19 2019-08-09 济宁市第一人民医院 Cancer diagnosis molecular marker

Similar Documents

Publication Publication Date Title
US10494677B2 (en) Predicting cancer outcome
US20240094222A1 (en) Nonalcoholic Fatty Liver Disease (NAFLD) and Nonalcoholic Steatohepatitis (NASH) Biomarkers and Uses Thereof
US7803552B2 (en) Biomarkers for predicting prostate cancer progression
CN108957006B (en) Non-alcoholic fatty liver disease (NAFLD) and non-alcoholic steatohepatitis (NASH) biomarkers and uses thereof
US20120143805A1 (en) Cancer Biomarkers and Uses Thereof
WO2018174861A1 (en) Methods and compositions for detecting early stage breast cancer with rna-seq expression profiling
WO2011031344A1 (en) Cancer biomarkers and uses thereof
CA2847188A1 (en) Non-small cell lung cancer biomarkers and uses thereof
US20150160225A1 (en) Renal Cell Carcinoma Biomarkers and Uses Thereof
WO2015164616A1 (en) Biomarkers for detection of tuberculosis
WO2018140049A1 (en) Methods and compositions for detecting early stage ovarian cancer with rnaseq expression profiling
WO2018174862A1 (en) Methods and compositions for detecting early stage bladder cancer with rna-seq expression profiling
WO2018174863A1 (en) Methods and composition for detecting early stage colon cancer with rna-seq expression profiling
WO2016123058A1 (en) Biomarkers for detection of tuberculosis risk
CN113444796B (en) Biomarkers associated with lung cancer and their use in diagnosing cancer
WO2018174860A1 (en) Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling
WO2018174859A1 (en) Methods and compositions for detection of early stage lung squamous cell carcinoma with rnaseq expression profiling
US20180356419A1 (en) Biomarkers for detection of tuberculosis risk
US20230071234A1 (en) Nonalcoholic Steatohepatitis (NASH) Biomarkers and Uses Thereof
US20130073213A1 (en) Gene Expression-Based Differential Diagnostic Model for Rheumatoid Arthritis
WO2023059854A1 (en) Lung cancer prediction and uses thereof
EP2607494A1 (en) Biomarkers for lung cancer risk assessment
WO2024015486A1 (en) Methods for sample quality assessment
WO2023141248A1 (en) Methods for sample quality assessment
WO2024064322A2 (en) Methods of assessing tobacco use status

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17902272

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17902272

Country of ref document: EP

Kind code of ref document: A1