WO2023152568A2 - Compositions and methods for characterizing lung cancer - Google Patents

Compositions and methods for characterizing lung cancer Download PDF

Info

Publication number
WO2023152568A2
WO2023152568A2 PCT/IB2023/000079 IB2023000079W WO2023152568A2 WO 2023152568 A2 WO2023152568 A2 WO 2023152568A2 IB 2023000079 W IB2023000079 W IB 2023000079W WO 2023152568 A2 WO2023152568 A2 WO 2023152568A2
Authority
WO
WIPO (PCT)
Prior art keywords
hsa
trf
iso
mir
seq
Prior art date
Application number
PCT/IB2023/000079
Other languages
French (fr)
Other versions
WO2023152568A3 (en
Inventor
Trine B. ROUNGE
Sinan Ugur UMU
Hilde LANGSETH
Robert Lyle
Original Assignee
Oslo Universitetssykehus Hf
Kreftregisteret
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oslo Universitetssykehus Hf, Kreftregisteret filed Critical Oslo Universitetssykehus Hf
Publication of WO2023152568A2 publication Critical patent/WO2023152568A2/en
Publication of WO2023152568A3 publication Critical patent/WO2023152568A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Definitions

  • the present invention relates to compositions and methods for characterizing cancer.
  • the present invention relates to compositions and methods for identifying individuals at increased risk of developing lung cancer.
  • Non-small-cell (NSCLC) and small-cell (SCLC) are the two major subtypes of LC.
  • the symptoms generally occur at a late stage and prognosis is poor.
  • the stage at diagnosis typically determines patient survival (3-5).
  • Screening with low- dose computed tomography (LDCT) can be effective for early detection (5, 6) and reduce LC mortality up to 20% in high-risk groups (7-9).
  • LDCT has limitations such as high false-positive rates, risk of overdiagnosis and high cost 6 10 .
  • Annual CT scans also cause harmful radiation exposure 5 ’ 8 .
  • Robust biomarkers can help stratify high-risk groups and increase accuracy in patient inclusion criteria for LDCT-based screening programs 8 .
  • RNA biomarkers can be used to detect cancer (8, 11, 12).
  • MicroRNAs miRNA
  • miRNA a class of ⁇ 21 nucleotide long short RNAs
  • They can be found both in serum (13, 17, 18) and plasma (13, 18, 19) as cell-free circulating RNAs.
  • miRNAs can function as tumor suppressors or oncomiRs, and regulate tumor traits such as cell growth, angiogenesis, immune system evasion and metastasis (14, 20).
  • the search for RNA biomarkers is not limited to miRNAs.
  • RNA classes such as protein coding mRNAs, tRNAs, piwi-interacting RNAs (piRNAs) and long-noncoding RNAs (IncRNAs) have been associated with cancer (21, 22).
  • piRNAs piwi-interacting RNAs
  • IncRNAs long-noncoding RNAs
  • the present invention relates to compositions and methods for characterizing cancer.
  • the present invention relates to compositions and methods for identifying individuals at increased risk of developing lung cancer.
  • LC prognosis is closely linked to the stage of disease when diagnosed.
  • Cell-free RNA molecules have been proposed as early diagnosis biomarkers of LC. They are associated with all hallmarks of cancer and are readily obtainable from serum as circulating RNAs.
  • Experiments described herein investigated the use of serum RNAs for the early detection of LC in smokers at different prediagnostic time intervals and histological subtypes. The results indicated that smokers can be robustly separated from healthy controls before LC diagnosis regardless of histology with an average AUC of 0.80 (95% CI, 0.75-0.85).
  • NSCLC non-small cell LC
  • a method of assaying a sample from a subject comprising: determining the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) markers selected from, for example i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO:1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23-BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL
  • kits comprising reagents for determining the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) markers selected from, for example i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO:1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23- BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY (TC
  • FIG. 1 For example i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO: 1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23- BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252
  • Additional embodiments provide a method of determining an increased risk of lung cancer, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) markers selected from, for example, iso-20- 5KP25HFF, GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-
  • one or more markers selected from, for example, iso-20- 5KP25HFF, GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-
  • Yet other embodiments provide a method of determining an increased risk of NSCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, LINC01362, Y-RNA, ISO-23-BONKZOUOD, ISO-22-MKJIJLJ2Q, ISO-21 -N2NBQRZ00, GBP3, ISO-20- RNUW92OI, GNAS, hsa-miR-30a-3p, NHSL2, piR-hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23-909U247N04, tRF-I89NJ4S2, tRF-9MV47P596VE, tRF- 86J8WPMN1EJ3, tRF-86V8WPMNlEJ3, and/or tRF-Q
  • Yet other embodiments provide a method of determining an increased risk of NSCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, tRF-20-739P8WQ0, tRF-20-J4S2I7L7, RNU2-19P, hsa-miR-193a-5p, NUDT3, iso-21-B0NKZ0RJ0, RBM39, tRF-29-7EMQ18Y3E7IN, iso-23-B0NKZ01J0E, iso-22-80FOUHBBP, piR-hsa-26131, iso- 21-DIPPZBOIO, iso-20-RNUW92OI, TANCI, iso-17-BJ93X24, iso-17-DIRN504, tRF-29- 3IRW18V6XOIE, RP11-182L
  • Still further embodiments provide a method of determining an increased risk of NSCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 15 or more) markers selected from, for example, hsa-miR-1273h- 5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, TCAGGCTCAGTCCCCTCCCGATT (ISO-23-8K4P8R8SDE), tRF-9MV47P594, tRF-YP9LON4V3, tRF-KY7343RXI7, and/or tRF-PSQP4PW3FJI0V; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or decreased in the sample.
  • Certain embodiments provide a method of determining an increased risk of SCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22- 947673FE5, AKAP9, MIGA1, RAP1B, RN7SL724P, RUFY2, ISO-23-X3749W540L, tRF- BS68BFD2, tRF-R29P4P9L5HJVE, and/or tRF-ZRS3S3RX8HYVD; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or
  • a method of determining an increased risk of SCLC comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more) markers selected from, for example, ISO-23-BONKZOUDW, ATL3, iso-21-Q85XJJ70D, HMGB1, hsa-miR-19b-3p, hsa-miR-215-5p, hsa-miR-30a-5p, hsa-miR- 339-3p, hsa-miR-760, RN7SL277P, and/or TNFRSF13C in a sample from a subject; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or decreased in the sample.
  • one or more markers selected from, for example, ISO-23-BONKZOUDW, ATL3, iso-21-Q85XJJ70D,
  • the increased risk of developing lung cancer is an increased risk of developing lung cancer within 2-8 (e.g., 2-5 or 6-8) years.
  • Also provided herein is a method of determining an increased risk of NSCLC comprising: a) assaying a sample from a subject for the level of one or more markers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) selected from, for example, tRF-20- 739P8WQ0, tRF-20-J4S2I7L7, RNU2-19P, hsa-miR-193a-5p, NUDT3, iso-21 -B0NKZ0RJ0, RBM39, tRF-29-7EMQ18Y3E7IN, iso-23-B0NKZ01J0E, iso-22-80FOUHBBP, piR-hsa- 26131, iso-21 -DIPPZBOIO, iso-20-RNUW92OI, TANCI, iso-17-BJ93X24, iso-17-DIRN504, tRF-29-3IRW18V6XOIE, RP11-182
  • Also provided herein is a method of determining an increased risk of NSCLC comprising: a) assaying a sample from a subject for the level of one or more markers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) selected from, for example, hsa-miR-1273h-5p piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, ISO-23-8K4P8R8SDE, tRF-9MV47P594, tRF-YP9LON4V3, tRF-KY7343RXI7, and/or tRF-PSQP4PW3FJI0V; and b) determining an increased risk of the subject developing lung cancer within 6-8 years when the level of the one or more markers are present, increased, or decreased in the sample.
  • markers e.g., 1, 2,
  • a method of determining an increased risk of SCLC comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO:ll)(ISO-23-BONKZ01JDW) ATL3, GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12), HMGB1, hsa-miR-19b-3p, hsa-miR- 215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, and/or TNFRSF13C in a sample from a subject; and b) determining an increased risk of the subject developing lung cancer within 2-5 years when the level of the one or more markers are present, increased, or decreased in the
  • a method of determining an increased risk of SCLC comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, MARCH8, tRF-25- JY7383RPD9, tRF-27-QlQ89P9L842, tRF-22-947673FE5, EXOC3, PTCH2, piR-hsa-1593, piR-hsa-28391, GRAP2, B4GALT2, ATP8A1, C6orf223, FADS1, iso-22-B04KZ01JL and/or MSN in a sample from a subject; and b) determining an increased risk of the subject developing lung cancer within 8-10 years when the level of the one or more markers are present, increased, or decreased in the sample.
  • markers selected from, for example, MARCH8, tRF-25- JY7383RPD9, tRF-27-QlQ
  • the assaying is repeated at regular or irregular time intervals.
  • subjects identified as having an increased risk of lung cancer are offered further cancer screening (e.g., molecular diagnostics and/or imaging (e.g., CT scans).
  • the present invention is not limited to particular methods of assaying a sample from a subject for the level or presence of the markers in the sample.
  • Exemplary methods utilize reagents including but not limited to, one or more sequencing primers, one or more amplification primers or one or more nucleic acid probes.
  • the present invention is not limited to particular samples. Examples include but are not limited to, whole blood, a blood product, a cell sample, a tissue sample, or a bodily fluid sample (e.g., urine).
  • samples include but are not limited to, whole blood, a blood product, a cell sample, a tissue sample, or a bodily fluid sample (e.g., urine).
  • the level of expression of the one or more biomarkers may be determined by comparison to a control as further defined herein, or as score, for example when the sample is a blind sample, as further described herein.
  • FIG. 1 A. Sample selection flow chart. B. Five different ML algorithms were tested to find the optimal one, XGBoost. FIG. 2: A. Each boxplot shows performances of each algorithm measured by AUCs.
  • FIG. 3 Sliding windows analysis showed better models which utilizes prediagnostic samples in specific time intervals such as SCLC models, which were restricted to samples from 2 to 5 years prior to diagnosis.
  • FIG. 4 Each ROC curve is based on the prediction results of a randomly created testing dataset (in total 5). AUC values show the average of these predictions.
  • FIG. 5 Exemplary clinical uses of RNA biomarkers in LC screening.
  • FIG. 6 Graphic representation of full-time model feature importance.
  • FIG. 7 Graphic representation of prediagnostic feature importance.
  • detect may describe either the general act of discovering or discerning or the specific observation of a detectably labeled composition.
  • the term “subject” refers to any organisms that are screened using the diagnostic methods described herein. Such organisms preferably include, but are not limited to, mammals (e.g., humans).
  • diagnosis refers to the recognition of a disease by its signs and symptoms, or genetic analysis, pathological analysis, histological analysis, and the like.
  • the term "characterizing cancer in a subject” refers to the identification of one or more properties of a cancer sample in a subject, including but not limited to, the presence of benign, pre-cancerous or cancerous tissue, the stage of the cancer, and the subject's prognosis. Cancers may be characterized by the identification of the expression of one or more cancer marker genes, including but not limited to, those described herein.
  • stage of cancer refers to a qualitative or quantitative assessment of the level of advancement of a cancer. Criteria used to determine the stage of a cancer include, but are not limited to, the size of the tumor and the extent of metastases (e.g., localized or distant).
  • nucleic acid molecule refers to any nucleic acid containing molecule, including but not limited to, DNA or RNA.
  • the term encompasses sequences that include any of the known base analogs of DNA and RNA including, but not limited to,
  • gene refers to a nucleic acid (e.g., DNA) sequence that comprises coding and non-coding sequences necessary for the production of a polypeptide, precursor, or RNA (e.g, rRNA, tRNA).
  • the polypeptide can be encoded by a full-length coding sequence or by any portion of the coding sequence so long as the desired activity or functional properties (e.g, enzymatic activity, ligand binding, signal transduction, immunogenicity, etc.) of the full-length or fragments are retained.
  • the term also encompasses the coding region of a structural gene and the sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb or more on either end such that the gene corresponds to the length of the full-length mRNA. Sequences located 5' of the coding region and present on the mRNA are referred to as 5' non-translated sequences. Sequences located 3' or downstream of the coding region and present on the mRNA are referred to as 3' non-translated sequences.
  • the term "gene” encompasses both cDNA and genomic forms of a gene.
  • a genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed "introns” or “intervening regions” or “intervening sequences.”
  • Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or “spliced out” from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript.
  • mRNA messenger RNA
  • oligonucleotide refers to a short length of single-stranded polynucleotide chain. Oligonucleotides are typically less than 200 residues long (e.g., between 15 and 100), however, as used herein, the term is also intended to encompass longer polynucleotide chains. Oligonucleotides are often referred to by their length. For example, a 24 residue oligonucleotide is referred to as a "24-mer”. Oligonucleotides can form secondary and tertiary structures by self-hybridizing or by hybridizing to other polynucleotides. Such structures can include, but are not limited to, duplexes, hairpins, cruciforms, bends, and triplexes.
  • the terms “complementary” or “complementarity” are used in reference to polynucleotides (i.e., a sequence of nucleotides) related by the base-pairing rules.
  • sequence “5'-A-G-T-3'” is complementary to the sequence “3'-T-C-A-5'.”
  • Complementarity may be “partial,” in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or there may be “complete” or “total” complementarity between the nucleic acids.
  • the degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods that depend upon binding between nucleic acids.
  • a partially complementary sequence is a nucleic acid molecule that at least partially inhibits a completely complementary nucleic acid molecule from hybridizing to a target nucleic acid is "substantially homologous.”
  • the inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency.
  • a substantially homologous sequence or probe will compete for and inhibit the binding (i.e., the hybridization) of a completely homologous nucleic acid molecule to a target under conditions of low stringency.
  • low stringency conditions are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (i.e., selective) interaction.
  • the absence of non-specific binding may be tested by the use of a second target that is substantially non-compl ementary (e.g., less than about 30% identity); in the absence of non-specific binding the probe will not hybridize to the second non-complementary target.
  • hybridization is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, the Tm of the formed hybrid, and the G:C ratio within the nucleic acids. A single molecule that contains pairing of complementary nucleic acids within its structure is said to be “selfhybridized.”
  • isolated when used in relation to a nucleic acid, as in “an isolated oligonucleotide” or “isolated polynucleotide” refers to a nucleic acid sequence that is identified and separated from at least one component or contaminant with which it is ordinarily associated in its natural source. Isolated nucleic acid is such present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids as nucleic acids such as DNA and RNA found in the state they exist in nature.
  • a given DNA sequence e.g., a gene
  • RNA sequences such as a specific mRNA sequence encoding a specific protein
  • isolated nucleic acid encoding a given protein includes, by way of example, such nucleic acid in cells ordinarily expressing the given protein where the nucleic acid is in a chromosomal location different from that of natural cells or is otherwise flanked by a different nucleic acid sequence than that found in nature.
  • the isolated nucleic acid, oligonucleotide, or polynucleotide may be present in single-stranded or doublestranded form.
  • the oligonucleotide or polynucleotide will contain at a minimum the sense or coding strand (i.e., the oligonucleotide or polynucleotide may be single-stranded) but may contain both the sense and anti-sense strands (i.e., the oligonucleotide or polynucleotide may be double-stranded).
  • sample is used in its broadest sense. In one sense, it is meant to include a specimen or culture obtained from any source, as well as biological and environmental samples. Biological samples may be obtained from animals (including humans) and encompass fluids, solids, tissues (e.g., biopsy samples), cells, vesicles, and gases. Biological samples include blood products, such as plasma, serum and the like as well as bodily fluid such as urine. Such examples are not however to be construed as limiting the sample types applicable to the present invention.
  • the present invention relates to compositions and methods for characterizing cancer.
  • the present invention relates to compositions and methods for identifying individuals at increased risk of developing lung cancer.
  • Experiments described herein identified a set of markers that can predict an increased risk of a subject (e.g., a smoker or former smoker) for developing lung cancer (e.g., NSCLC or SCLC).
  • a method of assaying a sample from a subject comprising: determining the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) markers selected from, for example: i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO: 1)), GBP3, hsa-miR- 30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23-BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, is
  • the present disclosure is not limited to particular methods of detecting the described markers. Exemplary detection methods are described herein.
  • the cancer markers of the present disclosure are detected using a variety of nucleic acid techniques, including but not limited to: nucleic acid sequencing, nucleic acid hybridization, and nucleic acid amplification.
  • nucleic acid sequencing methods are utilized (e.g., for detection of amplified nucleic acids).
  • the technology provided herein finds use in a Second Generation (a.k.a. Next Generation or Next-Gen), Third Generation (a.k.a. Next- Next-Gen), or Fourth Generation (a.k.a. N3-Gen) sequencing technology including, but not limited to, pyrosequencing, sequencing-by -ligation, single molecule sequencing, sequence-by- synthesis (SBS), semiconductor sequencing, massively parallel clonal sequencing, massively parallel single molecule SBS, massively parallel single molecule real-time sequencing, massively parallel single molecule real-time nanopore technology, etc.
  • SBS sequence-by- synthesis
  • Morozova and Marra provide a review of some such technologies in Genomics, 92: 255 (2008), herein incorporated by reference in its entirety. Those of ordinary skill in the art will recognize that RNA is usually reverse transcribed to DNA before sequencing.
  • nucleic acid sequencing techniques are suitable, including fluorescencebased sequencing methodologies (See, e.g., Birren et al., Genome Analysis: Analyzing DNA, 1, Cold Spring Harbor, N.Y.; herein incorporated by reference in its entirety).
  • the technology finds use in automated sequencing techniques understood in that art.
  • the present technology finds use in parallel sequencing of partitioned amplicons (PCT Publication No: W02006084132 to Kevin McKeman et al., herein incorporated by reference in its entirety).
  • the technology finds use in DNA sequencing by parallel oligonucleotide extension (See, e.g., U.S. Pat. No.
  • nucleic acid hybridization techniques include, but are not limited to, in situ hybridization (ISH), microarray, and Southern or Northern blot.
  • In situ hybridization (ISH) is a type of hybridization that uses a labeled complementary DNA or RNA strand as a probe to localize a specific DNA or RNA sequence in a portion or section of tissue in situ), or, if the tissue is small enough, the entire tissue (whole mount ISH).
  • DNA ISH can be used to determine the structure of chromosomes.
  • RNA ISH is used to measure and localize mRNAs and other transcripts (e.g., cancer markers) within tissue sections or whole mounts.
  • ISH x-ray fluorescence microscopy
  • ISH can also use two or more probes, labeled with radioactivity or the other non-radioactive labels, to simultaneously detect two or more transcripts.
  • cancer markers are detected using fluorescence in situ hybridization (FISH).
  • FISH assays utilize bacterial artificial chromosomes (BACs). These have been used extensively in the human genome sequencing project (see Nature 409: 953-958 (2001)) and clones containing specific BACs are available through distributors that can be located through many sources, e.g, NCBI. Each BAC clone from the human genome has been given a reference name that unambiguously identifies it. These names can be used to find a corresponding GenBank sequence and to order copies of the clone from a distributor.
  • BACs bacterial artificial chromosomes
  • the present disclosure further provides a method of performing a FISH assay on the patient sample.
  • the methods disclosed herein may comprise performing a FISH assay on one or more cells, tissues, organs, or fluids surrounding such cells, tissues and organs.
  • the methods disclosed herein further comprise performing a FISH assay on human breast cells, human breast tissue or on the fluid surrounding said human breast cells or human breast tissue.
  • Guidance regarding methodology may be obtained from many references including: In situ Hybridization: Medical Applications (eds. G. R. Coulton and J. de Belleroche), Kluwer Academic Publishers, Boston (1992); In situ Hybridization: In Neurobiology; Advances in Methodology (eds. J. H. Eberwine, K. L. Valentino, and J. D.
  • kits that are commercially available and that provide protocols for performing FISH assays (available from e.g., Oncor, Inc., Gaithersburg, MD). Patents providing guidance on methodology include U.S.
  • One or more cancer markers may be detected by conducting one or more hybridization reactions.
  • the one or more hybridization reactions may comprise one or more hybridization arrays, hybridization reactions, hybridization chain reactions, isothermal hybridization reactions, nucleic acid hybridization reactions, or a combination thereof.
  • the one or more hybridization arrays may comprise hybridization array genotyping, hybridization array proportional sensing, DNA hybridization arrays, macroarrays, microarrays, high-density oligonucleotide arrays, genomic hybridization arrays, comparative hybridization arrays, or a combination thereof.
  • DNA microarrays e.g., cDNA microarrays and oligonucleotide microarrays
  • protein microarrays e.g., cDNA microarrays and oligonucleotide microarrays
  • tissue microarrays e.g., tissue microarrays
  • transfection or cell microarrays e.g., cell microarrays
  • chemical compound microarrays e.g., cell microarrays
  • antibody microarrays e.g., antibodies to antibodies.
  • a DNA microarray commonly known as gene chip, DNA chip, or biochip
  • a solid surface e.g, glass, plastic or silicon chip
  • the affixed DNA segments are known as probes, thousands of which can be used in a single DNA microarray.
  • Microarrays can be used to identify disease genes or transcripts (e.g., cancer markers) by comparing gene expression in disease and normal cells.
  • Microarrays can be fabricated using a variety of technologies, including but not limiting: printing with fine-pointed pins onto glass slides; photolithography using pre-made masks; photolithography using dynamic micromirror devices; ink-jet printing; or electrochemistry on microelectrode arrays.
  • the methods disclosed herein may comprise conducting one or more amplification reactions.
  • Nucleic acids e.g., cancer markers
  • Conducting one or more amplification reactions may comprise one or more PCR- based amplifications, non-PCR based amplifications, or a combination thereof.
  • nucleic acid amplification techniques include, but are not limited to, polymerase chain reaction (PCR), reverse transcription polymerase chain reaction (RT-PCR), nested PCR, linear amplification, multiple displacement amplification (MDA), real-time SDA, rolling circle amplification, circle-to-circle amplification transcription-mediated amplification (TMA), ligase chain reaction (LCR), strand displacement amplification (SDA), and nucleic acid sequence based amplification (NASBA).
  • PCR polymerase chain reaction
  • RT-PCR reverse transcription polymerase chain reaction
  • MDA multiple displacement amplification
  • TMA circle-to-circle amplification transcription-mediated amplification
  • TMA circle-to-circle amplification transcription-mediated amplification
  • LCR ligase chain reaction
  • SDA strand displacement amplification
  • NASBA nucleic acid sequence based amplification
  • PCR The polymerase chain reaction (U.S. Pat. Nos. 4,683,195, 4,683,202, 4,800,159 and 4,965,188, each of which is herein incorporated by reference in its entirety), commonly referred to as PCR, uses multiple cycles of denaturation, annealing of primer pairs to opposite strands, and primer extension to exponentially increase copy numbers of a target nucleic acid sequence.
  • RT-PCR reverse transcriptase (RT) is used to make a complementary DNA (cDNA) from mRNA, and the cDNA is then amplified by PCR to produce multiple copies of DNA.
  • cDNA complementary DNA
  • the expression level of the RNA biomarkers of the present invention in a sample is quantified by real-time quantitative polymerase chain reaction (qPCR).
  • TMA Transcription mediated amplification
  • a target nucleic acid sequence autocatalytically under conditions of substantially constant temperature, ionic strength, and pH in which multiple RNA copies of the target sequence autocatalytically generate additional copies.
  • TMA optionally incorporates the use of blocking moieties, terminating moieties, and other modifying moieties to improve TMA process sensitivity and accuracy.
  • the ligase chain reaction (Weiss, R., Science 254: 1292 (1991), herein incorporated by reference in its entirety), commonly referred to as LCR, uses two sets of complementary DNA oligonucleotides that hybridize to adjacent regions of the target nucleic acid.
  • the DNA oligonucleotides are covalently linked by a DNA ligase in repeated cycles of thermal denaturation, hybridization and ligation to produce a detectable double-stranded ligated oligonucleotide product.
  • Strand displacement amplification (Walker, G. et al., Proc. Natl. Acad. Sci. USA 89: 392-396 (1992); U.S. Pat. Nos. 5,270,184 and 5,455,166, each of which is herein incorporated by reference in its entirety), commonly referred to as SDA, uses cycles of annealing pairs of primer sequences to opposite strands of a target sequence, primer extension in the presence of a dNTPaS to produce a duplex hemiphosphorothioated primer extension product, endonuclease-mediated nicking of a hemimodified restriction endonuclease recognition site, and polymerase-mediated primer extension from the 3' end of the nick to displace an existing strand and produce a strand for the next round of primer annealing, nicking and strand displacement, resulting in geometric amplification of product.
  • Thermophilic SDA (tSDA) uses thermophilic endonucleases and polymer
  • amplification methods include, for example: nucleic acid sequence-based amplification (U.S. Pat. No. 5,130,238, herein incorporated by reference in its entirety), commonly referred to as NASBA; one that uses an RNA replicase to amplify the probe molecule itself (Lizardi et al., BioTechnol. 6: 1197 (1988), herein incorporated by reference in its entirety), commonly referred to as Q replicase; a transcription-based amplification method (Kwoh et al., Proc. Natl. Acad. Sci. USA 86:1173 (1989)); and, self-sustained sequence replication (Guatelli et al., Proc. Natl. Acad. Sci.
  • the present invention provides compositions and methods for predicting the likelihood or increased risk of a subject (e.g., a smoker) developing lung cancer.
  • subjects identified by other factors as being at increased risk of lung cancer e.g., history of smoking, family history of lung cancer
  • at-risk subjects are screened at regular intervals (e.g., monthly, yearly, biannually, or other intervals).
  • individuals identified as likely to develop lung cancer based on presence or level of the markers described herein are offered additional screening (e.g., lung imaging such as CT scans).
  • a differential comparison of biomarker expression levels is utilized.
  • the result of the differential comparison for any one of the biomarkers as described in the present disclosure can result in the expression status of the biomarkers being termed to be upregulated, or downregulated, or unchanged or changed.
  • the combined results of the expression status of at least one or more biomarkers thus results in a determination being made of a subject to be at risk of developing lung cancer or has lung cancer.
  • Such a diagnosis can be made on the basis that a particular biomarker expression is considered to be upregulated or downregulated compared to a control or a second comparison sample.
  • the method further comprises measuring the expression level of at least one of the identified biomarkers, which when compared to a control, the expression level is not altered in the subject.
  • the method as described herein further comprises measuring the expression level of at least one of the identified biomarkers, wherein the upregulation of the biomarker, as compared to the control, indicates the subject is at risk of developing lung cancer or has lung cancer.
  • the downregulation of at least of the identified biomarkers as listed as compared to the control indicates the subject is at risk of developing lung cancer or has lung cancer.
  • the comparison of the identified biomarker expression levels include comparison of biomarker expression levels between samples obtained from subject at risk of developing lung cancer and a control group.
  • the control group is defined as a group of subjects, wherein the subjects are not defined as being at risk of developing cancer, such as lung cancer, or do not have cancer.
  • the control group is nonsmokers.
  • the control group is subjects who have never smoked.
  • the control group is a cancer- free group, and in some especially preferred embodiments, cancer-free nonsmokers.
  • the control group is a group of subjects, wherein the subjects do not have lung cancer and are nonsmokers.
  • the control group is a group of normal, cancer-free subjects.
  • the control is at least one selected from the group consisting of a lung cancer free control (normal) and a lung cancer patient or subject at risk of developing lung cancer.
  • the determination of whether the subject is at risk or developing lung cancer, or is in a prediagnostic stage of lung cancer involves determination of a score based on expression of one or more of the RNA biomarkers identified herein.
  • the term “score” refers to an integer or number, that can be determined mathematically, for example by using computational models a known in the art, which can include but are not limited to, SMV, as an example, and that is calculated using any one of a multitude of mathematical equations and/or algorithms known in the art for the purpose of statistical classification. Such a score is used to enumerate one outcome on a spectrum of possible outcomes.
  • a blind sample may be input into an algorithm, which in turn calculates a score based on the information provided by the analysis of the blind sample. This results in the generation of a score for said blind sample. Based on this score, a decision can be made, for example, how likely the patient, from which the blind sample was obtained, is likely to develop lung cancer (i.e., is at risk of developing lung cancer or in a prediagnostic phase of lung cancer).
  • the ends of the spectrum may be defined logically based on the data provided, or arbitrarily according to the requirement of the experimenter. In both cases the spectrum needs to be defined before a blind sample is tested.
  • the score generated by such a blind sample may indicate that the corresponding patient is at risk of developing lung cancer, based on a spectrum defined as a scale from 1 to 50, with “1” being defined as not being at risk of developing lung cancer and “50” being defined as having a very high risk of developing lung cancer.
  • RNA biomarker expression there are a variety of methods for the measurement of RNA biomarker expression including, but not limited to, hybridization-based methods, for example, microarray, northern blotting, bioluminescent, sequencing methods and real-time quantitative polymerase chain reaction (qPCR or RT-qPCR).
  • hybridization-based methods for example, microarray, northern blotting, bioluminescent, sequencing methods and real-time quantitative polymerase chain reaction (qPCR or RT-qPCR).
  • qPCR real-time quantitative polymerase chain reaction
  • a variation of such method is, for example, digital polymerase chain reaction (digital PCR), may also be used.
  • the method as disclosed herein further comprises measuring the expression level of at least one biomarker as listed herein and/or identified in the examples.
  • any sample obtained from a subject can be used according to the method of the present disclosure, so long as the sample in question contains nucleic acid sequences. More specifically, the sample is to contain RNA.
  • the sample is obtained from a subject that may or may not have cancer or may or may not be at risk of developing lung cancer (e.g., NSCLC or SCLC).
  • the sample is obtained from a subject who has cancer.
  • the sample is obtained from a subject who is cancer-free.
  • the sample is obtained from a subject who is lung cancer-free.
  • the sample is obtained from a subject who is normal and lung cancer-free.
  • the sample is obtained from a subject who is a smoker.
  • the sample is obtained from a subject who is a former smoker.
  • RNA biomarkers As person skilled in the art, having possession of the present disclosure, would be capable of working the present invention.
  • An illustrative example as to the use of the present invention is provided as follows: having obtained a sample from a subject, of which is not known if they suffer from lung cancer or if they are lung cancer free, or of which it is not known if they are at risk of developing lung cancer (e.g., NSCLC or SCLC), the sample is analyzed and a differential expression of one or a set of two or more RNA biomarkers, according to the present disclosure and as listed herein, is determined. In some preferred embodiments, this differential expression data is then compared to a control value.
  • a further mathematical score may be determined, which would also take into consideration further statistical parameters relevant to increasing the significance and the accuracy of the provided data set. Based on this information, the person skilled in the art would then be able to determine if the subject in question is cancer-free or has cancer, or, in especially preferred embodiments, if the subject is at risk of developing lung cancer (e.g., NSCLC or SCLC) or is in a prediagnostic stage of lung cancer (e.g., NSCLC or SCLC).
  • lung cancer e.g., NSCLC or SCLC
  • SCLC prediagnostic stage of lung cancer
  • the result of the differential comparison for any one of the biomarkers as described in the present disclosure can result in the expression status of the biomarkers being termed to be upregulated, or downregulated, or unchanged or unchanged.
  • the combined results of the expression status of at least one or more biomarkers thus results in a determination being made of a subject to be at risk of developing lung cancer (e.g., NSCLC or SCLC) or has lung cancer.
  • a diagnosis or prediction can be made on the basis that a particular biomarker expression is considered to be upregulated or downregulated compared to a control or a second comparison sample.
  • the method further comprises measuring the expression level of at least one of the identified biomarkers, which when compared to a control, the expression level is not altered in the subject.
  • the method as described herein further comprises measuring the expression level of at least one of the identified biomarkers, wherein the upregulation of the biomarker, as compared to the control, indicates the subject is at risk of developing lung cancer or has lung cancer.
  • the downregulation of at least of the identified biomarkers as listed as compared to the control indicates the subject is at risk of developing lung cancer or has lung cancer.
  • the comparison of the identified biomarker expression levels include comparison of biomarker expression levels between samples obtained from subject with cancer and a control group.
  • the control group is defined as a group of subjects, wherein the subjects do not have cancer.
  • the control group is a cancer-free group.
  • the control group is a group of subjects, wherein the subject do not have lung cancer.
  • the control group is a group of normal, cancer-free subjects.
  • the control is at least one selected from the group consisting of a lung cancer free control (normal) and a lung cancer patient or subject at risk of developing lung cancer.
  • the control group is subjects who have never smoked.
  • the control group is subjects who are former smokers.
  • the control group is smokers.
  • the determination of whether the subject is at risk or developing lung cancer involves determination of a score based on expression of one or more of the RNA biomarkers identified herein.
  • the term “score” refers to an integer or number, that can be determined mathematically, for example by using computational models a known in the art, which can include but are not limited to, SMV, as an example, and that is calculated using any one of a multitude of mathematical equations and/or algorithms known in the art for the purpose of statistical classification. Such a score is used to enumerate one outcome on a spectrum of possible outcomes.
  • a blind sample may be input into an algorithm, which in turn calculates a score based on the information provided by the analysis of the blind sample. This results in the generation of a score for said blind sample. Based on this score, a decision can be made, for example, how likely the patient, from which the blind sample was obtained, is likely to develop lung cancer (i.e., is at risk of developing lung cancer or in a prediagnostic phase of lung cancer).
  • the ends of the spectrum may be defined logically based on the data provided, or arbitrarily according to the requirement of the experimenter. In both cases the spectrum needs to be defined before a blind sample is tested.
  • the score generated by such a blind sample may indicate that the corresponding patient is at risk of developing lung cancer, based on a spectrum defined as a scale from 1 to 50, with “1” being defined as not being at risk of developing lung cancer and “50” being defined as having a very high risk of developing lung cancer.
  • RNA biomarker expression there are a variety of methods for the measurement of RNA biomarker expression including, but not limited to, hybridization-based methods, for example, microarray, northern blotting, bioluminescent, sequencing methods and real-time quantitative polymerase chain reaction (qPCR or RT-qPCR).
  • hybridization-based methods for example, microarray, northern blotting, bioluminescent, sequencing methods and real-time quantitative polymerase chain reaction (qPCR or RT-qPCR).
  • qPCR real-time quantitative polymerase chain reaction
  • a variation of such method is, for example, digital polymerase chain reaction (digital PCR), may also be used.
  • the method as disclosed herein further comprises measuring the expression level of at least one biomarker as listed herein and/or identified in the examples.
  • any sample obtained from a subject can be used according to the method of the present disclosure, so long as the sample in question contains nucleic acid sequences. More specifically, the sample is to contain RNA.
  • the sample is obtained from a subject that may or may not have cancer or may or may not be at risk of developing lung cancer.
  • the sample is obtained from a subject who has cancer.
  • the sample is obtained from a subject who is cancer-free.
  • the sample is obtained from a subject who is lung cancer-free.
  • the sample is obtained from a subject who is normal and lung cancer-free.
  • RNA biomarkers As person skilled in the art, having possession of the present disclosure, would be capable of working the present invention.
  • An illustrative example as to the use of the present invention is provided as follows: having obtained a sample from a subject, of which is not known if they suffer from lung cancer or if they are lung cancer free, or of which it is not known if they are at risk of developing lung cancer, the sample is analyzed and a differential expression of one or a set of two or more RNA biomarkers, according to the present disclosure and as listed herein is determined. This differential expression data is then compared to the differential expression levels, as provided herein, and which a person skilled in the art would understand the data.
  • a further mathematical score may be determined, which would also take into consideration further statistical parameters relevant to increasing the significance and the accuracy of the provided data set. Based on this information, the person skilled in the art would then be able to determine if the subject in question is cancer-free or has cancer, or, in especially preferred embodiments, if the subject is at risk of developing lung cancer or is in a prediagnostic stage of lung cancer.
  • a sample e.g, a blood or tissue sample
  • a profiling service e.g, clinical lab at a medical facility, genomic profiling business, etc.
  • the subject may visit a medical center to have the sample obtained and sent to the profiling center, or subjects may collect the sample themselves and directly send it to a profiling center.
  • the information may be directly sent to the profiling service by the subject (e.g, an information card containing the information may be scanned by a computer and the data transmitted to a computer of the profiling center using an electronic communication systems).
  • the profiling service Once received by the profiling service, the sample is processed and a profile is produced (i.e., marker data), specific for the diagnostic or prognostic information desired for the subject.
  • the profile data is then prepared in a format suitable for interpretation by a treating clinician.
  • the prepared format may represent a diagnosis or risk assessment for the subject, along with recommendations for particular treatment options.
  • the data may be displayed to the clinician by any suitable method.
  • the profiling service generates a report that can be printed for the clinician (e.g., at the point of care) or displayed to the clinician on a computer monitor.
  • the information is first analyzed at the point of care or at a regional facility.
  • the raw data is then sent to a central processing facility for further analysis and/or to convert the raw data to information useful for a clinician or patient.
  • the central processing facility provides the advantage of privacy (all data is stored in a central facility with uniform security protocols), speed, and uniformity of data analysis.
  • the central processing facility can then control the fate of the data following treatment of the subject. For example, using an electronic communication system, the central facility can provide data to the clinician, the subject, or researchers.
  • the subject is able to directly access the data using the electronic communication system.
  • the subject may choose further intervention or counseling based on the results.
  • the data is used for research use.
  • the data may be used to further optimize the inclusion or elimination of markers as useful indicators of a particular condition or stage of disease or as a companion diagnostic to determine a treatment course of action.
  • the results are used to select candidate therapies for drug screening or clinical trials.
  • compositions for use in the methods described herein include, but are not limited to, kits comprising one or more reagents for determining the presence and/or level of markers described herein in a sample.
  • the reagents are, for example, one or more nucleic acid primers for the amplification, extension, or sequencing of the genes.
  • kits contain all of the components necessary to perform a detection assay, including all controls, directions for performing assays, and any necessary software for analysis and presentation of results.
  • Tumor staging Detailed cancer information was selected from the CRN that has systematically collected mandatory notification on cancer occurrence for the Norwegian population since 1952 (30). The cases were classified into histological subtypes: NSCLC, SCLC and others, the latter referring to other less defined or multiple histologies. Stage at diagnosis was encoded with the TNM system: early (localized - stage I), locally advanced (regional - stage II and III), advanced or metastatic (distant - stage IV) and unknown (32).
  • RNAs were extracted from 400 pL serum using phenol-chloroform and miRNeasy Serum/Plasma kit (Qiagen, Valencia, CA, USA). Libraries were prepared with the NEBNext Small RNA kit (NEB, Ipswich, MA, USA) and sequenced on a HiSeq 2500 platform to on average 18 million sequences per sample (Illumina, San Diego, CA, USA).
  • NEB NEBNext Small RNA kit
  • tRFs tRNA derived fragments
  • snoRNAs miscRNAs
  • IncRNAs IncRNAs and mRNAs.
  • RNAs with fewer than 5 reads in less than 80% of the samples were filtered out to ensure a stable signal.
  • the optmatch (vO.9-11) R package (github.com/markmfredrickson/optmatch) selected appropriately matched controls while building models.
  • RNA transcripts can be grouped on type and other biological characteristics. Therefore, for the SGL models, we used RNA types as grouping variables.
  • Standard full-time and prediagnostic models We refer to models that do not take prediagnostic time into account as standard full-time models (Fig. IB). We trained these models for all histologies and histology specific. Prediagnostic models were created using a sliding windows approach to find optimal time intervals. We first selected 3 different window sizes, 2, 3 and 4 years, which were moved over 10 years prior to diagnosis time. We then built models based on samples captured by these sliding windows. We used the workflow described above to train both standard and prediagnostic models.
  • Feature reduction/selection methods We implemented feature selection methods to improve model performances, including single-RNA class, lasso-selection and significantselection.
  • single-RNA class method we dropped all RNA types except one.
  • lassoselection all non-zero features selected by the lasso classification models were pooled.
  • significance-selection a univariate regression analysis was done per feature and significant features (multiple testing adjusted) were used to train classification models.
  • RNA-seq profiles In this study, we selected 400 patients with prediagnostic serum samples including multiple samples from the same patients. We also included 525 individuals as controls. After excluding failed or low input samples, we obtained RNA-seq data from 1061 serum samples. However, samples from individuals without any smoking history (i.e., never smokers) or missing information were excluded from further analyses. This resulted in 535 case and 263 control samples from 645 current or former smokers for modelling and testing (Table 1 and Fig. 1 A).
  • RNAs were selected as candidate features and used in the models: 202 miRNAs, 1137 isomiRs, 89 miscRNAs, 380 piRNAs, 119 snoRNAs, 530 tRFs, 790 mRNAs and 59 IncRNAs.
  • ML algorithms can differentiate between prediagnostic cases and controls regardless of prediagnostic time.
  • RNAs e.g., miRNAs or tRFs.
  • XGBoost produced the most predictive full-time models, we analyzed the best predictors of these and ranked them based on their importance.
  • the top 3 best features were an isomiR of hsa-miR-486-5p, piR-hsa-28723 and INTS10 for all histologies; Y-RNA, piR- hsa-28723 and GPB3 for NSCLC; and tRF-BS68BFD2, RN7SL724P and tRF-947673FE5 for SCLC.
  • An in-depth investigation of selected features by other algorithms also showed common RNAs.
  • Y-RNA and hsa-miR-486-5p isomiR were among the top predictors of the RF, elastic-net, the SGL and the lasso models forNSCLC; tRF-BS68BFD2 for SCLC.
  • RNA class e.g., miRNA, isomiR etc.
  • XGBoost the best algorithm
  • This method showed that miscRNA-only and miRNA-only models achieved better classification performance than the other RNA classes regardless of histology and stage at diagnosis (Table 2).
  • the best separators of these models included, hsa- miR-99a-5p, hsa-miR-1908-5p, hsa-miR-3925-5p and Y-RNA related transcripts, RNY1P5 and RNY4P30.
  • miRNAs and isomiRs for NSCLC and miscRNAs for SCLC produced better models (Table 2).
  • the most important features of histology-dependent models included hsa-miR-629-5p, hsa-miR-99a-5p, hsa-miR-486-5p isomiR, hsa-miR-151a-3p isomiR for NSCLC; 7SL RNA related transcripts and Vault-RNA for SCLC.
  • RNA class models also implied that feature selection can further improve model performances, therefore we tested other feature selection methods.
  • the results showed lasso feature selection improved AUC values and reduced complexity (Table 2).
  • the most important features of lasso-selected models included hsa-miR-423-5p isomiR, GBP 3 and piR- hsa-28723 for all histologies; Y-RNA, hsa-miR-423-5p isomiR and /./M '01362 for NSCLC; HIST1H4E, PTCH2 and tRF-R29P4P9L5HJVE for SCLC.
  • univariate significant feature selection greatly reduced model complexity with an acceptable performance (Table 2).
  • SCLC models only included 11 RNAs.
  • the most important features were GBP3, LINC01362 and hsa-miR-30a-5p for all histologies; LINC01362, GBP3 and tRF- 9MV47P596V for NSCLC; piR-hsa-7001 and tRF-7343RX6NMH3 for SCLC.
  • RNA levels are dynamic and histology-specific in prediagnostic samples 25 .
  • an isomiR of hsa-miR-451a and RN7SL181P were the most important features of prediagnostic SCLC models. Enrichment analysis of the most important features identified signaling pathways, such as MAPK, PI3K- Akt, RAS, and other pathways like choline metabolism, cellular senescence and PD-L1 expression & PD-1 checkpoint. Similarly, NSCLC models restricted to 6 to 8 years prior to diagnosis had an average AUC of 0.81 (95% CI, 0.75-0.86).
  • RNAs of this period were tRF-YP9L0N4V3, an isomiR of hsa-miR-484 and tRF-9MV47P596V. More than 70 pathways were enriched such as endocytosis, MAPK, RAS, choline metabolism and neurotrophin signaling pathway.
  • RNAcentral a hub of information for non-coding RNA sequences. Nucleic Acids Res. 2019;47(Dl):D1250-D1251.

Abstract

The present invention relates to compositions and methods for characterizing cancer. In particular, the present invention relates to compositions and methods for identifying individuals at increased risk of developing lung cancer.

Description

COMPOSITIONS AND METHODS FOR CHARACTERIZING LUNG CANCER
CROSS-REFERENCE TO RELATED APPLICATION
The present application claims priority to U.S. Provisional Patent Application No. 63/308,633, filed February 10, 2022, which is hereby incorporated by reference in its entirety.
SEQUENCE LISTING
The text of the computer readable sequence listing filed herewith, titled “39611_601_SequenceListing”, created February 9, 2023, having a file size of 11,718 bytes, is hereby incorporated by reference in its entirety.
FIELD OF THE INVENTION
The present invention relates to compositions and methods for characterizing cancer. In particular, the present invention relates to compositions and methods for identifying individuals at increased risk of developing lung cancer.
BACKGROUND OF THE INVENTION
Lung cancer (LC) continues to be the leading cause of cancer-related deaths despite declining smoking prevalence (1, 2). Non-small-cell (NSCLC) and small-cell (SCLC) are the two major subtypes of LC. The symptoms generally occur at a late stage and prognosis is poor. The stage at diagnosis typically determines patient survival (3-5). Screening with low- dose computed tomography (LDCT) can be effective for early detection (5, 6) and reduce LC mortality up to 20% in high-risk groups (7-9). However, LDCT has limitations such as high false-positive rates, risk of overdiagnosis and high cost 6 10. Annual CT scans also cause harmful radiation exposure 58. Robust biomarkers can help stratify high-risk groups and increase accuracy in patient inclusion criteria for LDCT-based screening programs 8.
Liquid biopsies, quantifying molecular biomarkers (tumor-derived DNAs, proteins and RNAs) in circulation, can be used to detect cancer (8, 11, 12). MicroRNAs (miRNA), a class of ~21 nucleotide long short RNAs, have been widely investigated for their biomarker potential (13-16). They can be found both in serum (13, 17, 18) and plasma (13, 18, 19) as cell-free circulating RNAs. miRNAs can function as tumor suppressors or oncomiRs, and regulate tumor traits such as cell growth, angiogenesis, immune system evasion and metastasis (14, 20). The search for RNA biomarkers is not limited to miRNAs. Aberrant expression of other RNA classes, such as protein coding mRNAs, tRNAs, piwi-interacting RNAs (piRNAs) and long-noncoding RNAs (IncRNAs), have been associated with cancer (21, 22). Despite the immense potential of cell-free RNAs, the promise of non-invasive RNA biomarkers of cancer has not yet been fulfilled.
Diagnostic tests that can identify at risk individuals are required.
SUMMARY OF THE INVENTION
The present invention relates to compositions and methods for characterizing cancer. In particular, the present invention relates to compositions and methods for identifying individuals at increased risk of developing lung cancer.
Lung cancer (LC) prognosis is closely linked to the stage of disease when diagnosed. Cell-free RNA molecules have been proposed as early diagnosis biomarkers of LC. They are associated with all hallmarks of cancer and are readily obtainable from serum as circulating RNAs. Experiments described herein investigated the use of serum RNAs for the early detection of LC in smokers at different prediagnostic time intervals and histological subtypes. The results indicated that smokers can be robustly separated from healthy controls before LC diagnosis regardless of histology with an average AUC of 0.80 (95% CI, 0.75-0.85). Furthermore, the strongest models that took both time to diagnosis and histology into account successfully predicted non-small cell LC (NSCLC) between 6 to 8 years, with an AUC of 0.82 (95% CI, 0.76-0.88), and SCLC between 2 to 5 years, with an AUC of 0.89 (95% CI, 0.77-1.0), prior to diagnosis.
Accordingly, in some embodiments, provided herein is a method of assaying a sample from a subject (e.g., a smoker or former smoker), comprising: determining the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) markers selected from, for example i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO:1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23-BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY (TCCTGTACTGAGCTGCCCCGAGG (SEQ ID NO:3)), TLN1, tRF-V47P59D9, tRF-86V8WPMN!EJ3, tRF-6SXMSL73VL4Y, and/or tRF-QKF!R3WE8RO8IS; ii) LINC01362, Y-RNA, iso-23-B0NKZ01J0D (AAAAGCTGGGTTGAGAGGGCGTC (SEQ ID NO:4)), iso-22-MKJIJLJ2Q (CGGGGCAGCTCAGTACAGGATT (SEQ ID NO:5)), iso-21-N2NBQRZ00 (CTGGACTGAAGCTCCTTGAGG (SEQ ID NO: 6)), GBP3, iso-20-RNUW92OI (GGGTTTACGTTGGGAGAACT (SEQ ID NO: 7)), GNAS, hsa-miR-30a-3p, NHSL2, piR- hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23-909U247N04 (TGGAGTGTGACAATGGTGTTTGG (SEQ ID NO: 8)), IRF-I89NJ4S2, tRF- 9MV47P596VE, IRF-86J8WPMN1EJ3, IRF-86V8WPMN1EJ3, and/or tRF- Q1Q89P9L8422E; iii) tRF-20-739P8WQ0, tRF-20-J4S2I7L7, RNU2-19P, hsa-miR-193a-5p, NUDT3, iso-21-B0NKZ0RJ0, RBM39, tRF-29-7EMQ18Y3E7IN, iso-23-B0NKZ01J0E, iso- 22-80FOUHBBP, piR-hsa-26131, iso-21 -DIPPZBOIO, iso-20-RNUW92OI, TANCI, iso-17- BJ93X24, iso-17-DIRN504, 1RF-29-3IRW18V6XOIE, RP11-182L21.6, tRF-28- 6SXMSL73VLD5, hsa-miR-375-3p, hsa-miR-184, and/or tRF-35-I3Z9HMI8W47W!R; iv) hsa-miR-1273h-5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, iso-23-8K4P8R8SDE (TCAGGCTCAGTCCCCTCCCGATT (SEQ ID NO:9)), tRF-9MV47P594, tRF- YP9LON4V3, 1RF-KY7343RXI7, and/or 1RF-PSQP4PW3FJI0V; v) AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22- 947673FE5, AKAP9, MIGA1, RAP IB, RN7SL724P, RUFY2, iso-23-X3749W540L (TGAGTGTGTGTGTGTGAGTGTGA (SEQ ID NO: 10)), tRF-BS68BFD2, tRF- R29P4P9L5HJVE, and tRF-ZRS3S3RX8HYVD; and/or vi) iso-23-B0NKZ01 JDW (AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO: 11)) ATL3, iso-21-Q85XJJ70D (GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12)), HMGB1, hsa-miR-19b-3p, hsa-miR- 215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, and/or TNFRSF13C; and/or vii) MARCH8, tRF-25-JY7383RPD9, 1RF-27-Q1Q89P9L842, tRF-22-947673FE5, EXOC3, PTCH2, piR-hsa-1593, piR-hsa-28391, GRAP2, B4GALT2, ATP8A1, C6orf223, FADS1, iso-22-B04KZ01JL, and/or MSN.
In some embodiments, provided herein is a kit comprising reagents for determining the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) markers selected from, for example i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO:1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23- BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY (TCCTGTACTGAGCTGCCCCGAGG (SEQ ID NO:3)), TLN1, tRF-V47P59D9, tRF- 86V8WPMN1EJ3, tRF-6SXMSL73VL4Y, and/or tRF-QKFlR3WE8RO8IS; ii) LINC01362, Y-RNA, iso-23-B0NKZ01J0D (AAAAGCTGGGTTGAGAGGGCGTC (SEQ ID NO:4)), iso- 22-MKJIJLJ2Q (CGGGGCAGCTCAGTACAGGATT (SEQ ID NO:5)), iso-21- N2NBQRZ00 (CTGGACTGAAGCTCCTTGAGG (SEQ ID NO:6)), GBP3, iso-20- RNUW92OI (GGGTTTACGTTGGGAGAACT (SEQ ID NO:7)), GNAS, hsa-miR-30a-3p, NHSL2, piR-hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23-909U247N04 (TGGAGTGTGACAATGGTGTTTGG (SEQ ID NO: 8)), IRF-I89NJ4S2, tRF- 9MV47P596VE, tRF-86J8WPMNlEJ3, tRF-86V8WPMNlEJ3, and/or tRF- Q1Q89P9L8422E; iii) IRF-20-739P8WQ0, IRF-20-J4S2I7L7, RNU2-19P, hsa-miR-193a-5p, NUDT3, iso-21 -B0NKZ0RJ0, RBM39, IRF-29-7EMQ18Y3E7IN, iso-23-B0NKZ01J0E, iso- 22-80FOUHBBP, piR-hsa-26131, iso-21 -DIPPZBOIO, iso-20-RNUW92OI, TANCI, iso-17- BJ93X24, iso-17-DIRN504, 1RF-29-3IRW18V6XOIE, RP11-182L21.6, tRF-28- 6SXMSL73VLD5, hsa-miR-375-3p, hsa-miR-184, and/or tRF-35-I3Z9HMI8W47WlR; iv) hsa-miR-1273h-5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, iso-23-8K4P8R8SDE (TCAGGCTCAGTCCCCTCCCGATT (SEQ ID NO:9)), tRF-9MV47P594, tRF- YP9LON4V3, 1RF-KY7343RXI7, and/or 1RF-PSQP4PW3FJI0V; v) AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22- 947673FE5, AKAP9, MIGA1, RAP IB, RN7SL724P, RUFY2, iso-23-X3749W540L (TGAGTGTGTGTGTGTGAGTGTGA (SEQ ID NO: 10)), tRF-BS68BFD2, tRF- R29P4P9L5HJVE, and 1RF-ZRS3S3RX8HYVD; vi) iso-23-B0NKZ01JDW (AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO: 11)) ATL3, iso-21-Q85XJJ70D (GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12)), HMGB1, hsa-miR-19b-3p, hsa-miR- 215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, and/or TNFRSF13C; or vii) MARCH8, tRF-25-JY7383RPD9, 1RF-27-Q1Q89P9L842, tRF-22-947673FE5, EXOC3, PTCH2, piR-hsa-1593, piR-hsa-28391, GRAP2, B4GALT2, ATP8A1, C6orf223, FADS1, iso-22-B04KZ01JL, and/or MSN.
Further embodiments provide a method of determining an increased risk of lung cancer (e.g., NSCLC or SCLC), comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) markers selected from, for example i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO: 1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23- BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY (TCCTGTACTGAGCTGCCCCGAGG (SEQ ID NO:3)), TLN1, tRF-V47P59D9, tRF- 86V8WPMN1EJ3, tRF-6SXMSL73VL4Y, and/or tRF-QKFlR3WE8RO8IS; ii) LINC01362, Y-RNA, iso-23-B0NKZ01J0D (AAAAGCTGGGTTGAGAGGGCGTC (SEQ ID NO:4)), iso- 22-MKJIJLJ2Q (CGGGGCAGCTCAGTACAGGATT (SEQ ID NO:5)), iso-21- N2NBQRZ00 (CTGGACTGAAGCTCCTTGAGG (SEQ ID NO:6)), GBP3, iso-20- RNUW92OI (GGGTTTACGTTGGGAGAACT (SEQ ID NO:7)), GNAS, hsa-miR-30a-3p, NHSL2, piR-hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23-909U247N04 (TGGAGTGTGACAATGGTGTTTGG (SEQ ID NO: 8)), IRF-I89NJ4S2, tRF- 9MV47P596VE, tRF-86J8WPMNlEJ3, tRF-86V8WPMNlEJ3, and/or tRF- Q1Q89P9L8422E; iii) IRF-20-739P8WQ0, IRF-20-J4S2I7L7, RNU2-19P, hsa-miR-193a-5p, NUDT3, iso-21 -B0NKZ0RJ0, RBM39, IRF-29-7EMQ18Y3E7IN, iso-23-B0NKZ01J0E, iso-
22-80FOUHBBP, piR-hsa-26131, iso-21 -DIPPZBOIO, iso-20-RNUW92OI, TANCI, iso-17- BJ93X24, iso-17-DIRN504, IRF-29-3IRW18V6XOIE, RP11-182L21.6, tRF-28- 6SXMSL73VLD5, hsa-miR-375-3p, hsa-miR-184, and/or tRF-35-I3Z9HMI8W47WlR; iv) hsa-miR-1273h-5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, iso-23-8K4P8R8SDE (TCAGGCTCAGTCCCCTCCCGATT (SEQ ID NO:9)), tRF-9MV47P594, tRF- YP9LON4V3, IRF-KY7343RXI7, and/or IRF-PSQP4PW3FJI0V; v) AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22- 947673FE5, AKAP9, MIGA1, RAP IB, RN7SL724P, RUFY2, iso-23-X3749W540L (TGAGTGTGTGTGTGTGAGTGTGA (SEQ ID NO: 10)), tRF-BS68BFD2, tRF- R29P4P9L5HJVE, and/or tRF-ZRS3S3RX8HYVD; vi) iso-23-B0NKZ01JDW (AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO: 11)) ATL3, iso-21-Q85XJJ70D (GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12)), HMGB1, hsa-miR-19b-3p, hsa-miR- 215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, and/or TNFRSF13C; and/or vii) MARCH8, tRF-25-JY7383RPD9, 1RF-27-Q1Q89P9L842, tRF-22-947673FE5, EXOC3, PTCH2, piR-hsa-1593, piR-hsa-28391, GRAP2, B4GALT2, ATP8A1, C6orf223, FADS1, iso-22-B04KZ01JL, and/or MSN; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or decreased in the sample.
Additional embodiments provide a method of determining an increased risk of lung cancer, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) markers selected from, for example, iso-20- 5KP25HFF, GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-
23-BQ8DQWM4Z, CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY, TLN1, tRF-V47P59D9, tRF-86V8WPMNlEJ3, tRF-6SXMSL73VL4Y and/or tRF- QKF1R3WE8RO8IS; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or decreased in the sample.
Yet other embodiments provide a method of determining an increased risk of NSCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, LINC01362, Y-RNA, ISO-23-BONKZOUOD, ISO-22-MKJIJLJ2Q, ISO-21 -N2NBQRZ00, GBP3, ISO-20- RNUW92OI, GNAS, hsa-miR-30a-3p, NHSL2, piR-hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23-909U247N04, tRF-I89NJ4S2, tRF-9MV47P596VE, tRF- 86J8WPMN1EJ3, tRF-86V8WPMNlEJ3, and/or tRF-QlQ89P9L8422E; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or decreased in the sample.
Yet other embodiments provide a method of determining an increased risk of NSCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, tRF-20-739P8WQ0, tRF-20-J4S2I7L7, RNU2-19P, hsa-miR-193a-5p, NUDT3, iso-21-B0NKZ0RJ0, RBM39, tRF-29-7EMQ18Y3E7IN, iso-23-B0NKZ01J0E, iso-22-80FOUHBBP, piR-hsa-26131, iso- 21-DIPPZBOIO, iso-20-RNUW92OI, TANCI, iso-17-BJ93X24, iso-17-DIRN504, tRF-29- 3IRW18V6XOIE, RP11-182L21.6, tRF-28-6SXMSL73VLD5, hsa-miR-375-3p, hsa-miR-184 and/or tRF-35-I3Z9HMI8W47WlR; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or decreased in the sample.
Still further embodiments provide a method of determining an increased risk of NSCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 15 or more) markers selected from, for example, hsa-miR-1273h- 5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, TCAGGCTCAGTCCCCTCCCGATT (ISO-23-8K4P8R8SDE), tRF-9MV47P594, tRF-YP9LON4V3, tRF-KY7343RXI7, and/or tRF-PSQP4PW3FJI0V; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or decreased in the sample.
Certain embodiments provide a method of determining an increased risk of SCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22- 947673FE5, AKAP9, MIGA1, RAP1B, RN7SL724P, RUFY2, ISO-23-X3749W540L, tRF- BS68BFD2, tRF-R29P4P9L5HJVE, and/or tRF-ZRS3S3RX8HYVD; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or decreased in the sample.
Other embodiments provide a method of determining an increased risk of SCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more) markers selected from, for example, ISO-23-BONKZOUDW, ATL3, iso-21-Q85XJJ70D, HMGB1, hsa-miR-19b-3p, hsa-miR-215-5p, hsa-miR-30a-5p, hsa-miR- 339-3p, hsa-miR-760, RN7SL277P, and/or TNFRSF13C in a sample from a subject; and b) determining an increased risk of the subject developing lung cancer when the level of the one or more markers are present, increased, or decreased in the sample.
In some embodiments, the increased risk of developing lung cancer is an increased risk of developing lung cancer within 2-8 (e.g., 2-5 or 6-8) years.
Also provided herein is a method of determining an increased risk of NSCLC, comprising: a) assaying a sample from a subject for the level of one or more markers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) selected from, for example, tRF-20- 739P8WQ0, tRF-20-J4S2I7L7, RNU2-19P, hsa-miR-193a-5p, NUDT3, iso-21 -B0NKZ0RJ0, RBM39, tRF-29-7EMQ18Y3E7IN, iso-23-B0NKZ01J0E, iso-22-80FOUHBBP, piR-hsa- 26131, iso-21 -DIPPZBOIO, iso-20-RNUW92OI, TANCI, iso-17-BJ93X24, iso-17-DIRN504, tRF-29-3IRW18V6XOIE, RP11-182L21.6, tRF-28-6SXMSL73VLD5, hsa-miR-375-3p, hsa- miR-184 and/or tRF-35-I3Z9HMI8W47WlR; and b) determining an increased risk of the subject developing lung cancer within 0-2 years when the level of the one or more markers are present, increased, or decreased in the sample.
Also provided herein is a method of determining an increased risk of NSCLC, comprising: a) assaying a sample from a subject for the level of one or more markers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) selected from, for example, hsa-miR-1273h-5p piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, ISO-23-8K4P8R8SDE, tRF-9MV47P594, tRF-YP9LON4V3, tRF-KY7343RXI7, and/or tRF-PSQP4PW3FJI0V; and b) determining an increased risk of the subject developing lung cancer within 6-8 years when the level of the one or more markers are present, increased, or decreased in the sample.
Additionally provided herein is a method of determining an increased risk of SCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO:ll)(ISO-23-BONKZ01JDW) ATL3, GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12), HMGB1, hsa-miR-19b-3p, hsa-miR- 215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, and/or TNFRSF13C in a sample from a subject; and b) determining an increased risk of the subject developing lung cancer within 2-5 years when the level of the one or more markers are present, increased, or decreased in the sample.
Additionally provided herein is a method of determining an increased risk of SCLC, comprising: a) assaying a sample from a subject for the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 or more) markers selected from, for example, MARCH8, tRF-25- JY7383RPD9, tRF-27-QlQ89P9L842, tRF-22-947673FE5, EXOC3, PTCH2, piR-hsa-1593, piR-hsa-28391, GRAP2, B4GALT2, ATP8A1, C6orf223, FADS1, iso-22-B04KZ01JL and/or MSN in a sample from a subject; and b) determining an increased risk of the subject developing lung cancer within 8-10 years when the level of the one or more markers are present, increased, or decreased in the sample.
In some embodiments, the assaying is repeated at regular or irregular time intervals. In some embodiments, subjects identified as having an increased risk of lung cancer are offered further cancer screening (e.g., molecular diagnostics and/or imaging (e.g., CT scans).
The present invention is not limited to particular methods of assaying a sample from a subject for the level or presence of the markers in the sample. Exemplary methods utilize reagents including but not limited to, one or more sequencing primers, one or more amplification primers or one or more nucleic acid probes.
The present invention is not limited to particular samples. Examples include but are not limited to, whole blood, a blood product, a cell sample, a tissue sample, or a bodily fluid sample (e.g., urine).
It will be further understood that in each of the methods described above, the level of expression of the one or more biomarkers may be determined by comparison to a control as further defined herein, or as score, for example when the sample is a blind sample, as further described herein.
Additional embodiments are described herein.
DESCRIPTION OF THE FIGURES
FIG. 1 : A. Sample selection flow chart. B. Five different ML algorithms were tested to find the optimal one, XGBoost. FIG. 2: A. Each boxplot shows performances of each algorithm measured by AUCs.
B. Each ROC curve is based on the prediction results of a randomly created testing dataset (in total 5).
FIG. 3: Sliding windows analysis showed better models which utilizes prediagnostic samples in specific time intervals such as SCLC models, which were restricted to samples from 2 to 5 years prior to diagnosis.
FIG. 4: Each ROC curve is based on the prediction results of a randomly created testing dataset (in total 5). AUC values show the average of these predictions.
FIG. 5: Exemplary clinical uses of RNA biomarkers in LC screening.
FIG. 6: Graphic representation of full-time model feature importance.
FIG. 7: Graphic representation of prediagnostic feature importance.
DEFINITIONS
To facilitate an understanding of the present invention, a number of terms and phrases are defined below:
As used herein, the terms “detect”, “detecting” or “detection” may describe either the general act of discovering or discerning or the specific observation of a detectably labeled composition.
As used herein, the term “subject” refers to any organisms that are screened using the diagnostic methods described herein. Such organisms preferably include, but are not limited to, mammals (e.g., humans).
The term “diagnosed,” as used herein, refers to the recognition of a disease by its signs and symptoms, or genetic analysis, pathological analysis, histological analysis, and the like.
As used herein, the term "characterizing cancer in a subject" refers to the identification of one or more properties of a cancer sample in a subject, including but not limited to, the presence of benign, pre-cancerous or cancerous tissue, the stage of the cancer, and the subject's prognosis. Cancers may be characterized by the identification of the expression of one or more cancer marker genes, including but not limited to, those described herein.
As used herein, the term "stage of cancer" refers to a qualitative or quantitative assessment of the level of advancement of a cancer. Criteria used to determine the stage of a cancer include, but are not limited to, the size of the tumor and the extent of metastases (e.g., localized or distant).
As used herein, the term "nucleic acid molecule" refers to any nucleic acid containing molecule, including but not limited to, DNA or RNA. The term encompasses sequences that include any of the known base analogs of DNA and RNA including, but not limited to,
4-acetylcytosine, 8-hydroxy-N6-methyladenosine, aziridinylcytosine, pseudoisocytosine,
5-(carboxyhydroxylmethyl) uracil, 5-fluorouracil, 5 -bromouracil, 5- carboxymethylaminomethyl-2 -thiouracil, 5-carboxymethylaminomethyluracil, dihydrouracil, inosine, N6-isopentenyladenine, 1 -methyladenine, 1 -methylpseudouracil, 1 -methylguanine,
1 -methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3 -methylcytosine, 5 -methylcytosine, N6-methyladenine, 7-methylguanine, 5-methylaminomethyluracil, 5- methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine,
5 '-methoxy carbonylmethyluracil, 5-methoxyuracil, 2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid, oxybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5 -methyluracil, N- uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid, pseudouracil, queosine, 2- thiocytosine, and 2,6-diaminopurine.
The term "gene" refers to a nucleic acid (e.g., DNA) sequence that comprises coding and non-coding sequences necessary for the production of a polypeptide, precursor, or RNA (e.g, rRNA, tRNA). The polypeptide can be encoded by a full-length coding sequence or by any portion of the coding sequence so long as the desired activity or functional properties (e.g, enzymatic activity, ligand binding, signal transduction, immunogenicity, etc.) of the full-length or fragments are retained. The term also encompasses the coding region of a structural gene and the sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb or more on either end such that the gene corresponds to the length of the full-length mRNA. Sequences located 5' of the coding region and present on the mRNA are referred to as 5' non-translated sequences. Sequences located 3' or downstream of the coding region and present on the mRNA are referred to as 3' non-translated sequences. The term "gene" encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed "introns" or "intervening regions" or "intervening sequences." Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or "spliced out" from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.
As used herein, the term "oligonucleotide," refers to a short length of single-stranded polynucleotide chain. Oligonucleotides are typically less than 200 residues long (e.g., between 15 and 100), however, as used herein, the term is also intended to encompass longer polynucleotide chains. Oligonucleotides are often referred to by their length. For example, a 24 residue oligonucleotide is referred to as a "24-mer". Oligonucleotides can form secondary and tertiary structures by self-hybridizing or by hybridizing to other polynucleotides. Such structures can include, but are not limited to, duplexes, hairpins, cruciforms, bends, and triplexes.
As used herein, the terms "complementary" or "complementarity" are used in reference to polynucleotides (i.e., a sequence of nucleotides) related by the base-pairing rules. For example, the sequence "5'-A-G-T-3'," is complementary to the sequence "3'-T-C-A-5'." Complementarity may be "partial," in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or there may be "complete" or "total" complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods that depend upon binding between nucleic acids.
The term "homology" refers to a degree of complementarity. There may be partial homology or complete homology (i.e., identity). A partially complementary sequence is a nucleic acid molecule that at least partially inhibits a completely complementary nucleic acid molecule from hybridizing to a target nucleic acid is "substantially homologous." The inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. A substantially homologous sequence or probe will compete for and inhibit the binding (i.e., the hybridization) of a completely homologous nucleic acid molecule to a target under conditions of low stringency. This is not to say that conditions of low stringency are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (i.e., selective) interaction. The absence of non-specific binding may be tested by the use of a second target that is substantially non-compl ementary (e.g., less than about 30% identity); in the absence of non-specific binding the probe will not hybridize to the second non-complementary target.
As used herein, the term "hybridization" is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, the Tm of the formed hybrid, and the G:C ratio within the nucleic acids. A single molecule that contains pairing of complementary nucleic acids within its structure is said to be "selfhybridized."
The term "isolated" when used in relation to a nucleic acid, as in "an isolated oligonucleotide" or "isolated polynucleotide" refers to a nucleic acid sequence that is identified and separated from at least one component or contaminant with which it is ordinarily associated in its natural source. Isolated nucleic acid is such present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids as nucleic acids such as DNA and RNA found in the state they exist in nature. For example, a given DNA sequence (e.g., a gene) is found on the host cell chromosome in proximity to neighboring genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are found in the cell as a mixture with numerous other mRNAs that encode a multitude of proteins. However, isolated nucleic acid encoding a given protein includes, by way of example, such nucleic acid in cells ordinarily expressing the given protein where the nucleic acid is in a chromosomal location different from that of natural cells or is otherwise flanked by a different nucleic acid sequence than that found in nature. The isolated nucleic acid, oligonucleotide, or polynucleotide may be present in single-stranded or doublestranded form. When an isolated nucleic acid, oligonucleotide or polynucleotide is to be utilized to express a protein, the oligonucleotide or polynucleotide will contain at a minimum the sense or coding strand (i.e., the oligonucleotide or polynucleotide may be single-stranded) but may contain both the sense and anti-sense strands (i.e., the oligonucleotide or polynucleotide may be double-stranded).
As used herein, the term "sample" is used in its broadest sense. In one sense, it is meant to include a specimen or culture obtained from any source, as well as biological and environmental samples. Biological samples may be obtained from animals (including humans) and encompass fluids, solids, tissues (e.g., biopsy samples), cells, vesicles, and gases. Biological samples include blood products, such as plasma, serum and the like as well as bodily fluid such as urine. Such examples are not however to be construed as limiting the sample types applicable to the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention relates to compositions and methods for characterizing cancer. In particular, the present invention relates to compositions and methods for identifying individuals at increased risk of developing lung cancer. Experiments described herein identified a set of markers that can predict an increased risk of a subject (e.g., a smoker or former smoker) for developing lung cancer (e.g., NSCLC or SCLC). Accordingly, in some embodiments, provided herein is a method of assaying a sample from a subject, comprising: determining the level of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 or more) markers selected from, for example: i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO: 1)), GBP3, hsa-miR- 30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23-BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY (TCCTGTACTGAGCTGCCCCGAGG (SEQ ID NO:3)), TLN1, tRF-V47P59D9, tRF-86V8WPMN!EJ3, tRF-6SXMSL73VL4Y, and/or tRF-QKF!R3WE8RO8IS; ii) LINC01362, Y-RNA, iso-23-B0NKZ01J0D (AAAAGCTGGGTTGAGAGGGCGTC (SEQ ID NO:4)), iso-22-MKJIJLJ2Q (CGGGGCAGCTCAGTACAGGATT (SEQ ID NO:5)), iso-21-N2NBQRZ00 (CTGGACTGAAGCTCCTTGAGG (SEQ ID NO:6)), GBP3, iso-20-RNUW92OI (GGGTTTACGTTGGGAGAACT (SEQ ID NO:7)), GNAS, hsa-miR- 30a-3p, NHSL2, piR-hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23- 909U247N04 (TGGAGTGTGACAATGGTGTTTGG (SEQ ID NO: 8)), tRF-I89NJ4S2, tRF- 9MV47P596VE, tRF-86J8WPMN!EJ3, tRF-86V8WPMN!EJ3, and/or tRF- Q1Q89P9L8422E; iii) tRF-20-739P8WQ0, tRF-20-J4S2I7L7, RNU2-19P, hsa-miR-193a-5p, NUDT3, iso-21- B0NKZ0RJ0, RBM39, tRF-29-7EMQ18Y3E7IN, iso-23-B0NKZ01J0E, iso-22- 80FOUHBBP, piR-hsa-26131, iso-21-DIPPZBOIO, iso-20-RNUW92OI, TANCI, iso- 17- BJ93X24, iso-17-DIRN504, tRF-29-3IRW18V6XOIE, RP11-182L21.6, tRF-28- 6SXMSL73VLD5, hsa-miR-375-3p, hsa-miR-184, and/or tRF-35-I3Z9HMI8W47W!R; iv) hsa-miR-1273h-5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, iso-23-8K4P8R8SDE (TCAGGCTCAGTCCCCTCCCGATT (SEQ ID NO:9)), tRF-9MV47P594, tRF- YP9LON4V3, tRF-KY7343RXI7, and/or tRF-PSQP4PW3FJI0V; v) AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22-947673FE5, AKAP9, MIGA1, RAP1B, RN7SL724P, RUFY2, iso-23-X3749W540L (TGAGTGTGTGTGTGTGAGTGTGA (SEQ ID NO: 10)), tRF- BS68BFD2, tRF-R29P4P9L5HJVE, and/or 1RF-ZRS3S3RX8HYVD; vi) iso-23 -B0NKZ01JDW (AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO: 11)) ATL3, iso-21-Q85XJJ70D (GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12)), HMGB1, hsa- miR-19b-3p, hsa-miR-215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, or TNFRSF13C; and/or vii) MARCH8, tRF-25-JY7383RPD9, 1RF-27-Q1Q89P9L842, tRF-22-947673FE5, EXOC3, PTCH2, piR-hsa-1593, piR-hsa-28391, GRAP2, B4GALT2, ATP8A1, C6orf223, FADS1, iso-22-B04KZ01JL, and/or MSN.
The present disclosure is not limited to particular methods of detecting the described markers. Exemplary detection methods are described herein.
The cancer markers of the present disclosure are detected using a variety of nucleic acid techniques, including but not limited to: nucleic acid sequencing, nucleic acid hybridization, and nucleic acid amplification.
In some embodiments, nucleic acid sequencing methods are utilized (e.g., for detection of amplified nucleic acids). In some embodiments, the technology provided herein finds use in a Second Generation (a.k.a. Next Generation or Next-Gen), Third Generation (a.k.a. Next- Next-Gen), or Fourth Generation (a.k.a. N3-Gen) sequencing technology including, but not limited to, pyrosequencing, sequencing-by -ligation, single molecule sequencing, sequence-by- synthesis (SBS), semiconductor sequencing, massively parallel clonal sequencing, massively parallel single molecule SBS, massively parallel single molecule real-time sequencing, massively parallel single molecule real-time nanopore technology, etc. Morozova and Marra provide a review of some such technologies in Genomics, 92: 255 (2008), herein incorporated by reference in its entirety. Those of ordinary skill in the art will recognize that RNA is usually reverse transcribed to DNA before sequencing.
A number of nucleic acid sequencing techniques are suitable, including fluorescencebased sequencing methodologies (See, e.g., Birren et al., Genome Analysis: Analyzing DNA, 1, Cold Spring Harbor, N.Y.; herein incorporated by reference in its entirety). In some embodiments, the technology finds use in automated sequencing techniques understood in that art. In some embodiments, the present technology finds use in parallel sequencing of partitioned amplicons (PCT Publication No: W02006084132 to Kevin McKeman et al., herein incorporated by reference in its entirety). In some embodiments, the technology finds use in DNA sequencing by parallel oligonucleotide extension (See, e.g., U.S. Pat. No.
5,750,341 to Macevicz et al., and U.S. Pat. No. 6,306,597 to Macevicz et al., both of which are herein incorporated by reference in their entireties). Additional examples of sequencing techniques in which the technology finds use include the Church polony technology (Mitra et al., 2003, Analytical Biochemistry 320, 55-65; Shendure et al., 2005 Science 309, 1728-1732; U.S. Pat. No. 6,432,360, U.S. Pat. No. 6,485,944, U.S. Pat. No. 6,511,803; herein incorporated by reference in their entireties), the 454 picotiter pyrosequencing technology (Margulies et al., 2005 Nature 437, 376-380; US 20050130173; herein incorporated by reference in their entireties), the Solexa single base addition technology (Bennett et al., 2005, Pharmacogenomics, 6, 373-382; U.S. Pat. No. 6,787,308; U.S. Pat. No. 6,833,246; herein incorporated by reference in their entireties), the Lynx massively parallel signature sequencing technology (Brenner et al. (2000). Nat. Biotechnol. 18:630-634; U.S. Pat. No. 5,695,934; U.S. Pat. No. 5,714,330; herein incorporated by reference in their entireties), and the Adessi PCR colony technology (Adessi et al. (2000). Nucleic Acid Res. 28, E87; WO 00018957; herein incorporated by reference in its entirety).
Illustrative non-limiting examples of nucleic acid hybridization techniques include, but are not limited to, in situ hybridization (ISH), microarray, and Southern or Northern blot. In situ hybridization (ISH) is a type of hybridization that uses a labeled complementary DNA or RNA strand as a probe to localize a specific DNA or RNA sequence in a portion or section of tissue in situ), or, if the tissue is small enough, the entire tissue (whole mount ISH). DNA ISH can be used to determine the structure of chromosomes. RNA ISH is used to measure and localize mRNAs and other transcripts (e.g., cancer markers) within tissue sections or whole mounts. Sample cells and tissues are usually treated to fix the target transcripts in place and to increase access of the probe. The probe hybridizes to the target sequence at elevated temperature, and then the excess probe is washed away. The probe that was labeled with either radio-, fluorescent- or antigen-labeled bases is localized and quantitated in the tissue using either autoradiography, fluorescence microscopy or immunohistochemistry, respectively. ISH can also use two or more probes, labeled with radioactivity or the other non-radioactive labels, to simultaneously detect two or more transcripts.
In some embodiments, cancer markers are detected using fluorescence in situ hybridization (FISH). In some embodiments, FISH assays utilize bacterial artificial chromosomes (BACs). These have been used extensively in the human genome sequencing project (see Nature 409: 953-958 (2001)) and clones containing specific BACs are available through distributors that can be located through many sources, e.g, NCBI. Each BAC clone from the human genome has been given a reference name that unambiguously identifies it. These names can be used to find a corresponding GenBank sequence and to order copies of the clone from a distributor.
The present disclosure further provides a method of performing a FISH assay on the patient sample. The methods disclosed herein may comprise performing a FISH assay on one or more cells, tissues, organs, or fluids surrounding such cells, tissues and organs. In some instances, the methods disclosed herein further comprise performing a FISH assay on human breast cells, human breast tissue or on the fluid surrounding said human breast cells or human breast tissue. Guidance regarding methodology may be obtained from many references including: In situ Hybridization: Medical Applications (eds. G. R. Coulton and J. de Belleroche), Kluwer Academic Publishers, Boston (1992); In situ Hybridization: In Neurobiology; Advances in Methodology (eds. J. H. Eberwine, K. L. Valentino, and J. D. Barchas), Oxford University Press Inc., England (1994); In situ Hybridization: A Practical Approach (ed. D. G. Wilkinson), Oxford University Press Inc., England (1992)); Kuo, et al., Am. J. Hum. Genet. 49:112-119 (1991); Klinger, et al., Am. J. Hum. Genet. 51:55-65 (1992); and Ward, et al., Am. J. Hum. Genet. 52:854-865 (1993)). There are also kits that are commercially available and that provide protocols for performing FISH assays (available from e.g., Oncor, Inc., Gaithersburg, MD). Patents providing guidance on methodology include U.S. 5,225,326; 5,545,524; 6,121,489 and 6,573,043. All of these references are hereby incorporated by reference in their entirety and may be used along with similar references in the art and with the information provided in the Examples section herein to establish procedural steps convenient for a particular laboratory.
One or more cancer markers may be detected by conducting one or more hybridization reactions. The one or more hybridization reactions may comprise one or more hybridization arrays, hybridization reactions, hybridization chain reactions, isothermal hybridization reactions, nucleic acid hybridization reactions, or a combination thereof. The one or more hybridization arrays may comprise hybridization array genotyping, hybridization array proportional sensing, DNA hybridization arrays, macroarrays, microarrays, high-density oligonucleotide arrays, genomic hybridization arrays, comparative hybridization arrays, or a combination thereof.
Different kinds of biological assays are called microarrays including, but not limited to: DNA microarrays (e.g., cDNA microarrays and oligonucleotide microarrays); protein microarrays; tissue microarrays; transfection or cell microarrays; chemical compound microarrays; and antibody microarrays. A DNA microarray, commonly known as gene chip, DNA chip, or biochip, is a collection of microscopic DNA spots attached to a solid surface (e.g, glass, plastic or silicon chip) forming an array for the purpose of expression profiling or monitoring expression levels for thousands of genes simultaneously. The affixed DNA segments are known as probes, thousands of which can be used in a single DNA microarray. Microarrays can be used to identify disease genes or transcripts (e.g., cancer markers) by comparing gene expression in disease and normal cells. Microarrays can be fabricated using a variety of technologies, including but not limiting: printing with fine-pointed pins onto glass slides; photolithography using pre-made masks; photolithography using dynamic micromirror devices; ink-jet printing; or electrochemistry on microelectrode arrays.
The methods disclosed herein may comprise conducting one or more amplification reactions. Nucleic acids (e.g., cancer markers) may be amplified prior to or simultaneous with detection. Conducting one or more amplification reactions may comprise one or more PCR- based amplifications, non-PCR based amplifications, or a combination thereof. Illustrative non-limiting examples of nucleic acid amplification techniques include, but are not limited to, polymerase chain reaction (PCR), reverse transcription polymerase chain reaction (RT-PCR), nested PCR, linear amplification, multiple displacement amplification (MDA), real-time SDA, rolling circle amplification, circle-to-circle amplification transcription-mediated amplification (TMA), ligase chain reaction (LCR), strand displacement amplification (SDA), and nucleic acid sequence based amplification (NASBA). Those of ordinary skill in the art will recognize that certain amplification techniques (e.g., PCR) require that RNA be reversed transcribed to DNA prior to amplification (e.g, RT-PCR).
The polymerase chain reaction (U.S. Pat. Nos. 4,683,195, 4,683,202, 4,800,159 and 4,965,188, each of which is herein incorporated by reference in its entirety), commonly referred to as PCR, uses multiple cycles of denaturation, annealing of primer pairs to opposite strands, and primer extension to exponentially increase copy numbers of a target nucleic acid sequence. In a variation called RT-PCR, reverse transcriptase (RT) is used to make a complementary DNA (cDNA) from mRNA, and the cDNA is then amplified by PCR to produce multiple copies of DNA. For other various permutations of PCR see, e.g, U.S. Pat. Nos. 4,683,195, 4,683,202 and 4,800,159; Mullis et al., Meth. Enzymol. 155: 335 (1987); and Murakawa et al., DNA 7: 287 (1988), each of which is herein incorporated by reference in its entirety. In some particularly preferred embodiments, the expression level of the RNA biomarkers of the present invention in a sample is quantified by real-time quantitative polymerase chain reaction (qPCR).
Transcription mediated amplification (U.S. Pat. Nos. 5,480,784 and 5,399,491, each of which is herein incorporated by reference in its entirety), commonly referred to as TMA, synthesizes multiple copies of a target nucleic acid sequence autocatalytically under conditions of substantially constant temperature, ionic strength, and pH in which multiple RNA copies of the target sequence autocatalytically generate additional copies. See, e.g., U.S. Pat. Nos. 5,399,491 and 5,824,518, each of which is herein incorporated by reference in its entirety. In a variation described in U.S. Publ. No. 20060046265 (herein incorporated by reference in its entirety), TMA optionally incorporates the use of blocking moieties, terminating moieties, and other modifying moieties to improve TMA process sensitivity and accuracy.
The ligase chain reaction (Weiss, R., Science 254: 1292 (1991), herein incorporated by reference in its entirety), commonly referred to as LCR, uses two sets of complementary DNA oligonucleotides that hybridize to adjacent regions of the target nucleic acid. The DNA oligonucleotides are covalently linked by a DNA ligase in repeated cycles of thermal denaturation, hybridization and ligation to produce a detectable double-stranded ligated oligonucleotide product.
Strand displacement amplification (Walker, G. et al., Proc. Natl. Acad. Sci. USA 89: 392-396 (1992); U.S. Pat. Nos. 5,270,184 and 5,455,166, each of which is herein incorporated by reference in its entirety), commonly referred to as SDA, uses cycles of annealing pairs of primer sequences to opposite strands of a target sequence, primer extension in the presence of a dNTPaS to produce a duplex hemiphosphorothioated primer extension product, endonuclease-mediated nicking of a hemimodified restriction endonuclease recognition site, and polymerase-mediated primer extension from the 3' end of the nick to displace an existing strand and produce a strand for the next round of primer annealing, nicking and strand displacement, resulting in geometric amplification of product. Thermophilic SDA (tSDA) uses thermophilic endonucleases and polymerases at higher temperatures in essentially the same method (EP Pat. No. 0 684 315).
Other amplification methods include, for example: nucleic acid sequence-based amplification (U.S. Pat. No. 5,130,238, herein incorporated by reference in its entirety), commonly referred to as NASBA; one that uses an RNA replicase to amplify the probe molecule itself (Lizardi et al., BioTechnol. 6: 1197 (1988), herein incorporated by reference in its entirety), commonly referred to as Q replicase; a transcription-based amplification method (Kwoh et al., Proc. Natl. Acad. Sci. USA 86:1173 (1989)); and, self-sustained sequence replication (Guatelli et al., Proc. Natl. Acad. Sci. USA 87: 1874 (1990), each of which is herein incorporated by reference in its entirety). For further discussion of known amplification methods see Persing, David H., “In Vitro Nucleic Acid Amplification Techniques” in Diagnostic Medical Microbiology: Principles and Applications (Persing et al., Eds.), pp. 51-87 (American Society for Microbiology, Washington, DC (1993)).
In some embodiments, as described herein, the present invention provides compositions and methods for predicting the likelihood or increased risk of a subject (e.g., a smoker) developing lung cancer. In some embodiments, subjects identified by other factors as being at increased risk of lung cancer (e.g., history of smoking, family history of lung cancer) are screened using the markers described herein. In some embodiments, at-risk subjects are screened at regular intervals (e.g., monthly, yearly, biannually, or other intervals). In some embodiments, individuals identified as likely to develop lung cancer based on presence or level of the markers described herein are offered additional screening (e.g., lung imaging such as CT scans).
In some preferred embodiments, a differential comparison of biomarker expression levels is utilized. The result of the differential comparison for any one of the biomarkers as described in the present disclosure can result in the expression status of the biomarkers being termed to be upregulated, or downregulated, or unchanged or changed. The combined results of the expression status of at least one or more biomarkers thus results in a determination being made of a subject to be at risk of developing lung cancer or has lung cancer. Such a diagnosis can be made on the basis that a particular biomarker expression is considered to be upregulated or downregulated compared to a control or a second comparison sample. Thus, in one example, the method further comprises measuring the expression level of at least one of the identified biomarkers, which when compared to a control, the expression level is not altered in the subject. In another example, the method as described herein further comprises measuring the expression level of at least one of the identified biomarkers, wherein the upregulation of the biomarker, as compared to the control, indicates the subject is at risk of developing lung cancer or has lung cancer. In another example, the downregulation of at least of the identified biomarkers as listed as compared to the control, indicates the subject is at risk of developing lung cancer or has lung cancer.
The comparison of the identified biomarker expression levels, as described in the methods disclosed in the present disclosure, include comparison of biomarker expression levels between samples obtained from subject at risk of developing lung cancer and a control group. The control group is defined as a group of subjects, wherein the subjects are not defined as being at risk of developing cancer, such as lung cancer, or do not have cancer. In some embodiments, the control group is nonsmokers. In some embodiments, the control group is subjects who have never smoked. In another example, the control group is a cancer- free group, and in some especially preferred embodiments, cancer-free nonsmokers. In one example, the control group is a group of subjects, wherein the subjects do not have lung cancer and are nonsmokers. In another example, the control group is a group of normal, cancer-free subjects. In another example, the control is at least one selected from the group consisting of a lung cancer free control (normal) and a lung cancer patient or subject at risk of developing lung cancer.
In other preferred embodiments, the determination of whether the subject is at risk or developing lung cancer, or is in a prediagnostic stage of lung cancer, involves determination of a score based on expression of one or more of the RNA biomarkers identified herein. As used herein, the term “score” refers to an integer or number, that can be determined mathematically, for example by using computational models a known in the art, which can include but are not limited to, SMV, as an example, and that is calculated using any one of a multitude of mathematical equations and/or algorithms known in the art for the purpose of statistical classification. Such a score is used to enumerate one outcome on a spectrum of possible outcomes. The relevance and statistical significance of such a score depends on the size and the quality of the underlying data set used to establish the results spectrum. For example, a blind sample may be input into an algorithm, which in turn calculates a score based on the information provided by the analysis of the blind sample. This results in the generation of a score for said blind sample. Based on this score, a decision can be made, for example, how likely the patient, from which the blind sample was obtained, is likely to develop lung cancer (i.e., is at risk of developing lung cancer or in a prediagnostic phase of lung cancer). The ends of the spectrum may be defined logically based on the data provided, or arbitrarily according to the requirement of the experimenter. In both cases the spectrum needs to be defined before a blind sample is tested. As a result, the score generated by such a blind sample, for example the number “40” may indicate that the corresponding patient is at risk of developing lung cancer, based on a spectrum defined as a scale from 1 to 50, with “1” being defined as not being at risk of developing lung cancer and “50” being defined as having a very high risk of developing lung cancer.
As discussed above, there are a variety of methods for the measurement of RNA biomarker expression including, but not limited to, hybridization-based methods, for example, microarray, northern blotting, bioluminescent, sequencing methods and real-time quantitative polymerase chain reaction (qPCR or RT-qPCR). In some preferred embodiments, the most robust technology that provides precise, reproducible and accurate quantitative result and highest dynamic range is qPCR, which is currently considered the standard commonly used to validate the results of other technologies. A variation of such method is, for example, digital polymerase chain reaction (digital PCR), may also be used. Thus, in one example, the method as disclosed herein further comprises measuring the expression level of at least one biomarker as listed herein and/or identified in the examples.
In some preferred embodiments, any sample obtained from a subject can be used according to the method of the present disclosure, so long as the sample in question contains nucleic acid sequences. More specifically, the sample is to contain RNA. In one example, the sample is obtained from a subject that may or may not have cancer or may or may not be at risk of developing lung cancer (e.g., NSCLC or SCLC). In another example, the sample is obtained from a subject who has cancer. In another example, the sample is obtained from a subject who is cancer-free. In yet another example, the sample is obtained from a subject who is lung cancer-free. In a further example, the sample is obtained from a subject who is normal and lung cancer-free. In a further example, the sample is obtained from a subject who is a smoker. In a further example, the sample is obtained from a subject who is a former smoker.
As person skilled in the art, having possession of the present disclosure, would be capable of working the present invention. An illustrative example as to the use of the present invention is provided as follows: having obtained a sample from a subject, of which is not known if they suffer from lung cancer or if they are lung cancer free, or of which it is not known if they are at risk of developing lung cancer (e.g., NSCLC or SCLC), the sample is analyzed and a differential expression of one or a set of two or more RNA biomarkers, according to the present disclosure and as listed herein, is determined. In some preferred embodiments, this differential expression data is then compared to a control value. Optionally, a further mathematical score may be determined, which would also take into consideration further statistical parameters relevant to increasing the significance and the accuracy of the provided data set. Based on this information, the person skilled in the art would then be able to determine if the subject in question is cancer-free or has cancer, or, in especially preferred embodiments, if the subject is at risk of developing lung cancer (e.g., NSCLC or SCLC) or is in a prediagnostic stage of lung cancer (e.g., NSCLC or SCLC).
The result of the differential comparison for any one of the biomarkers as described in the present disclosure can result in the expression status of the biomarkers being termed to be upregulated, or downregulated, or unchanged or unchanged. The combined results of the expression status of at least one or more biomarkers thus results in a determination being made of a subject to be at risk of developing lung cancer (e.g., NSCLC or SCLC) or has lung cancer. Such a diagnosis or prediction can be made on the basis that a particular biomarker expression is considered to be upregulated or downregulated compared to a control or a second comparison sample. Thus, in one example, the method further comprises measuring the expression level of at least one of the identified biomarkers, which when compared to a control, the expression level is not altered in the subject. In another example, the method as described herein further comprises measuring the expression level of at least one of the identified biomarkers, wherein the upregulation of the biomarker, as compared to the control, indicates the subject is at risk of developing lung cancer or has lung cancer. In another example, the downregulation of at least of the identified biomarkers as listed as compared to the control, indicates the subject is at risk of developing lung cancer or has lung cancer.
The comparison of the identified biomarker expression levels, as described in the methods disclosed in the present disclosure, include comparison of biomarker expression levels between samples obtained from subject with cancer and a control group. The control group is defined as a group of subjects, wherein the subjects do not have cancer. In another example, the control group is a cancer-free group. In one example, the control group is a group of subjects, wherein the subject do not have lung cancer. In another example, the control group is a group of normal, cancer-free subjects. In another example, the control is at least one selected from the group consisting of a lung cancer free control (normal) and a lung cancer patient or subject at risk of developing lung cancer. In another example, the control group is subjects who have never smoked. In another example, the control group is subjects who are former smokers. In another example, the control group is smokers.
In other preferred embodiments, the determination of whether the subject is at risk or developing lung cancer (e.g., NSCLC or SCLC), or is in a prediagnostic stage of lung cancer, involves determination of a score based on expression of one or more of the RNA biomarkers identified herein. As used herein, the term “score” refers to an integer or number, that can be determined mathematically, for example by using computational models a known in the art, which can include but are not limited to, SMV, as an example, and that is calculated using any one of a multitude of mathematical equations and/or algorithms known in the art for the purpose of statistical classification. Such a score is used to enumerate one outcome on a spectrum of possible outcomes. The relevance and statistical significance of such a score depends on the size and the quality of the underlying data set used to establish the results spectrum. For example, a blind sample may be input into an algorithm, which in turn calculates a score based on the information provided by the analysis of the blind sample. This results in the generation of a score for said blind sample. Based on this score, a decision can be made, for example, how likely the patient, from which the blind sample was obtained, is likely to develop lung cancer (i.e., is at risk of developing lung cancer or in a prediagnostic phase of lung cancer). The ends of the spectrum may be defined logically based on the data provided, or arbitrarily according to the requirement of the experimenter. In both cases the spectrum needs to be defined before a blind sample is tested. As a result, the score generated by such a blind sample, for example the number “40” may indicate that the corresponding patient is at risk of developing lung cancer, based on a spectrum defined as a scale from 1 to 50, with “1” being defined as not being at risk of developing lung cancer and “50” being defined as having a very high risk of developing lung cancer.
As discussed above, there are a variety of methods for the measurement of RNA biomarker expression including, but not limited to, hybridization-based methods, for example, microarray, northern blotting, bioluminescent, sequencing methods and real-time quantitative polymerase chain reaction (qPCR or RT-qPCR). In some preferred embodiments, the most robust technology that provides precise, reproducible and accurate quantitative result and highest dynamic range is qPCR, which is currently considered the standard commonly used to validate the results of other technologies. A variation of such method is, for example, digital polymerase chain reaction (digital PCR), may also be used. Thus, in one example, the method as disclosed herein further comprises measuring the expression level of at least one biomarker as listed herein and/or identified in the examples.
Any sample obtained from a subject can be used according to the method of the present disclosure, so long as the sample in question contains nucleic acid sequences. More specifically, the sample is to contain RNA. In one example, the sample is obtained from a subject that may or may not have cancer or may or may not be at risk of developing lung cancer. In another example, the sample is obtained from a subject who has cancer. In another example, the sample is obtained from a subject who is cancer-free. In yet another example, the sample is obtained from a subject who is lung cancer-free. In a further example, the sample is obtained from a subject who is normal and lung cancer-free.
As person skilled in the art, having possession of the present disclosure, would be capable of working the present invention. An illustrative example as to the use of the present invention is provided as follows: having obtained a sample from a subject, of which is not known if they suffer from lung cancer or if they are lung cancer free, or of which it is not known if they are at risk of developing lung cancer, the sample is analyzed and a differential expression of one or a set of two or more RNA biomarkers, according to the present disclosure and as listed herein is determined. This differential expression data is then compared to the differential expression levels, as provided herein, and which a person skilled in the art would understand the data. Optionally, a further mathematical score may be determined, which would also take into consideration further statistical parameters relevant to increasing the significance and the accuracy of the provided data set. Based on this information, the person skilled in the art would then be able to determine if the subject in question is cancer-free or has cancer, or, in especially preferred embodiments, if the subject is at risk of developing lung cancer or is in a prediagnostic stage of lung cancer.
The present disclosure further contemplates any method capable of receiving, processing, and transmitting the information to and from laboratories conducting the assays, information provides, medical personal, and subjects. For example, in some embodiments of the present disclosure, a sample (e.g, a blood or tissue sample) is obtained from a subject and submitted to a profiling service (e.g, clinical lab at a medical facility, genomic profiling business, etc.), located in any part of the world (e.g, in a country different than the country where the subject resides or where the information is ultimately used) to generate raw data. Where the sample comprises a tissue or other biological sample, the subject may visit a medical center to have the sample obtained and sent to the profiling center, or subjects may collect the sample themselves and directly send it to a profiling center. Where the sample comprises previously determined biological information, the information may be directly sent to the profiling service by the subject (e.g, an information card containing the information may be scanned by a computer and the data transmitted to a computer of the profiling center using an electronic communication systems). Once received by the profiling service, the sample is processed and a profile is produced (i.e., marker data), specific for the diagnostic or prognostic information desired for the subject.
The profile data is then prepared in a format suitable for interpretation by a treating clinician. For example, rather than providing raw expression data, the prepared format may represent a diagnosis or risk assessment for the subject, along with recommendations for particular treatment options. The data may be displayed to the clinician by any suitable method. For example, in some embodiments, the profiling service generates a report that can be printed for the clinician (e.g., at the point of care) or displayed to the clinician on a computer monitor.
In some embodiments, the information is first analyzed at the point of care or at a regional facility. The raw data is then sent to a central processing facility for further analysis and/or to convert the raw data to information useful for a clinician or patient. The central processing facility provides the advantage of privacy (all data is stored in a central facility with uniform security protocols), speed, and uniformity of data analysis. The central processing facility can then control the fate of the data following treatment of the subject. For example, using an electronic communication system, the central facility can provide data to the clinician, the subject, or researchers.
In some embodiments, the subject is able to directly access the data using the electronic communication system. The subject may choose further intervention or counseling based on the results. In some embodiments, the data is used for research use. For example, the data may be used to further optimize the inclusion or elimination of markers as useful indicators of a particular condition or stage of disease or as a companion diagnostic to determine a treatment course of action. In some embodiments, the results are used to select candidate therapies for drug screening or clinical trials.
Compositions for use in the methods described herein include, but are not limited to, kits comprising one or more reagents for determining the presence and/or level of markers described herein in a sample. In some embodiments, the reagents are, for example, one or more nucleic acid primers for the amplification, extension, or sequencing of the genes.
In preferred embodiments, the kits contain all of the components necessary to perform a detection assay, including all controls, directions for performing assays, and any necessary software for analysis and presentation of results.
EXPERIMENTAL
The following examples are provided in order to demonstrate and further illustrate certain preferred embodiments and aspects of the present invention and are not to be construed as limiting the scope thereof.
Example 1
Methods Study population and data sources. We used the population-based Janus Serum Bank (JSB) cohort containing prediagnostic serum samples (29). The study participants were identified by linking the JSB to the Cancer Registry of Norway (CRN)(30). We restricted our analyses to patients later diagnosed with LC up to 10 years after blood donation and control samples from individual’s cancer-free (except non-melanoma skin cancer) at least 10 years after sample collection. We frequency matched cases and controls on confounders, such as sex, age at donation and sample pre-processing(28), as described in our previous study (25). Smoking, collected from health survey data, was classified as current, former, or never smokers (30). We only included smokers (current and former) in the study.
Tumor staging. Detailed cancer information was selected from the CRN that has systematically collected mandatory notification on cancer occurrence for the Norwegian population since 1952 (30). The cases were classified into histological subtypes: NSCLC, SCLC and others, the latter referring to other less defined or multiple histologies. Stage at diagnosis was encoded with the TNM system: early (localized - stage I), locally advanced (regional - stage II and III), advanced or metastatic (distant - stage IV) and unknown (32).
Laboratory processing and bioinformatic analyses. We extracted RNA from 400 pL serum using phenol-chloroform and miRNeasy Serum/Plasma kit (Qiagen, Valencia, CA, USA). Libraries were prepared with the NEBNext Small RNA kit (NEB, Ipswich, MA, USA) and sequenced on a HiSeq 2500 platform to on average 18 million sequences per sample (Illumina, San Diego, CA, USA). We used a large annotation dataset that contains 10 different RNA classes available in serum (17): miRNAs, miRNA hairpin, isomiRs, piRNAs, tRNAs, tRNA derived fragments (tRFs), snoRNAs, miscRNAs, IncRNAs and mRNAs. Our bioinformatics workflow includes quality control, adapter trimming, read mapping, read counting and creation of count tables. Algorithms, their versions and annotation databases, are described in (17, 25). RNAs with fewer than 5 reads in less than 80% of the samples were filtered out to ensure a stable signal. We used DESeq2’s (33) variance stabilizing normalization function to normalize identified RNA counts. The optmatch (vO.9-11) R package (github.com/markmfredrickson/optmatch) selected appropriately matched controls while building models.
Machine learning classification algorithms and training/testing workflow. High dimensionality (fewer samples than variables) is often a problem in modelling RNA-seq data. Our preliminary analysis showed that ML algorithms with regularization produced successful models. Non-linear associations were also expected. Therefore, we selected five ML algorithms to create our initial models: lasso, elastic-net, sparse group lasso (SGL), random forest (RF) and extreme gradient boosting (XGBoost) algorithms. We used 5-fold cross validation (if available) to tune hyperparameters for model training, which also prevents overfitting. RNA transcripts can be grouped on type and other biological characteristics. Therefore, for the SGL models, we used RNA types as grouping variables.
R implementations of these algorithms were used: caret (v6.0-84) and glmnet (yl.Q- 18) packages for elastic-net and the lasso, sglfast (vO.10) and msg! (v2.3.9) for the SGL models and xgboost (v 1.0.0.2) for XGBoost. The datasets were split into training (70%) and test (30%) (Fig, IB) for each group (i.e., histologies or prediagnostic time window). We repeated this step 5 times to select 5 different training and test datasets which were balanced for case/control numbers and also matched for confounders (i.e sex, age etc.). The test dataset of each repeat was only used for testing to overcome overfitting and assess true performance. The performance of the classifiers was mainly evaluated by area under the ROC curves (AUC). We also calculated accuracy, sensitivity and specificity.
Standard full-time and prediagnostic models. We refer to models that do not take prediagnostic time into account as standard full-time models (Fig. IB). We trained these models for all histologies and histology specific. Prediagnostic models were created using a sliding windows approach to find optimal time intervals. We first selected 3 different window sizes, 2, 3 and 4 years, which were moved over 10 years prior to diagnosis time. We then built models based on samples captured by these sliding windows. We used the workflow described above to train both standard and prediagnostic models.
Feature reduction/selection methods. We implemented feature selection methods to improve model performances, including single-RNA class, lasso-selection and significantselection. In the single-RNA class method, we dropped all RNA types except one. In lassoselection, all non-zero features selected by the lasso classification models were pooled. Next, we retrained new classification models which were restricted to use only these features. In significance-selection, a univariate regression analysis was done per feature and significant features (multiple testing adjusted) were used to train classification models.
Results
Patient characteristics and RNA-seq profiles. In this study, we selected 400 patients with prediagnostic serum samples including multiple samples from the same patients. We also included 525 individuals as controls. After excluding failed or low input samples, we obtained RNA-seq data from 1061 serum samples. However, samples from individuals without any smoking history (i.e., never smokers) or missing information were excluded from further analyses. This resulted in 535 case and 263 control samples from 645 current or former smokers for modelling and testing (Table 1 and Fig. 1 A).
After filtering out low-count transcripts, 3306 RNAs were selected as candidate features and used in the models: 202 miRNAs, 1137 isomiRs, 89 miscRNAs, 380 piRNAs, 119 snoRNAs, 530 tRFs, 790 mRNAs and 59 IncRNAs.
ML algorithms can differentiate between prediagnostic cases and controls regardless of prediagnostic time. We first evaluated the classification performance of the ML algorithms in terms of average AUCs on test datasets, created by five random repeats as explained in materials and methods.
All samples were included in algorithm evaluation regardless of their stage at diagnosis and prediagnostic time which were regarded as full-time standard models (Fig IB). The average AUC of all algorithms was 0.67 (95% CI, 0.65-0.68) for all histologies, 0.67 (95% CI, 0.65-0.69) forNSCLC and 0.64 (95% CI, 0.62-0.66) for SCLC (Fig. 2A, dashed lines) on the test datasets. The XGBoost algorithm produced a higher AUC than the average, 0.71 (95% CI, 0.68-0.73). The XGBoost models also performed better when the samples were stratified by histologies: NSCLC, 0.70 (95% CI, 0.65-0.75), and SCLC, 0.71 (95% CI, 0.68- 0.74).
Although the models of all algorithms had comparable performances in terms of average AUCs, they differ in terms of total number of non-zero features (e.g., different model complexity). For example, RF selected more than 3000 non-zero features while the lasso model selected fewer than 25 features. However, the profiles of the top features, ranked in terms of feature importance, usually consisted of similar RNAs (e.g., miRNAs or tRFs).
Since XGBoost produced the most predictive full-time models, we analyzed the best predictors of these and ranked them based on their importance. The top 3 best features were an isomiR of hsa-miR-486-5p, piR-hsa-28723 and INTS10 for all histologies; Y-RNA, piR- hsa-28723 and GPB3 for NSCLC; and tRF-BS68BFD2, RN7SL724P and tRF-947673FE5 for SCLC. An in-depth investigation of selected features by other algorithms also showed common RNAs. For example, Y-RNA and hsa-miR-486-5p isomiR were among the top predictors of the RF, elastic-net, the SGL and the lasso models forNSCLC; tRF-BS68BFD2 for SCLC. We also performed KEGG pathway enrichment analysis based on the common miRNA, mRNA and isomiR features. The results showed that many cancer-related pathways were significantly (p < 0.01) enriched such as MAPK signaling, mTOR signaling and AMPK signaling. Feature selection improves model performance and reduces model complexity.
We implemented regularization and the model parameters were selected by cross-validation while training. However, the total number of selected features were higher than expected. We therefore produced XGBoost (the best algorithm) models that included only a single RNA class (e.g., miRNA, isomiR etc.). This method showed that miscRNA-only and miRNA-only models achieved better classification performance than the other RNA classes regardless of histology and stage at diagnosis (Table 2). The best separators of these models included, hsa- miR-99a-5p, hsa-miR-1908-5p, hsa-miR-3925-5p and Y-RNA related transcripts, RNY1P5 and RNY4P30. When we took histology into account, miRNAs and isomiRs for NSCLC and miscRNAs for SCLC produced better models (Table 2). The most important features of histology-dependent models included hsa-miR-629-5p, hsa-miR-99a-5p, hsa-miR-486-5p isomiR, hsa-miR-151a-3p isomiR for NSCLC; 7SL RNA related transcripts and Vault-RNA for SCLC.
Single RNA class models also implied that feature selection can further improve model performances, therefore we tested other feature selection methods. The results showed lasso feature selection improved AUC values and reduced complexity (Table 2). The most important features of lasso-selected models included hsa-miR-423-5p isomiR, GBP 3 and piR- hsa-28723 for all histologies; Y-RNA, hsa-miR-423-5p isomiR and /./M '01362 for NSCLC; HIST1H4E, PTCH2 and tRF-R29P4P9L5HJVE for SCLC. Moreover, univariate significant feature selection greatly reduced model complexity with an acceptable performance (Table 2). For example, SCLC models only included 11 RNAs. The most important features were GBP3, LINC01362 and hsa-miR-30a-5p for all histologies; LINC01362, GBP3 and tRF- 9MV47P596V for NSCLC; piR-hsa-7001 and tRF-7343RX6NMH3 for SCLC.
Histology-specific prediagnostic models can improve prediction performance. We previously demonstrated that RNA levels are dynamic and histology-specific in prediagnostic samples 25. We therefore trained models stratified by prediagnostic time which were selected by a sliding window approach as explained in materials and methods.
The results showed inclusion of prediagnostic time and histological subtype together creates better models for specific time intervals. For example, SCLC models restricted to samples from 2 to 5 years prior to diagnosis had an average AUC of 0.84 (95% CI, 0.77-0.9). Another model of SCLC samples that only utilized miRNAs restricted to 3 to 5 years prior to diagnosis had an average AUC of 0.85 (95% CI, 0.76-0.93) on the test datasets. Both SCLC models selected the same miRNAs as their most important features such as hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-215-5p. Besides miRNAs, an isomiR of hsa-miR-451a and RN7SL181P were the most important features of prediagnostic SCLC models. Enrichment analysis of the most important features identified signaling pathways, such as MAPK, PI3K- Akt, RAS, and other pathways like choline metabolism, cellular senescence and PD-L1 expression & PD-1 checkpoint. Similarly, NSCLC models restricted to 6 to 8 years prior to diagnosis had an average AUC of 0.81 (95% CI, 0.75-0.86). The most important RNAs of this period were tRF-YP9L0N4V3, an isomiR of hsa-miR-484 and tRF-9MV47P596V. More than 70 pathways were enriched such as endocytosis, MAPK, RAS, choline metabolism and neurotrophin signaling pathway.
Frequent features can create simple and accurate models. We created simple models by compiling the top performing features from previous models. Our results showed less complex models for both full-time and prediagnostic models (Fig. 4) with slightly improved prediction performance. The average AUC for all histologies was 0.8 (95% CI, 0.75-0.85); NSCLC model was 0.79 (95% CI, 0.76-0.82); SCLC model was 0.83 (95% CI, 0.78-0.88) (Fig.4 and Table 3). Graphic representations of full-time model feature importance and prediagnostic feature importance are provided in Figs. 6 and 7.
Table 1. Clinical and histological characteristics of samples used in modelling.
Stage
Figure imgf000031_0001
Locally Advanced
Early (Localized) Advanced (Distant) Unknown Controls
(Regional)
Histology
NSCLC 84 99 167 11
SCLC 9 35 76 4 -
Others 10 5 31 4
Sex
Male 78 104 178 12 185
Female 25 35 96 7 78
Age at donation, yrs
Mean (SD) 54.3 (7.33) 54.9 (9.08) 53.5 (8.25) 51.8 (6.53) 49.9 (10.9)
Age at diagnosis, yrs
Mean (SD) 59.8 (7.67) 60.6 (8.89) 59.4 (8.31) 58.6 (6.05) Prediagnostic sampling time, yrs
Mean (SD)
5.52 (2.81) 5.63 (2.79) 5.91 (2.66) 6.75 (2.18)
Total Samples 103 139 274 19 263
Total Individuals 645 (smokers)
Table 2. Averages of AUCs, accuracies (acc), sensitivities (sn) and specificities (sp) of the XGBoost algorithm models on test datasets when prediagnostic time was not included.
Figure imgf000032_0001
Figure imgf000033_0001
* Average number of non-zero features selected by the models.
Table 3. All selected features and performance of simple XGBoost models.
Figure imgf000033_0002
Figure imgf000034_0001
* Including others histologies, ** Prediagnostic models.
References
1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394-424.
2. Wild CP, Weiderpass W, Stewart BW. World Cancer Report: Cancer Research for Cancer Prevention. Lyon, France: International Agency for Research on Cancer. Published online 2020. http://publications.iarc.fr/586
3. Aberle DR, Berg CD, Black WC, et al. The National Lung Screening Trial: overview and study design. Radiology. 2011;258(l):243-253.
4. Brustugun OT, Gronberg BH, Fj ellbirkeland L, et al. Substantial nation-wide improvement in lung cancer relative survival in Norway from 2000 to 2016. Lung Cancer. 2018;122:138-145.
5. Bach PB, Mirkin JN, Oliver TK, et al. Benefits and harms of CT screening for lung cancer: a systematic review. JAMA. 2012;307(22):2418-2429.
6. Peled N, Ilouze M. Screening for Lung Cancer: What Comes Next? J Clin Oncol. 2015;33(33):3847-3848.
7. Seijo LM, Peled N, Ajona D, et al. Biomarkers in Lung Cancer Screening: Achievements, Promises, and Challenges. J Thorac Oncol. 2019;14(3):343-357.
8. Hanash SM, Ostrin EJ, Fahrmann JF. Blood based biomarkers beyond genomics for lung cancer screening. Transl Lung Cancer Res. 2018;7(3):327-335.
9. de Koning HJ, van der Aalst CM, de Jong PA, et al. Reduced Lung-Cancer Mortality with Volume CT Screening in a Randomized Trial. N Engl J Med. 2020;382(6):503-513.
10. Gopal M, Abdullah SE, Grady JJ, Goodwin JS. Screening for lung cancer with low- dose computed tomography: a systematic review and meta-analysis of the baseline findings of randomized controlled trials. J Thorac Oncol. 2010;5(8): 1233-1239.
11. Ko J, Baldassano SN, Loh P-L, Kording K, Litt B, Issadore D. Machine learning to detect signatures of disease in liquid biopsies - a user’s guide. Lab Chip. 2018;18(3):395-405. 12. Sandfeld-Paulsen B, Jakobsen KR, Bask R, et al. Exosomal Proteins as Diagnostic Biomarkers in Lung Cancer. J Thorac Oncol. 2016;l 1(10): 1701-1710.
13. Keller A, Meese E. Can circulating miRNAs live up to the promise of being minimal invasive biomarkers in clinical settings? Wiley Interdiscip Rev RNA. 2016;7(2): 148-156.
14. Pichler M, Cahn GA. MicroRNAs in cancer: from developmental genes in worms to their clinical application in patients. Br J Cancer. 2015;113(4):569-573.
15. Tian F, Wang J, Ouyang T, et al. MiR-486-5p Serves as a Good Biomarker in Nonsmall Cell Lung Cancer and Suppresses Cell Growth With the Involvement of a Target PIK3R1. Front Genet. 2019;10:688.
16. Fehlmann T, Kahraman M, Ludwig N, et al. Evaluating the Use of Circulating MicroRNA Profiles for Lung Cancer Detection in Symptomatic Patients. JAMA Oncol. 2020;6(5):714-723.
17. Umu SU, Langseth H, Bucher-Johannessen C, et al. A comprehensive profile of circulating RNAs in human serum. RNA Biol. 2018;15(2):242-250.
18. Murillo OD, Thistlethwaite W, Rozowsky J, et al. exRNA Atlas Analysis Reveals Distinct Extracellular RNA Cargo Types and Their Carriers Present across Human Biofluids. Cell. 2019;177(2):463-477.el5.
19. Freedman JE, Gerstein M, Mick E, et al. Diverse human extracellular RNAs are widely detected in human plasma. Nat Commun. 2016;7: 11106.
20. Svoronos AA, Engelman DM, Slack FJ. OncomiR or Tumor Suppressor? The Duplicity of MicroRNAs in Cancer. Cancer Res. 2016;76(13):3666-3670.
21. Slack FJ, Chinnaiyan AM. The Role of Non-coding RNAs in Oncology. Cell. 2019;179(5):1033-1055.
22. Kim KM, Abdelmohsen K, Mustapic M, Kapogiannis D, Gorospe M. RNA in extracellular vesicles. Wiley Interdiscip Rev RNA. 2017;8(4). doi:10.1002/wma.l413
23. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100(l):57-70.
24. Gutschner T, Diederichs S. The hallmarks of cancer: a long non-coding RNA point of view. RNA Biol. 2012;9(6):703-719.
25. Umu SU, Langseth H, Keller A, et al. A 10 year prediagnostic followup study shows that serum RNA signals are highly dynamic in lung carcinogenesis. Mol Oncol. Published online 2019. https://febs.onlinelibrary.wiley.eom/doi/abs/10.1002/1878-026L12620
26. Lund E, Holden L, Bovelstad H, et al. A new statistical method for curve group analysis of longitudinal gene expression data illustrated for breast cancer in the NOW AC postgenome cohort as a proof of principle. BMC Med Res Methodol. 2016;16:28. 27. Burton J, Umu SU, Langseth H, et al. Serum RNA Profiling in the 10-Years Period Prior to Diagnosis of Testicular Germ Cell Tumor. Published online October 28, 2020. doi:10.3389/fonc.2020.574977
28. Rounge TB, Umu SU, Keller A, et al. Circulating small non-coding RNAs associated with age, sex, smoking, body mass and physical activity. Sci Rep. 2018;8(l):1760.
29. Langseth H, Gislefoss RE, Martinsen JI, Dillner J, Ursin G. Cohort Profile: The Janus Serum Bank Cohort in Norway. Int J Epidemiol. 2017;46(2):403-404g.
30. Larsen IK, Smastuen M, Johannesen TB, et al. Data quality at the Cancer Registry of Norway: an overview of comparability, completeness, validity and timeliness. Eur J Cancer. 2009;45(7):1218-1231.
31. Hjerkind KV, Gislefoss RE, Tretli S, et al. Cohort Profile Update: The Janus Serum Bank Cohort in Norway. Int J Epidemiol. 2017;46(4):l 101-1102f
32. Cancer Registry of Norway. Cancer in Norway 2017 - Cancer Incidence, Mortality, Survival and Prevalence in Norway. (TE Robsahm TK Grimsrud S Laronningen E Jakobsen G Ursin IKLBMTBJ, ed.).; 2018.
33. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.
34. Yanaihara N, Caplen N, Bowman E, et al. Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell. 2006;9(3): 189-198.
35. The RNAcentral Consortium. RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res. 2019;47(Dl):D1250-D1251.
36. Zhou R, Zhou X, Yin Z, et al. MicroRNA-574-5p promotes metastasis of non-small cell lung cancer by targeting PTPRU. Sci Rep. 2016;6:35714.
37. Foss KM, Sima C, Ugolini D, Neri M, Allen KE, Weiss GJ. miR-1254 and miR-574- 5p: serum-based microRNA biomarkers for early-stage non-small cell lung cancer. J Thorac Oncol. 2011;6(3):482-488.
38. ElKhouly AM, Youness RA, Gad MZ. MicroRNA-486-5p and microRNA-486-3p: Multifaceted pleiotropic mediators in oncological and non-oncological conditions. Noncoding RNA Res. 2020;5(l): 11-21.
39. Abdelmohsen K, Panda AC, Kang M-J, et al. 7SL RNA represses p53 translation by competing with HuR. Nucleic Acids Res. 2014;42(15): 10099-10111.
40. Li C, Qin F, Hu F, et al. Characterization and selective incorporation of small noncoding RNAs in non-small cell lung cancer extracellular vesicles. Cell Biosci. 2018;8:2. 41. Yu H, Guan Z, Cuk K, Zhang Y, Brenner H. Circulating MicroRNA Biomarkers for Lung Cancer Detection in East Asian Populations. Cancers . 2019; 11 (3). doi : 10.3390/ cancers 11030415
42. Babapoor S, Fleming E, Wu R, Dadras SS. A novel miR-451a isomiR, associated with amelanotypic phenotype, acts as a tumor suppressor in melanoma by retarding cell migration and invasion. PLoS One. 2014;9(9):el07502.
43. Glunde K, Jacobs MA, Bhujwalla ZM. Choline metabolism in cancer: implications for diagnosis and therapy. Expert Rev Mol Diagn. 2006;6(6): 821-829.
44. Klupczynska A, Plewa S, Kasprzyk M, Dyszkiewicz W, Kokot ZJ, Matysiak J. Serum lipidome screening in patients with stage I non-small cell lung cancer. Clin Exp Med.
2019;19(4):505-513.

Claims

CLAIMS We claim:
1. A method of assaying a sample from a subject in need thereof, comprising: assaying the sample from the subject to determine the level of one or more markers selected from the group consisting of: i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO:1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23-BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY (TCCTGTACTGAGCTGCCCCGAGG (SEQ ID NO:3)), TLN1, tRF-V47P59D9, tRF-86V8WPMN!EJ3, tRF-6SXMSL73VL4Y, and tRF- QKF1R3WE8RO8IS; ii) LINC01362, Y-RNA, iso-23-B0NKZ01J0D (AAAAGCTGGGTTGAGAGGGCGTC (SEQ ID NO:4)), iso-22-MKJIJLJ2Q (CGGGGCAGCTCAGTACAGGATT (SEQ ID NO:5)), iso-21-N2NBQRZ00 (CTGGACTGAAGCTCCTTGAGG (SEQ ID NO: 6)), GBP3, iso-20-RNUW92OI (GGGTTTACGTTGGGAGAACT (SEQ ID NO: 7)), GNAS, hsa-miR-30a-3p, NHSL2, piR- hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23-909U247N04 (TGGAGTGTGACAATGGTGTTTGG (SEQ ID NO: 8)), tRF-I89NJ4S2, tRF- 9MV47P596VE, tRF-86J8WPMN!EJ3, tRF-86V8WPMN!EJ3, and tRF-Q!Q89P9L8422E; iii) hsa-miR-1273h-5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2- 20P, RNY4P16, RNY4P28, RNY4P9, iso-23-8K4P8R8SDE (TCAGGCTCAGTCCCCTCCCGATT (SEQ ID NO:9)), tRF-9MV47P594, tRF- YP9LON4V3, tRF-KY7343RXI7, and tRF-PSQP4PW3FJI0V; iv) AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22-947673FE5, AKAP9, MIGA1, RAP1B, RN7SL724P, RUFY2, iso-23 -X3749W540L (TGAGTGTGTGTGTGTGAGTGTGA (SEQ ID NO: 10)), tRF-BS68BFD2, tRF-R29P4P9L5HJVE, and tRF-ZRS3S3RX8HYVD; and v) iso-23-B0NKZ01JDW (AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO: 11)), ATL3, iso-21 -Q85XJJ70D (GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12)), HMGB1, hsa-miR-19b-3p, hsa-miR-215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, and TNFRSF13C.
2. A method of determining an increased risk of lung cancer in a subject in need thereof, comprising: a) assaying a sample from the subject to determine the level of one or more markers selected from the group consisting of: i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO:1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23-BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY (TCCTGTACTGAGCTGCCCCGAGG (SEQ ID NO:3)), TLN1, tRF-V47P59D9, tRF-86V8WPMN!EJ3, tRF-6SXMSL73VL4Y, and tRF- QKF1R3WE8RO8IS; ii) LINC01362, Y-RNA, iso-23-B0NKZ01J0D (AAAAGCTGGGTTGAGAGGGCGTC (SEQ ID NO:4)), iso-22-MKJIJLJ2Q (CGGGGCAGCTCAGTACAGGATT (SEQ ID NO:5)), iso-21-N2NBQRZ00 (CTGGACTGAAGCTCCTTGAGG (SEQ ID NO: 6)), GBP3, iso-20-RNUW92OI (GGGTTTACGTTGGGAGAACT (SEQ ID NO: 7)), GNAS, hsa-miR-30a-3p, NHSL2, piR- hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23-909U247N04 (TGGAGTGTGACAATGGTGTTTGG (SEQ ID NO: 8)), tRF-I89NJ4S2, tRF- 9MV47P596VE, tRF-86J8WPMN!EJ3, tRF-86V8WPMN!EJ3, and tRF-Q!Q89P9L8422E; iii) hsa-miR-1273h-5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2- 20P, RNY4P16, RNY4P28, RNY4P9, iso-23-8K4P8R8SDE (TCAGGCTCAGTCCCCTCCCGATT (SEQ ID NO:9)), tRF-9MV47P594, tRF- YP9LON4V3, tRF-KY7343RXI7, and tRF-PSQP4PW3FJI0V; iv) AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22-947673FE5, AKAP9, MIGA1, RAP1B, RN7SL724P, RUFY2, iso-23 -X3749W540L (TGAGTGTGTGTGTGTGAGTGTGA (SEQ ID NO: 10)), tRF-BS68BFD2, tRF-R29P4P9L5HJVE, and tRF-ZRS3S3RX8HYVD; and v) iso-23-B0NKZ01JDW (AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO: 11)), ATL3, iso-21 -Q85XJJ70D (GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12)), HMGB1, hsa-miR-19b-3p, hsa-miR-215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, and TNFRSF13C; and b) determining an increased risk of the subject for developing lung cancer when the one or more markers are present in the sample or increased in the sample.
3. A method of determining an increased risk of lung cancer in a subject in need thereof, comprising: a) assaying a sample from the subject for the level of one or more markers selected from the group consisting of iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO: 1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23-BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD- 3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa- 28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY (TCCTGTACTGAGCTGCCCCGAGG (SEQ ID NO:3)), TLN1, tRF-V47P59D9, tRF- 86V8WPMN1EJ3, tRF-6SXMSL73VL4Y, and tRF-QKF!R3WE8RO8IS; and b) determining an increased risk of the subject for developing lung cancer when the one or more markers are present in the sample or increased in the sample.
4. A method of determining an increased risk of non-small cell lung cancer (NSCLC) in a subject in need thereof, comprising: a) assaying a sample from the subject for the level of one or more markers selected from the group consisting of LINC01362, Y-RNA, iso-23-B0NKZ01J0D (AAAAGCTGGGTTGAGAGGGCGTC (SEQ ID NO:4)), iso-22-MKJIJLJ2Q (CGGGGCAGCTCAGTACAGGATT (SEQ ID NO:5)), iso-21-N2NBQRZ00 (CTGGACTGAAGCTCCTTGAGG (SEQ ID NO: 6)), GBP3, iso-20-RNUW92OI (GGGTTTACGTTGGGAGAACT (SEQ ID NO: 7)), GNAS, hsa-miR-30a-3p, NHSL2, piR- hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23-909U247N04 (TGGAGTGTGACAATGGTGTTTGG (SEQ ID NO: 8)), tRF-I89NJ4S2, tRF- 9MV47P596VE, tRF-86J8WPMN!EJ3, tRF-86V8WPMN!EJ3, and tRF-Q!Q89P9L8422E; and b) determining an increased risk of the subject for developing lung cancer when the one or more markers are present in the sample or increased in the sample.
5. A method of determining an increased risk of NSCLC in a subject in need thereof, comprising: a) assaying a sample from the subject for the level of one or more markers selected from the group consisting of hsa-miR-1273h-5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, iso-23- 8K4P8R8SDE (TCAGGCTCAGTCCCCTCCCGATT (SEQ ID NO:9)), tRF-9MV47P594, tRF-YP9LON4V3, tRF-KY7343RXI7, and/or tRF-PSQP4PW3FJI0V; and b) determining an increased risk of the subject for developing lung cancer when the one or more markers are present in the sample or increased in the sample as compared to a control.
6. A method of determining an increased risk of small cell lung cancer (SCLC) in a subject in need thereof, comprising: a) assaying a sample from the subject for the level of one or more markers selected from the group consisting of AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22-947673FE5, AKAP9, MIGA1, RAP1B, RN7SL724P, RUFY2, iso-23-X3749W540L (TGAGTGTGTGTGTGTGAGTGTGA (SEQ ID NO: 10)), tRF-BS68BFD2, tRF- R29P4P9L5HJVE, and tRF-ZRS3S3RX8HYVD; and b) determining an increased risk of the subject for developing lung cancer when the one or more markers are present in the sample or increased in the sample.
7. A method of determining an increased risk of SCLC in a subject in need thereof, comprising: a) assaying a sample from the subject for the level of one or more markers selected from the group consisting of iso-23-B0NKZ01 JDW (AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO: 11)), ATL3, iso-21-Q85XJJ70D (GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12)), HMGB1, hsa-miR-19b-3p, hsa-miR- 215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, or TNFRSF13Cin a sample from a subject; and b) determining an increased risk of the subject for developing lung cancer when the one or more markers are present in the sample or increased in the sample.
8. The method of any one of claims 2, 3, 4, and 6, wherein the increased risk of developing lung cancer is an increased risk of developing lung cancer within 2-8 years.
9. The method of claim 5, wherein the increased risk of developing lung cancer is an increased risk of developing lung cancer within 6-8 years.
10. The method of claim 7, wherein the increased risk of developing lung cancer is an increased risk of developing lung cancer within 2-5 years.
11. A method of determining an increased risk of NSCLC in a subject in need thereof, comprising: a) assaying a sample from the subject for the level of one or more markers selected from the group consisting of hsa-miR-1273h-5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2-20P, RNY4P16, RNY4P28, RNY4P9, iso-23- 8K4P8R8SDE (TCAGGCTCAGTCCCCTCCCGATT (SEQ ID NO:9)), tRF-9MV47P594, tRF-YP9LON4V3, tRF-KY7343RXI7, and tRF-PSQP4PW3FJI0V; and b) determining an increased risk of the subject developing lung cancer within 6-8 years when the one or more markers are present in the sample or increased in the sample.
12. A method of determining an increased risk of NSCLC in a subject in need thereof, comprising: a) assaying a sample from the subject for the level of one or more markers selected from the group consisting of iso-23-B0NKZ01 JDW (AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO: 11)) ATL3, iso-21-Q85XJJ70D (GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12)), HMGB1, hsa-miR-19b-3p, hsa-miR- 215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, and TNFRSF13Cin a sample from a subject; and b) determining an increased risk of the subject developing lung cancer within 3-5 years when the one or more markers are present in the sample or increased in the sample.
13. The method of any of claims 1 to 12, wherein the assaying comprises the use of one or more reagents are selected from the group consisting of one or more sequencing primers, one or more amplification primers and one or more nucleic acid probes.
14. The method of any of claims 1 to 13, wherein the sample is selected from the group consisting of whole blood, a blood product, a cell sample, a tissue sample, and a bodily fluid sample.
15. The method of any of claims 1 to 14, wherein the one or more markers are 5 or more.
16. The method of any of claims 1 to 14, wherein the one or more markers are 10 or more.
17. The method of any of claims 1 to 14, wherein the one or more markers are 15 or more.
18. The method of any of claims 1 to 14, wherein the one or more markers are 20 or more.
19. The method of any of claims 1 to 14, wherein the one or more markers are 25 or more.
20. The method of any of claims 1 to 19, wherein the subject is a smoker or former smoker.
21. The method of any of claims 1 to 20, further comprising repeating the assaying at regular or irregular time intervals.
22. The method of claim 21, wherein the time intervals are selected from the group consisting of monthly, yearly, and biannually.
23. The method of any of claims 1 to 22, wherein subjects identified as having an increased risk of lung cancer are offered further cancer screening.
24. The method of claim 23, wherein the further cancer screening is selected from the group consisting of molecular diagnostics and computer tomography (CT) scans.
25. The method of any one of claims 1 to 24, wherein the level of the one or more markers in the sample is compared to a control.
26. The method of claim 25, wherein the control is selected from the group consisting of at least one of a nonsmoker control, a former smoker control, and a cancer-free control.
27. The method of any one of claims 1 to 24, wherein the level of the one or more markers in the sample is expressed as a score.
28. The method of any one of claims 1 to 27, wherein the sample comprises nucleic acids.
29. The method of claim 28, wherein the sample comprises an RNA species.
30. The method of claim 29, wherein the sample comprises one or more RNA species selected from the group consisting of miRNA, isomiR, miscRNA, piRNAs, snoRNA, tRFs, mRNA, and IncRNA.
31. A kit, comprising: reagents for determining the level of one or more markers selected from the group consisting of: i) iso-20-5KP25HFF (GAGGGGCAGAGAGCGAGACA (SEQ ID NO:1)), GBP3, hsa-miR-30a-5p, INTS10, LINC01362, piR-hsa-28723, RNU1-8P, iso-23-BQ8DQWM4Z (AACATTCAACGCTGTCGGTGAGT (SEQ ID NO:2)), CTD-3252C9.4, DST, HBA2, HIST2H2AC, hsa-miR-99b-3p, LATS1, piR-hsa-28391, piR-hsa-28394, RN7SL181P, RN7SL8P, RNU2-27P, iso-23-8YUYFYKSY (TCCTGTACTGAGCTGCCCCGAGG (SEQ ID NO:3)), TLN1, tRF-V47P59D9, tRF-86V8WPMN!EJ3, tRF-6SXMSL73VL4Y, and/or tRF-QKF!R3WE8RO8IS; ii) LINC01362, Y-RNA, iso-23-B0NKZ01J0D
(AAAAGCTGGGTTGAGAGGGCGTC (SEQ ID NO:4)), iso-22-MKJIJLJ2Q (CGGGGCAGCTCAGTACAGGATT (SEQ ID NO:5)), iso-21-N2NBQRZ00 (CTGGACTGAAGCTCCTTGAGG (SEQ ID NO: 6)), GBP3, iso-20-RNUW92OI (GGGTTTACGTTGGGAGAACT (SEQ ID NO: 7)), GNAS, hsa-miR-30a-3p, NHSL2, piR- hsa-28488, RC3H2, RN7SL181P, RNU2-19P, RNY4P27, ISO-23-909U247N04 (TGGAGTGTGACAATGGTGTTTGG (SEQ ID NO: 8)), tRF-I89NJ4S2, tRF- 9MV47P596VE, tRF-86J8WPMN!EJ3, tRF-86V8WPMN!EJ3, and/or tRF- Q1Q89P9L8422E; iii) hsa-miR-1273h-5p, piR-hsa-27124, PTCH2, RN7SL40P, RN7SL617P, RNU2- 20P, RNY4P16, RNY4P28, RNY4P9, iso-23-8K4P8R8SDE (TCAGGCTCAGTCCCCTCCCGATT (SEQ ID NO:9)), tRF-9MV47P594, tRF- YP9LON4V3, tRF-KY7343RXI7, and/or tRF-PSQP4PW3FJI0V; iv) AC113404.1, C6orf223, HIST1H4E, hsa-miR-30a-5p, hsa-miR-574-5p, ODC1, PTCH2, PTMA, RN7SL181P, tRF-22-947673FE5, AKAP9, MIGA1, RAP1B, RN7SL724P, RUFY2, iso-23 -X3749W540L (TGAGTGTGTGTGTGTGAGTGTGA (SEQ ID NO: 10)), tRF-BS68BFD2, tRF-R29P4P9L5HJVE, and tRF-ZRS3S3RX8HYVD; and/or v) iso-23-B0NKZ01JDW (AAAAGCTGGGTTGAGAGGGCGCT (SEQ ID NO: 11)), ATL3, iso-21 -Q85XJJ70D (GCTGGGATTACAGGCGTGAGC (SEQ ID NO: 12)), HMGB1, hsa-miR-19b-3p, hsa-miR-215-5p, hsa-miR-30a-5p, hsa-miR-339-3p, hsa-miR-760, RN7SL277P, and TNFRSF13C.
32. The kit of claim 31, wherein the reagents are selected from the group consisting of one or more sequencing primers, one or more amplification primers and one or more nucleic acid probes.
PCT/IB2023/000079 2022-02-10 2023-02-10 Compositions and methods for characterizing lung cancer WO2023152568A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263308633P 2022-02-10 2022-02-10
US63/308,633 2022-02-10

Publications (2)

Publication Number Publication Date
WO2023152568A2 true WO2023152568A2 (en) 2023-08-17
WO2023152568A3 WO2023152568A3 (en) 2023-09-21

Family

ID=85772624

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2023/000079 WO2023152568A2 (en) 2022-02-10 2023-02-10 Compositions and methods for characterizing lung cancer

Country Status (1)

Country Link
WO (1) WO2023152568A2 (en)

Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4683195A (en) 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4683202A (en) 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
US4800159A (en) 1986-02-07 1989-01-24 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences
US4965188A (en) 1986-08-22 1990-10-23 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme
US5130238A (en) 1988-06-24 1992-07-14 Cangene Corporation Enhanced nucleic acid amplification process
US5225326A (en) 1988-08-31 1993-07-06 Research Development Foundation One step in situ hybridization assay
US5270184A (en) 1991-11-19 1993-12-14 Becton, Dickinson And Company Nucleic acid target generation
US5399491A (en) 1989-07-11 1995-03-21 Gen-Probe Incorporated Nucleic acid sequence amplification methods
US5455166A (en) 1991-01-31 1995-10-03 Becton, Dickinson And Company Strand displacement amplification
EP0684315A1 (en) 1994-04-18 1995-11-29 Becton, Dickinson and Company Strand displacement amplification using thermophilic enzymes
US5480784A (en) 1989-07-11 1996-01-02 Gen-Probe Incorporated Nucleic acid sequence amplification methods
US5545524A (en) 1991-12-04 1996-08-13 The Regents Of The University Of Michigan Compositions and methods for chromosome region-specific probes
US5695934A (en) 1994-10-13 1997-12-09 Lynx Therapeutics, Inc. Massively parallel sequencing of sorted polynucleotides
US5714330A (en) 1994-04-04 1998-02-03 Lynx Therapeutics, Inc. DNA sequencing by stepwise ligation and cleavage
US5750341A (en) 1995-04-17 1998-05-12 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
WO2000018957A1 (en) 1998-09-30 2000-04-06 Applied Research Systems Ars Holding N.V. Methods of nucleic acid amplification and sequencing
US6121489A (en) 1996-03-05 2000-09-19 Trega Biosciences, Inc. Selectively N-alkylated peptidomimetic combinatorial libraries and compounds therein
US6432360B1 (en) 1997-10-10 2002-08-13 President And Fellows Of Harvard College Replica amplification of nucleic acid arrays
US6485944B1 (en) 1997-10-10 2002-11-26 President And Fellows Of Harvard College Replica amplification of nucleic acid arrays
US6511803B1 (en) 1997-10-10 2003-01-28 President And Fellows Of Harvard College Replica amplification of nucleic acid arrays
US6573043B1 (en) 1998-10-07 2003-06-03 Genentech, Inc. Tissue analysis and kits therefor
US6787308B2 (en) 1998-07-30 2004-09-07 Solexa Ltd. Arrayed biomolecules and their use in sequencing
US6833246B2 (en) 1999-09-29 2004-12-21 Solexa, Ltd. Polynucleotide sequencing
US20050130173A1 (en) 2003-01-29 2005-06-16 Leamon John H. Methods of amplifying and sequencing nucleic acids
WO2006084132A2 (en) 2005-02-01 2006-08-10 Agencourt Bioscience Corp. Reagents, methods, and libraries for bead-based squencing

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101475984A (en) * 2008-12-15 2009-07-08 江苏命码生物科技有限公司 Non-small cell lung cancer detection marker, detection method thereof, related biochip and reagent kit
AU2011329753B2 (en) * 2010-11-19 2015-07-23 The Regents Of The University Of Michigan ncRNA and uses thereof
ES2911253T3 (en) * 2017-06-12 2022-05-18 Fraunhofer Ges Forschung miRNA-574-5p as a biomarker for prostaglandin-dependent tumor stratification
CN107916292A (en) * 2017-12-29 2018-04-17 唐山市人民医院 Predict 423 5p of brain metastasis molecular marked compound miR and the application in medicine and diagnostic kit
JP2023504334A (en) * 2019-08-26 2023-02-03 リキッド ラング ディーエックス Biomarkers for diagnosis of lung cancer
CN112760381B (en) * 2021-02-08 2022-11-01 复旦大学附属中山医院 miRNA (micro ribonucleic acid) kit for detecting lung adenocarcinoma prognosis

Patent Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4683202A (en) 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
US4683202B1 (en) 1985-03-28 1990-11-27 Cetus Corp
US4683195A (en) 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4683195B1 (en) 1986-01-30 1990-11-27 Cetus Corp
US4800159A (en) 1986-02-07 1989-01-24 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences
US4965188A (en) 1986-08-22 1990-10-23 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme
US5130238A (en) 1988-06-24 1992-07-14 Cangene Corporation Enhanced nucleic acid amplification process
US5225326A (en) 1988-08-31 1993-07-06 Research Development Foundation One step in situ hybridization assay
US5480784A (en) 1989-07-11 1996-01-02 Gen-Probe Incorporated Nucleic acid sequence amplification methods
US5399491A (en) 1989-07-11 1995-03-21 Gen-Probe Incorporated Nucleic acid sequence amplification methods
US5824518A (en) 1989-07-11 1998-10-20 Gen-Probe Incorporated Nucleic acid sequence amplification methods
US5455166A (en) 1991-01-31 1995-10-03 Becton, Dickinson And Company Strand displacement amplification
US5270184A (en) 1991-11-19 1993-12-14 Becton, Dickinson And Company Nucleic acid target generation
US5545524A (en) 1991-12-04 1996-08-13 The Regents Of The University Of Michigan Compositions and methods for chromosome region-specific probes
US5714330A (en) 1994-04-04 1998-02-03 Lynx Therapeutics, Inc. DNA sequencing by stepwise ligation and cleavage
EP0684315A1 (en) 1994-04-18 1995-11-29 Becton, Dickinson and Company Strand displacement amplification using thermophilic enzymes
US5695934A (en) 1994-10-13 1997-12-09 Lynx Therapeutics, Inc. Massively parallel sequencing of sorted polynucleotides
US5750341A (en) 1995-04-17 1998-05-12 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
US6306597B1 (en) 1995-04-17 2001-10-23 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
US6121489A (en) 1996-03-05 2000-09-19 Trega Biosciences, Inc. Selectively N-alkylated peptidomimetic combinatorial libraries and compounds therein
US6432360B1 (en) 1997-10-10 2002-08-13 President And Fellows Of Harvard College Replica amplification of nucleic acid arrays
US6485944B1 (en) 1997-10-10 2002-11-26 President And Fellows Of Harvard College Replica amplification of nucleic acid arrays
US6511803B1 (en) 1997-10-10 2003-01-28 President And Fellows Of Harvard College Replica amplification of nucleic acid arrays
US6787308B2 (en) 1998-07-30 2004-09-07 Solexa Ltd. Arrayed biomolecules and their use in sequencing
WO2000018957A1 (en) 1998-09-30 2000-04-06 Applied Research Systems Ars Holding N.V. Methods of nucleic acid amplification and sequencing
US6573043B1 (en) 1998-10-07 2003-06-03 Genentech, Inc. Tissue analysis and kits therefor
US6833246B2 (en) 1999-09-29 2004-12-21 Solexa, Ltd. Polynucleotide sequencing
US20050130173A1 (en) 2003-01-29 2005-06-16 Leamon John H. Methods of amplifying and sequencing nucleic acids
WO2006084132A2 (en) 2005-02-01 2006-08-10 Agencourt Bioscience Corp. Reagents, methods, and libraries for bead-based squencing

Non-Patent Citations (63)

* Cited by examiner, † Cited by third party
Title
"Hybridization: In Neurobiology; Advances in Methodology", 1994, OXFORD UNIVERSITY PRESS INC.
"The RNAcentral Consortium. RNAcentral: a hub of information for non-coding RNA sequences", NUCLEIC ACIDS RES., vol. 47, no. D1, 2019, pages D1250 - D1251
ABDELMOHSEN KPANDA ACKANG M-J ET AL.: "7SL RNA represses p53 translation by competing with HuR", NUCLEIC ACIDS RES., vol. 42, no. 15, 2014, pages 10099 - 10111
ABERLE DRBERG CDBLACK WC ET AL.: "The National Lung Screening Trial: overview and study design", RADIOLOGY, vol. 258, no. 1, 2011, pages 243 - 253
ADESSI ET AL., NUCLEIC ACID RES., vol. 28, 2000, pages E87
BABAPOOR SFLEMING EWU RDADRAS SS: "A novel miR-451a isomiR, associated with amelanotypic phenotype, acts as a tumor suppressor in melanoma by retarding cell migration and invasion", PLOS ONE, vol. 9, no. 9, 2014, pages e107502
BACH PBMIRKIN JNOLIVER TK ET AL.: "Benefits and harms of CT screening for lung cancer: a systematic review", JAMA, vol. 307, no. 22, 2012, pages 2418 - 2429
BENNETT ET AL., PHARMACOGENOMICS, vol. 6, 2005, pages 373 - 382
BRAY FFERLAY JSOERJOMATARAM ISIEGEL RLTORRE LAJEMAL A: "Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries", CA CANCER J CLIN, vol. 68, no. 6, 2018, pages 394 - 424
BRENNER ET AL., NAT. BIOTECHNOL., vol. 18, 2000, pages 630 - 634
BRUSTUGUN OTGRONBERG BHFJELLBIRKELAND L ET AL.: "Substantial nation-wide improvement in lung cancer relative survival in Norway from 2000 to 2016", LUNG CANCER, vol. 122, 2018, pages 138 - 145, XP085421906, DOI: 10.1016/j.lungcan.2018.06.003
BURTON JUMU SULANGSETH H ET AL., SERUM RNA PROFILING IN THE 10-YEARS PERIOD PRIOR TO DIAGNOSIS OF TESTICULAR GERM CELL TUMOR, 28 October 2020 (2020-10-28)
DE KONING HJVAN DER AALST CMDE JONG PA ET AL.: "Reduced Lung-Cancer Mortality with Volume CT Screening in a Randomized Trial", N ENGL J MED., vol. 382, no. 6, 2020, pages 503 - 513
ELKHOULY AM, YOUNESS RA, GAD MZ: "MicroRNA-486-5p and microRNA-486-3p: Multifaceted pleiotropic mediators in oncological and non-oncological conditions", NONCODING RNA RES, vol. 5, no. 1, 2020, pages 11 - 21
FEHLMANN TKAHRAMAN MLUDWIG N ET AL.: "Evaluating the Use of Circulating MicroRNA Profiles for Lung Cancer Detection in Symptomatic Patients", JAMA ONCOL, vol. 6, no. 5, 2020, pages 714 - 723
FOSS KMSIMA CUGOLINI DNERI MALLEN KEWEISS GJ: "miR-1254 and miR-574-5p: serum-based microRNA biomarkers for early-stage non-small cell lung cancer", J THORAC ONCOL, vol. 6, no. 3, 2011, pages 482 - 488, XP008137808
FREEDMAN JEGERSTEIN MMICK E ET AL.: "Diverse human extracellular RNAs are widely detected in human plasma", NAT COMMUN, vol. 7, 2016, pages 11106
GLUNDE KJACOBS MABHUJWALLA ZM: "Choline metabolism in cancer: implications for diagnosis and therapy", EXPERT REV MOL DIAGN, vol. 6, no. 6, 2006, pages 821 - 829, XP008123152
GOPAL MABDULLAH SEGRADY JJGOODWIN JS: "Screening for lung cancer with low-dose computed tomography: a systematic review and meta-analysis of the baseline findings of randomized controlled trials", J THORAC ONCOL., vol. 5, no. 8, 2010, pages 1233 - 1239
GUATELLI ET AL., PROC. NATL. ACAD. SCI. USA, vol. 87, 1990, pages 1874
GUTSCHNER TDIEDERICHS S: "The hallmarks of cancer: a long non-coding RNA point of view", RNA BIOL, vol. 9, no. 6, 2012, pages 703 - 719, XP055137471, DOI: 10.4161/rna.20481
HANAHAN DWEINBERG RA: "The hallmarks of cancer", CELL, vol. 100, no. 1, 2000, pages 57 - 70, XP055447752
HANASH SMOSTRIN EJFAHRMANN JF: "Blood based biomarkers beyond genomics for lung cancer screening", TRANSL LUNG CANCER RES., vol. 7, no. 3, 2018, pages 327 - 335
HJERKIND KVGISLEFOSS RETRETLI S ET AL.: "Cohort Profile Update: The Janus Serum Bank Cohort in Norway", INT J EPIDEMIOL, vol. 46, no. 4, 2017, pages 1101 - 1102
KELLER AMEESE E: "Can circulating miRNAs live up to the promise of being minimal invasive biomarkers in clinical settings?", WILEY INTERDISCIP REV RNA., vol. 7, no. 2, 2016, pages 148 - 156, XP055458971, DOI: 10.1002/wrna.1320
KIM KMABDELMOHSEN KMUSTAPIC MKAPOGIANNIS DGOROSPE M: "RNA in extracellular vesicles", WILEY INTERDISCIP REV RNA, vol. 8, no. 4, 2017
KLINGER ET AL., AM. J. HUM. GENET., vol. 51, 1992, pages 55 - 65
KLUPCZYNSKA APLEWA SKASPRZYK MDYSZKIEWICZ WKOKOT ZJMATYSIAK J: "Serum lipidome screening in patients with stage I non-small cell lung cancer", CLIN EXP MED, vol. 19, no. 4, 2019, pages 505 - 513, XP036908535, DOI: 10.1007/s10238-019-00566-7
KO JBALDASSANO SNLOH P-LKORDING KLITT BISSADORE D: "Machine learning to detect signatures of disease in liquid biopsies - a user's guide", LAB CHIP, vol. 18, no. 3, 2018, pages 395 - 405, XP055703589, DOI: 10.1039/C7LC00955K
KUO ET AL., AM. J. HUM. GENET., vol. 49, 1991, pages 112 - 119
KWOH ET AL., PROC. NATL. ACAD. SCI. USA, vol. 86, 1989, pages 1173
LANGSETH HGISLEFOSS REMARTINSEN JIDILLNER JURSIN G: "Cohort Profile: The Janus Serum Bank Cohort in Norway", INT J EPIDEMIOL, vol. 46, no. 2, 2017, pages 403 - 404
LARSEN IKSMASTUEN MJOHANNESEN TB ET AL.: "Data quality at the Cancer Registry of Norway: an overview of comparability, completeness, validity and timeliness", EUR J CANCER, vol. 45, no. 7, 2009, pages 1218 - 1231, XP026039877, DOI: 10.1016/j.ejca.2008.10.037
LI CQIN FHU F ET AL.: "Characterization and selective incorporation of small non-coding RNAs in non-small cell lung cancer extracellular vesicles", CELL BIOSCI, vol. 8, 2018, pages 2
LIZARDI ET AL., BIOTECHNOL., vol. 6, 1988, pages 1197
LOVE MIHUBER WANDERS S: "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2", GENOME BIOL, vol. 15, no. 12, 2014, pages 550, XP021210395, DOI: 10.1186/s13059-014-0550-8
LUND EHOLDEN LBOVELSTAD H ET AL.: "A new statistical method for curve group analysis of longitudinal gene expression data illustrated for breast cancer in the NOW AC postgenome cohort as a proof of principle", BMC MED RES METHODOL, vol. 16, 2016, pages 28
MARGULIES ET AL., NATURE, vol. 437, 2005, pages 376 - 380
MITRA ET AL., ANALYTICAL BIOCHEMISTRY, vol. 320, 2003, pages 55 - 65
MOROZOVAMARRA: "provide a review of some such technologies", GENOMICS, vol. 92, 2008, pages 255
MULLIS ET AL., METH. ENZYMOL., vol. 155, 1987, pages 335
MURAKAWA ET AL., DNA, vol. 7, 1988, pages 287
MURILLO ODTHISTLETHWAITE WROZOWSKY J ET AL.: "exRNA Atlas Analysis Reveals Distinct Extracellular RNA Cargo Types and Their Carriers Present across Human Biofluids", CELL, vol. 177, no. 2, 2019, pages 463 - 477
NATURE, vol. 409, 2001, pages 953 - 958
PELED NILOUZE M: "Screening for Lung Cancer: What Comes Next?", J CLIN ONCOL, vol. 33, no. 33, 2015, pages 3847 - 3848
PERSING, DAVID H. ET AL.: "Diagnostic Medical Microbiology: Principles and Applications", 1993, AMERICAN SOCIETY FOR MICROBIOLOGY, article "In Vitro Nucleic Acid Amplification Techniques", pages: 51 - 87
PICHLER MCALIN GA: "MicroRNAs in cancer: from developmental genes in worms to their clinical application in patients", BR J CANCER, vol. 113, no. 4, 2015, pages 569 - 573
ROUNGE TBUMU SUKELLER A ET AL.: "Circulating small non-coding RNAs associated with age, sex, smoking, body mass and physical activity", SCI REP, vol. 8, no. 1, 2018, pages 1760
SANDFELD-PAULSEN BJAKOBSEN KRBÆK R ET AL.: "Exosomal Proteins as Diagnostic Biomarkers in Lung Cancer", J THORAC ONCOL, vol. 11, no. 10, 2016, pages 1701 - 1710
SEIJO LM, PELED N, AJONA D: "Biomarkers in Lung Cancer Screening: Achievements, Promises, and Challenges", J THORAC ONCOL, vol. 14, no. 3, 2019, pages 343 - 357
SHENDURE ET AL., SCIENCE, vol. 309, 2005, pages 1728 - 1732
SLACK FJCHINNAIYAN AM: "The Role of Non-coding RNAs in Oncology", CELL, vol. 179, no. 5, 2019, pages 1033 - 1055, XP085907036, DOI: 10.1016/j.cell.2019.10.017
SVORONOS AAENGELMAN DMSLACK FJ: "OncomiR or Tumor Suppressor? The Duplicity of MicroRNAs in Cancer", CANCER RES., vol. 76, no. 13, 2016, pages 3666 - 3670
TIAN FWANG JOUYANG T ET AL.: "MiR-486-5p Serves as a Good Biomarker in Nonsmall Cell Lung Cancer and Suppresses Cell Growth With the Involvement of a Target PIK3R1", FRONT GENET., vol. 10, 2019, pages 688
UMU SULANGSETH HBUCHER-JOHANNESSEN C ET AL.: "A comprehensive profile of circulating RNAs in human serum", RNA BIOL, vol. 15, no. 2, 2018, pages 242 - 250
UMU SULANGSETH HKELLER A ET AL.: "A 10 year prediagnostic followup study shows that serum RNA signals are highly dynamic in lung carcinogenesis", MOL ONCOL, 2019, Retrieved from the Internet <URL:https://febs.onlinelibrary.wiley.com/doi/abs/10.1002/1878-0261.12620>
WALKER, G. ET AL., PROC. NATL. ACAD. SCI. USA, vol. 89, 1992, pages 392 - 396
WARD ET AL., AM. J. HUM. GENET., vol. 52, 1993, pages 854 - 865
WEISS, R., SCIENCE, vol. 254, 1991, pages 1292
WILD CPWEIDERPASS WSTEWART BW: "World Cancer Report: Cancer Research for Cancer Prevention", LYON, FRANCE: INTERNATIONAL AGENCY FOR RESEARCH ON CANCER, 2020
YANAIHARA NCAPLEN NBOWMAN E ET AL.: "Unique microRNA molecular profiles in lung cancer diagnosis and prognosis", CANCER CELL, vol. 9, no. 3, 2006, pages 189 - 198
YU HGUAN ZCUK KZHANG YBRENNER H: "Circulating MicroRNA Biomarkers for Lung Cancer Detection in East Asian Populations", CANCERS, vol. 11, no. 3, 2019
ZHOU RZHOU XYIN Z ET AL.: "MicroRNA-574-5p promotes metastasis of non-small cell lung cancer by targeting PTPRU", SCI REP, vol. 6, 2016, pages 35714

Also Published As

Publication number Publication date
WO2023152568A3 (en) 2023-09-21

Similar Documents

Publication Publication Date Title
JP7228896B2 (en) Methods for predicting the prognosis of breast cancer patients
US9708667B2 (en) MiRNA expression signature in the classification of thyroid tumors
JP2020141684A (en) Microrna biomarkers for gastric cancer diagnosis
US10196691B2 (en) Colon cancer gene expression signatures and methods of use
US20200131586A1 (en) Methods and compositions for diagnosing or detecting lung cancers
US11814687B2 (en) Methods for characterizing bladder cancer
CA2859663A1 (en) Identification of multigene biomarkers
WO2013086352A1 (en) Prostate cancer associated circulating nucleic acid biomarkers
US20160102359A1 (en) Genetic marker for early breast cancer prognosis prediction and diagnosis, and use thereof
JP6356217B2 (en) Method for producing prognostic model for gastric cancer
US10718030B2 (en) Methods for predicting effectiveness of chemotherapy for a breast cancer patient
EP2778237A1 (en) Biomarkers for recurrence prediction of colorectal cancer
CA2684897A1 (en) Prostate cancer survival and recurrence
JP2020527958A (en) Strengthening cancer screening using cell-free viral nucleic acids
WO2021061473A1 (en) Systems and methods for diagnosing a disease condition using on-target and off-target sequencing data
CN111742059A (en) Model for targeted sequencing
CA3182993A1 (en) Detection and classification of human papillomavirus associated cancers
WO2013130465A2 (en) Gene expression markers for prediction of efficacy of platinum-based chemotherapy drugs
US20210079479A1 (en) Compostions and methods for diagnosing lung cancers using gene expression profiles
WO2023152568A2 (en) Compositions and methods for characterizing lung cancer
US20220259674A1 (en) Compositions and methods for treating breast cancer
WO2021003176A1 (en) Identification of patients that will respond to chemotherapy
KR20140121523A (en) Novel system for predicting prognosis of gastric cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23713426

Country of ref document: EP

Kind code of ref document: A2