WO2011146937A1 - Methods and kits useful in diagnosing nsclc - Google Patents

Methods and kits useful in diagnosing nsclc Download PDF

Info

Publication number
WO2011146937A1
WO2011146937A1 PCT/US2011/037606 US2011037606W WO2011146937A1 WO 2011146937 A1 WO2011146937 A1 WO 2011146937A1 US 2011037606 W US2011037606 W US 2011037606W WO 2011146937 A1 WO2011146937 A1 WO 2011146937A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
marker
oligonucleotide
group
sequence
Prior art date
Application number
PCT/US2011/037606
Other languages
French (fr)
Inventor
Glen Weiss
Original Assignee
The Translational Genomics Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Translational Genomics Research Institute filed Critical The Translational Genomics Research Institute
Publication of WO2011146937A1 publication Critical patent/WO2011146937A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Definitions

  • This present invention is related to a method, kits and miRNA markers for classifying a sample into a group of early-stage Non-Small Cell Lung Cancer (NSCLC).
  • NSCLC Non-Small Cell Lung Cancer
  • Lung cancer is the most common cause of cancer-related death in the world. In the United States, over 90 million individuals are at risk for developing lung cancer and this disease is estimated to remain a major health problem for at least the next 50 years. Lung cancer is often insidious, and it may produce no symptoms until the disease is well advanced. Approximately 7-10% of patients with lung cancer are asymptomatic. Over 75% of lung cancer cases are diagnosed an advanced stage, conferring a poor prognosis because the lack of practical way to screen a large number of people at risk. Therefore, the need to detect and diagnose lung cancer at an early and potentially curable stage is great.
  • NSCLC Non-Small Cell Lung Cancer
  • SCLC small cell lung cancer
  • NSCLC Neurosthelial cells
  • Adenocarcinomas which are often found in an outer area of the lung
  • Bronchioloalveolar carcinomas which are in the distal bronchioles or alveoli of the lung
  • Squamous cell carcinomas which are usually found in the center of the lung by an air tube (bronchus)
  • Large cell carcinomas which can occur in any part of the lung and they tend to grow and spread faster than the other three types.
  • Non-small cell lung cancer is divided into five stages: Stage 0: the cancer has not spread beyond the inner lining of the lung; Stage I: the cancer is small and hasn't spread to the lymph nodes; Stage II: the cancer has spread to some lymph nodes near the original tumor; Stage III: the cancer has spread to nearby tissue or spread to far away lymph nodes; Stage IV: the cancer has spread to other organs of the body such as the other lung, brain, or liver. Stages O-II are deemed as early stage of NSCLC, which are critical for treatment and survival rate.
  • One aspect of the invention provides a method of classifying a subject into a group.
  • the method generally comprises the steps of receiving a sample containing RNA from the subject; adding a first reagent comprising a first oligonucleotide capable of specific binding to a marker including a sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, and SEQ ID NO: 4 to a mixture comprising the RNA; and subjecting the mixture to conditions that allow detection of the binding of the first reagent to the marker; and classifying the subject into a group based on the expression level of the marker in comparison to the level of the same marker in a control.
  • the general method may further comprise the step of adding reverse transcriptase to the mixture allowing the formation of a DNA template comprising the nucleic acid sequence of the marker.
  • the general methods may further comprise the step of adding a second oligonucleotide and a third oligonucleotide to the mixture, wherein the second oligonucleotide and the third oligonucleotide bind to opposite strands of the DNA template comprising the nucleic acid sequence of the marker allowing nucleic acid amplification.
  • the general method may further comprise the step of adding a fourth oligonucleotide to the mixture wherein the fourth oligonucleotide binds to the DNA template between the sequences to which the second oligonucleotide and the third oligonucleotide are capable of binding.
  • the fourth nucleic acid comprises a label.
  • the label comprises a fluorescent label selected from the group consisting of FAM, dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ, Gold540, and LIZ.
  • the sample in this general method is a biological sample.
  • the sample comprises bodily fluid selected from a group consisting of whole blood, plasma and serum.
  • the sample is serum.
  • the marker in the general method is a classifier for identifying a subject at the risk or having non-small cell lung cancer (NSCLC). Specifically, the higher expression of the marker in the subject in comparison to the expression of the marker in a control classifies the subject into a group of early stage NSCLC.
  • the marker in the general method is selected from the group consisting of miR-574-5p represented by SEQ ID NO: 1, miR-1254 represented by SEQ ID NO: 2, miR-1268 represented by SEQ ID NO: 3, and miR-1228 represented by SEQ ID NO: 4.
  • the marker is miR-574-5p represented by SEQ ID NO: 1.
  • the preferred marker is miR-1254 represented by SEQ ID NO: 2.
  • the subject is a mammal.
  • the subject is a human.
  • kits for facilitating the classification of a subject into a group comprising a first reagent comprising a first oligonucleotide capable of specific binding to a marker that includes a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, and SEQ ID NO. 4.
  • the kit may further comprise a second oligonucleotide and a third oligonucleotide, wherein the second oligonucleotide and the third oligonucleotide are capable of binding to opposite strands of a DNA construct comprising the reverse transcription product of the marker.
  • the kit may further comprise a fourth oligonucleotide capable of binding to a sequence between the sequences to which the second oligonucleotide and the third oligonucleotide are capable of binding.
  • the fourth oligonucleotide will comprise a label.
  • the label comprises a fluorescent label selected from the group consisting of FAM, dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ, Gold540, and LIZ.
  • the kit may further comprise an indication of a result that signifies classification of the sample to a group of early stage NSCLC.
  • the result provided by the components of the kit can be in a form selected from the group consisting of a ACt value and a nucleic acid sequence including a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, and SEQ ID NO. 4.
  • the indication of the kit is selected from the group consisting of a positive control, a writing, and software configured to detect the result as input and identification of the sample as one in the early stage NSCLC.
  • Still another aspect of the invention provides an isolated sequence having at least
  • sequence identity with a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4, and reverse transcription and amplification products thereof.
  • the isolated sequence may further comprise a label attached to said sequence. Preferred labels are as described above.
  • Figure 1 depicts miR-1254 and miR-574-5p ROC curve using the discovery cohort, and showing that marker miR-1254 and miR-574-5p have an AUC (Area Under ROC Curve) of 0.77, that is 77% probability in classification, and had 82% sensitivity and 77% specificity; and
  • Figure 2 depicts miR-1254 and miR-574-5p ROC curve using the validation cohort, and showing that marker miR-1254 and miR-574-5p have an AUC of 0.75 and a 73% and 71% sensitivity and specificity, respectively.
  • the present invention provides miRNA markers for NSCLC early detection.
  • the miRNA markers are differentially expressed, serum-based, microRNAs between patients with early-stage NSCLC and a control group.
  • the ability to diagnose NSCLS at an early stage leads to improved survical due to early intervention and treatment.
  • miRNAs have been shown as a major new class of regulatory gene products. For example, in human heart, liver or brain, miRNAs play a role in tissue specification or cell lineage decisions. In addition, miRNAs influence a variety of processes, including early development, cell proliferation and cell death, apoptosis, and fat metabolism. The large number of miRNA genes, the diverse expression patterns and the abundance of potential miRNA targets suggest that miRNAs may be a significant but unrecognized source of human genetic disease. Differences in miRNA expression have also been found to be associated with cancer diagnosis, prognosis, and susceptibility to treatments.
  • a mature miRNA is typically an 18-25 nucleotide non-coding RNA that regulates expression of mRNA including sequences complementary to the miRNA.
  • These small RNA molecules are known to control gene expression by regulating the stability and/or translation of mRNAs.
  • miRNAs bind to the 3' UTR of target mRNAs and suppress translation.
  • miRNA' s may also bind to target mRNAs and mediate gene silencing through the RNAi pathway. miRNAs may also regulate gene expression by causing chromatin condensation.
  • Endogenously expressed miRNAs are processed by endonucleolytic cleavage from larger double- stranded RNA precursor molecules.
  • the resulting small single- stranded miRNAs are incorporated into a multi-protein complex, termed RISC.
  • the small RNA in RISC provides sequence information that is used to guide the RNA-protein complex to its target RNA molecules.
  • the degree of complimentarity between the small RNA and its target determines the fate of the bound mRNA. Perfect pairing induces target RNA cleavage, as is the case for siRNAs and most plant miRNAs. In comparison, the imperfect pairing in the central part of the duplex leads to a block in translation.
  • miRNAs regulate various biological functions including developmental processes, developmental timing, cell proliferation, neuronal gene expression and cell fate, apoptosis, tissue growth, viral pathogenesis, brain morphogenesis, muscle differentiation, stem cell division and progression of human diseases. Many miRNAs are conserved in sequence and function between distantly related organisms. However, condition- specific, time- specific, and individual- specific levels of gene expression may be due to the interactions of different miRNAs which lead to genetic expression of various traits. miRNA genetic alterations, such as deletions, insertions, reversions or conversions, may affect the accuracy or specificity of miRNA-related gene regulation. miRNA genetic alterations may be used as markers for disease prognosis and diagnosis. miRNA alleles may alternatively be used as target for disease treatment, and markers for disease prognosis and diagnosis.
  • miRNA may be amplified by any of a number of techniques including reverse transcription followed by PCR. Some techniques of reverse transcription of miRNA use a targeted stem-loop primer to prime reverse transcription of the miRNA into a cDNA template. The cDNA template may then be used as a primer for any type of PCR including any type of quantitative PCR.
  • a stem-loop oligonucleotide is a single stranded oligonucleotide that includes a sequence capable of binding to a specific marker because it includes a nucleic acid sequence complementary to the marker. The sequence complementary to the marker is flanked by inverted repeats that form self-complementary sequences. Such nucleotides may contain a fluorophore quencher pair at the 5' and 3' ends of the oligonucleotide.
  • a marker may be any molecular structure produced by a cell, expressed inside the cell, accessible on the cell surface, or secreted by the cell.
  • a marker may be any protein, carbohydrate, fat, nucleic acid, catalytic site, or any combination of these, such as, an enzyme, glycoprotein, cell membrane, virus, cell, organ, organelle, or any uni- or multimolecular structure or any other such structure now known or yet to be disclosed whether alone or in combination.
  • a marker may also be called a target and the terms are used interchangeably.
  • a marker may be represented by the sequence of a nucleic acid from which it can be derived or any other chemical structure.
  • a marker may be represented by a protein sequence.
  • the concept of a marker is not limited to the products of the exact nucleic acid sequence or protein sequence by which it may be represented. Rather, a marker encompasses all molecules that may be detected by a method of assessing the expression of the marker.
  • the marker is an isolated miRNA.
  • an isolated nucleic acid when an isolated nucleic acid includes a particular sequence, the sequence may be a part of a longer nucleic acid or may be the entirety of that sequence, which is an identified marker associated with a specific pheonotype or a trait.
  • the isolated nucleic acid may contain nucleotides at the 5' end, 3' end of that longer sequence, or both.
  • the concept of a nucleic acid including a particular sequence of a marker further encompasses nucleic acids that contain less than the full sequence of the marker sequence but are still capable of specifically detecting the alleles of the marker.
  • allele encompasses any form of a particular nucleic acid that may be recognized as a form of that particular nucleic acid on account of its location, sequence, epigenetic modification or any other characteristic that may identify it as being a specific form of that particular nucleic acid.
  • Nonlimiting examples of alleles include various forms of a gene that include point mutations, silent mutations, deletions, frameshift mutations, single nucleotide polymorphisms (SNPs), inversions, translocations, heterochromatic insertions, and differentially methylated sequences relative to a reference gene sequence, whether alone or in combination. Different alleles may or may not encode proteins or peptides.
  • Different alleles may differ in expression level, pattern temporal, or spatial specificity, and expression regulation.
  • the protein from different alleles may or may not be functional. Further, the protein may be gain-of-function, loss- of-function, or with altered function.
  • An allele may also be called a mutation or a mutant.
  • An allele may be compared to another allele that may be termed a wild type form of an allele. In some cases, the wild type allele is more common than the mutant.
  • a miRNA marker represented by a particular sequence or structure includes its alleles comprising insertions, deletions, truncations, alternative splicing derivatives, differentially modified sequences, and altered targeting specificity.
  • the different alleles of a miRNA marker may be in circulating forms or non- circulating forms in some cases, soluble or insoluble forms in some other cases.
  • Nucleic acid sequences including the sequence of a miRNA molecule, may be identified by the IUAPC letter code which is as follows: A - Adenine base; C- Cytosine base; G - guanine base; T or U - thymine or uracil base. M - A or C; R - A or G; W - A or T; S - C or G; Y - C or T; K - G or T; V - A or C or G; H - A or C or T; D - A or G or T; B - C or G or T; N or X - A or C or G or T.
  • T or U may be used interchangeably depending on whether the nucleic acid is DNA or RNA.
  • a sequence having less than 60%, less than 70%, less than 80%, less than 90%, less than 95%, less than 99% or 100% identity to the identifying sequence may still be encompassed by the invention if it is capable of binding to its complementary sequence and/or facilitating nucleic acid amplification of a desired target sequence. If a sequence is represented in degenerate form; for example through the use of codes other than A, C, G, T, or U; the concept of a nucleic acid including the sequence also encompasses a mixture of nucleic acids of different sequences that still meet the conditions imposed by the degenerate sequence.
  • Sequence Identity refers to a relationship between two or more polypeptide sequences, or two or more polynucleotide sequences, namely a reference sequence, and a given sequence to be compared with the reference sequence. Sequence identity is determined by comparing the given sequence to the reference sequence after the sequences have been optimally aligned to produce the highest degree of sequence similarity, as determined by the match between strings of such sequences. Upon such alignment, sequence identity is ascertained on a position-by-position basis, e.g., the sequences are "identical” at a particular position if at that position, the nucleotides or amino acid residues are identical.
  • Sequence identity can be readily calculated by known methods, including but not limited to, those described in Computational Molecular Biology, Lesk, A. N., ed., Oxford University Press, New York (1988), Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., Academic Press, New York (1993); Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H. G., eds., Humana Press, New Jersey (1994); Sequence Analysis in Molecular Biology, von Heinge, G., Academic Press (1987); Sequence Analysis Primer, Gribskov, M.
  • Preferred methods to determine the sequence identity are designed to give the largest match between the sequences tested. Methods to determine sequence identity are codified in publicly available computer programs which determine sequence identity between given sequences. Examples of such programs include, but are not limited to, the GCG program package (Devereux, J., et al., Nucleic Acids Research, 12(1):387 (1984)), BLASTP, BLASTN and FASTA (Altschul, S. F. et al., J.
  • BLASTX program is publicly available from NCBI and other sources (BLAST Manual, Altschul, S. et al., NCVI NLM NIH Bethesda, MD 20894, Altschul, S. F. et al., J. Molec. Biol., 215:403-410 (1990), the teachings of which are incorporated herein by reference). These programs optimally align sequences using default gap weights in order to produce the highest level of sequence identity between the given and reference sequences.
  • nucleotide sequence having at least, for example, 85%, preferably 90%, even more preferably 95% "sequence identity" to a reference nucleotide sequence it is intended that the nucleotide sequence of the given polynucleotide is identical to the reference sequence except that the given polynucleotide sequence may include up to 15, preferably up to 10, even more preferably up to 5 point mutations per each 100 nucleotides of the reference nucleotide sequence.
  • a polynucleotide having a nucleotide sequence having at least 85%, preferably 90%, even more preferably 95% identity relative to the reference nucleotide sequence up to 15%, preferably 10%, even more preferably 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 15%, preferably 10%, even more preferably 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence.
  • These mutations of the reference sequence may occur at the 5' or 3' terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence.
  • sequence homology refers to a method of determining the relatedness of two sequences. To determine sequence homology, two or more sequences are optimally aligned, and gaps are introduced if necessary.
  • the homolog sequence comprises at least a stretch of 50, even more preferably at least 100, even more preferably at least 250, and even more preferably at least 500 nucleotides.
  • a “conservative substitution” refers to the substitution of an amino acid residue or nucleotide with another amino acid residue or nucleotide having similar characteristics or properties including size, hydrophobicity, etc., such that the overall functionality does not change significantly.
  • isolated means altered by the hand of man from its natural state, i.e., if it occurs in nature, it has been changed or removed from its original environment, or both.
  • a polynucleotide or polypeptide naturally present in a living organism is not “isolated,” but the same polynucleotide or polypeptide separated from the coexisting materials of its natural state is “isolated", as the term is employed herein.
  • the concept of the expression of a marker encompasses any and all processes producing biological molecules or materials derived from the nucleic acid form of the marker.
  • Expression thus includes processes such as RNA transcription, mRNA splicing, protein translation, protein folding, post-translational modification, membrane transport, associations with other molecules, addition of carbohydrate moeties to proteins, phosphorylation, protein complex formation and any other process along a continuum that results in biological material derived from genetic material whether in vitro, in vivo, or ex vivo.
  • Expression also encompasses all processes through which the production of material derived from a nucleic acid template may be actively or passively suppressed. Such processes include all aspects of transcriptional and translational regulation. Examples of which include, heterochromatic silencing, transcription factor inhibition, any form of RNAi silencing, microRNA silencing, alternative splicing, protease digestion, posttranslational modification, and alternative protein folding.
  • Expression may be assessed by any number of methods used to detect material derived from a nucleic acid template used currently in the art and yet to be developed.
  • methods include any nucleic acid detection method including the following nonlimiting examples, microarray analysis, RNA in situ hybridization, RNAse protection assay, Northern blot, reverse transcriptase PCR, quantitative PCR, quantitative reverse transcriptase PCR, quantitative real-time reverse transcriptase PCR, reverse transcriptase treatment followed by direct sequencing, direct sequencing of genomic DNA, or any other method of detecting a specific nucleic acid now known or yet to be disclosed.
  • Other examples include any process of assessing protein expression including flow cytometry, immunohistochemistry, ELISA, Western blot, and immunoaffinity chromatograpy, HPLC, mass spectrometry, protein microarray analysis, PAGE analysis, isoelectric focusing, 2-D gel electrophoresis, or any enzymatic assay or any method that uses a protein reagent, nucleic acid reagent, or other reagent capable of specifically binding to or otherwise recognizing a specific nucleic acid or protein marker.
  • ligands capable of specifically binding a marker.
  • ligands include antibodies, antibody complexes, conjugates, natural ligands, small molecules, nanoparticles, or any other molecular entity capable of specific binding to a marker.
  • antibody is used herein in the broadest sense and refers generally to a molecule that contains at least one antigen binding site that immuno specifically binds to a particular antigen target of interest.
  • Antibody thus includes but is not limited to native antibodies and variants thereof, fragments of native antibodies and variants thereof, peptibodies and variants thereof, and antibody mimetics that mimic the structure and/or function of an antibody or a specified fragment or portion thereof, including single chain antibodies and fragments thereof.
  • the term thus includes full length antibodies and/or their variants as well as immunologically active fragments thereof, thus encompassing, antibody fragments capable of binding to a biological molecule (such as an antigen or receptor) or portions thereof, including but not limited to Fab, Fab' , F(ab')2, facb, pFc', Fd, Fv or scFv (See, e.g., CURRENT PROTOCOLS IN IMMUNOLOGY, (Colligan et al., eds., John Wiley & Sons, Inc., NY, 1994-2001).
  • Antibodies may be monoclonal, polyclonal, or any antibody fragment including a
  • Ligands may be associated with a label such as a radioactive isotope or chelate thereof, dye (fluorescent or nonfluore scent), stain, enzyme, metal, or any other substance capable of aiding a machine or a human eye from differentiating a cell expressing a marker from a cell not expressing a marker. Additionally, expression may be assessed by monomeric or multimeric ligands associated with substances capable of killing the cell. Such substances include protein or small molecule toxins, cytokines, pro-apoptotic substances, pore forming substances, radioactive isotopes, or any other substance capable of killing a cell.
  • Differential expression encompasses any detectable difference between the expression of a marker in one sample relative to the expression of the marker in another sample. Differential expression may be assessed by a detector, an instrument containing a detector, by aided or unaided human eye, or any other method that may detect differential expression.
  • Examples include but are not limited to differential staining of cells in an IHC assay configured to detect a marker, differential detection of bound RNA on a microarray to which a sequence capable of binding to the marker is bound, differential results in measuring RT-PCR measured in ACt or alternatively in the number of PCR cycles necessary to reach a particular optical density at a wavelength at which a double stranded DNA binding dye (e.g., SYBR Green) incorporates, differential results in measuring label from a reporter probe used in an RT-PCR reaction, differential detection of fluorescence labels on cells using a flow cytometer, differential intensities of bands in a Northern blot, differential intensities of bands in an RNAse protection assay, differential cell death measured by apoptotic markers, differential cell death measured by shrinkage of a tumor, or any method that allows a detection of a difference in signal between one sample or set of samples and another sample or set of samples.
  • a double stranded DNA binding dye e.g., SYBR Green
  • a label used to facilitate the differential expression detection may be any substance capable of aiding a machine, detector, sensor, device, or enhanced or unenhanced human eye from differentiating a labeled composition from an unlabeled composition.
  • labels include but are not limited to: a radioactive isotope or chelate thereof, dye (fluorescent or nonfluorescent), stain, enzyme, or nonradioactive metal. Specific examples include but are not limited to: fluorescein, biotin, digoxigenin, alkaline phosphatese, biotin, streptavidin, 3 H, 14 C,
  • rhodamine 4-(4'-dimethylamino- phenylazo)benzoic acid (“Dabcyl”), 4-(4'-dimethylamino-phenylazo)sulfonic acid (sulfonyl chloride) ("Dabsyl”), 5-((2-aminoethyl)-amino)-naphtalene-l-sulfonic acid (“EDANS”), Psoralene derivatives, haptens, cyanines, acridines, fluorescent rhodol derivatives, cholesterol derivatives, ethylenediaminetetraaceticacid (“EDTA”) and derivatives thereof, or any other compound that may be differentially detected.
  • Dabcyl 4-(4'-dimethylamino-phenylazo)benzoic acid
  • Dabsyl 4-(4'-dimethylamino-phenylazo)sulfonic acid (sulfonyl chloride)
  • the label may also include one or more fluorescent dyes optimized for use in genotyping.
  • fluorescent dyes include but are not limited to: dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ+, Gold 540, MGB and LIZ.
  • the expression of the marker in a sample may be compared to a level of expression predetermined to predict the presence or absence of a particular cellular or physiological characteristic.
  • the level of expression may be derived from a single control or a set of controls.
  • a control may be any sample with a previously determined level of expression.
  • a control may comprise material within the sample or material from sources other than the sample.
  • the expression of a marker in a sample may be compared to a control that has a level of expression predetermined to signal or not signal a cellular or physiological characteristic. This level of expression may be derived from a single source of material including the sample itself or from a set of sources. Comparison of the expression of the marker in the sample to a particular level of expression results in a prediction that the sample exhibits or does not exhibit the cellular or physiological characteristic.
  • Prediction of a cellular or physiological characteristic includes the prediction of any cellular or physiological state that may be predicted by assessing the expression of a marker. Examples include the identity of a cell as a particular cell including a particular normal or cancer cell type, the likelihood that one or more diseases is present or absent, the likelihood that a present disease will progress, remain unchanged, or regress, the likelihood that a disease will respond or not respond to a particular therapy, or any other disease outcome. Further examples include the likelihood that a cell will move, senesce, apoptose, differentiate, metastasize, or change from any state to any other state or maintain its current state.
  • Expression of a marker in a sample may be more or less than that of a level predetermined to predict the presence or absence of a cellular or physiological characteristic.
  • the expression of the marker in the sample may be more than l,000,000x, more than 100,000x, more than 10,000x, more than lOOOx, more than lOOx, more than lOx, more than 5x, more than 2x, more than lx, less than lx, less than 0.5x, less than O.lx less than 0.0 lx, less than 0.00 lx, less than 0.000 lx, less than 0.0000 lx, less than 0.00000 lx, less than 0.000000 lx or any value more or less than that of a level predetermined to predict the presence or absence of a cellular or physiological characteristic.
  • One type of cellular or physiological characteristic is the risk that a particular disease outcome will occur. Assessing this risk includes the performing of any type of test, assay, examination, result, readout, or interpretation that correlates with an increased or decreased probability that an individual has had, currently has, or will develop a particular disease, disorder, symptom, syndrome, or any condition related to health or bodily state. Examples of disease outcomes include, but need not be limited to survival, death, progression of existing disease, remission of existing disease, initiation of onset of a disease in an otherwise disease-free subject, or the continued lack of disease in a subject in which there has been a remission of disease. Assessing the risk of a particular disease encompasses diagnosis in which the type of disease afflicting a subject is determined.
  • Assessing the risk of a disease outcome also encompasses the concept of prognosis.
  • a prognosis may be any assessment of the risk of disease outcome in an individual in which a particular disease has been diagnosed. Assessing the risk further encompasses prediction of therapeutic response in which a treatment regimen is chosen based on the assessment. Assessing the risk also encompasses a prediction of overall survival after diagnosis.
  • Determining the level of expression that signifies a physiological or cellular characteristic may be assessed by any of a number of methods. The skilled artisan understands that numerous methods may be used to select a level of expression for a particular marker or a plurality of markers that signifies a particular physiological or cellular characteristic.
  • a threshold value may be obtained by performing the assay method on samples obtained from a population of patients having a certain type of disease (cancer for example), and from a second population of subjects that do not have the disease.
  • a population of patients, all of which have a disease such as cancer may be followed for a period of time.
  • the population may be divided into two or more groups. For example, the population may be divided into a first group of patients whose disease progresses to a particular endpoint and a second group of patients whose disease does not progress to the particular endpoint. Examples of endpoints include disease recurrence, death, metastasis or other states to which disease may progress. If expression of the marker in a sample is more similar to the predetermined expression of the marker in one group relative to the other group, the sample may be assigned a risk of having the same outcome as the patient group to which it is more similar.
  • one or more levels of expression of the marker may be selected that provide an acceptable ability of its ability to signify a particular physiological or cellular characteristic. Examples of such characteristics include identifying or diagnosing a particular disease, assessing a risk of outcome or a prognostic risk, or assessing the risk that a particular treatment will or will not be effective.
  • ROC curves Receiver Operating Characteristic curves, or "ROC" curves, may be calculated by plotting the value of a variable versus its relative frequency in two populations.
  • ROC curve is a graphical plot of the sensitivity, or true positive rate (TPR), vs. false positive rate (FPR), for a binary classifier system as its discrimination threshold is varied.
  • the ROC can also be represented equivalently by plotting the fraction of true positives out of the positives (TPR) vs. the fraction of false positives out of the negatives (FPR).
  • TPR determines a classifier or a diagnostic test performance on classifying positive instances correctly among all positive samples available during the test.
  • FPR defines how many incorrect positive results occur among all negative samples available during the test.
  • a ROC space is defined by FPR and TPR as x and y axes respectively, which depicts relative trade-offs between true positive (benefits) and false positive (costs). Since TPR is equivalent with sensitivity and FPR is equal to 1 - specificity, the ROC graph is sometimes called the sensitivity vs (1 - specificity) plot. Each prediction result or one instance of a confusion matrix represents one point in the ROC space. The diagonal divides the ROC space. Points above the diagonal represent good classification results, points below the line poor results. The best possible prediction method would yield a point in the upper left corner or coordinate (0,1) of the ROC space, representing 100% sensitivity (no false negatives) and 100% specificity (no false positives). The (0,1) point is also called a perfect classification.
  • a threshold for classification in a binary classifier system When a threshold for classification in a binary classifier system is selected, expression of the marker in the sample above the threshold indicates the sample is similar to one group and expression of the marker below the threshold indicates the sample is similar to the other group.
  • the area under the ROC curve (AUC, "Area Under Curve") is a measure of the probability that the expression correctly indicated the similarity of the sample to the proper group.
  • ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution (Hanley et al, Radiology 143: 29-36 (1982)).
  • levels of expression may be established by assessing the expression of a marker in a sample from one patient, assessing the expression of additional samples from the same patient obtained later in time, and comparing the expression of the marker from the later samples with the initial sample or samples.
  • This method may be used in the case of markers that indicate, for example, progression or worsening of disease or lack of efficacy of a treatment regimen or remission of a disease or efficacy of a treatment regimen.
  • Other methods may be used to assess how accurately the expression of a marker signifies a particular physiological or cellular characteristic. Such methods include a positive likelihood ratio, negative likelihood ratio, odds ratio, and/or hazard ratio.
  • a likelihood ratio the likelihood that the expression of the marker would be found in a sample with a particular cellular or physiological characteristic is compared with the likelihood that the expression of the marker would be found in a sample lacking the particular cellular or physiological characteristic.
  • An odds ratio measures effect size and describes the amount of association or non-independence between two groups.
  • An odds ratio is the ratio of the odds of a marker being expressed in one set of samples versus the odds of the marker being expressed in the other set of samples.
  • An odds ratio of 1 indicates that the event or condition is equally likely to occur in both groups.
  • An odds ratio grater or less than 1 indicates that expression of the marker is more likely to occur in one group or the other depending on how the odds ratio calculation was set up.
  • a hazard ratio may be calculated by an estimate of relative risk. Relative risk is the chance that a particular event will take place. It is a ratio of the probability that an event such as development or progression of a disease will occur in samples that exceed a threshold level of expression of a marker over the probability that the event will occur in samples that do not exceed a threshold level of expression of a marker.
  • a hazard ratio may be calculated by the limit of the number of events per unit time divided by the number at risk as the time interval decreases. In the case of a hazard ratio, a value of 1 indicates that the relative risk is equal in both the first and second groups; a value greater or less than 1 indicates that the risk is greater in one group or another, depending on the inputs into the calculation.
  • threshold levels of expression may be determined. This can be the case in so-called “tertile,” “quartile,” or “quintile” analyses. In these methods, multiple groups can be considered together as a single population, and are divided into 3 or more bins having equal numbers of individuals. The boundary between two of these "bins” may be considered threshold levels of expression indicating a particular level of risk of a disease developing or signifying a physiological or cellular state. A risk may be assigned based on which "bin" a test subject falls into. [0044] The invention contemplates assessing the expression of the marker in any biological sample from which the expression may be assessed. The type of biological sample, the subject comprising the sample, and the manner in which the sample is collected can and will vary, and is known to the skilled in the art. Numerous types of samples can be obtained from a subject to produce a miRNA expression profile.
  • sample or “biological sample” is used in its broadest sense.
  • a sample may comprise a bodily fluid including whole blood, serum, plasma, urine, saliva, cerebral spinal fluid, semen, vaginal fluid, pulmonary fluid, tears, perspiration, mucus and the like; an extract from a cell, chromosome, organelle, or membrane isolated from a cell; a cell; genomic DNA, RNA, or cDNA, in solution or bound to a substrate; a tissue; a tissue print or any other material isolated in whole or in part from a living subject.
  • samples include, but are not limited to, tissue isolated from primates, e.g., humans, or rodents, e.g., mice, and rats.
  • Biological samples may also include sections of tissues such as biopsy and autopsy samples, and frozen sections taken for histologic purposes such as blood, plasma, serum, sputum, stool, tears, mucus, hair, skin, and the like. Biological samples also include explants and primary and/or transformed cell cultures derived from patient tissues.
  • a biological sample is typically obtained from a eukaryotic organism, most preferably a mammal such as a primate e.g., chimpanzee or human; cow; dog; cat; a rodent, e.g., guinea pig, rat, mouse; rabbit; or a bird; reptile; or fish.
  • a biological sample for use is obtained in methods described in this invention.
  • samples include but are not limited to biopsy or other in vivo or ex vivo analysis of prostate, breast, skin, muscle, facia, brain, endometrium, lung, head and neck, pancreas, small intestine, blood, liver, testes, ovaries, colon, skin, stomach, esophagus, spleen, lymph node, bone marrow, kidney, placenta, or fetus.
  • the sample comprises a fluid sample, such as peripheral blood, lymph fluid, ascites, serous fluid, pleural effusion, sputum, cerebrospinal fluid, amniotic fluid, lacrimal fluid, stool, or urine.
  • Samples include single cells, whole organs or any fraction of a whole organ, in any condition including in vitro, ex vivo, in vivo, post-mortem, fresh, fixed, or frozen.
  • the term "subject" is used in its broadest sense.
  • the subject is a mammal.
  • mammals include humans, dogs, cats, horses, cows, sheep, goats, and pigs.
  • a subject includes any human or non-human mammal, including for example: a primate, cow, horse, pig, sheep, goat, dog, cat, or rodent, capable of developing cancer including human patients that are suspected of having cancer, that have been diagnosed with cancer, or that have a family history of cancer.
  • Methods of identifying subjects suspected of having cancer include but are not limited to: physical examination, family medical history, subject medical history, endometrial biopsy, or a number of imaging technologies such as ultrasonography, computed tomography, magnetic resonance imaging, magnetic resonance spectroscopy, or positron emission tomography.
  • Cancer cells include any cells derived from a tumor, neoplasm, cancer, precancer, cell line, malignancy, or any other source of cells that have the potential to expand and grow to an unlimited degree. Cancer cells may be derived from naturally occurring sources or may be artificially created. Cancer cells may also be capable of invasion into other tissues and metastasis. Cancer cells further encompass any malignant cells that have invaded other tissues and/or metastasized. One or more cancer cells in the context of an organism may also be called a cancer, tumor, neoplasm, growth, malignancy, or any other term used in the art to describe cells in a cancerous state.
  • cancers that could serve as sources of cancer cells include solid tumors such as fibrosarcoma, myxosarcoma, liposarcoma, chondrosarcoma, osteogenic sarcoma, chordoma, angiosarcoma, endotheliosarcoma, lymphangiosarcoma, lymphangio- endothelio sarcoma, synovioma, mesothelioma, Ewing's tumor, leiomyosarcoma, rhabdomyosarcoma, colon cancer, colorectal cancer, kidney cancer, pancreatic cancer, bone cancer, breast cancer, ovarian cancer, prostate cancer, esophageal cancer, stomach cancer, oral cancer, nasal cancer, throat cancer, squamous cell carcinoma, basal cell carcinoma, adenocarcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinomas, cystadenocarcinoma,
  • Additional cancers that may serve as sources of cancer cells include blood borne cancers such as acute lymphoblastic leukemia ("ALL,”), acute lymphoblastic B-cell leukemia, acute lymphoblastic T-cell leukemia, acute myeloblastic leukemia (“AML”), acute promyelocytic leukemia (“APL”), acute monoblastic leukemia, acute erythroleukemic leukemia, acute megakaryoblastic leukemia, acute myelomonocytic leukemia, acute nonlymphocyctic leukemia, acute undifferentiated leukemia, chronic myelocytic leukemia (“CML”), chronic lymphocytic leukemia (“CLL”), hairy cell leukemia, multiple myeloma, lymphoblastic leukemia, myelogenous leukemia, lymphocytic leukemia, myelocytic leukemia, Hodgkin's disease, non- Hodgkin's Lymphoma, Walden
  • kits to be used in assessing the expression of a RNA in a subject to assess the risk of developing disease.
  • Kits include any combination of components that may facilitate the performance of an assay.
  • a kit that facilitates assessing the expression of a RNA may include suitable nucleic acid-based and immunological reagents as well as suitable buffers, control reagents, and printed protocols.
  • kits that facilitate nucleic acid based methods.
  • kits comprise one or more of the following: specific nucleic acids such as oligonucleotides, labeling reagents, enzymes including PCR amplification reagents such as Taq or Pfu, reverse transcriptase, or one or more other polymerases, and/or reagents that facilitate hybridization.
  • Specific nucleic acids may include nucleic acids, polynucleotides, oligonucleotides (DNA, or RNA), or any combination of molecules that includes one or more of the above, or any other molecular entity capable of specific binding to a nucleic acid marker.
  • the specific nucleic acid comprises one or more oligonucleotides capable of hybridizing to the marker.
  • a specific nucleic acid may include a label.
  • a label may be any substance capable of aiding a machine, detector, sensor, device, or enhanced or unenhanced human eye from differentiating a sample that displays positive expression from a sample that displays reduced expression.
  • Examples of labels include but are not limited to: a radioactive isotope or chelate thereof, a dye (fluorescent or nonfluorescent,) stain, enzyme, or nonradioactive metal.
  • EDANS Psoralene derivatives, haptens, cyanines, acridines, fluorescent rhodol derivatives, cholesterol derivatives; ethylenediaminetetraaceticacid
  • EDTA ethylenediaminetetraaceticacid
  • the label includes one or more dyes optimized for use in genotyping.
  • dyes include but are not limited to: FAM, dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ, Gold540, and LIZ.
  • An oligonucleotide may be any polynucleotide of at least 2 nucleotides.
  • Oligonucleotides may be less than 10, less than 15, less than 20, less than 30, less than 40, less than 50, less than 75, less than 100, less than 200, less than 500, or more than 500 nucleotides in length. While oligonucleotides are often linear, they may, depending on their sequence and storage conditions, assume a two- or three-dimensional structure. Oligonucleotides may be chemically synthesized by any of a number of methods including sequential synthesis, solid phase synthesis, or any other synthesis method now known or yet to be disclosed. Alternatively, oligonucleotides may be produced by recombinant DNA based methods. In some aspects of the invention, an oligonucleotide may be 2 to 1000 bases in length.
  • oligonucleotide may be 5 to 500 bases in length, 5 to 100 bases in length, 5 to 50 bases in length, or 10 to 30 bases in length.
  • Oligonucleotides may be directly labeled, used as primers in PCR or sequencing reactions, or affixed directly to a solid substrate as in oligonucleotide arrays among other things.
  • the probe may be affixed to a solid substrate.
  • the sample may be affixed to a solid substrate.
  • a probe or sample may be covalently bound to the substrate or it may be bound by some non covalent interaction including electrostatic, hydrophobic, hydrogen bonding, Van Der Waals, magnetic, or any other interaction by which a probe such as an oligonucleotide probe may be attached to a substrate while maintaining its ability to recognize the allele to which it has specificity.
  • a substrate may be any solid or semisolid material onto which a probe may be affixed, attached or printed, either singly or in the formation of a microarray.
  • substrate materials include but are not limited to polyvinyl, polysterene, polypropylene, polyester or any other plastic, glass, silicon dioxide or other silanes, hydrogels, gold, platinum, microbeads, micelles and other lipid formations, nitrocellulose, or nylon membranes.
  • the substrate may take any form, including a spherical bead or flat surface.
  • the probe may be bound to a substrate in the case of an array.
  • the sample may be bound to a substrate as (for example) the case of a Southern Blot, Northern blot or other method that affixes the sample to a substrate.
  • a kit may also contain an indication of a level of expression that signifies a particular physiological or cellular characteristic.
  • An indication includes any guide to a level of expression that, using the kit in which the indication is provided, would signal the presence or absence of any physiological or cellular state that the kit is configured to detect.
  • the indication may be expressed numerically, expressed as a color, expressed as an intensity of a band, derived from a standard curve, or derived from a control.
  • the indication may be printed on a writing that may be included in the kit or it may be posted on the internet or embedded in a software package.
  • the kit may be used to detect the presence of non-small cell lung cancer (NSCLC) in a patient.
  • NSCLC includes any carcinoma derived from lung tissues that does not include small cell lung cancers. Examples of non-small cell lung cancers include adenocarcinomas, large cell carcinomas, and squamous cell carcinomas of the lung.
  • a patient may be suspected of having NSCLC based upon symptoms such as chest pain, chronic cough, hemoptysis, dyspnea (difficulty breathing), fatigue, lung infection such as pneumonia or bronchitis, shortness of breath, swollen lymph nodes, weight loss, or wheezing.
  • a patient may be suspected of being at risk of having NSCLC based upon environmental conditions such as exposure to tobacco or other smoke, exposure to radon or characteristics such as family history, age, or other conditions.
  • RNA Extraction and cel-miRNA spike in Total RNA was isolated from a volume of 75-200 ⁇ of serum or plasma using phenol and guanidine thiocyanate. A total of 2 ng of cel-miR-39 was spiked in to each sample after the addition of the phenol and guanidine thiocyanate to serve as an external processing control.
  • miRNA microarray profiling A minimum of 100 ng of total RNA from the discovery cohort was added to the GenoExplorerTM microRNA Expression System (GenoSensor Corporation, Tempe, AZ) containing probes in triplicate for 880 mature miRNAs (Sanger miRNA Registry version 13, September 2008) and 473 pre-miRNAs along with positive and negative control probes. SAM data analysis was applied to find significantly differentially- expressed miRNAs in one condition in contrast to the other. Data was normalized to PC-U6B, U6-337, 5S-rRNA, and PC-HU5S. The top differentially-expressed miRNAs in serum from NSCLC patients versus controls were identified based on fold change, p-values, q-values, and false discovery rates (FDR).
  • FDR false discovery rates
  • PCR was performed on the selected miRNA candidates to validate the miRNA array results.
  • the GenoExplorerTM miRNA First-Strand cDNA Core Kit (2002-50, GenoSensor Corporation) was used to generate miRNA first-strand cDNA. miRNA expression levels were measured using SYBR Green (04887352001; Roche; Indianapolis, IN). miRNA specific forward primers and a universal reverse primer were purchased (GenoSensor Corporation). The reaction conditions were 15 minutes denaturation at 94°C followed by 45 cycles of 94°C for 30 seconds, 59°C for 15 seconds, and 72°C for 30 seconds. Melting curve analysis was used to assess the specificity of the amplified product. All qRT-PCR reactions were carried out in triplicate on the Lightcycler 480 (Roche). miRNA expression was normalized to the expression of RNU6 and cel-miR-39 separately.
  • qRT-PCR data analysis A logit regression model was fit on the PCR data with the predictors being one or more miRNAs (a predictor set therefore comprises different combination of miRNAs).
  • a receiver operating characteristic (ROC) curve was plotted and area-under-the-curve (AUC) was computed.
  • P- values for the predictor set using T statistical test and Mann- Whitney- Wilcoxon test are also computed using coefficients obtained from logit regression.
  • the AUC is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. It can be shown that the area under the ROC curve is closely related to the Mann- Whitney U, which tests whether positives are ranked higher than negatives.
  • Baseline characteristics for the validation cohort are shown in Table A.
  • the median age was 62 years, with a majority of men, and nearly all had a documented history of tobacco smoking.
  • the median follow-up time for the validation controls was 40.9 months (range 8.8-108.5 months).
  • Adenocarcinoma was the most common NSCLC histology in the validation cohort.
  • miRNA profiling analysis reveals differentially-expressed miRNAs. miRNA profiling was performed on 24 samples of the discovery cohort (including two separate serum samples from controls #1075 and #1462). Four differentially-expressed miRNAs between the NSCLC patients and the controls were chosen as the most likely to stratify NSCLC patients from controls based on the fold change, p-value, q-value, and FDR (see Table C). The median signal intensity was normalized to the expression of PC-U6B, U6-337, 5S-rRNA, and PC-HU5S, which were used for analysis control. In Table C, LCS stands for lung cancer serum, CS stands for control serum.
  • FIG. 1 shows that marker miR-1254 and miR-574-5p have 77% probability in classification, with 82% sensitivity and 77% specificity.
  • Data in Figure 1 were normalized to expression of cel-miR-39. The data showed that qRT-PCR expression of miR- 1254 and miR-574-5p was able to stratify early stage NSCLC patients from controls with an AUC (area-under-the-curve) of 0.77 and a 82% and 77% sensitivity and specificity, respectively. These data presented the existence of significantly differentially-expressed serum-based miRNAs between NSCLC patients and controls.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention pertains to the use of microRNA methods and kits to identify subjects with early-stage non-small cell lung cancer using serum-based miRNA biomarkers.

Description

METHODS AND KITS USEFUL IN DIAGNOSING NSCLC
CROSS REFERENCE
[0001] This application is related to and claims the priority benefit of U.S. provisional application 61/347,191, filed on May 21, 2010, the teachings and content of which are incorporated by reference herein.
FIELD OF INVENTION
[0002] This present invention is related to a method, kits and miRNA markers for classifying a sample into a group of early-stage Non-Small Cell Lung Cancer (NSCLC).
BACKGROUND OF THE INVENTION
[0003] Lung cancer is the most common cause of cancer-related death in the world. In the United States, over 90 million individuals are at risk for developing lung cancer and this disease is estimated to remain a major health problem for at least the next 50 years. Lung cancer is often insidious, and it may produce no symptoms until the disease is well advanced. Approximately 7-10% of patients with lung cancer are asymptomatic. Over 75% of lung cancer cases are diagnosed an advanced stage, conferring a poor prognosis because the lack of practical way to screen a large number of people at risk. Therefore, the need to detect and diagnose lung cancer at an early and potentially curable stage is great.
[0004] NSCLC (Non-Small Cell Lung Cancer), which accounts for 85% of lung cancers, is the most common type of lung cancer. NSCLC usually grows and spreads more slowly than small cell lung cancer (SCLC), but like other lung cancer it has very poor prognosis. Because of the importance of stage on the therapeutic decision-making process, all patients with non-small cell lung cancer (NSCLC) must be staged adequately. There are four forms of NSCLC using histology: 1) Adenocarcinomas, which are often found in an outer area of the lung; (2) Bronchioloalveolar carcinomas, which are in the distal bronchioles or alveoli of the lung; (3) Squamous cell carcinomas, which are usually found in the center of the lung by an air tube (bronchus); (4) Large cell carcinomas, which can occur in any part of the lung and they tend to grow and spread faster than the other three types. Non-small cell lung cancer is divided into five stages: Stage 0: the cancer has not spread beyond the inner lining of the lung; Stage I: the cancer is small and hasn't spread to the lymph nodes; Stage II: the cancer has spread to some lymph nodes near the original tumor; Stage III: the cancer has spread to nearby tissue or spread to far away lymph nodes; Stage IV: the cancer has spread to other organs of the body such as the other lung, brain, or liver. Stages O-II are deemed as early stage of NSCLC, which are critical for treatment and survival rate.
However, the existing technology for detecting and diagnosing NSCLC, such as computerized tomography (CT) screening to at-risk populations, based on clinical parameters alone has several drawbacks including: (a) detection of significant rate of benign lung nodules at a rate up to 50%; and (b) for every lung cancer death prevented by CT screening, approximately two false-positive invasive procedures will result. It is, therefore, crucial to develop a biomarker for use in NSCLC screening. Such a marker would increase the probability of earlier NSCLC detection, facilitate the therapeutic decision-making process, lower the risk of potential patient harm, and increase efficiency both in terms of cost and resource utilization.
BRIEF SUMMARY OF THE INVENTION
[0005] One aspect of the invention provides a method of classifying a subject into a group. The method generally comprises the steps of receiving a sample containing RNA from the subject; adding a first reagent comprising a first oligonucleotide capable of specific binding to a marker including a sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, and SEQ ID NO: 4 to a mixture comprising the RNA; and subjecting the mixture to conditions that allow detection of the binding of the first reagent to the marker; and classifying the subject into a group based on the expression level of the marker in comparison to the level of the same marker in a control. The general method may further comprise the step of adding reverse transcriptase to the mixture allowing the formation of a DNA template comprising the nucleic acid sequence of the marker. In some forms of the invention, the general methods may further comprise the step of adding a second oligonucleotide and a third oligonucleotide to the mixture, wherein the second oligonucleotide and the third oligonucleotide bind to opposite strands of the DNA template comprising the nucleic acid sequence of the marker allowing nucleic acid amplification. In other forms of the invention, the general method may further comprise the step of adding a fourth oligonucleotide to the mixture wherein the fourth oligonucleotide binds to the DNA template between the sequences to which the second oligonucleotide and the third oligonucleotide are capable of binding. Preferably, the fourth nucleic acid comprises a label. In some forms of the invention, the label comprises a fluorescent label selected from the group consisting of FAM, dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ, Gold540, and LIZ. The sample in this general method is a biological sample. Preferably, the sample comprises bodily fluid selected from a group consisting of whole blood, plasma and serum. In some forms of the general method, the sample is serum. Additionally, the marker in the general method is a classifier for identifying a subject at the risk or having non-small cell lung cancer (NSCLC). Specifically, the higher expression of the marker in the subject in comparison to the expression of the marker in a control classifies the subject into a group of early stage NSCLC. The marker in the general method is selected from the group consisting of miR-574-5p represented by SEQ ID NO: 1, miR-1254 represented by SEQ ID NO: 2, miR-1268 represented by SEQ ID NO: 3, and miR-1228 represented by SEQ ID NO: 4. Preferably, the marker is miR-574-5p represented by SEQ ID NO: 1. Alternatively, the preferred marker is miR-1254 represented by SEQ ID NO: 2. In some form of the general method, the subject is a mammal. Preferably the subject is a human.
[0006] Another aspect of the invention generally provides a kit for facilitating the classification of a subject into a group. Generally, the kit comprises a first reagent comprising a first oligonucleotide capable of specific binding to a marker that includes a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, and SEQ ID NO. 4. In some forms of the kit, the kit may further comprise a second oligonucleotide and a third oligonucleotide, wherein the second oligonucleotide and the third oligonucleotide are capable of binding to opposite strands of a DNA construct comprising the reverse transcription product of the marker. In another form of the kit, the kit may further comprise a fourth oligonucleotide capable of binding to a sequence between the sequences to which the second oligonucleotide and the third oligonucleotide are capable of binding. Preferably, the fourth oligonucleotide will comprise a label. In some form of the invention, the label comprises a fluorescent label selected from the group consisting of FAM, dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ, Gold540, and LIZ. In some forms of the invention, the kit may further comprise an indication of a result that signifies classification of the sample to a group of early stage NSCLC. The result provided by the components of the kit can be in a form selected from the group consisting of a ACt value and a nucleic acid sequence including a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, and SEQ ID NO. 4. In some forms of the invention, the indication of the kit is selected from the group consisting of a positive control, a writing, and software configured to detect the result as input and identification of the sample as one in the early stage NSCLC.
[0007] Still another aspect of the invention provides an isolated sequence having at least
80%, more preferably at least 85%, still more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98%, and even more preferably at least 99% sequence identity with a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4, and reverse transcription and amplification products thereof. The isolated sequence may further comprise a label attached to said sequence. Preferred labels are as described above.
[0008] Other aspects and features of the disclosure are described more thoroughly below.
BRIEF DESCRIPTION OF THE FIGURES
[0009] Figure 1 depicts miR-1254 and miR-574-5p ROC curve using the discovery cohort, and showing that marker miR-1254 and miR-574-5p have an AUC (Area Under ROC Curve) of 0.77, that is 77% probability in classification, and had 82% sensitivity and 77% specificity; and
[0010] Figure 2 depicts miR-1254 and miR-574-5p ROC curve using the validation cohort, and showing that marker miR-1254 and miR-574-5p have an AUC of 0.75 and a 73% and 71% sensitivity and specificity, respectively.
DETAILED DESCRIPTION OF THE INVENTION
[0011] The present invention provides miRNA markers for NSCLC early detection.
Specifically, the miRNA markers are differentially expressed, serum-based, microRNAs between patients with early-stage NSCLC and a control group. The ability to diagnose NSCLS at an early stage leads to improved survical due to early intervention and treatment. I. miRNA and Markers
[0012] miRNAs have been shown as a major new class of regulatory gene products. For example, in human heart, liver or brain, miRNAs play a role in tissue specification or cell lineage decisions. In addition, miRNAs influence a variety of processes, including early development, cell proliferation and cell death, apoptosis, and fat metabolism. The large number of miRNA genes, the diverse expression patterns and the abundance of potential miRNA targets suggest that miRNAs may be a significant but unrecognized source of human genetic disease. Differences in miRNA expression have also been found to be associated with cancer diagnosis, prognosis, and susceptibility to treatments.
[0013] A mature miRNA is typically an 18-25 nucleotide non-coding RNA that regulates expression of mRNA including sequences complementary to the miRNA. These small RNA molecules are known to control gene expression by regulating the stability and/or translation of mRNAs. For example, miRNAs bind to the 3' UTR of target mRNAs and suppress translation. miRNA' s may also bind to target mRNAs and mediate gene silencing through the RNAi pathway. miRNAs may also regulate gene expression by causing chromatin condensation.
[0014] Endogenously expressed miRNAs are processed by endonucleolytic cleavage from larger double- stranded RNA precursor molecules. The resulting small single- stranded miRNAs are incorporated into a multi-protein complex, termed RISC. The small RNA in RISC provides sequence information that is used to guide the RNA-protein complex to its target RNA molecules. The degree of complimentarity between the small RNA and its target determines the fate of the bound mRNA. Perfect pairing induces target RNA cleavage, as is the case for siRNAs and most plant miRNAs. In comparison, the imperfect pairing in the central part of the duplex leads to a block in translation.
[0015] miRNAs regulate various biological functions including developmental processes, developmental timing, cell proliferation, neuronal gene expression and cell fate, apoptosis, tissue growth, viral pathogenesis, brain morphogenesis, muscle differentiation, stem cell division and progression of human diseases. Many miRNAs are conserved in sequence and function between distantly related organisms. However, condition- specific, time- specific, and individual- specific levels of gene expression may be due to the interactions of different miRNAs which lead to genetic expression of various traits. miRNA genetic alterations, such as deletions, insertions, reversions or conversions, may affect the accuracy or specificity of miRNA-related gene regulation. miRNA genetic alterations may be used as markers for disease prognosis and diagnosis. miRNA alleles may alternatively be used as target for disease treatment, and markers for disease prognosis and diagnosis.
[0016] miRNA may be amplified by any of a number of techniques including reverse transcription followed by PCR. Some techniques of reverse transcription of miRNA use a targeted stem-loop primer to prime reverse transcription of the miRNA into a cDNA template. The cDNA template may then be used as a primer for any type of PCR including any type of quantitative PCR. A stem-loop oligonucleotide is a single stranded oligonucleotide that includes a sequence capable of binding to a specific marker because it includes a nucleic acid sequence complementary to the marker. The sequence complementary to the marker is flanked by inverted repeats that form self-complementary sequences. Such nucleotides may contain a fluorophore quencher pair at the 5' and 3' ends of the oligonucleotide.
[0017] A marker may be any molecular structure produced by a cell, expressed inside the cell, accessible on the cell surface, or secreted by the cell. A marker may be any protein, carbohydrate, fat, nucleic acid, catalytic site, or any combination of these, such as, an enzyme, glycoprotein, cell membrane, virus, cell, organ, organelle, or any uni- or multimolecular structure or any other such structure now known or yet to be disclosed whether alone or in combination. A marker may also be called a target and the terms are used interchangeably. A marker may be represented by the sequence of a nucleic acid from which it can be derived or any other chemical structure. Examples of such nucleic acids include miRNA, tRNA, siRNA, mRNA, cDNA, or genomic DNA sequences including complimentary sequences. Alternatively, a marker may be represented by a protein sequence. The concept of a marker is not limited to the products of the exact nucleic acid sequence or protein sequence by which it may be represented. Rather, a marker encompasses all molecules that may be detected by a method of assessing the expression of the marker. In one preferred embodiment of the invention, the marker is an isolated miRNA.
[0018] When an isolated nucleic acid includes a particular sequence, the sequence may be a part of a longer nucleic acid or may be the entirety of that sequence, which is an identified marker associated with a specific pheonotype or a trait. The isolated nucleic acid may contain nucleotides at the 5' end, 3' end of that longer sequence, or both. Thus the concept of a nucleic acid including a particular sequence of a marker further encompasses nucleic acids that contain less than the full sequence of the marker sequence but are still capable of specifically detecting the alleles of the marker.
[0019] The term "allele" encompasses any form of a particular nucleic acid that may be recognized as a form of that particular nucleic acid on account of its location, sequence, epigenetic modification or any other characteristic that may identify it as being a specific form of that particular nucleic acid. Nonlimiting examples of alleles include various forms of a gene that include point mutations, silent mutations, deletions, frameshift mutations, single nucleotide polymorphisms (SNPs), inversions, translocations, heterochromatic insertions, and differentially methylated sequences relative to a reference gene sequence, whether alone or in combination. Different alleles may or may not encode proteins or peptides. Different alleles may differ in expression level, pattern temporal, or spatial specificity, and expression regulation. In case of encoded proteins, the protein from different alleles may or may not be functional. Further, the protein may be gain-of-function, loss- of-function, or with altered function. An allele may also be called a mutation or a mutant. An allele may be compared to another allele that may be termed a wild type form of an allele. In some cases, the wild type allele is more common than the mutant. One aspect of the present invention provides that a miRNA marker represented by a particular sequence or structure includes its alleles comprising insertions, deletions, truncations, alternative splicing derivatives, differentially modified sequences, and altered targeting specificity. The different alleles of a miRNA marker may be in circulating forms or non- circulating forms in some cases, soluble or insoluble forms in some other cases.
[0020] Nucleic acid sequences, including the sequence of a miRNA molecule, may be identified by the IUAPC letter code which is as follows: A - Adenine base; C- Cytosine base; G - guanine base; T or U - thymine or uracil base. M - A or C; R - A or G; W - A or T; S - C or G; Y - C or T; K - G or T; V - A or C or G; H - A or C or T; D - A or G or T; B - C or G or T; N or X - A or C or G or T. Note that T or U may be used interchangeably depending on whether the nucleic acid is DNA or RNA. A sequence having less than 60%, less than 70%, less than 80%, less than 90%, less than 95%, less than 99% or 100% identity to the identifying sequence may still be encompassed by the invention if it is capable of binding to its complementary sequence and/or facilitating nucleic acid amplification of a desired target sequence. If a sequence is represented in degenerate form; for example through the use of codes other than A, C, G, T, or U; the concept of a nucleic acid including the sequence also encompasses a mixture of nucleic acids of different sequences that still meet the conditions imposed by the degenerate sequence.
[0021] "Sequence Identity" as it is known in the art refers to a relationship between two or more polypeptide sequences, or two or more polynucleotide sequences, namely a reference sequence, and a given sequence to be compared with the reference sequence. Sequence identity is determined by comparing the given sequence to the reference sequence after the sequences have been optimally aligned to produce the highest degree of sequence similarity, as determined by the match between strings of such sequences. Upon such alignment, sequence identity is ascertained on a position-by-position basis, e.g., the sequences are "identical" at a particular position if at that position, the nucleotides or amino acid residues are identical. The total number of such position identities is then divided by the total number of nucleotides or residues in the reference sequence to give % sequence identity. Sequence identity can be readily calculated by known methods, including but not limited to, those described in Computational Molecular Biology, Lesk, A. N., ed., Oxford University Press, New York (1988), Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., Academic Press, New York (1993); Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H. G., eds., Humana Press, New Jersey (1994); Sequence Analysis in Molecular Biology, von Heinge, G., Academic Press (1987); Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M. Stockton Press, New York (1991); and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073 (1988), the teachings of which are incorporated herein by reference. Preferred methods to determine the sequence identity are designed to give the largest match between the sequences tested. Methods to determine sequence identity are codified in publicly available computer programs which determine sequence identity between given sequences. Examples of such programs include, but are not limited to, the GCG program package (Devereux, J., et al., Nucleic Acids Research, 12(1):387 (1984)), BLASTP, BLASTN and FASTA (Altschul, S. F. et al., J. Molec. Biol., 215:403-410 (1990). The BLASTX program is publicly available from NCBI and other sources (BLAST Manual, Altschul, S. et al., NCVI NLM NIH Bethesda, MD 20894, Altschul, S. F. et al., J. Molec. Biol., 215:403-410 (1990), the teachings of which are incorporated herein by reference). These programs optimally align sequences using default gap weights in order to produce the highest level of sequence identity between the given and reference sequences. As an illustration, by a polynucleotide having a nucleotide sequence having at least, for example, 85%, preferably 90%, even more preferably 95% "sequence identity" to a reference nucleotide sequence, it is intended that the nucleotide sequence of the given polynucleotide is identical to the reference sequence except that the given polynucleotide sequence may include up to 15, preferably up to 10, even more preferably up to 5 point mutations per each 100 nucleotides of the reference nucleotide sequence. In other words, in a polynucleotide having a nucleotide sequence having at least 85%, preferably 90%, even more preferably 95% identity relative to the reference nucleotide sequence, up to 15%, preferably 10%, even more preferably 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 15%, preferably 10%, even more preferably 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence. These mutations of the reference sequence may occur at the 5' or 3' terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence.
[0022] However, in contrast to "sequence identity", conservative amino acid substitutions are counted as a match when determining sequence homology. "Sequence homology", as used herein, refers to a method of determining the relatedness of two sequences. To determine sequence homology, two or more sequences are optimally aligned, and gaps are introduced if necessary. In other words, to obtain a polypeptide or polynucleotide having 95% sequence homology with a reference sequence, 85%, preferably 90%, even more preferably 95% of the amino acid residues or nucleotides in the reference sequence must match or comprise a conservative substitution with another amino acid or nucleotide, or a number of amino acids or nucleotides up to 15%, preferably up to 10%, even more preferably up to 5% of the total amino acid residues or nucleotides, not including conservative substitutions, in the reference sequence may be inserted into the reference sequence. Preferably, the homolog sequence comprises at least a stretch of 50, even more preferably at least 100, even more preferably at least 250, and even more preferably at least 500 nucleotides. A "conservative substitution" refers to the substitution of an amino acid residue or nucleotide with another amino acid residue or nucleotide having similar characteristics or properties including size, hydrophobicity, etc., such that the overall functionality does not change significantly. [0023] "Isolated" means altered by the hand of man from its natural state, i.e., if it occurs in nature, it has been changed or removed from its original environment, or both. For example, a polynucleotide or polypeptide naturally present in a living organism is not "isolated," but the same polynucleotide or polypeptide separated from the coexisting materials of its natural state is "isolated", as the term is employed herein.
II. Methods for Detecting Marker Expression
[0024] The concept of the expression of a marker encompasses any and all processes producing biological molecules or materials derived from the nucleic acid form of the marker. Expression thus includes processes such as RNA transcription, mRNA splicing, protein translation, protein folding, post-translational modification, membrane transport, associations with other molecules, addition of carbohydrate moeties to proteins, phosphorylation, protein complex formation and any other process along a continuum that results in biological material derived from genetic material whether in vitro, in vivo, or ex vivo. Expression also encompasses all processes through which the production of material derived from a nucleic acid template may be actively or passively suppressed. Such processes include all aspects of transcriptional and translational regulation. Examples of which include, heterochromatic silencing, transcription factor inhibition, any form of RNAi silencing, microRNA silencing, alternative splicing, protease digestion, posttranslational modification, and alternative protein folding.
[0025] Expression may be assessed by any number of methods used to detect material derived from a nucleic acid template used currently in the art and yet to be developed. Examples of such methods include any nucleic acid detection method including the following nonlimiting examples, microarray analysis, RNA in situ hybridization, RNAse protection assay, Northern blot, reverse transcriptase PCR, quantitative PCR, quantitative reverse transcriptase PCR, quantitative real-time reverse transcriptase PCR, reverse transcriptase treatment followed by direct sequencing, direct sequencing of genomic DNA, or any other method of detecting a specific nucleic acid now known or yet to be disclosed. Other examples include any process of assessing protein expression including flow cytometry, immunohistochemistry, ELISA, Western blot, and immunoaffinity chromatograpy, HPLC, mass spectrometry, protein microarray analysis, PAGE analysis, isoelectric focusing, 2-D gel electrophoresis, or any enzymatic assay or any method that uses a protein reagent, nucleic acid reagent, or other reagent capable of specifically binding to or otherwise recognizing a specific nucleic acid or protein marker.
[0026] Other methods used to assess expression include the use of natural or artificial ligands capable of specifically binding a marker. Such ligands include antibodies, antibody complexes, conjugates, natural ligands, small molecules, nanoparticles, or any other molecular entity capable of specific binding to a marker. The term "antibody" is used herein in the broadest sense and refers generally to a molecule that contains at least one antigen binding site that immuno specifically binds to a particular antigen target of interest. Antibody thus includes but is not limited to native antibodies and variants thereof, fragments of native antibodies and variants thereof, peptibodies and variants thereof, and antibody mimetics that mimic the structure and/or function of an antibody or a specified fragment or portion thereof, including single chain antibodies and fragments thereof. The term thus includes full length antibodies and/or their variants as well as immunologically active fragments thereof, thus encompassing, antibody fragments capable of binding to a biological molecule (such as an antigen or receptor) or portions thereof, including but not limited to Fab, Fab' , F(ab')2, facb, pFc', Fd, Fv or scFv (See, e.g., CURRENT PROTOCOLS IN IMMUNOLOGY, (Colligan et al., eds., John Wiley & Sons, Inc., NY, 1994-2001).
[0027] Antibodies may be monoclonal, polyclonal, or any antibody fragment including a
Fab, F(ab)2, Fv, scFv, phage display antibody, peptibody, multispecific ligand, or any other reagent with specific binding to a marker. Ligands may be associated with a label such as a radioactive isotope or chelate thereof, dye (fluorescent or nonfluore scent), stain, enzyme, metal, or any other substance capable of aiding a machine or a human eye from differentiating a cell expressing a marker from a cell not expressing a marker. Additionally, expression may be assessed by monomeric or multimeric ligands associated with substances capable of killing the cell. Such substances include protein or small molecule toxins, cytokines, pro-apoptotic substances, pore forming substances, radioactive isotopes, or any other substance capable of killing a cell.
[0028] Differential expression encompasses any detectable difference between the expression of a marker in one sample relative to the expression of the marker in another sample. Differential expression may be assessed by a detector, an instrument containing a detector, by aided or unaided human eye, or any other method that may detect differential expression. Examples include but are not limited to differential staining of cells in an IHC assay configured to detect a marker, differential detection of bound RNA on a microarray to which a sequence capable of binding to the marker is bound, differential results in measuring RT-PCR measured in ACt or alternatively in the number of PCR cycles necessary to reach a particular optical density at a wavelength at which a double stranded DNA binding dye (e.g., SYBR Green) incorporates, differential results in measuring label from a reporter probe used in an RT-PCR reaction, differential detection of fluorescence labels on cells using a flow cytometer, differential intensities of bands in a Northern blot, differential intensities of bands in an RNAse protection assay, differential cell death measured by apoptotic markers, differential cell death measured by shrinkage of a tumor, or any method that allows a detection of a difference in signal between one sample or set of samples and another sample or set of samples.
[0029] A label used to facilitate the differential expression detection may be any substance capable of aiding a machine, detector, sensor, device, or enhanced or unenhanced human eye from differentiating a labeled composition from an unlabeled composition. Examples of labels include but are not limited to: a radioactive isotope or chelate thereof, dye (fluorescent or nonfluorescent), stain, enzyme, or nonradioactive metal. Specific examples include but are not limited to: fluorescein, biotin, digoxigenin, alkaline phosphatese, biotin, streptavidin, 3H, 14C,
32 P, 35 S, or any other compound capable of emitting radiation, rhodamine, 4-(4'-dimethylamino- phenylazo)benzoic acid ("Dabcyl"), 4-(4'-dimethylamino-phenylazo)sulfonic acid (sulfonyl chloride) ("Dabsyl"), 5-((2-aminoethyl)-amino)-naphtalene-l-sulfonic acid ("EDANS"), Psoralene derivatives, haptens, cyanines, acridines, fluorescent rhodol derivatives, cholesterol derivatives, ethylenediaminetetraaceticacid ("EDTA") and derivatives thereof, or any other compound that may be differentially detected. The label may also include one or more fluorescent dyes optimized for use in genotyping. Examples of such dyes include but are not limited to: dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ+, Gold 540, MGB and LIZ.
[0030] The expression of the marker in a sample may be compared to a level of expression predetermined to predict the presence or absence of a particular cellular or physiological characteristic. The level of expression may be derived from a single control or a set of controls. A control may be any sample with a previously determined level of expression. A control may comprise material within the sample or material from sources other than the sample. Alternatively, the expression of a marker in a sample may be compared to a control that has a level of expression predetermined to signal or not signal a cellular or physiological characteristic. This level of expression may be derived from a single source of material including the sample itself or from a set of sources. Comparison of the expression of the marker in the sample to a particular level of expression results in a prediction that the sample exhibits or does not exhibit the cellular or physiological characteristic.
[0031] Prediction of a cellular or physiological characteristic includes the prediction of any cellular or physiological state that may be predicted by assessing the expression of a marker. Examples include the identity of a cell as a particular cell including a particular normal or cancer cell type, the likelihood that one or more diseases is present or absent, the likelihood that a present disease will progress, remain unchanged, or regress, the likelihood that a disease will respond or not respond to a particular therapy, or any other disease outcome. Further examples include the likelihood that a cell will move, senesce, apoptose, differentiate, metastasize, or change from any state to any other state or maintain its current state.
[0032] Expression of a marker in a sample may be more or less than that of a level predetermined to predict the presence or absence of a cellular or physiological characteristic. The expression of the marker in the sample may be more than l,000,000x, more than 100,000x, more than 10,000x, more than lOOOx, more than lOOx, more than lOx, more than 5x, more than 2x, more than lx, less than lx, less than 0.5x, less than O.lx less than 0.0 lx, less than 0.00 lx, less than 0.000 lx, less than 0.0000 lx, less than 0.00000 lx, less than 0.000000 lx or any value more or less than that of a level predetermined to predict the presence or absence of a cellular or physiological characteristic.
[0033] One type of cellular or physiological characteristic is the risk that a particular disease outcome will occur. Assessing this risk includes the performing of any type of test, assay, examination, result, readout, or interpretation that correlates with an increased or decreased probability that an individual has had, currently has, or will develop a particular disease, disorder, symptom, syndrome, or any condition related to health or bodily state. Examples of disease outcomes include, but need not be limited to survival, death, progression of existing disease, remission of existing disease, initiation of onset of a disease in an otherwise disease-free subject, or the continued lack of disease in a subject in which there has been a remission of disease. Assessing the risk of a particular disease encompasses diagnosis in which the type of disease afflicting a subject is determined. Assessing the risk of a disease outcome also encompasses the concept of prognosis. A prognosis may be any assessment of the risk of disease outcome in an individual in which a particular disease has been diagnosed. Assessing the risk further encompasses prediction of therapeutic response in which a treatment regimen is chosen based on the assessment. Assessing the risk also encompasses a prediction of overall survival after diagnosis.
[0034] Determining the level of expression that signifies a physiological or cellular characteristic may be assessed by any of a number of methods. The skilled artisan understands that numerous methods may be used to select a level of expression for a particular marker or a plurality of markers that signifies a particular physiological or cellular characteristic. In diagnosing the presence of a disease, a threshold value may be obtained by performing the assay method on samples obtained from a population of patients having a certain type of disease (cancer for example), and from a second population of subjects that do not have the disease. In assessing disease outcome or the effect of treatment, a population of patients, all of which have a disease such as cancer, may be followed for a period of time. After the period of time expires, the population may be divided into two or more groups. For example, the population may be divided into a first group of patients whose disease progresses to a particular endpoint and a second group of patients whose disease does not progress to the particular endpoint. Examples of endpoints include disease recurrence, death, metastasis or other states to which disease may progress. If expression of the marker in a sample is more similar to the predetermined expression of the marker in one group relative to the other group, the sample may be assigned a risk of having the same outcome as the patient group to which it is more similar.
[0035] In addition, one or more levels of expression of the marker may be selected that provide an acceptable ability of its ability to signify a particular physiological or cellular characteristic. Examples of such characteristics include identifying or diagnosing a particular disease, assessing a risk of outcome or a prognostic risk, or assessing the risk that a particular treatment will or will not be effective.
[0036] For any particular marker, a distribution of marker expression levels for subjects with and without a disease may overlap. The area of overlap indicates where the test cannot distinguish the two groups. This indicates that the test does not absolutely distinguish between the two populations with complete accuracy. Therefore, Receiver Operating Characteristic curves, or "ROC" curves, may be calculated by plotting the value of a variable versus its relative frequency in two populations. ROC curve, is a graphical plot of the sensitivity, or true positive rate (TPR), vs. false positive rate (FPR), for a binary classifier system as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the fraction of true positives out of the positives (TPR) vs. the fraction of false positives out of the negatives (FPR). TPR determines a classifier or a diagnostic test performance on classifying positive instances correctly among all positive samples available during the test. FPR, on the other hand, defines how many incorrect positive results occur among all negative samples available during the test.
[0037] A ROC space is defined by FPR and TPR as x and y axes respectively, which depicts relative trade-offs between true positive (benefits) and false positive (costs). Since TPR is equivalent with sensitivity and FPR is equal to 1 - specificity, the ROC graph is sometimes called the sensitivity vs (1 - specificity) plot. Each prediction result or one instance of a confusion matrix represents one point in the ROC space. The diagonal divides the ROC space. Points above the diagonal represent good classification results, points below the line poor results. The best possible prediction method would yield a point in the upper left corner or coordinate (0,1) of the ROC space, representing 100% sensitivity (no false negatives) and 100% specificity (no false positives). The (0,1) point is also called a perfect classification.
[0038] When a threshold for classification in a binary classifier system is selected, expression of the marker in the sample above the threshold indicates the sample is similar to one group and expression of the marker below the threshold indicates the sample is similar to the other group. The area under the ROC curve (AUC, "Area Under Curve") is a measure of the probability that the expression correctly indicated the similarity of the sample to the proper group. ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution (Hanley et al, Radiology 143: 29-36 (1982)).
[0039] Additionally, levels of expression may be established by assessing the expression of a marker in a sample from one patient, assessing the expression of additional samples from the same patient obtained later in time, and comparing the expression of the marker from the later samples with the initial sample or samples. This method may be used in the case of markers that indicate, for example, progression or worsening of disease or lack of efficacy of a treatment regimen or remission of a disease or efficacy of a treatment regimen.
[0040] Other methods may be used to assess how accurately the expression of a marker signifies a particular physiological or cellular characteristic. Such methods include a positive likelihood ratio, negative likelihood ratio, odds ratio, and/or hazard ratio. In the case of a likelihood ratio, the likelihood that the expression of the marker would be found in a sample with a particular cellular or physiological characteristic is compared with the likelihood that the expression of the marker would be found in a sample lacking the particular cellular or physiological characteristic.
[0041] An odds ratio measures effect size and describes the amount of association or non-independence between two groups. An odds ratio is the ratio of the odds of a marker being expressed in one set of samples versus the odds of the marker being expressed in the other set of samples. An odds ratio of 1 indicates that the event or condition is equally likely to occur in both groups. An odds ratio grater or less than 1 indicates that expression of the marker is more likely to occur in one group or the other depending on how the odds ratio calculation was set up.
[0042] A hazard ratio may be calculated by an estimate of relative risk. Relative risk is the chance that a particular event will take place. It is a ratio of the probability that an event such as development or progression of a disease will occur in samples that exceed a threshold level of expression of a marker over the probability that the event will occur in samples that do not exceed a threshold level of expression of a marker. Alternatively, a hazard ratio may be calculated by the limit of the number of events per unit time divided by the number at risk as the time interval decreases. In the case of a hazard ratio, a value of 1 indicates that the relative risk is equal in both the first and second groups; a value greater or less than 1 indicates that the risk is greater in one group or another, depending on the inputs into the calculation.
[0043] Additionally, multiple threshold levels of expression may be determined. This can be the case in so-called "tertile," "quartile," or "quintile" analyses. In these methods, multiple groups can be considered together as a single population, and are divided into 3 or more bins having equal numbers of individuals. The boundary between two of these "bins" may be considered threshold levels of expression indicating a particular level of risk of a disease developing or signifying a physiological or cellular state. A risk may be assigned based on which "bin" a test subject falls into. [0044] The invention contemplates assessing the expression of the marker in any biological sample from which the expression may be assessed. The type of biological sample, the subject comprising the sample, and the manner in which the sample is collected can and will vary, and is known to the skilled in the art. Numerous types of samples can be obtained from a subject to produce a miRNA expression profile.
[0045] The term "sample" or "biological sample" is used in its broadest sense.
Depending upon the embodiment of the invention, for example, a sample may comprise a bodily fluid including whole blood, serum, plasma, urine, saliva, cerebral spinal fluid, semen, vaginal fluid, pulmonary fluid, tears, perspiration, mucus and the like; an extract from a cell, chromosome, organelle, or membrane isolated from a cell; a cell; genomic DNA, RNA, or cDNA, in solution or bound to a substrate; a tissue; a tissue print or any other material isolated in whole or in part from a living subject. Such samples include, but are not limited to, tissue isolated from primates, e.g., humans, or rodents, e.g., mice, and rats. Biological samples may also include sections of tissues such as biopsy and autopsy samples, and frozen sections taken for histologic purposes such as blood, plasma, serum, sputum, stool, tears, mucus, hair, skin, and the like. Biological samples also include explants and primary and/or transformed cell cultures derived from patient tissues. A biological sample is typically obtained from a eukaryotic organism, most preferably a mammal such as a primate e.g., chimpanzee or human; cow; dog; cat; a rodent, e.g., guinea pig, rat, mouse; rabbit; or a bird; reptile; or fish.
[0046] A biological sample for use is obtained in methods described in this invention.
Most often, this will be done by removing a sample from a subject, but can also be accomplished by using previously isolated samples (e.g., isolated by another person, at another time, and/or for another purpose), or by performing the methods of the invention in vivo. Archival tissues, having treatment or outcome history, will be particularly useful.
[0047] Examples of sources of samples include but are not limited to biopsy or other in vivo or ex vivo analysis of prostate, breast, skin, muscle, facia, brain, endometrium, lung, head and neck, pancreas, small intestine, blood, liver, testes, ovaries, colon, skin, stomach, esophagus, spleen, lymph node, bone marrow, kidney, placenta, or fetus. In some aspects of the invention, the sample comprises a fluid sample, such as peripheral blood, lymph fluid, ascites, serous fluid, pleural effusion, sputum, cerebrospinal fluid, amniotic fluid, lacrimal fluid, stool, or urine. Samples include single cells, whole organs or any fraction of a whole organ, in any condition including in vitro, ex vivo, in vivo, post-mortem, fresh, fixed, or frozen.
[0048] The term "subject" is used in its broadest sense. In a preferred embodiment, the subject is a mammal. Non-limiting examples of mammals include humans, dogs, cats, horses, cows, sheep, goats, and pigs. Preferably, a subject includes any human or non-human mammal, including for example: a primate, cow, horse, pig, sheep, goat, dog, cat, or rodent, capable of developing cancer including human patients that are suspected of having cancer, that have been diagnosed with cancer, or that have a family history of cancer. Methods of identifying subjects suspected of having cancer include but are not limited to: physical examination, family medical history, subject medical history, endometrial biopsy, or a number of imaging technologies such as ultrasonography, computed tomography, magnetic resonance imaging, magnetic resonance spectroscopy, or positron emission tomography.
[0049] Cancer cells include any cells derived from a tumor, neoplasm, cancer, precancer, cell line, malignancy, or any other source of cells that have the potential to expand and grow to an unlimited degree. Cancer cells may be derived from naturally occurring sources or may be artificially created. Cancer cells may also be capable of invasion into other tissues and metastasis. Cancer cells further encompass any malignant cells that have invaded other tissues and/or metastasized. One or more cancer cells in the context of an organism may also be called a cancer, tumor, neoplasm, growth, malignancy, or any other term used in the art to describe cells in a cancerous state.
[0050] Examples of cancers that could serve as sources of cancer cells include solid tumors such as fibrosarcoma, myxosarcoma, liposarcoma, chondrosarcoma, osteogenic sarcoma, chordoma, angiosarcoma, endotheliosarcoma, lymphangiosarcoma, lymphangio- endothelio sarcoma, synovioma, mesothelioma, Ewing's tumor, leiomyosarcoma, rhabdomyosarcoma, colon cancer, colorectal cancer, kidney cancer, pancreatic cancer, bone cancer, breast cancer, ovarian cancer, prostate cancer, esophageal cancer, stomach cancer, oral cancer, nasal cancer, throat cancer, squamous cell carcinoma, basal cell carcinoma, adenocarcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinomas, cystadenocarcinoma, medullary carcinoma, bronchogenic carcinoma, renal cell carcinoma, hepatoma, bile duct carcinoma, choriocarcinoma, seminoma, embryonal carcinoma, Wilms' tumor, cervical cancer, uterine cancer, testicular cancer, small cell lung carcinoma, bladder carcinoma, lung cancer, epithelial carcinoma, glioma, glioblastoma multiforme, astrocytoma, medulloblastoma, craniopharyngioma, ependymoma, pinealoma, hemangioblastoma, acoustic neuroma, oligodendroglioma, meningioma, skin cancer, melanoma, neuroblastoma, and retinoblastoma.
[0051] Additional cancers that may serve as sources of cancer cells include blood borne cancers such as acute lymphoblastic leukemia ("ALL,"), acute lymphoblastic B-cell leukemia, acute lymphoblastic T-cell leukemia, acute myeloblastic leukemia ("AML"), acute promyelocytic leukemia ("APL"), acute monoblastic leukemia, acute erythroleukemic leukemia, acute megakaryoblastic leukemia, acute myelomonocytic leukemia, acute nonlymphocyctic leukemia, acute undifferentiated leukemia, chronic myelocytic leukemia ("CML"), chronic lymphocytic leukemia ("CLL"), hairy cell leukemia, multiple myeloma, lymphoblastic leukemia, myelogenous leukemia, lymphocytic leukemia, myelocytic leukemia, Hodgkin's disease, non- Hodgkin's Lymphoma, Waldenstrom's macroglobulinemia, Heavy chain disease, and Polycythemia vera.
III. Kits
[0052] The present invention further provides kits to be used in assessing the expression of a RNA in a subject to assess the risk of developing disease. Kits include any combination of components that may facilitate the performance of an assay. A kit that facilitates assessing the expression of a RNA may include suitable nucleic acid-based and immunological reagents as well as suitable buffers, control reagents, and printed protocols.
[0053] The present invention also provides kits that facilitate nucleic acid based methods.
The kits comprise one or more of the following: specific nucleic acids such as oligonucleotides, labeling reagents, enzymes including PCR amplification reagents such as Taq or Pfu, reverse transcriptase, or one or more other polymerases, and/or reagents that facilitate hybridization. Specific nucleic acids may include nucleic acids, polynucleotides, oligonucleotides (DNA, or RNA), or any combination of molecules that includes one or more of the above, or any other molecular entity capable of specific binding to a nucleic acid marker. In one aspect of the invention, the specific nucleic acid comprises one or more oligonucleotides capable of hybridizing to the marker. [0054] A specific nucleic acid may include a label. A label may be any substance capable of aiding a machine, detector, sensor, device, or enhanced or unenhanced human eye from differentiating a sample that displays positive expression from a sample that displays reduced expression. Examples of labels include but are not limited to: a radioactive isotope or chelate thereof, a dye (fluorescent or nonfluorescent,) stain, enzyme, or nonradioactive metal. Specific examples include, but are not limited to: fluorescein, biotin, digoxigenin, alkaline phosphatase, biotin, streptavidin, 3 H, 14 C, 32 P, 35 S, or any other compound capable of emitting radiation, rhodamine, 4-(4'-dimethylaminophenylazo) benzoic acid ("Dabcyl"); 4-(4'- dimethylamino-phenylazo)sulfonic acid (sulfonyl chloride) ("Dabsyl"); 5-((2-aminoethyl)- amino)-naphtalene- 1 -sulfonic acid ("EDANS"); Psoralene derivatives, haptens, cyanines, acridines, fluorescent rhodol derivatives, cholesterol derivatives; ethylenediaminetetraaceticacid ("EDTA"); and derivatives thereof or any other compound that signals the presence of the labeled nucleic acid. In one embodiment of the invention, the label includes one or more dyes optimized for use in genotyping. Examples of such dyes include but are not limited to: FAM, dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ, Gold540, and LIZ.
[0055] An oligonucleotide may be any polynucleotide of at least 2 nucleotides.
Oligonucleotides may be less than 10, less than 15, less than 20, less than 30, less than 40, less than 50, less than 75, less than 100, less than 200, less than 500, or more than 500 nucleotides in length. While oligonucleotides are often linear, they may, depending on their sequence and storage conditions, assume a two- or three-dimensional structure. Oligonucleotides may be chemically synthesized by any of a number of methods including sequential synthesis, solid phase synthesis, or any other synthesis method now known or yet to be disclosed. Alternatively, oligonucleotides may be produced by recombinant DNA based methods. In some aspects of the invention, an oligonucleotide may be 2 to 1000 bases in length. In other aspects, it may be 5 to 500 bases in length, 5 to 100 bases in length, 5 to 50 bases in length, or 10 to 30 bases in length. One skilled in the art would understand the length of oligonucleotide necessary to perform a particular task. Oligonucleotides may be directly labeled, used as primers in PCR or sequencing reactions, or affixed directly to a solid substrate as in oligonucleotide arrays among other things.
[0056] In some aspects of the invention, the probe may be affixed to a solid substrate. In other aspects of the invention, the sample may be affixed to a solid substrate. A probe or sample may be covalently bound to the substrate or it may be bound by some non covalent interaction including electrostatic, hydrophobic, hydrogen bonding, Van Der Waals, magnetic, or any other interaction by which a probe such as an oligonucleotide probe may be attached to a substrate while maintaining its ability to recognize the allele to which it has specificity. A substrate may be any solid or semisolid material onto which a probe may be affixed, attached or printed, either singly or in the formation of a microarray. Examples of substrate materials include but are not limited to polyvinyl, polysterene, polypropylene, polyester or any other plastic, glass, silicon dioxide or other silanes, hydrogels, gold, platinum, microbeads, micelles and other lipid formations, nitrocellulose, or nylon membranes. The substrate may take any form, including a spherical bead or flat surface. For example, the probe may be bound to a substrate in the case of an array. The sample may be bound to a substrate as (for example) the case of a Southern Blot, Northern blot or other method that affixes the sample to a substrate.
[0057] A kit may also contain an indication of a level of expression that signifies a particular physiological or cellular characteristic. An indication includes any guide to a level of expression that, using the kit in which the indication is provided, would signal the presence or absence of any physiological or cellular state that the kit is configured to detect. The indication may be expressed numerically, expressed as a color, expressed as an intensity of a band, derived from a standard curve, or derived from a control. The indication may be printed on a writing that may be included in the kit or it may be posted on the internet or embedded in a software package.
[0058] In some aspects of the invention, the kit may be used to detect the presence of non-small cell lung cancer (NSCLC) in a patient. NSCLC includes any carcinoma derived from lung tissues that does not include small cell lung cancers. Examples of non-small cell lung cancers include adenocarcinomas, large cell carcinomas, and squamous cell carcinomas of the lung. A patient may be suspected of having NSCLC based upon symptoms such as chest pain, chronic cough, hemoptysis, dyspnea (difficulty breathing), fatigue, lung infection such as pneumonia or bronchitis, shortness of breath, swollen lymph nodes, weight loss, or wheezing. A patient may be suspected of being at risk of having NSCLC based upon environmental conditions such as exposure to tobacco or other smoke, exposure to radon or characteristics such as family history, age, or other conditions. EXAMPLES
Example 1 - Serum-based microRNA Biomarkers for Early-stage NSCLC
I. Materials and Methods
[0059] Patient Samples. Serum and plasma samples from patients with (1) early- stage NSCLC or (2) mesothelioma and (3) healthy individuals (controls) with a history of smoking tobacco were utilized in this study. Note: all samples collected from cancer cases were done prior to initiation of cancer directed therapy. The discovery cohort consisted of serum samples from 11 controls and 11 NSCLC patients. The validation cohort consisted of serum samples taken from 22 NSCLC patients and 31 controls. Additionally, serum from 3 early-stage mesothelioma patients and a total of 6 plasma samples from both discovery and validation cohorts (4 NSCLC and 2 controls) were also available for analysis (Table A). All samples were acquired from the Cancer of RESpiratory Tract (CREST) biorepository under a standard protocol.
[0060] RNA Extraction and cel-miRNA spike in. Total RNA was isolated from a volume of 75-200 μΐ of serum or plasma using phenol and guanidine thiocyanate. A total of 2 ng of cel-miR-39 was spiked in to each sample after the addition of the phenol and guanidine thiocyanate to serve as an external processing control.
[0061] miRNA microarray profiling. A minimum of 100 ng of total RNA from the discovery cohort was added to the GenoExplorer™ microRNA Expression System (GenoSensor Corporation, Tempe, AZ) containing probes in triplicate for 880 mature miRNAs (Sanger miRNA Registry version 13, September 2008) and 473 pre-miRNAs along with positive and negative control probes. SAM data analysis was applied to find significantly differentially- expressed miRNAs in one condition in contrast to the other. Data was normalized to PC-U6B, U6-337, 5S-rRNA, and PC-HU5S. The top differentially-expressed miRNAs in serum from NSCLC patients versus controls were identified based on fold change, p-values, q-values, and false discovery rates (FDR).
[0062] Quantitative reverse transcription-PCR (qRT-PCR) analysis of miRNAs. qRT-
PCR was performed on the selected miRNA candidates to validate the miRNA array results. The GenoExplorer™ miRNA First-Strand cDNA Core Kit (2002-50, GenoSensor Corporation) was used to generate miRNA first-strand cDNA. miRNA expression levels were measured using SYBR Green (04887352001; Roche; Indianapolis, IN). miRNA specific forward primers and a universal reverse primer were purchased (GenoSensor Corporation). The reaction conditions were 15 minutes denaturation at 94°C followed by 45 cycles of 94°C for 30 seconds, 59°C for 15 seconds, and 72°C for 30 seconds. Melting curve analysis was used to assess the specificity of the amplified product. All qRT-PCR reactions were carried out in triplicate on the Lightcycler 480 (Roche). miRNA expression was normalized to the expression of RNU6 and cel-miR-39 separately.
[0063] qRT-PCR data analysis. A logit regression model was fit on the PCR data with the predictors being one or more miRNAs (a predictor set therefore comprises different combination of miRNAs). For each logit classifier using a predictor set, a receiver operating characteristic (ROC) curve was plotted and area-under-the-curve (AUC) was computed. P- values for the predictor set using T statistical test and Mann- Whitney- Wilcoxon test are also computed using coefficients obtained from logit regression. The AUC is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. It can be shown that the area under the ROC curve is closely related to the Mann- Whitney U, which tests whether positives are ranked higher than negatives.
II. Results
[0064] RNA extracted from human samples. The 24 available samples from the discovery cohort had a median of 352 ng total RNA (range = 147-1762 ng) derived from 200 μΐ serum and proceeded to miRNA microarray profiling and subsequent qRT-PCR analysis. Baseline characteristics for the discovery cohort are shown in Table A. The median age was 64 and 65 years for controls and NSCLC, respectively. Nearly all were men and all but one individual (control #2678) had a history of tobacco smoking in both groups. The median follow- up time for the discovery cohort controls was 33.4 months (range 27.4-114.9 months). The majority of the discovery cohort NSCLC cases were adenocarcinoma. Samples for the validation cohort became available for analysis at a later date. The 53 serum samples from the validation cohort had a median of 870 ng total RNA (range = 410-3,796 ng) derived from 100 to 200 μΐ serum and proceeded to qRT-PCR analysis. Baseline characteristics for the validation cohort are shown in Table A. In both controls and NSCLC, the median age was 62 years, with a majority of men, and nearly all had a documented history of tobacco smoking. The median follow-up time for the validation controls was 40.9 months (range 8.8-108.5 months). Adenocarcinoma was the most common NSCLC histology in the validation cohort.
Figure imgf000025_0001
Carcinoma
Other na 1 na 1 na
Unavailable na 1 na 1 na
[0065] Total RNA was also extracted from the 3 serum samples from early- stage mesothelioma patients and 6 plasma samples from NSCLC and control for qRT-PCR analysis. Detailed baseline clinical characteristics for all samples are shown in Table B.
Table B. Clinical Characteristics of all Samples
Figure imgf000026_0001
Table B. Clinical Characteristics of all Samples
Figure imgf000027_0001
Table B. Clinical Characteristics of all Samples
Figure imgf000028_0001
Table B. Clinical Characteristics of all Samples
Figure imgf000029_0001
Table B. Clinical Characteristics of all Samples
Figure imgf000030_0001
Samples 1075 and 1462 had two replicates.
^Subject has smoked 4 cigars a day for 30 years.
[0066] miRNA profiling analysis reveals differentially-expressed miRNAs. miRNA profiling was performed on 24 samples of the discovery cohort (including two separate serum samples from controls #1075 and #1462). Four differentially-expressed miRNAs between the NSCLC patients and the controls were chosen as the most likely to stratify NSCLC patients from controls based on the fold change, p-value, q-value, and FDR (see Table C). The median signal intensity was normalized to the expression of PC-U6B, U6-337, 5S-rRNA, and PC-HU5S, which were used for analysis control. In Table C, LCS stands for lung cancer serum, CS stands for control serum. The odd ratio of LCS/CS median greater or less than 1 indicates that expression of the marker is more likely to occur in one group or the other depending on how the odds ratio calculation was set up. Therefore, hsa-miR-574-5p, hsa-miR-1254, hsa-miR-1268 and hsa-miR- 1228 were identified as potential markers associated with early- stage lung cancer.
Figure imgf000030_0002
[0067] The sequences of the four miRNAs are shown in Table D, each of the four miRNAs were followed-up with qRT-PCR verification.
Figure imgf000031_0001
[0068] Validation of miRNA array findings by qRT-PCR in discovery cohort. The four miRNAs identified by array profiling were measured by qRT-PCR in the discovery cohort. All possible combinations of these four miRNAs were plotted on ROC curves to determine the combination that would stratify the patients with the highest sensitivity and specificity (see Table C). The best set of predictors included two of the four miRNAs: hsa-miR-1254 (miR-1254) and hsa-miR-574-5p (miR-574-5p), that were differentially-expressed between the controls and stage Til NSCLC patient samples at a p-values of 0.016 and 0.0277 with t statistical test and Mann- Whitney- Wilcoxon test, respectively. Both markers have a high probability that the expression correctly indicated the similarity of the sample to the proper group, and the high probability was described by AUC. Figure 1 shows that marker miR-1254 and miR-574-5p have 77% probability in classification, with 82% sensitivity and 77% specificity. Data in Figure 1 were normalized to expression of cel-miR-39. The data showed that qRT-PCR expression of miR- 1254 and miR-574-5p was able to stratify early stage NSCLC patients from controls with an AUC (area-under-the-curve) of 0.77 and a 82% and 77% sensitivity and specificity, respectively. These data presented the existence of significantly differentially-expressed serum-based miRNAs between NSCLC patients and controls.
[0069] Measuring miRNA biomarkers in a validation cohort. Expression of miR-
1254 and miR-574-5p was measured in the serum from 53 NSCLC and control cases in a validation cohort. When normalized to cel-miR-39, the ROC curve generated from the validation cohort had an AUC of 0.75 and a 73% and 71% sensitivity and specificity, respectively (p values of 0.0123 and 0.0025 with t statistical test and Mann- Whitney- Wilcoxon test, respectively) (Figure 2). The data show that the model correctly classified 16/22 NSCLC cases and 22/31 control cases, and stratifying early stage NSCLC and controls with a sensitivity and specificity of 73% and 71%, respectively. No differences were seen in the sensitivity or specificity when applying miR-1254 and miR-574-5p classifiers to adenocarcinoma or gender specific analysis. Expression of miR-1254 and miR-574-5p was also measured in the serum of 3 early-stage mesothelioma cases. The available plasma from 4 NSCLC and 2 controls did not consistently classify into either the serum NSCLC or the serum control group.

Claims

CLAIMS We claim:
1. A method of classifying a subject into a group, comprising the steps of:
receiving a sample containing RNA from the subject;
adding a first reagent to a mixture comprising said RNA, said first reagent comprising a first oligonucleotide capable of specific binding to a marker including a sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, and SEQ ID NO: 4; subjecting said mixture to conditions that allow detection of the binding of said first reagent to said marker; and
classifying the subject into a group based on the expression level of said marker in comparison to the level of the same marker in a control.
2. The method of claim 1, further comprising the step of adding reverse transcriptase to said mixture allowing the formation of a DNA template comprising the nucleic acid sequence of the marker.
3. The method of claim 2, further comprising the steps of adding a second oligonucleotide and a third oligonucleotide to said mixture, and amplifying said mixture, wherein said second oligonucleotide and said third oligonucleotide bind to sequences of opposite strands of said DNA template comprising said nucleic acid sequence of the marker.
4. The method of claim 3, further comprising the step of adding a fourth oligonucleotide to said mixture, wherein said fourth oligonucleotide binds to said DNA template between said sequences to which said second oligonucleotide and said third oligonucleotide are bound.
5. The method of claim 4 wherein said fourth nucleic acid comprises a label.
6. The method of claim 5 wherein said label comprises a fluorescent label selected from the group consisting of FAM, dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ, Gold540, and LIZ.
7. The method of claim 1, wherein said sample is a biological sample.
8. The method of claim 1, wherein said sample comprises bodily fluid selected from the group consisting of whole blood, plasma and serum.
9. The method of claim 1, wherein said sample is serum.
10. The method of claim 1, wherein said marker is a classifier for identifying a subject at the risk for developing or having non-small cell lung cancer (NSCLC).
11. The method of claim 10, wherein the higher expression of said marker in the subject in comparison to the expression of said marker in a control classifies the subject in a group of early stage NSCLC.
12. The method of claim 1, wherein said marker is selected from the group consisting of miR- 574-5p represented by SEQ ID NO: 1, miR-1254 represented by SEQ ID NO: 2, miR-1268 represented by SEQ ID NO: 3, and miR-1228 represented by SEQ ID NO: 4.
13. The method of claim 1, wherein said marker is miR-574-5p represented by SEQ ID NO: 1.
14. The method of claim 1, wherein said marker is miR-1254 represented by SEQ ID NO: 2.
15. The method of claim 1, wherein the subject is a mammal.
16. The method of claim 1, wherein the subject is a human.
17. A kit for facilitating the classification of a subject into a group, comprising: a first reagent comprising a first oligonucleotide capable of specific binding to a marker that includes a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, and SEQ ID NO. 4.
18. The kit of claim 17, further comprising a second oligonucleotide and a third oligonucleotide wherein said second oligonucleotide and said third oligonucleotide are capable of binding to sequences of opposite strands of a DNA construct comprising the reverse transcription product of said marker.
19. The kit of claim 18, further comprising a fourth oligonucleotide capable of binding to a sequence between said sequences to which said second oligonucleotide and said third
oligonucleotide are capable of binding.
20. The kit of claim 19, wherein said fourth oligonucleotide comprises a label.
21. The kit of claim 20, wherein said label comprises a fluorescent label selected from the group consisting of FAM, dRl lO, 5-FAM, 6FAM, dR6G, JOE, HEX, VIC, TET, dTAMRA, TAMRA, NED, dROX, PET, BHQ, Gold540, and LIZ.
22. The kit of claim 17, further comprising an indication of a result that signifies classification of the sample to a group of early stage NSCLC.
23. The kit of claim 22, wherein said indication of a result is in a form selected from the group consisting of a ACt value and a nucleic acid sequence including a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, and SEQ ID NO. 4.
24. The kit of claim 22, wherein said indication of a result is selected from the group consisting of a positive control, a writing, and software configured to detect the indication of a result as input and identification of the sample as one in the early stage NSCLC.
25. An isolated sequence having at least 80% sequence identity with a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4, and reverse transcription and amplification products thereof.
26. The isolated sequence of claim 25, further comprising a label attached to said sequence.
PCT/US2011/037606 2010-05-21 2011-05-23 Methods and kits useful in diagnosing nsclc WO2011146937A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US34719110P 2010-05-21 2010-05-21
US61/347,191 2010-05-21

Publications (1)

Publication Number Publication Date
WO2011146937A1 true WO2011146937A1 (en) 2011-11-24

Family

ID=44992092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/037606 WO2011146937A1 (en) 2010-05-21 2011-05-23 Methods and kits useful in diagnosing nsclc

Country Status (1)

Country Link
WO (1) WO2011146937A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012123591A1 (en) * 2011-03-17 2012-09-20 INSERM (Institut National de la Santé et de la Recherche Médicale) Method for targeting nucleic acids to the nucleus
RU2507268C1 (en) * 2012-10-04 2014-02-20 Федеральное государственное унитарное предприятие "Государственный научно-исследовательский институт генетики и селекции промышленных микроорганизмов" (ФГУП "ГосНИИгенетика") System of markers based on group of microrna genes for diagnostics of non-small-cell lung cancer including epidermoid cancer and adenocarcinoma
KR20170018409A (en) 2014-06-18 2017-02-17 도레이 카부시키가이샤 Kit or device for detecting lung cancer, and lung cancer detection method
JP2020523039A (en) * 2017-06-12 2020-08-06 フラウンホファー ゲセルシャフト ツール フェールデルンク ダー アンゲヴァンテン フォルシュンク エー.ファオ. MiRNA-574-5p as a biomarker for stratification of prostaglandin E-dependent tumors
US11634778B2 (en) 2017-06-29 2023-04-25 Toray Industries, Inc. Kit, device, and method for detecting lung cancer

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090143326A1 (en) * 2007-10-04 2009-06-04 Santaris Pharma A/S MICROMIRs
US20090326051A1 (en) * 2008-06-04 2009-12-31 The Board Of Regents Of The University Of Texas System Modulation of gene expression through endogenous small RNA targeting of gene promoters
US20100047804A1 (en) * 2004-05-14 2010-02-25 Rosetta Genomics Ltd. Methods for distinguishing between lung squamous carcinoma and other non small cell lung cancers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100047804A1 (en) * 2004-05-14 2010-02-25 Rosetta Genomics Ltd. Methods for distinguishing between lung squamous carcinoma and other non small cell lung cancers
US20090143326A1 (en) * 2007-10-04 2009-06-04 Santaris Pharma A/S MICROMIRs
US20090326051A1 (en) * 2008-06-04 2009-12-31 The Board Of Regents Of The University Of Texas System Modulation of gene expression through endogenous small RNA targeting of gene promoters

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012123591A1 (en) * 2011-03-17 2012-09-20 INSERM (Institut National de la Santé et de la Recherche Médicale) Method for targeting nucleic acids to the nucleus
RU2507268C1 (en) * 2012-10-04 2014-02-20 Федеральное государственное унитарное предприятие "Государственный научно-исследовательский институт генетики и селекции промышленных микроорганизмов" (ФГУП "ГосНИИгенетика") System of markers based on group of microrna genes for diagnostics of non-small-cell lung cancer including epidermoid cancer and adenocarcinoma
KR20220070556A (en) 2014-06-18 2022-05-31 도레이 카부시키가이샤 Kit or device for detecting lung cancer, and lung cancer detection method
US10620228B2 (en) 2014-06-18 2020-04-14 Toray Industries, Inc. Lung cancer detection kit or device, and detection method
EP3967769A2 (en) 2014-06-18 2022-03-16 Toray Industries, Inc. Lung cancer detection kit or device, and detection method
KR20170018409A (en) 2014-06-18 2017-02-17 도레이 카부시키가이샤 Kit or device for detecting lung cancer, and lung cancer detection method
US11519927B2 (en) 2014-06-18 2022-12-06 Toray Industries, Inc. Lung cancer detection kit or device, and detection method
KR20230038819A (en) 2014-06-18 2023-03-21 도레이 카부시키가이샤 Kit or device for detecting lung cancer, and lung cancer detection method
KR20230136701A (en) 2014-06-18 2023-09-26 도레이 카부시키가이샤 Kit or device for detecting lung cancer, and lung cancer detection method
US12117462B2 (en) 2014-06-18 2024-10-15 Toray Industries, Inc. Lung cancer detection kit or device, and detection method
JP2020523039A (en) * 2017-06-12 2020-08-06 フラウンホファー ゲセルシャフト ツール フェールデルンク ダー アンゲヴァンテン フォルシュンク エー.ファオ. MiRNA-574-5p as a biomarker for stratification of prostaglandin E-dependent tumors
JP7171712B2 (en) 2017-06-12 2022-11-15 フラウンホファー ゲセルシャフト ツール フェールデルンク ダー アンゲヴァンテン フォルシュンク エー.ファオ. miRNA-574-5p as a biomarker for stratification of prostaglandin E-dependent tumors
US11634778B2 (en) 2017-06-29 2023-04-25 Toray Industries, Inc. Kit, device, and method for detecting lung cancer
US12043872B2 (en) 2017-06-29 2024-07-23 Toray Industries, Inc. Kit, device, and method for detecting lung cancer

Similar Documents

Publication Publication Date Title
JP5940517B2 (en) Methods for predicting breast cancer recurrence under endocrine therapy
US8911940B2 (en) Methods of assessing a risk of cancer progression
JP6049739B2 (en) Marker genes for classification of prostate cancer
CN103459597A (en) Marker for predicting stomach cancer prognosis and method for predicting stomach cancer prognosis
JP2018524972A (en) Methods and compositions for diagnosis or detection of lung cancer
JP2011525106A (en) Markers for diffuse B large cell lymphoma and methods of use thereof
JP2011526487A (en) Breast cancer genome fingerprint
WO2017223216A1 (en) Compositions and methods for diagnosing lung cancers using gene expression profiles
US20150024956A1 (en) Methods for diagnosis and/or prognosis of gynecological cancer
CN110229899B (en) Plasma marker combinations for early diagnosis or prognosis prediction of colorectal cancer
EP3918611A1 (en) Novel biomarkers and diagnostic profiles for prostate cancer
WO2011146937A1 (en) Methods and kits useful in diagnosing nsclc
CN115927608B (en) Biomarkers, methods and diagnostic devices for predicting pancreatic cancer risk
KR102156282B1 (en) Method of predicting prognosis of brain tumors
CN110573629B (en) Methods and kits for diagnosing early pancreatic cancer
JP6949315B2 (en) Differentiation evaluation method for squamous cell lung cancer and adenocarcinoma of the lung
CN110331207A (en) Adenocarcinoma of lung biomarker and related application
KR20190143417A (en) Method of predicting prognosis of brain tumors
CN118414427A (en) Urine miRNA marker for diagnosing prostate cancer, diagnostic reagent and kit
EP3728630A1 (en) Compositions and methods for diagnosing lung cancers using gene expression profiles
JPWO2015105190A1 (en) Evaluation method for lymph node metastasis of endometrial cancer
CN114164273A (en) Prognosis marker of squamous cell carcinoma, establishment method of prognosis risk evaluation model and application of prognosis marker
US10066270B2 (en) Methods and kits used in classifying adrenocortical carcinoma
US20190010558A1 (en) Method for determining the risk of recurrence of an estrogen receptor-positive and her2-negative primary mammary carcinoma under an endocrine therapy
CN106636351B (en) One kind SNP marker relevant to breast cancer and its application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11784392

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11784392

Country of ref document: EP

Kind code of ref document: A1