US20230366034A1 - Compositions and methods for diagnosing lung cancers using gene expression profiles - Google Patents

Compositions and methods for diagnosing lung cancers using gene expression profiles Download PDF

Info

Publication number
US20230366034A1
US20230366034A1 US18/306,548 US202318306548A US2023366034A1 US 20230366034 A1 US20230366034 A1 US 20230366034A1 US 202318306548 A US202318306548 A US 202318306548A US 2023366034 A1 US2023366034 A1 US 2023366034A1
Authority
US
United States
Prior art keywords
endogenous
genes
gene
composition
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/306,548
Inventor
Michael Showe
Louise C. Showe
Andrei V. Kossenkov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wistar Institute of Anatomy and Biology
Original Assignee
Wistar Institute of Anatomy and Biology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wistar Institute of Anatomy and Biology filed Critical Wistar Institute of Anatomy and Biology
Priority to US18/306,548 priority Critical patent/US20230366034A1/en
Publication of US20230366034A1 publication Critical patent/US20230366034A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57423Specifically defined cancers of lung
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/60Complex ways of combining multiple protein biomarkers for diagnosis

Definitions

  • Lung cancer is the most common worldwide cause of cancer mortality. In the United States, lung cancer is the second most prevalent cancer in both men and women and will account for more than 174,000 new cases per year and more than 162,000 cancer deaths. In fact, lung cancer accounts for more deaths each year than from breast, prostate and colorectal cancers combined.
  • the high mortality (80-85% in five years), which has shown little or no improvement in the past 30 years, emphasizes the fact that new and effective tools to facilitate early diagnosis prior to metastasis to regional nodes or beyond the lung are needed.
  • High risk populations include smokers, former smokers, and individuals with markers associated with genetic predispositions. Because surgical removal of early stage tumors remains the most effective treatment for lung cancer, there has been great interest in screening high-risk patients with low dose spiral CT (LDCT). This strategy identifies non-calcified pulmonary nodules in approximately 30-70% of high risk individuals but only a small proportion of detected nodules are ultimately diagnosed as lung cancers (0.4 to 2.7%). Currently, the only way to differentiate subjects with lung nodules of benign etiology from subjects with malignant nodules is an invasive biopsy, surgery, or prolonged observation with repeated scanning.
  • LDCT low dose spiral CT
  • a diagnostic test would be easily accessible, inexpensive, demonstrate high sensitivity and specificity, and result in improved patient outcomes (medically and financially).
  • Others have shown that classifiers which utilize epithelial cells have high accuracy. However, harvesting these cells requires an invasive bronchoscopy. See, Silvestri et al, N Engl J Med. 2015 Jul. 16; 373(3): 243-251, which is incorporated herein by reference.
  • a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more polynucleotides or oligonucleotides, wherein each polynucleotide or oligonucleotide hybridizes to a different gene, gene fragment, gene transcript or expression product in a patient sample.
  • Each gene, gene fragment, gene transcript or expression product is selected from the genes of Table I or Table II.
  • at least one polynucleotide or oligonucleotide is attached to a detectable label.
  • the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 559 genes in Table I. In another embodiment, the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 100 genes in Table II.
  • a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more ligands, wherein each ligand hybridizes to a different gene expression product in a patient sample.
  • Each gene expression product is selected from the genes of Table I or Table II.
  • at least one ligand is attached to a detectable label.
  • the composition or kit includes ligands which detect the expression products of each of the 559 genes in Table I.
  • the composition or kit includes ligands which detect the expression products of each of the 100 genes in Table II.
  • compositions described herein enable detection of changes in expression in the genes in the subject's gene expression profile from that of a reference gene expression profile.
  • the various reference gene expression profiles are described below.
  • the composition provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.
  • a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying changes in the expression of three or more genes in the sample of a subject, said genes selected from the genes of Table I or Table II, and comparing that subject's gene expression levels with the levels of the same genes in a reference or control, wherein changes in expression of said gene expression correlates with a diagnosis or evaluation of a lung cancer.
  • the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.
  • a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying a gene expression profile in the blood of a subject, the gene expression profile comprising 10 or more gene expression products of 10 or more informative genes as described herein.
  • the 10 or more informative genes are selected from the genes of Table I or Table II.
  • the gene expression profile contains all 559 genes of Table I.
  • the gene expression profile contains all 100 genes of Table II.
  • the subject's gene expression profile is compared with a reference gene expression profile from a variety of sources described below. Changes in expression of the informative genes correlate with a diagnosis or evaluation of a lung cancer.
  • the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.
  • a method of detecting lung cancer in a patient includes obtaining a sample from the patient; and detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product.
  • a method of diagnosing lung cancer in a subject includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; and diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected.
  • a method of diagnosing and treating lung cancer in a subject having a neoplastic growth includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected; and removing the neoplastic growth.
  • Other appropriate treatments may also be provided.
  • FIG. 1 is a table showing patient characteristics for the samples used in Example 1.
  • CV SVM cross validated support vector machine classifier
  • the chart ( FIG. 4 A ) shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.
  • the sensitivity is 0.90, the specificity is 0.24; when the sensitivity is 0.87, the specificity is 0.42).
  • FIG. 6 is a graph showing the cross validated support vector machine classifier (CV SVM) of 25% of the data set used for the 559 Classifier, used as a testing set for the 100 Classifier.
  • ROC Area 0.82. According to the curve, when the sensitivity is 0.90, the specificity is 0.62; when the sensitivity is 0.79, the specificity is 0.68; and when the sensitivity is 0.71, the specificity is 0.75.
  • compositions and methods described herein apply gene expression technology to blood screening for the detection and diagnosis of lung cancer.
  • the compositions and methods described herein provide the ability to distinguish a cancerous tumor from a non-cancerous nodule, by determining a characteristic RNA expression profile of the genes of the blood of a mammalian, preferably human, subject. The profile is compared with the profile of one or more subjects of the same class (e.g., patients having lung cancer or a non-cancerous nodule) or a control to provide a useful diagnosis.
  • lung cancer screening employ compositions suitable for conducting a simple and cost-effective and non-invasive blood test using gene expression profiling that could alert the patient and physician to obtain further studies, such as a chest radiograph or CT scan, in much the same way that the prostate specific antigen is used to help diagnose and follow the progress of prostate cancer.
  • gene expression profiling that could alert the patient and physician to obtain further studies, such as a chest radiograph or CT scan, in much the same way that the prostate specific antigen is used to help diagnose and follow the progress of prostate cancer.
  • the application of these profiles provides overlapping and confirmatory diagnoses of the type of lung disease, beginning with the initial test for malignant vs. non-malignant disease.
  • “Patient” or “subject” as used herein means a mammalian animal, including a human, a veterinary or farm animal, a domestic animal or pet, and animals normally used for clinical research. In one embodiment, the subject of these methods and compositions is a human.
  • Control or “Control subject” as used herein refers to the source of the reference gene expression profiles as well as the particular panel of control subjects described herein.
  • the control or reference level is from a single subject.
  • the control or reference level is from a population of individuals sharing a specific characteristic.
  • the control or reference level is an assigned value which correlates with the level of a specific control individual or population, although not necessarily measured at the time of assaying the test subject's sample.
  • the control subject or reference is from a patient (or population) having a non-cancerous nodule.
  • the control subject or reference is from a patient (or population) having a cancerous tumor.
  • control subject can be a subject or population with lung cancer, such as a subject who is a current or former smoker with malignant disease, a subject with a solid lung tumor prior to surgery for removal of same; a subject with a solid lung tumor following surgical removal of said tumor; a subject with a solid lung tumor prior to therapy for same; and a subject with a solid lung tumor during or following therapy for same.
  • the controls for purposes of the compositions and methods described herein include any of the following classes of reference human subject with no lung cancer.
  • Such non-healthy controls include the classes of smoker with non-malignant disease, a former smoker with non-malignant disease (including patients with lung nodules), a non-smoker who has chronic obstructive pulmonary disease (COPD), and a former smoker with COPD.
  • the control subject is a healthy non-smoker with no disease or a healthy smoker with no disease.
  • sample as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells.
  • the most suitable sample for use in this invention includes whole blood.
  • Other useful biological samples include, without limitation, peripheral blood mononuclear cells, plasma, saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoscopy sample, bronchoalveolar lavage fluid, and other cellular exudates from a patient having cancer.
  • Such samples may further be diluted with saline, buffer or a physiologically acceptable diluent. Alternatively, such samples are concentrated by conventional means.
  • the term “cancer” refers to or describes the physiological condition in mammals that is typically characterized by unregulated cell growth. More specifically, as used herein, the term “cancer” means any lung cancer.
  • the lung cancer is non-small cell lung cancer (NSCLC).
  • the lung cancer is lung adenocarcinoma (AC or LAC).
  • the lung cancer is lung squamous cell carcinoma (SCC or LSCC).
  • the lung cancer is a stage I or stage II NSCLC.
  • the lung cancer is a mixture of early and late stages and types of NSCLC.
  • tumor refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues.
  • nodule refers to an abnormal buildup of tissue which is benign.
  • cancer tumor refers to a malignant tumor.
  • diagnosis or “evaluation” it is meant a diagnosis of a lung cancer, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy.
  • diagnosis or “evaluation” refers to distinguishing between a cancerous tumor and a benign pulmonary nodule.
  • sensitivity also called the true positive rate
  • measures the proportion of positives that are correctly identified as such e.g., the percentage of sick people who are correctly identified as having the condition.
  • specificity also called the true negative rate measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).
  • change in expression is meant an upregulation of one or more selected genes in comparison to the reference or control; a downregulation of one or more selected genes in comparison to the reference or control; or a combination of certain upregulated genes and down regulated genes.
  • therapeutic reagent or “regimen” is meant any type of treatment employed in the treatment of cancers with or without solid tumors, including, without limitation, chemotherapeutic pharmaceuticals, biological response modifiers, radiation, diet, vitamin therapy, hormone therapies, gene therapy, surgical resection, etc.
  • informative genes as used herein is meant those genes the expression of which changes (either in an up-regulated or down-regulated manner) characteristically in the presence of lung cancer. A statistically significant number of such informative genes thus form suitable gene expression profiles for use in the methods and compositions. Such genes are shown in Table I and Table II below. Such genes make up the “expression profile”.
  • the term “statistically significant number of genes” in the context of this invention differs depending on the degree of change in gene expression observed.
  • the degree of change in gene expression varies with the type of cancer and with the size or spread of the cancer or solid tumor.
  • the degree of change also varies with the immune response of the individual and is subject to variation with each individual. For example, in one embodiment of this invention, a large change, e.g., 2-3 fold increase or decrease in a small number of genes, e.g., in about 10 to 20 genes, is statistically significant. In another embodiment, a smaller relative change in about 15 more genes is statistically significant.
  • the methods and compositions described herein contemplate examination of the expression profile of a “statistically significant number of genes” ranging from 5 to about 559 genes in a single profile.
  • the genes are selected from Table I.
  • the genes are selected from Table II.
  • the gene profile is formed by a statistically significant number of 5 or more genes.
  • the gene profile is formed by a statistically significant number of 10 or more genes.
  • the gene profile is formed by a statistically significant number of 15 or more genes.
  • the gene profile is formed by a statistically significant number of 20 or more genes.
  • the gene profile is formed by a statistically significant number of 25 or more genes.
  • the gene profile is formed by a statistically significant number of 30 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 35 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 40 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 45 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 50 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 60 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 65 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 70 or more genes.
  • the gene profile is formed by a statistically significant number of 75 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 80 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 85 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 90 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 95 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 100 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 200 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 300 or more genes.
  • the gene profile is formed by a statistically significant number of 350 or more genes. In still another embodiment, the gene profile is formed by 400 or more genes. In still another embodiment, the gene profile is formed by 539 genes. In still another embodiment, the gene profile is formed by 559 genes. In still other embodiments, the gene profiles examined as part of these methods contain, as statistically significant numbers of genes, from 10 to 559 genes, and any numbers therebetween.
  • the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116
  • the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or all 100 genes of Table II.
  • Table I and Table II below refer to a collection of known genes useful in discriminating between a subject having a lung cancer, e.g., NSCLC, and subjects having benign (non-malignant) lung nodules.
  • the sequences of the genes identified in Table I and Table II are publicly available.
  • One skilled in the art may readily reproduce the compositions and methods described herein by use of the sequences of the genes, all of which are publicly available from conventional sources, such as GenBank.
  • GenBank accession number for each gene is provided.
  • microarray refers to an ordered arrangement of hybridizable array elements, preferably polynucleotide or oligonucleotide probes, on a substrate.
  • polynucleotide when used in singular or plural form, generally refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA.
  • polynucleotides as defined herein include, without limitation, single- and double-stranded DNA, DNA including single- and double-stranded regions, single- and double-stranded RNA, and RNA including single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or include single- and double-stranded regions.
  • polynucleotide refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA.
  • the strands in such regions may be from the same molecule or from different molecules.
  • the regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules.
  • One of the molecules of a triple-helical region often is an oligonucleotide.
  • polynucleotide specifically includes cDNAs.
  • the term includes DNAs (including cDNAs) and RNAs that contain one or more modified bases.
  • DNAs or RNAs with backbones modified for stability or for other reasons are “polynucleotides” as that term is intended herein.
  • DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritiated bases are included within the term “polynucleotides” as defined herein.
  • polynucleotide embraces all chemically, enzymatically and/or metabolically modified forms of unmodified polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including simple and complex cells.
  • oligonucleotide refers to a relatively short polynucleotide, including, without limitation, single-stranded deoxyribonucleotides, single- or double-stranded ribonucleotides, RNA:DNA hybrids and double-stranded DNAs. Oligonucleotides, such as single-stranded DNA probe oligonucleotides, are often synthesized by chemical methods, for example using automated oligonucleotide synthesizers that are commercially available. However, oligonucleotides can be made by a variety of other methods, including in vitro recombinant DNA-mediated techniques and by expression of DNAs in cells and organisms.
  • differentially expressed gene refers to a gene whose expression is activated to a higher or lower level in a subject suffering from a disease, specifically cancer, such as lung cancer, relative to its expression in a control subject, such as a subject having a benign nodule.
  • the terms also include genes whose expression is activated to a higher or lower level at different stages of the same disease. It is also understood that a differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product.
  • Differential gene expression may include a comparison of expression between two or more genes or their gene products, or a comparison of the ratios of the expression between two or more genes or their gene products, or even a comparison of two differently processed products of the same gene, which differ between normal subjects, non-health controls and subjects suffering from a disease, specifically cancer, or between various stages of the same disease.
  • Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages.
  • “differential gene expression” is considered to be present when there is a statistically significant (p ⁇ 0.05) difference in gene expression between the subject and control samples.
  • RNA transcript is used to refer to the level of the transcript determined by normalization to the level of reference mRNAs, which might be all measured transcripts in the specimen or a particular reference set of mRNAs.
  • gene amplification refers to a process by which multiple copies of a gene or gene fragment are formed in a particular cell or cell line.
  • the duplicated region (a stretch of amplified DNA) is often referred to as “amplicon.”
  • amplicon a stretch of amplified DNA
  • the amount of the messenger RNA (mRNA) produced i.e., the level of gene expression, also increases in the proportion of the number of copies made of the particular gene expressed.
  • suitable gene expression profiles include profiles containing any number between at least 5 through 559 genes from Table I.
  • suitable gene expression profiles include profiles containing any number between at least 5 through 100 genes from Table II.
  • gene profiles formed by genes selected from a table are used in rank order, e.g., genes ranked in the top of the list demonstrated more significant discriminatory results in the tests, and thus may be more significant in a profile than lower ranked genes.
  • the genes forming a useful gene profile do not have to be in rank order and may be any gene from the table.
  • the term “100 Classifier” or “100 Biomarker Classifier” refers to the 100 genes of Table II.
  • the term “559 Classifier” or “559 Biomarker Classifier” refers to the 559 genes of Table I.
  • subsets of the genes of Table I or Table II, as described herein, are also useful, and, in another embodiment, the terms may refer to those subsets as well.
  • labels or “reporter molecules” are chemical or biochemical moieties useful for labeling a nucleic acid (including a single nucleotide), polynucleotide, oligonucleotide, or protein ligand, e.g., amino acid or antibody.
  • Labels and “reporter molecules” include fluorescent agents, chemiluminescent agents, chromogenic agents, quenching agents, radionucleotides, enzymes, substrates, cofactors, inhibitors, magnetic particles, and other moieties known in the art.
  • Labels or “reporter molecules” are capable of generating a measurable signal and may be covalently or noncovalently joined or bound to an oligonucleotide or nucleotide (e.g., a non-natural nucleotide) or ligand.
  • the inventors have shown that the gene expression profiles of the whole blood of lung cancer patients differ significantly from those seen in patients having non-cancerous lung nodules. For example, changes in the gene expression products of the genes of Table I and/or Table II can be observed and detected by the methods of this invention in the normal circulating blood of patients with early stage solid lung tumors.
  • the gene expression profiles described herein provide new diagnostic markers for the early detection of lung cancer and could prevent patients from undergoing unnecessary procedures relating to surgery or biopsy for a benign nodule. Since the risks are very low, the benefit to risk ratio is very high.
  • the methods and compositions described herein may be used in conjunction with clinical risk factors to help physicians make more accurate decisions about how to manage patients with lung nodules. Another advantage of this invention is that diagnosis may occur early since diagnosis is not dependent upon detecting circulating tumor cells which are present in only vanishing small numbers in early stage lung cancers.
  • a composition for classifying a nodule as cancerous or benign in a mammalian subject.
  • the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.
  • the polynucleotide or oligonucleotide or ligand hybridizes to an mRNA.
  • a novel gene expression profile or signature can identify and distinguish patients having cancerous tumors from patients having benign nodules. See for example the genes identified in Table I and Table II which may form a suitable gene expression profile. In another embodiment, a portion of the genes of Table I form a suitable profile. In yet another embodiment, a portion of the genes of Table II form a suitable profile. As discussed herein, these profiles are used to distinguish between cancerous and non-cancerous tumors by generating a discriminant score based on differences in gene expression profiles as exemplified below. The validity of these signatures was established on samples collected at different locations by different groups in a cohort of patients with undiagnosed lung nodules. See Example 7 and FIGS. 2 A- 2 B and FIG. 6 .
  • the lung cancer signatures or gene expression profiles identified herein i.e., Table I or Table II
  • the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.
  • the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II.
  • the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.
  • the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II.
  • the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117
  • the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridize
  • the composition includes at least 3 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 5 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 15 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 20 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 25 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 30 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 35 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 40 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 45 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 50 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 55 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 60 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 65 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 70 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 75 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 80 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 85 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 90 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 95 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 150 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.
  • the composition includes at least 200 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 250 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.
  • the composition includes at least 300 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 350 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.
  • the composition includes at least 400 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 450 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.
  • the composition includes at least 500 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.
  • the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table I.
  • the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table II.
  • the expression profile is formed by the first 3 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 5 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 10 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 15 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 20 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 25 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 30 genes in rank order of Table I or Table II.
  • the expression profile is formed by the first 35 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 40 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 45 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 50 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 55 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 60 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 65 genes in rank order of Table I or Table II.
  • the expression profile is formed by the first 70 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 75 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 80 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 85 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 90 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 95 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 100 genes in rank order of Table I or Table II.
  • the expression profile is formed by the first 150 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 200 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 250 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 300 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 350 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 400 genes in rank order of Table I. In yet another embodiment, the expression profile is formed by the first 539 genes in rank order of Table I.
  • compositions described herein can be used with the gene expression profiling methods which are known in the art.
  • the compositions can be adapted accordingly to suit the method for which they are intended to be used.
  • at least one polynucleotide or oligonucleotide or ligand is attached to a detectable label.
  • each polynucleotide or oligonucleotide is attached to a different detectable label, each capable of being detected independently.
  • Such reagents are useful in assays such as the nCounter, as described below, and with the diagnostic methods described herein.
  • the composition comprises a capture oligonucleotide or ligand, which hybridizes to at least one polynucleotide or oligonucleotide or ligand.
  • capture oligonucleotide or ligand may include a nucleic acid sequence which is specific for a portion of the oligonucleotide or polynucleotide or ligand which is specific for the gene of interest.
  • the capture ligand may be a peptide or polypeptide which is specific for the ligand to the gene of interest.
  • the capture ligand is an antibody, as in a sandwich ELISA.
  • the capture oligonucleotide also includes a moiety which allows for binding with a substrate.
  • a substrate includes, without limitation, a plate, bead, slide, well, chip or chamber.
  • the composition includes a capture oligonucleotide for each different polynucleotide or oligonucleotide which is specific to a gene of interest.
  • Each capture oligonucleotide may contain the same moiety which allows for binding with the same substrate.
  • the binding moiety is biotin.
  • a composition for such diagnosis or evaluation in a mammalian subject as described herein can be a kit or a reagent.
  • a composition includes a substrate upon which the ligands used to detect and quantitate mRNA are immobilized.
  • the reagent in one embodiment, is an amplification nucleic acid primer (such as an RNA primer) or primer pair that amplifies and detects a nucleic acid sequence of the mRNA.
  • the reagent is a polynucleotide probe that hybridizes to the target sequence.
  • the target sequences are illustrated in Table III.
  • the reagent is an antibody or fragment of an antibody.
  • the reagent can include multiple said primers, probes or antibodies, each specific for at least one gene, gene fragment or expression product of Table I or Table II.
  • the reagent can be associated with a conventional detectable label.
  • the composition is a kit containing the relevant multiple polynucleotides or oligonucleotide probes or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items.
  • at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label.
  • the reagent is immobilized on a substrate.
  • Exemplary substrates include a microarray, chip, microfluidics card, or chamber.
  • the composition is a kit designed for use with the nCounter Nanostring system, as further discussed below.
  • Methods of gene expression profiling that were used in generating the profiles useful in the compositions and methods described herein or in performing the diagnostic steps using the compositions described herein are known and well summarized in U.S. Pat. No. 7,081,340.
  • Such methods of gene expression profiling include methods based on hybridization analysis of polynucleotides, methods based on sequencing of polynucleotides, and proteomics-based methods.
  • the most commonly used methods known in the art for the quantification of mRNA expression in a sample include northern blotting and in situ hybridization; RNAse protection assays; nCounter® Analysis; and PCR-based methods, such as RT-PCR.
  • antibodies may be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes.
  • Representative methods for sequencing-based gene expression analysis include Serial Analysis of Gene Expression (SAGE), and gene expression analysis by massively parallel signature sequencing (MPSS).
  • compositions described herein are adapted for use in the methods of gene expression profiling and/or diagnosis described herein, and those known in the art.
  • sample or “biological sample” as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells.
  • a suitable sample is whole blood.
  • the sample may be venous blood.
  • the sample may be arterial blood.
  • a suitable sample for use in the methods described herein includes peripheral blood, more specifically peripheral blood mononuclear cells.
  • Other useful biological samples include, without limitation, plasma or serum.
  • the sample is saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoalveolar lavage fluid, and other cellular exudates from a subject suspected of having a lung disease.
  • samples may further be diluted with saline, buffer or a physiologically acceptable diluent.
  • such samples are concentrated by conventional means. It should be understood that the use or reference throughout this specification to any one biological sample is exemplary only. For example, where in the specification the sample is referred to as whole blood, it is understood that other samples, e.g., serum, plasma, etc., may also be employed in another embodiment.
  • the biological sample is whole blood
  • the method employs the PaxGene Blood RNA Workflow system (Qiagen). That system involves blood collection (e.g., single blood draws) and RNA stabilization, followed by transport and storage, followed by purification of Total RNA and Molecular RNA testing.
  • This system provides immediate RNA stabilization and consistent blood draw volumes.
  • the blood can be drawn at a physician's office or clinic, and the specimen transported and stored in the same tube.
  • Short term RNA stability is 3 days at between 18-25° C. or 5 days at between 2-8° C. Long term RNA stability is 4 years at ⁇ 20 to ⁇ 70° C.
  • This sample collection system enables the user to reliably obtain data on gene expression in whole blood.
  • the biological sample is whole blood. While the PAXgene system has more noise than the use of PBMC as a biological sample source, the benefits of PAXgene sample collection outweighs the problems. Noise can be subtracted bioinformatically by the person of skill in the art.
  • the biological samples may be collected using the proprietary PaxGene Blood RNA System (PreAnalytiX, a Qiagen, BD company).
  • the PAXgene Blood RNA System comprises two integrated components: PAXgene Blood RNA Tube and the PAXgene Blood RNA Kit. Blood samples are drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs. Thereafter, the miRNA is detected and/or measured using a variety of assays.
  • nCounter® Analysis system (NanoString Technologies, Inc., Seattle WA).
  • the nCounter Analysis System utilizes a digital color-coded barcode technology that is based on direct multiplexed measurement of gene expression and offers high levels of precision and sensitivity ( ⁇ 1 copy per cell).
  • the technology uses molecular “barcodes” and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction.
  • Each color-coded barcode is attached to a single target-specific probe (i.e., polynucleotide, oligonucleotide or ligand) corresponding to a gene of interest, i.e., a gene of Table I.
  • the CodeSet includes all 559 genes of Table I. In another embodiment, the CodeSet includes all 100 genes of Table II. In another embodiment, the CodeSet includes at least 3 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 5 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 10 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 15 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 20 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 25 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 30 genes of Table I or Table II.
  • the CodeSet includes at least 40 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 50 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 60 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 70 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 80 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 90 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 100 genes of Table I. In another embodiment, the CodeSet includes at least 200 genes of Table I. In another embodiment, the CodeSet includes at least 300 genes of Table I. In yet another embodiment, the CodeSet includes at least 400 genes of Table I.
  • the CodeSet includes at least 500 genes of Table I. In yet another embodiment, the CodeSet is formed by the first 539 genes in rank order of Table I. In yet another embodiment, the CodeSet includes any subset of genes of Table I, as described herein. In another embodiment, the CodeSet includes any subset of genes of Table II, as described herein.
  • the NanoString platform employs two ⁇ 50 base probes per mRNA that hybridizes in solution.
  • the Reporter Probe carries the signal; the Capture Probe allows the complex to be immobilized for data collection.
  • the probes are mixed with the patient sample. After hybridization, the excess probes are removed and the probe/target complexes aligned and immobilized to a substrate, e.g., in the nCounter Cartridge.
  • Sample Cartridges are placed in the Digital Analyzer for data collection. Color codes on the surface of the cartridge are counted and tabulated for each target molecule.
  • NanoString nCounter system A benefit of the use of the NanoString nCounter system is that no amplification of mRNA is necessary in order to perform the detection and quantification.
  • other suitable quantitative methods are used. See, e.g., Geiss et al, Direct multiplexed measurement of gene expression with color-coded probe pairs, Nat Biotechnol. 2008 March; 26(3):317-25. doi: 10.1038/nbt1385. Epub 2008 Feb. 17, which is incorporated herein by reference in its entirety.
  • RT-PCR Another suitable quantitative method is RT-PCR, which can be used to compare mRNA levels in different sample populations, in normal and tumor tissues, to characterize patterns of gene expression, to discriminate between closely related mRNAs, and to analyze RNA structure.
  • the first step is the isolation of mRNA from a target sample (e.g., typically total RNA isolated from human PBMC).
  • mRNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g. formalin-fixed) tissue samples.
  • RNA isolation can be performed using a purification kit, buffer set and protease from commercial manufacturers, according to the manufacturer's instructions.
  • Exemplary commercial products include TRI-REAGENT, Qiagen RNeasy mini-columns, MASTERPURE Complete DNA and RNA Purification Kit (EPICENTRE®, Madison, Wis.), Paraffin Block RNA Isolation Kit (Ambion, Inc.) and RNA Stat-60 (Tel-Test).
  • Conventional techniques such as cesium chloride density gradient centrifugation may also be employed.
  • the first step in gene expression profiling by RT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction.
  • the two most commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MMLV-RT).
  • AMV-RT avilo myeloblastosis virus reverse transcriptase
  • MMLV-RT Moloney murine leukemia virus reverse transcriptase
  • the reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling. See, e.g., manufacturer's instructions accompanying the product GENEAMP RNA PCR kit (Perkin Elmer, Calif, USA).
  • the derived cDNA can then be used as a template in the subsequent RT-PCR reaction.
  • the PCR step generally uses a thermostable DNA-dependent DNA polymerase, such as the Taq DNA polymerase, which has a 5′-3′ nuclease activity but lacks a 3′-5′ proofreading endonuclease activity.
  • TAQMAN® PCR typically utilizes the 5′-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5′ nuclease activity can be used.
  • Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction.
  • the target sequence is shown in Table III.
  • a third oligonucleotide, or probe is designed to detect nucleotide sequence located between the two PCR primers.
  • the probe is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye and a quencher fluorescent dye. Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe.
  • the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner.
  • the resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore.
  • One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.
  • TaqMan® RT-PCR can be performed using commercially available equipment.
  • the 5′ nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7900® Sequence Detection System®.
  • the system amplifies samples in a 96-well format on a thermocycler.
  • laser-induced fluorescent signal is collected in real-time through fiber optic cables for all 96 wells, and detected at the CCD.
  • the system includes software for running the instrument and for analyzing the data.
  • 5′-Nuclease assay data are initially expressed as Ct, or the threshold cycle. As discussed above, fluorescence values are recorded during every cycle and represent the amount of product amplified to that point in the amplification reaction. The point when the fluorescent signal is first recorded as statistically significant is the threshold cycle (Ct).
  • RT-PCR is usually performed using an internal standard.
  • the ideal internal standard is expressed at a constant level among different tissues, and is unaffected by the experimental treatment.
  • RNAs most frequently used to normalize patterns of gene expression are mRNAs for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and ⁇ -actin.
  • GPDH glyceraldehyde-3-phosphate-dehydrogenase
  • ⁇ -actin glyceraldehyde-3-phosphate-dehydrogenase
  • Real time PCR is comparable both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization, and with quantitative comparative PCR using a normalization gene contained within the sample, or a housekeeping gene for RT-PCR.
  • the obtained cDNA is spiked with a synthetic DNA molecule (competitor), which matches the targeted cDNA region in all positions, except a single base, and serves as an internal standard.
  • the cDNA/competitor mixture is PCR amplified and is subjected to a post-PCR shrimp alkaline phosphatase (SAP) enzyme treatment, which results in the dephosphorylation of the remaining nucleotides.
  • SAP post-PCR shrimp alkaline phosphatase
  • the PCR products from the competitor and cDNA are subjected to primer extension, which generates distinct mass signals for the competitor- and cDNA-derived PCR products. After purification, these products are dispensed on a chip array, which is pre-loaded with components needed for analysis with matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) analysis.
  • MALDI-TOF MS matrix-assisted laser desorption ionization time-of-flight mass spectrometry
  • PCR-based techniques which are known to the art and may be used for gene expression profiling include, e.g., differential display, amplified fragment length polymorphism (iAFLP), and BeadArrayTM technology (Illumina, San Diego, CA) using the commercially available Luminex100 LabMAP system and multiple color-coded microspheres (Luminex Corp., Austin, Tex.) in a rapid assay for gene expression; and high coverage expression profiling (HiCEP) analysis.
  • iAFLP amplified fragment length polymorphism
  • BeadArrayTM technology Illumina, San Diego, CA
  • HiCEP high coverage expression profiling
  • Differential gene expression can also be identified, or confirmed using the microarray technique.
  • the expression profile of lung cancer-associated genes can be measured in either fresh or paraffin-embedded tissue, using microarray technology.
  • polynucleotide sequences of interest including cDNAs and oligonucleotides
  • the arrayed sequences are then hybridized with specific DNA probes from cells or tissues of interest.
  • the source of mRNA is total RNA isolated from whole blood of controls and patient subjects.
  • PCR amplified inserts of cDNA clones are applied to a substrate in a dense array.
  • all 559 nucleotide sequences from Table III are applied to the substrate.
  • the microarrayed genes, immobilized on the microchip, are suitable for hybridization under stringent conditions. Fluorescently labeled cDNA probes may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from tissues of interest. Labeled cDNA probes applied to the chip hybridize with specificity to each spot of DNA on the array. After stringent washing to remove non-specifically bound probes, the chip is scanned by confocal laser microscopy or by another detection method, such as a CCD camera.
  • Quantitation of hybridization of each arrayed element allows for assessment of corresponding mRNA abundance.
  • dual color fluorescence separately labeled cDNA probes generated from two sources of RNA are hybridized pairwise to the array. The relative abundance of the transcripts from the two sources corresponding to each specified gene is thus determined simultaneously.
  • the miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels. Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols.
  • serial analysis of gene expression is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need of providing an individual hybridization probe for each transcript.
  • SAGE Serial Analysis of gene expression
  • MPSS Massively Parallel Signature Sequencing
  • the expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. For more details see, e.g. Velculescu et al., Science 270:484 487 (1995); and Velculescu et al., Cell 88:243 51 (1997), both of which are incorporated herein by reference.
  • MPSS Massively Parallel Signature Sequencing
  • the free ends of the cloned templates on each microbead are analyzed simultaneously, using a fluorescence-based signature sequencing method that does not require DNA fragment separation. This method has been shown to simultaneously and accurately provide, in a single operation, hundreds of thousands of gene signature sequences from a yeast cDNA library.
  • Immunohistochemistry methods are also suitable for detecting the expression levels of the gene expression products of the informative genes described for use in the methods and compositions herein.
  • Antibodies or antisera preferably polyclonal antisera, and most preferably monoclonal antibodies, or other protein-binding ligands specific for each marker are used to detect expression.
  • the antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline phosphatase.
  • unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody.
  • a labeled secondary antibody comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. Protocols and kits for immunohistochemical analyses are well known in the art and are commercially available.
  • a composition for diagnosing lung cancer in a mammalian subject as described herein can be a kit or a reagent.
  • a composition includes a substrate upon which said polynucleotides or oligonucleotides or ligands or ligands are immobilized.
  • the composition is a kit containing the relevant 5 or more polynucleotides or oligonucleotides or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items.
  • at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label.
  • a composition for diagnosing lung cancer in a mammalian subject includes 5 or more PCR primer-probe sets. Each primer-probe set amplifies a different polynucleotide sequence from a gene expression product of 5 or more informative genes found in the blood of the subject. These informative genes are selected to form a gene expression profile or signature which is distinguishable between a subject having lung cancer and a subject having a non-cancerous nodule. Changes in expression in the genes in the gene expression profile from that of a reference gene expression profile are correlated with a lung cancer, such as non-small cell lung cancer (NSCLC).
  • NSCLC non-small cell lung cancer
  • the informative genes are selected from among the genes identified in Table I. In another embodiment of this composition, the informative genes are selected from among the genes identified in Table II. This collection of genes is those for which the gene product expression is altered (i.e., increased or decreased) versus the same gene product expression in the blood of a reference control (i.e., a patient having a non-cancerous nodule).
  • polynucleotide or oligonucleotide or ligands i.e., probes
  • An example of such a composition contains probes to a targeted portion of the 559 genes of Table I.
  • probes are generated to all 559 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 539 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 3 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 5 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 10 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 15 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 20 genes from Table I or Table II for use in the composition.
  • probes are generated to the first 25 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 30 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 35 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 40 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 45 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 50 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 60 genes from Table I or Table II for use in the composition.
  • probes are generated to the first 65 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 70 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 75 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 80 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 85 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 90 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 95 genes from Table I or Table II for use in the composition.
  • probes are generated to the first 100 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 200 genes from Table I for use in the composition. In yet another embodiment, probes are generated to 300 genes from Table I for use in the composition. Still other embodiments employ probes to a targeted portion of other combinations of the genes in Table I or Table II. The selected genes from the Table need not be in rank order; rather any combination that clearly shows a difference in expression between the reference control to the diseased patient is useful in such a composition.
  • the reference control is a non-healthy control (NHC) as described above.
  • the reference control may be any class of controls as described above in “Definitions”.
  • compositions based on the genes selected from Table I or Table II described herein, optionally associated with detectable labels can be presented in the format of a microfluidics card, a chip or chamber, or a kit adapted for use with the Nanostring, PCR, RT-PCR or Q PCR techniques described above.
  • a format is a diagnostic assay using TAQMAN® Quantitative PCR low density arrays.
  • such a format is a diagnostic assay using the Nanostring nCounter platform.
  • the PCR primers and probes are preferably designed based upon intron sequences present in the gene(s) to be amplified selected from the gene expression profile.
  • Exemplary target sequences are shown in Table III.
  • the design of the primer and probe sequences is within the skill of the art once the particular gene target is selected.
  • the particular methods selected for the primer and probe design and the particular primer and probe sequences are not limiting features of these compositions.
  • a ready explanation of primer and probe design techniques available to those of skill in the art is summarized in U.S. Pat. No.
  • optimal PCR primers and probes used in the compositions described herein are generally 17-30 bases in length, and contain about 20-80%, such as, for example, about 50-60% G+C bases. Melting temperatures of between 50 and 80° C., e.g. about 50 to 70° C. are typically preferred.
  • a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I.
  • a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I or Table II.
  • composition relies on recognition of the same gene profiles as described above for the Nanostring compositions but employs the techniques of a cDNA array. Hybridization of the immobilized polynucleotides in the composition to the gene expression products present in the blood of the patient subject is employed to quantitate the expression of the informative genes selected from among the genes identified in Tables I or Table II to generate a gene expression profile for the patient, which is then compared to that of a reference sample. As described above, depending upon the identification of the profile (i.e., that of genes of Table I or subsets thereof, that of genes of Table II or subsets thereof), this composition enables the diagnosis and prognosis of NSCLC lung cancers.
  • composition or kit useful in the methods described herein contain a plurality of ligands that bind to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject.
  • a composition or kit useful in the methods described herein contain a plurality of ligands that bind to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject.
  • the gene expression profile contains the genes of Table I or Table II, as described above for the other compositions. This composition enables detection of the proteins expressed by the genes in the indicated Tables.
  • the ligands are antibodies to the proteins encoded by the genes in the profile
  • various forms of antibody e.g., polyclonal, monoclonal, recombinant, chimeric, as well as fragments and components (e.g., CDRs, single chain variable regions, etc.) may be used in place of antibodies.
  • Such ligands may be immobilized on suitable substrates for contact with the subject's blood and analyzed in a conventional fashion.
  • the ligands are associated with detectable labels.
  • These compositions also enable detection of changes in proteins encoded by the genes in the gene expression profile from those of a reference gene expression profile. Such changes correlate with lung cancer in a manner similar to that for the PCR and polynucleotide-containing compositions described above.
  • the gene expression profile can, in one embodiment, include at least the first 25 of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 10 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 15 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 20 or more of the informative genes of Table I or Table II.
  • the gene expression profile can include 30 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 40 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 50 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 60 or more of the informative genes of Table I or Table II.
  • the gene expression profile can include 70 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 80 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 90 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 100 of the informative genes of Table II. In one embodiment, for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include at least the first 100 of the informative genes of Table I.
  • the gene expression profile can include 200 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 300 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 400 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 500 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 539 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 559 of the informative genes of Table I.
  • compositions may be used to diagnose lung cancers, such as stage I or stage II NSCLC. Further these compositions are useful to provide a supplemental or original diagnosis in a subject having lung nodules of unknown etiology.
  • compositions provide a variety of diagnostic tools which permit a blood-based, non-invasive assessment of disease status in a subject.
  • Use of these compositions in diagnostic tests which may be coupled with other screening tests, such as a chest X-ray or CT scan, increase diagnostic accuracy and/or direct additional testing.
  • a method for diagnosing lung cancer in a mammalian subject involves identifying a gene expression profile in the blood of a mammalian, preferably human, subject.
  • the gene expression profile includes 100 or more gene expression products of 100 or more informative genes having increased or decreased expression in lung cancer.
  • the gene expression profiles are formed by selection of 100 or more informative genes from the genes of Table I.
  • the gene expression profile includes 10 or more gene expression products of 10 or more informative genes having increased or decreased expression in lung cancer.
  • the gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table I.
  • the gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table II.
  • the gene expression profile includes 10 or more gene expression products of 5 or more informative genes having increased or decreased expression in lung cancer.
  • the gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table I.
  • the gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table II.
  • Comparison of a subject's gene expression profile with a reference gene expression profile permits identification of changes in expression of the informative genes that correlate with a lung cancer (e.g., NSCLC). This method may be performed using any of the compositions described above. In one embodiment, the method enables the diagnosis of a cancerous tumor from a benign nodule.
  • compositions described herein are provided for diagnosing lung cancer in a subject.
  • the diagnostic compositions and methods described herein provide a variety of advantages over current diagnostic methods. Among such advantages are the following. As exemplified herein, subjects with cancerous tumors are distinguished from those with benign nodules. These methods and compositions provide a solution to the practical diagnostic problem of whether a patient who presents at a lung clinic with a small nodule has malignant disease. Patients with an intermediate-risk nodule would clearly benefit from a non-invasive test that would move the patient into either a very low-likelihood or a very high-likelihood category of disease risk. An accurate estimate of malignancy based on a genomic profile (i.e.
  • estimating a given patient has a 90% probability of having cancer versus estimating the patient has only a 5% chance of having cancer would result in fewer surgeries for benign disease, more early stage tumors removed at a curable stage, fewer follow-up CT scans, and reduction of the significant psychological costs of worrying about a nodule.
  • the economic impact would also likely be significant, such as reducing the current estimated cost of additional health care associated with CT screening for lung cancer, i.e., $116,000 per quality adjusted life-year gained.
  • a non-invasive blood genomics test that has a sufficient sensitivity and specificity would significantly alter the post-test probability of malignancy and thus, the subsequent clinical care.
  • a desirable advantage of these methods over existing methods is that they are able to characterize the disease state from a minimally-invasive procedure, i.e., by taking a blood sample.
  • current practice for classification of cancer tumors from gene expression profiles depends on a tissue sample, usually a sample from a tumor. In the case of very small tumors a biopsy is problematic and clearly if no tumor is known or visible, a sample from it is impossible. No purification of tumor is required, as is the case when tumor samples are analyzed.
  • a recently published method depends on brushing epithelial cells from the lung during bronchoscopy, a method which is also considerably more invasive than taking a blood sample. Blood samples have an additional advantage, which is that the material is easily prepared and stabilized for later analysis, which is important when messenger RNA is to be analyzed.
  • the 559 classifier described herein showed a ROC-AUC of 0.81 over all tested samples.
  • the specificity is about 46%.
  • the nodule classification accuracy is assessed by size without using a specific threshold for sensitivity, as nodules size and the cancer risk factor increases, the number of benign nodules classified as cancer increases.
  • the accuracy of the gene classifier is about 89% for nodules ⁇ 8 mm.
  • the accuracy of the gene classifier is about 75% for nodules >8 to about ⁇ 12 mm.
  • the accuracy of the gene classifier is about 68% for nodules >12 to about ⁇ 16 mm.
  • the accuracy of the gene classifier is about 53% for ⁇ 16 mm. See examples below.
  • the specificity is about 54% and the ROC-AUC to 0.85 at about 90% sensitivity. In another embodiment, for larger nodules, about >10 mm, the specificity is about 24% and the ROC-AUC about 0.71 at about 90% sensitivity.
  • the 100 Classifier described herein showed a ROC-AUC of 0.82 over all tested samples.
  • the specificity is about 62%.
  • the specificity is about 79%, the specificity is about 68%.
  • the sensitivity is about 71%, the specificity is about 75%. See examples below.
  • compositions and methods allow for more accurate diagnosis and treatment of lung cancer.
  • the methods described include treatment of the lung cancer. Treatment may removal of the neoplastic growth, chemotherapy and/or any other treatment known in the art or described herein.
  • a method for diagnosing the existence or evaluating a lung cancer in a mammalian subject which includes identifying changes in the expression of 5, 10, 15 or more genes in the sample of said subject, said genes selected from the genes of Table I or the genes of Table II.
  • the subject's gene expression levels are compare with the levels of the same genes in a reference or control, wherein changes in expression of the subject's genes from those of the reference correlates with a diagnosis or evaluation of a lung cancer.
  • the diagnosis or evaluation comprise one or more of a diagnosis of a lung cancer, a diagnosis of a benign nodule, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy.
  • the changes comprise an upregulation of one or more selected genes in comparison to said reference or control or a downregulation of one or more selected genes in comparison to said reference or control.
  • the method includes the size of a lung nodule in the subject.
  • the specificity and sensitivity may be variable based on the size of the nodule.
  • the specificity is about 46% at about 90% sensitivity.
  • the specificity is about 54% at about 90% sensitivity for nodules ⁇ 10 mm.
  • the accuracy is about 88% for nodules ⁇ 8 mm, about 75% for nodules >8 mm and ⁇ 12 mm, about 68% for nodules >12 mm and ⁇ 16 mm, and about 53% for nodules >16 mm.
  • the reference or control comprises three or more genes of Table I sample of at least one reference subject.
  • the reference subject may be selected from the group consisting of: (a) a smoker with malignant disease, (b) a smoker with non-malignant disease, (c) a former smoker with non-malignant disease, (d) a healthy non-smoker with no disease, (e) a non-smoker who has chronic obstructive pulmonary disease (COPD), (f) a former smoker with COPD, (g) a subject with a solid lung tumor prior to surgery for removal of same; (h) a subject with a solid lung tumor following surgical removal of said tumor; (i) a subject with a solid lung tumor prior to therapy for same; and (j) a subject with a solid lung tumor during or following therapy for same.
  • the reference or control subject (a)-(j) is the same test subject at a temporally earlier timepoint.
  • the sample is selected from those described herein.
  • the sample is peripheral blood.
  • the nucleic acids in the sample are, in some embodiments, stabilized prior to identifying changes in the gene expression levels. Such stabilization may be accomplished, e.g., using the Pax Gene system, described herein.
  • the method of detecting lung cancer in a patient includes
  • the method of diagnosing lung cancer in a subject includes
  • the method includes
  • Example 1 Patient Population—Analysis A
  • Patients with lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).
  • the “control” cohort was derived from patients with benign lung nodules (e.g. ground glass opacities, single nodules, granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.
  • benign lung nodules e.g. ground glass opacities, single nodules, granulomas or hamartomas.
  • Example 2 Patient Population—Analysis B
  • lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).
  • the “control” cohort was derived from patients with benign lung nodules (e.g. granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.
  • benign lung nodules e.g. granulomas or hamartomas.
  • Blood samples were collected in the clinic by the tissue acquisition technician. Blood samples were drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs.
  • PAXgene RNA is prepared using a standard commercially available kit from QiagenTM that allows purification of mRNA. The resulting RNA is used for mRNA profiling. The RNA quality is determined using a Bioanalyzer. Only samples with RNA Integrity numbers >3 were used.
  • RNA is isolated as follows. Turn shaker-incubator on and set to 55° C. before beginning. Unless otherwise noted, all steps in this protocol including centrifugation steps, should be carried out at room temp (15-25° C.). This protocol assumes samples are stores at ⁇ 80° C. Unfrozen samples that have been left a RT per the Qiagen protocol of a minimum of 2 hours should be processed in the same way.
  • RNA samples will not be used immediately, store at ⁇ 20° C. or ⁇ 70° C. Since the RNA remains denatured after repeated freezing and thawing, it is not necessary to repeat the incubation at 65° C.
  • a gene expression profile with the smallest number of genes that maintain satisfactory accuracy is provided by the use of 100 more of the genes identified in Table I as well as by the use of 10 or more of the genes identified in Table II.
  • These gene profiles or signatures permit simpler and more practical tests that are easy to use in a standard clinical laboratory. Because the number of discriminating genes is small enough, NanoString nCounter® platforms are developed using these gene expression profiles.
  • samples were then transferred to the nCounter Prep Station for processing using the Standard Protocol setting (Run Time: 2 hr 35 min).
  • the Prep Station robot during the Standard Protocol, washed samples to remove excess Reporter and Capture Probes.
  • Samples were moved to a streptavidin-coated cartridge where purified target-probe complexes were immobilized in preparation for imaging by the nCounter Digital Analyzer.
  • the cartridge was sealed and placed in the Digital Analyzer using a Field of View (FOV) setting at 555.
  • a fluorescent microscope tabulated the raw counts for each unique barcode associated with a target mRNA. Data collected was stored in .csv files and then transferred to the Bioinformatics Facility for analysis according to the manufacturer's instructions.
  • Support Vector Machine can be applied to gene expression datasets for gene function discovery and classification.
  • SVM has been found to be most efficient at distinguishing the more closely related cases and controls that reside in the margins.
  • SVM-RFE (48, 54) was used to develop gene expression classifiers which distinguish clinically defined classes of patients from clinically defined classes of controls (smokers, non-smokers, COPD, granuloma, etc).
  • SVM-RFE is a SVM based model utilized in the art that removes genes, recursively based on their contribution to the discrimination, between the two classes being analyzed. The lowest scoring genes by coefficient weights were removed and the remaining genes were scored again and the procedure was repeated until only a few genes remained. This method has been used in several studies to perform classification and gene selection tasks. However, choosing appropriate values of the algorithm parameters (penalty parameter, kernel-function, etc.) can often influence performance.
  • SVM-RCE is a related SVM based model, in that it, like SVM-RFE assesses the relative contributions of the genes to the classifier. SVM-RCE assesses the contributions of groups of correlated genes instead of individual genes. Additionally, although both methods remove the least important genes at each step, SVM-RCE scores and removes clusters of genes, while SVM-RFE scores and removes a single or small numbers of genes at each round of the algorithm.
  • the SVM-RCE method is briefly described here. Low expressing genes (average expression less than 2 ⁇ background) were removed, quantile normalization performed, and then “outlier” arrays whose median expression values differ by more than 3 sigma from the median of the dataset were removed. The remaining samples were subject to SVM-RCE using ten repetitions of 10-fold cross-validation of the algorithm. The genes were reduced by t-test (applied on the training set) to an experimentally determined optimal value which produces highest accuracy in the final result. These starting genes were clustered by K-means into clusters of correlated genes whose average size is 3-5 genes. SVM classification scoring was carried out on each cluster using 3-fold resampling repeated 5 times, and the worst scoring clusters eliminated.
  • K-fold cross-validation K usually equal to 10
  • K K usually equal to 10
  • the algorithm was trained on a random selection of 90% of the patients and 90% of the controls and then tested on the remaining 10%. This was repeated until all of the samples have been employed as test subjects and the cumulated classifier makes use of all of the samples, but no sample is tested using a training set of which it is a part.
  • K-fold separation was performed M times producing different combinations of patients and controls in each of K folds each time. Therefore, for individual dataset M*K rounds of permuted selection of training and testing sets were used for each set of genes.
  • Resampling To demonstrate dependence of the classifier on the disease state, patients and controls from the dataset were chosen at random (permuted) and the classification was repeated. The accuracy of classification using randomized samples was compared to the accuracy of the developed classifier to determine the p value for the classifier, i.e., the possibility that the classifier might have been chosen by chance. In order to test the generality of a classifier developed in this manner, it was used to classify independent sets of samples that were not used in developing the classifier. The cross-validation accuracies of the permuted and original classifier were compared on independent test sets to confirm its validity in classifying new samples.
  • Performance of each classifier was estimated by different methods and several performance measurements were used for comparing classifiers between each other. These measurements include accuracy, area under ROC curve, sensitivity, specificity, true positive rate and true negative rate. Based on the required properties of the classification of interest, different performance measurements can be used to pick the optimal classifier, e.g. classifier to use in screening of the whole population would require better specificity to compensate for small ( ⁇ 1%) prevalence of the disease and therefore avoid large number of false positive hits, while a diagnostic classifier of patients in hospital should be more sensitive.
  • RNA samples were all collected in PAXgene RNA stabilizations tubes and RNA was extracted according to the manufacturer. Samples were tested on a Nanostring nCounterTM (as described above) against a custom panel of 559 probes (Table III). In addition, they were tested against a 100 probe subset of 559 marker panel.
  • the 559 classifier developed on all the samples showed a ROC-AUC of 0.81 ( FIG. 2 A ). With the Sensitivity set at 90%, the specificity is 46%. When performed on a balanced set of 556 samples (278 cancer, 278 nodule), similar performance is shown ( FIG. 2 B ). For both sets, UHR controls, post samples, and patients with other cancers were excluded.
  • nodules ⁇ 8 mm were correctly classified 88.9% of the time, for nodules >8, ⁇ 12 mm accuracy was 75%, for nodules >12, ⁇ 16 mm accuracy was 68%, for nodules >16 mm accuracy is 53.6%. See Table IV below.
  • the chart shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.

Abstract

Methods and compositions are provided for diagnosing lung cancer in a mammalian subject by use of 10 or more selected genes, e.g., a gene expression profile, from the blood of the subject which is characteristic of disease. The gene expression profile includes 10 or more genes of Table I or Table II herein.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This is a continuation application of U.S. application Ser. No. 16/312,036, filed Dec. 20, 2018, which is a National Stage Entry under 35 U.S.C. 371 of International Patent Application No. PCT/US2017/038571, filed Jun. 21, 2017, which claims priority to U.S. Provisional Application No. 62/352,865, filed Jun. 21, 2016. These applications are incorporated by reference herein.
  • STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
  • This invention was made with government support under Grant No. CA010815 awarded by the National Institutes of Health. The government has certain rights in the invention.
  • INCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED IN ELECTRONIC FORM
  • The contents of the electronic sequence listing (WST164USC1_SeqList.xml; size 536,359 bytes; and Date of Creation: Apr. 25, 2023) is herein incorporated by reference in its entirety.
  • BACKGROUND OF THE INVENTION
  • Lung cancer is the most common worldwide cause of cancer mortality. In the United States, lung cancer is the second most prevalent cancer in both men and women and will account for more than 174,000 new cases per year and more than 162,000 cancer deaths. In fact, lung cancer accounts for more deaths each year than from breast, prostate and colorectal cancers combined.
  • The high mortality (80-85% in five years), which has shown little or no improvement in the past 30 years, emphasizes the fact that new and effective tools to facilitate early diagnosis prior to metastasis to regional nodes or beyond the lung are needed.
  • High risk populations include smokers, former smokers, and individuals with markers associated with genetic predispositions. Because surgical removal of early stage tumors remains the most effective treatment for lung cancer, there has been great interest in screening high-risk patients with low dose spiral CT (LDCT). This strategy identifies non-calcified pulmonary nodules in approximately 30-70% of high risk individuals but only a small proportion of detected nodules are ultimately diagnosed as lung cancers (0.4 to 2.7%). Currently, the only way to differentiate subjects with lung nodules of benign etiology from subjects with malignant nodules is an invasive biopsy, surgery, or prolonged observation with repeated scanning. Even using the best clinical algorithms, 20-55% of patients selected to undergo surgical lung biopsy for indeterminate lung nodules, are found to have benign disease and those that do not undergo immediate biopsy or resection require sequential imaging studies. The use of serial CT in this group of patients runs the risk of delaying potential curable therapy, along with the costs of repeat scans, the not-insignificant radiation doses, and the anxiety of the patient.
  • Ideally, a diagnostic test would be easily accessible, inexpensive, demonstrate high sensitivity and specificity, and result in improved patient outcomes (medically and financially). Others have shown that classifiers which utilize epithelial cells have high accuracy. However, harvesting these cells requires an invasive bronchoscopy. See, Silvestri et al, N Engl J Med. 2015 Jul. 16; 373(3): 243-251, which is incorporated herein by reference.
  • Efforts are in progress to develop non-invasive diagnostics using sputum, blood or serum and analyzing for products of tumor cells, methylated tumor DNA, single nucleotide polymorphism (SNPs) expressed messenger RNA or proteins. This broad array of molecular tests with potential utility for early diagnosis of lung cancer has been discussed in the literature. Although each of these approaches has its own merits, none has yet passed the exploratory stage in the effort to detect patients with early stage lung cancer, even in high-risk groups, or patients which have a preliminary diagnosis based on radiological and other clinical factors. A simple blood test, a routine event associated with regular clinical office visits, would be an ideal diagnostic test.
  • SUMMARY OF THE INVENTION
  • In one aspect, a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more polynucleotides or oligonucleotides, wherein each polynucleotide or oligonucleotide hybridizes to a different gene, gene fragment, gene transcript or expression product in a patient sample. Each gene, gene fragment, gene transcript or expression product is selected from the genes of Table I or Table II. In one embodiment, at least one polynucleotide or oligonucleotide is attached to a detectable label. In one embodiment, the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 559 genes in Table I. In another embodiment, the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 100 genes in Table II.
  • In another aspect, a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more ligands, wherein each ligand hybridizes to a different gene expression product in a patient sample. Each gene expression product is selected from the genes of Table I or Table II. In one embodiment, at least one ligand is attached to a detectable label. In one embodiment, the composition or kit includes ligands which detect the expression products of each of the 559 genes in Table I. In another embodiment, the composition or kit includes ligands which detect the expression products of each of the 100 genes in Table II.
  • The compositions described herein enable detection of changes in expression in the genes in the subject's gene expression profile from that of a reference gene expression profile. The various reference gene expression profiles are described below. In one embodiment, the composition provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.
  • In another aspect, a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying changes in the expression of three or more genes in the sample of a subject, said genes selected from the genes of Table I or Table II, and comparing that subject's gene expression levels with the levels of the same genes in a reference or control, wherein changes in expression of said gene expression correlates with a diagnosis or evaluation of a lung cancer. In one embodiment, the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.
  • In another aspect, a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying a gene expression profile in the blood of a subject, the gene expression profile comprising 10 or more gene expression products of 10 or more informative genes as described herein. The 10 or more informative genes are selected from the genes of Table I or Table II. In one embodiment, the gene expression profile contains all 559 genes of Table I. In another embodiment, the gene expression profile contains all 100 genes of Table II. The subject's gene expression profile is compared with a reference gene expression profile from a variety of sources described below. Changes in expression of the informative genes correlate with a diagnosis or evaluation of a lung cancer. In one embodiment, the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.
  • In another aspect, a method of detecting lung cancer in a patient is provided. The method includes obtaining a sample from the patient; and detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product.
  • In yet another aspect, a method of diagnosing lung cancer in a subject is provided. The method includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; and diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected.
  • In another aspect, a method of diagnosing and treating lung cancer in a subject having a neoplastic growth is provided. The method includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected; and removing the neoplastic growth. Other appropriate treatments may also be provided.
  • Other aspects and advantages of these compositions and methods are described further in the following detailed description of the preferred embodiments thereof.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a table showing patient characteristics for the samples used in Example 1.
  • FIGS. 2A and 2B are graphs showing the cross validated support vector machine classifier (CV SVM) of all 610 samples (FIG. 2A, Accuracy=0.75, ROC Area=0.81. According to the curve, when the sensitivity is 0.91, the specificity is 0.46; when the sensitivity is 0.72, the specificity is 0.77) and a balanced set of 556 samples (FIG. 2B, Accuracy=0.76, ROC Area=0.81, According to the curve, when the sensitivity is 0.90, the specificity is 0.48; when the sensitivity is 0.76, the specificity is 0.77), using the 559 Classifier. The full and balanced sets show similar performance.
  • FIG. 3 is a bar graph showing sensitivity of the classifier by nodule size groups (x-axis). Data shows that larger nodules are more likely to be misclassified (p=1.54*10-4).
  • FIGS. 4A to 4C show the classification of samples groups (cancer, FIG. 4B, n=204; and nodule, FIG. 4C, n=331) stratified by lesion size. Over cancers >5 mm and higher, r=0.95. For nodules of all sizes, r=0.97. The chart (FIG. 4A) shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.
  • FIGS. 5A and 5B are graphs showing the cross validated support vector machine classifier (CV SVM) of all cancer samples (n=278) vs. small nodules (<10 mm) (n=244) (FIG. 5A, Accuracy=0.79, ROC Area=0.85. According to the curve, when the sensitivity is 0.90, the specificity is 0.54; when the sensitivity is 0.77, the specificity is 0.82) and 10-fold CV SVM using all cancer samples (n=278) vs. large nodules (≥10 mm) (n=88) (FIG. 5B, Accuracy=0.76, ROC Area=0.71. According to the curve, when the sensitivity is 0.90, the specificity is 0.24; when the sensitivity is 0.87, the specificity is 0.42).
  • FIG. 6 is a graph showing the cross validated support vector machine classifier (CV SVM) of 25% of the data set used for the 559 Classifier, used as a testing set for the 100 Classifier. ROC Area=0.82. According to the curve, when the sensitivity is 0.90, the specificity is 0.62; when the sensitivity is 0.79, the specificity is 0.68; and when the sensitivity is 0.71, the specificity is 0.75.
  • DETAILED DESCRIPTION OF THE INVENTION
  • The methods and compositions described herein apply gene expression technology to blood screening for the detection and diagnosis of lung cancer. The compositions and methods described herein provide the ability to distinguish a cancerous tumor from a non-cancerous nodule, by determining a characteristic RNA expression profile of the genes of the blood of a mammalian, preferably human, subject. The profile is compared with the profile of one or more subjects of the same class (e.g., patients having lung cancer or a non-cancerous nodule) or a control to provide a useful diagnosis.
  • These methods of lung cancer screening employ compositions suitable for conducting a simple and cost-effective and non-invasive blood test using gene expression profiling that could alert the patient and physician to obtain further studies, such as a chest radiograph or CT scan, in much the same way that the prostate specific antigen is used to help diagnose and follow the progress of prostate cancer. The application of these profiles provides overlapping and confirmatory diagnoses of the type of lung disease, beginning with the initial test for malignant vs. non-malignant disease.
  • “Patient” or “subject” as used herein means a mammalian animal, including a human, a veterinary or farm animal, a domestic animal or pet, and animals normally used for clinical research. In one embodiment, the subject of these methods and compositions is a human.
  • “Control” or “Control subject” as used herein refers to the source of the reference gene expression profiles as well as the particular panel of control subjects described herein. In one embodiment, the control or reference level is from a single subject. In another embodiment, the control or reference level is from a population of individuals sharing a specific characteristic. In yet another embodiment, the control or reference level is an assigned value which correlates with the level of a specific control individual or population, although not necessarily measured at the time of assaying the test subject's sample. In one embodiment, the control subject or reference is from a patient (or population) having a non-cancerous nodule. In another embodiment, the control subject or reference is from a patient (or population) having a cancerous tumor. In other embodiments, the control subject can be a subject or population with lung cancer, such as a subject who is a current or former smoker with malignant disease, a subject with a solid lung tumor prior to surgery for removal of same; a subject with a solid lung tumor following surgical removal of said tumor; a subject with a solid lung tumor prior to therapy for same; and a subject with a solid lung tumor during or following therapy for same. In other embodiments, the controls for purposes of the compositions and methods described herein include any of the following classes of reference human subject with no lung cancer. Such non-healthy controls (NHC) include the classes of smoker with non-malignant disease, a former smoker with non-malignant disease (including patients with lung nodules), a non-smoker who has chronic obstructive pulmonary disease (COPD), and a former smoker with COPD. In still other embodiments, the control subject is a healthy non-smoker with no disease or a healthy smoker with no disease.
  • “Sample” as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells. The most suitable sample for use in this invention includes whole blood. Other useful biological samples include, without limitation, peripheral blood mononuclear cells, plasma, saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoscopy sample, bronchoalveolar lavage fluid, and other cellular exudates from a patient having cancer. Such samples may further be diluted with saline, buffer or a physiologically acceptable diluent. Alternatively, such samples are concentrated by conventional means.
  • As used herein, the term “cancer” refers to or describes the physiological condition in mammals that is typically characterized by unregulated cell growth. More specifically, as used herein, the term “cancer” means any lung cancer. In one embodiment, the lung cancer is non-small cell lung cancer (NSCLC). In a more specific embodiment, the lung cancer is lung adenocarcinoma (AC or LAC). In another more specific embodiment, the lung cancer is lung squamous cell carcinoma (SCC or LSCC). In another embodiment, the lung cancer is a stage I or stage II NSCLC. In still another embodiment, the lung cancer is a mixture of early and late stages and types of NSCLC.
  • The term “tumor,” as used herein, refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues. The term “nodule” refers to an abnormal buildup of tissue which is benign. The term “cancerous tumor” refers to a malignant tumor.
  • By “diagnosis” or “evaluation” it is meant a diagnosis of a lung cancer, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy. In one embodiment, “diagnosis” or “evaluation” refers to distinguishing between a cancerous tumor and a benign pulmonary nodule.
  • As used herein, “sensitivity” (also called the true positive rate), measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition).
  • As used herein, “specificity” (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).
  • By “change in expression” is meant an upregulation of one or more selected genes in comparison to the reference or control; a downregulation of one or more selected genes in comparison to the reference or control; or a combination of certain upregulated genes and down regulated genes.
  • By “therapeutic reagent” or “regimen” is meant any type of treatment employed in the treatment of cancers with or without solid tumors, including, without limitation, chemotherapeutic pharmaceuticals, biological response modifiers, radiation, diet, vitamin therapy, hormone therapies, gene therapy, surgical resection, etc.
  • By “informative genes” as used herein is meant those genes the expression of which changes (either in an up-regulated or down-regulated manner) characteristically in the presence of lung cancer. A statistically significant number of such informative genes thus form suitable gene expression profiles for use in the methods and compositions. Such genes are shown in Table I and Table II below. Such genes make up the “expression profile”.
  • The term “statistically significant number of genes” in the context of this invention differs depending on the degree of change in gene expression observed. The degree of change in gene expression varies with the type of cancer and with the size or spread of the cancer or solid tumor. The degree of change also varies with the immune response of the individual and is subject to variation with each individual. For example, in one embodiment of this invention, a large change, e.g., 2-3 fold increase or decrease in a small number of genes, e.g., in about 10 to 20 genes, is statistically significant. In another embodiment, a smaller relative change in about 15 more genes is statistically significant.
  • Thus, the methods and compositions described herein contemplate examination of the expression profile of a “statistically significant number of genes” ranging from 5 to about 559 genes in a single profile. In one embodiment, the genes are selected from Table I. In another embodiment, the genes are selected from Table II. In one embodiment, the gene profile is formed by a statistically significant number of 5 or more genes. In one embodiment, the gene profile is formed by a statistically significant number of 10 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 15 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 20 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 25 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 30 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 35 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 40 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 45 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 50 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 60 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 65 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 70 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 75 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 80 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 85 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 90 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 95 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 100 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 200 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 300 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 350 or more genes. In still another embodiment, the gene profile is formed by 400 or more genes. In still another embodiment, the gene profile is formed by 539 genes. In still another embodiment, the gene profile is formed by 559 genes. In still other embodiments, the gene profiles examined as part of these methods contain, as statistically significant numbers of genes, from 10 to 559 genes, and any numbers therebetween. In another embodiment, the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, or all 559 genes of Table I. In another embodiment, the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or all 100 genes of Table II.
  • Table I and Table II below refer to a collection of known genes useful in discriminating between a subject having a lung cancer, e.g., NSCLC, and subjects having benign (non-malignant) lung nodules. The sequences of the genes identified in Table I and Table II are publicly available. One skilled in the art may readily reproduce the compositions and methods described herein by use of the sequences of the genes, all of which are publicly available from conventional sources, such as GenBank. The GenBank accession number for each gene is provided.
  • The term “microarray” refers to an ordered arrangement of hybridizable array elements, preferably polynucleotide or oligonucleotide probes, on a substrate.
  • The term “polynucleotide,” when used in singular or plural form, generally refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. Thus, for instance, polynucleotides as defined herein include, without limitation, single- and double-stranded DNA, DNA including single- and double-stranded regions, single- and double-stranded RNA, and RNA including single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or include single- and double-stranded regions. In addition, the term “polynucleotide” as used herein refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands in such regions may be from the same molecule or from different molecules. The regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules. One of the molecules of a triple-helical region often is an oligonucleotide. The term “polynucleotide” specifically includes cDNAs. The term includes DNAs (including cDNAs) and RNAs that contain one or more modified bases. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are “polynucleotides” as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritiated bases, are included within the term “polynucleotides” as defined herein. In general, the term “polynucleotide” embraces all chemically, enzymatically and/or metabolically modified forms of unmodified polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including simple and complex cells.
  • The term “oligonucleotide” refers to a relatively short polynucleotide, including, without limitation, single-stranded deoxyribonucleotides, single- or double-stranded ribonucleotides, RNA:DNA hybrids and double-stranded DNAs. Oligonucleotides, such as single-stranded DNA probe oligonucleotides, are often synthesized by chemical methods, for example using automated oligonucleotide synthesizers that are commercially available. However, oligonucleotides can be made by a variety of other methods, including in vitro recombinant DNA-mediated techniques and by expression of DNAs in cells and organisms.
  • The terms “differentially expressed gene”, “differential gene expression” and their synonyms, which are used interchangeably, refer to a gene whose expression is activated to a higher or lower level in a subject suffering from a disease, specifically cancer, such as lung cancer, relative to its expression in a control subject, such as a subject having a benign nodule. The terms also include genes whose expression is activated to a higher or lower level at different stages of the same disease. It is also understood that a differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a change in mRNA levels, surface expression, secretion or other partitioning of a polypeptide, for example. Differential gene expression may include a comparison of expression between two or more genes or their gene products, or a comparison of the ratios of the expression between two or more genes or their gene products, or even a comparison of two differently processed products of the same gene, which differ between normal subjects, non-health controls and subjects suffering from a disease, specifically cancer, or between various stages of the same disease. Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages. For the purpose of this invention, “differential gene expression” is considered to be present when there is a statistically significant (p<0.05) difference in gene expression between the subject and control samples.
  • The term “over-expression” with regard to an RNA transcript is used to refer to the level of the transcript determined by normalization to the level of reference mRNAs, which might be all measured transcripts in the specimen or a particular reference set of mRNAs.
  • The phrase “gene amplification” refers to a process by which multiple copies of a gene or gene fragment are formed in a particular cell or cell line. The duplicated region (a stretch of amplified DNA) is often referred to as “amplicon.” Usually, the amount of the messenger RNA (mRNA) produced, i.e., the level of gene expression, also increases in the proportion of the number of copies made of the particular gene expressed.
  • In the context of the compositions and methods described herein, reference to “10 or more”, “at least 10” etc. of the genes listed in Table I or Table II means any one or any and all combinations of the genes listed. For example, suitable gene expression profiles include profiles containing any number between at least 5 through 559 genes from Table I. In another example, suitable gene expression profiles include profiles containing any number between at least 5 through 100 genes from Table II. In one embodiment, gene profiles formed by genes selected from a table are used in rank order, e.g., genes ranked in the top of the list demonstrated more significant discriminatory results in the tests, and thus may be more significant in a profile than lower ranked genes. However, in other embodiments the genes forming a useful gene profile do not have to be in rank order and may be any gene from the table. As used herein, the term “100 Classifier” or “100 Biomarker Classifier” refers to the 100 genes of Table II. As used herein, the term “559 Classifier” or “559 Biomarker Classifier” refers to the 559 genes of Table I. However, subsets of the genes of Table I or Table II, as described herein, are also useful, and, in another embodiment, the terms may refer to those subsets as well.
  • As used herein, “labels” or “reporter molecules” are chemical or biochemical moieties useful for labeling a nucleic acid (including a single nucleotide), polynucleotide, oligonucleotide, or protein ligand, e.g., amino acid or antibody. “Labels” and “reporter molecules” include fluorescent agents, chemiluminescent agents, chromogenic agents, quenching agents, radionucleotides, enzymes, substrates, cofactors, inhibitors, magnetic particles, and other moieties known in the art. “Labels” or “reporter molecules” are capable of generating a measurable signal and may be covalently or noncovalently joined or bound to an oligonucleotide or nucleotide (e.g., a non-natural nucleotide) or ligand.
  • Unless defined otherwise in this specification, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs and by reference to published texts, which provide one skilled in the art with a general guide to many of the terms used in the present application.
  • I. GENE EXPRESSION PROFILES
  • The inventors have shown that the gene expression profiles of the whole blood of lung cancer patients differ significantly from those seen in patients having non-cancerous lung nodules. For example, changes in the gene expression products of the genes of Table I and/or Table II can be observed and detected by the methods of this invention in the normal circulating blood of patients with early stage solid lung tumors.
  • The gene expression profiles described herein provide new diagnostic markers for the early detection of lung cancer and could prevent patients from undergoing unnecessary procedures relating to surgery or biopsy for a benign nodule. Since the risks are very low, the benefit to risk ratio is very high. In one embodiment, the methods and compositions described herein may be used in conjunction with clinical risk factors to help physicians make more accurate decisions about how to manage patients with lung nodules. Another advantage of this invention is that diagnosis may occur early since diagnosis is not dependent upon detecting circulating tumor cells which are present in only vanishing small numbers in early stage lung cancers.
  • In one aspect, a composition is provided for classifying a nodule as cancerous or benign in a mammalian subject. In one embodiment, the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In another embodiment, the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the polynucleotide or oligonucleotide or ligand hybridizes to an mRNA.
  • TABLE I
    Rank Sequence ID# Gene Class Name
    1 PLEKHG4 NM_015432.3 Endogenous
    2 SLC25A20 NM_000387.5 Endogenous
    3 LETM2 NM_144652.3 Endogenous
    4 GLIS3 NM_001042413.1 Endogenous
    5 LOC100132797 XR_036994.1 Endogenous
    6 ARHGEF5 NM_005435.3 Endogenous
    7 TCF7L2 NM_030756.4 Endogenous
    8 SFRS2IP NM_004719.2 Endogenous
    9 CFD NM_001928.2 Endogenous
    10 AZI2 NM_022461.4 Endogenous
    11 STOM NM_004099.5 Endogenous
    12 CD1A NM_001763.2 Endogenous
    13 PANK2 NM_153640.2 Endogenous
    14 CNIH4 NM_014184.3 Endogenous
    15 EVI2A NM_014210.3 Endogenous
    16 BATF NM_006399.3 Endogenous
    17 TCP1 NM_030752.2 Endogenous
    18 BX108566 BX108566.1 Endogenous
    19 ANXA1 NM_000700.2 Endogenous
    20 PSMA3 NM_152132.2 Endogenous
    21 IRF4 NM_002460.1 Endogenous
    22 STAG3 NM_012447.3 Endogenous
    23 NDUFS4 NM_002495.2 Endogenous
    24 HAT1 NM_003642.3 Endogenous
    25 ANXA1 b NM_000700.1 Endogenous
    26 LOC148137 NM_144692.1 Endogenous
    27 LDHA NM_001165416.1 Endogenous
    28 PSME3 NM_005789.3 Endogenous
    29 REPS1 NM_001128617.2 Endogenous
    30 CDH5 NM_001795.3 Endogenous
    31 NAT5 NM_181528.3 Endogenous
    32 PLAC8 NM_001130715.1 Endogenous
    33 GSTO1 NM_004832.2 Endogenous
    34 DGUOK NM_080916.2 Endogenous
    35 OLR1 NM_002543.3 Endogenous
    36 MYST4 NM_012330.3 Endogenous
    37 TIMM8B ENST00000504148.1 Endogenous
    38 LY96 NM_015364.4 Endogenous
    39 CCDC72 NM_015933.4 Endogenous
    40 ATP5I NM_007100.2 Endogenous
    41 WDR91 NM_014149.3 Endogenous
    42 MAGEA3 NM_005362.3 Endogenous
    43 AK093878 AK093878.1 Endogenous
    44 EYA3 NM_001990.3 Endogenous
    45 ACAA2 NM_006111.2 Endogenous
    46 ETFDH NM_004453.3 Endogenous
    47 CCT6A NM_001762.3 Endogenous
    48 HSCB NM_172002.3 Endogenous
    49 EMR4 NM_001080498.2 Endogenous
    50 USP5 NM_003481.2 Endogenous
    51 SIK1 NM_173354.3 Endogenous
    52 SYNJ1 NM_003895.3 Endogenous
    53 KLRB1 NM_002258.2 Endogenous
    54 CLK2 XM_941392.1 Endogenous
    55 SNORA56 NR_002984.1 Endogenous
    56 TP53BP1 NM_005657.2 Endogenous
    57 RBX1 NM_014248.3 Endogenous
    58 CNPY2 NM_014255.5 Endogenous
    59 RELA NM_021975.2 Endogenous
    60 LOC732371 XM_001133019.1 Endogenous
    61 TMEM218 NM_001080546.2 Endogenous
    62 LOC91431 NM_001099776.1 Endogenous
    63 GZMB NM_004131.3 Endogenous
    64 CAMP NM_004345.4 Endogenous
    65 RBM16 NM_014892.4 Endogenous
    66 MID1IP1 NM_021242.5 Endogenous
    67 LOC399942 XM_934471.1 Endogenous
    68 COMMD6 NM_203497.3 Endogenous
    69 PPP6C NM_002721.4 Endogenous
    70 BCOR NM_017745.5 Endogenous
    71 PDCD10 NM_145859.1 Endogenous
    72 HLA-DMB NM_002118.3 Endogenous
    73 DNAJB1 NM_006145.2 Endogenous
    74 KYNU NM_001032998.1 Endogenous
    75 TM2D2 NM_078473.2 Endogenous
    76 FAM179A NM_199280.2 Endogenous
    77 FAM43A NM_153690.4 Endogenous
    78 QTRTD1 NM_024638.3 Endogenous
    79 MARCKSL1 NM_023009.5 Endogenous
    80 FAM193A NM_003704.3 Endogenous
    81 AK026725 AK026725.1 Endogenous
    82 SERPINB10 NM_005024.1 Endogenous
    83 OSBP ILMN_1706376.1 Endogenous
    84 ST6GAL1 NM_003032.2 Endogenous
    85 NDUFAF2 NM_174889.4 Endogenous
    86 UBE2I NM_194259.2 Endogenous
    87 CTAG1B NM_001327.2 Endogenous
    88 TRAF6 NM_145803.1 Endogenous
    89 REPIN1 NM_014374.3 Endogenous
    90 LAMA5 NM_005560.4 Endogenous
    91 TBC1D12 NM_015188.1 Endogenous
    92 TGIF1 b NM_173208.1 Endogenous
    93 LOC728533 XR_015610.3 Endogenous
    94 CLN8 NM_018941.3 Endogenous
    95 COX7B NM_001866.2 Endogenous
    96 DYNC2LI1 NM_016008.3 Endogenous
    97 ANP32B NM_006401.2 Endogenous
    98 PTGDR2 NM_004778.1 Endogenous
    99 MRPS16 NM_016065.3 Endogenous
    100 NIPBL NM_133433.3 Endogenous
    101 PPP2R5C NM_178588.1 Endogenous
    102 DPF2 NM_006268.4 Endogenous
    103 RAB10 NM_016131.4 Endogenous
    104 MYADM NM_001020820.1 Endogenous
    105 CCND3 NM_001760.2 Endogenous
    106 CC2D1B NM_032449.2 Endogenous
    107 HLA-G NM_002127.4 Endogenous
    108 CKS2 NM_001827.1 Endogenous
    109 HPSE NM_006665.5 Endogenous
    110 UBE2G1 NM_003342.4 Endogenous
    111 MED16 NM_005481.2 Endogenous
    112 LOC339674 XM_934917.1 Endogenous
    113 RNF114 NM_018683.3 Endogenous
    114 KIR2DS3 NM_012313.1 Endogenous
    115 AMD1 NM_001634.4 Endogenous
    116 S100A8 NM_002964.4 Endogenous
    117 NFATC4 NM_001136022.2 Endogenous
    118 RPL39L NM_052969.1 Endogenous
    119 LOC399753 XM_930634.1 Endogenous
    120 FKBP1A NM_054014.3 Endogenous
    121 CHMP5 NM_016410.5 Endogenous
    122 CABC1 NM_020247.4 Endogenous
    123 HLA-B NM_005514.6 Endogenous
    124 TRIM39 NM_021253.3 Endogenous
    125 LOC645914 XM_928884.1 Endogenous
    126 CD79A NM_021601.3 Endogenous
    127 GLRX ILMN_1737308.1 Endogenous
    128 RPL26L1 NM_016093.2 Endogenous
    129 USP21 NM_012475.4 Endogenous
    130 CD70 NM_001252.2 Endogenous
    131 SPINK5 NM_006846.3 Endogenous
    132 HUWE1 NM_031407.6 Endogenous
    133 STK38 NM_007271.3 Endogenous
    134 SEMG1 NM_003007.2 Endogenous
    135 NDUFA4 NM_002489.3 Endogenous
    136 MYADM b NM_001020820.1 Endogenous
    137 SGK1 b NM_005627.3 Endogenous
    138 SLAMF8 NM_020125.2 Endogenous
    139 LOC653773 XM_938755.1 Endogenous
    140 RPS24 NM_001026.4 Endogenous
    141 LOC338799 NR_002809.2 Endogenous
    142 MAP3K7 NM_145333.1 Endogenous
    143 KLRD1 NM_002262.3 Endogenous
    144 LOC732111 XM_001134275.1 Endogenous
    145 CD69 NM_001781.2 Endogenous
    146 DDIT4 NM_019058.2 Endogenous
    147 C1orf222 NM_001003808.1 Endogenous
    148 PFAS NM_012393.2 Endogenous
    149 USP9Y NM_004654.3 Endogenous
    150 COLEC12 NM_130386.2 Endogenous
    151 VPS37C NM_017966.4 Endogenous
    152 SAP130 NM_024545.3 Endogenous
    153 CDC42EP2 NM_006779.3 Endogenous
    154 LOC643319 XM_927980.1 Endogenous
    155 ASF1B NM_018154.2 Endogenous
    156 AK094576 AK094576.1 Endogenous
    157 BANP NM_079837.2 Endogenous
    158 TBK1 NM_013254.2 Endogenous
    159 GNS NM_002076.3 Endogenous
    160 IL1R2 NM_173343.1 Endogenous
    161 CLEC4C NM_203503.1 Endogenous
    162 TM9SF1 NM_006405.6 Endogenous
    163 PTGDR NM_000953.2 Endogenous
    164 GOLGA3 NM_005895.3 Endogenous
    165 CLEC4A NM_194448.2 Endogenous
    166 TSC1 NM_000368.4 Endogenous
    167 SFMBT1 NM_001005158.2 Endogenous
    168 GLT25D1 NM_024656.2 Endogenous
    169 LOC100130229 XM_001717158.1 Endogenous
    170 PHF8 NM_015107.2 Endogenous
    171 PUM1 NM_001020658.1 Endogenous
    172 SMARCC1 NM_003074.3 Endogenous
    173 AK126342 AK126342.1 Endogenous
    174 ACSL5 NM_203379.1 Endogenous
    175 TGIF1 NM_003244.2 Endogenous
    176 BF375676 BF375676.1 Endogenous
    177 SPA17 NM_017425.3 Endogenous
    178 FLNB NM_001457.3 Endogenous
    179 FAM105B NM_138348.4 Endogenous
    180 CPPED1 NM_018340.2 Endogenous
    181 TRIM32 NM_012210.3 Endogenous
    182 RNF34 NM_025126.3 Endogenous
    183 SLC45A3 NM_033102.2 Endogenous
    184 P2RY10 NM_198333.1 Endogenous
    185 AKR1C3 NM_003739.4 Endogenous
    186 NME1-NME2 NM_001018136.2 Endogenous
    187 AMPD3 NM_000480.2 Endogenous
    188 HSP90AB1 NM_007355.3 Endogenous
    189 RBM4B NM_031492.3 Endogenous
    190 DMBT1 NM_007329.2 Endogenous
    191 TMCO1 NM_019026.3 Endogenous
    192 CASP2 NM_032983.3 Endogenous
    193 C1orf103 NM_018372.3 Endogenous
    194 ARHGAP17 NM_018054.5 Endogenous
    195 IFNA17 NM_021268.2 Endogenous
    196 CTSZ NM_001336.3 Endogenous
    197 DBI NM_001079862.1 Endogenous
    198 TXNRD1 b NM_182743.2 Endogenous
    199 KIAA0460 NM_015203.4 Endogenous
    200 PDGFD NM_033135.3 Endogenous
    201 ATG5 NM_004849.2 Endogenous
    202 ITFG2 NM_018463.3 Endogenous
    203 HERC1 NM_003922.3 Endogenous
    204 MEN1 NM_130799.2 Endogenous
    205 IFI27L2 NM_032036.2 Endogenous
    206 LOC729887 XR_040891.2 Endogenous
    207 PI4K2A NM_018425.3 Endogenous
    208 RAG1 NM_000448.2 Endogenous
    209 CREB5 NM_182898.3 Endogenous
    210 SLC6A12 NM_003044.4 Endogenous
    211 CDKN1A NM_000389.2 Endogenous
    212 AW173314 AW173314.1 Endogenous
    213 SAP130 b NM_024545.3 Endogenous
    214 ABCA5 NM_018672.4 Endogenous
    215 SLC25A37 NM_016612.2 Endogenous
    216 MYLIP NM_013262.3 Endogenous
    217 GATA2 NM_001145662.1 Endogenous
    218 ATP5L NM_006476.4 Endogenous
    219 RPS27L NM_015920.3 Endogenous
    220 DB338252 DB338252.1 Endogenous
    221 FRAT2 NM_012083.2 Endogenous
    222 CCL4 NM_002984.2 Endogenous
    223 CD79B NM_000626.2 Endogenous
    224 MBD1 NM_015844.2 Endogenous
    225 TIAM1 NM_003253.2 Endogenous
    226 HSD11B1 NM_181755.1 Endogenous
    227 TPR NM_003292.2 Endogenous
    228 EID2B NM_152361.2 Endogenous
    229 PDSS1 NM_014317.3 Endogenous
    230 C9orf164 NM_182635.1 Endogenous
    231 ARHGEF18 NM_015318.3 Endogenous
    232 TXNRD1 NM_001093771.2 Endogenous
    233 HNRNPAB NM_004499.3 Endogenous
    234 TTN NM_133378.4 Endogenous
    235 EP300 NM_001429.2 Endogenous
    236 CCDC97 NM_052848.1 Endogenous
    237 HK3 NM_002115.2 Endogenous
    238 CRKL NM_005207.3 Endogenous
    239 NCOA5 NM_020967.2 Endogenous
    240 AK124143 AK124143.1 Endogenous
    241 LBA1 NM_014831.2 Endogenous
    242 SLC9A3R1 NM_004252.3 Endogenous
    243 CRY2 NM_021117.3 Endogenous
    244 ATG4B NM_178326.2 Endogenous
    245 CD97 NM_078481.3 Endogenous
    246 TTC9 NM_015351.1 Endogenous
    247 BMPR2 NM_001204.6 Endogenous
    248 LPIN2 NM_014646.2 Endogenous
    249 UBA1 NM_003334.3 Endogenous
    250 SETD1B XM_037523.11 Endogenous
    251 PRPF8 NM_006445.3 Endogenous
    252 RNASE2 NM_002934.2 Endogenous
    253 KIAA0101 NM_014736.4 Endogenous
    254 ARG1 NM_000045.3 Endogenous
    255 UBTF NM_001076683.1 Endogenous
    256 MFSD1 NM_022736.2 Endogenous
    257 IDO1 NM_002164.3 Endogenous
    258 MS4A6A NM_022349.3 Endogenous
    259 C22orf30 NM_173566.2 Endogenous
    260 HNRNPK NM_031263.2 Endogenous
    261 ARL8B NM_018184.2 Endogenous
    262 SETD2 NM_014159.6 Endogenous
    263 NCAPG NM_022346.4 Endogenous
    264 EEF1B2 NM_001037663.1 Endogenous
    265 TRIM39 b NM_172016.2 Endogenous
    266 EHD4 NM_139265.3 Endogenous
    267 IRF1 NM_002198.1 Endogenous
    268 LOC100129022 XM_001716591.1 Endogenous
    269 TRAF3IP2 NM_147686.3 Endogenous
    270 PSMA6 NM_002791.2 Endogenous
    271 RHOG NM_001665.3 Endogenous
    272 CN312986 CN312986.1 Endogenous
    273 PSMB8 NM_004159.4 Endogenous
    274 ZNF239 NM_001099283.1 Endogenous
    275 CLPTM1 NM_001294.3 Endogenous
    276 NADK NM_023018.4 Endogenous
    277 C8orf76 NM_032847.2 Endogenous
    278 LIF NM_002309.3 Endogenous
    279 EGR1 NM_001964.2 Endogenous
    280 ARG1 b NM_000045.2 Endogenous
    281 MERTK NM_006343.2 Endogenous
    282 RHOU NM_021205.5 Endogenous
    283 PFDN5 b NM_145897.2 Endogenous
    284 MAGEA1 NM_004988.4 Endogenous
    285 SEC24C NM_198597.2 Endogenous
    286 SLC11A1 NM_000578.3 Endogenous
    287 TCF20 NM_181492.2 Endogenous
    288 AHCYL1 NM_001242676.1 Endogenous
    289 TPT1 NM_003295.3 Endogenous
    290 KIR2DL5A XM_001126354.1 Endogenous
    291 IRAK2 NM_001570.3 Endogenous
    292 C17orf51 XM_944416.1 Endogenous
    293 C14orf156 NM_031210.5 Endogenous
    294 ATP2C1 NM_014382.3 Endogenous
    295 SOCS1 NM_003745.1 Endogenous
    296 JAK1 NM_002227.1 Endogenous
    297 RSL24D1 NM_016304.2 Endogenous
    298 AP2S1 NM_021575.3 Endogenous
    299 PHRF1 NM_020901.3 Endogenous
    300 GPI NM_000175.2 Endogenous
    301 NCR1 NM_004829.5 Endogenous
    302 AKAP4 NM_139289.1 Endogenous
    303 CD160 NM_007053.3 Endogenous
    304 DDX23 NM_004818.2 Endogenous
    305 GNL3 NM_014366.4 Endogenous
    306 NFKB2 NM_002502.2 Endogenous
    307 CSK NM_004383.2 Endogenous
    308 PELP1 NM_014389.2 Endogenous
    309 KLRF1 b NM_016523.2 Endogenous
    310 CS NM_004077.2 Endogenous
    311 PHCA NM_018367.6 Endogenous
    312 LOC644315 XR_017529.2 Endogenous
    313 NUDT18 NM_024815.3 Endogenous
    314 XCL2 NM_003175.3 Endogenous
    315 KLRC1 NM_002259.3 Endogenous
    316 ARHGAP18 NM_033515.2 Endogenous
    317 CTDSP2 NM_005730.3 Endogenous
    318 P2RY5 NM_005767.5 Endogenous
    319 CREB1 NM_004379.3 Endogenous
    320 RHOB NM_004040.3 Endogenous
    321 DCAF7 NM_005828.4 Endogenous
    322 NUP153 NM_005124.3 Endogenous
    323 AFTPH NM_017657.4 Endogenous
    324 EWSR1 NM_005243.3 Endogenous
    325 LYN NM_002350.1 Endogenous
    326 CYBB NM_000397.3 Endogenous
    327 TMEM70 NM_017866.5 Endogenous
    328 PPP1R3E XM_927029.1 Endogenous
    329 PSMB1 NM_002793.3 Endogenous
    330 RERE b NM_012102.3 Endogenous
    331 RXRA NM_002957.5 Endogenous
    332 GZMA NM_006144.3 Endogenous
    333 ERLIN1 NM_006459.3 Endogenous
    334 KRTAP10-3 NM_198696.2 Endogenous
    335 SAMSN1 NM_022136.3 Endogenous
    336 LRRC47 NM_020710.2 Endogenous
    337 MARCKS NM_002356.6 Endogenous
    338 HOPX NM_139211.4 Endogenous
    339 KLRF1 NM_016523.1 Endogenous
    340 NFAT5 NM_138713.3 Endogenous
    341 SLC15A2 NM_021082.3 Endogenous
    342 STK16 NM_003691.2 Endogenous
    343 KIR_Activating_Subgroup_2 NM_014512.1 Endogenous
    344 TBCE NM_001079515.2 Endogenous
    345 BAG3 NM_004281.3 Endogenous
    346 SFRS4 NM_005626.4 Endogenous
    347 AW270402 AW270402.1 Endogenous
    348 CCL3L1 NM_021006.4 Endogenous
    349 HERC3 NM_014606.2 Endogenous
    350 RPL34 NM_000995.3 Endogenous
    351 ALAS1 NM_000688.4 Endogenous
    352 CCR9 NM_031200.1 Endogenous
    353 CORO1C ILMN_1745954.1 Endogenous
    354 FAIM3 NM_005449.4 Endogenous
    355 SFPQ NM_005066.2 Endogenous
    356 HOOK3 NM_032410.3 Endogenous
    357 CD36 NM_000072.3 Endogenous
    358 IL7 NM_000880.2 Endogenous
    359 CBLL1 NM_024814.3 Endogenous
    360 HVCN1 NM_032369.3 Endogenous
    361 HMGB1 NM_002128.4 Endogenous
    362 SIN3A NM_015477.2 Endogenous
    363 CASP3 NM_032991.2 Endogenous
    364 BQ189294 BQ189294.1 Endogenous
    365 NDRG2 NM_016250.2 Endogenous
    366 BX400436 BX400436.2 Endogenous
    367 IFNAR2 NM_000874.3 Endogenous
    368 MS4A6A b NM_152851.2 Endogenous
    369 KLRC2 NM_002260.3 Endogenous
    370 S100A12 b NM_005621.1 Endogenous
    371 ATM NM_000051.3 Endogenous
    372 NLRP3 NM_001079821.2 Endogenous
    373 HAVCR2 NM_032782.3 Endogenous
    374 C4B NM_001002029.3 Endogenous
    375 CTSW NM_001335.3 Endogenous
    376 TMEM170B NM_001100829.2 Endogenous
    377 EIF4ENIF1 NM_019843.2 Endogenous
    378 CCL3 NM_002983.2 Endogenous
    379 CHCHD3 NM_017812.2 Endogenous
    380 CST7 NM_003650.3 Endogenous
    381 SFRS15 NM_020706.2 Endogenous
    382 STIP1 NM_006819.2 Endogenous
    383 MPDU1 NM_004870.3 Endogenous
    384 DHX16 b NM_001164239.1 Endogenous
    385 INTS4 NM_033547.3 Endogenous
    386 USP16 NM_001032410.1 Endogenous
    387 IFNAR1 NM_000629.2 Endogenous
    388 ITCH NM_001257138.1 Endogenous
    389 FOXK2 NM_004514.3 Endogenous
    390 LOC642812 XR_036892.1 Endogenous
    391 KIAA1967 NM_021174.5 Endogenous
    392 LOC440928 XM_942885.1 Endogenous
    393 NDUFV2 NM_021074.4 Endogenous
    394 IL4 NM_000589.2 Endogenous
    395 CIAPIN1 NM_020313.3 Endogenous
    396 CXCL2 NM_002089.3 Endogenous
    397 TXN NM_003329.3 Endogenous
    398 PRG2 NM_002728.4 Endogenous
    399 MS4A2 NM_000139.3 Endogenous
    400 YPEL1 NM_013313.4 Endogenous
    401 POLR2A NM_000937.4 Endogenous
    402 C19orf10 NM_019107.3 Endogenous
    403 IGFBP7 NM_001553.2 Endogenous
    404 ITGAE NM_002208.4 Endogenous
    405 CXCR5 b NM_001716.3 Endogenous
    406 BID NM_001196.2 Endogenous
    407 LOC100133273 XR_039238.1 Endogenous
    408 FNBP1 NM_015033.2 Endogenous
    409 IFNGR1 NM_000416.1 Endogenous
    410 STAT6 NM_003153.4 Endogenous
    411 CR2 NM_001006658.2 Endogenous
    412 CCL3L3 NM_001001437.3 Endogenous
    413 RFWD2 NM_022457.6 Endogenous
    414 SP2 NM_003110.5 Endogenous
    415 BAT2D1 NM_015172.3 Endogenous
    416 CX3CL1 NM_002996.3 Endogenous
    417 GPATCH3 NM_022078.2 Endogenous
    418 CASP1 NM_033294.3 Endogenous
    419 NAGK NM_017567.4 Endogenous
    420 IER5 NM_016545.4 Endogenous
    421 PHLPP2 NM_015020.3 Endogenous
    422 RPL31 NM_000993.4 Endogenous
    423 SPEN NM_015001.2 Endogenous
    424 TMSB4X NM_021109.3 Endogenous
    425 IL8RB NM_001557.3 Endogenous
    426 XPC NR_027299.1 Endogenous
    427 SNX11 NM_152244.1 Endogenous
    428 SPN NM_003123.3 Endogenous
    429 ANKHD1 NM_017747.2 Endogenous
    430 CCR6 NM_031409.2 Endogenous
    431 DZIP3 NM_014648.3 Endogenous
    432 MRPL27 NM_148571.1 Endogenous
    433 SREBF1 NM_001005291.2 Endogenous
    434 CD14 NM_000591.2 Endogenous
    435 TNFSF8 NM_001244.3 Endogenous
    436 C3 NM_000064.2 Endogenous
    437 FAM50B NM_012135.1 Endogenous
    438 RASSF5 NM_182664.2 Endogenous
    439 BU743228 BU743228.1 Endogenous
    440 NFATC1 NM_172389.1 Endogenous
    441 DOCK5 NM_024940.6 Endogenous
    442 PACS1 NM_018026.3 Endogenous
    443 CYP1B1 NM_000104.3 Endogenous
    444 CLIC3 ILMN_1796423.1 Endogenous
    445 PSMA4 NM_002789.3 Endogenous
    446 ZNF341 NM_032819.4 Endogenous
    447 PRPF3 NM_004698.2 Endogenous
    448 PSMA6 b NM_002791.2 Endogenous
    449 LOC648927 XR_038906.2 Endogenous
    450 KCTD12 NM_138444.3 Endogenous
    451 LOC440389 XM_498648.3 Endogenous
    452 U2AF2 NM_007279.2 Endogenous
    453 CLEC5A NM_013252.2 Endogenous
    454 PRRG4 NM_024081.5 Endogenous
    455 TNFRSF9 NM_001561.5 Endogenous
    456 NDUFB3 NM_002491.2 Endogenous
    457 BCL6 NM_001130845.1 Endogenous
    458 SGK1 NM_005627.3 Endogenous
    459 CIP29 NM_033082.3 Endogenous
    460 CD160 b NM_007053.2 Endogenous
    461 ARCN1 NM_001655.4 Endogenous
    462 LOC151162 NR_024275.1 Endogenous
    463 GPR65 NM_003608.3 Endogenous
    464 CCR1 NM_001295.2 Endogenous
    465 TFCP2 NM_005653.4 Endogenous
    466 SGK NM_005627.3 Endogenous
    467 RNF214 NM_207343.3 Endogenous
    468 TMC8 NM_152468.4 Endogenous
    469 RBM14 NM_006328.3 Endogenous
    470 USP34 NM_014709.3 Endogenous
    471 BACH2 NM_021813.3 Endogenous
    472 LILRA5 NM_021250.3 Endogenous
    473 C5orf21 NM_032042.5 Endogenous
    474 LOC441073 XR_018937.2 Endogenous
    475 TAX1BP1 NM_001079864.2 Endogenous
    476 TNFSF13 NM_003808.3 Endogenous
    477 PIM2 NM_006875.3 Endogenous
    478 RNF19B NM_153341.3 Endogenous
    479 EPHX2 NM_001979.5 Endogenous
    480 LILRA5 b NM_181879.2 Endogenous
    481 ABCF1 NM_001025091.1 Endogenous
    482 C4orf27 NM_017867.2 Endogenous
    483 PSMB7 NM_002799.2 Endogenous
    484 LPCAT4 NM_153613.2 Endogenous
    485 TRIM21 NM_003141.3 Endogenous
    486 LOC728835 XM_001133190.1 Endogenous
    487 NFKB1 NM_003998.3 Endogenous
    488 CR2 b NM_001006658.1 Endogenous
    489 HMGB2 NM_002129.3 Endogenous
    490 IL1B NM_000576.2 Endogenous
    491 C20orf52 NM_080748.2 Endogenous
    492 DNAJB6 NM_058246.3 Endogenous
    493 PFDN5 NM_145897.2 Endogenous
    494 RPS6 NM_001010.2 Endogenous
    495 LEF1 NM_016269.4 Endogenous
    496 DKFZp761P0423 XM_291277.4 Endogenous
    497 LOC647340 XR_018104.1 Endogenous
    498 FTHL16 XR_041433.1 Endogenous
    499 COX6C NM_004374.2 Endogenous
    500 BCL10 NM_003921.2 Endogenous
    501 CD48 NM_001778.2 Endogenous
    502 ZMIZ1 NM_020338.3 Endogenous
    503 GZMH NM_033423.4 Endogenous
    504 TRRAP NM_003496.3 Endogenous
    505 SH2D3C NM_170600.2 Endogenous
    506 UBC NM_021009.3 Endogenous
    507 TXNDC17 NM_032731.3 Endogenous
    508 ATP5J2 NM_004889.3 Endogenous
    509 KIAA1267 NM_015443.3 Endogenous
    510 RFX1 NM_002918.4 Endogenous
    511 WDR1 NM_005112.4 Endogenous
    512 LOC100129697 XM_001732822.2 Endogenous
    513 TOMM7 NM_019059.2 Endogenous
    514 ARHGAP26 NM_015071.4 Endogenous
    515 HSPA6 NM_002155.4 Endogenous
    516 FLJ10357 NM_018071.4 Endogenous
    517 ITGAL NM_002209.2 Endogenous
    518 BX089765 BX089765.1 Endogenous
    519 RERE NM_001042682.1 Endogenous
    520 C15orf39 NM_015492.4 Endogenous
    521 BX436458 BX436458.2 Endogenous
    522 RWDD1 NM_001007464.2 Endogenous
    523 TMBIM6 NM_003217.2 Endogenous
    524 SLC6A6 NM_003043.5 Endogenous
    525 KIAA0174 NM_014761.3 Endogenous
    526 IL16 NM_004513.4 Endogenous
    527 EGLN1 NM_022051.1 Endogenous
    528 LOC391126 XR_017684.2 Endogenous
    529 TAPBP NM_003190.4 Endogenous
    530 NUMB NM_001005744.1 Endogenous
    531 CENTD2 NM_001040118.2 Endogenous
    532 CLSTN1 NM_001009566.2 Endogenous
    533 PSMA4 b NM_002789.4 Endogenous
    534 LOC648000 XM_371757.4 Endogenous
    535 COX7C NM_001867.2 Endogenous
    536 PIK3CD NM_005026.3 Endogenous
    537 UQCRQ NM_014402.4 Endogenous
    538 IDS NM_006123.4 Endogenous
    539 C19orf59 NM_174918.2 Endogenous
    540 MYL12A NM_006471.3 Housekeeping
    541 EIF2B4 NM_015636.3 Housekeeping
    542 DGUOK b NM_080916.2 Housekeeping
    543 PSMC1 NM_002802.2 Housekeeping
    544 CHFR NM_018223.2 Housekeeping
    545 ARPC2 NM_005731.2 Housekeeping
    546 ATP5B NM_001686.3 Housekeeping
    547 RPL3 NM_001033853.1 Housekeeping
    548 ZNF143 NM_003442.5 Housekeeping
    549 PSMD7 NM_002811.4 Housekeeping
    550 TBP NM_003194.4 Housekeeping
    551 DHX16 NM_003587.4 Housekeeping
    552 TUG1 NR_002323.2 Housekeeping
    553 GUSB NM_000181.3 Housekeeping
    554 HDAC3 NM_003883.3 Housekeeping
    555 SDHA NM_004168.3 Housekeeping
    556 PGK1 NM_000291.3 Housekeeping
    557 STAMBP NM_006463.4 Housekeeping
    558 MTCH1 NM_014341.2 Housekeeping
    559 TUBB NM_178014.2 Housekeeping
  • TABLE II
    Rank Sequence ID# Gene Class Name
    1 TPR NM_003292.2 Endogenous
    2 DNAJB1 NM_006145.2 Endogenous
    3 PDCD10 NM_145859.1 Endogenous
    4 PSMB7 NM_002799.2 Endogenous
    5 MERTK NM_006343.2 Endogenous
    6 AFTPH NM_017657.4 Endogenous
    7 BCOR NM_017745.5 Endogenous
    8 RASSF5 NM_182664.2 Endogenous
    9 SNX11 NM_152244.1 Endogenous
    10 ANP32B NM_006401.2 Endogenous
    11 C4B NM_001002029.3 Endogenous
    12 NME1-NME2 NM_001018136.2 Endogenous
    13 DGUOK NM_080916.2 Endogenous
    14 CYP1B1 NM_000104.3 Endogenous
    15 MPDU1 NM_004870.3 Endogenous
    16 MED16 NM_005481.2 Endogenous
    17 FAM179A NM_199280.2 Endogenous
    18 CPPED1 NM_018340.2 Endogenous
    19 LOC648927 XR_038906.2 Endogenous
    20 ANKHD1 NM_017747.2 Endogenous
    21 CN312986 CN312986.1 Endogenous
    22 PHCA NM_018367.6 Endogenous
    23 CD1A NM_001763.2 Endogenous
    24 NCOA5 NM_020967.2 Endogenous
    25 SLC6A12 NM_003044.4 Endogenous
    26 LOC728533 XR_015610.3 Endogenous
    27 TRAF3IP2 NM_147686.3 Endogenous
    28 TBCE NM_001079515.2 Endogenous
    29 CCT6A NM_001762.3 Endogenous
    30 P2RY5 NM_005767.5 Endogenous
    31 RNASE2 NM_002934.2 Endogenous
    32 CLN8 NM_018941.3 Endogenous
    33 REPS1 NM_001128617.2 Endogenous
    34 TPT1 NM_003295.3 Endogenous
    35 LOC100129022 XM_001716591.1 Endogenous
    36 KLRC1 NM_002259.3 Endogenous
    37 AZI2 NM_022461.4 Endogenous
    38 FAM193A NM_003704.3 Endogenous
    39 PLAC8 NM_001130715.1 Endogenous
    40 LDHA NM_001165416.1 Endogenous
    41 GPATCH3 NM_022078.2 Endogenous
    42 RBM14 NM_006328.3 Endogenous
    43 KYNU NM_001032998.1 Endogenous
    44 PPP2R5C NM_178588.1 Endogenous
    45 S100A12 b NM_005621.1 Endogenous
    46 SFMBT1 NM_001005158.2 Endogenous
    47 CCR6 NM_031409.2 Endogenous
    48 TRIM39 NM_021253.3 Endogenous
    49 AK126342 AK126342.1 Endogenous
    50 SLC45A3 NM_033102.2 Endogenous
    51 IL4 NM_000589.2 Endogenous
    52 UBE2I NM_194259.2 Endogenous
    53 PRPF3 NM_004698.2 Endogenous
    54 NDUFB3 NM_002491.2 Endogenous
    55 CRKL NM_005207.3 Endogenous
    56 IDO1 NM_002164.3 Endogenous
    57 PUM1 NM_001020658.1 Endogenous
    58 BCL10 NM_003921.2 Endogenous
    59 TMBIM6 NM_003217.2 Endogenous
    60 C17orf51 XM_944416.1 Endogenous
    61 BANP NM_079837.2 Endogenous
    62 HAVCR2 NM_032782.3 Endogenous
    63 BAG3 NM_004281.3 Endogenous
    64 DBI NM_001079862.1 Endogenous
    65 C4orf27 NM_017867.2 Endogenous
    66 TSC1 NM_000368.4 Endogenous
    67 LPCAT4 NM_153613.2 Endogenous
    68 SAMSN1 NM_022136.3 Endogenous
    69 SNORA56 NR_002984.1 Endogenous
    70 ARG1 NM_000045.3 Endogenous
    71 IL1R2 NM_173343.1 Endogenous
    72 CCND3 NM_001760.2 Endogenous
    73 USP9Y NM_004654.3 Endogenous
    74 ATP2C1 NM_014382.3 Endogenous
    75 PSMB1 NM_002793.3 Endogenous
    76 NDUFAF2 NM_174889.4 Endogenous
    77 VPS37C NM_017966.4 Endogenous
    78 HAT1 NM_003642.3 Endogenous
    79 LOC732371 XM_001133019.1 Endogenous
    80 LOC148137 NM_144692.1 Endogenous
    81 CCR1 NM_001295.2 Endogenous
    82 CCDC97 NM_052848.1 Endogenous
    83 PPP6C NM_002721.4 Endogenous
    84 GPI NM_000175.2 Endogenous
    85 PIM2 NM_006875.3 Endogenous
    86 STAT6 NM_003153.4 Endogenous
    87 BATF NM_006399.3 Endogenous
    88 EIF4ENIF1 NM_019843.2 Endogenous
    89 HSP90AB1 NM_007355.3 Endogenous
    90 U2AF2 NM_007279.2 Endogenous
    91 CYBB NM_000397.3 Endogenous
    92 WDR1 NM_005112.4 Endogenous
    93 PSMB8 NM_004159.4 Endogenous
    94 TBC1D12 NM_015188.1 Endogenous
    95 LOC648000 XM_371757.4 Endogenous
    96 XCL2 NM_003175.3 Endogenous
    97 PTGDR NM_000953.2 Endogenous
    98 ACSL5 NM_203379.1 Endogenous
    99 CASP1 NM_033294.3 Endogenous
    100 UBTF NM_001076683.1 Endogenous
  • In one embodiment, a novel gene expression profile or signature can identify and distinguish patients having cancerous tumors from patients having benign nodules. See for example the genes identified in Table I and Table II which may form a suitable gene expression profile. In another embodiment, a portion of the genes of Table I form a suitable profile. In yet another embodiment, a portion of the genes of Table II form a suitable profile. As discussed herein, these profiles are used to distinguish between cancerous and non-cancerous tumors by generating a discriminant score based on differences in gene expression profiles as exemplified below. The validity of these signatures was established on samples collected at different locations by different groups in a cohort of patients with undiagnosed lung nodules. See Example 7 and FIGS. 2A-2B and FIG. 6 . The lung cancer signatures or gene expression profiles identified herein (i.e., Table I or Table II) may be further optimized to reduce the numbers of gene expression products necessary and increase accuracy of diagnosis.
  • In one embodiment, the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In another embodiment, the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In another embodiment, the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, or 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In one embodiment, the composition includes at least 3 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 5 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 15 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 20 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 25 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 30 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 35 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 40 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 45 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 50 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 55 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 60 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 65 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 70 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 75 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 80 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 85 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 90 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 95 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 150 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 200 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 250 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 300 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 350 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 400 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 450 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 500 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table I. In another embodiment, the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table II.
  • In yet another embodiment, the expression profile is formed by the first 3 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 5 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 10 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 15 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 20 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 25 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 30 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 35 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 40 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 45 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 50 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 55 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 60 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 65 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 70 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 75 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 80 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 85 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 90 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 95 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 100 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 150 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 200 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 250 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 300 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 350 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 400 genes in rank order of Table I. In yet another embodiment, the expression profile is formed by the first 539 genes in rank order of Table I.
  • As discussed below, the compositions described herein can be used with the gene expression profiling methods which are known in the art. Thus, the compositions can be adapted accordingly to suit the method for which they are intended to be used. In one embodiment, at least one polynucleotide or oligonucleotide or ligand is attached to a detectable label. In certain embodiments, each polynucleotide or oligonucleotide is attached to a different detectable label, each capable of being detected independently. Such reagents are useful in assays such as the nCounter, as described below, and with the diagnostic methods described herein.
  • In another embodiment, the composition comprises a capture oligonucleotide or ligand, which hybridizes to at least one polynucleotide or oligonucleotide or ligand. In one embodiment, such capture oligonucleotide or ligand may include a nucleic acid sequence which is specific for a portion of the oligonucleotide or polynucleotide or ligand which is specific for the gene of interest. The capture ligand may be a peptide or polypeptide which is specific for the ligand to the gene of interest. In one embodiment, the capture ligand is an antibody, as in a sandwich ELISA.
  • The capture oligonucleotide also includes a moiety which allows for binding with a substrate. Such substrate includes, without limitation, a plate, bead, slide, well, chip or chamber. In one embodiment, the composition includes a capture oligonucleotide for each different polynucleotide or oligonucleotide which is specific to a gene of interest. Each capture oligonucleotide may contain the same moiety which allows for binding with the same substrate. In one embodiment, the binding moiety is biotin.
  • Thus, a composition for such diagnosis or evaluation in a mammalian subject as described herein can be a kit or a reagent. For example, one embodiment of a composition includes a substrate upon which the ligands used to detect and quantitate mRNA are immobilized. The reagent, in one embodiment, is an amplification nucleic acid primer (such as an RNA primer) or primer pair that amplifies and detects a nucleic acid sequence of the mRNA. In another embodiment, the reagent is a polynucleotide probe that hybridizes to the target sequence. In another embodiment, the target sequences are illustrated in Table III. In another embodiment, the reagent is an antibody or fragment of an antibody. The reagent can include multiple said primers, probes or antibodies, each specific for at least one gene, gene fragment or expression product of Table I or Table II. Optionally, the reagent can be associated with a conventional detectable label.
  • In another embodiment, the composition is a kit containing the relevant multiple polynucleotides or oligonucleotide probes or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items. In still another embodiment, at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label. In certain embodiments, the reagent is immobilized on a substrate. Exemplary substrates include a microarray, chip, microfluidics card, or chamber.
  • In one embodiment, the composition is a kit designed for use with the nCounter Nanostring system, as further discussed below.
  • II. GENE EXPRESSION PROFILING METHODS
  • Methods of gene expression profiling that were used in generating the profiles useful in the compositions and methods described herein or in performing the diagnostic steps using the compositions described herein are known and well summarized in U.S. Pat. No. 7,081,340. Such methods of gene expression profiling include methods based on hybridization analysis of polynucleotides, methods based on sequencing of polynucleotides, and proteomics-based methods. The most commonly used methods known in the art for the quantification of mRNA expression in a sample include northern blotting and in situ hybridization; RNAse protection assays; nCounter® Analysis; and PCR-based methods, such as RT-PCR. Alternatively, antibodies may be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes. Representative methods for sequencing-based gene expression analysis include Serial Analysis of Gene Expression (SAGE), and gene expression analysis by massively parallel signature sequencing (MPSS).
  • In certain embodiments, the compositions described herein are adapted for use in the methods of gene expression profiling and/or diagnosis described herein, and those known in the art.
  • A. Patient Sample
  • The “sample” or “biological sample” as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells. In one embodiment, a suitable sample is whole blood. In another embodiment, the sample may be venous blood. In another embodiment, the sample may be arterial blood. In another embodiment, a suitable sample for use in the methods described herein includes peripheral blood, more specifically peripheral blood mononuclear cells. Other useful biological samples include, without limitation, plasma or serum. In still other embodiment, the sample is saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoalveolar lavage fluid, and other cellular exudates from a subject suspected of having a lung disease. Such samples may further be diluted with saline, buffer or a physiologically acceptable diluent. Alternatively, such samples are concentrated by conventional means. It should be understood that the use or reference throughout this specification to any one biological sample is exemplary only. For example, where in the specification the sample is referred to as whole blood, it is understood that other samples, e.g., serum, plasma, etc., may also be employed in another embodiment.
  • In one embodiment, the biological sample is whole blood, and the method employs the PaxGene Blood RNA Workflow system (Qiagen). That system involves blood collection (e.g., single blood draws) and RNA stabilization, followed by transport and storage, followed by purification of Total RNA and Molecular RNA testing. This system provides immediate RNA stabilization and consistent blood draw volumes. The blood can be drawn at a physician's office or clinic, and the specimen transported and stored in the same tube. Short term RNA stability is 3 days at between 18-25° C. or 5 days at between 2-8° C. Long term RNA stability is 4 years at −20 to −70° C. This sample collection system enables the user to reliably obtain data on gene expression in whole blood. In one embodiment, the biological sample is whole blood. While the PAXgene system has more noise than the use of PBMC as a biological sample source, the benefits of PAXgene sample collection outweighs the problems. Noise can be subtracted bioinformatically by the person of skill in the art.
  • In one embodiment, the biological samples may be collected using the proprietary PaxGene Blood RNA System (PreAnalytiX, a Qiagen, BD company). The PAXgene Blood RNA System comprises two integrated components: PAXgene Blood RNA Tube and the PAXgene Blood RNA Kit. Blood samples are drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs. Thereafter, the miRNA is detected and/or measured using a variety of assays.
  • B. Nanostring Analysis
  • A sensitive and flexible quantitative method that is suitable for use with the compositions and methods described herein is the nCounter® Analysis system (NanoString Technologies, Inc., Seattle WA). The nCounter Analysis System utilizes a digital color-coded barcode technology that is based on direct multiplexed measurement of gene expression and offers high levels of precision and sensitivity (<1 copy per cell). The technology uses molecular “barcodes” and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction. Each color-coded barcode is attached to a single target-specific probe (i.e., polynucleotide, oligonucleotide or ligand) corresponding to a gene of interest, i.e., a gene of Table I. Mixed together with controls, they form a multiplexed CodeSet. In one embodiment, the CodeSet includes all 559 genes of Table I. In another embodiment, the CodeSet includes all 100 genes of Table II. In another embodiment, the CodeSet includes at least 3 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 5 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 10 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 15 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 20 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 25 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 30 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 40 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 50 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 60 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 70 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 80 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 90 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 100 genes of Table I. In another embodiment, the CodeSet includes at least 200 genes of Table I. In another embodiment, the CodeSet includes at least 300 genes of Table I. In yet another embodiment, the CodeSet includes at least 400 genes of Table I. In yet another embodiment, the CodeSet includes at least 500 genes of Table I. In yet another embodiment, the CodeSet is formed by the first 539 genes in rank order of Table I. In yet another embodiment, the CodeSet includes any subset of genes of Table I, as described herein. In another embodiment, the CodeSet includes any subset of genes of Table II, as described herein.
  • The NanoString platform employs two ˜50 base probes per mRNA that hybridizes in solution. The Reporter Probe carries the signal; the Capture Probe allows the complex to be immobilized for data collection. The probes are mixed with the patient sample. After hybridization, the excess probes are removed and the probe/target complexes aligned and immobilized to a substrate, e.g., in the nCounter Cartridge.
  • The target sequences utilized in the Examples below for each of the genes of Table I and Table II are shown in Table III below, and are reproduced in the sequence listing. These sequences are portions of the published sequences of these genes. Suitable alternatives may be readily designed by one of skill in the art.
  • Sample Cartridges are placed in the Digital Analyzer for data collection. Color codes on the surface of the cartridge are counted and tabulated for each target molecule.
  • A benefit of the use of the NanoString nCounter system is that no amplification of mRNA is necessary in order to perform the detection and quantification. However, in alternate embodiments, other suitable quantitative methods are used. See, e.g., Geiss et al, Direct multiplexed measurement of gene expression with color-coded probe pairs, Nat Biotechnol. 2008 March; 26(3):317-25. doi: 10.1038/nbt1385. Epub 2008 Feb. 17, which is incorporated herein by reference in its entirety.
  • TABLE III
    Se-
    quence  Posi-
    ID# Gene tion Target Sequence
      1 ABCA5 NM_018672.4  6839- AAGGAAGACTGTGTGTAGAATCT
     6938 TACGTAATAGTCTGATTCTTTGA
    CTCTGTGGCTAGAATGACAGTTA
    TCTATGGAGGTGGTAGAATTAAG
    CCATACCT
      2 ABCF1 NM_00102509  2875- CCTAAACAAACAAGAGGTGACC
    1.1  2974 ACCTTATTGTGAGGTTCCATCCA
    GCCAAGTTTATGTGGCCTATTGT
    CTCAGGACTCTCATCACTCAGAA
    GCCTGCCTC
      3 ACAA2 NM_006111.2  1605- CTCACTGTGACCCATCCTTACTC
     1704 TACTTGGCCAGGCCACAGTAAAA
    CAAGTGACCTTCAGAGCAGCTGC
    CACAACTGGCCATGCCCTGCCAT
    TGAAACAG
      4 PHCA NM_018367.6  3324- AGCCAATAGTGATTTGTTTGCAT
     3423 ATCACCTAATGTGAAAAGTGCTC
    ATCTGTGAACTCTACAGCAAATT
    ATATTTTAGAAAATACTTTGTGA
    GGCCGGGC
      5 ACSL5 NM_203379.1  2701- CTATCACTCATGTCAATCATATC
     2800 TATGAGACAAATGTCTCCGATGC
    TCTTCTGCGTAAATTAAATTGTG
    TACTGAAGGGAAAAGTTTGATCA
    TACCAAAC
      6 CABC1 NM_020247.4  2536- TTCTAGAGTGAGATTTGTGTTTT
     2635 CTGCCCTTTTCCTCTCCAGCCGA
    TGGGCTGGAGCTGGGAGAGGTGC
    TGAGCTAACAGTGCCAACAAGT
    GCTCCTTAA
      7 CD97 NM_078481.3  3186- GCCAGTACTCGGGACAGACTAA
     3285 GGGCGCTTGTCCCATCCTGGACT
    TTTCCTCTCATGTCTTTGCTGCA
    GAACTGAAGAGACTAGGCGCTGG
    GGCTCAGCT
      8 AFTPH NM_017657.4  2741- CTACCACCCGTCCAGTTTGACTG
     2840 GAGTAGCAGTGGCCTTACTAACC
    CTTTAGATGGTGTGGATCCGGAG
    TTGTATGAGTTAACAACTTCTAA
    GCTGGAAA
      9 AHCYL1 NM_00124267  2401- CTACCCGGCAGGTAGGTTAGATG
    6.1  2500 TGGGTGGTGCATGTTAATTTCCC
    TTAGAAGTTCCAAGCCCTGTTTC
    CTGCGTAAAGGTGGTATGTCCAG
    TTCAGAGA
     10 AK AK026725.1  1869- AATGAAATTACTGTAGAGTCAGC
    026725  1968 AAAGAAGTAGAGAAGAAAAAAC
    ACCAAGAATGAGGAGAACCTAG
    CAAGGGCAGGCTTTTGGAAGCA
    AGAGGTAGATA
     11 AK AK093878.1  1554- AGAATTTCTTGGTAGCTTTACAC
    093878  1653 CGAAAAATGCGTGTAACTAAAT
    ACCAGACATCTTGACCATTCAGC
    TAGAACCCTGGCAGCAACAGAG
    CTATTTAATT
     12 AK AK094576.1  1765- CCCCTCCAGCCAGCCCTGCGTGG
    094576  1864 TTGTGGCCCCACTGCAGAAACGC
    CTCCGCTTAACACTCCAGCCTCT
    CTTCTATTCGGTCAGGCCACAGC
    TGCTGACT
     13 AK AK124143.1  2252- GTACCTGGTAGAAATTGTGTCTT
    124143  2351 GGAATGACCCTTTCGAGTTATTG
    ACATGGCTCTGATGAATAGAACA
    TGAGCCCCAAAACTAAATCCAA
    AAGGAATTT
     14 AK AK126342.1  2906- CTTATTGATTAGTGAATGTAGCT
    126342  3005 TAAGCCTTTGTATGTGTCCTCAG
    GGGGCAGACCGACTTTAAGAGG
    GACCAGATAACGTTTGAATGGA
    GGGATTATAT
     15 AKAP4 NM_139289.1   417- CTGTAAGTGTCCTCAACTGGCTT
      516 CTCAGTGATCTCCAGAAGTATGC
    CTTGGGTTTCCAACATGCACTGA
    GCCCCTCAACCTCTACCTGTAAA
    CATAAAGT
     16 AKR1C3 NM_003739.4  1097- GAGGACGTCTCTATGCCGGTGAC
     1196 TGGACATATCACCTCTACTTAAA
    TCCGTCCTGTTTAGCGACTTCAG
    TCAACTACAGCTGAGTCCATAGG
    CCAGAAAG
     17 ALAS1 NM_000688.4  1616- GGGGATCGGGATGGAGTCATGC
     1715 CAAAAATGGACATCATTTCTGGA
    ACACTTGGCAAAGCCTTTGGTTG
    TGTTGGAGGGTACATCGCCAGCA
    CGAGTTCTC
     18 AMD1 NM_001634.4   572- ACCACCCTCTTGCTGAAAGCACT
      671 GGTTCCCCTGTTGAAGCTTGCTA
    GGGATTACAGTGGGTTTGACTCA
    ATTCAAAGCTTCTTTTATTCTCG
    TAAGAATT
     19 AMPD3 NM_000480.2  3389- GTGATGCTCAGGGGCTGTCAAAG
     3488 TGACTGCGTTCATCAGTTTTACA
    CTGGGGCTGCTACATAATATTTT
    CATTTGAACGAAGAACTTCAAAA
    AGCACAGG
     20 ANKHD1 NM_017747.2  7665- CTTGGAACCCTATGATAAAAGTT
     7764 ATCCAAAATTCAACTGAATGCAC
    TGATGCCCAGCAGATTTGGCCTG
    GCACGTGGGCACCTCATATTGGA
    AACATGCA
     21 ANP32B NM_006401.2   661- CACCTTGGAACCTTTGAAAAAGT
      760 TAGAATGTCTGAAAAGCCTGGAC
    CTCTTTAACTGTGAGGTTACCAA
    CCTGAATGACTACCGAGAGAGT
    GTCTTCAAG
     22 ANXA1b NM_000700.1   516- GAAATCAGAGACATTAACAGGG
      615 TCTACAGAGAGGAACTGAAGAG
    AGATCTGGCCAAAGACATAACCT
    CAGACACATCTGGAGATTTTCGG
    AACGCTTTGC
     23 ANXA1 NM_000700.2  1191- TGGATGAAACCAAAGGAGATTA
     1290 TGAGAAAATCCTGGTGGCTCTTT
    GTGGAGGAAACTAAACATTCCCT
    TGATGGTCTCAAGCTATGATCAG
    AAGACTTTA
     24 AP2S1 NM_021575.3   746- CGAGTAACCGTGCCGTTGTCGTG
      845 TGATGCCATAAGCGTCTGTGCGT
    GGAGTCCCCAATAAACCTGTGGT
    CCTGCCTGGCCTTGCCGTCAAAA
    AAAAAAAA
     25 CENTD2 NM_00104011  4923- AAACTCCAGAACAGCAGAAAGC
    8.2  5022 GGGTGCTGTAGAGGAGCACTCA
    GCTCACGGGGAGGGAGCTCTTG
    GCTGAGCTTCTACAGGGCTGAGA
    GCTGCGCTTTG
     26 ARCN1 NM_001655.4  3437- CACTTTTAGCTGGTTGAAAAGTA
     3536 CCACTCCCACTCTGAACATCTGG
    CCGTCCCTGCAAAGAGTGTACTG
    TGCTTGAAGCAGAGCACTCACAC
    ATAAATGG
     27 ARG1b NM_000045.2   506- AAGGAACTAAAAGGAAAGATTC
      605 CCGATGTGCCAGGATTCTCCTGG
    GTGACTCCCTGTATATCTGCCAA
    GGATATTGTGTATATTGGCTTGA
    GAGACGTGG
     28 ARG1 NM_000045.3   989- TTCGGACTTGCTCGGGAGGGTAA
     1088 TCACAAGCCTATTGACTACCTTA
    ACCCACCTAAGTAAATGTGGAA
    ACATCCGATATAAATCTCATAGT
    TAATGGCAT
     29 ARHGAP NM_018054.5  3027- CATGTATGGTCTGTGTCTCCCCA
    17  3126 GTCCCCTCAGAACCATGCCCATG
    GATGGTGACTGCTGGCTCTGTCA
    CCTCATCAAACTGGATGTGACCC
    ATGCCGCC
     30 ARHGAP NM_033515.2  2499- TTTTTGACCAAAAAGATAACAAA
    18  2598 TACCAGGTATGGCAAGTIGTGAA
    GACAGCACATTAAAACATACCTA
    ATTTCACAGTATTCCTGTCACGA
    CAGAATGT
     31 ARHGAP NM_015071.4  6088- TCCCTGAGCTTTCCCAGTAGCCT
    26  6187 CCAGTTTCCTTTGTAAGACCCAG
    GGATCACTTAGCCATAGCCTGAA
    TCTTTTAGGGGTATTAAGGTCAG
    CCTCTCAC
     32 ARHGEF NM_015318.3  5128- GATTACAACATTTCCTCACTGCG
    18  5227 GGATATTTCTGACCCGCTTTAGA
    ACTTAAGACCTGATTCTAGCAAT
    AAACGTGTCCGAGATGAGCGGT
    GAAAAAAAA
     33 FLJ NM_018071.4  5402- GAATGTGTCTCCTCCACAGTGGC
    10357  5501 TCCCAGAGGTTCCACACACTCTC
    TGAAGCTCCTTCTCCCACACTGC
    ACCTACTCCTTGAGGCTGAACTG
    GTCACAGA
     34 ARHGEF NM_005435.3  5151- GGGGGACCATTGGGGCCTGAGC
    5  5250 CAAGGAACTTTCCTTCTACTGCC
    TTATAGTGCTTAAACATTCTCCG
    CCTCCAGGGTGCAGATTCAGAGC
    TGGCCAGAG
     35 ARL8B NM_018184.2  2491- ACCATTACAAAGAATGTGGCAA
     2590 CTTGCTTGTGCCTAAAAGGAGGA
    ATTGGAACTAGAATGTGTGACTC
    TGTGGGGACTGCATAGGTTTGTT
    AATTGACCT
     36 ARPC2 NM_005731.2   951- ACGGGGAAGACGTTTTCATCCCG
     1050 CTAATCTTGGGAATAAGAGGAG
    GAAGCGGCTGGCAACTGAAGGC
    TGGAACACTTGCTACTGGATAAT
    CGTAGCTTTT
     37 ASF1B NM_018154.2  1476- CTGTCTCCGGGCCAGGGTCAGGG
     1575 ACCCTCTGCCTCTGGCAGCCTTA
    ACCTGTCCTCTGCTAGGACCAGG
    GTGATTTCAAGCCAGGGAAGCA
    ACTGGGACC
     38 ATG4B NM_178326.2   106- GGACGCAGCTACTCTGACCTACG
      205 ACACTCTCCGGTTTGCTGAGTTT
    GAAGATTTTCCTGAGACCTCAGA
    GCCCGTTTGGATACTGGGTAGAA
    AATACAGC
     39 ATG5 NM_004849.2  1105- TGCAGTGGCTGAGTGAACATCTG
     1204 AGCTACCCGGATAATTTTCTTCA
    TATTAGTATCATCCCACAGCCAA
    CAGATTGAAGGATCAACTATTTG
    CCTGAACA
     40 ATM NM_000051.3    31- ACGCTAAGTCGCTGGCCATTGGT
      130 GGACATGGCGCAGGCGCGTTTGC
    TCCGACGGGCCGAATGTTTTGGG
    GCAGTGTTTTGAGCGCGGAGACC
    GCGTGATA
     41 ATP2C1 NM_014382.3  4070- TAAAAAGTCCCCAAACCCAAAC
     4169 AAATGGTTTATGAACCAGAGTAT
    ATGTGGAAGATTCTTTGCTGGTC
    TTGCTCTGTGTGCATCTGAAGCT
    TCTTTGGCC
     42 ATP5B NM_001686.3  1626- CTATATGGTGGGACCCATTGAAG
     1725 AAGCTGTGGCAAAAGCTGATAA
    GCTGGCTGAAGAGCATTCATCGT
    GAGGGGTCTTTGTCCTCTGTACT
    GTCTCTCTC
     43 ATP5I NM_007100.2   256- TTGCCAGAGAATTGGCAGAAGA
      355 TGACAGCATATTAAAGTGAGTGA
    CCCTGCGACCCACTCTTTGGACC
    AGCAGCGGATGAATAAAGCTTC
    CTGTGTTGTG
     44 ATP5J2 NM_004889.3   267- GCTGGCATGCTACGTGCTCTTTA
      366 GCTACTCCTTTTCCTACAAGCAT
    CTCAAGCACGAGCGGCTCCGCA
    AATACCACTGAAGAGGACACAC
    TCTGCACCCC
     45 ATP5L NM_006476.4   196- GGGACGGGGTCCTGCAGCGGGT
      295 CCTTCCGGCGGGTGACATTCAGC
    CGGCGGTTCGGGGCGACGGACT
    CTCCATTCCAGAACCATGGCCCA
    ATTTGTCCGT
     46 AW AW173314.1   419- AGCAGAAGGCAGGGGAGTCCAC
    173314   518 ACAGGGCAAGCAGCAACCAGGC
    TTCTGAGGACAGGAAAGGAGGG
    AGCATCTGGTGGGAAGCTGGCG
    AGGAGGGGCTGG
     47 AW AW270402.1   203- GATATCTCACACACGGAATAATC
    270402   302 ATTAAGAAACAACCACTGTTGAG
    CAAAGTTGATAGGCAGTAAGGA
    AATAAAGTGGACATAAACACAG
    CAGTACTAAT
     48 AZI2 NM_022461.4  3031- GAATTGGTGTCAGATGCTGGAAT
     3130 TTATTCTGACCAATGAACACAGC
    TGACTCAGGGGAGTACAATCTCC
    TGCCAAGTAATAGAACCAAACC
    CAATATGCA
     49 BACH2 NM_021813.3  8696- TCCAGAACCAGTCTGATGCAAGT
     8795 GCACCTCTAATATATGCCTTACA
    AACTCCAGAGGCCATATTCAAAA
    CAGGGTCTTCTCAGTGTATGCAA
    GGGGCTGC
     50 BAG3 NM_004281.3  2304- CCCCACCACCTGTTAGCTGTGGT
     2403 TGTGCACTGTCTTTTGTAGCTCT
    GGACTGGAGGGGTAGATGGGGAG
    TCAATTACCCATCACATAAATAT
    GAAACATT
     51 BANP NM_079837.2  2125- GGAGCCCTTTGCTGTGTGCTCTG
     2224 TCCAGTGTCATGAGGCAGGTGTT
    TGCAAAGCCAGCTCTCGGTTCCG
    ATGGGGTATTGCTGACCTACTTT
    TCTAGGGG
     52 BATF NM_006399.3   294- CCTGGCAAACAGGACTCATCTGA
      393 TGATGTGAGAAGAGTTCAGAGG
    AGGGAGAAAAATCGTATTGCCG
    CCCAGAAGAGCCGACAGAGGCA
    GACACAGAAGG
     53 BCL10 NM_003921.2  1251- TGAAAATACCATCTTCTCTTCAA
     1350 CTACACTTCCCAGACCTGGGGAC
    CCAGGGGCTCCTCCTTTGCCACC
    AGATCTACAGTTAGAAGAAGAA
    GGAACTTGT
     54 BCL6 NM_00113084  3401- CCTCACGGTGCCTTTTTTCACGG
    5.1  3500 AAGTTTTCAATGATGGGCGAGCG
    TGCACCATCCCTTTTTGAAGTGT
    AGGCAGACACAGGGACTTGAAG
    TTGTTACTA
     55 BCOR NM_017745.5  5794- ATACAAAGCTCTGATGACAGGCC
     5893 ATGACTGTAGAGTGGTCAGAACT
    GTGTGGTTGGTTTGAGGGAGCGA
    ATTCGGGGAAGGCACTTGGTGAT
    ATAACTTT
     56 BF BF375676.1   141- TGTATTTCTGTGCAATGAGAGAG
    375676   240 GCTCTTTATGGTGGTGCTACAAA
    CAAGCTCATCTTTGGAACTGGCA
    CTCTGCTTGCTGTCCAGCCAAAT
    ATCCAGAA
     57 BID NM_001196.2  1876- AAGCACGACAGTGGATGCTGGG
     1975 TCCATATCACACACATTGCTGTG
    AACAGGAAACTCCTGTGACCAC
    AACATGAGGCCACTGGAGACGC
    ATATGAGTAAG
     58 BMPR2 NM_001204.6  1164- CAGCGGCCCTGGCGGGTGCCCTG
     1263 GCTACCATGGACCATCCTGCTGG
    TCAGCACTGCGGCTGCTTCGCAG
    AATCAAGAACGGCTATGTGCGTT
    TAAAGATC
     59 BQ BQ189294.1   416- GCTGGAGTGATTGGCCCTGATGA
    189294   515 CCATGGAGAAAAGAGAGTAGGG
    AGAACAGTATAACCAGAAGTCA
    GGGGGGTCTCCTGGAATCCCTCC
    TCACAATACC
     60 BU BU743228.1   154- CCCTGTGGGCCTTGCAGGCCAGT
    743228   253 CCAGGCAGGTCTTTCACACTGTT
    GTCCCACATAACAGAAAAAGCT
    GAGCAGACAGGGTAGGAAACAC
    ACTTGCATCT
     61 BX BX089765.1   106- TTAAGCAACTTGCTCCAGTGACG
    089765   205 CAGCTGGTAAGCAGCAGAGCTG
    GGATTAAAACCCAGGCATTCTGA
    TTCCACCACCTACACACTTAGCC
    ATTCCGCCC
     62 BX BX108566.1   365- ATTTAGGGTGAGAGCTTCACAGC
    108566   464 TGAAAATCTCCTTTAAAGAAAAC
    GCGGCCCAAATGTGCTGGGAGG
    AGAAGCCAGTGGATCTAGGAGG
    GGGCCCGGCG
     63 BX BX400436.2     1- ATATTTTGGAGAGGGAAGTTGGC
    400436   100 TCACTGTTGTAGAGGACCTGAAC
    AAGGTGTTCCCACCCGAGGTCGC
    TGTGTTTGAGCCATCAGAAGCAG
    AGATCTCC
     64 BX BX436458.2   518- ATGCAGACAATTTGCCTGTGAGA
    436458   617 TGAGGAAAATTCTCTGGAAGATT
    TAGGCCCTGAGAGCTGAAAAGG
    GACCCTAAACATTACCTGGTGAC
    AACTGCCCT
     65 C15orf NM_015492.4  3535- CCTGAGCTTTTAACGTGAGGGTC
    39  3634 TTTATTGGATAGGACTACTCCCT
    ATTTCTTGCCTAGAGAACACACA
    TGGGCTTTGGAGCCCGACAGACC
    TGGGCTTG
     66 C17orf XM_944416.1  4909- AAGGATGGGGGTGGATTGACCA
    51  5008 AGCTGGGCCAGAGGTGCGAGGA
    GCTGATCTGCGAGCCCTGTGTGC
    CTGTGAGTCCTGGCGGAGTGGCC
    GTGCGTGGTG
     67 C3 NM_000064.2  4397- CATCTACCTGGACAAGGTCTCAC
     4496 ACTCTGAGGATGACTGTCTAGCT
    TTCAAAGTTCACCAATACTTTAA
    TGTAGAGCTTATCCAGCCTGGAG
    CAGTCAAG
     68 C4B NM_00100202  4438- GAGTCCAGGGTGCACTACACCGT
    9.3  4537 GTGCATCTGGCGGAACGGCAAG
    GTGGGGCTGTCTGGCATGGCCAT
    CGCGGACGTCACCCTCCTGAGTG
    GATTCCACG
     69 C4orf NM_017867.2   682- GAACCGTGAAGATGAAACAGAG
    27   781 AGATAAGAAAGTIGTGACAAAG
    ACCTTTCATGGTGCAGGCTTGGT
    TGTTCCAGTAGATAAAAATGATG
    TTGGGTACCG
     70 C8orf NM_032847.2  1029- TAAAAGATGAAGTTCACCCAGA
    76  1128 GGTGAAGTGTGTTGGCTCCGTAG
    CCCTGACTGCCTTGGTGACTGTA
    TCCTCAGAAGAATTTGAAGACAA
    GTGGTTCAG
     71 C9orf NM_182635.1   529- CGCTGGCCATGGGGAAGCCACCT
    164   628 CCAGGGCAGTCCCAGGGACTGA
    ATTGGAAGTTGTCCCAAGTCACT
    TCAGGTCCAACTGGGACAGCAG
    AGGTAACCCC
     72 CAMP NM_004345.4   623- TTGTCCAGAGAATCAAGGATTTT
      722 TTGCGGAATCTTGTACCCAGGAC
    AGAGTCCTAGTGTGTGCCCTACC
    CTGGCTCAGGCTTCTGGGCTCTG
    AGAAATAA
     73 CASP1 NM_033294.3   219- ATTTATCCAATAATGGACAAGTC
      318 AAGCCGCACACGTCTTGCTCTCA
    TTATCTGCAATGAAGAATTTGAC
    AGTATTCCTAGAAGAACTGGAGC
    TGAGGTTG
     74 CASP2 NM_032983.3  3347- CCCACCACTCTTGACTCAGGTGG
     3446 TGTCCTTCTTCCTCAAGTCTTGA
    CAATTCCCGGGCCCTTCAGTCCC
    TGAGCAGTCTACTTCTGTGTCTG
    TCACCACA
     75 CASP3 NM_032991.2   686- ACTCCACAGCACCTGGTTATTAT
      785 TCTTGGCGAAATTCAAAGGATGG
    CTCCTGGTTCATCCAGTCGCTTT
    GTGCCATGCTGAAACAGTATGCC
    GACAAGCT
     76 CBLL1 NM_024814.3  1967- ATGAGGGGGAAAAAAACTTATG
     2066 TGTAGTCAATCTTTTAAGCTTTG
    ACTGTTTTGGGAAGGAAGAGTAC
    CTCTTATCGAGGTAGTATAAAAC
    ACATAGGGT
     77 CC2D1B NM_032449.2  4183- TTGCATAAGCACAGCTCAAGAAC
     4282 TGAGCTTTGTATGTGTCCTTTTG
    GGGGATAACAGGGCTGGACCATG
    CTTCCCTGCCCTTAAACGCAGAG
    CTTTTAGT
     78 KIAA NM_021174.5   201- GGGAGAGGGCCCACACAGTCTC
    1967   300 CTCGCCGGCACCGGCCTCCTCCA
    TTTTTCCGGGCCTTGCGTGGAGG
    GTTTTGGCGGATGTTTTTGAACG
    AAGGAATGT
     79 CCDC97 NM_052848.1  2867- ATCCAGAGTGAGACAGCATTGG
     2966 AGGGACAAGTGTGCATGCAGAT
    GTCCTCAGACGGGAAGGTTTGAG
    AAGGGTCAGATGGTAGGCGGGC
    CTAACAAGGGC
     80 CCL3 NM_002983.2   160- CAGTTCTCTGCATCACTTGCTGC
      259 TGACACGCCGACCGCCTGCTGCT
    TCAGCTACACCTCCCGGCAGATT
    CCACAGAATTTCATAGCTGACTA
    CTTTGAGA
     81 CCL3L1 NM_021006.4   422- GGAGCCTGAGCCTTGGGAACAT
      521 GCGTGTGACCTCTACAGCTACCT
    CTTCTATGGACTGGTTATTGCCA
    AACAGCCACACTGTGGGACTCTT
    CTTAACTTA
     82 CCL3L3 NM_00100143   402- GGGGAGGAGCAGGAGCCTGAGC
    7.3   501 CTTGGGAACATGCGTGTGACCTC
    CACAGCTACCTCTTCTATGGACT
    GGTTATTGCCAAACAGCCACACT
    GTGGGACTC
     83 CCL4 NM_002984.2    36- TTCTGCAGCCTCACCTCTGAGAA
      135 AACCTCTTTGCCACCAATACCAT
    GAAGCTCTGCGTGACTGTCCTGT
    CTCTCCTCATGCTAGTAGCTGCC
    TTCTGCTC
     84 CCND3 NM_001760.2  1216- GGCCAGCCATGTCTGCATTTCGG
     1315 TGGCTAGTCAAGCTCCTCCTCCC
    TGCATCTGACCAGCAGCGCCTTT
    CCCAACTCTAGCTGGGGGTGGGC
    CAGGCTGA
     85 CCR1 NM_001295.2   536- CATCATTTGGGCCCTGGCCATCT
      635 TGGCTTCCATGCCAGGCTTATAC
    TTTTCCAAGACCCAATGGGAATT
    CACTCACCACACCTGCAGCCTTC
    ACTTTCCT
     86 CCR6 NM_031409.2   936- CTTTAACTGCGGGATGCTGCTCC
     1035 TGACTTGCATTAGCATGGACCGG
    TACATCGCCATTGTACAGGCGAC
    TAAGTCATTCCGGCTCCGATCCA
    GAACACTA
     87 CCR9 NM_031200.1  1096- CCCTGTTCTCTATGTTTTTGTGG
     1195 GTGAGAGATTCCGCCGGGATCTC
    GTGAAAACCCTGAAGAACTTGGG
    TTGCATCAGCCAGGCCCAGTGGG
    TTTCATTT
     88 CCT6A NM_001762.3   281- GCCCAAGGGCACCATGAAGATG
      380 CTCGTTTCTGGCGCTGGAGACAT
    CAAACTTACTAAAGACGGCAAT
    GTGCTGCTTCACGAAATGCAAAT
    TCAACACCCA
     89 CD14 NM_000591.2   886- GCCCAAGCACACTCGCCTGCCTT
      985 TTCCTGCGAACAGGTTCGCGCCT
    TCCCGGCCCTTACCAGCCTAGAC
    CTGTCTGACAATCCTGGACTGGG
    CGAACGCG
     90 CD160b NM_007053.2   501- TTGATGTTCACCATAAGCCAAGT
      600 CACACCGTTGCACAGTGGGACCT
    ACCAGTGTTGTGCCAGAAGCCAG
    AAGTCAGGTATCCGCCTTCAGGG
    CCATTTTT
     91 CD160 NM_007053.3  1286- AAAGGAAGACAGCCAGATCCAG
     1385 TGATTGACTTGGCATGAAAATGA
    GAAAATGCAGACAGACCTCAAC
    ATTCAACAACATCCATACAGCAC
    TGCTGGAGGA
     92 CD1A NM_001763.2  1816- CCTGTTTTAGATATCCCTTACTC
     1915 CAGAGGGCCTTCCCTGACTTACA
    AGTGGGAAGCAGTCTCTTCCTGG
    TCTGAACTCCCGCCACATTTTAG
    CCGTACTT
     93 CD36 NM_000072.3  1619- TAAAGAATCTGAAGAGGAACTA
     1718 TATTGTGCCTATTCTTTGGCTTA
    ATGAGACTGGGACCATTGGTGAT
    GAGAAGGCAAACATGTTCAGAAG
    TCAAGTAAC
     94 CD48 NM_001778.2   271- AATTTAAAGGCAGGGTCAGACTT
      370 GATCCTCAGAGTGGCGCACTGTA
    CATCTCTAAGGTCCAGAAAGAG
    GACAACAGCACCTACATCATGA
    GGGTGTTGAA
     95 CD69 NM_001781.2  1360- TATACAGTGTCTTACAGAGAAAA
     1459 GACATAAGCAAAGACTATGAGG
    AATATTTGCAAGACATAGAATAG
    TGTTGGAAAATGTGCAATATGTG
    ATGTGGCAA
     96 CD70 NM_001252.2   191- CCTATGGGTGCGTCCTGCGGGCT
      290 GCTTTGGTCCCATTGGTCGCGGG
    CTTGGTGATCTGCCTCGTGGTGT
    GCATCCAGCGCTTCGCACAGGCT
    CAGCAGCA
     97 CD79A NM_021601.3   617- TGAAGATGAAAACCTTTATGAAG
      716 GCCTGAACCTGGACGACTGCTCC
    ATGTATGAGGACATCTCCCGGGG
    CCTCCAGGGCACCTACCAGGATG
    TGGGCAGC
     98 CD79B NM_000626.2   350- GAAGCTGGAAAAGGGCCGCATG
      449 GAAGAGTCCCAGAACGAATCTCT
    CGCCACCCTCACCATCCAAGGCA
    TCCGGTTTGAGGACAATGGCATC
    TACTTCTGT
     99 CDC42 NM_006779.3  1779- AGGGCTTTGTGGAGGACAGGCCT
    EP2  1878 TGCCCTCAAGAACGTCGTACCTG
    ACGCTGAGCCTGTCATGAGAATG
    CAACAGGAGCAAACCAAGTGTT
    GCTGTGACA
    100 CDH5 NM_001795.3  3406- TCTCCCCTTCTCTGCCTCACCTG
     3505 GTCGCCAATCCATGCTCTCTTTC
    TTTTCTCTGTCTACTCCTTATCC
    CTTGGTTTAGAGGAACCCAAGAT
    GTGGCCTT
    101 CDKN1A NM_000389.2  1976- CATGTGTCCTGGTTCCCGTTTCT
     2075 CCACCTAGACTGTAAACCTCTCG
    AGGGCAGGGACCACACCCTGTAC
    TGTTCTGTGTCTTTCACAGCTCC
    TCCCACAA
    102 CFD NM_001928.2   860- CTGGTTGGTCTTTATTGAGCACC
      959 TACTATATGCAGAAGGGGAGGC
    CGAGGTGGGAGGATCATTGGAT
    CTCAGGAGTTCGAGATCAGCATG
    GGCCACGTAG
    103 CHCHD3 NM_017812.2  1173- TCCACCCTAACAAAGTAGGATGG
     1272 GGTTGGGGGCTAAATTAATTGGA
    GTGGGGCGAGGAGAGAGCCAGA
    AAACATAGATCCGAGGGCAGCA
    GTGCTGGGTG
    104 CHFR NM_018223.2  2836- CGCCGCTCCCTCATGCTGCCCGG
     2935 GCCCTTCCTCCAAGACCCTACAG
    AGCCTGAGGGGCACCTTGGCTTC
    CGCCTGTGCTAGCTTTGCCATGT
    CATCTGGA
    105 CHMP5 NM_016410.5  1148- ACTAAGGAAATGGAATCTTAAA
     1247 AGTCTATGACAGTGTAACTCTAC
    AGTCTCAAAATGACCTGATAAAT
    TGATAAGACAAAGATGAGATTA
    TTGGGGCTGT
    106 CIAPIN NM_020313.3  1816- GCATGTCTTGTAAAGAGAGGGG
    1  1915 ATGTGCATTTGTGTGTGATGTTG
    GATAGTCATCCACGCTCAGTTTG
    GACCATTGGAGGAACTTAGTGTC
    ACGCACAAA
    107 CKS2 NM_001827.1   228- AGACTTGGTGTCCAACAGAGTCT
      327 AGGCTGGGTTCATTACATGATTC
    ATGAGCCAGAACCACATATTCTT
    CTCTTTAGACGACCTCTTCCAAA
    AGATCAAC
    108 CLEC4A NM_194448.2   389- ATTTCTACTGAATCAGCATCTTG
      488 GCAAGACAGTGAGAAGGACTGT
    GCTAGAATGGAGGCTCACCTGCT
    GGTGATAAACACTCAAGAAGAG
    CAGGATTTCA
    109 CLEC4C NM_203503.1   571- TACGAGAGTATCAACAGTATCAT
      670 CCAAGCCTGACCTGCGTCATGGA
    AGGAAAGGACATAGAAGATTGG
    AGCTGCTGCCCAACCCCTTGGAC
    TTCATTTCA
    110 CLEC5A NM_013252.2  3251- CCCCATCCAACCCTTAGACTCAC
     3350 GAACAAATCCACCTGAGATCAG
    CAGAGCCACCCTAGATCAGCTGA
    AACTCTAAGCACAAAAATAAAA
    ACTTATCACT
    111 CLIC3 ILMN_179642    99- CGTACGCCGCTACCTGGACAGCG
    3.1   198 CGATGCAGGAGAAAGAGTTCAA
    ATACACGTGTCCGCACAGCGCCG
    AGATCCTGGCGGCCTACCGGCCC
    GCCGTGCAC
    112 CLK2 XM_941392.1   552- GATTATAGCCGGGATCGGGGAG
      651 ATGCCTACTATGACACAGACTAT
    CGGCATTCCTATGAATATCAGCG
    GGAGAACAGCAGTTACCGCAGC
    CAGCGCAGCA
    113 CLN8 NM_018941.3  4486- GGCGCCAGAGCTGGGCTCTTCAA
     4585 CACGGCATTTAGCGCAGAAAGTC
    GTGGTTCAGGCAGTATGGGCCGC
    TGTGACAAAACACCTAAGACTG
    GGTAGTTTA
    114 CLPTM1 NM_001294.3  2389- TCTGTGTTTCCAGCCATCTCGCC
     2488 CTGCCAGCCCAGCACCACTGGGA
    ATCATGGTGAAGCTGATGCAGCG
    TTGCCGAGGGGGTGGGTTGGGC
    GGGGGTGGG
    115 CLSTN1 NM_00100956  4990- TTGAATACTGTTCTGTGACCCTG
    6.2  5089 ACTGCTAGTTCTGAGGACACTGG
    TGGCTGTGCTATGTGTGGCCATC
    CTCCATGTCCCGTCCCTGTAGCT
    GCTCTGTT
    116 CN CN312986.1   491- AGGAAACTAAGACATGGAAAGG
    312986   590 TTAGGTAACTTGCCCAAGGTCGC
    ACAGCTAGTAAGTGGCAGACAT
    CCAGAGTCTCTGCTCTGCTCTTA
    ACTCTCACCA
    117 CNIH4 NM_014184.3   526- AATGACTGAAGCTGGAGAAGCC
      625 GTGGTTGAAGTCAGCCTACACTA
    CAGTGCACAGTTGAGGAGCCAG
    AGACTTCTTAAATCATCCTTAGA
    ACCGTGACCA
    118 CNPY2 NM_014255.5  1038- TTGCAGTAAGCGAACAGATCTTT
     1137 GTGACCATGCCCTGCACATATCG
    CATGATGAGCTATGAACCACTGG
    AGCAGCCCACACTGGCTTGATGG
    ATCACCCC
    119 COLEC NM_130386.2   901- ACACAAGCCAGGCTATCCAGCG
    12  1000 AATCAAGAACGACTTTCAAAATC
    TGCAGCAGGTTTTTCTTCAAGCC
    AAGAAGGACACGGATTGGCTGA
    AGGAGAAAGT
    120 GLT25 NM_024656.2  3067- CTGTGTGCCAGGCCTCACAGACT
    D1  3166 CCCAGTTGGGTTGAAGAATGGTT
    GACTGAGTTTGATTCTTCCTGTA
    CCCTCGGTCGTCTGAGCTGTGTG
    CGGACAAC
    121 COMMD6 NM_203497.3    32- CTCTCGAGTCCGGGCCGCAAGTC
      131 CCAGACGCTGCCCATGGAGGCGT
    CCAGCGAGCCGCCGCTGGATGCT
    AAGTCCGATGTCACCAACCAGCT
    TGTAGATT
    122 CORO1C ILMN_174595    98- AAGTAAAGTIGTTGATGGTGGTG
    4.1   197 AAACACCGTAGGGCATGTGGTTC
    AAAGAGAAGCAGGAGGGCAAGG
    GAAAGTTACCCTGATCTTAGTTT
    GTAGCTTAT
    123 COX6C NM_004374.2    70- GAAGTTTTGCCAAAACCTCGGAT
      169 GCGTGGCCTTCTGGCCAGGCGTC
    TGCGAAATCATATGGCTGTAGCA
    TTCGTGCTATCCCTGGGGGTTGC
    AGCTTTGT
    124 COX7B NM_001866.2   160- CAGAGCCACCAGAAACGTACAC
      259 CTGATTTTCATGACAAATACGGT
    AATGCTGTATTAGCTAGTGGAGC
    CACTTTCTGTATTGTTACATGGA
    CATATGTAG
    125 COX7C NM_001867.2     1- CAAGGTCGTGAAAAAAAAGGTC
      100 TTGGTGAGGTGCCGCCATTTCAT
    CTGTCCTCATTCTCTGCGCCTTT
    CGCAGAGCTTCCAGCAGCGGTAT
    GTTGGGCCA
    126 CPPED1 NM_018340.2  2494- TGTATTTGTTTCTTTACAACAGG
     2593 TGTAGGTATAGGAGGTCAAGAAA
    AGGAGTTCGGTAAAGGGCATAG
    CTAATAACAACCACACATTGGGC
    CAGGCACAG
    127 CR2b NM_00100665   486- GGTGTCAAGCAAATAATATGTGG
    8.1   585 GGGCCGACACGACTACCAACCT
    GTGTAAGTGTTTTCCCTCTCGAG
    TGTCCAGCACTTCCTATGATCCA
    CAATGGACA
    128 CR2 NM_00100665  3581- AGCCCAGTTTCACTGCCATATAC
    8.2  3680 TCTTCAAGGACTTTCTGAAGCCT
    CACTTATGAGATGCCTGAAGCCA
    GGCCATGGCTATAAACAATTACA
    TGGCTCTA
    129 CREB1 NM_004379.3  4856- TTTGATGGTAGGTCAGCAGCAGT
     4955 GCTAGTCTCTGAAAGCACAATAC
    CAGTCAGGCAGCCTATCCCATCA
    GATGTCATCTGGCTGAAGTTTAT
    CTCTGTCT
    130 CREB5 NM_182898.3  7898- ACCTACTCACCTTTTTCCCTTCT
     7997 AAGTTCTGCTAAATCACATCTGC
    CTCATAGAGAAAGGAATGTTGCC
    TTTGAGAACTGTCTTGGAGAACA
    GATAAGCT
    131 CRKL NM_005207.3  4901- TTCTAAAGGAGCAGAAGGACAG
     5000 GTCTCTGAGACAGGATCGTTGTC
    CCTACAGGAGGAACAGTGGCCTT
    GCTTCTTAGACGGTCTTCACTGT
    GTGTTTTAA
    132 CRY2 NM_021117.3  4013- CAGCTCAGGTGGCCCTGAGGGCT
     4112 CCCTCGGAACAGTGCCTCAAATC
    CTGACCCAAGGGCCAGCATGGG
    GAAGAGATGGTTGCAGGCAAAA
    TGCACTTTAT
    133 CS NM_004077.2  2080- CCTCCTAGCAAGACCTGTTGGTT
     2179 AGCTGGACATGCTTTGGCAATTT
    TTTTATACTACCAAGTGACCATA
    AAGGCATGGCATTTGTTGTGACT
    GGCACCCA
    134 CSK NM_004383.2  2501- TCTAGGGACCCCTCGCCCCAGCC
     2600 TCATTCCCCATTCTGTGTCCCAT
    GTCCCGTGTCTCCTCGGTCGCCC
    CGTGTTTGCGCTTGACCATGTTG
    CACTGTTT
    135 CST7 NM_003650.3   618- CAACCACACCTTGAAGCAGACTC
      717 TGAGCTGCTACTCTGAAGTCTGG
    GTCGTGCCCTGGCTCCAGCACTT
    CGAGGTGCCTGTTCTCCGTTGTC
    ACTGACCC
    136 CTAG1B NM_001327.2   286- GCGGGGCCAGGGGGCCGGAGAG
      385 CCGCCTGCTTGAGTTCTACCTCG
    CCATGCCTTTCGCGACACCCATG
    GAAGCAGAGCTGGCCCGCAGGA
    GCCTGGCCCA
    137 CTDSP2 NM_005730.3  4685- GAGGTCGGGCCAGCTGCCCCATT
     4784 CTTTTAACGTTGTAGGGCCTGCC
    CATGGAGCGGACCCTCCTCTTTG
    GGCCTCGTGAGCTTTTTTGCTTA
    TCATGTTC
    138 CTSW NM_001335.3  1076- TGCACCGAGGGAGCAATACCTGT
     1175 GGCATCACCAAGTTCCCGCTCAC
    TGCCCGTGTGCAGAAACCGGATA
    TGAAGCCCCGAGTCTCCTGCCCT
    CCCTGAAC
    139 CTSZ NM_001336.3  1174- CACTGGCTGCGAGTGTTCCTGAG
     1273 AGTTGAAAGTGGGATGACTTATG
    ACACTTGCACAGCATGGCTCTGC
    CTCACAATGATGCAGTCAGCCAC
    CTGGTGAA
    140 CX3CL1 NM_002996.3   141- AGCACCACGGTGTGACGAAATG
      240 CAACATCACGTGCAGCAAGATG
    ACATCAAAGATACCTGTAGCTTT
    GCTCATCCACTATCAACAGAACC
    AGGCATCATG
    141 CXCL2 NM_002089.3   855- ATCACATGTCAGCCACTGTGATA
      954 GAGGCTGAGGAATCCAAGAAAA
    TGGCCAGTGAGATCAATGTGACG
    GCAGGGAAATGTATGTGTGTCTA
    TTTTGTAAC
    142 IL8RB NM_001557.3   410- ACCTCAAAAATGGAAGATTTTAA
      509 CATGGAGAGTGACAGCTTTGAA
    GATTTCTGGAAAGGTGAAGATCT
    TAGTAATTACAGTTACAGCTCTA
    CCCTGCCCC
    143 CXCR5b NM_001716.3  2619- ACGTCCCTTTTTTCTCTGAGTAT
     2718 CTCCTCGCAAGCTGGGTAATCGA
    TGGGGGAGTCTGAAGCAGATGCA
    AAGAGGCAAGAGGCTGGATTTT
    GAATTTTCT
    144 CYBB NM_000397.3  3787- ACTGGAGAGGGTACCTCAGTTAT
     3886 AAGGAGTCTGAGAATATTGGCCC
    TTTCTAACCTATGTGCATAATTA
    AAACCAGCTTCATTTGTTGCTCC
    GAGAGTGT
    145 CYP1B1 NM_000104.3  2361- CTTACACCAAACTACTGAATGAA
     2460 GCAGTATTTTGGTAACCAGGCCA
    TTTTTGGTGGGAATCCAAGATTG
    GTCTCCCATATGCAGAAATAGAC
    AAAAAGTA
    146 DB DB338252.1   436- GTTCTTGGTCTGTATGTGTAGGT
    338252   535 GGAGGGAGGCAAAGTTGTGGTA
    ATAAAGTGGGAAGGCCCGGGAA
    GAACAGCTAACTGTATAGGGGT
    GAAATGACGCT
    147 DBI NM_00107986   241- CATAAATACAGAACGGCCCGGG
    2.1   340 ATGTTGGACTTCACGGGCAAGGC
    CAAGTGGGATGCCTGGAATGAG
    CTGAAAGGGACTTCCAAGGAAG
    ATGCCATGAAA
    148 DCAF7 NM_005828.4  6155- TTAACACTGTGCTGTGAAACAAC
     6254 TATGGGGAATCTCCATTGAAGGC
    TACTTCATGGGCACCTGAAAGTG
    GAGTGTTATAGCTATGACTTTCT
    ATTTCTTG
    149 DDIT4 NM_019058.2  1414- GACCTGTTGTAGGCAGCTATCTT
     1513 ACAGACGCATGAATGTAAGAGT
    AGGAAGGGGTGGGTGTCAGGGA
    TCACTTGGGATCTTTGACACTTG
    AAAAATTACA
    150 DDX23 NM_004818.2  2811- ATTGCACTGGGCCATCAGCTCAT
     2910 GCCAGGCTATGGGGGCAGCCAG
    TTGGCATTGCTCCCCAGACTGAA
    CAGAAACCTGGCCGCCGGATGG
    GACCTCCTTT
    151 DGUOK NM_080916.2   573- ACATCGAGTGGCATATCTATCAG
      672 GACTGGCATTCTTTTCTCCTGTG
    GGAGTTTGCCAGCCGGATCACAT
    TACATGGCTTCATCTACCTCCAG
    GCTTCTCC
    152 DGUOKb NM_080916.2   903- TTGTAAAGAATCTGTAACCAATA
     1002 CCATGAAGTTCAGGCTGTGATCT
    GGGCTCCCTGACTTTCTGAAGCT
    AGAAAAATGTTGTGTCTCCCAAC
    CACCTTTC
    153 DHX16b NM_00116423  2491- CCCGTGTCAACTTCTTTCTCCCT
    9.1  2590 GGCGGTGACCACCTGGTTCTGCT
    AAATGTTTACACACAGTGGGCTG
    AGAGTGGTTACTCTTCCCAGTGG
    TGCTATGA
    154 DHX16 NM_003587.4  3189- ACCAAAGAGTTCATGAGACAGG
     3288 TACTGGAGATTGAGAGCAGTTGG
    CTTCTGGAGGTGGCTCCCCATTA
    TTATAAGGCCAAGGAGCTAGAA
    GATCCCCATG
    155 DKFZp XM_291277.4  4192- CTCCTGCAGCTTCTGTGAGCCAA
    761P04  4291 GCCCCAGCCTGCACCGTCGCTGC
    23 CCCTTCCCTGCCTAACCCTTTCC
    TGTCTCGCCTTGGAAGCACCCAT
    GTCTCCCT
    156 DMBT1 NM_007329.2  3713- CACAATGGCTGGCTCTCCCACAA
     3812 CTGTGGCCATCATGAAGACGCTG
    GTGTCATCTGCTCAGCTTCCCAG
    TCCCAGCCGACACCCAGCCCAGA
    CACTTGGC
    157 DNAJB1 NM_006145.2  1904- GACCTCTGGCTCCAGTGAAGCTG
     2003 AATGTCCTCACTTTGTGGGTCAC
    ACTCTTTACATTTCTGTAAGGCA
    ATCTTGGCACACGTGGGGCTTAC
    CAGTGGCC
    158 DNAJB6 NM_058246.3  2087- CTTCCCTGCATGCTCCCTCCCAG
     2186 TGACTTTCCTTCCCTTTCACATG
    AGGATCTGCCGTTCATGTTGCTT
    TCTCCTTTGTCCTCTTGGACTTG
    AGGGCATT
    159 DOCK5 NM_024940.6  7201- AAAGAGATTTCCATTTCTGCTGC
     7300 CAGAGCTGGTATTTGCCTGCCTG
    ATTCTCTGTGTTTCCTGTTTCAC
    CGCCACCCTTTCAGGAGAGAACT
    ACACCAGT
    160 DPF2 NM_006268.4  2249- TCTCAGCTCATGGGGAAGCCACA
     2348 TAGACATCCCTTTCTTCCCTTGC
    ACGCTCGCTAGCAGCTGGTAAGG
    TCTTCACACCCTGATTCCTCAAG
    TTTTCTGC
    161 DYNC2L NM_016008.3   351- TTTGGGAACTCGGTGGAGGAACC
    I1   450 TCTTTATTGGACTTAATCAGCAT
    ACCCATCACAGGTGACACCTTAC
    GGACGTTTTCTCTTGTTCTCGTT
    CTGGATCT
    162 DZIP3 NM_014648.3  4323- CCCAGTGTCTTGCCCAGTAGATA
     4422 CAAGATAAATATTGCCAGAATCA
    GATATCAGGAAGTAGTAAGAAA
    AGGAGTTAATATGCAAACTAAAT
    CACTCGCTC
    163 EEF1B2 NM_00103766   699- GGATACGGAATTAAGAAACTTC
    3.1   798 AAATACAGTGTGTAGTTGAAGAT
    GATAAAGTTGGAACAGATATGCT
    GGAGGAGCAGATCACTGCTTTTG
    AGGACTATG
    164 EGLN1 NM_022051.1  3976- AGCAGCATGGACGACCTGATAC
     4075 GCCACTGTAACGGGAAGCTGGG
    CAGCTACAAAATCAATGGCCGG
    ACGAAAGCCATGGT
    165 EGR1 NM_001964.2  1506- GAGGCATACCAAGATCCACTTGC
     1605 GGCAGAAGGACAAGAAAGCAGA
    CAAAAGTGTTGTGGCCTCTTCGG
    CCACCTCCTCTCTCTCTTCCTAC
    CCGTCCCCG
    166 EHD4 NM_139265.3  2605- TCAAACATTAAATATCCCGAGGT
     2704 CTCCTTGGTGGGTGGCAGGATTT
    AAATTCAATCAAATCCTGTCCTA
    GTGTGTGCAGTGTCTTCGGCCCT
    GTGGACAC
    167 EID2B NM_152361.2   628- GCCAGTTTAGTTAACTCAGTCAT
      727 TAGGGGGAATGCAAACTGGAAG
    GGAATACGGCAATGTGCAATTG
    AAGGAGGAAGCACACTCCGAAA
    TGGAAACAGAC
    168 EIF2B4 NM_015636.3  1497- GTCTCTAATGAGCTAGATGACCC
     1596 TGATGATCTGCAATGTAAGCGGG
    GAGAACATGTTGCGCTGGCTAAC
    TGGCAGAACCACGCATCCCTACG
    GTTGTTGA
    169 EIF4EN NM_019843.2  3051- CACACTGGGCAGGACCCTGCTTC
    IF1  3150 ATCTCGGGTTGGTTTATGGGCTT
    TTACTTTGGAGCACTCTGTGTGA
    AGCTGTTTGGTGGAACCCATGCA
    TCTGGTGT
    170 EMR4 NM_00108049  1719- GGGAAGACGATTGGATCAATCA
    8.2  1818 TTGCATACTCATTCACCATCATC
    AACACCCTTCAGGGAGTGTTGCT
    CTTTGTGGTACACTGTCTCCTTA
    ATCGCCAGG
    171 EP300 NM_001429.2   716- CCAGCCAGGCCCAACAGAGCAG
      815 TCCTGGATTAGGTTTGATAAATA
    GCATGGTCAAAAGCCCAATGAC
    ACAGGCAGGCTTGACTTCTCCCA
    ACATGGGGAT
    172 EPHX2 NM_001979.5  1909- CATCCTTCCACCTGCTGGGGCAC
     2008 CATTCTTAGTATACAGAGGTGGC
    CTTACACACATCTTGCATGGATG
    GCAGCATTGTTCTGAAGGGGTTT
    GCAGAAAA
    173 ERLIN1 NM_006459.3  3197- TGATGGCCCTGGAGGCGGGGCT
     3296 GAGGAACAGGGAAATGCCGCTG
    TGAAGTCTTAAAGCACTTCTGCT
    TAAACTCCCATGTGTGAGGAGTG
    TGCCTCCCTG
    174 ETFDH NM_004453.3  1904- TGACCTCTTGTCATCTGTGGCTC
     2003 TGAGTGGTACTAATCATGAACAT
    GACCAGCCGGCACACTTAACCTT
    AAGGGATGACAGTATACCTGTAA
    ATAGAAAT
    175 EVI2A NM_014210.3  1410- GAGAGAGCTAAACTGTGTAATTT
     1509 AATGGTATCTTCCTTGCTGGATG
    TGGCAGAATCCACACCAGCTTAT
    CAACCAACACAGCTAATTTTAGA
    ATAGATCC
    176 EWSR1 NM_005243.3  2248- AAAAATGGATAAAGGCGAGCAC
     2347 CGTCAGGAGCGCAGAGATCGGC
    CCTACTAGATGCAGAGACCCCGC
    AGAGCTGCATTGACTACCAGATT
    TATTTTTTAA
    177 EYA3 NM_001990.3  1551- GATTCCTGGTTAGGAACTGCATT
     1650 AAAGTCCTTACTTCTCATCCAGT
    CCAGAAAGAATTGTGTGAATGTT
    CTGATCACTACCACCCAGCTGGT
    TCCAGCCC
    178 C5orf NM_032042.5  4058- TTAGAACAAGTAGAATGGGAAA
    21  4157 GGAGTGACTGATAAATCTAAGAT
    TCAAAATAGTCCCGTCGAAACTT
    AAAGGCCAGATTATTGCTTTGGA
    GCTTTCTAT
    179 FAM NM_199280.2  3306- ACTCTTAGACTCAGAGTCCTTGG
    179A  3405 GAGGCAGCCGCAAGGCCACTGA
    CAGAGGGGTGGCCCCTGACAGC
    AAGACAACTGGCAGCTCATACCC
    TTTTCAGCTG
    180 FAM NM_003704.3  4523- CCCTGACTTGTAGCCAGCTTGTG
    193A  4622 TAAGATCCCTTGCAGAACGAGA
    AAGTTAAAAACAAGCCCACCCA
    GTACTCACACCATCAAGTCTGTT
    ATAGAGTGTA
    181 FAM43A NM_153690.4  2741- AGACCCCTGAAATGTTGCCAAAT
     2840 TCTTCAAATAACTGTTTGGGGGG
    TGGGGGGAGATGAAAGAGAGTC
    GCGTTTTGTTTACAGTTAAAGAC
    ATCCAATAT
    182 FAM50B NM_012135.1  1273- TTCTGAGTATTTTAGTGTTGCCA
     1372 CCTGGATTTGCTGCATTGCTCTG
    CTGAGCTGTATTGAAACCATGAC
    TGGGCCCACTGTCAGACAGAAAT
    TAGAATAG
    183 FAIM3 NM_005449.4  1689- CAGGCTCTAGATCACATGGCATC
     1788 AGGCTGGGGCAGAGGCATAGCT
    ATTGTCTCGGGCATCCTTCCCAG
    GGTTGGGTCTTACACAAATAGAA
    GGCTCTTGC
    184 FKBP1A NM_054014.3   301- AGAAACAAGCCCTTTAAGTTTAT
      400 GCTAGGCAAGCAGGAGGTGATC
    CGAGGCTGGGAAGAAGGGGTTG
    CCCAGATGAGTGTGGGTCAGAG
    AGCCAAACTGA
    185 FLNB NM_001457.3  9148- CAGACCTGAGCTGGCTTTGGAAT
     9247 GAGGTTAAAGTGTCAGGGACGTT
    GCCTGAGCCCAAATGTGTAGTGT
    GGTCTGGGCAGGCAGACCTTTAG
    GTTTTGCT
    186 FNBP1 NM_015033.2  5237- TGTGTGTTGCACTAATTCTAAAC
     5336 TTTGGAGGCATTTTGCTGTGTGA
    GGCCGATCGCCACTGTAAAGGTC
    CTAGAGTTGCCTGTTTGTCTCTG
    GAGATGGA
    187 FOXK2 NM_004514.3  4387- TTTTTTGCCGTAGGCACCATTCT
     4486 GCATCTTGAACCCAGACTGAAGT
    GTGCCTCTCACAGATGGAAGGTG
    CACACGCTCCTGTCTCCTCCTCA
    CTCTGCCA
    188 FRAT2 NM_012083.2  1769- CTTGTCCTCCCAGCTGAGCTTTC
     1868 TTATTCCACCCTTTCTGGTGTCT
    ATAGGAATGCATGAGAGACCCTG
    GACGTTTTTCTGCTCTCTTCTGG
    CCCTCCAT
    189 FTHL16 XR_041433.1   255- GGACTCAGAGGCCGCCATCAAC
      354 CGCCAGATCAACCTAGAGCTCTG
    TGCCTCCTACGTTTACCTGTCCA
    TGTCTTACTGCTTTGACCGTGAT
    GATGTGGCT
    190 GATA2 NM_00114566  2573- GTCCAGTTGATTGTACGTAGCCA
    2.1  2672 CAGGAGCCCTGCTATGAAAGGA
    ATAAAACCTACACACAAGGTTG
    GAGCTTTGCAATTCTTTTTGGAA
    AAGAGCTGGG
    191 GLIS3 NM_00104241   548- ACTCGCGCTGGCCGGCCGGGGG
    3.1   647 AAGGGACCCGCACGCCGGGCTTT
    GTTGTGGAAATCCCGGTTACCTG
    GCTTATAACCCACACCATGGATA
    ACTTATTGG
    192 GLRX ILMN_173730   119- AAAGCATAGTTGGTCTTGGTGTC
    8.1   218 ATATGGATCAGAGGCACAAGTG
    CAGAGGCTGTGGTCATGCGGAA
    CACTCTGTTATTTAAGATGGCTA
    TCCAGATAAT
    193 GNL3 NM_014366.4  1733- TACAGCAGGTGAACAGTCTACA
     1832 AGGTCTTTTATCTTGGATAAAAT
    CATTGAAGAGGATGATGCTTATG
    ACTTCAGTACAGATTATGTGTAA
    CAGAACAAT
    194 GNS NM_002076.3  4988- CCTGTGTTTGCATCCTCTGTTCC
     5087 TATTCTGCCCTTGCTCTGTGTCA
    TCTCAGTCATTTGACTTAGAAAG
    TGCCCTTCAAAAGGACCCTGTTC
    ACTGCTGC
    195 GOLGA3 NM_005895.3  8961- CTCACTGACCGGAAGGTCCAGGT
     9060 GAATCTCGTCATAAGTGATCTCA
    GGCTCTCACAGGATCCGGAGGG
    AAATGTGTTAGAGGGTCTGGAA
    AATTCAGTGC
    196 GPAT NM_022078.2  1686- AGTCTGGGAGCAGCAGTCTTCGT
    CH3  1785 GGCTGGTTCAGGGTGTTTTGTTC
    CGAGCCTGCCTGCCTGCCGGTTC
    TATACCTCAGGGGCATTTTTACA
    AAAAGCCC
    197 GPI NM_000175.2  1696- CAGTGCTCAAGTGACCTCTCACG
     1795 ACGCTTCTACCAATGGGCTCATC
    AACTTCATCAAGCAGCAGCGCG
    AGGCCAGAGTCCAATAAACTCGT
    GCTCATCTG
    198 GPR65 NM_003608.3  1899- TATGATTTTTCTCACTCTTTCTT
     1998 TGGACTCCAGGGTGTCAGCCATC
    AGGTCTCCTAATTTTGTGTACCG
    GTCTCCAACAACCCCAGCTACTG
    AATACTGC
    199 GSTO1 NM_004832.2   897- AGAGCTCTACTTACAGAACAGCC
      996 CTGAGGCCTGTGACTATGGGCTC
    TGAAGGGGGCAGGAGTCAGCAA
    TAAAGCTATGTCTGATATTTTCC
    TTCACTAAT
    200 GUSB NM_000181.3  2032- GGTATCCCCACTCAGTAGCCAAG
     2131 TCACAATGTTTGGAAAACAGCCT
    GTTTACTTGAGCAAGACTGATAC
    CACCTGCGTGTCCCTTCCTCCCC
    GAGTCAGG
    201 GZMA NM_006144.3   636- GCCTCCGAGGTGGAAGAGACTC
      735 GTGCAATGGAGATTCTGGAAGCC
    CTTTGTTGTGCGAGGGTGTTTTC
    CGAGGGGTCACTTCCTTTGGCCT
    TGAAAATAA
    202 GZMB NM_004131.3   541- ACACTACAAGAGGTGAAGATGA
      640 CAGTGCAGGAAGATCGAAAGTG
    CGAATCTGACTTACGCCATTATT
    ACGACAGTACCATTGAGTTGTGC
    GTGGGGGACC
    203 GZMH NM_033423.4   718- GGCCCCTCGTGTGTAAGGACGTA
      817 GCCCAAGGTATTCTCTCCTATGG
    AAACAAAAAAGGGACACCTCCA
    GGAGTCTACATCAAGGTCTCACA
    CTTCCTGCC
    204 HAT1 NM_003642.3  1235- AACCAAATAGAAATAAGCATGC
     1334 AACATGAACAGCTGGAAGAGAG
    TTTTCAGGAACTAGTGGAAGATT
    ACCGGCGTGTTATTGAACGACTT
    GCTCAAGAGT
    205 HAVCR2 NM_032782.3   956- TATATGAAGTGGAGGAGCCCAA
     1055 TGAGTATTATTGCTATGTCAGCA
    GCAGGCAGCAACCCTCACAACCT
    TTGGGTTGTCGCTTTGCAATGCC
    ATAGATCCA
    206 HDAC3 NM_003883.3  1765- AAGATGAAGAGAGAGAGATTTG
     1864 GAAGGGGCTCTGGCTCCCTAACA
    CCTGAATCCCAGATGATGGGAA
    GTATGTTTTCAAGTGTGGGGAGG
    ATATGAAAAT
    207 HERC1 NM_003922.3 14664- CAATCGACATGGACAACTACATG
    14763 CTCTCGAGAAACGTGGACAACG
    CCGAGGGCTCCGACACTGACTAC
    TGACCGTGCGGGTGCTCTCACCC
    TCCCTTCTC
    208 HERC3 NM_014606.2  3796- TAAGAATGATTTAGACTGACCTG
     3895 TCCTTTTTTATCTGCGCATGCGA
    GAACATCACCTTCCTCTGTACAC
    TTGGAAATGCCTCTGGCTTGTTG
    CAGCCCTC
    209 HK3 NM_002115.2  2785- AGTCAGAGGATGGGTCCGGCAA
     2884 AGGTGCGGCCCTGGTCACCGCTG
    TTGCCTGCCGCCTTGCGCAGTTG
    ACTCGTGTCTGAGGAAACCTCCA
    GGCTGAGGA
    210 HLA-B NM_005514.6   938- CCCTGAGATGGGAGCCGTCTTCC
     1037 CAGTCCACCGTCCCCATCGTGGG
    CATTGTTGCTGGCCTGGCTGTCC
    TAGCAGTTGTGGTCATCGGAGCT
    GTGGTCGC
    211 HLA- NM_002118.3    21- CCCGTGAGCTGGAAGGAACAGA
    DMB   120 TTTAATATCTAGGGGCTGGGTAT
    CCCCACATCACTCATTTGGGGGG
    TCAAGGGACCCGGGCAATATAG
    TATTCTGCTC
    212 HLA-G NM_002127.4  1181- AAGAGCTCAGATTGAAAAGGAG
     1280 GGAGCTACTCTCAGGCTGCAATG
    TGAAACAGCTGCCCTGTGTGGGA
    CTGAGTGGCAAGTCCCTTTGTGA
    CTTCAAGAA
    213 HMGB1 NM_002128.4   209- TATGCATTTTTTGTGCAAACTTG
      308 TCGGGAGGAGCATAAGAAGAAGC
    ACCCAGATGCTTCAGTCAACTTC
    TCAGAGTTTTCTAAGAAGTGCTC
    AGAGAGGT
    214 HMGB2 NM_002129.3   670- TGCTGCATATCGTGCCAAGGGCA
      769 AAAGTGAAGCAGGAAAGAAGGG
    CCCTGGCAGGCCAACAGGCTCA
    AAGAAGAAGAACGAACCAGAAG
    ATGAGGAGGAG
    215 HNRNP NM_004499.3  1246- CCCCATGGAAATCACTCTCCTGT
    AB  1345 TGACTATTTCCAGAGCTCTAGGT
    GTTTAGGCAGCGTGTGGTGTCTG
    AGAGGCCATAGCGCCATCATGG
    GCTGATTTT
    216 HNRNPK NM_031263.2   538- TCCCTACCTTGGAAGAGGGCCTG
      637 CAGTTGCCATCACCCACTGCAAC
    CAGCCAGCTCCCGCTCGAATCTG
    ATGCTGTGGAATGCTTAAATTAC
    CAACACTA
    217 HOOK3 NM_032410.3  2391- GCAAGGTAGAGAAGTTGTGCCG
     2490 CTCAATCACAGACACCTGCACCC
    ACAACATACTTCTGTTACACACA
    AGAACATTTCAGGAAACTCAGCC
    AGCTTATTT
    218 HOPX NM_139211.4   590- AACAATAGGAAGCTATGTGTATC
      689 TTCTGTGTAAAGCAGTGGCTTCA
    CTGGAAAAATGGTGTGGCTAGC
    ATTTCCCTTTGAGTCATGATGAC
    AGATGGTGT
    219 HPSE NM_006665.5  3920- GAGGTTCCTATAATTGTCTCTGA
     4019 GTAACCCTTTGGAATGGAGAGG
    GTGTTGGTCAGTCTACAAACTGA
    ACACTGCAGTTCTGCGCTTTTTA
    CCAGTGAAA
    220 HSCB NM_172002.3   343- TCCACCCAGATTTCTTCAGCCAG
      442 AGGTCTCAGACTGAAAAGGACTT
    CTCAGAGAAGCATTCGACCCTGG
    TGAATGATGCCTATAAGACCCTC
    CTGGCCCC
    221 HSD11B NM_181755.1   156- GCCTACTACTACTATTCTGCAAA
    1   255 CGAGGAATTCAGACCAGAGATG
    CTCCAAGGAAAGAAAGTGATTG
    TCACAGGGGCCAGCAAAGGGAT
    CGGAAGAGAGA
    222 HSP90A NM_007355.3  1531- GGCATTCTCTAAAAATCTCAAGC
    B1  1630 TTGGAATCCACGAAGACTCCACT
    AACCGCCGCCGCCTGTCTGAGCT
    GCTGCGCTATCATACCTCCCAGT
    CTGGAGAT
    223 HSPA6 NM_002155.4  1990- GTGGCACTCAAGCCCGCCAGGG
     2089 GGACCCCAGCACCGGCCCCATCA
    TTGAGGAGGTTGATTGAATGGCC
    CTTCGTGATAAGTCAGCTGTGAC
    TGTCAGGGC
    224 HUWE1 NM_031407.6 13637- CCACCAACTCACCGTGTGTGTCC
    13736 CAGCTGCCCCATCTTCCCCAGCG
    CATACCTGTTCCTCTTCTCATTC
    TCTCCCCGCCGCCTGTTTCCTCA
    CCTTCTCT
    225 HVCN1 NM_032369.3   747- TGTTCCAGGAGCACCAGTTTGAG
      846 GCTCTGGGCCTGCTGATTCTGCT
    CCGGCTGTGGCGGGTGGCCCGG
    ATCATCAATGGGATTATCATCTC
    AGTTAAGAC
    226 IDO1 NM_002164.3    51- CTATTATAAGATGCTCTGAAAAC
      150 TCTTCAGACACTGAGGGGCACCA
    GAGGAGCAGACTACAAGAATGG
    CACACGCTATGGAAAACTCCTGG
    ACAATCAGT
    227 IDS NM_006123.4  1016- TGGATGGACATCAGGCAACGGG
     1115 AAGACGTCCAAGCCTTAAACATC
    AGTGTGCCGTATGGTCCAATTCC
    TGTGGACTTTCAGCGGAAAATCC
    GCCAGAGCT
    228 IER5 NM_016545.4  1802- ACTTTACACCTACCCCTCACCGG
     1901 AAAGCTAGACCCGCTTCAGGGCC
    AGGAGTGGCGTTTCCGCACAGG
    ATTTCCTAAGACGAGAGGGATTT
    AGCCAAGAG
    229 IFI27 NM_032036.2   305- GTCAGTGTTGGGGGCCTGCTTGG
    L2   404 GGAATTCACCTTCTTCTTCTCTC
    CCAGCTGAACCCGAGGCTAAAGA
    AGATGAGGCAAGAGAAAATGTA
    CCCCAAGGT
    230 IFNA17 NM_021268.2   292- TGAGATGATCCAGCAGACCTTCA
      391 ATCTCTTCAGCACAGAGGACTCA
    TCTGCTGCTTGGGAACAGAGCCT
    CCTAGAAAAATTTTCCACTGAAC
    TTTACCAG
    231 IFNAR1 NM_000629.2  3124- CTAATCAGCTCTCAGTGATCAAC
     3223 CCACTCTTGTTATGGGTGGTCTC
    TGTCACTTTGAATGCCAGGCTGG
    CTTCTCGTCTAGCAGTATTCAGA
    TACCCCTT
    232 IFNAR2 NM_000874.3   632- AAATACCACAAGATCATTTTGTG
      731 ACCTCACAGATGAGTGGAGAAG
    CACACACGAGGCCTATGTCACCG
    TCCTAGAAGGATTCAGCGGGAA
    CACAACGTTG
    233 IFNGR1 NM_000416.1  1141- CCCGGGCAGCCATCTGACTCCAA
     1240 TAGAGAGAGAGAGTTCTTCACCT
    TTAAGTAGTAACCAGTCTGAACC
    TGGCAGCATCGCTTTAAACTCGT
    ATCACTCC
    234 IGFBP7 NM_001553.2   584- ATCGGAATCCCGACACCTGTCCT
      683 CATCTGGAACAAGGTAAAAAGG
    GGTCACTATGGAGTTCAAAGGAC
    AGAACTCCTGCCTGGTGACCGGG
    ACAACCTGG
    235 IL16 NM_004513.4  1263- GGCATCTCCAACATCATCATCCA
     1362 ACGAAGACTCAGCTGCAAATGG
    TTCTGCTGAAACATCTGCCTTGG
    ACACAGGGTTCTCGCTCAACCTT
    TCAGAGCTG
    236 IL1B NM_000576.2   841- GGGACCAAAGGCGGCCAGGATA
      940 TAACTGACTTCACCATGCAATTT
    GTGTCTTCCTAAAGAGAGCTGTA
    CCCAGAGAGTCCTGTGCTGAATG
    TGGACTCAA
    237 ILIR2 NM_173343.1   114- TGCTTCTGCCACGTGCTGCTGGG
      213 TCTCAGTCCTCCACTTCCCGTGT
    CCTCTGGAAGTTGTCAGGAGCAA
    TGTTGCGCTTGTACGTGTTGGTA
    ATGGGAGT
    238 IL4 NM_000589.2   626- GACACTCGCTGCCTGGGTGCGAC
      725 TGCACAGCAGTTCCACAGGCACA
    AGCAGCTGATCCGATTCCTGAAA
    CGGCTCGACAGGAACCTCTGGG
    GCCTGGCGG
    239 IL7 NM_000880.2    39- AATAACCCAGCTTGCGTCCTGCA
      138 CACTTGTGGCTTCCGTGCACACA
    TTAACAACTCATGGTTCTAGCTC
    CCAGTCGCCAAGCGTTGCCAAGG
    CGTTGAGA
    240 INTS4 NM_033547.3   652- CCCACGTGTCAGAACAGCAGCTA
      751 TAAAAGCCATGTTGCAGCTCCAT
    GAAAGAGGACTGAAATTACACC
    AAACAATTTATAATCAGGCCTGT
    AAATTACTC
    241 IRAK2 NM_001570.3  1286- GTGTTGGCCGAGGTCCTCACGGG
     1385 CATCCCTGCAATGGATAACAACC
    GAAGCCCGGTTTACCTGAAGGAC
    TTACTCCTCAGTGATATTCCAAG
    CAGCACCG
    242 IRF1 NM_002198.1   511- CTGTGCGAGTGTACCGGATGCTT
      610 CCACCTCTCACCAAGAACCAGAG
    AAAAGAAAGAAAGTCGAAGTCC
    AGCCGAGATGCTAAGAGCAAGG
    CCAAGAGGAA
    243 IRF4 NM_002460.1   326- GGGCACTGTTTAAAGGAAAGTTC
      425 CGAGAAGGCATCGACAAGCCGG
    ACCCTCCCACCTGGAAGACGCGC
    CTGCGGTGCGCTTTGAACAAGAG
    CAATGACTT
    244 KIAA NM_014761.3  2187- ATGGATGGGACTCTTATGTCATA
    0174  2286 ACTTCTGTTACTCCTTTGGCCCA
    TAGCTAAGGTCATCCTTCCCCAC
    AGGGGTGGCTTTGGGATTGGATG
    ATACAGCT
    245 ITCH NM_00125713   439- GAGGTGACAAAGAGCCAACAGA
    8.1   538 GACAATAGGAGACTTGTCAATTT
    GTCTTGATGGGCTACAGTTAGAG
    TCTGAAGTTGTTACCAATGGTGA
    AACTACATG
    246 ITFG2 NM_018463.3  1985- GTCTGGTCTTACCCATGTTCCTA
     2084 GCAACCCTGAGATGATTTTCTTC
    CATTTACCAAAGCAGCCGGGTCA
    GTGCTTTCTCACGTTGCCGTATT
    CTTCAGGT
    247 ITGAE NM_002208.4  3406- CTGAATGCAGAGAACCACAGAA
     3505 CTAAGATCACTGTCGTCTTCCTG
    AAAGATGAGAAGTACCATTCTTT
    GCCTATCATCATTAAAGGCAGCG
    TTGGTGGAC
    248 ITGAL NM_002209.2  3906- GTGAGGGCTTGTCATTACCAGAC
     4005 GGTTCACCAGCCTCTCTTGGTTT
    CCTTCCTTGGAAGAGAATGTCTG
    ATCTAAATGTGGAGAAACTGTAG
    TCTCAGGA
    249 JAK1 NM_002227.1   286- GAGAACACCAAGCTCTGGTATGC
      385 TCCAAATCGCACCATCACCGTTG
    ATGACAAGATGTCCCTCCGGCTC
    CACTACCGGATGAGGTTCTATTT
    CACCAATT
    250 KIAA NM_015443.3  4402- CCTTCACATCCAGATCCCTGTCG
    1267  4501 GTGTTAGTTCCACTCTTGGTCTT
    TCACGCTCCCCTTGCCTGTGGAA
    CATTGTCTGGTCCTAGCTGTGGT
    TCCCATTG
    251 MYST4 NM_012330.3  6541- CCCAGACTGTAGCCATGCAGGGT
     6640 CCTGCACGGACTTTAACGATGCA
    AAGAGGCATGAACATGAGTGTG
    AACCTGATGCCAGCGCCAGCCTA
    CAATGTCAA
    252 KCTD12 NM_138444.3  4208- ACAAGTAAAATAACTTGACATG
     4307 AGCACCTTTAGATCCCTTCCCCT
    CCATGGGCTTTGGGCCACAGAAT
    GAACCTTTGAGGCCTGTAAAGTG
    GATTGTAAT
    253 KIAA NM_014736.4   236- CGACATCAGTTTCATCGAGGAAA
    0101   335 GCTGAAAATAAATATGCAGGAG
    GGAACCCCGTTTGCGTGCGCCCA
    ACTCCCAAGTGGCAAAAAGGAA
    TTGGAGAATT
    254 SETD1B XM_037523.11  7779- ATCGTGCCCAGTGTTAACCTCGG
     7878 CTGGCCTTCACTAAGGGGACTAG
    ACCTCCCTCTCCCCAGGAGCCCC
    AGCCCCAGAGTGGTTTGCAATAA
    TCAAGATA
    255 KIR2DL XM_00112635   265- GAGGTGACATATGCACAGTTGG
    5A 4.1   364 ATCACTGCGTTTTCACACAGACA
    AAAATCACTTCCCCTTCTCAGAG
    GCCCAAGACACCTCCAACAGAT
    ACCACCATGT
    256 KIR_  NM_014512.1   719- TCCGAAACCGGTAACCCCAGAC
    Acti-   818 ACCTACATGTTCTGATTGGGACC
    vatin TCAGTGGTCAAAATCCCTTTCAC
    g_Sub- CATCCTCCTCTTCTTTCTCCTTC
    group_ ATCGCTGGT
    2
    257 KIR2D NM_012313.1     1- CCGGCAGCACCATGTCGCTCATG
    S3   100 GTCATCAGCATGGCATGTGTTGG
    GTTCTTCTGGCTGCAGGGGGCCT
    GGCCACATGAGGGATTCCGCAG
    AAAACCTTC
    258 KLRB1 NM_002258.2   357- CAGCAACTCCGAGAGAAATGCTT
      456 GTTATTTTCTCACACTGTCAACC
    CTTGGAATAACAGTCTAGCTGAT
    TGTTCCACCAAAGAATCCAGCCT
    GCTGCTTA
    259 KLRC1 NM_002259.3   336- ACCTATCACTGCAAAGATTTACC
      435 ATCAGCTCCAGAGAAGCTCATTG
    TTGGGATCCTGGGAATTATCTGT
    CTTATCTTAATGGCCTCTGTGGT
    AACGATAG
    260 KLRC2 NM_002260.3     943- TATGTGAGTCAGCTTATAGGAAG
     1042 TACCAAGAACAGTCAAACCCAT
    GGAGACAGAAAGTAGAATAGTG
    GTTGCCAATGTCTCAGGGAGGTT
    GAAATAGGAG
    261 KLRD1 NM_002262.3   597- CAATTTTACTGGATTGGACTCTC
      696 TTACAGTGAGGAGCACACCGCCT
    GGTTGTGGGAGAATGGCTCTGCA
    CTCTCCCAGTATCTATTTCCATC
    ATTTGAAA
    262 KLRF1 NM_016523.1   544- TATACAGAAAAACCTAAGACAA
      643 TTAAACTACGTATGGATTGGGCT
    TAACTTTACCTCCTTGAAAATGA
    CATGGACTTGGGTGGATGGTTCT
    CCAATAGAT
    263 KLRF1b NM_016523.2   849- AAGTGCAATTAAATGCCAAAATC
      948 TCTTCTCCCTTCTCCCTCCATCA
    TCGACACTGGTCTAGCCTCAGAG
    TAACCCCTGTTAACAAACTAAAA
    TGTACACT
    264 KRTAP NM_198696.2   213- CTGCTGCCAGGCGGCCTGTGAGC
    10-3   312 CCAGCCCCTGCCAGTCAGGCTGC
    ACCAGCTCCTGCACGCCCTCGTG
    CTGCCAGCAGTCTAGCTGCCAGC
    CAGCTTGC
    265 KYNU NM_00103299   936- TTGCCTGCTGGTGTTCCTACAAG
    8.1  1035 TATTTAAATGCAGGAGCAGGAG
    GAATTGCTGGTGCCTTCATTCAT
    GAAAAGCATGCCCATACGATTA
    AACCTGCGAG
    266 LAMA5 NM_005560.4 11163- CCAACCCCGGCCCCTGGTCAGGC
    11262 CCCTGCAGCTGCCTCACACCGCC
    CCTTGTGCTCGCCTCATAGGTGT
    CTATTTGGACTCTAAGCTCTACG
    GGTGACAG
    267 LDHA NM_00116541  1348- ATCTTGTGTAGTCTTCAACTGGT
    6.1  1447 TAGTGTGAAATAGTTCTGCCACC
    TCTGACGCACCACTGCCAATGCT
    GTACGTACTGCATTTGCCCCTTG
    AGCCAGGT
    268 LEF1 NM_016269.4  3136- AACACATAGTGGCTTCTCCGCCC
     3235 TTGTAAGGTGTTCAGTAGAGCTA
    AATAAATGTAATAGCCAAACCC
    ACTCTGTTGGTAGCAATTGGCAG
    CCCTATTTC
    269 LETM2 NM_144652.3  1331- AAAGGACCCATCACTTCTTCTGA
     1430 AGAACCTACACTCCAGGCCAAAT
    CACAAATGACGGCCCAGAACAG
    CAAGGCTAGTTCAAAAGGAGCA
    TAAAGGACTA
    270 LIF NM_002309.3  1241- GGGATGGAAGGCTGTCTTCTTTT
     1340 GAGGATGATCAGAGAACTTGGG
    CATAGGAACAATCTGGCAGAAG
    TTTCCAGAAGGAGGTCACTTGGC
    ATTCAGGCTC
    271 LILRA5 NM_021250.3  1044- TTGAATGCTGGAGCCTTGGAAGC
     1143 GAATCTGATGGTCCTAGGAGGTT
    CGGGAAGACCATCTGAGGCCTAT
    GCCATCTGGACTGTCTGCTGGCA
    ATTTCTTT
    272 LILRA5  NM_181879.2   546- CACCCTCTCAGCCCTGCCCAGTC
    b   645 CTGTGGTGACCTCAGGAGAGAA
    CGTGACCCTCCAGTGTGGCTCAC
    GGCTGAGATTCGACAGGTTCATT
    CTGACTGAG
    273 LOC NR_002809.2   471- GCGGCAGCCAATCAGCGCGCGG
    338799   570 CTTCTATAGGGCTTGAGTTATTA
    GACGCTGATCTCAAAACATCCTT
    CATCAGACACGAAGGAGAGGCC
    AACAGATGAG
    274 LOC100 XM_00171659   568- AGGGTCATGCAGCTACTGAGGTC
    129022 1.1   667 ACAGCCTGGATTCATACACAGGT
    CTGACTCCTGAGCACTTAGCCAG
    GTGGCTGTAACAGTGTTCCCAGA
    AACACAGG
    275 LOC100 XM_00173282  1148- ACCTGTCTTCCGGGTCTGTTCAC
    129697 2.2  1247 CCGTCCCCTGGACTGGCACCAGC
    ACAGAGGGTCGAGTGTTGGCAC
    CTGTCTTCTGGGTCTCCATCCCT
    CCCTTTGTT
    276 LOC100 XM_00171715  1469- GAGAATGTCTGCGCGGAGACAG
    130229 8.1  1568 CATAGCTCTGTAGAAATGAGTGG
    CAGCGTATGTAACCTGGCATTTT
    GAACCCAGGAGCACAATTTTATT
    AAAGGAAAA
    277 LOC100 XR_036994.1    15- GAGTAGTAGGTGGACAGCCGTC
    132797   114 CCACACAAGGGTTTGTATCTGGG
    CTACACAGATTCCCTTCAGAAAA
    GCACCAATGTAAGCAACTCCCTT
    ACAGTTGCT
    278 LOC100 XR_039238.1   342- GAGATAGCTTCCTGAAATGTGTG
    133273   441 AAGGAAAATGATCAGAAAAAGA
    AAGAAGCCAAAGAGAAAGGTAC
    CTGGGTTCAACTAAAGTGCCAGC
    CTGCTCCACC
    279 LOC NM_144692.1  3367- GCTCTGTCCTTTGCCGCTCAGAC
    148137  3466 CAAAAACCTTAGAGCTGTCTTTG
    ACTTCTGTCTTTCCCTTCCACCC
    ACAGTTAACCAGGAAATCCTGCC
    ATCTCCGC
    280 LOC NR_024275.1  5062- GGTTACAGCCATTTTGTGTGATT
    151162  5161 CACTTCGGGGGTTAAGTAATGCA
    GGATTCTGCAAACAAGGTGTCGC
    CGTCCAAATGTACTGTCCTGGCA
    TAGAGAGC
    281 Clorf NM_00100380  2561- ACATGGCGCCACGGCCACTTCCT
    222 8.1  2660 GCTGCCCTGGACCCCGCAAGCCC
    AGGGACATCCAAGAGCACCCCT
    CCTGAGACCCCAGACTCAGAAG
    CAGCGAGAAG
    282 LOC XM_934917.1   376- CCCCTGGTGGACCGCGACCTCCG
    339674   475 CAAGACGCTAATGGTGCGCGAC
    AACCTGGCCTTCGGCGGCCCGGA
    GGTCTGAGCCGACTTGCAAAGG
    GGATAGGCGG
    283 LOC XM_371757.4   210- GCAAAGCACTATCACAAGGAAT
    648000   309 ATAGGCAAATGTACAGAACTGA
    AATTCGAGTGGCGAGGATGGCA
    AGAAAAGCTGGCAACTTCTATGT
    ACCTGCAGAAC
    284 LOC XR_017684.2    82- AAGATTATGTCTTCCCCTGTTTC
    391126   181 CAAAGAGCTGAGACAGAAGTACA
    ATGTGCAATCCATGCCCATCCGA
    AAGGATGATGAAGTTCAGGTTGT
    ACGAGGGC
    285 LOC XM_930634.1  1448- ATGGGACCCACTCTACTGAGGCT
    399753  1547 TTATGTAGAACTCATAGAGGAAG
    CTGGCTTTGAGGAATGAACTACC
    CTGTGCTTTTCTTAGGACTAAAA
    TCTCAGGA
    286 LOC XM_934471.1    21- GACGGTAACCGGGACCCAGTGT
    399942   120 CTGCTCCTGTCACCTTCGCCTCC
    TAATCCCTAGCCACTATGCGAGA
    TGACTCCTTCAACACCTTCAGTG
    AGACGGGTG
    287 LOC XM_498648.3   552- GAGTTTTCCAAACCCTGGATTTC
    440389   651 CTTCGGAGAGAGCTAGATTCTAT
    TCCATTCTTGGAATTCAGCTCCT
    TGCCCTTCTCTGTGACCCCGGAT
    CGCGAATG
    288 LOC XM_942885.1  1533- TGTTGCAAAAGCCAACTACCACT
    440928  1632 GTCAAACTTAGCCCGTTTACAAC
    ATGGGGAAAGGCGTATTTCTTAC
    TAATATCTCAACAACGATAACAA
    TGCTGTAT
    289 LOC XR_018937.2   287- CGGGTGCAGCGGGAAAAGGCTA
    441073   386 ATGGCACAACTGTCCACGTAGGC
    ATTCACCCCAGCAAGGTGGTTAT
    CACTAGGCTAAAACTGGACAAA
    GACTGTGAAA
    290 LOC XR_036892.1   591- GGTGAAGAATTTGTTCTATTATG
    642812   690 AAGATACTGTCTGGGCTAAAAA
    GCTTACAGTGAGTGGAAGATAG
    CAACTTGTAGGGTTGGTGGCTGA
    ACAGGCCGAC
    291 LOC XM_927980.1   255- CTGGCTCAAGGATGGCACGGTGT
    643319   354 TATGTGAGCTCAATAATGCACTG
    TACCCCAAGGGGCAGGTCCCAGT
    AAAGAAGATCCAGGCCTCCACC
    ATGGCCTTC
    292 LOC XR_017529.2    38- CAGGCGCTGCAAGTTCTCCCAGG
    644315   137 AGAAAGCCATGTTCAGTTCGAGC
    GCCAAGATCGTGAAGCCCAATG
    GCGAGAAGCCGGACGAGTTCGA
    GTCCGGCCAT
    293 LOC XM_928884.1    13- GAAGCACTGGTAAATGTCTGCTG
    645914   112 CATTAACTCACTCAGACCAAACT
    TTCTCTTATCTAGGTCCAAAAGG
    AAGCTGCTCGGCTGGAAGGAAC
    CTGGTGAGG
    294 LOC XR_018104.1   670- AGGTGCTGCAAAATTACCAGGA
    647340   769 ATACAGTCTGGCCAACAGCATCT
    ACTACTCTCTGAAGGAGTCCACC
    ACTAGTGAGCAGAGTGCCAGGA
    TGACAGCCAT
    295 LOC XR_038906.2  1638- TGGAGAGAAGAATGAAGAGGTG
    648927  1737 GTGGTTCTGGGTTTGATTTGAGT
    TCACCTGTGGGCAGTGGGCAGTG
    TCTTGGTGAAAGGGAGCGGATA
    CTACTTTTTG
    296 LOC XM_938755.1    38- GCCCTTCTGCCATCAACGAGGTG
    653773   137 GTGACCCAAGAACATACCATCA
    ACATTCACAAGCGCATCCATGGA
    GAGGGCTTCAAGAAGCGTGCTCC
    TCGGGCACT
    297 LOC XR_015610.3  1861- GTAGTTGTCCACTGCTTTCCTGG
    728533  1960 ATGGATGGGACTCTTATGTCATA
    ACTTCTATACTCCTTTGGCCCAT
    AGCTAAGGTCATCCTTCCCCACA
    GGGGTGGC
    298 LOC XM_00113319   510- CCAAACCAAAAGAGGCAAGCAA
    728835 0.1   609 GTCTGCGCTGACCCCAGTGAGTC
    CTGGGTCCAGGAGTACGTGTATG
    ACCTGGAACTGAACTGAGCTGCT
    CAGAGACAG
    299 LOC XR_040891.2   625- CCCTGGGTGCCCCTTAACCCGGG
    729887   724 CGGTAGCTCGTTAAGATGGCGAA
    GTGTCCGGTCCGGAACACGCGA
    AACCCCAAATCCCGCCTGCCCGA
    CCTCCTGAC
    300 LOC XM_00113427   765- GCGCGGTTGCGGTTAGCGGGCGC
    732111 5.1   864 GGTGCCAAAGCTGCCATCCCCAG
    CTCACAGCTCCTCATATCCACCC
    TGCCCTCATCTTTATGAATTGCG
    TGTAGACC
    301 LOC XM_00113301   182- GCCCTTCAGAGCTGCGGGAGATC
    732371 9.1   281 ATTGATGAGTGCCGGGCCCATGA
    TCCCTCTGTGCGGCCCTCTGTGG
    ATGAGCAGAAGCGCAGACTTAA
    TGATGTGTT
    302 LOC NM_00109977  2666- ATGTTGCATTGACTAGAGGAAAG
    91431 6.1  2765 AGGCATTTGTTGATTGTGGGAAA
    TTTAGCCTGTTTGAGGAAAAATC
    AACTTTGGGGACGAGTGATCCAA
    CACTGCGA
    303 P2RY5 NM_005767.5  2026- AGATTGTTTGCACTGGCGTGTGG
     2125 TTAACTGTGATCGGAGGAAGTGC
    ACCCGCCGTTTTTGTTCAGTCTA
    CCCACTCTCAGGGTAACAATGCC
    TCAGAAGC
    304 LPCAT4 NM_153613.2  1560- CCCCACACACCTCTCGAGGCACC
     1659 TCCCAGACACCAAATGCCTCATC
    CCCAGGCAACCCCACTGCTCTGG
    CCAATGGGACTGTGCAAGCACCC
    AAGCAGAA
    305 LPIN2 NM_014646.2  5620- AGAAAAAACTTAAAAATGGGAT
     5719 GTCCTAAAATGAAAGCTGCTCAA
    AGTCACAGAACAACCGAGGGAC
    AAAGGAGATTGGATGACTGGGA
    AGCGCTGGCCC
    306 Clorf NM_018372.3  1543- TTCCAATACCCAGCTTGCTTCCA
    103  1642 TGGCCAATCTAAGGGCAGAGAA
    GAATAAAGTGGAGAAACCATCT
    CCTTCTACCACAAATCCACATAT
    GAACCAATCC
    307 LRRC47 NM_020710.2  2461- GGGTCAGTGACGGACACTTACCT
     2560 GACAGCGGATCCACAATATTCTC
    GTGCAGTGTGTTTGGAATCCTGG
    TCTGGGCTCTCGTCGTTGGCCTT
    GTAGATCA
    308 LY96 NM_015364.4   439- AAGGGAGAGACTGTGAATACAA
      538 CAATATCATTCTCCTTCAAGGGA
    ATAAAATTTTCTAAGGGAAAATA
    CAAATGTGTTGTTGAAGCTATTT
    CTGGGAGCC
    309 LYN NM_002350.1  1286- TCCTGAAGAGCGATGAAGGTGG
     1385 CAAAGTGCTGCTTCCAAAGCTCA
    TTGACTTTTCTGCTCAGATTGCA
    GAGGGAATGGCATACATCGAGC
    GGAAGAACTA
    310 MAGEA1 NM_004988.4   477- AGGGGCCAAGCACCTCTTGTATC
      576 CTGGAGTCCTTGTTCCGAGCAGT
    AATCACTAAGAAGGTGGCTGATT
    TGGTTGGTTTTCTGCTCCTCAAA
    TATCGAGC
    311 MAGEA3 NM_005362.3   850- ACTGTGCCCCTGAGGAGAAAATC
      949 TGGGAGGAGCTGAGTGTGTTAG
    AGGTGTTTGAGGGGAGGGAAGA
    CAGTATCTTGGGGGATCCCAAGA
    AGCTGCTCAC
    312 MAP3K7 NM_145333.1   671- GCCATATTATACTGCTGCCCACG
      770 CAATGAGTTGGTGTTTACAGTGT
    TCCCAAGGAGTGGCTTATCTTCA
    CAGCATGCAACCCAAAGCGCTA
    ATTCACAGG
    313 MARCKS NM_002356.6  1800- GTCAAAAAGGGATATCAAATGA
     1899 AGTGATGGGGTCACAATGGGGA
    AATTGAAGTGGTGCATAACATTG
    CCAAAATAGTGTGCCACTAGAA
    ATGGTGTAAAG
    314 MARCKS NM_023009.5  1117- TCCAAGTAGGTTTTGTTTACCCT
    L1  1216 ACTCCCCAAATCCCTGAGCCAGA
    AGTGGGGTGCTTATACTCCCAAA
    CCTTGAGTGTCCAGCCTTCCCCT
    GTTGTTTT
    315 MBD1 NM_015844.2  2380- TGGCTGCAGGCCTGACTACTGCC
     2479 CACACCAACGAGGTGATCTAGC
    AGATACATGGCAACGTGTGAACT
    GCAACAACGCCTGGTGCCCCAGC
    ACCAACCTT
    316 C19orf NM_174918.2  1062- CATACTAGAGTATACTGCGGCGT
    59  1161 GTTTTCTGTCTACCCATGTCATG
    GTGGGGGAGATTTATCTCCGTAC
    ATGTGGGTGTCGCCATGTGTGCC
    CTGTCACT
    317 MED16 NM_005481.2  2152- TCTGAAGCCCAGCTGCCTGCCCG
     2251 TGTATACGGCCACCTCGGATACC
    CAGGACAGCATGTCCCTGCTCTT
    CCGCCTGCTCACCAAGCTCTGGA
    TCTGCTGT
    318 MEN1 NM_130799.2  2222- CCCAGCCCCTAGAAACCCAAGCT
     2321 CCTCCTCGGAACCGCTCACCTAG
    AGCCAGACCAACGTTACTCAGG
    GCTCCTCCCAGCTTGTAGGAGCT
    GAGGTTTCA
    319 MERTK NM_006343.2   666- GAAGAGATCGTGTCTGATCCCAT
      765 CTACATCGAAGTACAAGGACTTC
    CTCACTTTACTAAGCAGCCTGAG
    AGCATGAATGTCACCAGAAACA
    CAGCCTTCA
    320 MFSD1 NM_022736.2  2023- AAGGGCTGCGTTACACAAAATA
     2122 AACAATGGCATTGTCATAGGCCT
    TCCTTTTACTAGTAGGGCATAAT
    GCTAGGGAATATGTGAAGATGTT
    TTTATGAAG
    321 MID1IP NM_021242.5  3472- AGCTGGCATTTCGCCAGCTTGTA
    1  3571 CGTAGCTTGCCACTCAGTGAAAA
    TAATAACATTATTATGAGAAAGT
    GGACTTAACCGAAATGGAACCA
    ACTGACATT
    322 MPDU1 NM_004870.3  1226- CATTCAGCCAAGCCTCCTCCTCT
     1325 AGCAGCAATTTCCAGCTGTGTAA
    CACTATCCTGGGCAAATGTTTTA
    CCCTGTCCTCCAGCCTCCCTGCT
    TCCCTTCT
    323 MRPL27 NM_148571.1  2189- TCAAACTGGTAGCTATGCTTTGA
     2288 TGTCCTGTTGAGGCCATCGGACA
    GAGACTGGAGCCCAGGTGACAG
    GAGATGGTGATACCAGAAGTCA
    AGGGTTGGGG
    324 MRPS16 NM_016065.3  1811- ATTCAAATGTGGCTGTGATTTCT
     1910 GCATATATCATAGATGGGATCCT
    TCTGAGAATACTGGAATAGGGA
    ATTAGGACACCAAGCCAATTCAG
    CTGTGAACC
    325 MS4A2 NM_000139.3   662- TTCTCACCATTCTGGGACTTGGT
      761 AGTGCTGTGTCACTCACAATCTG
    TGGAGCTGGGGAAGAACTCAAA
    GGAAACAAGGTTCCAGAGGATC
    GTGTTTATGA
    326 MS4A6A NM_022349.3  1290- CTGGGAAGTTAAATGACTGGCCT
     1389 GGCATTATGCTATGAGTTTGTGC
    CTTTGCTGAGGACACTAGAACCT
    GGCTTGCCTCCCTTATAAGCAGA
    AACAATTT
    327 MS4A6A  NM_152851.2   880- CTGCGGTGGAAACAGGCTTACTC
    b   979 TGACTTCCCTGGGAGTGTACTTT
    TCCTGCCTCACAGTTACATTGGT
    AATTCTGGCATGTCCTCAAAAAT
    GACTCATG
    328 MTCH1 NM_014341.2  2081- TCCTCCTCATCTAATGCTCATCT
     2180 GTTTAATGGTGATGCCTCGCGTA
    CAGGATCTGGTTACCTGTGCAGT
    TGTGAATACCCAGAGGTTGGGCA
    GATCAGTG
    329 MYADM NM_00102082  2656- TCTTTTTCCTGGCCATGAGGACA
    0.1  2755 AAAATTACTGAGTGGCCCTTAAA
    GAGGGAAGTTTGTTTTCAGCTGT
    TCTCTTTTGCCCGTAGGTGGGAG
    GGTGGGGA
    330 MYADM  NM_00102082  2789- TGAATGTGTAGTGCACACGCACG
    b 0.1  2888 GGTGTTTCTGTGTGCTAGTTGCT
    TCTTGCTGCTGCTTCCTGCTTGT
    CTGGGACTCACATACATAACGTG
    ATATATAT
    331 C19orf NM_019107.3   649- TGTCCCTGAAAGGGCCAGCACAT
    10   748 CACTGGTTTTCTAGGAGGGACTC
    TTAAGTTTTCTACCTGGGCTGAC
    GTTGCCTTGTCCGGAGGGGCTTG
    CAGGGTGG
    332 MYL12A NM_006471.3   305- TCTCTGGGTAGCAGGGTGGTGTG
      404 ATAGCGGCAGCGAGGGGCTCGG
    AGAGGTGCTCGGATTCTCGTAGC
    TGTGCCGGGACTTAACCACCACC
    ATGTCGAGC
    333 MYLIP NM_013262.3  2701- TTGGGCATTTTGGAAGCTGGTCA
     2800 GCTAGCAGGTTTTCTGGGATGTC
    GGGAGACCTAGATGACCTTATCG
    GGTGCAATACTAGCTAAGGTAA
    AGCTAGAAA
    334 NAT5 NM_181528.3   735- AAACATACCACTCTCATGGTTCA
      834 TAGTATTCACTGTATGTATGCTA
    GGGAAAAGACTTGCTCCAGTCTC
    CTCCTCAGTTCTGTGCCTGAGAA
    CCACTGCT
    335 NADK NM_023018.4  2449- TCCGGGGCTAGTGATCGTGATCC
     2548 CTTTTATTTGCAACTGTAATGAG
    AATTTTTCACACTAACACAGCGA
    GGGACTCAACACGCTGATTCTCC
    TCCTGCCT
    336 NAGK NM_017567.4  1362- GGGCCAGGCACATCGGGCACCT
     1461 CCTCCCCATGGACTATAGCGCCA
    ATGCCATTGCCTTCTATTCCTAC
    ACCTTTTCCTAGGGGGCTGGTCC
    CGGCTCCAC
    337 NCAPG NM_022346.4  3080- ACCCAAGCATCAAAGTCTACTCA
     3179 GCTAAAGACTAACAGAGGACAG
    AGAAAAGTGACAGTTTCAGCTA
    GGACGAACAGGAGGTGTCAGAC
    TGCTGAAGCCG
    338 NCOA5 NM_020967.2  2837- TGGACATGTTCTCGAGATGGGTG
     2936 GCTGTTCGCGACTTTTGTACCAG
    AGTGAAATTGTTAGAAGGAGGG
    TTTCTGGCTGTGGTTCTAAATGG
    AGCCCCAGG
    339 NCR1 NM_004829.5   603- CGATGTTTTGGCTCCTATAACAA
      702 CCATGCCTGGTCTTTCCCCAGTG
    AGCCAGTGAAGCTCCTGGTCACA
    GGCGACATTGAGAACACCAGCC
    TTGCACCTG
    340 NDRG2 NM_016250.2  1516- TATGCATCCTCTGTCCTGATCTA
     1615 GGTGTCTATAGCTGAGGGGTAAG
    AGGTTGTTGTAGTTGTCCTGGTG
    CCTCCATCAGACTCTCCCTACTT
    GTCCCATA
    341 NDUFA4 NM_002489.3   262- TGGGACAGAAATAACCCAGAGC
      361 CCTGGAACAAACTGGGTCCCAAT
    GATCAATACAAGTTCTACTCAGT
    GAATGTGGATTACAGCAAGCTG
    AAGAAGGAAC
    342 NDUFAF NM_174889.4   486- TCCTGCCTCCACCAGTTCAAACT
    2   585 CAAATTAAAGGCCATGCCTCTGC
    TCCATACTTTGGAAAGGAAGAAC
    CCTCAGTGGCTCCCAGCAGCACT
    GGTAAAAC
    343 NDUFB3 NM_002491.2   383- ACAATGGAAGATAGAAGGGACA
      482 CCATTAGAAACTATCCAGAAGA
    AGCTGGCTGCAAAAGGGCTAAG
    GGATCCATGGGGCCGCAATGAA
    GCTTGGAGATAC
    344 NDUFS4 NM_002495.2   326- GAGTTTGATACCAGAGAGCGAT
      425 GGGAAAATCCTTTGATGGGTTGG
    GCATCAACGGCTGATCCCTTATC
    CAACATGGTTCTAACCTTCAGTA
    CTAAAGAAG
    345 NDUFV2 NM_021074.4   687- TTACTATGAGGATTTGACAGCTA
      786 AGGATATTGAAGAAATTATTGAT
    GAGCTCAAGGCTGGCAAAATCC
    CAAAACCAGGGCCAAGGAGTGG
    ACGCTTCTCT
    346 NFAT5 NM_138713.3  3857- CCCAAGAAGCATTTTTTGCAGCA
     3956 CCGAACTCAATTTCTCCACTTCA
    GTCAACATCAAACAGTGAACAA
    CAAGCTGCTTTCCAACAGCAAGC
    TCCAATATC
    347 NFATC1 NM_172389.1  1985- CGAATTCTCTGGTGGTTGAGATC
     2084 CCGCCATTTCGGAATCAGAGGAT
    AACCAGCCCCGTTCACGTCAGTT
    TCTACGTCTGCAACGGGAAGAG
    AAAGCGAAG
    348 NFATC4 NM_00113602  2297- ACAAGAGGGTTTCCCGGCCAGTC
    2.2  2396 CAGGTCTACTTTTATGTCTCCAA
    TGGGCGGAGGAAACGCAGTCCT
    ACCCAGAGTTTCAGGTTTCTGCC
    TGTGATCTG
    349 NFKB1 NM_003998.3  3606- CGGATGCATCTGGGGATGAGGTT
     3705 GCTTACTAAGCTTTGCCAGCTGC
    TGCTGGATCACAGCTGCTTTCTG
    TTGTCATTGCTGTTGTCCCTCTG
    CTACGTTC
    350 NFKB2 NM_002502.2   826- ATCTCCGGGGGCATCAAACCTGA
      925 AGATTTCTCGAATGGACAAGACA
    GCAGGCTCTGTGCGGGGTGGAG
    ATGAAGTTTATCTGCTTTGTGAC
    AAGGTGCAG
    351 NIPBL NM_133433.3  8755- GCGCCGTGATGGCCGCAAACTG
     8854 GTGCCTTGGGTAGACACTATTAA
    AGAGTCAGACATTATTTACAAAA
    AAATTGCTCTAACGAGTGCTAAT
    AAGCTGACT
    352 NLRP3 NM_00107982   416- AGTGGGGTTCAGATAATGCACGT
    1.2   515 GTTTCGAATCCCACTGTGATATG
    CCAGGAAGACAGCATTGAAGAG
    GAGTGGATGGGTTTACTGGAGTA
    CCTTTCGAG
    353 NME1- NM_00101813   484- ACCTGGAGCGCACCTTCATCGCC
    NME2 6.2   583 ATCAAGCCGGACGGCGTGCAGC
    GCGGCCTGGTGGGCGAGATCATC
    AAGCGCTTCGAGCAGAAGGGAT
    TCCGCCTCGT
    354 NUDT18 NM_024815.3  1369- CCCCAGTGGCATCTCCTCATCAC
     1468 GTTCTGTGCCGTCCTTGGGAAAG
    GCCTGCATTCTGATCCTTCCAGG
    CCCTTCGAGCATGGAGGGGCACT
    GGGGAAGG
    355 NUMB NM_00100574  2833- CATAAGATTGATTTATCATTGAT
    4.1  2932 GCCTACTGAAATAAAAAGAGGA
    AAGGCTGGAAGCTGCAGACAGG
    ATCCCTAGCTTGTTTTCTGTCAG
    TCATTCATTG
    356 NUP153 NM_005124.3  5104- TTTATGATCCAGCAGATTATTCA
     5203 CTGATTTGACATAGTCTGGCTGT
    ACCCAGGAATGGAGCCTGCACG
    GTGAATGGCTTTGTATAGAACCT
    CTTTGTCTA
    357 OLR1 NM_002543.3  1524- ACACATTTTGGGACAAGTGGGG
     1623 AGCCCAAGAAAGTAATTAGTAA
    GTGAGTGGTCTTTTCTGTAAGCT
    AATCCACAACCTGTTACCACTTC
    CTGAATCAGT
    358 OSBP ILMN 170637   130- TTCTCTTCCTTCACCATCTGCAC
    6.1   229 TACATTTCTGGCTGATCCCAATC
    AGATTCCCGCTAATGGAAGAAGT
    TTAGAATCTTTCAGGTGGAATAA
    AGTCACAT
    359 FAM105 NM_138348.4  2537- TGCAGATGGTGTTCACATGAACC
    B  2636 GGAGACATCACTCTTTAGGATTC
    TACTGGCAGCCCCTGAATTGGCT
    CAACGTTIGTGGAGGTGGTATTT
    CCCTGAAG
    360 P2RY10 NM_198333.1   972- TTACACCATGGTAAAGGAAACC
     1071 ATCATTAGCAGTTGTCCCGTTGT
    CCGAATCGCACTGTATTTCCACC
    CTTTTTGCCTGTGCCTTGCAAGT
    CTCTGCTGC
    361 PACS1 NM_018026.3  3830- CGCTGTCTTCGTGGCTTCCACCC
     3929 TTGTTAATGATGCTCCTGCCTCT
    GCCTCCCAGCCCCTCACCCAGCA
    CAGCTCTGCCTGGACTTGGAGAG
    ATGGGAGG
    362 PANK2 NM_153640.2   824- AGTGGATAAACTAGTACGAGAT
      923 ATTTATGGAGGGGACTATGAGA
    GGTTTGGACTGCCAGGCTGGGCT
    GTGGCTTCAAGCTTTGGAAACAT
    GATGAGCAAG
    363 PDCD10 NM_145859.1   901- AAGAGATGTACTTCTCAGTGGCA
     1000 GTATTGAACTGCCTTTATCTGTA
    AATTTTAAAGTTTGACTGTATAA
    ATTATCAGTCCCTCCTGAAGGGA
    TCTAATCC
    364 PDGFD NM_033135.3  3394- CCTGTGAAAACATCAGTTTCCTG
     3493 TACCAAAGTCAAAATGAACGTTA
    CATCACTCTAACCTGAACAGCTC
    ACAATGTAGCTGTAAATATAAAA
    AATGAGAG
    365 PDSS1 NM_014317.3  1199- CATGAAGCAATAAGAGAGATCA
     1298 GTAAACTTCGACCATCCCCAGAA
    AGAGATGCCCTCATTCAGCTTTC
    AGAAATTGTACTCACAAGAGAT
    AAATGACAAC
    366 PELP1 NM_014389.2  1989- TGGCCCCGTCTCCTCGCTGCCCA
     2088 CCTCCTCTTGCCTGTGCCCTGCA
    AGCCTTCTCCCTCGGCCAGCGAG
    AAGATAGCCTTGAGGTCTCCTCT
    TTCTGCTC
    367 PFAS NM_012393.2  5109- CATCCCTAGATCCTAACCCTTTA
     5208 GTATGCTGGAATTCTACTCTTCA
    CTTACTGCATTGACTGTTGTTGA
    TTAGTTATTATTGCAAAGCACTG
    TCACCGGC
    368 PFDN5 NM_145897.2   232- ATCGATGTGGGAACTGGGTACTA
      331 TGTAGAGAAGACAGCTGAGGAT
    GCCAAGGACTTCTTCAAGAGGA
    AGATAGATTTTCTAACCAAGCAG
    ATGGAGAAAA
    369 PFDN5  NM_145897.2   331- ATCCAACCAGCTCTTCAGGAGAA
    b   430 GCACGCCATGAAACAGGCCGTC
    ATGGAAATGATGAGTCAGAAGA
    TTCAGCAGCTCACAGCCCTGGGG
    GCAGCTCAGG
    370 PGK1 NM_000291.3  1122- GTCCTGAAAGCAGCAAGAAGTA
     1221 TGCTGAGGCTGTCACTCGGGCTA
    AGCAGATTGTGTGGAATGGTCCT
    GTGGGGGTATTTGAATGGGAAG
    CTTTTGCCCG
    371 PHF8 NM_015107.2  5704- ATCAAGGTTTAGAACACCATGAG
     5803 ATAGTTACCCCTGATCTCCAGTC
    CCTAGCTGGGGGCTGGACAGGG
    GGAAGGGAGAGAGGATTTCTAT
    TCACCTTTAA
    372 PHLPP2 NM_015020.3  7601- CCAGTTGGGTGTGGCAGATCTAC
     7700 TGAATATCAAATGATGCTCTTCT
    TCCCATGTAGACCTTCAGCAAAA
    GCCGGTACTTGGAAGCCACAGG
    CTCACCTTC
    373 PHRF1 NM_020901.3  5239- GGGAAATGGGGGGCATCACCAT
     5338 GCCTGCCGTCGGGTTCCTGCGCT
    GACACCTGGTCTGTGCACCTGTG
    TTGCTCACAGTTGAAAACTGGAC
    ACTTTTGTA
    374 PI4K2A NM_018425.3  3886- TCCATGGAATTGCTGAGACGTGG
     3985 CTCCTGGGGCTATTTCTCCCTAA
    TAAAGGATGATCCAGGTCCTCAT
    TTCCAAAGTCCCAATGCTCTGAA
    AACCAAAA
    375 PIK3CD NM_005026.3  4799- GAGCCAGAAGTAGCCGCCCGCT
     4898 CAGCGGCTCAGGTGCCAGCTCTG
    TTCTGATTCACCAGGGGTCCGTC
    AGTAGTCATTGCCACCCGCGGGG
    CACCTCCCT
    376 PIM2 NM_006875.3  1947- TTTTTGGGGGATGGGCTAGGGGA
     2046 AATAAGGCTTGCTGTTTGTTCTC
    CTGGGGCGCTCCCTCCAACTTTT
    GCAGATTCTTGCAACCTCCTCCT
    GAGCCGGG
    377 PLAC8 NM_00113071   289- CTGATATGAATGAATGCTGTCTG
    5.1   388 TGTGGAACAAGCGTCGCAATGA
    GGACTCTCTACAGGACCCGATAT
    GGCATCCCTGGATCTATTTGTGA
    TGACTATAT
    378 PLEKHG NM_015432.3  6365- CCAGTIGTGGGTTAAGAATAGGC
    4  6464 TAGAGCAGACATTGGGTGTTTCC
    ATGCTGTAGGCTGGTGGGGGACC
    ATGTGCCTCTAGGCAGTGACTAG
    GGTGCCCC
    379 POLR2A NM_000937.4  6539- CCCCTGCCTGTCCCCAAATTGAA
     6638 GATCCTTCCTTGCCTGTGGCTTG
    ATGCGGGGCGGGTAAAGGGTAT
    TTTAACTTAGGGGTAGTTCCTGC
    TGTGAGTGG
    380 PPP1R XM_927029.1  4342- CAGAACCTCCTCAGTTCCTTCAC
    3E  4441 AGTGCAACCCTGTGTACTTGGCC
    CGCAACCCAATAGTATTGTGCCT
    CACTTCACCTTCCATGGGCAACT
    GCCCTCCC
    381 PPP2R NM_178588.1   941- ACAGCACCCTCACGGAACCAGT
    5C  1040 GGTGATGGCACTTCTCAAATACT
    GGCCAAAGACTCACAGTCCAAA
    AGAAGTAATGTTCTTAAACGAAT
    TAGAAGAGAT
    382 PPP6C NM_002721.4  1536- TTAAGAAATTTCAGCAGCAAAGT
     1635 TGTTATTCAGTGGGCACGATGGA
    CTCCAAATGCCTCAAGTTATGTA
    TACCTGTCCCAGATGTAAACTTC
    ATTGTCCT
    383 PRG2 NM_002728.4   257- CTCTGGAAGTGAAGATGCCTCCA
      356 AGAAAGATGGGGCTGTTGAGTCT
    ATCTCAGTGCCAGATATGGTGGA
    CAAAAACCTTACGTGTCCTGAGG
    AAGAGGAC
    384 PRPF3 NM_004698.2  2116- CCTACAGAGAACATGGCTCGTGA
     2215 GCATTTCAAAAAGCATGGGGCTG
    AACACTACTGGGACCTTGCGCTG
    AGTGAATCTGTGTTAGAGTCCAC
    TGATTGAG
    385 PRPF8 NM_006445.3  7091- ACTCTGCGGATCGGGAGGACCTG
     7190 TATGCCTGACCGTTTCCCTGCCT
    CCTGCTTCAGCCTCCCGAGGCCG
    AAGCCTCAGCCCCTCCAGACAGG
    CCGCTGAC
    386 C22orf NM_173566.2 10495- CCCGTTGAGCTGGCCATCTAGTG
    30 10594 CAGTGTGCTCTCAGATTCCATGT
    TTGTTGATTGTGTGTCTTCACAA
    GCCCCTCTCTGGTGCTGAATTGG
    ATTTGAAT
    387 BAT2D1 NM_015172.3  9620- AGAACAGTGAGTACCTAGAACT
     9719 GTGCCACTAATTAAAGGAAATCC
    TAAGAAGGTGCATTTCTTTACAG
    AGCTGTGTCATGCCATCCTTTGG
    GCCCTCTGC
    388 PRRG4 NM_024081.5   761- GAAGACCTGAGGAGGCTGCCTT
      860 GTCTCCATTGCCGCCTTCTGTGG
    AGGATGCAGGATTACCTTCTTAT
    GAACAGGCAGTGGCGCTGACCA
    GAAAACACAG
    389 PSMA3 NM_152132.2   422- CTTTGGCTACAACATTCCACTAA
      521 AACATCTTGCAGACAGAGTGGCC
    ATGTATGTGCATGCATATACACT
    CTACAGTGCTGTTAGACCTTTTG
    GCTGCAGT
    390 PSMA4 NM_002789.3   541- GTACATTGGCTGGGATAAGCACT
      640 ATGGCTTTCAGCTCTATCAGAGT
    GACCCTAGTGGAAATTACGGGG
    GATGGAAGGCCACATGCATTGG
    AAATAATAGC
    391 PSMA4  NM_002789.4   879- GAGGAAGAAGAAGCCAAAGCTG
    b   978 AGCGTGAGAAGAAAGAAAAAGA
    ACAGAAAGAAAAGGATAAATAG
    AATCAGAGATTTTATTACTCATT
    TGGGGCACCAT
    392 PSMA6 NM_002791.2   218- GGTCGGCTCTACCAAGTAGAATA
      317 TGCTTTTAAGGCTATTAACCAGG
    GTGGCCTTACATCAGTAGCTGTC
    AGAGGGAAAGACTGTGCAGTAA
    TTGTCACAC
    393 PSMA6  NM_002791.2   866- GATGCTCACCTTGTTGCTCTAGC
    b   965 AGAGAGAGACTAAACATTGTCG
    TTAGTTTACCAGATCCGTGATGC
    CACTTACCTGTGTGTTTGGTAAC
    AACAAACCA
    394 PSMB1 NM_002793.3   687- GCGGCTGGTGAAAGATGTCTTCA
      786 TTTCTGCGGCTGAGAGAGATGTG
    TACACTGGGGACGCACTCCGGAT
    CTGCATAGTGACCAAAGAGGGC
    ATCAGGGAG
    395 PSMB7 NM_002799.2   421- GTTACATTGGTGCAGCCCTAGTT
      520 TTAGGGGGAGTAGATGTTACTGG
    ACCTCACCTCTACAGCATCTATC
    CTCATGGATCAACTGATAAGTTG
    CCTTATGT
    396 PSMB8 NM_004159.4  1216- ACTCACAGAGACAGCTATTCTGG
     1315 AGGCGTTGTCAATATGTACCACA
    TGAAGGAAGATGGTTGGGTGAA
    AGTAGAAAGTACAGATGTCAGT
    GACCTGCTGC
    397 PSMC1 NM_002802.2  1487- CATCCTGTGTCTTTTGGAGTACG
     1586 ATGTGTAAGTGCCCATTGGGTGG
    CCTGTTGGTCACTGTGCAGCAGT
    CTGCTTCCCAATAAAGCGTGCTC
    TTTCACAA
    398 PSMD7 NM_002811.4  1231- GAGCTCTCTGCCTCCGGTCACTC
     1330 TTGCTGTGGTGCTACGTGGAAGT
    GAATGGAGACTGATCTCAAATCT
    GAACTGCAGCTTTCGCTGCTGTG
    AGTTGGGG
    399 PSME3 NM_005789.3  3203- TCCCGAGTGATACCCATGAACTG
     3302 CCAGTAGAGGCTGCTATCGTTCC
    ATGTGTAAGGAATGAACTGGTTC
    AAGGCGCGTCCTACCCAGTCATT
    TTCTTTAC
    400 PTGDR NM_000953.2  2341- TATGATGACTGAAAGGGAAAAG
     2440 TGGAGGAAACGCAGCTGCAACT
    GAAGCGGAGACTCTAAACCCAG
    CTTGCAGGTAAGAGCTTTCACCT
    TTGGTAAAAGA
    401 PTGDR2 NM_004778.1  1836- GCCAATGCTTACTGCGCTAGACG
     1935 CTTCATCCCACAATCTTAAGGGG
    CAGCTTCTATTAGCCAGTCTTTA
    CAGCTGAGCACATTCTGGCTCAG
    GGAGGTTA
    402 PUM1 NM_00102065  3753- AAATGTTCTAGTGTAGAGTCTGA
    8.1  3852 GACGGGCAAGTGGTTGCTCCAG
    GATTACTCCCTCCTCCAAAAAAG
    GAATCAAATCCACGAGTGGAAA
    AGCCTTTGTA
    403 QTRTD1 NM_024638.3  2508- TTAGATTAGAGTCATAGCCTTAA
     2607 TAGCCCTAGTTGTCATCCTGGGA
    GACAGGCAACAGTAGAGATATT
    TGAGAGCCTAAAGAGAGGTTTG
    GCCTGTGGGT
    404 RAB10 NM_016131.4  3593- AGGGCTTTGCCCCTTTTCTGTAA
     3692 GTCTCTTGGGATCCTGTGTAGAA
    GCTGTTCTCATTAAACACCAAAC
    AGTTAAGTCCATTCTCTGGTACT
    AGCTACAA
    405 RAG1 NM_000448.2  2301- CAGTCTACATTTGTACTCTTTGT
     2400 GATGCCACCCGTCTGGAAGCCTC
    TCAAAATCTTGTCTTCCACTCTA
    TAACCAGAAGCCATGCTGAGAAC
    CTGGAACG
    406 RASSF5 NM_182664.2  3061- TCGTCCTGCATGTCTCTAACATT
     3160 AATAGAAGGCATGGCTCCTGCTG
    CAACCGCTGTGAATGCTGCTGAG
    AACCTCCCTCTATGGGGATGGCT
    ATTTTATT
    407 RBM14 NM_006328.3  2661- TGGTATGTATCCAAGTCCCTGCT
     2760 GACCACTAATGTTCTAGCTGATG
    GTGAGCGGCACAGTCCCACTTCC
    CCATCTCCCCAAGTAGGTGGTGT
    TAGAAAAC
    408 RBM4B NM_031492.3  1557- TAGGAGTTGAATCCTTCTCCCTG
     1656 CCTACCTGCAGCATCTCCTTTCC
    CTTTAAAATGACCATGTAGTGGC
    AAGCAGCCTTTTACTCTTCTGTT
    AGCTCTGG
    409 RBX1 NM_014248.3   158- GATATTGTGGTTGATAACTGTGC
      257 CATCTGCAGGAACCACATTATGG
    ATCTTTGCATAGAATGTCAAGCT
    AACCAGGCGTCCGCTACTTCAGA
    AGAGTGTA
    410 RELA NM_021975.2   361- GATGGCTTCTATGAGGCTGAGCT
      460 CTGCCCGGACCGCTGCATCCACA
    GTTTCCAGAACCTGGGAATCCAG
    TGTGTGAAGAAGCGGGACCTGG
    AGCAGGCTA
    411 REPIN1 NM_014374.3  2491- TGTGTCCAGGCTCTTGTCTGAAC
     2590 ACCGCAGCCCCTCCTTCGCTCCT
    TCCAGAGCTCAGCATGTCACGGC
    AAGGACTGCCGCATTGGTGATGG
    AGGGCCAG
    412 REPS1 NM_00112861  1289- CACCAACCAGTACTCTTTTAACC
    7.2  1388 ATGCATCCTGCTTCTGTCCAGGA
    CCAGACAACAGTACGAACTGTA
    GCATCAGCTACAACTGCCATTGA
    AATTCGTAG
    413 RERE NM_00104268  5916- AACCCTCGACCCGAAACCCTCAC
    2.1  6015 CAGATAAACTACAGTTTGTTTAG
    GAGGCCCTGACCTTCATGGTGTC
    TTTGAAGCCCAACCACTCGGTTT
    CCTTCGGA
    414 RERE  NM_012102.3  7734- GCATTCTTGTTAGCTTTGCTTTT
    b  7833 CTCCCCATATCCCAAGGCGAAGC
    GCTGAGATTCTTCCATCTAAAAA
    ACCCTCGACCCGAAACCCTCACC
    AGATAAAC
    415 RFWD2 NM_022457.6  2606- TTTTCTTTTCCCTCCTTTATGAC
     2705 CTTTGGGACATTGGGAATACCCA
    GCCAACTCTCCACCATCAATGTA
    ACTCCATGGACATTGCTGCTCTT
    GGTGGTGT
    416 RFX1 NM_002918.4  4187- ATAAAAATCACTATTTTGTGTGC
     4286 TCCGCGTGCTATAGCTTTTGGGG
    CGGCCCTGCCCAGTCCCCGTGCC
    CACGGGGCTCCCTCTCCCGGTGG
    TGAAAGTG
    417 RHOB NM_004040.3  1707- GGGAGGAGGGAGGATGCGCTGT
     1806 GGGGTTGTTTTTGCCATAAGCGA
    ACTTTGTGCCTGTCCTAGAAGTG
    AAAATTGTTCAGTCCAAGAAACT
    GATGTTATT
    418 RHOG NM_001665.3  1045- CTTTCCACACAGTTGTTGCTGCC
     1144 TATTGTGGTGCCGCCTCAGGTTA
    GGGGCTCTCAGCCATCTCTAACC
    TCTGCCCTCGCTGCTCTTGGAAT
    TGCGCCCC
    419 RHOU NM_021205.5  4174- TTGACAGACTCAAGAGAAACTA
     4273 CCCAGGTATTACACAAGCCAAA
    ATGGGAGCAAGGCCTTCTCTCCA
    GACTATCGTAACCTGGTGCCTTA
    CCAAGTTGTG
    420 RNASE2 NM_002934.2   331- TGACCTGTCCTAGTAACAAAACT
      430 CGCAAAAATTGTCACCACAGTGG
    AAGCCAGGTGCCTTTAATCCACT
    GTAACCTCACAACTCCAAGTCCA
    CAGAATAT
    421 RNF114 NM_018683.3  2246- AATTCAGATCATCTCAGAAGTCT
     2345 GGAGGGAAATCTGGCGAAACCT
    TCGTTTGAGGGACTGATGTGAGT
    GTATGTCCACCTCACTGGTGGCA
    CCGAGAAAC
    422 RNF19B NM_153341.3  2222- CCCCAGAGCCCAAGGTGCACCG
     2321 AGCCCAAGTGCCCATATGAACCT
    CTCTGCCCTAGCCGAGGGACAAA
    CTGTCTTGAAGCCAGAAGGTGGA
    GAAGCCAGA
    423 RNF214 NM_207343.3  2068- ACCTGTAAGCTATGTCTAATGTG
     2167 CCAGAAACTCGTCCAGCCCAGTG
    AGCTGCATCCAATGGCGTGTACC
    CATGTATTGCACAAGGAGTGTAT
    CAAATTCT
    424 RNF34 NM_025126.3  1619- CTTCTGTCCTCTTTGGATGAGAT
     1718 CAGTGTCCACAAGTGGCCGACAT
    GGAACATGCTGAGCAGTGGCTCC
    TCTGAATGTTCACTTTATTAGTC
    ATGTATAT
    425 C20orf NM_080748.2   274- CTCAGGATCGGAATGCGGGGTC
    52   373 GAGAGCTGATGGGCGGCATTGG
    GAAAACCATGATGCAGAGTGGC
    GGCACCTTTGGCACATTCATGGC
    CATTGGGATGG
    426 RPL26 NM_016093.2     4- CACTCAGGGTCTGAGGCAGCTAG
    L1   103 TAGCCGGAGGGTCACCATGAAG
    TTCAATCCCTTCGTTACCTCGGA
    CCGCAGTAAAAACCGCAAACGT
    CACTTCAATG
    427 RPL3 NM_00103385  1072- AGAAGAAAGCATTCATGGGACC
    3.1  1171 ACTGAAGAAAGACCGAATTGCA
    AAGGAAGAAGGAGCTTAATGCC
    AGGAACAGATTTTGCAGTTGGTG
    GGGTCTCAATA
    428 RPL31 NM_000993.4    20- CTTGCAACTGCGGCTTTCCTTCT
      119 CCCACAATCCTTCGCGCTCTTCC
    TTTCCAACTTGGACGCTGCAGAA
    TGGCTCCCGCAAAGAAGGGTGGC
    GAGAAGAA
    429 RPL34 NM_000995.3   471- ACCTCACCTCAGCTTGAGAGAGC
      570 CAGTTGTGTGCATCTCTTTCCAG
    TTTTGCATCCAGTGACGTCTGCT
    TGGCATCTTGAGATTGTTATGGT
    GAGAGTAT
    430 RPL39L NM_052969.1   139- GCGGGTTCGGGTCGGTGACACGC
      238 AGACCTGAGGGAGCTGGGCCCG
    CCTTTTCCGCCCGCGCCCCAGGC
    CCTTGCAGATCGAGATTTGCGTC
    CTAGAGTGG
    431 KIAA NM_015203.4  4795- CCCCTTGGGTCCCTCACACAGAG
    0460  4894 ACACCATCAGCCGGAGTGGTATA
    ATCTTACGGAGTCCCCGGCCAGA
    CTTTCGGCCTAGGGAACCTTTTC
    TCAGCAGA
    432 RPS24 NM_001026.4   482- ATGAAGAAAGTCAGGGGGACTG
      581 CAAAGGCCAATGTTGGTGCTGGC
    AAAAAGCCGAAGGAGTAAAGGT
    GCTGCAATGATGTTAGCTGTGGC
    CACTGTGGAT
    433 RPS27L NM_015920.3   241- TAAAATGTCCAGGTTGCTACAAG
      340 ATCACCACGGTTTTCAGCCATGC
    TCAGACAGTGGTTCTTTGTGTAG
    GTTGTTCAACAGTGTTGTGCCAG
    CCTACAGG
    434 RPS6 NM_001010.2   172- GAATGGAAGGGTTATGTGGTCCG
      271 AATCAGTGGTGGGAACGACAAA
    CAAGGTTTCCCCATGAAGCAGGG
    TGTCTTGACCCATGGCCGTGTCC
    GCCTGCTAC
    435 RSL24 NM_016304.2  1232- TGGAGTGACACTACACTCTAGAA
    D1  1331 TTTCCACTTTGGAGAATACTCAG
    TTCCAACTTGTGATTCCTGATAG
    AACAGACTTTACTTTTCTAGCCC
    AGCATTGA
    436 RWDD1 NM_00100746   998- TGGAGGATGATGAAGATGATCC
    4.2  1097 AGACTATAATCCTGCTGACCCAG
    AGAGTGACTCAGCTGACTAATGG
    ACTGTCCCCATCTGCAGAGAGGC
    TTGACTGCC
    437 RXRA NM_002957.5  5301- AGTAATTTTTAAAGCCTTGCTCT
     5400 GTTGTGTCCTGTTGCCGGCTCTG
    GCCTTCCTGTGACTGACTGTGAA
    GTGGCTTCTCCGTACGATTGTCT
    CTGAAACA
    438 S100A NM_005621.1   261- CAAGATGAACAGGTCGACTTTCA
    12b   360 AGAATTCATATCCCTGGTAGCCA
    TTGCGCTGAAGGCTGCCCATTAC
    CACACCCACAAAGAGTAGGTAG
    CTCTCTGAA
    439 S100A8 NM_002964.4   366- GTTAACTTCCAGGAGTTCCTCAT
      465 TCTGGTGATAAAGATGGGCGTGG
    CAGCCCACAAAAAAAGCCATGA
    AGAAAGCCACAAAGAGTAGCTG
    AGTTACTGGG
    440 SAMSN1 NM_022136.3  1024- ACCTGAGCCCCTATCCTTGAGCT
     1123 CAGACATCTCCTTAAATAAGTCA
    CAGTTAGATGACTGCCCAAGGG
    ACTCTGGTTGCTATATCTCATCA
    GGAAATTCA
    441 SAP NM_024545.3  3091- GATCTCCACCGAATAAACGAACT
    130b  3190 GATACAGGGAAATATGCAGAGG
    TGTAAACTTGTGATGGATCAAAT
    CAGTGAAGCCAGAGACTCCATG
    CTTAAGGTTT
    442 SAP130 NM_024545.3  3720- CGGTTCTTCTGCCTGACCTTCAA
     3819 ATGCCCATGTTGGCCTTTTACAG
    CAGTGCCACGGCACCAAGCGAG
    CTGCCACATCTCACACTCTAAAG
    GGTTTGAAC
    443 CIP29 NM_033082.3   622- AACTGGAACCACAGAGGATACA
      721 GAGGCAAAGAAGAGGAAAAGAG
    CAGAGCGCTTTGGGATTGCCTGA
    TGAAAAGTTCCTGATACTTTCTG
    TTCTCCAGTG
    444 SFRS2 NM_004719.2  4203- AGTTCTTCTCATGTAAGTAATAA
    IP  4302 CATGAGTACACCAGTTTTGCCTG
    CTCCGACAGCAGCCCCAGGAAA
    TACGGGAATGGTTCAGGGACCA
    AGTTCTGGTA
    445 SFRS15 NM_020706.2  3635- GAGAGAAGGAAGAAGCCCGAGG
     3734 AAAGGAAAAGCCTGAGGTGACA
    GACAGGGCAGGTGGTAACAAAA
    CCGTTGAACCTCCCATTAGCCAA
    GTGGGAAATGT
    446 RBM16 NM_014892.4  4111- TGATTATTTTGAAGGGGCCACTT
     4210 CTCAACGAAAAGGTGATAATGT
    GCCTCAGGTTAATGGTGAAAATA
    CAGAGAGACATGCTCAGCCACC
    ACCTATACCA
    447 SDHA NM_004168.3  2042- GTCACTCTGGAATATAGACCCGT
     2141 GATCGACAAAACTTTGAACGAG
    GCTGACTGTGCCACCGTCCCGCC
    AGCCATTCGCTCCTACTGATGAG
    ACAAGATGT
    448 SEC24C NM_198597.2  4194- AGGCAGAGGCAGCTGGAGCGCC
     4293 GTTCTCTCCTGCTGGGACACCGC
    TTGGGCTTTGGTATTGACTGAGT
    GGCTGACAGTTATCTTCCAACCC
    CAACTGGCT
    449 SEMG1 NM_003007.2  1291- GGCAGACACCAACATGGATCTC
     1390 ATGGGGGATTGGATATTGTAATT
    ATAGAGCAGGAAGATGACAGTG
    ATCGTCATTTGGCACAACATCTT
    AACAACGACC
    450 SERPIN NM_005024.1   891- AGACAGTTATGATCTCAAGTCAA
    B10   990 CCCTGAGCAGTATGGGGATGAGT
    GATGCCTTCAGCCAAAGCAAAG
    CTGATTTCTCAGGAATGTCTTCA
    GCAAGAAAC
    451 SETD2 NM_014159.6  7956- TGGTTAGAAGCCATCAGAGGTGC
     8055 AAGGGCTTAGAAAAGACCCTGG
    CCAGACCTGACTCCACTCTTAAA
    CCTGGGTCTTCTCCTTGGCGGTG
    CTGTCAGCG
    452 SFMBT1 NM_00100515  2844- AAGGATCGAAGTTGCTGAAAGG
    8.2  2943 CTTCACCTGGACAGTAACCCCTT
    GAAGTGGAGTGTGGCAGACGTT
    GTGCGGTTCATCAGATCCACTGA
    CTGTGCTCCA
    453 SFPQ NM_005066.2  2800- GGTTATGTAAGCAAAGCTGAACT
     2899 GTAAATCTTCAGGAATATGTATT
    AAGATTGTGGAATGGGTGTAAG
    ACAATTGGTAGGGGGTGAAAGT
    GGGTTTGATT
    454 SGK1 NM_005627.3  1622- ACGAGCGTTAGAGTGCCGCCTTA
     1721 GACGGAGGCAGGAGTTTCGTTA
    GAAAGCGGACGCTGTTCTAAAA
    AAGGTCTCCTGCAGATCTGTCTG
    GGCTGTGATG
    455 SGK NM_005627.3   173- GAAGCAGAGGAGGATGGGTCTG
      272 AACGACTTTATTCAGAAGATTGC
    CAATAACTCCTATGCATGCAAAC
    ACCCTGAAGTTCAGTCCATCTTG
    AAGATCTCC
    456 SGK1b NM_005627.3  1814- GGATATGCTGTGTGAACCGTCGT
     1913 GTGAGTGTGGTATGCCTGATCAC
    AGATGGATTTTGTTATAAGCATC
    AATGTGACACTTGCAGGACACTA
    CAACGTGG
    457 SH2D3C NM_170600.2  2795- AGCACCCCAAGGACACTGTGATC
     2894 AACCCGAGAATGTTCTGGGTTCA
    ACTCAAGCATCTCCCTTGCACCT
    CCAGGGTCCTGCGTGGACTCTGG
    GTTCCATC
    458 SIK1 NM_173354.3  4185- TCGCTCATAAAGAAGTTTTTGGG
     4284 ATGGGAGAGAATCCAGACCATC
    TTGGGGCAGCCAGGCCCTTGCCT
    TCATTTTTACAGAGGTAGCACAA
    CTGATTCCA
    459 SIN3A NM_015477.2  4666- TTTATTCCTGACGATTCCCTTGC
     4765 TGCCTACCCTTTTCTCTCCTCTG
    GTTCTCAACCTCAACGAGTTCAA
    ATCAGTTGTCCTTTTTAGCTCCC
    GTGGAACT
    460 SLAMF8 NM_020125.2  3173- AACAAATATTGATTGAGGGCGCT
     3272 GCATGTGCTGGGTACATTTCTTG
    GCACTTGGGAATCAGTAGTCAAG
    CGAAACCCTTGCCTTTGAGAGTT
    TATGGTCT
    461 SLC11 NM_000578.3  2072- GCAGGATAGAGTGGGACAGTTC
    A1  2171 CTGAGACCAGCCAACCTGGGGG
    CTTTAGGGACCTGCTGTTTCCTA
    GCGCAGCCATGTGATTACCCTCT
    GGGTCTCAGT
    462 SLC15 NM_021082.3  2548- AACTCATTAAAACTTGTGCAGTG
    A2  2647 TTGCTGGAGCTGGCCTGGTGTCT
    CCAAATGACCATGAAAATACAC
    ACGTATAATGGAGATCATTCTCT
    GTGGGTATG
    463 SLC25 NM_000387.5  1511- ATCTTCTTCAGTCCCTAGCCAGG
    A20  1610 AATACCCATTTGATTTCCAGGGT
    GCCATCTAATCCTGGGCTGTACA
    TGTGGATATGGACTTGAGGCCCA
    CCTCTGTG
    464 SLC25 NM_016612.2  1217- TCCAGCCCCTTGCCCTCTCCTCA
    A37  1316 CACGTAGATCATTTTTTTTTTGC
    AGGGTGCTGCCTATGGGCCCTCT
    GCTCCCCAATGCCTTAGAGAGAG
    GAGGGGAC
    465 SLC45 NM_033102.2  2455- AGTTTCTAGGATGAAACACTCCT
    A3  2554 CCATGGGATTTGAACATATGAAA
    GTTATTTGTAGGGGAAGAGTCCT
    GAGGGGCAACACACAAGAACCA
    GGTCCCCTC
    466 SLC6 NM_003044.4  3220- GATATTGCTAACTGATCACAGAT
    A12  3319 TCTTTCCCACCTCACAATCCTTC
    CGAATGTGCTCCAGGCAGCACCA
    TTTGCCATCCTGCTTCTAACGCA
    AACCCCTG
    467 SLC6 NM_003043.5  4438- ATTCTAGACCAAAGACACAGGC
    A6  4537 AGACCAAGTCCCCAGGCCCCGCC
    TGGAAGGAAGTCGTTCCTCAACT
    CTCCCCAAGGCACCTGTCTCCAA
    TCAGAGCCC
    468 SLC9 NM_004252.3  1811- ATTAACATGATTTTCCTGGTTGT
    A3R1  1910 TACATCCAGGGCATGGCAGTGGC
    CTCAGCCTTAAACTTTTGTTCCT
    ACTCCCACCCTCAGCGAACTGGG
    CAGCACGG
    469 C14orf NM_031210.5    46- CGGCCTCAGCAGCGAGAGGTGC
    156   145 TGCGGCGCTGCGTAGAAGTATCA
    ATCAGCCGGTTGCTTTTGTGAGA
    AGAATTCCTTGGACTGCGGCGTC
    GAGTCAGCT
    470 SMAR NM_003074.3  5281- CAATGGCCAGGGTTTTACCTACT
    CC1  5380 TCCTGCCAGTCTTTCCCAAAGGA
    AACTCATTCCAAATACTTCTTTT
    TTCCCCTGGAGTCCGAGAAGGAA
    AATGGAAT
    471 SNORA NR_002984.1    30- CTCGTGGGACTCTAGAGGGAGTC
    56   129 AGTCTGCAACAGTAAGTGGTGA
    GTTCTTCTGTCCAGCGTCAGTAT
    TTTGATGGTGGCTTTAGACTTGC
    CAGATAACA
    472 SNX11 NM_152244.1  2261- CCCTCCCTGTCGCCCACTCCTCC
     2360 CTCCTCTGGCTATCCTACCCTGT
    CTGTGGGCTCTTTTACTACCAGC
    CTATGCTGTGGGACTGTCATGGC
    ATTTAGTT
    473 SOCS1 NM_003745.1  1026- TTAACTGTATCTGGAGCCAGGAC
     1125 CTGAACTCGCACCTCCTACCTCT
    TCATGTTTACATATACCCAGTAT
    CTTTGCACAAACCAGGGGTTGGG
    GGAGGGTC
    474 SP2 NM_003110.5  2701- GGGGGCAATGATGAGCATATGAA
     2800 TTTTTTCTCACTCTAGCAATTCC
    CTTTTCTAAATGACACAGCATTT
    AAACTCAAATCTGGATTCAGATA
    ACAGCACC
    475 SPA17 NM_017425.3   176- CAAGGATTTGGGAATCTTCTTGA
      275 AGGGCTGACACGCGAGATTCTG
    AGAGAGCAACCGGACAATATAC
    CAGCTTTTGCAGCAGCCTATTTT
    GAGAGCCTTC
    476 SPEN NM_015001.2 11995- GTATTGCCCACTCATTTGTATAA
    12094 GTGCGCTTCGGTACAGCACGGGT
    CCTGCTCCCGCGATGTGGAAGTG
    TCACACGGCACCTGTACAAAAA
    GACTGGCTA
    477 SPINK5 NM_006846.3  2596- GAGCAATGACAAAGAGGATCTG
     2695 TGTCGTGAATTTCGAAGCATGCA
    GAGAAATGGAAAGCTTATCTGC
    ACCAGAGAAAATAACCCTGTTCG
    AGGCCCATAT
    478 SPN NM_003123.3  2346- AGTGCCTGCGTGTGTCCACTCGT
     2445 GGGTGTGGTTTGTGTGCAAGAGC
    TGAGGATTTGGCGATGCTTGGGA
    GGGGTAGTTGTGGGTACAGACG
    GTGTGGGGG
    479 SREB NM_00100529  3985- CCCCTCCTTGCTCTGCAGGCACC
    F1 1.2  4084 TTAGTGGCTTTTTTCCTCCTGTG
    TACAGGGAAGAGAGGGGTACATT
    TCCCTGTGCTGACGGAAGCCAAC
    TTGGCTTT
    480 SFRS4 NM_005626.4  2080- TACTCATGGCCCACAGTAGAATA
     2179 TCCAAAACGCCTTGGCTTTCAGG
    CCTGGCCTTTCCTACAGGGAGCT
    CAGTAACCTGGACGGCTCTAAGG
    CTGGAATG
    481 ST6GA NM_003032.2  3783- CTGATTTTAATCTTCGAATCATG
    L1  3882 ACACTGAGTGCAGAGGAGGTGG
    CATTCCGACAGCAGGACATACAT
    GTTGGTGTGAAGACTGGGACGA
    CACTGGGTAG
    482 STAG3 NM_012447.3  3424- AAGTGCCTGCAGCATGTCTCCCA
     3523 GGCACCTGGCCATCCCTGGGGCC
    CAGTCACCACCTACTGCCACTCC
    CTCAGCCCTGTGGAGAACACAGC
    AGAGACCA
    483 STAMBP NM_006463.4  1926- TTTCCTGTGGTTTATGGCAATAT
     2025 GAATGGAGCTTATTACTGGGGTG
    AGGGACAGCTTACTCCATTTGAC
    CAGATTGTTTGGCTAACACATCC
    CGAAGAAT
    484 STAT6 NM_003153.4  3725- ACTGTGCCCAAGTGGGTCCAAGT
     3824 GGCTGTGACATCTACGTATGGCT
    CCACACCTCCAATGCTGCCTGGG
    AGCCAGGGTGAGAGTCTGGGTC
    CAGGCCTGG
    485 STIP1 NM_006819.2  1906- CCCGGGGAAGACACAGAGACTC
     2005 GTACCTGCGCTGTTTGTGCCGCC
    GCTGCCTCTGGGCCCTCCCAGCA
    CACGCATGGTCTCTTCACCGCTG
    CCCTCGAGT
    486 STK16 NM_003691.2  1420- GGGGTAGCGGGGTCAGGACAATC
     1519 ATCTCAGTCCTGCATCTTTTCTT
    CTGCTTTCTTCCCTCCAAGAGCA
    AAACCTGGGCAAGGGGACTTAC
    TGAGTGGGG
    487 STK38 NM_007271.3  3269- TTGTCAGTGAAACTACTTTGGAT
     3368 TTTAACCTCTTAGAGGAAGAAAA
    AAGGTTAGGGAAGTGTCAACTCT
    GGATGAAGGTGATGTGTTTGCCT
    CTCAGTCT
    488 STOM NM_004099.5  2953- TTCTGCCTTGTGAATTCGTAGTC
     3052 CAATCAGCTGAAATTAAATCACT
    TGGGAGGGACGCATAGAAGGAG
    CTCTAGGAACACAGTGCCAGTGC
    AGAAGTTTC
    489 SYNJ1 NM_003895.3  4746- CCCTCTGCTCCCGCCCGGCACCA
     4845 GCCCTCCAGTAGATCCTTTCACG
    ACCTTGGCCTCTAAGGCTTCACC
    CACACTGGACTTTACAGAAAGAT
    AACGCCAT
    490 TAPBP NM_003190.4  3397- CTTGCCCTCCCTGGGTCGCAGAC
     3496 GAGGTCGGCCTCGTCATTCCCCG
    CAGACCGCCGCGCGTCCCTCTTG
    TGCGGTTCACCACAGTTGTATTT
    AAGTGATC
    491 TAX1 NM_00107986  2081- CAGCCAGCCTGCTCGAAACTTTA
    BP1 4.2  2180 GTCGGCCTGATGGCTTAGAGGAC
    TCTGAGGATAGCAAAGAAGATG
    AGAATGTGCCTACTGCTCCTGAT
    CCTCCAAGT
    492 TBC1 NM_015188.1  5451- TTCCAAGGAATGCACTAAGCCTT
    D12  5550 CAGTCTTTTTAGACTGACAGTAC
    TGGCAGCTAAAATATTGTACTGT
    ATCTTCTCTTGAGCCCAGTATGT
    AGGAAATA
    493 TBCE NM_00107951  1541- TATGCTGAAAAACCAGCTACTAA
    5.2  1640 CACTGAAGATAAAATACCCTCAT
    CAACTTGATCAGAAAGTCCTGGA
    GAAACAACTGCCGGGCTCCATG
    ACAATTCAA
    494 TBK1 NM_013254.2  1611- ACCAGTCTTCAGGATATCGACAG
     1710 CAGATTATCTCCAGGTGGATCAC
    TGGCAGACGCATGGGCACATCA
    AGAAGGCACTCATCCGAAAGAC
    AGAAATGTAG
    495 TBP NM_003194.4  1441- TGTAAGTGCCCACCGCGGGATGC
     1540 CGGGAAGGGGCATTATTTGTGCA
    CTGAGAACACCGCGCAGCGTGA
    CTGTGAGTTGCTCATACCGTGCT
    GCTATCTGG
    496 TCF20 NM_181492.2  6765- CCAGGCCTGTGTTGCCAGAGCTG
     6864 GCAGTGTGAGCTGTAGGCAGGG
    ACGGGGAGGGACTGTCGCTGTG
    ATCAGAGTGGGTTAAGCTGACCA
    GGAACACCCA
    497 TCF7L2 NM_030756.4  2067- GGCCCACCTGTCCATGATGCCTC
     2166 CGCCACCCGCCCTCCTGCTCGCT
    GAGGCCACCCACAAGGCCTCCG
    CCCTCTGTCCCAACGGGGCCCTG
    GACCTGCCC
    498 TCP1 NM_030752.2   254- GTGTTCGGTGACCGCAGCACTGG
      353 GGAAACGATCCGCTCCCAAAAC
    GTTATGGCTGCAGCTTCGATTGC
    CAATATTGTAAAAAGTTCTCTTG
    GTCCAGTTG
    499 TFCP2 NM_005653.4  2271- CCTCTGAAAACGGCCCTCTTGAA
     2370 GGGGGATATGAATGGAGATTTG
    AAGGTCTGCAAGAACCTGACTCG
    TCTGACTGTGTGTGGAGGAGTCC
    AGGCCATGG
    500 TGIF1 NM_003244.2  1041- ACCTCAACCAGGACTTCAGTGGA
     1140 TTTCAGCTTCTAGTGGATGTTGC
    ACTCAAACGGGCTGCAGAGATG
    GAGCTTCAGGCAAAACTTACAGC
    TTAACCCAT
    501 TGIF1b NM_173208.1   691- CCCCGGGATCAGTTTTGGCTCGT
      790 CCATCAGTGATCTGCCATACCAC
    TGTGACTGCATTGAAAGATGTCC
    CTTTCTCTCTCTGCCAGTCGGTC
    GGTGTGGG
    502 TIAM1 NM_003253.2  5293- CCTAACTCTGCCCACCCTCCTGT
     5392 ACCGTCGACAAGAATGTCCCCTT
    AGGTCGCGCTCTTGCACACACGG
    TTTTGGCAGCTGACTTGGTTCTG
    AAGCCATG
    503 TIMM8B ENST0000050   339- GAATGACAGAAGCAAAGGACTT
    4148.1   438 GTTACTAAGCAGATTTAAGGGTC
    AGTGGGGGAAGGCTATCAACCC
    ATTGTCAGATCAGCATCAGGCTG
    TTATCAAGTC
    504 TM2D2 NM_078473.2  2970- ACCCATCATCCATCTGCCCACAA
     3069 ACCTGGCCAAATGTGATACAACC
    TGAAAACCTGATGGACTAAAGG
    AGTACTATTTAACAATTGATTGC
    CTTTGCACT
    505 TM9SF1 NM_006405.6  1996- CGCTGGTGGTGGCGATCTGTGCT
     2095 GAGTGTTGGCTCCACCGGCCTCT
    TCATCTTCCTCTACTCAGTTTTC
    TATTATGCCCGGCGCTCCAACAT
    GTCTGGGG
    506 CCDC72 NM_015933.4   124- GAGGAGCAGAAGAAACTCGAGG
      223 AGCTAAAAGCGAAGGCCGCGGG
    GAAGGGGCCCTTGGCCACAGGT
    GGAATTAAGAAATCTGGCAAAA
    AGTAAGCTGTTC
    507 TMBIM6 NM_003217.2  2282- CTCTCCCTATTCACAACCAGTGC
     2381 ACAGTTTGACACAGTGGCCTCAG
    GTTCACAGTGCACCATGTCACTG
    TGCTATCCTACGAAATCATTTGT
    TTCTAAGT
    508 TMC8 NM_152468.4  2238- AGGCCAATGCCAGGGCCATCCA
     2337 CAGGCTCCGGAAGCAGCTGGTGT
    GGCAGGTTCAGGAGAAGTGGCA
    CCTGGTGGAGGACCTGTCGCGAC
    TGCTGCCGGA
    509 TMC01 NM_019026.3   992- TCATTTACATAAGTATTTTCTGT
     1091 GGGACCGACTCTCAAGGCACTGT
    GTATGCCCTGCAAGTTGGCTGTC
    TATGAGCATTTAGAGATTTAGAA
    GAAAAATT
    510 TMEM NM_00110082  7652- AGGAGAATAAATGTTGGAGGGG
    170B 9.2  7751 TAATACACAAAAACAAAGGCAT
    ATTTGATGAAGTACCCTGTGTTA
    TGTGAACACAATTTCCCCTTCTG
    TTAAGACTAT
    511 TMEM NM_00108054  1313- GCTCTGTGAAGGCAATGAGTGTC
    218 6.2  1412 ACTTCCCTCTGCTCTAATAAAGC
    AATAAATAATAGCTAAAGGGCT
    GACTTTCACTTCGAACTCTTGGC
    CACGGCTTT
    512 TMEM70 NM_017866.5  1952- GGTGGTTAGCTATACGGGAAATG
     2051 GTAAGTAGTGTTGTCTTCAGTAT
    CTTAATTTGTTTCTGCAACTGTG
    CACTCCTCCCTTGGTGGCACCCT
    ATGGGTGT
    513 TMSB4X NM_021109.3   286- TTAACTTTGTAAGATGCAAAGAG
      385 GTTGGATCAAGTTTAAATGACTG
    TGCTGCCCCTTTCACATCAAAGA
    ACTACTGACAACGAAGGCCGCG
    CCTGCCTTT
    514 TNFR NM_001561.5  1848- GCCTGGAGGAAGTTTTGGAAAG
    SF9  1947 AGTTCAAGTGTCTGTATATCCTA
    TGGTCTTCTCCATCCTCACACCT
    TCTGCCTTTGTCCTGCTCCCTTT
    TAAGCCAGG
    515 TNF NM_003808.3   811- AGTCAGAGAGCCGGCACTCTCA
    SF13   910 GTTGCCCTCTGGTTGAGTTGGGG
    GGCAGCTCTGGGGGCCGTGGCTT
    GTGCCATGGCTCTGCTGACCCAA
    CAAACAGAG
    516 TNFSF8 NM_001244.3   519- CCCTCAAAGGAGGAAATTGCTCA
      618 GAAGACCTCTTATGTATCCTGAA
    AAGGGCTCCATTCAAGAAGTCAT
    GGGCCTACCTCCAAGTGGCAAA
    GCATCTAAA
    517 TOMM7 NM_019059.2   251- TCTGGCTCGGATAAGAGATGGG
      350 ACATCATTCAGTCACTAGTTGGA
    TGGCACAAGGCTCTTCACAGACG
    CATCTGTAGCAGAGTGGATCTTG
    TACTAACTT
    518 TP53 NM_005657.2  5591- TACTTCCTGTGCCTTGCCAGTGG
    BP1  5690 GATTCCTTGTGTGTCTCATGTCT
    GGGTCCATGATAGTTGCCATGCC
    AACCAGCTCCAGAACTACCGTAA
    TTATCTGT
    519 TPR NM_003292.2  7194- TCTCCCCTCCACCAGCCAGGATC
     7293 CTCCTTCTAGCTCATCTGTAGAT
    ACTAGTAGTAGTCAACCAAAGCC
    TTTCAGACGAGTAAGACTTCAGA
    CAACATTG
    520 TPT1 NM_003295.3    18- GCCTGCGTCGCTTCCGGAGGCGC
      117 AGCGGGCGATGACGTAGAGGGA
    CGTGCCCTCTATATGAGGTTGGG
    GAGCGGCTGAGTCGGCCTTTTCC
    GCCCGCTCC
    521 TRAF NM_147686.3  2449- GCCAGTGTCCCATATGTTCCTCC
    3IP2  2548 TGACAGTTTGATGTGTCCATTCT
    GGGCCTCTCAGTGCTTAGCAAGT
    AGATAATGTAAGGGATGTGGCA
    GCAAATGGA
    522 TRAF6 NM_145803.1  1840- CACCCGCTTTGACATGGGTAGCC
     1939 TTCGGAGGGAGGGTTTTCAGCCA
    CGAAGTACTGATGCAGGGGTAT
    AGCTTGCCCTCACTTGCTCAAAA
    ACAACTACC
    523 LBA1 NM_014831.2 10132- CTGGGAAACCTTCATGCCTCTCT
    10231 GATGGTTACTGCCCACCCTTACC
    CCACCCCTCAGCTCAGCCTGGTA
    TGGAAAGCAAGGTGCACGTTGG
    TCTTTGATT
    524 TRIM21 NM_003141.3  1637- TCTGCAGAGGCATCCGGATCCCA
     1736 GCAAGCGAGCTTTAGCAGGGAA
    GTCACTTCACCATCAACATTCCT
    GCCCCAGATGGCTTTGTGATTCC
    CTCCAGTGA
    525 TRIM32 NM_012210.3  2681- GTGCTACCAAAGGGGATACACA
     2780 AGCCCTTTAGGAAGCAGTACCTC
    TCGCCTGGAGGATCTGTGCCATC
    TTGGATTGAGAATTGCAGATGTG
    ACAGAATGG
    526 TRIM39 NM_021253.3  3141- CTGCTATTCGGGTAATCTTCACA
     3240 GAAATGACTGAGAGAAGAATCT
    GCAGTTTACTGAGGGCATTTCAG
    TTCCTCCTACCACCTCAACAGGA
    CTTTGTCCA
    527 TRIM NM_172016.2  2841- CTCTATACCAATAAGTCAGTCAC
    39b  2940 CTTGCTCCTCTCCAGAGGCAAAG
    TGGAAGAGATCCTGCAAGACAC
    ATCTATCCTTTCACAGTGTTCCC
    AAGGGAACT
    528 TRRAP NM_003496.3 12169- AGTTGATGAACCCATCATGCTGG
    12268 TTTTTCTCTGAGCACAAAGTTTT
    AGGCTGTACACAGCCAGCCTTGG
    GAATCTCGTTGAGCGTTCGGCGT
    GGATCCAC
    529 TSC1 NM_000368.4  8068- CCCCAGACCAACCCTTCCCTCCC
     8167 TTTCCCCACCTCTTACAGTGTTT
    GGGACAGGAGGGTATGGTGCTGC
    TCTGTGTAGCAAGTACTTTGGCT
    ATTGAAAGA
    530 TTC9 NM_015351.1  4050- TACTAATCAGGCATCTGACCTGC
     4149 ACTGTCATCCCCTGCCTGGACTT
    TTGCGATGGACTCTTTGGGGGAA
    AAACTAACGCTTTTTAATTATTG
    TGAAAGCA
    531 TTN NM_133378.4   850- TCGACTGCTCAGATCTCAGAATC
      949 AAGACAAACCCGAATTGAAAAG
    AAGATTGAAGCCCACTTTGATGC
    CAGATCAATTGCAACAGTTGAGA
    TGGTCATAG
    532 TUBB NM_178014.2  2223- CAAAAAAGAATGAACACCCCTG
     2322 ACTCTGGAGTGGTGTATACTGCC
    ACATCAGTGTTTGAGTCAGTCCC
    CAGAGGAGAGGGGAACCCTCCT
    CCATCTTTTT
    533 TUG1 NR_002323.2  7082- TAAGCTAGAGGTCATGGTCACTG
     7181 AAATTACTTTCCAAAGTGGAAGA
    CAAAATGAAACAGGAACTGAGG
    GAATATTTAAGATCCCACAGAAG
    CGTAAAAAT
    534 TXN NM_003329.3   152- TTGGATCCATTTCCATCGGTCCT
      251 TACAGCCGCTCGTCAGACTCCAG
    CAGCCAAGATGGTGAAGCAGATC
    GAGAGCAAGACTGCTTTTCAGGA
    AGCCTTGG
    535 TXNDC NM_032731.3   378- TCATCTACTGCCAAGTAGGAGAA
    17   477 AAGCCTTATTGGAAAGATCCAAA
    TAATGACTTCAGAAAAAACTTGA
    AAGTAACAGCAGTGCCTACACTA
    CTTAAGTA
    536 TXNRD1 NM_00109377  3348- CTCAGTTGCAGCACTGAGTGGTC
    1.2  3447 AAAATACATTTCTGGGCCACCTC
    AGGGAACCCATGCATCTGCCTGG
    CATTTAGGCAGCAGAGCCCCTGA
    CCGTCCCC
    537 TXNR NM_182743.2  2438- TGTTGCATGGAAGGGATAGTTTG
    D1b  2537 GCTCCCTTGGAGGCTATGTAGGC
    TTGTCCCGGGAAAGAGAACTGTC
    CTGCAGCTGAAATGGACTGTTCT
    TTACTGAC
    538 U2AF2 NM_007279.2  2871- TTTATGGCCAAACTATTTTGAAT
     2970 TTTGTTGTCCGGCCCTCAGTGCC
    CTGCCCTCTCCCTTACCAGGACC
    ACAGCTCTGTTCCTTCGGCCTCT
    GGTCCTCT
    539 UBA1 NM_003334.3  3307- CCGCCACGTGCGGGCGCTGGTGC
     3406 TTGAGCTGTGCTGTAACGACGAG
    AGCGGCGAGGATGTCGAGGTTC
    CCTATGTCCGATACACCATCCGC
    TGACCCCGT
    540 UBC NM_021009.3  1876- TGCAGATCTTCGTGAAGACCCTG
     1975 ACTGGTAAGACCATCACTCTCGA
    AGTGGAGCCGAGTGACACCATT
    GAGAATGTCAAGGCAAAGATCC
    AAGACAAGGA
    541 UBE2G1 NM_003342.4   685- ACGCTGGCTCCCTATCCACACTG
      784 TGGAAACCATCATGATTAGTGTC
    ATTTCTATGCTGGCAGACCCTAA
    TGGAGACTCACCTGCTAATGTTG
    ATGCTGCG
    542 UBE2I NM_194259.2   288- CTGCTCTGCTGACTGGGGAAGTC
      387 ATCGTGCCACCCAGAACCTGAGT
    GCGGGCCTCTCAGAGCTCCTTCG
    TCCGTGGGTCTGCCGGGGACTGG
    GCCTTGTC
    543 UBTF NM_00107668  2724- GGGGGTCCCAAAGAGTTTGATG
    3.1  2823 AGGCCCTCCACACCTGCGGCCCA
    ATCCAAGGTGGGGTGGAAGCTT
    GGGGAAGACCCATTCCTTCCCAG
    AGGGGCCTGC
    544 UQCRQ NM_014402.4    97- TGACGCGGATGCGGCATGTGATC
      196 AGCTACAGCTTGTCACCGTTCGA
    GCAGCGCGCCTATCCGCACGTCT
    TCACTAAAGGAATCCCCAATGTT
    CTGCGCCG
    545 USP16 NM_00103241  2487- TCTATTCCTTATATGGAGTTGTT
    0.1  2586 GAACACAGTGGTACTATGAGGTC
    GGGGCATTACACTGCCTATGCCA
    AGGCAAGAACCGCAAATAGTCAT
    CTCTCTAA
    546 USP21 NM_012475.4  1499- CCTTTTCACTAAGGAAGAAGAGC
     1598 TAGAGTCGGAGAATGCCCCAGT
    GTGTGACCGATGTCGGCAGAAA
    ACTCGAAGTACCAAAAAGTTGA
    CAGTACAAAGA
    547 USP34 NM_014709.3 10104- AGGAGCACACTGTAGACAGCTG
    10203 CATCAGTGACATGAAAACAGAA
    ACCAGGGAGGTCCTGACCCCAA
    CGAGCACTTCTGACAATGAGACC
    AGAGACTCCTC
    548 USP5 NM_003481.2  2720- AGAGCAGAGGGGCAGCGATAGA
     2819 CTCTGGGGATGGAGCAGGACGG
    GGACGGGAGGGGCCGGCCACCT
    GTCTGTAAGGAGACTTTGTTGCT
    TCCCCTGCCCC
    549 USP9Y NM_004654.3    86- GGTGTGGAAAGACTTTTCTGGGC
      185 TCAGAGGTGAAACTGACCCTTGT
    GTATCAGCAGCATTTCTGACTGA
    CTGAGAGAGTGTAGTGATTAACA
    GAGTTGTG
    550 VPS37C NM_017966.4  2579- TTATAAAGAGAAATCACTAATGG
     2678 ACTCTACTGGTTTGAGTGCTTCT
    GAGCTGGATGACCGACCGCCTGT
    ATGTTTGTGTAATTAATTGCCAT
    AATAAACT
    551 WDR1 NM_005112.4  2325- AACTGTTGCCTGTCAGTGTTTAC
     2424 AAACTAGTGCGTTGACGGCACCG
    TGTCCAAGTTTTTAGAACCCTTG
    TTAGCCAGACCGAGGTGTCCTGG
    TCACCGTT
    552 WDR91 NM_014149.3  2777- CAGGCTCTCCTGTTGCTTTGCCA
     2876 TGGAGCCAGGTCAGCTCTCTGTC
    TGTTCTGCTGGGTAACAAGGTTT
    GGCAGTTCCTGTTTCTCTGGGCT
    TAAGTCAA
    553 XCL2 NM_003175.3   378- GTAGTCTCTGGCACCCTGTCCGT
      477 CTCCAGCCAGCCAGCTCATTTCA
    CTTTACACCCTCATGGACTGAGA
    TTATACTCACCTTTTATGAAAGC
    ACTGCATG
    554 XPC NR_027299.1  3168- CTGGATGGTGGTGCATCCGTGAA
     3267 TGCGCTGATCGTTTCTTCCAGTT
    AGAGTCTTCATCTGTCCGACAAG
    TTCACTCGCCTCGGTTGCGGACC
    TAGGACCA
    555 YPEL1 NM_013313.4  3672- GCTCATTTTTAAACCAAATGAAC
     3771 AGACCATGAGCTGGCTTCAGGG
    GAAGTGCTATTCACAGGACCATA
    TCCACCACCCTCTTAAATTCCTA
    AACAATATC
    556 ZMIZ1 NM_020338.3  7171- ATGATCACAGGTGATTCACACGT
     7270 ACACACATAAACACACCCACCA
    GTGCAGCCTGAAGTAACTCCCAC
    AGAAACCATCATCGTCTTTGTAC
    ATCGTATGT
    557 ZNF143 NM_003442.5  2292- TATCAGATCACAAACTCCTAGAG
     2391 TCTACATGCAAGACTAGTAAAGT
    CTTATGGAGTCTTATGATGGATT
    TTTAACTTCCCGTGGAAAAAAAA
    ATAAAGGC
    558 ZNF239 NM_00109928  1496- AGAGCTCCAACCTTCACATCCAC
    3.1  1595 CAGCGGGTTCACAAGAAAGATC
    CTCGCTAACTGACATTAGCCCAT
    TCAGGTCTTCACAGCGCTCATAC
    TGTAAAAAC
    559 ZNF341 NM_032819.4  3247- CAGACGGTTCCCCACAGCATCCT
     3346 CAGACAGCTCTGTGATGTAGCTT
    TTAGGAGGCACTCAGGTGTCACG
    GCTAGACTGCAGCTATGAGACA
    GATCTGGCT
  • C. Polymerase Chain Reaction (PCR) Techniques
  • Another suitable quantitative method is RT-PCR, which can be used to compare mRNA levels in different sample populations, in normal and tumor tissues, to characterize patterns of gene expression, to discriminate between closely related mRNAs, and to analyze RNA structure. The first step is the isolation of mRNA from a target sample (e.g., typically total RNA isolated from human PBMC). mRNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g. formalin-fixed) tissue samples.
  • General methods for mRNA extraction are well known in the art, such standard textbooks of molecular biology. In particular, RNA isolation can be performed using a purification kit, buffer set and protease from commercial manufacturers, according to the manufacturer's instructions. Exemplary commercial products include TRI-REAGENT, Qiagen RNeasy mini-columns, MASTERPURE Complete DNA and RNA Purification Kit (EPICENTRE®, Madison, Wis.), Paraffin Block RNA Isolation Kit (Ambion, Inc.) and RNA Stat-60 (Tel-Test). Conventional techniques such as cesium chloride density gradient centrifugation may also be employed.
  • The first step in gene expression profiling by RT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction. The two most commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MMLV-RT). The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling. See, e.g., manufacturer's instructions accompanying the product GENEAMP RNA PCR kit (Perkin Elmer, Calif, USA). The derived cDNA can then be used as a template in the subsequent RT-PCR reaction.
  • The PCR step generally uses a thermostable DNA-dependent DNA polymerase, such as the Taq DNA polymerase, which has a 5′-3′ nuclease activity but lacks a 3′-5′ proofreading endonuclease activity. Thus, TAQMAN® PCR typically utilizes the 5′-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5′ nuclease activity can be used. Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction. In one embodiment, the target sequence is shown in Table III. A third oligonucleotide, or probe, is designed to detect nucleotide sequence located between the two PCR primers. The probe is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye and a quencher fluorescent dye. Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore. One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.
  • TaqMan® RT-PCR can be performed using commercially available equipment. In a preferred embodiment, the 5′ nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7900® Sequence Detection System®. The system amplifies samples in a 96-well format on a thermocycler. During amplification, laser-induced fluorescent signal is collected in real-time through fiber optic cables for all 96 wells, and detected at the CCD. The system includes software for running the instrument and for analyzing the data. 5′-Nuclease assay data are initially expressed as Ct, or the threshold cycle. As discussed above, fluorescence values are recorded during every cycle and represent the amount of product amplified to that point in the amplification reaction. The point when the fluorescent signal is first recorded as statistically significant is the threshold cycle (Ct).
  • To minimize errors and the effect of sample-to-sample variation, RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a constant level among different tissues, and is unaffected by the experimental treatment. RNAs most frequently used to normalize patterns of gene expression are mRNAs for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and β-actin.
  • Real time PCR is comparable both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization, and with quantitative comparative PCR using a normalization gene contained within the sample, or a housekeeping gene for RT-PCR.
  • In another PCR method, i.e., the MassARRAY-based gene expression profiling method (Sequenom, Inc., San Diego, CA), following the isolation of RNA and reverse transcription, the obtained cDNA is spiked with a synthetic DNA molecule (competitor), which matches the targeted cDNA region in all positions, except a single base, and serves as an internal standard. The cDNA/competitor mixture is PCR amplified and is subjected to a post-PCR shrimp alkaline phosphatase (SAP) enzyme treatment, which results in the dephosphorylation of the remaining nucleotides. After inactivation of the alkaline phosphatase, the PCR products from the competitor and cDNA are subjected to primer extension, which generates distinct mass signals for the competitor- and cDNA-derived PCR products. After purification, these products are dispensed on a chip array, which is pre-loaded with components needed for analysis with matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) analysis. The cDNA present in the reaction is then quantified by analyzing the ratios of the peak areas in the mass spectrum generated.
  • Still other embodiments of PCR-based techniques which are known to the art and may be used for gene expression profiling include, e.g., differential display, amplified fragment length polymorphism (iAFLP), and BeadArray™ technology (Illumina, San Diego, CA) using the commercially available Luminex100 LabMAP system and multiple color-coded microspheres (Luminex Corp., Austin, Tex.) in a rapid assay for gene expression; and high coverage expression profiling (HiCEP) analysis.
  • D. Microarrays
  • Differential gene expression can also be identified, or confirmed using the microarray technique. Thus, the expression profile of lung cancer-associated genes can be measured in either fresh or paraffin-embedded tissue, using microarray technology. In this method, polynucleotide sequences of interest (including cDNAs and oligonucleotides) are plated, or arrayed, on a microchip substrate. The arrayed sequences are then hybridized with specific DNA probes from cells or tissues of interest. Just as in the other methods and compositions herein, the source of mRNA is total RNA isolated from whole blood of controls and patient subjects.
  • In one embodiment of the microarray technique, PCR amplified inserts of cDNA clones are applied to a substrate in a dense array. In one embodiment, all 559 nucleotide sequences from Table III are applied to the substrate. The microarrayed genes, immobilized on the microchip, are suitable for hybridization under stringent conditions. Fluorescently labeled cDNA probes may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from tissues of interest. Labeled cDNA probes applied to the chip hybridize with specificity to each spot of DNA on the array. After stringent washing to remove non-specifically bound probes, the chip is scanned by confocal laser microscopy or by another detection method, such as a CCD camera. Quantitation of hybridization of each arrayed element allows for assessment of corresponding mRNA abundance. With dual color fluorescence, separately labeled cDNA probes generated from two sources of RNA are hybridized pairwise to the array. The relative abundance of the transcripts from the two sources corresponding to each specified gene is thus determined simultaneously. The miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels. Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols.
  • Other useful methods summarized by U.S. Pat. No. 7,081,340, and incorporated by reference herein include Serial Analysis of Gene Expression (SAGE) and Massively Parallel Signature Sequencing (MPSS). Briefly, serial analysis of gene expression (SAGE) is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need of providing an individual hybridization probe for each transcript. First, a short sequence tag (about 10 to 14 bp) is generated that contains sufficient information to uniquely identify a transcript, provided that the tag is obtained from a unique position within each transcript. Then, many transcripts are linked together to form long serial molecules, that can be sequenced, revealing the identity of the multiple tags simultaneously. The expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. For more details see, e.g. Velculescu et al., Science 270:484 487 (1995); and Velculescu et al., Cell 88:243 51 (1997), both of which are incorporated herein by reference.
  • Gene Expression Analysis by Massively Parallel Signature Sequencing (MPSS), described by Brenner et al., Nature Biotechnology 18:630 634 (2000) (which is incorporated herein by reference), is a sequencing approach that combines non-gel-based signature sequencing with in vitro cloning of millions of templates on separate 5 μm diameter microbeads. First, a microbead library of DNA templates is constructed by in vitro cloning. This is followed by the assembly of a planar array of the template-containing microbeads in a flow cell at a high density (typically greater than 3×106 microbeads/cm2). The free ends of the cloned templates on each microbead are analyzed simultaneously, using a fluorescence-based signature sequencing method that does not require DNA fragment separation. This method has been shown to simultaneously and accurately provide, in a single operation, hundreds of thousands of gene signature sequences from a yeast cDNA library.
  • E. Immunohistochemistry
  • Immunohistochemistry methods are also suitable for detecting the expression levels of the gene expression products of the informative genes described for use in the methods and compositions herein. Antibodies or antisera, preferably polyclonal antisera, and most preferably monoclonal antibodies, or other protein-binding ligands specific for each marker are used to detect expression. The antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline phosphatase. Alternatively, unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. Protocols and kits for immunohistochemical analyses are well known in the art and are commercially available.
  • III. COMPOSITIONS OF THE INVENTION
  • The methods for diagnosing lung cancer described herein which utilize defined gene expression profiles permit the development of simplified diagnostic tools for diagnosing lung cancer, e.g., NSCLC vs. non-cancerous nodule. Thus, a composition for diagnosing lung cancer in a mammalian subject as described herein can be a kit or a reagent. For example, one embodiment of a composition includes a substrate upon which said polynucleotides or oligonucleotides or ligands or ligands are immobilized. In another embodiment, the composition is a kit containing the relevant 5 or more polynucleotides or oligonucleotides or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items. In still another embodiment, at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label.
  • In one embodiment, a composition for diagnosing lung cancer in a mammalian subject includes 5 or more PCR primer-probe sets. Each primer-probe set amplifies a different polynucleotide sequence from a gene expression product of 5 or more informative genes found in the blood of the subject. These informative genes are selected to form a gene expression profile or signature which is distinguishable between a subject having lung cancer and a subject having a non-cancerous nodule. Changes in expression in the genes in the gene expression profile from that of a reference gene expression profile are correlated with a lung cancer, such as non-small cell lung cancer (NSCLC).
  • In one embodiment of this composition, the informative genes are selected from among the genes identified in Table I. In another embodiment of this composition, the informative genes are selected from among the genes identified in Table II. This collection of genes is those for which the gene product expression is altered (i.e., increased or decreased) versus the same gene product expression in the blood of a reference control (i.e., a patient having a non-cancerous nodule). In one embodiment, polynucleotide or oligonucleotide or ligands, i.e., probes, are generated to 5 or more informative genes from Table I or Table II for use in the composition (the CodeSet). An example of such a composition contains probes to a targeted portion of the 559 genes of Table I. In another embodiment, probes are generated to all 559 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 539 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 3 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 5 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 10 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 15 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 20 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 25 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 30 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 35 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 40 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 45 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 50 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 60 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 65 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 70 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 75 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 80 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 85 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 90 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 95 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 100 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 200 genes from Table I for use in the composition. In yet another embodiment, probes are generated to 300 genes from Table I for use in the composition. Still other embodiments employ probes to a targeted portion of other combinations of the genes in Table I or Table II. The selected genes from the Table need not be in rank order; rather any combination that clearly shows a difference in expression between the reference control to the diseased patient is useful in such a composition.
  • In one embodiment of the compositions described above, the reference control is a non-healthy control (NHC) as described above. In other embodiments, the reference control may be any class of controls as described above in “Definitions”.
  • The compositions based on the genes selected from Table I or Table II described herein, optionally associated with detectable labels, can be presented in the format of a microfluidics card, a chip or chamber, or a kit adapted for use with the Nanostring, PCR, RT-PCR or Q PCR techniques described above. In one aspect, such a format is a diagnostic assay using TAQMAN® Quantitative PCR low density arrays. In another aspect, such a format is a diagnostic assay using the Nanostring nCounter platform.
  • For use in the above-noted compositions the PCR primers and probes are preferably designed based upon intron sequences present in the gene(s) to be amplified selected from the gene expression profile. Exemplary target sequences are shown in Table III. The design of the primer and probe sequences is within the skill of the art once the particular gene target is selected. The particular methods selected for the primer and probe design and the particular primer and probe sequences are not limiting features of these compositions. A ready explanation of primer and probe design techniques available to those of skill in the art is summarized in U.S. Pat. No. 7,081,340, with reference to publically available tools such as DNA BLAST software, the Repeat Masker program (Baylor College of Medicine), Primer Express (Applied Biosystems); MGB assay-by-design (Applied Biosystems); Primer3 (Steve Rozen and Helen J. Skaletsky (2000) Primer3 on the WWW for general users and for biologist programmers.
  • In general, optimal PCR primers and probes used in the compositions described herein are generally 17-30 bases in length, and contain about 20-80%, such as, for example, about 50-60% G+C bases. Melting temperatures of between 50 and 80° C., e.g. about 50 to 70° C. are typically preferred.
  • In another aspect, a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I. In another embodiment, a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I or Table II. This type of composition relies on recognition of the same gene profiles as described above for the Nanostring compositions but employs the techniques of a cDNA array. Hybridization of the immobilized polynucleotides in the composition to the gene expression products present in the blood of the patient subject is employed to quantitate the expression of the informative genes selected from among the genes identified in Tables I or Table II to generate a gene expression profile for the patient, which is then compared to that of a reference sample. As described above, depending upon the identification of the profile (i.e., that of genes of Table I or subsets thereof, that of genes of Table II or subsets thereof), this composition enables the diagnosis and prognosis of NSCLC lung cancers. Again, the selection of the polynucleotide sequences, their length and labels used in the composition are routine determinations made by one of skill in the art in view of the teachings of which genes can form the gene expression profiles suitable for the diagnosis and prognosis of lung cancers.
  • In yet another aspect, a composition or kit useful in the methods described herein contain a plurality of ligands that bind to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject. In another embodiment, a composition or kit useful in the methods described herein contain a plurality of ligands that bind to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject. The gene expression profile contains the genes of Table I or Table II, as described above for the other compositions. This composition enables detection of the proteins expressed by the genes in the indicated Tables. While preferably the ligands are antibodies to the proteins encoded by the genes in the profile, it would be evident to one of skill in the art that various forms of antibody, e.g., polyclonal, monoclonal, recombinant, chimeric, as well as fragments and components (e.g., CDRs, single chain variable regions, etc.) may be used in place of antibodies. Such ligands may be immobilized on suitable substrates for contact with the subject's blood and analyzed in a conventional fashion. In certain embodiments, the ligands are associated with detectable labels. These compositions also enable detection of changes in proteins encoded by the genes in the gene expression profile from those of a reference gene expression profile. Such changes correlate with lung cancer in a manner similar to that for the PCR and polynucleotide-containing compositions described above.
  • For all of the above forms of diagnostic/prognostic compositions, the gene expression profile can, in one embodiment, include at least the first 25 of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 10 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 15 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 20 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 30 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 40 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 50 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 60 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 70 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 80 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 90 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 100 of the informative genes of Table II. In one embodiment, for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include at least the first 100 of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 200 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 300 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 400 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 500 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 539 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 559 of the informative genes of Table I.
  • These compositions may be used to diagnose lung cancers, such as stage I or stage II NSCLC. Further these compositions are useful to provide a supplemental or original diagnosis in a subject having lung nodules of unknown etiology.
  • IV. DIAGNOSTIC METHODS OF THE INVENTION
  • All of the above-described compositions provide a variety of diagnostic tools which permit a blood-based, non-invasive assessment of disease status in a subject. Use of these compositions in diagnostic tests, which may be coupled with other screening tests, such as a chest X-ray or CT scan, increase diagnostic accuracy and/or direct additional testing.
  • Thus, in one aspect, a method is provided for diagnosing lung cancer in a mammalian subject. This method involves identifying a gene expression profile in the blood of a mammalian, preferably human, subject. In one embodiment, the gene expression profile includes 100 or more gene expression products of 100 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 100 or more informative genes from the genes of Table I. In another embodiment, the gene expression profile includes 10 or more gene expression products of 10 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table I. In another embodiment, the gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table II. In another embodiment, the gene expression profile includes 10 or more gene expression products of 5 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table I. In another embodiment, the gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table II. Comparison of a subject's gene expression profile with a reference gene expression profile permits identification of changes in expression of the informative genes that correlate with a lung cancer (e.g., NSCLC). This method may be performed using any of the compositions described above. In one embodiment, the method enables the diagnosis of a cancerous tumor from a benign nodule.
  • In another aspect, use of any of the compositions described herein is provided for diagnosing lung cancer in a subject.
  • The diagnostic compositions and methods described herein provide a variety of advantages over current diagnostic methods. Among such advantages are the following. As exemplified herein, subjects with cancerous tumors are distinguished from those with benign nodules. These methods and compositions provide a solution to the practical diagnostic problem of whether a patient who presents at a lung clinic with a small nodule has malignant disease. Patients with an intermediate-risk nodule would clearly benefit from a non-invasive test that would move the patient into either a very low-likelihood or a very high-likelihood category of disease risk. An accurate estimate of malignancy based on a genomic profile (i.e. estimating a given patient has a 90% probability of having cancer versus estimating the patient has only a 5% chance of having cancer) would result in fewer surgeries for benign disease, more early stage tumors removed at a curable stage, fewer follow-up CT scans, and reduction of the significant psychological costs of worrying about a nodule. The economic impact would also likely be significant, such as reducing the current estimated cost of additional health care associated with CT screening for lung cancer, i.e., $116,000 per quality adjusted life-year gained. A non-invasive blood genomics test that has a sufficient sensitivity and specificity would significantly alter the post-test probability of malignancy and thus, the subsequent clinical care.
  • A desirable advantage of these methods over existing methods is that they are able to characterize the disease state from a minimally-invasive procedure, i.e., by taking a blood sample. In contrast, current practice for classification of cancer tumors from gene expression profiles depends on a tissue sample, usually a sample from a tumor. In the case of very small tumors a biopsy is problematic and clearly if no tumor is known or visible, a sample from it is impossible. No purification of tumor is required, as is the case when tumor samples are analyzed. A recently published method depends on brushing epithelial cells from the lung during bronchoscopy, a method which is also considerably more invasive than taking a blood sample. Blood samples have an additional advantage, which is that the material is easily prepared and stabilized for later analysis, which is important when messenger RNA is to be analyzed.
  • The 559 classifier described herein showed a ROC-AUC of 0.81 over all tested samples. In one embodiment, when the sensitivity is about 90%, the specificity is about 46%. When the nodule classification accuracy is assessed by size without using a specific threshold for sensitivity, as nodules size and the cancer risk factor increases, the number of benign nodules classified as cancer increases. In one embodiment, the accuracy of the gene classifier is about 89% for nodules ≤8 mm. In another embodiment, the accuracy of the gene classifier is about 75% for nodules >8 to about ≤12 mm. In yet another embodiment, the accuracy of the gene classifier is about 68% for nodules >12 to about ≤16 mm. In another embodiment, the accuracy of the gene classifier is about 53% for ≥16 mm. See examples below.
  • In one embodiment, for nodules about <10 mm, the specificity is about 54% and the ROC-AUC to 0.85 at about 90% sensitivity. In another embodiment, for larger nodules, about >10 mm, the specificity is about 24% and the ROC-AUC about 0.71 at about 90% sensitivity.
  • The 100 Classifier described herein showed a ROC-AUC of 0.82 over all tested samples. In one embodiment, when the sensitivity is about 90%, the specificity is about 62%. In another embodiment, when the sensitivity is about 79%, the specificity is about 68%. In one embodiment, when the sensitivity is about 71%, the specificity is about 75%. See examples below.
  • These compositions and methods allow for more accurate diagnosis and treatment of lung cancer. Thus, in one embodiment, the methods described include treatment of the lung cancer. Treatment may removal of the neoplastic growth, chemotherapy and/or any other treatment known in the art or described herein.
  • In one embodiment, a method for diagnosing the existence or evaluating a lung cancer in a mammalian subject is provided, which includes identifying changes in the expression of 5, 10, 15 or more genes in the sample of said subject, said genes selected from the genes of Table I or the genes of Table II. The subject's gene expression levels are compare with the levels of the same genes in a reference or control, wherein changes in expression of the subject's genes from those of the reference correlates with a diagnosis or evaluation of a lung cancer.
  • In one embodiment, the diagnosis or evaluation comprise one or more of a diagnosis of a lung cancer, a diagnosis of a benign nodule, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy. In another embodiment, the changes comprise an upregulation of one or more selected genes in comparison to said reference or control or a downregulation of one or more selected genes in comparison to said reference or control.
  • In one embodiment, the method includes the size of a lung nodule in the subject. The specificity and sensitivity may be variable based on the size of the nodule. In one embodiment, the specificity is about 46% at about 90% sensitivity. In another embodiment, the specificity is about 54% at about 90% sensitivity for nodules <10 mm. In yet another embodiment, the accuracy is about 88% for nodules ≤8 mm, about 75% for nodules >8 mm and ≤12 mm, about 68% for nodules >12 mm and ≤16 mm, and about 53% for nodules >16 mm.
  • In another embodiment, the reference or control comprises three or more genes of Table I sample of at least one reference subject. The reference subject may be selected from the group consisting of: (a) a smoker with malignant disease, (b) a smoker with non-malignant disease, (c) a former smoker with non-malignant disease, (d) a healthy non-smoker with no disease, (e) a non-smoker who has chronic obstructive pulmonary disease (COPD), (f) a former smoker with COPD, (g) a subject with a solid lung tumor prior to surgery for removal of same; (h) a subject with a solid lung tumor following surgical removal of said tumor; (i) a subject with a solid lung tumor prior to therapy for same; and (j) a subject with a solid lung tumor during or following therapy for same. In one embodiment, the reference or control subject (a)-(j) is the same test subject at a temporally earlier timepoint.
  • The sample is selected from those described herein. In one embodiment, the sample is peripheral blood. The nucleic acids in the sample are, in some embodiments, stabilized prior to identifying changes in the gene expression levels. Such stabilization may be accomplished, e.g., using the Pax Gene system, described herein.
  • In one embodiment, the method of detecting lung cancer in a patient includes
      • a. obtaining a sample from the patient; and
      • b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product.
  • In another embodiment, the method of diagnosing lung cancer in a subject includes
      • a. obtaining a blood sample from a subject;
      • b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 100 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; and
      • c. diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected.
  • In yet another embodiment, the method includes
      • a. obtaining a blood sample from a subject;
      • b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 genes of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product;
      • c. diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected; and
      • d. removing the neoplastic growth.
    V. EXAMPLES
  • The invention is now described with reference to the following examples. These examples are provided for the purpose of illustration only and the invention should in no way be construed as being limited to these examples but rather should be construed to encompass any and all variations that become evident as a result of the teaching provided herein.
  • Example 1: Patient Population—Analysis A
  • For development of the gene classifier described herein, blood samples and clinical information were collected from 150 subjects, 73 having a diagnosis of lung cancer and 77 having a diagnosis of benign nodule. Patient characteristics are shown in FIG. 1 .
  • Patients with lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).
  • The “control” cohort was derived from patients with benign lung nodules (e.g. ground glass opacities, single nodules, granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.
  • Example 2: Patient Population—Analysis B
  • Further blood samples and clinical information were collected from 120 subjects, 60 having a diagnosis of lung cancer and 60 having a diagnosis of benign nodule. Patients with lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).
  • The “control” cohort was derived from patients with benign lung nodules (e.g. granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.
  • Example 3: Sample Collection Protocols and Processing
  • Blood samples were collected in the clinic by the tissue acquisition technician. Blood samples were drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs.
  • Example 4—RNA Purification and Quality Assessment
  • PAXgene RNA is prepared using a standard commercially available kit from Qiagen™ that allows purification of mRNA. The resulting RNA is used for mRNA profiling. The RNA quality is determined using a Bioanalyzer. Only samples with RNA Integrity numbers >3 were used.
  • Briefly, RNA is isolated as follows. Turn shaker-incubator on and set to 55° C. before beginning. Unless otherwise noted, all steps in this protocol including centrifugation steps, should be carried out at room temp (15-25° C.). This protocol assumes samples are stores at −80° C. Unfrozen samples that have been left a RT per the Qiagen protocol of a minimum of 2 hours should be processed in the same way.
  • Thaw Paxgene tubes upright in a plastic rack. Invert tubes at least 10 times to mix before starting isolation. Prepare all necessary tubes. For each sample, the following are needed: 2 numbered 1.5 ml Eppendorf tubes; 1 Eppendorf tube with the sample information (this is the final tube); 1 Lilac Paxgene spin column; 1 Red Paxgene Spin column; and 5 Processing tubes.
  • Centrifuge the PAXgene Blood RNA Tube for 10 minutes at 5000×g using a swing-out rotor in Qiagen centrifuge. (Sigma 4-15° C. Centrifuge., Rotor: Sigma Nr. 11140, 7/01, 5500/min, Holder: Sigma 13115,286 g 14/D, Inside tube holder: 18010, 125 g). Note: After thawed, ensure that the blood sample has been incubated in the PAXgene Blood RNA Tube for a minimum of 2 hours at room temperature (15-25° C.), in order to achieve complete lysis of blood cells.
  • Under the hood—remove the supernatant by decanting into bleach. When the supernatant is decanted, take care not to disturb the pellet, and dry the rim of the tube with a clean paper towel. Discard the decanted supernatant by placing the clotted blood into a bag and then into the infectious waste and discard the fluid portion down the sink and wash down with a lot of water. Add 4 ml RNase-free water to the pellet, and close the tube using a fresh secondary Hemogard closure.
  • Vortex until the pellet is visibly dissolved. Weigh the tubes in the centrifuge holder again to ensure they are balanced, and centrifuge for 10 minutes at 5000×g using a swing-out rotor Qiagen centrifuge. Small debris remaining in the supernatant after vortexing but before centrifugation will not affect the procedure.
  • Remove and discard the entire supernatant. Leave tube upside-down for 1 min to drain off all supernatant. Incomplete removal of the supernatant will inhibit lysis and dilute the lysate, and therefore affect the conditions for binding RNA to the PAXgene membrane.
  • Add 350 μl Buffer BM1 and pipet up and down lyse the pellet.
  • Pipet the re-suspended sample into a labeled 1.5 ml microcentrifuge tube. Add 300 μl Buffer BM2. Then add 40 μl proteinase K. Mix by vortexing for 5 seconds, and incubate for 10 minutes at 55° C. using a shaker-incubator at the highest possible speed, 800 rpm on Eppendorf thermomixer. (If using a shaking water bath instead of a thermomixer, quickly vortex the samples every 2-3 minutes during the incubation. Keep the vortexer next to the incubator).
  • Pipet the lysate directly into a PAXgene Shredder spin column (lilac tube) placed in a 2 ml processing tube, and centrifuge for 3 minutes at 24 C at 18,500×g in the TOMY Microtwin centrifuge. Carefully pipet the lysate into the spin column and visually check that the lysate is completely transferred to the spin column. To prevent damage to columns and tubes, do not exceed 20,000×g.
  • Carefully transfer the entire supernatant of the flow-through fraction to a fresh 1.5 ml microcentrifuge tube without disturbing the pellet in the processing tube. Discard the pellet in the processing tube.
  • Add 700 μl isopropanol (100%) to the supernatant. Mix by vortexing.
  • Pipet 690 μl sample into the PAXgene RNA spin column (red) placed in a 2 ml processing tube, and centrifuge for 1 minute at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.
  • Pipet the remaining sample into the PAXgene RNA spin column (red), and centrifuge for 1 minute at 18,500×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through. Carefully pipet the sample into the spin column and visually check that the sample is completely transferred to the spin column.
  • Pipet 350 μl Buffer BM3 into the PAXgene RNA spin column. Centrifuge for 15 sec at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.
  • Prepare DNase I incubation mix for step 13. Add 10 μl DNase I stock solution to 70 μl Buffer RDD in a 1.5 ml microcentrifuge tube. Mix by gently flicking the tube, and centrifuge briefly to collect residual liquid from the sides of the tube.
  • Pipet the DNase I incubation mix (80 l) directly onto the PAXgene RNA spin column membrane, and place on the benchtop (20-30° C.) for 15 minutes. Ensure that the DNase I incubation mix is placed directly onto the membrane. DNase digestion will be incomplete if part of the mix is applied to and remains on the walls or the O-ring of the spin column.
  • Pipet 350 μl Buffer BM3 into the PAXgene RNA spin column, and centrifuge for 15 sec at 18,500×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.
  • Pipet 500 μl Buffer BM4 to the PAXgene RNA spin column, and centrifuge for 15 sec at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.
  • Add another 500 μl Buffer BM4 to the PAXgene RNA spin column. Centrifuge for 2 minutes at 18,500×g.
  • Discard the tube containing the flow-through, and place the PAXgene RNA spin column in a new 2 ml processing tube. Centrifuge for 1 minute at 18,500×g.
  • Discard the tube containing the flow-through. Place the PAXgene RNA spin column in a labeled 1.5 ml microcentrifuge tube (final tube), and pipet 40 μl Buffer BR5 directly onto the PAXgene RNA spin column membrane. Centrifuge for 1 minute at 10,000×g to elute the RNA. It is important to wet the entire membrane with Buffer BR5 in order to achieve maximum elution efficiency.
  • Repeat the elution step as described, using 40 μl Buffer BR5 and the same microcentrifuge tube. Centrifuge for 1 minute at 20,000×g to elute the RNA.
  • Incubate the eluate for 5 minutes at 65° C. in the shaker-incubator without shaking. After incubation, chill immediately on ice. This incubation at 65° C. denatures the RNA for downstream applications. Do not exceed the incubation time or temperature.
  • If the RNA samples will not be used immediately, store at −20° C. or −70° C. Since the RNA remains denatured after repeated freezing and thawing, it is not necessary to repeat the incubation at 65° C.
  • Example 5: Measurement of RNA Levels
  • To provide a biomarker signature that can be used in clinical practice to diagnose lung cancer, a gene expression profile with the smallest number of genes that maintain satisfactory accuracy is provided by the use of 100 more of the genes identified in Table I as well as by the use of 10 or more of the genes identified in Table II. These gene profiles or signatures permit simpler and more practical tests that are easy to use in a standard clinical laboratory. Because the number of discriminating genes is small enough, NanoString nCounter® platforms are developed using these gene expression profiles.
  • A. Nanostring nCounter® Platform Gene Expression Assay Protocol
  • Total RNA was isolated from whole blood using the Paxgene Blood miRNA Kit, as described above, and samples were checked for RNA quality. Samples were analyzed with the Agilent 2100 Bioanalyzer on a RNA Nano chip, using the RIN score and electropherogram picture as indicators for good sample integrity. Samples were also quantitated on the Nanodrop (ND-1000 Spectrophotometer) where 260/280 and 260/230 readings were recorded and evaluated for Nanostring-compatibility. From the concentrations taken by Nanodrop, total RNA samples were normalized to contain 100 ng in 5 μL, using Nuclease-free water as diluent, into Nanostring-provided tube strips. An 8 μL aliquot of a mixture of the Nanostring nCounter Reporter CodeSet and Hybridization Buffer (70 μL Hybridization Buffer, 42 μL Reporter CodeSet per 12 assays) and 2 μL of Capture ProbeSet was added to each 5 μL RNA sample. Samples were hybridized for 19 hours at 65° C. in the Thermocycler (Eppendorf). During hybridization, Reporter Probes, which have fluorescent barcodes specific to each mRNA of interest to the user, and biotinylated Capture Probes bound to their associated target mRNA to create target-probe complexes. After hybridization was complete, samples were then transferred to the nCounter Prep Station for processing using the Standard Protocol setting (Run Time: 2 hr 35 min). The Prep Station robot, during the Standard Protocol, washed samples to remove excess Reporter and Capture Probes. Samples were moved to a streptavidin-coated cartridge where purified target-probe complexes were immobilized in preparation for imaging by the nCounter Digital Analyzer. Upon completion, the cartridge was sealed and placed in the Digital Analyzer using a Field of View (FOV) setting at 555. A fluorescent microscope tabulated the raw counts for each unique barcode associated with a target mRNA. Data collected was stored in .csv files and then transferred to the Bioinformatics Facility for analysis according to the manufacturer's instructions.
  • Example 6: Biomarker Selection
  • Support Vector Machine (SVM) can be applied to gene expression datasets for gene function discovery and classification. SVM has been found to be most efficient at distinguishing the more closely related cases and controls that reside in the margins. Primarily SVM-RFE (48, 54) was used to develop gene expression classifiers which distinguish clinically defined classes of patients from clinically defined classes of controls (smokers, non-smokers, COPD, granuloma, etc). SVM-RFE is a SVM based model utilized in the art that removes genes, recursively based on their contribution to the discrimination, between the two classes being analyzed. The lowest scoring genes by coefficient weights were removed and the remaining genes were scored again and the procedure was repeated until only a few genes remained. This method has been used in several studies to perform classification and gene selection tasks. However, choosing appropriate values of the algorithm parameters (penalty parameter, kernel-function, etc.) can often influence performance.
  • SVM-RCE is a related SVM based model, in that it, like SVM-RFE assesses the relative contributions of the genes to the classifier. SVM-RCE assesses the contributions of groups of correlated genes instead of individual genes. Additionally, although both methods remove the least important genes at each step, SVM-RCE scores and removes clusters of genes, while SVM-RFE scores and removes a single or small numbers of genes at each round of the algorithm.
  • The SVM-RCE method is briefly described here. Low expressing genes (average expression less than 2× background) were removed, quantile normalization performed, and then “outlier” arrays whose median expression values differ by more than 3 sigma from the median of the dataset were removed. The remaining samples were subject to SVM-RCE using ten repetitions of 10-fold cross-validation of the algorithm. The genes were reduced by t-test (applied on the training set) to an experimentally determined optimal value which produces highest accuracy in the final result. These starting genes were clustered by K-means into clusters of correlated genes whose average size is 3-5 genes. SVM classification scoring was carried out on each cluster using 3-fold resampling repeated 5 times, and the worst scoring clusters eliminated. Accuracy is determined on the surviving pool of genes using the left-out 10% of samples (testing set) and the top-scoring 100 genes were recorded. The procedure was repeated from the clustering step to an end point of 2 clusters. The optimal gene panel was taken to be the minimal number of genes which gives the maximal accuracy starting with the most frequently selected gene. The identity of the individual genes in this panel is not fixed, since the order reflects the number of times a given gene was selected in the top 100 informative genes and this order is subject to some variation.
  • A. Biomarker Selection.
  • Genes which score highest (by SVM) in discriminating cancerous tumors from benign nodules were examined for their utility for clinical tests. Factors considered include, higher differences in expression levels between classes, and low variability within classes. When selecting biomarkers for validation an effort was made to select genes with distinct expression profiles to avoid selection of correlated genes and to identify genes with differential expression levels that were robust by alternative techniques including PCR and/or immuno-histochemistry.
  • B. Validation.
  • Three methods of validation were considered.
  • Cross-Validation: To minimize over-fitting within a dataset, K-fold cross-validation (K usually equal to 10) was used, when the dataset is split on K parts randomly and K−1 parts were used for training and 1 for testing. Thus, for K=10 the algorithm was trained on a random selection of 90% of the patients and 90% of the controls and then tested on the remaining 10%. This was repeated until all of the samples have been employed as test subjects and the cumulated classifier makes use of all of the samples, but no sample is tested using a training set of which it is a part. To reduce the randomization impact, K-fold separation was performed M times producing different combinations of patients and controls in each of K folds each time. Therefore, for individual dataset M*K rounds of permuted selection of training and testing sets were used for each set of genes.
  • Independent Validation: To estimate the reproducibility of the data and the generality of the classifier, one needs to examine the classifier that was built using one dataset and tested using another dataset to estimate the performance of the classifier. To estimate the performance, validation on the second set was performed using the classifier developed with the original dataset.
  • Resampling (permutation): To demonstrate dependence of the classifier on the disease state, patients and controls from the dataset were chosen at random (permuted) and the classification was repeated. The accuracy of classification using randomized samples was compared to the accuracy of the developed classifier to determine the p value for the classifier, i.e., the possibility that the classifier might have been chosen by chance. In order to test the generality of a classifier developed in this manner, it was used to classify independent sets of samples that were not used in developing the classifier. The cross-validation accuracies of the permuted and original classifier were compared on independent test sets to confirm its validity in classifying new samples.
  • C. Classifier Performance
  • Performance of each classifier was estimated by different methods and several performance measurements were used for comparing classifiers between each other. These measurements include accuracy, area under ROC curve, sensitivity, specificity, true positive rate and true negative rate. Based on the required properties of the classification of interest, different performance measurements can be used to pick the optimal classifier, e.g. classifier to use in screening of the whole population would require better specificity to compensate for small (˜1%) prevalence of the disease and therefore avoid large number of false positive hits, while a diagnostic classifier of patients in hospital should be more sensitive.
  • For diagnosing cancerous tumors from benign nodules, higher sensitivity is more desirable than specificity, as the patients are already at high risk.
  • Example 7: Testing of the Classifiers
  • Peripheral blood samples were all collected in PAXgene RNA stabilizations tubes and RNA was extracted according to the manufacturer. Samples were tested on a Nanostring nCounter™ (as described above) against a custom panel of 559 probes (Table III). In addition, they were tested against a 100 probe subset of 559 marker panel.
  • For the 559 Classifier, 432 were selected based on previous microarray data, 107 probes were selected from Nanostring studies and 20 were housekeeping genes. We analyzed 610 PAXgene RNA samples (278 cancers, 332 controls) derived from 5 collection sites. For QC, a Universal RNA standard (Agilent) was included in each batch of 36 samples tested. Probe expression values were normalized using the 20 housekeeping genes as well as spike-in positive and negative controls supplied by Nanostring (included in classifier). Zscores were calculated for probe count values and served as the input to a Support Vector Machine (SVM) classifier using a polynomial kernel. Classification performance was evaluated by 10-fold cross-validation of the samples.
  • A. 559 Classifier
  • As shown in FIGS. 2A to 2B, the 559 classifier developed on all the samples showed a ROC-AUC of 0.81 (FIG. 2A). With the Sensitivity set at 90%, the specificity is 46%. When performed on a balanced set of 556 samples (278 cancer, 278 nodule), similar performance is shown (FIG. 2B). For both sets, UHR controls, post samples, and patients with other cancers were excluded.
  • When nodule classification accuracy is assessed by size without using a specific threshold for sensitivity, we find that as nodules size and the cancer risk factor increases, the number of benign nodules classified as cancer increases. FIG. 3 . In this analysis, nodules ≤8 mm were correctly classified 88.9% of the time, for nodules >8, ≤12 mm accuracy was 75%, for nodules >12, ≤16 mm accuracy was 68%, for nodules >16 mm accuracy is 53.6%. See Table IV below.
  • TABLE IV
    Nodule Size Correct Incorrect Total Specificity
    <=5 mm 108 19 127 85.0%
    >5, <=8 mm 88 11 99 88.9%
     >8, <=12 mm 40 13 53 75.5%
    >12, <=16 mm 17 8 25 68.0%
    >16 mm 15 13 28 53.6%
    Total 268 64 332 80.7%
  • A second set of nodules was tested and the accuracy of the classifier for size groups was determined by sample group (cancer vs benign nodule). Similarly, as nodule size and the cancer risk factor increases, the number of benign nodules classified as cancer increases (FIGS. 4A to 4C). For cancers >5 mm and higher, r=0.95. For nodules of all sizes, r=0.97. The chart shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.
  • Since classification accuracy was found to be negatively correlated with benign nodule size, we reanalyzed the data using only nodules <10 mm (n=244) (FIG. 5A) and sensitivity fixed at 90%, in this case the specificity rises to 54% and the ROC-AUC to 0.85. For larger nodules, >10 mm (n=88) the specificity drops to 24% and the ROC-AUC drops to 0.71 (FIG. 5B). See Table V below.
  • TABLE V
    Small Large
    ≤10 mm >10 mm All nodules
    N (nodules) 244 88 332
    min 1 10.4 1
    max 10 90 90
    mean 6.07 17.8 8.7
    median 6 15 6
    std 1.73 10.6 7.13
    ROC Area 0.85 0.71 0.81
    Specificity at 54% 42% 46%
    90% Sensitivity
  • B. 100 Marker Classifier
  • We now reanalyzed the data from the 633 samples analyzed by W559 on the Nanostring platform in order to identify the minimal number of probes required to maintain performance attained with the whole panel. We used SVM-RFE for probe selection as previously described. We used 75% of the data for the training set with SVM-RFE and the tested the performance of top 100 probes (Table II) selected by this process on an independent testing set composed of 25% of the samples. Samples were randomly selected for training and testing sets Table VI below. The accuracy obtained on the testing set is shown in FIG. 6 . In this analysis, at a sensitivity of 90%, specificity was 62%; at a sensitivity of 79%, specificity was 68%; and at a sensitivity of 71%, specificity was 75% (FIG. 6 ). In summary the ROC-AUC is 0.82 and at a sensitivity of 0.90 we achieve a specificity of 0.62.
  • TABLE VI
    nodules cancer
    > <= n > <= n
    0 5 130 0 14 86
    5 8 109 14 22 75
    8 12.5 65 22 33 64
    12.5 57 33 47
  • Each and every patent, patent application, and publication, including the priority application, U.S. Provisional Patent Application No. 62/352,865, filed Jun. 21, 2016, and publically available gene sequence cited throughout the disclosure is expressly incorporated herein by reference in its entirety. While this invention has been disclosed with reference to specific embodiments, it is apparent that other embodiments and variations of this invention are devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims include such embodiments and equivalent variations.

Claims (20)

1. A composition for diagnosing the existence or evaluating the progression of a lung cancer in a mammalian subject, said composition comprising at least 10 polynucleotides or oligonucleotides, wherein each polynucleotide or oligonucleotide hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.
2. The composition of claim 1, wherein at least one polynucleotide or oligonucleotide is attached to a detectable label.
3. The composition of claim 2, wherein each polynucleotide or oligonucleotide is attached to a different detectable label.
4. The composition of claim 1, further comprising a capture oligonucleotide, which hybridizes to at least one polynucleotide or oligonucleotide.
5. The composition of claim 4, wherein the capture oligonucleotide is capable of hybridizing to each polynucleotide or oligonucleotide.
6. The composition of claim 4, wherein the capture oligonucleotide binds to a substrate.
7. The composition of claim 6, further comprising a substrate to which the capture oligonucleotide binds.
8. The composition of claim 1, comprising at least 15 polynucleotides or oligonucleotides.
9. The composition of claim 1, comprising at least 25 polynucleotides or oligonucleotides.
10. The composition of claim 1, comprising at least 50 polynucleotides or oligonucleotides.
11. The composition of claim 1, comprising at least 100 polynucleotides or oligonucleotides.
12. The composition of claim 1, comprising at least 500 polynucleotides or oligonucleotides.
13. The composition of claim 1, comprising polynucleotides or oligonucleotides capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table I.
14. A kit comprising the composition of claim 1 and an apparatus for sample collection.
15. A method for diagnosing the existence or evaluating a lung cancer in a mammalian subject comprising identifying changes in the expression of 10 or more genes in the sample of said subject, said genes selected from the genes of Table I; and comparing said subject's gene expression levels with the levels of the same genes in a reference or control, wherein changes in expression of the subject's genes from those of the reference correlates with a diagnosis or evaluation of a lung cancer.
16. The method according to claim 15, wherein said diagnosis or evaluation comprise one or more of a diagnosis of a lung cancer, a diagnosis of a benign nodule, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy.
17. The method according to claim 15, wherein said changes comprise an upregulation of one or more selected genes in comparison to said reference or control or a downregulation of one or more selected genes in comparison to said reference or control.
18. The method according to claim 15, further comprising identifying the size of a lung nodule in the subject.
19. The method according to claim 15, wherein the specificity is about 46% at about 90% sensitivity or about 54% at about 90% for nodules <10 mm.
20. The method according to claim 15, wherein the accuracy is about 88% for nodules ≤8 mm, about 75% for nodules ≥8 mm and <12 mm, about 68% for nodules >12 mm and ≤16 mm, and about 53% for nodules >16 mm.
US18/306,548 2016-06-21 2023-04-25 Compositions and methods for diagnosing lung cancers using gene expression profiles Pending US20230366034A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/306,548 US20230366034A1 (en) 2016-06-21 2023-04-25 Compositions and methods for diagnosing lung cancers using gene expression profiles

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662352865P 2016-06-21 2016-06-21
PCT/US2017/038571 WO2017223216A1 (en) 2016-06-21 2017-06-21 Compositions and methods for diagnosing lung cancers using gene expression profiles
US201816312036A 2018-12-20 2018-12-20
US18/306,548 US20230366034A1 (en) 2016-06-21 2023-04-25 Compositions and methods for diagnosing lung cancers using gene expression profiles

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US16/312,036 Continuation US11661632B2 (en) 2016-06-21 2017-06-21 Compositions and methods for diagnosing lung cancers using gene expression profiles
PCT/US2017/038571 Continuation WO2017223216A1 (en) 2016-06-21 2017-06-21 Compositions and methods for diagnosing lung cancers using gene expression profiles

Publications (1)

Publication Number Publication Date
US20230366034A1 true US20230366034A1 (en) 2023-11-16

Family

ID=60783927

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/312,036 Active 2038-07-29 US11661632B2 (en) 2016-06-21 2017-06-21 Compositions and methods for diagnosing lung cancers using gene expression profiles
US18/306,548 Pending US20230366034A1 (en) 2016-06-21 2023-04-25 Compositions and methods for diagnosing lung cancers using gene expression profiles

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US16/312,036 Active 2038-07-29 US11661632B2 (en) 2016-06-21 2017-06-21 Compositions and methods for diagnosing lung cancers using gene expression profiles

Country Status (13)

Country Link
US (2) US11661632B2 (en)
EP (1) EP3472361A4 (en)
JP (1) JP2019522478A (en)
KR (1) KR20190026769A (en)
CN (1) CN109715830A (en)
AU (1) AU2017281099A1 (en)
BR (1) BR112018076528A2 (en)
CA (1) CA3026809A1 (en)
IL (1) IL263635A (en)
MX (1) MX2018016051A (en)
RU (1) RU2018145532A (en)
SG (1) SG11201810914VA (en)
WO (1) WO2017223216A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108350485A (en) 2015-10-30 2018-07-31 精密科学发展有限责任公司 The multiplex amplification detection assay of plasma dna and separation and detection
CA3022911A1 (en) 2016-05-05 2017-11-09 Exact Sciences Development Company, Llc Detection of lung neoplasia by analysis of methylated dna
CN108624692B (en) * 2018-06-25 2021-08-03 上海伯豪医学检验所有限公司 Gene marker for screening benign and malignant pulmonary nodules and application thereof
CA3049459A1 (en) 2017-01-27 2018-08-02 Exact Sciences Development Company, Llc Detection of colon neoplasia by analysis of methylated dna
JP2021506308A (en) * 2017-12-19 2021-02-22 ザ・ウイスター・インステイテユート・オブ・アナトミー・アンド・バイオロジー Compositions and Methods for Diagnosing Lung Cancer Using Gene Expression Profiles
BR112021009795A2 (en) * 2018-11-27 2021-08-17 Exact Sciences Development Company, Llc characterization methods and to characterize a sample, kit and composition
KR102199000B1 (en) * 2020-09-25 2021-01-06 이화여자대학교 산학협력단 A novel biomarker for diagnosing liver cancer

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6582908B2 (en) * 1990-12-06 2003-06-24 Affymetrix, Inc. Oligonucleotides
CA2478850C (en) 2002-03-13 2018-02-27 Genomic Health, Inc. Gene expression profiling in biopsied tumor tissues
EP2305811A1 (en) * 2005-07-27 2011-04-06 Oncotherapy Science, Inc. Method of diagnosing smal cell lung cancer
US8053183B2 (en) * 2005-07-27 2011-11-08 Oncotherapy Science, Inc. Method of diagnosing esophageal cancer
WO2007141004A1 (en) 2006-06-09 2007-12-13 Bayer Healthcare Ag Use of adipsin (adn) as a therapeutic or diagnostic target
US20090047689A1 (en) 2007-06-20 2009-02-19 John Kolman Autoantigen biomarkers for early diagnosis of lung adenocarcinoma
WO2009075799A2 (en) 2007-12-05 2009-06-18 The Wistar Institute Of Anatomy And Biology Method for diagnosing lung cancers using gene expression profiles in peripheral blood mononuclear cells
JP5701212B2 (en) 2008-09-09 2015-04-15 ソマロジック・インコーポレーテッド Lung cancer biomarkers and their use
GB201000688D0 (en) * 2010-01-15 2010-03-03 Diagenic Asa Product and method
US20130116150A1 (en) 2010-07-09 2013-05-09 Somalogic, Inc. Lung Cancer Biomarkers and Uses Thereof
EP2527459A1 (en) * 2011-05-02 2012-11-28 Rheinische Friedrich-Wilhelms-Universität Bonn Blood-based gene detection of non-small cell lung cancer
CA2869729C (en) * 2012-04-10 2021-12-28 Vib Vzw Novel markers for detecting microsatellite instability in cancer and determining synthetic lethality with inhibition of the dna base excision repair pathway
EP2841603A4 (en) * 2012-04-26 2016-05-25 Allegro Diagnostics Corp Methods for evaluating lung cancer status
US20150315643A1 (en) * 2012-12-13 2015-11-05 Baylor Research Institute Blood transcriptional signatures of active pulmonary tuberculosis and sarcoidosis
KR102461014B1 (en) * 2014-07-14 2022-10-31 베라사이트 인코포레이티드 Methods for Evaluating Lung Cancer Status
JP2021506308A (en) 2017-12-19 2021-02-22 ザ・ウイスター・インステイテユート・オブ・アナトミー・アンド・バイオロジー Compositions and Methods for Diagnosing Lung Cancer Using Gene Expression Profiles

Also Published As

Publication number Publication date
AU2017281099A1 (en) 2019-01-03
IL263635A (en) 2019-02-03
JP2019522478A (en) 2019-08-15
EP3472361A4 (en) 2020-02-19
RU2018145532A (en) 2020-07-23
MX2018016051A (en) 2019-08-29
SG11201810914VA (en) 2019-01-30
RU2018145532A3 (en) 2020-10-16
CN109715830A (en) 2019-05-03
KR20190026769A (en) 2019-03-13
US20200123613A1 (en) 2020-04-23
WO2017223216A1 (en) 2017-12-28
US11661632B2 (en) 2023-05-30
BR112018076528A2 (en) 2019-04-02
EP3472361A1 (en) 2019-04-24
CA3026809A1 (en) 2017-12-28

Similar Documents

Publication Publication Date Title
US20220396842A1 (en) Method for using gene expression to determine prognosis of prostate cancer
US20200370127A1 (en) Biomarkers in Peripheral Blood Mononuclear Cells for Diagnosing or Detecting Lung Cancers
US20230366034A1 (en) Compositions and methods for diagnosing lung cancers using gene expression profiles
JP6190434B2 (en) Gene expression markers to predict response to chemotherapeutic agents
US20200131586A1 (en) Methods and compositions for diagnosing or detecting lung cancers
AU2015227398B2 (en) Method for using gene expression to determine prognosis of prostate cancer
US20210079479A1 (en) Compostions and methods for diagnosing lung cancers using gene expression profiles
US20190010558A1 (en) Method for determining the risk of recurrence of an estrogen receptor-positive and her2-negative primary mammary carcinoma under an endocrine therapy

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION