US20230366034A1

US20230366034A1 - Compositions and methods for diagnosing lung cancers using gene expression profiles

Info

Publication number: US20230366034A1
Application number: US18/306,548
Authority: US
Inventors: Michael Showe; Louise C. Showe; Andrei V. Kossenkov
Original assignee: Wistar Institute of Anatomy and Biology
Current assignee: Wistar Institute of Anatomy and Biology
Priority date: 2016-06-21
Filing date: 2023-04-25
Publication date: 2023-11-16
Also published as: AU2017281099A1; IL263635A; JP2019522478A; EP3472361A4; RU2018145532A; MX2018016051A; SG11201810914VA; RU2018145532A3; CN109715830A; KR20190026769A; US20200123613A1; WO2017223216A1; US11661632B2; BR112018076528A2; EP3472361A1; CA3026809A1

Abstract

Methods and compositions are provided for diagnosing lung cancer in a mammalian subject by use of 10 or more selected genes, e.g., a gene expression profile, from the blood of the subject which is characteristic of disease. The gene expression profile includes 10 or more genes of Table I or Table II herein.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a continuation application of U.S. application Ser. No. 16/312,036, filed Dec. 20, 2018, which is a National Stage Entry under 35 U.S.C. 371 of International Patent Application No. PCT/US2017/038571, filed Jun. 21, 2017, which claims priority to U.S. Provisional Application No. 62/352,865, filed Jun. 21, 2016. These applications are incorporated by reference herein.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under Grant No. CA010815 awarded by the National Institutes of Health. The government has certain rights in the invention.

INCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED IN ELECTRONIC FORM

The contents of the electronic sequence listing (WST164USC1_SeqList.xml; size 536,359 bytes; and Date of Creation: Apr. 25, 2023) is herein incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

Lung cancer is the most common worldwide cause of cancer mortality. In the United States, lung cancer is the second most prevalent cancer in both men and women and will account for more than 174,000 new cases per year and more than 162,000 cancer deaths. In fact, lung cancer accounts for more deaths each year than from breast, prostate and colorectal cancers combined.
The high mortality (80-85% in five years), which has shown little or no improvement in the past 30 years, emphasizes the fact that new and effective tools to facilitate early diagnosis prior to metastasis to regional nodes or beyond the lung are needed.
High risk populations include smokers, former smokers, and individuals with markers associated with genetic predispositions. Because surgical removal of early stage tumors remains the most effective treatment for lung cancer, there has been great interest in screening high-risk patients with low dose spiral CT (LDCT). This strategy identifies non-calcified pulmonary nodules in approximately 30-70% of high risk individuals but only a small proportion of detected nodules are ultimately diagnosed as lung cancers (0.4 to 2.7%). Currently, the only way to differentiate subjects with lung nodules of benign etiology from subjects with malignant nodules is an invasive biopsy, surgery, or prolonged observation with repeated scanning. Even using the best clinical algorithms, 20-55% of patients selected to undergo surgical lung biopsy for indeterminate lung nodules, are found to have benign disease and those that do not undergo immediate biopsy or resection require sequential imaging studies. The use of serial CT in this group of patients runs the risk of delaying potential curable therapy, along with the costs of repeat scans, the not-insignificant radiation doses, and the anxiety of the patient.
Ideally, a diagnostic test would be easily accessible, inexpensive, demonstrate high sensitivity and specificity, and result in improved patient outcomes (medically and financially). Others have shown that classifiers which utilize epithelial cells have high accuracy. However, harvesting these cells requires an invasive bronchoscopy. See, Silvestri et al, N Engl J Med. 2015 Jul. 16; 373(3): 243-251, which is incorporated herein by reference.
Efforts are in progress to develop non-invasive diagnostics using sputum, blood or serum and analyzing for products of tumor cells, methylated tumor DNA, single nucleotide polymorphism (SNPs) expressed messenger RNA or proteins. This broad array of molecular tests with potential utility for early diagnosis of lung cancer has been discussed in the literature. Although each of these approaches has its own merits, none has yet passed the exploratory stage in the effort to detect patients with early stage lung cancer, even in high-risk groups, or patients which have a preliminary diagnosis based on radiological and other clinical factors. A simple blood test, a routine event associated with regular clinical office visits, would be an ideal diagnostic test.

SUMMARY OF THE INVENTION

In one aspect, a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more polynucleotides or oligonucleotides, wherein each polynucleotide or oligonucleotide hybridizes to a different gene, gene fragment, gene transcript or expression product in a patient sample. Each gene, gene fragment, gene transcript or expression product is selected from the genes of Table I or Table II. In one embodiment, at least one polynucleotide or oligonucleotide is attached to a detectable label. In one embodiment, the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 559 genes in Table I. In another embodiment, the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 100 genes in Table II.
In another aspect, a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more ligands, wherein each ligand hybridizes to a different gene expression product in a patient sample. Each gene expression product is selected from the genes of Table I or Table II. In one embodiment, at least one ligand is attached to a detectable label. In one embodiment, the composition or kit includes ligands which detect the expression products of each of the 559 genes in Table I. In another embodiment, the composition or kit includes ligands which detect the expression products of each of the 100 genes in Table II.
The compositions described herein enable detection of changes in expression in the genes in the subject's gene expression profile from that of a reference gene expression profile. The various reference gene expression profiles are described below. In one embodiment, the composition provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.
In another aspect, a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying changes in the expression of three or more genes in the sample of a subject, said genes selected from the genes of Table I or Table II, and comparing that subject's gene expression levels with the levels of the same genes in a reference or control, wherein changes in expression of said gene expression correlates with a diagnosis or evaluation of a lung cancer. In one embodiment, the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.
In another aspect, a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying a gene expression profile in the blood of a subject, the gene expression profile comprising 10 or more gene expression products of 10 or more informative genes as described herein. The 10 or more informative genes are selected from the genes of Table I or Table II. In one embodiment, the gene expression profile contains all 559 genes of Table I. In another embodiment, the gene expression profile contains all 100 genes of Table II. The subject's gene expression profile is compared with a reference gene expression profile from a variety of sources described below. Changes in expression of the informative genes correlate with a diagnosis or evaluation of a lung cancer. In one embodiment, the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.
In another aspect, a method of detecting lung cancer in a patient is provided. The method includes obtaining a sample from the patient; and detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product.
In yet another aspect, a method of diagnosing lung cancer in a subject is provided. The method includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; and diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected.
In another aspect, a method of diagnosing and treating lung cancer in a subject having a neoplastic growth is provided. The method includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected; and removing the neoplastic growth. Other appropriate treatments may also be provided.
Other aspects and advantages of these compositions and methods are described further in the following detailed description of the preferred embodiments thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a table showing patient characteristics for the samples used in Example 1.

FIGS. 2A and 2B are graphs showing the cross validated support vector machine classifier (CV SVM) of all 610 samples (FIG. 2A, Accuracy=0.75, ROC Area=0.81. According to the curve, when the sensitivity is 0.91, the specificity is 0.46; when the sensitivity is 0.72, the specificity is 0.77) and a balanced set of 556 samples (FIG. 2B, Accuracy=0.76, ROC Area=0.81, According to the curve, when the sensitivity is 0.90, the specificity is 0.48; when the sensitivity is 0.76, the specificity is 0.77), using the 559 Classifier. The full and balanced sets show similar performance.

FIG. 3 is a bar graph showing sensitivity of the classifier by nodule size groups (x-axis). Data shows that larger nodules are more likely to be misclassified (p=1.54*10-4).

FIGS. 4A to 4C show the classification of samples groups (cancer, FIG. 4B, n=204; and nodule, FIG. 4C, n=331) stratified by lesion size. Over cancers >5 mm and higher, r=0.95. For nodules of all sizes, r=0.97. The chart (FIG. 4A) shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.

FIGS. 5A and 5B are graphs showing the cross validated support vector machine classifier (CV SVM) of all cancer samples (n=278) vs. small nodules (<10 mm) (n=244) (FIG. 5A, Accuracy=0.79, ROC Area=0.85. According to the curve, when the sensitivity is 0.90, the specificity is 0.54; when the sensitivity is 0.77, the specificity is 0.82) and 10-fold CV SVM using all cancer samples (n=278) vs. large nodules (≥10 mm) (n=88) (FIG. 5B, Accuracy=0.76, ROC Area=0.71. According to the curve, when the sensitivity is 0.90, the specificity is 0.24; when the sensitivity is 0.87, the specificity is 0.42).

FIG. 6 is a graph showing the cross validated support vector machine classifier (CV SVM) of 25% of the data set used for the 559 Classifier, used as a testing set for the 100 Classifier. ROC Area=0.82. According to the curve, when the sensitivity is 0.90, the specificity is 0.62; when the sensitivity is 0.79, the specificity is 0.68; and when the sensitivity is 0.71, the specificity is 0.75.

DETAILED DESCRIPTION OF THE INVENTION

The methods and compositions described herein apply gene expression technology to blood screening for the detection and diagnosis of lung cancer. The compositions and methods described herein provide the ability to distinguish a cancerous tumor from a non-cancerous nodule, by determining a characteristic RNA expression profile of the genes of the blood of a mammalian, preferably human, subject. The profile is compared with the profile of one or more subjects of the same class (e.g., patients having lung cancer or a non-cancerous nodule) or a control to provide a useful diagnosis.
These methods of lung cancer screening employ compositions suitable for conducting a simple and cost-effective and non-invasive blood test using gene expression profiling that could alert the patient and physician to obtain further studies, such as a chest radiograph or CT scan, in much the same way that the prostate specific antigen is used to help diagnose and follow the progress of prostate cancer. The application of these profiles provides overlapping and confirmatory diagnoses of the type of lung disease, beginning with the initial test for malignant vs. non-malignant disease.
“Patient” or “subject” as used herein means a mammalian animal, including a human, a veterinary or farm animal, a domestic animal or pet, and animals normally used for clinical research. In one embodiment, the subject of these methods and compositions is a human.
“Control” or “Control subject” as used herein refers to the source of the reference gene expression profiles as well as the particular panel of control subjects described herein. In one embodiment, the control or reference level is from a single subject. In another embodiment, the control or reference level is from a population of individuals sharing a specific characteristic. In yet another embodiment, the control or reference level is an assigned value which correlates with the level of a specific control individual or population, although not necessarily measured at the time of assaying the test subject's sample. In one embodiment, the control subject or reference is from a patient (or population) having a non-cancerous nodule. In another embodiment, the control subject or reference is from a patient (or population) having a cancerous tumor. In other embodiments, the control subject can be a subject or population with lung cancer, such as a subject who is a current or former smoker with malignant disease, a subject with a solid lung tumor prior to surgery for removal of same; a subject with a solid lung tumor following surgical removal of said tumor; a subject with a solid lung tumor prior to therapy for same; and a subject with a solid lung tumor during or following therapy for same. In other embodiments, the controls for purposes of the compositions and methods described herein include any of the following classes of reference human subject with no lung cancer. Such non-healthy controls (NHC) include the classes of smoker with non-malignant disease, a former smoker with non-malignant disease (including patients with lung nodules), a non-smoker who has chronic obstructive pulmonary disease (COPD), and a former smoker with COPD. In still other embodiments, the control subject is a healthy non-smoker with no disease or a healthy smoker with no disease.
“Sample” as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells. The most suitable sample for use in this invention includes whole blood. Other useful biological samples include, without limitation, peripheral blood mononuclear cells, plasma, saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoscopy sample, bronchoalveolar lavage fluid, and other cellular exudates from a patient having cancer. Such samples may further be diluted with saline, buffer or a physiologically acceptable diluent. Alternatively, such samples are concentrated by conventional means.
As used herein, the term “cancer” refers to or describes the physiological condition in mammals that is typically characterized by unregulated cell growth. More specifically, as used herein, the term “cancer” means any lung cancer. In one embodiment, the lung cancer is non-small cell lung cancer (NSCLC). In a more specific embodiment, the lung cancer is lung adenocarcinoma (AC or LAC). In another more specific embodiment, the lung cancer is lung squamous cell carcinoma (SCC or LSCC). In another embodiment, the lung cancer is a stage I or stage II NSCLC. In still another embodiment, the lung cancer is a mixture of early and late stages and types of NSCLC.
The term “tumor,” as used herein, refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues. The term “nodule” refers to an abnormal buildup of tissue which is benign. The term “cancerous tumor” refers to a malignant tumor.
By “diagnosis” or “evaluation” it is meant a diagnosis of a lung cancer, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy. In one embodiment, “diagnosis” or “evaluation” refers to distinguishing between a cancerous tumor and a benign pulmonary nodule.
As used herein, “sensitivity” (also called the true positive rate), measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition).
As used herein, “specificity” (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).
By “change in expression” is meant an upregulation of one or more selected genes in comparison to the reference or control; a downregulation of one or more selected genes in comparison to the reference or control; or a combination of certain upregulated genes and down regulated genes.
By “therapeutic reagent” or “regimen” is meant any type of treatment employed in the treatment of cancers with or without solid tumors, including, without limitation, chemotherapeutic pharmaceuticals, biological response modifiers, radiation, diet, vitamin therapy, hormone therapies, gene therapy, surgical resection, etc.
By “informative genes” as used herein is meant those genes the expression of which changes (either in an up-regulated or down-regulated manner) characteristically in the presence of lung cancer. A statistically significant number of such informative genes thus form suitable gene expression profiles for use in the methods and compositions. Such genes are shown in Table I and Table II below. Such genes make up the “expression profile”.
The term “statistically significant number of genes” in the context of this invention differs depending on the degree of change in gene expression observed. The degree of change in gene expression varies with the type of cancer and with the size or spread of the cancer or solid tumor. The degree of change also varies with the immune response of the individual and is subject to variation with each individual. For example, in one embodiment of this invention, a large change, e.g., 2-3 fold increase or decrease in a small number of genes, e.g., in about 10 to 20 genes, is statistically significant. In another embodiment, a smaller relative change in about 15 more genes is statistically significant.
Thus, the methods and compositions described herein contemplate examination of the expression profile of a “statistically significant number of genes” ranging from 5 to about 559 genes in a single profile. In one embodiment, the genes are selected from Table I. In another embodiment, the genes are selected from Table II. In one embodiment, the gene profile is formed by a statistically significant number of 5 or more genes. In one embodiment, the gene profile is formed by a statistically significant number of 10 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 15 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 20 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 25 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 30 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 35 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 40 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 45 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 50 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 60 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 65 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 70 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 75 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 80 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 85 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 90 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 95 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 100 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 200 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 300 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 350 or more genes. In still another embodiment, the gene profile is formed by 400 or more genes. In still another embodiment, the gene profile is formed by 539 genes. In still another embodiment, the gene profile is formed by 559 genes. In still other embodiments, the gene profiles examined as part of these methods contain, as statistically significant numbers of genes, from 10 to 559 genes, and any numbers therebetween. In another embodiment, the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, or all 559 genes of Table I. In another embodiment, the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or all 100 genes of Table II.
Table I and Table II below refer to a collection of known genes useful in discriminating between a subject having a lung cancer, e.g., NSCLC, and subjects having benign (non-malignant) lung nodules. The sequences of the genes identified in Table I and Table II are publicly available. One skilled in the art may readily reproduce the compositions and methods described herein by use of the sequences of the genes, all of which are publicly available from conventional sources, such as GenBank. The GenBank accession number for each gene is provided.
The term “microarray” refers to an ordered arrangement of hybridizable array elements, preferably polynucleotide or oligonucleotide probes, on a substrate.
The term “polynucleotide,” when used in singular or plural form, generally refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. Thus, for instance, polynucleotides as defined herein include, without limitation, single- and double-stranded DNA, DNA including single- and double-stranded regions, single- and double-stranded RNA, and RNA including single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or include single- and double-stranded regions. In addition, the term “polynucleotide” as used herein refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands in such regions may be from the same molecule or from different molecules. The regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules. One of the molecules of a triple-helical region often is an oligonucleotide. The term “polynucleotide” specifically includes cDNAs. The term includes DNAs (including cDNAs) and RNAs that contain one or more modified bases. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are “polynucleotides” as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritiated bases, are included within the term “polynucleotides” as defined herein. In general, the term “polynucleotide” embraces all chemically, enzymatically and/or metabolically modified forms of unmodified polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including simple and complex cells.
The term “oligonucleotide” refers to a relatively short polynucleotide, including, without limitation, single-stranded deoxyribonucleotides, single- or double-stranded ribonucleotides, RNA:DNA hybrids and double-stranded DNAs. Oligonucleotides, such as single-stranded DNA probe oligonucleotides, are often synthesized by chemical methods, for example using automated oligonucleotide synthesizers that are commercially available. However, oligonucleotides can be made by a variety of other methods, including in vitro recombinant DNA-mediated techniques and by expression of DNAs in cells and organisms.
The terms “differentially expressed gene”, “differential gene expression” and their synonyms, which are used interchangeably, refer to a gene whose expression is activated to a higher or lower level in a subject suffering from a disease, specifically cancer, such as lung cancer, relative to its expression in a control subject, such as a subject having a benign nodule. The terms also include genes whose expression is activated to a higher or lower level at different stages of the same disease. It is also understood that a differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a change in mRNA levels, surface expression, secretion or other partitioning of a polypeptide, for example. Differential gene expression may include a comparison of expression between two or more genes or their gene products, or a comparison of the ratios of the expression between two or more genes or their gene products, or even a comparison of two differently processed products of the same gene, which differ between normal subjects, non-health controls and subjects suffering from a disease, specifically cancer, or between various stages of the same disease. Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages. For the purpose of this invention, “differential gene expression” is considered to be present when there is a statistically significant (p<0.05) difference in gene expression between the subject and control samples.
The term “over-expression” with regard to an RNA transcript is used to refer to the level of the transcript determined by normalization to the level of reference mRNAs, which might be all measured transcripts in the specimen or a particular reference set of mRNAs.
The phrase “gene amplification” refers to a process by which multiple copies of a gene or gene fragment are formed in a particular cell or cell line. The duplicated region (a stretch of amplified DNA) is often referred to as “amplicon.” Usually, the amount of the messenger RNA (mRNA) produced, i.e., the level of gene expression, also increases in the proportion of the number of copies made of the particular gene expressed.
In the context of the compositions and methods described herein, reference to “10 or more”, “at least 10” etc. of the genes listed in Table I or Table II means any one or any and all combinations of the genes listed. For example, suitable gene expression profiles include profiles containing any number between at least 5 through 559 genes from Table I. In another example, suitable gene expression profiles include profiles containing any number between at least 5 through 100 genes from Table II. In one embodiment, gene profiles formed by genes selected from a table are used in rank order, e.g., genes ranked in the top of the list demonstrated more significant discriminatory results in the tests, and thus may be more significant in a profile than lower ranked genes. However, in other embodiments the genes forming a useful gene profile do not have to be in rank order and may be any gene from the table. As used herein, the term “100 Classifier” or “100 Biomarker Classifier” refers to the 100 genes of Table II. As used herein, the term “559 Classifier” or “559 Biomarker Classifier” refers to the 559 genes of Table I. However, subsets of the genes of Table I or Table II, as described herein, are also useful, and, in another embodiment, the terms may refer to those subsets as well.
As used herein, “labels” or “reporter molecules” are chemical or biochemical moieties useful for labeling a nucleic acid (including a single nucleotide), polynucleotide, oligonucleotide, or protein ligand, e.g., amino acid or antibody. “Labels” and “reporter molecules” include fluorescent agents, chemiluminescent agents, chromogenic agents, quenching agents, radionucleotides, enzymes, substrates, cofactors, inhibitors, magnetic particles, and other moieties known in the art. “Labels” or “reporter molecules” are capable of generating a measurable signal and may be covalently or noncovalently joined or bound to an oligonucleotide or nucleotide (e.g., a non-natural nucleotide) or ligand.
Unless defined otherwise in this specification, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs and by reference to published texts, which provide one skilled in the art with a general guide to many of the terms used in the present application.

I. GENE EXPRESSION PROFILES

The inventors have shown that the gene expression profiles of the whole blood of lung cancer patients differ significantly from those seen in patients having non-cancerous lung nodules. For example, changes in the gene expression products of the genes of Table I and/or Table II can be observed and detected by the methods of this invention in the normal circulating blood of patients with early stage solid lung tumors.
The gene expression profiles described herein provide new diagnostic markers for the early detection of lung cancer and could prevent patients from undergoing unnecessary procedures relating to surgery or biopsy for a benign nodule. Since the risks are very low, the benefit to risk ratio is very high. In one embodiment, the methods and compositions described herein may be used in conjunction with clinical risk factors to help physicians make more accurate decisions about how to manage patients with lung nodules. Another advantage of this invention is that diagnosis may occur early since diagnosis is not dependent upon detecting circulating tumor cells which are present in only vanishing small numbers in early stage lung cancers.
In one aspect, a composition is provided for classifying a nodule as cancerous or benign in a mammalian subject. In one embodiment, the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In another embodiment, the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the polynucleotide or oligonucleotide or ligand hybridizes to an mRNA.

TABLE I

Rank	Sequence ID#	Gene	Class Name

1	PLEKHG4	NM_015432.3	Endogenous
2	SLC25A20	NM_000387.5	Endogenous
3	LETM2	NM_144652.3	Endogenous
4	GLIS3	NM_001042413.1	Endogenous
5	LOC100132797	XR_036994.1	Endogenous
6	ARHGEF5	NM_005435.3	Endogenous
7	TCF7L2	NM_030756.4	Endogenous
8	SFRS2IP	NM_004719.2	Endogenous
9	CFD	NM_001928.2	Endogenous
10	AZI2	NM_022461.4	Endogenous
11	STOM	NM_004099.5	Endogenous
12	CD1A	NM_001763.2	Endogenous
13	PANK2	NM_153640.2	Endogenous
14	CNIH4	NM_014184.3	Endogenous
15	EVI2A	NM_014210.3	Endogenous
16	BATF	NM_006399.3	Endogenous
17	TCP1	NM_030752.2	Endogenous
18	BX108566	BX108566.1	Endogenous
19	ANXA1	NM_000700.2	Endogenous
20	PSMA3	NM_152132.2	Endogenous
21	IRF4	NM_002460.1	Endogenous
22	STAG3	NM_012447.3	Endogenous
23	NDUFS4	NM_002495.2	Endogenous
24	HAT1	NM_003642.3	Endogenous
25	ANXA1 b	NM_000700.1	Endogenous
26	LOC148137	NM_144692.1	Endogenous
27	LDHA	NM_001165416.1	Endogenous
28	PSME3	NM_005789.3	Endogenous
29	REPS1	NM_001128617.2	Endogenous
30	CDH5	NM_001795.3	Endogenous
31	NAT5	NM_181528.3	Endogenous
32	PLAC8	NM_001130715.1	Endogenous
33	GSTO1	NM_004832.2	Endogenous
34	DGUOK	NM_080916.2	Endogenous
35	OLR1	NM_002543.3	Endogenous
36	MYST4	NM_012330.3	Endogenous
37	TIMM8B	ENST00000504148.1	Endogenous
38	LY96	NM_015364.4	Endogenous
39	CCDC72	NM_015933.4	Endogenous
40	ATP5I	NM_007100.2	Endogenous
41	WDR91	NM_014149.3	Endogenous
42	MAGEA3	NM_005362.3	Endogenous
43	AK093878	AK093878.1	Endogenous
44	EYA3	NM_001990.3	Endogenous
45	ACAA2	NM_006111.2	Endogenous
46	ETFDH	NM_004453.3	Endogenous
47	CCT6A	NM_001762.3	Endogenous
48	HSCB	NM_172002.3	Endogenous
49	EMR4	NM_001080498.2	Endogenous
50	USP5	NM_003481.2	Endogenous
51	SIK1	NM_173354.3	Endogenous
52	SYNJ1	NM_003895.3	Endogenous
53	KLRB1	NM_002258.2	Endogenous
54	CLK2	XM_941392.1	Endogenous
55	SNORA56	NR_002984.1	Endogenous
56	TP53BP1	NM_005657.2	Endogenous
57	RBX1	NM_014248.3	Endogenous
58	CNPY2	NM_014255.5	Endogenous
59	RELA	NM_021975.2	Endogenous
60	LOC732371	XM_001133019.1	Endogenous
61	TMEM218	NM_001080546.2	Endogenous
62	LOC91431	NM_001099776.1	Endogenous
63	GZMB	NM_004131.3	Endogenous
64	CAMP	NM_004345.4	Endogenous
65	RBM16	NM_014892.4	Endogenous
66	MID1IP1	NM_021242.5	Endogenous
67	LOC399942	XM_934471.1	Endogenous
68	COMMD6	NM_203497.3	Endogenous
69	PPP6C	NM_002721.4	Endogenous
70	BCOR	NM_017745.5	Endogenous
71	PDCD10	NM_145859.1	Endogenous
72	HLA-DMB	NM_002118.3	Endogenous
73	DNAJB1	NM_006145.2	Endogenous
74	KYNU	NM_001032998.1	Endogenous
75	TM2D2	NM_078473.2	Endogenous
76	FAM179A	NM_199280.2	Endogenous
77	FAM43A	NM_153690.4	Endogenous
78	QTRTD1	NM_024638.3	Endogenous
79	MARCKSL1	NM_023009.5	Endogenous
80	FAM193A	NM_003704.3	Endogenous
81	AK026725	AK026725.1	Endogenous
82	SERPINB10	NM_005024.1	Endogenous
83	OSBP	ILMN_1706376.1	Endogenous
84	ST6GAL1	NM_003032.2	Endogenous
85	NDUFAF2	NM_174889.4	Endogenous
86	UBE2I	NM_194259.2	Endogenous
87	CTAG1B	NM_001327.2	Endogenous
88	TRAF6	NM_145803.1	Endogenous
89	REPIN1	NM_014374.3	Endogenous
90	LAMA5	NM_005560.4	Endogenous
91	TBC1D12	NM_015188.1	Endogenous
92	TGIF1 b	NM_173208.1	Endogenous
93	LOC728533	XR_015610.3	Endogenous
94	CLN8	NM_018941.3	Endogenous
95	COX7B	NM_001866.2	Endogenous
96	DYNC2LI1	NM_016008.3	Endogenous
97	ANP32B	NM_006401.2	Endogenous
98	PTGDR2	NM_004778.1	Endogenous
99	MRPS16	NM_016065.3	Endogenous
100	NIPBL	NM_133433.3	Endogenous
101	PPP2R5C	NM_178588.1	Endogenous
102	DPF2	NM_006268.4	Endogenous
103	RAB10	NM_016131.4	Endogenous
104	MYADM	NM_001020820.1	Endogenous
105	CCND3	NM_001760.2	Endogenous
106	CC2D1B	NM_032449.2	Endogenous
107	HLA-G	NM_002127.4	Endogenous
108	CKS2	NM_001827.1	Endogenous
109	HPSE	NM_006665.5	Endogenous
110	UBE2G1	NM_003342.4	Endogenous
111	MED16	NM_005481.2	Endogenous
112	LOC339674	XM_934917.1	Endogenous
113	RNF114	NM_018683.3	Endogenous
114	KIR2DS3	NM_012313.1	Endogenous
115	AMD1	NM_001634.4	Endogenous
116	S100A8	NM_002964.4	Endogenous
117	NFATC4	NM_001136022.2	Endogenous
118	RPL39L	NM_052969.1	Endogenous
119	LOC399753	XM_930634.1	Endogenous
120	FKBP1A	NM_054014.3	Endogenous
121	CHMP5	NM_016410.5	Endogenous
122	CABC1	NM_020247.4	Endogenous
123	HLA-B	NM_005514.6	Endogenous
124	TRIM39	NM_021253.3	Endogenous
125	LOC645914	XM_928884.1	Endogenous
126	CD79A	NM_021601.3	Endogenous
127	GLRX	ILMN_1737308.1	Endogenous
128	RPL26L1	NM_016093.2	Endogenous
129	USP21	NM_012475.4	Endogenous
130	CD70	NM_001252.2	Endogenous
131	SPINK5	NM_006846.3	Endogenous
132	HUWE1	NM_031407.6	Endogenous
133	STK38	NM_007271.3	Endogenous
134	SEMG1	NM_003007.2	Endogenous
135	NDUFA4	NM_002489.3	Endogenous
136	MYADM b	NM_001020820.1	Endogenous
137	SGK1 b	NM_005627.3	Endogenous
138	SLAMF8	NM_020125.2	Endogenous
139	LOC653773	XM_938755.1	Endogenous
140	RPS24	NM_001026.4	Endogenous
141	LOC338799	NR_002809.2	Endogenous
142	MAP3K7	NM_145333.1	Endogenous
143	KLRD1	NM_002262.3	Endogenous
144	LOC732111	XM_001134275.1	Endogenous
145	CD69	NM_001781.2	Endogenous
146	DDIT4	NM_019058.2	Endogenous
147	C1orf222	NM_001003808.1	Endogenous
148	PFAS	NM_012393.2	Endogenous
149	USP9Y	NM_004654.3	Endogenous
150	COLEC12	NM_130386.2	Endogenous
151	VPS37C	NM_017966.4	Endogenous
152	SAP130	NM_024545.3	Endogenous
153	CDC42EP2	NM_006779.3	Endogenous
154	LOC643319	XM_927980.1	Endogenous
155	ASF1B	NM_018154.2	Endogenous
156	AK094576	AK094576.1	Endogenous
157	BANP	NM_079837.2	Endogenous
158	TBK1	NM_013254.2	Endogenous
159	GNS	NM_002076.3	Endogenous
160	IL1R2	NM_173343.1	Endogenous
161	CLEC4C	NM_203503.1	Endogenous
162	TM9SF1	NM_006405.6	Endogenous
163	PTGDR	NM_000953.2	Endogenous
164	GOLGA3	NM_005895.3	Endogenous
165	CLEC4A	NM_194448.2	Endogenous
166	TSC1	NM_000368.4	Endogenous
167	SFMBT1	NM_001005158.2	Endogenous
168	GLT25D1	NM_024656.2	Endogenous
169	LOC100130229	XM_001717158.1	Endogenous
170	PHF8	NM_015107.2	Endogenous
171	PUM1	NM_001020658.1	Endogenous
172	SMARCC1	NM_003074.3	Endogenous
173	AK126342	AK126342.1	Endogenous
174	ACSL5	NM_203379.1	Endogenous
175	TGIF1	NM_003244.2	Endogenous
176	BF375676	BF375676.1	Endogenous
177	SPA17	NM_017425.3	Endogenous
178	FLNB	NM_001457.3	Endogenous
179	FAM105B	NM_138348.4	Endogenous
180	CPPED1	NM_018340.2	Endogenous
181	TRIM32	NM_012210.3	Endogenous
182	RNF34	NM_025126.3	Endogenous
183	SLC45A3	NM_033102.2	Endogenous
184	P2RY10	NM_198333.1	Endogenous
185	AKR1C3	NM_003739.4	Endogenous
186	NME1-NME2	NM_001018136.2	Endogenous
187	AMPD3	NM_000480.2	Endogenous
188	HSP90AB1	NM_007355.3	Endogenous
189	RBM4B	NM_031492.3	Endogenous
190	DMBT1	NM_007329.2	Endogenous
191	TMCO1	NM_019026.3	Endogenous
192	CASP2	NM_032983.3	Endogenous
193	C1orf103	NM_018372.3	Endogenous
194	ARHGAP17	NM_018054.5	Endogenous
195	IFNA17	NM_021268.2	Endogenous
196	CTSZ	NM_001336.3	Endogenous
197	DBI	NM_001079862.1	Endogenous
198	TXNRD1 b	NM_182743.2	Endogenous
199	KIAA0460	NM_015203.4	Endogenous
200	PDGFD	NM_033135.3	Endogenous
201	ATG5	NM_004849.2	Endogenous
202	ITFG2	NM_018463.3	Endogenous
203	HERC1	NM_003922.3	Endogenous
204	MEN1	NM_130799.2	Endogenous
205	IFI27L2	NM_032036.2	Endogenous
206	LOC729887	XR_040891.2	Endogenous
207	PI4K2A	NM_018425.3	Endogenous
208	RAG1	NM_000448.2	Endogenous
209	CREB5	NM_182898.3	Endogenous
210	SLC6A12	NM_003044.4	Endogenous
211	CDKN1A	NM_000389.2	Endogenous
212	AW173314	AW173314.1	Endogenous
213	SAP130 b	NM_024545.3	Endogenous
214	ABCA5	NM_018672.4	Endogenous
215	SLC25A37	NM_016612.2	Endogenous
216	MYLIP	NM_013262.3	Endogenous
217	GATA2	NM_001145662.1	Endogenous
218	ATP5L	NM_006476.4	Endogenous
219	RPS27L	NM_015920.3	Endogenous
220	DB338252	DB338252.1	Endogenous
221	FRAT2	NM_012083.2	Endogenous
222	CCL4	NM_002984.2	Endogenous
223	CD79B	NM_000626.2	Endogenous
224	MBD1	NM_015844.2	Endogenous
225	TIAM1	NM_003253.2	Endogenous
226	HSD11B1	NM_181755.1	Endogenous
227	TPR	NM_003292.2	Endogenous
228	EID2B	NM_152361.2	Endogenous
229	PDSS1	NM_014317.3	Endogenous
230	C9orf164	NM_182635.1	Endogenous
231	ARHGEF18	NM_015318.3	Endogenous
232	TXNRD1	NM_001093771.2	Endogenous
233	HNRNPAB	NM_004499.3	Endogenous
234	TTN	NM_133378.4	Endogenous
235	EP300	NM_001429.2	Endogenous
236	CCDC97	NM_052848.1	Endogenous
237	HK3	NM_002115.2	Endogenous
238	CRKL	NM_005207.3	Endogenous
239	NCOA5	NM_020967.2	Endogenous
240	AK124143	AK124143.1	Endogenous
241	LBA1	NM_014831.2	Endogenous
242	SLC9A3R1	NM_004252.3	Endogenous
243	CRY2	NM_021117.3	Endogenous
244	ATG4B	NM_178326.2	Endogenous
245	CD97	NM_078481.3	Endogenous
246	TTC9	NM_015351.1	Endogenous
247	BMPR2	NM_001204.6	Endogenous
248	LPIN2	NM_014646.2	Endogenous
249	UBA1	NM_003334.3	Endogenous
250	SETD1B	XM_037523.11	Endogenous
251	PRPF8	NM_006445.3	Endogenous
252	RNASE2	NM_002934.2	Endogenous
253	KIAA0101	NM_014736.4	Endogenous
254	ARG1	NM_000045.3	Endogenous
255	UBTF	NM_001076683.1	Endogenous
256	MFSD1	NM_022736.2	Endogenous
257	IDO1	NM_002164.3	Endogenous
258	MS4A6A	NM_022349.3	Endogenous
259	C22orf30	NM_173566.2	Endogenous
260	HNRNPK	NM_031263.2	Endogenous
261	ARL8B	NM_018184.2	Endogenous
262	SETD2	NM_014159.6	Endogenous
263	NCAPG	NM_022346.4	Endogenous
264	EEF1B2	NM_001037663.1	Endogenous
265	TRIM39 b	NM_172016.2	Endogenous
266	EHD4	NM_139265.3	Endogenous
267	IRF1	NM_002198.1	Endogenous
268	LOC100129022	XM_001716591.1	Endogenous
269	TRAF3IP2	NM_147686.3	Endogenous
270	PSMA6	NM_002791.2	Endogenous
271	RHOG	NM_001665.3	Endogenous
272	CN312986	CN312986.1	Endogenous
273	PSMB8	NM_004159.4	Endogenous
274	ZNF239	NM_001099283.1	Endogenous
275	CLPTM1	NM_001294.3	Endogenous
276	NADK	NM_023018.4	Endogenous
277	C8orf76	NM_032847.2	Endogenous
278	LIF	NM_002309.3	Endogenous
279	EGR1	NM_001964.2	Endogenous
280	ARG1 b	NM_000045.2	Endogenous
281	MERTK	NM_006343.2	Endogenous
282	RHOU	NM_021205.5	Endogenous
283	PFDN5 b	NM_145897.2	Endogenous
284	MAGEA1	NM_004988.4	Endogenous
285	SEC24C	NM_198597.2	Endogenous
286	SLC11A1	NM_000578.3	Endogenous
287	TCF20	NM_181492.2	Endogenous
288	AHCYL1	NM_001242676.1	Endogenous
289	TPT1	NM_003295.3	Endogenous
290	KIR2DL5A	XM_001126354.1	Endogenous
291	IRAK2	NM_001570.3	Endogenous
292	C17orf51	XM_944416.1	Endogenous
293	C14orf156	NM_031210.5	Endogenous
294	ATP2C1	NM_014382.3	Endogenous
295	SOCS1	NM_003745.1	Endogenous
296	JAK1	NM_002227.1	Endogenous
297	RSL24D1	NM_016304.2	Endogenous
298	AP2S1	NM_021575.3	Endogenous
299	PHRF1	NM_020901.3	Endogenous
300	GPI	NM_000175.2	Endogenous
301	NCR1	NM_004829.5	Endogenous
302	AKAP4	NM_139289.1	Endogenous
303	CD160	NM_007053.3	Endogenous
304	DDX23	NM_004818.2	Endogenous
305	GNL3	NM_014366.4	Endogenous
306	NFKB2	NM_002502.2	Endogenous
307	CSK	NM_004383.2	Endogenous
308	PELP1	NM_014389.2	Endogenous
309	KLRF1 b	NM_016523.2	Endogenous
310	CS	NM_004077.2	Endogenous
311	PHCA	NM_018367.6	Endogenous
312	LOC644315	XR_017529.2	Endogenous
313	NUDT18	NM_024815.3	Endogenous
314	XCL2	NM_003175.3	Endogenous
315	KLRC1	NM_002259.3	Endogenous
316	ARHGAP18	NM_033515.2	Endogenous
317	CTDSP2	NM_005730.3	Endogenous
318	P2RY5	NM_005767.5	Endogenous
319	CREB1	NM_004379.3	Endogenous
320	RHOB	NM_004040.3	Endogenous
321	DCAF7	NM_005828.4	Endogenous
322	NUP153	NM_005124.3	Endogenous
323	AFTPH	NM_017657.4	Endogenous
324	EWSR1	NM_005243.3	Endogenous
325	LYN	NM_002350.1	Endogenous
326	CYBB	NM_000397.3	Endogenous
327	TMEM70	NM_017866.5	Endogenous
328	PPP1R3E	XM_927029.1	Endogenous
329	PSMB1	NM_002793.3	Endogenous
330	RERE b	NM_012102.3	Endogenous
331	RXRA	NM_002957.5	Endogenous
332	GZMA	NM_006144.3	Endogenous
333	ERLIN1	NM_006459.3	Endogenous
334	KRTAP10-3	NM_198696.2	Endogenous
335	SAMSN1	NM_022136.3	Endogenous
336	LRRC47	NM_020710.2	Endogenous
337	MARCKS	NM_002356.6	Endogenous
338	HOPX	NM_139211.4	Endogenous
339	KLRF1	NM_016523.1	Endogenous
340	NFAT5	NM_138713.3	Endogenous
341	SLC15A2	NM_021082.3	Endogenous
342	STK16	NM_003691.2	Endogenous
343	KIR_Activating_Subgroup_2	NM_014512.1	Endogenous
344	TBCE	NM_001079515.2	Endogenous
345	BAG3	NM_004281.3	Endogenous
346	SFRS4	NM_005626.4	Endogenous
347	AW270402	AW270402.1	Endogenous
348	CCL3L1	NM_021006.4	Endogenous
349	HERC3	NM_014606.2	Endogenous
350	RPL34	NM_000995.3	Endogenous
351	ALAS1	NM_000688.4	Endogenous
352	CCR9	NM_031200.1	Endogenous
353	CORO1C	ILMN_1745954.1	Endogenous
354	FAIM3	NM_005449.4	Endogenous
355	SFPQ	NM_005066.2	Endogenous
356	HOOK3	NM_032410.3	Endogenous
357	CD36	NM_000072.3	Endogenous
358	IL7	NM_000880.2	Endogenous
359	CBLL1	NM_024814.3	Endogenous
360	HVCN1	NM_032369.3	Endogenous
361	HMGB1	NM_002128.4	Endogenous
362	SIN3A	NM_015477.2	Endogenous
363	CASP3	NM_032991.2	Endogenous
364	BQ189294	BQ189294.1	Endogenous
365	NDRG2	NM_016250.2	Endogenous
366	BX400436	BX400436.2	Endogenous
367	IFNAR2	NM_000874.3	Endogenous
368	MS4A6A b	NM_152851.2	Endogenous
369	KLRC2	NM_002260.3	Endogenous
370	S100A12 b	NM_005621.1	Endogenous
371	ATM	NM_000051.3	Endogenous
372	NLRP3	NM_001079821.2	Endogenous
373	HAVCR2	NM_032782.3	Endogenous
374	C4B	NM_001002029.3	Endogenous
375	CTSW	NM_001335.3	Endogenous
376	TMEM170B	NM_001100829.2	Endogenous
377	EIF4ENIF1	NM_019843.2	Endogenous
378	CCL3	NM_002983.2	Endogenous
379	CHCHD3	NM_017812.2	Endogenous
380	CST7	NM_003650.3	Endogenous
381	SFRS15	NM_020706.2	Endogenous
382	STIP1	NM_006819.2	Endogenous
383	MPDU1	NM_004870.3	Endogenous
384	DHX16 b	NM_001164239.1	Endogenous
385	INTS4	NM_033547.3	Endogenous
386	USP16	NM_001032410.1	Endogenous
387	IFNAR1	NM_000629.2	Endogenous
388	ITCH	NM_001257138.1	Endogenous
389	FOXK2	NM_004514.3	Endogenous
390	LOC642812	XR_036892.1	Endogenous
391	KIAA1967	NM_021174.5	Endogenous
392	LOC440928	XM_942885.1	Endogenous
393	NDUFV2	NM_021074.4	Endogenous
394	IL4	NM_000589.2	Endogenous
395	CIAPIN1	NM_020313.3	Endogenous
396	CXCL2	NM_002089.3	Endogenous
397	TXN	NM_003329.3	Endogenous
398	PRG2	NM_002728.4	Endogenous
399	MS4A2	NM_000139.3	Endogenous
400	YPEL1	NM_013313.4	Endogenous
401	POLR2A	NM_000937.4	Endogenous
402	C19orf10	NM_019107.3	Endogenous
403	IGFBP7	NM_001553.2	Endogenous
404	ITGAE	NM_002208.4	Endogenous
405	CXCR5 b	NM_001716.3	Endogenous
406	BID	NM_001196.2	Endogenous
407	LOC100133273	XR_039238.1	Endogenous
408	FNBP1	NM_015033.2	Endogenous
409	IFNGR1	NM_000416.1	Endogenous
410	STAT6	NM_003153.4	Endogenous
411	CR2	NM_001006658.2	Endogenous
412	CCL3L3	NM_001001437.3	Endogenous
413	RFWD2	NM_022457.6	Endogenous
414	SP2	NM_003110.5	Endogenous
415	BAT2D1	NM_015172.3	Endogenous
416	CX3CL1	NM_002996.3	Endogenous
417	GPATCH3	NM_022078.2	Endogenous
418	CASP1	NM_033294.3	Endogenous
419	NAGK	NM_017567.4	Endogenous
420	IER5	NM_016545.4	Endogenous
421	PHLPP2	NM_015020.3	Endogenous
422	RPL31	NM_000993.4	Endogenous
423	SPEN	NM_015001.2	Endogenous
424	TMSB4X	NM_021109.3	Endogenous
425	IL8RB	NM_001557.3	Endogenous
426	XPC	NR_027299.1	Endogenous
427	SNX11	NM_152244.1	Endogenous
428	SPN	NM_003123.3	Endogenous
429	ANKHD1	NM_017747.2	Endogenous
430	CCR6	NM_031409.2	Endogenous
431	DZIP3	NM_014648.3	Endogenous
432	MRPL27	NM_148571.1	Endogenous
433	SREBF1	NM_001005291.2	Endogenous
434	CD14	NM_000591.2	Endogenous
435	TNFSF8	NM_001244.3	Endogenous
436	C3	NM_000064.2	Endogenous
437	FAM50B	NM_012135.1	Endogenous
438	RASSF5	NM_182664.2	Endogenous
439	BU743228	BU743228.1	Endogenous
440	NFATC1	NM_172389.1	Endogenous
441	DOCK5	NM_024940.6	Endogenous
442	PACS1	NM_018026.3	Endogenous
443	CYP1B1	NM_000104.3	Endogenous
444	CLIC3	ILMN_1796423.1	Endogenous
445	PSMA4	NM_002789.3	Endogenous
446	ZNF341	NM_032819.4	Endogenous
447	PRPF3	NM_004698.2	Endogenous
448	PSMA6 b	NM_002791.2	Endogenous
449	LOC648927	XR_038906.2	Endogenous
450	KCTD12	NM_138444.3	Endogenous
451	LOC440389	XM_498648.3	Endogenous
452	U2AF2	NM_007279.2	Endogenous
453	CLEC5A	NM_013252.2	Endogenous
454	PRRG4	NM_024081.5	Endogenous
455	TNFRSF9	NM_001561.5	Endogenous
456	NDUFB3	NM_002491.2	Endogenous
457	BCL6	NM_001130845.1	Endogenous
458	SGK1	NM_005627.3	Endogenous
459	CIP29	NM_033082.3	Endogenous
460	CD160 b	NM_007053.2	Endogenous
461	ARCN1	NM_001655.4	Endogenous
462	LOC151162	NR_024275.1	Endogenous
463	GPR65	NM_003608.3	Endogenous
464	CCR1	NM_001295.2	Endogenous
465	TFCP2	NM_005653.4	Endogenous
466	SGK	NM_005627.3	Endogenous
467	RNF214	NM_207343.3	Endogenous
468	TMC8	NM_152468.4	Endogenous
469	RBM14	NM_006328.3	Endogenous
470	USP34	NM_014709.3	Endogenous
471	BACH2	NM_021813.3	Endogenous
472	LILRA5	NM_021250.3	Endogenous
473	C5orf21	NM_032042.5	Endogenous
474	LOC441073	XR_018937.2	Endogenous
475	TAX1BP1	NM_001079864.2	Endogenous
476	TNFSF13	NM_003808.3	Endogenous
477	PIM2	NM_006875.3	Endogenous
478	RNF19B	NM_153341.3	Endogenous
479	EPHX2	NM_001979.5	Endogenous
480	LILRA5 b	NM_181879.2	Endogenous
481	ABCF1	NM_001025091.1	Endogenous
482	C4orf27	NM_017867.2	Endogenous
483	PSMB7	NM_002799.2	Endogenous
484	LPCAT4	NM_153613.2	Endogenous
485	TRIM21	NM_003141.3	Endogenous
486	LOC728835	XM_001133190.1	Endogenous
487	NFKB1	NM_003998.3	Endogenous
488	CR2 b	NM_001006658.1	Endogenous
489	HMGB2	NM_002129.3	Endogenous
490	IL1B	NM_000576.2	Endogenous
491	C20orf52	NM_080748.2	Endogenous
492	DNAJB6	NM_058246.3	Endogenous
493	PFDN5	NM_145897.2	Endogenous
494	RPS6	NM_001010.2	Endogenous
495	LEF1	NM_016269.4	Endogenous
496	DKFZp761P0423	XM_291277.4	Endogenous
497	LOC647340	XR_018104.1	Endogenous
498	FTHL16	XR_041433.1	Endogenous
499	COX6C	NM_004374.2	Endogenous
500	BCL10	NM_003921.2	Endogenous
501	CD48	NM_001778.2	Endogenous
502	ZMIZ1	NM_020338.3	Endogenous
503	GZMH	NM_033423.4	Endogenous
504	TRRAP	NM_003496.3	Endogenous
505	SH2D3C	NM_170600.2	Endogenous
506	UBC	NM_021009.3	Endogenous
507	TXNDC17	NM_032731.3	Endogenous
508	ATP5J2	NM_004889.3	Endogenous
509	KIAA1267	NM_015443.3	Endogenous
510	RFX1	NM_002918.4	Endogenous
511	WDR1	NM_005112.4	Endogenous
512	LOC100129697	XM_001732822.2	Endogenous
513	TOMM7	NM_019059.2	Endogenous
514	ARHGAP26	NM_015071.4	Endogenous
515	HSPA6	NM_002155.4	Endogenous
516	FLJ10357	NM_018071.4	Endogenous
517	ITGAL	NM_002209.2	Endogenous
518	BX089765	BX089765.1	Endogenous
519	RERE	NM_001042682.1	Endogenous
520	C15orf39	NM_015492.4	Endogenous
521	BX436458	BX436458.2	Endogenous
522	RWDD1	NM_001007464.2	Endogenous
523	TMBIM6	NM_003217.2	Endogenous
524	SLC6A6	NM_003043.5	Endogenous
525	KIAA0174	NM_014761.3	Endogenous
526	IL16	NM_004513.4	Endogenous
527	EGLN1	NM_022051.1	Endogenous
528	LOC391126	XR_017684.2	Endogenous
529	TAPBP	NM_003190.4	Endogenous
530	NUMB	NM_001005744.1	Endogenous
531	CENTD2	NM_001040118.2	Endogenous
532	CLSTN1	NM_001009566.2	Endogenous
533	PSMA4 b	NM_002789.4	Endogenous
534	LOC648000	XM_371757.4	Endogenous
535	COX7C	NM_001867.2	Endogenous
536	PIK3CD	NM_005026.3	Endogenous
537	UQCRQ	NM_014402.4	Endogenous
538	IDS	NM_006123.4	Endogenous
539	C19orf59	NM_174918.2	Endogenous
540	MYL12A	NM_006471.3	Housekeeping
541	EIF2B4	NM_015636.3	Housekeeping
542	DGUOK b	NM_080916.2	Housekeeping
543	PSMC1	NM_002802.2	Housekeeping
544	CHFR	NM_018223.2	Housekeeping
545	ARPC2	NM_005731.2	Housekeeping
546	ATP5B	NM_001686.3	Housekeeping
547	RPL3	NM_001033853.1	Housekeeping
548	ZNF143	NM_003442.5	Housekeeping
549	PSMD7	NM_002811.4	Housekeeping
550	TBP	NM_003194.4	Housekeeping
551	DHX16	NM_003587.4	Housekeeping
552	TUG1	NR_002323.2	Housekeeping
553	GUSB	NM_000181.3	Housekeeping
554	HDAC3	NM_003883.3	Housekeeping
555	SDHA	NM_004168.3	Housekeeping
556	PGK1	NM_000291.3	Housekeeping
557	STAMBP	NM_006463.4	Housekeeping
558	MTCH1	NM_014341.2	Housekeeping
559	TUBB	NM_178014.2	Housekeeping

TABLE II

Rank	Sequence ID#	Gene	Class Name

1	TPR	NM_003292.2	Endogenous
2	DNAJB1	NM_006145.2	Endogenous
3	PDCD10	NM_145859.1	Endogenous
4	PSMB7	NM_002799.2	Endogenous
5	MERTK	NM_006343.2	Endogenous
6	AFTPH	NM_017657.4	Endogenous
7	BCOR	NM_017745.5	Endogenous
8	RASSF5	NM_182664.2	Endogenous
9	SNX11	NM_152244.1	Endogenous
10	ANP32B	NM_006401.2	Endogenous
11	C4B	NM_001002029.3	Endogenous
12	NME1-NME2	NM_001018136.2	Endogenous
13	DGUOK	NM_080916.2	Endogenous
14	CYP1B1	NM_000104.3	Endogenous
15	MPDU1	NM_004870.3	Endogenous
16	MED16	NM_005481.2	Endogenous
17	FAM179A	NM_199280.2	Endogenous
18	CPPED1	NM_018340.2	Endogenous
19	LOC648927	XR_038906.2	Endogenous
20	ANKHD1	NM_017747.2	Endogenous
21	CN312986	CN312986.1	Endogenous
22	PHCA	NM_018367.6	Endogenous
23	CD1A	NM_001763.2	Endogenous
24	NCOA5	NM_020967.2	Endogenous
25	SLC6A12	NM_003044.4	Endogenous
26	LOC728533	XR_015610.3	Endogenous
27	TRAF3IP2	NM_147686.3	Endogenous
28	TBCE	NM_001079515.2	Endogenous
29	CCT6A	NM_001762.3	Endogenous
30	P2RY5	NM_005767.5	Endogenous
31	RNASE2	NM_002934.2	Endogenous
32	CLN8	NM_018941.3	Endogenous
33	REPS1	NM_001128617.2	Endogenous
34	TPT1	NM_003295.3	Endogenous
35	LOC100129022	XM_001716591.1	Endogenous
36	KLRC1	NM_002259.3	Endogenous
37	AZI2	NM_022461.4	Endogenous
38	FAM193A	NM_003704.3	Endogenous
39	PLAC8	NM_001130715.1	Endogenous
40	LDHA	NM_001165416.1	Endogenous
41	GPATCH3	NM_022078.2	Endogenous
42	RBM14	NM_006328.3	Endogenous
43	KYNU	NM_001032998.1	Endogenous
44	PPP2R5C	NM_178588.1	Endogenous
45	S100A12 b	NM_005621.1	Endogenous
46	SFMBT1	NM_001005158.2	Endogenous
47	CCR6	NM_031409.2	Endogenous
48	TRIM39	NM_021253.3	Endogenous
49	AK126342	AK126342.1	Endogenous
50	SLC45A3	NM_033102.2	Endogenous
51	IL4	NM_000589.2	Endogenous
52	UBE2I	NM_194259.2	Endogenous
53	PRPF3	NM_004698.2	Endogenous
54	NDUFB3	NM_002491.2	Endogenous
55	CRKL	NM_005207.3	Endogenous
56	IDO1	NM_002164.3	Endogenous
57	PUM1	NM_001020658.1	Endogenous
58	BCL10	NM_003921.2	Endogenous
59	TMBIM6	NM_003217.2	Endogenous
60	C17orf51	XM_944416.1	Endogenous
61	BANP	NM_079837.2	Endogenous
62	HAVCR2	NM_032782.3	Endogenous
63	BAG3	NM_004281.3	Endogenous
64	DBI	NM_001079862.1	Endogenous
65	C4orf27	NM_017867.2	Endogenous
66	TSC1	NM_000368.4	Endogenous
67	LPCAT4	NM_153613.2	Endogenous
68	SAMSN1	NM_022136.3	Endogenous
69	SNORA56	NR_002984.1	Endogenous
70	ARG1	NM_000045.3	Endogenous
71	IL1R2	NM_173343.1	Endogenous
72	CCND3	NM_001760.2	Endogenous
73	USP9Y	NM_004654.3	Endogenous
74	ATP2C1	NM_014382.3	Endogenous
75	PSMB1	NM_002793.3	Endogenous
76	NDUFAF2	NM_174889.4	Endogenous
77	VPS37C	NM_017966.4	Endogenous
78	HAT1	NM_003642.3	Endogenous
79	LOC732371	XM_001133019.1	Endogenous
80	LOC148137	NM_144692.1	Endogenous
81	CCR1	NM_001295.2	Endogenous
82	CCDC97	NM_052848.1	Endogenous
83	PPP6C	NM_002721.4	Endogenous
84	GPI	NM_000175.2	Endogenous
85	PIM2	NM_006875.3	Endogenous
86	STAT6	NM_003153.4	Endogenous
87	BATF	NM_006399.3	Endogenous
88	EIF4ENIF1	NM_019843.2	Endogenous
89	HSP90AB1	NM_007355.3	Endogenous
90	U2AF2	NM_007279.2	Endogenous
91	CYBB	NM_000397.3	Endogenous
92	WDR1	NM_005112.4	Endogenous
93	PSMB8	NM_004159.4	Endogenous
94	TBC1D12	NM_015188.1	Endogenous
95	LOC648000	XM_371757.4	Endogenous
96	XCL2	NM_003175.3	Endogenous
97	PTGDR	NM_000953.2	Endogenous
98	ACSL5	NM_203379.1	Endogenous
99	CASP1	NM_033294.3	Endogenous
100	UBTF	NM_001076683.1	Endogenous

In one embodiment, a novel gene expression profile or signature can identify and distinguish patients having cancerous tumors from patients having benign nodules. See for example the genes identified in Table I and Table II which may form a suitable gene expression profile. In another embodiment, a portion of the genes of Table I form a suitable profile. In yet another embodiment, a portion of the genes of Table II form a suitable profile. As discussed herein, these profiles are used to distinguish between cancerous and non-cancerous tumors by generating a discriminant score based on differences in gene expression profiles as exemplified below. The validity of these signatures was established on samples collected at different locations by different groups in a cohort of patients with undiagnosed lung nodules. See Example 7 and FIGS. 2A-2B and FIG. 6 . The lung cancer signatures or gene expression profiles identified herein (i.e., Table I or Table II) may be further optimized to reduce the numbers of gene expression products necessary and increase accuracy of diagnosis.
In one embodiment, the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In another embodiment, the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In another embodiment, the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, or 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In one embodiment, the composition includes at least 3 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 5 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 15 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 20 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 25 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 30 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 35 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 40 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 45 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 50 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 55 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 60 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 65 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 70 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 75 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 80 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 85 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 90 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 95 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 150 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 200 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 250 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 300 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 350 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 400 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 450 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 500 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table I. In another embodiment, the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table II.
In yet another embodiment, the expression profile is formed by the first 3 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 5 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 10 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 15 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 20 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 25 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 30 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 35 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 40 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 45 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 50 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 55 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 60 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 65 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 70 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 75 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 80 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 85 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 90 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 95 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 100 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 150 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 200 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 250 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 300 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 350 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 400 genes in rank order of Table I. In yet another embodiment, the expression profile is formed by the first 539 genes in rank order of Table I.
As discussed below, the compositions described herein can be used with the gene expression profiling methods which are known in the art. Thus, the compositions can be adapted accordingly to suit the method for which they are intended to be used. In one embodiment, at least one polynucleotide or oligonucleotide or ligand is attached to a detectable label. In certain embodiments, each polynucleotide or oligonucleotide is attached to a different detectable label, each capable of being detected independently. Such reagents are useful in assays such as the nCounter, as described below, and with the diagnostic methods described herein.
In another embodiment, the composition comprises a capture oligonucleotide or ligand, which hybridizes to at least one polynucleotide or oligonucleotide or ligand. In one embodiment, such capture oligonucleotide or ligand may include a nucleic acid sequence which is specific for a portion of the oligonucleotide or polynucleotide or ligand which is specific for the gene of interest. The capture ligand may be a peptide or polypeptide which is specific for the ligand to the gene of interest. In one embodiment, the capture ligand is an antibody, as in a sandwich ELISA.
The capture oligonucleotide also includes a moiety which allows for binding with a substrate. Such substrate includes, without limitation, a plate, bead, slide, well, chip or chamber. In one embodiment, the composition includes a capture oligonucleotide for each different polynucleotide or oligonucleotide which is specific to a gene of interest. Each capture oligonucleotide may contain the same moiety which allows for binding with the same substrate. In one embodiment, the binding moiety is biotin.
Thus, a composition for such diagnosis or evaluation in a mammalian subject as described herein can be a kit or a reagent. For example, one embodiment of a composition includes a substrate upon which the ligands used to detect and quantitate mRNA are immobilized. The reagent, in one embodiment, is an amplification nucleic acid primer (such as an RNA primer) or primer pair that amplifies and detects a nucleic acid sequence of the mRNA. In another embodiment, the reagent is a polynucleotide probe that hybridizes to the target sequence. In another embodiment, the target sequences are illustrated in Table III. In another embodiment, the reagent is an antibody or fragment of an antibody. The reagent can include multiple said primers, probes or antibodies, each specific for at least one gene, gene fragment or expression product of Table I or Table II. Optionally, the reagent can be associated with a conventional detectable label.
In another embodiment, the composition is a kit containing the relevant multiple polynucleotides or oligonucleotide probes or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items. In still another embodiment, at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label. In certain embodiments, the reagent is immobilized on a substrate. Exemplary substrates include a microarray, chip, microfluidics card, or chamber.
In one embodiment, the composition is a kit designed for use with the nCounter Nanostring system, as further discussed below.

II. GENE EXPRESSION PROFILING METHODS

Methods of gene expression profiling that were used in generating the profiles useful in the compositions and methods described herein or in performing the diagnostic steps using the compositions described herein are known and well summarized in U.S. Pat. No. 7,081,340. Such methods of gene expression profiling include methods based on hybridization analysis of polynucleotides, methods based on sequencing of polynucleotides, and proteomics-based methods. The most commonly used methods known in the art for the quantification of mRNA expression in a sample include northern blotting and in situ hybridization; RNAse protection assays; nCounter® Analysis; and PCR-based methods, such as RT-PCR. Alternatively, antibodies may be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes. Representative methods for sequencing-based gene expression analysis include Serial Analysis of Gene Expression (SAGE), and gene expression analysis by massively parallel signature sequencing (MPSS).
In certain embodiments, the compositions described herein are adapted for use in the methods of gene expression profiling and/or diagnosis described herein, and those known in the art.
A. Patient Sample
The “sample” or “biological sample” as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells. In one embodiment, a suitable sample is whole blood. In another embodiment, the sample may be venous blood. In another embodiment, the sample may be arterial blood. In another embodiment, a suitable sample for use in the methods described herein includes peripheral blood, more specifically peripheral blood mononuclear cells. Other useful biological samples include, without limitation, plasma or serum. In still other embodiment, the sample is saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoalveolar lavage fluid, and other cellular exudates from a subject suspected of having a lung disease. Such samples may further be diluted with saline, buffer or a physiologically acceptable diluent. Alternatively, such samples are concentrated by conventional means. It should be understood that the use or reference throughout this specification to any one biological sample is exemplary only. For example, where in the specification the sample is referred to as whole blood, it is understood that other samples, e.g., serum, plasma, etc., may also be employed in another embodiment.
In one embodiment, the biological sample is whole blood, and the method employs the PaxGene Blood RNA Workflow system (Qiagen). That system involves blood collection (e.g., single blood draws) and RNA stabilization, followed by transport and storage, followed by purification of Total RNA and Molecular RNA testing. This system provides immediate RNA stabilization and consistent blood draw volumes. The blood can be drawn at a physician's office or clinic, and the specimen transported and stored in the same tube. Short term RNA stability is 3 days at between 18-25° C. or 5 days at between 2-8° C. Long term RNA stability is 4 years at −20 to −70° C. This sample collection system enables the user to reliably obtain data on gene expression in whole blood. In one embodiment, the biological sample is whole blood. While the PAXgene system has more noise than the use of PBMC as a biological sample source, the benefits of PAXgene sample collection outweighs the problems. Noise can be subtracted bioinformatically by the person of skill in the art.
In one embodiment, the biological samples may be collected using the proprietary PaxGene Blood RNA System (PreAnalytiX, a Qiagen, BD company). The PAXgene Blood RNA System comprises two integrated components: PAXgene Blood RNA Tube and the PAXgene Blood RNA Kit. Blood samples are drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs. Thereafter, the miRNA is detected and/or measured using a variety of assays.
B. Nanostring Analysis
A sensitive and flexible quantitative method that is suitable for use with the compositions and methods described herein is the nCounter® Analysis system (NanoString Technologies, Inc., Seattle WA). The nCounter Analysis System utilizes a digital color-coded barcode technology that is based on direct multiplexed measurement of gene expression and offers high levels of precision and sensitivity (<1 copy per cell). The technology uses molecular “barcodes” and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction. Each color-coded barcode is attached to a single target-specific probe (i.e., polynucleotide, oligonucleotide or ligand) corresponding to a gene of interest, i.e., a gene of Table I. Mixed together with controls, they form a multiplexed CodeSet. In one embodiment, the CodeSet includes all 559 genes of Table I. In another embodiment, the CodeSet includes all 100 genes of Table II. In another embodiment, the CodeSet includes at least 3 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 5 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 10 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 15 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 20 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 25 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 30 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 40 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 50 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 60 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 70 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 80 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 90 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 100 genes of Table I. In another embodiment, the CodeSet includes at least 200 genes of Table I. In another embodiment, the CodeSet includes at least 300 genes of Table I. In yet another embodiment, the CodeSet includes at least 400 genes of Table I. In yet another embodiment, the CodeSet includes at least 500 genes of Table I. In yet another embodiment, the CodeSet is formed by the first 539 genes in rank order of Table I. In yet another embodiment, the CodeSet includes any subset of genes of Table I, as described herein. In another embodiment, the CodeSet includes any subset of genes of Table II, as described herein.
The NanoString platform employs two ˜50 base probes per mRNA that hybridizes in solution. The Reporter Probe carries the signal; the Capture Probe allows the complex to be immobilized for data collection. The probes are mixed with the patient sample. After hybridization, the excess probes are removed and the probe/target complexes aligned and immobilized to a substrate, e.g., in the nCounter Cartridge.
The target sequences utilized in the Examples below for each of the genes of Table I and Table II are shown in Table III below, and are reproduced in the sequence listing. These sequences are portions of the published sequences of these genes. Suitable alternatives may be readily designed by one of skill in the art.
Sample Cartridges are placed in the Digital Analyzer for data collection. Color codes on the surface of the cartridge are counted and tabulated for each target molecule.
A benefit of the use of the NanoString nCounter system is that no amplification of mRNA is necessary in order to perform the detection and quantification. However, in alternate embodiments, other suitable quantitative methods are used. See, e.g., Geiss et al, Direct multiplexed measurement of gene expression with color-coded probe pairs, Nat Biotechnol. 2008 March; 26(3):317-25. doi: 10.1038/nbt1385. Epub 2008 Feb. 17, which is incorporated herein by reference in its entirety.

TABLE III

	Se-
	quence		Posi-
	ID#	Gene	tion	Target Sequence

1	ABCA5	NM_018672.4	6839-	AAGGAAGACTGTGTGTAGAATCT
			6938	TACGTAATAGTCTGATTCTTTGA
				CTCTGTGGCTAGAATGACAGTTA
				TCTATGGAGGTGGTAGAATTAAG
				CCATACCT

2	ABCF1	NM_00102509	2875-	CCTAAACAAACAAGAGGTGACC
		1.1	2974	ACCTTATTGTGAGGTTCCATCCA
				GCCAAGTTTATGTGGCCTATTGT
				CTCAGGACTCTCATCACTCAGAA
				GCCTGCCTC

3	ACAA2	NM_006111.2	1605-	CTCACTGTGACCCATCCTTACTC
			1704	TACTTGGCCAGGCCACAGTAAAA
				CAAGTGACCTTCAGAGCAGCTGC
				CACAACTGGCCATGCCCTGCCAT
				TGAAACAG

4	PHCA	NM_018367.6	3324-	AGCCAATAGTGATTTGTTTGCAT
			3423	ATCACCTAATGTGAAAAGTGCTC
				ATCTGTGAACTCTACAGCAAATT
				ATATTTTAGAAAATACTTTGTGA
				GGCCGGGC

5	ACSL5	NM_203379.1	2701-	CTATCACTCATGTCAATCATATC
			2800	TATGAGACAAATGTCTCCGATGC
				TCTTCTGCGTAAATTAAATTGTG
				TACTGAAGGGAAAAGTTTGATCA
				TACCAAAC

6	CABC1	NM_020247.4	2536-	TTCTAGAGTGAGATTTGTGTTTT
			2635	CTGCCCTTTTCCTCTCCAGCCGA
				TGGGCTGGAGCTGGGAGAGGTGC
				TGAGCTAACAGTGCCAACAAGT
				GCTCCTTAA

7	CD97	NM_078481.3	3186-	GCCAGTACTCGGGACAGACTAA
			3285	GGGCGCTTGTCCCATCCTGGACT
				TTTCCTCTCATGTCTTTGCTGCA
				GAACTGAAGAGACTAGGCGCTGG
				GGCTCAGCT

8	AFTPH	NM_017657.4	2741-	CTACCACCCGTCCAGTTTGACTG
			2840	GAGTAGCAGTGGCCTTACTAACC
				CTTTAGATGGTGTGGATCCGGAG
				TTGTATGAGTTAACAACTTCTAA
				GCTGGAAA

9	AHCYL1	NM_00124267	2401-	CTACCCGGCAGGTAGGTTAGATG
		6.1	2500	TGGGTGGTGCATGTTAATTTCCC
				TTAGAAGTTCCAAGCCCTGTTTC
				CTGCGTAAAGGTGGTATGTCCAG
				TTCAGAGA

10	AK	AK026725.1	1869-	AATGAAATTACTGTAGAGTCAGC
	026725		1968	AAAGAAGTAGAGAAGAAAAAAC
				ACCAAGAATGAGGAGAACCTAG
				CAAGGGCAGGCTTTTGGAAGCA
				AGAGGTAGATA

11	AK	AK093878.1	1554-	AGAATTTCTTGGTAGCTTTACAC
	093878		1653	CGAAAAATGCGTGTAACTAAAT
				ACCAGACATCTTGACCATTCAGC
				TAGAACCCTGGCAGCAACAGAG
				CTATTTAATT

12	AK	AK094576.1	1765-	CCCCTCCAGCCAGCCCTGCGTGG
	094576		1864	TTGTGGCCCCACTGCAGAAACGC
				CTCCGCTTAACACTCCAGCCTCT
				CTTCTATTCGGTCAGGCCACAGC
				TGCTGACT

13	AK	AK124143.1	2252-	GTACCTGGTAGAAATTGTGTCTT
	124143		2351	GGAATGACCCTTTCGAGTTATTG
				ACATGGCTCTGATGAATAGAACA
				TGAGCCCCAAAACTAAATCCAA
				AAGGAATTT

14	AK	AK126342.1	2906-	CTTATTGATTAGTGAATGTAGCT
	126342		3005	TAAGCCTTTGTATGTGTCCTCAG
				GGGGCAGACCGACTTTAAGAGG
				GACCAGATAACGTTTGAATGGA
				GGGATTATAT

15	AKAP4	NM_139289.1	417-	CTGTAAGTGTCCTCAACTGGCTT
			516	CTCAGTGATCTCCAGAAGTATGC
				CTTGGGTTTCCAACATGCACTGA
				GCCCCTCAACCTCTACCTGTAAA
				CATAAAGT

16	AKR1C3	NM_003739.4	1097-	GAGGACGTCTCTATGCCGGTGAC
			1196	TGGACATATCACCTCTACTTAAA
				TCCGTCCTGTTTAGCGACTTCAG
				TCAACTACAGCTGAGTCCATAGG
				CCAGAAAG

17	ALAS1	NM_000688.4	1616-	GGGGATCGGGATGGAGTCATGC
			1715	CAAAAATGGACATCATTTCTGGA
				ACACTTGGCAAAGCCTTTGGTTG
				TGTTGGAGGGTACATCGCCAGCA
				CGAGTTCTC

18	AMD1	NM_001634.4	572-	ACCACCCTCTTGCTGAAAGCACT
			671	GGTTCCCCTGTTGAAGCTTGCTA
				GGGATTACAGTGGGTTTGACTCA
				ATTCAAAGCTTCTTTTATTCTCG
				TAAGAATT

19	AMPD3	NM_000480.2	3389-	GTGATGCTCAGGGGCTGTCAAAG
			3488	TGACTGCGTTCATCAGTTTTACA
				CTGGGGCTGCTACATAATATTTT
				CATTTGAACGAAGAACTTCAAAA
				AGCACAGG

20	ANKHD1	NM_017747.2	7665-	CTTGGAACCCTATGATAAAAGTT
			7764	ATCCAAAATTCAACTGAATGCAC
				TGATGCCCAGCAGATTTGGCCTG
				GCACGTGGGCACCTCATATTGGA
				AACATGCA

21	ANP32B	NM_006401.2	661-	CACCTTGGAACCTTTGAAAAAGT
			760	TAGAATGTCTGAAAAGCCTGGAC
				CTCTTTAACTGTGAGGTTACCAA
				CCTGAATGACTACCGAGAGAGT
				GTCTTCAAG

22	ANXA1b	NM_000700.1	516-	GAAATCAGAGACATTAACAGGG
			615	TCTACAGAGAGGAACTGAAGAG
				AGATCTGGCCAAAGACATAACCT
				CAGACACATCTGGAGATTTTCGG
				AACGCTTTGC

23	ANXA1	NM_000700.2	1191-	TGGATGAAACCAAAGGAGATTA
			1290	TGAGAAAATCCTGGTGGCTCTTT
				GTGGAGGAAACTAAACATTCCCT
				TGATGGTCTCAAGCTATGATCAG
				AAGACTTTA

24	AP2S1	NM_021575.3	746-	CGAGTAACCGTGCCGTTGTCGTG
			845	TGATGCCATAAGCGTCTGTGCGT
				GGAGTCCCCAATAAACCTGTGGT
				CCTGCCTGGCCTTGCCGTCAAAA
				AAAAAAAA

25	CENTD2	NM_00104011	4923-	AAACTCCAGAACAGCAGAAAGC
		8.2	5022	GGGTGCTGTAGAGGAGCACTCA
				GCTCACGGGGAGGGAGCTCTTG
				GCTGAGCTTCTACAGGGCTGAGA
				GCTGCGCTTTG

26	ARCN1	NM_001655.4	3437-	CACTTTTAGCTGGTTGAAAAGTA
			3536	CCACTCCCACTCTGAACATCTGG
				CCGTCCCTGCAAAGAGTGTACTG
				TGCTTGAAGCAGAGCACTCACAC
				ATAAATGG

27	ARG1b	NM_000045.2	506-	AAGGAACTAAAAGGAAAGATTC
			605	CCGATGTGCCAGGATTCTCCTGG
				GTGACTCCCTGTATATCTGCCAA
				GGATATTGTGTATATTGGCTTGA
				GAGACGTGG

28	ARG1	NM_000045.3	989-	TTCGGACTTGCTCGGGAGGGTAA
			1088	TCACAAGCCTATTGACTACCTTA
				ACCCACCTAAGTAAATGTGGAA
				ACATCCGATATAAATCTCATAGT
				TAATGGCAT

29	ARHGAP	NM_018054.5	3027-	CATGTATGGTCTGTGTCTCCCCA
	17		3126	GTCCCCTCAGAACCATGCCCATG
				GATGGTGACTGCTGGCTCTGTCA
				CCTCATCAAACTGGATGTGACCC
				ATGCCGCC

30	ARHGAP	NM_033515.2	2499-	TTTTTGACCAAAAAGATAACAAA
	18		2598	TACCAGGTATGGCAAGTIGTGAA
				GACAGCACATTAAAACATACCTA
				ATTTCACAGTATTCCTGTCACGA
				CAGAATGT

31	ARHGAP	NM_015071.4	6088-	TCCCTGAGCTTTCCCAGTAGCCT
	26		6187	CCAGTTTCCTTTGTAAGACCCAG
				GGATCACTTAGCCATAGCCTGAA
				TCTTTTAGGGGTATTAAGGTCAG
				CCTCTCAC

32	ARHGEF	NM_015318.3	5128-	GATTACAACATTTCCTCACTGCG
	18		5227	GGATATTTCTGACCCGCTTTAGA
				ACTTAAGACCTGATTCTAGCAAT
				AAACGTGTCCGAGATGAGCGGT
				GAAAAAAAA

33	FLJ	NM_018071.4	5402-	GAATGTGTCTCCTCCACAGTGGC
	10357		5501	TCCCAGAGGTTCCACACACTCTC
				TGAAGCTCCTTCTCCCACACTGC
				ACCTACTCCTTGAGGCTGAACTG
				GTCACAGA

34	ARHGEF	NM_005435.3	5151-	GGGGGACCATTGGGGCCTGAGC
	5		5250	CAAGGAACTTTCCTTCTACTGCC
				TTATAGTGCTTAAACATTCTCCG
				CCTCCAGGGTGCAGATTCAGAGC
				TGGCCAGAG

35	ARL8B	NM_018184.2	2491-	ACCATTACAAAGAATGTGGCAA
			2590	CTTGCTTGTGCCTAAAAGGAGGA
				ATTGGAACTAGAATGTGTGACTC
				TGTGGGGACTGCATAGGTTTGTT
				AATTGACCT

36	ARPC2	NM_005731.2	951-	ACGGGGAAGACGTTTTCATCCCG
			1050	CTAATCTTGGGAATAAGAGGAG
				GAAGCGGCTGGCAACTGAAGGC
				TGGAACACTTGCTACTGGATAAT
				CGTAGCTTTT

37	ASF1B	NM_018154.2	1476-	CTGTCTCCGGGCCAGGGTCAGGG
			1575	ACCCTCTGCCTCTGGCAGCCTTA
				ACCTGTCCTCTGCTAGGACCAGG
				GTGATTTCAAGCCAGGGAAGCA
				ACTGGGACC

38	ATG4B	NM_178326.2	106-	GGACGCAGCTACTCTGACCTACG
			205	ACACTCTCCGGTTTGCTGAGTTT
				GAAGATTTTCCTGAGACCTCAGA
				GCCCGTTTGGATACTGGGTAGAA
				AATACAGC

39	ATG5	NM_004849.2	1105-	TGCAGTGGCTGAGTGAACATCTG
			1204	AGCTACCCGGATAATTTTCTTCA
				TATTAGTATCATCCCACAGCCAA
				CAGATTGAAGGATCAACTATTTG
				CCTGAACA

40	ATM	NM_000051.3	31-	ACGCTAAGTCGCTGGCCATTGGT
			130	GGACATGGCGCAGGCGCGTTTGC
				TCCGACGGGCCGAATGTTTTGGG
				GCAGTGTTTTGAGCGCGGAGACC
				GCGTGATA

41	ATP2C1	NM_014382.3	4070-	TAAAAAGTCCCCAAACCCAAAC
			4169	AAATGGTTTATGAACCAGAGTAT
				ATGTGGAAGATTCTTTGCTGGTC
				TTGCTCTGTGTGCATCTGAAGCT
				TCTTTGGCC

42	ATP5B	NM_001686.3	1626-	CTATATGGTGGGACCCATTGAAG
			1725	AAGCTGTGGCAAAAGCTGATAA
				GCTGGCTGAAGAGCATTCATCGT
				GAGGGGTCTTTGTCCTCTGTACT
				GTCTCTCTC

43	ATP5I	NM_007100.2	256-	TTGCCAGAGAATTGGCAGAAGA
			355	TGACAGCATATTAAAGTGAGTGA
				CCCTGCGACCCACTCTTTGGACC
				AGCAGCGGATGAATAAAGCTTC
				CTGTGTTGTG

44	ATP5J2	NM_004889.3	267-	GCTGGCATGCTACGTGCTCTTTA
			366	GCTACTCCTTTTCCTACAAGCAT
				CTCAAGCACGAGCGGCTCCGCA
				AATACCACTGAAGAGGACACAC
				TCTGCACCCC

45	ATP5L	NM_006476.4	196-	GGGACGGGGTCCTGCAGCGGGT
			295	CCTTCCGGCGGGTGACATTCAGC
				CGGCGGTTCGGGGCGACGGACT
				CTCCATTCCAGAACCATGGCCCA
				ATTTGTCCGT

46	AW	AW173314.1	419-	AGCAGAAGGCAGGGGAGTCCAC
	173314		518	ACAGGGCAAGCAGCAACCAGGC
				TTCTGAGGACAGGAAAGGAGGG
				AGCATCTGGTGGGAAGCTGGCG
				AGGAGGGGCTGG

47	AW	AW270402.1	203-	GATATCTCACACACGGAATAATC
	270402		302	ATTAAGAAACAACCACTGTTGAG
				CAAAGTTGATAGGCAGTAAGGA
				AATAAAGTGGACATAAACACAG
				CAGTACTAAT

48	AZI2	NM_022461.4	3031-	GAATTGGTGTCAGATGCTGGAAT
			3130	TTATTCTGACCAATGAACACAGC
				TGACTCAGGGGAGTACAATCTCC
				TGCCAAGTAATAGAACCAAACC
				CAATATGCA

49	BACH2	NM_021813.3	8696-	TCCAGAACCAGTCTGATGCAAGT
			8795	GCACCTCTAATATATGCCTTACA
				AACTCCAGAGGCCATATTCAAAA
				CAGGGTCTTCTCAGTGTATGCAA
				GGGGCTGC

50	BAG3	NM_004281.3	2304-	CCCCACCACCTGTTAGCTGTGGT
			2403	TGTGCACTGTCTTTTGTAGCTCT
				GGACTGGAGGGGTAGATGGGGAG
				TCAATTACCCATCACATAAATAT
				GAAACATT

51	BANP	NM_079837.2	2125-	GGAGCCCTTTGCTGTGTGCTCTG
			2224	TCCAGTGTCATGAGGCAGGTGTT
				TGCAAAGCCAGCTCTCGGTTCCG
				ATGGGGTATTGCTGACCTACTTT
				TCTAGGGG

52	BATF	NM_006399.3	294-	CCTGGCAAACAGGACTCATCTGA
			393	TGATGTGAGAAGAGTTCAGAGG
				AGGGAGAAAAATCGTATTGCCG
				CCCAGAAGAGCCGACAGAGGCA
				GACACAGAAGG

53	BCL10	NM_003921.2	1251-	TGAAAATACCATCTTCTCTTCAA
			1350	CTACACTTCCCAGACCTGGGGAC
				CCAGGGGCTCCTCCTTTGCCACC
				AGATCTACAGTTAGAAGAAGAA
				GGAACTTGT

54	BCL6	NM_00113084	3401-	CCTCACGGTGCCTTTTTTCACGG
		5.1	3500	AAGTTTTCAATGATGGGCGAGCG
				TGCACCATCCCTTTTTGAAGTGT
				AGGCAGACACAGGGACTTGAAG
				TTGTTACTA

55	BCOR	NM_017745.5	5794-	ATACAAAGCTCTGATGACAGGCC
			5893	ATGACTGTAGAGTGGTCAGAACT
				GTGTGGTTGGTTTGAGGGAGCGA
				ATTCGGGGAAGGCACTTGGTGAT
				ATAACTTT

56	BF	BF375676.1	141-	TGTATTTCTGTGCAATGAGAGAG
	375676		240	GCTCTTTATGGTGGTGCTACAAA
				CAAGCTCATCTTTGGAACTGGCA
				CTCTGCTTGCTGTCCAGCCAAAT
				ATCCAGAA

57	BID	NM_001196.2	1876-	AAGCACGACAGTGGATGCTGGG
			1975	TCCATATCACACACATTGCTGTG
				AACAGGAAACTCCTGTGACCAC
				AACATGAGGCCACTGGAGACGC
				ATATGAGTAAG

58	BMPR2	NM_001204.6	1164-	CAGCGGCCCTGGCGGGTGCCCTG
			1263	GCTACCATGGACCATCCTGCTGG
				TCAGCACTGCGGCTGCTTCGCAG
				AATCAAGAACGGCTATGTGCGTT
				TAAAGATC

59	BQ	BQ189294.1	416-	GCTGGAGTGATTGGCCCTGATGA
	189294		515	CCATGGAGAAAAGAGAGTAGGG
				AGAACAGTATAACCAGAAGTCA
				GGGGGGTCTCCTGGAATCCCTCC
				TCACAATACC

60	BU	BU743228.1	154-	CCCTGTGGGCCTTGCAGGCCAGT
	743228		253	CCAGGCAGGTCTTTCACACTGTT
				GTCCCACATAACAGAAAAAGCT
				GAGCAGACAGGGTAGGAAACAC
				ACTTGCATCT

61	BX	BX089765.1	106-	TTAAGCAACTTGCTCCAGTGACG
	089765		205	CAGCTGGTAAGCAGCAGAGCTG
				GGATTAAAACCCAGGCATTCTGA
				TTCCACCACCTACACACTTAGCC
				ATTCCGCCC

62	BX	BX108566.1	365-	ATTTAGGGTGAGAGCTTCACAGC
	108566		464	TGAAAATCTCCTTTAAAGAAAAC
				GCGGCCCAAATGTGCTGGGAGG
				AGAAGCCAGTGGATCTAGGAGG
				GGGCCCGGCG

63	BX	BX400436.2	1-	ATATTTTGGAGAGGGAAGTTGGC
	400436		100	TCACTGTTGTAGAGGACCTGAAC
				AAGGTGTTCCCACCCGAGGTCGC
				TGTGTTTGAGCCATCAGAAGCAG
				AGATCTCC

64	BX	BX436458.2	518-	ATGCAGACAATTTGCCTGTGAGA
	436458		617	TGAGGAAAATTCTCTGGAAGATT
				TAGGCCCTGAGAGCTGAAAAGG
				GACCCTAAACATTACCTGGTGAC
				AACTGCCCT

65	C15orf	NM_015492.4	3535-	CCTGAGCTTTTAACGTGAGGGTC
	39		3634	TTTATTGGATAGGACTACTCCCT
				ATTTCTTGCCTAGAGAACACACA
				TGGGCTTTGGAGCCCGACAGACC
				TGGGCTTG

66	C17orf	XM_944416.1	4909-	AAGGATGGGGGTGGATTGACCA
	51		5008	AGCTGGGCCAGAGGTGCGAGGA
				GCTGATCTGCGAGCCCTGTGTGC
				CTGTGAGTCCTGGCGGAGTGGCC
				GTGCGTGGTG

67	C3	NM_000064.2	4397-	CATCTACCTGGACAAGGTCTCAC
			4496	ACTCTGAGGATGACTGTCTAGCT
				TTCAAAGTTCACCAATACTTTAA
				TGTAGAGCTTATCCAGCCTGGAG
				CAGTCAAG

68	C4B	NM_00100202	4438-	GAGTCCAGGGTGCACTACACCGT
		9.3	4537	GTGCATCTGGCGGAACGGCAAG
				GTGGGGCTGTCTGGCATGGCCAT
				CGCGGACGTCACCCTCCTGAGTG
				GATTCCACG

69	C4orf	NM_017867.2	682-	GAACCGTGAAGATGAAACAGAG
	27		781	AGATAAGAAAGTIGTGACAAAG
				ACCTTTCATGGTGCAGGCTTGGT
				TGTTCCAGTAGATAAAAATGATG
				TTGGGTACCG

70	C8orf	NM_032847.2	1029-	TAAAAGATGAAGTTCACCCAGA
	76		1128	GGTGAAGTGTGTTGGCTCCGTAG
				CCCTGACTGCCTTGGTGACTGTA
				TCCTCAGAAGAATTTGAAGACAA
				GTGGTTCAG

71	C9orf	NM_182635.1	529-	CGCTGGCCATGGGGAAGCCACCT
	164		628	CCAGGGCAGTCCCAGGGACTGA
				ATTGGAAGTTGTCCCAAGTCACT
				TCAGGTCCAACTGGGACAGCAG
				AGGTAACCCC

72	CAMP	NM_004345.4	623-	TTGTCCAGAGAATCAAGGATTTT
			722	TTGCGGAATCTTGTACCCAGGAC
				AGAGTCCTAGTGTGTGCCCTACC
				CTGGCTCAGGCTTCTGGGCTCTG
				AGAAATAA

73	CASP1	NM_033294.3	219-	ATTTATCCAATAATGGACAAGTC
			318	AAGCCGCACACGTCTTGCTCTCA
				TTATCTGCAATGAAGAATTTGAC
				AGTATTCCTAGAAGAACTGGAGC
				TGAGGTTG

74	CASP2	NM_032983.3	3347-	CCCACCACTCTTGACTCAGGTGG
			3446	TGTCCTTCTTCCTCAAGTCTTGA
				CAATTCCCGGGCCCTTCAGTCCC
				TGAGCAGTCTACTTCTGTGTCTG
				TCACCACA

75	CASP3	NM_032991.2	686-	ACTCCACAGCACCTGGTTATTAT
			785	TCTTGGCGAAATTCAAAGGATGG
				CTCCTGGTTCATCCAGTCGCTTT
				GTGCCATGCTGAAACAGTATGCC
				GACAAGCT

76	CBLL1	NM_024814.3	1967-	ATGAGGGGGAAAAAAACTTATG
			2066	TGTAGTCAATCTTTTAAGCTTTG
				ACTGTTTTGGGAAGGAAGAGTAC
				CTCTTATCGAGGTAGTATAAAAC
				ACATAGGGT

77	CC2D1B	NM_032449.2	4183-	TTGCATAAGCACAGCTCAAGAAC
			4282	TGAGCTTTGTATGTGTCCTTTTG
				GGGGATAACAGGGCTGGACCATG
				CTTCCCTGCCCTTAAACGCAGAG
				CTTTTAGT

78	KIAA	NM_021174.5	201-	GGGAGAGGGCCCACACAGTCTC
	1967		300	CTCGCCGGCACCGGCCTCCTCCA
				TTTTTCCGGGCCTTGCGTGGAGG
				GTTTTGGCGGATGTTTTTGAACG
				AAGGAATGT

79	CCDC97	NM_052848.1	2867-	ATCCAGAGTGAGACAGCATTGG
			2966	AGGGACAAGTGTGCATGCAGAT
				GTCCTCAGACGGGAAGGTTTGAG
				AAGGGTCAGATGGTAGGCGGGC
				CTAACAAGGGC

80	CCL3	NM_002983.2	160-	CAGTTCTCTGCATCACTTGCTGC
			259	TGACACGCCGACCGCCTGCTGCT
				TCAGCTACACCTCCCGGCAGATT
				CCACAGAATTTCATAGCTGACTA
				CTTTGAGA

81	CCL3L1	NM_021006.4	422-	GGAGCCTGAGCCTTGGGAACAT
			521	GCGTGTGACCTCTACAGCTACCT
				CTTCTATGGACTGGTTATTGCCA
				AACAGCCACACTGTGGGACTCTT
				CTTAACTTA

82	CCL3L3	NM_00100143	402-	GGGGAGGAGCAGGAGCCTGAGC
		7.3	501	CTTGGGAACATGCGTGTGACCTC
				CACAGCTACCTCTTCTATGGACT
				GGTTATTGCCAAACAGCCACACT
				GTGGGACTC

83	CCL4	NM_002984.2	36-	TTCTGCAGCCTCACCTCTGAGAA
			135	AACCTCTTTGCCACCAATACCAT
				GAAGCTCTGCGTGACTGTCCTGT
				CTCTCCTCATGCTAGTAGCTGCC
				TTCTGCTC

84	CCND3	NM_001760.2	1216-	GGCCAGCCATGTCTGCATTTCGG
			1315	TGGCTAGTCAAGCTCCTCCTCCC
				TGCATCTGACCAGCAGCGCCTTT
				CCCAACTCTAGCTGGGGGTGGGC
				CAGGCTGA

85	CCR1	NM_001295.2	536-	CATCATTTGGGCCCTGGCCATCT
			635	TGGCTTCCATGCCAGGCTTATAC
				TTTTCCAAGACCCAATGGGAATT
				CACTCACCACACCTGCAGCCTTC
				ACTTTCCT

86	CCR6	NM_031409.2	936-	CTTTAACTGCGGGATGCTGCTCC
			1035	TGACTTGCATTAGCATGGACCGG
				TACATCGCCATTGTACAGGCGAC
				TAAGTCATTCCGGCTCCGATCCA
				GAACACTA

87	CCR9	NM_031200.1	1096-	CCCTGTTCTCTATGTTTTTGTGG
			1195	GTGAGAGATTCCGCCGGGATCTC
				GTGAAAACCCTGAAGAACTTGGG
				TTGCATCAGCCAGGCCCAGTGGG
				TTTCATTT

88	CCT6A	NM_001762.3	281-	GCCCAAGGGCACCATGAAGATG
			380	CTCGTTTCTGGCGCTGGAGACAT
				CAAACTTACTAAAGACGGCAAT
				GTGCTGCTTCACGAAATGCAAAT
				TCAACACCCA

89	CD14	NM_000591.2	886-	GCCCAAGCACACTCGCCTGCCTT
			985	TTCCTGCGAACAGGTTCGCGCCT
				TCCCGGCCCTTACCAGCCTAGAC
				CTGTCTGACAATCCTGGACTGGG
				CGAACGCG

90	CD160b	NM_007053.2	501-	TTGATGTTCACCATAAGCCAAGT
			600	CACACCGTTGCACAGTGGGACCT
				ACCAGTGTTGTGCCAGAAGCCAG
				AAGTCAGGTATCCGCCTTCAGGG
				CCATTTTT

91	CD160	NM_007053.3	1286-	AAAGGAAGACAGCCAGATCCAG
			1385	TGATTGACTTGGCATGAAAATGA
				GAAAATGCAGACAGACCTCAAC
				ATTCAACAACATCCATACAGCAC
				TGCTGGAGGA

92	CD1A	NM_001763.2	1816-	CCTGTTTTAGATATCCCTTACTC
			1915	CAGAGGGCCTTCCCTGACTTACA
				AGTGGGAAGCAGTCTCTTCCTGG
				TCTGAACTCCCGCCACATTTTAG
				CCGTACTT

93	CD36	NM_000072.3	1619-	TAAAGAATCTGAAGAGGAACTA
			1718	TATTGTGCCTATTCTTTGGCTTA
				ATGAGACTGGGACCATTGGTGAT
				GAGAAGGCAAACATGTTCAGAAG
				TCAAGTAAC

94	CD48	NM_001778.2	271-	AATTTAAAGGCAGGGTCAGACTT
			370	GATCCTCAGAGTGGCGCACTGTA
				CATCTCTAAGGTCCAGAAAGAG
				GACAACAGCACCTACATCATGA
				GGGTGTTGAA

95	CD69	NM_001781.2	1360-	TATACAGTGTCTTACAGAGAAAA
			1459	GACATAAGCAAAGACTATGAGG
				AATATTTGCAAGACATAGAATAG
				TGTTGGAAAATGTGCAATATGTG
				ATGTGGCAA

96	CD70	NM_001252.2	191-	CCTATGGGTGCGTCCTGCGGGCT
			290	GCTTTGGTCCCATTGGTCGCGGG
				CTTGGTGATCTGCCTCGTGGTGT
				GCATCCAGCGCTTCGCACAGGCT
				CAGCAGCA

97	CD79A	NM_021601.3	617-	TGAAGATGAAAACCTTTATGAAG
			716	GCCTGAACCTGGACGACTGCTCC
				ATGTATGAGGACATCTCCCGGGG
				CCTCCAGGGCACCTACCAGGATG
				TGGGCAGC

98	CD79B	NM_000626.2	350-	GAAGCTGGAAAAGGGCCGCATG
			449	GAAGAGTCCCAGAACGAATCTCT
				CGCCACCCTCACCATCCAAGGCA
				TCCGGTTTGAGGACAATGGCATC
				TACTTCTGT

99	CDC42	NM_006779.3	1779-	AGGGCTTTGTGGAGGACAGGCCT
	EP2		1878	TGCCCTCAAGAACGTCGTACCTG
				ACGCTGAGCCTGTCATGAGAATG
				CAACAGGAGCAAACCAAGTGTT
				GCTGTGACA

100	CDH5	NM_001795.3	3406-	TCTCCCCTTCTCTGCCTCACCTG
			3505	GTCGCCAATCCATGCTCTCTTTC
				TTTTCTCTGTCTACTCCTTATCC
				CTTGGTTTAGAGGAACCCAAGAT
				GTGGCCTT

101	CDKN1A	NM_000389.2	1976-	CATGTGTCCTGGTTCCCGTTTCT
			2075	CCACCTAGACTGTAAACCTCTCG
				AGGGCAGGGACCACACCCTGTAC
				TGTTCTGTGTCTTTCACAGCTCC
				TCCCACAA

102	CFD	NM_001928.2	860-	CTGGTTGGTCTTTATTGAGCACC
			959	TACTATATGCAGAAGGGGAGGC
				CGAGGTGGGAGGATCATTGGAT
				CTCAGGAGTTCGAGATCAGCATG
				GGCCACGTAG

103	CHCHD3	NM_017812.2	1173-	TCCACCCTAACAAAGTAGGATGG
			1272	GGTTGGGGGCTAAATTAATTGGA
				GTGGGGCGAGGAGAGAGCCAGA
				AAACATAGATCCGAGGGCAGCA
				GTGCTGGGTG

104	CHFR	NM_018223.2	2836-	CGCCGCTCCCTCATGCTGCCCGG
			2935	GCCCTTCCTCCAAGACCCTACAG
				AGCCTGAGGGGCACCTTGGCTTC
				CGCCTGTGCTAGCTTTGCCATGT
				CATCTGGA

105	CHMP5	NM_016410.5	1148-	ACTAAGGAAATGGAATCTTAAA
			1247	AGTCTATGACAGTGTAACTCTAC
				AGTCTCAAAATGACCTGATAAAT
				TGATAAGACAAAGATGAGATTA
				TTGGGGCTGT

106	CIAPIN	NM_020313.3	1816-	GCATGTCTTGTAAAGAGAGGGG
	1		1915	ATGTGCATTTGTGTGTGATGTTG
				GATAGTCATCCACGCTCAGTTTG
				GACCATTGGAGGAACTTAGTGTC
				ACGCACAAA

107	CKS2	NM_001827.1	228-	AGACTTGGTGTCCAACAGAGTCT
			327	AGGCTGGGTTCATTACATGATTC
				ATGAGCCAGAACCACATATTCTT
				CTCTTTAGACGACCTCTTCCAAA
				AGATCAAC

108	CLEC4A	NM_194448.2	389-	ATTTCTACTGAATCAGCATCTTG
			488	GCAAGACAGTGAGAAGGACTGT
				GCTAGAATGGAGGCTCACCTGCT
				GGTGATAAACACTCAAGAAGAG
				CAGGATTTCA

109	CLEC4C	NM_203503.1	571-	TACGAGAGTATCAACAGTATCAT
			670	CCAAGCCTGACCTGCGTCATGGA
				AGGAAAGGACATAGAAGATTGG
				AGCTGCTGCCCAACCCCTTGGAC
				TTCATTTCA

110	CLEC5A	NM_013252.2	3251-	CCCCATCCAACCCTTAGACTCAC
			3350	GAACAAATCCACCTGAGATCAG
				CAGAGCCACCCTAGATCAGCTGA
				AACTCTAAGCACAAAAATAAAA
				ACTTATCACT

111	CLIC3	ILMN_179642	99-	CGTACGCCGCTACCTGGACAGCG
		3.1	198	CGATGCAGGAGAAAGAGTTCAA
				ATACACGTGTCCGCACAGCGCCG
				AGATCCTGGCGGCCTACCGGCCC
				GCCGTGCAC

112	CLK2	XM_941392.1	552-	GATTATAGCCGGGATCGGGGAG
			651	ATGCCTACTATGACACAGACTAT
				CGGCATTCCTATGAATATCAGCG
				GGAGAACAGCAGTTACCGCAGC
				CAGCGCAGCA

113	CLN8	NM_018941.3	4486-	GGCGCCAGAGCTGGGCTCTTCAA
			4585	CACGGCATTTAGCGCAGAAAGTC
				GTGGTTCAGGCAGTATGGGCCGC
				TGTGACAAAACACCTAAGACTG
				GGTAGTTTA

114	CLPTM1	NM_001294.3	2389-	TCTGTGTTTCCAGCCATCTCGCC
			2488	CTGCCAGCCCAGCACCACTGGGA
				ATCATGGTGAAGCTGATGCAGCG
				TTGCCGAGGGGGTGGGTTGGGC
				GGGGGTGGG

115	CLSTN1	NM_00100956	4990-	TTGAATACTGTTCTGTGACCCTG
		6.2	5089	ACTGCTAGTTCTGAGGACACTGG
				TGGCTGTGCTATGTGTGGCCATC
				CTCCATGTCCCGTCCCTGTAGCT
				GCTCTGTT

116	CN	CN312986.1	491-	AGGAAACTAAGACATGGAAAGG
	312986		590	TTAGGTAACTTGCCCAAGGTCGC
				ACAGCTAGTAAGTGGCAGACAT
				CCAGAGTCTCTGCTCTGCTCTTA
				ACTCTCACCA

117	CNIH4	NM_014184.3	526-	AATGACTGAAGCTGGAGAAGCC
			625	GTGGTTGAAGTCAGCCTACACTA
				CAGTGCACAGTTGAGGAGCCAG
				AGACTTCTTAAATCATCCTTAGA
				ACCGTGACCA

118	CNPY2	NM_014255.5	1038-	TTGCAGTAAGCGAACAGATCTTT
			1137	GTGACCATGCCCTGCACATATCG
				CATGATGAGCTATGAACCACTGG
				AGCAGCCCACACTGGCTTGATGG
				ATCACCCC

119	COLEC	NM_130386.2	901-	ACACAAGCCAGGCTATCCAGCG
	12		1000	AATCAAGAACGACTTTCAAAATC
				TGCAGCAGGTTTTTCTTCAAGCC
				AAGAAGGACACGGATTGGCTGA
				AGGAGAAAGT

120	GLT25	NM_024656.2	3067-	CTGTGTGCCAGGCCTCACAGACT
	D1		3166	CCCAGTTGGGTTGAAGAATGGTT
				GACTGAGTTTGATTCTTCCTGTA
				CCCTCGGTCGTCTGAGCTGTGTG
				CGGACAAC

121	COMMD6	NM_203497.3	32-	CTCTCGAGTCCGGGCCGCAAGTC
			131	CCAGACGCTGCCCATGGAGGCGT
				CCAGCGAGCCGCCGCTGGATGCT
				AAGTCCGATGTCACCAACCAGCT
				TGTAGATT

122	CORO1C	ILMN_174595	98-	AAGTAAAGTIGTTGATGGTGGTG
		4.1	197	AAACACCGTAGGGCATGTGGTTC
				AAAGAGAAGCAGGAGGGCAAGG
				GAAAGTTACCCTGATCTTAGTTT
				GTAGCTTAT

123	COX6C	NM_004374.2	70-	GAAGTTTTGCCAAAACCTCGGAT
			169	GCGTGGCCTTCTGGCCAGGCGTC
				TGCGAAATCATATGGCTGTAGCA
				TTCGTGCTATCCCTGGGGGTTGC
				AGCTTTGT

124	COX7B	NM_001866.2	160-	CAGAGCCACCAGAAACGTACAC
			259	CTGATTTTCATGACAAATACGGT
				AATGCTGTATTAGCTAGTGGAGC
				CACTTTCTGTATTGTTACATGGA
				CATATGTAG

125	COX7C	NM_001867.2	1-	CAAGGTCGTGAAAAAAAAGGTC
			100	TTGGTGAGGTGCCGCCATTTCAT
				CTGTCCTCATTCTCTGCGCCTTT
				CGCAGAGCTTCCAGCAGCGGTAT
				GTTGGGCCA

126	CPPED1	NM_018340.2	2494-	TGTATTTGTTTCTTTACAACAGG
			2593	TGTAGGTATAGGAGGTCAAGAAA
				AGGAGTTCGGTAAAGGGCATAG
				CTAATAACAACCACACATTGGGC
				CAGGCACAG

127	CR2b	NM_00100665	486-	GGTGTCAAGCAAATAATATGTGG
		8.1	585	GGGCCGACACGACTACCAACCT
				GTGTAAGTGTTTTCCCTCTCGAG
				TGTCCAGCACTTCCTATGATCCA
				CAATGGACA

128	CR2	NM_00100665	3581-	AGCCCAGTTTCACTGCCATATAC
		8.2	3680	TCTTCAAGGACTTTCTGAAGCCT
				CACTTATGAGATGCCTGAAGCCA
				GGCCATGGCTATAAACAATTACA
				TGGCTCTA

129	CREB1	NM_004379.3	4856-	TTTGATGGTAGGTCAGCAGCAGT
			4955	GCTAGTCTCTGAAAGCACAATAC
				CAGTCAGGCAGCCTATCCCATCA
				GATGTCATCTGGCTGAAGTTTAT
				CTCTGTCT

130	CREB5	NM_182898.3	7898-	ACCTACTCACCTTTTTCCCTTCT
			7997	AAGTTCTGCTAAATCACATCTGC
				CTCATAGAGAAAGGAATGTTGCC
				TTTGAGAACTGTCTTGGAGAACA
				GATAAGCT

131	CRKL	NM_005207.3	4901-	TTCTAAAGGAGCAGAAGGACAG
			5000	GTCTCTGAGACAGGATCGTTGTC
				CCTACAGGAGGAACAGTGGCCTT
				GCTTCTTAGACGGTCTTCACTGT
				GTGTTTTAA

132	CRY2	NM_021117.3	4013-	CAGCTCAGGTGGCCCTGAGGGCT
			4112	CCCTCGGAACAGTGCCTCAAATC
				CTGACCCAAGGGCCAGCATGGG
				GAAGAGATGGTTGCAGGCAAAA
				TGCACTTTAT

133	CS	NM_004077.2	2080-	CCTCCTAGCAAGACCTGTTGGTT
			2179	AGCTGGACATGCTTTGGCAATTT
				TTTTATACTACCAAGTGACCATA
				AAGGCATGGCATTTGTTGTGACT
				GGCACCCA

134	CSK	NM_004383.2	2501-	TCTAGGGACCCCTCGCCCCAGCC
			2600	TCATTCCCCATTCTGTGTCCCAT
				GTCCCGTGTCTCCTCGGTCGCCC
				CGTGTTTGCGCTTGACCATGTTG
				CACTGTTT

135	CST7	NM_003650.3	618-	CAACCACACCTTGAAGCAGACTC
			717	TGAGCTGCTACTCTGAAGTCTGG
				GTCGTGCCCTGGCTCCAGCACTT
				CGAGGTGCCTGTTCTCCGTTGTC
				ACTGACCC

136	CTAG1B	NM_001327.2	286-	GCGGGGCCAGGGGGCCGGAGAG
			385	CCGCCTGCTTGAGTTCTACCTCG
				CCATGCCTTTCGCGACACCCATG
				GAAGCAGAGCTGGCCCGCAGGA
				GCCTGGCCCA

137	CTDSP2	NM_005730.3	4685-	GAGGTCGGGCCAGCTGCCCCATT
			4784	CTTTTAACGTTGTAGGGCCTGCC
				CATGGAGCGGACCCTCCTCTTTG
				GGCCTCGTGAGCTTTTTTGCTTA
				TCATGTTC

138	CTSW	NM_001335.3	1076-	TGCACCGAGGGAGCAATACCTGT
			1175	GGCATCACCAAGTTCCCGCTCAC
				TGCCCGTGTGCAGAAACCGGATA
				TGAAGCCCCGAGTCTCCTGCCCT
				CCCTGAAC

139	CTSZ	NM_001336.3	1174-	CACTGGCTGCGAGTGTTCCTGAG
			1273	AGTTGAAAGTGGGATGACTTATG
				ACACTTGCACAGCATGGCTCTGC
				CTCACAATGATGCAGTCAGCCAC
				CTGGTGAA

140	CX3CL1	NM_002996.3	141-	AGCACCACGGTGTGACGAAATG
			240	CAACATCACGTGCAGCAAGATG
				ACATCAAAGATACCTGTAGCTTT
				GCTCATCCACTATCAACAGAACC
				AGGCATCATG

141	CXCL2	NM_002089.3	855-	ATCACATGTCAGCCACTGTGATA
			954	GAGGCTGAGGAATCCAAGAAAA
				TGGCCAGTGAGATCAATGTGACG
				GCAGGGAAATGTATGTGTGTCTA
				TTTTGTAAC

142	IL8RB	NM_001557.3	410-	ACCTCAAAAATGGAAGATTTTAA
			509	CATGGAGAGTGACAGCTTTGAA
				GATTTCTGGAAAGGTGAAGATCT
				TAGTAATTACAGTTACAGCTCTA
				CCCTGCCCC

143	CXCR5b	NM_001716.3	2619-	ACGTCCCTTTTTTCTCTGAGTAT
			2718	CTCCTCGCAAGCTGGGTAATCGA
				TGGGGGAGTCTGAAGCAGATGCA
				AAGAGGCAAGAGGCTGGATTTT
				GAATTTTCT

144	CYBB	NM_000397.3	3787-	ACTGGAGAGGGTACCTCAGTTAT
			3886	AAGGAGTCTGAGAATATTGGCCC
				TTTCTAACCTATGTGCATAATTA
				AAACCAGCTTCATTTGTTGCTCC
				GAGAGTGT

145	CYP1B1	NM_000104.3	2361-	CTTACACCAAACTACTGAATGAA
			2460	GCAGTATTTTGGTAACCAGGCCA
				TTTTTGGTGGGAATCCAAGATTG
				GTCTCCCATATGCAGAAATAGAC
				AAAAAGTA

146	DB	DB338252.1	436-	GTTCTTGGTCTGTATGTGTAGGT
	338252		535	GGAGGGAGGCAAAGTTGTGGTA
				ATAAAGTGGGAAGGCCCGGGAA
				GAACAGCTAACTGTATAGGGGT
				GAAATGACGCT

147	DBI	NM_00107986	241-	CATAAATACAGAACGGCCCGGG
		2.1	340	ATGTTGGACTTCACGGGCAAGGC
				CAAGTGGGATGCCTGGAATGAG
				CTGAAAGGGACTTCCAAGGAAG
				ATGCCATGAAA

148	DCAF7	NM_005828.4	6155-	TTAACACTGTGCTGTGAAACAAC
			6254	TATGGGGAATCTCCATTGAAGGC
				TACTTCATGGGCACCTGAAAGTG
				GAGTGTTATAGCTATGACTTTCT
				ATTTCTTG

149	DDIT4	NM_019058.2	1414-	GACCTGTTGTAGGCAGCTATCTT
			1513	ACAGACGCATGAATGTAAGAGT
				AGGAAGGGGTGGGTGTCAGGGA
				TCACTTGGGATCTTTGACACTTG
				AAAAATTACA

150	DDX23	NM_004818.2	2811-	ATTGCACTGGGCCATCAGCTCAT
			2910	GCCAGGCTATGGGGGCAGCCAG
				TTGGCATTGCTCCCCAGACTGAA
				CAGAAACCTGGCCGCCGGATGG
				GACCTCCTTT

151	DGUOK	NM_080916.2	573-	ACATCGAGTGGCATATCTATCAG
			672	GACTGGCATTCTTTTCTCCTGTG
				GGAGTTTGCCAGCCGGATCACAT
				TACATGGCTTCATCTACCTCCAG
				GCTTCTCC

152	DGUOKb	NM_080916.2	903-	TTGTAAAGAATCTGTAACCAATA
			1002	CCATGAAGTTCAGGCTGTGATCT
				GGGCTCCCTGACTTTCTGAAGCT
				AGAAAAATGTTGTGTCTCCCAAC
				CACCTTTC

153	DHX16b	NM_00116423	2491-	CCCGTGTCAACTTCTTTCTCCCT
		9.1	2590	GGCGGTGACCACCTGGTTCTGCT
				AAATGTTTACACACAGTGGGCTG
				AGAGTGGTTACTCTTCCCAGTGG
				TGCTATGA

154	DHX16	NM_003587.4	3189-	ACCAAAGAGTTCATGAGACAGG
			3288	TACTGGAGATTGAGAGCAGTTGG
				CTTCTGGAGGTGGCTCCCCATTA
				TTATAAGGCCAAGGAGCTAGAA
				GATCCCCATG

155	DKFZp	XM_291277.4	4192-	CTCCTGCAGCTTCTGTGAGCCAA
	761P04		4291	GCCCCAGCCTGCACCGTCGCTGC
	23			CCCTTCCCTGCCTAACCCTTTCC
				TGTCTCGCCTTGGAAGCACCCAT
				GTCTCCCT

156	DMBT1	NM_007329.2	3713-	CACAATGGCTGGCTCTCCCACAA
			3812	CTGTGGCCATCATGAAGACGCTG
				GTGTCATCTGCTCAGCTTCCCAG
				TCCCAGCCGACACCCAGCCCAGA
				CACTTGGC

157	DNAJB1	NM_006145.2	1904-	GACCTCTGGCTCCAGTGAAGCTG
			2003	AATGTCCTCACTTTGTGGGTCAC
				ACTCTTTACATTTCTGTAAGGCA
				ATCTTGGCACACGTGGGGCTTAC
				CAGTGGCC

158	DNAJB6	NM_058246.3	2087-	CTTCCCTGCATGCTCCCTCCCAG
			2186	TGACTTTCCTTCCCTTTCACATG
				AGGATCTGCCGTTCATGTTGCTT
				TCTCCTTTGTCCTCTTGGACTTG
				AGGGCATT

159	DOCK5	NM_024940.6	7201-	AAAGAGATTTCCATTTCTGCTGC
			7300	CAGAGCTGGTATTTGCCTGCCTG
				ATTCTCTGTGTTTCCTGTTTCAC
				CGCCACCCTTTCAGGAGAGAACT
				ACACCAGT

160	DPF2	NM_006268.4	2249-	TCTCAGCTCATGGGGAAGCCACA
			2348	TAGACATCCCTTTCTTCCCTTGC
				ACGCTCGCTAGCAGCTGGTAAGG
				TCTTCACACCCTGATTCCTCAAG
				TTTTCTGC

161	DYNC2L	NM_016008.3	351-	TTTGGGAACTCGGTGGAGGAACC
	I1		450	TCTTTATTGGACTTAATCAGCAT
				ACCCATCACAGGTGACACCTTAC
				GGACGTTTTCTCTTGTTCTCGTT
				CTGGATCT

162	DZIP3	NM_014648.3	4323-	CCCAGTGTCTTGCCCAGTAGATA
			4422	CAAGATAAATATTGCCAGAATCA
				GATATCAGGAAGTAGTAAGAAA
				AGGAGTTAATATGCAAACTAAAT
				CACTCGCTC

163	EEF1B2	NM_00103766	699-	GGATACGGAATTAAGAAACTTC
		3.1	798	AAATACAGTGTGTAGTTGAAGAT
				GATAAAGTTGGAACAGATATGCT
				GGAGGAGCAGATCACTGCTTTTG
				AGGACTATG

164	EGLN1	NM_022051.1	3976-	AGCAGCATGGACGACCTGATAC
			4075	GCCACTGTAACGGGAAGCTGGG
				CAGCTACAAAATCAATGGCCGG
				ACGAAAGCCATGGT

165	EGR1	NM_001964.2	1506-	GAGGCATACCAAGATCCACTTGC
			1605	GGCAGAAGGACAAGAAAGCAGA
				CAAAAGTGTTGTGGCCTCTTCGG
				CCACCTCCTCTCTCTCTTCCTAC
				CCGTCCCCG

166	EHD4	NM_139265.3	2605-	TCAAACATTAAATATCCCGAGGT
			2704	CTCCTTGGTGGGTGGCAGGATTT
				AAATTCAATCAAATCCTGTCCTA
				GTGTGTGCAGTGTCTTCGGCCCT
				GTGGACAC

167	EID2B	NM_152361.2	628-	GCCAGTTTAGTTAACTCAGTCAT
			727	TAGGGGGAATGCAAACTGGAAG
				GGAATACGGCAATGTGCAATTG
				AAGGAGGAAGCACACTCCGAAA
				TGGAAACAGAC

168	EIF2B4	NM_015636.3	1497-	GTCTCTAATGAGCTAGATGACCC
			1596	TGATGATCTGCAATGTAAGCGGG
				GAGAACATGTTGCGCTGGCTAAC
				TGGCAGAACCACGCATCCCTACG
				GTTGTTGA

169	EIF4EN	NM_019843.2	3051-	CACACTGGGCAGGACCCTGCTTC
	IF1		3150	ATCTCGGGTTGGTTTATGGGCTT
				TTACTTTGGAGCACTCTGTGTGA
				AGCTGTTTGGTGGAACCCATGCA
				TCTGGTGT

170	EMR4	NM_00108049	1719-	GGGAAGACGATTGGATCAATCA
		8.2	1818	TTGCATACTCATTCACCATCATC
				AACACCCTTCAGGGAGTGTTGCT
				CTTTGTGGTACACTGTCTCCTTA
				ATCGCCAGG

171	EP300	NM_001429.2	716-	CCAGCCAGGCCCAACAGAGCAG
			815	TCCTGGATTAGGTTTGATAAATA
				GCATGGTCAAAAGCCCAATGAC
				ACAGGCAGGCTTGACTTCTCCCA
				ACATGGGGAT

172	EPHX2	NM_001979.5	1909-	CATCCTTCCACCTGCTGGGGCAC
			2008	CATTCTTAGTATACAGAGGTGGC
				CTTACACACATCTTGCATGGATG
				GCAGCATTGTTCTGAAGGGGTTT
				GCAGAAAA

173	ERLIN1	NM_006459.3	3197-	TGATGGCCCTGGAGGCGGGGCT
			3296	GAGGAACAGGGAAATGCCGCTG
				TGAAGTCTTAAAGCACTTCTGCT
				TAAACTCCCATGTGTGAGGAGTG
				TGCCTCCCTG

174	ETFDH	NM_004453.3	1904-	TGACCTCTTGTCATCTGTGGCTC
			2003	TGAGTGGTACTAATCATGAACAT
				GACCAGCCGGCACACTTAACCTT
				AAGGGATGACAGTATACCTGTAA
				ATAGAAAT

175	EVI2A	NM_014210.3	1410-	GAGAGAGCTAAACTGTGTAATTT
			1509	AATGGTATCTTCCTTGCTGGATG
				TGGCAGAATCCACACCAGCTTAT
				CAACCAACACAGCTAATTTTAGA
				ATAGATCC

176	EWSR1	NM_005243.3	2248-	AAAAATGGATAAAGGCGAGCAC
			2347	CGTCAGGAGCGCAGAGATCGGC
				CCTACTAGATGCAGAGACCCCGC
				AGAGCTGCATTGACTACCAGATT
				TATTTTTTAA

177	EYA3	NM_001990.3	1551-	GATTCCTGGTTAGGAACTGCATT
			1650	AAAGTCCTTACTTCTCATCCAGT
				CCAGAAAGAATTGTGTGAATGTT
				CTGATCACTACCACCCAGCTGGT
				TCCAGCCC

178	C5orf	NM_032042.5	4058-	TTAGAACAAGTAGAATGGGAAA
	21		4157	GGAGTGACTGATAAATCTAAGAT
				TCAAAATAGTCCCGTCGAAACTT
				AAAGGCCAGATTATTGCTTTGGA
				GCTTTCTAT

179	FAM	NM_199280.2	3306-	ACTCTTAGACTCAGAGTCCTTGG
	179A		3405	GAGGCAGCCGCAAGGCCACTGA
				CAGAGGGGTGGCCCCTGACAGC
				AAGACAACTGGCAGCTCATACCC
				TTTTCAGCTG

180	FAM	NM_003704.3	4523-	CCCTGACTTGTAGCCAGCTTGTG
	193A		4622	TAAGATCCCTTGCAGAACGAGA
				AAGTTAAAAACAAGCCCACCCA
				GTACTCACACCATCAAGTCTGTT
				ATAGAGTGTA

181	FAM43A	NM_153690.4	2741-	AGACCCCTGAAATGTTGCCAAAT
			2840	TCTTCAAATAACTGTTTGGGGGG
				TGGGGGGAGATGAAAGAGAGTC
				GCGTTTTGTTTACAGTTAAAGAC
				ATCCAATAT

182	FAM50B	NM_012135.1	1273-	TTCTGAGTATTTTAGTGTTGCCA
			1372	CCTGGATTTGCTGCATTGCTCTG
				CTGAGCTGTATTGAAACCATGAC
				TGGGCCCACTGTCAGACAGAAAT
				TAGAATAG

183	FAIM3	NM_005449.4	1689-	CAGGCTCTAGATCACATGGCATC
			1788	AGGCTGGGGCAGAGGCATAGCT
				ATTGTCTCGGGCATCCTTCCCAG
				GGTTGGGTCTTACACAAATAGAA
				GGCTCTTGC

184	FKBP1A	NM_054014.3	301-	AGAAACAAGCCCTTTAAGTTTAT
			400	GCTAGGCAAGCAGGAGGTGATC
				CGAGGCTGGGAAGAAGGGGTTG
				CCCAGATGAGTGTGGGTCAGAG
				AGCCAAACTGA

185	FLNB	NM_001457.3	9148-	CAGACCTGAGCTGGCTTTGGAAT
			9247	GAGGTTAAAGTGTCAGGGACGTT
				GCCTGAGCCCAAATGTGTAGTGT
				GGTCTGGGCAGGCAGACCTTTAG
				GTTTTGCT

186	FNBP1	NM_015033.2	5237-	TGTGTGTTGCACTAATTCTAAAC
			5336	TTTGGAGGCATTTTGCTGTGTGA
				GGCCGATCGCCACTGTAAAGGTC
				CTAGAGTTGCCTGTTTGTCTCTG
				GAGATGGA

187	FOXK2	NM_004514.3	4387-	TTTTTTGCCGTAGGCACCATTCT
			4486	GCATCTTGAACCCAGACTGAAGT
				GTGCCTCTCACAGATGGAAGGTG
				CACACGCTCCTGTCTCCTCCTCA
				CTCTGCCA

188	FRAT2	NM_012083.2	1769-	CTTGTCCTCCCAGCTGAGCTTTC
			1868	TTATTCCACCCTTTCTGGTGTCT
				ATAGGAATGCATGAGAGACCCTG
				GACGTTTTTCTGCTCTCTTCTGG
				CCCTCCAT

189	FTHL16	XR_041433.1	255-	GGACTCAGAGGCCGCCATCAAC
			354	CGCCAGATCAACCTAGAGCTCTG
				TGCCTCCTACGTTTACCTGTCCA
				TGTCTTACTGCTTTGACCGTGAT
				GATGTGGCT

190	GATA2	NM_00114566	2573-	GTCCAGTTGATTGTACGTAGCCA
		2.1	2672	CAGGAGCCCTGCTATGAAAGGA
				ATAAAACCTACACACAAGGTTG
				GAGCTTTGCAATTCTTTTTGGAA
				AAGAGCTGGG

191	GLIS3	NM_00104241	548-	ACTCGCGCTGGCCGGCCGGGGG
		3.1	647	AAGGGACCCGCACGCCGGGCTTT
				GTTGTGGAAATCCCGGTTACCTG
				GCTTATAACCCACACCATGGATA
				ACTTATTGG

192	GLRX	ILMN_173730	119-	AAAGCATAGTTGGTCTTGGTGTC
		8.1	218	ATATGGATCAGAGGCACAAGTG
				CAGAGGCTGTGGTCATGCGGAA
				CACTCTGTTATTTAAGATGGCTA
				TCCAGATAAT

193	GNL3	NM_014366.4	1733-	TACAGCAGGTGAACAGTCTACA
			1832	AGGTCTTTTATCTTGGATAAAAT
				CATTGAAGAGGATGATGCTTATG
				ACTTCAGTACAGATTATGTGTAA
				CAGAACAAT

194	GNS	NM_002076.3	4988-	CCTGTGTTTGCATCCTCTGTTCC
			5087	TATTCTGCCCTTGCTCTGTGTCA
				TCTCAGTCATTTGACTTAGAAAG
				TGCCCTTCAAAAGGACCCTGTTC
				ACTGCTGC

195	GOLGA3	NM_005895.3	8961-	CTCACTGACCGGAAGGTCCAGGT
			9060	GAATCTCGTCATAAGTGATCTCA
				GGCTCTCACAGGATCCGGAGGG
				AAATGTGTTAGAGGGTCTGGAA
				AATTCAGTGC

196	GPAT	NM_022078.2	1686-	AGTCTGGGAGCAGCAGTCTTCGT
	CH3		1785	GGCTGGTTCAGGGTGTTTTGTTC
				CGAGCCTGCCTGCCTGCCGGTTC
				TATACCTCAGGGGCATTTTTACA
				AAAAGCCC

197	GPI	NM_000175.2	1696-	CAGTGCTCAAGTGACCTCTCACG
			1795	ACGCTTCTACCAATGGGCTCATC
				AACTTCATCAAGCAGCAGCGCG
				AGGCCAGAGTCCAATAAACTCGT
				GCTCATCTG

198	GPR65	NM_003608.3	1899-	TATGATTTTTCTCACTCTTTCTT
			1998	TGGACTCCAGGGTGTCAGCCATC
				AGGTCTCCTAATTTTGTGTACCG
				GTCTCCAACAACCCCAGCTACTG
				AATACTGC

199	GSTO1	NM_004832.2	897-	AGAGCTCTACTTACAGAACAGCC
			996	CTGAGGCCTGTGACTATGGGCTC
				TGAAGGGGGCAGGAGTCAGCAA
				TAAAGCTATGTCTGATATTTTCC
				TTCACTAAT

200	GUSB	NM_000181.3	2032-	GGTATCCCCACTCAGTAGCCAAG
			2131	TCACAATGTTTGGAAAACAGCCT
				GTTTACTTGAGCAAGACTGATAC
				CACCTGCGTGTCCCTTCCTCCCC
				GAGTCAGG

201	GZMA	NM_006144.3	636-	GCCTCCGAGGTGGAAGAGACTC
			735	GTGCAATGGAGATTCTGGAAGCC
				CTTTGTTGTGCGAGGGTGTTTTC
				CGAGGGGTCACTTCCTTTGGCCT
				TGAAAATAA

202	GZMB	NM_004131.3	541-	ACACTACAAGAGGTGAAGATGA
			640	CAGTGCAGGAAGATCGAAAGTG
				CGAATCTGACTTACGCCATTATT
				ACGACAGTACCATTGAGTTGTGC
				GTGGGGGACC

203	GZMH	NM_033423.4	718-	GGCCCCTCGTGTGTAAGGACGTA
			817	GCCCAAGGTATTCTCTCCTATGG
				AAACAAAAAAGGGACACCTCCA
				GGAGTCTACATCAAGGTCTCACA
				CTTCCTGCC

204	HAT1	NM_003642.3	1235-	AACCAAATAGAAATAAGCATGC
			1334	AACATGAACAGCTGGAAGAGAG
				TTTTCAGGAACTAGTGGAAGATT
				ACCGGCGTGTTATTGAACGACTT
				GCTCAAGAGT

205	HAVCR2	NM_032782.3	956-	TATATGAAGTGGAGGAGCCCAA
			1055	TGAGTATTATTGCTATGTCAGCA
				GCAGGCAGCAACCCTCACAACCT
				TTGGGTTGTCGCTTTGCAATGCC
				ATAGATCCA

206	HDAC3	NM_003883.3	1765-	AAGATGAAGAGAGAGAGATTTG
			1864	GAAGGGGCTCTGGCTCCCTAACA
				CCTGAATCCCAGATGATGGGAA
				GTATGTTTTCAAGTGTGGGGAGG
				ATATGAAAAT

207	HERC1	NM_003922.3	14664-	CAATCGACATGGACAACTACATG
			14763	CTCTCGAGAAACGTGGACAACG
				CCGAGGGCTCCGACACTGACTAC
				TGACCGTGCGGGTGCTCTCACCC
				TCCCTTCTC

208	HERC3	NM_014606.2	3796-	TAAGAATGATTTAGACTGACCTG
			3895	TCCTTTTTTATCTGCGCATGCGA
				GAACATCACCTTCCTCTGTACAC
				TTGGAAATGCCTCTGGCTTGTTG
				CAGCCCTC

209	HK3	NM_002115.2	2785-	AGTCAGAGGATGGGTCCGGCAA
			2884	AGGTGCGGCCCTGGTCACCGCTG
				TTGCCTGCCGCCTTGCGCAGTTG
				ACTCGTGTCTGAGGAAACCTCCA
				GGCTGAGGA

210	HLA-B	NM_005514.6	938-	CCCTGAGATGGGAGCCGTCTTCC
			1037	CAGTCCACCGTCCCCATCGTGGG
				CATTGTTGCTGGCCTGGCTGTCC
				TAGCAGTTGTGGTCATCGGAGCT
				GTGGTCGC

211	HLA-	NM_002118.3	21-	CCCGTGAGCTGGAAGGAACAGA
	DMB		120	TTTAATATCTAGGGGCTGGGTAT
				CCCCACATCACTCATTTGGGGGG
				TCAAGGGACCCGGGCAATATAG
				TATTCTGCTC

212	HLA-G	NM_002127.4	1181-	AAGAGCTCAGATTGAAAAGGAG
			1280	GGAGCTACTCTCAGGCTGCAATG
				TGAAACAGCTGCCCTGTGTGGGA
				CTGAGTGGCAAGTCCCTTTGTGA
				CTTCAAGAA

213	HMGB1	NM_002128.4	209-	TATGCATTTTTTGTGCAAACTTG
			308	TCGGGAGGAGCATAAGAAGAAGC
				ACCCAGATGCTTCAGTCAACTTC
				TCAGAGTTTTCTAAGAAGTGCTC
				AGAGAGGT

214	HMGB2	NM_002129.3	670-	TGCTGCATATCGTGCCAAGGGCA
			769	AAAGTGAAGCAGGAAAGAAGGG
				CCCTGGCAGGCCAACAGGCTCA
				AAGAAGAAGAACGAACCAGAAG
				ATGAGGAGGAG

215	HNRNP	NM_004499.3	1246-	CCCCATGGAAATCACTCTCCTGT
	AB		1345	TGACTATTTCCAGAGCTCTAGGT
				GTTTAGGCAGCGTGTGGTGTCTG
				AGAGGCCATAGCGCCATCATGG
				GCTGATTTT

216	HNRNPK	NM_031263.2	538-	TCCCTACCTTGGAAGAGGGCCTG
			637	CAGTTGCCATCACCCACTGCAAC
				CAGCCAGCTCCCGCTCGAATCTG
				ATGCTGTGGAATGCTTAAATTAC
				CAACACTA

217	HOOK3	NM_032410.3	2391-	GCAAGGTAGAGAAGTTGTGCCG
			2490	CTCAATCACAGACACCTGCACCC
				ACAACATACTTCTGTTACACACA
				AGAACATTTCAGGAAACTCAGCC
				AGCTTATTT

218	HOPX	NM_139211.4	590-	AACAATAGGAAGCTATGTGTATC
			689	TTCTGTGTAAAGCAGTGGCTTCA
				CTGGAAAAATGGTGTGGCTAGC
				ATTTCCCTTTGAGTCATGATGAC
				AGATGGTGT

219	HPSE	NM_006665.5	3920-	GAGGTTCCTATAATTGTCTCTGA
			4019	GTAACCCTTTGGAATGGAGAGG
				GTGTTGGTCAGTCTACAAACTGA
				ACACTGCAGTTCTGCGCTTTTTA
				CCAGTGAAA

220	HSCB	NM_172002.3	343-	TCCACCCAGATTTCTTCAGCCAG
			442	AGGTCTCAGACTGAAAAGGACTT
				CTCAGAGAAGCATTCGACCCTGG
				TGAATGATGCCTATAAGACCCTC
				CTGGCCCC

221	HSD11B	NM_181755.1	156-	GCCTACTACTACTATTCTGCAAA
	1		255	CGAGGAATTCAGACCAGAGATG
				CTCCAAGGAAAGAAAGTGATTG
				TCACAGGGGCCAGCAAAGGGAT
				CGGAAGAGAGA

222	HSP90A	NM_007355.3	1531-	GGCATTCTCTAAAAATCTCAAGC
	B1		1630	TTGGAATCCACGAAGACTCCACT
				AACCGCCGCCGCCTGTCTGAGCT
				GCTGCGCTATCATACCTCCCAGT
				CTGGAGAT

223	HSPA6	NM_002155.4	1990-	GTGGCACTCAAGCCCGCCAGGG
			2089	GGACCCCAGCACCGGCCCCATCA
				TTGAGGAGGTTGATTGAATGGCC
				CTTCGTGATAAGTCAGCTGTGAC
				TGTCAGGGC

224	HUWE1	NM_031407.6	13637-	CCACCAACTCACCGTGTGTGTCC
			13736	CAGCTGCCCCATCTTCCCCAGCG
				CATACCTGTTCCTCTTCTCATTC
				TCTCCCCGCCGCCTGTTTCCTCA
				CCTTCTCT

225	HVCN1	NM_032369.3	747-	TGTTCCAGGAGCACCAGTTTGAG
			846	GCTCTGGGCCTGCTGATTCTGCT
				CCGGCTGTGGCGGGTGGCCCGG
				ATCATCAATGGGATTATCATCTC
				AGTTAAGAC

226	IDO1	NM_002164.3	51-	CTATTATAAGATGCTCTGAAAAC
			150	TCTTCAGACACTGAGGGGCACCA
				GAGGAGCAGACTACAAGAATGG
				CACACGCTATGGAAAACTCCTGG
				ACAATCAGT

227	IDS	NM_006123.4	1016-	TGGATGGACATCAGGCAACGGG
			1115	AAGACGTCCAAGCCTTAAACATC
				AGTGTGCCGTATGGTCCAATTCC
				TGTGGACTTTCAGCGGAAAATCC
				GCCAGAGCT

228	IER5	NM_016545.4	1802-	ACTTTACACCTACCCCTCACCGG
			1901	AAAGCTAGACCCGCTTCAGGGCC
				AGGAGTGGCGTTTCCGCACAGG
				ATTTCCTAAGACGAGAGGGATTT
				AGCCAAGAG

229	IFI27	NM_032036.2	305-	GTCAGTGTTGGGGGCCTGCTTGG
	L2		404	GGAATTCACCTTCTTCTTCTCTC
				CCAGCTGAACCCGAGGCTAAAGA
				AGATGAGGCAAGAGAAAATGTA
				CCCCAAGGT

230	IFNA17	NM_021268.2	292-	TGAGATGATCCAGCAGACCTTCA
			391	ATCTCTTCAGCACAGAGGACTCA
				TCTGCTGCTTGGGAACAGAGCCT
				CCTAGAAAAATTTTCCACTGAAC
				TTTACCAG

231	IFNAR1	NM_000629.2	3124-	CTAATCAGCTCTCAGTGATCAAC
			3223	CCACTCTTGTTATGGGTGGTCTC
				TGTCACTTTGAATGCCAGGCTGG
				CTTCTCGTCTAGCAGTATTCAGA
				TACCCCTT

232	IFNAR2	NM_000874.3	632-	AAATACCACAAGATCATTTTGTG
			731	ACCTCACAGATGAGTGGAGAAG
				CACACACGAGGCCTATGTCACCG
				TCCTAGAAGGATTCAGCGGGAA
				CACAACGTTG

233	IFNGR1	NM_000416.1	1141-	CCCGGGCAGCCATCTGACTCCAA
			1240	TAGAGAGAGAGAGTTCTTCACCT
				TTAAGTAGTAACCAGTCTGAACC
				TGGCAGCATCGCTTTAAACTCGT
				ATCACTCC

234	IGFBP7	NM_001553.2	584-	ATCGGAATCCCGACACCTGTCCT
			683	CATCTGGAACAAGGTAAAAAGG
				GGTCACTATGGAGTTCAAAGGAC
				AGAACTCCTGCCTGGTGACCGGG
				ACAACCTGG

235	IL16	NM_004513.4	1263-	GGCATCTCCAACATCATCATCCA
			1362	ACGAAGACTCAGCTGCAAATGG
				TTCTGCTGAAACATCTGCCTTGG
				ACACAGGGTTCTCGCTCAACCTT
				TCAGAGCTG

236	IL1B	NM_000576.2	841-	GGGACCAAAGGCGGCCAGGATA
			940	TAACTGACTTCACCATGCAATTT
				GTGTCTTCCTAAAGAGAGCTGTA
				CCCAGAGAGTCCTGTGCTGAATG
				TGGACTCAA

237	ILIR2	NM_173343.1	114-	TGCTTCTGCCACGTGCTGCTGGG
			213	TCTCAGTCCTCCACTTCCCGTGT
				CCTCTGGAAGTTGTCAGGAGCAA
				TGTTGCGCTTGTACGTGTTGGTA
				ATGGGAGT

238	IL4	NM_000589.2	626-	GACACTCGCTGCCTGGGTGCGAC
			725	TGCACAGCAGTTCCACAGGCACA
				AGCAGCTGATCCGATTCCTGAAA
				CGGCTCGACAGGAACCTCTGGG
				GCCTGGCGG

239	IL7	NM_000880.2	39-	AATAACCCAGCTTGCGTCCTGCA
			138	CACTTGTGGCTTCCGTGCACACA
				TTAACAACTCATGGTTCTAGCTC
				CCAGTCGCCAAGCGTTGCCAAGG
				CGTTGAGA

240	INTS4	NM_033547.3	652-	CCCACGTGTCAGAACAGCAGCTA
			751	TAAAAGCCATGTTGCAGCTCCAT
				GAAAGAGGACTGAAATTACACC
				AAACAATTTATAATCAGGCCTGT
				AAATTACTC

241	IRAK2	NM_001570.3	1286-	GTGTTGGCCGAGGTCCTCACGGG
			1385	CATCCCTGCAATGGATAACAACC
				GAAGCCCGGTTTACCTGAAGGAC
				TTACTCCTCAGTGATATTCCAAG
				CAGCACCG

242	IRF1	NM_002198.1	511-	CTGTGCGAGTGTACCGGATGCTT
			610	CCACCTCTCACCAAGAACCAGAG
				AAAAGAAAGAAAGTCGAAGTCC
				AGCCGAGATGCTAAGAGCAAGG
				CCAAGAGGAA

243	IRF4	NM_002460.1	326-	GGGCACTGTTTAAAGGAAAGTTC
			425	CGAGAAGGCATCGACAAGCCGG
				ACCCTCCCACCTGGAAGACGCGC
				CTGCGGTGCGCTTTGAACAAGAG
				CAATGACTT

244	KIAA	NM_014761.3	2187-	ATGGATGGGACTCTTATGTCATA
	0174		2286	ACTTCTGTTACTCCTTTGGCCCA
				TAGCTAAGGTCATCCTTCCCCAC
				AGGGGTGGCTTTGGGATTGGATG
				ATACAGCT

245	ITCH	NM_00125713	439-	GAGGTGACAAAGAGCCAACAGA
		8.1	538	GACAATAGGAGACTTGTCAATTT
				GTCTTGATGGGCTACAGTTAGAG
				TCTGAAGTTGTTACCAATGGTGA
				AACTACATG

246	ITFG2	NM_018463.3	1985-	GTCTGGTCTTACCCATGTTCCTA
			2084	GCAACCCTGAGATGATTTTCTTC
				CATTTACCAAAGCAGCCGGGTCA
				GTGCTTTCTCACGTTGCCGTATT
				CTTCAGGT

247	ITGAE	NM_002208.4	3406-	CTGAATGCAGAGAACCACAGAA
			3505	CTAAGATCACTGTCGTCTTCCTG
				AAAGATGAGAAGTACCATTCTTT
				GCCTATCATCATTAAAGGCAGCG
				TTGGTGGAC

248	ITGAL	NM_002209.2	3906-	GTGAGGGCTTGTCATTACCAGAC
			4005	GGTTCACCAGCCTCTCTTGGTTT
				CCTTCCTTGGAAGAGAATGTCTG
				ATCTAAATGTGGAGAAACTGTAG
				TCTCAGGA

249	JAK1	NM_002227.1	286-	GAGAACACCAAGCTCTGGTATGC
			385	TCCAAATCGCACCATCACCGTTG
				ATGACAAGATGTCCCTCCGGCTC
				CACTACCGGATGAGGTTCTATTT
				CACCAATT

250	KIAA	NM_015443.3	4402-	CCTTCACATCCAGATCCCTGTCG
	1267		4501	GTGTTAGTTCCACTCTTGGTCTT
				TCACGCTCCCCTTGCCTGTGGAA
				CATTGTCTGGTCCTAGCTGTGGT
				TCCCATTG

251	MYST4	NM_012330.3	6541-	CCCAGACTGTAGCCATGCAGGGT
			6640	CCTGCACGGACTTTAACGATGCA
				AAGAGGCATGAACATGAGTGTG
				AACCTGATGCCAGCGCCAGCCTA
				CAATGTCAA

252	KCTD12	NM_138444.3	4208-	ACAAGTAAAATAACTTGACATG
			4307	AGCACCTTTAGATCCCTTCCCCT
				CCATGGGCTTTGGGCCACAGAAT
				GAACCTTTGAGGCCTGTAAAGTG
				GATTGTAAT

253	KIAA	NM_014736.4	236-	CGACATCAGTTTCATCGAGGAAA
	0101		335	GCTGAAAATAAATATGCAGGAG
				GGAACCCCGTTTGCGTGCGCCCA
				ACTCCCAAGTGGCAAAAAGGAA
				TTGGAGAATT

254	SETD1B	XM_037523.11	7779-	ATCGTGCCCAGTGTTAACCTCGG
			7878	CTGGCCTTCACTAAGGGGACTAG
				ACCTCCCTCTCCCCAGGAGCCCC
				AGCCCCAGAGTGGTTTGCAATAA
				TCAAGATA

255	KIR2DL	XM_00112635	265-	GAGGTGACATATGCACAGTTGG
	5A	4.1	364	ATCACTGCGTTTTCACACAGACA
				AAAATCACTTCCCCTTCTCAGAG
				GCCCAAGACACCTCCAACAGAT
				ACCACCATGT

256	KIR_	NM_014512.1	719-	TCCGAAACCGGTAACCCCAGAC
	Acti-		818	ACCTACATGTTCTGATTGGGACC
	vatin			TCAGTGGTCAAAATCCCTTTCAC
	g_Sub-			CATCCTCCTCTTCTTTCTCCTTC
	group_			ATCGCTGGT
	2

257	KIR2D	NM_012313.1	1-	CCGGCAGCACCATGTCGCTCATG
	S3		100	GTCATCAGCATGGCATGTGTTGG
				GTTCTTCTGGCTGCAGGGGGCCT
				GGCCACATGAGGGATTCCGCAG
				AAAACCTTC

258	KLRB1	NM_002258.2	357-	CAGCAACTCCGAGAGAAATGCTT
			456	GTTATTTTCTCACACTGTCAACC
				CTTGGAATAACAGTCTAGCTGAT
				TGTTCCACCAAAGAATCCAGCCT
				GCTGCTTA

259	KLRC1	NM_002259.3	336-	ACCTATCACTGCAAAGATTTACC
			435	ATCAGCTCCAGAGAAGCTCATTG
				TTGGGATCCTGGGAATTATCTGT
				CTTATCTTAATGGCCTCTGTGGT
				AACGATAG

260	KLRC2	NM_002260.3	943-	TATGTGAGTCAGCTTATAGGAAG
			1042	TACCAAGAACAGTCAAACCCAT
				GGAGACAGAAAGTAGAATAGTG
				GTTGCCAATGTCTCAGGGAGGTT
				GAAATAGGAG

261	KLRD1	NM_002262.3	597-	CAATTTTACTGGATTGGACTCTC
			696	TTACAGTGAGGAGCACACCGCCT
				GGTTGTGGGAGAATGGCTCTGCA
				CTCTCCCAGTATCTATTTCCATC
				ATTTGAAA

262	KLRF1	NM_016523.1	544-	TATACAGAAAAACCTAAGACAA
			643	TTAAACTACGTATGGATTGGGCT
				TAACTTTACCTCCTTGAAAATGA
				CATGGACTTGGGTGGATGGTTCT
				CCAATAGAT

263	KLRF1b	NM_016523.2	849-	AAGTGCAATTAAATGCCAAAATC
			948	TCTTCTCCCTTCTCCCTCCATCA
				TCGACACTGGTCTAGCCTCAGAG
				TAACCCCTGTTAACAAACTAAAA
				TGTACACT

264	KRTAP	NM_198696.2	213-	CTGCTGCCAGGCGGCCTGTGAGC
	10-3		312	CCAGCCCCTGCCAGTCAGGCTGC
				ACCAGCTCCTGCACGCCCTCGTG
				CTGCCAGCAGTCTAGCTGCCAGC
				CAGCTTGC

265	KYNU	NM_00103299	936-	TTGCCTGCTGGTGTTCCTACAAG
		8.1	1035	TATTTAAATGCAGGAGCAGGAG
				GAATTGCTGGTGCCTTCATTCAT
				GAAAAGCATGCCCATACGATTA
				AACCTGCGAG

266	LAMA5	NM_005560.4	11163-	CCAACCCCGGCCCCTGGTCAGGC
			11262	CCCTGCAGCTGCCTCACACCGCC
				CCTTGTGCTCGCCTCATAGGTGT
				CTATTTGGACTCTAAGCTCTACG
				GGTGACAG

267	LDHA	NM_00116541	1348-	ATCTTGTGTAGTCTTCAACTGGT
		6.1	1447	TAGTGTGAAATAGTTCTGCCACC
				TCTGACGCACCACTGCCAATGCT
				GTACGTACTGCATTTGCCCCTTG
				AGCCAGGT

268	LEF1	NM_016269.4	3136-	AACACATAGTGGCTTCTCCGCCC
			3235	TTGTAAGGTGTTCAGTAGAGCTA
				AATAAATGTAATAGCCAAACCC
				ACTCTGTTGGTAGCAATTGGCAG
				CCCTATTTC

269	LETM2	NM_144652.3	1331-	AAAGGACCCATCACTTCTTCTGA
			1430	AGAACCTACACTCCAGGCCAAAT
				CACAAATGACGGCCCAGAACAG
				CAAGGCTAGTTCAAAAGGAGCA
				TAAAGGACTA

270	LIF	NM_002309.3	1241-	GGGATGGAAGGCTGTCTTCTTTT
			1340	GAGGATGATCAGAGAACTTGGG
				CATAGGAACAATCTGGCAGAAG
				TTTCCAGAAGGAGGTCACTTGGC
				ATTCAGGCTC

271	LILRA5	NM_021250.3	1044-	TTGAATGCTGGAGCCTTGGAAGC
			1143	GAATCTGATGGTCCTAGGAGGTT
				CGGGAAGACCATCTGAGGCCTAT
				GCCATCTGGACTGTCTGCTGGCA
				ATTTCTTT

272	LILRA5	NM_181879.2	546-	CACCCTCTCAGCCCTGCCCAGTC
	b		645	CTGTGGTGACCTCAGGAGAGAA
				CGTGACCCTCCAGTGTGGCTCAC
				GGCTGAGATTCGACAGGTTCATT
				CTGACTGAG

273	LOC	NR_002809.2	471-	GCGGCAGCCAATCAGCGCGCGG
	338799		570	CTTCTATAGGGCTTGAGTTATTA
				GACGCTGATCTCAAAACATCCTT
				CATCAGACACGAAGGAGAGGCC
				AACAGATGAG

274	LOC100	XM_00171659	568-	AGGGTCATGCAGCTACTGAGGTC
	129022	1.1	667	ACAGCCTGGATTCATACACAGGT
				CTGACTCCTGAGCACTTAGCCAG
				GTGGCTGTAACAGTGTTCCCAGA
				AACACAGG

275	LOC100	XM_00173282	1148-	ACCTGTCTTCCGGGTCTGTTCAC
	129697	2.2	1247	CCGTCCCCTGGACTGGCACCAGC
				ACAGAGGGTCGAGTGTTGGCAC
				CTGTCTTCTGGGTCTCCATCCCT
				CCCTTTGTT

276	LOC100	XM_00171715	1469-	GAGAATGTCTGCGCGGAGACAG
	130229	8.1	1568	CATAGCTCTGTAGAAATGAGTGG
				CAGCGTATGTAACCTGGCATTTT
				GAACCCAGGAGCACAATTTTATT
				AAAGGAAAA

277	LOC100	XR_036994.1	15-	GAGTAGTAGGTGGACAGCCGTC
	132797		114	CCACACAAGGGTTTGTATCTGGG
				CTACACAGATTCCCTTCAGAAAA
				GCACCAATGTAAGCAACTCCCTT
				ACAGTTGCT

278	LOC100	XR_039238.1	342-	GAGATAGCTTCCTGAAATGTGTG
	133273		441	AAGGAAAATGATCAGAAAAAGA
				AAGAAGCCAAAGAGAAAGGTAC
				CTGGGTTCAACTAAAGTGCCAGC
				CTGCTCCACC

279	LOC	NM_144692.1	3367-	GCTCTGTCCTTTGCCGCTCAGAC
	148137		3466	CAAAAACCTTAGAGCTGTCTTTG
				ACTTCTGTCTTTCCCTTCCACCC
				ACAGTTAACCAGGAAATCCTGCC
				ATCTCCGC

280	LOC	NR_024275.1	5062-	GGTTACAGCCATTTTGTGTGATT
	151162		5161	CACTTCGGGGGTTAAGTAATGCA
				GGATTCTGCAAACAAGGTGTCGC
				CGTCCAAATGTACTGTCCTGGCA
				TAGAGAGC

281	Clorf	NM_00100380	2561-	ACATGGCGCCACGGCCACTTCCT
	222	8.1	2660	GCTGCCCTGGACCCCGCAAGCCC
				AGGGACATCCAAGAGCACCCCT
				CCTGAGACCCCAGACTCAGAAG
				CAGCGAGAAG

282	LOC	XM_934917.1	376-	CCCCTGGTGGACCGCGACCTCCG
	339674		475	CAAGACGCTAATGGTGCGCGAC
				AACCTGGCCTTCGGCGGCCCGGA
				GGTCTGAGCCGACTTGCAAAGG
				GGATAGGCGG

283	LOC	XM_371757.4	210-	GCAAAGCACTATCACAAGGAAT
	648000		309	ATAGGCAAATGTACAGAACTGA
				AATTCGAGTGGCGAGGATGGCA
				AGAAAAGCTGGCAACTTCTATGT
				ACCTGCAGAAC

284	LOC	XR_017684.2	82-	AAGATTATGTCTTCCCCTGTTTC
	391126		181	CAAAGAGCTGAGACAGAAGTACA
				ATGTGCAATCCATGCCCATCCGA
				AAGGATGATGAAGTTCAGGTTGT
				ACGAGGGC

285	LOC	XM_930634.1	1448-	ATGGGACCCACTCTACTGAGGCT
	399753		1547	TTATGTAGAACTCATAGAGGAAG
				CTGGCTTTGAGGAATGAACTACC
				CTGTGCTTTTCTTAGGACTAAAA
				TCTCAGGA

286	LOC	XM_934471.1	21-	GACGGTAACCGGGACCCAGTGT
	399942		120	CTGCTCCTGTCACCTTCGCCTCC
				TAATCCCTAGCCACTATGCGAGA
				TGACTCCTTCAACACCTTCAGTG
				AGACGGGTG

287	LOC	XM_498648.3	552-	GAGTTTTCCAAACCCTGGATTTC
	440389		651	CTTCGGAGAGAGCTAGATTCTAT
				TCCATTCTTGGAATTCAGCTCCT
				TGCCCTTCTCTGTGACCCCGGAT
				CGCGAATG

288	LOC	XM_942885.1	1533-	TGTTGCAAAAGCCAACTACCACT
	440928		1632	GTCAAACTTAGCCCGTTTACAAC
				ATGGGGAAAGGCGTATTTCTTAC
				TAATATCTCAACAACGATAACAA
				TGCTGTAT

289	LOC	XR_018937.2	287-	CGGGTGCAGCGGGAAAAGGCTA
	441073		386	ATGGCACAACTGTCCACGTAGGC
				ATTCACCCCAGCAAGGTGGTTAT
				CACTAGGCTAAAACTGGACAAA
				GACTGTGAAA

290	LOC	XR_036892.1	591-	GGTGAAGAATTTGTTCTATTATG
	642812		690	AAGATACTGTCTGGGCTAAAAA
				GCTTACAGTGAGTGGAAGATAG
				CAACTTGTAGGGTTGGTGGCTGA
				ACAGGCCGAC

291	LOC	XM_927980.1	255-	CTGGCTCAAGGATGGCACGGTGT
	643319		354	TATGTGAGCTCAATAATGCACTG
				TACCCCAAGGGGCAGGTCCCAGT
				AAAGAAGATCCAGGCCTCCACC
				ATGGCCTTC

292	LOC	XR_017529.2	38-	CAGGCGCTGCAAGTTCTCCCAGG
	644315		137	AGAAAGCCATGTTCAGTTCGAGC
				GCCAAGATCGTGAAGCCCAATG
				GCGAGAAGCCGGACGAGTTCGA
				GTCCGGCCAT

293	LOC	XM_928884.1	13-	GAAGCACTGGTAAATGTCTGCTG
	645914		112	CATTAACTCACTCAGACCAAACT
				TTCTCTTATCTAGGTCCAAAAGG
				AAGCTGCTCGGCTGGAAGGAAC
				CTGGTGAGG

294	LOC	XR_018104.1	670-	AGGTGCTGCAAAATTACCAGGA
	647340		769	ATACAGTCTGGCCAACAGCATCT
				ACTACTCTCTGAAGGAGTCCACC
				ACTAGTGAGCAGAGTGCCAGGA
				TGACAGCCAT

295	LOC	XR_038906.2	1638-	TGGAGAGAAGAATGAAGAGGTG
	648927		1737	GTGGTTCTGGGTTTGATTTGAGT
				TCACCTGTGGGCAGTGGGCAGTG
				TCTTGGTGAAAGGGAGCGGATA
				CTACTTTTTG

296	LOC	XM_938755.1	38-	GCCCTTCTGCCATCAACGAGGTG
	653773		137	GTGACCCAAGAACATACCATCA
				ACATTCACAAGCGCATCCATGGA
				GAGGGCTTCAAGAAGCGTGCTCC
				TCGGGCACT

297	LOC	XR_015610.3	1861-	GTAGTTGTCCACTGCTTTCCTGG
	728533		1960	ATGGATGGGACTCTTATGTCATA
				ACTTCTATACTCCTTTGGCCCAT
				AGCTAAGGTCATCCTTCCCCACA
				GGGGTGGC

298	LOC	XM_00113319	510-	CCAAACCAAAAGAGGCAAGCAA
	728835	0.1	609	GTCTGCGCTGACCCCAGTGAGTC
				CTGGGTCCAGGAGTACGTGTATG
				ACCTGGAACTGAACTGAGCTGCT
				CAGAGACAG

299	LOC	XR_040891.2	625-	CCCTGGGTGCCCCTTAACCCGGG
	729887		724	CGGTAGCTCGTTAAGATGGCGAA
				GTGTCCGGTCCGGAACACGCGA
				AACCCCAAATCCCGCCTGCCCGA
				CCTCCTGAC

300	LOC	XM_00113427	765-	GCGCGGTTGCGGTTAGCGGGCGC
	732111	5.1	864	GGTGCCAAAGCTGCCATCCCCAG
				CTCACAGCTCCTCATATCCACCC
				TGCCCTCATCTTTATGAATTGCG
				TGTAGACC

301	LOC	XM_00113301	182-	GCCCTTCAGAGCTGCGGGAGATC
	732371	9.1	281	ATTGATGAGTGCCGGGCCCATGA
				TCCCTCTGTGCGGCCCTCTGTGG
				ATGAGCAGAAGCGCAGACTTAA
				TGATGTGTT

302	LOC	NM_00109977	2666-	ATGTTGCATTGACTAGAGGAAAG
	91431	6.1	2765	AGGCATTTGTTGATTGTGGGAAA
				TTTAGCCTGTTTGAGGAAAAATC
				AACTTTGGGGACGAGTGATCCAA
				CACTGCGA

303	P2RY5	NM_005767.5	2026-	AGATTGTTTGCACTGGCGTGTGG
			2125	TTAACTGTGATCGGAGGAAGTGC
				ACCCGCCGTTTTTGTTCAGTCTA
				CCCACTCTCAGGGTAACAATGCC
				TCAGAAGC

304	LPCAT4	NM_153613.2	1560-	CCCCACACACCTCTCGAGGCACC
			1659	TCCCAGACACCAAATGCCTCATC
				CCCAGGCAACCCCACTGCTCTGG
				CCAATGGGACTGTGCAAGCACCC
				AAGCAGAA

305	LPIN2	NM_014646.2	5620-	AGAAAAAACTTAAAAATGGGAT
			5719	GTCCTAAAATGAAAGCTGCTCAA
				AGTCACAGAACAACCGAGGGAC
				AAAGGAGATTGGATGACTGGGA
				AGCGCTGGCCC

306	Clorf	NM_018372.3	1543-	TTCCAATACCCAGCTTGCTTCCA
	103		1642	TGGCCAATCTAAGGGCAGAGAA
				GAATAAAGTGGAGAAACCATCT
				CCTTCTACCACAAATCCACATAT
				GAACCAATCC

307	LRRC47	NM_020710.2	2461-	GGGTCAGTGACGGACACTTACCT
			2560	GACAGCGGATCCACAATATTCTC
				GTGCAGTGTGTTTGGAATCCTGG
				TCTGGGCTCTCGTCGTTGGCCTT
				GTAGATCA

308	LY96	NM_015364.4	439-	AAGGGAGAGACTGTGAATACAA
			538	CAATATCATTCTCCTTCAAGGGA
				ATAAAATTTTCTAAGGGAAAATA
				CAAATGTGTTGTTGAAGCTATTT
				CTGGGAGCC

309	LYN	NM_002350.1	1286-	TCCTGAAGAGCGATGAAGGTGG
			1385	CAAAGTGCTGCTTCCAAAGCTCA
				TTGACTTTTCTGCTCAGATTGCA
				GAGGGAATGGCATACATCGAGC
				GGAAGAACTA

310	MAGEA1	NM_004988.4	477-	AGGGGCCAAGCACCTCTTGTATC
			576	CTGGAGTCCTTGTTCCGAGCAGT
				AATCACTAAGAAGGTGGCTGATT
				TGGTTGGTTTTCTGCTCCTCAAA
				TATCGAGC

311	MAGEA3	NM_005362.3	850-	ACTGTGCCCCTGAGGAGAAAATC
			949	TGGGAGGAGCTGAGTGTGTTAG
				AGGTGTTTGAGGGGAGGGAAGA
				CAGTATCTTGGGGGATCCCAAGA
				AGCTGCTCAC

312	MAP3K7	NM_145333.1	671-	GCCATATTATACTGCTGCCCACG
			770	CAATGAGTTGGTGTTTACAGTGT
				TCCCAAGGAGTGGCTTATCTTCA
				CAGCATGCAACCCAAAGCGCTA
				ATTCACAGG

313	MARCKS	NM_002356.6	1800-	GTCAAAAAGGGATATCAAATGA
			1899	AGTGATGGGGTCACAATGGGGA
				AATTGAAGTGGTGCATAACATTG
				CCAAAATAGTGTGCCACTAGAA
				ATGGTGTAAAG

314	MARCKS	NM_023009.5	1117-	TCCAAGTAGGTTTTGTTTACCCT
	L1		1216	ACTCCCCAAATCCCTGAGCCAGA
				AGTGGGGTGCTTATACTCCCAAA
				CCTTGAGTGTCCAGCCTTCCCCT
				GTTGTTTT

315	MBD1	NM_015844.2	2380-	TGGCTGCAGGCCTGACTACTGCC
			2479	CACACCAACGAGGTGATCTAGC
				AGATACATGGCAACGTGTGAACT
				GCAACAACGCCTGGTGCCCCAGC
				ACCAACCTT

316	C19orf	NM_174918.2	1062-	CATACTAGAGTATACTGCGGCGT
	59		1161	GTTTTCTGTCTACCCATGTCATG
				GTGGGGGAGATTTATCTCCGTAC
				ATGTGGGTGTCGCCATGTGTGCC
				CTGTCACT

317	MED16	NM_005481.2	2152-	TCTGAAGCCCAGCTGCCTGCCCG
			2251	TGTATACGGCCACCTCGGATACC
				CAGGACAGCATGTCCCTGCTCTT
				CCGCCTGCTCACCAAGCTCTGGA
				TCTGCTGT

318	MEN1	NM_130799.2	2222-	CCCAGCCCCTAGAAACCCAAGCT
			2321	CCTCCTCGGAACCGCTCACCTAG
				AGCCAGACCAACGTTACTCAGG
				GCTCCTCCCAGCTTGTAGGAGCT
				GAGGTTTCA

319	MERTK	NM_006343.2	666-	GAAGAGATCGTGTCTGATCCCAT
			765	CTACATCGAAGTACAAGGACTTC
				CTCACTTTACTAAGCAGCCTGAG
				AGCATGAATGTCACCAGAAACA
				CAGCCTTCA

320	MFSD1	NM_022736.2	2023-	AAGGGCTGCGTTACACAAAATA
			2122	AACAATGGCATTGTCATAGGCCT
				TCCTTTTACTAGTAGGGCATAAT
				GCTAGGGAATATGTGAAGATGTT
				TTTATGAAG

321	MID1IP	NM_021242.5	3472-	AGCTGGCATTTCGCCAGCTTGTA
	1		3571	CGTAGCTTGCCACTCAGTGAAAA
				TAATAACATTATTATGAGAAAGT
				GGACTTAACCGAAATGGAACCA
				ACTGACATT

322	MPDU1	NM_004870.3	1226-	CATTCAGCCAAGCCTCCTCCTCT
			1325	AGCAGCAATTTCCAGCTGTGTAA
				CACTATCCTGGGCAAATGTTTTA
				CCCTGTCCTCCAGCCTCCCTGCT
				TCCCTTCT

323	MRPL27	NM_148571.1	2189-	TCAAACTGGTAGCTATGCTTTGA
			2288	TGTCCTGTTGAGGCCATCGGACA
				GAGACTGGAGCCCAGGTGACAG
				GAGATGGTGATACCAGAAGTCA
				AGGGTTGGGG

324	MRPS16	NM_016065.3	1811-	ATTCAAATGTGGCTGTGATTTCT
			1910	GCATATATCATAGATGGGATCCT
				TCTGAGAATACTGGAATAGGGA
				ATTAGGACACCAAGCCAATTCAG
				CTGTGAACC

325	MS4A2	NM_000139.3	662-	TTCTCACCATTCTGGGACTTGGT
			761	AGTGCTGTGTCACTCACAATCTG
				TGGAGCTGGGGAAGAACTCAAA
				GGAAACAAGGTTCCAGAGGATC
				GTGTTTATGA

326	MS4A6A	NM_022349.3	1290-	CTGGGAAGTTAAATGACTGGCCT
			1389	GGCATTATGCTATGAGTTTGTGC
				CTTTGCTGAGGACACTAGAACCT
				GGCTTGCCTCCCTTATAAGCAGA
				AACAATTT

327	MS4A6A	NM_152851.2	880-	CTGCGGTGGAAACAGGCTTACTC
	b		979	TGACTTCCCTGGGAGTGTACTTT
				TCCTGCCTCACAGTTACATTGGT
				AATTCTGGCATGTCCTCAAAAAT
				GACTCATG

328	MTCH1	NM_014341.2	2081-	TCCTCCTCATCTAATGCTCATCT
			2180	GTTTAATGGTGATGCCTCGCGTA
				CAGGATCTGGTTACCTGTGCAGT
				TGTGAATACCCAGAGGTTGGGCA
				GATCAGTG

329	MYADM	NM_00102082	2656-	TCTTTTTCCTGGCCATGAGGACA
		0.1	2755	AAAATTACTGAGTGGCCCTTAAA
				GAGGGAAGTTTGTTTTCAGCTGT
				TCTCTTTTGCCCGTAGGTGGGAG
				GGTGGGGA

330	MYADM	NM_00102082	2789-	TGAATGTGTAGTGCACACGCACG
	b	0.1	2888	GGTGTTTCTGTGTGCTAGTTGCT
				TCTTGCTGCTGCTTCCTGCTTGT
				CTGGGACTCACATACATAACGTG
				ATATATAT

331	C19orf	NM_019107.3	649-	TGTCCCTGAAAGGGCCAGCACAT
	10		748	CACTGGTTTTCTAGGAGGGACTC
				TTAAGTTTTCTACCTGGGCTGAC
				GTTGCCTTGTCCGGAGGGGCTTG
				CAGGGTGG

332	MYL12A	NM_006471.3	305-	TCTCTGGGTAGCAGGGTGGTGTG
			404	ATAGCGGCAGCGAGGGGCTCGG
				AGAGGTGCTCGGATTCTCGTAGC
				TGTGCCGGGACTTAACCACCACC
				ATGTCGAGC

333	MYLIP	NM_013262.3	2701-	TTGGGCATTTTGGAAGCTGGTCA
			2800	GCTAGCAGGTTTTCTGGGATGTC
				GGGAGACCTAGATGACCTTATCG
				GGTGCAATACTAGCTAAGGTAA
				AGCTAGAAA

334	NAT5	NM_181528.3	735-	AAACATACCACTCTCATGGTTCA
			834	TAGTATTCACTGTATGTATGCTA
				GGGAAAAGACTTGCTCCAGTCTC
				CTCCTCAGTTCTGTGCCTGAGAA
				CCACTGCT

335	NADK	NM_023018.4	2449-	TCCGGGGCTAGTGATCGTGATCC
			2548	CTTTTATTTGCAACTGTAATGAG
				AATTTTTCACACTAACACAGCGA
				GGGACTCAACACGCTGATTCTCC
				TCCTGCCT

336	NAGK	NM_017567.4	1362-	GGGCCAGGCACATCGGGCACCT
			1461	CCTCCCCATGGACTATAGCGCCA
				ATGCCATTGCCTTCTATTCCTAC
				ACCTTTTCCTAGGGGGCTGGTCC
				CGGCTCCAC

337	NCAPG	NM_022346.4	3080-	ACCCAAGCATCAAAGTCTACTCA
			3179	GCTAAAGACTAACAGAGGACAG
				AGAAAAGTGACAGTTTCAGCTA
				GGACGAACAGGAGGTGTCAGAC
				TGCTGAAGCCG

338	NCOA5	NM_020967.2	2837-	TGGACATGTTCTCGAGATGGGTG
			2936	GCTGTTCGCGACTTTTGTACCAG
				AGTGAAATTGTTAGAAGGAGGG
				TTTCTGGCTGTGGTTCTAAATGG
				AGCCCCAGG

339	NCR1	NM_004829.5	603-	CGATGTTTTGGCTCCTATAACAA
			702	CCATGCCTGGTCTTTCCCCAGTG
				AGCCAGTGAAGCTCCTGGTCACA
				GGCGACATTGAGAACACCAGCC
				TTGCACCTG

340	NDRG2	NM_016250.2	1516-	TATGCATCCTCTGTCCTGATCTA
			1615	GGTGTCTATAGCTGAGGGGTAAG
				AGGTTGTTGTAGTTGTCCTGGTG
				CCTCCATCAGACTCTCCCTACTT
				GTCCCATA

341	NDUFA4	NM_002489.3	262-	TGGGACAGAAATAACCCAGAGC
			361	CCTGGAACAAACTGGGTCCCAAT
				GATCAATACAAGTTCTACTCAGT
				GAATGTGGATTACAGCAAGCTG
				AAGAAGGAAC

342	NDUFAF	NM_174889.4	486-	TCCTGCCTCCACCAGTTCAAACT
	2		585	CAAATTAAAGGCCATGCCTCTGC
				TCCATACTTTGGAAAGGAAGAAC
				CCTCAGTGGCTCCCAGCAGCACT
				GGTAAAAC

343	NDUFB3	NM_002491.2	383-	ACAATGGAAGATAGAAGGGACA
			482	CCATTAGAAACTATCCAGAAGA
				AGCTGGCTGCAAAAGGGCTAAG
				GGATCCATGGGGCCGCAATGAA
				GCTTGGAGATAC

344	NDUFS4	NM_002495.2	326-	GAGTTTGATACCAGAGAGCGAT
			425	GGGAAAATCCTTTGATGGGTTGG
				GCATCAACGGCTGATCCCTTATC
				CAACATGGTTCTAACCTTCAGTA
				CTAAAGAAG

345	NDUFV2	NM_021074.4	687-	TTACTATGAGGATTTGACAGCTA
			786	AGGATATTGAAGAAATTATTGAT
				GAGCTCAAGGCTGGCAAAATCC
				CAAAACCAGGGCCAAGGAGTGG
				ACGCTTCTCT

346	NFAT5	NM_138713.3	3857-	CCCAAGAAGCATTTTTTGCAGCA
			3956	CCGAACTCAATTTCTCCACTTCA
				GTCAACATCAAACAGTGAACAA
				CAAGCTGCTTTCCAACAGCAAGC
				TCCAATATC

347	NFATC1	NM_172389.1	1985-	CGAATTCTCTGGTGGTTGAGATC
			2084	CCGCCATTTCGGAATCAGAGGAT
				AACCAGCCCCGTTCACGTCAGTT
				TCTACGTCTGCAACGGGAAGAG
				AAAGCGAAG

348	NFATC4	NM_00113602	2297-	ACAAGAGGGTTTCCCGGCCAGTC
		2.2	2396	CAGGTCTACTTTTATGTCTCCAA
				TGGGCGGAGGAAACGCAGTCCT
				ACCCAGAGTTTCAGGTTTCTGCC
				TGTGATCTG

349	NFKB1	NM_003998.3	3606-	CGGATGCATCTGGGGATGAGGTT
			3705	GCTTACTAAGCTTTGCCAGCTGC
				TGCTGGATCACAGCTGCTTTCTG
				TTGTCATTGCTGTTGTCCCTCTG
				CTACGTTC

350	NFKB2	NM_002502.2	826-	ATCTCCGGGGGCATCAAACCTGA
			925	AGATTTCTCGAATGGACAAGACA
				GCAGGCTCTGTGCGGGGTGGAG
				ATGAAGTTTATCTGCTTTGTGAC
				AAGGTGCAG

351	NIPBL	NM_133433.3	8755-	GCGCCGTGATGGCCGCAAACTG
			8854	GTGCCTTGGGTAGACACTATTAA
				AGAGTCAGACATTATTTACAAAA
				AAATTGCTCTAACGAGTGCTAAT
				AAGCTGACT

352	NLRP3	NM_00107982	416-	AGTGGGGTTCAGATAATGCACGT
		1.2	515	GTTTCGAATCCCACTGTGATATG
				CCAGGAAGACAGCATTGAAGAG
				GAGTGGATGGGTTTACTGGAGTA
				CCTTTCGAG

353	NME1-	NM_00101813	484-	ACCTGGAGCGCACCTTCATCGCC
	NME2	6.2	583	ATCAAGCCGGACGGCGTGCAGC
				GCGGCCTGGTGGGCGAGATCATC
				AAGCGCTTCGAGCAGAAGGGAT
				TCCGCCTCGT

354	NUDT18	NM_024815.3	1369-	CCCCAGTGGCATCTCCTCATCAC
			1468	GTTCTGTGCCGTCCTTGGGAAAG
				GCCTGCATTCTGATCCTTCCAGG
				CCCTTCGAGCATGGAGGGGCACT
				GGGGAAGG

355	NUMB	NM_00100574	2833-	CATAAGATTGATTTATCATTGAT
		4.1	2932	GCCTACTGAAATAAAAAGAGGA
				AAGGCTGGAAGCTGCAGACAGG
				ATCCCTAGCTTGTTTTCTGTCAG
				TCATTCATTG

356	NUP153	NM_005124.3	5104-	TTTATGATCCAGCAGATTATTCA
			5203	CTGATTTGACATAGTCTGGCTGT
				ACCCAGGAATGGAGCCTGCACG
				GTGAATGGCTTTGTATAGAACCT
				CTTTGTCTA

357	OLR1	NM_002543.3	1524-	ACACATTTTGGGACAAGTGGGG
			1623	AGCCCAAGAAAGTAATTAGTAA
				GTGAGTGGTCTTTTCTGTAAGCT
				AATCCACAACCTGTTACCACTTC
				CTGAATCAGT

358	OSBP	ILMN 170637	130-	TTCTCTTCCTTCACCATCTGCAC
		6.1	229	TACATTTCTGGCTGATCCCAATC
				AGATTCCCGCTAATGGAAGAAGT
				TTAGAATCTTTCAGGTGGAATAA
				AGTCACAT

359	FAM105	NM_138348.4	2537-	TGCAGATGGTGTTCACATGAACC
	B		2636	GGAGACATCACTCTTTAGGATTC
				TACTGGCAGCCCCTGAATTGGCT
				CAACGTTIGTGGAGGTGGTATTT
				CCCTGAAG

360	P2RY10	NM_198333.1	972-	TTACACCATGGTAAAGGAAACC
			1071	ATCATTAGCAGTTGTCCCGTTGT
				CCGAATCGCACTGTATTTCCACC
				CTTTTTGCCTGTGCCTTGCAAGT
				CTCTGCTGC

361	PACS1	NM_018026.3	3830-	CGCTGTCTTCGTGGCTTCCACCC
			3929	TTGTTAATGATGCTCCTGCCTCT
				GCCTCCCAGCCCCTCACCCAGCA
				CAGCTCTGCCTGGACTTGGAGAG
				ATGGGAGG

362	PANK2	NM_153640.2	824-	AGTGGATAAACTAGTACGAGAT
			923	ATTTATGGAGGGGACTATGAGA
				GGTTTGGACTGCCAGGCTGGGCT
				GTGGCTTCAAGCTTTGGAAACAT
				GATGAGCAAG

363	PDCD10	NM_145859.1	901-	AAGAGATGTACTTCTCAGTGGCA
			1000	GTATTGAACTGCCTTTATCTGTA
				AATTTTAAAGTTTGACTGTATAA
				ATTATCAGTCCCTCCTGAAGGGA
				TCTAATCC

364	PDGFD	NM_033135.3	3394-	CCTGTGAAAACATCAGTTTCCTG
			3493	TACCAAAGTCAAAATGAACGTTA
				CATCACTCTAACCTGAACAGCTC
				ACAATGTAGCTGTAAATATAAAA
				AATGAGAG

365	PDSS1	NM_014317.3	1199-	CATGAAGCAATAAGAGAGATCA
			1298	GTAAACTTCGACCATCCCCAGAA
				AGAGATGCCCTCATTCAGCTTTC
				AGAAATTGTACTCACAAGAGAT
				AAATGACAAC

366	PELP1	NM_014389.2	1989-	TGGCCCCGTCTCCTCGCTGCCCA
			2088	CCTCCTCTTGCCTGTGCCCTGCA
				AGCCTTCTCCCTCGGCCAGCGAG
				AAGATAGCCTTGAGGTCTCCTCT
				TTCTGCTC

367	PFAS	NM_012393.2	5109-	CATCCCTAGATCCTAACCCTTTA
			5208	GTATGCTGGAATTCTACTCTTCA
				CTTACTGCATTGACTGTTGTTGA
				TTAGTTATTATTGCAAAGCACTG
				TCACCGGC

368	PFDN5	NM_145897.2	232-	ATCGATGTGGGAACTGGGTACTA
			331	TGTAGAGAAGACAGCTGAGGAT
				GCCAAGGACTTCTTCAAGAGGA
				AGATAGATTTTCTAACCAAGCAG
				ATGGAGAAAA

369	PFDN5	NM_145897.2	331-	ATCCAACCAGCTCTTCAGGAGAA
	b		430	GCACGCCATGAAACAGGCCGTC
				ATGGAAATGATGAGTCAGAAGA
				TTCAGCAGCTCACAGCCCTGGGG
				GCAGCTCAGG

370	PGK1	NM_000291.3	1122-	GTCCTGAAAGCAGCAAGAAGTA
			1221	TGCTGAGGCTGTCACTCGGGCTA
				AGCAGATTGTGTGGAATGGTCCT
				GTGGGGGTATTTGAATGGGAAG
				CTTTTGCCCG

371	PHF8	NM_015107.2	5704-	ATCAAGGTTTAGAACACCATGAG
			5803	ATAGTTACCCCTGATCTCCAGTC
				CCTAGCTGGGGGCTGGACAGGG
				GGAAGGGAGAGAGGATTTCTAT
				TCACCTTTAA

372	PHLPP2	NM_015020.3	7601-	CCAGTTGGGTGTGGCAGATCTAC
			7700	TGAATATCAAATGATGCTCTTCT
				TCCCATGTAGACCTTCAGCAAAA
				GCCGGTACTTGGAAGCCACAGG
				CTCACCTTC

373	PHRF1	NM_020901.3	5239-	GGGAAATGGGGGGCATCACCAT
			5338	GCCTGCCGTCGGGTTCCTGCGCT
				GACACCTGGTCTGTGCACCTGTG
				TTGCTCACAGTTGAAAACTGGAC
				ACTTTTGTA

374	PI4K2A	NM_018425.3	3886-	TCCATGGAATTGCTGAGACGTGG
			3985	CTCCTGGGGCTATTTCTCCCTAA
				TAAAGGATGATCCAGGTCCTCAT
				TTCCAAAGTCCCAATGCTCTGAA
				AACCAAAA

375	PIK3CD	NM_005026.3	4799-	GAGCCAGAAGTAGCCGCCCGCT
			4898	CAGCGGCTCAGGTGCCAGCTCTG
				TTCTGATTCACCAGGGGTCCGTC
				AGTAGTCATTGCCACCCGCGGGG
				CACCTCCCT

376	PIM2	NM_006875.3	1947-	TTTTTGGGGGATGGGCTAGGGGA
			2046	AATAAGGCTTGCTGTTTGTTCTC
				CTGGGGCGCTCCCTCCAACTTTT
				GCAGATTCTTGCAACCTCCTCCT
				GAGCCGGG

377	PLAC8	NM_00113071	289-	CTGATATGAATGAATGCTGTCTG
		5.1	388	TGTGGAACAAGCGTCGCAATGA
				GGACTCTCTACAGGACCCGATAT
				GGCATCCCTGGATCTATTTGTGA
				TGACTATAT

378	PLEKHG	NM_015432.3	6365-	CCAGTIGTGGGTTAAGAATAGGC
	4		6464	TAGAGCAGACATTGGGTGTTTCC
				ATGCTGTAGGCTGGTGGGGGACC
				ATGTGCCTCTAGGCAGTGACTAG
				GGTGCCCC

379	POLR2A	NM_000937.4	6539-	CCCCTGCCTGTCCCCAAATTGAA
			6638	GATCCTTCCTTGCCTGTGGCTTG
				ATGCGGGGCGGGTAAAGGGTAT
				TTTAACTTAGGGGTAGTTCCTGC
				TGTGAGTGG

380	PPP1R	XM_927029.1	4342-	CAGAACCTCCTCAGTTCCTTCAC
	3E		4441	AGTGCAACCCTGTGTACTTGGCC
				CGCAACCCAATAGTATTGTGCCT
				CACTTCACCTTCCATGGGCAACT
				GCCCTCCC

381	PPP2R	NM_178588.1	941-	ACAGCACCCTCACGGAACCAGT
	5C		1040	GGTGATGGCACTTCTCAAATACT
				GGCCAAAGACTCACAGTCCAAA
				AGAAGTAATGTTCTTAAACGAAT
				TAGAAGAGAT

382	PPP6C	NM_002721.4	1536-	TTAAGAAATTTCAGCAGCAAAGT
			1635	TGTTATTCAGTGGGCACGATGGA
				CTCCAAATGCCTCAAGTTATGTA
				TACCTGTCCCAGATGTAAACTTC
				ATTGTCCT

383	PRG2	NM_002728.4	257-	CTCTGGAAGTGAAGATGCCTCCA
			356	AGAAAGATGGGGCTGTTGAGTCT
				ATCTCAGTGCCAGATATGGTGGA
				CAAAAACCTTACGTGTCCTGAGG
				AAGAGGAC

384	PRPF3	NM_004698.2	2116-	CCTACAGAGAACATGGCTCGTGA
			2215	GCATTTCAAAAAGCATGGGGCTG
				AACACTACTGGGACCTTGCGCTG
				AGTGAATCTGTGTTAGAGTCCAC
				TGATTGAG

385	PRPF8	NM_006445.3	7091-	ACTCTGCGGATCGGGAGGACCTG
			7190	TATGCCTGACCGTTTCCCTGCCT
				CCTGCTTCAGCCTCCCGAGGCCG
				AAGCCTCAGCCCCTCCAGACAGG
				CCGCTGAC

386	C22orf	NM_173566.2	10495-	CCCGTTGAGCTGGCCATCTAGTG
	30		10594	CAGTGTGCTCTCAGATTCCATGT
				TTGTTGATTGTGTGTCTTCACAA
				GCCCCTCTCTGGTGCTGAATTGG
				ATTTGAAT

387	BAT2D1	NM_015172.3	9620-	AGAACAGTGAGTACCTAGAACT
			9719	GTGCCACTAATTAAAGGAAATCC
				TAAGAAGGTGCATTTCTTTACAG
				AGCTGTGTCATGCCATCCTTTGG
				GCCCTCTGC

388	PRRG4	NM_024081.5	761-	GAAGACCTGAGGAGGCTGCCTT
			860	GTCTCCATTGCCGCCTTCTGTGG
				AGGATGCAGGATTACCTTCTTAT
				GAACAGGCAGTGGCGCTGACCA
				GAAAACACAG

389	PSMA3	NM_152132.2	422-	CTTTGGCTACAACATTCCACTAA
			521	AACATCTTGCAGACAGAGTGGCC
				ATGTATGTGCATGCATATACACT
				CTACAGTGCTGTTAGACCTTTTG
				GCTGCAGT

390	PSMA4	NM_002789.3	541-	GTACATTGGCTGGGATAAGCACT
			640	ATGGCTTTCAGCTCTATCAGAGT
				GACCCTAGTGGAAATTACGGGG
				GATGGAAGGCCACATGCATTGG
				AAATAATAGC

391	PSMA4	NM_002789.4	879-	GAGGAAGAAGAAGCCAAAGCTG
	b		978	AGCGTGAGAAGAAAGAAAAAGA
				ACAGAAAGAAAAGGATAAATAG
				AATCAGAGATTTTATTACTCATT
				TGGGGCACCAT

392	PSMA6	NM_002791.2	218-	GGTCGGCTCTACCAAGTAGAATA
			317	TGCTTTTAAGGCTATTAACCAGG
				GTGGCCTTACATCAGTAGCTGTC
				AGAGGGAAAGACTGTGCAGTAA
				TTGTCACAC

393	PSMA6	NM_002791.2	866-	GATGCTCACCTTGTTGCTCTAGC
	b		965	AGAGAGAGACTAAACATTGTCG
				TTAGTTTACCAGATCCGTGATGC
				CACTTACCTGTGTGTTTGGTAAC
				AACAAACCA

394	PSMB1	NM_002793.3	687-	GCGGCTGGTGAAAGATGTCTTCA
			786	TTTCTGCGGCTGAGAGAGATGTG
				TACACTGGGGACGCACTCCGGAT
				CTGCATAGTGACCAAAGAGGGC
				ATCAGGGAG

395	PSMB7	NM_002799.2	421-	GTTACATTGGTGCAGCCCTAGTT
			520	TTAGGGGGAGTAGATGTTACTGG
				ACCTCACCTCTACAGCATCTATC
				CTCATGGATCAACTGATAAGTTG
				CCTTATGT

396	PSMB8	NM_004159.4	1216-	ACTCACAGAGACAGCTATTCTGG
			1315	AGGCGTTGTCAATATGTACCACA
				TGAAGGAAGATGGTTGGGTGAA
				AGTAGAAAGTACAGATGTCAGT
				GACCTGCTGC

397	PSMC1	NM_002802.2	1487-	CATCCTGTGTCTTTTGGAGTACG
			1586	ATGTGTAAGTGCCCATTGGGTGG
				CCTGTTGGTCACTGTGCAGCAGT
				CTGCTTCCCAATAAAGCGTGCTC
				TTTCACAA

398	PSMD7	NM_002811.4	1231-	GAGCTCTCTGCCTCCGGTCACTC
			1330	TTGCTGTGGTGCTACGTGGAAGT
				GAATGGAGACTGATCTCAAATCT
				GAACTGCAGCTTTCGCTGCTGTG
				AGTTGGGG

399	PSME3	NM_005789.3	3203-	TCCCGAGTGATACCCATGAACTG
			3302	CCAGTAGAGGCTGCTATCGTTCC
				ATGTGTAAGGAATGAACTGGTTC
				AAGGCGCGTCCTACCCAGTCATT
				TTCTTTAC

400	PTGDR	NM_000953.2	2341-	TATGATGACTGAAAGGGAAAAG
			2440	TGGAGGAAACGCAGCTGCAACT
				GAAGCGGAGACTCTAAACCCAG
				CTTGCAGGTAAGAGCTTTCACCT
				TTGGTAAAAGA

401	PTGDR2	NM_004778.1	1836-	GCCAATGCTTACTGCGCTAGACG
			1935	CTTCATCCCACAATCTTAAGGGG
				CAGCTTCTATTAGCCAGTCTTTA
				CAGCTGAGCACATTCTGGCTCAG
				GGAGGTTA

402	PUM1	NM_00102065	3753-	AAATGTTCTAGTGTAGAGTCTGA
		8.1	3852	GACGGGCAAGTGGTTGCTCCAG
				GATTACTCCCTCCTCCAAAAAAG
				GAATCAAATCCACGAGTGGAAA
				AGCCTTTGTA

403	QTRTD1	NM_024638.3	2508-	TTAGATTAGAGTCATAGCCTTAA
			2607	TAGCCCTAGTTGTCATCCTGGGA
				GACAGGCAACAGTAGAGATATT
				TGAGAGCCTAAAGAGAGGTTTG
				GCCTGTGGGT

404	RAB10	NM_016131.4	3593-	AGGGCTTTGCCCCTTTTCTGTAA
			3692	GTCTCTTGGGATCCTGTGTAGAA
				GCTGTTCTCATTAAACACCAAAC
				AGTTAAGTCCATTCTCTGGTACT
				AGCTACAA

405	RAG1	NM_000448.2	2301-	CAGTCTACATTTGTACTCTTTGT
			2400	GATGCCACCCGTCTGGAAGCCTC
				TCAAAATCTTGTCTTCCACTCTA
				TAACCAGAAGCCATGCTGAGAAC
				CTGGAACG

406	RASSF5	NM_182664.2	3061-	TCGTCCTGCATGTCTCTAACATT
			3160	AATAGAAGGCATGGCTCCTGCTG
				CAACCGCTGTGAATGCTGCTGAG
				AACCTCCCTCTATGGGGATGGCT
				ATTTTATT

407	RBM14	NM_006328.3	2661-	TGGTATGTATCCAAGTCCCTGCT
			2760	GACCACTAATGTTCTAGCTGATG
				GTGAGCGGCACAGTCCCACTTCC
				CCATCTCCCCAAGTAGGTGGTGT
				TAGAAAAC

408	RBM4B	NM_031492.3	1557-	TAGGAGTTGAATCCTTCTCCCTG
			1656	CCTACCTGCAGCATCTCCTTTCC
				CTTTAAAATGACCATGTAGTGGC
				AAGCAGCCTTTTACTCTTCTGTT
				AGCTCTGG

409	RBX1	NM_014248.3	158-	GATATTGTGGTTGATAACTGTGC
			257	CATCTGCAGGAACCACATTATGG
				ATCTTTGCATAGAATGTCAAGCT
				AACCAGGCGTCCGCTACTTCAGA
				AGAGTGTA

410	RELA	NM_021975.2	361-	GATGGCTTCTATGAGGCTGAGCT
			460	CTGCCCGGACCGCTGCATCCACA
				GTTTCCAGAACCTGGGAATCCAG
				TGTGTGAAGAAGCGGGACCTGG
				AGCAGGCTA

411	REPIN1	NM_014374.3	2491-	TGTGTCCAGGCTCTTGTCTGAAC
			2590	ACCGCAGCCCCTCCTTCGCTCCT
				TCCAGAGCTCAGCATGTCACGGC
				AAGGACTGCCGCATTGGTGATGG
				AGGGCCAG

412	REPS1	NM_00112861	1289-	CACCAACCAGTACTCTTTTAACC
		7.2	1388	ATGCATCCTGCTTCTGTCCAGGA
				CCAGACAACAGTACGAACTGTA
				GCATCAGCTACAACTGCCATTGA
				AATTCGTAG

413	RERE	NM_00104268	5916-	AACCCTCGACCCGAAACCCTCAC
		2.1	6015	CAGATAAACTACAGTTTGTTTAG
				GAGGCCCTGACCTTCATGGTGTC
				TTTGAAGCCCAACCACTCGGTTT
				CCTTCGGA

414	RERE	NM_012102.3	7734-	GCATTCTTGTTAGCTTTGCTTTT
	b		7833	CTCCCCATATCCCAAGGCGAAGC
				GCTGAGATTCTTCCATCTAAAAA
				ACCCTCGACCCGAAACCCTCACC
				AGATAAAC

415	RFWD2	NM_022457.6	2606-	TTTTCTTTTCCCTCCTTTATGAC
			2705	CTTTGGGACATTGGGAATACCCA
				GCCAACTCTCCACCATCAATGTA
				ACTCCATGGACATTGCTGCTCTT
				GGTGGTGT

416	RFX1	NM_002918.4	4187-	ATAAAAATCACTATTTTGTGTGC
			4286	TCCGCGTGCTATAGCTTTTGGGG
				CGGCCCTGCCCAGTCCCCGTGCC
				CACGGGGCTCCCTCTCCCGGTGG
				TGAAAGTG

417	RHOB	NM_004040.3	1707-	GGGAGGAGGGAGGATGCGCTGT
			1806	GGGGTTGTTTTTGCCATAAGCGA
				ACTTTGTGCCTGTCCTAGAAGTG
				AAAATTGTTCAGTCCAAGAAACT
				GATGTTATT

418	RHOG	NM_001665.3	1045-	CTTTCCACACAGTTGTTGCTGCC
			1144	TATTGTGGTGCCGCCTCAGGTTA
				GGGGCTCTCAGCCATCTCTAACC
				TCTGCCCTCGCTGCTCTTGGAAT
				TGCGCCCC

419	RHOU	NM_021205.5	4174-	TTGACAGACTCAAGAGAAACTA
			4273	CCCAGGTATTACACAAGCCAAA
				ATGGGAGCAAGGCCTTCTCTCCA
				GACTATCGTAACCTGGTGCCTTA
				CCAAGTTGTG

420	RNASE2	NM_002934.2	331-	TGACCTGTCCTAGTAACAAAACT
			430	CGCAAAAATTGTCACCACAGTGG
				AAGCCAGGTGCCTTTAATCCACT
				GTAACCTCACAACTCCAAGTCCA
				CAGAATAT

421	RNF114	NM_018683.3	2246-	AATTCAGATCATCTCAGAAGTCT
			2345	GGAGGGAAATCTGGCGAAACCT
				TCGTTTGAGGGACTGATGTGAGT
				GTATGTCCACCTCACTGGTGGCA
				CCGAGAAAC

422	RNF19B	NM_153341.3	2222-	CCCCAGAGCCCAAGGTGCACCG
			2321	AGCCCAAGTGCCCATATGAACCT
				CTCTGCCCTAGCCGAGGGACAAA
				CTGTCTTGAAGCCAGAAGGTGGA
				GAAGCCAGA

423	RNF214	NM_207343.3	2068-	ACCTGTAAGCTATGTCTAATGTG
			2167	CCAGAAACTCGTCCAGCCCAGTG
				AGCTGCATCCAATGGCGTGTACC
				CATGTATTGCACAAGGAGTGTAT
				CAAATTCT

424	RNF34	NM_025126.3	1619-	CTTCTGTCCTCTTTGGATGAGAT
			1718	CAGTGTCCACAAGTGGCCGACAT
				GGAACATGCTGAGCAGTGGCTCC
				TCTGAATGTTCACTTTATTAGTC
				ATGTATAT

425	C20orf	NM_080748.2	274-	CTCAGGATCGGAATGCGGGGTC
	52		373	GAGAGCTGATGGGCGGCATTGG
				GAAAACCATGATGCAGAGTGGC
				GGCACCTTTGGCACATTCATGGC
				CATTGGGATGG

426	RPL26	NM_016093.2	4-	CACTCAGGGTCTGAGGCAGCTAG
	L1		103	TAGCCGGAGGGTCACCATGAAG
				TTCAATCCCTTCGTTACCTCGGA
				CCGCAGTAAAAACCGCAAACGT
				CACTTCAATG

427	RPL3	NM_00103385	1072-	AGAAGAAAGCATTCATGGGACC
		3.1	1171	ACTGAAGAAAGACCGAATTGCA
				AAGGAAGAAGGAGCTTAATGCC
				AGGAACAGATTTTGCAGTTGGTG
				GGGTCTCAATA

428	RPL31	NM_000993.4	20-	CTTGCAACTGCGGCTTTCCTTCT
			119	CCCACAATCCTTCGCGCTCTTCC
				TTTCCAACTTGGACGCTGCAGAA
				TGGCTCCCGCAAAGAAGGGTGGC
				GAGAAGAA

429	RPL34	NM_000995.3	471-	ACCTCACCTCAGCTTGAGAGAGC
			570	CAGTTGTGTGCATCTCTTTCCAG
				TTTTGCATCCAGTGACGTCTGCT
				TGGCATCTTGAGATTGTTATGGT
				GAGAGTAT

430	RPL39L	NM_052969.1	139-	GCGGGTTCGGGTCGGTGACACGC
			238	AGACCTGAGGGAGCTGGGCCCG
				CCTTTTCCGCCCGCGCCCCAGGC
				CCTTGCAGATCGAGATTTGCGTC
				CTAGAGTGG

431	KIAA	NM_015203.4	4795-	CCCCTTGGGTCCCTCACACAGAG
	0460		4894	ACACCATCAGCCGGAGTGGTATA
				ATCTTACGGAGTCCCCGGCCAGA
				CTTTCGGCCTAGGGAACCTTTTC
				TCAGCAGA

432	RPS24	NM_001026.4	482-	ATGAAGAAAGTCAGGGGGACTG
			581	CAAAGGCCAATGTTGGTGCTGGC
				AAAAAGCCGAAGGAGTAAAGGT
				GCTGCAATGATGTTAGCTGTGGC
				CACTGTGGAT

433	RPS27L	NM_015920.3	241-	TAAAATGTCCAGGTTGCTACAAG
			340	ATCACCACGGTTTTCAGCCATGC
				TCAGACAGTGGTTCTTTGTGTAG
				GTTGTTCAACAGTGTTGTGCCAG
				CCTACAGG

434	RPS6	NM_001010.2	172-	GAATGGAAGGGTTATGTGGTCCG
			271	AATCAGTGGTGGGAACGACAAA
				CAAGGTTTCCCCATGAAGCAGGG
				TGTCTTGACCCATGGCCGTGTCC
				GCCTGCTAC

435	RSL24	NM_016304.2	1232-	TGGAGTGACACTACACTCTAGAA
	D1		1331	TTTCCACTTTGGAGAATACTCAG
				TTCCAACTTGTGATTCCTGATAG
				AACAGACTTTACTTTTCTAGCCC
				AGCATTGA

436	RWDD1	NM_00100746	998-	TGGAGGATGATGAAGATGATCC
		4.2	1097	AGACTATAATCCTGCTGACCCAG
				AGAGTGACTCAGCTGACTAATGG
				ACTGTCCCCATCTGCAGAGAGGC
				TTGACTGCC

437	RXRA	NM_002957.5	5301-	AGTAATTTTTAAAGCCTTGCTCT
			5400	GTTGTGTCCTGTTGCCGGCTCTG
				GCCTTCCTGTGACTGACTGTGAA
				GTGGCTTCTCCGTACGATTGTCT
				CTGAAACA

438	S100A	NM_005621.1	261-	CAAGATGAACAGGTCGACTTTCA
	12b		360	AGAATTCATATCCCTGGTAGCCA
				TTGCGCTGAAGGCTGCCCATTAC
				CACACCCACAAAGAGTAGGTAG
				CTCTCTGAA

439	S100A8	NM_002964.4	366-	GTTAACTTCCAGGAGTTCCTCAT
			465	TCTGGTGATAAAGATGGGCGTGG
				CAGCCCACAAAAAAAGCCATGA
				AGAAAGCCACAAAGAGTAGCTG
				AGTTACTGGG

440	SAMSN1	NM_022136.3	1024-	ACCTGAGCCCCTATCCTTGAGCT
			1123	CAGACATCTCCTTAAATAAGTCA
				CAGTTAGATGACTGCCCAAGGG
				ACTCTGGTTGCTATATCTCATCA
				GGAAATTCA

441	SAP	NM_024545.3	3091-	GATCTCCACCGAATAAACGAACT
	130b		3190	GATACAGGGAAATATGCAGAGG
				TGTAAACTTGTGATGGATCAAAT
				CAGTGAAGCCAGAGACTCCATG
				CTTAAGGTTT

442	SAP130	NM_024545.3	3720-	CGGTTCTTCTGCCTGACCTTCAA
			3819	ATGCCCATGTTGGCCTTTTACAG
				CAGTGCCACGGCACCAAGCGAG
				CTGCCACATCTCACACTCTAAAG
				GGTTTGAAC

443	CIP29	NM_033082.3	622-	AACTGGAACCACAGAGGATACA
			721	GAGGCAAAGAAGAGGAAAAGAG
				CAGAGCGCTTTGGGATTGCCTGA
				TGAAAAGTTCCTGATACTTTCTG
				TTCTCCAGTG

444	SFRS2	NM_004719.2	4203-	AGTTCTTCTCATGTAAGTAATAA
	IP		4302	CATGAGTACACCAGTTTTGCCTG
				CTCCGACAGCAGCCCCAGGAAA
				TACGGGAATGGTTCAGGGACCA
				AGTTCTGGTA

445	SFRS15	NM_020706.2	3635-	GAGAGAAGGAAGAAGCCCGAGG
			3734	AAAGGAAAAGCCTGAGGTGACA
				GACAGGGCAGGTGGTAACAAAA
				CCGTTGAACCTCCCATTAGCCAA
				GTGGGAAATGT

446	RBM16	NM_014892.4	4111-	TGATTATTTTGAAGGGGCCACTT
			4210	CTCAACGAAAAGGTGATAATGT
				GCCTCAGGTTAATGGTGAAAATA
				CAGAGAGACATGCTCAGCCACC
				ACCTATACCA

447	SDHA	NM_004168.3	2042-	GTCACTCTGGAATATAGACCCGT
			2141	GATCGACAAAACTTTGAACGAG
				GCTGACTGTGCCACCGTCCCGCC
				AGCCATTCGCTCCTACTGATGAG
				ACAAGATGT

448	SEC24C	NM_198597.2	4194-	AGGCAGAGGCAGCTGGAGCGCC
			4293	GTTCTCTCCTGCTGGGACACCGC
				TTGGGCTTTGGTATTGACTGAGT
				GGCTGACAGTTATCTTCCAACCC
				CAACTGGCT

449	SEMG1	NM_003007.2	1291-	GGCAGACACCAACATGGATCTC
			1390	ATGGGGGATTGGATATTGTAATT
				ATAGAGCAGGAAGATGACAGTG
				ATCGTCATTTGGCACAACATCTT
				AACAACGACC

450	SERPIN	NM_005024.1	891-	AGACAGTTATGATCTCAAGTCAA
	B10		990	CCCTGAGCAGTATGGGGATGAGT
				GATGCCTTCAGCCAAAGCAAAG
				CTGATTTCTCAGGAATGTCTTCA
				GCAAGAAAC

451	SETD2	NM_014159.6	7956-	TGGTTAGAAGCCATCAGAGGTGC
			8055	AAGGGCTTAGAAAAGACCCTGG
				CCAGACCTGACTCCACTCTTAAA
				CCTGGGTCTTCTCCTTGGCGGTG
				CTGTCAGCG

452	SFMBT1	NM_00100515	2844-	AAGGATCGAAGTTGCTGAAAGG
		8.2	2943	CTTCACCTGGACAGTAACCCCTT
				GAAGTGGAGTGTGGCAGACGTT
				GTGCGGTTCATCAGATCCACTGA
				CTGTGCTCCA

453	SFPQ	NM_005066.2	2800-	GGTTATGTAAGCAAAGCTGAACT
			2899	GTAAATCTTCAGGAATATGTATT
				AAGATTGTGGAATGGGTGTAAG
				ACAATTGGTAGGGGGTGAAAGT
				GGGTTTGATT

454	SGK1	NM_005627.3	1622-	ACGAGCGTTAGAGTGCCGCCTTA
			1721	GACGGAGGCAGGAGTTTCGTTA
				GAAAGCGGACGCTGTTCTAAAA
				AAGGTCTCCTGCAGATCTGTCTG
				GGCTGTGATG

455	SGK	NM_005627.3	173-	GAAGCAGAGGAGGATGGGTCTG
			272	AACGACTTTATTCAGAAGATTGC
				CAATAACTCCTATGCATGCAAAC
				ACCCTGAAGTTCAGTCCATCTTG
				AAGATCTCC

456	SGK1b	NM_005627.3	1814-	GGATATGCTGTGTGAACCGTCGT
			1913	GTGAGTGTGGTATGCCTGATCAC
				AGATGGATTTTGTTATAAGCATC
				AATGTGACACTTGCAGGACACTA
				CAACGTGG

457	SH2D3C	NM_170600.2	2795-	AGCACCCCAAGGACACTGTGATC
			2894	AACCCGAGAATGTTCTGGGTTCA
				ACTCAAGCATCTCCCTTGCACCT
				CCAGGGTCCTGCGTGGACTCTGG
				GTTCCATC

458	SIK1	NM_173354.3	4185-	TCGCTCATAAAGAAGTTTTTGGG
			4284	ATGGGAGAGAATCCAGACCATC
				TTGGGGCAGCCAGGCCCTTGCCT
				TCATTTTTACAGAGGTAGCACAA
				CTGATTCCA

459	SIN3A	NM_015477.2	4666-	TTTATTCCTGACGATTCCCTTGC
			4765	TGCCTACCCTTTTCTCTCCTCTG
				GTTCTCAACCTCAACGAGTTCAA
				ATCAGTTGTCCTTTTTAGCTCCC
				GTGGAACT

460	SLAMF8	NM_020125.2	3173-	AACAAATATTGATTGAGGGCGCT
			3272	GCATGTGCTGGGTACATTTCTTG
				GCACTTGGGAATCAGTAGTCAAG
				CGAAACCCTTGCCTTTGAGAGTT
				TATGGTCT

461	SLC11	NM_000578.3	2072-	GCAGGATAGAGTGGGACAGTTC
	A1		2171	CTGAGACCAGCCAACCTGGGGG
				CTTTAGGGACCTGCTGTTTCCTA
				GCGCAGCCATGTGATTACCCTCT
				GGGTCTCAGT

462	SLC15	NM_021082.3	2548-	AACTCATTAAAACTTGTGCAGTG
	A2		2647	TTGCTGGAGCTGGCCTGGTGTCT
				CCAAATGACCATGAAAATACAC
				ACGTATAATGGAGATCATTCTCT
				GTGGGTATG

463	SLC25	NM_000387.5	1511-	ATCTTCTTCAGTCCCTAGCCAGG
	A20		1610	AATACCCATTTGATTTCCAGGGT
				GCCATCTAATCCTGGGCTGTACA
				TGTGGATATGGACTTGAGGCCCA
				CCTCTGTG

464	SLC25	NM_016612.2	1217-	TCCAGCCCCTTGCCCTCTCCTCA
	A37		1316	CACGTAGATCATTTTTTTTTTGC
				AGGGTGCTGCCTATGGGCCCTCT
				GCTCCCCAATGCCTTAGAGAGAG
				GAGGGGAC

465	SLC45	NM_033102.2	2455-	AGTTTCTAGGATGAAACACTCCT
	A3		2554	CCATGGGATTTGAACATATGAAA
				GTTATTTGTAGGGGAAGAGTCCT
				GAGGGGCAACACACAAGAACCA
				GGTCCCCTC

466	SLC6	NM_003044.4	3220-	GATATTGCTAACTGATCACAGAT
	A12		3319	TCTTTCCCACCTCACAATCCTTC
				CGAATGTGCTCCAGGCAGCACCA
				TTTGCCATCCTGCTTCTAACGCA
				AACCCCTG

467	SLC6	NM_003043.5	4438-	ATTCTAGACCAAAGACACAGGC
	A6		4537	AGACCAAGTCCCCAGGCCCCGCC
				TGGAAGGAAGTCGTTCCTCAACT
				CTCCCCAAGGCACCTGTCTCCAA
				TCAGAGCCC

468	SLC9	NM_004252.3	1811-	ATTAACATGATTTTCCTGGTTGT
	A3R1		1910	TACATCCAGGGCATGGCAGTGGC
				CTCAGCCTTAAACTTTTGTTCCT
				ACTCCCACCCTCAGCGAACTGGG
				CAGCACGG

469	C14orf	NM_031210.5	46-	CGGCCTCAGCAGCGAGAGGTGC
	156		145	TGCGGCGCTGCGTAGAAGTATCA
				ATCAGCCGGTTGCTTTTGTGAGA
				AGAATTCCTTGGACTGCGGCGTC
				GAGTCAGCT

470	SMAR	NM_003074.3	5281-	CAATGGCCAGGGTTTTACCTACT
	CC1		5380	TCCTGCCAGTCTTTCCCAAAGGA
				AACTCATTCCAAATACTTCTTTT
				TTCCCCTGGAGTCCGAGAAGGAA
				AATGGAAT

471	SNORA	NR_002984.1	30-	CTCGTGGGACTCTAGAGGGAGTC
	56		129	AGTCTGCAACAGTAAGTGGTGA
				GTTCTTCTGTCCAGCGTCAGTAT
				TTTGATGGTGGCTTTAGACTTGC
				CAGATAACA

472	SNX11	NM_152244.1	2261-	CCCTCCCTGTCGCCCACTCCTCC
			2360	CTCCTCTGGCTATCCTACCCTGT
				CTGTGGGCTCTTTTACTACCAGC
				CTATGCTGTGGGACTGTCATGGC
				ATTTAGTT

473	SOCS1	NM_003745.1	1026-	TTAACTGTATCTGGAGCCAGGAC
			1125	CTGAACTCGCACCTCCTACCTCT
				TCATGTTTACATATACCCAGTAT
				CTTTGCACAAACCAGGGGTTGGG
				GGAGGGTC

474	SP2	NM_003110.5	2701-	GGGGGCAATGATGAGCATATGAA
			2800	TTTTTTCTCACTCTAGCAATTCC
				CTTTTCTAAATGACACAGCATTT
				AAACTCAAATCTGGATTCAGATA
				ACAGCACC

475	SPA17	NM_017425.3	176-	CAAGGATTTGGGAATCTTCTTGA
			275	AGGGCTGACACGCGAGATTCTG
				AGAGAGCAACCGGACAATATAC
				CAGCTTTTGCAGCAGCCTATTTT
				GAGAGCCTTC

476	SPEN	NM_015001.2	11995-	GTATTGCCCACTCATTTGTATAA
			12094	GTGCGCTTCGGTACAGCACGGGT
				CCTGCTCCCGCGATGTGGAAGTG
				TCACACGGCACCTGTACAAAAA
				GACTGGCTA

477	SPINK5	NM_006846.3	2596-	GAGCAATGACAAAGAGGATCTG
			2695	TGTCGTGAATTTCGAAGCATGCA
				GAGAAATGGAAAGCTTATCTGC
				ACCAGAGAAAATAACCCTGTTCG
				AGGCCCATAT

478	SPN	NM_003123.3	2346-	AGTGCCTGCGTGTGTCCACTCGT
			2445	GGGTGTGGTTTGTGTGCAAGAGC
				TGAGGATTTGGCGATGCTTGGGA
				GGGGTAGTTGTGGGTACAGACG
				GTGTGGGGG

479	SREB	NM_00100529	3985-	CCCCTCCTTGCTCTGCAGGCACC
	F1	1.2	4084	TTAGTGGCTTTTTTCCTCCTGTG
				TACAGGGAAGAGAGGGGTACATT
				TCCCTGTGCTGACGGAAGCCAAC
				TTGGCTTT

480	SFRS4	NM_005626.4	2080-	TACTCATGGCCCACAGTAGAATA
			2179	TCCAAAACGCCTTGGCTTTCAGG
				CCTGGCCTTTCCTACAGGGAGCT
				CAGTAACCTGGACGGCTCTAAGG
				CTGGAATG

481	ST6GA	NM_003032.2	3783-	CTGATTTTAATCTTCGAATCATG
	L1		3882	ACACTGAGTGCAGAGGAGGTGG
				CATTCCGACAGCAGGACATACAT
				GTTGGTGTGAAGACTGGGACGA
				CACTGGGTAG

482	STAG3	NM_012447.3	3424-	AAGTGCCTGCAGCATGTCTCCCA
			3523	GGCACCTGGCCATCCCTGGGGCC
				CAGTCACCACCTACTGCCACTCC
				CTCAGCCCTGTGGAGAACACAGC
				AGAGACCA

483	STAMBP	NM_006463.4	1926-	TTTCCTGTGGTTTATGGCAATAT
			2025	GAATGGAGCTTATTACTGGGGTG
				AGGGACAGCTTACTCCATTTGAC
				CAGATTGTTTGGCTAACACATCC
				CGAAGAAT

484	STAT6	NM_003153.4	3725-	ACTGTGCCCAAGTGGGTCCAAGT
			3824	GGCTGTGACATCTACGTATGGCT
				CCACACCTCCAATGCTGCCTGGG
				AGCCAGGGTGAGAGTCTGGGTC
				CAGGCCTGG

485	STIP1	NM_006819.2	1906-	CCCGGGGAAGACACAGAGACTC
			2005	GTACCTGCGCTGTTTGTGCCGCC
				GCTGCCTCTGGGCCCTCCCAGCA
				CACGCATGGTCTCTTCACCGCTG
				CCCTCGAGT

486	STK16	NM_003691.2	1420-	GGGGTAGCGGGGTCAGGACAATC
			1519	ATCTCAGTCCTGCATCTTTTCTT
				CTGCTTTCTTCCCTCCAAGAGCA
				AAACCTGGGCAAGGGGACTTAC
				TGAGTGGGG

487	STK38	NM_007271.3	3269-	TTGTCAGTGAAACTACTTTGGAT
			3368	TTTAACCTCTTAGAGGAAGAAAA
				AAGGTTAGGGAAGTGTCAACTCT
				GGATGAAGGTGATGTGTTTGCCT
				CTCAGTCT

488	STOM	NM_004099.5	2953-	TTCTGCCTTGTGAATTCGTAGTC
			3052	CAATCAGCTGAAATTAAATCACT
				TGGGAGGGACGCATAGAAGGAG
				CTCTAGGAACACAGTGCCAGTGC
				AGAAGTTTC

489	SYNJ1	NM_003895.3	4746-	CCCTCTGCTCCCGCCCGGCACCA
			4845	GCCCTCCAGTAGATCCTTTCACG
				ACCTTGGCCTCTAAGGCTTCACC
				CACACTGGACTTTACAGAAAGAT
				AACGCCAT

490	TAPBP	NM_003190.4	3397-	CTTGCCCTCCCTGGGTCGCAGAC
			3496	GAGGTCGGCCTCGTCATTCCCCG
				CAGACCGCCGCGCGTCCCTCTTG
				TGCGGTTCACCACAGTTGTATTT
				AAGTGATC

491	TAX1	NM_00107986	2081-	CAGCCAGCCTGCTCGAAACTTTA
	BP1	4.2	2180	GTCGGCCTGATGGCTTAGAGGAC
				TCTGAGGATAGCAAAGAAGATG
				AGAATGTGCCTACTGCTCCTGAT
				CCTCCAAGT

492	TBC1	NM_015188.1	5451-	TTCCAAGGAATGCACTAAGCCTT
	D12		5550	CAGTCTTTTTAGACTGACAGTAC
				TGGCAGCTAAAATATTGTACTGT
				ATCTTCTCTTGAGCCCAGTATGT
				AGGAAATA

493	TBCE	NM_00107951	1541-	TATGCTGAAAAACCAGCTACTAA
		5.2	1640	CACTGAAGATAAAATACCCTCAT
				CAACTTGATCAGAAAGTCCTGGA
				GAAACAACTGCCGGGCTCCATG
				ACAATTCAA

494	TBK1	NM_013254.2	1611-	ACCAGTCTTCAGGATATCGACAG
			1710	CAGATTATCTCCAGGTGGATCAC
				TGGCAGACGCATGGGCACATCA
				AGAAGGCACTCATCCGAAAGAC
				AGAAATGTAG

495	TBP	NM_003194.4	1441-	TGTAAGTGCCCACCGCGGGATGC
			1540	CGGGAAGGGGCATTATTTGTGCA
				CTGAGAACACCGCGCAGCGTGA
				CTGTGAGTTGCTCATACCGTGCT
				GCTATCTGG

496	TCF20	NM_181492.2	6765-	CCAGGCCTGTGTTGCCAGAGCTG
			6864	GCAGTGTGAGCTGTAGGCAGGG
				ACGGGGAGGGACTGTCGCTGTG
				ATCAGAGTGGGTTAAGCTGACCA
				GGAACACCCA

497	TCF7L2	NM_030756.4	2067-	GGCCCACCTGTCCATGATGCCTC
			2166	CGCCACCCGCCCTCCTGCTCGCT
				GAGGCCACCCACAAGGCCTCCG
				CCCTCTGTCCCAACGGGGCCCTG
				GACCTGCCC

498	TCP1	NM_030752.2	254-	GTGTTCGGTGACCGCAGCACTGG
			353	GGAAACGATCCGCTCCCAAAAC
				GTTATGGCTGCAGCTTCGATTGC
				CAATATTGTAAAAAGTTCTCTTG
				GTCCAGTTG

499	TFCP2	NM_005653.4	2271-	CCTCTGAAAACGGCCCTCTTGAA
			2370	GGGGGATATGAATGGAGATTTG
				AAGGTCTGCAAGAACCTGACTCG
				TCTGACTGTGTGTGGAGGAGTCC
				AGGCCATGG

500	TGIF1	NM_003244.2	1041-	ACCTCAACCAGGACTTCAGTGGA
			1140	TTTCAGCTTCTAGTGGATGTTGC
				ACTCAAACGGGCTGCAGAGATG
				GAGCTTCAGGCAAAACTTACAGC
				TTAACCCAT

501	TGIF1b	NM_173208.1	691-	CCCCGGGATCAGTTTTGGCTCGT
			790	CCATCAGTGATCTGCCATACCAC
				TGTGACTGCATTGAAAGATGTCC
				CTTTCTCTCTCTGCCAGTCGGTC
				GGTGTGGG

502	TIAM1	NM_003253.2	5293-	CCTAACTCTGCCCACCCTCCTGT
			5392	ACCGTCGACAAGAATGTCCCCTT
				AGGTCGCGCTCTTGCACACACGG
				TTTTGGCAGCTGACTTGGTTCTG
				AAGCCATG

503	TIMM8B	ENST0000050	339-	GAATGACAGAAGCAAAGGACTT
		4148.1	438	GTTACTAAGCAGATTTAAGGGTC
				AGTGGGGGAAGGCTATCAACCC
				ATTGTCAGATCAGCATCAGGCTG
				TTATCAAGTC

504	TM2D2	NM_078473.2	2970-	ACCCATCATCCATCTGCCCACAA
			3069	ACCTGGCCAAATGTGATACAACC
				TGAAAACCTGATGGACTAAAGG
				AGTACTATTTAACAATTGATTGC
				CTTTGCACT

505	TM9SF1	NM_006405.6	1996-	CGCTGGTGGTGGCGATCTGTGCT
			2095	GAGTGTTGGCTCCACCGGCCTCT
				TCATCTTCCTCTACTCAGTTTTC
				TATTATGCCCGGCGCTCCAACAT
				GTCTGGGG

506	CCDC72	NM_015933.4	124-	GAGGAGCAGAAGAAACTCGAGG
			223	AGCTAAAAGCGAAGGCCGCGGG
				GAAGGGGCCCTTGGCCACAGGT
				GGAATTAAGAAATCTGGCAAAA
				AGTAAGCTGTTC

507	TMBIM6	NM_003217.2	2282-	CTCTCCCTATTCACAACCAGTGC
			2381	ACAGTTTGACACAGTGGCCTCAG
				GTTCACAGTGCACCATGTCACTG
				TGCTATCCTACGAAATCATTTGT
				TTCTAAGT

508	TMC8	NM_152468.4	2238-	AGGCCAATGCCAGGGCCATCCA
			2337	CAGGCTCCGGAAGCAGCTGGTGT
				GGCAGGTTCAGGAGAAGTGGCA
				CCTGGTGGAGGACCTGTCGCGAC
				TGCTGCCGGA

509	TMC01	NM_019026.3	992-	TCATTTACATAAGTATTTTCTGT
			1091	GGGACCGACTCTCAAGGCACTGT
				GTATGCCCTGCAAGTTGGCTGTC
				TATGAGCATTTAGAGATTTAGAA
				GAAAAATT

510	TMEM	NM_00110082	7652-	AGGAGAATAAATGTTGGAGGGG
	170B	9.2	7751	TAATACACAAAAACAAAGGCAT
				ATTTGATGAAGTACCCTGTGTTA
				TGTGAACACAATTTCCCCTTCTG
				TTAAGACTAT

511	TMEM	NM_00108054	1313-	GCTCTGTGAAGGCAATGAGTGTC
	218	6.2	1412	ACTTCCCTCTGCTCTAATAAAGC
				AATAAATAATAGCTAAAGGGCT
				GACTTTCACTTCGAACTCTTGGC
				CACGGCTTT

512	TMEM70	NM_017866.5	1952-	GGTGGTTAGCTATACGGGAAATG
			2051	GTAAGTAGTGTTGTCTTCAGTAT
				CTTAATTTGTTTCTGCAACTGTG
				CACTCCTCCCTTGGTGGCACCCT
				ATGGGTGT

513	TMSB4X	NM_021109.3	286-	TTAACTTTGTAAGATGCAAAGAG
			385	GTTGGATCAAGTTTAAATGACTG
				TGCTGCCCCTTTCACATCAAAGA
				ACTACTGACAACGAAGGCCGCG
				CCTGCCTTT

514	TNFR	NM_001561.5	1848-	GCCTGGAGGAAGTTTTGGAAAG
	SF9		1947	AGTTCAAGTGTCTGTATATCCTA
				TGGTCTTCTCCATCCTCACACCT
				TCTGCCTTTGTCCTGCTCCCTTT
				TAAGCCAGG

515	TNF	NM_003808.3	811-	AGTCAGAGAGCCGGCACTCTCA
	SF13		910	GTTGCCCTCTGGTTGAGTTGGGG
				GGCAGCTCTGGGGGCCGTGGCTT
				GTGCCATGGCTCTGCTGACCCAA
				CAAACAGAG

516	TNFSF8	NM_001244.3	519-	CCCTCAAAGGAGGAAATTGCTCA
			618	GAAGACCTCTTATGTATCCTGAA
				AAGGGCTCCATTCAAGAAGTCAT
				GGGCCTACCTCCAAGTGGCAAA
				GCATCTAAA

517	TOMM7	NM_019059.2	251-	TCTGGCTCGGATAAGAGATGGG
			350	ACATCATTCAGTCACTAGTTGGA
				TGGCACAAGGCTCTTCACAGACG
				CATCTGTAGCAGAGTGGATCTTG
				TACTAACTT

518	TP53	NM_005657.2	5591-	TACTTCCTGTGCCTTGCCAGTGG
	BP1		5690	GATTCCTTGTGTGTCTCATGTCT
				GGGTCCATGATAGTTGCCATGCC
				AACCAGCTCCAGAACTACCGTAA
				TTATCTGT

519	TPR	NM_003292.2	7194-	TCTCCCCTCCACCAGCCAGGATC
			7293	CTCCTTCTAGCTCATCTGTAGAT
				ACTAGTAGTAGTCAACCAAAGCC
				TTTCAGACGAGTAAGACTTCAGA
				CAACATTG

520	TPT1	NM_003295.3	18-	GCCTGCGTCGCTTCCGGAGGCGC
			117	AGCGGGCGATGACGTAGAGGGA
				CGTGCCCTCTATATGAGGTTGGG
				GAGCGGCTGAGTCGGCCTTTTCC
				GCCCGCTCC

521	TRAF	NM_147686.3	2449-	GCCAGTGTCCCATATGTTCCTCC
	3IP2		2548	TGACAGTTTGATGTGTCCATTCT
				GGGCCTCTCAGTGCTTAGCAAGT
				AGATAATGTAAGGGATGTGGCA
				GCAAATGGA

522	TRAF6	NM_145803.1	1840-	CACCCGCTTTGACATGGGTAGCC
			1939	TTCGGAGGGAGGGTTTTCAGCCA
				CGAAGTACTGATGCAGGGGTAT
				AGCTTGCCCTCACTTGCTCAAAA
				ACAACTACC

523	LBA1	NM_014831.2	10132-	CTGGGAAACCTTCATGCCTCTCT
			10231	GATGGTTACTGCCCACCCTTACC
				CCACCCCTCAGCTCAGCCTGGTA
				TGGAAAGCAAGGTGCACGTTGG
				TCTTTGATT

524	TRIM21	NM_003141.3	1637-	TCTGCAGAGGCATCCGGATCCCA
			1736	GCAAGCGAGCTTTAGCAGGGAA
				GTCACTTCACCATCAACATTCCT
				GCCCCAGATGGCTTTGTGATTCC
				CTCCAGTGA

525	TRIM32	NM_012210.3	2681-	GTGCTACCAAAGGGGATACACA
			2780	AGCCCTTTAGGAAGCAGTACCTC
				TCGCCTGGAGGATCTGTGCCATC
				TTGGATTGAGAATTGCAGATGTG
				ACAGAATGG

526	TRIM39	NM_021253.3	3141-	CTGCTATTCGGGTAATCTTCACA
			3240	GAAATGACTGAGAGAAGAATCT
				GCAGTTTACTGAGGGCATTTCAG
				TTCCTCCTACCACCTCAACAGGA
				CTTTGTCCA

527	TRIM	NM_172016.2	2841-	CTCTATACCAATAAGTCAGTCAC
	39b		2940	CTTGCTCCTCTCCAGAGGCAAAG
				TGGAAGAGATCCTGCAAGACAC
				ATCTATCCTTTCACAGTGTTCCC
				AAGGGAACT

528	TRRAP	NM_003496.3	12169-	AGTTGATGAACCCATCATGCTGG
			12268	TTTTTCTCTGAGCACAAAGTTTT
				AGGCTGTACACAGCCAGCCTTGG
				GAATCTCGTTGAGCGTTCGGCGT
				GGATCCAC

529	TSC1	NM_000368.4	8068-	CCCCAGACCAACCCTTCCCTCCC
			8167	TTTCCCCACCTCTTACAGTGTTT
				GGGACAGGAGGGTATGGTGCTGC
				TCTGTGTAGCAAGTACTTTGGCT
				ATTGAAAGA

530	TTC9	NM_015351.1	4050-	TACTAATCAGGCATCTGACCTGC
			4149	ACTGTCATCCCCTGCCTGGACTT
				TTGCGATGGACTCTTTGGGGGAA
				AAACTAACGCTTTTTAATTATTG
				TGAAAGCA

531	TTN	NM_133378.4	850-	TCGACTGCTCAGATCTCAGAATC
			949	AAGACAAACCCGAATTGAAAAG
				AAGATTGAAGCCCACTTTGATGC
				CAGATCAATTGCAACAGTTGAGA
				TGGTCATAG

532	TUBB	NM_178014.2	2223-	CAAAAAAGAATGAACACCCCTG
			2322	ACTCTGGAGTGGTGTATACTGCC
				ACATCAGTGTTTGAGTCAGTCCC
				CAGAGGAGAGGGGAACCCTCCT
				CCATCTTTTT

533	TUG1	NR_002323.2	7082-	TAAGCTAGAGGTCATGGTCACTG
			7181	AAATTACTTTCCAAAGTGGAAGA
				CAAAATGAAACAGGAACTGAGG
				GAATATTTAAGATCCCACAGAAG
				CGTAAAAAT

534	TXN	NM_003329.3	152-	TTGGATCCATTTCCATCGGTCCT
			251	TACAGCCGCTCGTCAGACTCCAG
				CAGCCAAGATGGTGAAGCAGATC
				GAGAGCAAGACTGCTTTTCAGGA
				AGCCTTGG

535	TXNDC	NM_032731.3	378-	TCATCTACTGCCAAGTAGGAGAA
	17		477	AAGCCTTATTGGAAAGATCCAAA
				TAATGACTTCAGAAAAAACTTGA
				AAGTAACAGCAGTGCCTACACTA
				CTTAAGTA

536	TXNRD1	NM_00109377	3348-	CTCAGTTGCAGCACTGAGTGGTC
		1.2	3447	AAAATACATTTCTGGGCCACCTC
				AGGGAACCCATGCATCTGCCTGG
				CATTTAGGCAGCAGAGCCCCTGA
				CCGTCCCC

537	TXNR	NM_182743.2	2438-	TGTTGCATGGAAGGGATAGTTTG
	D1b		2537	GCTCCCTTGGAGGCTATGTAGGC
				TTGTCCCGGGAAAGAGAACTGTC
				CTGCAGCTGAAATGGACTGTTCT
				TTACTGAC

538	U2AF2	NM_007279.2	2871-	TTTATGGCCAAACTATTTTGAAT
			2970	TTTGTTGTCCGGCCCTCAGTGCC
				CTGCCCTCTCCCTTACCAGGACC
				ACAGCTCTGTTCCTTCGGCCTCT
				GGTCCTCT

539	UBA1	NM_003334.3	3307-	CCGCCACGTGCGGGCGCTGGTGC
			3406	TTGAGCTGTGCTGTAACGACGAG
				AGCGGCGAGGATGTCGAGGTTC
				CCTATGTCCGATACACCATCCGC
				TGACCCCGT

540	UBC	NM_021009.3	1876-	TGCAGATCTTCGTGAAGACCCTG
			1975	ACTGGTAAGACCATCACTCTCGA
				AGTGGAGCCGAGTGACACCATT
				GAGAATGTCAAGGCAAAGATCC
				AAGACAAGGA

541	UBE2G1	NM_003342.4	685-	ACGCTGGCTCCCTATCCACACTG
			784	TGGAAACCATCATGATTAGTGTC
				ATTTCTATGCTGGCAGACCCTAA
				TGGAGACTCACCTGCTAATGTTG
				ATGCTGCG

542	UBE2I	NM_194259.2	288-	CTGCTCTGCTGACTGGGGAAGTC
			387	ATCGTGCCACCCAGAACCTGAGT
				GCGGGCCTCTCAGAGCTCCTTCG
				TCCGTGGGTCTGCCGGGGACTGG
				GCCTTGTC

543	UBTF	NM_00107668	2724-	GGGGGTCCCAAAGAGTTTGATG
		3.1	2823	AGGCCCTCCACACCTGCGGCCCA
				ATCCAAGGTGGGGTGGAAGCTT
				GGGGAAGACCCATTCCTTCCCAG
				AGGGGCCTGC

544	UQCRQ	NM_014402.4	97-	TGACGCGGATGCGGCATGTGATC
			196	AGCTACAGCTTGTCACCGTTCGA
				GCAGCGCGCCTATCCGCACGTCT
				TCACTAAAGGAATCCCCAATGTT
				CTGCGCCG

545	USP16	NM_00103241	2487-	TCTATTCCTTATATGGAGTTGTT
		0.1	2586	GAACACAGTGGTACTATGAGGTC
				GGGGCATTACACTGCCTATGCCA
				AGGCAAGAACCGCAAATAGTCAT
				CTCTCTAA

546	USP21	NM_012475.4	1499-	CCTTTTCACTAAGGAAGAAGAGC
			1598	TAGAGTCGGAGAATGCCCCAGT
				GTGTGACCGATGTCGGCAGAAA
				ACTCGAAGTACCAAAAAGTTGA
				CAGTACAAAGA

547	USP34	NM_014709.3	10104-	AGGAGCACACTGTAGACAGCTG
			10203	CATCAGTGACATGAAAACAGAA
				ACCAGGGAGGTCCTGACCCCAA
				CGAGCACTTCTGACAATGAGACC
				AGAGACTCCTC

548	USP5	NM_003481.2	2720-	AGAGCAGAGGGGCAGCGATAGA
			2819	CTCTGGGGATGGAGCAGGACGG
				GGACGGGAGGGGCCGGCCACCT
				GTCTGTAAGGAGACTTTGTTGCT
				TCCCCTGCCCC

549	USP9Y	NM_004654.3	86-	GGTGTGGAAAGACTTTTCTGGGC
			185	TCAGAGGTGAAACTGACCCTTGT
				GTATCAGCAGCATTTCTGACTGA
				CTGAGAGAGTGTAGTGATTAACA
				GAGTTGTG

550	VPS37C	NM_017966.4	2579-	TTATAAAGAGAAATCACTAATGG
			2678	ACTCTACTGGTTTGAGTGCTTCT
				GAGCTGGATGACCGACCGCCTGT
				ATGTTTGTGTAATTAATTGCCAT
				AATAAACT

551	WDR1	NM_005112.4	2325-	AACTGTTGCCTGTCAGTGTTTAC
			2424	AAACTAGTGCGTTGACGGCACCG
				TGTCCAAGTTTTTAGAACCCTTG
				TTAGCCAGACCGAGGTGTCCTGG
				TCACCGTT

552	WDR91	NM_014149.3	2777-	CAGGCTCTCCTGTTGCTTTGCCA
			2876	TGGAGCCAGGTCAGCTCTCTGTC
				TGTTCTGCTGGGTAACAAGGTTT
				GGCAGTTCCTGTTTCTCTGGGCT
				TAAGTCAA

553	XCL2	NM_003175.3	378-	GTAGTCTCTGGCACCCTGTCCGT
			477	CTCCAGCCAGCCAGCTCATTTCA
				CTTTACACCCTCATGGACTGAGA
				TTATACTCACCTTTTATGAAAGC
				ACTGCATG

554	XPC	NR_027299.1	3168-	CTGGATGGTGGTGCATCCGTGAA
			3267	TGCGCTGATCGTTTCTTCCAGTT
				AGAGTCTTCATCTGTCCGACAAG
				TTCACTCGCCTCGGTTGCGGACC
				TAGGACCA

555	YPEL1	NM_013313.4	3672-	GCTCATTTTTAAACCAAATGAAC
			3771	AGACCATGAGCTGGCTTCAGGG
				GAAGTGCTATTCACAGGACCATA
				TCCACCACCCTCTTAAATTCCTA
				AACAATATC

556	ZMIZ1	NM_020338.3	7171-	ATGATCACAGGTGATTCACACGT
			7270	ACACACATAAACACACCCACCA
				GTGCAGCCTGAAGTAACTCCCAC
				AGAAACCATCATCGTCTTTGTAC
				ATCGTATGT

557	ZNF143	NM_003442.5	2292-	TATCAGATCACAAACTCCTAGAG
			2391	TCTACATGCAAGACTAGTAAAGT
				CTTATGGAGTCTTATGATGGATT
				TTTAACTTCCCGTGGAAAAAAAA
				ATAAAGGC

558	ZNF239	NM_00109928	1496-	AGAGCTCCAACCTTCACATCCAC
		3.1	1595	CAGCGGGTTCACAAGAAAGATC
				CTCGCTAACTGACATTAGCCCAT
				TCAGGTCTTCACAGCGCTCATAC
				TGTAAAAAC

559	ZNF341	NM_032819.4	3247-	CAGACGGTTCCCCACAGCATCCT
			3346	CAGACAGCTCTGTGATGTAGCTT
				TTAGGAGGCACTCAGGTGTCACG
				GCTAGACTGCAGCTATGAGACA
				GATCTGGCT

C. Polymerase Chain Reaction (PCR) Techniques
Another suitable quantitative method is RT-PCR, which can be used to compare mRNA levels in different sample populations, in normal and tumor tissues, to characterize patterns of gene expression, to discriminate between closely related mRNAs, and to analyze RNA structure. The first step is the isolation of mRNA from a target sample (e.g., typically total RNA isolated from human PBMC). mRNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g. formalin-fixed) tissue samples.
General methods for mRNA extraction are well known in the art, such standard textbooks of molecular biology. In particular, RNA isolation can be performed using a purification kit, buffer set and protease from commercial manufacturers, according to the manufacturer's instructions. Exemplary commercial products include TRI-REAGENT, Qiagen RNeasy mini-columns, MASTERPURE Complete DNA and RNA Purification Kit (EPICENTRE®, Madison, Wis.), Paraffin Block RNA Isolation Kit (Ambion, Inc.) and RNA Stat-60 (Tel-Test). Conventional techniques such as cesium chloride density gradient centrifugation may also be employed.
The first step in gene expression profiling by RT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction. The two most commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MMLV-RT). The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling. See, e.g., manufacturer's instructions accompanying the product GENEAMP RNA PCR kit (Perkin Elmer, Calif, USA). The derived cDNA can then be used as a template in the subsequent RT-PCR reaction.
The PCR step generally uses a thermostable DNA-dependent DNA polymerase, such as the Taq DNA polymerase, which has a 5′-3′ nuclease activity but lacks a 3′-5′ proofreading endonuclease activity. Thus, TAQMAN® PCR typically utilizes the 5′-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5′ nuclease activity can be used. Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction. In one embodiment, the target sequence is shown in Table III. A third oligonucleotide, or probe, is designed to detect nucleotide sequence located between the two PCR primers. The probe is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye and a quencher fluorescent dye. Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore. One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.
TaqMan® RT-PCR can be performed using commercially available equipment. In a preferred embodiment, the 5′ nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7900® Sequence Detection System®. The system amplifies samples in a 96-well format on a thermocycler. During amplification, laser-induced fluorescent signal is collected in real-time through fiber optic cables for all 96 wells, and detected at the CCD. The system includes software for running the instrument and for analyzing the data. 5′-Nuclease assay data are initially expressed as Ct, or the threshold cycle. As discussed above, fluorescence values are recorded during every cycle and represent the amount of product amplified to that point in the amplification reaction. The point when the fluorescent signal is first recorded as statistically significant is the threshold cycle (Ct).
To minimize errors and the effect of sample-to-sample variation, RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a constant level among different tissues, and is unaffected by the experimental treatment. RNAs most frequently used to normalize patterns of gene expression are mRNAs for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and β-actin.
Real time PCR is comparable both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization, and with quantitative comparative PCR using a normalization gene contained within the sample, or a housekeeping gene for RT-PCR.
In another PCR method, i.e., the MassARRAY-based gene expression profiling method (Sequenom, Inc., San Diego, CA), following the isolation of RNA and reverse transcription, the obtained cDNA is spiked with a synthetic DNA molecule (competitor), which matches the targeted cDNA region in all positions, except a single base, and serves as an internal standard. The cDNA/competitor mixture is PCR amplified and is subjected to a post-PCR shrimp alkaline phosphatase (SAP) enzyme treatment, which results in the dephosphorylation of the remaining nucleotides. After inactivation of the alkaline phosphatase, the PCR products from the competitor and cDNA are subjected to primer extension, which generates distinct mass signals for the competitor- and cDNA-derived PCR products. After purification, these products are dispensed on a chip array, which is pre-loaded with components needed for analysis with matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) analysis. The cDNA present in the reaction is then quantified by analyzing the ratios of the peak areas in the mass spectrum generated.
Still other embodiments of PCR-based techniques which are known to the art and may be used for gene expression profiling include, e.g., differential display, amplified fragment length polymorphism (iAFLP), and BeadArray™ technology (Illumina, San Diego, CA) using the commercially available Luminex100 LabMAP system and multiple color-coded microspheres (Luminex Corp., Austin, Tex.) in a rapid assay for gene expression; and high coverage expression profiling (HiCEP) analysis.
D. Microarrays
Differential gene expression can also be identified, or confirmed using the microarray technique. Thus, the expression profile of lung cancer-associated genes can be measured in either fresh or paraffin-embedded tissue, using microarray technology. In this method, polynucleotide sequences of interest (including cDNAs and oligonucleotides) are plated, or arrayed, on a microchip substrate. The arrayed sequences are then hybridized with specific DNA probes from cells or tissues of interest. Just as in the other methods and compositions herein, the source of mRNA is total RNA isolated from whole blood of controls and patient subjects.
In one embodiment of the microarray technique, PCR amplified inserts of cDNA clones are applied to a substrate in a dense array. In one embodiment, all 559 nucleotide sequences from Table III are applied to the substrate. The microarrayed genes, immobilized on the microchip, are suitable for hybridization under stringent conditions. Fluorescently labeled cDNA probes may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from tissues of interest. Labeled cDNA probes applied to the chip hybridize with specificity to each spot of DNA on the array. After stringent washing to remove non-specifically bound probes, the chip is scanned by confocal laser microscopy or by another detection method, such as a CCD camera. Quantitation of hybridization of each arrayed element allows for assessment of corresponding mRNA abundance. With dual color fluorescence, separately labeled cDNA probes generated from two sources of RNA are hybridized pairwise to the array. The relative abundance of the transcripts from the two sources corresponding to each specified gene is thus determined simultaneously. The miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels. Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols.
Other useful methods summarized by U.S. Pat. No. 7,081,340, and incorporated by reference herein include Serial Analysis of Gene Expression (SAGE) and Massively Parallel Signature Sequencing (MPSS). Briefly, serial analysis of gene expression (SAGE) is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need of providing an individual hybridization probe for each transcript. First, a short sequence tag (about 10 to 14 bp) is generated that contains sufficient information to uniquely identify a transcript, provided that the tag is obtained from a unique position within each transcript. Then, many transcripts are linked together to form long serial molecules, that can be sequenced, revealing the identity of the multiple tags simultaneously. The expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. For more details see, e.g. Velculescu et al., Science 270:484 487 (1995); and Velculescu et al., Cell 88:243 51 (1997), both of which are incorporated herein by reference.
Gene Expression Analysis by Massively Parallel Signature Sequencing (MPSS), described by Brenner et al., Nature Biotechnology 18:630 634 (2000) (which is incorporated herein by reference), is a sequencing approach that combines non-gel-based signature sequencing with in vitro cloning of millions of templates on separate 5 μm diameter microbeads. First, a microbead library of DNA templates is constructed by in vitro cloning. This is followed by the assembly of a planar array of the template-containing microbeads in a flow cell at a high density (typically greater than 3×10⁶microbeads/cm²). The free ends of the cloned templates on each microbead are analyzed simultaneously, using a fluorescence-based signature sequencing method that does not require DNA fragment separation. This method has been shown to simultaneously and accurately provide, in a single operation, hundreds of thousands of gene signature sequences from a yeast cDNA library.
E. Immunohistochemistry
Immunohistochemistry methods are also suitable for detecting the expression levels of the gene expression products of the informative genes described for use in the methods and compositions herein. Antibodies or antisera, preferably polyclonal antisera, and most preferably monoclonal antibodies, or other protein-binding ligands specific for each marker are used to detect expression. The antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline phosphatase. Alternatively, unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. Protocols and kits for immunohistochemical analyses are well known in the art and are commercially available.

III. COMPOSITIONS OF THE INVENTION

The methods for diagnosing lung cancer described herein which utilize defined gene expression profiles permit the development of simplified diagnostic tools for diagnosing lung cancer, e.g., NSCLC vs. non-cancerous nodule. Thus, a composition for diagnosing lung cancer in a mammalian subject as described herein can be a kit or a reagent. For example, one embodiment of a composition includes a substrate upon which said polynucleotides or oligonucleotides or ligands or ligands are immobilized. In another embodiment, the composition is a kit containing the relevant 5 or more polynucleotides or oligonucleotides or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items. In still another embodiment, at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label.
In one embodiment, a composition for diagnosing lung cancer in a mammalian subject includes 5 or more PCR primer-probe sets. Each primer-probe set amplifies a different polynucleotide sequence from a gene expression product of 5 or more informative genes found in the blood of the subject. These informative genes are selected to form a gene expression profile or signature which is distinguishable between a subject having lung cancer and a subject having a non-cancerous nodule. Changes in expression in the genes in the gene expression profile from that of a reference gene expression profile are correlated with a lung cancer, such as non-small cell lung cancer (NSCLC).
In one embodiment of this composition, the informative genes are selected from among the genes identified in Table I. In another embodiment of this composition, the informative genes are selected from among the genes identified in Table II. This collection of genes is those for which the gene product expression is altered (i.e., increased or decreased) versus the same gene product expression in the blood of a reference control (i.e., a patient having a non-cancerous nodule). In one embodiment, polynucleotide or oligonucleotide or ligands, i.e., probes, are generated to 5 or more informative genes from Table I or Table II for use in the composition (the CodeSet). An example of such a composition contains probes to a targeted portion of the 559 genes of Table I. In another embodiment, probes are generated to all 559 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 539 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 3 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 5 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 10 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 15 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 20 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 25 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 30 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 35 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 40 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 45 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 50 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 60 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 65 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 70 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 75 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 80 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 85 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 90 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 95 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 100 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 200 genes from Table I for use in the composition. In yet another embodiment, probes are generated to 300 genes from Table I for use in the composition. Still other embodiments employ probes to a targeted portion of other combinations of the genes in Table I or Table II. The selected genes from the Table need not be in rank order; rather any combination that clearly shows a difference in expression between the reference control to the diseased patient is useful in such a composition.
In one embodiment of the compositions described above, the reference control is a non-healthy control (NHC) as described above. In other embodiments, the reference control may be any class of controls as described above in “Definitions”.
The compositions based on the genes selected from Table I or Table II described herein, optionally associated with detectable labels, can be presented in the format of a microfluidics card, a chip or chamber, or a kit adapted for use with the Nanostring, PCR, RT-PCR or Q PCR techniques described above. In one aspect, such a format is a diagnostic assay using TAQMAN® Quantitative PCR low density arrays. In another aspect, such a format is a diagnostic assay using the Nanostring nCounter platform.
For use in the above-noted compositions the PCR primers and probes are preferably designed based upon intron sequences present in the gene(s) to be amplified selected from the gene expression profile. Exemplary target sequences are shown in Table III. The design of the primer and probe sequences is within the skill of the art once the particular gene target is selected. The particular methods selected for the primer and probe design and the particular primer and probe sequences are not limiting features of these compositions. A ready explanation of primer and probe design techniques available to those of skill in the art is summarized in U.S. Pat. No. 7,081,340, with reference to publically available tools such as DNA BLAST software, the Repeat Masker program (Baylor College of Medicine), Primer Express (Applied Biosystems); MGB assay-by-design (Applied Biosystems); Primer3 (Steve Rozen and Helen J. Skaletsky (2000) Primer3 on the WWW for general users and for biologist programmers.
In general, optimal PCR primers and probes used in the compositions described herein are generally 17-30 bases in length, and contain about 20-80%, such as, for example, about 50-60% G+C bases. Melting temperatures of between 50 and 80° C., e.g. about 50 to 70° C. are typically preferred.
In another aspect, a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I. In another embodiment, a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I or Table II. This type of composition relies on recognition of the same gene profiles as described above for the Nanostring compositions but employs the techniques of a cDNA array. Hybridization of the immobilized polynucleotides in the composition to the gene expression products present in the blood of the patient subject is employed to quantitate the expression of the informative genes selected from among the genes identified in Tables I or Table II to generate a gene expression profile for the patient, which is then compared to that of a reference sample. As described above, depending upon the identification of the profile (i.e., that of genes of Table I or subsets thereof, that of genes of Table II or subsets thereof), this composition enables the diagnosis and prognosis of NSCLC lung cancers. Again, the selection of the polynucleotide sequences, their length and labels used in the composition are routine determinations made by one of skill in the art in view of the teachings of which genes can form the gene expression profiles suitable for the diagnosis and prognosis of lung cancers.
In yet another aspect, a composition or kit useful in the methods described herein contain a plurality of ligands that bind to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject. In another embodiment, a composition or kit useful in the methods described herein contain a plurality of ligands that bind to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject. The gene expression profile contains the genes of Table I or Table II, as described above for the other compositions. This composition enables detection of the proteins expressed by the genes in the indicated Tables. While preferably the ligands are antibodies to the proteins encoded by the genes in the profile, it would be evident to one of skill in the art that various forms of antibody, e.g., polyclonal, monoclonal, recombinant, chimeric, as well as fragments and components (e.g., CDRs, single chain variable regions, etc.) may be used in place of antibodies. Such ligands may be immobilized on suitable substrates for contact with the subject's blood and analyzed in a conventional fashion. In certain embodiments, the ligands are associated with detectable labels. These compositions also enable detection of changes in proteins encoded by the genes in the gene expression profile from those of a reference gene expression profile. Such changes correlate with lung cancer in a manner similar to that for the PCR and polynucleotide-containing compositions described above.
For all of the above forms of diagnostic/prognostic compositions, the gene expression profile can, in one embodiment, include at least the first 25 of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 10 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 15 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 20 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 30 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 40 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 50 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 60 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 70 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 80 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 90 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 100 of the informative genes of Table II. In one embodiment, for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include at least the first 100 of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 200 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 300 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 400 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 500 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 539 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 559 of the informative genes of Table I.
These compositions may be used to diagnose lung cancers, such as stage I or stage II NSCLC. Further these compositions are useful to provide a supplemental or original diagnosis in a subject having lung nodules of unknown etiology.

IV. DIAGNOSTIC METHODS OF THE INVENTION

All of the above-described compositions provide a variety of diagnostic tools which permit a blood-based, non-invasive assessment of disease status in a subject. Use of these compositions in diagnostic tests, which may be coupled with other screening tests, such as a chest X-ray or CT scan, increase diagnostic accuracy and/or direct additional testing.
Thus, in one aspect, a method is provided for diagnosing lung cancer in a mammalian subject. This method involves identifying a gene expression profile in the blood of a mammalian, preferably human, subject. In one embodiment, the gene expression profile includes 100 or more gene expression products of 100 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 100 or more informative genes from the genes of Table I. In another embodiment, the gene expression profile includes 10 or more gene expression products of 10 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table I. In another embodiment, the gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table II. In another embodiment, the gene expression profile includes 10 or more gene expression products of 5 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table I. In another embodiment, the gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table II. Comparison of a subject's gene expression profile with a reference gene expression profile permits identification of changes in expression of the informative genes that correlate with a lung cancer (e.g., NSCLC). This method may be performed using any of the compositions described above. In one embodiment, the method enables the diagnosis of a cancerous tumor from a benign nodule.
In another aspect, use of any of the compositions described herein is provided for diagnosing lung cancer in a subject.
The diagnostic compositions and methods described herein provide a variety of advantages over current diagnostic methods. Among such advantages are the following. As exemplified herein, subjects with cancerous tumors are distinguished from those with benign nodules. These methods and compositions provide a solution to the practical diagnostic problem of whether a patient who presents at a lung clinic with a small nodule has malignant disease. Patients with an intermediate-risk nodule would clearly benefit from a non-invasive test that would move the patient into either a very low-likelihood or a very high-likelihood category of disease risk. An accurate estimate of malignancy based on a genomic profile (i.e. estimating a given patient has a 90% probability of having cancer versus estimating the patient has only a 5% chance of having cancer) would result in fewer surgeries for benign disease, more early stage tumors removed at a curable stage, fewer follow-up CT scans, and reduction of the significant psychological costs of worrying about a nodule. The economic impact would also likely be significant, such as reducing the current estimated cost of additional health care associated with CT screening for lung cancer, i.e., $116,000 per quality adjusted life-year gained. A non-invasive blood genomics test that has a sufficient sensitivity and specificity would significantly alter the post-test probability of malignancy and thus, the subsequent clinical care.
A desirable advantage of these methods over existing methods is that they are able to characterize the disease state from a minimally-invasive procedure, i.e., by taking a blood sample. In contrast, current practice for classification of cancer tumors from gene expression profiles depends on a tissue sample, usually a sample from a tumor. In the case of very small tumors a biopsy is problematic and clearly if no tumor is known or visible, a sample from it is impossible. No purification of tumor is required, as is the case when tumor samples are analyzed. A recently published method depends on brushing epithelial cells from the lung during bronchoscopy, a method which is also considerably more invasive than taking a blood sample. Blood samples have an additional advantage, which is that the material is easily prepared and stabilized for later analysis, which is important when messenger RNA is to be analyzed.
The 559 classifier described herein showed a ROC-AUC of 0.81 over all tested samples. In one embodiment, when the sensitivity is about 90%, the specificity is about 46%. When the nodule classification accuracy is assessed by size without using a specific threshold for sensitivity, as nodules size and the cancer risk factor increases, the number of benign nodules classified as cancer increases. In one embodiment, the accuracy of the gene classifier is about 89% for nodules ≤8 mm. In another embodiment, the accuracy of the gene classifier is about 75% for nodules >8 to about ≤12 mm. In yet another embodiment, the accuracy of the gene classifier is about 68% for nodules >12 to about ≤16 mm. In another embodiment, the accuracy of the gene classifier is about 53% for ≥16 mm. See examples below.
In one embodiment, for nodules about <10 mm, the specificity is about 54% and the ROC-AUC to 0.85 at about 90% sensitivity. In another embodiment, for larger nodules, about >10 mm, the specificity is about 24% and the ROC-AUC about 0.71 at about 90% sensitivity.
The 100 Classifier described herein showed a ROC-AUC of 0.82 over all tested samples. In one embodiment, when the sensitivity is about 90%, the specificity is about 62%. In another embodiment, when the sensitivity is about 79%, the specificity is about 68%. In one embodiment, when the sensitivity is about 71%, the specificity is about 75%. See examples below.
These compositions and methods allow for more accurate diagnosis and treatment of lung cancer. Thus, in one embodiment, the methods described include treatment of the lung cancer. Treatment may removal of the neoplastic growth, chemotherapy and/or any other treatment known in the art or described herein.
In one embodiment, a method for diagnosing the existence or evaluating a lung cancer in a mammalian subject is provided, which includes identifying changes in the expression of 5, 10, 15 or more genes in the sample of said subject, said genes selected from the genes of Table I or the genes of Table II. The subject's gene expression levels are compare with the levels of the same genes in a reference or control, wherein changes in expression of the subject's genes from those of the reference correlates with a diagnosis or evaluation of a lung cancer.
In one embodiment, the diagnosis or evaluation comprise one or more of a diagnosis of a lung cancer, a diagnosis of a benign nodule, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy. In another embodiment, the changes comprise an upregulation of one or more selected genes in comparison to said reference or control or a downregulation of one or more selected genes in comparison to said reference or control.
In one embodiment, the method includes the size of a lung nodule in the subject. The specificity and sensitivity may be variable based on the size of the nodule. In one embodiment, the specificity is about 46% at about 90% sensitivity. In another embodiment, the specificity is about 54% at about 90% sensitivity for nodules <10 mm. In yet another embodiment, the accuracy is about 88% for nodules ≤8 mm, about 75% for nodules >8 mm and ≤12 mm, about 68% for nodules >12 mm and ≤16 mm, and about 53% for nodules >16 mm.
In another embodiment, the reference or control comprises three or more genes of Table I sample of at least one reference subject. The reference subject may be selected from the group consisting of: (a) a smoker with malignant disease, (b) a smoker with non-malignant disease, (c) a former smoker with non-malignant disease, (d) a healthy non-smoker with no disease, (e) a non-smoker who has chronic obstructive pulmonary disease (COPD), (f) a former smoker with COPD, (g) a subject with a solid lung tumor prior to surgery for removal of same; (h) a subject with a solid lung tumor following surgical removal of said tumor; (i) a subject with a solid lung tumor prior to therapy for same; and (j) a subject with a solid lung tumor during or following therapy for same. In one embodiment, the reference or control subject (a)-(j) is the same test subject at a temporally earlier timepoint.
The sample is selected from those described herein. In one embodiment, the sample is peripheral blood. The nucleic acids in the sample are, in some embodiments, stabilized prior to identifying changes in the gene expression levels. Such stabilization may be accomplished, e.g., using the Pax Gene system, described herein.
In one embodiment, the method of detecting lung cancer in a patient includes

- a. obtaining a sample from the patient; and
- b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product.

In another embodiment, the method of diagnosing lung cancer in a subject includes

- a. obtaining a blood sample from a subject;
- b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 100 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; and
- c. diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected.

In yet another embodiment, the method includes

- a. obtaining a blood sample from a subject;
- b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 genes of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product;
- c. diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected; and
- d. removing the neoplastic growth.

V. EXAMPLES

The invention is now described with reference to the following examples. These examples are provided for the purpose of illustration only and the invention should in no way be construed as being limited to these examples but rather should be construed to encompass any and all variations that become evident as a result of the teaching provided herein.

Example 1: Patient Population—Analysis A

For development of the gene classifier described herein, blood samples and clinical information were collected from 150 subjects, 73 having a diagnosis of lung cancer and 77 having a diagnosis of benign nodule. Patient characteristics are shown in FIG. 1 .
Patients with lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).
The “control” cohort was derived from patients with benign lung nodules (e.g. ground glass opacities, single nodules, granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.

Example 2: Patient Population—Analysis B

Further blood samples and clinical information were collected from 120 subjects, 60 having a diagnosis of lung cancer and 60 having a diagnosis of benign nodule. Patients with lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).
The “control” cohort was derived from patients with benign lung nodules (e.g. granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.

Example 3: Sample Collection Protocols and Processing

Blood samples were collected in the clinic by the tissue acquisition technician. Blood samples were drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs.

Example 4—RNA Purification and Quality Assessment

PAXgene RNA is prepared using a standard commercially available kit from Qiagen™ that allows purification of mRNA. The resulting RNA is used for mRNA profiling. The RNA quality is determined using a Bioanalyzer. Only samples with RNA Integrity numbers >3 were used.
Briefly, RNA is isolated as follows. Turn shaker-incubator on and set to 55° C. before beginning. Unless otherwise noted, all steps in this protocol including centrifugation steps, should be carried out at room temp (15-25° C.). This protocol assumes samples are stores at −80° C. Unfrozen samples that have been left a RT per the Qiagen protocol of a minimum of 2 hours should be processed in the same way.
Thaw Paxgene tubes upright in a plastic rack. Invert tubes at least 10 times to mix before starting isolation. Prepare all necessary tubes. For each sample, the following are needed: 2 numbered 1.5 ml Eppendorf tubes; 1 Eppendorf tube with the sample information (this is the final tube); 1 Lilac Paxgene spin column; 1 Red Paxgene Spin column; and 5 Processing tubes.
Centrifuge the PAXgene Blood RNA Tube for 10 minutes at 5000×g using a swing-out rotor in Qiagen centrifuge. (Sigma 4-15° C. Centrifuge., Rotor: Sigma Nr. 11140, 7/01, 5500/min, Holder: Sigma 13115,286 g 14/D, Inside tube holder: 18010, 125 g). Note: After thawed, ensure that the blood sample has been incubated in the PAXgene Blood RNA Tube for a minimum of 2 hours at room temperature (15-25° C.), in order to achieve complete lysis of blood cells.
Under the hood—remove the supernatant by decanting into bleach. When the supernatant is decanted, take care not to disturb the pellet, and dry the rim of the tube with a clean paper towel. Discard the decanted supernatant by placing the clotted blood into a bag and then into the infectious waste and discard the fluid portion down the sink and wash down with a lot of water. Add 4 ml RNase-free water to the pellet, and close the tube using a fresh secondary Hemogard closure.
Vortex until the pellet is visibly dissolved. Weigh the tubes in the centrifuge holder again to ensure they are balanced, and centrifuge for 10 minutes at 5000×g using a swing-out rotor Qiagen centrifuge. Small debris remaining in the supernatant after vortexing but before centrifugation will not affect the procedure.
Remove and discard the entire supernatant. Leave tube upside-down for 1 min to drain off all supernatant. Incomplete removal of the supernatant will inhibit lysis and dilute the lysate, and therefore affect the conditions for binding RNA to the PAXgene membrane.
Add 350 μl Buffer BM1 and pipet up and down lyse the pellet.
Pipet the re-suspended sample into a labeled 1.5 ml microcentrifuge tube. Add 300 μl Buffer BM2. Then add 40 μl proteinase K. Mix by vortexing for 5 seconds, and incubate for 10 minutes at 55° C. using a shaker-incubator at the highest possible speed, 800 rpm on Eppendorf thermomixer. (If using a shaking water bath instead of a thermomixer, quickly vortex the samples every 2-3 minutes during the incubation. Keep the vortexer next to the incubator).
Pipet the lysate directly into a PAXgene Shredder spin column (lilac tube) placed in a 2 ml processing tube, and centrifuge for 3 minutes at 24 C at 18,500×g in the TOMY Microtwin centrifuge. Carefully pipet the lysate into the spin column and visually check that the lysate is completely transferred to the spin column. To prevent damage to columns and tubes, do not exceed 20,000×g.
Carefully transfer the entire supernatant of the flow-through fraction to a fresh 1.5 ml microcentrifuge tube without disturbing the pellet in the processing tube. Discard the pellet in the processing tube.
Add 700 μl isopropanol (100%) to the supernatant. Mix by vortexing.
Pipet 690 μl sample into the PAXgene RNA spin column (red) placed in a 2 ml processing tube, and centrifuge for 1 minute at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.
Pipet the remaining sample into the PAXgene RNA spin column (red), and centrifuge for 1 minute at 18,500×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through. Carefully pipet the sample into the spin column and visually check that the sample is completely transferred to the spin column.
Pipet 350 μl Buffer BM3 into the PAXgene RNA spin column. Centrifuge for 15 sec at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.
Prepare DNase I incubation mix for step 13. Add 10 μl DNase I stock solution to 70 μl Buffer RDD in a 1.5 ml microcentrifuge tube. Mix by gently flicking the tube, and centrifuge briefly to collect residual liquid from the sides of the tube.
Pipet the DNase I incubation mix (80 l) directly onto the PAXgene RNA spin column membrane, and place on the benchtop (20-30° C.) for 15 minutes. Ensure that the DNase I incubation mix is placed directly onto the membrane. DNase digestion will be incomplete if part of the mix is applied to and remains on the walls or the O-ring of the spin column.
Pipet 350 μl Buffer BM3 into the PAXgene RNA spin column, and centrifuge for 15 sec at 18,500×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.
Pipet 500 μl Buffer BM4 to the PAXgene RNA spin column, and centrifuge for 15 sec at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.
Add another 500 μl Buffer BM4 to the PAXgene RNA spin column. Centrifuge for 2 minutes at 18,500×g.
Discard the tube containing the flow-through, and place the PAXgene RNA spin column in a new 2 ml processing tube. Centrifuge for 1 minute at 18,500×g.
Discard the tube containing the flow-through. Place the PAXgene RNA spin column in a labeled 1.5 ml microcentrifuge tube (final tube), and pipet 40 μl Buffer BR5 directly onto the PAXgene RNA spin column membrane. Centrifuge for 1 minute at 10,000×g to elute the RNA. It is important to wet the entire membrane with Buffer BR5 in order to achieve maximum elution efficiency.
Repeat the elution step as described, using 40 μl Buffer BR5 and the same microcentrifuge tube. Centrifuge for 1 minute at 20,000×g to elute the RNA.
Incubate the eluate for 5 minutes at 65° C. in the shaker-incubator without shaking. After incubation, chill immediately on ice. This incubation at 65° C. denatures the RNA for downstream applications. Do not exceed the incubation time or temperature.
If the RNA samples will not be used immediately, store at −20° C. or −70° C. Since the RNA remains denatured after repeated freezing and thawing, it is not necessary to repeat the incubation at 65° C.

Example 5: Measurement of RNA Levels

To provide a biomarker signature that can be used in clinical practice to diagnose lung cancer, a gene expression profile with the smallest number of genes that maintain satisfactory accuracy is provided by the use of 100 more of the genes identified in Table I as well as by the use of 10 or more of the genes identified in Table II. These gene profiles or signatures permit simpler and more practical tests that are easy to use in a standard clinical laboratory. Because the number of discriminating genes is small enough, NanoString nCounter® platforms are developed using these gene expression profiles.
A. Nanostring nCounter® Platform Gene Expression Assay Protocol
Total RNA was isolated from whole blood using the Paxgene Blood miRNA Kit, as described above, and samples were checked for RNA quality. Samples were analyzed with the Agilent 2100 Bioanalyzer on a RNA Nano chip, using the RIN score and electropherogram picture as indicators for good sample integrity. Samples were also quantitated on the Nanodrop (ND-1000 Spectrophotometer) where 260/280 and 260/230 readings were recorded and evaluated for Nanostring-compatibility. From the concentrations taken by Nanodrop, total RNA samples were normalized to contain 100 ng in 5 μL, using Nuclease-free water as diluent, into Nanostring-provided tube strips. An 8 μL aliquot of a mixture of the Nanostring nCounter Reporter CodeSet and Hybridization Buffer (70 μL Hybridization Buffer, 42 μL Reporter CodeSet per 12 assays) and 2 μL of Capture ProbeSet was added to each 5 μL RNA sample. Samples were hybridized for 19 hours at 65° C. in the Thermocycler (Eppendorf). During hybridization, Reporter Probes, which have fluorescent barcodes specific to each mRNA of interest to the user, and biotinylated Capture Probes bound to their associated target mRNA to create target-probe complexes. After hybridization was complete, samples were then transferred to the nCounter Prep Station for processing using the Standard Protocol setting (Run Time: 2 hr 35 min). The Prep Station robot, during the Standard Protocol, washed samples to remove excess Reporter and Capture Probes. Samples were moved to a streptavidin-coated cartridge where purified target-probe complexes were immobilized in preparation for imaging by the nCounter Digital Analyzer. Upon completion, the cartridge was sealed and placed in the Digital Analyzer using a Field of View (FOV) setting at 555. A fluorescent microscope tabulated the raw counts for each unique barcode associated with a target mRNA. Data collected was stored in .csv files and then transferred to the Bioinformatics Facility for analysis according to the manufacturer's instructions.

Example 6: Biomarker Selection

Support Vector Machine (SVM) can be applied to gene expression datasets for gene function discovery and classification. SVM has been found to be most efficient at distinguishing the more closely related cases and controls that reside in the margins. Primarily SVM-RFE (48, 54) was used to develop gene expression classifiers which distinguish clinically defined classes of patients from clinically defined classes of controls (smokers, non-smokers, COPD, granuloma, etc). SVM-RFE is a SVM based model utilized in the art that removes genes, recursively based on their contribution to the discrimination, between the two classes being analyzed. The lowest scoring genes by coefficient weights were removed and the remaining genes were scored again and the procedure was repeated until only a few genes remained. This method has been used in several studies to perform classification and gene selection tasks. However, choosing appropriate values of the algorithm parameters (penalty parameter, kernel-function, etc.) can often influence performance.
SVM-RCE is a related SVM based model, in that it, like SVM-RFE assesses the relative contributions of the genes to the classifier. SVM-RCE assesses the contributions of groups of correlated genes instead of individual genes. Additionally, although both methods remove the least important genes at each step, SVM-RCE scores and removes clusters of genes, while SVM-RFE scores and removes a single or small numbers of genes at each round of the algorithm.
The SVM-RCE method is briefly described here. Low expressing genes (average expression less than 2× background) were removed, quantile normalization performed, and then “outlier” arrays whose median expression values differ by more than 3 sigma from the median of the dataset were removed. The remaining samples were subject to SVM-RCE using ten repetitions of 10-fold cross-validation of the algorithm. The genes were reduced by t-test (applied on the training set) to an experimentally determined optimal value which produces highest accuracy in the final result. These starting genes were clustered by K-means into clusters of correlated genes whose average size is 3-5 genes. SVM classification scoring was carried out on each cluster using 3-fold resampling repeated 5 times, and the worst scoring clusters eliminated. Accuracy is determined on the surviving pool of genes using the left-out 10% of samples (testing set) and the top-scoring 100 genes were recorded. The procedure was repeated from the clustering step to an end point of 2 clusters. The optimal gene panel was taken to be the minimal number of genes which gives the maximal accuracy starting with the most frequently selected gene. The identity of the individual genes in this panel is not fixed, since the order reflects the number of times a given gene was selected in the top 100 informative genes and this order is subject to some variation.
A. Biomarker Selection.
Genes which score highest (by SVM) in discriminating cancerous tumors from benign nodules were examined for their utility for clinical tests. Factors considered include, higher differences in expression levels between classes, and low variability within classes. When selecting biomarkers for validation an effort was made to select genes with distinct expression profiles to avoid selection of correlated genes and to identify genes with differential expression levels that were robust by alternative techniques including PCR and/or immuno-histochemistry.
B. Validation.
Three methods of validation were considered.
Cross-Validation: To minimize over-fitting within a dataset, K-fold cross-validation (K usually equal to 10) was used, when the dataset is split on K parts randomly and K−1 parts were used for training and 1 for testing. Thus, for K=10 the algorithm was trained on a random selection of 90% of the patients and 90% of the controls and then tested on the remaining 10%. This was repeated until all of the samples have been employed as test subjects and the cumulated classifier makes use of all of the samples, but no sample is tested using a training set of which it is a part. To reduce the randomization impact, K-fold separation was performed M times producing different combinations of patients and controls in each of K folds each time. Therefore, for individual dataset M*K rounds of permuted selection of training and testing sets were used for each set of genes.
Independent Validation: To estimate the reproducibility of the data and the generality of the classifier, one needs to examine the classifier that was built using one dataset and tested using another dataset to estimate the performance of the classifier. To estimate the performance, validation on the second set was performed using the classifier developed with the original dataset.
Resampling (permutation): To demonstrate dependence of the classifier on the disease state, patients and controls from the dataset were chosen at random (permuted) and the classification was repeated. The accuracy of classification using randomized samples was compared to the accuracy of the developed classifier to determine the p value for the classifier, i.e., the possibility that the classifier might have been chosen by chance. In order to test the generality of a classifier developed in this manner, it was used to classify independent sets of samples that were not used in developing the classifier. The cross-validation accuracies of the permuted and original classifier were compared on independent test sets to confirm its validity in classifying new samples.
C. Classifier Performance
Performance of each classifier was estimated by different methods and several performance measurements were used for comparing classifiers between each other. These measurements include accuracy, area under ROC curve, sensitivity, specificity, true positive rate and true negative rate. Based on the required properties of the classification of interest, different performance measurements can be used to pick the optimal classifier, e.g. classifier to use in screening of the whole population would require better specificity to compensate for small (˜1%) prevalence of the disease and therefore avoid large number of false positive hits, while a diagnostic classifier of patients in hospital should be more sensitive.
For diagnosing cancerous tumors from benign nodules, higher sensitivity is more desirable than specificity, as the patients are already at high risk.

Example 7: Testing of the Classifiers

Peripheral blood samples were all collected in PAXgene RNA stabilizations tubes and RNA was extracted according to the manufacturer. Samples were tested on a Nanostring nCounter™ (as described above) against a custom panel of 559 probes (Table III). In addition, they were tested against a 100 probe subset of 559 marker panel.
For the 559 Classifier, 432 were selected based on previous microarray data, 107 probes were selected from Nanostring studies and 20 were housekeeping genes. We analyzed 610 PAXgene RNA samples (278 cancers, 332 controls) derived from 5 collection sites. For QC, a Universal RNA standard (Agilent) was included in each batch of 36 samples tested. Probe expression values were normalized using the 20 housekeeping genes as well as spike-in positive and negative controls supplied by Nanostring (included in classifier). Zscores were calculated for probe count values and served as the input to a Support Vector Machine (SVM) classifier using a polynomial kernel. Classification performance was evaluated by 10-fold cross-validation of the samples.
A. 559 Classifier
As shown in FIGS. 2A to 2B, the 559 classifier developed on all the samples showed a ROC-AUC of 0.81 (FIG. 2A). With the Sensitivity set at 90%, the specificity is 46%. When performed on a balanced set of 556 samples (278 cancer, 278 nodule), similar performance is shown (FIG. 2B). For both sets, UHR controls, post samples, and patients with other cancers were excluded.
When nodule classification accuracy is assessed by size without using a specific threshold for sensitivity, we find that as nodules size and the cancer risk factor increases, the number of benign nodules classified as cancer increases. FIG. 3 . In this analysis, nodules ≤8 mm were correctly classified 88.9% of the time, for nodules >8, ≤12 mm accuracy was 75%, for nodules >12, ≤16 mm accuracy was 68%, for nodules >16 mm accuracy is 53.6%. See Table IV below.

TABLE IV

Nodule Size	Correct	Incorrect	Total	Specificity

<=5 mm	108	19	127	85.0%
>5, <=8 mm	88	11	99	88.9%
>8, <=12 mm	40	13	53	75.5%
>12, <=16 mm	17	8	25	68.0%
>16 mm	15	13	28	53.6%
Total	268	64	332	80.7%

A second set of nodules was tested and the accuracy of the classifier for size groups was determined by sample group (cancer vs benign nodule). Similarly, as nodule size and the cancer risk factor increases, the number of benign nodules classified as cancer increases (FIGS. 4A to 4C). For cancers >5 mm and higher, r=0.95. For nodules of all sizes, r=0.97. The chart shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.
Since classification accuracy was found to be negatively correlated with benign nodule size, we reanalyzed the data using only nodules <10 mm (n=244) (FIG. 5A) and sensitivity fixed at 90%, in this case the specificity rises to 54% and the ROC-AUC to 0.85. For larger nodules, >10 mm (n=88) the specificity drops to 24% and the ROC-AUC drops to 0.71 (FIG. 5B). See Table V below.

TABLE V

Small	Large
≤10 mm	>10 mm	All nodules

N (nodules)	244	88	332
min	1	10.4	1
max	10	90	90
mean	6.07	17.8	8.7
median	6	15	6
std	1.73	10.6	7.13
ROC Area	0.85	0.71	0.81
Specificity at	54%	42%	46%
90% Sensitivity

B. 100 Marker Classifier
We now reanalyzed the data from the 633 samples analyzed by W559 on the Nanostring platform in order to identify the minimal number of probes required to maintain performance attained with the whole panel. We used SVM-RFE for probe selection as previously described. We used 75% of the data for the training set with SVM-RFE and the tested the performance of top 100 probes (Table II) selected by this process on an independent testing set composed of 25% of the samples. Samples were randomly selected for training and testing sets Table VI below. The accuracy obtained on the testing set is shown in FIG. 6 . In this analysis, at a sensitivity of 90%, specificity was 62%; at a sensitivity of 79%, specificity was 68%; and at a sensitivity of 71%, specificity was 75% (FIG. 6 ). In summary the ROC-AUC is 0.82 and at a sensitivity of 0.90 we achieve a specificity of 0.62.

	TABLE VI

	nodules	cancer

	>	<=	n	>	<=	n

0	5	130	0	14	86
5	8	109	14	22	75
8	12.5	65	22	33	64
12.5		57	33		47

Each and every patent, patent application, and publication, including the priority application, U.S. Provisional Patent Application No. 62/352,865, filed Jun. 21, 2016, and publically available gene sequence cited throughout the disclosure is expressly incorporated herein by reference in its entirety. While this invention has been disclosed with reference to specific embodiments, it is apparent that other embodiments and variations of this invention are devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims include such embodiments and equivalent variations.

Claims

1. A composition for diagnosing the existence or evaluating the progression of a lung cancer in a mammalian subject, said composition comprising at least 10 polynucleotides or oligonucleotides, wherein each polynucleotide or oligonucleotide hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I.

2. The composition of claim 1, wherein at least one polynucleotide or oligonucleotide is attached to a detectable label.

3. The composition of claim 2, wherein each polynucleotide or oligonucleotide is attached to a different detectable label.

4. The composition of claim 1, further comprising a capture oligonucleotide, which hybridizes to at least one polynucleotide or oligonucleotide.

5. The composition of claim 4, wherein the capture oligonucleotide is capable of hybridizing to each polynucleotide or oligonucleotide.

6. The composition of claim 4, wherein the capture oligonucleotide binds to a substrate.

7. The composition of claim 6, further comprising a substrate to which the capture oligonucleotide binds.

8. The composition of claim 1, comprising at least 15 polynucleotides or oligonucleotides.

9. The composition of claim 1, comprising at least 25 polynucleotides or oligonucleotides.

10. The composition of claim 1, comprising at least 50 polynucleotides or oligonucleotides.

11. The composition of claim 1, comprising at least 100 polynucleotides or oligonucleotides.

12. The composition of claim 1, comprising at least 500 polynucleotides or oligonucleotides.

13. The composition of claim 1, comprising polynucleotides or oligonucleotides capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table I.

14. A kit comprising the composition of claim 1 and an apparatus for sample collection.

15. A method for diagnosing the existence or evaluating a lung cancer in a mammalian subject comprising identifying changes in the expression of 10 or more genes in the sample of said subject, said genes selected from the genes of Table I; and comparing said subject's gene expression levels with the levels of the same genes in a reference or control, wherein changes in expression of the subject's genes from those of the reference correlates with a diagnosis or evaluation of a lung cancer.

16. The method according to claim 15, wherein said diagnosis or evaluation comprise one or more of a diagnosis of a lung cancer, a diagnosis of a benign nodule, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy.

17. The method according to claim 15, wherein said changes comprise an upregulation of one or more selected genes in comparison to said reference or control or a downregulation of one or more selected genes in comparison to said reference or control.

18. The method according to claim 15, further comprising identifying the size of a lung nodule in the subject.

19. The method according to claim 15, wherein the specificity is about 46% at about 90% sensitivity or about 54% at about 90% for nodules <10 mm.

20. The method according to claim 15, wherein the accuracy is about 88% for nodules ≤8 mm, about 75% for nodules ≥8 mm and <12 mm, about 68% for nodules >12 mm and ≤16 mm, and about 53% for nodules >16 mm.