WO2009037527A2

WO2009037527A2 - Methods, systems, and compositions for cancer diagnosis

Info

Publication number: WO2009037527A2
Application number: PCT/IB2007/004629
Authority: WO
Inventors: Osvaldo L. Podhajcer; Fernando Juan Pitossi
Original assignee: Gentron, Llc
Priority date: 2006-11-16
Filing date: 2007-11-15
Publication date: 2009-03-26
Also published as: JP2010509909A; WO2009037527A3; AU2007359085A1; EP2094871A4; EP2094871A2; CA2670019A1

Abstract

The invention features methods, systems, and compositions for diagnosing non-central nervous system (non-CNS) cancers by detecting changes in gene or protein expression in the CNS, e.g., in cerebrospinal fluid, in brain or spinal cord tissue samples, or in a bodily fluid.

Description

METHODS, SYSTEMS, AND COMPOSITIONS FOR CANCER DIAGNOSIS

FIELD OF THE INVENTION

ITic invention relates to methods and compositions for risk assessment, identification, diagnosis, prognosis, and/or monitoring of cancer, and for early therapeutic intervention.

BACKGROUND OF THE INVENTION

It is axiomatic that early diagnosis and concomitant early therapeutic intervention are critical to successful treatment and/or management of most human disorders. However, many disorders presently cannot be or are not diagnosed until the pathological process is already advanced. For example, many solid tumors are usually not clinically detectable before they can be palpated or visualized by tissue imaging techniques (i.e., when they arc at least 0.5 cm in size), at which time neoplasia may have been present for years. Similarly, the diagnostic criterion for diabetes mellitus (increased fasting plasma glucose levels or hyperglycemia) identifies the disorder when glucose intolerance (the underlying cause of hyperglycemia) is already present. In another example, rheumatoid arthritis (RA) is diagnosed by the presence of joint stiffness and soreness and the presence of positive rheumatoid factor, all factors that indicate RA is already present and may be advanced. One main function of the nervous system is to maintain homeostasis by sensing and reacting to signals that reach a certain threshold. The CNS can sense and react to signals emerging from other systems previously considered to be unrelated, such as the immune system (Besedovsky and del Rey, Endocr. Rev. 17(l):64-102 (1996); Tracey, Nature 420(6917):853-859 (2002); Blalock, J. Intern. Med. 257(2): 126- 138 (2005); Ader et al., Lancet 345(8942):99-103 (1995);

Steinman, Nat. Immunol. 5(6)575-581 (2004)). For example, the brain can sense immune peripheral signals and react through the activation of the hypothalamus- pituitary-adrenal axis resulting in the modulation of an ongoing immune response (Besedovsky et ah, Science 221, 564 (1983); A. V. Turnbull, C. L. Rivicr, Physiol. Rev. 79, 1 ( 1999)). Peripheral signals can impact the brain by humoral or recently discovered neural pathways such as vagal nerve activation, supporting the idea that new pathways of communication between the CNS and the periphery are yet to be discovered (Traccy, Nature 420(6917):853-859 (2002); Gochlcr et al., Auton.

Neurosci. 85(l-3):49-59 (2000)).

Cancer remains a leading cause of death in industrialized countries. Major improvements in patient survival were attained mainly through early disease detection when still amenable to elimination, highlighting the need for detecting cancer presence at the earliest possible stage (Etzioni et al., Nat. Rev. Cancer

3(4):243-252 (2003)).

Cancer progression is characterized by high genomic instability and high mutation rates, each mutation producing new genetic clones (Khong and Rcstifo, Nat. Immunol. 3(11):999- 1005 (2002)).

The development of high throughput screening approaches such as functional genomics and proteomics has provided a new biological platform to search for molecules associated with different disorders. Gene-expression profiles based on mieroarray analysis have been of some use to predict survival of patients with lung carcinoma (Beer ct al., Nat. Med. 8(8):816-24 (2002)). A similar approach identified a group of genes that were said to be useful to predict the clinical outcome of diffuse large B-cell lymphoma following combination chemotherapy (Shipp et al.,

Nat. Med. 8(l):68-74 (2002)). In addition, comparison of the proteomic profile of patients with ovarian or prostate cancer compared to non-cancerous volunteers was said to have provided a set of serum proteins that might be useful for early cancer detection (Petricoin ct al., Lancet 359 (9306):572-7 (2002); Petricoin ct al., J. Natl.

Cancer Inst. 94 (20): 1576-8 (2002)).

At present, most functional genomics studies in cancer have used samples of cancerous tissues obtained from patients to generate cancer-associated gene expression profiles (either by a genomics or a proteomics approach).

SUMMARY OF THE FNVENTION

The methods and systems described herein are based, at least in part, on the discovery that the central nervous system (CNS) exhibits specific changes in gene expression (e.g., changes in patterns of gene expression) in response to the presence of a peripheral (non-CNS) cancer. While not bound by any theory, the inventors believe that specific changes in gene expression in the CNS, e.g., in the brain, occur in response to the presence of peripheral cancer at an early stage in the development of the cancer, e.g., before the disorder is clinically detectable and/or before the subject is symptomatic. Thus, peripheral cancers can be diagnosed at an early stage and targeted for early therapeutic intervention by analyzing changes or patterns in CNS expression of genes described herein.

In one aspect, the invention provides methods for diagnosing a non-central nervous system (non-CNS) cancer in a subject. The methods include providing a reference gene expression profile comprising five or more genes selected from the genes listed in one or more of FIGs. 5A-5I, 1 IA, 12, or 21A-23C, or homologs thereof, and optionally one or more genes listed in FlGs. 5J or 1 1 B; generating a subject gene expression profile comprising detecting expression of all genes of the reference gene expression profile in a CNS sample of the subject; and comparing the subject gene expression profile with the reference gene expression profile. A match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS cancer. In some embodiments, the methods also include providing a control gene expression profile corresponding to one or more healthy subjects; and comparing the subject gene expression profile with the control gene expression profile, wherein a match of the subject gene expression profile to the control gene expression profile indicates the subject docs not have and is not likely to develop non-CNS cancer.

In some embodiments, the CNS sample is a sample of one or more cells from the brain, e.g., cells from the hypothalamus, the midbrain, and the prefrontal cortex. In some embodiments, the brain cells are from the hypothalamus.

In some embodiments, two or more reference gene expression profiles arc used, each specific for a different non-CNS cancer.

In some embodiments, the non-CNS cancer is selected from the group consisting of lung cancer, colon cancer, and mammary cancer. In some embodiments, the non-CNS cancer is a solid tumor less than 0.5 cm in diameter.

In some embodiments, the reference gene expression profile includes ten or more genes selected from any genes listed in one or more of FlGs. 5A-5I, 1 IA, 12, and 21Λ-23C or homologs thereof. Gene expression can be detected using any method known in the art; in a preferred embodiment, gene expression is detected using a microarray assay.

In some embodiments, the subject is a human and the reference gene expression profile comprises one or more human homologs of genes listed in FIGs. 55Λ-5I, 1 1 A, 12, and 21 A-23C. The subject can have a family history of cancer, can lack a clinical sign of a cancer as evaluated by imaging analysis, and/or be a carrier of a gene associated with an increased risk of developing the non-CNS cancer, e.g., the BRCAl, BRCA2, KMSH2, hMLHl, or hMSHό gene.

In some embodiments, the methods further include generating a record of the result of the comparing step; and optionally transmitting the record to the subject, a health care provider, or an other party.

In some embodiments, the reference gene expression profile includes expression data for one or more genes selected from the following group of genes: Tbxa2r, Λtxn2, Cntn2, Oxt, Gabarapl2, Unc84a, Atp5k, Bmpl 5, Kin, Nadk, Λvp, Indo, Pome, Ptgs2, Npy, and homologs thereof; and a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop non-CNS lung cancer.

In some embodiments, the reference gene expression profile includes expression data for one or more genes selected from the following group of genes: Avp, Indo, Pome, Npy, Ptgs2, and homologs thereof; and a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop non-CNS mammary cancer.

In some embodiments, the reference gene expression profile includes expression data for one or more genes selected from the following group of genes: Avp, Indo, Mc4r, McSr, Pome, Ptgs2, Npy, and homologs thereof; and a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS colon cancer.

In another aspect, the invention features reference gene expression profiles corresponding to the presence of a non-central nervous system (non-CNS) cancer. The profile can include expression data of five or more genes, e.g., ten, twelve, twenty, thirty, forty, or fifty genes, selected from any genes listed in one or more of FlGs. 5A-51, 1 IA₃ 12, or 21 A-23C, and optionally any gene listed in one or both of FIGs. 5J or I tB.

In some embodiments, the reference gene expression profile includes expression data for five or more genes selected from any genes listed in one or more of FlGs. 5Λ-51, 1 IA, 12, and 21A-23C.

In some embodiments, the reference gene expression profile includes expression data for one or more genes selected from the following group of genes: Tbxa2r, Λtxn2, Cntn2, Oxt, Gabarapl2, Unc84a, Atp5k, Bmpl5, Kin, Nadk, Avp, lndo, Pome, Ptgs2, Npy, and homologs thereof; and wherein the non-central nervous system (non-CNS) cancer is lung cancer. In some embodiments, the reference gene expression profile comprises expression data for five or more genes selected from the following group of genes: Tbxa2r, Atxn2, Cntn2, Oxt, Gabarapl2, Unc84a, Atp5k, Bmpl 5, Kin, Nadk, Avp, lndo, Pome, Ptgs2, Npy, and homologs thereof; and wherein the non-central nervous system (non-CNS) cancer is lung cancer. In some embodiments, the reference gene expression profile includes expression data for one or more genes selected from the following group of genes: Avp, lndo, Pome, Npy, Ptgs2, and homαlogs thereof; and wherein the non-central nervous system (non-CNS) cancer is mammary cancer. In some embodiments, the reference gene expression profile comprises expression data for Avp, lndo, Pome, Npy, Ptgs2, and homologs thereof; and wherein the non-central nervous system (non-CNS) cancer is mammary cancer.

In some embodiments, the reference gene expression profile includes expression data for one or more genes selected from the following group of genes: Avp, lndo, Mc4r, Mc5r, Pome, Ptgs2, Npy, and homologs thereof; and wherein the non-central nervous system (non-CNS) cancer is colon cancer. In some embodiments, the reference gene expression profile includes expression data for five or more genes selected from the following group of genes: Avp, Indo, Mc4r, Mc5r, Pome, Ptgs2, Npy, and homologs thereof.

In a further aspect, the invention provides computer-readable media including a data set corresponding to a reference gene expression profile described herein. In an additional aspect, the invention features systems for diagnosing a non- central nervous system (non-CNS) cancer in a subject. The systems include: a sampling device to obtain a central nervous system (CNS) sample; a gene expression detection device that generates gene expression data for one or more genes in the CNS sample or an imaging device to obtain an image of gene expression of one or more genes in the CNS and generate gene expression data for the one or more genes; a reference gene expression profile as described herein for a specific non- CNS cancer; and a comparator that receives and compares the gene expression data with the reference gene expression profile.

In yet another aspect, the invention provides methods for diagnosing a non- central nervous system (non-CNS) cancer in a subject. The methods include providing a reference gene expression profile comprising five or more genes selected from any gene listed in one or more of FIGs. 5A-51, 1 IA, 12, and 21 A-23C, and optionally any gene listed in one or both of FIGs. 5J or 1 I B; generating a subject expression profile comprising detecting expression of proteins encoded by all genes of the reference gene expression profile in a CNS sample (e.g., a cerebrospinal fluid (CSF) sample) of the subject; and comparing the subject expression profile with the reference gene expression profile. A match of the subject expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS cancer.

In some embodiments, two or more reference gene expression profiles are used, each specific for a different non-CNS cancer. In some embodiments, the non-CNS cancer is selected from the group consisting of lung cancer, colon cancer, and mammary cancer.

In some embodiments, the non-CNS cancer is a solid tumor less than 0.5 cm in diameter.

In some embodiments, the reference gene expression profile includes five or more genes selected from any genes listed in one or more of FIGs. 5A-5I, 1 IA, 12, and 21 A-23C, or homologs thereof. In some embodiments, the methods also include obtaining a control gene expression profile corresponding to one or more healthy subjects; and comparing the subject expression profile with the control gene expression profile, wherein a match of the subject expression profile to the control gene expression profile indicates the subject does not have and will not likely develop the non-CNS cancer.

In some embodiments, the subject is a human and the reference gene expression profile comprises one or more human homologs of genes listed in FIGs. 5A-51, 1 1Λ, 12, and 21Λ-23C.

The subject can have a family history of cancer, can lack a clinical sign of a cancer as evaluated by imaging analysis, and/or be a carrier of a gene associated with an increased risk of developing the non-CNS cancer, e.g., the BRCAl , BRCA2, hMSH2, hMLHl, or hMSH6 gene.

The method of claim 30, further including generating a record of the result of the comparing step; and optionally transmitting the record to the subject, a health care provider, or an other party.

In some embodiments, the reference gene expression profile includes expression data of one or more genes selected from the following group of genes: Tbxa2r, Atxn2, Cntn2, Oxt, Gabarapl2, Unc84a, Atp5k, Bmpl S, Kin, Nadk, Avp, Indo, Pome, Ptgs2, Npy, and homologs thereof; and wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS lung cancer.

In some embodiments, the reference gene expression profile includes expression data of one or more genes selected from the following group of genes: Avp, Indo, Pome, Npy, Ptgs2, and homologs thereof; and wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS mammary cancer.

In some embodiments, the reference gene expression profile includes expression data of one or more genes selected from the following group of genes: Avp, Indo, Mc4r, Mc5r, Pome, Ptgs2, Npy, and homologs thereof; and wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS colon cancer. In another aspect, the invention features methods for diagnosing a non- central nervous system (non-CNS) cancer in a subject. The methods include providing a reference gene expression profile comprising five or more genes selected from the group consisting of Atxn2, Gabarapl2, Atp5k, GprlO9a, and Kin; generating a subject gene expression profile comprising detecting expression of all genes of the reference gene expression profile in a blood sample of the subject; and comparing the subject gene expression profile with the reference gene expression profile. A match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS cancer.

In yet a further aspect, the invention provides methods for diagnosing a non- central nervous system (non-CNS) cancer in a subject. The methods include providing a reference protein expression profile comprising five or more proteins selected from the genes listed in Table 1 ; generating a subject protein expression profile comprising detecting expression of all proteins of the reference protein expression profile in a cerebrospinal fluid (CSF) sample of the subject; and comparing the subject protein expression profile with the reference protein expression profile. A match of the subject protein expression profile to the reference protein expression profile indicates the subject has or is likely to develop the non- CNS cancer.

The methods described herein are useful, inter alia, for risk assessment for a variety of disorders, for early detection and diagnosis of disease, for monitoring of progression of disease, for monitoring efficacy of treatment for a disease, and/or evaluation of clinical status. As used herein a "disorder" or "disease" is an alteration in the state of the body or of some of its cells, tissues, or organs, that threatens health. The two terms are meant to encompass all stages of an illness, including the very eaτly stages of an illness (e.g., early alterations in the body that may not be detectable by the subject or a health care provider, but nonetheless set in motion a disease process). For example, the terms "disorder" and "disease" encompass the state of neoplasia, before a neoplasm or tumor is formed; and early immunological reactions to an antigen, e.g., in the development of rheumatoid arthritis, before inflammation is symptomatic.

As used herein, a "neoplasia" or "cancer" is an unregulated and progressive proliferation of cells under conditions that normally would not elicit, or would cause cessation of, proliferation of normal cells. Neoplasia can result in the formation of a "neoplasm," a new and abnormal growth of tissue. If the abnormally proliferating cells form a mass, the neoplasm is generally referred to as a 'Humor." A neoplasm or cancer can be benign or malignant.

A "subject" is a human or animal that is tested for the presence of a possible cancer. The animal can be a mammal, e.g., a domesticated animal such as a dog, cat, horse, pig, cow or goat; an experimental animal such as an experimental rodent (e.g., a mouse, rat, guinea pig, or hamster); a rabbit; or an experimental primate, e.g., a chimpanzee or monkey.

As used herein, the terms "matches," "matching," or "match," when referring to a comparison of expression profiles, mean that at least 75% of the genes in a subject expression profile are differentially expressed. If data on the direction of differential expression is included in the reference expression profile, a "match" means that at least 75% of the genes in a subject expression profile are either up- or down-regulated in the same manner as the genes in the reference expression profile. For example, if genes 1 through 5 arc up-regulated and genes 6 through 10 arc down-regulated in the reference expression profile, then a subject profile where genes 1 through 10 are down-regulated would not be a match, whereas a subject profile where genes I, 2, 3, 4, and 6 are up-regulated and genes 5, 7, 8, 9, and 10 are down-regulated would be a match. A "high level match" means that at least 75% of the genes come within at least plus or minus 50% of the expression level (or Log2 ratio of expression level) of the gene in the reference expression profile. For example, assume the following reference expression profile: for gene A, the Log2 ratio of expression level in the presence of a disorder to the expression level in the absence of the disorder is +0.4; for gene B, the ratio is -0.4; for gene C, the ratio is +0.2; and for gene D, the ratio is -0.2. A subject profile with the following values (A = +0.3; B = -0.3; C = +0.1 ; D = +0.3) is a high level match because genes A, B, C in the subject profile (75% of the genes in the reference profile) are within ± 50% of the ratios for those genes in the reference profile.

Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. AU publications, patent applications, patents, and other references mentioned herein arc incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting. Other features and advantages of the invention will be apparent from the following detailed description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FlG. 1 is a table showing number of genes differentially expressed in the CNS of mice at different times after injection of different tumor cells (lung, mammary, and colon) compared to their respective controls. Data for this table was obtained with multiple analysis of variance (ANOVA) tests, one test for each time point. The table shows the number of differentially expressed genes (Genes column), the percentage of differentially expressed genes that were up-regulated (Up column), the average fold change (Fold column), and the 5th and 95th percentiles. The data correspond to genes showing fold changes > 1.2.

FIGs. 2 A to 2D are a series of graphs that show an analysis of the quality of the data sets in graphical form, evaluated by the standard error of biological replicates. The graphs show the density distribution of the absolute value of logio standard error (ASE) and their median value for the different disease models: lung cancer (Fig. 2A), breast cancer (Fig. 2B), colon cancer (Fig. 2C) and arthritis (Fog. 2D). "Ht" stands for hypothalamus; "Cx" stands for prefrontal cortex; "Mb" stands for midbrain; and "Li" stands for liver.

FIG. 3 is a table showing the number of genes differentially expressed in mice injected with tumor cells compared to their respective controls. Differentially expressed genes were selected as described herein (initial analysis). The table shows the number of differentially expressed genes (Genes row), the proportion of upregulated genes (Up row), the average fold change (Fold row), and the 5th and 95th percentiles. The data correspond to genes showing fold changes > 1.2.

FlG. 4 is a table showing the number of genes differentially expressed in mice injected with cancer cells compared to their respective controls. Selected genes correspond to those showing altered expression in only one direction (second more restrictive analysis). The table shows the number of differentially expressed genes (Genes row), the proportion of upregulated genes (Up row), the average fold change (Fold row), and the 5th and 95th percentiles. Data correspond to genes showing > 1.2 fold change (Sec Experimental methodology for further details on gene selection).

FlGs. 5A-1 are tables of genes differentially expressed in the brains of mice injected with cancer cells. Genes selected for inclusion in this table correspond to those showing altered expression in only one direction (i.e., up or down) in a more restrictive analysis (described in example 3). Each table includes genes differentially expressed in certain brain regions, as follows: 5Λ-C, cortex; 5D-F₁ hypothalamus; and 5G-1, midbrain. The mice were injected with colon cancer cells (5A, 5D, and 5G), breast cancer cells (5B, 5E, and 5H), or lung (5C, 5F, and 51) cancer cells.

FIG. 5J is a scries of tables of additional genes differentially expressed in mice injected with cancer cells. Genes selected for inclusion in this table correspond to those showing altered expression in only one direction in a more restrictive analysis (described in example 3), using anova for paired data. Ht, hypothalamus; ct, cortex; mb, midbrain.

FIGs. 6Λ to 6C are a series of graphs that show the False Discovery Rate (FDR; q-value) for different brain regions and liver datascts corresponding to the three cancer models. Genes showing > 1.2 fold change in expression levels were rank-sorted according to their p-value of differential expression. The figure shows the q-value estimated from this ranked list. Cx: Prefrontal cortex; Ht: Hypothalamus; Mb: Midbrain; Li: Liver FIG. 7A is a bar graph that shows the validation of microarray data by real time PCR. Genes corresponding to the hypothalamus of the lung cancer model were selected by criteria 1 or 2 and grouped according to their fold change. Bars represent the proportion of validated time points and the numbers above the bars indicate the number of assayed time points.

FlG. 7B is a series of Venn diagrams showing the intersections between lung, mammary, and colon cancer models for the three brain areas. Numbers indicate numbers of differentially expressed genes within the second more restrictive analysis. * p < 0.05, ** p < 0.01 Fisher's exact test.

FIGs. 8A, B, and C are a series of three two-dimensional cluster analyses of hypothalamic samples. (8Λ) Hierarchical cluster analysis of hypothalamic samples obtained from the lung and colon cancer models based on genes responding differentially between these tumor models. Hierarchical cluster analysis of sample from (8B) mammary and lung cancer models and (8C) colon and mammary cancer models, based on the subset of genes responding differentially between the paired models. Each column represents a sample and each row a single gene. Grey scale represents the fold change as a percentage. The two dominant cluster of samples are defined by the first branch division of the hierarchical tree belonging to the samples axis. Sample labels indicate cancer model and time point.

FlGs. 8D, 8E, and 8F are tables listing the genes identified in the hypothalamic sample cluster analyses shown in FIGs. 8A-C. 8D, colon vs. breast; 8E, colon vs. lung;8F, breast vs. lung. FIGs. 9A, B, and C are a series of three two-dimensional cluster analyses.

Cortical samples were obtained from (A) mammary and lung cancer models, (B) colon and lung cancer models, and (C) mammary and colon cancer models based on a subset of genes responding differentially between the paired cancer models. Each column represents a sample and each row a single gene. Grey scale represents the fold change as a percentage. The two dominant clusters of samples are defined by a first branch division of the hierarchical tree belonging to the samples axis. Sample labels indicate cancer model and time point.

FIGs. 9D, 9E, and 9F are tables listing the genes identified in hypothalamic sample cluster analyses shown in FIGs. 9A-C. 9D, colon vs. breast; 9E, colon vs. lung; 9F, breast vs. lung.

FIGs. 1 OA, B, and C are a series of three two-dimensional cluster analyses. Midbrain samples were obtained from (A) mammary and lung cancer models, (B) colon and lung cancer models and (C) mammary and colon cancer models based on a subset of genes responding differentially between the paired cancer models. Each column represents a sample and each row a single gene. Grey scale represents the fold change as a percentage. The two dominant clusters of samples are defined by a first branch division of the hierarchical tree belonging to the samples axis. Sample labels indicate cancer model and time point.

FlGs. 1 1 A and 11 B are tables of genes selected from the hypothalamus, prefrontal cortex and midbrain of mice injected with lung, breast and colon cancer cells, validated by real-time PCR. Genes in these tables were obtained from an in- house printed 1OK oligonucleotide-based array (p < 0.05). Column definitions for 1 1 A and 1 1 B are as follows: Area: Brain area from which the sample was obtained; Model: Cancer model; locus: Entrcz gene number (=locuslink number) number (unique identified for each gene); fold: fold change (%); p.value: p-valuc estimated by ANOVA (should be < 0.05); gene: Gene symbol from entrez-gcnc database; description: Gene description from cntrcz-gene database; locus(Homo): Entrez-gene number (=locuslink number) for the Homo sapiens homolog obtained from HomoloGene database.; gene(Homo): Gene symbol for the Homo sapiens homolog. FlG. 12 is a table of genes assayed by real-time PCR. Genes in these tables were not present in the in-house printed 1OK oligonucleotide-based array, but are known to be involved in a behavioral state known as sickness behavior. The data is expressed as average ± standard error of the mean. Column definition: Area: Brain area from which the sample was obtained; Model: Cancer model; locus: Entrcz gene number (=locuslink number) number (unique identified for each gene); fold: fold change (%); p.value: p-value estimated by ANOVA (should be < 0.05); gene: Gene symbol from entrez-gene database; description: Gene description from entrez-gcne database; locus(Homo): Entrez-gcnc number (=locuslink number) for the Homo sapiens homolog obtained from HomoloGene database; gene(Homo): Gene symbol for the Homo sapiens homolog.

FlGs. 13 A and B are bar graphs showing the results of behavioral tests. (13A) Mean weight of pellets displaced (burrowed) from a tube by mice at different time points after injection of tumor cells, n = 9 animals per group. (13B) Forced swimming test done at 9 days after tumor cell injection. Y-axcs shows the time to immobility (seconds), n = 7 animals per group.

FIGs. HA, B, and C are a series of three two-dimensional cluster analyses. Hypothalamic samples were obtained from (14A) arthritis and lung cancer models, ( 14B) arthritis and mammary cancer models and (14C) arthritis and colon cancer models based on a subset of genes responding differentially between the different paired models. Each column represents a sample and each row a single gene. Grey scale represents the fold change as a percentage. The two dominant cluster of samples arc defined by the first branch division of the hierarchical tree belonging to the samples axis. Sample labels indicate cancer model and time point.

FIGs. 14D-G arc tables of genes that are differentially expressed in the hypothalamus between arthritis and cancer models. 14D, arthritis vs. colon; 14E, arthritis vs. lung; 14P, arthritis vs. breast; 14G, arthritis vs. colon, lung, and breast. Column definitions: locus: Entrez gene number (=locuslink number) number (unique identified for each gene); gene: Gene symbol from cntrcz-gcnc database; description: Gene description from entrez-gcne database; Iocus(Homo): Entrcz-genc number (=locuslink number) for the Homo sapiens homolog obtained from HomoloGcnc database; gcnc(Homo): Gene symbol for the Homo sapiens homolog. FIGs. 15A, B, and C are a scries of three two-dimensional cluster analyses. Cortical samples were obtained from ( 15A) arthritis and lung cancer models, (15B) arthritis and mammary cancer models and (15C) arthritis and colon cancer models based on a subset of genes responding differentially between the different paired models. Each column represents a sample and each row a single gene. Grey scale represents the fold change as a percentage. The two dominant clusters of samples are defined by a first branch division of the hierarchical tree belonging to the samples axis. Sample labels indicate cancer model and time point.

FlGs. 15D-G are tables of genes that arc differentially expressed in the cortex between arthritis and cancer models. 15D, arthritis vs. colon; 15E, arthritis vs. lung; 15F, arthritis vs. breast; 15G, arthritis vs. colon, lung, and breast. Column definitions arc as in FlGs. 14D-G.

FIGs. 16A, B, and C are a series of three two-dimensional cluster analyses. Midbrain samples were obtained from (A) arthritis and lung cancer models, (B) arthritis and mammary cancer models and (C) arthritis and colon cancer models based on a subset of genes responding differentially between the different paired models. Each column represents a sample and each row a single gene. Grey scale represents the fold change as a percentage. The two dominant clusters of samples are defined by a first branch division of the hierarchical tree belonging to the samples axis. Sample labels indicate cancer model and time point.

FIGs. 16D-G are tables of genes that are differentially expressed in the midbrain between arthritis and cancer models. 156, arthritis vs. colon; 16E, arthritis vs. lung; 16F, arthritis vs. breast; 16G, arthritis vs. colon, lung, and breast. Column definitions are as in FIGs. 14D-G.

FIG. 17Λ is a hierarchical cluster analysis of hypothalamic samples were obtained from the arthritis and cancer models based on the 10 top-ranked genes from each pair-wise comparison. Each column represents a sample and each row a single gene. Grey scale represents the fold change as a percentage. The white line marks the subdivision into the dominant clusters of samples. Sample labels indicate disease model and time point.

FIGs. 17B-D are tables of genes selected from each region that change for either arthritis model with p-value < 0.05 and fold > 20%, but if it changes in both models, the change must be in the same direction, e.g., genes that changed in at least one time point; genes that changed in more than one time point, must have changed in the same direction (i.e., always up or down regulated). Column definitions: locus: Entrez gene number (=locuslink number) number (unique identified for each gene); C57 and DBA: Fold change (%) for each arthritis model (C57BL/6 or DBA/1 ); max: Maximum fold change for either arthritis model (should be > 20); p.value: p-value estimated by ANOVA (should be < 0.05); gene: Gene symbol from cntrcz-gcnc database; description: Gene description from cntrcz-gene database; locus(Homo): Entrcz-genc number (=locuslink number) for the Homo sapiens homolog obtained from HomoloGene database; gene(Homo): Gene symbol for the Homo sapiens homolog. FIG. 18 is a graph showing the False Discovery Rate (FDR; q-value) for different brain regions datasets corresponding to the arthritis models. Genes showing a greater than 20% fold change in expression levels were rank-sorted according to their p-value of differential expression. The figure shows the q-value estimated form this ranked list. Cx: Prefrontal cortex; Ht: Hypothalamus; Mb: Midbrain.

FIGs. 19A, D and G are directed acyclic graphs (DAGs) showing the Gene Ontology Consortia database (GO) gene sets enriched in differentially expressed genes. Filled circles represent differential gene sets at p < 0.05. Intensity indicates the average fold change as percentage for each gene set (p < 0.05, GSEA). Open circles indicate non-differential gene sets (p ≥0.05). Closely related gene sets were grouped in categories (upper labels) highlighted with the same color across the three time points. Leading-edge genes (LEG) were identified from selected terminal gene sets.

FIGs. 19B, E and H are bar graphs showing LEG manually grouped according to their biological function in an adult brain.

FIGs 19C, F and 1 are bars illustrating the number of genes corresponding to the "neuronal connections" function that belong to specific signaling pathways. The horizontal bars indicate the signaling pathway and the size is proportional to the amount of genes.

FIGs. 19J-L arc tables of hypothalamic leading edge genes from differential gene ontology sets for breast cancer (19J), colon cancer (19K), and lung cancer (19L). Columns definitions: locus: Entrez gene number (=locuslink number) number (unique identified for each gene); fold: fold change (%); p.value: p-value estimated by ANOVA (should be < 0.05); gene: Gene symbol from entrez-gene database; description: Gene description from cntrez-gcne database; locus(Homo): Entrcz-gcne number (=locuslink number) for the Homo sapiens homolog obtained from HomoloGcne database; genc(Homo): Gene symbol for the Homo sapiens homolog.

FIG. 20 is a list of individual genes that are differentially expressed in the brain in response to cancer. This list contains the genes present in at least one of the previous lists (figures 5, 8-10, and 14-16), with exception of the ones for gene expression profiles that discriminate between different cancer models (Figures 1 1- 12). Total number of genes: 999. Column definitions: locus: Entrez gene number (-locuslink number) number (unique identified for each gene); gene: Gene symbol from entrez-gene database; description: Gene description from cntrez-genc database; locus(Homo): Entrez-gcnc number (=locuslink number) for the Homo sapiens homolog obtained from HomoloGene database; gene(Homo): Gene symbol for the Homo sapiens homolog. FIGs. 21 A-C, 22A-C, and 23 A-C are lists of the 50 best candidate genes for each tumor type (21 A-C, colon; 22A-C, lung; 23A-C, mammary). These lists were made from lists described herein and include genes validated by real-time PCR and Figure 5, using ANOVA for paired data. Column definitions: locus: Entrez gene number (=locuslink number) number

(unique identified for each gene); gene: Gene symbol from entre/.-gcne database; description: Gene description from cntrez-gene database; locus(Homo): Entrez-gene number (=locuslink number) for the Homo sapiens homolog obtained from HomoloGene database; gene(Homo): Gene symbol for the Homo sapiens homolog. DETAILED DESCRIPTION

The methods described herein rely, in part, on the detection of gene expression in the CNS to identify (e.g., diagnose or monitor) peripheral (non-CNS) tissues or organs for early stages of cancer (e.g., in some cases, within hours, days, weeks or months of the appearance of the cancer). Early identification and/or diagnosis of disease provides an opportunity for early therapeutic intervention to target the cancer before it becomes overly advanced or aggressive.

General Methodology

The CNS is involved in the body's response to any internal or external stimulus that by its intensity or functional relevance could alter internal homeostasis.

As part of this function, the CNS and the immune system interact to obtain a suitable immune response when necessary.

An immune response impacts the brain via neural and humoral mechanisms.

Neural mechanisms primarily involve the activation of the vagal nerve. Humoral mechanisms can include cytokinc-mcdiated action directly on brain structures, e.g., cytokinc-mediated increases on neural firing rates (Rothwcll and Hopkins, Trends

Ncurosci. 18(3): 130- 1366 (1995); Wang et al., Nature 421 (6921 ):384-388 (2003)). In one example, peripheral cytokines have been shown to bind and activate the vagal nerve, which in turn activates neurons of the nucleus of the tractus solitarius and the hypothalamus in the brain (Watkins and Maier, Proc. Natl. Acad. Sci. USA, 96(14):7710-7713 (1999)). Humoral signals from the periphery act as potent messengers to the brain.

Cytokines in the brain can exert their action at a much lower dose than in the periphery. For example, intracerebral administration of interleukin-1 (IL-I) at a dose of 100 pg to 10 ng elicits maximal changes in fever, gastric function, increased metabolism and behavioral changes, while several micrograms of this cytokine are necessary to elicit similar responses when administered to the periphery (Rothwell and Hopkins, (1995), supra).

After sensing an internal immune signal, the brain reacts in different ways. A paradigm of CNS response to immune signals is the activation of neuroendocrine axes such as the hypothalamus-pituitary-adrenal axis. The activation of this axis results in the liberation of glucocorticoids, which in turn can modulate the ongoing immune response in under 10 minutes. Vagatomy has been shown to blunt the activation of the hypothalamus pituitary adrenal axis after intraperitoneal administration of cytokines (Watkins and Maicr, (1999), supra). This feedback mechanism is of high physiological relevance; i.e., inhibition of glucocorticoid production after cytokine release in the periphery usually results in the death of the organism (Bescdovsky and del Rey, Endocr. Rev, 17(l):64-102 (1996)).

The brain can also sense signals that will affect the immune and other systems from the external milieu. For example, the triggering of a stress reaction can result in the release of glucocorticoids and the attenuation of an ongoing immune response. The effects of stress on the immune system are well documented in animal models and humans (Deinzer et al., Int. J. Psychophysiol. 37(3):219-32 (2000); Marshall ct al., Brain Behav. Immun. 12(4):297-307 (1998); Bcnschop ct al., FASEB J. 10(4):517-24 (1996); Sheridan et al., Ann. KY. Acad. Sci. 840:803-8 ( 1998)). In addition, there is anecdotal and preliminary evidence that mind/body interventions such as meditation or yoga could have an influence on the immune system (Cassileth, CA Cancer J. Clin. 49(6):362-75 (1999)). The new methods harness this natural reaction of the CNS as a way to detect peripheral cancer and other disorders at an early stage. While not limited by any theory, the methods described herein are based, in part, on the discovery that the CNS senses the presence of "alarm signals" from peripheral (non-CNS) disorders at an early stage in the development of cancer progression. Thus, the methods described herein relate to diagnosing peripheral cancers by detecting gene expression in the CNS, e.g., in a CNS sample from a subject, such as a human. In one aspect, a non-CNS cancer can be identified based on a specific profile of gene expression in the brain (e.g., a profile of genes selected from genes in FlGs. 5, 12, and 13 as expressed in the hypothalamus, cortex, and/or midbrain regions of the brain) within hours, weeks or months after cancer progression is initiated in the body. In some embodiments, a non-CNS cancer can be identified based on a profile of gene expression in the brain after cancer progression is initiated in the body, but before the cancer is clinically detectable and/or in an advanced stage.

Cancer Development

It is generally accepted that a clinically detectable tumor mass is composed of cells that, although abnormal, evade immune surveillance and resist immune system attack. During the time of neoplastic progression, cells are characterized by high mutation rates, reflected, inter alia, in phenotypic changes such as down- regulation of histocompatibility antigens. A tumor may thus become resistant to a particular therapeutic by clonal selection and proliferation from the tumor mass of a cell clone having a mutation that allows the cell to resist the given therapeutic. The "natural selection" of tumor cell clones occurs at a given rate leading to the appearance of malignant cells having genetic and epigenctic traits that facilitate growth and escape from the immune system. It is estimated that the average malignancy contains more than 10,000 mutations (Stoler et al., Proc. Natl. Acad. Sci. USA. 96(26): 15121-6 (1999)). Therefore, it can be concluded that the antigen profile of established cancers by no means reflects the cell genotype and phenotype of very early stage neoplasia. Moreover, it is reasonable to assume that tumor antigens present in the established cancer and the response they can induce in the organism will be different than the antigens and responses induced by early stage neoplastic cells. The new methods described herein can detect such early stage neoplastic cells in spite of these obstacles.

Some neoplasms, e.g., some cancers, can grow for long periods (e.g., for 1, 2, 5, 10, 15, 20 or 25 years) before they arc clinically detectable using prior known technology and/or before they become malignant. This period provides an extraordinary window of opportunity for detection of cancerous cells before the malignant tumor is clinically detectable by current strategies. During this period tumor cells undergo several modifications at the molecular level as a result of their genomic instability. Each genetic change is potentially selective for proliferation and/or is capable of triggering a new "alarm signal" to recruit and activate local innate and adaptive immune responses. In a simple view, 10,000 alann signals are produced during the 10 to 15 years of tumor development before the tumor is clinically detectable.

Methods Of Detecting Gene Expression

Gene expression in the CNS can be detected in vitro, e.g., in an isolated CNS sample, or in vivo, e.g., using in vivo imaging techniques.

Central Nervous System (CNS) Samples

The CNS refers to the brain (including the cranial nerves) and spinal cord. A CNS sample can be, e.g., a cell or tissue from the brain or spinal cord, or a sample of the cerebrospinal fluid (CSF) that fills the ventricles of the brain and the central canal of the spinal cord. Where the detection of gene expression is to be done in a CNS sample isolated from the subject, a CNS sample can be obtained by any number of methods available to the skilled artisan. For example, a CNS cell or tissue sample can be obtained from the brain, e.g., by needle biopsy or by open surgical incision. Imaging of the brain can be performed to determine the precise positioning of the needle or scalpel to enter the brain.

In one example, known as stereotactic biopsy, a tiny hole is drilled into the skull with the patient under light sedation or general anesthesia, and a needle is inserted into the brain tissue guided by computer-assisted imaging techniques such as computerized tomography (CT) or magnetic resonance imaging (MRI) scans. The needle is used to remove a sample of cells, whose gene expression can then be detected by a routine assay, e.g., a gene expression assay described herein. In another example, a sample of CSF can be obtained by routine methods, such as by lumbar puncture. This procedure can be done on an outpatient basis, e.g., under local anesthetic.

The number of cells or amount of CSF needed to perform a particular gene expression assay on a CNS sample will vary; however, some techniques, such as PCR based techniques, will require a very small number of cells, e.g.. as few as 10 to 100 cells (Klein et al., Nat. Biotechnol. 20(4):387-92 (2002)). The CNS sample can be used immediately in a diagnostic test described herein, or it can be stored, e.g., cooled or frozen, and/or transported to a facility where the diagnostic test is performed.

Nucleic Acid-Based Methods

In one embodiment, the methods described herein will utilize techniques for detection of gene expression where a polynucleotide (such as an RNA, mRNA, DNA, cDNA, or other nucleic acid corresponding to the gene) is detected. It should be understood by the skilled artisan that many methods for nucleic-acid based detection of gene expression exist and that any suitable method for detection can be used. Typical assay formats utilize nucleic acid hybridization and include, e.g., 1) nuclear run-on assay, 2) slot blot assay, 3) northern blot assay, 4) magnetic particle separation, 5) nucleic acid or DNA arrays or chips (also discussed in more detail below), 6) reverse northern blot assay, 7) dot blot assay, 8) in situ hybridization, 9) RNase protection assay, 10) ligase chain reaction, 11) polymerase chain reaction (PCR), 12) reverse transcriptase (RT)-PCR, and 13) differential display RT-PCR (DDRT-PCR) or any combination of any two or more of these methods. Such assays can employ the use of detectable labels such as radioactive labels, enzyme labels, chcmi luminescent labels, fluorescent labels, or other suitable labels, to detect, identify, or monitor the presence or level of a particular nucleic acid being detected. Such techniques and labels are known in the art and widely available to the skilled artisan.

In one embodiment, an RNase protection assay can be utilized in the methods described herein by hybridizing multiple DNA probes corresponding to one or more members of a panel of sequences to mRNA isolated from a CNS sample from a subject to be tested. The expression profile for one or more genes from the CNS sample can be compared to a reference gene expression profile, e.g., a basal pattern of expression, or other negative or positive control (e.g., a profile from a patient known to have no peripheral disease, or a standard or average profile derived from subjects known to not have the particular disorder being tested). In one example, the gene expression profile from the test CNS sample is compared to a reference gene expression profile that is associated with the presence of a non-CNS neoplasia. If the test gene expression profile matches the reference gene expression profile, it indicates that the subject has, or is at risk for developing, the non-CNS neoplastic disorder.

The methods described herein are also well suited for polymerase chain reaction (PCR)-based methods. PCR-based methods include RT-PCR (U.S. Patent No. 4,683,202), ligasc chain reaction (Barany, Proc. Natl. Acad. Sci. USA 88:189- 193 (1991)), self-sustained sequence replication (Guatelli et al., Proc. Natl. Acad. Sci. USA 87: 1874-1878 (1990)), transcriptional amplification system (Kwoh ct al., Proc. Natl. Acad. Sci. USA 86: 1173-1177 (1989)), Q-Bcta Replicase (Lizardi et al., BioTechnology, 6:1197 (1988)), rolling circle replication (Lizardi et al., U.S. Patent No. 5,854,033), or any other nucleic acid amplification method, followed by the detection of the amplified molecules using techniques known in the art. PCR amplification of mRN As expressed in a CNS sample can be performed directly from mRNA isolated from the sample, or from cDN A reverse- transcribed from such isolated mRNA. The amplified nucleic acid can then be hybridized to a particular probe of interest, e.g., a probe for a CNS gene as described herein, to determine its expression. The probe can be disposed on an address of an array, e.g., an array described herein. Such methods are routine and are particularly amendable to routine adaptation to automated systems employing computer controlled reagent aliquoting and signal detection. Sec, e.g., Klein ct al., Nat. Biotcchnol. 20(4):387-92 (2002).

In another embodiment, in situ methods are used to detect the presence or level of mRNA corresponding to a particular gene. In such methods, a CNS cell or tissue sample can be prepared/processed and immobilized on a support, typically a glass slide, and then contacted with a probe (e.g., a probe for a CNS gene described herein).

In still another embodiment, serial analysis of gene expression, as described in U.S. Patent No. 5,695,937, is used to detect transcript levels of a CNS gene described herein.

Polypcptide-Based Methods

In other embodiments, the methods described herein utilize techniques for detection of gene expression where a gene product (polypeptide) encoded by a gene is detected or where an activity of the polypeptide, e.g., an enzymatic activity, is detected. Such methods are particularly advantageous for detecting the expression of genes that encode polypeptides that are secreted from CNS cells, e.g., into the CSF.

A variety of methods can be used to determine the level of protein encoded by a CNS gene. In general, these methods include contacting a CNS sample (such as a brain cell sample or a CSF sample) with an agent, such as an antibody, that selectively binds to the protein of interest. In one embodiment, the antibody bears a detectable label. Antibodies can be polyclonal, or more preferably, monoclonal. An intact antibody, or a fragment thereof (e.g., Fab or F(ab')2) can be used. The term "labeled," with regard to the probe or antibody, is intended to encompass direct labeling of the probe or antibody by coupling (i.e., physically linking) a detectable substance to the probe or antibody, as well as indirect labeling of the probe or antibody by reactivity with a detectable substance. Such detection methods can be used to detect a CNS gene product in a CNS sample in vitro as well as in vivo. In vitro techniques include immunoassays such as enzyme linked immunosorbent assays (ELISAs), immunoprecipitations, immunofluorescence, enzyme immunoassay (ElA), radioimmunoassay (RIA), Western blot analysis, and Luminex™ x MAP™ detection assay. Some immunoassays are "sandwich" type assays, in which a target analyte(s) is "sandwiched" between a labeled antibody and an antibody immobilized onto a solid support. The assay is read by observing the presence and amount of antigen-labeled antibody complex bound to the immobilized antibody.

Another immunoassay useful in the methods described herein is a "competition" type immunoassay, wherein an antibody bound to a solid surface is contacted with a sample (e.g., a CSF sample) containing both an unknown quantity of antigen analytc and labeled antigen of the same type. The amount of labeled antigen bound on the solid surface is then determined to provide an indirect measure of the amount of antigen analyte in the sample. Such immunoassays are readily performed in a "dipstick" format (e.g., a flow-through or migratory dipstick design) for convenient use. A dipstick-based assay optionally includes an internal negative or positive control. Numerous types of dipstick immunoassays are known in the art and arc described, e.g., in U.S. Patent Nos. 5,656,448; 4,366,241 ; and 4,770,853. In other embodiments, antibody-based assays are performed in an array format. For example, a CNS sample is labeled, e.g., biotinylated, and then contacted to an antibody, e.g., an antibody positioned on an antibody array. The sample can be detected, e.g., with avidin coupled to a iluorescent label. In vivo techniques include, e.g., introducing into a subject (e.g., into the

CSF) a labeled antibody that binds to the gene product to be detected. The antibody can be labeled, e.g., with a radioactive marker, whose presence and location in a subject can be detected by standard imaging techniques.

Polyclonal and monoclonal antibodies to be used to detect a particular CNS gene product will, in certain cases, be available. For example, commercially available antibodies exist for many of the CNS marker genes described herein. Alternatively, a skilled artisan can make a suitable antibody for use in a diagnostic assay using routine techniques. Methods of making and using polyclonal and monoclonal antibodies to detect a particular target are described, e.g., in Harlow ct al., Using Antibodies: A Laboratory Manual: Portable Protocol t. Cold Spring

Harbor Laboratory (December 1 , 1998). Methods for making modified antibodies and antigen-binding antibody fragments (e.g., chimeric antibodies, reshaped antibodies, humanized antibodies, or fragments thereof, e.g., Fab', Fab, F(ab')2 fragments); or biosynthetic antibodies (e.g., single chain antibodies, single domain antibodies (DABs), Fv, single chain Fv (scFv), and the like), are known in the art and can be found, e.g., in Zola, Monoclonal Antibodies: Preparation and Use of Monoclonal Antibodies and Engineered Antibody Derivatives, Springer Verlag (December 15, 2000; 1st edition).

Imaging of CNS Gene Expression

In one embodiment, the methods described herein utilize techniques for imaging of gene expression, e.g., non-invasive imaging of gene expression, in the CNS. For example, a labeled probe that is capable of detecting the expression of a target gene can be delivered into the brain through the blood-brain barrier (BBB) by targeting the labeled probe to the brain via endogenous BBB transport systems, such as carrier-mediated transport systems that exist for the transport of nutrients across the BBB. Similarly, receptor-mediated transcytosis systems operate to transport circulating peptides across the BBB, such as insulin, transferrin, or insulin-like growth factors. These endogenous peptides can act as "transporting peptides," or "molecular Trojan horses," to ferry a labeled diagnostic probe as described herein across the BBB. The label can then be detected by known brain imaging techniques. Such an approach is described, e.g., in U.S. Patent No. 6,372,250. In other embodiments, the methods described in Shi et al., Proc. Natl. Acad. Sci. USA 97(26): 14709- 14 (2000), and Lee ct al., J. Nucl. Med. 43(7):948-56 (2002), for imaging of gene expression in the brain in vivo using an antisense radiopharmaceutical combined with drug-targeting technology to traverse the BBB, can be used.

Other methods of delivering into the brain a labeled probe that is capable of detecting the expression of a target gene are described, e.g., in U.S. Pat. No. 5,720,720. This patent describes methods of delivering agents (such as labeled antibodies for imaging gene products) into the brain by high-flow microinfusion. Detection of Changes in CNS Gene Expression in Bodily Fluids In some cases, gene activation in the CNS can result in a measurable alteration in a gene product at a distant site, e.g., in a fluid such as blood, urine or semen (i.e., a fluid other than CSF). It is known, for example, that the cerebral cortex, hippocampus, entorrhinal cortex, parts of the thalamus, basal ganglia, cerebellum, and the reticular formation influence the output of the autonomic nervous system (Kandel et al., Principles of Neural Science. Third Edition, Appleton & Lange). These influences can result in measurable alterations of gene expression at the mRNA or protein level in autonomic ganglia or in innervated organs. An example of this type of interaction is the immunomodulatory action of the activation of the vagus nerve after cytokine release in the periphery (Traccy, Nature, 420:853- 9, 2002).

In addition, in some embodiments, gene activation in the CNS can be detected by measuring levels of CNS proteins in blood, i.e., proteins that were expressed in CNS tissues and ended up in the blood stream. For example, neurons in the CNS can trigger the release of hormones in blood via the activation of several neuroendocrine axes such as the hypothalamus-pituitary-adrenal, -gonadal, or thyroid axes (Besedovsky and del Rcy, Endocr. Rev. 17:1-39 (1996)). Moreover, brain extracellular fluid drains into blood and deep cervical lymph (Cserr et al., Brain Pathol. 2(4):269-76 (1992)). Cerebral extracellular fluids drain from the brain into the blood across the arachnoid villi and into the lymph along certain cranial nerves (primarily olfactory) and spinal nerve root ganglia. A minimum of 14 to 47% of protein injected into different regions of brain or cerebrospinal fluid passes through the lymph. Thus, CSF markers drain into, and can be detected in, lymph, blood, or scrum. Such markers found in blood may also be enriched, and thereby detectable, in urine, due to selective filtration of blood components by the kidneys.

The CNS is connected to the testes via the autonomic nervous system as well as the endocrine system. If a change in gene activity in the brain results in modifications in the activity of the hypothalamus-pituitary-gonadal axis or in the innervation of the testes, these changes could be then detected in fluids related to the testes, such as semen. For example, patients with spinal cord injury have been shown to have alterations in the composition of their semen (See Naderi and Safarinejad, Clin. Endocrinol. 58(2): 177-84 (2003)).

Routine methods can be used to identify gene products in peripheral tissues, such as peripheral bodily fluids, which are the result of changes in gene expression in the CNS. For example, a candidate marker gene can be disrupted in the brain of an experimental animal. A change in the expression of a candidate gene in a peripheral tissue in the experimental animal, compared to a wild type animal (i.e., an animal not disrupted for the candidate marker gene) indicates that the expression of the candidate molecule in the peripheral tissue is tied to changes in gene expression in the CNS.

In some embodiments, CNS gene products detected in blood include those listed in Table 1, below.

Arrays The methods described herein are readily adapted for nucleic acid or protein arrays, e.g., nucleic acid and/or protein "chips," following the methods known in the art. In a typical embodiment, an array chip includes multiple probes (e.g., DNA probes and/or antibody probes) for detection of expression of multiple CNS genes. In one embodiment, the probes on a specific chip are chosen to detect the members of one or more specific panels or "clusters" of genes, each cluster being associated with a specific gene expression profile if a non-CNS neoplasia is present in the subject from whom the CNS sample was taken. A chip can contain tens, hundreds, or thousands of individual probes immobilized (tethered) at discrete, predetermined locations (addresses or "spots") on a solid, planar support, e.g., glass, metal, or nylon. An array can be a macroarray or microarray, the difference being in the size of the spots. Macroarrays contain spots of about 300 microns in diameter or larger and can be imaged using gel or blot scanners. Microarrays contain spots less than 300 microns, typically less than 200 microns, in diameter.

For analysis and comparison of profiles of gene expression in the methods described herein, a nucleic acid array can be constructed using nucleic acid probes for at least four, e.g., at least 10, 20, 40, 60, 80, or 100 CNS genes listed in the tables or figures herein. Such an array can include control probes (i.e., probes for genes whose expression is expected to remain unaffected in a negative sample, e.g., a sample from a subject not having a non-CNS disorder). Typically, such controls or "normal" non-disease samples arc obtained from healthy volunteers. Longitudinal studies of healthy volunteers can be performed to confirm that the control samples arc from individuals that remained disease free over time. Such studies provide the raw data for a database of control gene expression profiles. Such a database provides a source of normal or control "reference" profiles that can be used in the present methods. Control samples can also be obtained post-mortem from individuals who died for a reason unrelated to the disorder being diagnosed (e.g., individuals who died from an accidental trauma). In such cases, post-mortem samples should be taken as soon as possible after death, e.g., no later than 3 hours after death.

A population of labeled cDNA representing total mRNA from a sample of a tissue of interest, e.g., brain, spinal cord, or CSF, is contacted with the nucleic acid, e.g., DNA, array under suitable hybridization conditions. Hybridization of cDNAs with sequences in the array is detected, e.g., by fluorescence at particular addresses on the solid support. Thus, a pattern of fluorescence representing a gene expression pattern in the CNS sample of a particular subject or group of subjects is obtained. These patterns of gene expression can be digitized and stored electronically for computerized analysis and comparison. For example, an array can be used to compare expression of CNS genes in individuals being tested with one or more reference gene expression profiles stored electronically, e.g., in a digital database, where the reference gene expression profile is associated with either the presence (positive control) or absence (negative control) of a peripheral neoplasia or other disorder.

In some embodiments, cDNAs are used as probes to form the array. Suitable cDNAs can be obtained by conventional polymerase chain reaction (PCR) techniques. The length of the cDNAs can be from 20 to 2,000 nucleotides, e.g., from 100 to 1 ,000 nucleotides. Other methods known in the art for producing cDNAs can be used. For example, reverse transcription of a cloned sequence can be used (for example, as described in Sambrook et al., eds., Molecular Cloning: A Laboratory Manual. 2nd ed.. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989). The cDNA probes are deposited or placed ("printed" or "spotted") onto a suitable solid support (substrate), e.g., a coated glass microscope slide, at specific, predetermined locations (addresses) in a two-dimensional grid. A small volume, e.g., 5 nanolitcrs, of a concentrated DNA solution is used in each spot. Spotting can be carried out using a commercial microspotting device (sometimes called an arraying machine or gridding robot) according to the vendor's instructions. Commercial vendors of solid supports and equipment for producing DNA arrays include BioRobotics Ltd., Cambridge, UK; Corning Science Products Division, Acton, MA; GENPAK Inc., Stony Brook, NY; SciMatrix, Inc., Durham, NC; and TcleChcm International, Sunnyvale, CA.

The cDNAs can be attached to the solid support by any suitable method. In general, the linkage is covalent. Suitable methods of covalently linking DNA molecules to the solid support include amino cross-linking and UV crosslinking. For guidance concerning construction of cDNA arrays , sec, e.g., DcRisi et al., Nature Genetics 14:457-460 (1996); Khan et al., Electrophoresis 20:223-229 (1999); Lockhart et al., Nature Biotechnol. 14: 1675-1680 (1996).

In some embodiments of the methods described herein, the immobilized DNA probes in the array are synthetic oligonucleotides. Preformed oligonucleotides can be spotted to form a DNA array, using techniques described herein with regard to cDNAs. In general, however, the oligonucleotides are synthesized directly on the solid support. Methods for synthesizing oligonucleotide arrays are known in the art. Sec, e.g., Fodor ct al., U.S. Patent No. 5,744,305. The sequences of the oligonucleotides represent portions of the sequences of a particular gene to be detected. Generally, the lengths of oligonucleotides are 10 to 50 nucleotides, e.g., 15, 20, 25, 30, 35, 40, or 45 nucleotides.

Also useful in the methods arc aptamcr arrays. Aptamers are nucleic acid molecules that bind to specific target molecules based on their three-dimensional conformation rather than hybridization. The aptamers are selected, for example, by synthesizing an initial heterogeneous population of oligonucleotides, and then selecting oligonucleotides within the population that bind tightly to a particular target molecule. Once an aptamer that binds to a particular target molecule has been identified, it can be replicated using a variety of techniques known in biological and other arts, e.g., by cloning and polymerase chain reaction (PCR) amplification followed by transcription. The target molecules can be nucleic acids, proteins, peptides, small organic or inorganic compounds, and even entire micro-organisms.

The synthesis of a heterogeneous population of oligonucleotides and the selection of aptamers within that population can be accomplished using a procedure known as the Systematic Evolution of Ligands by Exponential Enrichment or SELEX. The SELEX method is described in, e.g., Gold ct al., U.S. Patent Nos. 5,270,163 and 5,567,588; Fitzwater et al., Methods in Enzymology, 267:275-301 (1996); and in Ellington and Szostak, Nature 346:818-22 (1990). Briefly, a heterogeneous DNA oligomer population is synthesized to provide candidate oligomers for the in vitro selection of aptamers. This initial DNA oligomer population is a set of random sequences 15 to 100 nucleotides in length flanked by fixed 5' and 3' sequences 10 to 50 nucleotides in length. The fixed regions provide sites for PCR primer hybridization and, in one implementation, for initiation of transcription by an RNA polymerase to produce a population of RN A oligomers.

The fixed regions also contain restriction sites for cloning selected aptamers. Many examples of fixed regions can be used in aptamcr evolution. See, e.g., Conrad et al., Methods in Enzymology, 267:336-83 (1996); Cicsiolka et al., Methods in Enzymology, 267:315-35 ( 1996); and Fitzwater, (1996), supra. Aptamers are generally selected in a 5 to 100 cycle procedure. In each cycle, oligomers are bound to the target molecule, purified by isolating the target to which they are bound, released from the target, and then replicated by 20 to 30 generations of PCR amplification.

Aptamer selection is similar to evolutionary selection of a function in biology. Subjecting the heterogeneous oligonucleotide population to the aptamcr selection procedure described above is analogous to subjecting a continuously reproducing biological population to 10 to 20 severe selection events for the function, with each selection separated by 20 to 30 generations of replication. Heterogeneity is introduced, e.g., only at the beginning of the aptamcr selection procedure, and docs not occur throughout the replication process. Alternatively, heterogeneity can be introduced at later stages of the aptamer selection procedure. Various oligomers can be used for aptamer selection, including, e.g., T- fluoro-ribonucleoϋde oligomers, NH2-substituted and OCH 3 -substituted ribose aptamcrs, and deoxyribose aptamers. RNA and DNA populations are equally capable of providing aptamers configured to bind to any type of target molecule. See Griffiths et al., EMBO J. 13:3245-60 (1994).

Using 2'-fluoro-ribonucleotide oligomers is likely to increase binding affinities ten to one hundred fold over those obtained with unsubstituted ribo- or deoxyribo- oligonucleotides. See, e.g., Pagratis et al., Nature Biotechnology 15(l):68-73 (1997). Such modified bases provide additional binding interactions and increase the stability of aptamer secondary structures. These modifications also make the aptamers resistant to nucleases, a significant advantage for real world applications of the system. See, e.g., Lin et al. Nuc. Acids Res. 22:5229-34 (1994); Pagratis, (1997), supra.

In the present invention, aptamers can be used to detect, e.g., mRNAs, cDNAs, or proteins corresponding to CNS marker genes.

In some embodiments of the invention, probes (e.g., nucleic acid probes, antibodies, or aptamers) for the human homologs of animal model CNS genes are used in the detection method. In other embodiments, the probe used for detection consists of highly conserved regions of a gene, e.g., a sequence that is highly conserved between homologous mouse and, human sequence.

• Sample Preparation and Analysis

In general, mRNA from the CNS cells or tissue is reverse translated into cDNA under conditions such that the relative amounts of cDNΛ produced representing specific genes reflect the relative amounts of the mRNA in the sample. Comparative hybridization methods involve comparing the amounts of various, specific mRNAs in two tissue samples, as indicated by the amounts of corresponding cDNAs hybridized to sequences from the genes of interest.

The mRNA used to produce cDNA is generally isolated from other cellular contents and components. One useful approach for mRNA isolation is a two-step approach. In the first step, total RNA is isolated. The second step is based on hybridization of the poly(A) tails of mRNAs to oligo(dT) molecules bound to a solid support, e.g., a chromatographic column or magnetic beads. Total RNA isolation and mRNΛ isolation arc known in the art and can be accomplished, for example, using commercial kits according to the vendor's instructions. Similarly, synthesis of cDNΛ from isolated mRNA is known in the art and can be accomplished using commercial kits according to the vendor's instructions. Fluorescent labeling of cDNA can be achieved by including a fluorescently labeled deoxynucleotide, e.g., Cy5-dUTP or Cy3-dUTP, in the cDNA synthesis reaction. For guidance concerning isolation of mRNA and synthesis of fluorescently labeled cDNA for analysis on a DNA array, see, e.g., Ross ct al., Nature Genetics 24:227-235 (2000).

Conventional techniques for hybridization and washing of DNA arrays, detection of hybridization, and data analysis can be employed in the new methods without undue experimentation. Commercial vendors of hardware and software for scanning DNA arrays and analyzing data include Cartesian Technologies, Inc. (Irvine, CA); GSI Lumonics (Watertown, MA); Genetic Microsystems Inc. (Wobura, MA); and Scanalytics, Inc. (Fairfax, VA). In other embodiments, the expression level of one or more CNS genes is reflected in the presence and/or level of protein present in cells of a CNS sample to be assayed. The presence or level of protein in a CNS sample can be detected by routine methods. For example, a CNS sample (e.g., a CSF sample) can be analyzed by gel electrophoresis techniques such as 2-dimensional (2D) PAGE. Once protein spots are separated on a 2D-PAGE gel, differentially expressed spots can be identified, e.g., by matrix assisted laser dcsorption ionization time of flight (MALDI-TOF) and clcctrospray ionization (ESI). This method can also be used for peptide analysis to provide the fingerprint of a particular protein in a sample.

A second proteomic approach can involve obtaining a protcomic spectrum by directly analyzing a CNS sample, such as a CSF sample, by mass spectroscopy. For example, surface enhanced laser desorption ionization time of flight (SELDI-TOF) analysis can be performed to generate a protcomic pattern from a CNS sample. SELDI-TOF analysis has been shown to be able to identify a cluster pattern that differentiates between normal and disease patients. Sec, Paweletz ct al., Dis. Markers, 17(4):301 -7 (2001).

Generating Expression Profiles An expression profile used in the methods described herein is a pattern of expression of two or more CNS genes or proteins. In some cases, an expression profile can be a pattern of expression of 5, 10, 25, 50, 100, 200, 500, or more genes or proteins. A "reference gene expression profile" as used herein is a characteristic pattern (datasct) of expression (e.g., up or down regulated and/or level of expression) of two or more CNS genes, where the pattern of expression is associated with risk or presence of a particular disorder (e.g., a ratio of the level of expression associated with a particular disorder to the level of expression in a person without the disorder). The association between the characteristic profile and the particular disorder is determined through the generation and analysis of CNS gene or protein expression data to identify correlations between particular patterns of CNS gene or protein expression (e.g., relative increases and/or decreases of gene expression of particular genes compared to a negative control) and particular clinical states. For example, a reference gene expression profile can be data for a set of genes (also referred to herein as a "panel" or "cluster" of genes), where each gene of the set is cither down-regulated or up-regulated when associated with a specific peripheral disorder or any peripheral disorder.

A reference profile can also include a value, e.g., a relative value, of gene expression for two or more genes in a panel, where at least one gene of the panel is down-regulated and at least one gene is up-regulated.

Exemplary gene expression profiles associated with non-CNS carcinoma (or particular types of non-CNS carcinoma, such as breast, lung or colon carcinoma) are shown in FlGs. 5A-5I, 1 IA, 12, and 21 A-23C. A reference gene expression profile can include data from at least a portion of the genes or gene products shown in these figures. For example, a reference gene expression profile associated with lung carcinoma can include a value for the differential expression of 1, 2, 5, 10, 20, 30, 40, 50, or more, genes or gene products listed as CNS markers for carcinoma in FIGs. 5A-5I, 1 I A, 12, and 21 A-23C; in some embodiments, the profile includes values for genes listed in FIGs. 5J and/or 1 IB. FIGs. 5A-J are tables of genes that are differentially expressed in the brain 18, 72, or 192 hours after mice arc inoculated with either lung, colon, or mammary cancer cells. FIGs. 1 IA-B are tables of certain genes from hypothalamus samples of mice injected with lung cancer cells that were validated as differentially expressed by real-time PCR. FlG. 12 is a table of genes that are known to be involved in a behavioral state known as sickness behavior and not represented in the 1OK oligonucleotidc-based array used in the examples 2 and 3, but found to be differentially expressed in the brain after inoculation of cancer cells. FlGs. 21 A-23C are lists of the best candidate genes for each type of cancer.

Reference profiles can be generated by detecting changes in patterns of gene expression in the CNS in response to the presence of non-CNS disease in an experimental animal, and identifying the human homologs of the genes and gene clusters that are differentially expressed in a certain pattern in the experimental samples, as exemplified in Examples 1 described herein.

A reference gene expression profile can also be obtained by evaluating human CNS gene expression data. For example, a database is created and maintained where CNS gene expression data is obtained and stored, e.g., electronically e.g., digitally, for tens, hundreds, or thousands of individuals. The individuals can be followed and evaluated with regard to, e.g., cancer clinical state longitudinally (e.g., over at least 5 years, 10 years, 15 years, 20 years, 30 years, 50 years or a lifetime). The expression profiles of individuals who developed a particular disease, e.g., 5, years, 10 years, 15 years, 20 years, 30 years, or 50 years after the CNS gene expression data was obtained, are compared with the expression profiles of individuals who remained disease free. Similar comparison is made between individuals who developed one clinical type of the disorder compared to another, or individuals who developed the disease at an early age versus a late age. These analyses provide specific reference CNS gene expression profiles that are associated with different stages of disease, e.g., different stages of neoplasia, or different types of tumors. A "control gene expression profile" is a profile of a given set of genes in a healthy (normal) individual or animal model.

Both reference and control gene expression profiles are typically stored in electronic digital form, e.g., on a computer-readable medium, such as a CD, diskette, DVD, hard drive, computer memory, or memory cards, along with identifying information such as gender, type and stage of disorder, age group, and race of the subject. A "subject expression profile" is obtained from a CNS sample of a subject to be tested for the presence of cancer. First, a CNS sample, e.g., a brain cell sample or CSF sample, is obtained from the subject by routine means such as brain needle biopsy (for a brain cell sample) or a lumbar puncture (for CSF), as described herein. The sample is then prepared for use in a method of detecting gene or protein expression, e.g., any method of detecting gene or protein expression described herein. In one embodiment, total RNA can be prepared from the sample, and reverse transcribed into cDNA for use in a nucleic acid array assay described herein. In another embodiment, total protein is prepared from the sample for use in an antibody assay described herein. The prepared sample can then be contacted with an array (e.g., an antibody or nucleic acid array) that can detect expression levels (or protein levels in the case of an antibody array) of at least one cluster or panel of CNS genes or gene products corresponding to the cluster or panel of CNS genes or gene products of one or more particular reference gene expression profiles to which the test sample will be compared. For example, a prepared CNS sample from the subject can be contacted with a nucleic acid array containing nucleic acid probes or an antibody array containing antibody probes for two or more, e.g., between 2 and 150, between 10 and 50, or between 20 and 30, of the genes shown herein, e.g., in FlGs. 5A-51, 1 IA, 12, and 21 A-23C. In some embodiments, the array can contain probes for each of the marker genes in a particular cluster disclosed herein, e.g., in any of FlGs. 5A-51, 1 IA, 12, and 21 A-23C. In some embodiments, the array also contains probes for one or more marker genes in FIGs. 5J and/or 11 B.

The results of the array assay are obtained by routine techniques, such as fluorescence detection and measurement of bound antibody or hybridized nucleic acid for each position (each probe) on the array. A dataset of the values for the level of each polypeptide or gene detected in the CNS sample by each antibody or probe on the array can then be generated. The dataset can contain information such as patient identifier, and actual and/or relative levels of expression or protein detected. Such a dataset can be used directly as the subject expression profile or the dataset can be converted into a format comparable to the format of the reference profile. Once the subject expression profile is generated, a subject profile can be compared to a reference expression profile as described herein. Analyzing Expression Profiles

The new methods and systems enable one to evaluate a test subject by comparing a subject expression profile from the test subject with a reference gene expression profile associated with the presence of a particular disorder and/or a control ("normal") gene expression profile associated with the absence of a particular non-CNS disorder. Longitudinal studies of CNS gene expression in multiple volunteers are performed to identify and confirm control expression profiles that are associated with individuals who remain disease free or reference profiles individuals who get the disease. Such studies provide the raw data for a database of negative and positive control expression profiles that can be used in the present methods.

"Subject" and "reference" profiles can be obtained by methods described herein. In one embodiment, the methods include obtaining a CNS sample from a subject (cither directly or indirectly from a caregiver or other party), creating an expression profile from the sample, and comparing the subject's expression profile to one or more control and/or reference profiles and/or selecting a reference profile most similar to that of the subject.

As with other detection methods, profile-based assays can be performed prior to the onset of symptoms (in which case they arc diagnostic), prior to treatment (in which case they are prognostic) or during the course of treatment (in which case they serve as monitors.)

A variety of routine statistical measures can be used to compare two expression profiles. One possible metric is the length of the distance vector that is the difference between the two profiles. Each of the subject and reference profiles is represented as a multi-dimensional vector, wherein each dimension is a value in the profile, e.g., a value for the expression of a particular gene in a panel. A subject profile and reference or control profile can be said to "match" if the two profiles satisfy the conditions for a "match" given above. If the subject and reference profile match, the subject can be identified as having the peripheral disorder with which the reference profile is associated. If the subject and normal control profile match, the subject is likely to be free of the peripheral disorder. In one embodiment, pattern recognition software is used to identify matching profiles. These profiles can be obtained using univariate classical statistical techniques, like signal-to-noisc ratio, correlation analysis and ANOVA (Golub et al. Science 286:531-7 (1999), van 't Veer ct al. Nature 415:530-6 (2002), Pavlidis, P. Methods 31 :282-9 (2003)), or specific multivariate techniques, like the SPLASH (structural pattern localization analysis by sequential histograms) algorithm implemented in the Genes@Work software package (IBM Corp.) (Leprc, et al. Bioinformatics 20: 1033-44 (2004)).

In another embodiment, gene expression profiles are analyzed by quantitative pattern comparison performed by applying a nearest neighbor classifier (see Jelinck et al., MoI. Cancer Res., 1 :346-61 (2003)). Based on the nearest neighbor classifier, a score is defined which, together with a permutations-derived distribution, can be used to estimate the probability of each subject profile of belonging to a class defined by a reference gene expression pattern (see Jelinck, (2003), supra).

The result of the diagnostic test, which can be transmitted in paper or electronic form to the subject, a caregiver, or another interested party, can be the subject expression profile per se, a result of a comparison of the subject expression profile with another profile, a most similar reference profile, or a descriptor of any of these. Transmission can occur across a computer network (e.g., in the form of a computer transmission such as a computer data signal embedded in a carrier wave). Tlic new systems also include a computer-readable medium (such as a CD, diskette, or hard drive) having executable code for effecting the following steps: receive a subject expression profile; access a database of reference expression profiles; and either i) select a matching reference profile most similar to the subject expression profile, or ii) determine at least one comparison score for the similarity of the subject expression profile to at least one reference profile. The subject expression profile and the reference expression profile each include a value representing the level of expression of one or more of the identified genes or gene products or the proteins they encode. Predictive Medicine

The methods described herein arc generally useful in the field of predictive medicine and, more specifically, are useful in diagnostic and prognostic assays, in monitoring progression of a neoplasia, or monitoring of response to treatment, e.g., in clinical trials. For example, one can dctcπninc whether a subject has a very early stage neoplasia, in the absence of other, e.g., clinical, indications of neoplasia. The methods are particularly useful, e.g., for patients who have had surgery or treatment to remove cancer, in which case the methods could be used to monitor recurrence or metastasis, for persons living in regions of high incidence of cancer due, e.g., to environmental factors, or for individuals who have a family history of a disease

(e.g., diabetes, asthma or cancer) or are carriers of a disease susceptibility gene, e.g., a cancer susceptibility gene (e.g., BRCAl or BRCA2, hMSH2, MLH l , MSH2, or MSH6). Other cancer susceptibility genes are described in The Genetic Basis of Human Cancer. 2nd edition (Vogelstein and Kinzler, Eds.), McGraw-Hill Professional (2002). Such individuals can be evaluated using the methods described herein.

In some cases, for example, where the risk of developing a disease is high (e.g., where an individual has a strong family history of cancer, or where an individual carries a cancer susceptibility gene or lives in a high risk area for cancer), an individual can be evaluated periodically (e.g., every 10 years, every 5 years, or every year) during his or her lifetime.

A System for Diagnosing a Non-CNS Disorder

A system for diagnosing a non-central nervous system (non-CNS) disorder in a subject can include the following elements: a sampling device to obtain a CNS sample, a gene expression detection device, a reference gene expression profile, and a means for comparing gene expression (e.g., a comparator) of one or more genes in the CNS sample with the reference gene expression profile. A sampling device obtains a CNS sample by a minimally invasive technique, e.g., a form of neurosurgery. Minimally invasive neurosurgery techniques include computer-assisted stcrco-taxis, intra-operativc ultrasound, brain mapping and neuro- endoscopy, among other techniques. Stereo-taxis refers to a system of navigating to any area within the brain, with the aid of imaging techniques that display external reference landmarks and neural structures.

Alternatively, a "sample" can be taken by imaging gene expression, e.g., in the brain, rather than taking an actual sample. Brain imaging can be performed by Computer Tomography Scan (CT), Magnetic Resonance Imaging (MRI) or Positron Emission Tomography (PET), among other methods. Signals originated from these methods provide reference points from which a computer can calculate and present trajectories and depths to any target point within the brain. The latest generation of stereo-tactic systems, which includes the Cosman-Roberts-Wells (CRW) system, can be used with MRI and cerebral angiographic localization, lntra-operative ultrasound can be used either alone or in combination with stereo-taxis. Intraoperative ultrasound is used to identify structures such as the ventricles prior to dural opening. The ultrasound probe can also be used to guide a needle biopsy of a deep- seated lesion to obtain the CNS sample. Both the rigid and fiber-optic flexible endoscopes can be used to obtain a brain sample using minimally invasive techniques. Lasers and various other instruments (including biospy instruments) can be attached and used. A sampling device to obtain cerebrospinal fluid by lumbar puncture can also be guided by any of the imaging methods listed above. Gene expression detection devices include those described herein under the subheading Nucleic Acid-Based Methods, Array, and, sample preparation and analysis. The comparator can be a computer loaded with pattern recognition software, as described herein.

Computer-Readable Medium

In another aspect, the new systems feature a computer-readable medium having a plurality of digitally encoded data records or data sets. Each data record or data set includes a value representing the level of expression of a CNS gene, and a descriptor of the sample. The descriptor can be, e.g., an identifier (e.g., an identifier for the patient from which the sample was obtained, e.g., a name or a reference code that can be matched with patient information only by those having access to a decoding table), a diagnosis made, or a treatment to be performed in the event the level of expression reaches a certain level or falls below a certain level. The data record can also include values representing the level of expression of related genes (e.g., the data record can include values for each of a plurality of genes in a gene "cluster," where a particular reference gene expression for the genes in the cluster is associated with a non-CNS disorder). The data record can also include values for control genes (e.g., genes whose expression is not changed in control samples or whose expression is not diagnostically correlated with a non-CNS disorder). The data record can be structured in various ways, e.g., as a table (e.g., a table that is part of a database such as a relational database (e.g., a SQL database of the Oracle or Sybase database environments) or as a list.

Isolating Homologous Sequences from Other Species

Human homologs of genes useful in methods described herein are listed in FlGs. 5A-5I, 1 IA, 12, and 21 A-23C can be found on public databases such as GcnBank and others that are available on the Internet.

The human homologs of CNS marker genes and their products (e.g., human homologs of CNS marker genes identified by experiments in non-human experimental animals) arc useful for various embodiments of the methods described herein. Human homologs are known for most of the CNS marker genes provided herein. In those cases where a human homolog is not identified, several standard approaches can be used to identify such genes. These methods include low stringency hybridization screens of human libraries with a mouse marker gene nucleic acid sequence, polymerase chain reactions (PCR) of human DNA sequence primed with degenerate oligonucleotides derived from a mouse marker gene, two- hybrid screens, and database screens for homologous sequences.

Chcmothcrapcutic Agents

In one embodiment, the methods described herein can identify or diagnose the presence of a non-CNS neoplasia in a subject at an early stage, e.g., before a neoplasm has formed, before a neoplasm is clinically detectable, and/or before a tumor has become malignant. As such, a neoplasm detected by a method described herein is amenable to treatment by an agent that targets neoplastic cells in general or targets specific neoplastic cells in particular. In one embodiment, a subject may be treated with a chemothcrapcutic agent. Chemotherapeutic agents, as used herein, refer to chemical therapeutic agents or drugs used in the treatment of neoplasia. This term is used for simplicity notwithstanding the fact that other compounds may be technically described as chemotherapeutic agents in that they exert an anti-cancer effect. A number of exemplary chemotherapeutic agents arc described below. Suitable chemotherapeutic agents include: antitubulin/antimicrotubule drugs, e.g., paclitaxel, taxol, tamoxifen, vincristine, vinblastine, vindesine, vinorclbin, taxotcre; topoisomcrase 1 inhibitors, e.g., topotecan, camptothccin, doxorubicin, etoposide, mitoxantrone, daunorubicin, idarubicin. teniposide, amsacrine, epirubicin, merbarone, piroxantrone hydrochloride; antimetabolites, e.g., 5-fluorouracil (5-FU), methotrexate, 6-mercaptopurine, 6-thioguanine, fludarabine phosphate, cytarabine/Ara-C, trimetrexate, gemcitabine, acivicin, alanosine, pyrazofurin, N-Phosphoracetyl-L-Asparale=PALA, pentostatin, 5-azacitidine, 5-A/a 2'-dcoxycytidinc, ara-A, cladribine, 5 - fluorouridine, FUDR, tiazofurin, N-[5-[N- (3,4-dihydro-2-methyl-4-oxoquinazolin-6-ylmethyl)-N-methylamino]-2-thcnoyl]-L- giutamic acid; alkylating agents, e.g., cisplatin, carboplatin, mitomycin C, BCNU=Carmustine, melphalan, thiotepa, busulfan, chlorambucil, plicamycin, dacarbazine, ifosfamide phosphate, cyclophosphamide, nitrogen mustard, uracil mustard, and pipobroman, 4-ipomeanol; estrogen modulators, e.g., raloxifene; piroxicam; 9-cis retinoic acid.

Suitable dosages for the selected chemotherapeutic agent arc known to those of skill in the art. For example, where the agent is doxorubicin, suitable dosage may include 30 mg/m² of patient skin surface area, administered intravenously, twice at 1 week intervals. However, one of skill in the art can readily adjust the route of administration, the number of doses received, the timing of the doses, and the dosage amount, as needed. Bearing in mind these considerations, generally, a suitable dose for a given chemotherapeutic agent is between 10 mg/m to about 500 mg/m², and more preferably, between 50 mg/m² to about 250 mg/m² of patient skin surface area (the skin surface of an average sized adult human is about 1.8 nr).

Such a dose, which may be readily adjusted depending upon the particular drug or agent selected, may be administered by any suitable route, including, e.g., intravenously, intradcmially, by direct site injection, intraperitoneally, intranasally, or the like. Doses may be repeated as needed.

In one embodiment, because a method described herein can identify or diagnose the presence of a non-CNS neoplasia in a subject at an early stage, e.g., before a neoplasm has formed, before a neoplasm is clinically detectable, and/or before a tumor has become malignant, the dose of a chemotherapeutic agent may be lower than that typically used after a neoplasm, e.g., a cancer, is detected or diagnosed by clinical methods, such as visualization or palpation of a tumor mass.

Screening for Therapeutic Targets

A CNS marker gene for a non-CNS cancer, e.g., a CNS marker gene described herein, may not only "sense" the presence of the cancer, but also actively participate in responding to the presence of the cancer by generating a response, e.g., an antitumor response. Alternatively, a CNS marker gene may respond to the presence of non-CNS cancer by promoting progression of the cancer, e.g., inducing growth of a neoplasm or promoting malignant transformation of a neoplasm. As a therapeutic strategy, one would want to promote the expression or activity of the former type of gene, and/or inhibit the expression of activity of the latter type of gene, in the CNS. Thus, regardless of whether a CNS marker gene generates a response to curb or promote a specific cancer, its identification can provide a target for inhibiting progression of the cancer.

One way to identify such CNS marker genes that are also potential therapeutic targets is to identify CNS genes that are differentially expressed in animals that exhibit an inhibitory response against a disease compared to animals that do not exhibit an inhibitory response. For example, experimental animals can be injected with tumor inducing cells (e.g., colon cancer cells such as CT26) that express an interlcukin (IL), e.g., IL- 12. Injection of tumor cells genetically modified to express IL-12 is known to induce TIi 1 immune-mediated tumor rejection (Adris et al., Cancer Res. 60(23):6696-703 (2000)). Control mice can be injected with tumor cells that do not express IL-12. At different times after injection, gene expression in the CNS is analyzed in the animals, as described herein, e.g., by microarray analysis. Thus, genes that "turn off' and "turn on" specifically in the CNS (e.g., brain) of the animals can be identified. Some of these genes will respond to the presence of the IL. Others will correspond to genes actively engaged in the "stimulation" of the antitumor immune response. This strategy can be used for any interleukin gene that may be involved in the stimulation of an antitumor immune response. Identification of brain genes actively involved in "stimulating" an antitumor response will provide a target for therapeutic intervention, e.g., by direct use of the gene or its gene product, or by screening for agents that block or stimulate their activity.

A second strategy for identifying CNS genes that are potential therapeutic targets is by using transgenic animals (e.g., knockout mice) having brain-specific disruptions {e.g., knockouts) in specific genes. A great number of CNS-speάfϊc knockout mice arc currently available to the skilled artisan (see, e.g., the Jackson Laboratory web site, describing numerous JAX® mice models used in neurobiology), and many more can be expected to become routinely available. A role in the CNS response to non-CNS disease can be established for any particular gene for which a brain knockout animal can be obtained or produced, by inducing the disorder in the knockout mice (e.g., as described herein for cancer, RA, asthma or obesity), and evaluating disease outcome.

CNS marker genes and gene products that are also potential therapeutic targets are listed in FlGs. 5A-51, 1 IA, 12, and 21 A-23C. These genes are or encode molecules involved in cell signaling, (e.g., growth factors, hormones, cytokines and their receptors) and arc also differentially expressed markers in each of the tumors studied.

Vaccines The methods described herein also provide targets for preventive vaccination. A set of brain genes that "senses" a disease may include receptors for known or unknown ligands. A disease cell might produce these ligands to inhibit the induction of a brain-derived anti-disease response. In such an instance, identifying a CNS gene that is involved in an anti-disease response can lead to the identification of a gene product secreted by the diseased cell that might impact in the brain to inhibit disease response. A genetic vaccine targeting these products could be a viable therapeutic strategy. One approach to identify CNS targets for preventive vaccination in the treatment of non-CNS disorders is the following: obtain a CNS gene expression profile (using techniques such as those described herein above) from animals that exhibit an anti-disease response, e.g., in the case of a tumor, an IL-12 mediated antitumor response, in an experimental tumor model. It is expected that from the cluster of genes "sensing" the tumor, some will change their expression levels in the presence of IL- 12. This subset of genes will likely be those involved in "generating" the antitumor response. This subset of genes is likely to have predictable modulators. For example, if a CNS gene that changes its expression profile in response to a non-CNS gene in the presence of IL-12 is a receptor, one could predict that the change in gene expression of such a receptor could be brought about by its ltgand. Thus, a preventive genetic vaccine could be designed to generate a memory response to such a ligand.

A second experimental approach can involve identifying those CNS genes that change their activity in response to a non-tumorigcnic dose of tumor cells (e.g., a condition where neoplasia exists in the body, but no neoplasm is yet formed). From this subset of CNS genes one can predict the modulating genes responsible for their changes in activity, as explained above. Such modulating genes, which may be derived from the neoplastic cells, arc likely to be initial tumor-derived signals of alarm in the peripheral body. Thus, a preventive genetic vaccine could be designed to generate a memory response to such genes.

A vaccine can be, e.g., a polypeptide or nucleic acid corresponding to the gene to be targeted. Vaccines described herein can be administered, or inoculated, to an individual in physiologically compatible solution such as water, saline, Tris- EDTA (TE) buffer, or in phosphate buffered saline (PBS). They can also be administered in the presence of substances (e.g., facilitating agents and adjuvants) that have the capability of promoting uptake or recruiting immune system cells to the site of inoculation. Vaccines have many modes and routes of administration. They can be administered intradermally (ID), intramuscularly (IM), and by either route, they can be administered by needle injection, gene gun, or needleless jet injection (e.g., Biojector™, Bioject Inc., Portland, OR). Other modes of administration include oral, intravenous, intraperitoneal, intrapulmonary, intravitrcal, and subcutaneous inoculation. Topical inoculation is also possible, and can be referred to as mucosal vaccination. These include, for example, intranasal, ocular, oral, vaginal, or rectal topical routes. Delivery by these topical routes can be by nose drops, eye drops, inhalants, suppositories, or microspheres. The following examples are illustrative only and not intended to be limiting.

EXAMPLES

Example 1 : Preliminary Study of CNS Expression Profiles Associated With Cancer To establish whether the brain can sense an incipient tumor growing at the periphery, Lewis lung, CT-26 colon, and 4Tl mammary tumorigenic cells were injected into syngeneic mice and the transcriptomes in different brain regions were evaluated. The hypothalamus was selected because of its functional relevance in monitoring the internal milieu, the prefrontal cortex for being an associative region, and the midbrain due to its role in the regulation of basic parameters of homeostasis. In Vivo Studies - Tumor Models

CT-26.WT colon carcinoma and LL/2(LLC1) lung carcinoma were grown in DMEM and 4Tl mammary carcinoma cells were grown in RPMI containing 10% FBS and antibiotics, in a humidified chamber and 5% CO₂. All the cells were obtained from the American Type Culture Collection (ATCC, Manassas, VA). Eight week old mice housed in HEPA filtered air racks (Tccniplast, Italy), 5 animals per cage with food and water ad libitum, were injected subcutaneously with malignant cells. Balb/C males were injected with IxIO⁶ CT-26.WT cells in 300 μl of PBS, Balb/C female mice received IxIO⁵ 4T- 1 cells in 100 μl of PBS, between the first and the second mammary gland, and C57BL/6 male mice were injected with IxIO⁶ LL/2(LLC1) cells in 300 μl of PBS.

In the absence of obvious control cells, sham control mice were injected with the corresponding volume of PBS. Injection of normal cells was not considered a reliable control since they die when injected at an inappropriate site eventually sending signals of death by anoikis, Therefore, each different tumor type became a control for other tumor types.

At the corresponding time points (18, 72, or 192 hours), mice were killed by cervical dislocation and the hypothalamus, prefrontal cortex, and midbrain regions were dissected. At the same time the liver was extracted. Al! regions were frozen in dry ice immediately and stored at -80⁰C until RNA extraction.

All animal procedures were performed according to the rules and standards of the regulations for the use of laboratory animals of the National Institute of Health, USA. Animal experiments were approved by the Ethical Committee of the Institute Leloir.

Microarray Studies

Total RNA was extracted using TRIzol Reagent (Invitrogen, Carlsbad, CA). Poly A+ RNA was obtained from total RNA using the MicroPoly(A) Pure kit (Ambion, Austin, Texas). RNA was reverse transcribed using Superscript H RT (Invitrogen) with oligo dT primers and random primers, both in the presence of aa- dUTP (Sigma Co., St Louis, MO). The cDNΛ was cthanol precipitated, resuspended in 4.5 μl of 0.1 M NaHCO₃ (pH 9.0), mixed with 4.5 μl of the dye (Cy3 or Cy5), resuspended in DMSO (Amersham Pharmacia, Sweden), and incubated for 1 hour at room temperature in the dark. The probes were purified using the SNAP Gel Purification Kit (Invitrogen) following manufacturer instructions with the following modifications: at the initial step, 500 μl of loading buffer (2.25 M guanidinium HCl in 70% isopropanol) were added to the sample, and the sample was placed in a SNAP column and incubated for 4 minutes before the first centrifugation. A 50-mer mouse 1OK Oligo Set (MWG, Germany) was printed on UltraGAPS slides (Coming, Acton, MA) using a Virtek Chipreader Arrayer (Virtek Vision International Inc. Ontario, Canada). Printed slides were prehybridised at 42⁰C in 2X SSC, 0.1% SDS, 1% BSA. Labeled probes were mixed with hybridization buffer containing 30% ibrmamide and hybridized overnight at 42⁰C. Data Acquisition and lmaee Processing

The slides were scanned with a Chip-Reader (Bio-Rad, Hercules, CA) at 10 μm resolution and 16 bit pixel depth, and images were analyzed with VersArray™ Analyzer software v4.5 (BioRad).

Data Filtration and Normalization All the data processing was performed under the R System v2.2.1 (The R

Project for Statistical Computing, available on the world wide web at r-projcct.org). The data was filtered to eliminate dust derived data points (spots with size less than 75 pixels or with a mean to median correlation of less than 80% (Tran et al., Nucleic Acids Res. 30( 12):e54 (2002)), saturated data points (spots with a proportion of saturated pixels greater than 20%), and low signal data points (spots with signal to background ratio below 1.2). Intensity- and spatial-related bias were simultaneously corrected with a 3D-normalization approach. Briefly, the ratio of spot intensity for spot j (Mj = Iog₂ RiICj) was replaced by the residuals of the non-linear regression; M₁ ^~Axi_> yu At), where (x,, yi) is the location of spot / in the slide, and A_\ is the mean spot intensity, i.e. Λ,- = Vi ^■ Iog₂ RjIGj, Locally weighted 3D-polynomial surface regression was carried out with the loess function of R system. Data Integration Between Replicated Slides and Expression Set Generation

M values from dye-swap (technical) replicates were weighted by their quality scores (QC) and averaged. QC was defined as the ratio of spot area to spot perimeter scaled in a range (0, 1]. The resulting expression set was scale-normalized so that each sample has the same median absolute deviation (Smyth and Speed, Methods 31 (4):265-73 (2003)). Results

The cancer cell inocula led to tumor growth in 100% of the mice, a prerequisite for these studies. To avoid amplification of RNA samples that might induce loss of low abundant transcripts and to control for technical variability, each of the biological replicates was obtained from a pool of 15 mice. Control mice were injected with vehicle alone. Mice were sacrificed at 18, 72, and 192 hours after the initiation of the experiment, and the transcriptional profile of the brain was analyzed using a replicated dye-swap design on printed 10 K oligonucleotide-based arrays (see Experiment methodology below for the rationale for the experimental design). Multiple analysis of variance (ANOVA) tests were preformed, one test for each time point. As expected from a chronic stimulus, such as a growing cancer at its earlier stages, transcript levels between mice injected with malignant cells compared to their respective controls had moderate changes in amplitude (up to 200%). FIG. 1 shows the results for the three experimental models (lung, mammary, and colon). For each experimental model, samples were taken from the hypothalamus, the prefrontal cortex, and the midbrain at three different time points (18, 72, and 192 hours). For each cancer model with each sample at each time point, an ANOVA test was performed. FlG. 1 shows the number of genes found to be differentially expressed using the ANOVA test; the percentage of genes differentially expressed that were up-regulated; the average fold change of the differentially expressed genes; and the 5^th and 95^lh percentile values of the fold change data. As one can see from FIG. 1 , genes expressed in the midbrain of the colon cancer model mice at the 192 hour time point exhibited up to a 200% change (95^lh percentile value) in expression levels.

Interestingly, moderate changes in transcript levels were also found in transcriptomic analysis performed in the brain of mice exposed to non-acute conditions such as changes in the hippocampus following cognitive impairment and cortical changes between sleep and wakefulness (Cirelli et al., Neuron 41(l):35-43 (2004); Blalock ct al., J. Neurosci. 23(9):3807-19 (2003)).

Transcript changes demonstrated that the largest number of differentially expressed genes was found in the hypothalami of mice injected with lung cancer cells (FIG. I ). In addition, the lung and mammary cancer models showed the largest number of differentially expressed genes at the 18 hour time point, while in the colon cancer model the number of differentially expressed genes was evenly distributed at the three time points (FIG. 1).

The quality of each datasct was evaluated by the standard error of biological replicates, and the density distribution of the absolute value of logl 0 standard error (ASE) and their median value for the different cancer models was graphed. ASE was calculated for each gene across the biological replicates. No difference in the quality of the different datasets was found, indicating that the observed differences between brain regions and cancer models reflect the biology of the system (FIGs. 2 A-D). The results indicated that there was no difference in the quality of the different data-sets. Thus, the different amounts of differentially expressed genes identified in the different brain areas reflect the ability of the different brain areas to sense tumor presence, instead of being an artifactual result.

Taken together, these results indicate that there is an immediate recognition by the brain of the presence of malignant cells at the periphery. F.xamplc 2: Study of CNS Expression Profiles Associated With Cancer

A more complete study was done using a mixed model ANOVA design to identify differentially expressed genes at either of the three time points. Genes having statistically significant changes at either time point were selected for further analysis. FlG. 3 shows the results for the three experimental models (lung, mammary, and colon). For each experimental model, samples were taken from the hypothalamus, the prefrontal cortex, and the midbrain at three different time points ( 18, 72, and 192 hours). For each cancer model and brain area, an ANOVA test was performed. FIG. 3 shows the number of genes found to be differentially expressed using the ANOVA test; the percentage of genes differentially expressed that were up-regulated; the average fold change of the differentially expressed genes; and the 5th and 95th percentile values of the fold change data. Analysis of Biological Replicates At least four biological replicates were performed at each time point. For the identification of differentially expressed genes, a mixed effects experimental design was employed with factor treatment (levels: tumor and control) and random effect experiment (levels: 1-4) without replication. The statistical significance for the treatment factor was estimated by AN0VΛ (P. Pavlidis, Methods 31, 282 (2003)). For the identification of genes affected by the factor treatment at either time point, we decided to add the factor time (levels: 18, 72 and 192 hours) to the previous design instead of performing multiple ANOVA tests, because this design has better sensitivity without loss of selectivity when compared with multiple one-way ANOVA or t-tests (P. Pavlidis, W. S. Noble, Genome Biol. 2, 42 (2001)). To identify those genes that changed their expression levels in the same direction, e.g., genes that changed in at least one time point; genes that changed in more than one time point, must have changed in the same direction (i.e., always up or down regulated), the statistical significance for the average difference between treatments was estimated by ANOVA.

The Fisher's exact test was used to estimate if the number of differentially expressed genes shared by two or more tumor models was significantly higher than the amount expected only by chance. Example 3: Λ More Restrictive Analysis of the Study in Example 2

In a more restrictive analysis, genes showing up-rcgulation at one time point, but down regulation at a different time point were eliminated. Genes that exhibited up-regulation at two time points, but no change at the third time point, were genes included in the more restrictive analysis. FIG. 4 shows the results of this more restrictive selection process. As one can see by comparing FIGs. 3 and 4, about half the genes were eliminated by this more stringent selection process. FIGs. 5A-J are tables of all of the genes identified by this second, more restrictive selection process.

Example 4: Real Time PCR Validation Studies of the Hypothalamus in the Lung Cancer Model

Real time PCR validation studies were focused on the hypothalamic region of the lung cancer group for two main reasons: (a) it was one of the regions with the largest changes in gene expression (see FlGs. 1 and 4) and (b) analysis of the false discovery rate for genes rank-sorted according to statistical significance (Benjamini et al., Behav. Brain Res. 125(l -2):279-84 (2001 )) confirmed that the hypothalamic region as a whole and the lung cancer model in particular, showed the largest number of differentially expressed genes. To study whether other organs might show gene expression changes due to tumor cell growth, cancer cells were injected into mice and changes in transcript levels in the liver were evaluated. Figures 6A-C show the estimated proportion of false discoveries (y-axis) as a function of the number of genes identified as differentially expressed (x-axis) after injection with lung cancer cells (6A), breast cancer cells (6B), or colon cancer cells (6C). False discovery rate analysis with an arbitrary cutoff for fold discoveries set at 0.2 revealed only 3 differentially expressed genes in the liver (FIGs. 6A-C), which is in marked contrast to the approximately 200 hypothalamic genes selected in the lung cancer model or the more than 60 average genes if all brain regions and tumor models were pooled (FIG. 6). Real time PCR was used to validate all the annotated genes showing an amplitude change in the liver microarrays larger than 30%. Only a sequence of a zinc finger gene (Zfand2a) of unknown function was validated out of the U genes differentially expressed in the microarray data. Samples for real time PCR validation studies were selected according to the significance of their p-value in the microarrays independently of their amplitude of changes. This method of selection led to 55% validation by real time PCR. The percent validation rose to 83% if only amplitude changes > 1.3 fold were considered. FlG. 7A shows the results of the real time PCR validation study grouped according to percentage fold change, including fold changes greater than 5% (1.05). Of the genes that showed a fold change greater than 5% in the experiment of example 2, 55% of the 29 genes were validated by real time PCR. Of the genes that showed a fold change GREATER THAN 5% in Example 3, 55% of the 22 genes were validated by real time PCR. The extent of validation by real-time PCR was similar to that reported previously for low levels of differential expression in the brain (Wurmbach ct al., Methods 31(4):306-16 (2003)).

Since the extent of validation was very similar between the experiments in Examples 2 and 3 (FIG. 7A), the rest of the experiment was conducted with the list of genes showing altered expression in only one direction (the second more restrictive analysis), based on the idea that they would be more reliable candidates for monitoring cancer growth. Real-Time PCR 0.5 μg of mRNA was reverse-transcribed using oligo(dT)i₂-ιs (Invitrogen) and Superscript II RNascH^" Reverse Transcriptase (Invitrogen) following manufacturer instructions. mRNA degradation was performed by incubation at 37°C for 15 minutes with 2 μl of 2.5 M NaOH. Reactions were neutralized with 10 μl of 2 M HEPES free acid, followed by cDNA precipitation. Quantification of cDNA was performed using Oligreen ssDNA Quantitation Reagent (Invitrogen). Primers were designed using the Primer3 program (at www-genomc.wi.mit.edu), and obtained from Invitrogen. Each gene was analyzed by comparing with two housekeeping genes (bcta2-microglobulin and beta-actin) using SYBR Green I (Invitrogen) in 96-well optical plates on an iCyclcr IQ Real-Time Detection System (Bio-Rad). For each 25 μl reaction, 1 μl cDNA dilution, 2.5 μl 1OX PCR Buffer, 1.5 μl 50 mM MgCl₂, 0.75 μl 10 mM dNTP Mix, 0.5 μl of each primer (10 μM), 0.75 μl SYBR Green I (1 :1000 dilution), 0.25 μl 10 mg/ml BSA, 0.25 μl Rox reference dye (Invitrogen), 0.25 μl glycerol, and 0.25 μl Platinum Taq DNA Polymerase (Invitrogcn) were used. PCR conditions were set as follows: 2.5 min at 94°C, and 40 cycles of 45 sec at 94⁰C, 30 sec at 59^DC and 15 sec at 72⁰C. Melting curves of all samples were always performed. Each target gene was assessed 2 - 4 times, each time in triplicates. Normalization versus an average of both housekeeping genes was then performed. Statistical significance was estimated by ANOVA.

Example 5: Venn Diagram Analysis of Genes Selected in Example 3

Venn diagrams of genes selected in Example 3 showed a statistically significant number of shared genes when comparing the hypothalamus and midbrain data of the lung and mammary cancer models. FIO. 7B shows that 12 genes differentially expressed in the hypothalamus of lung cancer model mice were also differentially expressed in the hypothalamus of mammary cancer model mice. FIG.

7B also shows that 8 genes differentially expressed in the midbrain of lung cancer model mice were also differentially expressed in the midbrain of mammary cancer model mice. These data indicate that the brain changes gene expression levels in unique areas in response to peripheral rumor cells.

Example 6: Search for a Subset of Differentially Expressed Genes That Discriminate Between Two Cancer Models.

Λ search for a subset of differentially expressed genes that might discriminate between paired cancer models was conducted. FIG. 8A shows the clustering of 20 hypothalamic samples (column) obtained from the lung and colon cancer models based on 33 differentially expressed genes (row). Similarly, FIG. 8 shows the clustering of hypothalamic samples obtained from mammary and lung cancer models (FlG. 8B) and from colon and mammary cancer models (FlG. 8C). FIG. 9 shows the clustering of cortical samples obtained from mammary and lung cancer models (FIG. 9A); from colon and lung models (FIG. 9B); and from mammary and colon models (FIG. 9C). FIG. 10 shows the clustering of midbrain samples obtained from mammary and lung cancer models (FIG. 10A); from colon and lung models (FIG. 10B); and from mammary and colon models (FIG. 10C). In these figures, the first branch division of most of the hierarchical clustering analyses in the different areas separated the paired cancer models with perfect accuracy indicating the presence of brain gene expression signatures that discriminate between the different cancers. In addition, the hypothalamic region also showed statistically significant predictive ability for discriminating between lung and colon cancer (ROC-score: 0.98, p < 0.0001), and colon and mammary cancer (ROC-score: 1 , p < 0.0001).

Hierarchical Cluster Analysis

From the list of genes that changed their expression levels in the same direction all along the experiment (p < 0.05; see analysis of biological replicates section above), we selected those responding differentially between tumor models (p < 0.01 , t-test). Only experimental samples with less that 40% of missing values, and genes with less than 20% of missing values were included in the analysis. For the Euclidean distance calculation, data was scaled as Ms,- = (M; -Error! Objects cannot be created from editing field codes.) / SD(M), where SD is the standard deviation. Cluster analysis was performed using a Ward's minimum variance agglomeration method (J. A. Hartigan, Clustering Algorithms. John Wiley & Sons Inc, New York, 1975, pp. 366).

Predictive Ability Estimation

A lcave-one-out (LOO) cross validation was performed. If there are n samples, for every sample in turn, the classifier was trained on the remaining n-\ samples and then the resulting hypothesis on the sample left out was tested.

Classification was accomplished with a weighted version of the classical k-nearest neighbour (knn) method (see weighted k-nearcst neighbour below). The LOO process produced n hypothesis output values which were compared with a given detection threshold τ to predict the class. By varying τ, a curve of sensitivity vs. 1- specificity of the classification, called Receiver Operating Characteristic (ROC) curve, can be drawn. The area under the ROC curve (the ROC-score) is a measure of the separating ability of the classifier. Very good classifiers have ROC-scores close to 1 ; bad classifiers have ROC-scores around 0.5. The statistical significance of the ROC-score was estimated from a null model generated by 10,000 permutations at random of the class labels. The p-value of an observed ROC-score was computed as the fraction of randomized ROC scores greater than the observed score. Weighted Nearest Neighbor (wNN)

Variables (genes) useful for classification were selected from the training set (see variables selection section below). The training set (Y) and the test set (X) were then standardized by dividing the variables by the standard deviation of Y. Next, the Euclidean distance d(y,x) between each of the samples y of the training set belonging to class /, and the test sample x were calculated. We converted distances to weights (w) by the Gauss kernel function _?^*' (2π)^"l/2 exp(-/ / (2s²)), where s is the standard deviation of the distances d. The decision function fix) was built using all the samples for training the wNN classifier, so k = M (as defined below). The hypothesis output value was generated as the estimated voting-score of x belonging to class O:fo(x) - /no^'1 ∑[w/ (KO ^~ O)] / Σ vv, where / is the vector of class labels and mo is the amount of samples belonging to class 0. Thus, the voting-score for the classes arc: Error! Objects cannot be created from editing field codes. Error! Objects cannot be created from editing field codes. where mo and m\ are the amount of samples belonging to class 0 and 1 respectively,

In this way, the decision function is D(x) -f_\{x) /Mx), and if D(x) > 1 , then x is assigned to class 1, whereas if O(x) < \, x is assigned to class 0. Variables Selection

To select variables (genes) suitable for classification, the training set was searched for a set of genes that maximizes the ROC-score in a LOO process. In brief, genes were rank-ordered according to their differential expression between classes using a t-test. The number of genes in the classifier was optimized by sequentially adding genes from the top of this rank-ordered list, starting at 2 genes and up to 100 genes. The power for correct classification was estimated for each set of genes by the ROC-score calculated with a LOO performed on the training set. The minimum number of genes maximizing the ROC-score was selected as useful for classification.

Example 7: Grouping of Differentially Expressed Genes into Functional Categories To get insight into the biological significance of the microarray information, differentially expressed genes were grouped in functional categories using the functional classification of the unbiased Gene Ontology Consortia database (GO) (M. Ashbumcr et a!., Nat. Genet. 25, 25 (2000)) and further analyzed using the Gene Set Enrichment Analysis (GSEA) (A. Subramanian et al., Proc. Natl. Acad. Sci. USA, (2005)). GSEA considers all the genes with no arbitrary cut-off, identifying sets of genes that collectively show significant changes in activity, even though the average individual change is low. Analysis was concentrated in the hypothalamic region. Functional gene sets enriched in differentially expressed genes between each tumor type and their respective controls were identified. The functional categories were further grouped when they were closely related biological processes. A global analysis of functional categories at p < 0.05 found no statistically significant gene sets shared between the different cancer models. This additional type of analysis allows the identification of family of genes differentially expressed in tumor-bearing mice. This analysis is based on the functional relationship among differentially expressed genes and is accepted in the literature, and could give additional information about relevant genes not detected by other means. However, grouping gene sets in categories after performing directed acyclic GO graphs, revealed the existence of similar biological processes shared by the three models (Fig. 19 A-I). To explore whether these common processes involved genes with common biological functions, we identified the leading-edge subset of genes (A. Subramanian et al., Proc. Natl. Acad. Sci. USA, (2005))

. This analysis (Fig. 19 J-L) identified genes shared by the three cancer models that reflected similar biological processes such as axon guidance and cytoskeleton and synaptic remodelling that were grouped as "neuronal connections" (Fig. 19B, E, H, and J-L); this included members of gene families acting as positional cues (de Wit, J. & Vcrhaagen, J. (2003) Prog. Neurobiol. 71 , 249-67; Crowner ct al. (2003) Curr. Biol. 13, 967-72.) such as the TGFβ/BMP, wnt, hedgehog and Notch signalling molecules (FIg. 19C, F and I). Time-course analysis of the lung cancer model points to increased "neuronal connections" associated with the up-regulation of the TGFβ/BMP pathway at 18 hours, the Wnt pathway at 72 hours, and the activation of the Notch pathway at 192 hours (Fig. 19C and Fig. 19 J-L). The three pathways as well as the hedgehog signalling pathway were also activated in the mammary and colon cancer models (Fig. 19F and I). In contrast to the lung cancer model, "neuronal connections" were either down-regulated or showed no change in colon and mammary cancer, respectively, suggesting that these signalling pathways might be involved in additional biological processes as well (Fig 19 J-L). It is of note that GSEA analysis yielded no significant changes in additional signalling pathway despite the fact that the array contained gene sequences corresponding to at least 30 additional signalling pathways. Additional leading-edge genes include transcripts associated with "immune response" in the lung and mammary cancer models, and "sickness response" in the three models (Fig. 19B, Fig. 19E, H and Fig 19 J-L). Down-regulation of "apoptosis" in the lung and colon cancer models and down- regulation in "synaptic activity" in the mammary and colon cancer models was also observed (Fig. 19B₁ Fig. 19E, H and 19 J-L). Finally, 35 % to 40 % of leading-edge genes showed non related or unknown biological functions (sec miscellaneous in Fig. 19 A-L).

Analysis of real time PCR validated genes showed genes involved in neuronal development and activity (a clearly represented biological category in the GSEA analysis). FIG. 1 1 is a table of genes validated by real-time PCR. Some of these genes are involved in neuronal development and activity. These genes included thromboxane Λ2 receptor, which is involved in myelin homeostasis (Blackman ct al., J. Biol. Chem. 273(l):475-83 (1998)); ataxin-2 which is involved in dopaminergic transmission (Boesch et al., Mov. Disord. 19( 11 ): 1320-1325 (2004); Wullncr et al., Arch. Neurol. 62(8): 1280-5 (2005)); and contactin 2, oxytocin, and GABAArapl2, which participates in synaptic activity (Furley ct al., Cell 61(l):157-70 (1990); Thcodosis ct al., Neurochcm. Int. 45(4):491-501 (2004); Okazaki et al., Brain Res. MoI. Brain Res. 85(1-2):1-12 (2000)). Interestingly, thromboxane A2 receptor and oxytocin could be assigned to inflammation / immune-related processes, a biological category also overrepresented in GSEA analysis, and also to a behavioral state known as sickness behavior. This state changes the behavioral priorities of animals and human beings to concentrate efforts in coping with an ongoing disease; importantly, different studies provided strong links between this behavioral state and cancer progression reviewed in (Konsman ct al., Trends Neurosci. 25(3): 154-9 (2002)). Gene Annotation and Gene Set Enrichment Analysis Gene information was obtained from Entrez Gene (available on the world wide web at ncbi.nlm.nih.gov/cntrez), and Mouse Genome Informatics (available on the world wide web at informatics.jax.org). Functional gene sets were generated from the Biological Process (BP) ontology of the Gene Ontology (GO) database (Ashburner et al, Nat. Genet. 25(l):25-9 (2000)). Mapping from Entrez Gene numbers to GO-BP categories were obtained from Gene Ontology Annotation database (GOA) (Camon et al., Nucleic Acids Res. 32(Database issuc):D262-6 (2004)), implemented in the GO R-package. Genes were rank-ordered according to their p-value for differential expression estimated by a t-test. Gene sets overrcpresented at the top of the genes ranked list were identified with the Gene Set Enrichment Analysis algorithm (Subramanian et al., Proc. Natl. Acad. Sci. USA 102(43): 15545-50 (2005)). Statistical significance (nominal p values) was estimated by permuting the genes 10,000 times, i.e. genes were randomly assigned to the sets while maintaining their size. Differential nodes refer to the statistically significant GO-BP sets (p < 0.05).

Example 8: Study of Genes Related to a Behavioral State Known as Sickness Behavior The expression of genes that were not present in the array used in example 1 ,

2, and 3, but known to be involved in sickness behavior were studied.

FIG. 12 is a table of genes known to be involved in sickness behavior, but not present in the microarray used in the above examples and genes that have been validated by real-time PCR to be differentially expressed in at least one time point, one brain region, and one cancer model. For example, levels of Cyclo-oxygenasc-2 (ptgs-2) in the hypothalamus and the cortex of mice injected with the three cancer types were found to be up-rcgulated (+17.3% at 18 hours, in the cortex, in the mammary cancer model; +9.8% at 18 hours, in the hypothalamus, in the colon cancer model; and +14.3% at 72 hours, in the hypothalamus, in the lung cancer model; see FIG. 12).

Transcript levels of genes involved in attenuating sickness behaviour via interference with cytokine function were downregulatcd in the hypothalami of mice injected with different tumor models. For example pro-opiomelanocortin-alpha (Pomcl) was decreased in all models; argininc vasopressin in lung and mammary cancer cells, and melanocortin receptors 4 and 5- (Mc4r and Mc5r) in colon cancer cells alone (FIG. 12). No changes in transcript levels were observed for a sickness behavior-related group of cytokines, IGF-I, IL-I β and TNF-α These data indicate the generation of a molecular environment prone to the development of sickness behaviour.

In addition, the diminished expression of indoleaminc 2,3-dioxygenase (Indo) in the hypothalami of all tumor models (FIG. 12) might indicate greater accessibility of tryptophan to produce serotonin, a neurotransmitter associated with cancer rclated-cachexia (Konsman et al., Trends Neurosci. 25(3): 154-159 (2002); Laviano ct ai., Nutrition 18(1): 100-105 (2002)) suggesting that the molecular changes leading to cachexia might initiate early before clinical symptoms are detected. Finally, after an initial increase at 18 hours, neuropeptide Y (Npy) transcript levels returned to normal or decreased in the cortical area of mice injected with mammary and lung cancer cells, while in colon cancer a decrease was observed much later at 192 hours (FIG. 12). Decrease in Npy and Mcr levels might also precede altered feeding behaviour and response to stress that occurs later in cancer development (Heilig and Thorsell, Rev. Neurosci. 13(l):85-94 (2002)).

Example 9: Behavioral Studies of Tumor-Bearing Animals

The expression of genes related to sickness behaviour were not accompanied by behavioral changes in the tumor-bearing animals as analyzed by burrowing and swimming tests.

Burrowing

This test was performed as described (Guenther et al, Eur. J. Neurosci. 14(2):401-409 (2001)). Briefly, at least two hours before the start of a dark period, mice were placed in individual plastic cages containing plastic tubes (20 cm long, 4 cm diameter) filled with pellets. The lower closed end was resting on the cage floor. The open end was supported 3 cm above the cage floor with screws, preventing the contents from being non-purposefully displaced. The tube was filled with 300 grams of pellets. The weight of pellets remaining in the tube was measured 16 hours later, from which the weight displaced (burrowed) was calculated.

Forced Swimming Test

Mice were individually placed in cylinders (19 cm in diameter and 25 cm high) filled with water (24 ± I ⁰C) up to 15 cm deep. Time to immobility was recorded (Pctit-Demouliere ct al., Psychopharmacology (Berl) 177(3):245-55 (2005)). Immobility was considered when mice only did minimal movements to maintain floatability. After the end of the experiment, mice were dried with a towel and placed back in the cage. Results

FlGs. 13A-B shows the results of behavioral tests of tumor-bearing mice and normal control mice. The results are substantially similar for tumor-bearing and control mice.

These results suggest that the brain could modulate the expression profile of certain genes long before behavioral changes are manifested.

Example 10: Comparison Between Rheumatoid Arthritis and Cancer Models

To further evaluate the specificity of the molecular response of the brain to cancer, gene expression changes in a chronic disease other than cancer was assessed. Two models of rheumatoid arthritis (RA) in DBA/1 and C57 Black mice were used. Brain samples were obtained at a very early stage of disease, i.e., one week after immunization/boost in both models. Microarray analysis identified 89, 73, and 104 differentially expressed genes with >1.2 fold change in the hypothalamus, prefrontal cortex, and midbrain, respectively, when compared with control mice. Comparison of the data between models showed that mice injected with cancer cells and RA- mice shared non-significant numbers of differentially expressed genes. Hierarchical clustering analysis demonstrated that the first branch division separated all the arthritic samples from paired samples corresponding to each cancer model with perfect accuracy (FlGs. 14A-C, 15A-C, and 16A-C). A similar analysis considered the ten top-ranked hypothalamic genes in each of the 6 possible pair-wise comparisons between the RA groups of mice and the three cancer models. FlGs. 17A-B show that in this analysis J 00% of the arthritis and 100% of the cancer model samples split in the first clustering branch division. The second branch division included 100% of mammary cancer samples; 1 out of 7 of the colon cancer samples; and 3 out of 12 lung cancer samples (FIG. 17A). Finally, the third branch division separated the rest of the colon cancer samples from the lung cancer samples (FIG. 17A). The cortex and midbrain areas were less effective than the hypothalamus in discriminating between the different samples. Calculation of the ROC-scorc demonstrated that the prefrontal cortex data showed statistically significant predictive ability for discrimination between RA and mammary cancer models (ROC-score: 0.8333 p < 0.001); and RA and lung cancer models (ROC-scorc: 0.8013, p < 0.001). This is compared to the hypothalamus which was able to discriminate between RA and colon cancer model (ROC-score: 0.9107 p < 0.001), and RA and mammary cancer model (ROC-score: 1, p < 0.001). The midbrain showed no predictive capacity to discriminate between the two models of chronic disease, although, upon false discovery rate analysis, the midbrain showed the greatest amount of differentially expressed genes (FIG. 18). GSEA analysis of hypothalamic samples showed no statistically significant shared gene sets between RA and cancer models. FIGs. 17B-D are lists of genes selected from each region that change for cither arthritis model with p-value < 0.05 and fold > 20%; for the genes that change in both models, the change must be in the same direction.

In Vivo Studies - Arthritis model

Immunization procedure: 10-12 weeks old DBA/1 and C57BL/6 male mice were injected intradermally at the base of the tail at days 1 and 21, with 0.1 ml of 2mg/ml Type II chicken collagen (Sigma Co., St Louis, MO), combined with an equal volume of complete Freund's Adjuvant (CFA, Sigma Co).

Sham treated animals were injected with CFA alone. At day 24, mice were injected i.p. with LPS (lipopolysaccharide, 40 μg in 0.1 ml PBS, Sigma Co.) and one week later (day 31) mice showing signs of early disease were killed by cervical dislocation and the hypothalamus, prc-frontal cortex and mid brain were dissected. Onset of arthritis was macroscopically visible as paw swelling 3 weeks after immunization. Using this immunization procedure, between 73 and 90 % of mice developed arthritis. Clinical features of arthritis could be monitored by assessing paw swelling and the number of swollen paws using calipers once every two days. (Taylor ct βl, Eur. J. Immunol. 25(3):763-9 (1995)).

Early detection of cancer, when intervention can often be implemented more successfully, remains a major challenge. The purpose of the work described herein was to establish whether the CNS, as a known integrator of signals emanating from the periphery, could detect cancer progression. The present studies demonstrate that the brain indeed reacts to the presence of peripheral tumors, with changing gene expression levels seen as early as 18 hours after injection of cancer cells. Single gene and gene set analyses demonstrated that molecular signatures for each cancer model are specific. In addition, the brain recognizes the early onset of another chronic disease such as RA with a specific gene profile, different from the ones observed in the cancer models. In contrast, a peripheral organ such as the liver was essentially unable to detect the presence of the tumors. Finally, the hypothalamus appeared as a region that can more efficiently detect and discriminate very early signals emanating from chronic peripheral diseases.

Example 11 : Identification of Differentially Expressed Proteins in CSF

The experiments described in this example were designed to determine whether the differential CNS gene expression associated with cancer as described herein also correlates with the presence of different proteins in the CSF.

In vivo studies were conducted as described in Example 1. CSF samples were obtained at 24 hours and at 9 days post injection, from the same group of 30 animals (15 tumor injected and 15 controls). For each tumor type, four independent experiments were done.

For CSF extraction, animals were anesthetized and CSF was obtained by punction of the fourth ventricle with a 27G butterfly. CSF samples from each experimental group were pooled. CSF samples were precipitated wilh TCA/acetone and resuspended in an appropriate volume of lysis buffer (Urea 7 M, Tiourea 2 M, CHAPS 4%, TCEP-HCl 2 mM) and incubated with shaking for 1 hour at room temperature and stored at -8O⁰C. To perform 2-D analysis, samples were resuspcnded in an appropriate volume of rehydration solution and applied by the cup loading method into an overnight rchydrated 1 1 cm pH 3-10 IPG strips (GE Healthcare, Piscataway, NJ) Isoelectrofocusing (IEF) was performed in an Ettan IPGphor isoelectrofocusing system (GE Healthcare, Piscataway, NJ) according to manufacturer's instructions. After IEF, strips were incubated for 15 minutes in equilibration buffer (50 mM Tris- HCl, 6 M urea, 30% v/v glycerol, 2% SDS, 0.005% bromophenol blue) containing 1% DTT, followed by a second 15 minute-incubation with 4% iodoacetamide in equilibration buffer. Second dimension (SDS-PAGE) was performed in was perfoπned in a Bio-rad Miniprotean III system (Bio-Rad, Hercules, CA).

Data analysis, statistics and protein identification:

CyproRubi (Sigma, St. Louis, MO) stained gels were scanned in an Image scanner and analyzed with Image Master 2D Elite 3.10 software (both from GE Healthcare, Piscataway, NJ). Spots were automatically detected, manually edited, and matched between gels. Normalization and inference statistics were performed under R-Systcm V2.1.1. Proteins were detected using mass-spectrometry compatible silver nitrate staining.

Results

The proteins identified as either increased or decreased are listed in Table 1.

TABLE 1 : PROTEINS DIFFERENTIALLY PRESENT IN CSF

LOCUS DEFINITION

P28665 Murinoglobulin-1 precursor (MuG 1).

BAA23958 TCR beta chain

BAE28703 unnamed protein product

AAA40942 alpha-1 type III collagen

XP_995007 similar to Zinc finger protein 235 (Zinc finger protein 93) (Zfp-93

XP_576394 similar to glyceraldehyde-3-phosphate dehydrogenase

P16617 Phosphoglycerate kinase 1

BAE26325 unnamed protein product

AAH85098 Enolase 1, alpha non-neuron

NP_033931 carbonic anhydrase 2

8AB27362 unnamed protein product

AAH62957 Cp protein AAH43338 Complement component 3 XP_223362 similar to mKIAA1458 protein

These results indicate that there is a differential pattern of proteins present in the CSF of animals with cancer, as opposed to healthy animals, likely as a consequence of the differentia! gene expression in the CNS as described herein.

Example 12: Identification of Differentially Expressed CNS Proteins in Blood

The experiments described in this example were designed to determine whether the differential CNS gene expression associated with cancer as described herein also correlates with the presence of different proteins in the blood of the implanted mice.

In vivo studies were conducted as described in example 1. Blood was collected in a 50 ml conical falcon tube with EDTΛ 10%. Lysis buffer (KHCO₃ 10 mM, NH₄Cl 150 mM, EDTA 0.1 mM pH = 8) was added in a proportion of 150 ml of lysis buffer per 1 ml of blood. Samples were incubated at room temperature (RT) for 15 minutes with occasional agitation, and then centrifuged for 15 minutes at RT at 300 x g. The supernatant was discarded and the pellet was homogenized with 500 μl of Trizol™ reagent (Invitrogen), and total RNA was isolated following manufacturer's instructions. RNA was DNase treated and purified with the RNeasy mini Kit (Qiagen) following manufacturer's instructions. Real time PCR experiments were conducted as in example 4.

The results, shown in Table 2, indicate a differential pattern of gene expression present in the blood of animals with cancer, as opposed to healthy animals.

TABLE 2: VALIDATED GENES IN BLOOD SAMPLES

GenBank

Gene Ace. » Fold change (%) p Value Time point (hr)

Atxn2 NM 009125 -106.38 0.020 18

Gabarapl2 NM 026693 -237.23 6.37E-06 18

Gabarapl2 NM 026693 +59.42 0.028 192

Atp5k NM 007507 -1004.77 0.003 18

Atp5k NM 007507 -54.99 0.041 72

Gpr109a NM 030701 -279.9 0.039 18

Kin X58472 -1796.78 0.031 18 OTHER EMBODIMENTS

A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.

Claims

What is claimed is:

1. Λ method of diagnosing a non-central nervous system (non-CNS) cancer in a subject, the method comprising: providing a reference gene expression profile comprising five or more genes selected from the genes listed in one or more of FlGs. 5A-5I, 1 IA, 12, or 21 A-23C, or homologs thereof, and optionally one or more genes listed in FIGs. 5J or 1 1 B; generating a subject gene expression profile comprising detecting expression of all genes of the reference gene expression profile in a CNS sample of the subject; and comparing the subject gene expression profile with the reference gene expression profile, wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS cancer.

2. Tlic method of claim 1 , wherein the CNS sample is a sample of one or more cells from the brain.

3. The method of claim 2, wherein the brain cells are selected from the group consisting of cells from the hypothalamus, the midbrain, and the prefrontal cortex.

4. The method of claim 3, wherein the brain cells arc selected from the hypothalamus.

5. ^'lTic method of claim 1 , wherein two or more reference gene expression profiles are used, each specific for a different non-CNS cancer.

6. The method of claim I , wherein the non-CNS cancer is selected from the group consisting of lung cancer, colon cancer, and mammary cancer.

7. The method of claim 1, wherein the non-CNS cancer is a solid tumor less than 0.5 an in diameter.

8. The method of claim 1 , wherein the reference gene expression profile comprises ten or more genes selected from any genes listed in one or more of FIGs. 5 A-51, 1 1 A, 12, and 21 A-23C or homologs thereof.

9. The method of claim 1 , further comprising providing a control gene expression profile corresponding to one or more healthy subjects; and comparing the subject gene expression profile with the control gene expression profile, wherein a match of the subject gene expression profile to the control gene expression profile indicates the subject does not have and is not likely to develop non-CNS cancer.

10. The method of claim 1, wherein gene expression is detected using a microarray assay.

1 1. The method of claim 1 , wherein the subject is a human and the reference gene expression profile comprises one or more human homologs of genes listed in

FIGs. 55A-51, HA, 12, and 21A-23C.

12. The method of claim 1, wherein the subject has a family history of cancer.

13. The method of claim 1, wherein the subject lacks a clinical sign of a cancer as evaluated by imaging analysis.

14. The method of claim 1, wherein the subject is a carrier of a gene associated with an increased risk of developing the disorder.

15. The method of claim 14, wherein the subject is a carrier of the BRCAl , BRCA2, hMSH2, hMLHl, or hMSH6 gene.

16. The method of claim 1, further comprising generating a record of the result of the comparing step; and optionally transmitting the record to the subject, a health care provider, or an other party.

17. The method of claim 1, wherein the reference gene expression profile comprises expression data for one or more genes selected from the following group of genes: Tbxa2r, Atxn2, Cntn2, Oxt, Gabarapl2, Unc84a, Atp5k, Bmp 15, Kin,

Nadk, Avp, Indo, Pome, Ptgs2, Npy, and homologs thereof; and wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop non-CNS lung cancer.

18. The method of claim 1, wherein the reference gene expression profile comprises expression data for one or more genes selected from the following group of genes: Avp, Indo, Pome, Npy, Ptgs2, and homologs thereof; and wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop non-CNS mammary cancer.

19. The method of claim 1 , wherein the reference gene expression profile comprises expression data for one or more genes selected from the following group of genes: Avp, Indo, Mc4r, McSr, Pome, Ptgs2, Npy, and homologs thereof; and wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS colon cancer.

20. A reference gene expression profile corresponding to the presence of a non-central nervous system (non-CNS) cancer, comprising expression data of five or more genes selected from any genes listed in one or more of FIGs. 5A-5I, 1 IA, 12, or 21 A-23C, and optionally any gene listed in one or both of FlGs. 5J or 1 1 B.

21. The reference gene expression profile of claim 20, wherein the reference gene expression profile comprises expression data for five or more genes selected from any genes listed in one or more of FIGs. 5A-51, 1 IA, 12, and 21 A-23C.

22. The reference gene expression profile of claim 20, wherein the reference gene expression profile comprises expression data for one or more genes selected from the following group of genes: Tbxa2r, Atxn2, Cntn2, Oxt, Gabarapl2, Unc84a, AtpSk, BmplS, Kin, Nadk, Avp, Indo, Pome, Ptgs2, Npy, and homologs thereof; and wherein the non-central nervous system (non-CNS) cancer is lung cancer.

23. The reference gene expression profile of claim 22, wherein the reference gene expression profile comprises expression data for five or more genes selected from the following group of genes: Tbxa2r, Atxn2, Cntn2, Oxt, Gabarapl2, Unc84a, Atp5k, Bmpl5, Kin, Nadk, Avp, Indo, Pome, Ptgs2, Npy, and homologs thereof; and wherein the non-central nervous system (non-CNS) cancer is lung cancer.

24. The reference gene expression profile of claim 20, wherein the reference gene expression profile comprises expression data for one or more genes selected from the following group of genes: Avp, Indo, Pome, Npy, Ptgs2, and homologs thereof; and wherein the non-central nervous system (non-CNS) cancer is mammary cancer.

25. The reference gene expression profile of claim 24, wherein the reference gene expression profile comprises expression data for Avp, Indo, Pome, Npy, Ptgs2, and homologs thereof; and wherein the non-central nervous system (non-CNS) cancer is mammary cancer.

26. The reference gene expression profile of claim 20, wherein the reference gene expression profile comprises expression data for one or more genes selected from the following group of genes: Avp, Indo, Mc4r, Mc5r, Pome, Ptgs2, Npy, and homologs thereof; and wherein the non-central nervous system (non-CNS) cancer is colon cancer.

27. The reference gene expression profile of claim 26, wherein the reference gene expression profile comprises expression data for five or more genes selected from the following group of genes: Λvp, Indo, Mc4r, McSr, Pome, Ptgs2, Npy, and homologs thereof.

28. A computer-readable medium comprising a data set corresponding to a reference gene expression profile of claim 20.

29. A system for diagnosing a non-central nervous system (non-CNS) cancer in a subject, the system comprising: a sampling device to obtain a central nervous system (CNS) sample; a gene expression detection device that generates gene expression data for one or more genes in the CNS sample or an imaging device to obtain an image of gene expression of one or more genes in the CNS and generate gene expression data for the one or more genes; a reference gene expression profile of claim 20 for a specific non-CNS cancer; and a comparator that receives and compares the gene expression data with the reference gene expression profile.

30. A method of diagnosing a non-central nervous system (non-CNS) cancer in a subject, the method comprising: providing a reference gene expression profile comprising five or more genes selected from any gene listed in one or more of FIGs. 5A-51, 1 1 A, 12, and 21 A-23C, and optionally any gene listed in one or both of FIGs. 5J or 1 I B; generating a subject expression profile comprising detecting expression of proteins encoded by all genes of the reference gene expression profile in a CNS sample of the subject; and comparing the subject expression profile with the reference gene expression profile, wherein a match of the subject expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS cancer.

31 . The method of claim 30, wherein the CNS sample is a cerebrospinal fluid (CSF) sample.

32. The method of claim 30, wherein two or more reference gene expression profiles are used, each specific for a different non-CNS cancer.

33. The method of claim 30, wherein the non-CNS cancer is selected from the group consisting of lung cancer, colon cancer, and mammary cancer.

34. The method of claim 30, wherein the non-CNS cancer is a solid tumor less than 0.5 cm in diameter.

35. The method of claim 30, wherein the reference gene expression profile comprises five or more genes selected from any genes listed in one or more of FlGs. 5Λ-51, 1 IA, 12, and 21 A-23C, or homologs thereof..

36. The method of claim 30, further comprising obtaining a control gene expression profile corresponding to one or more healthy subjects; and comparing the subject expression profile with the control gene expression profile, wherein a match of the subject expression profile to the control gene expression profile indicates the subject does not have and will not likely develop the non-CNS cancer.

37. The method of claim 30, wherein the subject is a human and the reference gene expression profile comprises one or more human homologs of genes listed in FlGs. 5A-51, 1 I A, 12, and 21 A-23C.

38. The method of claim 30, wherein the subject has a family history of cancer.

39. The method of claim 30, wherein the subject lacks a clinical sign of a cancer as evaluated by imaging analysis.

40. The method of claim 30, wherein the subject is a carrier of a gene associated with an increased risk of developing the disorder.

41. The method of claim 30, wherein lhe subject is a carrier of the BRCAl, BRCA2, hMSH2, hMLHl , or hMSHό gene.

42. The method of claim 30, further comprising generating a record of the result of the comparing step; and optionally transmitting the record to the subject, a health care provider, or an other party.

43. The method of claim 30, wherein the reference gene expression profile comprises expression data of one or more genes selected from the following group of genes: Tbxa2r, Atxn2, Cntn2, Oxt, Gabarapl2, Unc84a, AtpSk, Bmpl 5, Kin, Nadk, Avp, Indo, Pome, Ptgs2, Npy, and homologs thereof; and wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS lung cancer.

44. The method of claim 30, wherein the reference gene expression profile comprises expression data of one or more genes selected from the following group of genes: Avp, Indo, Pome, Npy, Ptgs2, and homologs thereof; and wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS mammary cancer.

45. The method of claim 30, wherein the reference gene expression profile comprises expression data of one or more genes selected from the following group of genes: Avp, Indo, Mc4r, Mc5r, Pome, Ptgs2, Npy, and homologs thereof; and wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS colon cancer.

46. A method of diagnosing a non-central nervous system (non-CNS) cancer in a subject, the method comprising: providing a reference gene expression profile comprising five or more genes selected from the group consisting of Atxn2, Gabarapl2, Atp5k, GprlO9a, and Kin; generating a subject gene expression profile comprising detecting expression of all genes of the reference gene expression profile in a blood sample of the subject; and comparing the subject gene expression profile with the reference gene expression profile, wherein a match of the subject gene expression profile to the reference gene expression profile indicates the subject has or is likely to develop the non-CNS cancer.

47. Λ method of diagnosing a non-central nervous system (non-CNS) cancer in a subject, the method comprising: providing a reference protein expression profile comprising five or more proteins selected from the genes listed in Table 1 ; generating a subject protein expression profile comprising detecting expression of all proteins of the reference protein expression profile in a cerebrospinal fluid (CSF) sample of the subject; and comparing the subject protein expression profile with the reference protein expression profile, wherein a match of the subject protein expression profile to the reference protein expression profile indicates the subject has or is likely to develop the non-CNS cancer.