WO2002044411A1 - Use of profiling for detecting aneuploidy - Google Patents

Use of profiling for detecting aneuploidy Download PDF

Info

Publication number
WO2002044411A1
WO2002044411A1 PCT/US2000/035352 US0035352W WO0244411A1 WO 2002044411 A1 WO2002044411 A1 WO 2002044411A1 US 0035352 W US0035352 W US 0035352W WO 0244411 A1 WO0244411 A1 WO 0244411A1
Authority
WO
WIPO (PCT)
Prior art keywords
genes
type
organism
cell
cellular constituents
Prior art date
Application number
PCT/US2000/035352
Other languages
French (fr)
Inventor
Matthew J. Marton
Timothy R. Hughes
Original Assignee
Rosetta Inpharmatics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rosetta Inpharmatics, Inc. filed Critical Rosetta Inpharmatics, Inc.
Publication of WO2002044411A1 publication Critical patent/WO2002044411A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Definitions

  • the present invention relates to methods of using profiles to detect aneuploidy, in particular, to determine the likelihood that aneuploidy is present in a cell type or organism.
  • the present invention relates to methods of diagnosing or determining the predisposition of an organism toward diseases that are associated with abnormal copy 0 numbers of one or more genes, i.e., aneuploidy.
  • the present invention also relates to methods of correcting a profile for the presence of aneuploidy.
  • the present invention further relates to a computer system, a computer program product, and kits for detecting aneuploidy or determining the likelihood that aneuploidy is present in a cell type or organism from profiles. 5
  • Aneuploid cells have a chromosomal constitution that differs from the usual chromosomal constitution for a given species.
  • Germ line cells are said to have n chromosomes. If an organism or species has 2n number of chromosomes in its somatic 0 cells, the organism or species is said to be diploid. Different organisms or species, or the same organism at different phases of a life cycle, can have different ploidy.
  • Yeast cells e.g., can grow as haploid (n number of chromosomes) or diploid (2n number of chromosomes) or polyploid (such as 4n; see Galitski et al. (1999) Science 285:251-254). Many plant species are octaploid (8n).
  • Aneuploidy may occur by loss or gain of one or more chromosomes or chromosomal segments and can have drastic effects on phenotypic expression.
  • -Aneuploidy usually results from non-disjunction of chromosomes during meiosis, which in turn results in gametes having too many or too few chromosomes.
  • Chromosomal non-disjunction can also occur during mitosis, resulting in individuals that express chromosomal mosaicism, i.e., having 0 some somatic cells or tissues that are aneuploid and others that are euploid, which may be associated with mild to severe phenotypic manifestations.
  • Euploid (“true-ploid”) cells have the appropriate or correct amount of genetic material for a given species. Therefore, they are the opposite of aneuploid cells. Aneuploidy can also result from spurious recombination events that result in the amplification or duplication of either full chromosomes or *5 chromosomal segments. Aneuploidy is often lethal in animals but can be tolerated to a greater extent in plants. Trisomies (2n+l chromosomes) are the most common form of aneuploidy and result in the least severe phenotypic aberrations.
  • aneuploid species of some plants may either have almost wild-type characteristics or may be small and infertile, depending on which chromosome is affected (E.R. Sears, "The Aneuploids of Common Wheat,” University of Missouri Research Bulletin, November, 1954).
  • Contiguous gene syndromes result from deletions and amplifications of regions of chromosomes. Contiguous gene syndromes in humans may cause severe mental and physical deformities, and include Alagille syndrome (20p.l2 chromosomal deletion), Angelman syndrome (15ql 1 deletion of maternal chromosome), DiGeorge syndrome (22qll.21 deletion), Langer-Gidion syndrome (8q24.1 deletion), Miller-Dieker syndrome (17pl3.3 deletion), Prader-Willi syndrome (15ql 1 deletion of paternal chromosome), Rubinstein-Taybi syndrome (16pl3 deletion), Smith Magenis syndrome (17pl l.2 deletion), and Williams syndrome (7ql 1.23 deletion) (The Merck Manual 2233-37, Mark H. Beers and Robert Berkow eds., Merck Research Laboratories 17th ed. 1999).
  • cancers that may be associated with aneuploidy include, but are not limited to, leukemias, such as acute myelogenous leukemia, chronic myelocytic leukemia, acute promyelocytic leukemia, acute nonlymphocytic leukemia, acute monocytic leukemia, and acute myelomonocytic leukemia; lymphomas, such as Burkitt's lymphoma, and non-Hodgkin's lymphoma; lymphocytic leukemias, such as acute lymphoblastic leukemia and chronic lymphocytic leukemia; myeloproliferative diseases; adenocarcinomas including small cell lung cancer, kidney cancer, uterine cancer, cervical cancer, prostate cancer, bladder cancer, and ovarian cancer; sarcomas including liposarcoma, synovial sarcoma, rhabdomyosarcoma, extraskeletal myxoid chondrosarcoma, Ewing's tumor and
  • aneuploidy correlates one hundred percent with transformation of mammalian cells in vitro using non-genotoxic carcinogens such as colcemid, benz[a]pyrene, methylcholanthrene, dimethylbenzanthracene, 17 beta-estradiol, and diethylstilbestrol (Li et al. (1997) Proc. Natl. Acad. Sci. U.S.A. 94:14506-14511; Tsutsui and Barrett (1997) Environ Health Perspect. 105 Suppl. 3:619-624).
  • non-genotoxic carcinogens such as colcemid, benz[a]pyrene, methylcholanthrene, dimethylbenzanthracene, 17 beta-estradiol, and diethylstilbestrol
  • Variations in gene dosage may occur not only in the nuclear DNA, but also in the DNA of the sub-cellular organelles including the mitochondrion and chloroplast. These variations may prove advantageous or deleterious. For example, some defects in mitochondrial DNA are known to be pathogenic in humans (Shadel, G.S. et al. (1997) Ann. Rev. Biochem. 66:409-435).
  • Chromosomal abnormalities may prove advantageous to a cell or organism.
  • aneuploidy resulting in the amplifications of certain genes can compensate for deletions or defects in other genes or otherwise prove advantageous by, e.g., conferring a growth advantage on the aneuploid organism or cancerous cells.
  • plants that are polyploid may be cultivated because they have new traits that are not seen in diploid species, such as increased vigor and higher yield (see the internet site at cc.ndsu.nodak.edu).
  • chromosomal abnormalities have been detected by karyotyping via microscopic examination of stained cells and their chromosomes. Circulating blood lymphocytes or amniocytes are collected and cultured in vitro under conditions that stimulate cell division. Colchicine is then added to arrest mitosis during metaphase.
  • CGH Comparative genomic hybridization
  • CGH on cDNA microarrays has been used to detect DNA copy- number variation in breast cancer cell lines and tumors (Solinas-Toldo et al. (1997) Genes Chromosomes Cancer 20(4):399-407; Pinkel et al. (1998) Nat. Genetics 20(2):207-211; Pollack et al. (1999) Nature Genetics 23:41-46).
  • the present invention relates to methods for detecting the presence of aneuploidy, in particular, to determine the likelihood that aneuploidy is present in a cell type or organism.
  • the invention relates to methods of using expression profiles to detect the presence of aneuploidy.
  • the present invention also relates to methods of diagnosing or determining the predisposition of an organism toward diseases that are associated with abnormal copy numbers of one or more genes, i.e., aneuploidy.
  • the present invention relates to computer systems, computer program products, and kits for detecting aneuploidy, or diagnosing or determining the predisposition of a subject to diseases associated with aneuploidy, using profiles.
  • the present invention relates to a method of determining whether aneuploidy is likely to be present in a cell type or organism comprising: (a) quanti--ying levels, in one or more cells of said cell type or organism, of a plurality of cellular constituents associated with a plurality of genes in the genome of said cell type or organism, said plurality of genes comprising genes mapped to different chromosomes; (b) comparing the quantified levels of cellular constituents associated with genes mapped to the same chromosome, to the mean quantified levels of said cellular constituents associated with said plurality of genes; and (c) identifying genes mapped to the same chromosome for which the quantified levels of cellular constituents associated with said genes are substantially the same for each of said genes and are dissimilar to the mean quantified levels of said cellular constituents associated with said plurality of genes; wherein identifying said genes in step (c) indicates that aneuploidy of said same chromosome or portion thereof is likely to be present in said cell type
  • the present invention relates to a computer system for determining whether aneuploidy is likely to be present in a cell type or organism, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute steps of: (a) comparing quantified levels, in one or more cells of said cell type or organism, of a plurality of cellular constituents associated with a plurality of genes in the genome of said cell type or organism, said genes being mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and (b) identifying genes mapped to the same chromosome for which the quantified level is substantially the same for each cellular constituent associated with each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosome
  • a computer program product for directing a user computer in a computer-aided determination of whether aneuploidy is likely to be present in a cell type or organism, said computer program product comprising: computer code for comparing quantified levels, in one or more cells of said cell type or organism, of cellular constituents associated with genes in the genome of said cell type or organism mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and computer code for identifying genes mapped to the same chromosome for which the quantified levels of cellular constituents associated with said genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism.
  • the present invention relates to a method of detecting the predisposition of a cell type or organism to a disease associated with aneuploidy, comprising: (a) quantifying the levels of a plurality of cellular constituents associated with a plurality of genes in the genome of one or more cells of said cell type or organism, said plurality comprising cellular constituents associated with genes mapped to different chromosomes; (b) comparing the quantified levels of cellular constituents associated with genes mapped to the same chromosome to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; and (c) identifying genes mapped to the same chromosome for which the quantified level of cellular constituents associated with said genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes in step (c) indicates that an
  • the present invention relates to a computer system for detecting the predisposition of a cell type or organism to a disease associated with aneuploidy, comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: (a) comparing quantified levels of cellular constituents associated with genes in the genome of one or more cells of said cell type or organism, said genes being mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with plurality of genes mapped to different chromosomes; and (b) identifying genes mapped to the same chromosome for which the quantified level is substantially the same for each cellular constituent associated with each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that ane
  • a computer program product for directing a user computer in a computer-aided determination of whether a cell type or organism is predisposed to a disease associated with aneuploidy
  • said computer program product comprising: computer code for comparing quantified levels of cellular constituents associated with genes in the genome of one or more cells of said cell type or organism mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and computer code for identifying genes mapped to the same chromosome for which the quantified level of cellular constituents associated with each of said genes is substantially the same for each of said genes and is dissimilar to mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism, and wherein said cell type or organism
  • the present invention relates to a method of determining whether aneuploidy is likely to be present in a cell type or organism comprising detecting an expression bias that is shared by a first plurality of genes mapped to a single chromosome or mapped to a chromosomal portion of interest in a cell of said cell type or from said organism, wherein said expression bias is present when measured levels of a first plurality of cellular constituents associated with said first plurality of genes are different from the mean of measured levels of a second plurality of cellular constituents associated with a second plurality of genes in said cell, wherein said second plurality of genes consists of at least one gene (or at least 10 or 50 or 100 or 1,000 genes) not mapped to said chromosome or not mapped to said chromosomal portion.
  • the present invention relates to a method for detecting the presence of aneuploidy in a cell type or type of organism, comprising: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein the known alteration in copy number of at least one known gene in the one or more landmark profiles determined to be most similar is indicative of the presence of aneuploidy in said first cell type or type of organism.
  • the present invention relates to a method of diagnosing a disease associated with aneuploidy in a cell type or type of organism, comprising: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism.
  • the present invention relates to a computer system for diagnosing a disease associated with a known aneuploidy in a cell type or type of organism, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute steps of: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar
  • a computer program product for directing a user computer in a computer-aided diagnosis of a disease associated with a known aneuploidy in a cell type or organism, said computer program product comprising: computer code for comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism.
  • the present invention relates to a kit for detecting the presence of aneuploidy in a subject comprising: (a) an array comprising a positionally- addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of said subject, wherein said different nucleotide sequences are known to be increased or decreased as a result of aneuploidy; and (b) expression profiles, in electronic or written form, each correlated to a known alteration in copy number of at least one gene, wherein said expression profiles are obtained by measuring a plurality of cellular constituents in a cell of said subject having a known alteration in copy number of said at least one gene.
  • the present invention relates to a kit for detecting the presence of aneuploidy in a subject comprising: (a) an array comprising a positionally- addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of an organism; and (b) a container comprising RNA, or cDNA derived therefrom, of a cell having a known aneuploidy.
  • the present invention relates to a method of determining whether aneuploidy of one or more genes is likely to be present in a cell type or organism comprising: identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild-type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild- type organism; and wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes encoding said one or more cellular constituents is likely
  • the present invention relates to a computer system for determining whether aneuploidy is likely to be present in a cell type or organism, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute step of: identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation * of said one or more cellular constituents or other cellular constituents in said wild-type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-
  • a computer program product for directing a user computer in a computer-aided determination that aneuploidy is likely to be present in a cell type or organism, said computer program product comprising: computer code for identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild-type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild-type organism; and wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes
  • the present invention relates to a method of correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, comprising: determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value of the mean chromosomal offset ratio.
  • the present invention relates to a method of correcting a profile of a cell type or organism for aneuploidy, comprising: determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number; and dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
  • the present invention relates to a method of correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, comprising: dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number.
  • the present invention relates to a method of correcting a profile of a cell type or organism for aneuploidy, comprising: dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
  • the present invention relates to a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment
  • said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and dividing the mean quantified level of said plurality of
  • a computer program product for directing a user computer in a computer-aided correction of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment
  • said computer program product comprising: computer code for determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value of
  • the present invention relates to a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abno ⁇ nal copy number; and dividing the mean quantified level of said plurality of cellular constituents
  • a computer program product for directing a user computer in a computer-aided correction of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
  • the present invention relates to a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment
  • said computer system comprising comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or
  • a computer program product for directing a user computer in a computer-aided correction of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment
  • said computer program product comprising: computer code for dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number.
  • the present invention relates to a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
  • a computer program product for directing a user computer in a computer-aided correction of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio c for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
  • FIG. 1 shows chromosome VII expression bias in erg4A and ecml8 ⁇ lecml8 ⁇ mutants of the yeast Saccharomyces cerevisiae as determined by expression profiling, and confirmation of aneuploidy by two-color hybridization of genomic DNA from said mutants to DNA microarrays.
  • Circles represent the mean of the log 10 (expression ratio) of all genes on an individual chromosome and squares represent the mean of the log 10 (genomic content signal ratios) of all genes on an individual chromosome.
  • FIG. 2 shows segmental aneuploidy in an rpl20aAlrpl20aA mutant.
  • FIG. 3 illustrates a computer system useful for embodiments of the invention.
  • FIG. 4 shows selection for aneuploidy in rnrl ⁇ and rsp24a ⁇ /rsp24a ⁇ mutants that results in a growth advantage.
  • FIG. 5 shows spurious correlation between two mutants displaying a large transcriptional signature resulting from aneuploidy.
  • FIG. 6A shows expression data for the tupl ⁇ deletion mutant that reveals chromosome- wide expression biases that are consistent with aneuploidy
  • FIGS. 6B-6C respectively show chromosome-wide expression biases in rpbl ⁇ 187 and hhf2 expression profiles consistent with aneuploidy
  • FIG. 6D shows an expression profile of &pip2 ⁇ oafl ⁇ double mutant determined by SAGE analysis that suggests a chromosome- wide expression bias that is consistent with aneuploidy.
  • a cell e.g., “mutation of a gene in a cell”
  • a "cell type,” as used herein, can refer to a cell of a species of interest (e.g., corn, bean, human, mouse), a lineage of interest (e.g., blood cell, nerve cell, skin cell), or a tissue of interest (e.g., lung, brain, heart).
  • a species of interest e.g., corn, bean, human, mouse
  • a lineage of interest e.g., blood cell, nerve cell, skin cell
  • a tissue of interest e.g., lung, brain, heart
  • Such cells can be from naturally single-celled organisms or derived from multi-cellular higher organisms.
  • the cell can be a cell of a plant, including but not limited to a monocot, such as rice, corn, wheat and other grasses, or a dicot, such as beans, Arabidopsis, potatoes or tobacco, or an animal, including but not limited to mammals, primates, humans, and non-human animals such as dogs, cats, horses, cows, sheep, mice, rats, etc.
  • Aneuploidy may have effects on the biological state of a cell, which can be represented by measured amounts of cellular constituents as defined in Section 5.1.1, below.
  • the variations in gene dosage, in addition to affecting the biological state of the cell, may also affect the phenotype or predisposition of an organism to a disease.
  • the inventors have discovered that a variation in gene copy number is mirrored in the expression profiles of an organism.
  • an organism that is, e.g., trisomic for a particular chromosome will exhibit, for example, increased levels of mRNA transcribed from a plurality of genes on the trisomic chromosome.
  • the invention is also premised upon the observation that, in some organisms with altered gene dosage, i.e.
  • biological sample is broadly defined to include any cell, tissue, organ or multicellular organism.
  • a biological sample can be derived, for example, from cell or tissue cultures in vitro.
  • a biological sample can be derived from a living organism or organisms or from a population of single cell organisms.
  • the state of a biological sample can be measured by the content, activities or structures of its cellular constituents.
  • the state of a biological sample is taken from the state of a collection of cellular constituents, which are sufficient to characterize the cell or organism for an intended purpose including, but not limited to characterizing the effects of variations in gene dosages, i.e., copy number.
  • the term "cellular constituent" is also broadly defined in this disclosure to encompass any kind of measurable biological variable.
  • the measurements and/or observations made on the state of these constituents can be of their abundances (i.e., amounts or concentrations in a biological sample), or their activities, or their states of modification (e.g., phosphorylation), or other measurements relevant to the biology of a biological sample.
  • this invention includes making such measurements and/or observations on different collections of cellular constituents. These different collections of cellular constituents are also called herein aspects of the biological state of a biological sample. It is noted that, as used herein, the term "cellular constituent" is not intended to refer to known subcellular organelles such as mitochondria, chloroplasts, lysozomes, etc.
  • the transcriptional state of a biological sample is its transcriptional state.
  • the transcriptional state is the currently preferred aspect of the biological state measured in this invention.
  • the transcriptional state of a biological sample includes the identities and abundances of the constituent RNA species, especially mRNAs, in the cell under a given set of conditions. Preferably, a substantial fraction of all constituent RNA species in the biological sample are measured, but at least a sufficient fraction is measured to characterize a variation in gene dosage.
  • the transcriptional state of a biological sample can be conveniently determined by, e.g., measuring cDNA abundances by any of several existing gene expression technologies.
  • One particularly preferred embodiment of the invention employs DNA arrays for measuring mRNA or transcript level of a large number of genes.
  • Another aspect of the biological state of a biological sample usefully measured in the present invention is its translational state.
  • the translational state of a biological sample includes the identities and abundances of the constituent protein species in the biological sample under a given set of conditions. Preferably, a substantial fraction of all constituent protein species in the biological sample is measured, but at least a sufficient fraction is measured to characterize the action of a perturbation of interest.
  • the transcriptional state is often representative of the translational state.
  • Other aspects of the biological state of a biological sample are also of use in this invention.
  • the activity state of a biological sample includes the activities of the constituent protein species (and also, optionally, catalytically active nucleic acid species) in the biological sample under a given set of conditions.
  • the translational state is often representative of the activity state.
  • This invention is also adaptable, where relevant, to "mixed" aspects of the biological state of a biological sample in which measurements of different aspects of the biological state of a biological sample are combined. For example, in one mixed aspect, the abundances of certain RNA species and of certain protein species, are combined with measurements of the activities of certain other protein species. Further, it will be appreciated from the following that this invention is also adaptable to other aspects of the biological state of the biological sample that are measurable.
  • the biological state of a biological sample is represented by a profile of a plurality of cellular constituents.
  • S can be the transcription level of gene i or, alternatively, the abundance or activity level of protein i.
  • the elements S are continuous variables. For example, transcriptional rates are typically indicated as numbers of molecules synthesized per unit of time.
  • Transcriptional rates can also be indicated as percentages of a control rate.
  • the elements S can be categorical variables.
  • transcriptional rates can be indicated as either “on” or “off,” where the value "on” indicates a transcriptional rate above a user-determined threshold value and "off indicates a transcriptional rate below that threshold.
  • the response of a biological sample to a variation in gene dosage resulting from aneuploidy can be measured by observing changes in the biological state of the sample.
  • a biological response profile is a collection of such changes of cellular constituents.
  • the profile of a biological sample (e.g., a cell or cell culture) resulting from the variation in gene dosage m can be represented by the vector v (m) , v (w ...vP] (Equation 2).
  • v( m) is the amplitude of the response of cellular constituent i in a biological sample subject to the variation in gene dosage m, i.e., aneuploidy, such as that which occurs as a result of trisomy of a particular chromosome.
  • vf m) can be simply the absolute measured amounts, e.g., abundances, activity levels or levels of modification, of cellular constituent i in a biological sample having the variation in gene dosage m, or the difference in measured amounts of cellular constituent i between a biological sample that has the variation in gene dosage m and a sample that does not have the variation in gene dosage m.
  • v/ m) can be the ratio (or, more preferably, the logarithm of the ratio) of the measured amounts of cellular constituent i in a sample having the variation in gene dosage m to a sample that does not have the variation in gene dosage m.
  • Aneuploidy can include, for example, genetic "knockouts" in which one or more particular genes of the cell or organism are deleted or inactivated, e.g., by standard techniques, such as homologous recombination, that are well known in the art. Such aneuploidy can also include amplifications, e.g., duplications, of at least one gene, of a portion of a gene sufficient to be expressed, or of a chromosome or a portion thereof.
  • the response v[" l) of the -.'th cellular constituent to a particular alteration in gene dosage, m can simply be the ratio of or difference between the measured amounts of cellular constituent i in a cell or cells having the particular altered gene dosage and in a cell or cells that do not have the altered gene dosage.
  • v( m) can be the ratio (or, more preferably, the logarithm of the ratio) of the measured amounts of cellular constituent i in a cell or cells having the particular alteration in gene dosage to such measured amounts in a cell or cells that do not have the particular alteration in gene dosage.
  • the response vf m) of the z'th cellular constituent to a particular alteration of gene dosage, m can be the absolute amount of cellular constituent i in the cell or cells having the altered gene dosage, e.g., the number of mRNA molecules per cell.
  • v( m) is set equal to zero for all cellular constituents i whose responses are below a threshold amplitude or confidence level which can be determined, e.g., from knowledge of the measurement error behavior. For example, in some embodiments, only cellular constituents that have a response greater than or equal to two standard errors in more than N profiles may be selected for subsequent analysis, where the number of profiles N is selected by a user of the invention.
  • V ⁇ (m) may be equal to the measured value.
  • n, v ⁇ m may be made equal to the expression and/or activity of the 'th cellular constituent at the highest dosage of the gene, m.
  • the response at different gene dosages, u may be interpolated to a smooth, piece-wise continuous function, e.g., by spline- or model-fitting, and v( m) made equal to some parameter of the interpolation.
  • variable "u” in Equation 3, above refers to an arbitrary value of the gene dosage level where the response of the z'th cellular constituent is to be evaluated.
  • S can be any smooth, or at least piece- wise continuous, function of limited support having a width characteristic of the structure expected in the response functions.
  • An exemplary width can be chosen to be the distance over which the response function being interpolated rises from 10% to 90% of its asymptotic value.
  • Exemplary S function include linear and Gaussian interpolation.
  • model-fitting the response data to various levels u t of the gene dosage n are interpolated by approximating the response by a single parameterized function.
  • An exemplary model-fitting function appropriate for approximating transcriptional state data is the Hill function:
  • the Hill function shown in Equation 4, above, comprises adjustable parameters of: (1) an amplitude parameter a; (2) an exponent n; and (3) an inflection point parameter 0 .
  • the adjustable parameters are selected independently for each cellular constituent.
  • the adjustable parameters are selected so that for each cellular constituent of the response the sum of the squared of the distances of H(u.) from v( m) (u ⁇ is minimized.
  • This preferable parameters adjustment method is well known in the art as a least squares fit of H() to Such a fit can be done using any of the many available numerical methods known in the art (see, e.g., Press et al., 1996, Numerical Recipes in C, 2nd Ed., Cambridge University Press, Chpts.
  • the response amplitude v. (m) can then be selected to be equal to, e.g. , the amplitude parameter a in Equation 4.
  • the biological response profile data may be categorical. For example, in a binary approximation the response amplitude vf m) is set equal to zero if there is no significant response, and is set equal to 1 if there is a significant response.
  • the response amplitude (1) is set equal to +1 if cellular constituent i has a significant increase in expression or activity in a biological sample having gene dosage m; (2) is set equal to zero if there is no significant response; and (3) is set equal to -1 if there is a significant decrease in expression or activity.
  • Such embodiments are particularly preferred if it is known or suspected that the responses to which the biological response profile v- (m) is to be compared do not have the same relative amplitudes as v( m) but do involve the same cellular constituents.
  • the methods of the present invention use profiles, which comprise measurements of levels of individual cellular constituents (or changes in such measurements), e.g., measurements of abundances of mRNA or protein species, protein activities, levels of protein modification such as phosphorylation of kinases, etc., to detect aneuploidy, in particular, to determine the likelihood that aneuploidy is present in the genome of an organism.
  • profiles comprise measurements of levels of individual cellular constituents (or changes in such measurements), e.g., measurements of abundances of mRNA or protein species, protein activities, levels of protein modification such as phosphorylation of kinases, etc.
  • a profile of a subj ect cell type or organism is shown to correlate with one or more compendium profiles from a cell type or organism having aneuploidy associated with a certain disease, then the disease can be diagnosed or predicted in the subject cell type or organism.
  • calculation of the expression bias of chromosomally adjacent genes can be used to determine the presence of aneuploidy of a chromosome, or a portion thereof.
  • detected co-regulation of sets of genes in aneuploid cells may reveal the chromosomal localization/mapping of unmapped genes in a genome, since those genes located in the same region of a chromosome in an aneuploid cell type or organism are more likely to show similarity in expression levels.
  • detection of aneuploidy in cells will facilitate the accurate interpretation of whole genome expression data, particularly from cells known to have genetic instability, such as cancer cells.
  • the methods of this invention employ certain types of cells, certain observations of changes in aspects of the biological state of these cells, and certain comparisons of the observed changes. In the following, these cell types, observations, and comparisons are described in turn in detail.
  • Wild-type cells are reference, or standard, cells used in a particular application or embodiment of the methods of this invention. Being only a reference cell, a wild-type cell need not be a cell normally found in nature, and often will be a recombinant or genetically altered cell line. Usually the cells are cultured in vitro as a cell line or strain. Other cell types used in the particular application of the present invention are preferably derived from the wild-type cells. Less preferably, other cell types are derived from cells substantially isogenic with wild-type cells.
  • wild-type cells might be a particular cell line of the yeast Saccharomyces cerevisiae, or a particular mammalian cell line (e.g., HeLa cells).
  • a particular mammalian cell line e.g., HeLa cells.
  • this disclosure often makes reference to single cells (e.g., "RNA is isolated from a cell deleted for a single gene"), it will be understood by those of skill in the art that more often any particular step of the invention will be carried out using a plurality of genetically identical cells, e.g., from a cultured cell line or tissue sample from a human patient.
  • Two cells are said to be "substantially isogenic" where their expressed genomes differ by a known amount that is at less than 10% of genetic loci, more preferably at less than 1%, or even more preferably at less than 0.1%.
  • two cells can be considered substantially isogenic when the portions of their genomes relevant to the effects of altered gene dosages of interest differ by the preceding amounts. It is preferable that the differing loci be individually known.
  • Modified cells are derived from wild-type cells by modifications to the genome of the wild-type cells.
  • protein activities result in part from protein abundances; protein abundances result from translation of mRNA (balanced against protein degradation); and mRNA abundances result from transcription of DNA and splicing of mRNA precursors (balanced against mRNA degradation). Therefore, genetic level modifications to a cellular DNA constituent alters transcribed mRNA abundances, translated protein abundances, and ultimately protein activities.
  • modified cells include those cells having altered gene dosages.
  • an example of a modified cell comprises a cell having at least one gene, usually a protein-coding gene, that is substantially amplified.
  • a modified cell comprises a cell having at least one gene that is substantially deleted.
  • deletion mutants also include mutants in which a gene has been disrupted so that usually no detectable mRNA or bioactive protein is expressed from the gene, even though some portion of the genetic material may be present.
  • a modified cell further comprises a cell having a deviation from an exact multiple of the haploid number of chromosomes.
  • modified cells having altered gene dosages may not be derived from the wild-type cells but may instead be derived from cells that are substantially isogenic with wild-type cells, except for their particular genetic modifications.
  • aneuploidy refers to a state or condition of a cell or organism wherein at least one gene in the genome of said cell or organism has a gene dosage that is altered from the gene dosage of a wild-type cell of said type or wild-type organism.
  • the altered gene dosage can be the result of, mter alia, chromosome non-disjunction, homologous recombination or chromosome breakage.
  • an "aneuploid cell or organism” is a cell or organism exhibiting variation in the dosage of at least one gene, or a portion thereof.
  • the methods of the invention involve observing changes in any of several aspects of the biological state of a cell (e.g., changes in the transcriptional state, in the translational state, in the activity state, and so forth) between a wild-type cell and a modified cell in order to detect a variation in gene dosage between the two cells.
  • it may be useful to create a known variation in dosage of a particular gene and measure the resulting profile, e.g., to create a database relating a profile to a particular aneuploidy, such as trisomy of human chromosome 21.
  • a variation in gene dosage can be achieved by amplification of one or more genes, or by over-expression or under-expression of the encoded RNA or protein of a gene (see Section 5.6 and its subsections, infra).
  • a variation in gene dosage may result indirectly from introduction of one or more point mutations, insertions or deletions into a gene of interest by triggering an unwanted secondary event, such as compensation for the loss of function of a particular gene by amplification of a paralog gene and the surrounding genetic material. In the latter case, the aneuploidy that occurs in response to a genetic mutation is unpredictable and should be characterized.
  • Aneuploidy of one or more genes in the genome of a cell or organism may result in a "perturbation" (change in the measured level) of a cellular constituent associated with said one or more genes, e.g., by resulting in an increase in mRNA messages transcribed from amplified genes or in an increase in protein levels encoded by the mRNAs.
  • Measured levels of other cellular constituents may remain constant, and measured levels of still other cellular constituents may decrease in an aneuploid cell.
  • the set of measured levels of cellular constituents can be referred to as a profile.
  • a profile can be a pattern of changes in mRNA abundances, protein abundances, protein activity levels, etc.
  • a first cellular constituent and a second cellular constituent are said to be “differently perturbed” when for the first cellular constituent there is a positive perturbation and for the second cellular constituent there is a negative perturbation or no perturbation.
  • cellular constituents are “differently perturbed” if for the first cellular constituent there is a negative perturbation and for the second cellular constituent there is a positive perturbation or no perturbation.
  • two cellular constituents are “differently perturbed” if for the first cellular constituent there is no perturbation and for the second cellular constituent there is either a positive or a negative perturbation.
  • two perturbation can be said to be “differently perturbed” where the measured values for the two perturbations are detectably different, preferably having a statistically significant difference.
  • perturbations of a first and a second cellular constituent are said to be the "same" when both have a negative or a positive perturbation, or where the measured values are not significantly different.
  • a numerical abundance or activity ratio can be calculated and placed in the profile. For example, in the case of transcriptional state measurements by quantitative gene expression technologies, a numerical expression ratio of the abundances of cDNAs (or mRNAs in an appropriate technology) in a modified biological sample and in a wild-type biological sample can be calculated. Alternatively, a logarithm (e.g., log 10 ) (or another monotonic function) of the abundance ratio can be used. Alternatively, an absolute numerical abundance or activity, e.g., a number of mRNA molecules in a cell, can be measured and placed in the profile.
  • arbitrary integer values can be assigned to each type of perturbation of a cellular constituent. For example, the value +1 can be assigned to a positive perturbation; the value -1 to a negative perturbation; and the value 0 to no perturbation.
  • the resulting profile can be arranged as the transcript array is arranged.
  • variations in gene dosage are detected by measuring and comparing changes in the transcriptional state of a cell. Analysis of the transcriptional state is often sufficient for purposes of characterizing aneuploidy, because no global dosage compensation mechanism for autosomes (non-sex chromosomes) is known to exist for normalization of expression from each gene (or chromosome) in aneuploid strains. Most aneuploidies produce a significant and characteristic change in the transcriptional state of the cell. Further, in yeast and humans, and probably other organisms, the homeostatic expression mechanisms to compensate for aneuploidy of autosomes have never been reported, and are not expected to exist.
  • aneuploidy may also exist in the genetic material of sub-cellular organelles, e.g., mitochondria and chloroplasts.
  • gene copy number of, e.g., mitochondrial or chloroplast DNA may also be assayed by the methods of the present invention in order to detect, e.g., the relative number of mitochondria or chloroplasts in a cell type or organism, or the presence of abnormal copy numbers of genes in these organelles, which may be indicative of desirable phenotypes or of disease.
  • the modified-cell profile includes a plurality of perturbation values that represent the perturbation in cellular constituents observed in an aspect of the biological state of a modified cell resulting from an indicated variation in gene dosage, as described above.
  • the levels of cellular constituents associated with genes on different chromosomes are quantified, and are compared to quantified levels of cellular constituents associated with genes mapped to the same chromosome or a portion thereof.
  • Aneuploidy of a chromosome or a portion thereof is then determined by identifying at least 1, preferably at least 4, still more preferably at least 10, and even more preferably at least 50 genes mapped to the same chromosome or a portion thereof for which the level of the cellular constituent associated with each gene is substantially the same and is dissimilar to the mean quantified levels of cellular constituents associated with genes mapped to different chromosomes.
  • a cellular constituent that is "associated with" a gene is meant a cellular constituent that either directly or indirectly originates from said gene.
  • the cellular constituent may be the mRNA that is transcribed from said gene. Alternatively, it may be the protein that is translated from said mRNA.
  • the cellular constituent "associated with" a gene may be, for example, a protein target that is phosphorylated by the protein product of said gene, such that an increase in phosphorylation of said protein target is indicative of an increased amount or activity of the protein product of said gene.
  • an aspect of the biological state of a modified cell with a variation in gene dosage is measured and compared to that aspect of the biological state of the cell without such a variation (wild-type) in order to determine the cellular constituents in this aspect that are perturbed or are not perturbed.
  • a profile comprising a collection of the measured changes in cellular constituents in the modified cell relative to a wild-type cell is not generally limited to revealing only changes directly due to the variation in gene dosage, because changes in the elements of the biological state that are indirectly affected by the particular gene dosage will also be apparent. This type of profile provides information about the effects of the variation in gene dosage on the biological state of a wild-type cell.
  • the methods of this invention detect the presence of altered gene dosage, i.e., aneuploidy, in a cell type or organism.
  • a "landmark profile,” as used herein, refers to a profile of a modified cell or organism having a known alteration in copy number of one or more genes or to a profile of a wild-type cell or organism.
  • a group of such profiles preferably comprising a plurality of landmark profiles, each associated with a different, known aneuploidy, is herein called a compendium of landmark profiles, is assembled for detecting aneuploidy in an unknown cell type or organism.
  • a landmark profile that is "indicative of the presence or absence of aneuploidy of a particular gene, chromosomal region or chromosome", as used herein, does not have to conclusively indicate that aneuploidy is present or absent.
  • a landmark profile that is indicative of the presence or absence, respectively, of aneuploidy indicates an increased probability that aneuploidy is present or absent, respectively, which can be with varying degrees of certainty, from aneuploidy being more likely than not present or absent, to it being reasonably conclusive that aneuploidy is present or absent, respectively.
  • PCR polymerase chain reaction
  • comparative genomic hybridization labeled DNA is hybridized to metaphase chromosome spreads from normal cells and from cells suspected of being aneuploid. By measuring the relative amounts of hybridization of the labeled DNA to the two genomes, variations in gene copy number between the genomes can be detected.
  • a landmark profile that is "indicative of the presence of aneuploidy" can be indicative of the presence of a particular type of aneuploidy. Therefore, an organism that has trisomy of chromosome 1 is likely to have a different profile from that of the same type of organism that has a 100-fold amplification of five contiguous genes on chromosome 1, which is likely to in turn have a different profile from that of the same type of organism that has a deletion of the short arm of chromosome 1. Therefore, the profile not only indicates the presence of aneuploidy, but can also indicate the type of aneuploidy that is present.
  • the profiles are measured in the following ways.
  • the expression profile of a cell is determined by observing its transcript array. This cell may be a cell that is suspected of having aneuploidy, or it may be a cell having a known alteration in copy number of one or more genes.
  • deletion transcript profiles where the genome modification includes variations in gene dosage wherein the gene dosage is decreased with respect to the gene dosage in a wild-type cell
  • amplification transcript profiles where the genome modification includes variations in gene dosage wherein the gene dosage is increased with respect to the gene dosage in a wild-type cell
  • transcript profiles of cells exhibiting aneuploidy are examples of transcript profiles of cells exhibiting aneuploidy.
  • Methods for determining whether aneuploidy is likely to be present in a cell type or organism identify the probable variations in gene dosage that result from aneuploidy by observing profiles, preferably expression profiles.
  • the methods include three principal steps.
  • a first step includes quantifying levels of a plurality of cellular constituents associated with a plurality of genes in the genome of a cell type or organism that are mapped to different chromosomes.
  • the cellular constituents are mRNA species, i.e., levels of cellular constituents are represented by levels of mRNA species.
  • the mRNA levels may be measured by increases or decreases relative to mRNA levels in a wild-type cell.
  • the transcriptional state may be related to the absolute measured amounts of cellular constituents, e.g., the number of, for example, mRNA molecules, in a cell.
  • the cellular constituents are protein species, which are quantified by, for example, measuring the amount or activity of protein species.
  • a combination of the transcriptional and translational states of a cell type is observed.
  • a second step includes comparing the quantified levels of cellular constituents associated with at least 1, preferably at least 3, still more preferably at least 10, and even more preferably at least 50 genes mapped to the same chromosome or a portion thereof to the mean quantified levels of said cellular constituents associated with the plurality of genes.
  • a third step involves identifying genes mapped to the same chromosome or a portion thereof for which the level of cellular constituents for each gene is substantially the same, and for which the level of cellular constituents is dissimilar to the mean quantified levels of cellular constituents for said plurality of genes. If the genes identified in this step are adjacent on the same chromosome, then there is an indication that aneuploidy of the chromosome, or a portion thereof, is likely to be present in the cell type or organism.
  • a method of determining whether aneuploidy is likely to be present in a cell type or organism comprises detecting an expression bias that is shared by one or more genes mapped to a single chromosome or a portion thereof.
  • the expression bias is a measure of levels of a first plurality of cellular constituents associated with said first plurality of genes that is different from the mean measure of levels of a second plurality of cellular constituents associated with a second plurality of cellular constituents associated with a second plurality of genes in the cell type, wherein the second plurality consists of at least one gene (or at least 10 or 50 or 100 or 1,000) that is not mapped to said chromosome or portion thereof.
  • a profile or a predicted profile of a subject cell is compared to a database comprising landmark profiles (i.e. a compendium), each of which (a) arises from a cell having a known alteration in copy number of at least one gene, and (b) is digitally stored in association with the known alteration in copy number, to determine the degree of similarity between the profile of the subject cell and the landmark profiles.
  • landmark profiles i.e. a compendium
  • the profile is preferably compared to a compendium of aneuploid profiles, that is, a compendium comprising landmark profiles generated from measurements of the transcriptional state of cells with known aneuploidies of at least one gene.
  • the aneuploid profiles having the greatest similarity to the profile of the subject cell indicate which aneuploidy is likely to be present in the subject cell.
  • amounts of a plurality of cellular constituents are measured in a cell of a cell type, and a predicted profile is derived therefrom for comparison to one or more landmark profiles.
  • the predicted profile may be for different cellular constituents than those for which amounts were measured in the experiment.
  • a translational profile of protein levels may be used to predict the corresponding transcript profile, which may be used for comparison to a database comprising landmark transcript profiles.
  • a transcript profile of an immature organism e.g., a seedling, may be acquired and may be used to predict the transcript profile of the mature organism.
  • the measured amounts of cellular constituents are determined in comparison to a wild-type cell of said cell type or said organism. In another embodiment, the measured amounts of cellular constituents are absolute measured amounts of cellular constituents, e.g., a number of mRNA molecules per cell.
  • This subsection describes embodiments of the invention relating to diagnosis of a disease or to determination of a predisposition to a disease in a cell type or organism.
  • the predisposition of a subject to a disease associated with aneuploidy is determined, or a disease associated with aneuploidy is diagnosed in a subject by observing the profile, preferably the expression profile, of the subject.
  • Subjects include, but are not limited to, humans, primates, mammals, fish, birds, mice, livestock animals such as cows, pigs, goats, sheep, horses, companion animals such as cats and dogs, flowering plants, and crop plants such as corn, wheat, rice, beans, soy, and alfalfa.
  • Cells from said subjects to be assayed for said detection of disease or predisposition toward a disease associated with aneuploidy may be obtained, e.g., by biopsy or amniocentesis.
  • the subject is a human and the disease to which a predisposition is determined or which is diagnosed in said subject includes, but is not
  • trisomy 21 Edwards syndrome (trisomy 18), and Patau syndrome (trisomy 13); diseases associated with deletions of an arm of a chromosome, such as cri du chat syndrome (5p deletion) and Wolf-Hirschhorn syndrome (4p deletion); diseases associated with contiguous gene syndromes such as Alagille syndrome (20p.l2 deletion), Angelman syndrome
  • the subject is a human and the disease to which a predisposition is determined or which is diagnosed in said subject includes, but is not
  • cancers such as breast cancer; colon cancer; leukemias, such as acute myelogenous leukemia, chronic myelocytic leukemia, acute promyelocytic leukemia, acute nonlymphocytic leukemia, acute monocytic leukemia, and acute myelomonocytic leukemia; lymphomas, such as Burkitt's lymphoma, and non-Hodgkin's lymphoma; lymphocytic leukemias, such as acute lymphoblastic leukemia and chronic lymphocytic leukemia;
  • leukemias such as acute myelogenous leukemia, chronic myelocytic leukemia, acute promyelocytic leukemia, acute nonlymphocytic leukemia, acute monocytic leukemia, and acute myelomonocytic leukemia
  • lymphomas such as Burkitt's lymphoma, and non-Hodgkin's lymphoma
  • adenocarcinomas including small cell lung cancer, kidney cancer, uterine cancer, cervical cancer, prostate cancer, bladder cancer, and ovarian cancer
  • sarcomas including liposarcoma, synovial sarcoma, rhabdomyosarcoma, extraskeletal myxoid chondrosarcoma, Ewing's tumor and peripheral neuroepithelioma
  • testicular and ovarian dysgerminoma retinoblastoma; Wilms' tumor; neuroblastoma; malignant
  • hereditary papillary renal carcinomas have been associated with trisomy of chromosomes 7, 8 and 17 (Fletcher, 1997, Renal and bladder cancers. In Human Cytogenetic Cancer Markers, eds. Wofman & Sell, Totowa, NJ, Humana Press, 169-202; Zhuang et al., 1998, Nat. Genet. 20:66-69; Sen, 2000, Current Opinion in Oncology 12:82-
  • the predisposition of a subject to a disease is detected, or a disease associated with aneuploidy is diagnosed, by quantifying levels of a plurality of cellular constituents associated with genes mapped to the same chromosome, or a portion thereof, and comparing these levels to the mean quantified levels of cellular constituents associated with a plurality of genes mapped to different chromosomes.
  • each cellular constituent associated with each gene mapped to the same chromosome or a portion thereof is substantially the same for each of said genes, and is dissimilar to the mean quantified levels of cellular constituents associated with said plurality of genes mapped to different chromosomes, and if said genes mapped to the same chromosome or a portion thereof are adjacent on said chromosome, then aneuploidy of said chromosome or portion thereof is likely to be present, and said subject is likely to have a predisposition to a disease or to have a disease associated with said aneuploidy.
  • the profile of the subject to be diagnosed is compared to a compendium comprising landmark profiles, some of which are from a cell or organism having an altered copy number of at least one gene that is diagnostic or prognostic of a particular disease.
  • Diseases associated with landmark profiles having the greatest similarity to said cell profile are those diseases present in said subject.
  • the cell type from which the landmark profiles are derived is substantially isogenic to the cell type being diagnosed.
  • the cell type from which the landmark profiles are derived is preferably from the same species and tissue type as the cell type of the subject being diagnosed or assayed for predisposition to disease.
  • the cell type from which the landmark profiles are derived is preferably from the same species and tissue type as the cell type of the subject being diagnosed or assayed for predisposition to disease.
  • the landmark profiles to which the profile of the subject to be diagnosed or assayed for a predisposition to disease is compared is preferably a set of landmark profiles from fat cells of an organism of the same species.
  • the predisposition of a cell type or organism to a disease associated with aneuploidy can be detected as follows.
  • a profile of an immature (not fully differentiated), mature or asymptomatic cell e.g., from amniotic cells of a fetus
  • a compendium comprising landmark profiles each of which arises from an immature cell, or from an asymptomatic cell, having an identified alteration in copy number of at least one gene that is associated with a disease in order to determine the degree of similarity between the profile of the immature or asymptomatic cell and the landmark profiles.
  • Similarity of the immature or asymptomatic cell profiles indicates eventual similarity of profiles associated with mature cells or with cells in which a disease is present.
  • the predisposition of the immature or asymptomatic cell toward the disease associated with aneuploidy can be detected.
  • asymptomatic cell as used herein, is meant a cell that does not show a pathology related to aneuploidy, even though the genome of the cell may exhibit variations in gene dosage from a wild type cell.
  • landmark profiles for detection of the predisposition of humans toward diseases associated with aneuploidy may include, inter alia, those associated with diseases discussed above in this section.
  • amounts of a plurality of cellular constituents can be measured in an immature or asymptomatic cell of a cell type, and a predicted profile can be derived therefrom for comparison to one or more landmark profiles.
  • the predicted profile is then compared to the compendium of landmark profiles of mature cells or of cells having symptoms of a disease in order to detect the predisposition of an immature or asymptomatic cell to a disease associated with aneuploidy.
  • the profile of the immature or asymptomatic cell can be compared directly to the compendium of landmark profiles of mature cells or of cells having symptoms of a disease.
  • whole chromosomal aneuploidy is determined using mean chromosomal ratio plots.
  • a mean chromosomal ratio plot the ratio of measured amounts of cellular constituents associated with at least 10%, preferably at least 30%, more preferably at least 60%, even more preferably at least 90%, most preferably all of the genes on a chromosome in an aneuploid cell and on the chromosome of a wild-type cell, e.g., the expression ratio, is plotted as a function of chromosome location, i.e., which chromosome the genes reside on. For example, the expression levels (circles) and genomic dosage (squares) for each chromosome correlate in FIG.
  • the mean expression ratio for each chromosome may be represented as an error- weighted mean of at least 5 genes, preferably at least 10 genes, more preferably at least 50 genes, more preferably at least 100 genes, even more preferably at least 10%, even more preferably at least 30%, even more preferably at least 60%, even more preferably at least 90%, most preferably all of the genes present on that chromosome, with the error calculated based on the quality and intensity of the data.
  • a chromosome has a statistically significant chromosome- wide expression bias if the mean chromosomal ratio has an offset of greater than 0.1 in log space and is at least ten standard deviations from the mean (P ⁇ 10 "20 ).
  • P values can be calculated from the number of standard deviations from the mean, assuming a Gaussian distribution, and the error of the mean ratio in log space computed from the spread of the data, taking into account the error of each point and the number of data points.
  • x ik is q ik / ⁇ ik and x jk is q jk / ⁇ jk
  • q ik and q jk are the logarithms of the expression ratios between the perturbed and baseline conditions for gene k in profiles i and j, respectively
  • ⁇ ik and ⁇ jk are the uncertainties in the measurements of q ik and q jk , respectively.
  • z is normally distributed with standard error l/(n-3) 1/2 and n is the total number of measurements (Fisher, 1921, Metron 1 3).
  • n is the total number of measurements (Fisher, 1921, Metron 1 3).
  • a non-parametric approach to assigning a probability to any r value is to randomize the order of the elements in the data vectors (i.e., the gene indices), and then generate a Monte Carlo distribution of r arising from the rearranged data, which satisfies the uncorrelated hypothesis. The value of r computed from the actual data is then compared to this distribution in order to assign a likelihood that the correlation is not random.
  • aneuploidy may be detected by correlation of profiles with strains of known aneuploidy.
  • segmental aneuploidy can be detected by scanning the expression ratio data for instances in which a number, i.e., at least two, preferably at least four, of non-overlapping, chromosomally-adjacent genes are all up- or down-regulated at, e.g., a 0.05 significance threshold.
  • FIG. 2a and 2b depict the log 10 of expression ratios of cellular constituents associated with all genes on chromosome XV of a yeast rpl20aA/ rpl20a ⁇ mutant as a function of chromosome location. Segmental aneuploidy on chromosome XV is shown by expression ratio data (FIG. 2a-b), and is confirmed by assaying genomic DNA copy number (FIG. 2c-d). 5.5 USING CO-VARYING SETS TO DETECT ANEUPLOIDY
  • the methods of the present invention can involve using cellular constituents in the biological response profiles that are arranged or grouped according to their tendency to co-vary in response to a perturbation. For example, if groups of cellular constituents that normally co-vary in response to perturbations (preferably over at least 3, 5, 10, 50 or 100 different perturbations) are identified, deviations from that covariation may indicate the presence of aneuploidy in cells.
  • this Section describes specific embodiments for arranging the cellular constituents into co-varying sets. Clustering methods are also described in International Patent Publication WO 00/24936, published May 4, 2000, which is incorporated herein by reference in its entirety.
  • the basis or co-varying sets are identified by means of a clustering algorithm (i.e., by means of "clustering analysis”).
  • Clustering algorithms of this invention may be generally classified as “model-based” or “model-independent” algorithms.
  • model-based clustering methods assume that co-varying sets or clusters map to some predefined distribution shape in the cellular constituent "vector space.”
  • many model-based clustering algorithms assume ellipsoidal cluster distributions having a particular eccentricity.
  • model-independent clustering algorithms make no assumptions about cluster shape.
  • model- independent methods are substantially identical to assuming "hyperspherical” cluster distributions. Hyperspherical cluster distributions are generally preferred in the methods of this invention, e.g., when the perturbation vector elements v m have similar scales and meanings, such as the abundances of different mRNA species.
  • the clustering methods and algorithms of the present invention may be further classified as "hierarchical” or "fixed-number-of groups” algorithms (see, e.g., S-Plus Guide to Statistical and Mathematical Analysis v.3.3, 1995, MathSoft, Inc.: StatSci. Division, Seattle, Washington).
  • Such algorithms are well known in the art (see, e.g., Fukunaga, 1990, Statistical Pattern Recognition, 2nd Ed., San Diego: Academic Press; Everitt, 1974, Cluster Analysis, London: Heinemann Educ.
  • hierarchical clustering methods and/or algorithms are employed in the methods of this invention.
  • the clustering analysis of the present invention is done using the hclust routine or algorithm (see, e.g., 'hclusf routine from the software package S-Plus, MathSoft, Inc., Cambridge, MA).
  • the clustering algorithms used in the present invention operate on a table of data containing measurements of a plurality of cellular constituents, preferably gene expression measurements.
  • the data table analyzed by the clustering methods of the present invention comprise an Nx K array or matrix wherein N is the total number of conditions or perturbations and K is the number of cellular constituents measured or analyzed.
  • the clustering algorithms of the present invention analyze such arrays or matrices to dete ⁇ nine dissimilarities between cellular constituents. Mathematically, dissimilarities between cellular constituents i and y are expressed as "distances" / . For example, in one embodiment, the Euclidian distance is determined according to the Equation 8:
  • Equation 8 vf m) and v (m) are the responses of cellular constituent i andy, respectively, to the perturbation m.
  • the Euclidian distance in Equation 9, above is squared to place progressively greater weight on cellular constituents that are further apart.
  • the distance measure I is the Manhattan distance provided by Equation 9:
  • the distance measure is preferably a percent disagreement defined by Equation 10:
  • r is defined by Equation 11, below:
  • Equation 11 the dot product v.-v, is defined according to Equation 12:
  • the distance measure can some other distance measure known in the art, such as the Chebychev distance, the power distance, and percent disagreement, to name a few.
  • the distance measure is appropriate to the biological questions being asked, e.g., for identifying co-varying and/or co-regulated cellular constituents including co-varying or co-regulated genes.
  • the distance measure I tJ - - r ⁇ j with the correlation coefficient which comprises a weighted dot product of the response vectors v, and v..
  • r is preferably defined by Equation 13:
  • Equation 13 the quantities ⁇ m) and ⁇ j m are the standard errors associated with the measurement of the t'th and/'th cellular constituents, respectively, in experiment m.
  • the correlation coefficients provided by Equations 11 and 13 are bounded between values of +1, which indicates that the two response vectors are perfectly correlated and essentially identical, and -1, which indicates that the two response vectors are "anti- correlated” or "anti-sense” (i.e., axe opposites). These correlation coefficients are particularly preferably in embodiments of the invention where cellular constituent sets or clusters are sought of constituents which have responses of the same sign. However, in other embodiments, it can be preferable to identify cellular constituent sets or clusters which are co-regulated or involved in the same biological responses or pathways but comprise both similar and anti-correlated responses. In such embodiments, it is preferable to use the absolute value of the correlation coefficient provided by Equation 11 or 13; i.e., ⁇ r.
  • the relationships between co-regulated and/or co-varying cellular constituents may be even more complex, such as in instances wherein multiple biological pathways (for example, multiple signaling pathways) converge on the same cellular constituent to produce different outcomes.
  • it is preferable to use a correlation coefficient r y r cha " ge) which is capable of identifying co-varying and/or co-regulated cellular constituents irrespective of the sign.
  • the correlation coefficient specified by Equation 14, below, is particular useful in such embodiments.
  • clustering algorithms used in the methods of the invention also use one or more linkage rules to group cellular constituents into one or more sets or "clusters.”
  • single linkage or the nearest neighbor method determines the distance between the two closest objects (i.e., between the two closest cellular constituents) in a data table.
  • complete linkage methods determine the greatest distance between any two objects (i.e., cellular constituents) in different clusters or sets.
  • the unweighted pair-group average evaluates the "distance" between two clusters or sets by determining the average distance between all pairs of objects (i.e., cellular constituents) in the two clusters.
  • the weighted pair-group average evaluates the distance between two clusters or sets by determining the weighted average distance between all pairs of objects in the two clusters, wherein the weighing factor is proportional to the size of the respective clusters.
  • an agglomerative hierarchical clustering algorithm is used. Such algorithms are known in the art and described, e.g., in Hartigan, supra. Briefly, the algorithm preferably starts with each object (e.g., each cellular constituent) as a separate group. In each successive step, the algorithm identified the two most similar objects by finding the minimum of all the pair-wise similarity measures, merges them into one object (i.e., into one "cluster") and updates the between-cluster similarity measures accordingly. The procedure continues until all objects are found in a single group. When merging two closest objects, a heuristic criterion of average linkage is preferably employed to redefine the between-cluster similarity measures.
  • clustering yields a rigid hierarchical structure among objects and defines their memberships.
  • Genesets may be readily defined based on the branchings of a clustering tree.
  • genesets may be defined based on the many smaller branchings of a clustering tree, or, optionally, larger genesets may be defined corresponding to the larger branches of a clustering tree.
  • the choice of branching level at which genesets are defined matches the number of distinct response pathways expected. In embodiments wherein little or no information is available to indicate the number of pathways, the genesets should be defined according to the branching level wherein the branches of the clustering tree are "truly distinct.”
  • Truly distinct may be defined, e.g., by a minimum distance value between the individual branches.
  • the distance values between truly distinct genesets are in the range of 0.2 to 0.4, where a distance of zero corresponds to perfect correlation and a distance of unity corresponds to no correlation.
  • distances between truly distinct genesets may be larger in certain embodiments, e.g., wherein there is poorer quality data or fewer experiments n in the profile data.
  • the distance between truly distinct genesets may be less than 0.2.
  • truly distinct cellular constituent sets are defined by means of an objective test of statistical significance for each bifurcation in the clustering tree.
  • truly distinct cellular constituent sets are defined by means of a statistical test which uses Monte Carlo randomization of the experiment index m for the responses of each cellular constituent across the set of experiments.
  • the experiment index m of each cellular constituent's response v[ m) is randomly permutated, as indicated by Equation 15: v m) ⁇ v, ⁇ ( " !) (Equation 15)
  • a large number of permutations of the experiment index m is generated for each cellular constituent's response.
  • the number of permutations is from 50 to about 1000, more preferably from 50 to about 100.
  • Hierarchical clustering is performed on the permutated data, preferably using the same clustering algorithm as used for the original unpermuted data;
  • Equation 16 is the square of the distance measure for cellular constituent i with respect to the center (i.e., the mean) of its assigned cluster.
  • the superscripts (1) and (2) indicate whether the square of the distance measure D, is made with respect to (1) the center of its entire branch, or (2) the center of the appropriate cluster out of the two clusters.
  • the distance function -D, in Equation 16 may be defined according to any one of several embodiments. In particular, the various embodiments described supra for the definition of y may also be used to define D l in Equation 16.
  • the distribution of fractional improvements obtained from the above-described Monte Carlo methods provides an estimate of the distribution under the null hypothesis, i.e., the hypothesis that a particular branching in a cluster tree is not significant or distinct.
  • a significance can thus be assigned to the actual fractional improvement (i.e., the fraction improvement of the unpermuted data) by comparing the actual fractional improvement to the distribution of fractional improvements for the permuted data.
  • the significance is expressed in terms of the standard deviation of the null hypothesis distribution, e.g., by fitting a log normal model to the null hypothesis distribution obtained from the permuted data.
  • an objective statistical test is preferably employed to determine the statistical reliability of the grouping decisions of any clustering method or algorithm.
  • a similar test is used for both hierarchical and non-hierarchical clustering methods.
  • the statistical test employed comprises (a) obtaining a measure of the compactness of the clusters determined by one of the clustering methods of this invention, and (b) comparing the obtained measure of compactness to a hypothetical measure of compactness of cellular constituents regrouped in an increased number of clusters.
  • a hypothetical measure of compactness preferably comprises the measure of compactness for clusters selected at the next lowest branch in a clustering tree.
  • the hypothetical measure of compactness is preferably the compactness obtained for N+1 clusters by the same methods.
  • Cluster compactness maybe quantitatively defined, e.g., as the mean squared distance of elements of the cluster from the "cluster mean," or, more preferably, as the inverse of the mean squared distance of elements from the cluster mean.
  • the cluster mean of a particular cluster is generally defined as the mean of the response vectors of all elements in the cluster.
  • the above definition of mean is problematic in embodiments wherein response vectors can be in opposite directions such that the above defined cluster mean could be zero.
  • cluster compactness such as, but not limited to, the mean squared distance between all pairs of elements in the cluster.
  • the cluster compactness may be defined to comprise the average distance (or more preferably the inverse of the average distance) from each element (e.g., cellular constituent) of the cluster to all other elements in that cluster.
  • step (b) above of comparing cluster compactness to a hypothetical compactness comprises generating a non-parametric statistical distribution for the changed compactness in an increased number of clusters. More preferably, such a distribution is generated using a model which mimics the actual data but has no intrinsic clustered structures (i.e., a "null hypothesis" model). For example, such distributions may be generated by (a) randomizing the perturbation experiment index m for each actual perturbation vector v[ m and (b) calculating the change in compactness which occurs for each distribution, e.g. , by increasing the number of clusters from N to N+1 (non-hierarchical clustering methods), or by increasing the branching level at which clusters are defined (hierarchical methods).
  • the increased compactness is given by the parameter E, which is defined by Equation 17, below: rW _ Htr+i) j-, mean • * ⁇ mean / ⁇ -, , . . ⁇
  • the statistical methods of this invention provide methods to analyze the significance of E. Specifically, these methods provide an empirical distribution approach for the analysis of E by comparing the actual increase in compactness, E 0 , for actual experimental data to an empirical distribution of E values determined from randomly permuted data (e.g., by Equation 15 above).
  • the coordinates (i.e., the indices) of the vectors in each cluster being subdivided are "reflected" about the cluster center, e.g., by first translating the coordinate axes to the cluster center.
  • the randomly permuted data are re-evaluated by cluster algorithms, most preferably by the same cluster algorithm used to determine the original cluster(s), so that new clusters are determined for the permutated data, and a value of E is evaluated for these new clusters (i.e., for splitting one or more of the new clusters).
  • Steps one and two above are repeated for some number of Monte Carlo trials to generate a distribution of E values.
  • the number of Monte Carlo trials is from about 50 to about 1000, and more preferably from about 50 to about 100.
  • E 0 is compared to this empirical distribution of E values.
  • the confidence level in the number of clusters may be evaluated from 1-x/M.
  • Cellular constituent sets can also be defined based upon the mechanism of the regulation of cellular constituents.
  • genesets can often be defined based upon the regulation mechanism of individual genes. Genes whose regulatory regions have the same transcription factor binding sites are more likely to be co-regulated, and, as such, are more likely to co-vary.
  • the regulatory regions of the genes of interest are compared using multiple alignment analysis to decipher possible shared transcription factor binding sites (see, e.g., Stormo and Hartzell, 1989, Proc. Natl. Acad. Sci. 5(5:1183-1187; and Hertz and Stormo, 1995, Proc. of3rdIntl. Conf.
  • the common promoter sequence responsive to Gcn4 in 20 genes is likely to be responsible for those 20 genes co-varying over a wide variety of perturbations.
  • Co-regulated and/or co-varying genes may also be in the up- or down-stream relationship where the products of up-stream genes regulate the activity of down-stream genes.
  • gene regulation networks there are numerous varieties of gene regulation networks. Accordingly, the methods of the present invention are not limited to any particular kind of gene regulation mechanism. If it can be derived or determined from their mechanisms of regulation, whatever that mechanism happens to be, that two or more genes are co-regulated in terms of their activity change in response to perturbation, those two or more genes may be clustered into a geneset.
  • clustering may be used to cluster genesets when the regulation of genes of interest is partially known.
  • the number of genesets may be predetermined by understanding (which may be incomplete or limited) or the regulation mechanism or mechanisms.
  • the clustering methods may be constrained to produce the predetermined number of clusters. For example, in a particular embodiment promoter sequence comparison may indicate that the measured genes should fall into three distinct genesets. The clustering methods described above may then be constrained to generate exactly three genesets with the greatest possible distinction between those three sets.
  • Cellular constituent sets such as cellular constituent sets identified by any of the above methods or combinations thereof, may be refined using any of several sources of corroborating information.
  • corroborating information which may be used to refine cellular constituent sets include, but are by no means limited to, searches for common regulatory sequence patterns, literature evidence for co-regulations, sequence homology (e.g., of genes or proteins), and known shared function.
  • a cellular constituent database or “compendium” is used for the refinement of genesets.
  • the compendium is a "dynamic database.”
  • a compendium containing raw data for cluster analysis of cellular constituent sets e.g., for genesets is used to continuously update geneset definitions.
  • the cellular constituents are re-ordered according the cellular constituent sets or clusters obtained or provided by the above-described methods and visually displayed.
  • the biological state of a cell is determined by measuring the expression levels of a plurality of genes in a cell to produce a transcript (or expression) profile.
  • the effects of altered dosages of individual genes, chromosomal regions or entire chromosomes in a cell can be conveniently and exhaustively examined by using a library of cell mutants, wherein each mutant has an altered dosage of one or more genes.
  • gene dosage can be altered by increasing or decreasing the amount of DNA of a gene, or by increasing or decreasing the levels or activities of RNA or protein encoded by said gene.
  • a mutation in a gene of a cell or organism may result in altered dosage of other genes because the cell or organism compensates for, e.g., loss of function of the mutated gene.
  • altered gene dosage of a particular gene m may be the result of a mutation in a paralog gene m ' that has a similar function to gene m.
  • a mutation of gene i ' that results in a deletion or down-regulation of the gene may be compensated for by, e.g., homologous recombination and selection for increased dosage of gene m, which has a similar function to gene m '.
  • a mutation in a gene that results in altered dosages of other genes can be spontaneous or can be introduced by techniques including, but not limited to, transfection, homologous recombination, promoter replacement, or RNA anti- sense approaches.
  • aneuploidy may be induced by making mutations in genes whose function is to maintain a wild-type chromosome number in a cell type or organism. Thus, when these mutants become aneuploid, there is likely to be no mechanism in the cell to correct the altered gene dosage.
  • the transcript profiles of each of the resulting aneuploid cells are measured to produce a "compendium" comprising landmark transcript profiles, each of which is uniquely associated with a particular dose of one or more genes in an organism.
  • the compendium may comprise landmark profiles for different dosages of a particular gene, e.g., gene m, because a profile generated from a cell type or organism having a duplication of gene m may be different from a profile generated from the same cell type or organism having a 100-fold amplification of gene m.
  • a compendium can also be constructed by measuring other cellular constituents that are indicative of the biological states of aneuploid cells, which include, but are not limited to, protein expression and protein activity levels.
  • the compendium comprising landmark profiles is a database stored on a computer readable medium that carries out the comparisons.
  • the database contains at least 10 profiles, at least 50 profiles, at least 100 profiles, at least 500 profiles, at least 1,000 profiles, at least 10,000 profiles, or at least 50,000 profiles, each profile containing measurements of at least 10, preferably at least 50, more preferably at least 100, more preferably at least 500, even more preferably at least 1,000, even more preferably at least 10,000, most preferably at least 50,000 cellular constituents.
  • a library of aneuploid cells is generated by targeting mutations to particular genes of an organism and selecting for mutants that compensate for the targeted mutations with altered dosage levels of other genes.
  • Saccharomyces cerevisiae is particularly well-suited to this technique of generating mutants. While many organisms repair double-stranded DNA ends that are not part of telomeres by end-to-end ligation, S. cerevisiae uses homologous recombination.
  • targeted perturbations of genes can be made in yeast by transforming the yeast with a particular DNA sequence, which integrates at a locus with high homology.
  • a library of aneuploid cells is generated by first randomly mutagenizing the cells using, e.g., chemical agents, radiation or retroviral-mediated insertion mutagenesis and subsequent identification of cells that compensate for these mutations by exhibiting altered gene copy number.
  • profiles may change with environmental perturbations, so that when generating a compendium comprising landmark profiles, differences in environmental variables, e.g., growth medium, temperature, cell density, pH, etc., should be minimized.
  • the organism or cell from which that profile was generated should be grown under the same environmental conditions as the aneuploid cells from which the compendium was compiled.
  • profiles will change with tissue type and developmental state.
  • the database comprises landmark profiles for altered dosages of at least 2%, preferably at least 5%, more preferably at least 15%, even more preferably at least 20%, even more preferably at least 40%, most preferably at least 75%, of genes in the genome of a cell type or organism, and may also include profiles from strains having different copy numbers of the same gene, since these can be fundamentally different from each other.
  • the number of landmark profiles is reduced to the mimmum necessary to identify altered copy number of particular genes or chromosomal regions.
  • aneuploidy of a particular chromosomal region can be represented in the compendium set by only a few profiles from cell types or organisms having aneuploidy of genes that are located throughout the chromosomal region, i.e., each chromosomal region can be represented in the compendium by at least one profile from a cell type or organism having altered copy number of one gene, but multiple profiles from cell types or organisms having altered copy numbers of many genes in the chromosomal region may not be necessary.
  • the database comprises landmark profiles for at least 100, preferably at least 250, more preferably at least 500, even more preferably at least 1,000, even more preferably at least 10,000, even more preferably at least 50,000, most preferably at least 75,000 genes in the genome of a cell or organism, each gene having an altered copy number.
  • the database comprises landmark profiles for at least 1/4, preferably at least 1/2, most preferably at least 3/4 of the genes in the genome of a cell or organism, each gene having an altered copy number.
  • the cell or organism for which the database contains landmark profiles is a human, livestock or companion animal or plant.
  • Genetically modified cells i.e., mutant cells from which aneuploid cells can result, can be made using cells of any organism for which genomic sequence information is available and for which methods are available that allow alteration in dosage of specific genes.
  • the genetically modified cells that exhibit aneuploidy are used to make aneuploid profiles.
  • a compendium is constructed that includes transcript profiles that represent the transcriptional states of each of a plurality of modified cells with an indicated dosage level of one or more genes, e.g., a set of cells in which each cell has a duplication of a particular gene.
  • Such a compendium is advantageous to detect aneuploidy in a systematic and automatable manner.
  • the compendium includes aneuploid transcript profiles for the genes likely to result in a disease or syndrome.
  • the invention is carried out using a yeast, with S ⁇ cch ⁇ romyces cerevisiae most preferred because the sequence of the entire genome of a S. cerevisiae strain has been determined.
  • yeast S ⁇ cch ⁇ romyces cerevisiae
  • well-established methods for deleting or otherwise disrupting or modifying specific genes are available in yeast. It is believed that most (approximately four-fifths) of the genes in S. cerevisiae can be deleted, one at a time, with little or no effect on the ability of the organism to reproduce.
  • Another advantage is that biological functions are often conserved between yeast and humans. For example, almost half of the proteins identified as defective in human heritable diseases show amino acid similarity to yeast proteins (Goffeau et al., 1996, Life with 6000 genes.
  • a preferred strain of yeast is a S. cerevisiae strain for which yeast genomic sequence is known, such as strain S288C or substantially isogenic derivatives of it (see, e.g., Nature 369, 371-8 (1994); P.N.A.S. 92:3809-13 (1995); E.M.B.O. J. 13:5795-5809 (1994), Science 265:2077-2082 (1994); E.M.B.O. J. 15:2031-49 (1996), all of which are incorporated herein.
  • yeast strains are available from American Type Culture Collection, Rockville, MD 20852. Standard techniques for manipulating yeast are described in C. Kaiser, S.
  • yeast cells are used.
  • yeast genes are disrupted or deleted using the method of Baudin et al, 1993, A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae, Nucl. Acids Res. 21 :3329-3330, which is incorporated by reference in its entirety for all purposes.
  • This method uses a selectable marker, e.g., the KanMx gene, which serves in a gene replacement cassette.
  • the cassette is transformed into a haploid yeast strain and homologous recombination results in the replacement of the targeted gene (ORF) with the selectable marker.
  • a precise null mutation (a deletion from start codon to stop codon) is generated.
  • the polynucleotide (e.g., containing a selectable marker) used for transformation of the yeast includes an oligonucleotide marker that serves as a unique identifier of the resulting deletion strain as described, for example, in Shoemaker et al., 1996, Nature Genetics 14:450.
  • perturbations can be verified by PCR using the internal KanMx sequences, or using an external primer in the yeast genome that immediately flanks the disrupted open reading frame, and assaying for a PCR product of the expected size.
  • yeast it may sometimes be advantageous to disrupt ORFs in three yeast strains, i.e., haploid strains of the a and mating types, and a diploid strain (for deletions of essential genes).
  • precise deletion of yeast genes is accomplished by using a
  • PCR-mediated gene disruption strategy using homologous recombination (Winzeler et al. (1999) Science 285:901-906).
  • Winzeler et al. (1999) Science 285:901-906 short regions of yeast sequence that are upstream and downstream of a targeted gene are placed at each end of a selectable marker gene through PCR.
  • the resulting PCR products when transformed into yeast, can replace the targeted gene by homologous recombination. For most genes, greater than 95% of the yeast transformants carry the correct gene deletion.
  • the method of the present invention can be carried out using cells from any eukaryote for which genomic sequence of at least one gene is available, e.g., fruit flies (e.g.,
  • D. melanogaster D. melanogaster
  • nematodes e.g., C. elegans
  • mammalian cells such as cells derived from mice and humans.
  • 100% of the genome of D. melanogaster has been sequenced (Jasny, 2000, Science 287:2181).
  • Methods for disruption of specific genes are well known to those of skill in the art, see, e.g., Anderson, 1995, Methods Cell Biol. 48:31;
  • Ribozymes are RNAs which are capable of catalyzing RNA cleavage reactions. (Cech, 1987, Science 236:1532-1539; PCT International Publication WO 90/11364, published October 4, 1990; Sarver et al, 1990, Science 247: 1222-1225). "Hairpin” and "hammerhead” RNA ribozymes can be designed to specifically cleave a particular target mRNA.
  • Ribozyme methods involve exposing a cell to, inducing expression in a cell, etc. of such small RNA ribozyme molecules. (Grassi and Marini, 1996, Annals of Medicine 28: 499-510; Gibson, 1996, Cancer and Metastasis Reviews 15: 287-299). Ribozymes can be routinely expressed in vivo in sufficient number to be catalytically effective in cleaving mRNA, and thereby modifying mRNA abundances in a cell.
  • RNA in vivo a ribozyme coding DNA sequence, designed according to the previous rules and synthesized, for example, by standard phosphoramidite chemistry, can be ligated into a restriction enzyme site in the anticodon stem and loop of a gene encoding a tRNA, which can then be transformed into and expressed in a cell of interest by methods routine in the art.
  • tDNA genes i.e., genes encoding tRNAs
  • an inducible promoter e.g., a glucocorticoid or a tetracycline response element
  • ribozymes can be routinely designed to cleave virtually any mRNA sequence, and a cell can be routinely transformed with DNA coding for such ribozyme sequences such that a catalytically effective amount of the ribozyme is expressed. Accordingly the abundance of virtually any RNA species in a cell can be essentially eliminated.
  • activity of a target RNA (preferable mRNA) species is inhibited by use of antisense nucleic acids.
  • antisense nucleic acid refers to a nucleic acid capable of hybridizing to a sequence-specific (e.g., non-poly A) portion of the target RNA, for example its translation initiation region, by virtue of some sequence complementarity to a coding and/or non- coding region.
  • the antisense nucleic acids of the invention can be oligonucleotides that are double-stranded or single-stranded, RNA or DNA or a modification or derivative thereof, which can be directly administered to a cell or which can be produced intracellularly by transcription of exogenous, introduced sequences in quantities sufficient to inhibit translation of the target RNA.
  • antisense nucleic acids are of at least six nucleotides and are preferably oligonucleotides (ranging from 6 to about 200 oligonucleotides).
  • the oligonucleotide is at least 10 nucleotides, at least 15 nucleotides, at least 100 nucleotides, or at least 200 nucleotides.
  • the oligonucleotides can be DNA or RNA or chimeric mixtures or derivatives or modified versions thereof, single-stranded or double-stranded.
  • the oligonucleotide can be modified at the base moiety, sugar moiety, or phosphate backbone.
  • the oligonucleotide may include other appending groups such as peptides, or agents facilitating transport across the cell membrane (see, e.g., Letsinger et al, 1989, Proc. Natl. Acad. Sci. U.S.A. 86: 6553-6556; Lemaitre et al, 1987, Proc. Natl. Acad. Sci. 84: 648-652; PCT Publication No. WO 88/09810, published December 15, 1988), hybridization-triggered cleavage agents (see, e.g., Krol et al, 1988, BioTechniques 6: 958-976) or intercalating agents (see, e.g., Zon, 1988, Pharm. Res.
  • an antisense oligonucleotide is provided, preferably as single-stranded DNA.
  • the oligonucleotide may be modified at any position on its structure with constituents generally known in the art.
  • the antisense oligonucleotides may comprise at least one modified base moiety which is selected from the group including but not limited to 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine,
  • the oligonucleotide comprises at least one modified sugar moiety selected from the group including, but not limited to, arabinose, 2-fluoroarabinose, xylulose, and hexose.
  • the oligonucleotide comprises at least one modified phosphate backbone selected from the group consisting of a phosphorothioate, a phosphorodithioate, a phosphoramidothioate, a phosphoramidate, a phosphordiamidate, a methylphosphonate, an alkyl phosphotriester, and a formacetal or analog thereof.
  • the oligonucleotide is a 2- ⁇ -anomeric oligonucleotide.
  • An -anomeric oligonucleotide forms specific double-stranded hybrids with complementary RNA in which, contrary to the usual ⁇ -units, the strands run parallel to each other (Gautier et al, 1987, Nucl. Acids Res. 15: 6625-6641).
  • the oligonucleotide may be conjugated to another molecule, e.g., a peptide, hybridization triggered cross-linking agent, transport agent, hybridization-triggered cleavage agent, etc.
  • Oligonucleotides of the invention may be synthesized by standard methods known in the art, e.g by use of an automated DNA synthesizer (such as are commercially available from Biosearch, Applied Biosystems, etc.).
  • phosphorothioate oligonucleotides may be synthesized by the method of Stein et al. (1988, Nucl. Acids Res.
  • methylphosphonate oligonucleotides can be prepared by use of controlled pore glass polymer supports (Sarin et al, 1988, Proc. Natl. Acad. Sci. U.S.A. 85: 7448-7451), etc.
  • the oligonucleotide is a 2'-0-methylribonucleotide (Inoue et al, 1987, Nucl. Acids Res. 15: 6131-6148), or a chimeric RNA-DNA analog (Inoue et al, 1987, FEBS Lett. 215: 327-330).
  • the antisense nucleic acids of the invention are produced intracellularly by transcription from an exogenous sequence.
  • a vector can be introduced in vivo such that it is taken up by a cell, within which cell the vector or a portion thereof is transcribed, producing an antisense nucleic acid (RNA) of the invention.
  • RNA antisense nucleic acid
  • Such a vector would contain a sequence encoding the antisense nucleic acid.
  • Such a vector can remain episomal or become chromosomally integrated, as long as it can be transcribed to produce the desired antisense RNA.
  • Such vectors can be constructed by recombinant DNA technology methods standard in the art.
  • Nectors can be plasmid, viral, or others known in the art, used for replication and expression in mammalian cells. Expression of the sequences encoding the antisense R ⁇ As can be by any promoter known in the art to act in a cell of interest. Such promoters can be inducible or constitutive.
  • Such promoters for mammalian cells include, but are not limited to: the SN40 early promoter region (Bernoist and Chambon, 1981, Nature 290: 304-310), the promoter contained in the 3' long terminal repeat of Rous sarcoma virus (Yamamoto et al, 1980, Cell 22: 787-797), the herpes thymidine kinase promoter (Wagner et al, 1981, Proc. Natl. Acad. Sci. U.S.A. 78: 1441-1445), the regulatory sequences of the metallothionein gene (Brinster et al, 1982, Nature 296: 39-42), etc.
  • the antisense nucleic acids of the invention comprise a sequence complementary to at least a portion of a target RNA species.
  • absolute complementarity although preferred, is not required.
  • the ability to hybridize will depend on both the degree of complementarity and the length of the antisense nucleic acid.
  • the longer the hybridizing nucleic acid the more base mismatches with a target RNA it may contain and still form a stable duplex (or triplex, as the case may be).
  • One skilled in the art can ascertain a tolerable degree of mismatch by use of standard procedures to determine the melting point of the hybridized complex.
  • the amount of antisense nucleic acid that will be effective in the inhibition of translation of the target RNA can be determined by standard assay techniques.
  • antisense nucleic acids can be routinely designed to target virtually any mRNA sequence, and a cell can be routinely transformed with or exposed to nucleic acids
  • RNA aptamers can be introduced into or expressed in a cell.
  • RNA aptamers are specific RNA ligands for proteins, such as for Tat and Rev RNA
  • RNA interference can also be used to modify RNA abundances (Guo et al, 1995, Cell 81:611-620; Fire et al, 1998, Nature 391:806-811).
  • dsRNAs are injected into cells to specifically block expression of its homologous gene.
  • 15 anti-sense strand can inactivate the corresponding gene. It is suggested that the dsRNAs are cut by nucleases into 21-23 nucleotide fragments. These fragments hybridize to the homologous region of their corresponding mRNAs to form double-stranded segments, which are then degraded by nucleases (Grant, 1999, Cell 96:303-306; Zamore et al, 2000, Cell 101:25-33; Bass, 2000, Cell 101:235-238; Petcherski et al, 2000, Nature 405:364-
  • one or more dsRNAs having sequences homologous to the sequences of one or more mRNAs whose abundances are to be modified are transfected into a cell or tissue sample. Any standard methods for introducing nucleic acids into cells can be used.
  • Methods of modifying protein abundances include, mter alia, those altering protein degradation rates and those using antibodies (which bind to proteins affecting abundances ofactiviti.es of native target protein species). Increasing (or decreasing) the degradation rates of a protein species decreases (or increases) the abundance of that species.
  • a heat-inducible or drug- inducible N-terminal degron which is an N-terminal protein fragment that exposes a degradation signal promoting rapid protein degradation at a higher temperature (e.g., 37° C)
  • Such an exemplary degron is Arg-DHFR ts , a variant of murine dihydrofolate reductase in which the N-terminal Val is replaced by A g and the Pro at position 66 is replaced with Leu.
  • a gene for a target protein, P is replaced by standard gene targeting methods known in the art (Lodish et al, 1995, Molecular Biology of the Cell W.H. Freeman and Co., New York, especially chap 8) with a gene coding for the fusion protein Ub-Arg-DHFR ts -P ("Ub" stands for ubiquitin).
  • the N-terminal ubiquitin is rapidly cleaved after translation exposing the N- terminal degron.
  • lysines internal to Arg-DHFR ts are not exposed, ubiquitination of the fusion protein does not occur, degradation is slow, and active target protein levels are high.
  • lysines internal to Arg-DHFR ts are exposed, ubiquitination of the fusion protein occurs, degradation is rapid, and active target protein levels are low. Heat activation is blocked by exposure methotrexate.
  • This method is adaptable to other N-terminal degrons which are responsive to other inducing factors, such as drugs and temperature changes.
  • Target protein abundances and also, directly or indirectly, their activities can also be decreased by (neutralizing) antibodies.
  • antibodies to suitable epitopes on protein surfaces may decrease the abundance, and thereby indirectly decrease the activity, of the wild-type active form of a target protein by aggregating active forms into complexes with less or minimal activity as compared to the wild-type unaggregated wild-type form.
  • antibodies may directly decrease protein activity by, e.g., interacting directly with active sites or by blocking access of substrates to active sites.
  • (activating) antibodies may also interact with proteins and their active sites to increase resulting activity.
  • antibodies of the various types to be described
  • antibodies can be raised against specific protein species (by the methods to be described) and their effects screened.
  • the effects of the antibodies can be assayed and suitable antibodies selected that raise or lower the target protein species concentration and/or activity.
  • assays involve introducing antibodies into a cell (see below), and assaying the concentration of the wild-type amount or activities of the target protein by standard means (such as immunoassays) known in the art.
  • the net activity of the wild-type form can be assayed by assay means appropriate to the known activity of the target protein.
  • Antibodies can be introduced into cells in numerous fashions, including, for example, microinjection of antibodies into a cell (Morgan et al, 1988, Immunology Today 9:84-86) or transforming hybridoma mRNA encoding a desired antibody into a cell (Burke et al, 1984, Cell 36:847-858).
  • recombinant antibodies can be engineering and ectopically expressed in a wide variety of non-lymphoid cell types to bind to target proteins as well as to block target protein activities (Biocca et al, 1995, Trends in Cell Biology 5:248-252).
  • a first step is the selection of a particular monocolonal antibody with appropriate specificity to the target protein (see below).
  • sequences encoding the variable regions of the selected antibody can be cloned into various engineered antibody formats, including, for example, whole antibody, Fab fragments, Fv fragments, single chain Fv fragments (N H and N L regions united by a peptide linker) ("ScFv” fragments), diabodies (two associated ScFv fragments with different specificities), and so forth (Hayden et al, 1997, Current Opinion in Immunology 9:210-212).
  • Intracellularly expressed antibodies of the various formats can be targeted into cellular compartments (e.g., the cytoplasm, the nucleus, the mitochondria, etc.) by expressing them as fusions with the various known intracellular leader sequences (Bradbury et al, 1995, Antibody Engineering (vol. 2) (Borrebaeck ed.), pp 295-361, IRL Press).
  • the ScFv format appears to be particularly suitable for cytoplasmic targeting.
  • Antibody types include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, Fab fragments, and an Fab expression library.
  • Various procedures known in the art may be used for the production of polyclonal antibodies to a target protein.
  • various host animals can be immunized by injection with the target protein, such host animals include, but are not limited to, rabbits, mice, rats, etc.
  • adjuvants can be used to increase the immunological response, depending on the host species, and include, but are not limited to, Freund's (complete and incomplete), mineral gels such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, dinitrophenol, and potentially useful human adjuvants such as bacillus Calmette-Guerin (BCG) and corynebacterium parvum.
  • BCG Bacillus Calmette-Guerin
  • any technique that provides for the production of antibody molecules by continuous cell lines in culture may be used.
  • Such techniques include, but are not restricted to, the hybridoma technique originally developed by Kohler and Milstein (1975, Nature 256: 495-497), the trioma technique, the human B-cell hybridoma technique (Kozbor et al, 1983, Immunology Today 4: 72), and the EBV hybridoma technique to produce human monoclonal antibodies (Cole et al, 1985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).
  • monoclonal antibodies can be produced in germ-free animals utilizing recent technology (PCT/US90/02545).
  • human antibodies may be used and can be obtained by using human hybridomas (Cote et al, 1983, Proc. Natl. Acad. Sci. USA 80: 2026-2030), or by transforming human B cells with EBV virus in vitro (Cole et al, 1985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).
  • techniques developed for the production of "chimeric antibodies” (Morrison et al, 1984, Proc. Natl. Acad. Sci.
  • 4,946,778 can be adapted to produce single chain antibodies specific to the target protein.
  • An additional embodiment of the invention utilizes the techniques described for the construction of Fab expression libraries (Huse et al, 1989, Science 246: 1275-1281) to allow rapid and easy identification of monoclonal Fab fragments with the desired specificity for the target protein.
  • Antibody fragments that contain the idiotypes of the target protein can be generated by techniques known in the art.
  • such fragments include, but are not limited to: the F(ab') 2 fragment which can be produced by pepsin digestion of the antibody molecule; the Fab' fragments that can be generated by reducing the disulfide bridges of the F(ab') 2 fragment, the Fab fragments that can be generated by treating the antibody molecule with papain and a reducing agent, and Fv fragments.
  • screening for the desired antibody can be accomplished by techniques known in the art, e.g., ELISA (enzyme-linked immunosorbent assay).
  • ELISA enzyme-linked immunosorbent assay
  • To select antibodies specific to a target protein one may assay generated hybridomas or a phage display antibody library for an antibody that binds to the target protein.
  • Methods of directly modifying protein activities include, inter alia, dominant negative mutations, specific drugs (used in the sense of this application), and also the use of antibodies, as previously discussed.
  • Dominant negative mutations are mutations to endogenous genes or mutant exogenous genes that when expressed in a cell disrupt the activity of a targeted protein species.
  • general rules exist that guide the selection of an appropriate strategy for constructing dominant negative mutations that disrupt activity of that target (Hershkowitz, 1987, Nature 329:219-222).
  • over expression of an inactive form can cause competition for natural substrates or ligands sufficient to significantly reduce net activity of the target protein.
  • Such over expression can be achieved by, for example, associating a promoter of increased activity with the mutant gene.
  • changes to active site residues can be made so that a virtually irreversible association occurs with the target ligand.
  • Such can be achieved with certain tyrosine kinases by careful replacement of active site serine residues (Perlmutter et al, 1996, Current Opinion in --h-i-munology 8:285-290).
  • multimeric activity can be decreased by expression of genes coding exogenous protein fragments that bind to multimeric association domains and prevent multimer formation.
  • an inactive protein unit of a particular type can tie up wild-type active units in inactive multimers, and thereby decrease multimeric activity (Nocka et al, 1990, The EMBO J. 9:1805-1813).
  • the DNA binding domain can be deleted from the DNA binding unit, or the activation domain deleted from the activation unit.
  • the DNA binding domain unit can be expressed without the domain causing association with the activation unit. Thereby, DNA binding sites are tied up without any possible activation of expression.
  • expression of a rigid unit can inactivate resultant complexes.
  • proteins involved in cellular mechanisms are typically composed of associations of many subunits of a few types. These structures are often highly sensitive to disruption by inclusion of a few monomerie units with structural defects. Such mutant monomers disrupt the relevant protein activities.
  • mutant target proteins that are sensitive to temperature (or other exogenous factors) can be found by mutagenesis and screening procedures that are well-known in the art. Also, one of skill in the art will appreciate that expression of antibodies binding and inhibiting a target protein can be employed as another dominant negative strategy.
  • activities of certain target proteins can be altered by exposure to exogenous drugs or ligands.
  • a drug is known that interacts with only one target protein in the cell and alters the activity of only that one target protein. Exposure of a cell to that drug thereby modifies the cell. The alteration can be either a decrease or an increase of activity.
  • a drug is known and used that alters the activity of only a few (e.g., 2-5) target proteins with separate, distinguishable, and non-overlapping effects.
  • the methods of the present invention are directed toward correcting for the effects of aneuploidy in a profile.
  • aneuploidy may arise spontaneously in a cell as indirect result of, for example, a mutation, or missegregation of chromosomes or some selection, such as one that offers a growth advantage. Consequently, profiles may be generated from aneuploid cells where the aneuploidy is not a desired characteristic, but where it contaminates the profile. In fact, aneuploidy may go undetected in the cells from which the profiles were generated. The results of undesired and undetected aneuploidy may be spurious correlations between profiles, which may in turn lead to erroneous interpretations of, for example, gene function.
  • profiles may be corrected for the effects of aneuploidy as follows.
  • the mean chromosomal ratio offset is the difference between the mean quantified level of a plurality of cellular constituents associated with a plurality of genes having an abno ⁇ nal copy number (i.e., those mapped to the aneuploid chromosome or chromosomal segment) and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes having a wild type copy number (i.e., those mapped to chromosomes or chromosomal segments having a wild type copy number) (Fig.
  • the mean chromosomal ratio offset for chromosome Nil shown in Fig. Id is about 10 0'2 , or about 58%, while that shown in Fig. Ie is about 10 0 14 , or about 35%.
  • the mean quantified level of the plurality of cellular constituents associated with the plurality of genes mapped to the affected chromosome or chromosomal segment is divided by the mean chromosomal ratio offset in order to correct for the effects of aneuploidy.
  • the mean quantified level of the plurality of cellular constituents associated with the plurality of genes on the aneuploid chromosome of Fig. Id will be decreased by 58% and the mean quantified level of the plurality of cellular constituents associated with the plurality of genes on the aneuploid chromosome of Fig. Ie will be decreased by 35%.
  • the plurality of cellular constituents is m-R ⁇ A transcripts
  • the mean quantified level is an expression ratio, i.e., the ratio of the level of gene transcripts in the aneuploid cell and the level of gene transcripts in a wild type cell.
  • the mean chromosomal ratio offset is determined for at least 2 genes, preferably at least 10 genes, more preferably at least 50 genes mapped to the same aneuploid chromosome or chromosomal segment.
  • FIG. 5 Another illustration of aneuploidy resulting in a spurious correlation of profiles is the correlation of the profiles of Saccharomyces cerevisiae mutants +/mcml and yor080w/yor080w (Fig. 5).
  • the cells harboring the mutations have lost chromosome III, on which is located the 2 transcription factor. This factor regulates transcription on many other chromosomes. Consequently, loss of chromosome III affects not only levels of cellular constituents associated with genes on chromosome III, but also levels of cellular constituents associated with genes on many other chromosomes.
  • the mean chromosomal ratio offset can be determined and the expression ratio of genes on chromosome III can be divided by this amount. However, this correction would clearly be suboptimal because it does not correct for changes in levels of cellular constituents associated with genes on other chromosomes.
  • profiles may be corrected for the effects of aneuploidy as follows.
  • the mean ratio offset for at least 50%, at least 75%, or preferably all genes known to be affected by the aneuploidy is determined.
  • the mean ratio offset is determined for all genes known to be regulated by a gene on chromosome III, such as the transcription factor ⁇ 2.
  • the mean quantified level of a plurality of cellular constituents altered by the presence of aneuploidy is divided by the mean ratio offset in order to correct for the effects of aneuploidy.
  • the mean ratio offset is determined for at least two affected genes, preferably at least 10 affected genes, more preferably at least 50 affected genes.
  • identities of genes regulated by a particular gene on an aneuploid chromosome or chromosomal segment are preferably known.
  • the genes affected by the aneuploidy are classified based on a characteristic.
  • genes regulated by a given aneuploidy may tend to be “highly regulated” or “strongly induced” (or “strongly repressed”), and another class of genes might be “slightly regulated” or “slightly induced” (or “slightly repressed”) by the aneuploidy. If so, then application of a mean offset for these classes of genes should be different. For example, some genes might always be strongly induced, say 20-fold, while other are only slightly induced, say 1.5-fold. Clearly, the expression ratios of genes in each of these classes would be divided by the mean offset for that class of genes.
  • Figure 3 illustrates an exemplary computer system suitable for implementation of the analytic methods of this invention.
  • Computer system 301 is illustrated as comprising internal components and being linked to external components.
  • the internal components of this computer system include processor element 302 interconnected with main memory 303.
  • processor element 302 interconnected with main memory 303.
  • computer system 301 can be an Intel Pentium®-based processor of 200 MHz or greater clock rate and with 32 MB or more of main memory.
  • computer system 301 is an Alta cluster of nine computers; a head "node” and eight sibling "nodes,” each having an
  • the Alta cluster comprises 128Mb of random access memory (“RAM”) on the head node and 256 Mb of RAM on each of the eight sibling nodes.
  • RAM random access memory
  • FIG. 10 exemplary computer system depicted in FIG. 3 and having only a single processor and a single memory unit.
  • the external components include mass storage 304.
  • This mass storage can be one or more hard disks which are typically packaged together with the processor and memory.
  • Such hard disks are typically of 1 Gb or greater storage capacity and more preferably having
  • each node of the Alta cluster comprises a hard drive.
  • the head node has a hard drive with 6 Gb of storage capacity whereas each sibling node has a hard drive with 9 Gb of storage capacity.
  • Other external components include user interface device 305, which can be a monitor and a keyboard together with a pointing device 306 such as a
  • the computer system is also linked to a network link 307, which can be, e.g., part of an Ethernet link to other local computer systems, remote computer systems, or wide area communication networks such as the Internet.
  • a network link 307 can be, e.g., part of an Ethernet link to other local computer systems, remote computer systems, or wide area communication networks such as the Internet.
  • each computer system in the preferred Alta cluster of computers described above is connected via an NFS network. This network link allows the computer
  • Software component 310 represents an operating system, which is responsible for managing the computer system and its network interconnections.
  • the operating system can be, for example, of the Microsoft WindowsTM family, such as Windows 98, Window 95 or Windows NT.
  • the operating system can be a Macintosh operating system, a UNIX operating system or the LINUX operating system.
  • Software component 311 represents an operating system, which is responsible for managing the computer system and its network interconnections.
  • the operating system can be, for example, of the Microsoft WindowsTM family, such as Windows 98, Window 95 or Windows NT.
  • the operating system can be a Macintosh operating system, a UNIX operating system or the LINUX operating system.
  • Software component 311 represents an operating system, which is responsible for managing the computer system and its network interconnections.
  • the operating system can be, for example, of the Microsoft WindowsTM family, such as Windows 98, Window 95 or Windows NT.
  • the operating system can be a Macintosh operating system, a UNI
  • 35 represents common languages and functions conveniently present in the system to assist programs implementing the methods specific to the present invention.
  • Languages that can be used to program the analytic methods of the invention include, for example, C, and C++; PERL; FORTRAN; and JAVA.
  • the methods of the present invention can also be programmed or modeled in mathematical software packages that allow symbolic entry of equations and high-level specification of processing, including specific algorithms to be used, thereby freeing a user of the need to procedurally program individual equations and algorithms.
  • Such packages include, e.g., Matlab from Mathworks (Natick, MA), Mathematica from Wolfram-Research (Chapaign, Illinois) or S-Plus from Math Soft (Seattle, Washington).
  • software component 312 represents analytic methods of the present invention as programmed in a procedural language or symbolic package.
  • the computer system also contains a database 313 of landmark profiles.
  • a user first loads profile data into the computer system 301. These data can be directly entered by the user from monitor 305 and keyboard 306, or from other computer systems linked by network connection 307, or on removable storage media such as a CD-ROM or floppy disk (not illustrated) or through the network (307).
  • profile analysis software 312 which performs the steps of comparing the profile to the database 313 of landmark profiles.
  • a computer system for determining whether aneuploidy is likely to be present in a cell type or organism comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute steps of: (a) comparing quantified levels of cellular constituents associated with a plurality of genes in the genome of one or more cells of said cell type or organism, said plurality of genes being mapped to the same chromosome, to mean quantified levels of cellular constituents associated with genes mapped to different chromosomes; and (b) identifying genes mapped to the same chromosome for which the quantified level is substantially the same for each cellular constituent associated with each of said genes and is dissimilar to mean quantified levels of said plurality of cellular constituents associated with genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be
  • a computer system for detecting the predisposition of a cell type or organism to a disease comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: (a) comparing quantified levels of cellular constituents associated with a plurality of genes in the genome of one or more cells of said cell type or organism, said plurality of genes being mapped to the same cliromosome, to mean quantified levels of a plurality of cellular constituents associated with genes mapped to different chromosomes; and (b) identifying genes mapped to the same cliromosome for which the quantified levels of cellular constituents associated with said genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that aneuploidy of said same chromosome or
  • a computer system for diagnosing a disease associated with a known aneuploidy in a cell type or type of organism comprises one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type
  • a computer system for detecting the presence of aneuploidy in a cell type or type of organism comprises one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein the known alteration in copy number of at least one known gene in the one or more landmark profiles determined to be most similar is indicative of the presence of aneuploidy in said first cell type or type of organism.
  • a computer system for determining whether aneuploidy is likely to be present in a cell type or organism comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute step of: identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild-type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild- type organism; and wherein identifying
  • a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes
  • a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment comprising comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abno ⁇ nal copy number; and dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one
  • a computer system for co ⁇ ecting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment comprising comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: .
  • a computer system for co ⁇ ecting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment comprising comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
  • a computer program product for directing a user computer in a computer-aided diagnosis of a disease associated with a known aneuploidy in a cell type or organism, said computer program product comprises: computer code for comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism.
  • a computer program product for directing a user computer in a computer-aided diagnosis of a disease associated with a known aneuploidy in a cell type or organism, said computer program product comprises: computer code for comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein the known alteration in copy number of at least one known gene in the one or more landmark profiles determined to be most similar is indicative of the presence of aneuploidy in said first cell type or type of organism.
  • a computer program product for directing a user computer in a computer-aided co ⁇ ection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment
  • said computer program product comprises: computer code for determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value
  • a computer program product for directing a user computer in a computer-aided co ⁇ ection of a profile of a cell type or organism for aneuploidy of a cliromosome or chromosomal segment, said computer program product comprises: computer code for determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnonnal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
  • a computer program product for directing a user computer in a computer-aided co ⁇ ection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment
  • said computer program product comprises: computer code for dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number.
  • a computer program product for directing a user computer in a computer-aided co ⁇ ection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprises: computer code for dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
  • kits for determining the biological state of a cell type or organism contain microa ⁇ ays, such as those described in subsections below.
  • the microa ⁇ ays include one or more test probes, each of which has a polynucleotide sequence that is complementary to a sequence of RNA or DNA to be detected. Each probe preferably has a different nucleic acid sequence, and the position of each probe on the solid surface is preferably known.
  • the microa ⁇ ays are preferably addressable a ⁇ ays, and more preferably are positionally addressable a ⁇ ays.
  • each probe (or group of identical probe molecules) of the a ⁇ ay is preferably located at a known, predetermined position on the solid support such that the identity (i.e., the sequence) of each probe can be determined from its position on the a ⁇ ay (i.e., on the support or surface), hi prefened embodiments, each probe is covalently attached to the solid support at a single site.
  • the probes contained in the kits of this invention are nucleic acids capable of hybridizing specifically to nucleic acid sequences derived from RNA species that are known to increase or decrease in a cell or organism having a particular altered gene copy number that is detected by the kit.
  • kits of the invention preferably substantially exclude nucleic acids that hybridize to RNA species that are not increased or decreased in a cell or organism having a particular altered gene copy number that is detected by the kit.
  • the kits of the invention comprise an array comprising a positionally-addressable a ⁇ ay of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of said subject, wherein said different nucleotide sequences are known to be increased or decreased as a result of aneuploidy and expression profiles, in electronic or written form, each co ⁇ elated to a known alteration in copy number of at least one gene, wherein said expression profiles are obtained by measuring a plurality of cellular constituents in
  • kits of the invention comprise an a ⁇ ay comprising a positionally-addressable a ⁇ ay of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of an organism, and a container comprising RNA, or cDNA derived therefrom, of a cell having a known aneuploidy.
  • kits can be used to diagnose a disease associated with aneuploidy in a cell type or organism, i.e., by determimng the profile of the cell type or organism and comparing the profile to a compendium of landmark profiles from cells having a known alteration in copy number in at least one gene that is associated with a disease in order to determine if the cell type or organism exhibits the aneuploidy associated with the disease.
  • a profile of a first cell at a later developmental stage can be predicted from the profile of the first cell measured at an earlier developmental stage and can be compared to a compendium having profiles from a second cell that is at a developmental stage more similar to the later developmental stage of the first cell and exhibiting aneuploidy associated with a disease in order to determine the first cell's predisposition to the disease.
  • Diseases in humans associated with aneuploidy that can be diagnosed or predicted using the kits of the invention include, but are not limited to, trisomic diseases such as Down syndrome cases (trisomy of chromosome 21), Edwards syndrome cases (trisomy of chromosome 18) and Patau syndrome (trisomy of chromosome 13); diseases associated with deletions of an arm of a chromosome, such as cri du chat syndrome (5p deletion) and Wolf-Hirschhorn syndrome (4p deletion); diseases associated with contiguous gene syndromes such as Alagille syndrome (20p.l2 deletion), Angelman syndrome (maternal chromosome at 15ql l deletion), DiGeorge syndrome (22qll.21 deletion), Langer-Giedion syndrome (8q24.1 deletion), Miller-Dieker syndrome (17pl3.3 deletion), Prader-Willi syndrome (paternal chromosome at 15qll deletion), Rubinstein-Taybi syndrome (16pl3- deletion), Smith Magenis syndrome (17pl l.2 deletion), and
  • cancers in humans are also associated with gene amplifications, deletions or translocations and can be diagnosed or predicted using the kits of the present invention.
  • These cancers that may be associated with aneuploidy include, but are not limited to, colon cancer; breast cancer; leukemias, such as acute myelogenous leukemia, chronic myelocytic leukemia, acute promyelocytic leukemia, acute nonlymphocytic leukemia, acute monocytic leukemia, and acute myelomonocytic leukemia; lymphomas, such as Burkitt's lymphoma, and non-Hodgkin's lymphoma; lymphocytic leukemias, such as acute lymphoblastic leukemia and chronic lymphocytic leukemia; myeloproliferative diseases; adenocarcinomas including small cell lung cancer, kidney cancer, uterine cancer, cervical cancer, prostate cancer, bladder cancer, and ovarian cancer; sarcomas including liposarcom
  • kits of the invention may be used to detect or predict phenotypes, including beneficial phenotypes, resulting from the presence of aneuploidy in a cell type or organism.
  • the profiling methods of the present invention can be performed using any probe or probes that comprise a polynucleotide sequence and which are immobilized to a solid support or surface.
  • the probes may comprise DNA sequences, RNA sequences, or copolymer sequences of DNA and RNA.
  • the polynucleotide sequences of the probes may also comprise DNA and/or RNA analogues, or combinations thereof.
  • the polynucleotide sequences of the probes may be full or partial sequences of genomic DNA, cDNA, or mRNA sequences extracted from cells.
  • the polynucleotide sequences of the probes may also be synthesized nucleotide sequences, such as synthetic oligonucleotide sequences.
  • the probe sequences can be synthesized either enzymatically in vivo, enzymatically in vitro (e.g., by PCR), or non-enzymatically in vitro.
  • the probe or probes used in the methods of the invention are preferably immobilized to a solid support which may be either porous or non-porous.
  • the probes of the invention may be polynucleotide sequences that are attached to a nitrocellulose or nylon membrane or filter.
  • Such hybridization probes are well known in the art (see, e.g., Sambrook et al, Eds., 1989, Molecular Cloning: A Laboratory Manual, 2nd ed., Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York).
  • the solid support or surface may be a glass or plastic surface.
  • This invention is particularly useful for the analysis of gene expression profiles in order to determine the likelihood of alterations to the genotype of a cell.
  • Some embodiments of this invention are based on measuring the transcriptional state of a cell.
  • the transcriptional state can be measured by techniques of hybridization to microa ⁇ ays of probes consisting of a solid phase on the surface of which are immobilized a population of polynucleotides, such as a population of DNA or DNA mimics, or, alternatively, a population of RNA or RNA mimics.
  • the solid phase may be a nonporous or, optionally, a porous material such as a gel.
  • microa ⁇ ays can be employed for analyzing aspects of the biological state of a cell other than the transcriptional state, such as the translational state, the activity state, or mixed aspects.
  • a microa ⁇ ay comprises a support or surface with an ordered a ⁇ ay of binding (e.g., hybridization) sites or "probes" for products of many of the genes in the genome of a cell or organism, preferably most or almost all of the genes.
  • the microarrays are addressable a ⁇ ays, preferably positionally addressable a ⁇ ays. More specifically, each probe of the a ⁇ ay is preferably located at a known, predetermined position on the solid support such that the identity (i.e., the sequence) of each probe can be determined from its position in the a ⁇ ay (i.e., on the support or surface).
  • each probe is covalently attached to the solid support at a single site.
  • Microa ⁇ ays can be made in a number of ways, of which several are described below. However produced, microa ⁇ ays share certain characteristics: The a ⁇ ays are reproducible, allowing multiple copies of a given a ⁇ ay to be produced and easily compared with each other.
  • microa ⁇ ays are made from materials that are stable under binding (e.g., nucleic acid hybridization) conditions, and include large nylon a ⁇ ays, such as those sold by Research Genetics.
  • the microa ⁇ ays are preferably small, e.g., between 5 cm 2 and 25 cm 2 , preferably between 12 cm 2 and 13 cm 2 .
  • a ⁇ ays are also contemplated and may be preferable, e.g., for use in screening and/or signature chips comprising a very large number of distinct oligonucleotide probe sequences.
  • a given binding site or unique set of binding sites in the microanay will specifically bind (e.g., hybridize) to the product of a single gene in a cell (e.g., to a specific mRNA, or to a specific cDNA derived therefrom).
  • a binding site or unique set of binding sites in the microanay will specifically bind (e.g., hybridize) to the product of a single gene in a cell (e.g., to a specific mRNA, or to a specific cDNA derived therefrom).
  • other, related or similar sequences may cross hybridize to a given binding site.
  • the microa ⁇ ays of the present invention include one or more test probes, each of which has a polynucleotide sequence that is complementary to a subsequence of RNA or DNA to be detected.
  • Each probe preferably has a different nucleic acid sequence, and the position of each probe on the solid surface is preferably known.
  • the microanays are preferably addressable a ⁇ ays, and more preferably are positionally addressable anays.
  • each probe of the a ⁇ ay is preferably located at a known, predetermined position on the solid support such that the identity (i.e., the sequence) of each probe can be determined from its position on the a ⁇ ay (i.e., on the support or surface).
  • the density of probes on a microa ⁇ ay is between about 100 and 1,000 different (i.e., non-identical) probes per 1 cm 2 . More preferably, a microa ⁇ ay of the invention will have between about 1,000 and 5,000 different probes per 1 cm 2 , between about 5,000 and 10,000 different probes per 1 cm 2 , between about 10,000 and 15,000 different probes per 1 cm 2 or between about 15,000 and 20,000 different probes per 1 cm 2 .
  • the microanay is a high density a ⁇ ay, preferably having a density of between about 1,000 and 5,000 different probes per 1 cm 2 .
  • the microanays of the invention therefore preferably contain at least 2,500, at least 5,000, at least 10,000, at least 15,000, at least 20,000, at least 25,000, at least 50,000, at least 55,000, at least 100,000 or at least 150,000 different (i.e., non-identical) probes.
  • the density of probes on a microa ⁇ ay is between about 100 and 1,000 different (i.e., non-identical) probes per 1 cm 2 , between 1,000 and 5,000 different probes per 1 cm 2 , between 5,000 and 10,000 different probes per 1 cm 2 , between 10,000 and 15,000 different probes per 1 cm 2 , between 15,000 and 20,000 different probes per 1 cm 2 , between 20,000 and 50,000 different probes per cm 2 , between 50,000 and 100,000 different probes per 1 cm 2 , between 100,000 and 500,000 different probes per 1 cm 2 , or more than 500,000 different (i.e., non-identical) probes per 1 cm 2 .
  • the microa ⁇ ay is an a ⁇ ay (i.e., a matrix) in which each position represents a discrete binding site for a product encoded by a gene (i.e., an mRNA or a cDNA derived therefrom), and in which binding sites are present for products of most or almost all of the genes in the organism's genome.
  • the binding site can be a DNA or DNA analogue to which a particular RNA can specifically hybridize.
  • the DNA or DNA analogue can be, e.g., a synthetic oligomer, a full-length cDNA, a less-than full length cDNA, or a gene fragment.
  • the microanay contains binding sites for products of all or almost all genes in the target organism's genome, such comprehensiveness is not necessarily required.
  • the microa ⁇ ay will have binding sites conesponding to at least about 5% of the genes in the genome, sometimes to as many as 25%, often to at least about 50%, more often to at about 75%, even more often to at least about 85%, even more often to about 90%, and still more often to at least about 99%.
  • "picoa ⁇ ays” which may have binding sites for several hundred genes, may also be used.
  • Such a ⁇ ays are microa ⁇ ays which contain binding sites for products of only a limited number of genes in the target organism's genome.
  • a picoanay contains binding sites conesponding to fewer than about 50% of the genes in the genome of an organism.
  • the microarray has binding sites for genes associated with one or more biological pathways responsible for producing a phenotype of interest.
  • a "gene” is typically identified as the portion of DNA that is transcribed by RNA polymerase.
  • a gene may include a 5' untranslated region ("UTR"), introns, exons and a 3' UTR.
  • UTR 5' untranslated region
  • a gene comprises at least 25 to 100,000 nucleotides from which a messenger RNA is transcribed in the organism or in some cell in a multicellular organism.
  • the number of genes in a genome can be estimated from the number of mRNAs expressed by the organism, or by extrapolation from a well characterized portion of the genome.
  • ORF open reading frames
  • the "probe" to which a particular polynucleotide molecules specifically hybridizes according to the invention is a complementary polynucleotide sequence.
  • the probes of the microanay comprise nucleotide sequences greater than about 250 bases in length conesponding to one or more genes or gene fragments.
  • the probes may comprise DNA or DNA "mimics” (e.g., derivatives and analogues) conesponding to at least a portion of each gene in an organism's genome.
  • the probes of the microanay are complementary RNA or RNA mimics.
  • DNA mimics are polymers composed of subunits capable of specific, Watson-Crick-like hybridization with DNA, or of specific hybridization with RNA.
  • the nucleic acids can be modified at the base moiety, at the sugar moiety, or at the phosphate backbone.
  • Exemplary DNA mimics include, e.g., phosphorothioates.
  • DNA can be obtained, e.g., by polymerase chain reaction (PCR) amplification of gene segments from genomic DNA, cDNA (e.g., by RT-PCR), or cloned sequences.
  • PCR polymerase chain reaction
  • PCR primers are preferably chosen based on known sequence of the genes or cDNA that result in amplification of unique fragments (i.e., fragments that do not share more than 10 bases of contiguous identical sequence with any other fragment on the microanay).
  • Computer programs that are well known in the art are useful in the design of primers with the required specificity and optimal amplification properties, such as Oligo version 5.0 (National Biosciences).
  • Oligo version 5.0 National Biosciences.
  • each probe on the microanay will be between 20 bases and 50,000 bases, and usually between 300 bases and 1000 bases in length.
  • PCR methods are well known in the art, and are described, for example, in Innis et al., eds., 1990, PCR Protocols: A Guide to Methods and Applications, Academic Press Inc., San Diego, CA. It will be apparent to one skilled in the art that controlled robotic systems are useful for isolating and amplifying nucleic acids.
  • An alternative, prefened means for generating the polynucleotide probes of the microa ⁇ ay is by synthesis of synthetic polynucleotides or oligonucleotides, e.g., using N- phosphonate or phosphoramidite chemistries (Froehler et al. , 1986, Nucleic Acid Res. 14:5399-5401; McBride et al, 1983, Tetrahedron Lett. 24:246-248).
  • Synthetic sequences are typically between about 15 and about 500 bases in length, more typically between about 20 and about 100 bases, most preferably between about 40 and about 70 bases in length, hi some embodiments, synthetic nucleic acids include non-natural bases, such as, but by no means limited to, inosine.
  • nucleic acid analogues may be used as binding sites for hybridization.
  • An example of a suitable nucleic acid analogue is peptide nucleic acid (see, e.g., Egholm et al, 1993, Nature 363:566-568; U.S. Patent No. 5,539,083).
  • the hybridization sites are made from plasmid or phage clones of genes, cDNAs (e.g., expressed sequence tags), or inserts therefrom (Nguyen et al, 1995, Genomics 29:201-209).
  • the probes are attached to a solid support or surface, which may be made, e.g., from glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, gel, or other porous or nonporous material.
  • a prefened method for attaching the nucleic acids to a surface is by printing on glass plates, as is described generally by Schena et al, 1995, Science 270:461-410. This method is especially useful for preparing microanays of cDNA
  • a second prefened method for making microa ⁇ ays is by making high-density oligonucleotide arrays. Techniques are known for producing a ⁇ ays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface
  • oligonucleotides (generally of length 20 to 70 bases) of known sequence are synthesized directly on a surface such as a derivatized glass slide. Usually, the a ⁇ ay produced is redundant, with several oligonucleotide molecules per RNA. Oligonucleotide probes can be chosen to distinguish between alternatively spliced mRNAs.
  • microa ⁇ ays e.g., by masking
  • any type of a ⁇ ay for example, dot blots on a nylon hybridization membrane (see Sambrook et al, supra) could be used.
  • very small arrays will frequently be prefened because hybridization volumes will be smaller.
  • microanays of the invention are
  • oligonucleotide probes in such microa ⁇ ays are preferably synthesized in a ⁇ ays e.g.
  • microdroplets on a glass slide, by serially depositing individual nucleotide bases in "microdroplets" of a high surface tension solvent such as propylene carbonate.
  • the microdroplets have small volumes (e.g., 100 pL or less, more preferably 50 pL or less) and are separated from each other on the microanay (e.g., by hydrophobic domains) to form circular surface tension wells which define the locations of the a ⁇ ay elements (z. e. , the different probes).
  • Target polynucleotides which may be analyzed by the methods and compositions of the invention include RNA molecules such as, but by no means limited to, messenger RNA (mRNA) molecules, ribosomal RNA (rRNA) molecules, cRNA molecules (i. e. , RNA molecules prepared from cDNA molecules that are transcribed in vivo) and fragments thereof.
  • RNA molecules such as, but by no means limited to, messenger RNA (mRNA) molecules, ribosomal RNA (rRNA) molecules, cRNA molecules (i. e. , RNA molecules prepared from cDNA molecules that are transcribed in vivo) and fragments thereof.
  • Target polynucleotides which may also be analyzed by the methods and compositions of the present invention include, but are not limited to DNA molecules such as genomic DNA molecules, cDNA molecules, and fragments thereof including oligonucleotides, ESTs, STSs, etc.
  • the target polynucleotides may be from any source.
  • the target polynucleotide molecules may be naturally occurring nucleic acid molecules such as genomic or extragenomic DNA molecules isolated from an organism, or RNA molecules, such as mRNA molecules, isolated from an organism.
  • the polynucleotide molecules may be synthesized, including, e.g., nucleic acid molecules synthesized enzymatically in vivo or in vitro, such as cDNA molecules, or polynucleotide molecules synthesized by PCR, RNA molecules synthesized by in vitro transcription, etc.
  • the sample of target polynucleotides can comprise, e.g., molecules of DNA, RNA, or copolymers of DNA and RNA.
  • the target polynucleotides of the invention will conespond to particular genes or to particular gene transcripts (e.g., to particular mRNA sequences expressed in cells or to particular cDNA sequences derived from such mRNA sequences).
  • the target polynucleotides may co ⁇ espond to particular fragments of a gene transcript.
  • the target polynucleotides may co ⁇ espond to different exons of the same gene, e.g., so that different splice variants of that gene may be detected and/or analyzed.
  • the target polynucleotides to be analyzed are prepared in vitro from nucleic acids extracted from cells.
  • RNA is extracted from cells (e.g., total cellular RNA, poly(A) + messenger RNA, fraction thereof) and messenger RNA is purified from the total extracted RNA.
  • Methods for preparing total and poly(A) + RNA are well known in the art, and are described generally, e.g., in Sambrook et al, supra.
  • RNA is extracted from cells of the various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation (Chirgwin et al, 1979, Biochemistry 18:5294-5299).
  • cDNA is then synthesized from the purified mRNA using, e.g., oligo-dT or random primers.
  • the target polynucleotides are cRNA prepared from purified messenger RNA extracted from cells.
  • cRNA is defined as RNA complementary to the source RNA.
  • the extracted RNAs are amplified using a process in which doubled-stranded cDNAs are synthesized from the RNAs using a primer linked to an RNA polymerase promoter in a direction capable of directing transcription of anti-sense RNA.
  • Anti-sense RNAs or cRNAs are then transcribed from the second strand of the double-stranded cDNAs using an RNA polymerase (see, e.g., U.S. Patent Nos. 5,891,636, 5,716,785; 5,545,522 and 6,132,997; see also, U.S. Patent Application Serial No. 09/411,074, filed October 4, 1999 by Linsley and Schelter and U.S. Provisional Patent Application Serial No. to be assigned, Attorney Docket No. 9301-124-888, filed on November 28, 2000, by Ziman et al). Both oligo-dT primers (U.S. Patent Nos. 5,545,522 and 6,132,997) or random primers (U.S.
  • the target polynucleotides are short and/or fragmented polynucleotide molecules which are representative of the original nucleic acid population of the cell.
  • the target polynucleotides to be analyzed by the methods and compositions of the invention are preferably detectably labeled.
  • cDNA can be labeled directly, e.g., with nucleotide analogs, or indirectly, e.g., by making a second, labeled cDNA strand using the first strand as a template.
  • the double-stranded cDNA can be transcribed into cRNA and labeled.
  • the detectable label is a fluorescent label, e.g., by incorporation of nucleotide analogs.
  • Other labels suitable for use in the present invention include, but are not limited to, biotin, imminobiotin, antigens, cofactors, dinitrophenol, lipoic acid, olefinic compounds, detectable polypeptides, electron rich molecules, enzymes capable of generating a detectable signal by action upon a substrate, and radioactive isotopes.
  • Prefened radioactive isotopes include 32 P, 35 S, 1 C, 15 N and 125 I.
  • Fluorescent molecules suitable for the present invention include, but are not limited to, fluorescein and its derivatives, rhodamine and its derivatives, texas red, 5'carboxy-fluorescein (“FMA”), 2',7'- dimethoxy-4',5 '-dichloro-6-carboxy-fluorescein (“JOE”), N,N,N',N'-tetramethyl-6-carboxy- rhodamine (“TAMRA”), 6'carboxy-X-rhodamine (“ROX”), HEX, TET, IRD40, and IRD41.
  • FMA fluorescein and its derivatives
  • rhodamine and its derivatives texas red
  • FMA 5'carboxy-fluorescein
  • JE 2',7'- dimethoxy-4',5 '-dichloro-6-carboxy-fluorescein
  • TAMRA N,N,N',N'-t
  • Fluroescent molecules that are suitable for the invention further include: cyamine dyes, including by not limited to Cy3, Cy3.5 and Cy5; BODIPY dyes including but not limited to BODIPY-FL, BODIPY-TR, BODIPY-TMR, BODIPY-630/650, and BODIPY-650/670; and ALEXA dyes, including but not limited to ALEXA-488, ALEXA-532, ALEXA-546, ALEXA-568, and ALEXA-594; as well as other fluorescent dyes which will be known to those who are skilled in the art.
  • Electron rich indicator molecules suitable for the present invention include, but are not limited to, ferritin, hemocyanin, and colloidal gold.
  • the target polynucleotides may be labeled by specifically complexing a first group to the polynucleotide.
  • a second group covalently linked to an indicator molecules and which has an affinity for the first group, can be used to indirectly detect the target polynucleotide.
  • compounds suitable for use as a first group include, but are not limited to, biotin and iminobiotin.
  • Compounds suitable for use as a second group include, but are not limited to, avidin and streptavidin.
  • nucleic acid hybridization and wash conditions are chosen so that the polynucleotide molecules to be analyzed by the invention (refe ⁇ ed to herein as the "target polynucleotide molecules") specifically bind or specifically hybridize to the complementary polynucleotide sequences of the a ⁇ ay, preferably to a specific array site, wherein its complementary DNA is located.
  • a ⁇ ays containing double-stranded probe DNA situated thereon are preferably subjected to denaturing conditions to render the DNA single-stranded prior to contacting with the target polynucleotide molecules.
  • Arrays containing single-stranded probe DNA may need to be denatured prior to contacting with the target polynucleotide molecules, e.g., to remove hairpins or dimers which form due to self complementary sequences.
  • Optimal hybridization conditions will depend on the length (e.g., oligomer versus polynucleotide greater than 200 bases) and type (e.g., RNA, or DNA) of probe and target nucleic acids.
  • General parameters for specific (i.e., stringent) hybridization conditions for nucleic acids are described in Sambrook et al, (supra), and in Ausubel et al, 1987, Current Protocols in Molecular Biology, Greene Publishing and Wiley-Interscience, New York.
  • typical hybridization conditions are hybridization in 5 X SSC plus 0.2% SDS at 65 °C for four hours, followed by washes at 25
  • Particularly prefened hybridization conditions for use with the screening and/or signaling chips of the present invention include hybridization at a temperature at or near the mean melting temperature of the probes (e.g., within 5 °C, more preferably within 2 °C) in
  • cDNA or cRNA complementary to the total cellular mRNA when detectably labeled (e.g., with a fluorophore) cDNA or cRNA complementary to the total cellular mRNA is hybridized to a microa ⁇ ay, the site on the anay conesponding to a gene (i-e., capable of specifically binding the product of the gene) that is not transcribed in the cell will have little or no signal (e.g., fluorescent signal), and a gene for which the encoded mRNA is prevalent will have a relatively strong signal.
  • a gene i-e., capable of specifically binding the product of the gene
  • cDNAs or cRNAs from two different cells are hybridized to the binding sites of the microanay.
  • one cell is a wild-type cell and another cell is of the same type but is aneuploid.
  • the cDNA or cRNA derived from each of the two cell types are differently labeled so that they can be distinguished.
  • cDNA or cRNA from an aneuploid cell is synthesized using a fluorescein-labeled dNTP
  • cDNA or cRNA from a second, wild- type cell is synthesized using a rhodamine-labeled dNTP.
  • the relative intensity of signal from each cDNA or cRNA set is determined for each site on the a ⁇ ay, and any relative difference in abundance of a particular mRNA is thereby detected.
  • the cDNA or cRNA from the aneuploid cell will fluoresce green when the fluorophore is stimulated, and the cDNA or cRNA from the wild-type cell will fluoresce red.
  • the aneuploidy has no effect, either directly or indirectly, on the relative abundance of a particular mRNA in a cell, the mRNA will be equally prevalent in both cells, and, upon reverse transcription, red-labeled and green-labeled cDNA or cRNA will be equally prevalent.
  • the binding site(s) for that species of RNA will emit wavelength characteristic of both fluorophores.
  • the aneuploidy either directly or indirectly increases the prevalence of the mRNA in the cell, the ratio of green to red fluorescence will increase. When the mutation decreases the mRNA prevalence, the ratio will decrease.
  • the fluorescent labels in two-color differential hybridization experiments are reversed to reduce biases peculiar to individual genes or anay spot locations, and consequently, to reduce experimental e ⁇ or.
  • the fluorescence emissions at each site of a transcript anay can be, preferably, detected by scanning confocal laser microscopy or a charge-coupled device ("CCD").
  • CCD charge-coupled device
  • a separate scan, using the appropriate excitation line, is carried out for each of the two fluorophores used.
  • a laser can be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously (see Shalon et al, 1996, Genome Res. 6:639-645).
  • the a ⁇ ays are scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective.
  • Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser, and the emitted light is split by wavelength and detected with two photomultiplier tubes.
  • fluorescence laser scanning devices are described, e.g., in Schena et a , 1996, Genome Res. 6:639-645.
  • the fiber-optic bundle described by Ferguson et al, 1996, Nature Biotech. 14:1681-1684 may be used to monitor mRNA abundance levels at a large number of sites simultaneously. Signals are recorded and, in a preferred embodiment, analyzed by computer, e.g., using a 12 bit analog to digital board.
  • the scanned image is despeckled using a graphics program (e.g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet of the average hybridization at each wavelength at each site. If necessary, an experimentally determined co ⁇ ection for "cross talk" (or overlap) between the channels for the two fluors may be made. For any particular hybridization site on the transcript a ⁇ ay, a ratio of the emission of the two fluorophores can be calculated. The ratio is independent of the absolute expression level of the cognate gene, but is useful for genes whose expression is significantly modulated by alterations in the genotype of a cell.
  • a graphics program e.g., Hijaak Graphics Suite
  • the relative abundance of an mRNA in two cells or cell lines is scored as a perturbation and its magnitude determined (i.e., the abundance is different in the two sources of mRNA tested) or as not perturbed (i.e., the relative abundance is the same).
  • a difference between the two sources of RNA of at least a factor of about 25% i. e. , RNA is 25% more abundant in one source than in the other source
  • more usually about 50% even more often by a factor of about 2 (i.e., twice as abundant), 3 (three times as abundant), or 5 (five times as abundant) is scored as a perturbation.
  • Present detection methods allow reliable detection of difference of an order of about 3-fold to about 5-fold, but more sensitive methods are expected to be developed.
  • Yeast strains The genotypes of the nearly 300 strains used to generate expression profiles can be found at the Rosetta Inpharmatics, Inc. web site (www.rii.com). Essentially all 300 strains are derived from strain BY4743 (MATa/MAT Hs3 ⁇ l/his3 ⁇ l leu2 ⁇ 0/leu2 ⁇ 0 ura3 ⁇ O/ura3 ⁇ O +/metl5 ⁇ 0 A/lys2 ⁇ O), the parental strain for the
  • Yeast culture and cDNA microa ⁇ ay expression analysis were grown, harvested, and processed in parallel with conesponding wild-type or ' control cultures. Several colonies of similar size were picked from freshly-streaked YAPD agar plates into liquid Synthetic Complete medium (SC) with 2% glucose, grown overnight at 30 °C to mid-log phase, diluted to 0.4 - 1.0 x 10 6 cells/ml, and grown an additional 5-7 hours until reaching 0.4 - 1.0 x 10 7 cells/ml, at which point they were sedimented by centrifugation for 2 minutes at room temperature and frozen in liquid nitrogen. The final optical densities of experimental and control cultures were matched as closely as possible.
  • SC Synthetic Complete medium
  • RNA was prepared by phenolchlorofoim extraction followed by ethanol precipitation, as described previously (Marton et al. (1998) Nat. Med. 4:1293-1301), except that vortexing with glass beads was replace by a 10 minute incubation at 65 °C followed by 1 minute of vortexing.
  • Poly-A+ RNA purification, cDNA labeling, microanay production, and microanay hybridization and washing were as described previously (Marton et al. (1998) Nat. Med. 4:1293-1301) with measurements taken in fluor-reversed pairs. A ⁇ ays were scanned, images were quantitated and physical artifacts (dust and salt residue) edited as described previously (Marton et al. (1998) Nat. Med.
  • Genomic DNA extraction, labeling, and hybridization to microanays genomic DNA was extracted from 5 ml saturated cultures grown in YPD medium with minor modifications to standard techniques. See. Hoffman, CS. & Winston, F. (1987) Gene 57:267-272. Two micrograms of genomic DNA were denatured and annealed to 1 ⁇ g random hexamers, and labeled at 37° C in 15 ⁇ l reactions containing lx NEB buffer 2, 7 units of Klenow fragment of DNA Polymerase I, 500 ⁇ M dATP, dCTP, and dGTP, 200 ⁇ M dUTP, and 100 ⁇ M Cy-dUTP. Production of
  • Microa ⁇ ays were scanned on either a General Scanning ScanA ⁇ ay3000 or a Genetic Microsystems 418 Anay Scanner. For determination of aneuploidy in small colonies versus large colonies, cells were streaked on five plates, and approximately 2000 small colonies or
  • the mean chromosomal ratio plots (Figs. 1, 4, 5) 0 display, in logarithmic scale, the average of all expression ratios for each individual chromosome.
  • the mean expression ratio for each chromosome is an e ⁇ or-weighted mean of all the ORFs present on that chromosome, with the enor calculated based on the individual spot intensity slide quality, and the slide quality, i.e., the degree to which the determined ratios agree in each of two slides from a fluor-reversed pair of hybridizations 5 done per experiment.
  • a cliromosome was flagged as having a statistically significant chromosome-wide expression bias if the mean chromosomal ratio had an offset of greater than 0.1 in log space and was at least ten standard deviations from the mean (P ⁇ 10 "20 ).
  • P values were calculated from the number of standard deviations from the mean, assuming a Gaussian distribution, which was verified by analysis of 63 wild-type vs wild-type control 0 experiments (Hughes et al. (2000) Cell 102(1):109-126).
  • the estimated systematic bias of each chromosome with respect to the mean is at the level of 0.0016 of log 10 (ratio).
  • the enor bar of the mean ratio in log space was computed from the spread of the data, taking into account the enor of each point and the number of data points.
  • Segmental aneuploidy To explore expression profiling data for potential 5 occurrences of segmental aneuploidy, data were scanned for instances in which four or more non-overlapping, chromosomally-adjacent genes were all up- or down-regulated at a 0.05 significance threshold. Twenty-two cases were identified in which at least four adjacent genes were apparently coordinately regulated. Four cases were tested (three of the four are listed in Table Id) and all were confirmed experimentally by genomic DNA hybridization. The rpUOa ⁇ mutant contained a 56-ORF duplication from YOR290c to
  • YOR343c which in the wild-type is flanked by retrotransposon long terminal repeats (LTRs) and a Ty2 transposon on the centrometric and telomeric sides, respectively.
  • the top3 ⁇ mutant contained a 28-ORF duplication from YLR228c to YOR256w and in the wild- type is flanked by LTRs and a Tyl transposon on the centromeric and telomeric sides, respectively.
  • the genomic DNA hybridization of the rad27 ⁇ mutant was consistent with an
  • 25 chromosome revealed that, on average, the expression of all genes on chromosome Nil was higher in the erg4 ⁇ and ecml8 ⁇ /ecml8 ⁇ mutants, respectively, than in the parental wild-type control to which the mutant was compared (Fig. ld,e; circles).
  • genomic D A from the mutant and parental wild-type strains was isolated, labeled and hybridized to 0 D ⁇ A microanays, and the results plotted in the same manner (Fig. ld,e; squares).
  • the mutant strains contain an additional copy or copies of chromosome Nil.
  • the discovery of a spurious co ⁇ elation resulting from aneuploidy in two independent yeast mutants not known to suffer chromosome instability prompted a search for additional examples of aneuploidy in a collection of expression profiles.
  • Plots of the mean expression ratio for each chromosome for all other mutants profiled revealed that expression profiles from -8% of the mutants (22 of 290) contained at least one cliromosome that displayed a mean chromosomal ratio bias greater than 0.1 in log space and that was at
  • YOR3436 is precisely flanked by retrotransposon long terminal repeats (Fig. 2b, d) and contains RPL20B, which encodes a protein with 99% identity to RPL20 ⁇ .
  • the duplication may have been the result of a homologous recombination event and a selection for increased dosage of RPL20B.
  • An expression profile thus serves as a tool for the detection of aneuploidy, including even small deletions or duplications.
  • This example shows that aneuploidy can be detected in publicly available expression data obtained using SAGE and using microa ⁇ ays.
  • Example 1 Several studies contained data suggestive of aneuploidy but the expression biases did not meet the criteria described above in Example 1 (0.1 bias in log space and at least ten standard deviations from the mean). For example, an expression bias was noted in data from strain El that underwent adaptive evolution during approximately 500 generations in glucose-limited media (Ferea, T.L., Botstein, D., Brown, P.O. & Rosenzweig, R.F. (1999) Proc. Natl. Acad. Sci.

Abstract

The present invention relates to methods for detecting aneuploidy or, in particular, determining the likelihood that aneuploidy is present in a cell type or organism. In particular, the invention relates to the use of profiles for detecting aneuploidy or for determining the likelihood of the presence of aneuploidy in a cell type or organism that is associated with a disease, or with a predisposition toward a certain disease. The present invention also relates to methods of correcting a profile for the presence of aneuploidy. The present further invention relates to a computer system, a computer program product and kits for detecting aneuploidy or determining the likelihood that aneuploidy is present in a cell type or organism.

Description

USE OF PROFILING FOR DETECTING ANEUPLOIDY
1. FIELD OF THE INVENTION
The present invention relates to methods of using profiles to detect aneuploidy, in particular, to determine the likelihood that aneuploidy is present in a cell type or organism. In particular, the present invention relates to methods of diagnosing or determining the predisposition of an organism toward diseases that are associated with abnormal copy 0 numbers of one or more genes, i.e., aneuploidy. The present invention also relates to methods of correcting a profile for the presence of aneuploidy. The present invention further relates to a computer system, a computer program product, and kits for detecting aneuploidy or determining the likelihood that aneuploidy is present in a cell type or organism from profiles. 5
2. BACKGROUND OF THE INVENTION
Aneuploid cells have a chromosomal constitution that differs from the usual chromosomal constitution for a given species. Germ line cells are said to have n chromosomes. If an organism or species has 2n number of chromosomes in its somatic 0 cells, the organism or species is said to be diploid. Different organisms or species, or the same organism at different phases of a life cycle, can have different ploidy. Yeast cells, e.g., can grow as haploid (n number of chromosomes) or diploid (2n number of chromosomes) or polyploid (such as 4n; see Galitski et al. (1999) Science 285:251-254). Many plant species are octaploid (8n). Humans are naturally diploid in somatic cells. 5 Aneuploidy may occur by loss or gain of one or more chromosomes or chromosomal segments and can have drastic effects on phenotypic expression. -Aneuploidy usually results from non-disjunction of chromosomes during meiosis, which in turn results in gametes having too many or too few chromosomes. Chromosomal non-disjunction can also occur during mitosis, resulting in individuals that express chromosomal mosaicism, i.e., having 0 some somatic cells or tissues that are aneuploid and others that are euploid, which may be associated with mild to severe phenotypic manifestations. Euploid ("true-ploid") cells have the appropriate or correct amount of genetic material for a given species. Therefore, they are the opposite of aneuploid cells. Aneuploidy can also result from spurious recombination events that result in the amplification or duplication of either full chromosomes or *5 chromosomal segments. Aneuploidy is often lethal in animals but can be tolerated to a greater extent in plants. Trisomies (2n+l chromosomes) are the most common form of aneuploidy and result in the least severe phenotypic aberrations. In humans, approximately 95% of Down syndrome cases, 95% of Edwards syndrome cases, and 80% of Patau syndrome cases result from complete trisomy of a chromosome, i.e., of chromosome 21, chromosome 18, and chromosome 13, respectively. (The Merck Manual 2233-37, Mark H. Beers and Robert Berkow eds., Merck Research Laboratories 17th ed. 1999). In humans, these and other syndromes caused by aneuploidy are likely to be accompanied by physical deformities, severe mental retardation and decreased life expectancy. In contrast, aneuploid species of some plants, e.g., wheat, may either have almost wild-type characteristics or may be small and infertile, depending on which chromosome is affected (E.R. Sears, "The Aneuploids of Common Wheat," University of Missouri Research Bulletin, November, 1954).
Contiguous gene syndromes result from deletions and amplifications of regions of chromosomes. Contiguous gene syndromes in humans may cause severe mental and physical deformities, and include Alagille syndrome (20p.l2 chromosomal deletion), Angelman syndrome (15ql 1 deletion of maternal chromosome), DiGeorge syndrome (22qll.21 deletion), Langer-Gidion syndrome (8q24.1 deletion), Miller-Dieker syndrome (17pl3.3 deletion), Prader-Willi syndrome (15ql 1 deletion of paternal chromosome), Rubinstein-Taybi syndrome (16pl3 deletion), Smith Magenis syndrome (17pl l.2 deletion), and Williams syndrome (7ql 1.23 deletion) (The Merck Manual 2233-37, Mark H. Beers and Robert Berkow eds., Merck Research Laboratories 17th ed. 1999).
Gene amplification, deletions and translocations have also been linked to a number of cancers. For example, recurrent DNA amplifications, i.e., in the region of the ERBB2 gene, have been found in several human breast tumor cell lines and in one primary breast tumor (Pollack et al, 1999, Nature genetics 23:41-46). Likewise, a 50-fold amplification of a 300 kilobase region around the myc oncogene was found in a colon cancer cell line (Kallioniemi et al. (1992) Science 258:818-821). Furthermore, cancers may be associated with physical deletions of regions of chromosomes that include tumor suppressor genes (see Kallioniemi, supra). Other cancers that may be associated with aneuploidy include, but are not limited to, leukemias, such as acute myelogenous leukemia, chronic myelocytic leukemia, acute promyelocytic leukemia, acute nonlymphocytic leukemia, acute monocytic leukemia, and acute myelomonocytic leukemia; lymphomas, such as Burkitt's lymphoma, and non-Hodgkin's lymphoma; lymphocytic leukemias, such as acute lymphoblastic leukemia and chronic lymphocytic leukemia; myeloproliferative diseases; adenocarcinomas including small cell lung cancer, kidney cancer, uterine cancer, cervical cancer, prostate cancer, bladder cancer, and ovarian cancer; sarcomas including liposarcoma, synovial sarcoma, rhabdomyosarcoma, extraskeletal myxoid chondrosarcoma, Ewing's tumor and peripheral neuroepithelioma; testicular and ovarian dysgerminoma; retinoblastoma; Wilms' tumor; neuroblastoma; malignant melanoma; and mesothelioma. Indeed, nearly all solid cancers are aneuploid, and furthermore, aneuploidy correlates one hundred percent with transformation of mammalian cells in vitro using non-genotoxic carcinogens such as colcemid, benz[a]pyrene, methylcholanthrene, dimethylbenzanthracene, 17 beta-estradiol, and diethylstilbestrol (Li et al. (1997) Proc. Natl. Acad. Sci. U.S.A. 94:14506-14511; Tsutsui and Barrett (1997) Environ Health Perspect. 105 Suppl. 3:619-624). This indicates that, in addition to the more broadly-established role of mutation of individual genes, aneuploidy itself may be a cause of neoplastic transformation.
Variations in gene dosage may occur not only in the nuclear DNA, but also in the DNA of the sub-cellular organelles including the mitochondrion and chloroplast. These variations may prove advantageous or deleterious. For example, some defects in mitochondrial DNA are known to be pathogenic in humans (Shadel, G.S. et al. (1997) Ann. Rev. Biochem. 66:409-435).
Chromosomal abnormalities may prove advantageous to a cell or organism. For example aneuploidy resulting in the amplifications of certain genes can compensate for deletions or defects in other genes or otherwise prove advantageous by, e.g., conferring a growth advantage on the aneuploid organism or cancerous cells. In addition, plants that are polyploid may be cultivated because they have new traits that are not seen in diploid species, such as increased vigor and higher yield (see the internet site at cc.ndsu.nodak.edu).
Traditionally, chromosomal abnormalities have been detected by karyotyping via microscopic examination of stained cells and their chromosomes. Circulating blood lymphocytes or amniocytes are collected and cultured in vitro under conditions that stimulate cell division. Colchicine is then added to arrest mitosis during metaphase.
Subsequently, cells are spread onto a microscope slide and stained using, e.g., G (Giemsa) or Q (fluorescent) banding techniques. The individual chromosomes are then photographed, and their images are cut out and arranged in order to determine the karyotype of the organism (The Merck Manual, p. 2233, Mark H. Beers and Robert Berkow eds., Merck Research Laboratories 17th ed. 1999).
Other techniques for detecting chromosomal abnormalities include restriction fragment length polymorphism (RFLP) analysis, and fluorescent in situ hybridization (FISH), which uses DNA probes with fluorescent tags to identify the organization of genes, and to locate gene deletions, rearrangements and duplications. Comparative genomic hybridization (CGH) has been used to reveal DNA copy-number variations across a genome using differentially labeled test and reference genomic DNAs, e.g., DNAs from malignant and normal cells, co-hybridized to normal metaphase chromosome spreads (Kallioniemi et al, supra). In addition, CGH on cDNA microarrays has been used to detect DNA copy- number variation in breast cancer cell lines and tumors (Solinas-Toldo et al. (1997) Genes Chromosomes Cancer 20(4):399-407; Pinkel et al. (1998) Nat. Genetics 20(2):207-211; Pollack et al. (1999) Nature Genetics 23:41-46).
Currently available techniques for examining chromosomal abnormalities, such as aneuploidy, including gene deletions, gene amplifications and gene translocations, have limitations. Traditional karyotyping is limited by the low number of high-quality metaphase chromosome spreads and can only be used to detect gross chromosomal abnormalities. Higher resolution techniques, such as FISH, are labor intensive if done on a large scale. Likewise, RFLP analysis can only be done on one genetic locus at a time, and some genomic changes might not result in a change in restriction fragment length. Traditional CGH has a limited mapping resolution, e.g., about 20 megabases; furthermore, small insertions or deletions may be difficult to detect using this method. Thus, there is clearly a need for a more sensitive and comprehensive method of detecting aneuploidy, as well as for a method of determining the likelihood that aneuploidy is present, in organisms.
Discussion or citation of a reference herein shall not be construed as an admission that such reference is prior art to the present invention.
3. SUMMARY OF THE INVENTION
The present invention relates to methods for detecting the presence of aneuploidy, in particular, to determine the likelihood that aneuploidy is present in a cell type or organism. In particular, the invention relates to methods of using expression profiles to detect the presence of aneuploidy. The present invention also relates to methods of diagnosing or determining the predisposition of an organism toward diseases that are associated with abnormal copy numbers of one or more genes, i.e., aneuploidy. In addition, the present invention relates to computer systems, computer program products, and kits for detecting aneuploidy, or diagnosing or determining the predisposition of a subject to diseases associated with aneuploidy, using profiles. In a first embodiment, the present invention relates to a method of determining whether aneuploidy is likely to be present in a cell type or organism comprising: (a) quanti--ying levels, in one or more cells of said cell type or organism, of a plurality of cellular constituents associated with a plurality of genes in the genome of said cell type or organism, said plurality of genes comprising genes mapped to different chromosomes; (b) comparing the quantified levels of cellular constituents associated with genes mapped to the same chromosome, to the mean quantified levels of said cellular constituents associated with said plurality of genes; and (c) identifying genes mapped to the same chromosome for which the quantified levels of cellular constituents associated with said genes are substantially the same for each of said genes and are dissimilar to the mean quantified levels of said cellular constituents associated with said plurality of genes; wherein identifying said genes in step (c) indicates that aneuploidy of said same chromosome or portion thereof is likely to be present in said cell type or organism.
In a second embodiment, the present invention relates to a computer system for determining whether aneuploidy is likely to be present in a cell type or organism, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute steps of: (a) comparing quantified levels, in one or more cells of said cell type or organism, of a plurality of cellular constituents associated with a plurality of genes in the genome of said cell type or organism, said genes being mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and (b) identifying genes mapped to the same chromosome for which the quantified level is substantially the same for each cellular constituent associated with each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism.
In a third embodiment, a computer program product is provided for directing a user computer in a computer-aided determination of whether aneuploidy is likely to be present in a cell type or organism, said computer program product comprising: computer code for comparing quantified levels, in one or more cells of said cell type or organism, of cellular constituents associated with genes in the genome of said cell type or organism mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and computer code for identifying genes mapped to the same chromosome for which the quantified levels of cellular constituents associated with said genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism. In a fourth embodiment, the present invention relates to a method of detecting the predisposition of a cell type or organism to a disease associated with aneuploidy, comprising: (a) quantifying the levels of a plurality of cellular constituents associated with a plurality of genes in the genome of one or more cells of said cell type or organism, said plurality comprising cellular constituents associated with genes mapped to different chromosomes; (b) comparing the quantified levels of cellular constituents associated with genes mapped to the same chromosome to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; and (c) identifying genes mapped to the same chromosome for which the quantified level of cellular constituents associated with said genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes in step (c) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism, and wherein said cell type or organism is likely to be predisposed to a disease associated with said aneuploidy of said same chromosome or portion thereof. In a fifth embodiment, the present invention relates to a computer system for detecting the predisposition of a cell type or organism to a disease associated with aneuploidy, comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: (a) comparing quantified levels of cellular constituents associated with genes in the genome of one or more cells of said cell type or organism, said genes being mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with plurality of genes mapped to different chromosomes; and (b) identifying genes mapped to the same chromosome for which the quantified level is substantially the same for each cellular constituent associated with each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism, and wherein said cell type or organism is likely to be predisposed to a disease associated with said aneuploidy of said same chromosome or portion thereof.
In a sixth embodiment, a computer program product is provided for directing a user computer in a computer-aided determination of whether a cell type or organism is predisposed to a disease associated with aneuploidy, said computer program product comprising: computer code for comparing quantified levels of cellular constituents associated with genes in the genome of one or more cells of said cell type or organism mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and computer code for identifying genes mapped to the same chromosome for which the quantified level of cellular constituents associated with each of said genes is substantially the same for each of said genes and is dissimilar to mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism, and wherein said cell type or organism is likely to be predisposed to a disease associated with said aneuploidy of said same chromosome or portion thereof.
In a seventh embodiment, the present invention relates to a method of determining whether aneuploidy is likely to be present in a cell type or organism comprising detecting an expression bias that is shared by a first plurality of genes mapped to a single chromosome or mapped to a chromosomal portion of interest in a cell of said cell type or from said organism, wherein said expression bias is present when measured levels of a first plurality of cellular constituents associated with said first plurality of genes are different from the mean of measured levels of a second plurality of cellular constituents associated with a second plurality of genes in said cell, wherein said second plurality of genes consists of at least one gene (or at least 10 or 50 or 100 or 1,000 genes) not mapped to said chromosome or not mapped to said chromosomal portion.
In an eighth embodiment, the present invention relates to a method for detecting the presence of aneuploidy in a cell type or type of organism, comprising: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein the known alteration in copy number of at least one known gene in the one or more landmark profiles determined to be most similar is indicative of the presence of aneuploidy in said first cell type or type of organism.
In a ninth embodiment, the present invention relates to a method of diagnosing a disease associated with aneuploidy in a cell type or type of organism, comprising: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism. In a tenth embodiment, the present invention relates to a computer system for diagnosing a disease associated with a known aneuploidy in a cell type or type of organism, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute steps of: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism. In an eleventh embodiment, a computer program product is provided for directing a user computer in a computer-aided diagnosis of a disease associated with a known aneuploidy in a cell type or organism, said computer program product comprising: computer code for comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism.
In a twelfth embodiment, the present invention relates to a kit for detecting the presence of aneuploidy in a subject comprising: (a) an array comprising a positionally- addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of said subject, wherein said different nucleotide sequences are known to be increased or decreased as a result of aneuploidy; and (b) expression profiles, in electronic or written form, each correlated to a known alteration in copy number of at least one gene, wherein said expression profiles are obtained by measuring a plurality of cellular constituents in a cell of said subject having a known alteration in copy number of said at least one gene. In a thirteenth embodiment, the present invention relates to a kit for detecting the presence of aneuploidy in a subject comprising: (a) an array comprising a positionally- addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of an organism; and (b) a container comprising RNA, or cDNA derived therefrom, of a cell having a known aneuploidy.
In a fourteenth embodiment, the present invention relates to a method of determining whether aneuploidy of one or more genes is likely to be present in a cell type or organism comprising: identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild-type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild- type organism; and wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes encoding said one or more cellular constituents is likely to be present in said cell type or organism. In a fifteenth embodiment, the present invention relates to a computer system for determining whether aneuploidy is likely to be present in a cell type or organism, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute step of: identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation* of said one or more cellular constituents or other cellular constituents in said wild-type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild-type organism; and wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes encoding said one or more cellular constituents is likely to be present in said cell type or organism. In a sixteenth embodiment, a computer program product is provided for directing a user computer in a computer-aided determination that aneuploidy is likely to be present in a cell type or organism, said computer program product comprising: computer code for identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild-type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild-type organism; and wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes encoding said one or more cellular constituents is likely to be present in said cell type or organism.
In a seventeenth embodiment, the present invention relates to a method of correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, comprising: determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value of the mean chromosomal offset ratio. In an eighteenth embodiment, the present invention relates to a method of correcting a profile of a cell type or organism for aneuploidy, comprising: determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number; and dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
In a nineteenth embodiment, the present invention relates to a method of correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, comprising: dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number.
In a twentieth embodiment, the present invention relates to a method of correcting a profile of a cell type or organism for aneuploidy, comprising: dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number. In a twenty-first embodiment, the present invention relates to a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value of the mean chromosomal offset ratio.
In a twenty-second embodiment, a computer program product is provided for directing a user computer in a computer-aided correction of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value of the mean chromosomal offset ratio.
In a twenty-third embodiment, the present invention relates to a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnoπnal copy number; and dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
In a twenty-fourth embodiment, a computer program product is provided for directing a user computer in a computer-aided correction of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
In a twenty-fifth embodiment, the present invention relates to a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number. In a twenty-sixth embodiment, a computer program product is provided for directing a user computer in a computer-aided correction of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number. In a twenty-seventh embodiment, the present invention relates to a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
In a twenty-eighth embodiment, a computer program product is provided for directing a user computer in a computer-aided correction of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio c for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number. 4. BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 shows chromosome VII expression bias in erg4A and ecml8Δlecml8Δ mutants of the yeast Saccharomyces cerevisiae as determined by expression profiling, and confirmation of aneuploidy by two-color hybridization of genomic DNA from said mutants to DNA microarrays. Circles represent the mean of the log10(expression ratio) of all genes on an individual chromosome and squares represent the mean of the log10(genomic content signal ratios) of all genes on an individual chromosome.
FIG. 2 shows segmental aneuploidy in an rpl20aAlrpl20aA mutant.
FIG. 3 illustrates a computer system useful for embodiments of the invention.
FIG. 4 shows selection for aneuploidy in rnrlΔ and rsp24aΔ/rsp24aΔ mutants that results in a growth advantage.
FIG. 5 shows spurious correlation between two mutants displaying a large transcriptional signature resulting from aneuploidy.
FIG. 6A shows expression data for the tuplΔ deletion mutant that reveals chromosome- wide expression biases that are consistent with aneuploidy; FIGS. 6B-6C respectively show chromosome-wide expression biases in rpblΔ187 and hhf2 expression profiles consistent with aneuploidy; and FIG. 6D shows an expression profile of &pip2Δ oaflΔ double mutant determined by SAGE analysis that suggests a chromosome- wide expression bias that is consistent with aneuploidy.
5. DETAILED DESCRIPTION OF THE INVENTION
This section presents a detailed description of the invention and its applications. The description is by way of several exemplary illustrations, in increasing detail and specificity, of the general methods of this invention. These examples are non-limiting, and related variants will be apparent to one of skill in the art.
Although, for simplicity, this disclosure often makes references to gene expression profiles, transcriptional rate, transcript levels, etc., it will be understood by those skilled in the art that the methods of the invention are useful for the analysis of any biological response profile. In particular, one skilled in the art will recognize that the methods of the present invention are equally applicable to profiles that comprise measurements of other cellular constituents such as, but not limited to, measurements of protein abundance or protein activity levels.
Moreover, although for simplicity this disclosure often makes reference to "a cell" (e.g., "mutation of a gene in a cell"), it will be understood by those of skill in the art that any particular step of the invention will also be construed as covering use of a plurality of cells, e.g., from a tissue sample from an organism, or from a cultured cell line. A "cell type," as used herein, can refer to a cell of a species of interest (e.g., corn, bean, human, mouse), a lineage of interest (e.g., blood cell, nerve cell, skin cell), or a tissue of interest (e.g., lung, brain, heart). Such cells can be from naturally single-celled organisms or derived from multi-cellular higher organisms. The cell can be a cell of a plant, including but not limited to a monocot, such as rice, corn, wheat and other grasses, or a dicot, such as beans, Arabidopsis, potatoes or tobacco, or an animal, including but not limited to mammals, primates, humans, and non-human animals such as dogs, cats, horses, cows, sheep, mice, rats, etc.
5.1 INTRODUCTION
Aneuploidy may have effects on the biological state of a cell, which can be represented by measured amounts of cellular constituents as defined in Section 5.1.1, below. The variations in gene dosage, in addition to affecting the biological state of the cell, may also affect the phenotype or predisposition of an organism to a disease. The inventors have discovered that a variation in gene copy number is mirrored in the expression profiles of an organism. Thus, an organism that is, e.g., trisomic for a particular chromosome, will exhibit, for example, increased levels of mRNA transcribed from a plurality of genes on the trisomic chromosome. The invention is also premised upon the observation that, in some organisms with altered gene dosage, i.e. copy number, such as yeast and humans, there is no global somatic chromosome dosage compensation mechanism to normalize expression from each gene having an abnormal copy number. As a result, the expression of a plurality of the genes on a, e.g., trisomic chromosome, will be up-regulated. Thus, in general, altered gene copy number is not masked by mechanisms at work in the organism for maintaining homeostasis. An expression profile thus serves as a tool for the detection of aneuploidy, including even small deletions or duplications.
This section first presents a background about representations of biological state in terms of measured amounts of cellular constituents. Next, an overview is presented of the methods of the invention for detecting aneuploidy and for diagnosis or prognosis of a disease. The following sections present specific non-limiting embodiments of this invention in greater detail. 5.1.1 DEFINITION OF BIOLOGICAL STATE
As used in herein, the term "biological sample" is broadly defined to include any cell, tissue, organ or multicellular organism. A biological sample can be derived, for example, from cell or tissue cultures in vitro. Alternatively, a biological sample can be derived from a living organism or organisms or from a population of single cell organisms.
The state of a biological sample can be measured by the content, activities or structures of its cellular constituents. The state of a biological sample, as used herein, is taken from the state of a collection of cellular constituents, which are sufficient to characterize the cell or organism for an intended purpose including, but not limited to characterizing the effects of variations in gene dosages, i.e., copy number. The term "cellular constituent" is also broadly defined in this disclosure to encompass any kind of measurable biological variable. The measurements and/or observations made on the state of these constituents can be of their abundances (i.e., amounts or concentrations in a biological sample), or their activities, or their states of modification (e.g., phosphorylation), or other measurements relevant to the biology of a biological sample. In various embodiments, this invention includes making such measurements and/or observations on different collections of cellular constituents. These different collections of cellular constituents are also called herein aspects of the biological state of a biological sample. It is noted that, as used herein, the term "cellular constituent" is not intended to refer to known subcellular organelles such as mitochondria, chloroplasts, lysozomes, etc.
One aspect of the biological state of a biological sample (e.g., a cell or cell culture) usefully measured in the present invention is its transcriptional state. In fact, the transcriptional state is the currently preferred aspect of the biological state measured in this invention. The transcriptional state of a biological sample includes the identities and abundances of the constituent RNA species, especially mRNAs, in the cell under a given set of conditions. Preferably, a substantial fraction of all constituent RNA species in the biological sample are measured, but at least a sufficient fraction is measured to characterize a variation in gene dosage. The transcriptional state of a biological sample can be conveniently determined by, e.g., measuring cDNA abundances by any of several existing gene expression technologies. One particularly preferred embodiment of the invention employs DNA arrays for measuring mRNA or transcript level of a large number of genes. Another aspect of the biological state of a biological sample usefully measured in the present invention is its translational state. The translational state of a biological sample includes the identities and abundances of the constituent protein species in the biological sample under a given set of conditions. Preferably, a substantial fraction of all constituent protein species in the biological sample is measured, but at least a sufficient fraction is measured to characterize the action of a perturbation of interest. As is known to those of skill in the art, the transcriptional state is often representative of the translational state. Other aspects of the biological state of a biological sample are also of use in this invention. For example, the activity state of a biological sample, as that term is used herein, includes the activities of the constituent protein species (and also, optionally, catalytically active nucleic acid species) in the biological sample under a given set of conditions. As is known to those of skill in the art, the translational state is often representative of the activity state. This invention is also adaptable, where relevant, to "mixed" aspects of the biological state of a biological sample in which measurements of different aspects of the biological state of a biological sample are combined. For example, in one mixed aspect, the abundances of certain RNA species and of certain protein species, are combined with measurements of the activities of certain other protein species. Further, it will be appreciated from the following that this invention is also adaptable to other aspects of the biological state of the biological sample that are measurable.
Preferably, the biological state of a biological sample (e.g., of a cell or cell culture) is represented by a profile of a plurality of cellular constituents. Such a profile of cellular constituents can be represented, for example, by a vector S, S = [S1, ...,Sl ,..., Sk] (Equation 1) wherein S; is the level or value of the z'th cellular constituent. For example, S; can be the transcription level of gene i or, alternatively, the abundance or activity level of protein i. In certain embodiments, the elements S, are continuous variables. For example, transcriptional rates are typically indicated as numbers of molecules synthesized per unit of time. Transcriptional rates can also be indicated as percentages of a control rate. In certain other embodiments, the elements S, can be categorical variables. For example, transcriptional rates can be indicated as either "on" or "off," where the value "on" indicates a transcriptional rate above a user-determined threshold value and "off indicates a transcriptional rate below that threshold.
5.1.2 REPRESENTATION OF BIOLOGICAL RESPONSES
The response of a biological sample to a variation in gene dosage resulting from aneuploidy, can be measured by observing changes in the biological state of the sample. A biological response profile is a collection of such changes of cellular constituents. For example, the profile of a biological sample (e.g., a cell or cell culture) resulting from the variation in gene dosage m can be represented by the vector v(m), v(w
Figure imgf000020_0001
...vP] (Equation 2).
In Equation 2, v(m) is the amplitude of the response of cellular constituent i in a biological sample subject to the variation in gene dosage m, i.e., aneuploidy, such as that which occurs as a result of trisomy of a particular chromosome. In some embodiments, vfm) can be simply the absolute measured amounts, e.g., abundances, activity levels or levels of modification, of cellular constituent i in a biological sample having the variation in gene dosage m, or the difference in measured amounts of cellular constituent i between a biological sample that has the variation in gene dosage m and a sample that does not have the variation in gene dosage m. In other embodiments, v/m) can be the ratio (or, more preferably, the logarithm of the ratio) of the measured amounts of cellular constituent i in a sample having the variation in gene dosage m to a sample that does not have the variation in gene dosage m.
Aneuploidy can include, for example, genetic "knockouts" in which one or more particular genes of the cell or organism are deleted or inactivated, e.g., by standard techniques, such as homologous recombination, that are well known in the art. Such aneuploidy can also include amplifications, e.g., duplications, of at least one gene, of a portion of a gene sufficient to be expressed, or of a chromosome or a portion thereof. In such embodiments, the response v["l) of the -.'th cellular constituent to a particular alteration in gene dosage, m, can simply be the ratio of or difference between the measured amounts of cellular constituent i in a cell or cells having the particular altered gene dosage and in a cell or cells that do not have the altered gene dosage. In other such embodiments, v(m) can be the ratio (or, more preferably, the logarithm of the ratio) of the measured amounts of cellular constituent i in a cell or cells having the particular alteration in gene dosage to such measured amounts in a cell or cells that do not have the particular alteration in gene dosage. In still other embodiments, the response vfm) of the z'th cellular constituent to a particular alteration of gene dosage, m, can be the absolute amount of cellular constituent i in the cell or cells having the altered gene dosage, e.g., the number of mRNA molecules per cell.
In preferred embodiments, v(m) is set equal to zero for all cellular constituents i whose responses are below a threshold amplitude or confidence level which can be determined, e.g., from knowledge of the measurement error behavior. For example, in some embodiments, only cellular constituents that have a response greater than or equal to two standard errors in more than N profiles may be selected for subsequent analysis, where the number of profiles N is selected by a user of the invention.
For those cellular constituents whose responses are above the threshold amplitude, Vι(m) may be equal to the measured value. For example, in embodiments wherein the variation in gene dosage m comprises graded levels of gene dosage, n, v{m) may be made equal to the expression and/or activity of the 'th cellular constituent at the highest dosage of the gene, m. Alternatively, the response at different gene dosages, u may be interpolated to a smooth, piece-wise continuous function, e.g., by spline- or model-fitting, and v(m) made equal to some parameter of the interpolation. For example, in spline-fitting the response data to various levels of gene dosage m are interpolated by summing products of an appropriate spline interpolation function S multiplied by the measured data values, as illustrated by Equation 3: vϊm) 0) = ∑ S(u - u,) x vfm) (w,) (Equation 3)
/
The variable "u" in Equation 3, above, refers to an arbitrary value of the gene dosage level where the response of the z'th cellular constituent is to be evaluated. In general, S can be any smooth, or at least piece- wise continuous, function of limited support having a width characteristic of the structure expected in the response functions. An exemplary width can be chosen to be the distance over which the response function being interpolated rises from 10% to 90% of its asymptotic value. Exemplary S function include linear and Gaussian interpolation.
In model-fitting, the response data to various levels ut of the gene dosage n are interpolated by approximating the response by a single parameterized function. An exemplary model-fitting function appropriate for approximating transcriptional state data is the Hill function:
H^ = ι Λ / v. (Equation 4)
1 + (u l u0)
The Hill function shown in Equation 4, above, comprises adjustable parameters of: (1) an amplitude parameter a; (2) an exponent n; and (3) an inflection point parameter 0. The adjustable parameters are selected independently for each cellular constituent. Preferably, the adjustable parameters are selected so that for each cellular constituent of the response the sum of the squared of the distances of H(u.) from v(m)(u^ is minimized. This preferable parameters adjustment method is well known in the art as a least squares fit of H() to
Figure imgf000021_0001
Such a fit can be done using any of the many available numerical methods known in the art (see, e.g., Press et al., 1996, Numerical Recipes in C, 2nd Ed., Cambridge University Press, Chpts. 10 and 14; Branch et al, 1996, Matlab Optimization Toolbox User's Guide, Mathworks, Natick, MA). The response amplitude v.(m) can then be selected to be equal to, e.g. , the amplitude parameter a in Equation 4.
In an alternative embodiment, the biological response profile data may be categorical. For example, in a binary approximation the response amplitude vfm) is set equal to zero if there is no significant response, and is set equal to 1 if there is a significant response. Alternatively, in a ternary approximation the response amplitude: (1) is set equal to +1 if cellular constituent i has a significant increase in expression or activity in a biological sample having gene dosage m; (2) is set equal to zero if there is no significant response; and (3) is set equal to -1 if there is a significant decrease in expression or activity. Such embodiments are particularly preferred if it is known or suspected that the responses to which the biological response profile v-(m) is to be compared do not have the same relative amplitudes as v(m) but do involve the same cellular constituents. In yet other embodiments, it is desirable to use "Mutual Information" as described, e.g., by Brunei (1998, Neural Computation 10(7) .1731-1757).
In all of the above-described embodiments, it is often preferred to normalize the profile by scaling all elements of the vector v(m) (i.e., v(m) for all i) by the same constant so that the vector length | v(m) | is unity. Generally, the vector length can be defined by Equation 5:
,(«) = J∑ (v}"))2 (Equation 5)
5.2 METHODS OFDETECTINGANEUPLOIDY This section presents first the general methods of the invention, then presents certain alternative embodiments of the invention, including applications of the methods of the invention to the diagnosis of disease, the determination of a predisposition of an organism toward a disease, and gene mapping.
The methods of the present invention use profiles, which comprise measurements of levels of individual cellular constituents (or changes in such measurements), e.g., measurements of abundances of mRNA or protein species, protein activities, levels of protein modification such as phosphorylation of kinases, etc., to detect aneuploidy, in particular, to determine the likelihood that aneuploidy is present in the genome of an organism. The RNA transcript or protein abundance profile of an organism may be compared with a library of landmark profiles, or "compendium," obtained from organisms with known copy numbers of particular genes. If a profile of a subj ect cell type or organism is shown to correlate with one or more compendium profiles from a cell type or organism having aneuploidy associated with a certain disease, then the disease can be diagnosed or predicted in the subject cell type or organism. Alternatively, calculation of the expression bias of chromosomally adjacent genes can be used to determine the presence of aneuploidy of a chromosome, or a portion thereof. Furthermore, detected co-regulation of sets of genes in aneuploid cells may reveal the chromosomal localization/mapping of unmapped genes in a genome, since those genes located in the same region of a chromosome in an aneuploid cell type or organism are more likely to show similarity in expression levels. In addition, detection of aneuploidy in cells will facilitate the accurate interpretation of whole genome expression data, particularly from cells known to have genetic instability, such as cancer cells.
5.2.1 GENERAL METHODS OF THE INVENTION
The methods of this invention employ certain types of cells, certain observations of changes in aspects of the biological state of these cells, and certain comparisons of the observed changes. In the following, these cell types, observations, and comparisons are described in turn in detail.
The present invention makes use of two principal types of cells: wild-type cells, and modified cells. "Wild-type" cells are reference, or standard, cells used in a particular application or embodiment of the methods of this invention. Being only a reference cell, a wild-type cell need not be a cell normally found in nature, and often will be a recombinant or genetically altered cell line. Usually the cells are cultured in vitro as a cell line or strain. Other cell types used in the particular application of the present invention are preferably derived from the wild-type cells. Less preferably, other cell types are derived from cells substantially isogenic with wild-type cells. For example, wild-type cells might be a particular cell line of the yeast Saccharomyces cerevisiae, or a particular mammalian cell line (e.g., HeLa cells). Although, for simplicity this disclosure often makes reference to single cells (e.g., "RNA is isolated from a cell deleted for a single gene"), it will be understood by those of skill in the art that more often any particular step of the invention will be carried out using a plurality of genetically identical cells, e.g., from a cultured cell line or tissue sample from a human patient.
Two cells are said to be "substantially isogenic" where their expressed genomes differ by a known amount that is at less than 10% of genetic loci, more preferably at less than 1%, or even more preferably at less than 0.1%. Alternatively, two cells can be considered substantially isogenic when the portions of their genomes relevant to the effects of altered gene dosages of interest differ by the preceding amounts. It is preferable that the differing loci be individually known.
"Modified cells" are derived from wild-type cells by modifications to the genome of the wild-type cells. As is commonly appreciated, protein activities result in part from protein abundances; protein abundances result from translation of mRNA (balanced against protein degradation); and mRNA abundances result from transcription of DNA and splicing of mRNA precursors (balanced against mRNA degradation). Therefore, genetic level modifications to a cellular DNA constituent alters transcribed mRNA abundances, translated protein abundances, and ultimately protein activities. As used herein, modified cells include those cells having altered gene dosages. Thus, an example of a modified cell comprises a cell having at least one gene, usually a protein-coding gene, that is substantially amplified. Alternatively, a modified cell comprises a cell having at least one gene that is substantially deleted. As used herein, deletion mutants also include mutants in which a gene has been disrupted so that usually no detectable mRNA or bioactive protein is expressed from the gene, even though some portion of the genetic material may be present. A modified cell further comprises a cell having a deviation from an exact multiple of the haploid number of chromosomes. Alternately, modified cells having altered gene dosages may not be derived from the wild-type cells but may instead be derived from cells that are substantially isogenic with wild-type cells, except for their particular genetic modifications.
As used herein, "aneuploidy" refers to a state or condition of a cell or organism wherein at least one gene in the genome of said cell or organism has a gene dosage that is altered from the gene dosage of a wild-type cell of said type or wild-type organism. The altered gene dosage can be the result of, mter alia, chromosome non-disjunction, homologous recombination or chromosome breakage. Thus, if a cell type or organism is diploid and there are two copies of a gene in the wild-type organism, three copies of the gene in that organism (or a cell therefrom) is an example of aneuploidy. "Segmental aneuploidy," as used herein, refers to altered gene dosages of several contiguous genes on a chromosome. Therefore, "segmental aneuploidy" is present in a cell or organism when preferably at least 2 contiguous genes, more preferably at least 10 contiguous genes, even more preferably at least 25 contiguous genes, even more preferably at least 100 contiguous genes in the genome of a cell or organism are amplified or deleted relative to a wild-type cell of said same type or a wild-type organism. Another example of "segmental aneuploidy" is an amplification or deletion of an entire arm of a chromosome. The methods of the present invention may be used to detect deletions or duplications of regions of DNA that are as large as several hundred kilobases (e.g., three hundred) to as small as about 1 kilobase in length. In addition, an "aneuploid cell or organism" is a cell or organism exhibiting variation in the dosage of at least one gene, or a portion thereof.
The methods of the invention involve observing changes in any of several aspects of the biological state of a cell (e.g., changes in the transcriptional state, in the translational state, in the activity state, and so forth) between a wild-type cell and a modified cell in order to detect a variation in gene dosage between the two cells. In some embodiments, it may be useful to create a known variation in dosage of a particular gene and measure the resulting profile, e.g., to create a database relating a profile to a particular aneuploidy, such as trisomy of human chromosome 21. By way of example, a variation in gene dosage can be achieved by amplification of one or more genes, or by over-expression or under-expression of the encoded RNA or protein of a gene (see Section 5.6 and its subsections, infra). In addition, a variation in gene dosage may result indirectly from introduction of one or more point mutations, insertions or deletions into a gene of interest by triggering an unwanted secondary event, such as compensation for the loss of function of a particular gene by amplification of a paralog gene and the surrounding genetic material. In the latter case, the aneuploidy that occurs in response to a genetic mutation is unpredictable and should be characterized. Aneuploidy of one or more genes in the genome of a cell or organism may result in a "perturbation" (change in the measured level) of a cellular constituent associated with said one or more genes, e.g., by resulting in an increase in mRNA messages transcribed from amplified genes or in an increase in protein levels encoded by the mRNAs. Measured levels of other cellular constituents may remain constant, and measured levels of still other cellular constituents may decrease in an aneuploid cell. The set of measured levels of cellular constituents can be referred to as a profile. For example, a profile can be a pattern of changes in mRNA abundances, protein abundances, protein activity levels, etc.
As used herein, a first cellular constituent and a second cellular constituent (that are the same or different and are from the same or a different cell) are said to be "differently perturbed" when for the first cellular constituent there is a positive perturbation and for the second cellular constituent there is a negative perturbation or no perturbation. In addition, cellular constituents are "differently perturbed" if for the first cellular constituent there is a negative perturbation and for the second cellular constituent there is a positive perturbation or no perturbation. Furthermore, two cellular constituents are "differently perturbed" if for the first cellular constituent there is no perturbation and for the second cellular constituent there is either a positive or a negative perturbation. In cases where the values of perturbations are measured, two perturbation can be said to be "differently perturbed" where the measured values for the two perturbations are detectably different, preferably having a statistically significant difference. As used herein, perturbations of a first and a second cellular constituent are said to be the "same" when both have a negative or a positive perturbation, or where the measured values are not significantly different.
The actual values present in a profile depend essentially on the measurement methods available for the particular cellular constituents being measured. Where quantitative abundances or activities are available, a numerical abundance or activity ratio can be calculated and placed in the profile. For example, in the case of transcriptional state measurements by quantitative gene expression technologies, a numerical expression ratio of the abundances of cDNAs (or mRNAs in an appropriate technology) in a modified biological sample and in a wild-type biological sample can be calculated. Alternatively, a logarithm (e.g., log10) (or another monotonic function) of the abundance ratio can be used. Alternatively, an absolute numerical abundance or activity, e.g., a number of mRNA molecules in a cell, can be measured and placed in the profile. Where only qualitative data is available, arbitrary integer values can be assigned to each type of perturbation of a cellular constituent. For example, the value +1 can be assigned to a positive perturbation; the value -1 to a negative perturbation; and the value 0 to no perturbation.
It is often convenient to represent graphically a profile as a two-dimensional physical array of perturbation values. When making such a graphical representation, the assignment of particular perturbation values to particular array positions can be entirely arbitrary or can be guided by any convenient principles. For example, related cellular constituents, such as genes, proteins, or protein activities of a particular pathway, can be grouped together, e.g., by "clustering" as described in co-pending U.S. Patent Application Nos. 09/220,142 (filed December 23, 1998), 09/428,427 (filed October 27, 1999), PCT International Publication WO 00/39336 published July 6, 2000 and PCT International Publication WO 00/24936 published May 4, 2000, which are incorporated herein by reference in their entireties. In the case of transcriptional state measurements by gene transcript arrays, the resulting profile can be arranged as the transcript array is arranged. In preferred embodiments, variations in gene dosage are detected by measuring and comparing changes in the transcriptional state of a cell. Analysis of the transcriptional state is often sufficient for purposes of characterizing aneuploidy, because no global dosage compensation mechanism for autosomes (non-sex chromosomes) is known to exist for normalization of expression from each gene (or chromosome) in aneuploid strains. Most aneuploidies produce a significant and characteristic change in the transcriptional state of the cell. Further, in yeast and humans, and probably other organisms, the homeostatic expression mechanisms to compensate for aneuploidy of autosomes have never been reported, and are not expected to exist. Thus, analysis of transcriptional state is often sufficient for purposes of characterizing aneuploidy. It will be understood by those of skill in the art that, although in general aneuploidy is discussed in terms of genomic nuclear DNA, aneuploidy may also exist in the genetic material of sub-cellular organelles, e.g., mitochondria and chloroplasts. Thus, gene copy number of, e.g., mitochondrial or chloroplast DNA, may also be assayed by the methods of the present invention in order to detect, e.g., the relative number of mitochondria or chloroplasts in a cell type or organism, or the presence of abnormal copy numbers of genes in these organelles, which may be indicative of desirable phenotypes or of disease. For example, a crop plant having a greater number of chloroplasts per cell than a wild-type plant may be desirable for growing in areas with shorter growth seasons. In addition, an alteration in the dosages of several mitochondrial genes may be indicative of the presence of or predisposition toward a particular disease in an organism. The modified-cell profile includes a plurality of perturbation values that represent the perturbation in cellular constituents observed in an aspect of the biological state of a modified cell resulting from an indicated variation in gene dosage, as described above. In one embodiment, the levels of cellular constituents associated with genes on different chromosomes are quantified, and are compared to quantified levels of cellular constituents associated with genes mapped to the same chromosome or a portion thereof. Aneuploidy of a chromosome or a portion thereof is then determined by identifying at least 1, preferably at least 4, still more preferably at least 10, and even more preferably at least 50 genes mapped to the same chromosome or a portion thereof for which the level of the cellular constituent associated with each gene is substantially the same and is dissimilar to the mean quantified levels of cellular constituents associated with genes mapped to different chromosomes. By a cellular constituent that is "associated with" a gene, as used herein, is meant a cellular constituent that either directly or indirectly originates from said gene. For example, the cellular constituent may be the mRNA that is transcribed from said gene. Alternatively, it may be the protein that is translated from said mRNA. Furthermore, the cellular constituent "associated with" a gene may be, for example, a protein target that is phosphorylated by the protein product of said gene, such that an increase in phosphorylation of said protein target is indicative of an increased amount or activity of the protein product of said gene.
In an alternative embodiment, an aspect of the biological state of a modified cell with a variation in gene dosage is measured and compared to that aspect of the biological state of the cell without such a variation (wild-type) in order to determine the cellular constituents in this aspect that are perturbed or are not perturbed. A profile comprising a collection of the measured changes in cellular constituents in the modified cell relative to a wild-type cell is not generally limited to revealing only changes directly due to the variation in gene dosage, because changes in the elements of the biological state that are indirectly affected by the particular gene dosage will also be apparent. This type of profile provides information about the effects of the variation in gene dosage on the biological state of a wild-type cell. The methods of this invention detect the presence of altered gene dosage, i.e., aneuploidy, in a cell type or organism. A "landmark profile," as used herein, refers to a profile of a modified cell or organism having a known alteration in copy number of one or more genes or to a profile of a wild-type cell or organism. A group of such profiles, preferably comprising a plurality of landmark profiles, each associated with a different, known aneuploidy, is herein called a compendium of landmark profiles, is assembled for detecting aneuploidy in an unknown cell type or organism.
It will be understood that a landmark profile that is "indicative of the presence or absence of aneuploidy of a particular gene, chromosomal region or chromosome", as used herein, does not have to conclusively indicate that aneuploidy is present or absent. A landmark profile that is indicative of the presence or absence, respectively, of aneuploidy indicates an increased probability that aneuploidy is present or absent, respectively, which can be with varying degrees of certainty, from aneuploidy being more likely than not present or absent, to it being reasonably conclusive that aneuploidy is present or absent, respectively. It will be understood by one of skill in the art that the presence or absence of aneuploidy of a particular gene, chromosome or portion thereof can be confirmed by other methods, such as polymerase chain reaction ("PCR") or comparative genomic hybridization. Using quantitative PCR, DNA copy number aberrations for single genes in a chromosomal region can be detected. In comparative genomic hybridization, labeled DNA is hybridized to metaphase chromosome spreads from normal cells and from cells suspected of being aneuploid. By measuring the relative amounts of hybridization of the labeled DNA to the two genomes, variations in gene copy number between the genomes can be detected.
It will be further understood that a landmark profile that is "indicative of the presence of aneuploidy" can be indicative of the presence of a particular type of aneuploidy. Therefore, an organism that has trisomy of chromosome 1 is likely to have a different profile from that of the same type of organism that has a 100-fold amplification of five contiguous genes on chromosome 1, which is likely to in turn have a different profile from that of the same type of organism that has a deletion of the short arm of chromosome 1. Therefore, the profile not only indicates the presence of aneuploidy, but can also indicate the type of aneuploidy that is present.
In a specific embodiment, in which the observed aspect of the biological state is the transcriptional state, and in which the transcriptional state is measured by hybridization to a gene transcript array, the profiles, also known as expression profiles or transcript profiles, are measured in the following ways. The expression profile of a cell is determined by observing its transcript array. This cell may be a cell that is suspected of having aneuploidy, or it may be a cell having a known alteration in copy number of one or more genes. In particular, deletion transcript profiles, where the genome modification includes variations in gene dosage wherein the gene dosage is decreased with respect to the gene dosage in a wild-type cell, and amplification transcript profiles, where the genome modification includes variations in gene dosage wherein the gene dosage is increased with respect to the gene dosage in a wild-type cell, are examples of transcript profiles of cells exhibiting aneuploidy.
Methods for determining whether aneuploidy is likely to be present in a cell type or organism according to the present invention identify the probable variations in gene dosage that result from aneuploidy by observing profiles, preferably expression profiles. In one preferred general embodiment, the methods include three principal steps. A first step includes quantifying levels of a plurality of cellular constituents associated with a plurality of genes in the genome of a cell type or organism that are mapped to different chromosomes. In one embodiment, when the transcriptional state is observed, the cellular constituents are mRNA species, i.e., levels of cellular constituents are represented by levels of mRNA species. The mRNA levels may be measured by increases or decreases relative to mRNA levels in a wild-type cell. Alternatively, the transcriptional state may be related to the absolute measured amounts of cellular constituents, e.g., the number of, for example, mRNA molecules, in a cell. Alternatively, when the translational state is observed, the cellular constituents are protein species, which are quantified by, for example, measuring the amount or activity of protein species. In yet another embodiment, a combination of the transcriptional and translational states of a cell type is observed.
A second step includes comparing the quantified levels of cellular constituents associated with at least 1, preferably at least 3, still more preferably at least 10, and even more preferably at least 50 genes mapped to the same chromosome or a portion thereof to the mean quantified levels of said cellular constituents associated with the plurality of genes. A third step involves identifying genes mapped to the same chromosome or a portion thereof for which the level of cellular constituents for each gene is substantially the same, and for which the level of cellular constituents is dissimilar to the mean quantified levels of cellular constituents for said plurality of genes. If the genes identified in this step are adjacent on the same chromosome, then there is an indication that aneuploidy of the chromosome, or a portion thereof, is likely to be present in the cell type or organism.
In a second embodiment of the invention, a method of determining whether aneuploidy is likely to be present in a cell type or organism comprises detecting an expression bias that is shared by one or more genes mapped to a single chromosome or a portion thereof. The expression bias is a measure of levels of a first plurality of cellular constituents associated with said first plurality of genes that is different from the mean measure of levels of a second plurality of cellular constituents associated with a second plurality of cellular constituents associated with a second plurality of genes in the cell type, wherein the second plurality consists of at least one gene (or at least 10 or 50 or 100 or 1,000) that is not mapped to said chromosome or portion thereof. In a third embodiment of the invention, a profile or a predicted profile of a subject cell is compared to a database comprising landmark profiles (i.e. a compendium), each of which (a) arises from a cell having a known alteration in copy number of at least one gene, and (b) is digitally stored in association with the known alteration in copy number, to determine the degree of similarity between the profile of the subject cell and the landmark profiles. When the transcriptional state is observed, the profile is preferably compared to a compendium of aneuploid profiles, that is, a compendium comprising landmark profiles generated from measurements of the transcriptional state of cells with known aneuploidies of at least one gene. The aneuploid profiles having the greatest similarity to the profile of the subject cell indicate which aneuploidy is likely to be present in the subject cell.
Conversely, if the subject cell profile is not similar to any of the profiles of aneuploid cells, then aneuploidy of the types represented in the compendium is likely to be absent in the subject cell.
In yet another embodiment, amounts of a plurality of cellular constituents are measured in a cell of a cell type, and a predicted profile is derived therefrom for comparison to one or more landmark profiles. The predicted profile may be for different cellular constituents than those for which amounts were measured in the experiment. For example, a translational profile of protein levels may be used to predict the corresponding transcript profile, which may be used for comparison to a database comprising landmark transcript profiles. Alternatively, a transcript profile of an immature organism, e.g., a seedling, may be acquired and may be used to predict the transcript profile of the mature organism.
In one embodiment, the measured amounts of cellular constituents are determined in comparison to a wild-type cell of said cell type or said organism. In another embodiment, the measured amounts of cellular constituents are absolute measured amounts of cellular constituents, e.g., a number of mRNA molecules per cell.
5.3 DIAGNOSIS AND PREDICTION OF
DISEASES ASSOCIATED WITH ANEUPLOIDY
This subsection describes embodiments of the invention relating to diagnosis of a disease or to determination of a predisposition to a disease in a cell type or organism.
In preferred embodiments, the predisposition of a subject to a disease associated with aneuploidy is determined, or a disease associated with aneuploidy is diagnosed in a subject by observing the profile, preferably the expression profile, of the subject. Subjects include, but are not limited to, humans, primates, mammals, fish, birds, mice, livestock animals such as cows, pigs, goats, sheep, horses, companion animals such as cats and dogs, flowering plants, and crop plants such as corn, wheat, rice, beans, soy, and alfalfa. Cells from said subjects to be assayed for said detection of disease or predisposition toward a disease associated with aneuploidy may be obtained, e.g., by biopsy or amniocentesis.
In a preferred embodiment, the subject is a human and the disease to which a predisposition is determined or which is diagnosed in said subject includes, but is not
5 limited to, diseases associated with chromosome trisomies, such as Down syndrome
(trisomy 21), Edwards syndrome (trisomy 18), and Patau syndrome (trisomy 13); diseases associated with deletions of an arm of a chromosome, such as cri du chat syndrome (5p deletion) and Wolf-Hirschhorn syndrome (4p deletion); diseases associated with contiguous gene syndromes such as Alagille syndrome (20p.l2 deletion), Angelman syndrome
10 (maternal chromosome at 15ql 1 deletion), DiGeorge syndrome (22ql 1.21 deletion), Langer-Giedion syndrome (8q24.1 deletion), Miller-Dieker syndrome (17pl3.3 deletion), Prader-Willi syndrome (paternal chromosome at 15qll deletion), Rubinstein-Taybi syndrome (16pl3- deletion), Smith Magenis syndrome (17pl l.2 deletion), and Williams syndrome (7ql 1.23 deletion); diseases associated with sex chromosome abnormalities such
15 as Turner syndrome (45,X female or 45,X/46,XX or 45,X/47,XXX mosaics), Triple X syndrome (47,XXX), rare X chromosome abnormalities (48,XXXX and 49,XXXXX), Klinefelter's syndrome (47,XXY male) and 47,XYY syndrome.
In another preferred embodiment, the subject is a human and the disease to which a predisposition is determined or which is diagnosed in said subject includes, but is not
20 limited to cancers, such as breast cancer; colon cancer; leukemias, such as acute myelogenous leukemia, chronic myelocytic leukemia, acute promyelocytic leukemia, acute nonlymphocytic leukemia, acute monocytic leukemia, and acute myelomonocytic leukemia; lymphomas, such as Burkitt's lymphoma, and non-Hodgkin's lymphoma; lymphocytic leukemias, such as acute lymphoblastic leukemia and chronic lymphocytic leukemia;
25 myeloproliferative diseases; adenocarcinomas including small cell lung cancer, kidney cancer, uterine cancer, cervical cancer, prostate cancer, bladder cancer, and ovarian cancer; sarcomas including liposarcoma, synovial sarcoma, rhabdomyosarcoma, extraskeletal myxoid chondrosarcoma, Ewing's tumor and peripheral neuroepithelioma; testicular and ovarian dysgerminoma; retinoblastoma; Wilms' tumor; neuroblastoma; malignant
30 melanoma; and mesothelioma.
For example, hereditary papillary renal carcinomas have been associated with trisomy of chromosomes 7, 8 and 17 (Fletcher, 1997, Renal and bladder cancers. In Human Cytogenetic Cancer Markers, eds. Wofman & Sell, Totowa, NJ, Humana Press, 169-202; Zhuang et al., 1998, Nat. Genet. 20:66-69; Sen, 2000, Current Opinion in Oncology 12:82-
35 88). Moderate gains of sequences from chromosomes 8 and 13 have been found to occur in most colorectal tumors (Sen, 2000, Current Opinion in Oncology 12:82-88), and aneuploidy of chromosome 4 has been found to occur in metastatic colorectal cancer (Malkhosyan et al, 1998, Proc. Natl. Acad. Sci. USA 95:10170-10175). Follicular thyroid tumors have been frequently found to result from loss of chromosome 22 (Hemmer et al., 1998, Br. J. Cancer 78:1012-17), and cervical cancer progression has been associated with aneuploidy of chromosomes 1, 7, and X (Bulten et al., 1998, Am. J. Pathol. 152:495-503). Recent studies suggest that acute myeloid leukemia may result from monosomy of chromosome 7 (Krauter et al., 1999, Ann. Hematol. 78:265-269) and patients with myelodysplastic syndromes, which may lead to leukemias, have been reported to display monosomy of chromosomes 5 and 7 (Van Den Neste et al., 1999, Br. J. Hematol. 105:268-270). In addition, aneuploidy has been found in over 20,000 analyzed solid tumors (Heim & Mitelman, 1995, Cancer cytogenetics, ed. 2, New York, Wiley Liss, Inc.).
In one embodiment, the predisposition of a subject to a disease is detected, or a disease associated with aneuploidy is diagnosed, by quantifying levels of a plurality of cellular constituents associated with genes mapped to the same chromosome, or a portion thereof, and comparing these levels to the mean quantified levels of cellular constituents associated with a plurality of genes mapped to different chromosomes. If the level of each cellular constituent associated with each gene mapped to the same chromosome or a portion thereof is substantially the same for each of said genes, and is dissimilar to the mean quantified levels of cellular constituents associated with said plurality of genes mapped to different chromosomes, and if said genes mapped to the same chromosome or a portion thereof are adjacent on said chromosome, then aneuploidy of said chromosome or portion thereof is likely to be present, and said subject is likely to have a predisposition to a disease or to have a disease associated with said aneuploidy.
In another embodiment, the profile of the subject to be diagnosed is compared to a compendium comprising landmark profiles, some of which are from a cell or organism having an altered copy number of at least one gene that is diagnostic or prognostic of a particular disease. Diseases associated with landmark profiles having the greatest similarity to said cell profile are those diseases present in said subject. In a preferred embodiment, the cell type from which the landmark profiles are derived is substantially isogenic to the cell type being diagnosed. The cell type from which the landmark profiles are derived is preferably from the same species and tissue type as the cell type of the subject being diagnosed or assayed for predisposition to disease. The cell type from which the landmark profiles are derived is preferably from the same species and tissue type as the cell type of the subject being diagnosed or assayed for predisposition to disease. For example, if a disease is being diagnosed in a subject or the subject is being assayed for a predisposition to disease by obtaining a fat biopsy, the landmark profiles to which the profile of the subject to be diagnosed or assayed for a predisposition to disease is compared is preferably a set of landmark profiles from fat cells of an organism of the same species.
In a specific embodiment, the predisposition of a cell type or organism to a disease associated with aneuploidy can be detected as follows. A profile of an immature (not fully differentiated), mature or asymptomatic cell, e.g., from amniotic cells of a fetus, is compared to a compendium comprising landmark profiles each of which arises from an immature cell, or from an asymptomatic cell, having an identified alteration in copy number of at least one gene that is associated with a disease in order to determine the degree of similarity between the profile of the immature or asymptomatic cell and the landmark profiles. Similarity of the immature or asymptomatic cell profiles indicates eventual similarity of profiles associated with mature cells or with cells in which a disease is present. By comparing the profile of an immature or asymptomatic cell to the landmark profiles of immature cells or of asymptomatic cells having an identified alteration in copy number of at least one gene that is associated with a disease, the predisposition of the immature or asymptomatic cell toward the disease associated with aneuploidy can be detected. By "asymptomatic cell," as used herein, is meant a cell that does not show a pathology related to aneuploidy, even though the genome of the cell may exhibit variations in gene dosage from a wild type cell. Such landmark profiles for detection of the predisposition of humans toward diseases associated with aneuploidy may include, inter alia, those associated with diseases discussed above in this section.
In another embodiment, amounts of a plurality of cellular constituents can be measured in an immature or asymptomatic cell of a cell type, and a predicted profile can be derived therefrom for comparison to one or more landmark profiles. The predicted profile is then compared to the compendium of landmark profiles of mature cells or of cells having symptoms of a disease in order to detect the predisposition of an immature or asymptomatic cell to a disease associated with aneuploidy. Alternatively, the profile of the immature or asymptomatic cell can be compared directly to the compendium of landmark profiles of mature cells or of cells having symptoms of a disease.
5.4 DETECTION OF ANEUPLOIDY BY
DETERMINATION OF GENE EXPRESSION
AS A FUNCTION OF CHROMOSOME LOCATION
In a preferred embodiment, whole chromosomal aneuploidy is determined using mean chromosomal ratio plots. In a mean chromosomal ratio plot, the ratio of measured amounts of cellular constituents associated with at least 10%, preferably at least 30%, more preferably at least 60%, even more preferably at least 90%, most preferably all of the genes on a chromosome in an aneuploid cell and on the chromosome of a wild-type cell, e.g., the expression ratio, is plotted as a function of chromosome location, i.e., which chromosome the genes reside on. For example, the expression levels (circles) and genomic dosage (squares) for each chromosome correlate in FIG. 1 (d and e) for the erg4 and ecml8/ecml8 mutant strains of yeast, indicating that these mutants had additional copies of chromosome seven, which resulted in elevated expression of the genes on chromosome seven by 58% and 35%, respectively, compared to wild-type cells.
The mean expression ratio for each chromosome may be represented as an error- weighted mean of at least 5 genes, preferably at least 10 genes, more preferably at least 50 genes, more preferably at least 100 genes, even more preferably at least 10%, even more preferably at least 30%, even more preferably at least 60%, even more preferably at least 90%, most preferably all of the genes present on that chromosome, with the error calculated based on the quality and intensity of the data. In this embodiment, a chromosome has a statistically significant chromosome- wide expression bias if the mean chromosomal ratio has an offset of greater than 0.1 in log space and is at least ten standard deviations from the mean (P<10"20). P values can be calculated from the number of standard deviations from the mean, assuming a Gaussian distribution, and the error of the mean ratio in log space computed from the spread of the data, taking into account the error of each point and the number of data points. When the ORFs on a particular chromosome have a mean chromosomal ratio that meets the above-identified criteria, an expression bias of genes on that chromosome resulting from aneuploidy is likely to be present.
In a less preferred embodiment, aneuploidy can be detected using a correlation coefficient that measures the similarity between two profiles from cell types or organisms having deletion mutations of different genes on the same chromosome (FIG. la-c). For example, when comparing the expression profiles of the two yeast strains, one having a deletion mutation in the erg4 gene, which is on chromosome VII, and the other having a deletion mutation in the ecml8 gene, which is on chromosome IN, a correlation is found to exist (r=0.63). This correlation is primarily due to the aneuploidy of chromosome Nil that exists in both strains, as is shown by the fact that the correlation disappears when the expression ratios for genes on chromosome VII are excluded from the plot (FIG. lb). Furthermore, if the expression ratios for genes located on a chromosome having a wild-type copy number (e.g., chromosome IV in this example) are excluded from the plot, the correlation of the expression profile persists (FIG. Ic). Detection of aneuploidy is important to avoid interpreting spurious correlation of profiles resulting from aneuploidy as correlations due to co-regulation of genes that are, e.g., in the same biological pathway. The similarity between profiles may be measured by a weighted correlation coefficient, r, given by Equation 6:
Figure imgf000035_0001
£x )1/2 (Equation 6)
where xik is qikik and xjk is qjkjk, where qik and qjk are the logarithms of the expression ratios between the perturbed and baseline conditions for gene k in profiles i and j, respectively and σik and σjk are the uncertainties in the measurements of qik and qjk, respectively. This is an optimally- weighted measure of correlation between two profiles, given the error estimate, σ, on each data point.
Two profiles are "similar" for the purposes of the methods of the present invention if they have a statistically significant correlation. For example, values of r significantly different from zero are those that have a small likelihood of occurring by chance under the hypothesis that the profiles are in fact not correlated. Under the uncorrelated hypothesis, the probability distribution of r is approximated by Equation 7:
z=(l/2) [ln(l+r) - In (1 -r)] (Equation 7)
wherein z is normally distributed with standard error l/(n-3)1/2 and n is the total number of measurements (Fisher, 1921, Metron 1 3). Thus, a pair of values r and n that result in a z value of greater than 2/(n-3)1/2 indicate profile similarity at a two standard deviation level of significance.
A non-parametric approach to assigning a probability to any r value is to randomize the order of the elements in the data vectors (i.e., the gene indices), and then generate a Monte Carlo distribution of r arising from the rearranged data, which satisfies the uncorrelated hypothesis. The value of r computed from the actual data is then compared to this distribution in order to assign a likelihood that the correlation is not random.
Thus aneuploidy may be detected by correlation of profiles with strains of known aneuploidy.
In another preferred embodiment, segmental aneuploidy can be detected by scanning the expression ratio data for instances in which a number, i.e., at least two, preferably at least four, of non-overlapping, chromosomally-adjacent genes are all up- or down-regulated at, e.g., a 0.05 significance threshold. FIG. 2a and 2b depict the log10 of expression ratios of cellular constituents associated with all genes on chromosome XV of a yeast rpl20aA/ rpl20aΔ mutant as a function of chromosome location. Segmental aneuploidy on chromosome XV is shown by expression ratio data (FIG. 2a-b), and is confirmed by assaying genomic DNA copy number (FIG. 2c-d). 5.5 USING CO-VARYING SETS TO DETECT ANEUPLOIDY
In some embodiments, the methods of the present invention can involve using cellular constituents in the biological response profiles that are arranged or grouped according to their tendency to co-vary in response to a perturbation. For example, if groups of cellular constituents that normally co-vary in response to perturbations (preferably over at least 3, 5, 10, 50 or 100 different perturbations) are identified, deviations from that covariation may indicate the presence of aneuploidy in cells. In particular, this Section describes specific embodiments for arranging the cellular constituents into co-varying sets. Clustering methods are also described in International Patent Publication WO 00/24936, published May 4, 2000, which is incorporated herein by reference in its entirety.
Clustering Algorithms:
Preferably, the basis or co-varying sets are identified by means of a clustering algorithm (i.e., by means of "clustering analysis"). Clustering algorithms of this invention may be generally classified as "model-based" or "model-independent" algorithms. In particular, model-based clustering methods assume that co-varying sets or clusters map to some predefined distribution shape in the cellular constituent "vector space." For example, many model-based clustering algorithms assume ellipsoidal cluster distributions having a particular eccentricity. By contrast, model-independent clustering algorithms make no assumptions about cluster shape. As is recognized by those skilled in the art, such model- independent methods are substantially identical to assuming "hyperspherical" cluster distributions. Hyperspherical cluster distributions are generally preferred in the methods of this invention, e.g., when the perturbation vector elements v m have similar scales and meanings, such as the abundances of different mRNA species.
The clustering methods and algorithms of the present invention may be further classified as "hierarchical" or "fixed-number-of groups" algorithms (see, e.g., S-Plus Guide to Statistical and Mathematical Analysis v.3.3, 1995, MathSoft, Inc.: StatSci. Division, Seattle, Washington). Such algorithms are well known in the art (see, e.g., Fukunaga, 1990, Statistical Pattern Recognition, 2nd Ed., San Diego: Academic Press; Everitt, 1974, Cluster Analysis, London: Heinemann Educ. Books; Hartigan, 1975, Clustering Algorithms, New York: Wiley; Sneath and Sokal, 1973, Numerical Taxonomy, Freeman; Anderberg, 1973, Cluster Analysis for Applications, New York: Academic Press), and include, e.g., hierarchical agglomerative clustering algorithms, the "k-means" algorithm of Hartigan (supra), and model-based clustering algorithms such as mclust by MathSoft, Inc.
Preferably, hierarchical clustering methods and/or algorithms are employed in the methods of this invention. In a particularly preferred embodiment, the clustering analysis of the present invention is done using the hclust routine or algorithm (see, e.g., 'hclusf routine from the software package S-Plus, MathSoft, Inc., Cambridge, MA).
The clustering algorithms used in the present invention operate on a table of data containing measurements of a plurality of cellular constituents, preferably gene expression measurements. Specifically, the data table analyzed by the clustering methods of the present invention comprise an Nx K array or matrix wherein N is the total number of conditions or perturbations and K is the number of cellular constituents measured or analyzed. The clustering algorithms of the present invention analyze such arrays or matrices to deteπnine dissimilarities between cellular constituents. Mathematically, dissimilarities between cellular constituents i and y are expressed as "distances" / . For example, in one embodiment, the Euclidian distance is determined according to the Equation 8:
(Equation 8)
Figure imgf000037_0001
In Equation 8, above, vfm) and v(m) are the responses of cellular constituent i andy, respectively, to the perturbation m. In other embodiments, the Euclidian distance in Equation 9, above, is squared to place progressively greater weight on cellular constituents that are further apart. In alternative embodiments, the distance measure I is the Manhattan distance provided by Equation 9:
Iι,j = Σ \v m) ~ V ] I (Equation 9) m
In embodiments wherein the biological response profile data is categorical (e.g. , wherein each element v "!-) = 1 or 0), the distance measure is preferably a percent disagreement defined by Equation 10:
It
Figure imgf000037_0002
t,j = (Equation 10) N
In a particularly preferred embodiment, the distance is defined as I = 1 - r , where r is the "correlation coefficient" or normalized "dot product" between the response vectors v, and Vj. In particular, r is defined by Equation 11, below:
V 1 - V J r h. j . = . — π — r (Equation 11) hi Equation 11, the dot product v.-v, is defined according to Equation 12:
V v, = ∑ (v<"> x v<">) (Equation 12)
Further, the quantities |v,| and |vj in Equation 11 are provided by the relations |v,| = (v,-v.) ,
Figure imgf000038_0001
In still other embodiments, the distance measure can some other distance measure known in the art, such as the Chebychev distance, the power distance, and percent disagreement, to name a few. Most preferably, the distance measure is appropriate to the biological questions being asked, e.g., for identifying co-varying and/or co-regulated cellular constituents including co-varying or co-regulated genes. For example, in a particularly preferred embodiment, the distance measure ItJ = - - rτj with the correlation coefficient which comprises a weighted dot product of the response vectors v, and v.. Specifically, in this preferred embodiment, r is preferably defined by Equation 13:
(Equation 13)
Figure imgf000038_0002
In Equation 13, above, the quantities σ^m) and σjm are the standard errors associated with the measurement of the t'th and/'th cellular constituents, respectively, in experiment m.
The correlation coefficients provided by Equations 11 and 13 are bounded between values of +1, which indicates that the two response vectors are perfectly correlated and essentially identical, and -1, which indicates that the two response vectors are "anti- correlated" or "anti-sense" (i.e., axe opposites). These correlation coefficients are particularly preferably in embodiments of the invention where cellular constituent sets or clusters are sought of constituents which have responses of the same sign. However, in other embodiments, it can be preferable to identify cellular constituent sets or clusters which are co-regulated or involved in the same biological responses or pathways but comprise both similar and anti-correlated responses. In such embodiments, it is preferable to use the absolute value of the correlation coefficient provided by Equation 11 or 13; i.e., \r. | as the correlation coefficient. In still other embodiments, the relationships between co-regulated and/or co-varying cellular constituents may be even more complex, such as in instances wherein multiple biological pathways (for example, multiple signaling pathways) converge on the same cellular constituent to produce different outcomes. In such embodiments, it is preferable to use a correlation coefficient ry = r cha"ge) which is capable of identifying co-varying and/or co-regulated cellular constituents irrespective of the sign. The correlation coefficient specified by Equation 14, below, is particular useful in such embodiments.
(change) (Equation 14) ι,J
Figure imgf000039_0001
Generally, the clustering algorithms used in the methods of the invention also use one or more linkage rules to group cellular constituents into one or more sets or "clusters."
For example, single linkage or the nearest neighbor method determines the distance between the two closest objects (i.e., between the two closest cellular constituents) in a data table. By contrast, complete linkage methods determine the greatest distance between any two objects (i.e., cellular constituents) in different clusters or sets. Alternatively, the unweighted pair-group average evaluates the "distance" between two clusters or sets by determining the average distance between all pairs of objects (i.e., cellular constituents) in the two clusters. Alternatively, the weighted pair-group average evaluates the distance between two clusters or sets by determining the weighted average distance between all pairs of objects in the two clusters, wherein the weighing factor is proportional to the size of the respective clusters. Other linkage rules, such as the unweighted and weighted pair-group centroid and Ward's method, are also useful for certain embodiments of the present invention (see, e.g., Ward, 1963, J. Am. Stat. Assn 58:236; Hartigan, 1975, Clustering Algorithms, New York: Wiley).
In particularly preferred embodiments, an agglomerative hierarchical clustering algorithm is used. Such algorithms are known in the art and described, e.g., in Hartigan, supra. Briefly, the algorithm preferably starts with each object (e.g., each cellular constituent) as a separate group. In each successive step, the algorithm identified the two most similar objects by finding the minimum of all the pair-wise similarity measures, merges them into one object (i.e., into one "cluster") and updates the between-cluster similarity measures accordingly. The procedure continues until all objects are found in a single group. When merging two closest objects, a heuristic criterion of average linkage is preferably employed to redefine the between-cluster similarity measures. Since two objects are combined at each similarity level, such a clustering algorithm clustering yields a rigid hierarchical structure among objects and defines their memberships. Once a clustering algorithm has grouped the cellular constituents from the data table into sets or clusters, e.g., by application of linkage rules such as those described supra, a clustering "tree" may be generated to illustrate the clusters of cellular constituents so determined.
Genesets may be readily defined based on the branchings of a clustering tree. In particular, genesets may be defined based on the many smaller branchings of a clustering tree, or, optionally, larger genesets may be defined corresponding to the larger branches of a clustering tree. Preferably, the choice of branching level at which genesets are defined matches the number of distinct response pathways expected. In embodiments wherein little or no information is available to indicate the number of pathways, the genesets should be defined according to the branching level wherein the branches of the clustering tree are "truly distinct."
"Truly distinct," as used herein, may be defined, e.g., by a minimum distance value between the individual branches. Typically, the distance values between truly distinct genesets are in the range of 0.2 to 0.4, where a distance of zero corresponds to perfect correlation and a distance of unity corresponds to no correlation. However, distances between truly distinct genesets may be larger in certain embodiments, e.g., wherein there is poorer quality data or fewer experiments n in the profile data. Alternatively, in other embodiments, e.g., having better quality data or more experiments n in the profile dataset, the distance between truly distinct genesets may be less than 0.2.
Statistical Significance:
Preferably, truly distinct cellular constituent sets are defined by means of an objective test of statistical significance for each bifurcation in the clustering tree. For example, in one aspect of the invention, truly distinct cellular constituent sets are defined by means of a statistical test which uses Monte Carlo randomization of the experiment index m for the responses of each cellular constituent across the set of experiments. For example, in one preferred embodiment, the experiment index m of each cellular constituent's response v[m) is randomly permutated, as indicated by Equation 15: v m) → v,π("!) (Equation 15)
More specifically, a large number of permutations of the experiment index m is generated for each cellular constituent's response. Preferably, the number of permutations is from 50 to about 1000, more preferably from 50 to about 100. For each branching of the original clustering tree, and for each permutation of the experiment index:
(1) hierarchical clustering is performed on the permutated data, preferably using the same clustering algorithm as used for the original unpermuted data; and
(2) the fractional improvement/in the total scatter is computed with respect to the cluster centers in going from one cluster to two clusters.
In particular, the fractional improvement/is computed according to Equation 16, below:
= 1 ~ V Σ n(2) (Equation 16)
In Equation 16, D, is the square of the distance measure for cellular constituent i with respect to the center (i.e., the mean) of its assigned cluster. The superscripts (1) and (2) indicate whether the square of the distance measure D, is made with respect to (1) the center of its entire branch, or (2) the center of the appropriate cluster out of the two clusters. The distance function -D, in Equation 16 may be defined according to any one of several embodiments. In particular, the various embodiments described supra for the definition of y may also be used to define Dl in Equation 16.
The distribution of fractional improvements obtained from the above-described Monte Carlo methods provides an estimate of the distribution under the null hypothesis, i.e., the hypothesis that a particular branching in a cluster tree is not significant or distinct. A significance can thus be assigned to the actual fractional improvement (i.e., the fraction improvement of the unpermuted data) by comparing the actual fractional improvement to the distribution of fractional improvements for the permuted data. Preferably, the significance is expressed in terms of the standard deviation of the null hypothesis distribution, e.g., by fitting a log normal model to the null hypothesis distribution obtained from the permuted data.
In more detail, an objective statistical test is preferably employed to determine the statistical reliability of the grouping decisions of any clustering method or algorithm. Preferably, a similar test is used for both hierarchical and non-hierarchical clustering methods. More preferably, the statistical test employed comprises (a) obtaining a measure of the compactness of the clusters determined by one of the clustering methods of this invention, and (b) comparing the obtained measure of compactness to a hypothetical measure of compactness of cellular constituents regrouped in an increased number of clusters. For example, in embodiments wherein hierarchical clustering algorithms, such as hclust, axe employed, such a hypothetical measure of compactness preferably comprises the measure of compactness for clusters selected at the next lowest branch in a clustering tree. Alternatively, in embodiments wherein non-hierarchical clustering methods or algorithms are employed, e.g., to generate N clusters, the hypothetical measure of compactness is preferably the compactness obtained for N+1 clusters by the same methods.
Cluster compactness maybe quantitatively defined, e.g., as the mean squared distance of elements of the cluster from the "cluster mean," or, more preferably, as the inverse of the mean squared distance of elements from the cluster mean. The cluster mean of a particular cluster is generally defined as the mean of the response vectors of all elements in the cluster. However, in certain embodiments, e.g., wherein the absolute value of Equation 11 or 13 is used to evaluate the distance metric (i.e., Ij = l - \rtj\ ) of the clustering algorithm, such a definition of cluster mean is problematic. More generally, the above definition of mean is problematic in embodiments wherein response vectors can be in opposite directions such that the above defined cluster mean could be zero. Accordingly, in such embodiments, it is preferable to chose a different definition of cluster compactness such as, but not limited to, the mean squared distance between all pairs of elements in the cluster. Alternatively, the cluster compactness may be defined to comprise the average distance (or more preferably the inverse of the average distance) from each element (e.g., cellular constituent) of the cluster to all other elements in that cluster.
Preferably, step (b) above of comparing cluster compactness to a hypothetical compactness comprises generating a non-parametric statistical distribution for the changed compactness in an increased number of clusters. More preferably, such a distribution is generated using a model which mimics the actual data but has no intrinsic clustered structures (i.e., a "null hypothesis" model). For example, such distributions may be generated by (a) randomizing the perturbation experiment index m for each actual perturbation vector v[m and (b) calculating the change in compactness which occurs for each distribution, e.g. , by increasing the number of clusters from N to N+1 (non-hierarchical clustering methods), or by increasing the branching level at which clusters are defined (hierarchical methods).
In an exemplary embodiment, the increased compactness is given by the parameter E, which is defined by Equation 17, below: rW _ Htr+i) j-, mean * mean -, , . . ^
E = XjXT (Equation 17) mean
However, other definitions that are apparent to those skilled in the art can also be used in the statistical methods of this invention. In general, the exact definition of E is not crucial provided it is monotonically related to increase in cluster compactness. The statistical methods of this invention provide methods to analyze the significance of E. Specifically, these methods provide an empirical distribution approach for the analysis of E by comparing the actual increase in compactness, E0, for actual experimental data to an empirical distribution of E values determined from randomly permuted data (e.g., by Equation 15 above). The coordinates (i.e., the indices) of the vectors in each cluster being subdivided are "reflected" about the cluster center, e.g., by first translating the coordinate axes to the cluster center. Second, the randomly permuted data are re-evaluated by cluster algorithms, most preferably by the same cluster algorithm used to determine the original cluster(s), so that new clusters are determined for the permutated data, and a value of E is evaluated for these new clusters (i.e., for splitting one or more of the new clusters). Steps one and two above are repeated for some number of Monte Carlo trials to generate a distribution of E values. Preferably, the number of Monte Carlo trials is from about 50 to about 1000, and more preferably from about 50 to about 100. Finally, the actual increase in compactness, i.e., E0, is compared to this empirical distribution of E values. For example, if M Monte Carlo simulations are performed, of which x have E values greater than E0, then the confidence level in the number of clusters may be evaluated from 1-x/M. In particular, if M= 100, and x = 4, then the confidence level that there is no real significance in increasing the number of clusters is 1 - 4/100 = 96%.
The above methods are equally applicable to embodiments comprising hierarchical clusters and/or a plurality of elements (e.g. , more than two cellular constituents).
Classification Based Upon Mechanisms of Regulation:
Cellular constituent sets can also be defined based upon the mechanism of the regulation of cellular constituents. For example, genesets can often be defined based upon the regulation mechanism of individual genes. Genes whose regulatory regions have the same transcription factor binding sites are more likely to be co-regulated, and, as such, are more likely to co-vary. In some preferred embodiments, the regulatory regions of the genes of interest are compared using multiple alignment analysis to decipher possible shared transcription factor binding sites (see, e.g., Stormo and Hartzell, 1989, Proc. Natl. Acad. Sci. 5(5:1183-1187; and Hertz and Stormo, 1995, Proc. of3rdIntl. Conf. on Bioinformatics and Genome Research, Lim and Cantor, eds., Singapore: World Scientific Publishing Co., Ltd., pp.201-216). For example, the common promoter sequence responsive to Gcn4 in 20 genes is likely to be responsible for those 20 genes co-varying over a wide variety of perturbations. Co-regulated and/or co-varying genes may also be in the up- or down-stream relationship where the products of up-stream genes regulate the activity of down-stream genes. For example, as is well known to those of skill in the art, there are numerous varieties of gene regulation networks. Accordingly, the methods of the present invention are not limited to any particular kind of gene regulation mechanism. If it can be derived or determined from their mechanisms of regulation, whatever that mechanism happens to be, that two or more genes are co-regulated in terms of their activity change in response to perturbation, those two or more genes may be clustered into a geneset.
In many embodiments, knowledge of the exact regulation mechanisms of certain cellular constituents may be limited and/or incomplete. In such embodiments, it may be preferred to combine cluster analysis methods, described above, with knowledge of regulatory mechanisms to derive better defined, i.e., refined cellular constituent sets. For example, in some embodiments, clustering may be used to cluster genesets when the regulation of genes of interest is partially known. In particular, in many embodiments, the number of genesets may be predetermined by understanding (which may be incomplete or limited) or the regulation mechanism or mechanisms. In such embodiments, the clustering methods may be constrained to produce the predetermined number of clusters. For example, in a particular embodiment promoter sequence comparison may indicate that the measured genes should fall into three distinct genesets. The clustering methods described above may then be constrained to generate exactly three genesets with the greatest possible distinction between those three sets.
Refinement of Cellular Constituent Sets:
Cellular constituent sets, such as cellular constituent sets identified by any of the above methods or combinations thereof, may be refined using any of several sources of corroborating information. Examples of corroborating information which may be used to refine cellular constituent sets include, but are by no means limited to, searches for common regulatory sequence patterns, literature evidence for co-regulations, sequence homology (e.g., of genes or proteins), and known shared function.
In preferred embodiments, a cellular constituent database or "compendium" is used for the refinement of genesets. In particularly preferred embodiments the compendium is a "dynamic database." For example, in certain embodiments, a compendium containing raw data for cluster analysis of cellular constituent sets (e.g., for genesets) is used to continuously update geneset definitions.
Re-ordering the Cellular Constituent Index: As noted above, in preferred embodiments of the present invention the cellular constituents are re-ordered according the cellular constituent sets or clusters obtained or provided by the above-described methods and visually displayed. Analytically, such a reordering corresponds to transforming a particular original profile, such as a profile that measures a response to a particular perturbation, e.g., v(n) = {v(n)}, to the re-ordered profile {vπ(ifn)}, where i is the cellular constituent index.
5.6 USING A COMPENDIUM OF LANDMARK PROFILES TO DETECT ANEUPLOIDY
In a preferred embodiment, the biological state of a cell is determined by measuring the expression levels of a plurality of genes in a cell to produce a transcript (or expression) profile. The effects of altered dosages of individual genes, chromosomal regions or entire chromosomes in a cell can be conveniently and exhaustively examined by using a library of cell mutants, wherein each mutant has an altered dosage of one or more genes. In another embodiment, gene dosage can be altered by increasing or decreasing the amount of DNA of a gene, or by increasing or decreasing the levels or activities of RNA or protein encoded by said gene. In yet another embodiment, a mutation in a gene of a cell or organism may result in altered dosage of other genes because the cell or organism compensates for, e.g., loss of function of the mutated gene. Thus, altered gene dosage of a particular gene m may be the result of a mutation in a paralog gene m ' that has a similar function to gene m. Thus, a mutation of gene i ' that results in a deletion or down-regulation of the gene may be compensated for by, e.g., homologous recombination and selection for increased dosage of gene m, which has a similar function to gene m '. One of skill in the art will readily appreciate that a mutation in a gene that results in altered dosages of other genes, e.g., paralog genes, can be spontaneous or can be introduced by techniques including, but not limited to, transfection, homologous recombination, promoter replacement, or RNA anti- sense approaches. In yet another embodiment, aneuploidy may be induced by making mutations in genes whose function is to maintain a wild-type chromosome number in a cell type or organism. Thus, when these mutants become aneuploid, there is likely to be no mechanism in the cell to correct the altered gene dosage. The transcript profiles of each of the resulting aneuploid cells are measured to produce a "compendium" comprising landmark transcript profiles, each of which is uniquely associated with a particular dose of one or more genes in an organism. One of skill in the art will recognize that the compendium may comprise landmark profiles for different dosages of a particular gene, e.g., gene m, because a profile generated from a cell type or organism having a duplication of gene m may be different from a profile generated from the same cell type or organism having a 100-fold amplification of gene m. One of ordinary skill in the art will also readily recognize that a compendium can also be constructed by measuring other cellular constituents that are indicative of the biological states of aneuploid cells, which include, but are not limited to, protein expression and protein activity levels. Preferably, the compendium comprising landmark profiles is a database stored on a computer readable medium that carries out the comparisons. In specific embodiments, the database contains at least 10 profiles, at least 50 profiles, at least 100 profiles, at least 500 profiles, at least 1,000 profiles, at least 10,000 profiles, or at least 50,000 profiles, each profile containing measurements of at least 10, preferably at least 50, more preferably at least 100, more preferably at least 500, even more preferably at least 1,000, even more preferably at least 10,000, most preferably at least 50,000 cellular constituents.
In some embodiments, a library of aneuploid cells is generated by targeting mutations to particular genes of an organism and selecting for mutants that compensate for the targeted mutations with altered dosage levels of other genes. Saccharomyces cerevisiae is particularly well-suited to this technique of generating mutants. While many organisms repair double-stranded DNA ends that are not part of telomeres by end-to-end ligation, S. cerevisiae uses homologous recombination. Thus, targeted perturbations of genes can be made in yeast by transforming the yeast with a particular DNA sequence, which integrates at a locus with high homology. In other embodiments, a library of aneuploid cells is generated by first randomly mutagenizing the cells using, e.g., chemical agents, radiation or retroviral-mediated insertion mutagenesis and subsequent identification of cells that compensate for these mutations by exhibiting altered gene copy number. One of ordinary skill in the art will recognize that profiles may change with environmental perturbations, so that when generating a compendium comprising landmark profiles, differences in environmental variables, e.g., growth medium, temperature, cell density, pH, etc., should be minimized. Likewise, when comparing a new profile to the compendium, the organism or cell from which that profile was generated should be grown under the same environmental conditions as the aneuploid cells from which the compendium was compiled. One of ordinary skill in the art will further recognize that, in the case of multicellular organisms, profiles will change with tissue type and developmental state.
In one embodiment, the database comprises landmark profiles for altered dosages of at least 2%, preferably at least 5%, more preferably at least 15%, even more preferably at least 20%, even more preferably at least 40%, most preferably at least 75%, of genes in the genome of a cell type or organism, and may also include profiles from strains having different copy numbers of the same gene, since these can be fundamentally different from each other. In another embodiment, the number of landmark profiles is reduced to the mimmum necessary to identify altered copy number of particular genes or chromosomal regions. For example, aneuploidy of a particular chromosomal region can be represented in the compendium set by only a few profiles from cell types or organisms having aneuploidy of genes that are located throughout the chromosomal region, i.e., each chromosomal region can be represented in the compendium by at least one profile from a cell type or organism having altered copy number of one gene, but multiple profiles from cell types or organisms having altered copy numbers of many genes in the chromosomal region may not be necessary.
In a specific embodiment, the database comprises landmark profiles for at least 100, preferably at least 250, more preferably at least 500, even more preferably at least 1,000, even more preferably at least 10,000, even more preferably at least 50,000, most preferably at least 75,000 genes in the genome of a cell or organism, each gene having an altered copy number. In another embodiment, the database comprises landmark profiles for at least 1/4, preferably at least 1/2, most preferably at least 3/4 of the genes in the genome of a cell or organism, each gene having an altered copy number. In various embodiments, the cell or organism for which the database contains landmark profiles is a human, livestock or companion animal or plant.
5.6.1 GENETIC MODIFICATIONS
Genetically modified cells, i.e., mutant cells from which aneuploid cells can result, can be made using cells of any organism for which genomic sequence information is available and for which methods are available that allow alteration in dosage of specific genes. The genetically modified cells that exhibit aneuploidy are used to make aneuploid profiles. Preferably, a compendium is constructed that includes transcript profiles that represent the transcriptional states of each of a plurality of modified cells with an indicated dosage level of one or more genes, e.g., a set of cells in which each cell has a duplication of a particular gene. Such a compendium is advantageous to detect aneuploidy in a systematic and automatable manner. Preferably, the compendium includes aneuploid transcript profiles for the genes likely to result in a disease or syndrome.
In one embodiment, the invention is carried out using a yeast, with Sαcchαromyces cerevisiae most preferred because the sequence of the entire genome of a S. cerevisiae strain has been determined. In addition, well-established methods for deleting or otherwise disrupting or modifying specific genes are available in yeast. It is believed that most (approximately four-fifths) of the genes in S. cerevisiae can be deleted, one at a time, with little or no effect on the ability of the organism to reproduce. Another advantage is that biological functions are often conserved between yeast and humans. For example, almost half of the proteins identified as defective in human heritable diseases show amino acid similarity to yeast proteins (Goffeau et al., 1996, Life with 6000 genes. Science 274:546- 567). A preferred strain of yeast is a S. cerevisiae strain for which yeast genomic sequence is known, such as strain S288C or substantially isogenic derivatives of it (see, e.g., Nature 369, 371-8 (1994); P.N.A.S. 92:3809-13 (1995); E.M.B.O. J. 13:5795-5809 (1994), Science 265:2077-2082 (1994); E.M.B.O. J. 15:2031-49 (1996), all of which are incorporated herein. However, other strains may be used as well. Yeast strains are available from American Type Culture Collection, Rockville, MD 20852. Standard techniques for manipulating yeast are described in C. Kaiser, S. Michaelis, & A. Mitchell, 1994, Methods in Yeast Genetics: A Cold Spring Harbor Laboratory Course Manual, Cold Spring Harbor Laboratory Press, New York; and Sherman et al, 1986, Methods in Yeast Genetics: A Laboratory Manual Cold Spring Harbor Laboratory, Cold Spring Harbor. New York, both of which are incorporated by reference in their entirety and for all purposes.
5.6.2 CONSTRUCTION OF DELETION AND
ONER-EXPRESSION MUTANTS IN YEAST
In one embodiment of the invention, yeast cells are used. In one embodiment, yeast genes are disrupted or deleted using the method of Baudin et al, 1993, A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae, Nucl. Acids Res. 21 :3329-3330, which is incorporated by reference in its entirety for all purposes. This method uses a selectable marker, e.g., the KanMx gene, which serves in a gene replacement cassette. The cassette is transformed into a haploid yeast strain and homologous recombination results in the replacement of the targeted gene (ORF) with the selectable marker. In one embodiment, a precise null mutation (a deletion from start codon to stop codon) is generated. Also see, Wach et al, 199 A, New heterologous modules for classical or PCR-based gene perturbations in Saccharomyces cerevisiae, Yeast 10:1793-1808; Rothstein, 1991, Methods Enzymol. 194:281 each of which is incorporated by reference in its entirety for all purposes. An advantage to using precise null mutants is that it avoids problems with residual or altered functions associated with truncated products. However, in some embodiments (e.g., when investigating potential targets in the excluded set) a deletion or mutation affecting less than the entire protein coding sequence, e.g., a deletion of only one domain of a protein having multiple domains and multiple activities, is used.
In some embodiments, the polynucleotide (e.g., containing a selectable marker) used for transformation of the yeast includes an oligonucleotide marker that serves as a unique identifier of the resulting deletion strain as described, for example, in Shoemaker et al., 1996, Nature Genetics 14:450. Once made, perturbations can be verified by PCR using the internal KanMx sequences, or using an external primer in the yeast genome that immediately flanks the disrupted open reading frame, and assaying for a PCR product of the expected size. When yeast is used, it may sometimes be advantageous to disrupt ORFs in three yeast strains, i.e., haploid strains of the a and mating types, and a diploid strain (for deletions of essential genes).
In another embodiment, precise deletion of yeast genes is accomplished by using a
PCR-mediated gene disruption strategy using homologous recombination (Winzeler et al. (1999) Science 285:901-906). In this method, short regions of yeast sequence that are upstream and downstream of a targeted gene are placed at each end of a selectable marker gene through PCR. The resulting PCR products, when transformed into yeast, can replace the targeted gene by homologous recombination. For most genes, greater than 95% of the yeast transformants carry the correct gene deletion.
5.6.3 CONSTRUCTION OF MUTANTS IN OTHER ORGANISMS The method of the present invention can be carried out using cells from any eukaryote for which genomic sequence of at least one gene is available, e.g., fruit flies (e.g.,
D. melanogaster), nematodes (e.g., C. elegans), and mammalian cells such as cells derived from mice and humans. For example, 100% of the genome of D. melanogaster has been sequenced (Jasny, 2000, Science 287:2181). Methods for disruption of specific genes are well known to those of skill in the art, see, e.g., Anderson, 1995, Methods Cell Biol. 48:31;
Pettitt et al, 1996, Development 122:4149-4157; Spradling et al, 1995, Proc. Natl. Acad.
Sci. USA; Ramirez-Solis et al, 1993, Methods Enzymol. 225:855; and Thomas et al, 1987, Cell 51 :503, each of which is incorporated herein by reference in its entirety for all purposes.
Other known methods of cellular modification target RNA abundances or activities, protein abundances, or protein activities. Examples of such methods are described in the following.
Methods of Modifying RNA Abundances or Activities
Methods of modifying RNA abundances and activities currently fall within four classes, ribozymes, antisense species, and RNA aptamers (Good et al, 1997, Gene Therapy 4: 45-54). Ribozymes are RNAs which are capable of catalyzing RNA cleavage reactions. (Cech, 1987, Science 236:1532-1539; PCT International Publication WO 90/11364, published October 4, 1990; Sarver et al, 1990, Science 247: 1222-1225). "Hairpin" and "hammerhead" RNA ribozymes can be designed to specifically cleave a particular target mRNA. Rules have been established for the design of short RNA molecules with ribozyme activity, which are capable of cleaving other RNA molecules in a highly sequence specific way and can be targeted to virtually all kinds of RNA. (Haseloff et al, 1988, Nature
334:585-591; Koizumi et al, 1988, FEBS Lett, 228:228-230; Koizumi et al, 1988, FEBS Lett., 239:285-288). Ribozyme methods involve exposing a cell to, inducing expression in a cell, etc. of such small RNA ribozyme molecules. (Grassi and Marini, 1996, Annals of Medicine 28: 499-510; Gibson, 1996, Cancer and Metastasis Reviews 15: 287-299). Ribozymes can be routinely expressed in vivo in sufficient number to be catalytically effective in cleaving mRNA, and thereby modifying mRNA abundances in a cell. (Cotten et al, 1989, Ribozyme mediated destruction of RNA in vivo, The EMBO J. 8:3861-3866). In particular, a ribozyme coding DNA sequence, designed according to the previous rules and synthesized, for example, by standard phosphoramidite chemistry, can be ligated into a restriction enzyme site in the anticodon stem and loop of a gene encoding a tRNA, which can then be transformed into and expressed in a cell of interest by methods routine in the art. tDNA genes (i.e., genes encoding tRNAs) are useful in this application because of their small size, high rate of transcription, and ubiquitous expression in different kinds of tissues. Alternately, an inducible promoter (e.g., a glucocorticoid or a tetracycline response element) can by used so that ribozyme expression can be selectively controlled. Therefore, ribozymes can be routinely designed to cleave virtually any mRNA sequence, and a cell can be routinely transformed with DNA coding for such ribozyme sequences such that a catalytically effective amount of the ribozyme is expressed. Accordingly the abundance of virtually any RNA species in a cell can be essentially eliminated.
In another embodiment, activity of a target RNA (preferable mRNA) species, specifically its rate of translation, is inhibited by use of antisense nucleic acids. An
"antisense" nucleic acid as used herein refers to a nucleic acid capable of hybridizing to a sequence-specific (e.g., non-poly A) portion of the target RNA, for example its translation initiation region, by virtue of some sequence complementarity to a coding and/or non- coding region. The antisense nucleic acids of the invention can be oligonucleotides that are double-stranded or single-stranded, RNA or DNA or a modification or derivative thereof, which can be directly administered to a cell or which can be produced intracellularly by transcription of exogenous, introduced sequences in quantities sufficient to inhibit translation of the target RNA.
Preferably, antisense nucleic acids are of at least six nucleotides and are preferably oligonucleotides (ranging from 6 to about 200 oligonucleotides). In specific aspects, the oligonucleotide is at least 10 nucleotides, at least 15 nucleotides, at least 100 nucleotides, or at least 200 nucleotides. The oligonucleotides can be DNA or RNA or chimeric mixtures or derivatives or modified versions thereof, single-stranded or double-stranded. The oligonucleotide can be modified at the base moiety, sugar moiety, or phosphate backbone. The oligonucleotide may include other appending groups such as peptides, or agents facilitating transport across the cell membrane (see, e.g., Letsinger et al, 1989, Proc. Natl. Acad. Sci. U.S.A. 86: 6553-6556; Lemaitre et al, 1987, Proc. Natl. Acad. Sci. 84: 648-652; PCT Publication No. WO 88/09810, published December 15, 1988), hybridization-triggered cleavage agents (see, e.g., Krol et al, 1988, BioTechniques 6: 958-976) or intercalating agents (see, e.g., Zon, 1988, Pharm. Res. 5: 539-549). In a preferred aspect of the invention, an antisense oligonucleotide is provided, preferably as single-stranded DNA. The oligonucleotide may be modified at any position on its structure with constituents generally known in the art.
The antisense oligonucleotides may comprise at least one modified base moiety which is selected from the group including but not limited to 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine,
5-(carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta- D-mannosylqueosine, 5'-methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-N6- isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil- 5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine.
In another embodiment, the oligonucleotide comprises at least one modified sugar moiety selected from the group including, but not limited to, arabinose, 2-fluoroarabinose, xylulose, and hexose.
In yet another embodiment, the oligonucleotide comprises at least one modified phosphate backbone selected from the group consisting of a phosphorothioate, a phosphorodithioate, a phosphoramidothioate, a phosphoramidate, a phosphordiamidate, a methylphosphonate, an alkyl phosphotriester, and a formacetal or analog thereof.
In yet another embodiment, the oligonucleotide is a 2-α-anomeric oligonucleotide. An -anomeric oligonucleotide forms specific double-stranded hybrids with complementary RNA in which, contrary to the usual β-units, the strands run parallel to each other (Gautier et al, 1987, Nucl. Acids Res. 15: 6625-6641).
The oligonucleotide may be conjugated to another molecule, e.g., a peptide, hybridization triggered cross-linking agent, transport agent, hybridization-triggered cleavage agent, etc. Oligonucleotides of the invention may be synthesized by standard methods known in the art, e.g by use of an automated DNA synthesizer (such as are commercially available from Biosearch, Applied Biosystems, etc.). As examples, phosphorothioate oligonucleotides may be synthesized by the method of Stein et al. (1988, Nucl. Acids Res. 16: 3209), methylphosphonate oligonucleotides can be prepared by use of controlled pore glass polymer supports (Sarin et al, 1988, Proc. Natl. Acad. Sci. U.S.A. 85: 7448-7451), etc. In another embodiment, the oligonucleotide is a 2'-0-methylribonucleotide (Inoue et al, 1987, Nucl. Acids Res. 15: 6131-6148), or a chimeric RNA-DNA analog (Inoue et al, 1987, FEBS Lett. 215: 327-330).
In an alternative embodiment, the antisense nucleic acids of the invention are produced intracellularly by transcription from an exogenous sequence. For example, a vector can be introduced in vivo such that it is taken up by a cell, within which cell the vector or a portion thereof is transcribed, producing an antisense nucleic acid (RNA) of the invention. Such a vector would contain a sequence encoding the antisense nucleic acid. Such a vector can remain episomal or become chromosomally integrated, as long as it can be transcribed to produce the desired antisense RNA. Such vectors can be constructed by recombinant DNA technology methods standard in the art. Nectors can be plasmid, viral, or others known in the art, used for replication and expression in mammalian cells. Expression of the sequences encoding the antisense RΝAs can be by any promoter known in the art to act in a cell of interest. Such promoters can be inducible or constitutive. Such promoters for mammalian cells include, but are not limited to: the SN40 early promoter region (Bernoist and Chambon, 1981, Nature 290: 304-310), the promoter contained in the 3' long terminal repeat of Rous sarcoma virus (Yamamoto et al, 1980, Cell 22: 787-797), the herpes thymidine kinase promoter (Wagner et al, 1981, Proc. Natl. Acad. Sci. U.S.A. 78: 1441-1445), the regulatory sequences of the metallothionein gene (Brinster et al, 1982, Nature 296: 39-42), etc. The antisense nucleic acids of the invention comprise a sequence complementary to at least a portion of a target RNA species. However, absolute complementarity, although preferred, is not required. A sequence "complementary to at least a portion of an RNA," as referred to herein, means a sequence having sufficient complementarity to be able to hybridize with the RNA, forming a stable duplex; in the case of double-stranded antisense nucleic acids, a single strand of the duplex DNA may thus be tested, or triplex formation may be assayed. The ability to hybridize will depend on both the degree of complementarity and the length of the antisense nucleic acid. Generally, the longer the hybridizing nucleic acid, the more base mismatches with a target RNA it may contain and still form a stable duplex (or triplex, as the case may be). One skilled in the art can ascertain a tolerable degree of mismatch by use of standard procedures to determine the melting point of the hybridized complex. The amount of antisense nucleic acid that will be effective in the inhibition of translation of the target RNA can be determined by standard assay techniques.
Therefore, antisense nucleic acids can be routinely designed to target virtually any mRNA sequence, and a cell can be routinely transformed with or exposed to nucleic acids
5 coding for such antisense sequences such that an effective amount of the antisense nucleic acid is expressed. Accordingly the translation of virtually any RNA species in a cell can be inhibited.
In a further embodiment, RNA aptamers can be introduced into or expressed in a cell. RNA aptamers are specific RNA ligands for proteins, such as for Tat and Rev RNA
10 (Good et al, 1997, Gene Therapy 4: 45-54) that can specifically inhibit their translation.
Post-transcriptional gene silencing (PTGS) or RNA interference (RNAi) can also be used to modify RNA abundances (Guo et al, 1995, Cell 81:611-620; Fire et al, 1998, Nature 391:806-811). In RNAi, dsRNAs are injected into cells to specifically block expression of its homologous gene. In particular, in RNAi, both the sense strand and the
15 anti-sense strand can inactivate the corresponding gene. It is suggested that the dsRNAs are cut by nucleases into 21-23 nucleotide fragments. These fragments hybridize to the homologous region of their corresponding mRNAs to form double-stranded segments, which are then degraded by nucleases (Grant, 1999, Cell 96:303-306; Zamore et al, 2000, Cell 101:25-33; Bass, 2000, Cell 101:235-238; Petcherski et al, 2000, Nature 405:364-
20 368). Therefore, in one embodiment, one or more dsRNAs having sequences homologous to the sequences of one or more mRNAs whose abundances are to be modified are transfected into a cell or tissue sample. Any standard methods for introducing nucleic acids into cells can be used.
25 Methods of Modifying Protein Abundances
Methods of modifying protein abundances include, mter alia, those altering protein degradation rates and those using antibodies (which bind to proteins affecting abundances ofactiviti.es of native target protein species). Increasing (or decreasing) the degradation rates of a protein species decreases (or increases) the abundance of that species. Methods
30 for controllably increasing the degradation rate of a target protein in response to elevated temperature or exposure to a particular drug, which are known in the art, can be employed in this invention. For example, one such method employs a heat-inducible or drug- inducible N-terminal degron, which is an N-terminal protein fragment that exposes a degradation signal promoting rapid protein degradation at a higher temperature (e.g., 37° C)
35 and which is hidden to prevent rapid degradation at a lower temperature (e.g., 23 ° C)
(Dohmen et. al, 1994, Science 263:1273-1276). Such an exemplary degron is Arg-DHFRts, a variant of murine dihydrofolate reductase in which the N-terminal Val is replaced by A g and the Pro at position 66 is replaced with Leu. According to this method, for example, a gene for a target protein, P, is replaced by standard gene targeting methods known in the art (Lodish et al, 1995, Molecular Biology of the Cell W.H. Freeman and Co., New York, especially chap 8) with a gene coding for the fusion protein Ub-Arg-DHFRts-P ("Ub" stands for ubiquitin). The N-terminal ubiquitin is rapidly cleaved after translation exposing the N- terminal degron. At lower temperatures, lysines internal to Arg-DHFRts are not exposed, ubiquitination of the fusion protein does not occur, degradation is slow, and active target protein levels are high. At higher temperatures (in the absence of methotrexate), lysines internal to Arg-DHFRts are exposed, ubiquitination of the fusion protein occurs, degradation is rapid, and active target protein levels are low. Heat activation is blocked by exposure methotrexate. This method is adaptable to other N-terminal degrons which are responsive to other inducing factors, such as drugs and temperature changes.
Target protein abundances and also, directly or indirectly, their activities can also be decreased by (neutralizing) antibodies. For example, antibodies to suitable epitopes on protein surfaces may decrease the abundance, and thereby indirectly decrease the activity, of the wild-type active form of a target protein by aggregating active forms into complexes with less or minimal activity as compared to the wild-type unaggregated wild-type form. Alternately, antibodies may directly decrease protein activity by, e.g., interacting directly with active sites or by blocking access of substrates to active sites. Conversely, in certain cases, (activating) antibodies may also interact with proteins and their active sites to increase resulting activity. In either case, antibodies (of the various types to be described) can be raised against specific protein species (by the methods to be described) and their effects screened. The effects of the antibodies can be assayed and suitable antibodies selected that raise or lower the target protein species concentration and/or activity. Such assays involve introducing antibodies into a cell (see below), and assaying the concentration of the wild-type amount or activities of the target protein by standard means (such as immunoassays) known in the art. The net activity of the wild-type form can be assayed by assay means appropriate to the known activity of the target protein. Antibodies can be introduced into cells in numerous fashions, including, for example, microinjection of antibodies into a cell (Morgan et al, 1988, Immunology Today 9:84-86) or transforming hybridoma mRNA encoding a desired antibody into a cell (Burke et al, 1984, Cell 36:847-858). In a further technique, recombinant antibodies can be engineering and ectopically expressed in a wide variety of non-lymphoid cell types to bind to target proteins as well as to block target protein activities (Biocca et al, 1995, Trends in Cell Biology 5:248-252). A first step is the selection of a particular monocolonal antibody with appropriate specificity to the target protein (see below). Then sequences encoding the variable regions of the selected antibody can be cloned into various engineered antibody formats, including, for example, whole antibody, Fab fragments, Fv fragments, single chain Fv fragments (NH and NL regions united by a peptide linker) ("ScFv" fragments), diabodies (two associated ScFv fragments with different specificities), and so forth (Hayden et al, 1997, Current Opinion in Immunology 9:210-212). Intracellularly expressed antibodies of the various formats can be targeted into cellular compartments (e.g., the cytoplasm, the nucleus, the mitochondria, etc.) by expressing them as fusions with the various known intracellular leader sequences (Bradbury et al, 1995, Antibody Engineering (vol. 2) (Borrebaeck ed.), pp 295-361, IRL Press). In particular, the ScFv format appears to be particularly suitable for cytoplasmic targeting.
Antibody types include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, Fab fragments, and an Fab expression library. Various procedures known in the art may be used for the production of polyclonal antibodies to a target protein. For production of the antibody, various host animals can be immunized by injection with the target protein, such host animals include, but are not limited to, rabbits, mice, rats, etc. Various adjuvants can be used to increase the immunological response, depending on the host species, and include, but are not limited to, Freund's (complete and incomplete), mineral gels such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, dinitrophenol, and potentially useful human adjuvants such as bacillus Calmette-Guerin (BCG) and corynebacterium parvum.
For preparation of monoclonal antibodies directed towards a target protein, any technique that provides for the production of antibody molecules by continuous cell lines in culture may be used. Such techniques include, but are not restricted to, the hybridoma technique originally developed by Kohler and Milstein (1975, Nature 256: 495-497), the trioma technique, the human B-cell hybridoma technique (Kozbor et al, 1983, Immunology Today 4: 72), and the EBV hybridoma technique to produce human monoclonal antibodies (Cole et al, 1985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). In an additional embodiment of the invention, monoclonal antibodies can be produced in germ-free animals utilizing recent technology (PCT/US90/02545). According to the invention, human antibodies may be used and can be obtained by using human hybridomas (Cote et al, 1983, Proc. Natl. Acad. Sci. USA 80: 2026-2030), or by transforming human B cells with EBV virus in vitro (Cole et al, 1985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). In fact, according to the invention, techniques developed for the production of "chimeric antibodies" (Morrison et al, 1984, Proc. Natl. Acad. Sci. USA 81: 6851-6855; Neuberger et al, 1984, Nature 312:604-608; Takeda et al, 1985, Nature 314: 452-454) by splicing the genes from a mouse antibody molecule specific for the target protein together with genes from a human antibody molecule of appropriate biological activity can be used; such antibodies are within the scope of this invention. Additionally, where monoclonal antibodies are advantageous, they can be alternatively selected from large antibody libraries using the techniques of phage display (Marks et al, 1992, J. Biol. Chem. 267:16007-16010). Using this technique, libraries of up to 1012 different antibodies have been expressed on the surface of fd filamentous phage, creating a "single pot" in vitro immune system of antibodies available for the selection of monoclonal antibodies (Griffiths et al, 1994, EMBO J. 13:3245-3260). Selection of antibodies from such libraries can be done by techniques known in the art, including contacting the phage to immobilized target protein, selecting and cloning phage bound to the target, and subcloning the sequences encoding the antibody variable regions into an appropriate vector expressing a desired antibody format. According to the invention, techniques described for the production of single chain antibodies (U.S. Patent No. 4,946,778) can be adapted to produce single chain antibodies specific to the target protein. An additional embodiment of the invention utilizes the techniques described for the construction of Fab expression libraries (Huse et al, 1989, Science 246: 1275-1281) to allow rapid and easy identification of monoclonal Fab fragments with the desired specificity for the target protein.
Antibody fragments that contain the idiotypes of the target protein can be generated by techniques known in the art. For example, such fragments include, but are not limited to: the F(ab')2 fragment which can be produced by pepsin digestion of the antibody molecule; the Fab' fragments that can be generated by reducing the disulfide bridges of the F(ab')2 fragment, the Fab fragments that can be generated by treating the antibody molecule with papain and a reducing agent, and Fv fragments.
In the production of antibodies, screening for the desired antibody can be accomplished by techniques known in the art, e.g., ELISA (enzyme-linked immunosorbent assay). To select antibodies specific to a target protein, one may assay generated hybridomas or a phage display antibody library for an antibody that binds to the target protein. Methods of Modifying Protein Activities
Methods of directly modifying protein activities include, inter alia, dominant negative mutations, specific drugs (used in the sense of this application), and also the use of antibodies, as previously discussed. Dominant negative mutations are mutations to endogenous genes or mutant exogenous genes that when expressed in a cell disrupt the activity of a targeted protein species. Depending on the structure and activity of the targeted protein, general rules exist that guide the selection of an appropriate strategy for constructing dominant negative mutations that disrupt activity of that target (Hershkowitz, 1987, Nature 329:219-222). In the case of active monomerie forms, over expression of an inactive form can cause competition for natural substrates or ligands sufficient to significantly reduce net activity of the target protein. Such over expression can be achieved by, for example, associating a promoter of increased activity with the mutant gene. Alternatively, changes to active site residues can be made so that a virtually irreversible association occurs with the target ligand. Such can be achieved with certain tyrosine kinases by careful replacement of active site serine residues (Perlmutter et al, 1996, Current Opinion in --h-i-munology 8:285-290). In the case of active multimeric forms, several strategies can guide selection of a dominant negative mutant. Multimeric activity can be decreased by expression of genes coding exogenous protein fragments that bind to multimeric association domains and prevent multimer formation. Alternatively, over expression of an inactive protein unit of a particular type can tie up wild-type active units in inactive multimers, and thereby decrease multimeric activity (Nocka et al, 1990, The EMBO J. 9:1805-1813). For example, in the case of dimeric DNA binding proteins, the DNA binding domain can be deleted from the DNA binding unit, or the activation domain deleted from the activation unit. Also, in this case, the DNA binding domain unit can be expressed without the domain causing association with the activation unit. Thereby, DNA binding sites are tied up without any possible activation of expression. In the case where a particular type of unit normally undergoes a conformational change during activity, expression of a rigid unit can inactivate resultant complexes. For a further example, proteins involved in cellular mechanisms, such as cellular motility, the mitotic process, cellular architecture, and so forth, are typically composed of associations of many subunits of a few types. These structures are often highly sensitive to disruption by inclusion of a few monomerie units with structural defects. Such mutant monomers disrupt the relevant protein activities.
In addition to dominant negative mutations, mutant target proteins that are sensitive to temperature (or other exogenous factors) can be found by mutagenesis and screening procedures that are well-known in the art. Also, one of skill in the art will appreciate that expression of antibodies binding and inhibiting a target protein can be employed as another dominant negative strategy.
Finally, alternatively to techniques involving mutations, activities of certain target proteins can be altered by exposure to exogenous drugs or ligands. In a preferable case, a drug is known that interacts with only one target protein in the cell and alters the activity of only that one target protein. Exposure of a cell to that drug thereby modifies the cell. The alteration can be either a decrease or an increase of activity. Less preferably, a drug is known and used that alters the activity of only a few (e.g., 2-5) target proteins with separate, distinguishable, and non-overlapping effects.
5.7 METHODS OF CORRECTING THE
EFFECTS OF ANEUPLOIDY IN PROFILES
In one embodiment, the methods of the present invention are directed toward correcting for the effects of aneuploidy in a profile. As discussed above, aneuploidy may arise spontaneously in a cell as indirect result of, for example, a mutation, or missegregation of chromosomes or some selection, such as one that offers a growth advantage. Consequently, profiles may be generated from aneuploid cells where the aneuploidy is not a desired characteristic, but where it contaminates the profile. In fact, aneuploidy may go undetected in the cells from which the profiles were generated. The results of undesired and undetected aneuploidy may be spurious correlations between profiles, which may in turn lead to erroneous interpretations of, for example, gene function.
An illustration of this is the correlation between profiles for two Saccharomyces cerevisiae null mutants, erg4 and ecml8/ecml8 (see Example, below). Neither of these mutants is known to cause chromosome instability. Nevertheless, if each of the mutants is aneuploid for the same chromosome or chromosomal segment, then a correlation will exist between the profiles. For example, if each mutant has an extra copy of chromosome VII, then both profiles will exhibit, e.g., increased transcript levels from the genes on chromosome VII. As a result, the correlation coefficient for the two profiles will be high (see Fig. \d,e). It might be inferred from this correlation that the two genes, ERG4 and ECM18, are in the same cellular pathway. Alternatively, if modes of action of a drag are being determined, the correlation between the profiles may lead to an incorrect interpretation of the action of the drug. In order for profiles to more accurately reflect changes in measured amounts of cellular constituents due to a known perturbation, such as a gene knockout or exposure to a particular level of a drug, they are preferably corrected for the effects of aneuploidy.
In some embodiments, profiles may be corrected for the effects of aneuploidy as follows. In a first step, the mean chromosomal ratio offset for the affected chromosome or chromosomal segment is determined. The mean chromosomal ratio offset is the difference between the mean quantified level of a plurality of cellular constituents associated with a plurality of genes having an abnoπnal copy number (i.e., those mapped to the aneuploid chromosome or chromosomal segment) and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes having a wild type copy number (i.e., those mapped to chromosomes or chromosomal segments having a wild type copy number) (Fig. ld,e). For example, the mean chromosomal ratio offset for chromosome Nil shown in Fig. Id is about 100'2, or about 58%, while that shown in Fig. Ie is about 100 14, or about 35%. In a second step, the mean quantified level of the plurality of cellular constituents associated with the plurality of genes mapped to the affected chromosome or chromosomal segment is divided by the mean chromosomal ratio offset in order to correct for the effects of aneuploidy. As a result of the correction, the mean quantified level of the plurality of cellular constituents associated with the plurality of genes on the aneuploid chromosome of Fig. Id will be decreased by 58% and the mean quantified level of the plurality of cellular constituents associated with the plurality of genes on the aneuploid chromosome of Fig. Ie will be decreased by 35%.
In one embodiment, the plurality of cellular constituents is m-RΝA transcripts, and the mean quantified level is an expression ratio, i.e., the ratio of the level of gene transcripts in the aneuploid cell and the level of gene transcripts in a wild type cell. In some embodiments, the mean chromosomal ratio offset is determined for at least 2 genes, preferably at least 10 genes, more preferably at least 50 genes mapped to the same aneuploid chromosome or chromosomal segment.
Another illustration of aneuploidy resulting in a spurious correlation of profiles is the correlation of the profiles of Saccharomyces cerevisiae mutants +/mcml and yor080w/yor080w (Fig. 5). Here, the cells harboring the mutations have lost chromosome III, on which is located the 2 transcription factor. This factor regulates transcription on many other chromosomes. Consequently, loss of chromosome III affects not only levels of cellular constituents associated with genes on chromosome III, but also levels of cellular constituents associated with genes on many other chromosomes. As described above in this section, the mean chromosomal ratio offset can be determined and the expression ratio of genes on chromosome III can be divided by this amount. However, this correction would clearly be suboptimal because it does not correct for changes in levels of cellular constituents associated with genes on other chromosomes.
Accordingly, in one aspect of the invention, profiles may be corrected for the effects of aneuploidy as follows. In a first step, the mean ratio offset for at least 50%, at least 75%, or preferably all genes known to be affected by the aneuploidy is determined. For example, in the case of loss of chromosome III, the mean ratio offset is determined for all genes known to be regulated by a gene on chromosome III, such as the transcription factor α2. In a second step, the mean quantified level of a plurality of cellular constituents altered by the presence of aneuploidy is divided by the mean ratio offset in order to correct for the effects of aneuploidy. The mean ratio offset is determined for at least two affected genes, preferably at least 10 affected genes, more preferably at least 50 affected genes. One of skill in the art will recognize that, in this embodiment, the identities of genes regulated by a particular gene on an aneuploid chromosome or chromosomal segment are preferably known. In some embodiments in accordance with this aspect of the invention, the genes affected by the aneuploidy are classified based on a characteristic. For example, some genes regulated by a given aneuploidy may tend to be "highly regulated" or "strongly induced" (or "strongly repressed"), and another class of genes might be "slightly regulated" or "slightly induced" (or "slightly repressed") by the aneuploidy. If so, then application of a mean offset for these classes of genes should be different. For example, some genes might always be strongly induced, say 20-fold, while other are only slightly induced, say 1.5-fold. Clearly, the expression ratios of genes in each of these classes would be divided by the mean offset for that class of genes. It would make little sense to treat all genes regulated by a particular gene affected by an aneuploidy the same way, if in fact, some were highly induced and others slightly induced. To classify genes affected by a given aneuploidy, by way of example, techniques to define a covarying set may be used (see e.g., Section 5.5).
5.8 IMPLEMENTATION SYSTEMS AND METHODS
The analytic methods described in the previous subsections can preferably be implemented by use of the following computer systems and according to the following programs and methods. Figure 3 illustrates an exemplary computer system suitable for implementation of the analytic methods of this invention. Computer system 301 is illustrated as comprising internal components and being linked to external components. The internal components of this computer system include processor element 302 interconnected with main memory 303. For example, computer system 301 can be an Intel Pentium®-based processor of 200 MHz or greater clock rate and with 32 MB or more of main memory.
It is noted that although the present description and figures refer to an exemplary computer system having a memory unit and a processor unit, the computer systems of the present invention are not limited to those consisting of a single memory unit or a single processor unit. Indeed, computer systems comprising a plurality of processor units and/or a plurality of memory units (e.g., having a plurality of SIMMS or DRAMS) are well known in the art. Indeed, such systems are generally recognized in the art as having improved performance capabilities over computer systems that have only a single processor unit or a single memory unit. For example, in one preferred embodiment, computer system 301 is an Alta cluster of nine computers; a head "node" and eight sibling "nodes," each having an
5 i686 central processing unit ("CPU"). In addition, the Alta cluster comprises 128Mb of random access memory ("RAM") on the head node and 256 Mb of RAM on each of the eight sibling nodes. Nevertheless and as the skilled artisan readily appreciates, as such computer systems relate to the present invention, a computer system that has a plurality of memory units and/or a plurality or processor units is, in fact, substantially equivalent to the
10 exemplary computer system depicted in FIG. 3 and having only a single processor and a single memory unit.
The external components include mass storage 304. This mass storage can be one or more hard disks which are typically packaged together with the processor and memory. Such hard disks are typically of 1 Gb or greater storage capacity and more preferably having
15 at least 6 Gb of storage capacity. For example, in the preferred embodiment described above each node of the Alta cluster comprises a hard drive. Specifically, the head node has a hard drive with 6 Gb of storage capacity whereas each sibling node has a hard drive with 9 Gb of storage capacity. Other external components include user interface device 305, which can be a monitor and a keyboard together with a pointing device 306 such as a
20 "mouse" or other graphical input device. Typically, the computer system is also linked to a network link 307, which can be, e.g., part of an Ethernet link to other local computer systems, remote computer systems, or wide area communication networks such as the Internet. For example, each computer system in the preferred Alta cluster of computers described above is connected via an NFS network. This network link allows the computer
25 systems in the cluster to share data and processing tasks with one another.
Loaded into memory during operation of this system are several software components, which are both standard in the art and special to the instant invention. These software components collectively cause the computer system to function according to the methods of the invention. The software components are typically stored on mass storage
30 304. Software component 310 represents an operating system, which is responsible for managing the computer system and its network interconnections. The operating system can be, for example, of the Microsoft Windows™ family, such as Windows 98, Window 95 or Windows NT. Alternatively, the operating system can be a Macintosh operating system, a UNIX operating system or the LINUX operating system. Software component 311
35 represents common languages and functions conveniently present in the system to assist programs implementing the methods specific to the present invention. Languages that can be used to program the analytic methods of the invention include, for example, C, and C++; PERL; FORTRAN; and JAVA. The methods of the present invention can also be programmed or modeled in mathematical software packages that allow symbolic entry of equations and high-level specification of processing, including specific algorithms to be used, thereby freeing a user of the need to procedurally program individual equations and algorithms. Such packages include, e.g., Matlab from Mathworks (Natick, MA), Mathematica from Wolfram-Research (Chapaign, Illinois) or S-Plus from Math Soft (Seattle, Washington). Accordingly, software component 312 represents analytic methods of the present invention as programmed in a procedural language or symbolic package. In a preferred embodiment, the computer system also contains a database 313 of landmark profiles.
In an exemplary implementation, to practice the methods of the present invention, a user first loads profile data into the computer system 301. These data can be directly entered by the user from monitor 305 and keyboard 306, or from other computer systems linked by network connection 307, or on removable storage media such as a CD-ROM or floppy disk (not illustrated) or through the network (307). Next the user causes execution of profile analysis software 312 which performs the steps of comparing the profile to the database 313 of landmark profiles.
In one embodiment, a computer system for determining whether aneuploidy is likely to be present in a cell type or organism comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute steps of: (a) comparing quantified levels of cellular constituents associated with a plurality of genes in the genome of one or more cells of said cell type or organism, said plurality of genes being mapped to the same chromosome, to mean quantified levels of cellular constituents associated with genes mapped to different chromosomes; and (b) identifying genes mapped to the same chromosome for which the quantified level is substantially the same for each cellular constituent associated with each of said genes and is dissimilar to mean quantified levels of said plurality of cellular constituents associated with genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism.
In another embodiment, a computer system for detecting the predisposition of a cell type or organism to a disease comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: (a) comparing quantified levels of cellular constituents associated with a plurality of genes in the genome of one or more cells of said cell type or organism, said plurality of genes being mapped to the same cliromosome, to mean quantified levels of a plurality of cellular constituents associated with genes mapped to different chromosomes; and (b) identifying genes mapped to the same cliromosome for which the quantified levels of cellular constituents associated with said genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present, and that said cell type or organism is likely to be predisposed to a disease associated with said aneuploidy of said same chromosome or portion thereof.
In another embodiment, a computer system for diagnosing a disease associated with a known aneuploidy in a cell type or type of organism comprises one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism.
In still another embodiment, a computer system for detecting the presence of aneuploidy in a cell type or type of organism comprises one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein the known alteration in copy number of at least one known gene in the one or more landmark profiles determined to be most similar is indicative of the presence of aneuploidy in said first cell type or type of organism. In still another embodiment, a computer system for determining whether aneuploidy is likely to be present in a cell type or organism, comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute step of: identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild-type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild- type organism; and wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes encoding said one or more cellular constituents is likely to be present in said cell type or organism. In still another embodiment, a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value of the mean chromosomal offset ratio.
In another embodiment, a computer system for correcting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnoπnal copy number; and dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
In yet another embodiment, a computer system for coπecting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: . dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number.
In still another embodiment, a computer system for coπecting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising comprises: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
In an exemplary implementation, a computer program product is provided for directing a user computer in a computer-aided determination of whether aneuploidy is likely to be present in a cell type or organism comprises: computer code for comparing quantified levels of a plurality of cellular constituents associated with a plurality of genes in the genome of one or more cells of said cell type or organism mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with genes mapped to different chromosomes; and computer code for identifying genes mapped to the same chromosome for which the quantified level of cellular constituents associated with a plurality of genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of cellular constituents associated with genes mapped to different chromosomes; wherein identifying said genes indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism.
In another exemplary implementation, a computer program product is provided for directing a user computer in a computer-aided determination of the predisposition of a cell type or organism to a disease associated with a known aneuploidy comprises: computer code for comparing quantified levels of a plurality of cellular constituents associated with a plurality of genes in the genome of one or more cells of said cell type or organism mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with genes mapped to different chromosomes; and computer code for identifying genes mapped to the same chromosome for which the quantified level of cellular constituents associated with a plurality of genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of cellular constituents associated with genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present, and that said cell type or organism is likely to be predisposed to a disease associated with said aneuploidy of said same chromosome or portion thereof. In another exemplary implementation, a computer program product is provided for directing a user computer in a computer-aided diagnosis of a disease associated with a known aneuploidy in a cell type or organism, said computer program product comprises: computer code for comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism. h still another exemplary implementation, a computer program product is provided for directing a user computer in a computer-aided diagnosis of a disease associated with a known aneuploidy in a cell type or organism, said computer program product comprises: computer code for comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein the known alteration in copy number of at least one known gene in the one or more landmark profiles determined to be most similar is indicative of the presence of aneuploidy in said first cell type or type of organism. In yet another exemplary implementation, a computer program product is provided for directing a user computer in a computer-aided determination that aneuploidy is likely to be present in a cell type or organism comprises: computer code for identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild- type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild- type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild-type organism; and wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes encoding said one or more cellular constituents is likely to be present in said cell type or organism.
In yet another exemplary implementation, a computer program product is provided for directing a user computer in a computer-aided coπection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprises: computer code for determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value of the mean chromosomal offset ratio. hi another exemplary implementation, a computer program product is provided for directing a user computer in a computer-aided coπection of a profile of a cell type or organism for aneuploidy of a cliromosome or chromosomal segment, said computer program product comprises: computer code for determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnonnal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
In still another exemplary implementation, a computer program product is provided for directing a user computer in a computer-aided coπection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprises: computer code for dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number. In still another exemplary implementation, a computer program product is provided for directing a user computer in a computer-aided coπection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprises: computer code for dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number. Alternative computer systems and software for implementing the analytic methods of this invention will be apparent to one of skill in the art and are intended to be comprehended within the accompanying claims. In particular, the accompanying claims are intended to include the alternative program structures for implementing the methods of this invention that will be readily apparent to one of skill in the art.
5.9 ANALYTIC KIT IMPLEMENTATION
In a prefened embodiment, the methods of this invention can be implemented by use of kits for determining the biological state of a cell type or organism. Such kits contain microaπays, such as those described in subsections below. The microaπays include one or more test probes, each of which has a polynucleotide sequence that is complementary to a sequence of RNA or DNA to be detected. Each probe preferably has a different nucleic acid sequence, and the position of each probe on the solid surface is preferably known. Indeed, the microaπays are preferably addressable aπays, and more preferably are positionally addressable aπays. Specifically, each probe (or group of identical probe molecules) of the aπay is preferably located at a known, predetermined position on the solid support such that the identity (i.e., the sequence) of each probe can be determined from its position on the aπay (i.e., on the support or surface), hi prefened embodiments, each probe is covalently attached to the solid support at a single site. In particular, the probes contained in the kits of this invention are nucleic acids capable of hybridizing specifically to nucleic acid sequences derived from RNA species that are known to increase or decrease in a cell or organism having a particular altered gene copy number that is detected by the kit. The probes contained in the kits of the invention preferably substantially exclude nucleic acids that hybridize to RNA species that are not increased or decreased in a cell or organism having a particular altered gene copy number that is detected by the kit. In one embodiment, the kits of the invention comprise an array comprising a positionally-addressable aπay of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of said subject, wherein said different nucleotide sequences are known to be increased or decreased as a result of aneuploidy and expression profiles, in electronic or written form, each coπelated to a known alteration in copy number of at least one gene, wherein said expression profiles are obtained by measuring a plurality of cellular constituents in a cell of said subject having a known alteration in copy number of said at least one gene. In another embodiment, the kits of the invention comprise an aπay comprising a positionally-addressable aπay of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of an organism, and a container comprising RNA, or cDNA derived therefrom, of a cell having a known aneuploidy.
In particular, kits can be used to diagnose a disease associated with aneuploidy in a cell type or organism, i.e., by determimng the profile of the cell type or organism and comparing the profile to a compendium of landmark profiles from cells having a known alteration in copy number in at least one gene that is associated with a disease in order to determine if the cell type or organism exhibits the aneuploidy associated with the disease. Alternatively, a profile of a first cell at a later developmental stage can be predicted from the profile of the first cell measured at an earlier developmental stage and can be compared to a compendium having profiles from a second cell that is at a developmental stage more similar to the later developmental stage of the first cell and exhibiting aneuploidy associated with a disease in order to determine the first cell's predisposition to the disease.
Diseases in humans associated with aneuploidy that can be diagnosed or predicted using the kits of the invention include, but are not limited to, trisomic diseases such as Down syndrome cases (trisomy of chromosome 21), Edwards syndrome cases (trisomy of chromosome 18) and Patau syndrome (trisomy of chromosome 13); diseases associated with deletions of an arm of a chromosome, such as cri du chat syndrome (5p deletion) and Wolf-Hirschhorn syndrome (4p deletion); diseases associated with contiguous gene syndromes such as Alagille syndrome (20p.l2 deletion), Angelman syndrome (maternal chromosome at 15ql l deletion), DiGeorge syndrome (22qll.21 deletion), Langer-Giedion syndrome (8q24.1 deletion), Miller-Dieker syndrome (17pl3.3 deletion), Prader-Willi syndrome (paternal chromosome at 15qll deletion), Rubinstein-Taybi syndrome (16pl3- deletion), Smith Magenis syndrome (17pl l.2 deletion), and Williams syndrome (7qll.23 deletion); diseases associated with sex chromosome abnormalities such as Turner syndrome (45,X female or 45,X/46,XX or 45,X/47,XXX mosaics), Triple X syndrome (47,XXX), rare X chromosome abnormalities (48,XXXX and 49,XXXXX), Klinefelter's syndrome (47,XXY male) and 47,XYY syndrome.
Certain cancers in humans are also associated with gene amplifications, deletions or translocations and can be diagnosed or predicted using the kits of the present invention. These cancers that may be associated with aneuploidy include, but are not limited to, colon cancer; breast cancer; leukemias, such as acute myelogenous leukemia, chronic myelocytic leukemia, acute promyelocytic leukemia, acute nonlymphocytic leukemia, acute monocytic leukemia, and acute myelomonocytic leukemia; lymphomas, such as Burkitt's lymphoma, and non-Hodgkin's lymphoma; lymphocytic leukemias, such as acute lymphoblastic leukemia and chronic lymphocytic leukemia; myeloproliferative diseases; adenocarcinomas including small cell lung cancer, kidney cancer, uterine cancer, cervical cancer, prostate cancer, bladder cancer, and ovarian cancer; sarcomas including liposarcoma, synovial sarcoma, rhabdomyosarcoma, extraskeletal myxoid chondrosarcoma, Ewing's tumor and peripheral neuroepithelioma; testicular and ovarian dysgerminoma; retinoblastoma; Wilms' tumor; neuroblastoma; malignant melanoma; and mesothelioma.
Alternatively, the kits of the invention may be used to detect or predict phenotypes, including beneficial phenotypes, resulting from the presence of aneuploidy in a cell type or organism.
Alternative kits for implementing the analytic methods of this invention will be apparent to one of skill in the art and are intended to be comprehended within the accompanying claims.
5.10 METHODS FOR DETERMINING PROFILES USING TRA-NSCRIPT MICROARRAYS
This section provides details of a prefened embodiment wherein polynucleotide microaπays are used to determine profiles comprising measurements of RNA transcript levels. One of skill in the art will appreciate that there are numerous other possible methods for determining profiles including, but not limited to, the use of protein aπays, and that profiles may also comprise measurements (or changes in measurements), e.g., of abundances of protein species, protein activities, and/or levels of protein modification.
In general, the profiling methods of the present invention can be performed using any probe or probes that comprise a polynucleotide sequence and which are immobilized to a solid support or surface. For example, the probes may comprise DNA sequences, RNA sequences, or copolymer sequences of DNA and RNA. The polynucleotide sequences of the probes may also comprise DNA and/or RNA analogues, or combinations thereof. For example, the polynucleotide sequences of the probes may be full or partial sequences of genomic DNA, cDNA, or mRNA sequences extracted from cells. The polynucleotide sequences of the probes may also be synthesized nucleotide sequences, such as synthetic oligonucleotide sequences. The probe sequences can be synthesized either enzymatically in vivo, enzymatically in vitro (e.g., by PCR), or non-enzymatically in vitro.
The probe or probes used in the methods of the invention are preferably immobilized to a solid support which may be either porous or non-porous. For example, the probes of the invention may be polynucleotide sequences that are attached to a nitrocellulose or nylon membrane or filter. Such hybridization probes are well known in the art (see, e.g., Sambrook et al, Eds., 1989, Molecular Cloning: A Laboratory Manual, 2nd ed., Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York). Alternatively, the solid support or surface may be a glass or plastic surface.
5.10.1 MICROARRAYS GENERALLY
This invention is particularly useful for the analysis of gene expression profiles in order to determine the likelihood of alterations to the genotype of a cell. Some embodiments of this invention are based on measuring the transcriptional state of a cell. The transcriptional state can be measured by techniques of hybridization to microaπays of probes consisting of a solid phase on the surface of which are immobilized a population of polynucleotides, such as a population of DNA or DNA mimics, or, alternatively, a population of RNA or RNA mimics. The solid phase may be a nonporous or, optionally, a porous material such as a gel. In various alternative embodiments, microaπays can be employed for analyzing aspects of the biological state of a cell other than the transcriptional state, such as the translational state, the activity state, or mixed aspects.
In prefened embodiments, a microaπay comprises a support or surface with an ordered aπay of binding (e.g., hybridization) sites or "probes" for products of many of the genes in the genome of a cell or organism, preferably most or almost all of the genes. Preferably the microarrays are addressable aπays, preferably positionally addressable aπays. More specifically, each probe of the aπay is preferably located at a known, predetermined position on the solid support such that the identity (i.e., the sequence) of each probe can be determined from its position in the aπay (i.e., on the support or surface). In prefened embodiments, each probe is covalently attached to the solid support at a single site. Microaπays can be made in a number of ways, of which several are described below. However produced, microaπays share certain characteristics: The aπays are reproducible, allowing multiple copies of a given aπay to be produced and easily compared with each other. Preferably, microaπays are made from materials that are stable under binding (e.g., nucleic acid hybridization) conditions, and include large nylon aπays, such as those sold by Research Genetics. The microaπays are preferably small, e.g., between 5 cm2 and 25 cm2, preferably between 12 cm2 and 13 cm2. However, larger aπays are also contemplated and may be preferable, e.g., for use in screening and/or signature chips comprising a very large number of distinct oligonucleotide probe sequences. Preferably, a given binding site or unique set of binding sites in the microanay will specifically bind (e.g., hybridize) to the product of a single gene in a cell (e.g., to a specific mRNA, or to a specific cDNA derived therefrom). However, in general other, related or similar sequences may cross hybridize to a given binding site. Although there may be more than one physical binding site per specific RNA or DNA, for the sake of clarity the discussion below will assume that there is a single, completely complementary binding site.
The microaπays of the present invention include one or more test probes, each of which has a polynucleotide sequence that is complementary to a subsequence of RNA or DNA to be detected. Each probe preferably has a different nucleic acid sequence, and the position of each probe on the solid surface is preferably known. Indeed, the microanays are preferably addressable aπays, and more preferably are positionally addressable anays. Specifically, each probe of the aπay is preferably located at a known, predetermined position on the solid support such that the identity (i.e., the sequence) of each probe can be determined from its position on the aπay (i.e., on the support or surface).
Preferably, the density of probes on a microaπay is between about 100 and 1,000 different (i.e., non-identical) probes per 1 cm2. More preferably, a microaπay of the invention will have between about 1,000 and 5,000 different probes per 1 cm2, between about 5,000 and 10,000 different probes per 1 cm2, between about 10,000 and 15,000 different probes per 1 cm2 or between about 15,000 and 20,000 different probes per 1 cm2. In a particularly prefened embodiment, the microanay is a high density aπay, preferably having a density of between about 1,000 and 5,000 different probes per 1 cm2. The microanays of the invention therefore preferably contain at least 2,500, at least 5,000, at least 10,000, at least 15,000, at least 20,000, at least 25,000, at least 50,000, at least 55,000, at least 100,000 or at least 150,000 different (i.e., non-identical) probes.
In specific embodiments, the density of probes on a microaπay is between about 100 and 1,000 different (i.e., non-identical) probes per 1 cm2, between 1,000 and 5,000 different probes per 1 cm2, between 5,000 and 10,000 different probes per 1 cm2, between 10,000 and 15,000 different probes per 1 cm2, between 15,000 and 20,000 different probes per 1 cm2, between 20,000 and 50,000 different probes per cm2, between 50,000 and 100,000 different probes per 1 cm2, between 100,000 and 500,000 different probes per 1 cm2, or more than 500,000 different (i.e., non-identical) probes per 1 cm2. In one embodiment, the microaπay is an aπay (i.e., a matrix) in which each position represents a discrete binding site for a product encoded by a gene (i.e., an mRNA or a cDNA derived therefrom), and in which binding sites are present for products of most or almost all of the genes in the organism's genome. For example, the binding site can be a DNA or DNA analogue to which a particular RNA can specifically hybridize. The DNA or DNA analogue can be, e.g., a synthetic oligomer, a full-length cDNA, a less-than full length cDNA, or a gene fragment.
Although in a prefened embodiment the microanay contains binding sites for products of all or almost all genes in the target organism's genome, such comprehensiveness is not necessarily required. Usually the microaπay will have binding sites conesponding to at least about 5% of the genes in the genome, sometimes to as many as 25%, often to at least about 50%, more often to at about 75%, even more often to at least about 85%, even more often to about 90%, and still more often to at least about 99%. Alternatively, however, "picoaπays," which may have binding sites for several hundred genes, may also be used. Such aπays are microaπays which contain binding sites for products of only a limited number of genes in the target organism's genome. Generally, a picoanay contains binding sites conesponding to fewer than about 50% of the genes in the genome of an organism.
Preferably, the microarray has binding sites for genes associated with one or more biological pathways responsible for producing a phenotype of interest. A "gene" is typically identified as the portion of DNA that is transcribed by RNA polymerase. Thus, a gene may include a 5' untranslated region ("UTR"), introns, exons and a 3' UTR. Thus, a gene comprises at least 25 to 100,000 nucleotides from which a messenger RNA is transcribed in the organism or in some cell in a multicellular organism. The number of genes in a genome can be estimated from the number of mRNAs expressed by the organism, or by extrapolation from a well characterized portion of the genome. When a genome having few introns of an organism of interest, such as yeast, has been sequenced, the number of open reading frames ("ORF") can be determined and mRNA coding regions identified by analysis of the DNA sequence. For example, the genome of Saccharomyces cerevisiae has been completely sequenced, and is reported to have approximately 6275 ORFs longer than 99 amino acids. Analysis of these ORFs indicates that there are 5885 ORFs that are likely to encode protein products (Goffeau et al., 1996, Science 274:546- 567). In contrast, the human genome is estimated to contain approximately 105 genes, although estimates vary from about 35,000 to about 120,000 genes (Crollius et al. (2000) Nat. Genetics 25:235-238; Ewing et al. (2000) Nat. Genetics 25:232-234; Liang et al. (2000) Nat Genetics 25:239-240).
5.10.2 PREPARATION OF PROBES FOR MICROARRAYS As noted above, the "probe" to which a particular polynucleotide molecules specifically hybridizes according to the invention is a complementary polynucleotide sequence. In one embodiment, the probes of the microanay comprise nucleotide sequences greater than about 250 bases in length conesponding to one or more genes or gene fragments. For example, the probes may comprise DNA or DNA "mimics" (e.g., derivatives and analogues) conesponding to at least a portion of each gene in an organism's genome. In another embodiment, the probes of the microanay are complementary RNA or RNA mimics. DNA mimics are polymers composed of subunits capable of specific, Watson-Crick-like hybridization with DNA, or of specific hybridization with RNA. The nucleic acids can be modified at the base moiety, at the sugar moiety, or at the phosphate backbone. Exemplary DNA mimics include, e.g., phosphorothioates. DNA can be obtained, e.g., by polymerase chain reaction (PCR) amplification of gene segments from genomic DNA, cDNA (e.g., by RT-PCR), or cloned sequences. PCR primers are preferably chosen based on known sequence of the genes or cDNA that result in amplification of unique fragments (i.e., fragments that do not share more than 10 bases of contiguous identical sequence with any other fragment on the microanay). Computer programs that are well known in the art are useful in the design of primers with the required specificity and optimal amplification properties, such as Oligo version 5.0 (National Biosciences). Typically each probe on the microanay will be between 20 bases and 50,000 bases, and usually between 300 bases and 1000 bases in length. PCR methods are well known in the art, and are described, for example, in Innis et al., eds., 1990, PCR Protocols: A Guide to Methods and Applications, Academic Press Inc., San Diego, CA. It will be apparent to one skilled in the art that controlled robotic systems are useful for isolating and amplifying nucleic acids.
An alternative, prefened means for generating the polynucleotide probes of the microaπay is by synthesis of synthetic polynucleotides or oligonucleotides, e.g., using N- phosphonate or phosphoramidite chemistries (Froehler et al. , 1986, Nucleic Acid Res. 14:5399-5401; McBride et al, 1983, Tetrahedron Lett. 24:246-248). Synthetic sequences are typically between about 15 and about 500 bases in length, more typically between about 20 and about 100 bases, most preferably between about 40 and about 70 bases in length, hi some embodiments, synthetic nucleic acids include non-natural bases, such as, but by no means limited to, inosine. As noted above, nucleic acid analogues may be used as binding sites for hybridization. An example of a suitable nucleic acid analogue is peptide nucleic acid (see, e.g., Egholm et al, 1993, Nature 363:566-568; U.S. Patent No. 5,539,083). 5 In alternative embodiments, the hybridization sites (i.e., the probes) are made from plasmid or phage clones of genes, cDNAs (e.g., expressed sequence tags), or inserts therefrom (Nguyen et al, 1995, Genomics 29:201-209).
5.10.3 ATTACHINGPROBES TO THE SOLID SURFACE
10 The probes are attached to a solid support or surface, which may be made, e.g., from glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, gel, or other porous or nonporous material. A prefened method for attaching the nucleic acids to a surface is by printing on glass plates, as is described generally by Schena et al, 1995, Science 270:461-410. This method is especially useful for preparing microanays of cDNA
15 (See also, DeRisi et al, 1996, Nature Genetics 14:451-460; Shalon et al, 1996, Genome Res. 6:639-645; and Schena et al, 1995, Proc. Natl. Acad. Sci. U.S.A. 3: 10539-11286).
A second prefened method for making microaπays is by making high-density oligonucleotide arrays. Techniques are known for producing aπays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface
20 using photolithographic techniques for synthesis in situ (see, Fodor et al, 1991, Science 251:161-113; Pease et al, 1994, Proc. Natl. Acad. Sci. U.S.A. 91:5022-5026; Lockhart et al, 1996, Nature Biotechnology 14:1615; U.S. Patent Nos. 5,578,832; 5,556,752; and 5,510,270) or other methods for rapid synthesis and deposition of defined oligonucleotides (Blanchard et al, Biosensors & Bioelectronics 11:681-690). When these methods are used,
25 oligonucleotides (generally of length 20 to 70 bases) of known sequence are synthesized directly on a surface such as a derivatized glass slide. Usually, the aπay produced is redundant, with several oligonucleotide molecules per RNA. Oligonucleotide probes can be chosen to distinguish between alternatively spliced mRNAs.
Other methods for making microaπays, e.g., by masking (Maskos and Southern,
30 1992, Nuc. Acids. Res. 20:1679-1684), may also be used. In principle, and as noted supra, any type of aπay, for example, dot blots on a nylon hybridization membrane (see Sambrook et al, supra) could be used. However, as will be recognized by those skilled in the art, very small arrays will frequently be prefened because hybridization volumes will be smaller. In a particularly prefened embodiment, microanays of the invention are
35 manufactured by means of an ink j et printing device for oligonucleotide synthesis, e.g. , using the methods and systems described by Blanchard in International Patent Publication No. WO 98/41531, published September 24, 1998; Blanchard et al, 1996, Biosensors and Bioeletronics 11:681-690; Blanchard, 1998, in Synthetic DNA Arrays in Genetic Engineering, Vol. 20, J.K. Setlow, Ed., Plenum Press, New York at pages 111-123; U.S. Patent No. 6,028,189 to Blanchard. Specifically, the oligonucleotide probes in such microaπays are preferably synthesized in aπays e.g. , on a glass slide, by serially depositing individual nucleotide bases in "microdroplets" of a high surface tension solvent such as propylene carbonate. The microdroplets have small volumes (e.g., 100 pL or less, more preferably 50 pL or less) and are separated from each other on the microanay (e.g., by hydrophobic domains) to form circular surface tension wells which define the locations of the aπay elements (z. e. , the different probes).
5.10.4 TARGET POLYNUCLEOTIDE MOLECULES
Target polynucleotides which may be analyzed by the methods and compositions of the invention include RNA molecules such as, but by no means limited to, messenger RNA (mRNA) molecules, ribosomal RNA (rRNA) molecules, cRNA molecules (i. e. , RNA molecules prepared from cDNA molecules that are transcribed in vivo) and fragments thereof. Target polynucleotides which may also be analyzed by the methods and compositions of the present invention include, but are not limited to DNA molecules such as genomic DNA molecules, cDNA molecules, and fragments thereof including oligonucleotides, ESTs, STSs, etc.
The target polynucleotides may be from any source. For example, the target polynucleotide molecules may be naturally occurring nucleic acid molecules such as genomic or extragenomic DNA molecules isolated from an organism, or RNA molecules, such as mRNA molecules, isolated from an organism. Alternatively, the polynucleotide molecules may be synthesized, including, e.g., nucleic acid molecules synthesized enzymatically in vivo or in vitro, such as cDNA molecules, or polynucleotide molecules synthesized by PCR, RNA molecules synthesized by in vitro transcription, etc. The sample of target polynucleotides can comprise, e.g., molecules of DNA, RNA, or copolymers of DNA and RNA. In prefened embodiments, the target polynucleotides of the invention will conespond to particular genes or to particular gene transcripts (e.g., to particular mRNA sequences expressed in cells or to particular cDNA sequences derived from such mRNA sequences). However, in many embodiments, particularly those embodiments wherein the polynucleotide molecules are derived from mammalian cells, the target polynucleotides may coπespond to particular fragments of a gene transcript. For example, the target polynucleotides may coπespond to different exons of the same gene, e.g., so that different splice variants of that gene may be detected and/or analyzed. In prefened embodiments, the target polynucleotides to be analyzed are prepared in vitro from nucleic acids extracted from cells. For example, in one embodiment, RNA is extracted from cells (e.g., total cellular RNA, poly(A)+ messenger RNA, fraction thereof) and messenger RNA is purified from the total extracted RNA. Methods for preparing total and poly(A)+ RNA are well known in the art, and are described generally, e.g., in Sambrook et al, supra. In one embodiment, RNA is extracted from cells of the various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation (Chirgwin et al, 1979, Biochemistry 18:5294-5299). cDNA is then synthesized from the purified mRNA using, e.g., oligo-dT or random primers. In another prefened embodiment, the target polynucleotides are cRNA prepared from purified messenger RNA extracted from cells. As used herein, cRNA is defined as RNA complementary to the source RNA. The extracted RNAs are amplified using a process in which doubled-stranded cDNAs are synthesized from the RNAs using a primer linked to an RNA polymerase promoter in a direction capable of directing transcription of anti-sense RNA. Anti-sense RNAs or cRNAs are then transcribed from the second strand of the double-stranded cDNAs using an RNA polymerase (see, e.g., U.S. Patent Nos. 5,891,636, 5,716,785; 5,545,522 and 6,132,997; see also, U.S. Patent Application Serial No. 09/411,074, filed October 4, 1999 by Linsley and Schelter and U.S. Provisional Patent Application Serial No. to be assigned, Attorney Docket No. 9301-124-888, filed on November 28, 2000, by Ziman et al). Both oligo-dT primers (U.S. Patent Nos. 5,545,522 and 6,132,997) or random primers (U.S. Provisional Patent Application Serial No. to be assigned, Attorney Docket No. 9301-124-888, filed November 28, 2000, by Ziman et al.) that contain an RNA polymerase promoter or complement thereof can be used. Preferably, the target polynucleotides are short and/or fragmented polynucleotide molecules which are representative of the original nucleic acid population of the cell.
The target polynucleotides to be analyzed by the methods and compositions of the invention are preferably detectably labeled. For example, cDNA can be labeled directly, e.g., with nucleotide analogs, or indirectly, e.g., by making a second, labeled cDNA strand using the first strand as a template. Alternatively, the double-stranded cDNA can be transcribed into cRNA and labeled.
Preferably, the detectable label is a fluorescent label, e.g., by incorporation of nucleotide analogs. Other labels suitable for use in the present invention include, but are not limited to, biotin, imminobiotin, antigens, cofactors, dinitrophenol, lipoic acid, olefinic compounds, detectable polypeptides, electron rich molecules, enzymes capable of generating a detectable signal by action upon a substrate, and radioactive isotopes. Prefened radioactive isotopes include 32P, 35S, 1 C, 15N and 125I. Fluorescent molecules suitable for the present invention include, but are not limited to, fluorescein and its derivatives, rhodamine and its derivatives, texas red, 5'carboxy-fluorescein ("FMA"), 2',7'- dimethoxy-4',5 '-dichloro-6-carboxy-fluorescein ("JOE"), N,N,N',N'-tetramethyl-6-carboxy- rhodamine ("TAMRA"), 6'carboxy-X-rhodamine ("ROX"), HEX, TET, IRD40, and IRD41. Fluroescent molecules that are suitable for the invention further include: cyamine dyes, including by not limited to Cy3, Cy3.5 and Cy5; BODIPY dyes including but not limited to BODIPY-FL, BODIPY-TR, BODIPY-TMR, BODIPY-630/650, and BODIPY-650/670; and ALEXA dyes, including but not limited to ALEXA-488, ALEXA-532, ALEXA-546, ALEXA-568, and ALEXA-594; as well as other fluorescent dyes which will be known to those who are skilled in the art. Electron rich indicator molecules suitable for the present invention include, but are not limited to, ferritin, hemocyanin, and colloidal gold. Alternatively, in less prefened embodiments the target polynucleotides may be labeled by specifically complexing a first group to the polynucleotide. A second group, covalently linked to an indicator molecules and which has an affinity for the first group, can be used to indirectly detect the target polynucleotide. In such an embodiment, compounds suitable for use as a first group include, but are not limited to, biotin and iminobiotin. Compounds suitable for use as a second group include, but are not limited to, avidin and streptavidin.
5.10.5 HYBRIDIZATION TO MICROARRAYS As described supra, nucleic acid hybridization and wash conditions are chosen so that the polynucleotide molecules to be analyzed by the invention (refeπed to herein as the "target polynucleotide molecules") specifically bind or specifically hybridize to the complementary polynucleotide sequences of the aπay, preferably to a specific array site, wherein its complementary DNA is located. Aπays containing double-stranded probe DNA situated thereon are preferably subjected to denaturing conditions to render the DNA single-stranded prior to contacting with the target polynucleotide molecules. Arrays containing single-stranded probe DNA (e.g., synthetic oligodeoxyribonucleic acids) may need to be denatured prior to contacting with the target polynucleotide molecules, e.g., to remove hairpins or dimers which form due to self complementary sequences.
Optimal hybridization conditions will depend on the length (e.g., oligomer versus polynucleotide greater than 200 bases) and type (e.g., RNA, or DNA) of probe and target nucleic acids. General parameters for specific (i.e., stringent) hybridization conditions for nucleic acids are described in Sambrook et al, (supra), and in Ausubel et al, 1987, Current Protocols in Molecular Biology, Greene Publishing and Wiley-Interscience, New York. When the cDNA microaπays of Schena et al. are used, typical hybridization conditions are hybridization in 5 X SSC plus 0.2% SDS at 65 °C for four hours, followed by washes at 25
°C in low stringency wash buffer (1 X SSC plus 0.2% SDS), followed by 10 minutes at 25
°C in higher stringency wash buffer (0.1 X SSC plus 0.2% SDS) (Shena et al, 1996, Proc.
Natl. Acad. Sci. U.S.A. P5.T0614). Useful hybridization conditions are also provided in, e.g., Tijessen, 1993, Hybridization With Nucleic Acid Probes, Elsevier Science Publishers
BN. and Kricka, 1992, Νonisotopic DΝA Probe Techniques, Academic Press, San Diego,
CA.
Particularly prefened hybridization conditions for use with the screening and/or signaling chips of the present invention include hybridization at a temperature at or near the mean melting temperature of the probes (e.g., within 5 °C, more preferably within 2 °C) in
1 M ΝaCl, 50 mM MES buffer (pH 6.5), 0.5% sodium sarcosine and 30% formamide.
5.10.6 SIGNAL DETECTION AND DATA ANALYSIS
It will be appreciated that when cDNA or cRNA complementary to the RNA of a cell is made and hybridized to a microaπay under suitable hybridization conditions, the level of hybridization to the site in the aπay conesponding to any particular gene will reflect the prevalence in the cell of mRNA transcribed from that gene. For example, when detectably labeled (e.g., with a fluorophore) cDNA or cRNA complementary to the total cellular mRNA is hybridized to a microaπay, the site on the anay conesponding to a gene (i-e., capable of specifically binding the product of the gene) that is not transcribed in the cell will have little or no signal (e.g., fluorescent signal), and a gene for which the encoded mRNA is prevalent will have a relatively strong signal.
In prefened embodiments, cDNAs or cRNAs from two different cells are hybridized to the binding sites of the microanay. In the case of the instant invention, as an example, one cell is a wild-type cell and another cell is of the same type but is aneuploid. The cDNA or cRNA derived from each of the two cell types are differently labeled so that they can be distinguished. In one embodiment, for example, cDNA or cRNA from an aneuploid cell is synthesized using a fluorescein-labeled dNTP, and cDNA or cRNA from a second, wild- type cell is synthesized using a rhodamine-labeled dNTP. When the two cDNAs or cRNAs are mixed and hybridized to the microanay, the relative intensity of signal from each cDNA or cRNA set is determined for each site on the aπay, and any relative difference in abundance of a particular mRNA is thereby detected.
In the example described above, the cDNA or cRNA from the aneuploid cell will fluoresce green when the fluorophore is stimulated, and the cDNA or cRNA from the wild-type cell will fluoresce red. As a result, when the aneuploidy has no effect, either directly or indirectly, on the relative abundance of a particular mRNA in a cell, the mRNA will be equally prevalent in both cells, and, upon reverse transcription, red-labeled and green-labeled cDNA or cRNA will be equally prevalent. When hybridized to the microanay, the binding site(s) for that species of RNA will emit wavelength characteristic of both fluorophores. In contrast, when the aneuploidy either directly or indirectly increases the prevalence of the mRNA in the cell, the ratio of green to red fluorescence will increase. When the mutation decreases the mRNA prevalence, the ratio will decrease.
The use of a two-color fluorescence labeling and detection scheme to define alterations in gene expression has been described, e.g., in Shena et al, 1995, Science 270:461-410. An advantage of using cDNA or cRNA labeled with two different fluorophores is that a direct and internally controlled comparison of the mRNA levels conesponding to each aπayed gene in two cell genotypes can be made, and variations due to minor differences in experimental conditions (e.g., hybridization conditions) will not affect subsequent analyses.
In one embodiment, the fluorescent labels in two-color differential hybridization experiments are reversed to reduce biases peculiar to individual genes or anay spot locations, and consequently, to reduce experimental eπor. In other words, it is preferable to first measure gene expression with one labeling (e.g., labeling wild-type cells with a first fluorophore and mutant cells with a second fluorophore) of the mRNA from the two cells being measured, and then to measure gene expression from the two cells with reversed labeling (e.g., labeling wild-type cells with the second fluorophre and mutant cells with the first fluorophore).
When fluorescently labeled probes are used, the fluorescence emissions at each site of a transcript anay can be, preferably, detected by scanning confocal laser microscopy or a charge-coupled device ("CCD"). In one embodiment, a separate scan, using the appropriate excitation line, is carried out for each of the two fluorophores used. Alternatively, a laser can be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously (see Shalon et al, 1996, Genome Res. 6:639-645). In a prefened embodiment, the aπays are scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective. Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser, and the emitted light is split by wavelength and detected with two photomultiplier tubes. Such fluorescence laser scanning devices are described, e.g., in Schena et a , 1996, Genome Res. 6:639-645. Alternatively, the fiber-optic bundle described by Ferguson et al, 1996, Nature Biotech. 14:1681-1684, may be used to monitor mRNA abundance levels at a large number of sites simultaneously. Signals are recorded and, in a preferred embodiment, analyzed by computer, e.g., using a 12 bit analog to digital board. In one embodiment, the scanned image is despeckled using a graphics program (e.g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet of the average hybridization at each wavelength at each site. If necessary, an experimentally determined coπection for "cross talk" (or overlap) between the channels for the two fluors may be made. For any particular hybridization site on the transcript aπay, a ratio of the emission of the two fluorophores can be calculated. The ratio is independent of the absolute expression level of the cognate gene, but is useful for genes whose expression is significantly modulated by alterations in the genotype of a cell.
According to the method of the invention, the relative abundance of an mRNA in two cells or cell lines is scored as a perturbation and its magnitude determined (i.e., the abundance is different in the two sources of mRNA tested) or as not perturbed (i.e., the relative abundance is the same). As used herein, a difference between the two sources of RNA of at least a factor of about 25% (i. e. , RNA is 25% more abundant in one source than in the other source), more usually about 50%, even more often by a factor of about 2 (i.e., twice as abundant), 3 (three times as abundant), or 5 (five times as abundant) is scored as a perturbation. Present detection methods allow reliable detection of difference of an order of about 3-fold to about 5-fold, but more sensitive methods are expected to be developed. Preferably, in addition to identifying a perturbation as positive or negative, it is advantageous to determine the magnitude of the perturbation. This can be carried out, as noted above, by calculating the ratio of the emission of the two fluorophores used for differential labeling, or by analogous methods that will be readily apparent to those of skill in the art.
6. EXAMPLES
The following example is presented by way of illustration of the previously described invention and is not limiting of that description.
6.1 EXAMPLE 1 : DETECTION OF ANEUPLOIDY
IN YEAST USING DNA MICROARRAY EXPRESSION PROFILING
This example demonstrates the appearance of aneuploidy in yeast knockout strains detected using the methods of the invention. 6.1.1 MATERIALS AND METHODS
Yeast strains. The genotypes of the nearly 300 strains used to generate expression profiles can be found at the Rosetta Inpharmatics, Inc. web site (www.rii.com). Essentially all 300 strains are derived from strain BY4743 (MATa/MAT Hs3Δl/his3Δl leu2Δ0/leu2Δ0 ura3ΔO/ura3ΔO +/metl5Δ0 A/lys2ΔO), the parental strain for the
International Saccharomyces Genome Deletion Consortium. Winzeler, E.A. et al. (1999) Science 285:901-906 (Stanford University web site).
In order to minimize the potential impact of unlinked recessive mutations associated with strain construction, homozygous diploid deletion mutants were profiled when possible. All deletion mutants are start-to-stop codon. For experiments involving tet-regulatable genes, the natural promoter on the chromosome was replaced with a heptamerized tet operator fused to a kanamycin-resistance cassette enabling direct integration, and the 'tet activator' (tTA*, which dissociates in the presence of doxycycline) was supplied either on a CEN plasmid (Gari et al. (1997) Yeast 13:837-848.) or integrated into the genome. Yeast culture and cDNA microaπay expression analysis. Experimental (mutant) cultures were grown, harvested, and processed in parallel with conesponding wild-type or ' control cultures. Several colonies of similar size were picked from freshly-streaked YAPD agar plates into liquid Synthetic Complete medium (SC) with 2% glucose, grown overnight at 30 °C to mid-log phase, diluted to 0.4 - 1.0 x 106 cells/ml, and grown an additional 5-7 hours until reaching 0.4 - 1.0 x 107 cells/ml, at which point they were sedimented by centrifugation for 2 minutes at room temperature and frozen in liquid nitrogen. The final optical densities of experimental and control cultures were matched as closely as possible. Total RNA was prepared by phenolchlorofoim extraction followed by ethanol precipitation, as described previously (Marton et al. (1998) Nat. Med. 4:1293-1301), except that vortexing with glass beads was replace by a 10 minute incubation at 65 °C followed by 1 minute of vortexing. Poly-A+ RNA purification, cDNA labeling, microanay production, and microanay hybridization and washing were as described previously (Marton et al. (1998) Nat. Med. 4:1293-1301) with measurements taken in fluor-reversed pairs. Aπays were scanned, images were quantitated and physical artifacts (dust and salt residue) edited as described previously (Marton et al. (1998) Nat. Med. 4: 1293-1301.). Resulting data files were evaluated by a series of quality-control criteria relating first to the image itself and second to known biological artifacts. Experiments flagged as containing biological artifacts were noted but not excluded, for purposes of illustrating the impact of biases.
Genomic DNA extraction, labeling, and hybridization to microanays. In order to confirm the presence of aneuploidy, genomic DNA was extracted from 5 ml saturated cultures grown in YPD medium with minor modifications to standard techniques. See. Hoffman, CS. & Winston, F. (1987) Gene 57:267-272. Two micrograms of genomic DNA were denatured and annealed to 1 μg random hexamers, and labeled at 37° C in 15 μl reactions containing lx NEB buffer 2, 7 units of Klenow fragment of DNA Polymerase I, 500 μM dATP, dCTP, and dGTP, 200 μM dUTP, and 100 μM Cy-dUTP. Production of
5 cDNA microanays, hybridizations, washing, and image analysis were performed according to the two-color procedure described in (Marton et al. (1998) Nat. Med. 4:1293-1301). Microaπays were scanned on either a General Scanning ScanAπay3000 or a Genetic Microsystems 418 Anay Scanner. For determination of aneuploidy in small colonies versus large colonies, cells were streaked on five plates, and approximately 2000 small colonies or
10 50 large colonies were picked using a toothpick and resuspended directly in lysis buffer for DΝA extraction (Hoffman, CS. & Winston, F. (1987) Gene 57:267-272). Analysis of data
Normalization. The relative expression level of a gene in the mutant relative to the wild-type control conveyed as a ratio is the expression ratio. A coπelation plot (Figs, la-c,
15 5a) displays the expression ratio of each gene from one profile plotted versus its expression ratio in a second expression profile. The Cy3 and Cy5 channels are normalized by the mean signal intensity for all yeast ORF spots. Thus, by convention, the mean expression ratio for all spots is unity.
Whole chromosome aneuploidy. The mean chromosomal ratio plots (Figs. 1, 4, 5) 0 display, in logarithmic scale, the average of all expression ratios for each individual chromosome. The mean expression ratio for each chromosome is an eπor-weighted mean of all the ORFs present on that chromosome, with the enor calculated based on the individual spot intensity slide quality, and the slide quality, i.e., the degree to which the determined ratios agree in each of two slides from a fluor-reversed pair of hybridizations 5 done per experiment. A cliromosome was flagged as having a statistically significant chromosome-wide expression bias if the mean chromosomal ratio had an offset of greater than 0.1 in log space and was at least ten standard deviations from the mean (P < 10"20). P values were calculated from the number of standard deviations from the mean, assuming a Gaussian distribution, which was verified by analysis of 63 wild-type vs wild-type control 0 experiments (Hughes et al. (2000) Cell 102(1):109-126). The estimated systematic bias of each chromosome with respect to the mean is at the level of 0.0016 of log10(ratio). The enor bar of the mean ratio in log space was computed from the spread of the data, taking into account the enor of each point and the number of data points.
Segmental aneuploidy. To explore expression profiling data for potential 5 occurrences of segmental aneuploidy, data were scanned for instances in which four or more non-overlapping, chromosomally-adjacent genes were all up- or down-regulated at a 0.05 significance threshold. Twenty-two cases were identified in which at least four adjacent genes were apparently coordinately regulated. Four cases were tested (three of the four are listed in Table Id) and all were confirmed experimentally by genomic DNA hybridization. The rpUOaΔ mutant contained a 56-ORF duplication from YOR290c to
5 YOR343c, which in the wild-type is flanked by retrotransposon long terminal repeats (LTRs) and a Ty2 transposon on the centrometric and telomeric sides, respectively. The top3Δ mutant contained a 28-ORF duplication from YLR228c to YOR256w and in the wild- type is flanked by LTRs and a Tyl transposon on the centromeric and telomeric sides, respectively. The genomic DNA hybridization of the rad27Δ mutant was consistent with an
1° 18-ORF deletion from YDR367w to YDR385w. The centromeric side of the duplicated region is flanked by two LTRs and a Tyl transposon, whereas the telomeric side has no obvious sequence features.
6.1.2 RESULTS AND DISCUSSION 5 Using a two-color competitive hybridization DNA microarray protocol (DeRisi et al.
(1997) Science 278:680-686; Marton et al. (1998) Nature Med. 4:1293-1301), expression profiles were generated for nearly three hundred S. cerevisiae deletion mutants as previously described (Hughes et al. (2000) Cell 102:109-126), mostly obtained through the Saccharomyces Genome Deletion Consortium. See, Winzeler, E.A. et al. (1999) Science
20 285:901-906. An unexpected expression profile similarity ® = 0.63; Fig. la) was observed between mutants harboring null mutations in ERG4 and ECM18. See. Lai, M.H. et al. (1994) Gene 140:41-49; Lussier, M. et al. (1997) 147:435-450. Many of the shared up- regulations conesponded to genes located on chromosome Nil, but not other chromosomes (Fig. lb,c). A plot of the mean of the expression ratios for all genes on a particular
25 chromosome revealed that, on average, the expression of all genes on chromosome Nil was higher in the erg4Δ and ecml8Δ/ecml8Δ mutants, respectively, than in the parental wild-type control to which the mutant was compared (Fig. ld,e; circles). To determine whether this increased expression could be explained by increased gene dosage, genomic D A from the mutant and parental wild-type strains was isolated, labeled and hybridized to 0 DΝA microanays, and the results plotted in the same manner (Fig. ld,e; squares). These data indicate that the mutants possess more genomic DΝA from chromosome Nil than the wild-type control. Because the elevated gene expression and genomic DΝA ratios include essentially all genes on the chromosome, the simplest model explaining these observations is that the mutant strains contain an additional copy or copies of chromosome Nil. The discovery of a spurious coπelation resulting from aneuploidy in two independent yeast mutants not known to suffer chromosome instability prompted a search for additional examples of aneuploidy in a collection of expression profiles. Plots of the mean expression ratio for each chromosome for all other mutants profiled revealed that expression profiles from -8% of the mutants (22 of 290) contained at least one cliromosome that displayed a mean chromosomal ratio bias greater than 0.1 in log space and that was at
5 least ten standard deviations from the mean. Each case was confirmed by hybridizing genomic DNA from the mutant strains to microaπays as above (Table 1 a,b,c). While several of these mutants (bubl, bub3, and biml) axe known to have defects in chromosome segregation, the majority are not thought to be directly involved in genome stability (Hoyt, M.A., Totis, L. & Roberts, B.T. (1991) Cell 66:507-517; Schwartz, K., Richards, K. &
10 Botstein, D. (1997) Mol. Biol. Cell 8:2677-2691).
In five of the aneuploid mutants (Table lc), the additional chromosome harbored a gene encoding a highly related (80-90% identical) protein. For example, the rps24αΔ/rps24αΔ and mrlΔ strains profiled both contained extra copies of chromosome IX, on which are located RPS24B (91% identical to RPS24A) and RNR3 (80% identical to
15 RNRl). In all five cases, the deletions resulted in a slow-growth phenotype, suggesting that gain of the whole chromosome may have been a result of a selection for increased growth- rate by increasing gene dosage of the paralog (i.e., RPS24B and RNR3) of the deleted genes. When slow-growing colonies of the rps24αΔ/rps24αΔ and rnrlΔ mutants were streaked on solid medium, fast-growning colonies were observed (Fig. 4a). Comparative hybridization
20 of genomic DΝA from pooled large colonies versus pooled small colonies for each mutant revealed that large colonies contained an additional copy or copies of chromosome IX (Fig. 4b,c), strongly suggesting that the extra chromosome provided a selective growth . advantage.
Expression biases within chromosomal segments (Table d) were identified by
25 plotting the expression ratio of each gene as a function of its chromosomal location. An expression bias in a 56-ORF region on the right ami of chromosome XV was noted in the rpl20αΔ/rpl20αΔ mutant expression profile (Fig. 2d). The genomic content data (Fig. 2c) minor precisely the expression data in this region, suggesting that duplication can completely explain the expression bias. Interestingly, this region (between ORFs YOR290c
30 and YOR3436) is precisely flanked by retrotransposon long terminal repeats (Fig. 2b, d) and contains RPL20B, which encodes a protein with 99% identity to RPL20α. The duplication may have been the result of a homologous recombination event and a selection for increased dosage of RPL20B.
The presence of chromosome- wide expression biases in 8% of the approximately
35 300 strains indicates that whole-chromosome aneuploidy is widespread in laboratory yeast strains. However, considering the number of cell divisions involved in strain construction and storage, the frequency of aneuploidy in these strains (excluding mutants with a clear growth defect or chromosome missegregation phenotype) is in agreement with previous estimates of the frequency of mitotic chromosome loss (Hartwell, L.H. & Smith, D. (1985) Genetics 110:381-395). These results have several implications. First, the data show that the mRNA abundance of nearly every gene on trisomic or monosomic chromosomes is altered, suggesting that in yeast there is no global dosage compensation mechanism to normalize expression from each gene (or cliromosome). An expression profile thus serves as a tool for the detection of aneuploidy, including even small deletions or duplications. Second, the fact that unexpected alterations in DNA copy number can lead to spurious coπelations between expression profiles poses an important potential hazard in drawing conclusions from gene expression data, particularly from cell lines or tumor cells that have unstable genomes. The presence of aneuploidy may complicate the interpretation of expression profiles. For example, loss of one copy of chromosome III, which contains the heteroallelic MATalMATa mating control locus, resulted in a false conelation between our mcmlDIMCMl and yor080wΔ/yor080wΔ mutant expression profiles (Fig. 5). In contrast to the erg4Δ - ecml8Δ/ecml8Δ conelation, which was dependent upon a large number of small-magnitude expression changes arising from genes on the duplicated chromosome, the mcmlΔIMCMl -yor080wΔlyor080wΔ conelation was mostly due to expression changes of genes present on chromosomes other than the aneuploid chromosome, as might be expected when a key transcriptional regulator is affected directly by the aneuploidy (Fig. 5a). Finally, the potential for aneuploidy to suneptitiously mask or alter phenotypes of deleterious mutations is a more general concern for geneticists when interpreting their results. The observation that very large duplications are recovered as dominant suppressors of single-gene mutations might suggest that the prevalence of such duplications in evolution and in cancer cells may be the result of a need to compensate loss of function of other genes. See, Wolfe, K.H. & Shields, D.C. (1997) Nature 387:708-713; Smith, N.G., Knight, R. & Hurst, L.D. (1999) Bioessays 21:697-703.
Table 1
Genotype Strain # Di- or Trisomy Monosomy Possible Explanation
Chrom SD Chrom SD
# # ecmlΔ/ecmlΔ 2311 3 21 - ecml8A/ecml8Δ 4719 7 39 Genotype Strain # Di- or Trisomy Monosomy Possible Explanation erg4Δ 7363 7 46 - - - ste20Δ/ste20Δ 2012 11 27 - - - rml2Δ/rml2Δ 1852 13 22 - - - rpd3Δ/rpd3Δ 320 13 23 - - - yhrOl IwΔ/yhrOl IwΔ 2018 14 30 - - - pfd2Δ/pfd2Δ 1778 14 37 - - - y oj-051 cΔfyor051 cΔ 2083 14 38 - - - mcmlΔ/MCMl 120 - - 3 15 - mcmlΔ/MCMl 121 - - 3 16 - yap3Δ/yap3Δ 2010 - - 3 24 - yor080wΔ/yor080wΔ 2103 - - 3 15 - b bimlΔfbimlΔ 406 25 26 1 11 spindle defects bublΔ 277 2,10 36,26 - - mitotic checkpoint defects bub3Δ/bub3Δ 1040 2,13 16,11 1 13 mitotic checkpoint defects sin3A/sin3Δ 3432 5,11 14,16 - - - c rnrlΔ 111 9 10 - - selection for -RΛ--.3 rpl27aΔ/rpl27aΔ 2017 4 30 - - selection for RPL27B rpl34aΔ/rpl34aΔ 382 9 21 - - selection for RPL34B rps24aΔ/rps24aΔ 985 9 12 - - selection for KPS24B rps27bΔ/rps27bΔ 2024 11 30 - - selection for RPS27A d rad27Δ/rad27Δ 1184 NA 18 NA enhanced recombination
ORFs rpΩOaΔ/rpUOaΔ 2373 56 NA NA selection for RPL20B ORFs top3Δ 9379 28 NA NA strain construction ORFs 6.2 EXAMPLE 2 : DETECTION OF SPURIOUS
ANEUPLOIDY IN YEAST DELETION MUTANTS USING DNA MICROARRAY EXPRESSION PROFILING
This example shows that aneuploidy can be detected in publicly available expression data obtained using SAGE and using microaπays.
6.2.1 MATERIALS AND METHODS
Expression data for the tuplΔ deletion mutant were obtained from the Stanford University web site (DeRisi, J.L., Iyer, V.R., & Brown, P.O. (1997) Science 278:680-686). Mutant/wild-type control expression ratios were used without applying an intensity
10 threshold. The enor model described in Example 1 above was applied to the data, assuming similar data quality. Expression data were also downloaded for another experiment utilizing a wild-type strain (wild-type compared to overexpression of YAP1). After normalization by total signal intensity, the mean chromosomal expression of the wild-type channels from the two independent hybridizations were compared. Expression profiles for
15 sixteen mutants, including rpblΔ187 and hhf2, were obtained from the MIT web site (Holstege, F.C et al. (1998) Cell 95:717-728; Wyrick, J.J. et al. (1999) 402:418-421). Most of these mutants were profiled in duplicate (i.e., two mutant hybridizations and two wild-type control hybridizations); genes called "absent" (i.e., genes not expressed) in any of the four hybridizations were excluded from analysis. The pip2Δ oaflΔ double mutant
20 SAGE expression data (Kal, A.J.et al. (1999) Mol. Biol Cell 10:1859-1872) were downloaded from the Molecular Biology of the .Cell web site. The data were first normalized for number of tags sequenced to compensate for the nearly three-fold excess of tags for the wild-type control over the mutant (14,367 and 5419 tags for the wild-type and mutant, respectively). Tags more than 500 bases upstream of the 3' end of an ORF were
25 excluded from analysis. The sum of all tags from the mutant was divided by the sum of all tags from the wild-type. Eπors were estimated by the square root of the sample size (Fig 6d).
6.2.2 RESULTS
30
Using expression data for the tuplΔ deletion mutant obtained from the Stanford University web site (DeRisi, J.L., Iyer, V.R., & Brown, P.O. (1997) Science 278:680-686), chromosome-wide expression biases were observed that are consistent with aneuploidy (Fig. 6a). Because the analysis suggested that the wild-type strain was the source of the „^ aneuploidy in the tuplΔ strain, expression data were also downloaded for another experiment utilizing a wild-type strain (wild-type compared to overexpression of YAPP). After normalization by total signal intensity, the mean chromosomal expression of the wild- type channels from the two independent hybridizations were compared and were consistent with the hypothesis that the wild-type control in the tupl data had an additional copy or copies of cliromosome XIII. This analysis, while suggestive, has limitations because it assumes the two independent hybridizations have coπelated eπors, i.e., it treats a competitive two-color experiment as if it were a one-color experiment.
Of the data downloaded from the MIT web site (Holstege, F.C et al. (1998) Cell 95:717-728; Wyrick, J.J. et al. (1999) 402:418-421), chromosome-wide expression biases were detected only in the rpblΔ187 and hhβ profiles (Fig. 6b-c). The hhβ strain expression profile suggested the loss of chromosome I. Since this strain was haploid, the simplest explanation is that the control strain was the source of the aneuploidy. Vnepip2Δ oaflΔ double mutant SAGE expression data downloaded from the Molecular Biology of the Cell web site (Kal, A.J.et al. (1999) Mol. Biol. Cell 10:1859-1872) suggest that this mutant exhibits aneuploidy of chromosome III.
Several studies contained data suggestive of aneuploidy but the expression biases did not meet the criteria described above in Example 1 (0.1 bias in log space and at least ten standard deviations from the mean). For example, an expression bias was noted in data from strain El that underwent adaptive evolution during approximately 500 generations in glucose-limited media (Ferea, T.L., Botstein, D., Brown, P.O. & Rosenzweig, R.F. (1999) Proc. Natl. Acad. Sci. USA 96:9721-9726) (chromosome XIV mean log10(ratio)= 0.07; ten standard deviations from the mean), and in two tefraploid strains recently profiled (Galitski, T., Saldanha, A.J., Styles, C.A., Lander, E.S. & Fink, G.R. (1999) Science 282:699-705) (chromosome VI of strain MATa/MAT /MAT /MAT had a mean log10(ratio)= 0.16 and was seven standard deviations from the mean; chromosome I of strain
MATa/MATa/MATa/MATa had a mean log10(ratio)= 0.17 and was six standard deviations
Figure imgf000090_0001
Expression profiles in which the same strain is profiled under two or more conditions, such as kinetic analyses (e.g., during diauxic shift (DeRisi, J.L., Iyer, N.R., &
Brown, P.O. (1997) Science 278:680-686), sporulation induction (Chu, S., et al. (1998)
Science 282:699-705) or cell cycle progression (Spellman, P.T., et al. (1998) Mol. Biol.
Cell 9:3273-3297)) , drug treatments (e.g., methyl methanesulfonate treatment (Jelinsky,
S.A. & Samson, L.D. (1999) Proc. Νatl Acad. Sci. USA 96:1486-1491)) or gene induction experiments (e.g., GAL-CLB2 and GAL-CLN3 experiments (Spellman, P.T., et al. (1998)
Mol. Biol. Cell 9:3273-3297)) are not expected to be susceptible to this type of problem and did not exhibit chromosome- wide expression biases. Other suitable methods include U.S. Patent Νos. 5,545,522, 5,891,636, and 5,716,785. In publicly available S. cerevisiae expression profiling data, several cases of chromosome-wide expression biases were observed, including the tuplΔ data of DeRisi et al. (DeRisi, J.L., Iyer, N.R., 8c Brown, P.O. (1997) Science 278:680-686) (Fig. 6a), the rpblΔ187 data reported in Holstege et al. (Holstege, F.C. et al. (1998) Cell 95:717-728) (Fig. 6b), and the hhβ depletion expression data of Wyrick et al. (Wyrick, J.J. et al. (1999) 402:418-421) (Fig. 6c). In addition, an expression profile of apip2Δ oαflΔ double mutant was deteπnined by SAGE analysis, and although the number of sequence tags for the mutant is small, there appears to be chromosome-wide expression bias in that experiment as well (Fig. 6d). The expression profiles exhibited by these mutant strains thus were consistent with aneuploidy.
7. REFERENCES CITED
All references cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each individual publication or patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.
Many modifications and variations of this invention can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. The specific embodiments described herein are offered by way of example only, and the invention is to be limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims

What is claimed is:
1. A method of determining whether aneuploidy is likely to be present in a cell type or organism comprising: (a) quantifying levels, in one or more cells of said cell type or organism, of a plurality of cellular constituents associated with a plurality of genes in the genome of said cell type or organism, said plurality of genes comprising genes mapped to different chromosomes;
(b) comparing the quantified levels of cellular constituents associated with genes mapped to the same chromosome, to the mean quantified levels of said cellular constituents associated with said plurality of genes; and
(c) identifying genes mapped to the same chromosome for which the quantified levels of cellular constituents associated with said genes are substantially the same for each of said genes and are dissimilar to the mean quantified levels of said cellular constituents associated with said plurality of genes; wherein identifying said genes in step (c) indicates that aneuploidy of said same chromosome or portion thereof is likely to be present in said cell type or organism.
2. The method of claim 1 wherein at least some of said genes identified in step (c) are adjacent on said same chromosome.
3. The method of claim 1 , wherein said plurality of cellular constituents associated with a plurality of genes are abundances of a plurality of RNAs encoded by said plurality of genes.
4. The method of claim 1 , wherein said plurality of cellular constituents associated with a plurality of genes are abundances of a plurality of proteins translated from a plurality of mRNAs that are encoded by said plurality of genes.
5. The method of claim 1, wherein said genes mapped to the same chromosome span at least one-half of the chromosome.
6. The method of claim 5, wherein said genes mapped to the same chromosome are at least 3 genes.
7. The method of claim 6, wherein said genes mapped to the same chromosome are at least 10 genes.
5 8. The method of claim 7, wherein said genes mapped to the same chromosome are at least 50 genes.
9. The method of claim 8, wherein said genes mapped to the same chromosome are substantially all of the genes within said plurality known to map to said same
10 chromosome.
10. The method of claim 1 , wherein the aneuploidy is a deletion of at least part of a chromosome.
15 11. The method of claim 1 , wherein the aneuploidy is a duplication of at least part of a chromosome.
12. The method of claim 10 or 11 wherein the aneuploidy is a deletion or duplication of less than 10 adjacent genes.
20
13. The method of claim 1 , wherein said plurality of genes are at least 1 ,000 genes.
14. The method of claim 13, wherein said plurality of genes are at least 10,000 genes.
25 15. The method of claim 14, wherein said plurality of genes are at least 50,000 genes.
16. The method of claim 3, wherein said measuring is performed by a method comprising
(a) contacting a positionally-addressable aπay of polynucleotide probes with a 30 sample comprising RNAs or nucleic acids derived therefrom from said cell sample under conditions conducive to hybridization between said probes and said RNAs or nucleic acids, wherein said anay comprises a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of a support, each of said different nucleotide sequences comprising 35 a sequence complementary and hybridizable to a sequence in a different gene of said cell sample; and (b) measuring hybridization between said probes and said RNAs or nucleic acids.
17. The method of claim 16 wherein said microaπay comprises between 100 and 1,000 5 different oligonucleotide probes per 1 cm2.
18. The method of claim 16 wherein said microaπay comprises between 1,000 and 5,000 different probes per 1 cm2.
10 19. The method of claim 16 wherein said microaπay comprises between 5,000 and 10,000 different probes per 1 cm2.
20. The method of claim 16 wherein said microaπay comprises between 10,000 and 15,000 different probes per 1 cm2.
15
21. The method of claim 16 wherein said microanay comprises between 15,000 and 20,000 different probes per 1 cm2.
22. The method of claim 16 wherein said microaπay comprises between 20,000 and 20 50,000 different probes per 1 cm2.
23. The method of claim 16 wherein said microanay comprises at least 1,000 different probes per 1 cm2.
25 24. The method of claim 1, wherein said quantifying determines absolute abundances of said cellular constituents.
25. The method of claim 1 , wherein said quantifying determines the ratio of abundance of a cellular constituent in said cells relative to the abundance of said cellular
30 constituent in wild-type cells of said cell type, for each cellular constituent whose level is quantified.
26. A computer system for determining whether aneuploidy is likely to be present in a cell type or organism, said computer system comprising:
35 one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute steps of:
(a) comparing quantified levels, in one or more cells of said cell type or organism, of a plurality of cellular constituents associated with a plurality of genes in the genome of said cell type or organism, said genes being mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and (b) identifying genes mapped to the same chromosome for which the quantified level is substantially the same for each cellular constituent associated with each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism.
27. The computer system of claim 26, wherein said plurality of cellular constituents associated with a plurality of genes are a plurality of RNAs encoded by said plurality of genes.
28. The computer system of claim 26, wherein said plurality of cellular constituents associated with a plurality of genes are a plurality of proteins translated from a plurality of m-RNAs that are encoded by said plurality of genes.
29. A computer program product for directing a user computer in a computer-aided determination of whether aneuploidy is likely to be present in a cell type or organism, said computer program product comprising: computer code for comparing quantified levels, in one or more cells of said cell type or organism, of cellular constituents associated with genes in the genome of said cell type or organism mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and computer code for identifying genes mapped to the same chromosome for which the quantified levels of cellular constituents associated with said genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism.
30. The computer program product of claim 29, wherein said plurality of cellular constituents associated with a plurality of genes are a plurality of RNAs encoded by said plurality of genes.
31. The computer program product of claim 29, wherein said plurality of cellular constituents associated with a plurality of genes are a plurality of proteins translated from a plurality of mRNAs that are encoded by said plurality of genes.
32. A method of detecting the predisposition of a cell type or organism to a disease associated with aneuploidy, comprising:
(a) quantifying the levels of a plurality of cellular constituents associated with a plurality of genes in the genome of one or more cells of said cell type or organism, said plurality comprising cellular constituents associated with genes mapped to different chromosomes;
(b) comparing the quantified levels of cellular constituents associated with genes mapped to the same chromosome to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; and
(c) identifying genes mapped to the same chromosome for which the quantified level of cellular constituents associated with said genes is substantially the same for each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes in step (c) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism, and wherein said cell type or organism is likely to be predisposed to a disease associated with said aneuploidy of said same chromosome or portion thereof.
33. The method of claim 32 wherein at least some of said genes identified in step (c) are adjacent on said same chromosome.
34. The method of claim 32, wherein said plurality of cellular constituents associated 5 with a plurality of genes are abundances of a plurality of RNAs encoded by said plurality of genes.
35. The method of claim 32, wherein said plurality of cellular constituents associated with a plurality of genes are abundances of a plurality of proteins translated from a
10 plurality of mRNAs that are encoded by said plurality of genes.
36. The method of claim 32, wherein said genes mapped to the same chromosome span at least one-half of the chromosome.
15 37. The method of claim 36, wherein said genes mapped to the same chromosome are at least 3 genes.
38. The method of claim 37, wherein said genes mapped to the same chromosome are at least 10 genes.
20
39. The method of claim 38, wherein said genes mapped to the same chromosome are at least 50 genes.
40. The method of claim 39, wherein said genes mapped to the same chromosome are 25 substantially all of the genes within said plurality known to map to said same chromosome.
41. The method of claim 32, wherein the aneuploidy is a deletion of at least part of a chromosome.
30
42. The method of claim 32, wherein the aneuploidy is a duplication of at least part of a chromosome.
43. The method of claim 41 or 42, wherein the aneuploidy is a deletion or duplication of 35 less than 10 adjacent genes.
44. The method of claim 32, wherein said plurality of genes are at least 1,000 genes.
45. The method of claim 44, wherein said plurality of genes are at least 10,000 genes.
5 46. The method of claim 45, wherein said plurality of genes are at least 50,000 genes.
47. The method of claim 34, wherein said measuring is performed by a method comprising
(a) contacting a positionally-addressable aπay of polynucleotide probes with a 10 sample comprising RNAs or nucleic acids derived therefrom from said cell sample under conditions conducive to hybridization between said probes and said RNAs or nucleic acids, wherein said aπay comprises a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of a support, each of said different nucleotide sequences comprising 15 a sequence complementary and hybridizable to a sequence in a different gene of said cell sample; and
(b) measuring hybridization between said probes and said RNAs or nucleic acids.
20 48. The method of claim 47 wherein said microaπay comprises between 100 and 1,000 different oligonucleotide probes per 1 cm2.
49. The method of claim 47 wherein said microaπay comprises between 1,000 and 5,000 different probes per 1 cm2.
25
50. The method of claim 47 wherein said microanay comprises between 5,000 and 10,000 different probes per 1 cm2.
51. The method of claim 47 wherein said microanay comprises between 10,000 and 30 15,000 different probes per 1 cm2.
52. The method of claim 47 wherein said microaπay comprises between 15,000 and 20,000 different probes per 1 cm2.
35 53. The method of claim 47 wherein said microaπay comprises between 20,000 and 50,000 different probes per 1 cm2.
54. The method of claim 47 wherein said microanay comprises at least 1,000 different probes per 1 cm2.
55. The method of claim 32, wherein said quantifying determines absolute abundances of said cellular constituents.
56. The method of claim 32, wherein said quantifying determines the ratio of abundance of a cellular constituent in said cell relative to the abundance of said cellular constituent in a wild-type cell, for each cellular constituent whose level is quantified.
57. A computer system for detecting the predisposition of a cell type or organism to a disease associated with aneuploidy, comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of:
(a) comparing quantified levels of cellular constituents associated with genes in the genome of one or more cells of said* cell type or organism, said genes being mapped to the same chromosome, to mean quantified levels of a plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and (b) identifying genes mapped to the same chromosome for which the quantified level is substantially the same for each cellular constituent associated with each of said genes and is dissimilar to the mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes in step (b) indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism, and wherein said cell type or organism is likely to be predisposed to a disease associated with said aneuploidy of said same chromosome or portion thereof.
58. The computer system of claim 57, wherein said plurality of cellular constituents associated with a plurality of genes are a plurality of RNAs encoded by said plurality of genes.
5 59. The computer system of claim 57, wherein said plurality of cellular constituents associated with a plurality of genes are a plurality of proteins translated from a plurality of mRNAs that are encoded by said plurality of genes.
60. A computer program product for directing a user computer in a computer-aided 10 determination of whether a cell type or organism is predisposed to a disease associated with aneuploidy, said computer program product comprising: computer code for comparing quantified levels of cellular constituents associated with genes in the genome of one or more cells of said cell type or organism mapped to the same chromosome, to mean quantified levels of a 15 plurality of cellular constituents associated with a plurality of genes mapped to different chromosomes; and computer code for identifying genes mapped to the same chromosome for which the quantified level of cellular constituents associated with each of said genes is substantially the same for each of said genes and is dissimilar to 20 mean quantified levels of said plurality of cellular constituents associated with said plurality of genes mapped to different chromosomes; wherein identifying said genes indicates that aneuploidy of said same chromosome or portion thereof containing said genes is likely to be present in said cell type or organism, and wherein said cell type or organism is likely to be predisposed to a disease associated 25 with said aneuploidy of said same chromosome or portion thereof.
61. The computer program product of claim 60, wherein said plurality of cellular constituents associated with a plurality of genes are a plurality of RNAs encoded by said plurality of genes.
30
62. The computer program product of claim 60, wherein said plurality of cellular constituents associated with a plurality of genes are a plurality of proteins translated from a plurality of mRNAs that are encoded by said plurality of genes.
35 63. A method of determining whether aneuploidy is likely to be present in a cell type or organism comprising detecting an expression bias that is shared by a first plurality of genes mapped to a single chromosome or mapped to a chromosomal portion of interest in a cell of said cell type or from said organism, wherein said expression bias is present when measured levels of a first plurality of cellular constituents associated with said first plurality of genes are different 5 from the mean of measured levels of a second plurality of cellular constituents associated with a second plurality of genes in said cell, wherein said second plurality of genes consists of at least one gene not mapped to said chromosome or not mapped to said chromosomal portion.
10 64. The method of claim 63 , wherein said first plurality of cellular constituents associated with a first plurality of genes are abundances of a plurality of RNAs encoded by said first plurality of genes and wherein said second plurality of cellular constituents associated with said second plurality of genes are abundances of a plurality of RNAs encoded by said second plurality of genes.
15
65. The method of claim 63, wherein said first plurality of cellular constituents associated with a first plurality of genes are abundances of a first plurality of proteins translated from a first plurality of mRNAs that are encoded by said first plurality of genes and wherein said second plurality of cellular constituents
20 associated with a second plurality of genes are abundances of a second plurality of proteins translated from a second plurality of mRNAs that are encoded by said second plurality of genes.
66. The method of claim 63, wherein said first plurality of genes mapped to a single 25 chromosome span at least one-half of the chromosome.
67. The method of claim 66, wherein said first plurality of genes mapped to a single chromosome comprise at least 3 genes.
30 68. The method of claim 67, wherein said first plurality of genes mapped to a single chromosome comprise at least 10 genes.
69. The method of claim 68, wherein said first plurahty of genes mapped to a single chromosome comprise at least 50 genes. 35
70. The method of claim 69, wherein said first plurality of genes mapped to a single chromosome are substantially all of the genes within said first plurality known to map to said single chromosome.
5 71. The method of claim 63 , wherein the aneuploidy is a deletion of at least part of a chromosome.
72. The method of claim 63, wherein the aneuploidy is a duplication of at least part of a chromosome.
10
73. The method of claim 71 or 72, wherein the aneuploidy is a deletion or duplication of less than 10 adjacent genes.
74. The method of claim 63, wherein said first plurality of genes comprise at least 1,000 15 genes.
75. The method of claim 74, wherein said first plurality of genes comprise at least 10,000 genes.
20 76. The method of claim 75, wherein said first plurality of genes comprise at least 50,000 genes.
77. The method of claim 64 wherein abundances of said first plurality and said second plurality of RNAs are measured by hybridizing said first plurality and said second
25 plurality of RNAs, or cDNAs derived therefrom, to a microanay, said microanay comprising:
(a) a surface, and
(b) binding sites for a plurality of polynucleotide species attached to said surface, wherein said binding sites are attached to said surface such that the
30 identity of a binding site can be determined from its position on the surface.
78. The method of claim 77 wherein said microaπay comprises between 100 and 1,000 different oligonucleotide probes per 1 cm2.
35 79. The method of claim 77 wherein said microarray comprises between 1 ,000 and 5,000 different probes per 1 cm2.
80. The method of claim 77 wherein said microaπay comprises between 5,000 and 10,000 different probes per 1 cm2.
81. The method of claim 77 wherein said microanay comprises between 10,000 and 15,000 different probes per 1 cm2.
82. The method of claim 77 wherein said microanay comprises between 15,000 and 20,000 different probes per 1 cm2.
83. The method of claim 77 wherein said microaπay comprises at least 1,000 different probes per 1 cm2.
84. The method of claim 63, wherein said levels of said first plurality and said second plurality of cellular constituents are absolute abundances of said cellular constituents.
85. The method of claim 63, wherein said levels of said first plurality of cellular constituents are the ratios of abundances of said first plurality of cellular constituents in said cell relative to the abundances of said first plurality of cellular constituents in a wild-type cell, and wherein said levels of said second plurality of cellular constituents are the ratios of abundances of said second plurality of cellular constituents in said cell relative to the abundances of said second plurality of cellular constituents in a wild-type cell.
86. A method for detecting the presence of aneuploidy in a cell type or type of organism, comprising: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein the known alteration in copy number of at least one known gene in the one or more landmark profiles determined to be most similar is indicative of the presence of aneuploidy in said first cell type or type of organism.
5 87. The method of claim 86 wherein said measured changes of a plurality of cellular constituents in a biological sample comprise changes in abundances of mRNA species in said biological sample.
88. The method of claim 87, wherein said measuring is performed by a method 10 comprising
(a) contacting a positionally-addressable anay of polynucleotide probes with a sample comprising RNAs or nucleic acids derived therefrom from said cell sample under conditions conducive to hybridization between said probes and said RNAs or nucleic acids, wherein said anay comprises a plurality of
15 polynucleotide probes of different nucleotide sequences bound to different regions of a support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a gene of said cell sample; and
(b) measuring hybridization between said probes and said RNAs or nucleic 20 acids.
89. The method of claim 86 wherein the database comprises landmark profiles for at least 100 genes in the genome of said cell type or organism, each gene having an altered copy number.
25
90. The method of claim 89 wherein the database comprises landmark profiles for at least 5,000 genes in the genome of said cell type or organism, each gene having an altered copy number.
30 91. The method of claim 90 wherein the database comprises landmark profiles for at least 75,000 genes in the genome of said cell type or organism, each gene having an altered copy number.
92. The method of claim 86 wherein the database comprises landmark profiles for at 35 least V-- of the genes in the genome of said cell type or organism, each gene having an altered copy number.
93. The method of claim 92 wherein the database comprises landmark profiles for at least 3/4 of the genes in the genome of said cell type or organism, each gene having an altered copy number.
5
94. The method of claim 86 wherein the database comprises landmark profiles for at least 2% of the genes in the genome of said cell type or orgamsm, each gene having an altered copy number.
10 95. The method of claim 94 wherein the database comprises landmark profiles for at least 15% of the genes in the genome of said cell type or organism, each gene having an altered copy number.
96. The method of claim 95 wherein the database comprises landmark profiles for at 15 least 75% of the genes in the genome of said cell type or organism, each gene having an altered copy number.
97. The method of claim 86 wherein the measured amounts of the plurality of cellular constituents in said first cell of said cell type or of said type of organism are
20 determined in comparison to a wild-type cell of said cell type or said type of organism, and wherein the measured amounts of the plurality of cellular constituents in said second cell of said cell type or of said type of organism are determined in comparison to a wild-type cell of said cell type or of said type of organism.
25 98. The method of claim 86 wherein measured amounts of the plurality of cellular constituents in said first cell of said cell type or of said organism and the measured amounts of the plurality of cellular constituents in said second cell of said cell type or of said type of organism are absolute amounts of the pluralities of cellular constituents. 30
99. A method of diagnosing a disease associated with aneuploidy in a cell type or type of organism, comprising: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or 35 more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known 5 gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism.
10 100. The method of claim 99, wherein said predicted profile is compared to said database, and said first profile is at a first developmental stage or first condition and said predicted profile is at a second, different developmental stage or condition more similar to the developmental stage or condition of said second cell than said first cell.
15
101. The method of claim 99 wherein said measured changes of a plurality of cellular constituents in a biological sample comprise changes in abundances of mRNA species in said biological sample.
20 102. The method of claim 101, wherein said measuring is performed by a method comprising
(a) contacting a positionally-addressable aπay of polynucleotide probes with a sample comprising RNAs or nucleic acids derived therefrom from said cell sample under conditions conducive to hybridization between said probes and
25 said RNAs or nucleic acids, wherein said aπay comprises a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of a support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene of said cell sample; and
30 (b) measuring hybridization between said probes and said RNAs or nucleic acids.
103. The method of claim 99 wherein the database comprises landmark profiles for at least 100 genes in the genome of said cell type or organism, each gene having an 35 altered copy number.
104. The method of claim 103 wherein the database comprises landmark profiles for at least 50,000 genes in the genome of said cell type or organism, each gene having an altered copy number.
105. The method of claim 99 wherein the database comprises landmark profiles for at least Vi of the genes in the genome of said cell type or organism, each gene having an altered copy number.
106. The method of claim 105 wherein the database comprises landmark profiles for at least 3/4 of the genes in the genome of said cell type or organism, each gene having an altered copy number.
107. The method of claim 99 wherein the database comprises landmark profiles for at least 2% of the genes in the genome of said cell type or organism, each gene having an altered copy number.
108. The method of claim 107 wherein the database comprises landmark profiles for at least 75% of the genes in the genome of said cell type or organism, each gene having an altered copy number.
109. The method of claim 99 wherein the measured amounts of the plurality of cellular constituents in said first cell of said cell type or of said type of organism are determined in comparison to a wild-type cell of said cell type or said type of organism, and wherein the measured amounts of the plurality of cellular constituents in said second cell of said cell type or of said type of organism are determined in comparison to a wild-type cell of said cell type or of said type of organism.
110. The method of claim 99 wherein measured amounts of the plurality of cellular constituents in said first cell of said cell type or of said organism and the measured amounts of the plurality of cellular constituents in said second cell of said cell type or of said type of organism are absolute amounts of the pluralities of cellular constituents.
111. A computer system for diagnosing a disease associated with a known aneuploidy in a cell type or type of organism, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute steps of: comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism.
112. The computer system of claim 111, wherein said predicted profile is compared to said database, and said first profile is at a first developmental stage or first condition and said predicted profile is at a second, different developmental stage or condition more similar to the developmental stage or condition of said second cell than said first cell.
113. A computer program product for directing a user computer in a computer-aided diagnosis of a disease associated with a known aneuploidy in a cell type or organism, said computer program product comprising: computer code for comparing a first profile or a predicted profile derived therefrom to a database comprising a plurality of landmark profiles to determine the one or more landmark profiles most similar to said first or predicted profile; wherein said first profile comprises measured amounts of a plurality of cellular constituents in a first cell of said cell type or of said type of organism; wherein each landmark profile comprises measured amounts of a plurality of cellular constituents in a second cell of said cell type or type of organism having a known alteration in copy number of at least one known gene; and wherein a disease associated with said known alteration in copy number of at least one known gene associated with the one or more landmark profiles determined to be most similar to said first or predicted profile is present in said first cell type or type of organism.
114. The computer program product of claim 113, wherein said predicted profile is compared to said database, and said first profile is at a first developmental stage or first condition and said predicted profile is at a second, different developmental stage or condition more similar to the developmental stage or condition of said second cell than said first cell.
115. A kit for detecting the presence of aneuploidy comprising:
(a) an array comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of an organism, wherein said different nucleotide sequences are known to be increased or decreased as a result of aneuploidy; and
(b) expression profiles, in electronic or written form, each conelated to a known alteration in copy number of at least one gene, wherein said expression profiles are obtained by measuring a plurality of cellular constituents in a cell of said organism having a known alteration in copy number of said at least one gene.
116. A kit for detecting the presence of aneuploidy comprising:
(a) an aπay comprising a positionally-addressable anay of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of said support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene in the genome of an organism; and (b) a container comprising RNA, or cDNA derived therefrom, of a cell having a known aneuploidy.
117. The kit of claim 116, further comprising: expression profiles, in electronic or written form, each conelated to a known alteration in copy number of at least one gene, wherein said expression profiles are obtained by measuring a plurality of cellular constituents in a cell of a subject having a known alteration in copy number of said at least one gene.
118. The kit of claim 115 or 116, wherein said aneuploidy is associated with a disease or a predisposition to a disease and said organism is a human.
119. The kit of claim 118, wherein said aneuploidy results from amplification, deletion or translocation of at least one gene and said disease is cancer.
120. The kit of claim 118, wherein said aneuploidy results from chromosome trisomy and said disease is selected from the group consisting of Down syndrome, Edwards syndrome, Patau syndrome, triple X syndrome, Klinefelter syndrome, and 47,XYY syndrome.
121. The kit of claim 118, wherein said aneuploidy results from a deletion of a chromosome or portion thereof and said disease is selected from the group consisting of cri du chat syndrome, Wolf-Hirschhorn syndrome, Alagille syndrome, Angelman syndrome, DiGeorge syndrome, Langer-Giedion syndrome, Miller- Dieker syndrome, Prader-Willi syndrome, Rubinstein-Taybi syndrome, Smith Magenis syndrome, Williams syndrome and Turner syndrome.
122. The method of claim 32 or 99, wherein said aneuploidy results from amplification, deletion or translocation of at least one gene and said disease is cancer.
123. The method of claim 122, wherein said cancer is selected from the group consisting of breast cancer, colon cancer, acute myelogenous leukemia, chronic myelocytic leukemia, acute promyelocytic leukemia, acute nonlymphocytic leukemia, acute monocytic leukemia, acute myelomonocytic leukemia, Burkitt's lymphoma, non- Hodgkin's lymphoma, acute lymphoblastic leukemia, chronic lymphocytic leukemia, myeloproliferative diseases, small cell lung cancer, kidney cancer, uterine cancer, cervical cancer, prostate cancer, bladder cancer, ovarian cancer, liposarcoma, synovial sarcoma, rhabdomyosarcoma, extraskeletal myxoid chondrosarcoma, Ewing's tumor, peripheral neuroepithelioma, testicular and ovarian dysgerminoma, retinoblastoma, Wilms' tumor, neuroblastoma, malignant melanoma, and mesothelioma.
5 124. A method of determining whether aneuploidy of one or more genes is likely to be present in a cell type or organism comprising: identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in 10 response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild- type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that 15 co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild-type organism; and wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes encoding said one or more cellular constituents is likely to be present in said cell type or organism.
20
125. The method of claim 124, wherein said one or more cellular constituents are abundances of one or more RNAs.
126. The method of claim 124, wherein said one or more cellular constituents are 25 abundances of one or more proteins.
127. The method of claim 124, wherein the aneuploidy is a deletion of at least part of a cliromosome.
30 128. The method of claim 124, wherein the aneuploidy is a duplication of at least part of a chromosome.
129. The method of claim 127 or 128, wherein the aneuploidy is a deletion or duplication of less than 10 adjacent genes.
35
130. The method of claim 125, wherein said measuring is performed by a method comprising
(a) contacting a positionally-addressable anay of polynucleotide probes with a sample comprising RNAs or nucleic acids derived therefrom from said cell
5 ■ sample under conditions conducive to hybridization between said probes and said RNAs or nucleic acids, wherein said aπay comprises a plurality of polynucleotide probes of different nucleotide sequences bound to different regions of a support, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a sequence in a different gene
10 of said cell sample; and
(b) measuring hybridization between said probes and said RNAs or nucleic acids.
131. The method of claim 130 wherein said microaπay comprises between 100 and 1 ,000 15 different oligonucleotide probes per 1 cm2.
132. The method of claim 130 wherein said microanay comprises between 1,000 and 5,000 different probes per 1 cm2.
20 133. The method of claim 130 wherein said microanay comprises between 5,000 and 10,000 different probes per 1 cm2.
134. The method of claim 130 wherein said microanay comprises between 10,000 and 15,000 different probes per 1 cm2.
25
135. The method of claim 130 wherein said microaπay comprises between 15,000 and 20,000 different probes per 1 cm2.
136. The method of claim 130 wherein said microaπay comprises between 20,000 and 30 50,000 different probes per 1 cm2.
137. The method of claim 130 wherein said microaπay comprises at least 1,000 different probes per 1 cm2.
35 138. A computer system for determining whether aneuploidy is likely to be present in a cell type or organism, said computer system comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute step of: 5 identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild- 10 type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild-type organism; and 15 wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes encoding said one or more cellular constituents is likely to be present in said cell type or organism.
139. The computer system of claim 138 wherein said programs further cause the one or 20 more processor units to execute step of: defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations to said wild-type cell or wild-type organism.
25
140. The computer system of claim 139 wherein said programs cause the one or more processor units to execute said step of defining wild-type co-varying cellular constituent sets by means of an agglomerative hierarchical clustering algorithm.
30 141. A computer program product for directing a user computer in a computer-aided determination that aneuploidy is likely to be present in a cell type or organism, said computer program product comprising: computer code for identifying one or more cellular constituents that are members of a wild-type co-varying cellular constituent set, wherein the
35 variation in said one or more cellular constituents in a cell or organism suspected of being aneuploid, in response to one or more perturbations, is not similar to the variation of said one or more cellular constituents or other cellular constituents in said wild-type co-varying cellular constituent set, in a wild-type cell of the same type or in a wild-type organism; wherein said wild-type cellular constituent set consists of cellular constituents that co-vary in a wild-type cell or wild-type organism in response to a plurality of perturbations to said wild-type cell or wild-type organism; and wherein identifying said one or more cellular constituents indicates that aneuploidy of one or more genes encoding said one or more cellular constituents is likely to be present in said cell type or organism.
142. The computer program product of claim 141, further comprising: computer code for defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations to said wild-type cell or wild-type organism.
143. A method of coπecting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, comprising:
(a) determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and
(b) dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value of the mean chromosomal offset ratio.
144. The method of claim 143, wherein the value of the mean chromosomal ratio offset is determined for at least 3 genes located on said chromosome or chromosomal segment.
145. The method of claim 144, wherein the value of the mean chromosomal ratio offset is determined for at least 10 genes located on said chromosome or chromosomal segment.
5 146. The method of claim 145, 169, wherein the value of the mean chromosomal ratio offset is detennined for at least 50 genes located on said chromosome or chromosomal segment.
147. The method of claim 143, wherein said plurality of cellular constituents associated 10 with a plurality of genes are abundances of a plurality of RNAs encoded by said plurality of genes.
148. The method of claim 143, wherein said plurality of cellular constituents associated with a plurality of genes are abundances of a plurality of proteins translated from a
15 plurality of mRNAs that are encoded by said plurality of genes.
149. The method of claim 143, wherein the aneuploidy is a deletion of at least part of a chromosome.
20 150. The method of claim 143, wherein the aneuploidy is a duplication of at least part of a chromosome.
151. The method of claim 149 or 150, wherein the aneuploidy is a deletion or duplication of less than 10 adjacent genes.
25
152. The method of claim 143, wherein said plurality of genes are at least 1,000 genes.
153. The method of claim 152, wherein said plurality of genes are at least 10,000 genes.
30 154. The method of claim 153, wherein said plurality of genes are at least 50,000 genes.
155. A method of conecting a profile of a cell type or organism for aneuploidy, comprising:
(a) determining the value of the mean offset ratio for a plurality of genes 35 associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnoπnal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number; and (b) dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
10
156. The method of claim 155, wherein the value of the mean chromosomal ratio offset is determined for at least 3 genes located on said chromosome or chromosomal segment.
15 157. The method of claim 156, wherein the value of the mean chromosomal ratio offset is determined for at least 10 genes located on said chromosome or chromosomal segment.
158. The method of claim 157, wherein the value of the mean chromosomal ratio offset is 0 determined for at least 50 genes located on said chromosome or chromosomal segment.
159. The method of claim 155, wherein said plurality of cellular constituents associated with a plurality of genes are abundances of a plurality of RNAs encoded by said 5 plurality of genes.
160. The method of claim 155, wherein said plurality of cellular constituents associated with a plurality of genes are abundances of a plurality of proteins translated from a plurality of mRNAs that are encoded by said plurality of genes. 0
161. The method of claim 155, wherein the aneuploidy is a deletion of at least part of a chromosome.
162. The method of claim 155, wherein the aneuploidy is a duplication of at least part of 5 a chromosome.
163. The method of claim 161 or 162, wherein the aneuploidy is a deletion or duplication of less than 10 adjacent genes.
164. The method of claim 155, wherein said plurality of genes are at least 1,000 genes. 5
165. The method of claim 164, wherein said plurality of genes are at least 10,000 genes.
166. The method of claim 165, wherein said plurality of genes are at least 50,000 genes.
10 167. A method of coπecting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, comprising: dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean 15 chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said cliromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome 20 or chromosomal segment having a wild type copy number.
168. The method of claim 167, wherein the value of the mean chromosomal ratio offset is determined for at least 3 genes located on said chromosome or chromosomal segment.
25
169. The method of claim 168, wherein the value of the mean chromosomal ratio offset is determined for at least 10 genes located on said chromosome or chromosomal segment.
30 170. The method of claim 169, wherein the value of the mean chromosomal ratio offset is determined for at least 50 genes located on said cliromosome or chromosomal segment.
171. The method of claim 167, wherein said plurality of cellular constituents associated 35 with a plurality of genes are abundances of a plurality of RNAs encoded by said plurality of genes.
172. The method of claim 167, wherein said plurality of cellular constituents associated with a plurality of genes are abundances of a plurality of proteins translated from a plurality of mRNAs that are encoded by said plurality of genes.
5
173. The method of claim 167, wherein the aneuploidy is a deletion of at least part of a chromosome.
174. The method of claim 167, wherein the aneuploidy is a duplication of at least part of 10 a chromosome.
175. The method of claim 173 or 174, wherein the aneuploidy is a deletion or duplication of less than 10 adjacent genes.
15 176. The method of claim 167, wherein said plurality of genes are at least 1 ,000 genes.
177. The method of claim 176, wherein said plurality of genes are at least 10,000 genes.
178. The method of claim 177, wherein said plurality of genes are at least 50,000 genes. 0
179. A method of conecting a profile of a cell type or organism for aneuploidy, comprising: dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism 25 having an abnonnal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular 30 constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
180. The method of claim 179, wherein the value of the mean chromosomal ratio offset is determined for at least 3 genes located on said chromosome or chromosomal
35 segment.
181. The method of claim 180, wherein the value of the mean chromosomal ratio offset is determined for at least 10 genes located on said chromosome or chromosomal segment.
5 182. The method of claim 181, wherein the value of the mean chromosomal ratio offset is determined for at least 50 genes located on said chromosome or chromosomal segment.
183. The method of claim 179, wherein said plurality of cellular constituents associated 10 with a plurality of genes are abundances of a plurality of RNAs encoded by said plurality of genes.
184. The method of claim 179, wherein said plurality of cellular constituents associated with a plurality of genes are abundances of a plurality of proteins translated from a
15 plurality of mRNAs that are encoded by said plurality of genes.
185. The method of claim 179, wherein the aneuploidy is a deletion of at least part of a chromosome.
20 186. The method of claim 179, wherein the aneuploidy is a duplication of at least part of a chromosome.
187. The method of claim 185 or 186, wherein the aneuploidy is a deletion or duplication of less than 10 adjacent genes.
25
188. The method of claim 179, wherein said plurality of genes are at least 1,000 genes.
189. The method of claim 188, wherein said plurality of genes are at least 10,000 genes.
30 190. The method of claim 189, wherein said plurality of genes are at least 50,000 genes.
191. A computer system for coπecting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising: one or more processor units; 35 one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said cliromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment by the value of the mean chromosomal offset ratio.
192. The computer system of claim 191 wherein said programs further cause the one or more processor units to execute the step of: defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations to said wild-type cell or wild-type organism.
193. The computer system of claim 192, wherein said programs cause the one or more processor units to execute said step of defining wild-type co-varying cellular constituents by means of an agglomerative hierarchical clustering algorithm.
194. A computer program product for directing a user computer in a computer-aided coπection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for determining the value of the mean chromosomal offset ratio for a plurality of genes mapped to said chromosome or chromosomal segment in the cell type or organism, wherein said value is the difference between the mean quantified level of a plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said cliromosome or chromosomal segment by the value of the mean chromosomal offset ratio.
195. The computer program product of claim 194, further comprising: computer code for defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations to said wild-type cell or wild-type organism.
196. A computer system for coπecting a profile of a cell type or organism for aneuploidy of a cliromosome or chromosomal segment, said computer system comprising comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the steps of: determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number; and dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio.
197. The computer system of claim 196 wherein said programs further cause the one or more processor units to execute the step of: defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations to said wild-type cell or wild-type organism. 5
198. The computer system of claim 197, wherein said programs cause the one or more processor units to execute said step of defining wild-type co-varying cellular constituents by means of an agglomerative hierarchical clustering algorithm.
10 199. A computer program product for directing a user computer in a computer-aided coπection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for determining the value of the mean offset ratio for a plurality of genes associated with a plurality of cellular constituents whose
15 mean quantified level is altered by the presence of one or more genes in said cell type or organism having an abnormal copy number, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality 0 of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number; and computer code for dividing the mean quantified level of said plurality of cellular constituents that are altered by the presence of said one or more genes having an abnormal copy number by the value of the mean offset ratio. 5
200. The computer program product of claim 199, further comprising: computer code for defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations 0 to said wild-type cell or wild-type organism.
201. A computer system for coπecting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising comprising: 5 one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: dividing the mean quantified level of a plurality of cellular constituents 5 associated with a plurality of genes mapped to a chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said 10 chromosome or chromosomal segment and the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number.
202. The computer system of claim 201 wherein said programs further cause the one or 15 more processor units to execute the step of: defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations to said wild-type cell or wild-type organism. 20
203. The computer system of claim 202, wherein said programs cause the one or more processor units to execute said step of defining wild-type co-varying cellular constituents by means of an agglomerative hierarchical clustering algorithm.
25 204. A computer program product for directing a user computer in a computer-aided conection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for dividing the mean quantified level of a plurality of cellular constituents associated with a plurality of genes mapped to a 30 chromosome or chromosomal segment in a cell type or organism by the value of the mean chromosomal offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents associated with said plurality of genes mapped to said chromosome or chromosomal segment and the mean quantified level of a plurality of 35 cellular constituents associated with a plurality of genes mapped to at least one chromosome or chromosomal segment having a wild type copy number.
205. The computer program product of claim 204, further comprising: computer code for defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations to said wild-type cell or wild-type organism.
206. A computer system for coπecting a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer system comprising comprising: one or more processor units; one or more memory units connected to said one or more processor units, said one or more memory units containing one or more programs which cause said one or more processor units to execute the step of: dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
207. The computer system of claim 206, wherein said programs further cause the one or more processor units to execute the step of: defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations to said wild-type cell or wild-type organism.
208. The computer system of claim 207, wherein said programs cause the one or more processor units to execute said step of defining wild-type co-varying cellular constituents by means of an agglomerative hierarchical clustering algorithm.
209. A computer program product for directing a user computer in a computer-aided conection of a profile of a cell type or organism for aneuploidy of a chromosome or chromosomal segment, said computer program product comprising: computer code for dividing the mean quantified level of a plurality of cellular constituents that are altered by the presence of one or more genes in a cell type or organism having an abnormal copy number by the value of the mean offset ratio for said plurality of genes, wherein said value is the difference between the mean quantified level of said plurality of cellular constituents that are altered by the presence of one or more genes having an abnormal copy number and the mean quantified level of a plurality of cellular constituents that are unaltered by the presence of said one or more genes having an abnormal copy number.
210. The computer program product of claim 209, further comprising: computer code for defining one or more wild-type co-varying cellular constituent sets in which a plurality of cellular constituents in a wild-type cell or wild-type organism co-vary in response to a plurality of perturbations to said wild-type cell or wild-type organism.
PCT/US2000/035352 2000-12-01 2000-12-22 Use of profiling for detecting aneuploidy WO2002044411A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US25059700P 2000-12-01 2000-12-01
US60/250,597 2000-12-01

Publications (1)

Publication Number Publication Date
WO2002044411A1 true WO2002044411A1 (en) 2002-06-06

Family

ID=22948395

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/035352 WO2002044411A1 (en) 2000-12-01 2000-12-22 Use of profiling for detecting aneuploidy

Country Status (1)

Country Link
WO (1) WO2002044411A1 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10242359A1 (en) * 2002-09-12 2004-03-25 Alopex Gmbh Amplifying information from genetic material containing many fragments, useful e.g. for analyzing frequency of chromosomes in oocytes
CN100449003C (en) * 2003-06-20 2009-01-07 中山大学 Method of detecting tumor gene with boichip
US8560243B2 (en) 1998-12-28 2013-10-15 Microsoft Corporation Methods for determining therapeutic index from gene expression profiles
US11111544B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US11111545B2 (en) 2010-05-18 2021-09-07 Natera, Inc. Methods for simultaneous amplification of target loci
US11186863B2 (en) 2019-04-02 2021-11-30 Progenity, Inc. Methods, systems, and compositions for counting nucleic acid molecules
US11230731B2 (en) 2018-04-02 2022-01-25 Progenity, Inc. Methods, systems, and compositions for counting nucleic acid molecules
US11286530B2 (en) 2010-05-18 2022-03-29 Natera, Inc. Methods for simultaneous amplification of target loci
US11306359B2 (en) 2005-11-26 2022-04-19 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US11306357B2 (en) 2010-05-18 2022-04-19 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11322224B2 (en) 2010-05-18 2022-05-03 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11319595B2 (en) 2014-04-21 2022-05-03 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11326208B2 (en) 2010-05-18 2022-05-10 Natera, Inc. Methods for nested PCR amplification of cell-free DNA
US11332793B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for simultaneous amplification of target loci
US11332785B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11339429B2 (en) 2010-05-18 2022-05-24 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11390916B2 (en) 2014-04-21 2022-07-19 Natera, Inc. Methods for simultaneous amplification of target loci
US11408031B2 (en) 2010-05-18 2022-08-09 Natera, Inc. Methods for non-invasive prenatal paternity testing
US11479812B2 (en) 2015-05-11 2022-10-25 Natera, Inc. Methods and compositions for determining ploidy
US11485996B2 (en) 2016-10-04 2022-11-01 Natera, Inc. Methods for characterizing copy number variation using proximity-litigation sequencing
US11519028B2 (en) 2016-12-07 2022-12-06 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US11525159B2 (en) 2018-07-03 2022-12-13 Natera, Inc. Methods for detection of donor-derived cell-free DNA
US11939634B2 (en) 2010-05-18 2024-03-26 Natera, Inc. Methods for simultaneous amplification of target loci

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5665549A (en) * 1992-03-04 1997-09-09 The Regents Of The University Of California Comparative genomic hybridization (CGH)
WO2000024925A1 (en) * 1998-10-28 2000-05-04 Luminis Pty Ltd Karyotyping means and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5665549A (en) * 1992-03-04 1997-09-09 The Regents Of The University Of California Comparative genomic hybridization (CGH)
WO2000024925A1 (en) * 1998-10-28 2000-05-04 Luminis Pty Ltd Karyotyping means and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUGHES et al.; "Widespread aneuploidy revealed by DNA microarray expression profiling", Nature Genetics, July 2000, Volume 25, pages 333-337. *

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560243B2 (en) 1998-12-28 2013-10-15 Microsoft Corporation Methods for determining therapeutic index from gene expression profiles
DE10242359A1 (en) * 2002-09-12 2004-03-25 Alopex Gmbh Amplifying information from genetic material containing many fragments, useful e.g. for analyzing frequency of chromosomes in oocytes
CN100449003C (en) * 2003-06-20 2009-01-07 中山大学 Method of detecting tumor gene with boichip
US11111544B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US11306359B2 (en) 2005-11-26 2022-04-19 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US11519035B2 (en) 2010-05-18 2022-12-06 Natera, Inc. Methods for simultaneous amplification of target loci
US11408031B2 (en) 2010-05-18 2022-08-09 Natera, Inc. Methods for non-invasive prenatal paternity testing
US11286530B2 (en) 2010-05-18 2022-03-29 Natera, Inc. Methods for simultaneous amplification of target loci
US11939634B2 (en) 2010-05-18 2024-03-26 Natera, Inc. Methods for simultaneous amplification of target loci
US11306357B2 (en) 2010-05-18 2022-04-19 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11312996B2 (en) 2010-05-18 2022-04-26 Natera, Inc. Methods for simultaneous amplification of target loci
US11322224B2 (en) 2010-05-18 2022-05-03 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11746376B2 (en) 2010-05-18 2023-09-05 Natera, Inc. Methods for amplification of cell-free DNA using ligated adaptors and universal and inner target-specific primers for multiplexed nested PCR
US11525162B2 (en) 2010-05-18 2022-12-13 Natera, Inc. Methods for simultaneous amplification of target loci
US11326208B2 (en) 2010-05-18 2022-05-10 Natera, Inc. Methods for nested PCR amplification of cell-free DNA
US11332793B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for simultaneous amplification of target loci
US11332785B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11339429B2 (en) 2010-05-18 2022-05-24 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11111545B2 (en) 2010-05-18 2021-09-07 Natera, Inc. Methods for simultaneous amplification of target loci
US11482300B2 (en) 2010-05-18 2022-10-25 Natera, Inc. Methods for preparing a DNA fraction from a biological sample for analyzing genotypes of cell-free DNA
US11414709B2 (en) 2014-04-21 2022-08-16 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11319595B2 (en) 2014-04-21 2022-05-03 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11408037B2 (en) 2014-04-21 2022-08-09 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11390916B2 (en) 2014-04-21 2022-07-19 Natera, Inc. Methods for simultaneous amplification of target loci
US11530454B2 (en) 2014-04-21 2022-12-20 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11319596B2 (en) 2014-04-21 2022-05-03 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11486008B2 (en) 2014-04-21 2022-11-01 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11371100B2 (en) 2014-04-21 2022-06-28 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11479812B2 (en) 2015-05-11 2022-10-25 Natera, Inc. Methods and compositions for determining ploidy
US11946101B2 (en) 2015-05-11 2024-04-02 Natera, Inc. Methods and compositions for determining ploidy
US11485996B2 (en) 2016-10-04 2022-11-01 Natera, Inc. Methods for characterizing copy number variation using proximity-litigation sequencing
US11519028B2 (en) 2016-12-07 2022-12-06 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US11530442B2 (en) 2016-12-07 2022-12-20 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US11230731B2 (en) 2018-04-02 2022-01-25 Progenity, Inc. Methods, systems, and compositions for counting nucleic acid molecules
US11788121B2 (en) 2018-04-02 2023-10-17 Enumera Molecular, Inc. Methods, systems, and compositions for counting nucleic acid molecules
US11525159B2 (en) 2018-07-03 2022-12-13 Natera, Inc. Methods for detection of donor-derived cell-free DNA
US11186863B2 (en) 2019-04-02 2021-11-30 Progenity, Inc. Methods, systems, and compositions for counting nucleic acid molecules
US11959129B2 (en) 2019-04-02 2024-04-16 Enumera Molecular, Inc. Methods, systems, and compositions for counting nucleic acid molecules

Similar Documents

Publication Publication Date Title
US6468476B1 (en) Methods for using-co-regulated genesets to enhance detection and classification of gene expression patterns
WO2002044411A1 (en) Use of profiling for detecting aneuploidy
US6203987B1 (en) Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns
US6950752B1 (en) Methods for removing artifact from biological profiles
US6801859B1 (en) Methods of characterizing drug activities using consensus profiles
US6132969A (en) Methods for testing biological network models
AU774830B2 (en) Statistical combining of cell expression profiles
US6370478B1 (en) Methods for drug interaction prediction using biological response profiles
CA2282792A1 (en) Methods for drug target screening
AU3890699A (en) Methods for identifying pathways of drug action
EP1483720A1 (en) Computer systems and methods for identifying genes and determining pathways associated with traits
US20040091933A1 (en) Methods for genetic interpretation and prediction of phenotype
WO2002002740A2 (en) Methods and compositions for determining gene function
US7807447B1 (en) Compositions and methods for exon profiling
EP1141415A1 (en) Methods for robust discrimination of profiles
AU773456B2 (en) Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns
WO2002002741A2 (en) Methods for genetic interpretation and prediction of phenotype
US20020146694A1 (en) Functionating genomes with cross-species coregulation

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP

NENP Non-entry into the national phase

Ref country code: JP