EP1141415A1 - Verfahren zur grobunterscheidung von profilen - Google Patents

Verfahren zur grobunterscheidung von profilen

Info

Publication number
EP1141415A1
EP1141415A1 EP99968165A EP99968165A EP1141415A1 EP 1141415 A1 EP1141415 A1 EP 1141415A1 EP 99968165 A EP99968165 A EP 99968165A EP 99968165 A EP99968165 A EP 99968165A EP 1141415 A1 EP1141415 A1 EP 1141415A1
Authority
EP
European Patent Office
Prior art keywords
profile
constituent
biological sample
augmented
profiles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99968165A
Other languages
English (en)
French (fr)
Inventor
Stephen H. Friend
Roland Stoughton
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rosetta Inpharmatics LLC
Original Assignee
Rosetta Inpharmatics LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rosetta Inpharmatics LLC filed Critical Rosetta Inpharmatics LLC
Publication of EP1141415A1 publication Critical patent/EP1141415A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1089Design, preparation, screening or analysis of libraries using computer algorithms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/5005Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells

Definitions

  • the field of this invention relates to methods for discriminating between the subtle effects of a first perturbation and a second perturbation on a biological sample.
  • the invention also relates to improved methods for identifying disease states in patients.
  • the invention provides improved methods for optimizing drug therapy regimens in diseased subjects.
  • the invention also generally relates to improved methods for determining the subtle effects of pharmacological agents on a biological system.
  • Cellular constituents include gene expression levels, abundance of mRNA encoding specific genes, and protein expression levels in a biological sample. Levels of various constituents of a cell, such as mRNA encoding genes and/or protein expression levels, are known to change in response to drug treatments and other perturbations of the cell's biological state. Measurements of a plurality of such "cellular constituents” therefore contain a wealth of information about the affect of perturbations on the cell's biological state. The collection of such measurements is generally referred to as the "profile" of the cell's biological state.
  • LC/MS/MS microcolumn reversed-phase liquid chromatography electrospray ionization tandem mass spectrometry
  • Mortensen et al. describe a method for producing embryonic stem (ES) cell lines whereby both alleles are inactivated by homologous recombination. Using the methods of Mortensen et al, it is possible to obtain homozygous mutationally altered cells, i.e., double knockouts of ES cell lines. Mortensen et al. propose that their method may be generally applicable to other genes and to cell lines other than ES cells. Mortensen et al. 1992, Production of homozygous mutant ES cells with a single targeting construct, Cell Biol. 12:2391-2395.
  • Comparison of profiles with other profiles in a database can give clues to the molecular targets of drugs and related functions, efficacy and toxicity of drug candidates and/or pharmacological agents. Such comparisons may also be used to derive consensus profiles representative of ideal drug activities or disease states. Profile comparison can also help detect diseases in a patient at an early stage and provide improved clinical outcome projections for a patient diagnosed with a disease.
  • the 365 experiments include experiments with/without drugs at different concentrations, with/without specific genes in the yeast strain, combinations of drug treatment and gene deletion, changes in culture density, growth temperature, medium composition, and stimulations with endogenous hormones like mating factor.
  • several thousand cellular constituents are being profiled in each experiment depicted in Figure 1, typically only a small number of constituents change significantly, and often none at all.
  • a profile derived from any of the 365 experiments in Figure 1 would not provide enough information to determine the subtle effects of a particular perturbation. Consequently, profile comparisons using conventional profiles suffer from a failure to provide sufficient information to discern the subtle affects of a perturbation on a biological system. According to the above background, there is a great demand in the art for robust profile comparison methods.
  • This invention provides robust profile comparison methods. These methods are used to determine a degree of similarity between an effect of a first perturbation and a second perturbation on a biological system.
  • the methods of this invention have extensive applications in the areas of preventive health care, drug discovery, drug candidate lead selection, drug candidate validation, drug regimen optimization in a variety of patient populations, development of clinical trial protocols to satisfy United States Food and Drug Administration (FDA) requirements including those for investigative new drugs, satisfaction of related clinical trial protocol requirements in administrative agencies that are equivalent to the FDA in countries other than the United States, drug and/or drug candidate efficacy, drug and/or drug candidate toxicity, diagnostic applications such as disease monitoring in a variety of patient populations, and for the prediction of the clinical outcome of a patient.
  • FDA United States Food and Drug Administration
  • One aspect of the invention includes a method comprising the steps of (a) determining a first set of constituent profiles, wherein each constituent profile in the set is determined by a different one of a plurality of initial states of a biological sample by measuring a response of the biological sample to the first perturbation when the biological sample is in the selected initial state; (b) determining a second set of constituent profiles, each constituent profile of the second set determined using a different one of a plurality of initial states of the biological sample by measuring a response of the biological sample to a second perturbation when the biological sample is in the selected initial state; (c) combining the first set of constituent profiles into a first augmented profile; (d) combining the second set of constituent profiles into a second augmented profile; and (e) comparing the first augmented profile with the second augmented profile to determine the degree of similarity between the first perturbation and the second perturbation.
  • At least one constituent profile in the first set of constituent profiles is a first response profile and at least one constituent profile in the second set of constituent profiles is a second response profile.
  • the first response profile is determined by at least one measurement of a at least one cellular constituent in the biological sample when the biological sample is in an initial state selected from a plurality of initial states
  • the second response profile is determined by at least one measurement of at lease one cellular constituent in said biological sample when said biological sample is in the selected initial state.
  • at least one constituent profile in the first set of constituent profiles is a first projected profile and at least one constituent profile in the second set of constituent profiles is a second projected profile.
  • the first and second projected profiles each contain a plurality of cellular constituent set values derived according to a definition of co-varying cellular constituent sets.
  • the first and second projected profiles could be determined by an initial state selected from said plurality of initial states of the biological sample.
  • An augmented profile could include any combination of projected profiles and response profiles.
  • the biological sample is a cell line.
  • the cell line could be an of an unicellular organism and at least one initial state included in a plurality of initial states could be determined by altering the biological sample in a manner that alters cell wall permeability.
  • the biological sample is substantially isogenic to Saccharomyces cerevisiae.
  • the biological sample is a cell line that expresses a macromolecule that serves as a drug efflux pump.
  • some of the initial biological states are generated by selecting isogenic cell lines that do not possess macromolecules that have an ability to act as a drug efflux pump.
  • the biological sample is a cell line and the first initial state that is selected from a plurality of initial states is determined by a first set of culture growth conditions and a second initial state that is selected from a plurality of initial states is determined by a second set of culture growth conditions.
  • the first culture growth conditions and the second culture growth conditions vary by a variable such as an amount of a nutrient that is necessary for viability of said cell line, an amount of a trace element, an amount of a mineral, a culture temperature, and/or the nature of the container the sample is cultured in. Examples of containers include but are not limited to shaker flasks, culture plates and incubators.
  • the biological sample is a cell line and a first initial state that is selected from a plurality of initial states is determined by a culture growth density of the cell line and a second initial state that is selected from a plurality of initial states is determined by a second culture growth density of the cell line, wherein the two culture growth densities vary by an amount.
  • the biological sample is a cell line and a first initial state that is selected from a plurality of initial states is determined by a first amount of a pharmacological agent that is contacted with the biological sample and a second initial state that is selected from said plurality of initial states is determined by a second amount of a pharmacological agent that is contacted with the biological sample.
  • a first initial state is determined by a genetic feature of the biological sample.
  • the biological sample could be Saccharomyces cerevisiae having a genome and the first initial state that is selected from a plurality of initial states is determined by a genetic feature selected from the group consisting of a haploid state of the genome, a diploid state of the genome, a heterozygous state of a gene included in the genome, a homozygous state of a gene included in the genome, a mutation of a gene included in the genome, a deletion of a portion of a gene from the genome, an alteration of a regulatory sequence of a gene in the genome, an exogenous gene integrated into the genome and an exogenous oligonucleotide integrated into the genome.
  • the biological sample could be a cell line having a genome wherein the first initial state that is selected from a plurality of initial states is determined by a genetic feature selected from the group consisting of a heterozygous state of a gene included in the genome, a homozygous state of a gene included in the genome, a mutation of a gene included in the genome, a deletion of a portion of a gene from the genome, an alteration of a regulatory sequence of a gene in the genome, an exogenous gene integrated into the genome, and an exogenous oligonucleotide integrated into t genome.
  • a genetic feature selected from the group consisting of a heterozygous state of a gene included in the genome, a homozygous state of a gene included in the genome, a mutation of a gene included in the genome, a deletion of a portion of a gene from the genome, an alteration of a regulatory sequence of a gene in the genome, an exogenous gene integrated into the genome, and an exogenous oligon
  • the biological sample is a cell line and the first initial state that is selected from a plurality of initial states is determined by a state of a biological pathway that is selected from a compendium of biological pathways present in the cell line.
  • the biological sample is substantially isogenic with Saccharomyces cerevisiae and the biological pathway is a mating pathway.
  • the first perturbation is a first amount of a first pharmacological agent that is contacted with the biological sample.
  • the second perturbation is a second amount of the first pharmacological agent that is contacted with the biological sample, and the first and second amounts of pharmacological agent vary.
  • the second perturbation is a second amount of a second pharmacological agent that is contacted with said biological sample.
  • the biological sample includes a genome and the first perturbation is determined by the introduction of an exogenous gene into the genome, and or deletion of at least one gene in the genome.
  • the first perturbation is a method, the method comprising: contacting said biological sample with a hormone, a drug, a peptide, an oligonucleotide, a mineral, a composition of media, a phage, a trace element, a salt, a colony stimulating factor or a source of irradiation.
  • the first perturbation is a method, the method comprising: contacting an amount of an organic compound that has a molecular weight less than 1000 Daltons with said biological sample.
  • P 1 is a first augmented profile
  • P is a first constituent profile in a first set of constituent profiles that is determined by measuring a response of a biological sample to a first
  • P' N is an N h constituent profile in the first set of constituent profiles that is determined by measuring a response of the biological sample to the first perturbation when the biological sample is in an N h biological state selected
  • the second augmented profile is:
  • R is a first constituent profile in a second set of constituent profiles that is determined by measuring a response of the biological sample to the second perturbation when the biological sample is in said the biological state;
  • P" N is an N h constituent profile in the second set of constituent profiles that
  • the quantitative measure of similarity is derived from Shannon mutual information theory.
  • each constituent profile includes a plurality of elements that each represent an amount of a cellular constituent in a biological sample.
  • the cellular constituents are independently selected from the group consisting of a gene expression level, an amount of an mR ⁇ A encoding a gene, an amount of a protein, an amount of an enzymatic activity, an amount of an epitope presented by a macromolecule, an amount of a divalent cation, an amount of a phosphorylated protein, an amount of a dephosphorylated protein, an amount of a hormone, and an amount of a peptide.
  • Another aspect of the invention is a method of determining an effect of a first perturbation on a subject, the method comprising:(a) determining a plurality of augmented profiles; each augmented profile determined by combining a constituent profile set selected from a plurality of constituent profile sets wherein: each constituent profile set in the plurality of constituent profile sets is determined by obtaining a biological sample from the subject at a different time; and each constituent profile in the constituent profile set is determined by measuring a biological response of the biological sample to a different second perturbation selected from a plurality of perturbations; and (b) comparing the plurality of augmented profiles to determine the effect of the first perturbation on the subject.
  • the first perturbation may be selected from the group consisting of a diseased state, introduction of an exogenous gene into the genome of the subject, and a behavioral health risk.
  • the first constituent profile set in the plurality of constituent profiles sets represents a baseline state and all other constituent profile sets in the plurality of constituent profile sets are expressed as a ratio or logarithmic ratio of the first constituent profile set.
  • the first perturbation is a drug that is taken by the subject of interest at regular intervals.
  • Fig. 1 represents the results of 365 mRNA transcription profiling experiments. Methods were as described for a subset of these experiments in Section 6., supra.
  • Each of the 365 rows in this image has, when printed at full resolution, 6000 gray-scale pixels representing the ratio in mRNA expression of the 6000 yeast genes between the pair of cell conditions in that experiment pair. Black denotes upregulation of a gene's transcription, white denotes downregulation, and the middle gray denotes very little or no change.
  • the 365 condition pairs include comparisons of with/without drugs at different concentrations, with/without specific genes in the yeast strain, combinations of drug treatment and gene deletion, changes in culture density, growth temperature, medium composition, and stimulations with endogenous hormones like mating factor.
  • Fig. 2 represents profiles to drugs in multiple conditions. Although the response to the drugs under starting State 1 may be small or nonexistent, the concatenated response profiles obtained in different states may provide robust discrimination of the activities of the different compounds, t denotes upregulation. 1 denotes downregulation. Absence of an arrow denotes no change for that cellular constituent.
  • Fig. 3 A illustrates a profile for the immunosuppressant drug cyclosporin A.
  • Fig. 3B illustrates a profile for the immunosuppressant drug FK506.
  • the horizontal axis is the intensity of the individual hybridized spots on the microarrary, representing individual mRNA species abundance in the two cultures.
  • the vertical axis is the loglO of the ratio of the intensity measured for one fluorescent label (Culture 1) to that measured for the other label (Culture 2). Error bars and names are displayed only for those genes which had up or down regulations due to the drug that were significant at the 95% confidence level or better.
  • Fig. 4 Shows the high correlation (similarity) between the effects of cyclosporin A and FK506 on S. Cerevisiae that had been cultured in the presence of 1 ⁇ g/ml of FK506 and 30 ⁇ g/ml of cyclosporin respectively.
  • Fig. 5A illustrates a response profile for the gene deletion strain FPR cultured in the presence of 1 ⁇ g/ml of FK506.
  • Fig. 5B illustrates a response profile for the gene deletion strain CPH1 cultured in the presence of 1 ⁇ g/ml of FK506.
  • Fig. 5C illustrates a response profile for the gene deletion strain FPR cultured in the presence of 50 ⁇ g/ml of cyclosporin.
  • Fig. 5D illustrates a response profile for the gene deletion strain CPH1 cultured in the presence of 50 ⁇ g/ml of cyclosporin.
  • Fig. 6 illustrates the reduced correlation between the effects of cyclosporin and FK506 in yeast when augmented profiles are used.
  • Fig. 7 illustrates a computer system useful for embodiments of the invention.
  • a basis for the present invention is the unexpected discovery that augmented profiles provide a method for robustly discriminating between the subtle effects of a first perturbation and a second perturbation on a biological sample.
  • Augmented profiles are derived by the combination of a plurality of response profiles and or projection profiles that are in turn based upon the measurement of cellular constituents within a biological sample as the biological sample is placed in a series of different starting states. This section presents a detailed description of the invention and its applications.
  • a biological sample and/or biological system includes a cell line, a culture of a cell line, a tissue sample obtained from a subject, a Homo sapien, a mammal, a yeast substantially isogenic to Saccharomyces cerevisia, or any other art recognized biological system.
  • a perturbation includes the exposure of a biological sample to a drug candidate or pharmacologic agent, the introduction of an exogenous gene into a biological sample, the deletion of a gene from the biological sample, changes in the culture conditions of the biological sample, or any other art recognized method of perturbing a biological sample.
  • a constituent profile is a profile used in the formation of an augmented profile.
  • the constituent profile may, for example, be a response profile or a projected profile, which are described infra.
  • Behavioral Health Risk includes, but is not limited to, consumption of alcohol and cigarette smoking.
  • biological sample is broadly defined to include any cell, tissue, organ or multicellular organism.
  • a biological sample can be derived, for example, from cell or tissue cultures in vitro.
  • a biological sample can be derived from a living organism or from a population of single cell organisms.
  • the state of a biological sample can be measured by the content, activities or structures of its cellular constituents.
  • the state of a biological sample, as used herein, is determined by the state of a collection of cellular constituents, which are sufficient to characterize the cell or organism for an intended purpose including characterizing the effects of a drug or other perturbation.
  • cellular constituent is broadly defined herein to encompass any kind of measurable biological variable.
  • the measurements and/or observations made on the state of these constituents can be of their abundances ⁇ i.e., amounts or concentrations in a biological sample), their activities, their states of modification ⁇ e.g., phosphorylation), or other art recognized measurements relevant to the physiological state of a biological sample.
  • this invention includes making such measurements and/or observations on different collections of cellular constituents. These different collections of cellular constituents are also called aspects of the biological state of a biological sample.
  • the transcriptional state of a biological sample includes the identities and abundances of the constituent RNA species, especially mRNAs, in the cell under a given set of conditions. Often, a substantial fraction of all constituent RNA species in the biological sample are measured, but at least a sufficient fraction is measured to characterize the action of a drug or other perturbation of interest.
  • the transcriptional state of a biological sample can be conveniently determined by measuring cDNA abundances by any of several existing gene expression technologies. DNA arrays for measuring mRNA or transcript level of a large number of genes can be employed to ascertain the biological state of a sample.
  • the translational state of a biological sample includes the identities and abundances of the constituent protein species in the biological sample under a given set of conditions. Preferably a substantial fraction of all constituent protein species in the biological sample is measured, but at least a sufficient fraction is measured to characterize the action of a drug of interest.
  • the transcriptional state is often representative of the translational state.
  • the activity state of a biological sample includes the activities of the constituent protein species (and also optionally catalytically active nucleic acid species) in the biological sample under a given set of conditions.
  • the translational state is often representative of the activity state.
  • This invention is also adaptable, where relevant, to "mixed" aspects of the biological state of a biological sample in which measurements of different aspects of the biological state of a biological sample are combined. For example, in one mixed aspect, the abundances of certain RNA species and of certain protein species, are combined with measurements of the activities of certain other protein species. Further, it will be appreciated from the following that this invention is also adaptable to any other aspect of a biological state of a biological sample that is measurable.
  • the biological state of a biological sample ⁇ e.g., a cell or cell culture
  • a profile of some number of cellular constituents can be represented by the vector S.
  • S [S, , . . S, , . . S k ] (i)
  • S t is the level of the /'th cellular constituent, for example, the transcript level of gene i, or alternatively, the abundance or activity level of protein i.
  • cellular constituents are measured as continuous variables.
  • transcriptional rates are typically measured as number of molecules synthesized per unit of time.
  • Transcriptional rate may also be measured as percentage of a control rate.
  • cellular constituents may be measured as categorical variables.
  • transcriptional rates may be measured as either "on” or “off, where the value "on” indicates a transcriptional rate above a predetermined threshold and value "off indicates a transcriptional rate below that threshold.
  • the responses of a biological sample to a perturbation can be measured by observing the changes in the biological state of the biological sample.
  • a response profile is a collection of changes of cellular constituents.
  • the response profile of a biological sample ⁇ e.g., a cell or cell culture) to the perturbation m may be defined as the vector v ( ) :
  • vTM is the amplitude of response of cellular constituent under the perturbation m.
  • biological response to the application of a pharmacological agent is measured by the induced change in the transcript level of at least 2 genes, preferably more than 10 genes, more preferably more than 100 genes and most preferably more than 1 ,000 genes.
  • biological response profiles comprise simply the difference between biological variables before and after perturbation.
  • the biological response is defined as the ratio of cellular constituents before and after a perturbation is applied.
  • v, m is set to zero if the response of gene i is below some threshold amplitude or confidence level determined from knowledge of the measurement error behavior. In such embodiments, those cellular constituents whose measured responses are lower than the threshold are given the response value of zero, whereas those cellular constituents whose measured responses are greater than the threshold retain their measured response values.
  • This truncation of the response vector is suitable when most of the smaller responses are expected to be greatly dominated by measurement error. After the truncation, the response vector v (m) also approximates a 'matched detector' ⁇ see, e.g., Van Trees, 1968, Detection. Estimation, and Modulation Theory Vol. I.
  • truncation levels can be set based upon the purpose of detection and the measurement errors. For example, in some embodiments, genes whose transcript level changes are lower than two fold or more preferably four fold are given the value of zero.
  • perturbations are applied at several levels of strength. For example, different amounts of a drug may be applied to a biological sample to observe its response.
  • the perturbation responses may be interpolated by approximating each by a single parameterized "model" function of the perturbation strength u.
  • An exemplary model function appropriate for approximating transcriptional state data is the Hill function, which has adjustable parameters a, u 0 , and n.
  • the adjustable parameters are selected independently for each cellular constituent of the perturbation response.
  • the adjustable parameters are selected for each cellular constituent so that the sum of the squares of the differences between the model function ⁇ e.g., the Hill function, Equation 3) and the corresponding experimental data at each perturbation strength is minimized.
  • This preferable parameter adjustment method is known in the art as a least squares fit.
  • Other possible model functions are based on polynomial fitting. More detailed description of model fitting and biological response has been disclosed in Friend and Stoughton, Methods of Determining Protein Activity Levels Using Gene Expression Profiles, U.S. Provisional Application Serial No. 60/084,742, filed on May 8, 1998, which is incorporated herein by reference in it's entirety for all purposes.
  • the methods of the invention are useful for comparing augmented profiles that contain any number of response profile and/or projected profiles. Projected profiles are best understood after a discussion of genesets, which are co-regulated genes. Projected profiles are useful for analyzing many types of cellular constituents including genesets.
  • genes tend to increase or decrease their expression in groups. Genes tend to increase or decrease their rates of transcription together when they possess similar regulatory sequence patterns, i.e., transcription factor binding sites. This is the mechanism for coordinated response to particular signaling inputs ⁇ see, e.g., Madhani and Fink, 1998,
  • a preferred embodiment for identifying such basis genesets involves clustering algorithms (for reviews of clustering algorithms, see, e.g., Fukunaga, 1990, Statistical Pattern Recognition. 2nd Ed., Academic Press, San Diego; Everitt, 1974, Cluster Analysis. London: Heinemann Educ. Books; Hartigan, 1975, Clustering Algorithms. New York: Wiley; Sneath and Sokal, 1973, Numerical Taxonomy. Freeman; Anderberg, 1973, Cluster Analysis for Applications. Academic Press: New York).
  • the expression of a large number of genes is monitored as biological samples are subjected to a wide variety of perturbations.
  • a table of data containing the gene expression measurements is used for cluster analysis.
  • Cluster analysis operates on a table of data which has the dimension m x k wherein m is the total number of conditions or perturbations and k is the number of genes measured.
  • Clustering algorithms use dissimilarities or distances between objects when forming clusters.
  • the distance used is Euclidean distance in multidimensional space:
  • I(x,y) is the distance between gene X and gene Y;
  • X t and Y f are gene expression response under perturbation i.
  • the Euclidean distance may be squared to place progressively greater weight on objects that are further apart.
  • the distance measure may be the Manhattan distance e.g., between gene X and Y, which is provided by:
  • X t and Y t are gene expression responses under perturbation i.
  • Another useful distance definition which is particularly useful in the context of cellular response, is
  • I 1 - r, where r is the correlation coefficient between the response vectors X, Y, also called the normalized dot product X*YI ⁇ X ⁇ ⁇ Y ⁇ .
  • Various cluster linkage rules are useful for defining genesets.
  • Single linkage a nearest neighbor method, determines the distance between the two closest objects.
  • complete linkage methods determine distance by the greatest distance between any two objects in the different clusters. This method is particularly useful in cases when genes or other cellular constituents form naturally distinct "clumps.”
  • the unweighted pair-group average defines distance as the average distance between all pairs of objects in two different clusters. This method is also very useful for clustering genes or other cellular constituents to form naturally distinct "clumps.”
  • the weighted pair- group average method may also be used. This method is the same as the unweighted pair- group average method except that the size of the respective clusters is used as a weight.
  • This method is particularly useful for embodiments where the cluster size is suspected to be greatly varied (Sneath and Sokal,1973, Numerical taxonomy. San Francisco: W. H. Freeman & Co.).
  • Other cluster linkage rules such as the unweighted and weighted pair- group centroid and Ward's method are also useful for some embodiments of the invention. See., e.g., Ward, 1963, J. Am. Stat Assn. 58:236; Hartigan, 1975, Clustering algorithms. New York: Wiley.
  • Genesets may be defined based on the many smaller branches of a tree, or a small number of larger branches by cutting across the tree at different levels. The choice of cut level may be made to match the number of distinct response pathways expected. If little or no prior information is available about the number of pathways, then the tree should be divided into as many branches as are truly distinct. 'Truly distinct' may be defined by a minimum distance value between the individual branches. Typical values are in the range 0.2 to 0.4 where 0 is perfect correlation and 1 is zero correlation, but may be larger for poorer quality data or fewer experiments in the training set, or smaller in the case of better data and more experiments in the training set.
  • 'truly distinct' may be defined with an objective test of statistical significance for each bifurcation in the tree.
  • the Monte Carlo randomization of the experiment index for each cellular constituent's responses across the set of experiments is used to define an objective test.
  • the objective test is defined in the following manner: etp kl be the response of constituent k in experiment i. Let JI(i) be a random permutation of the experiment index. Then for each of a large (about 100 to 1000) number of different random permutations, construct p kJJ(l) . For each branching in the original tree, for each permutation:
  • D k is the square of the distance measure for constituent k with respect to the center (mean) of its assigned cluster.
  • Superscript 1 or 2 indicates whether it is with respect to the center of the entire branch or with respect to the center of the appropriate cluster out of the two subclusters.
  • D I - r , where r is the co ⁇ elation coefficient between the responses of one constituent across the experiment set vs. the responses of the other (or vs. the mean cluster response).
  • the distribution of fractional improvements obtained from the Monte Carlo procedure is an estimate of the distribution under the null hypothesis that a given branching was not significant.
  • the actual fractional improvement for that branching with the unpermuted data is then compared to the cumulative probability distribution from the null hypothesis to assign significance.
  • Standard deviations are derived by fitting a log normal model for the null hypothesis distribution. Using this procedure, a standard deviation greater than about 2, for example, indicates that the branching is significant at the 95% confidence level.
  • Genesets defined by cluster analysis typically have underlying biological significance.
  • Another aspect of the cluster analysis method provides the definition of basis vectors for use in profile projection described in the following sections.
  • a set of basis vectors V has kx n dimensions, where k is the number of genes and n is the number of genesets.
  • V ⁇ n> k is the amplitude contribution of gene index k in basis vector n.
  • F is proportional to the response of gene k in geneset n over the training data set used to define the genesets .
  • the elements F ⁇ are normalized so that each V n) k has unit length by dividing by the square root of the number of genes in geneset n. This produces basis vectors which are not only orthogonal (the genesets derived from cutting the clustering tree are disjoint), but also orthonormal (unit length). With this choice of normalization, random measurement errors in profiles project onto the V n in such a way that the amplitudes tend to be comparable for each n. Normalization prevents large genesets from dominating the results of similarity calculations.
  • Genesets can also be defined based upon the mechanism of the regulation of genes. Genes whose regulatory regions have the same transcription factor binding sites are more likely to be co-regulated. In some prefe ⁇ ed embodiments, the regulatory regions of the genes of interest are compared using multiple alignment analysis to decipher possible shared transcription factor binding sites (Stormo and Hartzell,1989, Identifying protein binding sites from unaligned DNA fragments, Proc Natl Acad Sci 86:1183-1187; Hertz and Stormo, 1995, Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps, Proc of 3rd Intl Conf on Bioinformatics and Genome Research.
  • K-means clustering may be used to cluster genesets when the regulation of genes of interest is partially known. K-means clustering is particularly useful in cases where the number of genesets is predetermined by the understanding of the regulatory mechanism. In general, K-mean clustering is constrained to produce exactly the number of clusters desired. Therefore, if promoter sequence comparison indicates the measured genes should fall into three genesets, K-means clustering may be used to generate exactly three genesets with greatest possible distinction between clusters.
  • the expression value of genes can be converted into the expression value for genesets. This process is referred to as projection.
  • the projection is as follows:
  • J ⁇ is the amplitude of cellular constituent index k of basis vector n.
  • the value of geneset expression is simply the average of the expression value of the genes within the geneset. In some other embodiments, the average is weighted so that highly expressed genes do not dominate the geneset value.
  • the collection of the expression values of the genesets is the projected profile.
  • projected profiles P may be obtained for any set of profiles indexed by . Similarities between the R, may be more clearly seen than between the original profiles/?, for two reasons. First, measurement errors in extraneous genes have been excluded or averaged out. Second, the basis genesets tend to capture the biology of the profiles /?, and so are matched detectors for their individual response components.
  • Classification and clustering of the profiles both are based on an objective similarity metric, call it S, where one useful definition is
  • This definition is the generalized angle cosine between the vectors R, and P. It is the projected version of the conventional correlation coefficient between ;, andp Profile/?, is deemed most similar to that other profile/? for which S y is maximum.
  • New profiles may be classified according to their similarity to profiles of known biological significance, such as the response patterns for known drugs or perturbations in specific biological pathways. Sets of new profiles may be clustered using the distance metric
  • the statistical significance of any observed similarity S y may be assessed using an empirical probability distribution generated under the null hypothesis of no correlation. This distribution is generated by performing the projection, Equations (9) and (10) for many different random permutations of the constituent index in the original profile p. That is, the ordered set/?* are replaced ⁇ s p ⁇ (k) where U(k) is a permutation, for -100 to 1000 different random permutations. The probability of the similarity S y arising by chance is then the fraction of these permutations for which the similarity S y (permuted) exceeds the similarity observed using the original unpermuted data.
  • a biological sample is placed in alternative states by, for example, introducing mutations or changing growth conditions, to make the biological sample more responsive to a given perturbation.
  • This concept is illustrated in Figure 2.
  • the drugs have only limited responses and comparison of their effects is tenuous and based on little information.
  • augmented profiles consisting of concatenated profiles from multiple states or conditions, the profiles become much more informative. Because they are more informative, they can provide improved detail on the effects of different perturbations, such as drugs in the illustration, on a patient.
  • the different states may be different culture growth conditions, background genetic strains, or additional drug treatments, to name a few.
  • Additional states may be chosen based on prior biological knowledge to elicit specific responses in otherwise unresponsive cells, or they may be chosen more or less at random with the knowledge that the resulting additional diversity in the augmented response profile will tend to allow better discrimination, on average.
  • Techniques to change the initial state and possibly elicit responses include, for example, inhibiting drug efflux pumps or enhancing cell wall permeability by genetic modification of the organism, growing in nutrient-poor media, growing on plates vs. in volume culture, adding certain trace elements or minerals to the media, using haploid, diploid, and heterozygous background strains, activating pathways such as the mating pathway which have widespread effects on cell state and are likely to change the responsiveness to the stimuli that are being compared.
  • Robust augmented profile comparison has wide ranging applicability, such as providing a method for robust discrimination of drug activities or disease states in vivo.
  • multiple conditions are provided by following a patient in time or through other environmental or medical insults and by concatenation of the multiple profiles obtained under these different host conditions.
  • Profiles may be expressed as departure profiles from baseline states by forming the ratio or log(ratio) of constituent levels with respect to a baseline state, or any second perturbation.
  • Equation (12) P * P l( ⁇ P ⁇ P J ⁇ ) (12) can be used to define the concatenated profiles, as they would be defined on single-state profiles pi.
  • * denotes dot product
  • denotes vector norm (length).
  • Many other quantitative measures of similarity are possible, such as Shannon mutual information [S.E. Shannon and W. Weaver, The mathematical theory of communication, University of Illinois Press, Urbana, IL, 1949], or modifications of Equation (12) where elements of the profiles are set to "1" ("-1") if they exceed a positive (negative) threshold and "0" if they do not.
  • the methods of the present invention have applicability to the field of drug candidate lead selection.
  • a target enzyme will be screened against a large library of proprietary and/or nonproprietary compounds. Such a screening effort is refe ⁇ ed to as a primary assay. Primary assays are often reduced to a robotic format in which thousands of compounds are screened per day. These efforts will result in a large number of compounds that produce the desired activity, which is typically the inhibition of the activity of a selected target enzyme.
  • Hits from the primary assay are typically screened in appropriately designed secondary assays. While the format of the secondary assay may vary depending on the scope of the drug discovery project, a typically secondary assay includes the dose response of a compound on whole cells. Thus in such a cell-based assay, the presence of some cellular constituent, such as TNF secretion, is measured as the cells are incubated in increasing concentrations of test compound.
  • secondary assays are typically used to compare the activity of hits from the primary assay with the activity of some reference compound.
  • the reference compound may be one that has proven efficacy in the appropriate clinical setting, a known drug or simply a prior lead. Comparison of newly developed compounds against the active reference compound serves as an excellent tool for marking progress and for determining what is to be expected of new compounds.
  • the methods of the present invention will serve as an improved secondary assay. Accordingly, the effect of dosing an appropriate cell line with a reference compound can be compared to the effect of dosing the same cell line with each of the hits from the primary screening assay.
  • appropriate cellular constituents of the cell line can be measured using any of the techniques described in this specification or known in the art. Further, these measurements can be done when the cell line is placed in a variety of different initial biological states. For example, cell response profiles can be measured when the reference compound has been contacted with the cells after they have been cultured in a variety of cell culture densities, temperatures, or other culture conditions. Each of these response profiles are combined to form a reference augmented profile.
  • a potential drug candidate will exhibit excellent activity in the primary in vitro assay and secondary cell-based assays. Even if a compound is successful in both primary and secondary assays, their remains a need to validate the compound.
  • Compound validation addresses the difficult issue of verifying that a test compound was successful in the primary and secondary assays because of selective affects on the desired target rather than unselective affects on multiple physiological processes.
  • Compounds that selectively affect the desired target are prefe ⁇ ed over compounds that selectively affect a wide variety of cellular constituents. For example, a compound that is excessively hydrophobic may bind to the target enzyme by unselective hydrophobic interactions.
  • a nonselective kinase inhibitor such as staurosporine will bind and inhibit dozens of kinases.
  • a test compound may perform well in the secondary assay because it is toxic to the cells or because the compound knocked out a biological pathway that is unrelated to the biological pathway of interest.
  • augmented profiles reference augment profiles
  • augmented profiles generated from compounds that need validation For example, reference compounds that have a general toxic effect on the biological sample will have distinct augmented profiles. Thus a low co ⁇ elation between such reference toxic compounds and test compounds of interest is desired.
  • a high co ⁇ elation between an augmented profile derived from a previously validated compound and a test compound would indicate that the test compound is selectively influencing the proper biological pathway.
  • a previously validated compound may be obtained from animal trials or from prior scientific publications.
  • augmented profiles developed from biological samples obtained from a patient can be compared with reference augmented profiles that represent model drug responses of patients with favorable clinical outcome. Data derived from such comparisons would then be used to optimize a particular drug regimen thus maximizing the effectiveness of drug treatment and reducing its costs in terms of response time and financial expenditure.
  • the augmented profiles taken from patients can also be used to discover unsatisfactory therapeutic responses caused by inadequate drug exposure or undesirable side-effects before they manifest in unfavorable symptoms.
  • Robust augmented profile comparison can also be used to detect poor compliance with a dosage regimen.
  • regular comparison of augmented profiles can be used to detect and monitor interactions with co-ingested medications or the effects of changes in the physical status of the patient.
  • augmented profiles will provide an invaluable service in the field of preventive health care.
  • biological samples are obtained from subjects on a routine basis over time. Augmented profiles are developed based upon these biological samples. Comparison of these augmented profiles to a database that includes several model disease states provides advance warning that the subject has a particular disease before the disease manifests itself in any outward clinical symptoms.
  • Such a diagnostic tool is particularly valuable in diseases such as cancer because early treatment leads to improved chances of recovery and/or survival.
  • Appropriately chosen augmented profile comparisons will also provide useful information on health risks in a subject.
  • appropriately designed augmented profiles will be used to determine if a patient should alter their diet, exercise more, take certain vitamins, or alter other behavioral aspects.
  • the utility of the robust profile comparison method will increase.
  • Robust profile comparison has utility in the field of disease monitoring. For example, robust comparison of an augmented profile obtained from a cancer patient before and after the start of a drug therapy regimen can provide a physician with valuable information about the effects that a particular cancer drug regimen has on a patient.
  • augmented profiles are compared with a database of augmented profiles to determine if the subject's augmented profile co ⁇ elates with those patients in which the drug had a positive effect, no effect, toxic-side effects, or any combination thereof.
  • augmented profiles are used to track the health of an AIDS patient. Cu ⁇ ently, biological markers such at T cell count are used as a crude indicator of the progression of the AIDS virus.
  • augmented profiles taken from patients throughout the various stages of a disease may be stored in national and/or proprietary databases. Then, using the robust profile comparison methods of the present invention, augmented profiles obtained from new patients may be compared with profiles in the database.
  • This invention utilizes the ability to measure the responses of a biological sample to a large variety of perturbations. This section provides some exemplary methods for measuring biological responses.
  • the transcriptional rate can be measured by techniques of hybridization to a ⁇ ays of nucleic acid or nucleic acid mimic probes, described in the next subsection, or by other gene expression technologies, such as those described in the subsequent subsection. However measured, the result is either the absolute, relative amounts of transcripts or response data including values representing RNA abundance ratios, which usually reflect DNA expression ratios (in the absence of differences in RNA degradation rates).
  • aspects of the biological state other than the transcriptional state such as the translational state, the activity state, or mixed aspects can be measured.
  • measurement of the transcriptional state is made by hybridization to transcript a ⁇ ays, which are described in this subsection. Certain other methods of transcriptional state measurement are described later in this subsection.
  • transcript a ⁇ ays also called herein "microa ⁇ ays”
  • Transcript a ⁇ ays can be employed for analyzing the transcriptional state in a biological sample and especially for measuring the transcriptional states of a biological sample exposed to graded levels of a drug of interest or to graded perturbations to a biological pathway of interest.
  • transcript a ⁇ ays are produced by hybridizing detectably labeled polynucleotides representing the mRNA transcripts present in a cell (e.g., fluorescently labeled cDNA synthesized from total cell mRNA) to a microa ⁇ ay.
  • a microa ⁇ ay is a surface with an ordered a ⁇ ay of binding (e.g., hybridization) sites for products of many of the genes in the genome of a cell or organism, preferably most or almost all of the genes.
  • Microa ⁇ ays can be made in a number of ways, of which several are described below.
  • microa ⁇ ays share certain prefe ⁇ ed characteristics:
  • the arrays are reproducible, allowing multiple copies of a given a ⁇ ay to be produced and easily compared with each other.
  • the microa ⁇ ays are small, usually smaller than 5 cm 2 , and they are made from materials that are stable under binding ⁇ e.g., nucleic acid hybridization) conditions.
  • a given binding site or unique set of binding sites in the microa ⁇ ay will specifically bind the product of a single gene in the cell.
  • site physical binding site
  • cDNA complementary to the total cellular mRNA when detectably labeled (e.g., with a fluorophore) cDNA complementary to the total cellular mRNA is hybridized to a microa ⁇ ay, the site on the array co ⁇ esponding to a gene (i.e., capable of specifically binding the product of the gene) that is not transcribed in the cell will have little or no signal (e.g., fluorescent signal), and a gene for which the encoded mRNA is prevalent will have a relatively strong signal.
  • a gene for which the encoded mRNA is prevalent when detectably labeled (e.g., with a fluorophore) cDNA complementary to the total cellular mRNA is hybridized to a microa ⁇ ay, the site on the array co ⁇ esponding to a gene (i.e., capable of specifically binding the product of the gene) that is not transcribed in the cell will have little or no signal (e.g., fluorescent signal), and a gene for
  • cDNAs from two different cells are hybridized to the binding sites of the microa ⁇ ay.
  • drug responses one biological sample is exposed to a drug and another biological sample of the same type is not exposed to the drug.
  • pathway responses one cell is exposed to a pathway perturbation and another cell of the same type is not exposed to the pathway perturbation.
  • the cDNA derived from each of the two cell types are differently labeled so that they can be distinguished.
  • cDNA from a cell treated with a drug is synthesized using a fluorescein-labeled dNTP
  • cDNA from a second cell, not drug- exposed is synthesized using a rhodamine-labeled dNTP.
  • the cDNA from the drug-treated (or pathway perturbed) cell will fluoresce green when the fluorophore is stimulated and the cDNA from the untreated cell will fluoresce red.
  • the drug treatment has no effect, either directly or indirectly, on the relative abundance of a particular mRNA in a cell, the mRNA will be equally prevalent in both cells and, upon reverse transcription, red-labeled and green-labeled cDNA will be equally prevalent.
  • the binding site(s) for that species of RNA will emit wavelengths characteristic of both fluorophores (and appear brown in combination).
  • cDNA labeled with two different fluorophores is that a direct and internally controlled comparison of the mRNA levels co ⁇ esponding to each a ⁇ ayed gene in two cell states can be made, and variations due to minor differences in experimental conditions ⁇ e.g., hybridization conditions) will not affect subsequent analyses.
  • cDNA from a single cell, and compare, for example, the absolute amount of a particular mRNA in, e.g., a drug-treated or pathway-perturbed cell and an untreated cell.
  • Microa ⁇ ays are known in the art and consist of a surface to which probes that co ⁇ espond in sequence to gene products ⁇ e.g., cDNAs, mRNAs, cRNAs, polypeptides, and fragments thereof), can be specifically hybridized or bound at a known position.
  • the microa ⁇ ay is an a ⁇ ay ⁇ i.e., a matrix) in which each position represents a discrete binding site for a product encoded by a gene (e.g., a protein or RNA), and in which binding sites are present for products of most or almost all of the genes in the organism's genome.
  • the "binding site” is a nucleic acid or nucleic acid analogue to which a particular cognate cDNA can specifically hybridize.
  • the nucleic acid or analogue of the binding site can be, e.g., a synthetic oligomer, a full- length cDNA, a less-than full length cDNA, or a gene fragment.
  • the microa ⁇ ay contains binding sites for products of all or almost all genes in the target organism's genome, such comprehensiveness is not necessarily required.
  • the microa ⁇ ay will have binding sites co ⁇ esponding to at least about 50% of the genes in the genome, often at least about 75%, more often at least about 85%o, even more often more than about 90%, and most often at least about 99%.
  • the microa ⁇ ay has binding sites for genes relevant to the action of a drug of interest or in a biological pathway of interest.
  • a “gene” is an open reading frame (ORF) of preferably at least 50, 75, or 99 amino acids from which a messenger RNA is transcribed in the organism ⁇ e.g., if a single cell) or in some cell in a multicellular organism.
  • ORF open reading frame
  • the number of genes in a genome can be estimated from the number of mRNAs expressed by the organism, or by extrapolation from a well-characterized portion of the genome.
  • the number of ORFs can be determined and mRNA coding regions identified by analysis of the DNA sequence. For example, the Saccharomyces cerevisiae genome has been completely sequenced and is reported to have approximately 6275 open reading frames (ORFs) longer than 99 amino acids.
  • ORFs there are 5885 ORFs that are likely to specify protein products (Goffeau et al, 1996, Life with 6000 genes, Science 274:546-567, which is incorporated by reference in its entirety for all purposes). In contrast, the human genome is estimated to contain approximately 10 5 genes.
  • the "binding site" to which a particular cognate cDNA specifically hybridizes is usually a nucleic acid or nucleic acid analogue attached at that binding site.
  • the binding sites of the microarray are DNA polynucleotides co ⁇ esponding to at least a portion of each gene in an organism's genome. These DNAs can be obtained by, e.g., polymerase chain reaction (PCR) amplification of gene segments from genomic DNA, cDNA ⁇ e.g., by RT-PCR), or cloned sequences.
  • PCR polymerase chain reaction
  • PCR primers are chosen, based on the known sequence of the genes or cDNA, that result in amplification of unique fragments ⁇ i.e., fragments that do not share more than 10 bases of contiguous identical sequence with any other fragment on the microa ⁇ ay).
  • Computer programs are useful in the design of primers with the required specificity and optimal amplification properties. In the case of binding sites co ⁇ esponding to very long genes, it will sometimes be desirable to amplify segments near the 3' end of the gene so that when oligo-dT primed cDNA probes are hybridized to the microa ⁇ ay, less-than-full length probes will bind efficiently.
  • each gene fragment on the microa ⁇ ay will be between about 50 bp and about 2000 bp, more typically between about 100 bp and about 1000 bp, and usually between about 300 bp and about 800 bp in length.
  • PCR methods are well known and are described, for example, in Innis et al. eds., 1990, PCR Protocols: A Guide to Methods and Applications, Academic Press Inc., San Diego, CA, which is incorporated by reference in its entirety for all purposes.
  • nucleic acid for the microa ⁇ ay is by synthesis of synthetic polynucleotides or oligonucleotides, e.g., using N-phosphonate or phosphoramidite chemistries (Froehler et al, 1986, Nucleic Acid Res 14:5399-5407; McBride et al, 1983, Tetrahedron Lett. 24:245-248). Synthetic sequences are between about 15 and about 500 bases in length, more typically between about 20 and about 50 bases.
  • synthetic nucleic acids include non-natural bases, e.g., inosine.
  • nucleic acid analogues may be used as binding sites for hybridization.
  • nucleic acid analogue is peptide nucleic acid ⁇ see, e.g., Egholm et al, 1993, PNA hybridizes to complementary oligonucleotides obeying the Watson-Crick hydrogen-bonding rules, Nature 365:566-568; see also U.S. Patent No. 5,539,083).
  • the binding (hybridization) sites are made from plasmid or phage clones of genes, cDNAs ⁇ e.g., expressed sequence tags), or inserts therefrom (Nguyen et al, 1995, Differential gene expression in the murine thymus assayed by quantitative hybridization of a ⁇ ayed cDNA clones, Genomics 29:207-209).
  • the polynucleotide of the binding sites is RNA.
  • the nucleic acid or analogue are attached to a solid support, which may be made from glass, plastic ⁇ e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, or other materials.
  • a prefe ⁇ ed method for attaching the nucleic acids to a surface is by printing on glass plates, as is described generally by Schena et al., 1995, Quantitative monitoring of gene expression patterns with a complementary DNA microa ⁇ ay, Science 270:467-470. This method is especially useful for preparing microa ⁇ ays of cDNA.
  • a second prefe ⁇ ed method for making microa ⁇ ays is by making high-density oligonucleotide a ⁇ ays.
  • Techniques are known for producing a ⁇ ays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface using photolithographic techniques for synthesis in situ ⁇ see, Fodor et al., 1991, Light- directed spatially addressable parallel chemical synthesis, Science 251 : 767-773; Pease et al, 1994, Light-directed oligonucleotide a ⁇ ays for rapid DNA sequence analysis, Proc. Natl. Acad. Sci.
  • oligonucleotides ⁇ e.g., 20- mers
  • a surface such as a derivatized glass slide.
  • Oligonucleotide probes can be chosen to detect alternatively spliced mRNAs or to serve as various type of control.
  • microa ⁇ ays Another prefe ⁇ ed method of making microa ⁇ ays is by use of an inkjet printing process to synthesize oligonucleotides directly on a solid phase, as described, e.g., in co-pending U.S. patent application Serial No. 09/008,120 filed on January 16, 1998, by Blanchard entitled “Chemical Synthesis Using Solvent Microdroplets", which is incorporated by reference herein in its entirety.
  • microarrays e.g., by masking
  • any type of a ⁇ ay for example, dot blots on a nylon hybridization membrane ⁇ see Sambrook et al., Molecular Cloning - A Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1989), could be used, although, as will be recognized by those of skill in the art, very small a ⁇ ays will be prefe ⁇ ed because hybridization volumes will be smaller.
  • RNA is extracted from cells of the various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation (Chirgwin et al, 1979, Biochemistry 18:5294-5299).
  • Poly(A)+ RNA is selected by selection with oligo-dT cellulose ⁇ see Sambrook et al, supra).
  • Cells of interest include wild-type cells, drug-exposed wild-type cells, modified cells, and drug-exposed modified cells.
  • Labeled cDNA is prepared from mRNA by oligo dT-primed or random-primed reverse transcription, both of which are well known in the art ⁇ see, e.g., Klug and Berger, 1987, Methods Enzymol. 152:316-325). Reverse transcription may be carried out in the presence of a dNTP conjugated to a detectable label, most preferably a fluorescently labeled dNTP.
  • isolated mRNA can be converted to labeled antisense RNA synthesized by in vitro transcription of double-stranded cDNA in the presence of labeled dNTPs (Lockhart et al, 1996, Expression monitoring by hybridization to high-density oligonucleotide a ⁇ ays, Nature Biotech. 14:1675, which is incorporated by reference in its entirety for all purposes).
  • the cDNA or RNA probe can be synthesized in the absence of detectable label and may be labeled subsequently, e.g., by incorporating biotinylated dNTPs or rNTP, or some similar means ⁇ e.g., photo-cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin ⁇ e.g., phycoerythrin-conjugated streptavidin) or the equivalent.
  • biotinylated dNTPs or rNTP or some similar means ⁇ e.g., photo-cross-linking a psoralen derivative of biotin to RNAs
  • streptavidin e.g., phycoerythrin-conjugated streptavidin
  • fluorophores When fluorescently-labeled probes are used, many suitable fluorophores are known, including fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others ⁇ see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, CA). It will be appreciated that pairs of fluorophores are chosen that have distinct emission spectra so that they can be easily distinguished.
  • a label other than a fluorescent label is used.
  • a radioactive label or a pair of radioactive labels with distinct emission spectra, can be used ⁇ see Zhao et al, 1995, High density cDNA filter analysis: a novel approach for large-scale, quantitative analysis of gene expression, Gene 156:207; Pietu et al, 1996, Novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA a ⁇ ay, Genome Res. 6:492).
  • use of radioisotopes is a less-prefe ⁇ ed embodiment.
  • labeled cDNA is synthesized by incubating a mixture containing 0.5 mM dGTP, dATP and dCTP plus 0.1 mM dTTP plus fluorescent deoxyribonucleotides ⁇ e.g., 0.1 mM Rhodamine 110 UTP (Perken Elmer Cetus) or 0.1 mM Cy3 dUTP (Amersham)) with reverse transcriptase ⁇ e.g., SuperscriptTM II, LTI Inc.) at 42° C for 60 min.
  • fluorescent deoxyribonucleotides ⁇ e.g., 0.1 mM Rhodamine 110 UTP (Perken Elmer Cetus) or 0.1 mM Cy3 dUTP (Amersham)
  • reverse transcriptase ⁇ e.g., SuperscriptTM II, LTI Inc.
  • Nucleic acid hybridization and wash conditions are optimally chosen so that the probe "specifically binds" or “specifically hybridizes” to a specific a ⁇ ay site, i.e., the probe hybridizes, duplexes or binds to a sequence a ⁇ ay site with a complementary nucleic acid sequence but does not hybridize to a site with a non-complementary nucleic acid sequence.
  • One polynucleotide sequence is considered complementary to another when, if the shorter of the polynucleotides is less than or equal to 25 bases, there are no mismatches using standard base-pairing rules or, if the shorter of the polynucleotides is longer than 25 bases, there is no more than a 5% mismatch.
  • the polynucleotides are perfectly complementary (no mismatches). It can easily be demonstrated that specific hybridization conditions result in specific hybridization by carrying out a hybridization assay including negative controls ⁇ see, e.g., Shalon et al, supra, and Chee et al, supra).
  • Optimal hybridization conditions will depend on the length ⁇ e.g., oligomer versus polynucleotide greater than 200 bases) and type ⁇ e.g., RNA, DNA, PNA) of labeled probe and immobilized polynucleotide or oligonucleotide.
  • length e.g., oligomer versus polynucleotide greater than 200 bases
  • type e.g., RNA, DNA, PNA
  • hybridization conditions are hybridization in 5 X SSC plus 0.2% SDS at 65 ° C for 4 hours followed by washes at 25° C in low stringency wash buffer (1 X SSC plus 0.2% SDS) followed by 10 minutes at 25° C in high stringency wash buffer (0.1 X SSC plus 0.2% SDS) (Shena et al, 1996, Proc. Natl. Acad. Sci. USA, 93:10614).
  • Useful hybridization conditions are also provided in, e.g., Tijessen, 1993, Hybridization With Nucleic Acid Probes, Elsevier Science Publishers B.V. and Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, CA.
  • the fluorescence emissions at each site of a transcript a ⁇ ay may be, preferably, detected by scanning confocal laser microscopy.
  • a separate scan, using the appropriate excitation line, is carried out for each of the two fluorophores used.
  • a laser may be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously ⁇ see Shalon et al, 1996, A DNA microa ⁇ ay system for analyzing complex DNA samples using two-color fluorescent probe hybridization, Genome Research 6:639-645, which is incorporated by reference in its entirety for all purposes).
  • the a ⁇ ays are scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective. Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser and the emitted light is split by wavelength and detected with two photomultiplier tubes. Fluorescence laser scanning devices are described in Schena et al, 1996, Genome Res. 6:639-645 and in other references cited herein. Alternatively, the fiber-optic bundle described by Ferguson et al, 1996, Nature Biotech. 14:1681-1684, may be used to monitor mRNA abundance levels at a large number of sites simultaneously.
  • Signals are recorded and, in a prefe ⁇ ed embodiment, analyzed by computer, e.g., using a 12 bit analog to digital board.
  • the scanned image is despeckled using a graphics program ⁇ e.g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet of the average hybridization at each wavelength at each site. If necessary, an experimentally determined co ⁇ ection for "cross talk" (or overlap) between the channels for the two fluors may be made.
  • a ratio of the emission of the two fluorophores can be calculated. The ratio is independent of the absolute expression level of the cognate gene, but is useful for genes whose expression is significantly modulated by drug administration, gene deletion, or any other tested event.
  • the relative abundance of an mRNA in two biological samples is scored as a perturbation and its magnitude determined ⁇ i.e., the abundance is different in the two sources of mRNA tested), or as not perturbed ⁇ i.e., the relative abundance is the same).
  • a difference between the two sources of RNA of at least a factor of about 25% RNA from one source is 25% more abundant in one source than the other source, more usually about 50%, even more often by a factor of about 2 (twice as abundant), 3 (three times as abundant) or 5 (five times as abundant) is scored as a perturbation.
  • Genesets can be determined by observing the gene expression response of perturbation to a particular pathway. For instance, transcript a ⁇ ays reflecting the transcriptional state of a biological sample of interest are made by hybridizing a mixture of two differently labeled probes each co ⁇ esponding ⁇ i.e., complementary) to the mRNA of a different sample of interest, to the microa ⁇ ay.
  • the two samples may be of the same type, i.e., of the same species and strain, but differ genetically at a limited number of loci. Alternatively, they are isogeneic and differ in their environmental history ⁇ e.g., exposed to a drug versus not exposed).
  • the genes whose expression are highly co ⁇ elated may belong to a geneset.
  • gene expression change in response to a large number of perturbations is used to construct a clustering tree for the purpose of defining genesets.
  • the perturbations should target different pathways.
  • biological samples are subjected to graded perturbations to pathways of interest.
  • the samples exposed to the perturbation and samples not exposed to the perturbation are used to construct transcript a ⁇ ays, which are measured to find the mRNAs with modified expression and the degree of modification due to exposure to the perturbation. In this way, the perturbation-response relationship is obtained.
  • the density of levels of the graded drug exposure and graded perturbation control parameter is governed by the sharpness and structure in the individual gene responses - the steeper the steepest part of the response, the denser the levels needed to properly resolve the response.
  • the fluorescent labels in two-color differential hybridization experiments it is preferable in order to reduce experimental e ⁇ or to reverse the fluorescent labels in two-color differential hybridization experiments to reduce biases peculiar to individual genes or a ⁇ ay spot locations.
  • Multiple measurements over exposure levels and perturbation control parameter levels provide additional experimental e ⁇ or control. With adequate sampling a trade-off may be made when choosing the width of the spline function S used to interpolate response data between averaging of e ⁇ ors and loss of structure in the response functions.
  • the cells are exposed to graded levels of the drug, drug candidate of interest or grade strength of other perturbation.
  • the compound is usually added to their nutrient medium.
  • yeast it is preferable to harvest the yeast in early log phase, since expression patterns are relatively insensitive to time of harvest at that time.
  • levels of the drug or other compounds may be added. The particular level employed depends on the particular characteristics of the drug, but usually will be between about 1 ng/ml and 100 mg/ml. In some cases a drug will be solubilized in a solvent such as DMSO.
  • the cells exposed to the drug and cells not exposed to the drug are used to construct transcript a ⁇ ays, which are measured to find the mRNAs with altered expression due to exposure to the drug. Thereby, the drug response is obtained.
  • the levels of drug exposure used proved sufficient resolution (e.g., by using approximately 10 levels of drug exposure) of rapidly changing regions of the drug response.
  • the transcriptional state of a cell may be measured by other gene expression technologies known in the art.
  • Several such technologies produce pools of restriction fragments of limited complexity for electrophoretic analysis, such as methods combining double restriction enzyme digestion with phasing primers ⁇ see, e.g., European Patent O 534858 Al, filed September 24, 1992, by Zabeau et al.), or methods selecting restriction fragments with sites closest to a defined mRNA end ⁇ see, e.g., Prashar et al., 1996, Proc. Natl. Acad. Sci. USA 93:659-663).
  • cDNA pools statistically sample cDNA pools, such as by sequencing sufficient bases (e.g., 20-50 bases) in each of multiple cDNAs to identify each cDNA, or by sequencing short tags ⁇ e.g., 9-10 bases) which are generated at known positions relative to a defined mRNA end ⁇ see, e.g., Velculescu, 1995, Science 270:484- 487).
  • Measurement of the translational state may be performed according to several methods.
  • whole genome monitoring of protein ⁇ i.e., the "proteome,” Goffeau et al, supra
  • binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome.
  • antibodies are present for a substantial fraction of the encoded proteins, or at least for those proteins relevant to the action of a drug of interest.
  • Methods for making monoclonal antibodies are well known (see, e.g., Harlow and Lane, 1988, Antibodies: A Laboratory Manual, Cold Spring Harbor, New York, which is incorporated in its entirety for all purposes).
  • monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence of the cell.
  • proteins from the cell are contacted to the a ⁇ ay and their binding is assayed with assays known in the art.
  • proteins can be separated by two-dimensional gel electrophoresis systems.
  • Two-dimensional gel electrophoresis is well-known in the art and typically involves iso-electric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e.g., Hames et al, 1990, Gel Electrophoresis of Proteins: A Practical Approach, IRL Press, New York; Shevchenko et al, 1996, Proc. Nafl Acad. Sci. USA 93:1440-1445; Sagliocco et al., 1996, Yeast 12:1519- 1533; Lander, 1996, Science 274:536-539.
  • the resulting electropherograms can be analyzed by numerous techniques, including mass spectrometric techniques, western blotting and immunoblot analysis using polyclonal and monoclonal antibodies, and internal and N-terminal micro-sequencing. Using these techniques, it is possible to identify a substantial fraction of all the proteins produced under given physiological conditions, including in cells ⁇ e.g., in yeast) exposed to a drug, or in cells modified by, e.g., deletion or over-expression of a specific gene.
  • the methods of the invention are applicable to any cellular constituent that can be monitored. For instance, where activities of proteins relevant to the characterization of a perturbation, such as drug action, can be measured, profiles can be based on such measurements. Activity measurements can be performed by any functional, biochemical, or physical means appropriate to the particular activity being characterized. Where the activity involves a chemical transformation, the cellular protein can be contacted with the natural substrate(s), and the rate of transformation measured. Where the activity involves association in multimeric units, for example association of an activated DNA binding complex with DNA, the amount of associated protein or secondary consequences of the association, such as amounts of mRNA transcribed, can be measured. Also, where only a functional activity is known, for example, as in cell cycle control, performance of the function can be observed. However known and measured, the changes in protein activities form the response data.
  • response data may be formed of mixed aspects of the biological state of a cell.
  • Response data can be constructed from, e.g., changes in certain mRNA abundances, changes in certain protein abundances, and changes in certain protein activities.
  • Methods for targeted perturbation of cellular states at various levels of a cell are increasingly known and applied in the art. Any such methods that are capable of specifically targeting and controllably modifying ⁇ e.g., either by a graded increase or activation or by a graded decrease or inhibition) specific cellular constituents ⁇ e.g., gene expression, RNA concentrations, protein abundances, protein activities, or so forth) can be employed in performing cellular state perturbations. Controllable modifications of cellular constituents consequentially controllably perturb cellular states originating at the modified cellular constituents. Preferable modification methods are capable of individually targeting each of a plurality of cellular constituents and most preferably a substantial fraction of such cellular constituents.
  • Cellular state perturbations may be made in cell types derived from any organism for which genomic or expressed sequence information is available and for which methods are available that permit controllable modification of the expression of specific genes. Genomic sequencing information is available for several eukaryotic organisms, including humans, nematodes, Arabidopsis, and Saccharomyces cerevisiae..
  • RNA abundances or activities direct modifications of protein abundances, and direct modification of protein activities including use of drugs (or chemical moieties in general) with specific known action.
  • the most commonly used controllable promoter in yeast is the GALl promoter (Johnston et al, 1984, Sequences that regulate the divergent GALI-GALIO promoter in Saccharomyces cerevisiae, Mol Cell. Biol. 8: 1440-1448).
  • the GALl promoter is strongly repressed by the presence of glucose in the growth medium, and is gradually switched on in a graded manner to high levels of expression by the decreasing abundance of glucose and the presence of galactose.
  • the GALl promoter usually allows a 5-100 fold range of expression control on a gene of interest.
  • promoter systems include the MET25 promoter (Kerjan et al, 1986, Nucleotide sequence of the Saccharomyces cerevisiae MET25 gene, Nucl. Acids. Res. 14:7861-7871), which is induced by the absence of methionine in the growth medium, and the CUP1 promoter, which is induced by copper (Masco ⁇ o-Gallardo et al, 1996, Construction of a CUPl promoter-based vector to modulate gene expression in Saccharomyces cerevisiae, Gene 172:169-170). All of these promoter systems are controllable in that gene expression can be incrementally controlled by incremental changes in the abundances of a controlling moiety in the growth medium.
  • Tet In mammalian cells, several means of titrating expression of genes are available (Spencer, 1996, Creating conditional mutations in mammals, Trends Genet. 12:181-187). As mentioned above, the Tet system is widely used, both in its original form, the "forward" system, in which addition of doxycycline represses transcription, and in the newer “reverse” system, in which doxycycline addition stimulates transcription (Gossen et al, 1995, Proc.
  • CID chemical-induced dimerization
  • the gene of interest is put under the control of the CID-responsive promoter, and transfected into cells expressing two different hybrid proteins, one comprised of a DNA-binding domain fused to FKBP12, which binds FK506.
  • the other hybrid protein contains a transcriptional activation domain also fused to FKBP12.
  • the CID inducing molecule is FK1012, a homodimeric version of
  • FK506 that is able to bind simultaneously both the DNA binding and transcriptional activating hybrid proteins. In the graded presence of FK1012, graded transcription of the controlled gene is activated. ⁇
  • the gene of interest is put under the control of the controllable promoter, and a plasmid harboring this construct along with an antibiotic resistance gene is transfected into cultured mammalian cells.
  • the plasmid DNA integrates into the genome, and drug resistant colonies are selected and screened for appropriate expression of the regulated gene.
  • the regulated gene can be inserted into an episomal plasmid such as pCEP4 (Invitrogen, Inc.), which contains components of the Epstein-Ban virus necessary for plasmid replication.
  • titratable expression systems such as the ones described above, are introduced for use into cells or organisms lacking the co ⁇ esponding endogenous gene and/or gene activity, e.g., organisms in which the endogenous gene has been disrupted or deleted.
  • Methods for producing such "knock outs" are well known to those of skill in the art, see e.g., Pettitt et al, 1996, Development 122:4149-4157; Spradling et al, 1995, Proc.
  • TRANSFECTION SYSTEMS FOR MAMMALIAN CELLS can introduce controllable perturbations in biological cellular states in mammalian cells.
  • transfection or transduction of a target gene can be used with cells that do not naturally express the target gene of interest.
  • Such non-expressing cells can be derived from a tissue not normally expressing the target gene or the target gene can be specifically mutated in the cell.
  • the target gene of interest can be cloned into one of many mammalian expression plasmids, for example, the pcDNA3.1 +/- system (Invitrogen, Inc.) or retroviral vectors, and introduced into the non-expressing host cells.
  • Transfected or transduced cells expressing the target gene may be isolated by selection for a drug resistance marker encoded by the expression vector.
  • the level of gene transcription is monotonically related to the transfection dosage. In this way, the effects of varying levels of the target gene may be investigated.
  • a particular example of the use of this method is the search for drugs that target the src-family protein tyrosine kinase, lck, a key component of the T cell receptor activation cellular state (Anderson et al, 1994, Involvement of the protein tyrosine kinase p56 (lck) in T cell signaling and thymocyte development, Adv. Immunol. 56:171-178). Inhibitors of this enzyme are of interest as potential immunosuppressive drugs (Hanke, 1996, Discovery of a Novel, Potent, and src family-selective tyrosine kinase inhibitor, J. Biol Chem 271:695- 701).
  • JcaMl A specific mutant of the Jurkat T cell line (JcaMl) is available that does not express lck kinase (Straus et al, 1992, Genetic evidence for the involvement of the lck tyrosine kinase in signal transduction through the T cell antigen receptor, Cell 70:585-593). Therefore, introduction of the lck gene into JCaMl by transfection or transduction permits specific perturbation of cellular states of T cell activation regulated by the lck kinase. The efficiency of transfection or transduction, and thus the level of perturbation, is dose related. The method is generally useful for providing perturbations of gene expression or protein abundances in cells not normally expressing the genes to be perturbed.
  • RNA abundances and activities cu ⁇ ently fall within three classes, ribozymes, antisense species, and RNA aptamers (Good et al, 1997, Gene Therapy 4: 45-54). Controllable application or exposure of a cell to these entities permits controllable perturbation of RNA abundances.
  • Ribozymes are RNAs which are capable of catalyzing RNA cleavage reactions. (Cech, 1987, Science 236:1532-1539; PCT International Publication WO 90/11364, published October 4, 1990; Sarver et al, 1990, Science 247: 1222-1225). "Hairpin” and "hammerhead” RNA ribozymes can be designed to specifically cleave a particular target mRNA. Rules have been established for the design of short RNA molecules with ribozyme activity, which are capable of cleaving other RNA molecules in a highly sequence specific way and can be targeted to virtually all kinds of RNA.
  • Ribozyme methods involve exposing a cell to, inducing expression in a cell, etc. of such small RNA ribozyme molecules. (Grassi and Marini, 1996, Annals of Medicine 28: 499-510; Gibson, 1996, Cancer and Metastasis Reviews 15: 287-299).
  • Ribozymes can be routinely expressed in vivo in sufficient number to be catalytically effective in cleaving mRNA, and thereby modifying mRNA abundances in a cell. (Cotten et al, 1989, Ribozyme mediated destruction of RNA in vivo, The EMBO J. 8:3861-3866).
  • a ribozyme coding DNA sequence designed according to the previous rules and synthesized, for example, by standard phosphoramidite chemistry, can be ligated into a restriction enzyme site in the anticodon stem and loop of a gene encoding a tRNA, which can then be transformed into and expressed in a cell of interest by methods routine in the art.
  • an inducible promoter e.g., a glucocorticoid or a tetracycline response element
  • a glucocorticoid or a tetracycline response element is also introduced into this construct so that ribozyme expression can be selectively controlled.
  • tDNA genes ⁇ i.e., genes encoding tRNAs
  • ribozymes can be routinely designed to cleave virtually any mRNA sequence, and a cell can be routinely transformed with DNA coding for such ribozyme sequences such that a controllable and catalytically effective amount of the ribozyme is expressed. Accordingly the abundance of virtually any RNA species in a cell can be perturbed.
  • RNA preferable mRNA
  • an "antisense" nucleic acid refers to a nucleic acid capable of hybridizing to a sequence-specific ⁇ e.g., non-poly A) portion of the target RNA, for example its translation initiation region, by virtue of some sequence complementarity to a coding and/or non-coding region.
  • the antisense nucleic acids of the invention can be oligonucleotides that are double-stranded or single-stranded, RNA or DNA or a modification or derivative thereof, which can be directly administered in a controllable manner to a cell or which can be produced intracellularly by transcription of exogenous, introduced sequences in controllable quantities sufficient to perturb translation of the target RNA.
  • Antisense nucleic acids are typically at least six nucleotides and are preferably oligonucleotides (ranging from 6 to about 200 nucleotides).
  • the oligonucleotides can be DNA or RNA or chimeric mixtures or derivatives or modified versions thereof, single- stranded or double-stranded.
  • the oligonucleotide can be modified at the base moiety, sugar moiety, or phosphate backbone.
  • the oligonucleotide may include other appending groups such as peptides, or agents facilitating transport across the cell membrane ⁇ see, e.g., Letsinger et al, 1989, Proc. Natl. Acad. Sci. U.S.A.
  • Antisense oligonucleotide are typically in the form of single-stranded DNA.
  • the oligonucleotide may be modified at any position on its structure with constituents generally known in the art.
  • Antisense oligonucleotides may comprise at least one modified base moiety such as 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine, 5-(carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-a
  • Antisenseoligonucleotides may contain modified sugar moities such as arabinose, 2-fluoroarabinose, xylulose, and hexose.
  • Antisense oligonucleotide may contain modified phosphate backbones such as a phosphorothioate, a phosphorodithioate, a phosphoramidothioate, a phosphoramidate, a phosphordiamidate, a methylphosphonate, an alkyl phosphotriester, and a formacetal or analog thereof.
  • Antisense oligonucleotides may be a 2- ⁇ -anomeric oligonucleotide.
  • oligonucleotide forms specific double- stranded hybrids with complementary RNA in which, contrary to the usual ⁇ -units, the strands run parallel to each other (Gautier et al, 1987, Nucl. Acids Res. 15: 6625-6641).
  • the oligonucleotide may be conjugated to another molecule, e.g., a peptide, hybridization triggered cross-linking agent, transport agent, hybridization-triggered cleavage agent, etc.
  • Antisense nucleic acids comprise a sequence complementary to at least a portion of a target RNA species. However, absolute complementarity, is not required.
  • a sequence "complementary to at least a portion of an RNA,” as refe ⁇ ed to herein, means a sequence having sufficient complementarity to be able to hybridize with the RNA, forming a stable duplex; in the case of double-stranded antisense nucleic acids, a single strand of the duplex DNA may thus be tested, or triplex formation may be assayed. The ability to hybridize will depend on both the degree of complementarity and the length of the antisense nucleic acid.
  • the longer the hybridizing nucleic acid the more base mismatches with a target RNA it may contain and still form a stable duplex (or triplex, as the case may be).
  • One skilled in the art can ascertain a tolerable degree of mismatch by use of standard procedures to determine the melting point of the hybridized complex.
  • the amount of antisense nucleic acid that will be effective in the inhibiting translation of the target RNA can be determined by standard assay techniques.
  • Antisense oligonucleotides may be synthesized by standard methods known in the art, e.g. by use of an automated DNA synthesizer (such as are commercially available from Biosearch, Applied Biosystems, etc.).
  • an automated DNA synthesizer such as are commercially available from Biosearch, Applied Biosystems, etc.
  • phosphorothioate oligonucleotides may be synthesized by the method of Stein et al. (1988, Nucl. Acids Res. 16: 3209)
  • methylphosphonate oligonucleotides can be prepared by use of controlled pore glass polymer supports (Sarin et al, 1988, Proc. Natl. Acad. Sci. U.S.A. 85: 7448-7451), etc.
  • the oligonucleotide is a 2'-0-methylribonucleotide (Inoue et al, 1987, Nucl. Acids Res. 15: 6131-6148), or a chimeric RNA-DNA analog (Inoue et al, 1987, FEBS Lett. 215: 327-330).
  • the synthesized antisense oligonucleotides can then be administered to a cell in a controlled manner.
  • the antisense oligonucleotides can be placed in the growth environment of the cell at controlled levels where they may be taken up by the cell.
  • the uptake of the antisense oligonucleotides can be assisted by use of methods well known in the art.
  • antisense nucleic acids are controllably expressed intracellularly by transcription from an exogenous sequence.
  • a vector can be introduced in vivo such that it is taken up by a cell, within which cell the vector or a portion thereof is transcribed, producing an antisense nucleic acid (RNA) of the invention.
  • RNA antisense nucleic acid
  • Such a vector would contain a sequence encoding the antisense nucleic acid.
  • Such a vector can remain episomal or become chromosomally integrated, as long as it can be transcribed to produce the desired antisense RNA.
  • Such vectors can be constructed by recombinant DNA * technology methods standard in the art.
  • Vectors can be plasmid, viral, or others known in the art, used for replication and expression in mammalian cells.
  • Expression of the sequences encoding the antisense RNAs can be by any promoter known in the art to act in a cell of interest. Such promoters can be inducible or constitutive. Most preferably, promoters are controllable or inducible by the administration of an exogenous moiety in order to achieve controlled expression of the antisense oligonucleotide. Such controllable promoters include the Tet promoter. Less preferably usable promoters for mammalian cells include, but are not limited to: the SV40 early promoter region (Bernoist and Chambon, 1981, Nature 290:
  • antisense nucleic acids can be routinely designed to target virtually any mRNA sequence, and a cell can be routinely transformed with or exposed to nucleic acids coding for such antisense sequences such that an effective and controllable amount of the antisense nucleic acid is expressed. Accordingly the translation of virtually any RNA species in a cell can be controllably perturbed.
  • RNA aptamers can be introduced into or expressed in a cell.
  • RNA aptamers are specific RNA ligands for proteins, such as for Tat and Rev RNA (Good et al, 1997, Gene Therapy 4: 45-54) that can specifically inhibit their translation.
  • Methods of modifying protein abundances include, inter alia, those altering protein degradation rates and those using antibodies (which bind to proteins affecting abundances of activities of native target protein species). Increasing (or decreasing) the degradation rates of a protein species decreases (or increases) the abundance of that species. Methods for controllably increasing the degradation rate of a target protein in response to elevated temperature and/or exposure to a particular drug, which are known in the art, can be employed.
  • one such method employs a heat-inducible or drug-inducible N- terminal degron, which is an N-terminal protein fragment that exposes a degradation signal promoting rapid protein degradation at a higher temperature ⁇ e.g., 37° C) and which is hidden to prevent rapid degradation at a lower temperature ⁇ e.g., 23° C) (Dohmen et. al, 1994, Science 263:1273-1276).
  • a degron is Arg-DHFR , a variant of murine dihydrofolate reductase in which the N-terminal Val is replaced by Arg and the Pro at position 66 is replaced with Leu.
  • a gene for a target protein, P is replaced by standard gene targeting methods known in the art (Lodish et al, 1995, Molecular Biology of the Cell. W.H. Freeman and Co., New York, especially chapter 8) with a gene coding for the fusion protein Ub-Arg-DHFR ⁇ -P ("Ub” stands for ubiquitin).
  • Ub stands for ubiquitin
  • the N-terminal ubiquitin is rapidly cleaved after translation exposing the N-terminal degron. At lower temperatures, lysines internal to Arg-DHFR 15 are not exposed, ubiquitination of the fusion protein does not occur, degradation is slow, and active target protein levels are high.
  • Target protein abundances and directly or indirectly, their activities can be decreased by (neutralizing) antibodies.
  • protein abundances/activities can be controllably modified.
  • antibodies to suitable epitopes on protein surfaces may decrease the abundance, and thereby indirectly decrease the activity, of the wild-type active form of a target protein by aggregating active forms into complexes with less or minimal activity as compared to the wild-type unaggregated wild-type form.
  • antibodies may directly decrease protein activity by, e.g., interacting directly with active sites or by blocking access of substrates to active sites.
  • (activating) antibodies may also interact with proteins and their active sites to increase resulting activity.
  • antibodies of the various types to be described
  • antibodies can be raised against specific protein species (by the methods to be described) and their effects screened.
  • the effects of the antibodies can be assayed and suitable antibodies selected that raise or lower the target protein species concentration and/or activity.
  • assays involve introducing antibodies into a cell (see below), and assaying the concentration of the wild-type amount or activities of the target protein by standard means (such as immunoassays) known in the art.
  • the net activity of the wild-type form can be assayed by assay means appropriate to the known activity of the target protein.
  • Antibodies can be introduced into cells in numerous fashions, including, for example, microinjection of antibodies into a cell (Morgan et al, 1988, Immunology Today 9:84-86) or transforming hybridoma mRNA encoding a desired antibody into a cell (Burke et al, 1984, Cell 36:847-858).
  • recombinant antibodies can be engineering and ectopically expressed in a wide variety of non-lymphoid cell types to bind to target proteins as well as to block target protein activities (Biocca et al, 1995, Trends in Cell Biology 5:248-252).
  • expression of the antibody is under control of a controllable promoter, such as the Tet promoter.
  • a first step is the selection of a particular monoclonal antibody with appropriate specificity to the target protein (see below). Then sequences encoding the variable regions of the selected antibody can be cloned into various engineered antibody formats, including, for example, whole antibody, Fab fragments, Fv fragments, single chain Fv fragments (V H and V L regions united by a peptide linker) ("ScFv" fragments), diabodies (two associated ScFv fragments with different specificities), and so forth (Hayden et al, 1997, Cu ⁇ ent Opinion in Immunology 9:210-212).
  • Intracellularly expressed antibodies of the various formats can be targeted into cellular compartments ⁇ e.g., the cytoplasm, the nucleus, the mitochondria, etc.) by expressing them as fusions with the various known intracellular leader sequences (Bradbury et al, 1995, Antibody Engineering (vol. 2) (Bo ⁇ ebaeck ed.), pp 295-361, IRL Press).
  • the ScFv format appears to be particularly suitable for cytoplasmic targeting.
  • Antibody types include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, Fab fragments, and an Fab expression library.
  • Various procedures known in the art may be used for the production of polyclonal antibodies to a target protein.
  • various host animals can be immunized by injection with the target protein, such host animals include, but are not limited to, rabbits, mice, rats, etc.
  • adjuvants can be used to increase the immunological response, depending on the host species, and include, but are not limited to, Freund's (complete and incomplete), mineral gels such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, dinitrophenol, and potentially useful human adjuvants such as bacillus Calmette-Guerin (BCG) and corynebacterium parvum.
  • BCG Bacillus Calmette-Guerin
  • any technique that provides for the production of antibody molecules by continuous cell lines in culture may be used.
  • Such techniques include, but are not restricted to, the hybridoma technique originally developed by Kohler and Milstein (1975, Nature 256: 495-497), the trioma technique, the human B-cell hybridoma technique (Kozbor et al, 1983, Immunology Today 4: 72), and the EBV hybridoma technique to produce human monoclonal antibodies (Cole et al, 1985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).
  • monoclonal antibodies can be produced in germ-free animals utilizing recent technology (PCT/US90/02545).
  • human antibodies may be used and can be obtained by using human hybridomas (Cote et al, 1983, Proc. Natl. Acad. Sci. USA 80: 2026-2030), or by transforming human B cells with EBV virus in vitro (Cole et al, 1985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).
  • techniques developed for the production of "chimeric antibodies” (Morrison et al, 1984, Proc. Natl. Acad. Sci.
  • monoclonal antibodies can be alternatively selected from large antibody libraries using the techniques of phage display (Marks et al, 1992, J. Biol. Chem. 267:16007-16010). Using this technique, libraries of up to 10 12 different antibodies have been expressed on the surface of fd filamentous phage, creating a "single pot" in vitro immune system of antibodies available for the selection of monoclonal antibodies (Griffiths et al, 1994, EMBO J. 13:3245-3260).
  • Selection of antibodies from such libraries can be done by techniques known in the art, including contacting the phage to immobilized target protein, selecting and cloning phage bound to the target, and subcloning the sequences encoding the antibody variable regions into an appropriate vector expressing a desired antibody format.
  • Methods of directly modifying protein activities include, inter alia, dominant negative mutations, specific drugs (used in the sense of this application) or chemical moieties generally, and also the use of antibodies.
  • Dominant negative mutations are mutations to endogenous genes or mutant exogenous genes that, when expressed in a cell, disrupt the activity of a targeted protein species.
  • general rules exist that guide the selection of an appropriate strategy for constructing dominant negative mutations that disrupt activity of that target (Hershkowitz, 1987, Nature 329:219-222).
  • over expression of an inactive form can cause competition for natural substrates or ligands sufficient to significantly reduce net activity of the target protein.
  • Such over expression can be achieved by, for example, associating a promoter, preferably a controllable or inducible promoter, of increased activity with the mutant gene.
  • changes to active site residues can be made so that a virtually i ⁇ eversible association occurs with the target ligand. Such can be achieved with certain tyrosine kinases by careful replacement of active site serine residues (Perlmutter et al, 1996, Cu ⁇ ent Opinion in Immunology 8:285-290).
  • Multimeric activity can be controllably decreased by expression of genes coding exogenous protein fragments that bind to multimeric association domains and prevent multimer formation.
  • controllable over expression of an inactive protein unit of a particular type can tie up wild-type active units in inactive multimers, and thereby decrease multimeric activity (Nocka et al, 1990, The EMBO J. 9:1805-1813).
  • the DNA binding domain can be deleted from the DNA binding unit, or the activation domain deleted from the activation unit.
  • the DNA binding domain unit can be expressed without the domain causing association with the activation unit.
  • DNA binding sites are tied up without any possible activation of expression.
  • a particular type of unit normally undergoes a conformational change during activity
  • expression of a rigid unit can inactivate resultant complexes.
  • proteins involved in cellular mechanisms such as cellular motility, the mitotic process, cellular architecture, and so forth, are typically composed of associations of many subunits of a few types. These structures are often highly sensitive to disruption by inclusion of a few monomeric units with structural defects. Such mutant monomers disrupt the relevant protein activities and can be controllably expressed in a cell.
  • mutant target proteins that are sensitive to temperature (or other exogenous factors) can be found by mutagenesis and screening procedures that are well-known in the art.
  • activities of certain target proteins can be controllably altered by exposure to exogenous drugs or ligands.
  • a drug is known that interacts with only one target protein in the cell and alters the activity of only that one target protein.
  • Graded exposure of a cell to varying amounts of that drug thereby causes graded perturbations of cellular states originating at that protein.
  • the alteration can be either a decrease or an increase of activity.
  • a drug is known and used that alters the activity of only a few ⁇ e.g., 2-5) target proteins with separate, distinguishable, and non-overlapping effects.
  • Graded exposure to such a drug causes graded perturbations to the several cellular states originating at the target proteins.
  • two profiles generated using the immunosuppressant drugs, Cyclosporin A and FK506 provide an illustration of one aspect of the present invention.
  • the profiles were obtained with mRNA transcript a ⁇ ays in the yeast S. Cerevisiae as described in M. Marton, et al, supra and in the Experimental section infra.
  • the transcriptional signatures of these drugs are illustrated in Figure 3.
  • the horizontal axis in these plots is the intensity of the individual hybridized spots on the microarray, representing individual mRNA species abundances in the two.
  • the vertical axis is the log 10 of the ratio of the intensity measured for one fluorescent label (Culture 1) to that measured for the other label (Culture 2).
  • FIG. 4 shows the high co ⁇ elation (similarity) between the effects of the two drugs at these concentrations, 1 ⁇ g/ml for FK506 and 30 ⁇ g/ml for Cyclosporin, where both drugs affect primarily the calcineurin-mediated pathway which is the yeast analogue of the T-cell activation pathway in humans.
  • the co ⁇ elation coefficient of the log 10 (expression ratio) is 0.98, where this is computed based on those genes which were significantly up or down regulated at the 95% confidence level in either experiment.
  • Cyclosporin A was added to a concentration of 30 ⁇ g/ml.
  • Cells were broken by standard procedures (see e.g. Ausubel et al, Cu ⁇ ent Protocols in Molecular Biology, John Wiley & Sons, Inc. (New York), 12.12.1 - 13.12.5) with the following modifications.
  • Cell pellets were resuspended in breaking buffer (0.2M Tris HCI pH 7.6, 0.5M NaCl, 10 mM EDTA, 1% SDS), vortexed for 2 minutes on a VWR multitube vortexer at setting 8 in the presence of 60% glass beads (425-600 ⁇ m mesh; Sigma) and phenolxhloroform (50:50, v/v).
  • RNA was isolated by two sequential chromatographic purifications over oligo dT cellulose (NEB) using established protocols. See e.g. Ausubel et al, supra). Preparation and hybridization of the labeled sample
  • Fluorescently-labeled cDNA was prepared, purified and hybridized essentially as described by DeRisi et al. DeRisi et al, 1997, Exploring the metabolic and genetic control of gene expression on a genomic scale, Science 278:680-686. Briefly, Cy3- or Cy5-dUTP (Amersham) was incorporated into cDNA during reverse transcription (Superscript II, LTI, Inc.) And purified by concentrating to less than 10 ⁇ l using Microcon-30 microconcentrators (A icon).
  • Paired cDNAs were resuspended in 20-26 ⁇ l hybridization solution (3x SSC, 0.75 ⁇ g/ml poly A DNA, 0.2% SDS) and applied to the microarray under a 22x30 mm coverslip for 6 hr at 63 °C, all according to DeRisi et al, (1997), supra.
  • PCR products containing common 5' and 3' sequences were used as templates with amino-modified forward primer and unmodified reverse primers to PCR amplify 6065 ORFs from the S. cervisiae genome.
  • First pass success rate was 94%.
  • Amplification reactions that gave products of unexpected sizes were excluded from subsequent analysis.
  • ORFs that could not be amplified from purchased templates were amplified from genomic DNA. DNA samples from 100 ⁇ l reactions were isopropanol precipitated, resuspended in water, brought to 3x SSC in a total volume of 15 ⁇ l, and transfe ⁇ ed to 384-well microtiter plates (Genetix).
  • PCR products were spotted into 1x3 inch polylysine-treated glass slides by a robot built according to specifications provided in Schena et al, supra; DeRisi et al, 1996, Discovery and analysis of inflammatory disease- related genes using cDNA microa ⁇ ays, PNAS USA. 94:2150-2155; and DeResi et al, (1997). After printing, slides were processed following published protocols. See DeResi et al, (1997).
  • Microa ⁇ ays were images on a prototype multi-frame CCD camera in development at Applied Precision, Inc. (Seattle, WA). Each CCD image frame was approximately 2mm square. Exposure time of 2 sec in the Cy5 channel (white light through Chroma 618-648 nm excitation filter, Chroma 657-727 nm emission filter) and 1 sec in the Cy3 channel (Chroma 535-560 nm excitation filter, Chroma 570-620 nm emission filter) were done consecutively in each fram before moving to the next, spatially contiguous frame. Color isolation between the Cy3 and Cy5 channels was -100:1 or better. Frames were knitted together in software to make the complete images.
  • the intensity of spots ( ⁇ lOO ⁇ m) were quantified from the 10 ⁇ m pixels by frame background subtraction and intensity averaging in each channel. Dynamic range of the resulting spot intensities was typically a ration of 1000 between the brightest spots and the background-subracted additive e ⁇ or level. Normalization between the channels was accomplished by normalizing each channel to the mean intensities of all genes. This procedure is nearly equivalent to normalization between channels using the intensity ration of genomic DNA spots (See DeRisi et al, 1997) , but is possibly more robust since it is based on the intensities of several thousand spots distributed over the a ⁇ ay.
  • This confidence level was assigned based on an e ⁇ or model which assigns a lognormal probability distribution to each gene's expression ratio with characteristic width based on the observed scatter in its repeated measurements (repeated a ⁇ ays at the same nominal experimental conditions) and on the individual a ⁇ ay hybridization quality. This latter dependence was derived from control experiments in which both Cy3 and Cy5 samples were derived from the same RNA sample. For large numbers of repeated measurements the e ⁇ or reduces to the observed scatter. For a single measurement the e ⁇ or is based on the a ⁇ ay quality and the spot intensity.
  • Random measurement e ⁇ ors in the x and y signatures tend to bias the co ⁇ elation toward zero. In most experiments the great majority of genes is not significantly affected but do exhibit small random measurement e ⁇ ors. Selecting only the 95% confidence genes for the co ⁇ elation calculation, rather than the entire genome, reduces this bias and makes the actual biological co ⁇ elations more apparent.
  • Co ⁇ elations between a profile and itself are unity by definition.
  • E ⁇ or limits on the co ⁇ elation are 95% confidence limits based on the individual measurement e ⁇ or bars, and assuming unco ⁇ elated e ⁇ ors. They do not include the bias mentioned above; thus, a departure of p from unity does not necessarily mean that the underlying biological co ⁇ elation is imperfect. However, a co ⁇ elation of 0.7 ⁇ 0.1, for example, is very significantly different from zero. Small (magnitude of p ⁇ 0.2) but formally significant co ⁇ elation in the tables and text probably are due to small systematic biases in the Cy5/Cy3 ratios which violate the assumption of independent measurement e ⁇ ors used to generate the
  • Expression ratios are based on mean intensities over each spot.
  • the occasional 0 smaller spots have fewer image pixels in the average. This does not degrade accuracy noticeably until the number of pixels falls below ten, in which case the spot is rejected from the data set.
  • Wander of spot positions with respect to the nominal grid is adaptively tracked in a ⁇ ay subregions by the image processing software.
  • Unequal spot wander within a subregion greater than half a spot spacing is problematic for the automated quantitating 5 algorithms; in this case the spot is rejected from analysis based on human inspection of the wander. Any spots partially overlapping are excluded from the data set. Less than 1% of spots typically are rejected for these reasons.
  • FIG. 7 illustrates an exemplary computer system suitable for implementation of the analytic methods of this invention.
  • Computer system 501 is illustrated as comprising internal components and being linked to external components.
  • the internal components of this computer system include processor element 502 interconnected with main memory 503.
  • processor element 502 interconnected with main memory 503.
  • computer system 501 can be an Intel 8086-, 80386-, 80486-, Pentiumy, or Pentiumy-based processor with preferably 32 MB or more of main memory.
  • the external components include mass storage 504.
  • This mass storage can be one or more hard disks (which are typically packaged together with the processor and memory). Such hard disks are preferably of 1 GB or greater storage capacity.
  • Other external components include user interface device 505, which can be a monitor, together with inputing device 506, which can be a "mouse", or other graphic input devices (not illustrated), and/or a keyboard.
  • a printing device 508 can also be attached to the computer 501.
  • computer system 501 is also linked to network link 507, which can be part of an Ethernet link to other local computer systems, remote computer systems, or wide area communication networks, such as the Internet.
  • network link 507 can be part of an Ethernet link to other local computer systems, remote computer systems, or wide area communication networks, such as the Internet.
  • This network link allows computer system 501 to share data and processing tasks with other computer systems.
  • Software component 510 represents the operating system, which is responsible for managing computer system 501 and its network interconnections. This operating system can be, for example, of the Microsoft Windows® family, such as Windows 3.1, Windows 95, Windows 98, or Windows NT.
  • Software component 511 represents common languages and functions conveniently present on this system to assist programs implementing the methods specific to this invention. Many high or low level computer languages can be used to program the analytic methods of this invention. Instructions can be interpreted during run-time or compiled.
  • Prefe ⁇ ed languages include C/ C++, FORTRAN and JAVAy.
  • the methods of this invention are programmed in mathematical software packages that allow symbolic entry of equations and high-level specification of processing, including algorithms to be used, thereby freeing a user of the need to procedurally program individual equations or algorithms.
  • Such packages include Matlab from Mathworks (Natick, MA), Mathematica® from Wolfram Research (Champaign, IL), or S-Plus® from Math Soft (Cambridge, MA).
  • software component 512 and/or 513 represents the analytic methods of this invention as programmed in a procedural language or symbolic package.
  • a user first loads experimental data into the computer system 501. These data can be directly entered by the user from monitor 505, keyboard 506, or from other computer systems linked by network connection 507, or on removable storage media such as a CD-ROM, floppy disk (not illustrated), tape drive (not illustrated), ZIP® drive (not illustrated) or through the network (507).
  • the user causes execution of expression profile analysis software 512 which performs the methods of the present invention.
  • a user first loads experimental data and/or databases into the computer system. This data is loaded into the memory from the storage media (504) or from a remote computer, preferably from a dynamic geneset database system, through the network (507). Next the user causes execution of software that performs the steps of the present invention.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Urology & Nephrology (AREA)
  • Hematology (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Food Science & Technology (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
EP99968165A 1998-12-23 1999-12-21 Verfahren zur grobunterscheidung von profilen Withdrawn EP1141415A1 (de)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US222597 1988-07-21
US220274 1994-03-30
US22027498A 1998-12-23 1998-12-23
US22259798A 1998-12-28 1998-12-28
PCT/US1999/030577 WO2000039337A1 (en) 1998-12-23 1999-12-21 Methods for robust discrimination of profiles

Publications (1)

Publication Number Publication Date
EP1141415A1 true EP1141415A1 (de) 2001-10-10

Family

ID=26914716

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99968165A Withdrawn EP1141415A1 (de) 1998-12-23 1999-12-21 Verfahren zur grobunterscheidung von profilen

Country Status (5)

Country Link
EP (1) EP1141415A1 (de)
JP (1) JP2002533699A (de)
AU (1) AU2483900A (de)
CA (1) CA2356891A1 (de)
WO (1) WO2000039337A1 (de)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10040289A1 (de) * 2000-08-17 2002-02-28 Aventis Res & Tech Gmbh & Co Verfahren zur Herstellung und Ermittlung geeigneter Effektoren von Zielmolekülen mit Substanzbibliotheken
US6861034B1 (en) 2000-11-22 2005-03-01 Xerox Corporation Priming mechanisms for drop ejection devices
US6514704B2 (en) * 2001-02-01 2003-02-04 Xerox Corporation Quality control mechanism and process for a biofluid multi-ejector system
AUPR480901A0 (en) * 2001-05-04 2001-05-31 Genomics Research Partners Pty Ltd Diagnostic method for assessing a condition of a performance animal
US7348144B2 (en) * 2003-08-13 2008-03-25 Agilent Technologies, Inc. Methods and system for multi-drug treatment discovery
EP2027465A2 (de) 2006-05-17 2009-02-25 Cellumen, Inc. Verfahren für automatisierte gewebeanalyse
WO2012125807A2 (en) 2011-03-17 2012-09-20 Cernostics, Inc. Systems and compositions for diagnosing barrett's esophagus and methods of using the same
US10929753B1 (en) 2014-01-20 2021-02-23 Persyst Development Corporation System and method for generating a probability value for an event
US11457855B2 (en) 2018-03-12 2022-10-04 Persyst Development Corporation Method and system for utilizing empirical null hypothesis for a biological time series

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5800992A (en) * 1989-06-07 1998-09-01 Fodor; Stephen P.A. Method of detecting nucleic acids
WO1996012187A1 (en) * 1994-10-13 1996-04-25 Horus Therapeutics, Inc. Computer assisted methods for diagnosing diseases
US5564433A (en) * 1994-12-19 1996-10-15 Thornton; Kirtley E. Method for the display, analysis, classification, and correlation of electrical brain function potentials

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0039337A1 *

Also Published As

Publication number Publication date
WO2000039337A1 (en) 2000-07-06
CA2356891A1 (en) 2000-07-06
JP2002533699A (ja) 2002-10-08
WO2000039337A9 (en) 2000-11-30
AU2483900A (en) 2000-07-31

Similar Documents

Publication Publication Date Title
US6468476B1 (en) Methods for using-co-regulated genesets to enhance detection and classification of gene expression patterns
US6203987B1 (en) Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns
US6950752B1 (en) Methods for removing artifact from biological profiles
US6801859B1 (en) Methods of characterizing drug activities using consensus profiles
US6859735B1 (en) Computer systems for identifying pathways of drug action
AU738900B2 (en) Methods for drug target screening
US6370478B1 (en) Methods for drug interaction prediction using biological response profiles
US6324479B1 (en) Methods of determining protein activity levels using gene expression profiles
US20040091933A1 (en) Methods for genetic interpretation and prediction of phenotype
WO2002044411A1 (en) Use of profiling for detecting aneuploidy
WO2002002740A2 (en) Methods and compositions for determining gene function
EP1349957A2 (de) Zusammensetzungen und verfahren, geeignet für exon profiling
WO2000039337A1 (en) Methods for robust discrimination of profiles
AU773456B2 (en) Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns
WO2002002741A2 (en) Methods for genetic interpretation and prediction of phenotype
US20020146694A1 (en) Functionating genomes with cross-species coregulation

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010719

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20030701