CA2356891A1

CA2356891A1 - Methods for robust discrimination of profiles

Info

Publication number: CA2356891A1
Application number: CA002356891A
Authority: CA
Inventors: Stephen H. Friend; Roland Stoughton
Original assignee: Individual
Current assignee: Rosetta Inpharmatics LLC
Priority date: 1998-12-23
Filing date: 1999-12-21
Publication date: 2000-07-06
Also published as: JP2002533699A; EP1141415A1; WO2000039337A9; AU2483900A; WO2000039337A1

Abstract

Methods for discriminating between the subtle effects of a first perturbation and a second perturbation on a biological sample are provided. Further, methods for identifying disease states in patients and methods for optimizing drug therapy regiments in diseased subjects are provided. Finally, improved methods for determining the subtle effects of pharmacological agents on a biological system are provided.

Description

Methods for Robust Discrimination of Profiles This is a continuation-in-part of copending application serial number 09/220,274, by Stoughton et al. filed Decembesr 23, 1998 entitled, "Methods for Robust Discrimination of Profiles" which is incorporatef. by reference herein in its entirety.

The field of this invention relates to methods for discriminating between the subtle effects of a first perturbation anal a second perturbation on a biological sample. The invention also relates to improved methods for identifying disease states in patients. In addition, the invention provides. improved methods for optimizing drug therapy regimens in diseased subjects. The invention also generally relates to improved methods for __ . determining the subtle effects of pharmacological agents on a biological system.
2 BA(~KGROUND OF THE INVENTION
2.1 Profiles of Cellular Constituents "Cellular constituents" include gene expression levels, abundance of mRNA
encoding specific genes, and protein expression levels in a biological sample.
Levels of various constituents of a cell, such as mRNA encoding genes and/or protein expression levels, are known to change in rc;sponse to drug treatments and other perturbations of the cell's biological state. Measurements of a plurality of such "cellular constituents" therefore contain a wealth of information about the affect of perturbations on the cell's biological state. The collection of such measurements is generally referred to as the "profile" of the cell's biological state.
There may be on the order of 100,000 different cellular constituents for mammalian cells. Consequently, the profile of a particular cell is typically complex.
The profile of any given state of a biological sample; is often measured after the sample has been subjected to a perturbation. Such perturbations include, for example, exposure of the sample to a drug candidate, the introduction of an exogenous gene, the deletion of a gene from the sample, or changes in culture conditions. Comprehensive measurements of cellular constituents, or profiles of gene and protein expression and their response to perturbations in the cell, therefore have a wide range of utility including the ability to compare and understand the effects of drugs, diagnose disease" and optimize patient drug regimens. In addition, they have further application in a basic life science research.

Within the past decade, several technological advances have made it possible to accurately measure cellular constituents and therefore derive profiles. For example, new techniques provide the ability to monitor the expression level of a large number of transcripts at any one time (see', e.g., Schena et aL, 1995, Quantitative monitoring of gene expression patterns with a corr.~plementary DNA micro-array, cience 270:467-470;
Lockhart et al., 1996, Expression monitoring by hybridization to high-density oligonucleotide arrays, Nature BiotechnoloQV 14:1675-1680; Blanchard et al., 1996, Sequence to array: Probing thc: genome's secrets, Nature Biotechnolosv 14, 1649; U.S.
Patent 5,569,588, issued October 29, 1996 to Ashby et al. entitled "Methods for Drug Screening"). In organisms for which the complete genome is known, it is possible to ~alyze the transcripts of all genes within the cell. With other organisms, such as humans, for which there is an increasing; knowledge of the genome, it is possible to simultaneously monitor large numbers of the genes within the cell.
In another front, the direct measurement of protein abundance has been improved by the use of microcolumn reversed-phase liquid chromatography electrospray ionization t~dem mass spectrometry (LC,~1VIS/MS) to directly identify proteins contained in mixtures.
This technology promises to push the dynamic range for which protein abundance can be measured in a biological sample. Using LC/MS/MS, McCormack et al. have demonstrated that proteins presented in sample mixtures can be readily identified with a 30-fold difference in molar quantity, that the identifications are reproducible, and that proteins mthm the mixture can be identified at low femtomole levels. McCormack et al., 1997, Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level, Anal. Chem. 69:767-776. In a review of tandem mass spectrometry, Chait points out that an additional advantage of this technology is that it is orders of magnitude faster than more conventional approaches such as Edman sequencing. Chait, 1996, Trawliing for proteins in the post-genome era, Nat.
Biotech.
14:1544.
Other technological advances have provided for the ability to specifically perturb biological samples with individual genetic mutations. For example, Mortensen et al.
describe a method for producing embryonic stem (ES) cell lines whereby both alleles are inactivated by homologous recombination. Using the methods of Mortensen et al., it is possible to obtain homozygous mutationally altered cells, i.e., double knockouts of ES cell lines. Mortensen et al. propose that their method may be generally applicable to other genes and to cell lines other than ES cells. Mortensen et al. 1992, Production of homozygous mutant ES cells with a single targeting construct, 11 iol. 12:2391-2395.
~ mother promising technology Wach et al. provide a dominant resistance module for selection of S. cerevisiae tran:>formants which entirely consists of heterologous DNA.
The module can also be used to provide PCR based gene disruptions. Wach et al., 1994, New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae, Yeast. 10:1793-808..
Technological advances., such as the use of DNA microarrays, are already being used in drug discovery (See e.g" Morton et al., 1998, Drug target validation and identification of secondary drug; target effects using DNA microarrays, Nature Medicine in press; Gray et al., 1998, Exploiting chemical libraries, structure, and genomics in the search for kinase inhibitors, fence 281:533-538).
2.2 Profile comparis rtes Comparison of profiles with other profiles in a database (see, e.g., U.S.
Patent 5,777,888, issued July 7, 1998 to Rine et al. entitled "Systems for generating and analyzing stimulus-response output signal matrices") or clustering of profiles by similarity can give clues to the molecular targets of drugs and related functions, efficacy and toxicity of drug candidates and/or pharmacological agents. Such comparisons may also be used to derive consensus profiles representative: of ideal drug activities or disease states.
Profile comparison can also help detect diseases in a patient at-an early stage and provide improved clinical outcome projections for a patient diagnosed with a disease.
At the center of all these profile comparison efforts is the need for robust discrimination of subtle differences in activity of the experimental conditions {"perturbations") that are often associated with the different profiles. To date such robust discrimination has not been achieved. In a typical perturbation experiment, the response of several thousand cellular constitu~.ents are typically measured, yet only a small number of constituents change significantly. Frequently none of the cellular constituents change at all.
Consequently, there is frequently not enough information available in a conventional profile to provide an accurate assessment of the subtle effects of a perturbation. Figure 1 illustrates this art recognized problem. In Figure l, the results of 365 mRNA
transcription profiling experiments are shown. The 365 experiments include experiments with/without drugs at different concentrations, with/without specific genes in the yeast strain, combinations of drug treatment and gene deletion, changes in culture density, growth temperature, medium composition, and stimulations with endogenous hormones like mating factor. Although several thousand cellular constituents are being profiled in each experiment depicted in Figure l, typically only a small number of constituents change significantly, and often none at all. As a consequence, a profile derived from any of the 365 experiments in Figure 1 would nol: provide enough information to determine the subtle effects of a particular perturbation. Consequently, profile comparisons using conventional profiles suffer from a failure to provide sufficient information to discern the subtle affects of a perturbation on a biological system.

According to the above background, there is a great demand in the art for robust profile comparison methods.
Discussion or citation o~f a reference herein shall not be construed as an admission that such reference is prior art to the present invention.

This invention provides robust profile comparison methods. These methods are used to determine a degree of siimilarity between an effect of a first perturbation and a second perturbation on a biological system. The methods of this invention have extensive applications in the areas of preventive health care, drug discovery, drug candidate lead selection, drug candidate validation, drug regimen optimization in a variety of patient populations, development of clinical trial protocols to satisfy United States Food and Drug Administration (FDA) requirements including those for investigative new drugs, satisfaction of related clinical tnial protocol requirements in administrative agencies that are equivalent to the FDA in countries other than the United States, drug and/or drug candidate ~15 ~f~caey,-drug and~or-drug candidate toxicity, diagnostic applications such-as disease monitoring in a variety of patient populations, and for the prediction of the clinical outcome of a patient.
One aspect of the invention includes a method comprising the steps of (a) determining a first set of constituent profiles, wherein each constituent profile in the set is determined by a different one of a plurality of initial states of a biological sample by measuring a response of the biological sample to the first perturbation when the biological sample is in the selected initial skate; {b) determining a second set of constituent profiles, each constituent profile of the second set determined using a different one of a plurality of initial states of the biological sample by measuring a response of the biological sample to a second perturbation when the bie~logical sample is in the selected initial state; (c) combining the first set of constituent profiles into a first augmented profile; (d) combining the second set of constituent profiles into a second augmented profile; and (e) comparing the first augmented profile with the second augmented profile to determine the degree of similarity between the first perturbation and the second perturbation.
~ accord with a second aspect of the invention at least one constituent profile in the first set of constituent profiles is a first response profile and at Least one constituent profile in the second set of constituent profiles is a second response profile. The first response profile is determined by at least one measurement of a at least one cellular constituent in the biological sample when the biological sample is in an initial state selected from a plurality °f initial states, and the second response profile is determined by at least one measurement of at lease one cellular constituent in said biological sample when said biological sample is in the selected initial state.

In accord with a another aspect of the invention at least one constituent profile in the first set of constituent profiles is a first projected profile and at least one constituent profile in the second set of constituent profiles is a second projected profile. In this aspect of the invention, the first and second Inojected profiles each contain a plurality of cellular constituent set values derived according to a definition of co-varying cellular constituent sets. The first and second projected profiles could be determined by an initial state selected from said plurality of initial stakes of the biological sample. An augmented profile could include any combination of projiected profiles and response profiles.
In accord with a another aspect of the invention the biological sample is a cell line.
The cell line could be an of an unicellular organism and at least one initial state included in a plurality of initial states could be determined by altering the biological sample in a manner that alters cell wall permeability. In another aspect the biological sample is substantially isogenic to Saccharomyces cerevisiae.
In another aspect of the invention, the biological sample is a cell line that expresses a macromolecule that serves as a drug efflux pump. In this embodiment, some of the initial biological states are generated b:y selecting isogenic cell lines khat do not possess macromolecules that have an abiility to act as a drug efflux pump.
In another embodiment,1he biological sample is a cell line and the first initial state that is selected from a plurality of initial states is determined by a first set of culture growth conditions and a second initial state that is selected from a plurality of initial states is determined by a second set of culture growth conditions. In this embodiment, the first culture growth conditions and the second culture growth conditions vary by a variable such as an amount of a nutrient that is necessary for viability of said cell line, an amount of a trace element, an amount of a mineral, a culture temperature, andlor the nature of the container the sample is cultured in. Examples of containers include but are not limited to shaker flasks, culture plates and incubators.
In another aspect of the invention, the biological sample is a cell line and a first initial state that is selected from a plurality of initial states is determined by a culture growth density of the cell line and a second initial state that is selected fi-om a plurality of initial states is determined by a second culture growth density of the cell line, wherein the two culture growth densities vary by ;m amount.
In an another aspect of thc; invention, the biological sample is a cell line and a first initial state that is selected from a. plurality of initial states is detenmined by a first amount of a pharmacological agent that is. contacted with the biological sample and a second initial state that is selected from said plurality of initial states is determined by a second amount of a pharmacological agent that is contacted with the biological sample.
In an accord with another aspect of the invention a first initial state is determined by a genetic feature of the biological sample. In this aspect of the invention, the biological sample could be SaccharomycE:s cerevisiae having a genome and the first initial state that is selected from a plurality of initial states is determined by a genetic feature selected from the group consisting of a haploid state of the genome, a diploid state of the genome, a heterozygous state of a gene included in the genome, a homozygous state of a gene included in the genome, a mutation of a ;gene included in the genome, a deletion of a portion of a gene from the genome, an alteration of a regulatory sequence of a gene in the genome, an exogenous gene integrated into the genome and an exogenous oligonucleotide integrated into the genome.
In accordance with the another aspect of the invention, the biological sample could be a cell line having a genome wherein the first initial state that is selected from a plurality °f initial states is determined by a genetic feature selected from the group consisting of a heterozygous state of a gene included in the genome, a homozygous state of a gene included in the genome, a mutation of a gene included in the genome, a deletion of a portion of a gene from the genome, an alteration of a regulatory sequence of a gene in the genome, an exogenous gene integrated into I:he genome, and an exogenous oligonucleotide integrated into t genome. --In another aspect of the invention, the biological sample is a cell line and the first initial state that is selected from a plurality of initial states is determined by a state of a biological pathway that is selected from a compendium of biological pathways present in the cell line. In one aspect of thE; invention, the biological sample is substantially isogenic with Saccharomyces cerevisiae ~md the biological pathway is a mating pathway.
1n yet another aspect of the invention, the first perturbation is a first amount of a first pharmacological agent that is contacted with the biological sample. In another aspect, the second perturbation is a second amount of the f rst pharmacological agent that is contacted with the biological sample, and tlhe first and second amounts of pharmacological agent vary.
In another aspect, the second perturbation is a second amount of a second pharmacological agent that is contacted with said biological sample.
In accordance with another aspect of the invention, the biological sample includes a genome and the first perturbation is determined by the introduction of an exogenous gene into the genome, and/or deletion .of at least one gene in the genome.
~ accordance with another aspect of the invention, the first perturbation is a method, the method comprising: contacting said biological sample with a hormone, a drug, a peptide, an oligonucIeotide, a mineral, a composition of media, a phage, a trace element, a salt, a colony stimulating factor or a source of irradiation. In another aspect, the first perturbation is a method, the method comprising: contacting an amount of an organic compound that has a molecular weight less than 1000 Daltons with said biological sample.
In accordance with another aspect of the invention, the first augmented profile is expressed as:

P' _ [P'~;...;P'N]
wherein, P' is a first augmented profile;
P', is a first constituent profile in a first set of constituent profiles that is determined by measuring a response of a biological sample to a first perturbation when a biological sample is in a first biological state selected from a plurality of initial states;
P N is an N'" constituent profile in the first set of constituent profiles that is determined by measuring a response of the biological sample to the first perturbation when the biological sample is in an N'" biological state selected from the plurality of initial states; and the second augmented profile is:
P' _ [P"r . . ~ P" N]
,., wherein, p' is a second au~~nented profile;
P", is a first constituent profile in a second set of constituent profiles that is determined by measuring a response of the biological sample to the second perturbation when the biological sample is in said the biological state;
P' N is an N'" constituent profile in the second set of constituent profiles that is determined by measuring a response of the biological sample to the second perturbation when the biological sample is in an N'" biological state selected from the plurality of initial states; and N is the number of states in the plurality of initial states.
In this embodiment the step of comparing the first augmented profile with the second augmented profile to detenmine the correlation is performed by comparing P' to P' using a quantitative measure of similarity. In one aspect this quantitative measure of similarity is a generalized dot product:
~;; = pi * P' i C~P'~~I"~) wherein * denotes dot product, ~~ denotes vector norm and r;~ denotes similarity. In another aspect of the invention, the quantitative measure of similarity is derived from Shannon mutual information theory.
In another aspect of the invention, each constituent profile includes a plurality of elements that each represent an amount of a cellular constituent in a biological sample.
Accordingly, the cellular constituents are independently selected from the group consisting of a gene expression level, an amount of an mRNA encoding a gene, an amount of a protein, an amount of an enzymatic activity, an amount of an epitope presented by a _7_ macromolecule, an amount of a divalent cation, an amount of a phosphorylated protein, an amount of a dephosphorylated protein, an amount of a hormone, and an amount of a peptide.
Another aspect of the invention is a method of determining an effect of a first perturbation on a subject, the method comprising:(a) determining a plurality of augmented profiles; each augmented profile determined by combining a constituent profile set selected from a plurality of constituent ;prof le sets wherein:
each constituent profile set in the plurality of constituent profile sets is determined by obtaining a biological sample from the subject at a different time; and each constituent profile in the constituent profile set is determined by measuring a biological response of tlhe biological sample to a different second perturbation selected from a pluralit)r of perturbations;
and (b) comparing the plurality of augmented profiles to determine the effect of the first perturbation on the subject. The first perturbation may be selected from the group consisting of a diseased state; -introduction of an exogenous gene into the genome of the subject, and a behavioral health risk. Optionally, the first constituent profile set in the plurality of constituent profiles nets represents a baseline state and all other constituent profile sets in the plurality of constituent profile sets are expressed as a ratio or logarithmic ratio of the first constituent profile set. Optionally, the first perturbation is a drug that is den by the subject of interest a.t regular intervals.
4 BRIEF :DESCRIPTION OF THE DRAWINGS
Fig. 1 represents the results of 365 mRNA transcription profiling experiments.
Methods were as described for a subset of these experiments in Section 6., supra. Each of the 365 rows in this image has, vvhen printed at full resolution, 6000 gray-scale pixels representing the ratio in mRNA expression of the 6000 yeast genes between the pair of cell conditions in that experiment pair. Black denotes upregulation of a gene's transcription, white denotes downregulation, and the middle gray denotes very little or no change. The gray-scale bar at the bottom of Fiigure 1 indicates a scale from logl0(ratio) _ -1 (ten fold downregulation) to 1og10(ratio) _= +1 (ten fold upregulation) for reference.
The 365 condition pairs include comparisons of with/without drugs at different concentrations, with/without specific genes in the; yeast strain, combinations of drug treatment and gene deletion, changes in culture density, growth temperature, medium composition, and stimulations with endogenous homnones like mating factor.
Fig. 2 represents profiles to drugs in multiple conditions, Although the response to the drugs under starting State 1 may be small or nonexistent, the concatenated response profiles obtained in different states may provide robust discrimination of the activities of the _g_ WO 00/39337 PCT1US99l30577 different compounds. t denotes upregulation. 1 denotes downregulation. Absence of an arrow denotes no change for that cellular constituent.
Fig. 3A illustrates a profile for the immunosuppressant drug cyclosporin A.
Fig. 3B
illustrates a profile for the immunosuppressant drug FK506. In both figures, the horizontal axis is the intensity of the individual hybridized spots on the microan ary, representing individual mRNA species abundance in the two cultures. The vertical axis is the 1og10 of the ratio of the intensity measured for one fluorescent label (Culture 1) to that measured for the other label (Culture 2). Error bars and names are displayed only for those genes which had up or down regulations duc; to the drug that were significant at the 95%
confidence level or better.
Fig. 4 Shows the high correlation (similarity) between the effects of cyclosporin A
and FK506 on S. Cerevisiae that had been cultured in the presence of I pg/ml of FK506 and 30pg/ml of cyciosporin respectively.
Fig. SA illustrates a response profile for the gene deletion strain FPR
cultured in the presence of I pcg/ml of FK506.
Fig. SB i-llustrates a response profile for the gene deletion strain CPH1 cultured iii the presence of 1 pg/ml of FK506.
Fig. SC illustrates a response profile for the gene deletion strain FPR
cultured in the presence of SOpg/ml of cyclosporin.
Fig. SD illustrates a response profile for the gene deletion strain CPH1 cultured in ~e presence of SOUg/ml of cycl~asporin.
Fig. 6 illustrates the reduced correlation between the effects of cyclosporin and FK506 in yeast when augmenteci profiles are used.
Fig. 7 illustrates a computer system useful for embodiments of the invention.

A basis for the present invention is the unexpected discovery that augmented profiles provide a method for robustly discriminating between the subtle effects of a first perturbation and a second perturbation on a biological sample. Augmented profiles are derived by the combination of a plurality of response profiles and/or projection profiles that ~e m turn based upon the measurement of cellular constituents within a biological sample as the biological sample is placed in a series of different starting states.
This section presents a detailed description of the invention and its applications.
5.1 INTRODUCTION
To appreciate the methods of the present invention, an understanding of some preliminary concepts such as biological state, response profiles, and projection profiles is necessary. After these concepts ~~re understood, one skilled in the art will understand the concept of an augmented profile. Further, the improvements that augmented profiles provide in the field of profile comparison will be appreciated after the details of the present invention are described and an example is presented.
5.1.1 GENERAL DEFINITIONS
Biological sample and/or Biological system: As used herein, a biological sample and/or biological system includes a cell line, a culture of a cell line, a tissue sample obtained from a subject, a Homo sapien, a mammal, a yeast substantially isogenic to Saccharomyces cerevisia, or ar~y other art recognized biological system.
Perturbation: As used herein, a perturbatiion includes the exposure of a biological sample to a drug candidate or phanmacologic agent, the introduction of an exogenous gene into a biological sample, the deletion of a gene from the biological sample, changes in the culture conditions of the biological sample, or any other art recognized method of perturbing a biological sample.
Constituent Profile: A constituent profile is a profile used in the formation of an augmented profile. The constituent profile may, for example, be a response profile or a projected profile, which are described infra.
Behavioral Health Risk: As used herein, a behavioral health risk includes, but is not limited to, consumption of alcohol and cigarette smoking.
5.1.2 BIOLOGICAL SAMPLE
As used in herein, the term "biological sample" is broadly defined to include any cell, tissue, organ or multiceIlula.r organism. A biological sample can be derived, for example, from cell or tissue cultures in vitro. Alternatively, a biological sample can be derived from a living organism o~r from a population of single cell organisms.
The state of a biological sample can be measured by the content, activities or structures of its cellular constituents. The state of a biological sample, as used herein, is determined by the state of a c°Ilection of cellular constituents., which are sufficient to characterize the cell or organism for an intended purpose including; characterizing the effects of a drug or other perturbation.
The term "cellular constituent" is broadly defined herein to encompass any kind of measurable biological variable. ~('he measurements and/or observations made on the state of these constituents can be of their abundances (i.e., amounts or concentrations in a biological s~ple), their activities, their states of modification (e.g., phosphorylation), or other art recognized measurements relevant to the physiological state of a biological sample. In various embodiments, this inventiion includes making such measurements and/or observations on different collections of cellular constituents. These different collections of cellular constituents are also called aspects of the biological state of a biological sample.
One aspect of the biolol;ical state of a biological sample (e.g., a cell or cell culture) usefully measured in the present invention is its transcriptional state. The transcriptional state of a biological sample includes the identities and abundances of the constituent RNA
species, especially mRNAs, in l:he cell under a given set of conditions.
Often, a substantial fraction of all constituent RNA species in the biological sample are measured, but at least a sufficient fraction is measured to characterize the action of a drug or other perturbation of interest. The transcriptional state of a biological sample can be conveniently determined by measuring cDNA abundances b;y any of several existing gene expression technologies.
DNA arrays for measuring mRNA or transcript level of a large number of genes can be employed to ascertain the biological state of a sample.
Another aspect of the biological state of a biological sample usefully measured is its translational state. The translati~onal state of a biological sample includes the identities and abundances of the constituent protein species in the biological sample under a given set of conditions. Preferably a substantial fraction of all constituent protein species in the biological sample is measured, but at least a sufficient fraction is measured to characterize the action of a drug of interest. "Che transcriptional state is often representative of the translational state.
Other aspects of the biological state of a biological sample are also of use in this invention. For example, the activity state of a biological sample includes the activities of the constituent protein species (and also optionally catalytically active nucleic acid species) in the biological sample under a ,given set of conditions. As is known to those of skill in the art, the translational state is often representative of the activity state.
This invention is also adaptable, where relevant, to "mixed" aspects of the biological state of a biological sample in wluch measurements of different aspects of the biological state of a biological sample are combined. For example, in one mixed aspect, the abundances of certain RNA species and of certain protein species, are combined with measurements of the activities of certain other protein species. Further, it will be appreciated from the following that this invention is also adaptable to any other aspect of a biological state of a biological sample that is measurable.
The biological state of a biological sample (e.g., a cell or cell culture) can be represented by a profile of some number of cellular constituents. Such a profile of cellular constituents can be represented b~~ the vector S.
S=~S"..S;,..S,~~

Where S; is the level of the i'th cellular constituent, for example, the transcript level of gene i, or alternatively, the abundance or activity level of protein i.
In some embodiments, cellular constituents are measured as continuous variables.
For example, transcriptional rates are typically measured as number of molecules synthesized per unit of time. Transcriptional rate may also be measured as percentage of a control rate. However, in some other embodiments, cellular constituents may be measured as categorical variables. For example, transcriptional rates may be measured as either "on"
or "off ', where the value "on" indicates a transcriptional rate above a predetermined threshold and value "ofF' indicates a transcriptional rate below that threshold.
5.1.3 RESPONSE PROFILES
The responses of a biological sample to a perturbation, such a pharmacological agent, can be measured by observing the changes in the biological state of the biological sample. A response profile is a collection of changes of cellular constituents. The response profile of a biological sample (e.g., a cell or cell culture) to the perturbation m may be defined as the vector v~'"~:
v~'"~ = w;'"a, . . v~"'~ . . v~"'~~ (2) k where v"' is the amplitude of response of cellular constituent i under the pe~ation m. In some embodiments of response profiles, biological response to the application of a pharmacological agent is measured by the induced change in the transcript level of at least 2 genes, preferably more than 10 genes, more preferably more than 100 genes and most preferably more than 1,000 genes, In some embodiments, biological response profiles comprise simply the difference between biological variables before and after perturbation. In some preferred embodiments, the biological response is defined as the ratio of cellular constituents before and after a perturbation is applied.
In some preferred embodiments, v;~ is set to zero if the response of gene i is below some threshold amplitude or confidence level determined from knowledge of the me~m'ement error behavior. In such embodiments, those cellular constituents whose measured responses are lower tha~l the threshold are given the response value of zero, whereas those cellular constituents whose measured responses are greater than the threshold retain their measured response values. This truncation of the response vector is suitable when most of the smaller responses are expected to be greatly dominated by measurement ~TOr. After the truncation, the response vector v~"'~ also approximates a 'matched detector' (see, e.g., Van Trees, 1968, Detection. Estimation and Modulation Theorv Vol I, Wiley &

Sons) for the existence of similar perturbations. It is apparent to those skilled in the art that the truncation levels can be set based upon the purpose of detection and the measurement errors. For example, in some e:mbodirnents, genes whose transcript level changes are lower than two fold or more preferably four fold are given the value of zero.
In some preferred embodiments of response profiles, perturbations are applied at several levels of strength. For example, different amounts of a drug may be applied to a biological sample to observe its response. In such embodiments, the perturbation responses may be interpolated by approximating each by a single parameterized "model"
function of the perturbation strength u. An exemplary model function appropriate for approximating transcriptional state data is the Hill function, which has adjustable parameters a, uo, and n.
ar(ulu~" (3) H(~ --1 v (u/u~n The adjustable parameters are selected independently for each cellular constituent of the perturbation response. Preferably, the adjustable parameters are selected for each cellular constituent so that the sum of the; squares of the differences between the model function (e.g., the Hill function, Equation 3) and the corresponding experimental data at each perturbation strength is minimizc;d. This preferable parameter adjustment method is known in the art as a least squares fit. Other possible model functions are based on polynomial fitting. More detailed description of model fitting and biological response has been disclosed in Friend and Stoughton, Methods of Determining Protein Activity Levels Using Gene Expression Profiles, U.S- Provisional Application Serial No. 60/084,742, filed on May 8, 1998, which is incorporated herein by reference in it's entirety for all purposes.
5.1.4 PROJECTED PROFILES
The methods of the invention are useful for comparing augmented profiles that contain any number of response profile and/or projected profiles. Projected profiles are best understood after a discussion of genesets, which are co-regulated genes.
Projected profiles are useful for analyzing many types of cellular constituents including genesets.
5.1.4.1 CO-REGULATED GENES AND GENESETS
Certain genes tend to increase or decrease their expression in groups. Genes tend to increase or decrease their rates of transcription together when they possess similar regulatory sequence patterns, i.e., transcription factor binding sites. This is the mechanism for coordinated response to particular signaling inputs (see, e.g., Madhani and Fink, 1998, The riddle of MAP kinase signaling specificity, Transactions in Genetics 14:151-155;
Arnone and Davidson, 1997, The hardwiring of development: organization and fimction of genomic regulatory systems, L>evelopment 124:1851-1864). Separate genes which make different components of a necessary protein or cellular structure will tend to co-vary.
Duplicated genes (see, e.g., W;agner, 1996, Genetic redundancy caused by gene duplications and its evolution in networks o~f transcriptional regulators, Biol. Cvbern.
74:557-567) will also tend to co-vary to the extent mutations have not lead to functional divergence in the regulatory regions. Further, because regulatory sequences are modular (see, e.g., Yuh et al.,1998, Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene, Sc'ence 279:1896-1902), the more modules two genes have in common, the greater the variety of conditions under which they are expected to co-vary their transcriptional rates. Separation between modules also is an important determinant since c°-activators also are involved. In summary therefore, for any finite set of conditions, it is expected that genes will not all vary independently, and that there are simplifying subsets of genes and proteins that will co-vary. These co-varying sets of genes form a complete basis in the mathematical sense with which to describe all the profile changes within that finite set of conditions.
5.1.4.2 _GENESET CLASSIFICATION BY CLUSTER ANALYSIS
For many applications, it is desirable to find basis genesets that are co-regulated over a wide variety of conditions. A preferred embodiment for identifying such basis genesets involves clustering algorithms (for reviews of clustering algorithms, see, e.g., F~aga, 1990, Statistical Pattern Recognition, 2nd Ed., Academic Press, San Diego;
Everitt, 1974, Cluster Analysis, London: Heinemann Educ. Books; Hartigan, 1975, Clustering Algorithms, New York: WiIey; Sneath and Sokal, 1973, Numerical Taxonomy, Freeman; Anderberg, 1973, Cluster Analysis for A~nlications, Academic Press:
New York).
In some embodiments employing cluster analysis, the expression of a large number °f genes is monitored as biologi<:al samples are subjected to a wide variety of perturbations.
A table of data containing the gene expression measurements is used for cluster analysis. In order to obtain basis genesets that contain genes which co-vary over a wide variety of conditions multiple perturbations or conditions are employed. Cluster analysis operates on a table of data which has the dimension m x k wherein m is the total number of conditions or p~bations and k is the numben~ of genes measured.
A number of clustering algorithms are useful for clustering analysis.
Clustering algorithms use dissimilarities or distances between objects when forming clusters. In some embodiments, the distance used is Euclidean distance in multidimensional space:
!l2 1(x'~Y)= ~ ~Xr - Y~ (4) r where I(xy) is the distance between gene X and gene Y; X,. and Y, are gene expression response under perturbation i. 'The Euclidean distance may be squared to place progressively greater weight on objects that are further apart. Alternatively, the distance measure may be the Manhattan distance e.g., between gene X and Y, which is provided by:
I(.x~Y)= ~ ~Xa ~ yh r Again, X, and Y,. are gene expre:;sion responses under perturbation i. Some other definitions of distances are Chebychev distance, power distance, and percent disagreement.
Percent disagreement, defined as I(xy) -_ (number ofX,. #Y,)li, is particularly useful for the method of this invention, if the data for 'the dimensions are categorical in nature.
Another useful I O distance definition, which is particularly useful in the context of cellular response, is I =1- r, where r is the con elation coefficient between the response vectors X, Y, also called the normalized dot product X~YI~~~Y~.
Various cluster linkage rules are useful for defining genesets. Single linkage, a _ _ near~st.neighbvr method, deterrr~ines the distance between the two closest objects. By contrast, complete linkage methods determine distance by the greatest distance between any two objects in the different clusters. This method is particularly useful in cases when genes or other cellular constituents form naturally distinct "clumps."
Alternatively, the unweighted pair-group average dfefines distance as the average distance between all pairs of objects in two different clusters. This method is also very useful for clustering genes or other cellular constituents to fornn naturally distinct "clumps." Finally, the weighted pair-group average method may also be used. This method is the same as the unweighted pair-group average method except that the size of the respective clusters is used as a weight.
This method is particularly useful for embodiments where the cluster size is suspected to be greatly varied (Sneath and Sokal,1973, Numerical taxonomv, San Francisco: W.
H.
Freeman & Co.). Other cluster linkage rules, such as the unweighted and weighted pair-group centroid and Ward's method are also useful for some embodiments of the invention.
See., e.g., Ward, 1963, J. Am. Sta.t Assn. 58:236; Hartigan, 1975, Clustering al o~ s, New York: Wiley.
As the diversity of perturbations in the clustering set becomes very large, the genesets which are clearly distinguishable get smaller and more numerous.
However, even over very large experiment sets, there are small genesets that retain their coherence. These genesets are termed irreducible gewesets. Typically, a large number of diverse perturbations are applied to obtain such irreducible genesets.
Often, the clustering of genesets is represented graphically and is termed a 'tree'.
Genesets may be defined based or.~ the many smaller branches of a tree, or a small number of larger branches by cutting acro;;s the tree at different levels. The choice of cut level may - IS -be made to match the number of distinct response pathways expected. If little or no prior information is available about the number of pathways, then the tree should be divided into as many branches as are truly distinct. 'Truly distinct' may be defined by a minimum distance value between the individual branches. Typical values are in the range 0.2 to 0.4 where 0 is perfect correlation and 1 is zero correlation, but may be larger for poorer quality data or fewer experiments in the training set, or smaller in the case of better data and more experiments in the training set.
Preferably, 'truly distinct' may be defined with an objective test of statistical significance for each bifurcation in the tree. In one aspect of the invention, the Monte Carlo randomization of the experiment index for each cellular constituent's responses across the set of experiments is used to define an objective test.
In some embodiments, the objective test is defined in the following manner:
Let pk; be the response of constituent k in experiment i. Let II(i) be a random permutation of the experiment iindex. Then for each of a large (about 100 to 1000) number of different random permutations, construct pkx;~. For each branching in the original tree, for each permutation:
(1) perform hierarchical clustering with the same algorithm ('hclust' in this case) used on the original unpermuteci data;
(2) compute fractional improvement f in the total scatter with respect to cluster centers in going from one cluster to two clusters j= I - ~Dk~'~ ~ ~JDk~l~ (6) where Dk is the square of the distance measure for constituent k with respect to the center (mean) of its assigned cluster. Superscript 1 or 2 indicates whether it is with respect to the center of the entire branch or with respect to the center of the appropriate cluster out of the two subclusters. There is considerable freedom in the definition of the distance function D
used in the clustering procedure. In these examples, D = 1- r , whe:re r is the correlation coefficient between the responses of one constituent across the experiment set vs. the responses of the other (or vs. the mean cluster response).
The distribution of fractional improvements obtained from the Monte Carlo procedure is an estimate of the distribution under the null hypothesis that a given branching was not significant. The actual fractional improvement for that branching with the unpermuted data is then compared to the cumulative probability distribution from the null hypothesis to assign significance. Standard deviations are derived by fitting a log normal model for the null hypothesis distribution. Using this procedure, a standard deviation greater than about 2, for example, indicates that the branching is significant at the 95%

confidence level. Genesets defined by cluster analysis typically have underlying biological significance.
Another aspect of the cluster analysis method provides the definition of basis vectors for use in profile projection described in the following sections.
A set of basis vectors V has k x n dimensions, where k is the number of genes and n is the number of genesets.
Y~'~ . Y~"~
t' _ . . . ('n Y~'~ . Y~"~
k k ~"~k 1S the amplitude contribution of gene index k in basis vector n. In some embodiments, Vr"'k = l, if gene k is a member of geneset n, and I~"'k = 0 if gene k is not a member of geneset n. In some embodiments, Vr"~k is proportional to the response of gene k in geneset n over the training data set used ta~ define the genesets .
.. _ . . In some preferred embodiments, the elements Yr"~k are normalized so that each Yr"~k has unit length by dividing by the square root of the number of genes in geneset n. This produces basis vectors which are; not only orthogonal (the genesets derived from cutting the clustering tree are disjoint), but also orthonormal (unit length). With this choice of normalization, random measurement errors in profiles project onto the i~"~k in such a way that the amplitudes tend to be comparable for each n. Normalization prevents large genesets from dominating the results of similarity calculations.
5.1.4.3 GE;NESET CLASSIFICATION BASED UPON
MECHANISMS OF REGULATION
Genesets can also be defined based upon the mechanism of the regulation of genes.
Genes whose regulatory regions have the same transcription factor binding sites are more likely to be co-regulated. In some preferred embodiments, the regulatory regions of the genes of interest are compared using multiple alignment analysis to decipher possible shared transcription factor binding sites (Stormo and Hartze11,I989, Identifying protein binding sites from unaligned DN,A fragments, Proc Natl Acad Sci 86:1183-1187;
Hertz and Stormo, 1995, Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps, Proc of 3rd Intl Conf on Bioinformatics and Genome Rese~arc , Lim and Cantor, eds., World Scientific Publishing Co., Ltd. Singapore, pp. 201-216). For example, as Example 3, infra, shows, common promoter sequence responsive to ~Gcn4 in 20 genes may be responsible for those 20 genes being co-regulated over a wide variety of perturbations.

The co-regulation of genes is not limited to those with binding sites for the same transcriptional factor. Co-regulated (co-varying) genes may be in the up-stream/down-stream relationship where the products of up-stream genes regulate the activity of down-stream genes. It is well known to those of skill in the art that there are numerous varieties of gene regulation networks. One of skill in the art also understands that the methods of this invention are not limited to any particular kind of gene regulation mechanism.
If it can be derived from the mechanism of regulation that two genes are co-regulated in terms of their activity change in response to perturbation, the two genes may be clustered into a geneset.
Because of lack of complete understanding of the regulation of genes of interest, it is often preferred to combine cluster analysis with regulatory mechanism knowledge to derive better defined genesets. In some embodiments, K-means clustering may be used to cluster genesets when the regulation ofgenes of interest is partially known. K-means clustering is particularly useful in cases where the number of genesets is predetermined by the understanding of the regulatory mechanism. In general, K-mean clustering is constrained to produce exactly the number of clusters desired. Therefore, if promoter sequence comparison indicates the measured genes should fall into three genesets, K-means clustering may be used to generate exactly three genesets with greatest possible distinction between clusters.
5.1.4.4 REPRESENTING PROJECTED PROFILE, The expression value of genes can be converted into the expression value for genesets. This process is referred to as projection. In some embodiments, the projection is as follows:
P = [P~, .. P~, .. PnJ = p . y (8) wherein,p is the expression profile, P is the projected profile, P; is expression value for geneset i and v is a predefined set of basis vectors. The basis vectors have been previously defined in Equation 7 (Section 5.1.4.2, supra) as:
il~~t) . yin) t t V = . . . (9) ~!(1) . y(n) k k ~,h~.ein l~"~k is the amplitude of ce:lluIar constituent index k of basis vector n.

In one preferred embodiment, the value of geneset expression is simply the average of the expression value of the gt;nes within the geneset. In some other embodiments, the average is weighted so that highly expressed genes do not dominate the geneset value. The collection of the expression values of the genesets is the projected profile.
5.2 PROFILE COMPARISON AND CLASSIFICATION
Once the basis genesets ~~re chosen, projected profiles P; may be obtained for any set of profiles indexed by i. Similatzties between the P; may be more clearly seen than between the original profiles p; for two reasons. First, measurement errors in extraneous genes have been excluded or averaged out. Second, the basis genesets tend to capture the biology of the profilesp; and so are matched detectors for their individual response components.
Classification and clustering of the profiles both are based on an objective similarity metric, call it S, where one useful definition is S;~ = S(P; . P;~ = P; ~P~ / (IPrI IP;U (IO) This definition is the genc;ralized angle cosine between the vectors P; and P~. It is the projected version of the conventional correlation coefficient between p; and p~. Profile p; is deemed most similar to that other profile p~ for which S;~ is maximum. New profiles may be classified according to their similarity to profiles of known biological significance, such as ~e response patterns for known drugs or perturbations in specific biological pathways. Sets of new profiles may be clustered using the distance metric D,~ = I - S;~ (1 I) where this clustering is analogous to clustering in the original larger space of the entire set of response measurements, but has the advantages just mentioned of reduced measurement error effects and enhanced capture; of the relevant biology.
The statistical significance of any observed similarity S;~ may be assessed using an empirical probability distribution ,generated under the null hypothesis of no correlation.
~s distribution is generated by performing the projection, Equations (9) and (10) for many different random permutations of the constituent index in the original profile p. That is, the ordered set pk are replaced by pxk~ where II(k) is a permutation, for 100 to 1000 different random permutations. The probability of the similarity S;~ arising by chance is then the fi~action of these permutations for which the similarity S;~ (permuted) exceeds the similarity observed using the original unpermuted data.

5.3 AUGMENTED PROFILES AND ROBUST DISCRIMINATION
5.3.1 MAD FOR ROBUST DISCRIMINATION
In the methods of this invention, a biological sample is placed in alternative states by, for example, introducing mutations or changing growth conditions, to make the biological sample more responsive to a given perturbation. This concept is illustrated in Figure 2. Under State 1, in Figlsre 2, the drugs have only limited responses and comparison of their effects is tenuous and based on little information. By forming augmented profiles consisting of concatenated profiles from multiple states or conditions, the profiles become much more informative. Because they are more informative, they can provide improved detail on the effects of different perturbations, such as drugs in the illustration, on a patient.
The different states may be different culture growth conditions, background genetic strains, or additional drug treatments, to name a few. These additional states may be chosen based on prior biological knowledge to elicit specific responses in otherwise unresponsive cells, or they may be chosen more or less; at random with the knowledge that the resulting additional dwersity in the augmented response profile will tend to allow better discrimination, on average. Techniques to change the initial state and possibly elicit responses include, for example, inhibiting drug efflux pumps or enhancing cell wall permeability by genetic modification of the organism, growing in nutrient-poor media, growing on plates vs. in volume culture, adding certain trace elements or minerals to the media, using haploid, diploid, and heterozygous background strains, activating pathways such as the mating pathway which have widespread effects on cell state and are likely to change the responsiveness to the stimuli that are being compared.
Robust augmented profiles comparison has wide ranging applicability, such as providing a method for robust discrimination of drug activities or disease states in vivo. In such applications, multiple conditions are provided by following a patient in time or through other environmental or medical insults and by concatenation of the multiple profiles obtained under these different host conditions. Profiles may be expressed as departure profiles from baseline states by forming the ratio or log(ratio) of constituent levels with respect to a baseline state, or any second perturbation.
Mathematically, comparisons of augmented profiles are done in a manner that is analogous to the comparison of profiles obtained in a single state as described in section 5.2.
The concatenated profile may be written P = [pl;p2;...;pNJ of length NL, where pl is the profile in the first state, N is the number of states and L is the number of cellular constituents measured in a single :>tate. Measures of similarity, such as a generalized dot pr°duct, r~ .= pt * p' /(~P'IIPiU

can be used to define the concatenated profiles, as they would be defined on single-state profiles pl. In Equation (12), * denotes dot product and ~~ denotes vector norm (length).
Many other quantitative measures of similarity are possible, such as Shannon mutual information [S.E. Shannon and W. Weaver, The mathematical theory of communication, University of Illinois Press, Urbana, IL, 1949], or modifications of Equation (12) where elements of the profiles are set to "1" ("-1 ") if they exceed a positive (negative) threshold and "0" if they do not.
These measures of similarity then support searches of augmented-profile libraries for the profile most similar to a query profile, and clustering of sets of augmented profiles into groups that are likely to share characteristics like toxicity or effectiveness.
5.3.2 ILLUSTRATIVE; DRUG DISCOVERY APPLICATIONS
Robust discrimination of augmented profiles has wide applicability to several aspects of drug discovery as outlined in the following sections.
5.3.2.1 DRUG CANDIDATE LEAD SELECTION
The methods of the present invention have applicability to the field of drug candidate lead selection. In many drug discovery efforts, a target enzyme will be screened against a large library of proprietary and/or nonproprietary compounds. Such a screening effort is referred to as a primary assay. Primary assays are often reduced to a robotic format in which ~°usands of compounds are screened per day. These efforts will result in a large number of compounds that produce the desired activity, which is typically the inhibition of the activity of a selected target enzyme.
Compounds that are succE;ssful in the proprietary assay are typically called hits or leads. Hits from the primary assay are typically screened in appropriately designed secondary assays. While the format of the secondary assay may vary depending on the scope of the drug discovery project, a typically secondary assay includes the dose response of a compound on whole cells. Thus in such a cell-based assay, the presence of some cellular constituent, such as TNF secretion, is measured as the cells are incubated in increasing concentrations of test compound.
~ order to measure the suitability of a test compound, secondary assays are typically used to compare the activity of hits from the primary assay with the activity of some reference compound. The reference compound may be one that has proven efficacy in the appropriate clinical setting, a known drug or simply a prior lead. Comparison of newly developed compounds against the active reference compound serves as an excellent tool for m~~ng progress and for detenni~ung what is to be expected of new compounds.
In one aspect, the methods of the present invention will serve as an improved secondary assay. Accordingly, the effect of dosing an appropriate cell line with a reference compound can be compared to l;he effect of dosing the same cell line with each of the hits from the primary screening assay. In this embodiment, appropriate cellular constituents of the cell line can be measured using any of the techniques described in this specification or known in the art. Further, these measurements can be done when the cell line is placed in a variety of different initial biological states. For example, cell response profiles can be measured when the reference compound has been contacted with the cells after they have been cultured in a variety of cell. culture densities, temperatures, or other culture conditions.
Each of these response profiles ~~re combined to form a reference augmented profile.
Similar augmented profiles are created for each of the hits from the primary screening assay and these augment profiles are compared with the reference profile. By comparing augmented profiles generated from each compound of interest rather than individual response or projected profiles, subtle differences between the effects of each test compound can be detected. Even small changes in cellular constituents associated with a known toxicity or a desired physiologic event will become statistically meaningful using the methods of this invention.
5.3.2.2 DRUG CANDIDATE VALIDATION
08en in the drug discovery process, a potential drug candidate will exhibit excellent activity in the primary in vitro assay and secondary cell-based assays. Even if a compound is successful in both primary and secondary assays, their remains a need to validate the compound. Compound validation addresses the difficult issue of verifying that a test compound was successful in the primary and secondary assays because of selective affects on the desired target rather than unselective affects on multiple physiological processes.
Compounds that selectively affect the desired target are preferred over compounds that selectively affect a wide variety of cellular constituents. For example, a compound that is excessively hydrophobic may bind to the target enzyme by unselective hydrophobic interactions. The problem with such an excessively hydrophobic protein is that it is likely to unselectively bind and/or inhibit several cellular constituents as well.
Compounds that nonselectively inhibit all enzymes in a class are also undesirable. For example, in addition to inhibiting a target kinase of interest, a nonselective kinase inhibitor such as staurosporine well bind and inhibit dozens of kunases. A test compound may perform well in the secondary assay because it is toxic to the cells or because the compound knocked out a biological pathway that is unrelated to the biological pathway of interest.
The methods of the present invention provide improved means for validating test compounds in a drug discovery effort. In this embodiment of the invention, augmented profiles (reference augment profiles) based on the compounds that have a known effect on a biological sample are compared with augmented profiles generated from compounds that need validation. For example, reference compounds that have a general toxic effect on the biological sample will have distinct augmented profiles. Thus a low correlation between such reference toxic compounds and test compounds of interest is desired.
Similarly, a high correlation between an augmented profile derived from a previously validated compound and a test compound would indicate that the test compound is selectively influencing the proper biological pathway. A previously validated compound may be obtained from animal trials or from prior scientific publications.
5.3.2.3 DRUG REGIMEN OPTIMIZATION IN A VARIETY OF PATIENT
POPULATIONS
The methods of the present invention provide improved methods for °ptimizing drug regimens in a variety of patient populations. In one embodiment, augmented profiles developed from biological samples obtained from a patient can be compared with reference augmented profiles that represent model drug responses of patients with favorable clinical outcome. Data derived from such comparisons would then be used to optimize a particular drug regimen thus maximizing the effectiveness of drug treatment ~d reducing its costs in terms of response time and financial expenditure.
The augmented profiles taken from patients can also be used tv discover unsatisfactory therapeutic responses caused by inadequate drug exposure or undesirable side-effects before they manifest in unfavorable symptoms. Robust augmented profile comparison can also be used to detect poor compliance with a dosage regimen.
In another embodiment regular comparison of augmented profiles can be used to detect and monitor interactions with co-ingested medications or the effects of changes in the physical status of the patient.
5.3.3 ILLUSTRATIVE DIAGNOSTIC APPLICATION
5.3.3.1 PREVENTIVE HEALTH CA_R~
Because of its improved. ability at measuring the subtle effects of a perturbation on a biological sample, comparison 0f augmented profiles will provide an invaluable service in the field of preventive health care. In one embodiment of the invention, biological samples are obtained from subjects on a routine basis over time. Augmented profiles are developed bred upon these biological sarr~ples. Comparison of these augmented profiles to a database that includes several model disease states provides advance warning that the subject has a particular disease before the disease manifests itself in any outward clinical symptoms.
Such a diagnostic tool is particularly valuable in diseases such as cancer because early treatment leads to improved chances of recovery and/or survival. Appropriately chosen augmented profile comparisons will also provide useful information on health risks in a subject. Thus appropriately designed augmented profiles will be used to determine if a patient should alter their diet, exercise more, take certain vitamins, or alter other behavioral aspects. As the database of reference augmented profiles is enriched, the utility of the robust profile compariso

Claims

WHAT IS CLAIMED IS:

1. A method for determining a degree of similarity between an effect of a first perturbation and an effect of a second perturbation on a biological sample, the method comprising:
(a) determining a first set of constituent profiles, each constituent profile of said first set is determined using a different one of a plurality of initial states of said biological sample by measuring a response of said biological sample to said first perturbation when said biological sample is in said different one of said initial states;
(b) determining a second set of constituent profiles, each constituent profile of said second set determined using a different one of said plurality of initial states by measuring a response of said biological sample to said second perturbation when said biological sample is in said different one of said initial states;
(c) combining said first set of constituent profiles into a first augmented profile;
(d) combining said second set of constituent profiles into a second augmented profile; and (e) comparing said first augmented profile with said second augmented profile to determine said degree of similarity.

2. A method for determining a degree of similarity between an effect of a first perturbation and an effect of a second perturbation on a biological sample, the method comprising:
(a) combining a first set of constituent profiles into a first augmented profile;
each constituent profile in said first set determined by:
a different one of a plurality of initial states of said biological sample wherein a response of said biological sample to said first perturbation when said biological sample is in said different one of said initial states is measured;
(b) combining a second set of constituent profiles into a second augmented profile; each constituent profile of said second set determined by:
a different one of said plurality of initial states of said biological sample;
wherein a response of said biological sample to said second perturbation when said biological sample is in said different one of said initial states is measured; and (c) comparing said first augmented profile with said second augmented profile to determine said degree of similarity.

3. A method for determining a degree of similarity between an effect of a first perturbation and an effect of a second perturbation on a biological sample by comparing a first augmented profile with a second augmented profile to determine said degree of similarity; wherein:
(i) said first augmented profile is determined by combining a first set of constituent profiles; each constituent profile of said first set determined with a different one of a plurality of initial states of said biological sample by measuring a response of said biological sample to said first perturbation when said biological sample is in said different one of said initial states; and (ii) said second augmented profile is determined by combining a second set of constituent profiles; each constituent profile of said second set is determined with said different one of said plurality of initial states of said biological sample by measuring a response of said biological sample to said second perturbation when said biological sample is in said different one of said initial states.

4. The method of claim 1, 2 or 3 wherein each said initial state is different.

5. The method of claims 1, 2 or 3 wherein two or more of said initial states are the same.

6. The method of claims 1, 2 or 3 wherein at least one constituent profile in said first set of constituent profiles is a first response profile and at least one constituent profile in said second set of constituent profiles is a second response profile, wherein said first response profile is determined by at least one measurement of at least one cellular constituent in said biological sample when said biological sample is in an initial state selected from said plurality of initial states, and said second response profile is determined by at least one measurement of at least one said cellular constituent in said biological sample when said biological sample is in said initial state.

7. The method of claim 6 wherein said first response profile and said second response profile is determined by said initial state of said biological sample at a time when said measurements are made.

8. The method of claim 1, 2 or 3 wherein at least one constituent profile in said first set of constituent profiles is a first projected profile and at least one constituent profile in said second set of constituent profiles is a second projected profile, wherein said first and said second projected profile each contain a plurality of cellular constituent set values derived according to a definition of co-varying cellular constituent sets.

9. The method of claim 8 wherein said first projected profile and said second projected profile is determined by an initial state selected from said plurality of initial states.

10. The method of claims 8 wherein said definition is based upon co-variation of said cellular constituents under a plurality of different perturbations.

11. The method of claim 8 wherein said definition of co-varying cellular constituent sets is defined by a similarity tree derived by a cluster analysis of said cellular constituents under said plurality of perturbations.

12. The method of claim 11 wherein said co-varying cellular constituent sets are defined as branches of said similarity tree.

13. The method of claim 1, 2, or 3 wherein said biological sample is an organism having a cell wall and at least one initial state selected from said plurality of initial states is determined by altering said biological sample in a manner that alters said cell wall permeability.

14. The method of claim 1, 2, or 3 wherein said biological sample is a cell line.

15. The method of claim 14 wherein said biological sample is substantially isogenic to Saccharomyces cerevisiae.

16. The method of claim 14 wherein said cell line expresses a macromolecule that has an ability to act as a drug efflux pump and an initial state that is selected from said plurality of initial states is determined by a mutant activity of said macromolecule in said cell line.

17. The method of claim 14 wherein a first initial state that is selected from said plurality of initial states is determined by a first set of culture growth conditions and a second initial state that is selected from said plurality of initial states is determined by a second set of culture growth conditions, wherein said first culture growth conditions and said second culture growth conditions vary by an amount of a component of said culture growth conditions.

18. The method of claim 17 wherein said component of said culture growth conditions is an amount of a nutrient that is necessary for viability of said cell line.

19. The method of claim 17 wherein said component of said culture growth conditions is an amount of a trace element.

20. The method of claim 17 wherein said component is selected from the group consisting of iron, manganese, zinc, copper, molybdenum, boron, chlorine, calcium, sodium, chromium, potassium, magnesium, and selenium.

21. The method of claim 17 wherein said component of said culture growth conditions is an incubation temperature.

22. The method of claim 14 wherein a first initial state that is selected from said plurality of initial states is determined by a culture growth density of said cell line and a second initial state that is selected from said plurality of initial states is determined by a second culture growth density of said cell line, wherein said first culture growth density and said second culture growth density vary by an amount.

23. The method of claim 14 wherein a first initial state that is selected from said plurality of initial states is determined by a first amount of a pharmacological agent that is contacted with said biological sample and a second initial state that is selected from said plurality of initial states is determined by a second amount of a pharmacological agent that is contacted with said biological sample.

24. The method of claim 14 wherein a first initial state that is selected from said plurality of initial states is determined by incubating said cell line on a surface.

25. The method of claim 14 wherein a first initial state that is selected from said plurality of initial states is determined by incubating said cell line in a liquid.

26. The method of claim 14 wherein said biological sample is incubated in a container and a first initial state that is selected from said plurality of initial states is determined by the container that said biological sample is incubated in and the container is selected from the group consisting of shaker flasks, culture plates, incubators, 96-well microtiter plates, and 384-well microtiter plates.

27. The method of claim 1, 2, or 3 wherein a first initial state that is selected from said plurality of initial states is determined by a genetic feature of said biological sample.

28. The method of claim 27 wherein the biological sample is substantially isogenic to Saccharomyces cerevisiae having a genome; and a first initial state, which is selected from said plurality of initial states, is determined by a genetic feature selected from the group consisting of a haploid state of said genome, a diploid state of said genome, a heterozygous state of a gene included in said genome, a homozygous state of a gene included in said genome, a mutation of a gene included in said genome, a deletion of a portion of a gene from sand genome, an alteration of a regulatory sequence of a gene in said genome, an exogenous gene integrated into said genome and an exogenous oligonucleotide integrated into said genome.

29. The method of claim 27 wherein said biological sample is a cell line having a genome;
wherein a first initial state is selected from said plurality of initial states; wherein said first initial state is determined by a genetic feature selected from the group consisting of a heterozygous state of a gene included in said genome, a homozygous state of a gene included in said genome, a mutation of a gene included in said genome, a deletion of a portion of a gene from said genome, an alteration of a regulatory sequence of a gene in said genome, an exogenous gene integrated into said genome of said cell line, and an exogenous oligonucleotide integrated into said genome.

30. The method of claim 29 wherein a second initial state that is selected from said plurality of initial states is determined by contacting said biological sample with an amount of a composition; wherein said compositioncomprises a pharmacological agent, an endogenous hormone, a growth factor, a peptide, or an oligonucleotide.

31. The method of claim 14 wherein a first initial state that is selected from said plurality of initial states is determined by a state of a biological pathway; wherein said biological pathway is selected from a compendium ofbiological pathways present in said cell line.

32. The method of claim 31 wherein said biological sample is substantially isogenic to Saccharomyces cerevisiae and said biological pathway is a mating pathway.

33. The method of claim 1, 2, or 3 wherein said first perturbation is a first amount of a first pharmacological agent that is contacted with said biological sample.

34. The method of claim 33 wherein said second perturbation is a second amount of said first pharmacological agent that is contacted with said biological sample, wherein said first and said second amount of said first pharmacological agent are different.

35. The method of claim 33 wherein said second perturbation is an amount of a second pharmacological agent that is contacted with said biological sample.

36. The method claim 1, 2, or 3 wherein said biological sample includes a genome and said first perturbation is determined by the introduction of an exogenous gene into said genome.

37. The method of claim 1, 2, or 3 wherein said biological sample includes a genome and said first perturbation includes a deletion of at least a substantial portion of one gene in said genome.

38. The method of claim 1, 2, or 3 wherein said first perturbation is a method, the method comprising: contacting said biological sample with an agent selecting from the group consisting of a hormone, a drug, a peptide, an oligonucleotide, a mineral, a composition of media, a phage, a trace element, a salt, a colony stimulating factor, and a source of irradiation.

39. The method of claim 1, 2, or 3 wherein said first perturbation is a method, the method comprising: contacting an amount of an organic compound that has a molecular weight less than 1000 Daltons with said biological sample.

40. The method of claim 1, 2, or 3 wherein said first set of constituent profiles is combined into said first augmented profile by concatenating said first set of constituent profiles and said second set of constituent profiles is combined into said second augmented profile by concatenating said second set of constituent profiles.

41. The method of claim 1, 2, or 3 wherein said first augmented profile is:
P i=[P'l;...;P'N]
wherein, P i is said first augmented profile;
P'l is a first constituent profile in said first set of constituent profiles that is determined by measuring a response of said biological sample to said first perturbation when said biological sample is in said first biological state;
P'N is an N th constituent profile in said first set of constituent profiles that is determined by measuring a response of said biological sample to said first perturbation when said biological sample is in an N th biological state selected from said plurality of initial states; and said second augmented profile is:
P j = [P''l;...;P n N]

wherein, P i is said second augmented profile;
P''l is a first constituent profile in said second set of constituent profiles that is determined by measuring a response of said biological sample to said second perturbation when said biological sample is in said first biological state;
P''N is an N th constituent profile in said second set of constituent profiles that is determined by measuring a response of said biological sample to said second perturbation when said biological sample is in an N th biological state selected from said plurality of initial states;
and N is the number of states in said plurality of initial states; and said step of comparing said first augmented profile with said second augmented profile to determine said correlation is performed by comparing P i to P j using a quantitative measure of similarity.

42. The method of claim 41 wherein said quantitative measure of similarity is a generalized dot product:
r ij = P i * P j/(¦P i~P j¦) wherein * denotes dot product, ¦¦ denotes vector norm and r ij denotes similarity.

43. The method of claim 41 wherein said quantitative measure of similarity is derived from Shannon mutual information theory.

44. The method of claim 1, 2 or 3 wherein each constituent profile includes a plurality of elements, each element representing an amount of a cellular constituent in said biological sample.

45. The method of claim 44 wherein each said element of at least one constitutive profile in said first set and each said element of at least one constitutive profile in said second set is assigned a "-1", if said element exceeds a negative threshold, "1", if said element exceeds a positive threshold, and "0", if said element does not exceed said positive and said negative threshold;
and said positive threshold corresponds to a first amount of one or more cellular constituents in said biological sample and said second threshold corresponds to a second amount of one or more cellular constituents in said biological sample.

46. The method of claim 44 wherein each said cellular constituent is independently selected from the group consisting of a gene expression level, an amount of an mRNA encoding a gene, an amount of a protein, an amount of an enzymatic activity, an amount of an epitope presented by a macromolecule, an amount of a divalent cation, an amount of a phosphorylated protein, an amount of a dephosphorylated protein, an amount of a hormone, and an amount of a peptide.

47. The method of claim 1, 2, or 3 wherein each said initial state of said biological sample is provided by selecting said biological sample at a different time.

48. The method of claim 1, 2, or 3 wherein said second set of constituent profiles represents a baseline state of said biological sample.

49. The method of 1, 2, or 3 wherein said second perturbation is wild-type activity and said second set of constituent profiles represents a wild-type state of said biological sample.

50. A method of determining an effect of a first perturbation on a subject, the method comprising:
(a) determining a plurality of augmented profiles; each augmented profile determined by combining a constituent profile set selected from a plurality of constituent profile sets wherein:
each said constituent profile set in said plurality of constituent profile sets is determined by obtaining a biological sample from said subject at a different time; and each constituent profile in said constituent profile set is determined by measuring a biological response of said biological sample to a different second perturbation selected from a plurality of perturbations;
and (b) comparing said plurality of augmented profiles to determine said effect of said first perturbation on said subject.

51. The method of claim 50, wherein said first perturbation is selected from the group consisting of a diseased state, introduction of an exogenous gene into the genome of said subject, and a behavioral health risk.

52. A method of determining an effect of a first perturbation on a subject, the method comprising:
(a) determining a plurality of augmented profiles; each augmented profile determined by combining a constituent profile set selected from a plurality of constituent profile sets wherein:
each said constituent profile set in said plurality of constituent profile sets is determined by obtaining a biological sample from said subject at a different stage of an environmental insult; and each constituent profile in said constituent profile set is determined by measuring a biological response of said biological sample to a different second perturbation selected from a plurality of perturbations;
and (b) comparing said plurality of augmented profiles to determine said effect of said first perturbation on said subject:

53. The method of claim 52, wherein the environmental insult is a disease that has afflicted said subject.

54. The method of claim 50 or 52 wherein a first constituent profile set in said plurality of constituent profiles sets represents a baseline state and all other constituent profile sets in said plurality of constituent profile sets are expressed as a ratio of said first constituent profile set.

55. The method of claim 50 or 52 wherein a first constituent profile set in said plurality of constituent profiles sets represents a baseline state and all other constituent profile sets in said plurality of constituent profile sets are expressed as a logarithmic ratio of said first constituent profile set.

56. The method of claim 50 or 52 wherein said first perturbation is a drug that is taken by said subject at regular intervals.

57. A method of determining a biological state of a first subject, the method comprising:
(a) determining a first set of constituent profiles, each constituent profile of said first set being determined by measuring a response of a biological sample derived from said first subject to a perturbation at a different time;
(b) determining a second set of constituent profiles, each constituent profile of said second set being determined by measuring a response of a second biological sample, which is derived from a second subject having a known biological state, to said perturbation at a different time;
(c) combining said first set of constituent profiles into a first augmented profile;
(d) combining said second set of constituent profiles into a second augmented profile; and (e) comparing said first augmented profile with said second augmented profile to predict the biological state of said first subject.

58. A method of diagnosing a disease state in a subject, the method comprising:
(a) determining a first set of constituent profiles, each constituent profile of said first set being determined by measuring a response of a biological sample obtained from said subject to a different perturbation selected from a plurality of perturbations;
(b) combining said first set of constituent profiles into a first augmented-profile; and (c) comparing said first augmented profile with a library of augmented profiles, wherein each augmented profile in said library of augmented profiles is derived from a different biological sample with a known biological state, to diagnose said disease state.

59. The method of 58 wherein the comparing step includes the step of clustering said library into groups based upon similarities to said first augmented profile.

60. A method of drug discovery, the method comprising the steps of:
(a) determining a plurality of augmented profiles; each augmented profile being determined by combining a constituent profile set selected from a plurality of constituent profile sets wherein:
each said constituent profile set in said plurality of constituent profile sets is determined by use of a test compound; and each constituent profile in said constituent profile set is determined by contacting said test compound with a cell line that is in a different biological state selected from a plurality of biological states;
and (b) comparing said plurality of augmented profiles to determine the effect of said test compound on said cell line.

61. A method of determining a biological state of a first subject, the method comprising comparing a first augmented profile with a second augmented profile to predict said biological state of said first subject wherein said first and said second augmented profile are derived by (a) determining a first set of constituent profiles, each constituent profile of said first set is determined by measuring a response of a biological sample derived from said first subject to a perturbation at a different time;
(b) determining a second set of constituent profiles, each constituent profile of said second set is determined by measuring a response of a second biological sample, which is derived from a second subject having a known biological state, to said perturbation at a different time;
(c) combining said first set of constituent profiles into a first augmented profile; and (d) combining said second set of constituent profiles into a second augmented profile.

62. A method of diagnosing a disease state in a subject, the method comprising comparing a first augmented profile with a library of augmented profiles, wherein each augmented profile in said library of augmented profiles is derived from a different biological sample with a known biological state and said first augment profile is derived by (a) determining a first set of constituent profiles, each constituent profile of said first set of constituent profiles is determined by measuring a response of a biological sample obtained from said subject to a different perturbation selected from a plurality of perturbations; and (b) combining said first set of constituent profiles to derive said first augmented profile.

63. A computer system for determining a degree of similarity between an effect of a first perturbation and an effect of a second perturbation on a biological system, the computer system comprising a processor, and a memory encoding one or more programs coupled to the processor, wherein the one or more programs cause the processor to perform a method comprising:
(a) combining a first set of constituent profiles into a first augmented profile;
each constituent profile in said first set determined by:
a different one of a plurality of initial states of said biological system wherein a response of said biological system to said first perturbation when said biological system is in said different one of said initial states is measured;

(b) combining a second set of constituent profiles into a second augmented profile; each constituent profile of said second set determined by:
a different one of said plurality of initial states of said biological sample; wherein a response of said biological sample to said second perturbation when said biological sample is in said different one of said initial states is measured; and (c) comparing said first augmented profile with said second augmented profile to determine said degree of similarity.

64. A computer system for determining a degree of similarity between an effect of a first perturbation and an effect of a second perturbation on a biological sample, the computer system comprising a processor, and a memory encoding one or more programs coupled to the processor, wherein the one or more programs cause the processor to perform a method that comprises comparing a first augmented profile with a second augmented profile to determine said degree of similarity;
wherein:
(i) said first augmented profile is determined by combining a first set of constituent profiles; each constituent profile of said first set determined with a different one of a plurality of initial states of said biological sample by measuring a response of said biological sample to said first perturbation when said biological sample is in said different one of said initial states;
and (ii) said second augmented profile is determined by combining a second set of constituent profiles; each constituent profile of said second set is determined with said different one of said plurality of initial states of said biological sample by measuring a response of said biological sample to said second perturbation when said biological sample is in said different one of said initial states.

65. A computer system for determining an effect of a first perturbation on a subject, the computer system comprising a processor, and a memory encoding one or more programs coupled to the processor, wherein the one or more programs cause the processor to perform a method comprising:
(a) determining a plurality of augmented profiles; each augmented profile determined by combining a constituent profile set selected from a plurality of constituent profile sets wherein:
each said constituent profile set in said plurality of constituent profile sets is determined by obtaining a biological sample from said subject at a different time; and each constituent profile in said constituent profile set is determined by measuring a biological response of said biological sample to a different second perturbation selected from a plurality of perturbations;
and (b) comparing said plurality of augmented profiles to determine said effect of said first perturbation on said subject.

66. A computer system for determining an effect of a first perturbation on a subject, the computer system comprising a processor, and a memory encoding one or more programs coupled to the processor, wherein the one or more programs cause the processor to perform a method comprising:
(a) determining a plurality of augmented profiles; each augmented profile determined by combining a constituent profile set selected from a plurality of constituent profile sets wherein:
each said constituent profile set in said plurality of constituent profile sets is determined by obtaining a biological sample from said subject at a different stage of an environmental insult; and each constituent profile in said constituent profile set is determined by measuring a biological response of said biological sample to a different second perturbation selected from a plurality of perturbations;
and (b) comparing said plurality of augmented profiles to determine said effect of said first perturbation on said subject.

67. A computer system for determining a biological state of a first subject, the computer system comprising a processor, and a memory encoding one or more programs coupled to the processor, wherein the one or more programs cause the processor to perform a method comprising:
(a) determining a first set of constituent profiles, each constituent profile of said first set being determined by measuring a response of a biological sample derived from said first subject to a perturbation at a different time;
(b) determining a second set of constituent profiles, each constituent profile of said second set being determined by measuring a response of a second biological sample, which is derived from a second subject having a known biological state, to said perturbation at a different time;
(c) combining said first set of constituent profiles into a first augmented profile;

(d) combining said second set of constituent profiles into a second augmented profile; and (e) comparing said first augmented profile with said second augmented profile to predict the biological state of said first subject.

68. A computer system for diagnosing a disease state in a subject, the computer system comprising a processor, and a memory encoding one or more programs coupled to the processor, wherein the one or more programs cause the processor to perform a method comprising:
(a) determining a first set of constituent profiles, each constituent profile of said first set being determined by measuring a response of a biological sample obtained from said subject to a different perturbation selected from a plurality of perturbations;
(b) combining said first set of constituent profiles into a first augmented profile; and (c) comparing said first augmented profile with a library of augmented profiles, wherein each augmented profile in said library of augmented profiles is derived from a different biological sample with a known biological state, to diagnose said disease state.

69. A computer system for advancing drug discovery, the computer system comprising a processor, and a memory encoding one or more programs coupled to the processor, wherein the one or more programs cause the processor to perform a method comprising the steps of:
(a) determining a plurality of augmented profiles; each augmented profile being determined by combining a constituent profile set selected from a plurality of constituent profile sets wherein:
each said constituent profile set in said plurality of constituent profile sets is determined by use of a test compound; and each constituent profile in said constituent profile set is determined by contacting said test compound with a cell line that is in a different biological state selected from a plurality of biological states;
and (b) comparing said plurality of augmented profiles to determine the effect of said test compound on said cell line.

70. A computer system for determining a biological state of a first subject, the computer system comprising a processor, and a memory encoding one or more programs coupled to the processor, wherein the one or more programs cause the processor to perform a method comprising comparing a first augmented profile with a second augmented profile to predict said biological state of said first subject wherein said first and said second augmented profile are derived by:
(a) determining a first set of constituent profiles, each constituent profile of said first set is determined by measuring a response of a biological sample derived from said first subject to a perturbation at a different time;
(b) determining a second set of constituent profiles, each constituent profile of said second set is determined by measuring a response of a second biological sample, which is derived from a second subject having a known biological state, to said perturbation at a different time;
(c) combining said first set of constituent profiles into a first augmented profile; and (d) combining said second set of constituent profiles into a second augmented profile.

71. A computer system for diagnosing a disease state in a subject, the computer system comprising a processor, and a memory encoding one or more programs coupled to the processor, wherein the one or more programs cause the processor to perform a method comprising comparing a first augmented profile with a library of augmented profiles, wherein each augmented profile in said library of augmented profiles is derived from a different biological sample with a known biological state and said first augment profile is derived by:
(a) determining a first set of constituent profiles, each constituent profile of said first set of constituent profiles is determined by measuring a response of a biological sample obtained from said subject to a different perturbation selected from a plurality of perturbations; and (b) combining said first set of constituent profiles to derive said first augmented profile.